Back
 CN  Vol.5 No.4 , November 2013
Lossless Compression of SKA Data Sets
Abstract: With the size of astronomical data archives continuing to increase at an enormous rate, the providers and end users of astronomical data sets will benefit from effective data compression techniques. This paper explores different lossless data compression techniques and aims to find an optimal compression algorithm to compress astronomical data obtained by the Square Kilometre Array (SKA), which are new and unique in the field of radio astronomy. It was required that the compressed data sets should be lossless and that they should be compressed while the data are being read. The project was carried out in conjunction with the SKA South Africa office. Data compression reduces the time taken and the bandwidth used when transferring files, and it can also reduce the costs involved with data storage. The SKA uses the Hierarchical Data Format (HDF5) to store the data collected from the radio telescopes, with the data used in this study ranging from 29 MB to 9 GB in size. The compression techniques investigated in this study include SZIP, GZIP, the LZF filter, LZ4 and the Fully Adaptive Prediction Error Coder (FAPEC). The algorithms and methods used to perform the compression tests are discussed and the results from the three phases of testing are presented, followed by a brief discussion on those results.
Cite this paper: K. Rajeswaran and S. Winberg, "Lossless Compression of SKA Data Sets," Communications and Network, Vol. 5 No. 4, 2013, pp. 369-378. doi: 10.4236/cn.2013.54046.
References

[1]   L.-L. Stark and F. Murtagh, “Handbook of Astronomical Data Analysis,” Springer-Verlag, Heidelberg-Berlin, 2002.

[2]   W. D. Pence, R. Seaman and R. L. White, “Lossless Astronomical Image Compression and the Effects of Noise,” Publications of the Astronomical Society of the Pacific, Vol. 121, No. 4, 2009, pp. 414-427.
http://dx.doi.org/10.1086/599023

[3]   “South Africa’s Meerkat Array,” 2012.
http://www.ska.ac.za/download/factsheetmeerkat2011.pdf

[4]   R. Mittra, “Square Kilometer Array-A Unique Instrument for Exploring the Mysteries of the Universe Using the Square Kilometer Array,” Applied Electromagnetics Conference (AEMC), Kolkata, 14-16 December 2009, pp. 1-6.

[5]   D. L. Jones, K. Wagstaff, D. R. Thompson, L. D’Addario, R. Navarro, C. Mattmann, W. Majid, J. Lazio, R. Preston, and U. Rebbapragada, “Big Data Challenges for Large Radio Arrays,” IEEE Aerospace Conference, Big Sky, 3-10 March 2012, pp. 1-6.

[6]   J.-L. Starck and F. Murtagh, “Astronomical Image and Signal Processing,” IEEE Signal Processing Magazine, Vol. 18, No. 2, 2002, pp. 30-40.

[7]   S. de Rooij and P. M. Vitanyi, “Approximating Rate-Distortion Graphs of Individual Data: Experiments in Lossy Compression and Denoising,” IEEE Transactions on Computers, Vol. 23, No. 4, 2012, pp. 14-15.

[8]   S. Finniss, “Using the Fits Data Structure,” Master’s Thesis, University of Cape Town, Cape Town, 2011.

[9]   K. Borne, “Data Science Challenges from Distributed Petascale Astronomical Sky Surveys,” DOE Conference on Mathematical Analysis of Petascale Data, 2008.

[10]   K. R. Anderson, A. Alexov, L. BaLhren, J. M. Griemeier, M. Wise, and G. A. Renting, “LOFAR and HDF5: Toward a New Radio Data Standard,” SKAF2010 Science Meeting, 10-14 June 2010.

[11]   B. B. C. I. of Technology, “Astronomy Needs New Data Format Standards,” 2011.
http://astrocompute.wordpress.com/2011/05/20/astronomy-needs-new-data-format-standards/

[12]   “Hdf5 for Python,” 2013.
www.alvfen.org/wp/hdf-5for-python

[13]   C. E. Sanchez, “Feasibility Study of the PEC Compressor in hdf5 File Format,” Master’s Thesis, Universitat Politecnica de Catalunya, 2011.

[14]   “What Is HDF5,” 2013.
www.hdfgroup.org/hdf5/whatishdf5.html

[15]   P.-S. Yeh, W. Xia-Serafino, L. Miles, B. Kobler and D. Menasce, “Implementation of CCSDS Lossless Data Compression in HDF,” Space Operations Conference, Houston, 9-12 October 2002.

[16]   “Implementation of CCSDS Lossless Data Compression in HDF,” Earth Science Technology Conference, Pasadena, 11-13 June 2002.

[17]   “LZF Compression Filter for HDF5,” 2013.
http://www.h5py.org/lzf/

[18]   J. Portell, E. Garcia Berroad, C. E. Sanchez, J. Castaneda, and M. Clotet, “Efficient Data Storage of Astronomical Data Using HDF5 and PEC Compression,” SPIE HighPerformance Computing in Remote Sensing, Vol. 8183, 2011, Article ID: 818305.

[19]   “LZ4 Explained,” 2013.
http://fastcompression.blogspot.com/2011/05/lz4-explained.html

[20]   “From Big Bang to Big Data: Astron and IBM Collaborate to Explore Origins of the Universe,” 2012.
http://www-03.ibm.com/press/us/en/pressrelease/37361.wss

 
 
Top