IJCNS  Vol.6 No.12 , December 2013
Improving Compression of Short Messages
Abstract: Compression of short text strings, such as the GSM Short Message Service (SMS) and Twitter messages, has received relatively little attention compared to the compression of longer texts. This is not surprising given that for typical cellular and internet-based networks, the cost of compression probably outweighs the cost of delivering uncompressed messages. However, this is not necessarily true in the case where the cost of data transport is high, for example, where satellite back-haul is involved, or on bandwidth-starved mobile mesh networks, such as the mesh networks for disaster relief, rural, remote and developing contexts envisaged by the Serval Project [1-4]. This motivated the development of a state-of-art text compression algorithm that could be used to compress mesh-based short-message traffic, culminating in the development of the stats3 SMS compression scheme described in this paper. Stats3 uses word frequency and 3rd-order letter statistics embodied in a pre-constructed dictionary to affect lossless compression of short text messages. This scheme shows that our scheme compressing text messages typically reduces messages to less than half of their original size, and in so doing substantially outperforms all public SMS compression systems, while also matching or exceeding the marketing claims of the commercial options known to the authors. We also outline approaches for future work that has the potential to further improve the performance and practical utility of stats3.  
Cite this paper: P. Gardner-Stephen, A. Bettison, R. Challans, J. Hampton, J. Lakeman and C. Wallis, "Improving Compression of Short Messages," International Journal of Communications, Network and System Sciences, Vol. 6 No. 12, 2013, pp. 497-504. doi: 10.4236/ijcns.2013.612053.

[1]   P. Gardner-Stephen, “The Serval Project: Practical Wireless Ad-Hoc Mobile Telecommunications,” 2011.

[2]   P. Gardner-Stephen and R. Challans and A. Bettison and J. Lakeman and C. Wallis, “The Serval Project,” 2012.

[3]   P. Gardner-Stephen and J. Lakeman and R. Challans and C. Wallis and A. Stulman and Y. Haddad, “MeshMS: Ad Hoc Data Transfer within Mesh Network,” International Journal of Communications, Network and System Sciences, Vol. 5, No. 8, 2012, pp. 496-504.

[4]   P. Gardner-Stephen and S. Palaniswamy, “Serval Mesh Software-WiFi Multi Model Management. Proceedings of the 1st International Conference on Wireless Technologies for Humanitarian Relief in ACWR 11, New York, 2011, pp. 71-77.

[5]   European Telecommunications Standards Institute, “Digital Cellular Telecommunications System (Phase 2+); Compression Algorithm for Text Messaging Services (GSM TS 03.42 Version 7.1.1 Release 1998),” 1999.

[6]   European Telecommunications Standards Institute, “Digital Cellular Telecommunications System (Phase 2+); Universal Mobile Telecommunications System (UMTS); LTE; Compression Algorithm for Text Messaging Services (3GPP TS 23.042 version 11.0.0 Release 11),” 2012.

[7]   P. G. Howard and J. S. Vitter, “Arithmetic Coding for Data Compression,” Proceedings of IEEE, Vol. 82, No. 6, 1994, pp. 857-865.

[8]   P. G. Howard and J. S. Vitter, “Analysis of Arithmetic Coding for Data Compression,” Information Processing and Management, Vol. 28, No. 6, 1992, pp. 749-764.

[9]   G. G. Langdon Jr., “An Introduction to Arithmetic Coding,” IBM Journal of Research and Development, Vol. 28, No. 2, 1984, p. 135.

[10]   J. Rissanen and G. G. Langdon, “Arithmetic Coding,” IBM Journal of Research and Development, Vol. 23, No. 2, 1979, pp. 149-162.

[11]   A. Sanko, “Five Cents on Arithmetic Coding,” Technical Report, 2005.

[12]   I. H. Witten and Radford M. Neal and J. G. Cleary. “Arithmetic Coding for Data Compression,” Communications of the ACM, Vol. 30, No. 6, 1987, pp. 520-540.

[13]   CleverTexting, myMobile Ergonomics.

[14]   SMSzipper, Acticom GmbH.

[15]   Panini Keypad, Luna Ergonomics Private Ltd.

[16]   T. M. Mahmoud and B. A. Abdel-latef and A. A. Ahmed and A. M. Mahfouz and T. M. Mahmoud and B. A. Abdel-latef and A. A. Ahmed and A. M. Mahfouz, “Hybrid Compression Encryption Technique for Securing SMS,” International Journal of Computer Science and Security, 2009.

[17]   J. L. Gailly, “Gzip Program and Documentation,” 1993.

[18]   M. Burrows and D. J. Wheeler, “A Block-Sorting Lossless Data Compression Algorithm,” Technical Report, 124, Digital Equipment Corporation, 1994.

[19]   J. Seward, M. Burrows, D. Wheeler, P. Fenwick and A. Moffat and Radford Neal and Ian Witten, “The BZIP2 Compression Program,” 2001.

[20]   M. V. Mahoney, “Adaptive Weighing of Context Models for Lossless Data Compression,” Technical Report, CS-2005-16, Florida Institute of Technology CS Department, 2005.

[21]   T. J. Ferguson and J. H. Rabinowitz. Self-Synchronizing Huffman Codes. IEEE Transactions on Information Theory, Vol. 30, No. 4, 1984, p. 687.

[22]   R. G. Gallager, “Variations on a Theme by Huffman,” IEEE Transactions on Information Theory, Vol. IT-24, No. 6, 1978, pp. 668-674.

[23]   D. A. Huffman, “A method for the construction of minimum redundancy codes,” Proceedings of the IRE, Vol. 40, 1952, pp. 1098-1101.

[24]   M. Jakobsson, “Huffman Coding in Bit-Vector Compression,” Information Processing Letters, Vol. 7, No. 6, 1978, pp. 304-307.

[25]   J. van Leeuwen. “On the Construction of Huffman Trees. Proceedings of 3rd International Colloqium on Automata, Languages and Programming, 1976, pp. 382-410.

[26]   Salvatore Sanfilippo, “SMAZ—Compression for Very Small Strings,” 2009.

[27]   T. A. Almeida, J. M. G. Hidalgo and A. Yamakami, “Contributions to the Study of SMS Spam Filtering: New Collection and Results,” Proceedings of the 11th ACM Symposium on Document Engineering in DocEng‘11, New York, 2011, pp. 259-262.

[28]   M. T. Nuruzzaman, L. Changmoo and C. Deokjai, “Independent and Personal SMS Spam Filtering,” 2011 IEEE 11th International Conference on Computer and Information Technology (CIT), 2011, pp. 429-435.

[29]   R. McCreadie, I. Soboroff, J. Lin, C. Macdonald, I. Ounis, and D. McCullough, “On Building a Reusable Twitter Corpus,” Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval in SIGIR ‘12, New York, 2012, pp 1113-1114.

[30]   C. Constantinescu, J. Q. Trelewicz and R. B. Arps, “Natural Language Insensitive Short Textual String Compression,” 2004, pp. 1-10.

[31]   A. Moffat and L. Stuiver, “Binary Interpolative Coding for Effective Index Compression,” Information Retrieval, Vol. 3, No. 1, 2000, pp. 25-47.