JIS  Vol.6 No.2 , April 2015
Quantifying Malware Evolution through Archaeology
Abstract: Dynamic analysis of malware allows us to examine malware samples, and then group those samples into families based on observed behavior. Using Boolean variables to represent the presence or absence of a range of malware behavior, we create a bitstring that represents each malware behaviorally, and then group samples into the same class if they exhibit the same behavior. Combining class definitions with malware discovery dates, we can construct a timeline of showing the emergence date of each class, in order to examine prevalence, complexity, and longevity of each class. We find that certain behavior classes are more prevalent than others, following a frequency power law. Some classes have had lower longevity, indicating that their attack profile is no longer manifested by new variants of malware, while others of greater longevity, continue to affect new computer systems. We verify for the first time commonly held intuitions on malware evolution, showing quantitatively from the archaeological record that over 80% of the time, classes of higher malware complexity emerged later than classes of lower complexity. In addition to providing historical perspective on malware evolution, the methods described in this paper may aid malware detection through classification, leading to new proactive methods to identify malicious software.
Cite this paper: Seideman, J. , Khan, B. and Vargas, C. (2015) Quantifying Malware Evolution through Archaeology. Journal of Information Security, 6, 101-110. doi: 10.4236/jis.2015.62011.

[1]   Classification of Species, 2009.

[2]   Seideman, J. (2009) Recent Advances in Malware Detection and Classification: A Survey. Technical Report, The Graduate School and University Center of the City University of New York.

[3]   Spafford, E.H. (1994) Computer Viruses as Artificial Life. Artificial Life, 1, 249-265.

[4]   Bailey, M., Oberheide, J., Andersen, J., Morley Mao, Z.Q., Jahanian, F. and Nazario, J. (2007) Automated Classification and Analysis of Internet Malware. Proceedings of RAID 2007, 178-197.

[5]   Riau, C. (2002) A Virus by Any Other Name: Virus Naming Practices.

[6]   Gandotra, E., Bansal, D. and Sofat, S. (2014) Malware Analysis and Classification: A Survey. Journal of Information Security, 5, 56-64.

[7]   Lee, T. and Mody, J.J. (2006) Behavioral Classification. Proceedings of EICAR 2006, May 2006, 1-17.

[8]   Szor, P. (2005) The Art of Computer Virus Research and Defense. Addison-Wesley, New York.

[9]   Jacob, G., Debar, H. and Filiol, E. (2008) Behavioral Detection of Malware: From a Survey towards an Established Taxonomy. Journal in Computer Virology, 4, 251-266.

[10]   Andreas Moser, Christopher Kr¨ugel, and Engin Kirda. (2007) Exploring Multiple Execution Paths for Malware Analysis. Proceedings of the 2007 IEEE Symposium on Security and Privacy, 2007, 231-245.

[11]   Norman Sandbox, 2009.

[12]   Liang, Z.K., Sun, W.Q., Venkatakrishnan, V.N. and Sekar, R. (2009) Alcatraz: An Isolated Environment for Experimenting with Untrusted Software. ACM Transactions on Information and System Security, 12, 1-37.

[13]   Buyrukbilen, S. and Deryol, R. (2008) An Automated System for Behavioral Malware Analyis. Technical Report, John Jay College of Criminal Justice, City University of New York.

[14]   The Honeynet Project, 2010.

[15]   Nepenthes—Finest Collection, 2010.

[16]   Dionaea-Catches Bugs, 2012.

[17]   Offensive Computing: Community Malicious Code Research and Analysis, 2010.

[18]   VX Heavens, 2010.

[19]   Symantec, 2012.

[20]   VirusTotal, 2008.

[21]   Threat Explorer—Spyware and Adware, Dialers, Hack Tools, Hoaxes and Other Risks, 2012.

[22]   Oberheide, J., Cooke, E. and Jahanian, F. (2008) Cloudav: N-version Antivirus in the Network Cloud. Proceedings of the 17th USENIX Security Symposium, 91-106.