JIS  Vol.8 No.4 , October 2017
Exploring the Effects of Gap-Penalties in Sequence-Alignment Approach to Polymorphic Virus Detection
Antiviral software systems (AVSs) have problems in identifying polymorphic variants of viruses without explicit signatures for such variants. Alignment-based techniques from bioinformatics may provide a novel way to generate signatures from consensuses found in polymorphic variant code. We demonstrate how multiple sequence alignment supplemented with gap penalties leads to viral code signatures that generalize successfully to previously known polymorphic variants of JS. Cassandra virus and previously unknown polymorphic variants of W32.CTX/W32.Cholera and W32.Kitti viruses. The implications are that future smart AVSs may be able to generate effective signatures automatically from actual viral code by varying gap penalties to cover for both known and unknown polymorphic variants.
Cite this paper: Naidu, V. , Whalley, J. and Narayanan, A. (2017) Exploring the Effects of Gap-Penalties in Sequence-Alignment Approach to Polymorphic Virus Detection. Journal of Information Security, 8, 296-327. doi: 10.4236/jis.2017.84020.

[1]   Symantec Internet Security Threat Report (2014) Symantec Corporation.

[2]   Global Risks 2012: Insight Report (2012) World Economic Forum.

[3]   Kephart, J. and Arnold, W. (1994) Automatic Extraction of Computer Virus Signatures. Proceedings of the 4th Virus Bulletin International Conference, Abingdon, England, 178-184.

[4]   Christodorescu, M., Jha, S., Seshia, S.A., Song, D. and Bryant, R.E. (2005) Semantics-Aware Malware Detection. Proceedings of the IEEE Symposium on Security and Privacy SP ‘05, California, 8-11 May 2005, 32-46.

[5]   Sathyanarayanan, V.S., Kohli, P. and Bruhadesgwar, B. (2008) Signature Generation and Detection of Malware Families. In: Mu, Y., Susilo, W. and Seberry, J., Eds., Information Security and Privacy, Lecture Notes in Computer Science, Springer-Verlag, Heidelberg, 336-349.

[6]   Ellis, D., Aiken, J.G., Attwood, K.S. and Tenaglia, S.D. (2004) A Behavioral Approach to Worm Detection, Proceedings of the ACM Workshop on Rapid Malcode (WORM04), Washington, DC, 29 October 2004, 43-53.

[7]   Gao, D., Reiter, M.K. and Song, D. (2006) Behavioral Distance for Intrusion Detection. In: Valdes, A. and Zamboni, D., Eds., Recent Advances in Intrusion DetectionRAID 2005, Lecture Notes in Computer Science, Springer-Verlag, Heidelberg, 63-81.

[8]   Cesare, S. and Xiang, Y. (2010) Classification of Malware Using Structured Control Flow. Proceedings of the 8th Australasian Symposium on Parallel and Distributed Computing (AusPDC), Brisbane, 1 January 2010, 61-70.

[9]   Tang, H., Zhu, B. andRen, K. (2009) A New Approach to Malware Detection. In: Park, J.H., Chen, HH., Atiquzzaman, M., Lee, C., Kim, T. and Yeo, SS., Eds., Advances in Information Security and Assurance ISA 2009, Lecture Notes in Computer Science, Springer-Verlag, Heidelberg, 229-238.

[10]   Kinder, J., Katzenbeisser, S., Schallhart, C. and Veith, H. (2005) Detecting Malicious Code by Model Checking. In: Julisch, K. and Kruegel, C., Eds., Detection of Intrusions and Malware, and Vulnerability Assessment DIMVA 2005, Lecture Notes in Computer Science, Springer-Verlag, Heidelberg, 174-187.

[11]   Bailey, M., Oberheide, J., Andersen, J., Mao, Z.M., Jahanian, F. and Nazario, J. (2007) Automated Classification and Analysis of Internet Malware. In: Kruegel, C., Lippmann, R. and Clark, A., Eds., Recent Advances in Intrusion Detection RAID 2007. Lecture Notes in Computer Science, Springer-Verlag, Heidelberg, 178-197.

[12]   Eskandari, M. and Hashemi, S. (2012) A Graph Mining Approach for Detecting Unknown Malware. Journal of Visual Languages and Computing, 23, 154-162.

[13]   Chaumette, S., Ly, O. and Tabary, R. (2011) Automated Extraction of Polymorphic Signatures Using Abstract Interpretation. Proceedings of the 5th International Conference on Network and Systems Security (NSS), Milan, 6-8 September 2011, 41-48.

[14]   Zhang, Q. and Reeves, D.S. (2007) MetaAware: Identifying Metamorphic Malware. Proceedings of the IEEE 23rd Annual Computer Security Applications Conference, Florida, 10-14 December 2007, 411-420.

[15]   Steinbock, B. and Martini, P. (2009) Classification and Detection of Metamorphic Malware Using Value Set Analysis. Proceedings of the 4th International Conference on Malicious and Unwanted Software, Quebec, 13-14 October 2009, 39-46.

[16]   Griffin, K., Schneider, S., Hu, X. and Chiueh, T. (2009) Automatic Generation of String Signatures for Malware Detection. In: Kirda, E., Jha, S. and Balzarotti, D., Eds., Recent Advances in Intrusion Detection RAID 2009. Lecture Notes in Computer Science, Springer-Verlag, Heidelberg, 101-120.

[17]   Moser, A., Kruegel, C. and Kirda, E. (2007) Limits of Static Analysis for Malware Detection. Proceedings of the IEEE 23rd Annual Computer Security Applications Conference, Florida, 10-14 December 2007, 421-430.

[18]   Rastogi, V., Chen, Y. and Jiang, X. (2014) Catch Me If You Can: Evaluating Android Anti-Malware against Transformation Attacks. IEEE Transactions on Information Forensics and Security, 9, 99-108.

[19]   Schultz, M.G., Eskin, E., Zadok, E. and Stolfo, S.J. (2001) Data Mining Methods for Detection of New Malicious Executables. Proceedings of the IEEE Symposium on Security & Privacy, California, 14-16 May 2001, 38-49.

[20]   Baldangombo, U., Jambaljav, N. and Horng, S-J. (2013) A Static Malware Detection System Using Data Mining Methods.

[21]   Komashinskiy, D. and Kotenko, I. (2010) Malware Detection by Data Mining Techniques Based on Positionally Dependent Features. Proceedings of the 18th Euromicro Conferences on Parallel, Distributed and Network-based Processing, Pisa, 17-19 February 2010, 617-623.

[22]   Tabish, S.M., Shafiq, M.Z. and Farooq, M. (2009) Malware Detection Using Statistical Analysis of Byte-Level File Content. Proceedings of the 15th ACM SIGKDD Workshop on Cybersecurity and Intelligence Informatics, Paris, 28 June - 1 July 2009, 23-31.

[23]   Abou-Assaleh, T., Cercone, N. and Sweidan, R. (2004) Detection of New Malicious Code Using N-Grams Signatures. Proceedings of the 2nd Annual Conference on Privacy, Security and Trust, New Brunswick, 13-15 October 2004, 13-15.

[24]   Kolter, J.Z. and Maloof, M.A. (2006) Learning to Detect and Classify Malicious Executables in the Wild. Journal of Machine Learning Research, 7, 2721-2744.

[25]   Shafiq, M.Z., Tabish, S.M. and Farooq, M. (2008) Embedded Malware Detection Using Markov N-Grams. Proceedings of the 5th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, Paris, 10-11 July 2008, 88-107.

[26]   Skoudis, E. and Zeltser, L. (2004) Malware: Fighting Malicious Code. United States: Prentice Hall Professional, New Jersey.

[27]   Ferrie, P. and Ször, P. (2001) Hunting for Metamorphic. Virus, 123-143.

[28]   Chen, Y., Narayanan, A., Pang, S. and Tao, B. (2012) Malicioius Software Detection Using Multiple Sequence Alignment and Data Mining. Proceedings of the 2012 IEEE 26th International Conference on Advanced Information Networking and Applications (AINA), 26-29 March 2012, 8-14.

[29]   Naidu, V. and Narayanan, A. (2016) A Syntactic Approach for Detecting Viral Polymorphic Malware Variants. In: Chau, M., Wang, G. and Chen, H., Eds., Intelligence and Security Informatics PAISI 2016. Lecture Notes in Computer Science, Springer-Verlag, Heidelberg, 146-165.

[30]   Naidu, V. and Narayanan, A. (2016) Using Different Substitution Matrices in a String-Matching Technique for Identifying Viral Polymorphic Malware Variants. Proceedings of the 2016 IEEE Congress on Evolutionary Computation (WCCI-IEEE CEC), Vancouver, 24-29 July 2016, 2903-2910.

[31]   Naidu, V. and Narayanan, A. (2016) Needleman-Wunsch and Smith-Waterman Algorithms for Identifying Viral Polymorphic Malware Variants. Proceedings of the 14th IEEE International Conference on Dependable, Autonomic and Secure Computing (DASC), Auckland, 8-12 August 2016, 326-333.

[32]   Naidu, V. and Narayanan, A. (2014) Further Experiments in Biocomputational Structural Analysis of Malware. Proceedings of the 10th IEEE International Conference on Natural Computation (ICNC), 19-21 August 2014, 605-610.

[33]   Kim, H.-A. and Karp, B. (2004) Autograph: Toward Automated, Distributed Worm Signature Detection. SSYM’04 Proceedings of the 13th conference on USENIX Security Symposium, San Diego, CA, 9-13 August, 13, 19-19.

[34]   Kreibich, C. and Crowcroft, J. (2004) Honeycomb: Creating Intrusion Detection Signatures Using Honeypots. ACM SIGCOMM Computer Communication Review, 34, 51-56.

[35]   Singh, S., Estan, C., Varghese, G. and Savage, S. (2004) Automated Worm Fingerprinting. OSDI’04 Proceedings of the 6th conference on Symposium on Operating Systems Design & Implementation, San Francisco, CA, 06-08 December, 6, 4.

[36]   Newsome, J., Karp, B. and Song, D. (2005) Polygraph: Automatically Generating Signatures for Polymorphic Worms. IEEE Symposium on Security and Privacy, Oakland, CA, USA, 8-11 May, 226-241.

[37]   Wang, K., Cretu, G. andStolfo, S.J. (2005) Anomalous Payload-Based Worm Detection and Signature Generation. RAID’05 Proceedings of the 8th international conference on Recent Advances in Intrusion Detection, Seattle, WA, 7-9 September, 227-246.

[38]   Yegneswaran, V., Giffon, J. T., Barford, P. andJha, S. (2005) An Architecture for Generating Semantic Aware Signatures. SSYM’05 Proceedings of the 14th Conference on USENIX Security Symposium, Baltimore, MD, 31 July-05 August, 14, 34-43.

[39]   Li, Z., Sanghi, M., Chen, Y., Kao, M.-Y. and Chavez, B. (2006) Hamsa: Fast Signature Generation for Zero-Day Polymorphic Worms with Provable Attack Resilience. IEEE Symposium on Security and Privacy, 21-24 May, 32-47.

[40]   Rieck, K., Schwenk, G., Limmer, T., Holz, T. and Laskov, P. (2010) Botzilla: Detecting the Phoning Home of Malicious Software. SAC’10 Proceedings of the 2010 ACM Symposium on Applied Computing,Sierre, Switzerland, 22-26 March, 1978-1984.

[41]   Cui, W., Peinado, M., Wang, H. J. andLocasto, M. E. (2007) Shieldgen: Automaticdata Patch Generation for Unknown Vulnerabilities with Informed Probing. IEEE Symposium on Security and Privacy, Berkeley, CA, USA, 20-23 May, 252-266.

[42]   Xie, Y., Yu, F., Achan, K., Panigrahy, R., Hulten, G. andOsipkov, I. (2008) Spamming Botnets: Signatures and Characteristics. ACM SIGCOMM Computer Communication Review, 38, 171-182.

[43]   Wurzinger, P., Bilge, L., Holz, T., Goebel, J., Kruegel, C. andKirda, E. (2009) Automatically Generating Models for Botnet Detection. ESORICS’09 Proceedings of the 14th European Conference on Research in Computer Security, Saint-Malo, France, 21-23 September, 232-249.

[44]   Rossow, C. and Dietrich, C. J. (2013) Provex: Detecting Botnets with Encrypted Command and Control Channels. DIMVA’13 Proceedings of the 10th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment, Berlin, Germany, 18-19 July, 21-40.

[45]   Caballero, J., Johnson, N.M., McCamant, S. and Song, D. (2009) Binary Code Extraction and Interface Identification for Security Applications. Technical Report, DTIC Document, No. UCB/EECS-2009-133.

[46]   Rafique, M. Z. and Caballero, J. (2013) Firma: Malware Clustering and Network Signature Generation with Mixed Network Behaviors. RAID 2013 Proceedings of the 16th International Symposium on Research in Attacks, Intrusions, and Defenses, Rodney Bay, St. Lucia, 23-25 October, 8145, 144-163.

[47]   Scheirer, W. and Chuah, M.C. (2008) Syntax vs. Semantics: Competing Approaches to Dynamic Network Intrusion Detection. International Journal of Security and Networks, 3, 24-35.

[48]   Coull, S.E. and Szymanski, B.K. (2008) Sequence Alignment for Masquerade Detection. Computational Statistics & Data Analysis, 52, 4116-4131.

[49]   Wespi, A., Dacier, M. and Debar, H. (1999) An Intrusion-Detection System Based on the Teiresias Pattern-Discovery Algorithm. IBM Thomas J.Watson Research Division.

[50]   Zhao, Y., Tang, Y., Wang, Y. and Chen, S. (2013) Generating Malware Signature Using Transcoding from Sequential Data to Amino Acid Sequence. International Conference on High Performance Computing and Simulation (HPCS), Helsinki, Finland, 1-5 July, 266-272.

[51]   Ki, Y., Kim, E. and Kim, H.K. (2015) A Novel Approach to Detect Malware Based on API Call Sequence Analysis. International Journal of Distributed Sensor Networks, 2015, Article No. 4.

[52]   Kirat, D. and Vigna, G. (2015) Malgene: Automatic Extraction of Malware Analysis Evasion Signature. CCS’15 Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, Denver, Colorado, USA, 12-16 October, 769-780.

[53]   Kumar, V. and Mishra, S.K. (2013) Detection of Malware by Using Sequence Alignment Strategy and Data Mining Techniques. International Journal of Computer Applications, 62.

[54]   Oracle VM VirtualBox (2016) VirtualBox.

[55]   JS.Cassandra by Second Part To Hell (2014) rRlF#4 (Redemption).

[56]   Tutorials-Win32 Polymorphism (2014) VX Heavens.

[57]   Viruses (2004) Second Part To Hell’s Artworks-VIRUSES.

[58]   Win32/CTX.6889.A (2004) NOD21 Website.

[59]   W32/CTX-A (2015) SOPHOS Website.

[60]   Computer Virus Collection/Virus.Win32.CTX (2 Files)—VX Heaven (2009) Virus Collection (VX Heaven).

[61]   Second Part to Hell’s Artworks-VIRUSES (2013) Second Part To Hell’s Artworks.

[62]   Second Part to Hell’s Artworks-INDEX (2014) Second Part To Hell’s Artworks.

[63]   Valhalla 4 Announcement (2013) VX Heaven Forum.

[64]   Viruses: w32.kitti.rar (2013) Second Part To Hell’s Artworks.

[65]   VirusTotal (2016) Free Online Virus, Malware and URL Scanner.

[66]   ClamavNet (2015) ClamAV® is an Open Source Antivirus Engine for Detecting Trojans, Viruses, Malware & Other Malicious Threats.

[67]   Moustafa, A. (2010) JAligner: Java Implementation of the Smith-Waterman Algorithm for Biological Sequence Alignment. Retrieved from SourceForge:

[68]   Clustal (2012) Clustal: Multiple Sequence Alignment. Retrieved from Clustal:

[69]   Yan, R., Wang, X., Huang, L., Lin, J., Cai, W. and Zhang, Z. (2014) GPCRserver: An Accurate and Novel G Protein-Coupled Receptor Predictor. Molecular BioSystems, 10, 2495-2504.

[70]   Notredame, C., Higgins, D. G., & Heringa, J. (2000) T-Coffee: A Novel Method for Fast and Accurate Multiple Sequence Alignment. Journal of Molecular Biology, 302, 205-217. Retrieved from EMBL-EBI:

[71]   TopTenReviews (2016) Top 10 Best Antivirus Software for 2016—Top Ten Reviews.