JIS  Vol.6 No.3 , July 2015
Evaluation of Modified Vector Space Representation Using ADFA-LD and ADFA-WD Datasets
Abstract: Predicting anomalous behaviour of a running process using system call trace is a common practice among security community and it is still an active research area. It is a typical pattern recognition problem and can be dealt with machine learning algorithms. Standard system call datasets were employed to train these algorithms. However, advancements in operating systems made these datasets outdated and un-relevant. Australian Defence Force Academy Linux Dataset (ADFA-LD) and Australian Defence Force Academy Windows Dataset (ADFA-WD) are new generation system calls datasets that contain labelled system call traces for modern exploits and attacks on various applications. In this paper, we evaluate performance of Modified Vector Space Representation technique on ADFA-LD and ADFA-WD datasets using various classification algorithms. Our experimental results show that our method performs well and it helps accurately distinguishing process behaviour through system calls.
Cite this paper: Borisaniya, B. and Patel, D. (2015) Evaluation of Modified Vector Space Representation Using ADFA-LD and ADFA-WD Datasets. Journal of Information Security, 6, 250-264. doi: 10.4236/jis.2015.63025.

[1]   Forrest, S., Hofmeyr, S.A., Somayaji, A. and Longstaff, T.A. (1996) Sense of Self for Unix Processes. Proceedings of the 1996 IEEE Symposium on Security and Privacy, Oakland, 6-8 May 1996, 120-128.

[2]   Hofmeyr, S.A., Forrest, S. and Somayaji, A. (1998) Intrusion Detection Using Sequences of System Calls. Journal of Computer Security, 6, 151-180.

[3]   Hubballi, N., Biswas, S. and Nandi, S. (2011) Sequencegram: n-Gram Modeling of System Calls for Program Based Anomaly Detection. 2011 Third International Conference on Communication Systems and Networks (COMSNETS 2011), Bangalore, 4-8 January 2011, 1-10.

[4]   Hubballi, N. (2012) Pairgram: Modeling Frequency Information of Lookahead Pairs for System Call Based Anomaly Detection. Fourth International Conference on Communication Systems and Networks (COMSNETS 2012), Bangalore, 3-7 January 2012, 1-10.

[5]   Wang, X., Yu, W., Champion, A., Fu, X. and Xuan, D. (2007) Detecting Worms via Mining Dynamic Program Execution. Proceedings of Third International Conference on Security and Privacy in Communications Networks and the Workshops (SecureComm 2007), Nice, 17-21 September 2007, 412-421.

[6]   Rieck, K., Holz, T., Willems, C., Düssel, P. and Laskov, P. (2008) Learning and Classification of Malware Behavior. Detection of Intrusions and Malware, and Vulnerability Assessment, LNCS, 5137, 108-125.

[7]   Liao, Y. and Vemuri, V.R. (2002) Using Text Categorization Techniques for Intrusion Detection. USENIX Security Symposium, USENIX Association, Berkeley, 51-59.

[8]   Forrest, S. University of New Mexico (UNM) Intrusion Detection Dataset.

[9]   DARPA Intrusion Detection Dataset.

[10]   Creech, G. and Hu, J. (2013) Generation of a New IDS Test Dataset: Time to Retire the KDD Collection. Wireless Communications and Networking Conference (WCNC 2013), Shanghai, 7-10 April 2013, 4487-4492.

[11]   Creech, G. (2014) Developing a High-Accuracy Cross Platform Host-Based Intrusion Detection System Capable of Reliably Detecting Zero-Day Attacks. Ph.D. Dissertation, University of New South Wales, Sydney.

[12]   Creech, G. and Hu, J. (2014) A Semantic Approach to Host-Based Intrusion Detection Systems Using Contiguous and Discontiguous System Call Patterns. IEEE Transactions on Computers, 63, 807-819.

[13]   Xie, M., Hu, J. and Slay, J. (2014) Evaluating Host-Based Anomaly Detection Systems: Application of the One-Class SVM Algorithm to ADFA-LD. Proceedings of the 11th IEEE International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2014), Xiamen, 19-21 August 2014, 978-982.

[14]   Xie, M. and Hu J. (2013) Evaluating Host-Based Anomaly Detection Systems: A Preliminary Analysis of ADFA-LD. Proceedings of the 6th IEEE International Congress on Image and Signal Processing (CISP 2013), Hangzhou, 16-18 December 2013, 1711-1716.

[15]   Xie, M., Hu, J., Yu, X. and Chang, E. (2014) Evaluating Host-Based Anomaly Detection Systems: Application of the Frequency-Based Algorithms to ADFA-LD. Proceedings of 8th International Conference on Network and System Security (NSS 2014), Lecture Notes in Computer Science, 8792, 542-549.

[16]   Borisaniya, B., Patel, K. and Patel, D. (2014) Evaluation of Applicability of Modified Vector Space Representation for in-VM Malicious Activity Detection in Cloud. Proceedings of the 11th Annual IEEE India Conference (INDICON 2014), Pune, 11-13 December 2014, 1-6.

[17]   Canali, D., Lanzi, A., Balzarotti, D., Kruegel, C., Christodorescu, M. and Kirda, E. (2012) A Quantitative Study of Accuracy in System Call-Based Malware Detection. Proceedings of the 2012 International Symposium on Software Testing and Analysis (ISSTA 2012), Minneapolis, 15-20 July 2012, 122-132.

[18]   Manning, C., Raghavan, P. and Schütze, H. (2008) Introduction to Information Retrieval. Cambridge University Press, Cambridge.

[19]   Wagner, D. and Dean, D. (2001) Intrusion Detection via Static Analysis. Proceedings of the 2001 IEEE Symposium on Security and Privacy, Oakland, 14-16 May 2001, 156-168.

[20]   Wagner, D. and Soto, P. (2002) Mimicry Attacks on Host-Based Intrusion Detection Systems. Proceedings of the 9th ACM Conference on Computer and Communications Security (CCS 2002), Washington DC, 18-22 November 2002, 255-264.

[21]   The ADFA Intrusion Detection Datasets. IDS Datasets/

[22]   Auditd.

[23]   Process Monitor (Procmon).

[24]   Holmes, G., Donkin, A. and Witten, I.H. (1994) WEKA: A Machine Learning Workbench. Proceedings of the 1994 Second Australian and New Zealand Conference on Intelligent Information Systems, Brisbane, 29 November-2 December 1994, 357-361.

[25]   Weka.

[26]   Fawcett, T. (2006) An Introduction to ROC Analysis. Pattern Recognition Letters, 27, 861-874.