JIS  Vol.11 No.1 , January 2020
Comparing the Area of Data Mining Algorithms in Network Intrusion Detection
Abstract: The network-based intrusion detection has become common to evaluate machine learning algorithms. Although the KDD Cup’99 Dataset has class imbalance over different intrusion classes, still it plays a significant role to evaluate machine learning algorithms. In this work, we utilize the singular valued decomposition technique for feature dimension reduction. We further reconstruct the features form reduced features and the selected eigenvectors. The reconstruction loss is used to decide the intrusion class for a given network feature. The intrusion class having the smallest reconstruction loss is accepted as the intrusion class in the network for that sample. The proposed system yield 97.90% accuracy on KDD Cup’99 dataset for the stated task. We have also analyzed the system with individual intrusion categories separately. This analysis suggests having a system with the ensemble of multiple classifiers; therefore we also created a random forest classifier. The random forest classifier performs significantly better than the SVD based system. The random forest classifier achieves 99.99% accuracy for intrusion detection on the same training and testing data set.
Cite this paper: Alagrash, Y. , Drebee, A. , Zirjawi, N. , (2020) Comparing the Area of Data Mining Algorithms in Network Intrusion Detection. Journal of Information Security, 11, 1-18. doi: 10.4236/jis.2020.111001.

[1]   Bhuyan, M.H., Bhattacharyya, D.K. and Kalita, J.K. Network Anomaly Detection.

[2]   Feng, W., Zhang, Q., Hu, G. and Huang, J.X. (2014) Mining Network Data for Intrusion Detection through Combining SVMs with Ant Colony Networks. Future Generation Computer Systems, 37, 127-140.

[3]   Singh, R., Kumar, H. and Singla, R. (year) An Intrusion Detection System Using Network Traffic Profiling and Online Sequential Extreme Learning Machine. Expert Systems with Applications, 42, 8609-8624.

[4]   Wu, S.X. and Banzhaf, W. (2010) The Use of Computational Intelligence in Intrusion Detection Intrusion Detection Systems: A Review. Applied Soft Computing, 10, 1-35.

[5]   DeLong, E.R., DeLong, D.M. and Clarke-Pearson, D.L. (1988) Comparing the Areas under Two or More Correlated Receiver Operating Characteristic Curves: A Nonparametric. Biometrics, 44, 837-845.

[6]   Breiman, L. (2001) Random Forests. Machine Learning, 45.

[7]   Nadeem, M., Marshall, O., Singh, S., Fang, X. and Yuan, X. (2016) Semi-Supervised Deep Neural Network for Network Intrusion Detection. KSU Proceedings on Cybersecurity Education, Research and Practice. 2.

[8]   Shi, N., Yuan, X. and Nick, W. (2017) Semi-Supervised Random Forest for Intrusion Detection Network. 2017 The 28th Modern Artificial Intelligence and Cognitive Science Conference, Fort Wayne, IN, 28-29 April 2017, 181-185.

[9]   Klassen, M. and Yang, N. (2012) IEEE Fifth International Conference on Advanced Computational Intelligence.

[10]   Liu, T., Qi, A., Hou, Y. and Chang, X. (2008) Method for Network Anomaly Detection Based on Bayesian Statistical Model with Time Slicing. 2008 7th World Congress on Intelligent Control and Automation, Chongqing, 25-27 June 2008, 3359-3362.

[11]   Swarnkar, M. and Hubballi, N. (2016) OCPAD: One Class Naive Bayes Classifier for Payload Based Anomaly Detection. Expert Systems with Applications, 64, 330-339.

[12]   Wagner, C., Francois, J., State, R. and Engel, T. (2011) Machine Learning Approach for IP-Flow Record Anomaly Detection. 10th IFIP Networking Conference (NETWORKING), Valencia, Spain, May 2011, 28-39.

[13]   Scholkopf, B.S., Platt, J.C. and Shawe-Taylor, J.C. (2001) Estimating the Support of a High-Dimensional Distribution. Neural Computation, 13, 1443-1471.

[14]   Catania, C.A., Bromberg, F. and Garino, C.G. (2011) An Autonomous Labeling Approach to 345 Support Vector Machines Algorithms for Network Traffic Anomaly Detection. Expert Systems with Applications, 39, 1822-1829.

[15]   Amer, M., Goldstein, M. and Abdennadher, S. (2013) Enhancing One-Class Support Vector Machines for Unsupervised Anomaly Detection. Proceedings of the ACM SIGKDD Workshop on Outlier Detection and Description on Outlier Detection and Description, Chicago, IL, 11 August 2013, 8-15.

[16]   Wang, H., Gu, J. and Wang, S. (2017) An Effective Intrusion Detection Framework Based on SVM with Feature Augmentation. Knowledge-Based Systems, 136, 130-139.

[17]   Kabir, E., Hu, J., Wang, H. and Zhuo, G. (2018) A Novel Statistical Technique for Intrusion Detection Systems. Future Generation Computer Systems, 79, 303-318.

[18]   Barbard, D., Couto, J., Jajodia, S. and Wu, N. (2001) A Testbed for Exploring the Use of Data Mining in Intrusion Detection. SIGMOD Record, 30, 15-24.

[19]   Abbes, T., Bouhoula, A. and Rusinowitch, M. (2010) Efficient Decision Tree for Protocol Analysis in Intrusion Detection. International Journal of Security and Networks, 5, 220-235.

[20]   Muda, Z., Yassin, W., Sulaiman, M. and Udzir, N.I. (2011) A k-Means and Naive Bayes Learning Approach for Better Intrusion Detection. Information Technology Journal, 10, 648-655.

[21]   Quinlan, J.R. (1986) Induction of Decision Trees. Machine Learning.

[22]   Amini, M., Jalili, R. and Shahriari, H.R. (2006) RT-UNNID: A Practical Solution to Real-Time Network-Based Intrusion Detection Using Unsupervised Neural Networks. Computers & Security, 25, 459-468.

[23]   Subba, B., Biswas, S. and Karmakar, S. (2016) A Neural Network Based System for Intrusion Detection and Attack Classification. 2016 Twenty Second National Conference on Communication (NCC), Guwahati, India, 4-6 March 2016, 1-6.

[24]   Saeed, A., Ahmadinia, A., Javed, A. and Larijani, H. (2016) Intelligent Intrusion Detection in Low-Power IoTs. ACM Transactions on Internet Technology, 16, Article No. 27.

[25]   Brown, J., Anwar, M. and Dozier, G. (2016) An Evolutionary General Regression Neural Network. 25th International Conference on Computer Communication and Networks, Waikoloa, HI, 1-4 August 2016, 1-5.

[26]   Stolfo, S.J., Fan, W., Lee, W., Prodromidis, A. and Chan, P.K. (2000) Cost-Based Modeling for Fraud and Intrusion Detection: Results from the Jam Project. Tech Report, Columbia University, New York.

[27]   Shafi, K. (2008) An Online and Adaptive Signature-Based Approach for Intrusion Detection.

[28]   Stewart, I. (2009) A Modified Genetic Algorithm and Switch-Based Neural Net-Work Model.

[29]   Sheikhan, M., Jadidi, Z. and Farrokhi, A. (2012) Intrusion Detection Using Reduced-Size RNN Based on Feature Grouping. Neural Computing and Applications, 21, 1-6.

[30]   Protić, D.D. (2018) Review of KDD Cup’99, NSL-KDD i Kyoto 2006+ Baza Podataka. Vojnotehnički glasnik, 663, 580-596.

[31]   Kemal, O. (2015) A New Classification Scheme of Plastic Wastes Based upon Recycling Labels. Waste Management, 35, 29-35.

[32]   Rijsbergen, C.J.V. and Croft, W.B. (1975) Document Clustering: An Evaluation of Some Experiments with the Cranfield 1400 Collection. Information Processing & Management, 11, 171-182.

[33]   DeLong, E.R., DeLong, D.M. and Clarke-Pearson, D.L. (1988) Comparing the Areas under Two or More Correlated Receiver Operating Characteristic Curves: A Nonparametric Approach. Biometric, 44, 837-845.