JIS  Vol.7 No.3 , April 2016
Feature Selection for Intrusion Detection Using Random Forest
Abstract: An intrusion detection system collects and analyzes information from different areas within a computer or a network to identify possible security threats that include threats from both outside as well as inside of the organization. It deals with large amount of data, which contains various ir-relevant and redundant features and results in increased processing time and low detection rate. Therefore, feature selection should be treated as an indispensable pre-processing step to improve the overall system performance significantly while mining on huge datasets. In this context, in this paper, we focus on a two-step approach of feature selection based on Random Forest. The first step selects the features with higher variable importance score and guides the initialization of search process for the second step whose outputs the final feature subset for classification and in-terpretation. The effectiveness of this algorithm is demonstrated on KDD’99 intrusion detection datasets, which are based on DARPA 98 dataset, provides labeled data for researchers working in the field of intrusion detection. The important deficiency in the KDD’99 data set is the huge number of redundant records as observed earlier. Therefore, we have derived a data set RRE-KDD by eliminating redundant record from KDD’99 train and test dataset, so the classifiers and feature selection method will not be biased towards more frequent records. This RRE-KDD consists of both KDD99Train+ and KDD99Test+ dataset for training and testing purposes, respectively. The experimental results show that the Random Forest based proposed approach can select most im-portant and relevant features useful for classification, which, in turn, reduces not only the number of input features and time but also increases the classification accuracy.
Cite this paper: Hasan, M. , Nasser, M. , Ahmad, S. and Molla, K. (2016) Feature Selection for Intrusion Detection Using Random Forest. Journal of Information Security, 7, 129-140. doi: 10.4236/jis.2016.73009.

[1]   Suebsing, A. and Hiransakolwong, N. (2011) Euclidean-Based Feature Selection for Network Intrusion Detection. International Conference on Machine Learning and Computing, 3, 222-229.

[2]   Hasan, M.A.M., Nasser, M. and Pal, B. (2013) On the KDD’99 Dataset: Support Vector Machine Based Intrusion Detection System (IDS) with Different Kernels. IJECCE, 4, 1164-1170.

[3]   Adebayo, O.A., Shi, Z., Shi, Z. and Adewale, O.S. (2006) Network Anomalous Intrusion Detection Using Fuzzy-Bayes. Intelligent Information Processing III, 525-530.

[4]   Cannady, J. (1998) Artificial Neural Networks for Misuse Detection. National Information Systems Security Conference, 368-381.

[5]   Pal, B. & Hasan, M.A.M. (2012) Neural Network & Genetic Algorithm Based Approach to Network Intrusion Detection & Comparative Analysis of Performance. 15th International Conference on Computer and Information Technology (ICCIT), Chittagong, 22-24 December 2012, 150-154.

[6]   Hasan, M.A.M., Nasser, M., Pal, B. and Ahmad, S. (2014) Support Vector Machine and Random Forest Modeling for Intrusion Detection System (IDS). Journal of Intelligent Learning Systems and Applications, 6, 45.

[7]   Wang, Q. and Megalooikonomou, V. (2005) A Clustering Algorithm for Intrusion Detection. Defense and Security, International Society for Optics and Photonics, 31-38.

[8]   Chen, Y., Abraham, A. and Yang, J. (2005) Feature Selection and Intrusion Detection Using Hybrid Flexible Neural tree. Advances in Neural Networks—ISNN 2005, 3498, 439-444.

[9]   Lee, W., Stolfo, S.J. and Mok, K.W. (1999) A Data Mining Framework for Building Intrusion Detection Models. Proceedings of the 1999 IEEE Symposium on Security and Privacy, 120-132.

[10]   Chebrolu, S., Abraham, A. and Thomas, J.P. (2004) Hybrid Feature Selection for Modeling Intrusion Detection Systems. Neural Information Processing, 3316, 1020-1025.

[11]   Chebrolu, S., Abraham, A. and Thomas, J.P. (2005) Feature Deduction and Ensemble Design of Intrusion Detection Systems. Computers & Security, 24, 295-307.

[12]   Takkellapati, V.S. & Prasad, G.V.S.N.R.V. (2012) Network Intrusion Detection System Based on Feature Selection and Triangle Area Support Vector Machine. International Journal of Engineering Trends and Technology, 3, 466-470.

[13]   Chou, T.S., Yen, K.K. and Luo, J. (2008) Network Intrusion Detection Design Using Feature Selection of Soft Computing Paradigms. International Journal of Computational Intelligence, 4, 196-208.

[14]   Lee, W., Stolfo, S.J. and Mok, K.W. (2000) Adaptive Intrusion Detection: A Data Mining Approach. Artificial Intelligence Review, 14, 533-567.

[15]   Breiman, L. (2001) Random Forests. Machine learning, 45, 5-32.

[16]   (2010) MIT Lincoln Laboratory, DARPA Intrusion Detection Evaluation.

[17]   (2010) KDD’99 Dataset.

[18]   Bahrololum, M., Salahi, E. and Khaleghi, M. (2009) Anomaly Intrusion Detection Design Using Hybrid of Unsupervised and Supervised Neural Network. International Journal of Computer Networks & Communications (IJCNC), 1, 26-33.

[19]   Singh, S. and Silakari, S. (2009) An Ensemble Approach for Feature Selection of Cyber Attack Dataset. arXiv preprint arXiv:0912.1014.

[20]   Eid, H.F., Darwish, A., Hassanien, A.E. and Abraham, A. (2010) Principle Components Analysis and Support Vector Machine Based Intrusion Detection System. 10th International Conference on Intelligent Systems Design and Applications (ISDA), 363-367.

[21]   Tavallaee, M., Bagheri, E., Lu, W. and Ghorbani, A.A. (2009) A Detailed Analysis of the KDD CUP 99 Data Set. Proceedings of the Second IEEE Symposium on Computational Intelligence for Security and Defence Applications 2009, Ottawa, 8-10 July 2009, 1-6.

[22]   Liaw, A. and Wiener, M. (2002) Classification and Regression by Random Forest. R News, 2, 18-22.

[23]   Svetnik, V., Liaw, A., Tong, C., Culberson, J.C., Sheridan, R.P. and Feuston, B.P. (2003) Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling. Journal of Chemical Information and Computer Sciences, 43, 1947-1958.

[24]   Sandri, M. and Zuccolotto, P. (2006) Variable Selection Using Random Forests. In: Zani, S., Cerioli, A., Riani, M. and Vichi, M., Ed., Data Analysis, Classification and the Forward Search, Springer, Berlin Heidelberg, 263-270.

[25]   Qi, Y. (2012) Random Forest for Bioinformatics. In: Zhang, C. and Ma, Y.Q. Ed., Ensemble Machine Learning, Springer, US, 307-323.

[26]   Yao, J., Zhao, S. and Fan, L. (2006) An Enhanced Support Vector Machine Model for Intrusion Detection. Rough Sets and Knowledge Technology, 4062, 538-543.