DDoS Attack Detection Using Heuristics Clustering Algorithm and Naïve Bayes Classification

Show more

1. Introduction

In today’s world of high speed internet and network system, security of system from various threats has been a major concern world widely. Among various possible network threats and attacks, Distributed Denial of Service attack is the attack with most devastating effects. A Denial of Service attack is the type of attack that typically uses a single computer and one internet connection to flood a targeted system or resources [1] so as to prevent the legitimate users from accessing the system or the resources. A distributed denial of service attack is one in which a multitude of compromised systems attack a single target, thereby causing denial of service for users of the targeted system.

Intrusion detection is “the process of monitoring the events occurring in a computer system or network and analyzing them for signs of intrusions, defined as attempts to compromise the confidentiality, integrity, availability, or to bypass the security mechanisms of a computer or network” [2] . There are generally two types of Intrusion detection system: Misuse detection and Anomaly detection. In misuse detection, each instance in a data set is labeled as “normal” or “intrusion” and a learning algorithm is trained over the labeled data. Whereas an anomaly detection technique builds models of normal behavior, and automatically detects any deviation from it, flagging the latter as suspect [2] .

For developing an effective intrusion detection system, data mining techniques have been very helpful and a lot of research is ongoing these days because data mining approach is useful for extracting a wide range of features from network flow which can be helpful for distinguishing the attack packet from normal packet. In this proposed system, clustering followed with classification technique of data mining has been used. Clustering is the unsupervised technique that is used to group together the similar items to extract new knowledge from a largely data set. While classification is a data mining technique that assigns categories to collection of data in order to aide in more accurate predictions and analysis.

Clustering technique means separating dissimilar items, according to some defined dissimilarity measure among data items themselves [3] . The most widely used clustering technique for DDoS detection is K-means Clustering algorithm that separates the anomaly packet from normal packet. A variation of K-means algorithm called as K-Medoids has also been used. K-Means algorithm takes the mean of data point as the cluster center therefore is influenced by the extreme values and outliers. It is simple, has low time complexity but is sensitive to initial centers since we need to assume the number of cluster at the beginning of the clustering and the initial centers are chosen at the random. The other major shortcomings of K-Means are: 1) degeneracy and 2) incapability to process the character attributes of network packet. K-Medoids algorithm however solves the degeneracy problem of K-Means algorithm since in K-Medoids we choose the actual data objects present in the data set as the center of the cluster instead of taking the mean value of the data sets. It is more robust to noises and outliers. Therefore, we can say that the existing works that has been based on K-Means and K-Medoids has three shortcomings namely degeneracy, cluster dependency and lacking of the ability of dealing with character attributes in the network transactions.

Classification technique categories the available set of data for accurate analysis. The category can be termed as class label. In case of anomaly detection, it will classify the data generally into two categories namely normal or abnormal [4] . A Naïve Bayes classifier is a simple probabilistic classifier based on applying Bayes’ theorem with strong (naïve) independence assumptions. A Naïve Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature. Depending on the precise nature of the probability model, Naïve Bayes classifiers can be trained very efficiently in a supervised learning setting.

Bayes Theorem can be expressed as:

$P\left(H|X\right)=P\left(X|H\right)P\left(H\right)/P\left(X\right)$ (1)

Let X be the data record, H be some hypothesis representing data record X, which belongs to a specific class C. For classification, we would like to determine P(H|X), which is the probability that the hypothesis H holds, given an observed data record X. P(H|X) is the posterior probability of H conditioned on X. In contrast, P(H) is the prior probability. The posterior Probability P(H|X), is based on more information such as background knowledge than the prior probability P(H), which is independent of X. Similarly, P(X|H) is posterior probability of X conditioned on H. Bayes theorem is useful because it provides ways to calculate the posterior probability P(H|X) from P(H), P(X), and P(X|H) [5] .

Therefore, the use of Heuristic Clustering Algorithm followed by Naïve Bayes Classification in this paper has contributed to overcome the problem of degeneracy, has developed as DDoS attack detection system that takes into account of both the character and numerical attributes of the network data packet. The proposed hybrid learning approach has lead into better performances in terms of Accuracy, Detection Rate and False Positive Rate and has proved that hybrid learning approach is better than Clustering and Classification technique alone.

2. Related Work

M. Jianliang, et al. has introduced the application on intrusion detection based on K-means clustering algorithm. K-means is used for intrusion detection to detect unknown attack and partition large data space effectively but it has many disadvantages like degeneracy and cluster dependence. Yu Guan, et al. has introduced Y-means algorithm which is a clustering method of intrusion detection. This algorithm is based on K-means algorithm and other related clustering algorithm. It overcomes two short comings of K-means i.e. no of cluster dependency and degeneracy. Zhou mingqiang, et al. has introduced a new concept of a graph based clustering algorithm for anomaly based clustering algorithm for anomaly intrusion detection. They used outlier detection method which is based on local deviation coefficient (LDCGB). Compared to other intrusion detection algorithm of clustering this algorithm is unnecessary to initial cluster number.

T. Velmurugan and T. Santhanam have analyzed the efficiency of k-Means and k-Medoids clustering algorithms by using large datasets in the cases of normal and uniform distribution; and found that the average time taken by k-Means algorithm is greater than that of k-Medoids algorithms for both the cases [2] .

M. Jianliangetall has implemented K-means algorithm to cluster and analyze the data of KDD-99 dataset. This algorithm can detect unknown intrusions in the real network connections. The simulations results that run on KDD-99 data set showed that the K-means method is an effective algorithm for partitioning large data set. Jose F. Nieves presented a comparative study with more emphasis on the unsupervised learning methods for anomaly detection. K-means algorithm with KDD Cup 1999 network data set is used to evaluate the performance of an unsupervised learning method for anomaly detection. High detection rate can be achieved while maintaining a low false alarm rate is the results of this work evaluation [6] .

K. Sarmila, G. Kavin has introduced the Heuristic clustering algorithm to cluster the data and detect DDoS attacks i; n DARPA 2000 datasets and has obtained better results in terms of detection rate and false positive rate in comparison to K-Means and K-Medoids algorithm. Chitrakar R and Huang chuanhe has proposed a hybrid learning approach of combining k-medoids clustering and naive bayes classification that has grouped the whole data into clusters more accurately than K-means such that it results in better classification. The hybrid approach was tested in Kyoto 2006+ datasets.

3. Proposed Method

Figure 1 here shows the system block diagram of the proposed algorithm .Here, the workflow starts with the extraction of nine network attributes from the datasets followed by the preprocessing of data to eliminate those data values that would ultimately result in wrong output. Once, the dataset is prepared after preprocessing, those datasets are fed into Heuristics Clustering Algorithm that results

Figure 1. System block diagram.

in cluster formation. After the cluster formation dataset is then classified as either Attack or Normal instances using Naïve Bayes Classification.

The proposed method uses Heuristic Clustering Algorithm for clustering of data which is then followed by Naïve Bayes Classification for classifying the clusters into either Normal or Attack instances. For comparison of the results obtained from the proposed method with the result from existing system of reference paper, labelling scheme defined in the paper is also performed after clustering. Finally, the result obtained is compared using the performance parameters namely Accuracy, Detection Rate and False Positive Rate. The algorithm used is discussed below.

3.1. Heuristic Clustering Algorithm

1) Some Notations

Notation1: Let $H=\left\{{H}_{1},{H}_{2},\cdots ,{H}_{m}\right\}$ be a set of attribute values, the m is number of attribute values

Notation 2: Let
$H={H}_{N}\cup {H}_{S}$ and
${H}_{N}\cap {H}_{S}=\varnothing $ , where H_{N} is the subset of numerical attribute and H_{S} is the subset of character attribute.

Notation 3: Let,
${e}_{i}=\left({h}_{i1},{h}_{i2},\cdots ,{h}_{im}\right)$ , e_{i} is a record, the m is number of attribute values and h_{ij} is the value of H_{m}.

Notation 4: $E=\left\{{e}_{1},{e}_{2},\cdots ,{e}_{n}\right\}$ , E is the set of records; n is the number of packets [2] .

The Center of Cluster

A cluster is represented by its cluster center. In the HCA algorithm, we use the algorithm Count ( ) to compute the cluster center. The center of a cluster is composed of the center of numerical attributes and character attribute. Let P = (P_{N} + P_{S}), and P = (P_{1}, P_{2}, ∙∙∙, P_{m}) where P_{N} is the center of numerical attribute, the P_{S} is the center of character attribute,

${P}_{N}=\frac{1}{n}{\displaystyle {\sum}_{j=1}^{n}hji},\text{\hspace{0.17em}}i=1,2,\cdots ,p\text{\hspace{0.17em}}\left(p\le m\right)$ (2)

The hji is the numerical attribute and P_{S} is the frequent character attribute set which consists of q most frequent character attribute [2] .

2) The Initial Center of Cluster

In the beginning of clustering, we should confirm two initial center of clustering by the algorithm Search ( ).

Algorithm: Search_m(E,l).

Input: E = data set

l = number of sampling

Output: Initial center m_{1}, m_{2}.

Pseudocodes:

1) From the set of data E, get samples S_{1}, S_{2}, ∙∙∙, S_{l}

2) For i ← 1 to L

m_{i} = Count_m(Si) // m = center {m_{1},m_{2}, m_{3}, ∙∙∙m_{l}}

3) m_{1} = m, m_{2} = max (Sim (m, m_{i})) [2]

3) Computing Similarity

The dataset consists of numerical attribute and character attribute. The similarity of character attributes is calculated through attribute matching.

Let e_{i} and e_{j} be two records in the E. all containing m attributes (including P character attributes), the nhik and nhjk is the number of hik and hjk respectively.

$Si{m}^{P}\left({e}_{i},{e}_{j}\right)={\displaystyle {\sum}_{k=1}^{p}\frac{nhik+nhjk}{nhik\ast nhjk}\ast A}$ (3)

If (hik = hjk) then A = 0 else A = 1.

The similarity of numerical attribute (to the numerical attribute, still use the classical Euclidean distance to computer similarity.

$Si{m}^{N}\left({e}_{i},{e}_{j}\right)=\sqrt{{\displaystyle {\sum}_{k=1}^{q}\left|hik-hjk\right|2}}$ (4)

The similarity of two records (including similarity of numerical attribute and similarity of character attribute) is calculated as:

$Sim\left({e}_{i},{e}_{j}\right)=Si{m}^{N}\left({e}_{i},{e}_{j}\right)+Si{m}^{P}\left({e}_{i},{e}_{j}\right)$ (5) [2]

4) Heuristic Clustering Algorithm

Step 1. Confirm two initial cluster centers by algorithm search ( ).

Step 2. Import a new record.

Step 3. Compute the similarity between the new record and the centers of clusters by algorithm Similar ().

Step 4. Compute the similarity between the centers of clusters.

Step 5. If the minimum similarity between the record and centers of clusters is greater than the minimum similarity between the centers of clusters, create a new cluster with the record as the new center until no change [2] .

5) Labelling

In the labeling method, we assume that center of a normal cluster is highly close to the initial cluster center vh which are created from the clustering. In other words, if a cluster is normal, the distance between the center of the cluster and vh will be small, otherwise it will be large. Thus, we first, for each cluster center Cj, calculate the maximum distance to vh. We then calculate the average distance of the maximum distances. If the maximum distance from a cluster to vh is less than the maximum average distance, we label the cluster as normal. Otherwise, label as attack. Here the similarity measure is used as the distance measure i.e. Attribute Matching for character attributes and Euclidean distance measure for numerical attributes [2] .

3.2. Naïve Bayes Classification

Input: D: Data set having n data objects

C: Set of classes e.g. {Normal; Attack}

X: Data record to be classified

H: Hypothesis (that X is classified into C)

Output: The predicted class CNB where X should be classified into.

Pseudocodes:

For j ← 1 to no. of classes

Cj_count ← no. of Di where Di.class_label = j;

P(Cj) ← Cj_count/n;

For each attribute value Xl in X

Xl_count ← no. of Xl in Cj;

P(Xl |Cj) ← Xl_count / Cj_count;

EndFor

P(X) ← average (P(Xl |Cj));

Endfor

For j ← 1 to no_of_classes

P(Cj|X) ← P(Cj/H) * P(Cj) / P(X)

CNB = max(P(Cj|X)) [5]

4. Experiments and Results

Two sets of experiments are performed as:

1) Heuristics Clustering Algorithm with Labelling

2) Heuristics Clustering Algorithm with Naïve Bayes Classification

a) Selection of Experimental Data

To perform the series of experiments 12 samples of two different datasets namely “CAIDA UCSD DDoS Attack 2007 Dataset” and DARPA 2000 Dataset” with each sample consisting of 10,000 datasets are selected.

b) Extraction of Network Attributes

The set of 9 data packet attributes are extracted from the dataset. The attributes are Source IP Address, Destination IP Address, Protocol, Source Port, Destination Port, Sequence number, Acknowledgment number, length, and Window size.

c) Data Pre-processing

Data pre-processing is done to eliminate all those data packets that would ultimately lead to wrong results using data analysis tools: Wireshark Tool.

d) The Experimental Procedure

Using the selected sets of data samples, both the programs are executed simultaneously and the number of true positive, true negative, false positive and false negative values of both the programs are recorded and used in the performance evaluation of both the programs.

e) Performance Parameters

The performance of the proposed algorithm is evaluated using the Performance parameters namely Accuracy (A), Detection Rate (DR) and False Positive Rate (FPR) using following equations:

$\text{Accuracy}\left(\text{A}\right)=\left(\text{TP}+\text{TN}\right)/\left(\text{TP}+\text{TN}+\text{FP}+\text{FN}\right)$ (6)

$\text{DetectionRate}\left(\text{DR}\right)=\left(\text{TP}\right)/\left(\text{TP}+\text{FP}\right)$ (7)

$\text{FalsePositiveRate}\left(\text{FPR}\right)=\left(\text{FP}\right)/\left(\text{FP}+\text{TN}\right)$ (8)

where,

True Positive (TP) = Attacks that are correctly detected as attack

True Negative (TN) = Normal data that are correctly detected as normal

False Positive (FP) = Normal data that are incorrectly detected as attack

False Negative (FN) = Attack that are incorrectly detected as normal

The below shown tables illustrates the improvement of accuracy, detection rate and false positive rate of the proposed algorithm i.e. Heuristics Clustering Algorithm with Naïve Bayes Classification over Heuristics Clustering algorithm with Labelling.

Table 1 here shows the improvement in Accuracy with HCA Clustering with NB Classification in UCSD DDoS attack 2007 dataset where we can see the average improvement of 8.16% with highest improvement of 25.82% and lowest as 2.11%.

Table 2 here shows the improvement in Accuracy with HCA Clustering with NB Classification in DARPA 2000 dataset where we can see the average improvement of 14.31% with highest improvement of 29.67% and lowest as 0.8%.

Table 3 here shows the improvement in Detection Rate with HCA Clustering with NB Classification in UCSD DDoS attack 2007 dataset where we can see the average improvement of 32.21% with highest improvement of 66.35% and lowest as 1.71%.

Table 4 here shows the improvement in Detection Rate with HCA Clustering with NB Classification in DARPA 2000 dataset where we can see the average improvement of 42.49% with highest improvement of 90% and lowest as 0.1%.

Table 5 here shows the reduction in False Positive Rate with HCA Clustering with NB Classification in UCSD DDoS attack 2007 dataset where we can see the average reduction of 1.22% with highest reduction of 2.6% and lowest as 0.04%.

Table 6 here shows the reduction in False Positive Rate with HCA Clustering with NB Classification in DARPA 2000 dataset where we can see the average reduction of 11.84% with highest reduction of 27.03% and lowest as 0.8%.

Table 1. Comparison of accuracy in CAIDA UCSD DDoS attack 2007 dataset.

Table 2. Comparison of accuracy in DARPA 2000 dataset.

Table 3. Comparison of Detection Rate in CAIDA UCSD DDoS Attack 2007 Dataset.

Table 4. Comparison of detection rate in DARPA 2000 dataset.

Table 5. Comparison of false positive rate in CAIDA UCSD DDoS attack 2007.

Table 6. Comparison of false positive rate in DARPA 2000 dataset.

Performance Analysis

From the above experiments and results, it is seen that the Accuracy and Detection Rate has been improved with corresponding reduction in False Positive Rate. Therefore, the proposed algorithm has justified it’s intend of improving the results in terms of performance parameter of Heuristics algorithm alone.

Figure 2 shows the improvement in Accuracy with HCA clustering followed by NB classification where we can see that the highest improvement is 25.82% and lowest improvement is 3.74% for CAIDA dataset. Whereas, the highest improvement is 29.67% and the lowest improvement is 0.80% for DARPA dataset.

Figure 3 shows the improvement in Detection Rate with HCA clustering followed by NB classification where we can see that the highest improvement is 66.35% and lowest improvement is 4.99% for CAIDA dataset. Whereas, the highest improvement is 80.88% and the lowest improvement is 0.10% for DARPA dataset.

Figure 4 shows the improvement in False Positive Rate with HCA clustering followed by NB classification where we can see that the highest improvement is 2.60% and lowest improvement is 0.04% for CAIDA dataset. Whereas, the highest improvement is 27.03% and the lowest improvement is 0.80% for DARPA dataset.

Figure 2. Improvement in Accuracy with HCA Followed by NB Classification.

Figure 3. Improvement in Detection Rate with HCA Clustering followed by NB Classification.

Figure 4. Improvement in false positive rate with HCA clustering followed by NB classification.

5. Conclusions and Future Works

From the above analysis we can infer that for both the datasets, Heuristic Clustering Algorithm followed by Naïve Bayes Classification results in better result in terms of higher Accuracy, higher Detection Rate and lower False Positive Rate in comparison to result obtained from Heuristic Clustering Algorithm with Labelling.

In this work, we have performed all the experiments by taking a uniform sample size for both the datasets and 10% attack data is used collectively for the 12 data samples i.e. attack percentage is taken at random for 12 different data samples to reach the total 12 percentage margins. We have used Naïve Bayes Classification method that works very well for good data distributions but data distribution model varies from environment to environment for intrusion detection system.

Therefore in future, this work can be extended as:

1) Data distribution can be changed i.e. both small size and large data size samples can be taken instead of equal size uniform samples for testing the result.

2) Equal percentage of attack data can be taken for each data samples.

3) Since, Naïve Bayes Classification works well only for good data distribution another classification technique like Support Vector Machine that works better for small sized samples as well can be taken into consideration for future work.

Acknowledgements

The authors would like to extend their gratitude to Department of Graduate Studies, Nepal College of Information Technology for its constant support and motivation. We would also like to thank the Journal of Information Security for its feedbacks and reviews.

References

[1] SANS Institute InfoSec Reading Room (2011) Denial of Service Attacks and Mitigation Techniques: Real Time Implementation with Detailed Analysis. SANS Institute Reading Room Site.

[2] Sarmila, K. and Kavin, G. (2014) A Clustering Algorithm for Detecting DDoS Attacks in Networks. International Journal of Recent Engineering Science, 1, ISSN: 2349-7157.

[3] Bhaya, W. and Manaa, M.E. (2014) Review Clustering Mechanisms of Distributed Denial of Service Attacks. Journal of Computer Science, 10, 2037-2046, ISSN: 1549-3636.

https://doi.org/10.3844/jcssp.2014.2037.2046

[4] Shikha, A. and Jitendra, A. (2015) Survey on Anomaly Detection using Data Mining Techniques. 19th International Conference on Knowledge Based and Intelligent Information and Engineering Systems.

[5] Chitrakar, R. and Chuanhe, H. (2012) Anomaly Based Intrusion Detection Using Hybrid Learning Approach of Combining k-Medoids Clustering and Naïve Bayes Classification. Proceedings of 8th IEEE International Conference on Wireless Communications, Networking and Mobile Computing (WiCOM).

[6] Digital Attack Map. In: Digitalattackmap.com. N.p., 2017. Web. 26 Apr. 2017.