thers. Additionally, nodes located on each link utilize the pairwise shared key for secure transmission  . Piyi et al. introduced an efficient data aggregation method with the constant communication overhead. The efficient actions against the passive and active privacy attacks were facilitated. The proposed method was proven to be robust for data loss and reduced transmission cost which was suited to be applied for large networks  . Arumugam and Ponnuchamy introduced an energy-efficient LEACH protocol for data gathering. The proposed protocol provided better packet delivery ratio and improved the network lifetime  . The limitations of the existing secure data aggregation technique and benefits of the proposed SAC-TA approach are shown in Table 1.
3. SAC-TA: A Secure Area Based Clustering for Data Aggregation Using Traffic Analysis
In this section, we proposed a novel secure area based clustering for data aggregation in WSN. In many applications, the physical occurrence is sensed by the sensors and reported the sensed information to the Base Station (BS). Data aggregation is used for solving the disintegration and overlapping problems occur during the data-centric routing in WSN. The data having same attribute is aggregated, when the data reach the same routing node located on the path back to the BS. The security issues, data confidentiality, and integrity are the vital factors for the data aggregation process, during the deployment of the sensor network in a hostile environment. Figure 2 shows the overall flow of the proposed approach.
To reduce the energy utilization, the application should incorporate an in-network aggregation before it reaches the BS. The compromised nodes can perform malicious actions that affect the aggregation results. Before the detection of the malicious nodes, a secure aggregation protocol will safeguard the data packets and forward the data in a secured route. The sensor nodes are separated into different clusters, where each cluster has a CH. The CH acts as an aggregator for collecting the data received from the sensor nodes.
3.1. Network Model
Let us assume a WSN of “N” sensor nodes that are randomly distributed over the area M * M. The proposed network model is based on some assumptions.
・ The sink with infinite energy level is located within the monitoring area.
・ The sensor nodes are deployed within the specified area.
・ The nodes can dynamically adjust the communication power according to the distance between the sink and other nodes.
・ The communication between the nodes is reliable and symmetric.
3.2. Area Based Clustering
The area based clustering method uses the location information for each sensor nodes. The coordinates (xi, yi) for each sensor nodes are utilized to calculate the distance between two sensor nodes. The nodes are clustered to minimize the residual energy and maximize the lifetime of the node and network. The CH is selected based on
Table 1. Comparison of the limitations of the existing secure data aggregation technique and benefits of the proposed SAC-TA approach.
Figure 2. Flow of the proposed SAC-TA approach.
the cluster center and highest residual energy. The cluster center is estimated based on the minimum distance between the cluster node and the centroid. The data gathering is performed between the cluster members to CHs; and CHs to BS. The distance between the reference nodes is calculated using the following equation
Here, and are the coordinates of the reference node.
3.2.1. Node Deployment
The nodes are deployed around the sink using the two-dimensional Gaussian distribution. The deployed nodes are battery powered with the initial energy. denotes the standard deviation for the “x” and “y” dimensions of the node. The Gaussian distribution is defined as
3.2.2. Cluster Formation
After the deployment of the nodes, the cluster is formed. A non-cluster node considers the size of the cluster “i” denoted as to decide which cluster to join in the kth round. In most of the cases, the non-cluster nodes depend on the signal strength of the CHs to decide about the cluster to join based on the assumption about the uniform distribution of the sensor nodes in the monitoring region. But this is not possible in the practical scenarios. The energy dissipation of the CH of a relatively big cluster for aggregating data from the nodes is highly greater than the CH in a smaller cluster. This leads to the imbalance of energy consumption in the CHs and results in the adverse impacts in the network lifetime. A new criterion for the cluster formation is introduced in this work, for achieving better load balancing among the CHs. When a non-cluster node decides to join a cluster, it considers the residual energy of the CH and size of the cluster. This is defined as
where, is the energy of the node “I” at the kth round; is a factor that is used for adjusting the impact of the size of the cluster and. When a sensor node is located far away from all CHs, the above equation is used to choose its CH having high residual energy and lesser cluster size than the other CHs. This ensures better load balancing to the CHs, to improve the overall lifetime and better performance of the WSN.
3.2.3. Optimal CH Selection
The CH is chosen based on the distance measurement approach. The node located at the center of the cell is selected as the CH, due to the minimum distance between the center node and other cluster member nodes. The residual energy of the node is the main condition for selecting the CH. The nodes forming a cluster are disseminated within a small region, and the sink is located far away from the nodes. The data aggregation results in the energy saving effect, since the nodes require only a minimum amount of energy for transmitting the data directly to the CH, rather than sending the data to the sink. So, the nodes located closer within a cluster are inferred due to the minimum amount of energy consumption. Hence, selection of the CH is performed based on the residual energy amount of the sensor nodes.
Let us consider a WSN with “N” number of sensor nodes. is defined as the concentration degree of the node “i”, for sensing the number of sensor nodes during the kth round.
is defined as the selection weight of the node “a” in the kth round.
where “C” is the number of clusters; “” is the initial energy of the node “a”; “” is the average residual energy of the network in the “kth” round; is the residual energy of the node “a” in the kth round; is an adaptive factor to adjust the influence of the residual energy of the node and concentration degree to the selection weight. The adaptive factor increases gradually with the reduction in the residual energy, to adapt to the decrease in the number of effective nodes in the WSN.
The CH aggregates and compresses data before transmitting the data to the sink. The optimal probability of a node being selected as a CH is defined as the function of the spatial density, during the uniform distribution of the nodes over the sensor field. Optimal clustering is achieved, when the energy consumption across all the sensor nodes is low. Let us consider that the distance between the nodes to the sink or CH is. Thus, the energy dissipation in the CH, during transmission of “B” bit message over a distance “D” in a round is given as
where, is the energy dissipated per bit; “D” is the distance between the sender and receiver nodes; “k” is the number of clusters; is the energy dissipation amount of the transmitter amplifier circuit; is the processing cost of a bit report to the sink or base station; and is the average distance between the sink and CH. The energy used in a non-CH node is equal to
where is the average distance between the cluster member and its CH. If the nodes are uniformly distributed, the is defined as
where, is the node distribution and “” is the area in which the nodes are distributed. The total energy dissipation in the network is
The optimal number of CHs is estimated by differentiating the total energy dissipation with respect to “k” and equating it to zero. The probability of a node to become a CH is computed as
where, is the energy dissipated by the transmitter amplifier circuit. If the residual energy level of the node is greater than the threshold value, optimal CH selection is performed. Otherwise, the residual energy calculation process is repeated. After the selection of the CH, data aggregation is performed and data is forwarded to the BS. If the cluster construction is not in the optimal manner, there is an exponential increase in the total energy consumption of the network, when the number of constructed clusters is greater than or less than the optimal number of clusters.
The CH is used for authenticating the nodes located inside the cluster. The CH validates the sensor nodes in the cluster, shares the secret key to the nodes, validates the data received from the nodes and forwards the data to the BS. The CH forwards the data received from the cluster member nodes after validating the signature of the data. If the sensor node is compromised by the attacker, the data received from that node is sent to the CH. The CH verifies the signature of the data received from that node. If there is a mismatch in the data signature, then the CH detects the node as a compromised node and stops the data communication with that node. After the successful transmission of data from the CH to the sink, the selected CH will lose energy for transmission. Again, a new CH having highest residual energy and minimum distance with other cluster member nodes is chosen. Figure 3 shows the flow diagram of the optimal CH selection process.
Figure 3. Flow diagram of the optimal CH selection process.
Each node maintains the energy and distance between the cluster centers in the routing table. For the predefined time slot t, the CH is selected and data aggregation process takes place. Through the periodic reassignment of the CH role to various nodes, the problem of single point of failure during the depletion of node energy is prevented. It is a really critical task to avoid the failure of the nodes caused by the early depletion of energy, to ensure high network lifetime. After the selection of the CH, the traffic behavior is analyzed. If the traffic is found as normal, the data transmission process is performed. During the detection of the abnormal traffic, the one-time key is generated and checked. If the key is valid, the data transmission is executed. Otherwise, the node is added to the block list.
3.3. Traffic Analysis
The amount of traffic between the sensor nodes is estimated for a specific period. During this period, the amount of traffic emitted from all other regions and the average size of the sent packets are recorded. If the difference between the estimated and actual values of the traffic is more than a threshold value, the probability of occurrence of attack is detected. This includes three main steps:
1) Assess the amount of traffic generated by the nodes located adjacent to the suspected node.
2) Estimate the traffic amount and compare it with the actual values.
3) Comparing the average size of the sent packets with the pre-stored information.
If the increase in the generated traffic occurs as a result of detection of an event in the area covered by the sensor node, the neighbor nodes should have experience the same increase. Therefore, the normal and misbehavior of the nodes is properly differentiated. If the adjacent nodes do not show noticeable increase in the traffic volume, the possibility of occurrence of an attack is detected more precisely. The packet size comparison is performed based on the concept that if the attacker tries to improve the packet sending rate, power consumption of the nodes will be high. Hence, the misbehaving node is identified based on the traffic analysis.
3.4. One-Time Key Generation
In the symmetric key generation process, a static secret key is generated by using the cipher text. During the symmetric key cryptography, if there is N number of nodes, then there is a need to generate (N − 1) keys in the network. If the value of N is large, it requires more memory space for storing the large key values. The public key cryptography system requires huge computation power, but the sensor has less processing power. The public key cryptography system is not efficient in the WSN applications. Hence, the dynamic key generation is mostly preferred for the security purpose in WSN, since the dynamic keys are single-time usable symmetric cryptographic keys for forming a sequence of keys. The probability of breaking a dynamic key is low. The dynamic key management does not involve a central key controller such as BS or third party in the rekeying process of the nodes. The key management is handled by the dynamically assigned key controllers. The cryptographic keys are provided securely, while preventing the activities of the attacker nodes inside a network. During the detection of a compromised sensor node, the current secret key of the compromised node is canceled and a new key is generated. This new key is distributed to its associated nodes, except the compromised node. Moreover, it is highly desirable for a dynamic key management scheme to maintain the security of the keys and avoid collusion between the compromised nodes and newly joined nodes. The dynamic key management schemes prevent a single point of failure and ensure high network scalability. However, these schemes are highly prone to the design errors because the compromised nodes can participate in the node removal process.
3.4.1. Key Generation
Several security parameters are defined during the construction of the Secure One-Time (SOT) key  . For
signing B-bit messages, initially ‘n’ and ‘k’ are selected such that and the security parameter “p” and
one-way hash function “” that operates on p-bit strings is chosen. The p-bit string are generated randomly to generate the public key. Let for. The public key is and private key is.
3.4.2. Signature Generation
Let for signing a message “M” with the secret key. Then, “H” is split into “k” substrings. Finally, each is interpreted as an integer for. The resulting signature is .
3.4.3. Signature Verification
The signature verification is similar as the signature generation process. If the verifier has the message “M”, signature and public key. The signature is accepted only if , for each “z”. Otherwise the signature is rejected.
In this scheme, the public key component can be used for multiple times. Signature generation requires only a single call to the hash function. But, the verification process requires “k” calls to the hash function. The main advantage of our scheme is the smaller signature size.
3.4.4. Key Generation Procedure
1) Each node chooses “n” number of secret key components at random.
2) Each node creates a “m” number of hash chain of length “l”.
3) The public key components are obtained through an one-way function. H denotes the hash function.
3.4.5. Public Key Handling
Each node generates a set of one-time keys that are distributed locally among the nodes. Since the one-time keys are valid once or used for a limited period of time, there is a need to update the one-time key for the nodes. Initially, the public key called as Initial key is distributed safely to the nodes during the system setup. This guarantees that each neighboring node holds the authentic copy of the public key. When, a new node enters into the network, it receives the public key and broadcasts to its neighbors. Hence, successive one-time public keys are distributed efficiently through the periodic broadcasting of the Hello message.
The first secret key SK1 is defined as
and the corresponding public key is
3.4.6. System Setup
If a node enters into the network, it is notified about the security parameters in the network. Then, the node selects its secret key components and creates a hash chain (Figure 4).
3.4.7. Route Discovery
If the source node (I) initiates a route discovery to a certain destination node, it simply generates a signature over the Route Request (RREQ). When the CH1 receives the RREQ, the signature of the source node is initially verified. If the signature is found to be correct, the CH1 hashes the received message and generates its own signature by using the SOT key generation scheme. The whole message is retransmitted to the next CH2. Once the CH2 receives this double signed RREQ, it initially verifies the previous hop using the public key of CH1 received through the Hello messages. If the one-time signature is found correct, CH2 hashes the signature again, creates a signature over the hash for replacing the signature of CH1. Then, this new message is broadcasted to the BS, only if both the signatures are correct. These operations are repeated until the RREQ reaches the desti-
Figure 4. Hash chain of secret key components.
nation node (BS). When RREQ reaches the BS, the BS performs same verification operations as each CH. Then, a route reply (RREP) is generated and signed in a same manner as RREQ. Each CH will transmit the RREP to the source node through the reverse route and same operations are performed along the route.
In our proposed work, the SOT key is generated dynamically based on the source ID, random number and position coordinates of the nodes. The main idea to generate the one-time key is to avoid distribution of the long-term shared cryptographic keys. Due to the randomness, the one-time key is unbreakable by the compromised nodes. Hence, the secret key is generated each time during transmission of packet between the nodes. The one-time key generation is used to prevent the compromised node or third party from extracting the key. Even if the source node is a compromised node, the one-time key is not hacked by the source node. The security of the one-time key depends on the randomness property.
As the sensor nodes report their data, the direction of the data movement is habitually towards the BS. This asymmetric communication design can assist a malicious node in tracking down the location of the BS. It can result in malicious launching the serious attacks on the BS and eventually settling down the entire network. There are many ways to track the location of BS:
1) If the malicious node can understand the information of the packet being transmitted, then the malicious nodes can correlate the packets that are transmitted towards the BS. It will permit the malicious node to follow the direction of these packets towards the locality of the BS, which leads to discovery/jamming and destruction of the BS.
2) If there is a time correlation between when a node receives or forwards a packet, a malicious node can use the time correlation to estimate the direction towards the BS.
Hence, it is necessary to block the malicious nodes from the data aggregation and data transmission process. The misbehavior of the nodes is checked by analyzing the traffic flow between the nodes. Before the key generation process, the location of the node is updated. The key generation step is initiated, if the traffic flow is found to be abnormal. If the one-time key of the encrypted packet is invalid, the current packet is dropped, and the node is added to the block list. Data processing is performed, only if the one-time key is found to be valid. After the BS receives the aggregated data from all the child nodes, it decrypts and validates the signature. It estimates the final aggregation result just like the operation of the normal intermediate node. The final aggregation result stored in the BS. The following algorithm is proposed to block the malicious nodes from the network.
The SOT key is generated based on the key generation scheme adopted in  . Here, IDS denotes the source ID, RP is the random prime number generated for CH, (Posx, Posy) is the node coordinates and OT_Ki denotes the SOT key generated for the corresponding nodes. The Exclusive-OR operation is performed in the SOT key generation process. Figure 5 shows the flowchart of the onetime key generation algorithm.
4. Performance Analysis
In this section, the performance of the proposed Secure Area Based Clustering for Data Aggregation Using
Figure 5. Flowchart for the one-time key generation algorithm.
Table 2. Simulation parameters.
Traffic Analysis (SAC-TA) is analyzed and compared with the existing Secure Data Aggregation Technique (SDAT)  . For analysis, there are 200 sensor nodes are deployed, and the results are taken for the simulation time up to 100 s and tested in NS2 tool. Table 2 presents the simulation parameters.
4.1. Energy Analysis for Data Aggregation
The energy consumption of the sensor network over a period can be estimated as follows:
・ Energy spent for sensing the channel.
・ Energy spent to send the packets from the sensor nodes to the CHs.
・ Energy spent to receive the gathered data from the CHs to the BS.
・ Average energy consumption and remaining energy for the entire data gathering process.
Scenario 1: There are totally 600 packets are gathered by the CHs among various sensor nodes. The CHs collect the packets and validate it to analyze the gathered information containing any malicious information. Figure 6(a) shows the average energy consumption for a number of packets to be gathered from the sensor nodes to the corresponding CHs. The results are compared with the existing Secure Data Aggregation Technique (SDAT). The proposed SAC-TC results in the lesser energy consumption than the existing SDAT.
Scenario 2: The BS collected the gathered data from the CHs. If any misbehaving activities are identified, then a one-time key is generated to identify whether the packet is included with the false injected data or it includes the secured gathered data. After the process gets completed, the remaining energy for the corresponding packet reception is calculated to determine the energy utilization rate of each node. The energy efficiency of the node is high if the remaining energy value is high.
Figure 6(b) depicts the results for the average remaining energy for the proposed SAC-TA with the existing
(a) (b)(c) (d)
Figure 6. Energy Analysis for the proposed SAC-TA and the existing SDAT (a) Average energy consumption for sending the packets to CHs, (b) Average remaining energy after receiving the packets, (c) Average remaining energy across the simulation time (100 s) and (d) Average energy consumption across varying the number of nodes.
SDAT approaches. The proposed approach results higher remaining energy than the existing SDAT.
Scenario 3: The process is evaluated for simulation time up to 100 s. The remaining energy is noted for every 20 seconds, and it examined with the existing method. We are using the highest energy nodes as the CHs to aggregate the packets. Hence, the remaining energy is obviously high, which helps to increase the network lifetime. Figure 6(c) shows the average remaining energy for the proposed and existing method. The proposed SAC-TA results in the higher residual energy than the existing method SDAT.
Scenario 4: In our analysis, 200 sensor nodes are taken for data aggregation process. The average energy consumption is examined for varying the nodes. Figure 6(d) shows the comparison result for the proposed SAC-TA with the existing SDAT regarding the variation in the number of nodes up to 200. The results show that the proposed approach takes lesser energy consumption than the existing method.
4.2. Time Taken for Secure Data Aggregation
The time taken to complete the secure data aggregation refers to the time taken to transfer a packet across a sensor network from the sensor to CHs and CHs to BS. The equation to estimate the End-to-End (E2E) delay is defined as follows,
Here, denotes the transmission delay; is the propagation delay; denotes the processing delay; and “N” denotes the number of links. The CH is selected based on the predefined time slot, to improve the system performance without causing any unwanted link failure. Hence, it automatically reduces the E2E delay for data aggregation.
Figure 7 shows the E2E delay for the proposed SAC-TA and existing SDAT which is quite lesser than the existing method.
4.3. False Data Detection
The malicious node hacks the network and stole the gathered data. The false data is injected by the malicious nodes to intrude the network performance. The malicious node needs to be isolated and removed from the network. Here, the traffic flow is analyzed and the abnormal behavior of the sensor nodes is identified. A one-time key generation procedure is introduced to validate the encrypted information about the gathered data. It helps to identify the false data on the gathered data to provide the secure data gathering environment.
Figure 8 shows the false data detection rate for the proposed SAC-TA with the existing method. The proposed approach results in better detection rate than the existing method for varying the data transmission.
Figure 7. Comparison of end-to-end delay for SCA-TA and SDAT.
Figure 8. False data detection across the transmission data for SAC-TA and SDAT.
Figure 9. Aggregation accuracy for SAC-TA and SDAT.
4.4. Data Aggregation Accuracy
The data aggregation accuracy is an important factor to predict the successful system performance. The accuracy measure is taken for varying the simulation time up to 100 s. It shows that the process gets completed at the final stage and results in the accuracy of about 89.5% to receive the gathered data at the BS. Totally 600 packets are transmitted by the sensor nodes to CHs, and approximately 576 packets are gathered and collected by the BS. Figure 9 shows the aggregation accuracy for the proposed approach with the existing method. The comparative analysis of the resultant aggregation accuracy values for the proposed SAC-TA approach and existing SDAT method are shown in Table 3. It shows the proposed SAC-TA approach achieves better accuracy rate than the existing SDAT method.
4.5. Alive Nodes Analysis
The number of alive nodes is calculated for each round to find the energy efficiency of the network. After com-
Table 3. Comparative analysis for aggregation accuracy.
Figure 10. Number of alive nodes after data aggregation completion.
pletion, the proposed approach contains approximately 84 alive nodes, whereas the existing holds only 30 alive nodes. Figure 10 shows the comparison of alive nodes for proposed and existing methods. The proposed approach holds higher alive nodes than the existing method. The proposed SAC-TA method improves the network lifetime. Hence, it is clearly evident that the proposed approach achieves better performance than the existing methods.
5. Conclusion and Future Work
In this paper, a SAC-TA approach is presented for data aggregation in WSN. The false injection attack is identified based on the traffic analysis at the time of route discovery process. One-time key generation algorithm is introduced to eliminate the malicious nodes from the network. The efficiency of the proposed data aggregation scheme is evaluated by comparing it with the existing secure data aggregation scheme (SDAT). It results in the lower energy consumption to gather the data from diverse sensor nodes. It automatically improves the residual energy, because half of the sensor nodes are alive at the end of the gathering process. The proposed approach also achieves low end-to-end delay and better false detection rate and aggregation accuracy, when compared to the existing method. In the future work, the behavior of the WSN on internal attackers is investigated, and the SAC-TA scheme is extended to resist such attacks. This will guarantee the secure data aggregation under the presence of attacks.