WSN  Vol.2 No.2 , February 2010
K-Nearest Neighbor Based Missing Data Estimation Algorithm in Wireless Sensor Networks
Abstract: In wireless sensor networks, the missing of sensor data is inevitable due to the inherent characteristic of wireless sensor networks, and it causes many difficulties in various applications. To solve the problem, the missing data should be estimated as accurately as possible. In this paper, a k-nearest neighbor based missing data estimation algorithm is proposed based on the temporal and spatial correlation of sensor data. It adopts the linear regression model to describe the spatial correlation of sensor data among different sensor nodes, and utilizes the data information of multiple neighbor nodes to estimate the missing data jointly rather than independently, so that a stable and reliable estimation performance can be achieved. Experimental results on two real-world datasets show that the proposed algorithm can estimate the missing data accurately.
Cite this paper: nullL. Pan and J. Li, "K-Nearest Neighbor Based Missing Data Estimation Algorithm in Wireless Sensor Networks," Wireless Sensor Network, Vol. 2 No. 2, 2010, pp. 115-122. doi: 10.4236/wsn.2010.22016.

[1]   D. E. Cullar, D. Estrin, and M. Strvastava, “Overview of sensor networks,” IEEE Computer, Vol. 37, No. 8, pp. 41–49, 2004.

[2]   W. F. Fung, D. Sun, and J. Gehrke, “Cougar: the network is the database,” In SIGMOD Conference, pp. 621, 2002.

[3]   S. Madden, M. J. Franklin, J. M. Hellerstein, and W. Hong, “The design of an acquisitional query processor for sensor networks [C],” In SIGMOD. San Diego, Califor- nia, 2003.

[4]   Y. Yao and J. Gehrke, “The cougar approach to in- network query processing in sensor networks,” In SIGMOD Record, Vol. 31, No. 3, pp. 9–18, 2002.

[5]   G. Tolle, “Sonoma redwoods data,” 2005. http://www.cs.

[6]   S. Madden, “Intel Berkeley research lab data,” 2003.

[7]   X. Zhu, S. Zhang, J. Zhang, and C. Zhang, “Cost- sensitive imputing missing values with ordering,” In AAAI. Vancouver, Canada, pp. 1922–1923, 2007.

[8]   N. A. Setiawan, P. A. Venkatachalam, and A. F. M. Hani, “Missing attribute values prediction based on artificial neural network and rough set theory,” In BMEI. Sanya, Hainan, China, pp. 306–310, 2008.

[9]   M. S. B. Sehgal, I. Gondal, L. Dooley, and R. L. Coppel, “Ameliorative missing value imputation for robust biological knowledge inference,” Journal of Biomedical Informatics, Vol. 41, No. 4, pp. 499–514, 2008.

[10]   M. S. B. Sehgal, I. Gondal, and L. Dooley, “Collateral missing value imputation: A new robust missing value estimation algorithm for microarray data,” Bioinformatics, Vol. 21, No. 10, pp. 2417–2423, 2005.

[11]   H. Kim, G. H. Golub, and H. Park., “Missing value estimation for dna microarray gene expression data: local least squares imputation[J],” Bioinformatics, Vol. 22, No. 11, pp. 1410–1411, 2006.

[12]   O. G. Troyanskaya, M. Cantor, G. Sherlock, P. O. Brown, T. Hastie, R. Tibshirani, D. Botstein, and R. B. Altman, “Missing value estimationmethods for dna microarrays,” Bioinformatics, Vol. 17, No. 6, pp. 520–525, 2001.

[13]   C. Zhang, X. Zhu, J. Zhang, Y. Qin, and S. Zhang, “Gbkii: An imputation method for missing values,” In PAKDD. Nanjing, China, pp. 1080–1087, 2007.

[14]   S. Zhang, J. Zhang, X. Zhu, Y. Qin, and C. Zhang, “Missing value imputation based on data clustering,” Transactions on Computational Science, Vol. 1, No. 1, pp. 128–138, 2008.

[15]   A. Manjhi, S. Nath, and P. B. Gibbons, “Tributaries and deltas: efficient and robust aggregation in sensor network streams,” In SIGMOD Conference. Baltimore, Maryland, pp. 287–298, 2005.

[16]   A. Silberstein, K. Munagala, and J. Yang, “Energy- efficient monitoring of extreme values in sensor networks,” In SIGMOD Conference. Chicago, Illinois, pp. 169–180, 2006.

[17]   D. J. Abadi, S. Madden, and W. Lindner, “Reed: robust, efficient filtering and event detection in sensor networks,” In VLDB, Trondheim, Norway, pp. 769–780, 2005.

[18]   X. Yang, H. B. Lim, M. T. Ozsu, and K. L. Tan. “In- network execution of monitoring queries in sensor networks,” In SIGMOD Conference, Beijing, China, pp. 521–532, 2007.

[19]   J. Considine, F. Li, G. Kollios, and J. Byers, “Approximate aggregation techniques for sensor data- bases,” In ICDE. Boston, MA, pp. 449–460, 2004.

[20]   A. Deshpande, C. Guestrin, S. Madden, J. M. Hellerstein, and W. Hong, “Model-driven data acquisition in sensor networks,” In VLDB, Toronto, Canada, pp. 588–599, 2004.

[21]   A. Deshpande, C. Guestrin, W. Hong, and S. Madden, “Exploiting correlated attributes in axquisitional query processing,” In ICDE, Tokyo, Japan, pp. 143–154, 2005.

[22]   D. Chu, A. Deshpand, J. M. Hellerstein, and W. Hong, “Approximate data collection in sensor networks using probabilistic models,” In ICDE. Atlanta, pp. 48, 2006.

[23]   A. Silberstein, R. Braynard, C. S. Ellis, K. Munagala, and J. Yang, “A sampling-based approach to optimizing top-k queries in sensor networks,” In ICDE. Atlanta, pp. 68, 2006.

[24]   Y. Li, C. Ai, W. P. Deshmukh, and Y. Wu, “Data estimation in sensor networks using physical and statistical methodologies,” In ICDCS, Beijing, China, pp. 538–545, 2008.

[25]   H. Zhang, J. M. F. Moura, and B. H. Krogh. “Estimation in sensor networks: A graph approach,” In IPSN, Los Angeles, California, pp. 203–209, 2005.

[26]   M. Halatchev and L. Gruenwald. “Estimating missing values in related sensor data streams,” In COMAD, Hyderabad, India, pp. 83–94, 2005.

[27]   N. Jiang and L. Gruenwald, “Estimating missing data in data streams,” In DASFAA, Bangkok, Thailand, pp. 981–987, 2007.