Cleaning duplicate data is
a major problem that persists even though many works have been done to solve
it, due to the exponential growth of data amount treated and the necessity to
use scalable and speed algorithms. This problem depends on the type and quality
of data, and differs according to the volume of data set manipulated. In this
paper we are going to introduce a novel framework based on extended fuzzy
C-means algorithm by using topic ontology. This work aims to improve the OLAP
querying process over heterogeneous data warehouses that contain big data sets,
by improving query results integration, eliminating redundancies by using the
extended classification algorithm, and measuring the loss of information.
Cite this paper
Mouhni, N. , Elkalay, A. and Chakraoui, M. (2014) Optimizing Query Results Integration Process Using an Extended Fuzzy C-Means Algorithm. Journal of Software Engineering and Applications
, 354-359. doi: 10.4236/jsea.2014.75032
 Hemalatha, S., Raja, K. and Arasu, T. (2011) Duplicate Detection of Query Results from Multiple Web Databases. IJCA Special Issue on Computational Science—New Dimension & Perspectives.
 James, C. and Bezdek, R.E. (1984) William Full FCM: The Fuzzy c-Means Clustering Algorithm. Computers & Geosciences, 10, 191-203. http://dx.doi.org/10.1016/0098-3004(84)90020-7
 Robert, L., Cannon, J.V.D. and Bezdek, J.C. (1986) Efficient Implementation of the Fuzzy c-Means Clusteng Algornthms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8, 248-255.
 Jayanthi, S.K. and Subramani, S. (2010) Link Spam Detection Based on Dbspamclust with Fuzzy c-Means Clustering. International Journal of Next-Generation Networks, 2.
 Blonda, A. and Blonda, P. (1999) A Survey of Fuzzy Clustering Algorithms for Pattern Recognition—Part I. IEEE Transactions on Systems, Man, and Cybernetics, 29, 778-785. http://dx.doi.org/10.1109/3477.809032
 O. Hassanzadeh, Chiang, F., Lee, H.C. and Miller, R.J. (2009) Framework for Evaluating Clustering Algorithms in Duplicate Detection. Proceedings of the VLDB Endowment, 2, 1282-1293.
 Gruber, T.R. (1995) Toward Principles for the Design of Ontologies Used for Knowledge Sharing. International Journal of Human-Computer Studies, 43, 907-928. http://dx.doi.org/10.1006/ijhc.1995.1081
 Mouhni, N. and El Kalay, A. (2014) A Critical Overview of Existing Query Processing Systems over Heterogeneous Data Sources. Journal of Theoretical & Applied Information Technology, 60, 254-262.
 Guarino, N. (Ed.) (1998) Formal Ontology in Information Systems. Proceedings of the First International Conference (FOIS’98), Trento, 6-8 June 1998.
 Mouhni, N. and El Kalay, A. (2013) Ontology Based Data Warehouses Federation Management System. International Journal of Computer Science Issues (IJCSI), 10.