AM  Vol.6 No.6 , June 2015
A Novel Method for Transforming XML Documents to Time Series and Clustering Them Based on Delaunay Triangulation
Abstract: Nowadays exchanging data in XML format become more popular and have widespread application because of simple maintenance and transferring nature of XML documents. So, accelerating search within such a document ensures search engine’s efficiency. In this paper, we propose a technique for detecting the similarity in the structure of XML documents; in the following, we would cluster this document with Delaunay Triangulation method. The technique is based on the idea of representing the structure of an XML document as a time series in which each occurrence of a tag corresponds to a given impulse. So we could use Discrete Fourier Transform as a simple method to analyze these signals in frequency domain and make similarity matrices through a kind of distance measurement, in order to group them into clusters. We exploited Delaunay Triangulation as a clustering method to cluster the d-dimension points of XML documents. The results show a significant efficiency and accuracy in front of common methods.
Cite this paper: Shafieian, N. (2015) A Novel Method for Transforming XML Documents to Time Series and Clustering Them Based on Delaunay Triangulation. Applied Mathematics, 6, 1076-1085. doi: 10.4236/am.2015.66098.

[1]   Hwang, J. and Ryu, K. (2010) A Weighted Common Structure Based Clustering Technique for XML.

[2]   Algergawy, A., Nayak, R. and Saake, G. (2010) Element Similarity Measures in XML Schema Matching.

[3]   Flesca, S., Manco, G., Masciari, E., Pontieri, L. and Pugliese, A. (2005) Fast Detection of XML Structural Similarities. IEEE Transactions on Knowledge and Data Engineering, 7, 160-175.

[4]   Mundur, P., Rao, Y. and Yesha, Y. (2006) Key Frame Based Video Summarization Using Delaunay Clustering. International Journal on Digital Libraries, 6, 219-232.

[5]   Fabri1, A., Giezeman, G., Kettner, L., Schirra, S. and Schonherr, S. (1999) On the Design of CGAL a Computational Geometry Algorithms Library.

[6]   Denoyer, L. and Gallinari, P. (2006) The Wikipedia XML Corpus.

[7]   Yuan, J.-S., Li, X.-Y. and Ma, L.-N. (2008) An Improved XML Document Clustering Using Path Feature. 5th International Conference on Fuzzy Systems and Knowledge Discovery.

[8]   Naresh, N. and Ashok, B. (2010) Clustering Homogeneous XML Documents Using Weighted Similarities on XML Attributes. IEEE 2nd International Advance Computing Conference, Patiala, 19-20 February 2010, 369-372.