AM  Vol.6 No.6 , June 2015
A Novel Method for Transforming XML Documents to Time Series and Clustering Them Based on Delaunay Triangulation
Author(s) Narges Shafieian
Nowadays exchanging data in XML format become more popular and have widespread application because of simple maintenance and transferring nature of XML documents. So, accelerating search within such a document ensures search engine’s efficiency. In this paper, we propose a technique for detecting the similarity in the structure of XML documents; in the following, we would cluster this document with Delaunay Triangulation method. The technique is based on the idea of representing the structure of an XML document as a time series in which each occurrence of a tag corresponds to a given impulse. So we could use Discrete Fourier Transform as a simple method to analyze these signals in frequency domain and make similarity matrices through a kind of distance measurement, in order to group them into clusters. We exploited Delaunay Triangulation as a clustering method to cluster the d-dimension points of XML documents. The results show a significant efficiency and accuracy in front of common methods.

Cite this paper
Shafieian, N. (2015) A Novel Method for Transforming XML Documents to Time Series and Clustering Them Based on Delaunay Triangulation. Applied Mathematics, 6, 1076-1085. doi: 10.4236/am.2015.66098.

[1]   Hwang, J. and Ryu, K. (2010) A Weighted Common Structure Based Clustering Technique for XML.

[2]   Algergawy, A., Nayak, R. and Saake, G. (2010) Element Similarity Measures in XML Schema Matching.

[3]   Flesca, S., Manco, G., Masciari, E., Pontieri, L. and Pugliese, A. (2005) Fast Detection of XML Structural Similarities. IEEE Transactions on Knowledge and Data Engineering, 7, 160-175.

[4]   Mundur, P., Rao, Y. and Yesha, Y. (2006) Key Frame Based Video Summarization Using Delaunay Clustering. International Journal on Digital Libraries, 6, 219-232.

[5]   Fabri1, A., Giezeman, G., Kettner, L., Schirra, S. and Schonherr, S. (1999) On the Design of CGAL a Computational Geometry Algorithms Library.

[6]   Denoyer, L. and Gallinari, P. (2006) The Wikipedia XML Corpus.

[7]   Yuan, J.-S., Li, X.-Y. and Ma, L.-N. (2008) An Improved XML Document Clustering Using Path Feature. 5th International Conference on Fuzzy Systems and Knowledge Discovery.

[8]   Naresh, N. and Ashok, B. (2010) Clustering Homogeneous XML Documents Using Weighted Similarities on XML Attributes. IEEE 2nd International Advance Computing Conference, Patiala, 19-20 February 2010, 369-372.