JCC  Vol.3 No.5 , May 2015
Hadoop-Based Similarity Computation System for Composed Documents
Abstract

There exist a large number of composed documents in universities in the teaching process. Most of them are required to check the similarity for validation. A kind of similarity computation system is constructed for composed documents with images and text information. Firstly, each document is split and outputs two parts as images and text information. Then, these documents are compared by computing the similarities of images and text contents independently. Through Hadoop system, the text contents are easily and quickly separated. Experimental results show that the proposed system is efficient and practical.


Cite this paper
Zhang, X. , Qin, Z. , Liu, X. , Hou, Q. , Zhang, B. and Wu, J. (2015) Hadoop-Based Similarity Computation System for Composed Documents. Journal of Computer and Communications, 3, 196-202. doi: 10.4236/jcc.2015.35025.
References

[1]   Mao, E., Wesley, P. and Chu, W. (2007) The Phrase Based Vector Space Model for Automatic Retrieval of Free- Document Medical Documents. Data & Knowledge Engineering, 1.

[2]   He, C.B., Tang, Y. and Tang, F.Y. (2011) Large-Scale Document Similarity Computation Based on Cloud Computing Platform. 2011 6th International Conference on Pervasive?Computing and Applications (ICPCA).

[3]   Li, L.N., Li, C.P. and Chen, H. (2013) Map Reduce-Based SimRank Computation and Its Application. 2013 IEEE International Congress on Big Data.

[4]   Baraglia, R., Morales, G.F. and Lucchese, C. (2010) Document Similarity Self-Join with MapReduce. 2010 IEEE International Conference on Data Mining. http://dx.doi.org/10.1109/ICDM.2010.70

[5]   Dean, J. and Ghemawat, S. (2008) MapReduce: Simplified Data Processing on Large Clusters. Communications of the ACM, 1. http://dx.doi.org/10.1145/1327452.1327492

 
 
Top