JCC  Vol.3 No.5 , May 2015
Cluster Analysis Based on Contextual Features Extraction for Conversational Corpus
Abstract: Cluster analysis related to computational linguistics seldom concerned with Pragmatics level. Features of corpus on Pragmatics level related to specific situations, including backgrounds, titles and habits. To improve the accuracy of clustering for conversations collected from international students in Tsinghua University, it required contextual features. Here, we collected four-hundred conversations as a corpus and built it to Vector Space Model. With the Oxford-Duden Dictionary and other methods we modified the model and concluded into three groups. We testified our hypothesis through self-organizing map neural network. The result suggested that the modified model had a better outcome.
Cite this paper: Chen, Q. , Chen, Y. and Jiang, M. (2015) Cluster Analysis Based on Contextual Features Extraction for Conversational Corpus. Journal of Computer and Communications, 3, 33-37. doi: 10.4236/jcc.2015.35004.

[1]   Jurafsky and Martin (2000) Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice Hall.

[2]   Lewis, D.D. and Hayes, P.J. (1994) ACM Trans-actions on Information Systems: Special Issue on Text Categorization, Vol. 12. ACM Press.

[3]   Ji, H., Luo, Z.S., Wang, M. and Gao, X.Y. (2002) Summarizing Based on Concept Counting and Hierarchy Analysis. The Natural Language Processing and Knowledge Engineering (NLPKE) Mini Symposium of the 2002 IEEE International Conference on Systems, Man and Cybernetics (SMC2002).

[4]   Liao, S.S. and Jiang, M.H. (2005) An Improved Method of Feature Selection Based on Concept Attributes in Text Classification. Advances in Natural Computation, Lecture Notes in Computer Science, 3610, 1140-1149.

[5]   Kohonen, T. (1987) Self-Organization and Associative Memory. 2nd Edition, Springer-Verlag, Berlin.

[6]   Salton, G., Singhal, A., Buckley, C., et al. (1994) Automatic Text Decomposition Using Text Segments and Text Themes. Text Retrieval Conference, Washington DC.