Under push of Mobile Internet, new social media
such as microblog, we chat, question answering systems are constantly emerging.
They produce huge amounts of short texts which bring forward new challenges to text
clustering. In response to the features of large amount and dynamic growth of short
texts, a two-stage clustering method was putted forward. This method adopted a sliding
window sliding on the flow of short texts. Inside the slide window, hierarchical
clustering method was used, and between the slide windows, clusters merging method
based on information gain was adopted. Experiment indicated that this method is
fast and has a higher accuracy.
Cite this paper
Wang, Y. , Wu, L. and Shao, H. (2014) Clusters Merging Method for Short Texts Clustering. Open Journal of Social Sciences
, 186-192. doi: 10.4236/jss.2014.29032
 He, H., Chen, B., Xu, W., et al. (2007) Short Text Feature Extraction and Clustering for Web Topic Mining. IEEE Third International Conference on Semantics, Knowledge and Grid, 382-385.
 Hartigan, J.A. and Wong, M.A. (1979) Algorithm AS 136: A k-Means Clustering Algorithm. Journal of the Royal Statistical Society, Series C (Applied Statistics), 28, 100-108.
 Szekely, G.J. and Rizzo, M.L. (2005) Hierarchical Clustering via Joint between-within Distances: Extending Ward’s Minimum Variance Method. Journal of Classification, 22, 151-183. http://dx.doi.org/10.1007/s00357-005-0012-9
 Zhao, P. and Cai, Q.S. (2007) Research of Novel Chinese Text Clustering Algorithm Based on HowNet. Computer Engineering and Applications, 43, 162-163.
 Tang, J., Wang, X., Gao, H., et al. (2012) Enriching Short Text Representation in Microblog for Clustering. Frontiers of Computer Science, 6, 88-101.
 Wang, L., Jia, Y., Han, W. (2007) Instant Message Clustering Based on Extended Vector Space Model. Advances in Computation and Intelligence, Springer Berlin Heidelberg, 435-443. http://dx.doi.org/10.1007/978-3-540-74581-5_48
 Peng, Z.Y., Yu, X.M., Xu H.B., et al. (2011) Incomplete Clustering for Large Scale Short Texts. Journal of Chinese Information, 25, 54-59.
 Chen, J.C., Hu, G.W., Yang, Z.H., et al. (2011) Text Clustering Based on Global Center-Determination. Computer Engineering and Applications, 47, 147-150.
 Liu, Z.X., Liu, Y.B. and Luo, L.M. (2010) An Efficient Density and Grid Based Clustering Algorithm. Journal of Chongqing University of Posts and Telecommunications (Natural Science Edition), 22, 242-247.
 Quinlan, J.R. (1979) Discovering Rules by Induction from Large Collections of Examples. Expert Sys-tems in the Micro Electronic Age. Edinburgh University Press.
 Guha, S., Rastogi, R. and Shim, K. (1998) CURE: An Efficient Clustering Algorithm for Large Databases. ACM SIGMOD Record, ACM, 27, 73-84.
 Zhou, Z.T. (2005) Quality Evaluation of Text Clustering Results and Investigation on Text Representation. Graduate University of Chinese Academy of Sciences, Beijing.