JSEA  Vol.5 No.12 , December 2012
ML-CLUBAS: A Multi Label Bug Classification Algorithm

In this paper, a multi label variant of CLUBAS [1] algorithm, ML-CLUBAS (Multi Label-Classification of software Bugs Using Bug Attribute Similarity) is presented. CLUBAS is a hybrid algorithm, and is designed by using text clustering, frequent term calculations and taxonomic terms mapping techniques, and is an example of classification using clustering technique. CLUBAS is a single label algorithm, where one bug cluster is exactly mapped to a single bug category. However a bug cluster can be mapped into the more than one bug category in case of cluster label matches with the more than one category term, for this purpose ML-CLUBAS a multi label variant of CLUBAS is presented in this work. The designed algorithm is evaluated using the performance parameters F-measures and accuracy, number of clusters and purity. These parameters are compared with the CLUBAS and other multi label text clustering algorithms.

Cite this paper
N. Nagwani and S. Verma, "ML-CLUBAS: A Multi Label Bug Classification Algorithm," Journal of Software Engineering and Applications, Vol. 5 No. 12, 2012, pp. 983-990. doi: 10.4236/jsea.2012.512113.
[1]   N. K. Nagwani and S. Verma, “CLUBAS: An Algorithm and Java Based Tool for Software Bug Classification Using Bug Attributes Similarities,” Journal of Software Engineering and Applications, Vol. 5 No. 6, 2012, pp. 436-447. doi:10.4236/jsea.2012.56050

[2]   S. Chapman, “Simmetrics, Java Based API for Text Similarity Measurement,” 2011. http://www.dcs.shef.ac.uk/~sam/simmetrics.html.

[3]   C. D. Manning, P. Raghavan and H. Schuitze, “Introduction to Information Retrieval,” 2008. http://nlp.standford.edu/IR-book/html/htmledition/evaluation-of-clustering-1.html

[4]   H. Li, K. Zhang and T. Jiang, “Minimum Entropy Clustering and Applications to Gene Expression Analysis,” Proceedings of IEEE Computational System Bioinformatics Conference, Stanford, August 2004, pp. 142-151.

[5]   I. H. Witten, E. Frank, L. E. Trigg, M. A. Hall, G. Holmes and S. J. Cunningham, “Weka (Waikato Environment for Knowledge Analysis),” 2011. www.cs.waikato.ac.nz/ml/weka

[6]   “Android Bug Repository,” 2011. http://code.google.com/p/android/issues.

[7]   JBoss-Seam, “Bug Repository,” 2011. https://issues.jboss.org/browse/JBSEAM.

[8]   “Mozilla Bug Repository,” 2011. https://bugzilla.mozilla.org.

[9]   MySql, “Bug Repository,” 2011. http://bugs.mysql.com.

[10]   S. Osinski, J. Stefanowski and D. Weiss, “Lingo: Search Results Clustering Algorithm Based on Singular Value Decomposition,” Proceedings of the International Intelligent Information Processing and Web Mining Conference, Zakopane, 17-20 May 2004, pp. 359-368.

[11]   S. Osinski, “An Algorithm for Clustering of Web Search Results,” Master’s thesis, Poznań University of Technology, Poznań, 2003.

[12]   O. Zamir, O. Etzioni, “Grouper: A Dynamic Clustering Interface for Web Search Results,” Computer Networks, Vol. 31, No. 11-16, 1999, pp. 1361-1374. doi:10.1016/S1389-1286(99)00054-7

[13]   O. Zamir and O. Etzioni, “Web Document Clustering: A Feasibility Demonstration,” Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), Melbourne, 24-28 August 1998, pp. 46-54.

[14]   W. Li, “Random Texts Exhibit Zipf’s-Law-Like Word Frequency Distribution,” IEEE Transactions on Information Theory, Vol. 38, No. 6, 1992, pp. 1842-1845. doi:10.1109/18.165464

[15]   W. J. Reed, “The Pareto, Zipf and Other Power Laws,” Economics Letters, Vol. 74, No. 1, 2001, pp. 15-19. doi:10.1016/S0165-1765(01)00524-9