Explanation vs Performance in Data Mining: A Case Study with Predicting Runaway Projects

ABSTRACT

Often, the explanatory power of a learned model must be traded off against model performance. In the case of predict-ing runaway software projects, we show that the twin goals of high performance and good explanatory power are achievable after applying a variety of data mining techniques (discrimination, feature subset selection, rule covering algorithms). This result is a new high water mark in predicting runaway projects. Measured in terms of precision, this new model is as good as can be expected for our data. Other methods might out-perform our result (e.g. by generating a smaller, more explainable model) but no other method could out-perform the precision of our learned model.

Often, the explanatory power of a learned model must be traded off against model performance. In the case of predict-ing runaway software projects, we show that the twin goals of high performance and good explanatory power are achievable after applying a variety of data mining techniques (discrimination, feature subset selection, rule covering algorithms). This result is a new high water mark in predicting runaway projects. Measured in terms of precision, this new model is as good as can be expected for our data. Other methods might out-perform our result (e.g. by generating a smaller, more explainable model) but no other method could out-perform the precision of our learned model.

Cite this paper

nullT. MENZIES, O. MIZUNO, Y. TAKAGI and T. KIKUNO, "Explanation vs Performance in Data Mining: A Case Study with Predicting Runaway Projects,"*Journal of Software Engineering and Applications*, Vol. 2 No. 4, 2009, pp. 221-236. doi: 10.4236/jsea.2009.24030.

nullT. MENZIES, O. MIZUNO, Y. TAKAGI and T. KIKUNO, "Explanation vs Performance in Data Mining: A Case Study with Predicting Runaway Projects,"

References

[1] Y. Takagi, O. Mizuno, and T. Kikuno, “An empirical approach to characterizing risky software projects based on logistic regression analysis,” Empirical Software En-gineering, Vol. 10, No. 4, pp. 495–515, 2005.

[2] S. Abe, O. Mizuno, T. Kikuno, N. Kikuchi, and M. Hira-yama, “Estimation of project success using bayesian clas-sifier,” in ICSE 2006, pp. 600–603, 2006.

[3] O. Mizuno, T. Kikuno, Y. Takagi, and K. Sakamoto, “Characterization of risky projects based on project man-agers evaluation,” in ICSE 2000, 2000.

[4] R. Glass, “Software runaways: Lessons learned from massive software project failures,” Pearson Education, 1997.

[5] “The Standish Group Report: Chaos 2001,” 2001, http://standishgroup.com/sample research/PDFpages/ ex-treme chaos.pdf.

[6] J. Jiang, G. Klein, H. Chen, and L. Lin, “Reducing user-related risks during and prior to system develop-ment,” International Journal of Project Management, Vol. 20, No. 7, pp. 507–515, October 2002.

[7] J. Ropponen and K. Lyytinen, “Components of software development risk: how to address them? A project man-ager survey,” IEEE Transactions on Software Engineer-ing, pp. 98–112, Feburary 2000.

[8] W. Dillon and M. Goldstein, “Multivariate analysis: Methods and applications.” Wiley-Interscience, 1984.

[9] J. C. Munson and T. M. Khoshgoftaar, “The use of soft-ware complexity metrics in software reliability model-ing,” in Proceedings of the International Symposium on Software Reliability Engineering, Austin, TX, May 1991.

[10] G. Boetticher, T. Menzies, and T. Ostrand, “The PROM-ISE Repository of Empirical Software Engineering Data,” 2007, http://promisedata.org/repository.

[11] T. McCabe, “A complexity measure,” IEEE Transactions on Software Engineering, Vol. 2, No. 4, pp. 308–320, December 1976.

[12] M. Halstead, “Elements of software science,” Elsevier, 1977.

[13] K. Toh, W. Yau, and X. Jiang, “A reduced multivariate polynomial model for multimodal biometrics and classi-fiers fusion,” IEEE Transactions on Circuits and Systems for Video Technology, pp. 224–233, February 2004.

[14] R. Duda, P. Hart, and N. Nilsson, “Subjective bayesian methods for rule-based inference systems,” in Technical Report 124, Artificial Intelligence Center, SRI Interna-tional, 1976.

[15] P. Domingos and M. J. Pazzani, “On the optimality of the simple bayesian classifier under zero-one loss,” Machine Learning, Vol. 29, No. 2-3, pp. 103–130, 1997. http:// citeseer.ist.psu.edu/domingos97 optimality. html

[16] Y. Yang and G. Webb, “Weighted proportional k-interval discretization for naive-bayes classifiers,” in Proceedings of the 7th Pacific-Asia Conference on Knowledge Dis-covery and Data Mining (PAKDD 2003), 2003, http://www.csse.monash.edu/_webb/Files/YangWe-bb03.pdf.

[17] I. H. Witten and E. Frank, Data mining. 2nd edition. Los Altos, Morgan Kaufmann, US, 2005.

[18] G. John and P. Langley, “Estimating continuous distribu-tions in bayesian classifiers,” in Proceedings of the Elev-enth Conference on Uncertainty in Artificial Intelligence Montreal, Quebec: Morgan Kaufmann, 1995, pp. 338–345, http://citeseer.ist.psu.edu/john95 estimating.html.

[19] M. Hall and G. Holmes, “Benchmarking attribute selec-tion techniques for discrete class data mining,” IEEE Transactions On Knowledge And Data Engineering, Vol. 15, No. 6, pp. 1437–1447, 2003, http://www.cs.waikato.ac.nz/_mhall/HallHolmesTKDE.pdf.

[20] J. Dougherty, R. Kohavi, and M. Sahami, “Supervised and unsupervised discretization of continuous features,” in International Conference on Machine Learning, pp. 194–202, 1995, http://www.cs.pdx.edu/_timm/dm/dougherty95supervised.pdf.

[21] T. Menzies, J. Greenwald, and A. Frank, “Data mining static code attributes to learn defect predictors,” IEEE Transactions on Software Engineering, January 2007, http://menzies.us/pdf/06learnPredict.pdf.

[22] R. Quinlan, C4.5: Programs for Machine Learning. Mor-gan Kaufman, 1992.

[23] R. Holte, “Very simple classification rules perform well on most commonly used datasets,” Machine Learning, Vol. 11, pp. 63, 1993.

[24] L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, “Classification and regression trees,” Wadsworth Interna-tional, Monterey, CA, Tech. Rep., 1984.

[25] J. B. MacQueen, “Some methods for classification and analysis of multivariate observations,” in Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297, 1967.

[26] T. M. Cover and P. E. Hart, “Nearest neighbour pattern classification,” IEEE Transactions on Information Theory, pp. 21–27, January 1967.

[27] A. Beygelzimer, S. Kakade, and J. Langford, “Cover trees for nearest neighbor,” in ICML’06, 2006, http://hunch.net/_jl/projects/cover tree/cover tree.html.

[28] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, “Optimi-zation by simulated annealing,” Science, No. 4598, Vol. 220, pp. 671–680, 1983, http://citeseer.nj.nec.com/kirkpatrick83opt-imization.html

[29] G. G. Towell and J. W. Shavlik, “Extracting refined rules from knowledge-based neural networks,” Machine Learning, Vol. 13, pp. 71–101, 1993, http: //citeseer.ist.psu.edu/towell92extracting.html

[30] B. Taylor and M. Darrah, “Rule extraction as a formal method for the verification and validation of neural net-works,” in IJCNN ’05: Proceedings. 2005 IEEE Interna-tional Joint Conference on Neural Networks, Vol. 5, pp. 2915–2920, 2005.

[31] T. Menzies and E. Sinsel, “Practical large scale what-if queries: Case studies with software risk assessment,” in Proceedings ASE 2000, 2000, http://menzies.us/pdf/00ase.pdf.

[32] W. Cohen, “Fast effective rule induction,” in ICML’95, 1995, pp. 115–123, http://www.cs.cmu.edu/_wcohen/postscript/ml-95-ripper.ps.

[33] J. Cendrowska, “Prism: An algorithm for inducing modular rules,” International Journal of Man-Machine Studies, Vol. 27, No. 4, pp. 349–370, 1987.

[34] T. Dietterich, “Machine learning research: Four current directions,” AI Magazine, Vol. 18, No. 4, pp. 97–136, 1997.

[35] T. Menzies and J. S. D. Stefano, “How good is your blind spot sampling policy?” in 2004 IEEE Conference on High Assurance Software Engineering, 2003, http://menzies.us/pdf/03blind.pdf.

[36] J. Lu, Y. Yang, and G. Webb, “Incremental discretization for naive-bayes classifier,” in Lecture Notes in Computer Science 4093: Proceedings of the Second International Conference on Advanced Data Mining and Applications (ADMA 2006), pp. 223–238, 2006, http://www.csse.monash.edu/_webb/Files/LuYangWebb06.pdf.

[37] U. M. Fayyad and I. H. Irani, “Multi-interval discretiza-tion of continuous-valued attributes for classification learning,” in Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence, pp. 1022–1027, 1993.

[38] J. Gama and C. Pinto, “Discretization from data streams: Applications to histograms and data mining,” in SAC ’06: Proceedings of the 2006 ACM symposium on Applied computing. New York, NY, USA: ACM Press, pp. 662–667, 2006. http://www.liacc.up.pt/_jgama/ IWKDDS/Papers/p6.pdf.

[39] A. Miller, Subset Selection in Regression (second edition). Chapman & Hall, 2002.

[40] R. Kohavi and G. H. John, “Wrappers for feature subset selection,” Artificial Intelligence, Vol. 97, No. 1-2, pp. 273–324, 1997, http://citeseer.nj.nec.com/ kohavi96wrappers.html

[41] T. Menzies and J. D. Stefano, “More success and failure factors in software reuse,” IEEE Transactions on Soft-ware Engineering, May 2003, http://men- zies.us/pdf/02sereuse.pdf.

[42] T. Menzies, Z. Chen, J. Hihn, and K. Lum, “Selecting best practices for effort estimation,” IEEE Transactions on Software Engineering, November 2006, http://menzies.us/pdf/06coseekmo.pdf.

[43] U. Fayyad, “Data mining and knowledge discovery in databases: Implications for scientific databases,” in Pro-ceedings on Ninth International Conference on Scientific and Statistical Database Management, pp. 2–11, 1997.

[44] F. Provost, T. Fawcett, and R. Kohavi, “The case against accuracy estimation for comparing induction algorithms,” in Proc. 15th International Conf. on Ma-chine Learning. Morgan Kaufmann, San Francisco, CA, pp. 445–453, 1998, http://citeseer.nj.nec.com/ provost98case.html.

[45] R. Bouckaert, “Choosing between two learning algo-rithms based on calibrated tests,” in ICML’03, 2003, http://www.cs.pdx.edu/_timm/dm/10x 10way.

[46] C. Kirsopp and M. Shepperd, “Case and feature subset selection in case-based software project effort predic-tion,” in Proc. of 22nd SGAI International Conference on Knowledge-Based Systems and Applied Artificial Intel-ligence, Cambridge, UK, 2002.

[47] N. Nagappan and T. Ball, “Static analysis tools as early indicators of pre-release defect density,” in ICSE 2005, St. Louis, 2005.

[1] Y. Takagi, O. Mizuno, and T. Kikuno, “An empirical approach to characterizing risky software projects based on logistic regression analysis,” Empirical Software En-gineering, Vol. 10, No. 4, pp. 495–515, 2005.

[2] S. Abe, O. Mizuno, T. Kikuno, N. Kikuchi, and M. Hira-yama, “Estimation of project success using bayesian clas-sifier,” in ICSE 2006, pp. 600–603, 2006.

[3] O. Mizuno, T. Kikuno, Y. Takagi, and K. Sakamoto, “Characterization of risky projects based on project man-agers evaluation,” in ICSE 2000, 2000.

[4] R. Glass, “Software runaways: Lessons learned from massive software project failures,” Pearson Education, 1997.

[5] “The Standish Group Report: Chaos 2001,” 2001, http://standishgroup.com/sample research/PDFpages/ ex-treme chaos.pdf.

[6] J. Jiang, G. Klein, H. Chen, and L. Lin, “Reducing user-related risks during and prior to system develop-ment,” International Journal of Project Management, Vol. 20, No. 7, pp. 507–515, October 2002.

[7] J. Ropponen and K. Lyytinen, “Components of software development risk: how to address them? A project man-ager survey,” IEEE Transactions on Software Engineer-ing, pp. 98–112, Feburary 2000.

[8] W. Dillon and M. Goldstein, “Multivariate analysis: Methods and applications.” Wiley-Interscience, 1984.

[9] J. C. Munson and T. M. Khoshgoftaar, “The use of soft-ware complexity metrics in software reliability model-ing,” in Proceedings of the International Symposium on Software Reliability Engineering, Austin, TX, May 1991.

[10] G. Boetticher, T. Menzies, and T. Ostrand, “The PROM-ISE Repository of Empirical Software Engineering Data,” 2007, http://promisedata.org/repository.

[11] T. McCabe, “A complexity measure,” IEEE Transactions on Software Engineering, Vol. 2, No. 4, pp. 308–320, December 1976.

[12] M. Halstead, “Elements of software science,” Elsevier, 1977.

[13] K. Toh, W. Yau, and X. Jiang, “A reduced multivariate polynomial model for multimodal biometrics and classi-fiers fusion,” IEEE Transactions on Circuits and Systems for Video Technology, pp. 224–233, February 2004.

[14] R. Duda, P. Hart, and N. Nilsson, “Subjective bayesian methods for rule-based inference systems,” in Technical Report 124, Artificial Intelligence Center, SRI Interna-tional, 1976.

[15] P. Domingos and M. J. Pazzani, “On the optimality of the simple bayesian classifier under zero-one loss,” Machine Learning, Vol. 29, No. 2-3, pp. 103–130, 1997. http:// citeseer.ist.psu.edu/domingos97 optimality. html

[16] Y. Yang and G. Webb, “Weighted proportional k-interval discretization for naive-bayes classifiers,” in Proceedings of the 7th Pacific-Asia Conference on Knowledge Dis-covery and Data Mining (PAKDD 2003), 2003, http://www.csse.monash.edu/_webb/Files/YangWe-bb03.pdf.

[17] I. H. Witten and E. Frank, Data mining. 2nd edition. Los Altos, Morgan Kaufmann, US, 2005.

[18] G. John and P. Langley, “Estimating continuous distribu-tions in bayesian classifiers,” in Proceedings of the Elev-enth Conference on Uncertainty in Artificial Intelligence Montreal, Quebec: Morgan Kaufmann, 1995, pp. 338–345, http://citeseer.ist.psu.edu/john95 estimating.html.

[19] M. Hall and G. Holmes, “Benchmarking attribute selec-tion techniques for discrete class data mining,” IEEE Transactions On Knowledge And Data Engineering, Vol. 15, No. 6, pp. 1437–1447, 2003, http://www.cs.waikato.ac.nz/_mhall/HallHolmesTKDE.pdf.

[20] J. Dougherty, R. Kohavi, and M. Sahami, “Supervised and unsupervised discretization of continuous features,” in International Conference on Machine Learning, pp. 194–202, 1995, http://www.cs.pdx.edu/_timm/dm/dougherty95supervised.pdf.

[21] T. Menzies, J. Greenwald, and A. Frank, “Data mining static code attributes to learn defect predictors,” IEEE Transactions on Software Engineering, January 2007, http://menzies.us/pdf/06learnPredict.pdf.

[22] R. Quinlan, C4.5: Programs for Machine Learning. Mor-gan Kaufman, 1992.

[23] R. Holte, “Very simple classification rules perform well on most commonly used datasets,” Machine Learning, Vol. 11, pp. 63, 1993.

[24] L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, “Classification and regression trees,” Wadsworth Interna-tional, Monterey, CA, Tech. Rep., 1984.

[25] J. B. MacQueen, “Some methods for classification and analysis of multivariate observations,” in Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297, 1967.

[26] T. M. Cover and P. E. Hart, “Nearest neighbour pattern classification,” IEEE Transactions on Information Theory, pp. 21–27, January 1967.

[27] A. Beygelzimer, S. Kakade, and J. Langford, “Cover trees for nearest neighbor,” in ICML’06, 2006, http://hunch.net/_jl/projects/cover tree/cover tree.html.

[28] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, “Optimi-zation by simulated annealing,” Science, No. 4598, Vol. 220, pp. 671–680, 1983, http://citeseer.nj.nec.com/kirkpatrick83opt-imization.html

[29] G. G. Towell and J. W. Shavlik, “Extracting refined rules from knowledge-based neural networks,” Machine Learning, Vol. 13, pp. 71–101, 1993, http: //citeseer.ist.psu.edu/towell92extracting.html

[30] B. Taylor and M. Darrah, “Rule extraction as a formal method for the verification and validation of neural net-works,” in IJCNN ’05: Proceedings. 2005 IEEE Interna-tional Joint Conference on Neural Networks, Vol. 5, pp. 2915–2920, 2005.

[31] T. Menzies and E. Sinsel, “Practical large scale what-if queries: Case studies with software risk assessment,” in Proceedings ASE 2000, 2000, http://menzies.us/pdf/00ase.pdf.

[32] W. Cohen, “Fast effective rule induction,” in ICML’95, 1995, pp. 115–123, http://www.cs.cmu.edu/_wcohen/postscript/ml-95-ripper.ps.

[33] J. Cendrowska, “Prism: An algorithm for inducing modular rules,” International Journal of Man-Machine Studies, Vol. 27, No. 4, pp. 349–370, 1987.

[34] T. Dietterich, “Machine learning research: Four current directions,” AI Magazine, Vol. 18, No. 4, pp. 97–136, 1997.

[35] T. Menzies and J. S. D. Stefano, “How good is your blind spot sampling policy?” in 2004 IEEE Conference on High Assurance Software Engineering, 2003, http://menzies.us/pdf/03blind.pdf.

[36] J. Lu, Y. Yang, and G. Webb, “Incremental discretization for naive-bayes classifier,” in Lecture Notes in Computer Science 4093: Proceedings of the Second International Conference on Advanced Data Mining and Applications (ADMA 2006), pp. 223–238, 2006, http://www.csse.monash.edu/_webb/Files/LuYangWebb06.pdf.

[37] U. M. Fayyad and I. H. Irani, “Multi-interval discretiza-tion of continuous-valued attributes for classification learning,” in Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence, pp. 1022–1027, 1993.

[38] J. Gama and C. Pinto, “Discretization from data streams: Applications to histograms and data mining,” in SAC ’06: Proceedings of the 2006 ACM symposium on Applied computing. New York, NY, USA: ACM Press, pp. 662–667, 2006. http://www.liacc.up.pt/_jgama/ IWKDDS/Papers/p6.pdf.

[39] A. Miller, Subset Selection in Regression (second edition). Chapman & Hall, 2002.

[40] R. Kohavi and G. H. John, “Wrappers for feature subset selection,” Artificial Intelligence, Vol. 97, No. 1-2, pp. 273–324, 1997, http://citeseer.nj.nec.com/ kohavi96wrappers.html

[41] T. Menzies and J. D. Stefano, “More success and failure factors in software reuse,” IEEE Transactions on Soft-ware Engineering, May 2003, http://men- zies.us/pdf/02sereuse.pdf.

[42] T. Menzies, Z. Chen, J. Hihn, and K. Lum, “Selecting best practices for effort estimation,” IEEE Transactions on Software Engineering, November 2006, http://menzies.us/pdf/06coseekmo.pdf.

[43] U. Fayyad, “Data mining and knowledge discovery in databases: Implications for scientific databases,” in Pro-ceedings on Ninth International Conference on Scientific and Statistical Database Management, pp. 2–11, 1997.

[44] F. Provost, T. Fawcett, and R. Kohavi, “The case against accuracy estimation for comparing induction algorithms,” in Proc. 15th International Conf. on Ma-chine Learning. Morgan Kaufmann, San Francisco, CA, pp. 445–453, 1998, http://citeseer.nj.nec.com/ provost98case.html.

[45] R. Bouckaert, “Choosing between two learning algo-rithms based on calibrated tests,” in ICML’03, 2003, http://www.cs.pdx.edu/_timm/dm/10x 10way.

[46] C. Kirsopp and M. Shepperd, “Case and feature subset selection in case-based software project effort predic-tion,” in Proc. of 22nd SGAI International Conference on Knowledge-Based Systems and Applied Artificial Intel-ligence, Cambridge, UK, 2002.

[47] N. Nagappan and T. Ball, “Static analysis tools as early indicators of pre-release defect density,” in ICSE 2005, St. Louis, 2005.