JILSA  Vol.7 No.2 , May 2015
Classifying Unstructured Text Using Structured Training Instances and an Ensemble of Classifiers
Typical supervised classification techniques require training instances similar to the values that need to be classified. This research proposes a methodology that can utilize training instances found in a different format. The benefit of this approach is that it allows the use of traditional classification techniques, without the need to hand-tag training instances if the information exists in other data sources. The proposed approach is presented through a practical classification application. The evaluation results show that the approach is viable, and that the segmentation of classifiers can greatly improve accuracy.

Cite this paper
Lianos, A. and Yang, Y. (2015) Classifying Unstructured Text Using Structured Training Instances and an Ensemble of Classifiers. Journal of Intelligent Learning Systems and Applications, 7, 58-73. doi: 10.4236/jilsa.2015.72006.
[1]   Breiman, L. (1996) Bagging Predictors. Machine Learning, 24, 123-140.

[2]   Schapire, R.E. (1990) The Strength of Weak Learnability. Machine Learning, 5, 197-227.

[3]   Manning, C.D., Raghavan, P. and Schütze, H. (2008) Introduction to Information Retrieval, Vol. 1. Cambridge University Press, Cambridge. http://dx.doi.org/10.1017/CBO9780511809071

[4]   Sebastiani, F. (2002) Machine Learning in Automated Text Categorization. ACM Computing Surveys, 34, 1-47.http://dx.doi.org/10.1145/505282.505283

[5]   Forman, G. (2003) An Extensive Empirical Study of Feature Selection Metrics for Text Classification. Journal of Machine Learning Research, 3, 1289-1305.

[6]   Dasu, T., Johnson, T., Muthukrishnan, S. and Shkapenyuk, V. (2002) Mining Database Structure; or, How to Build a Data Quality Browser. Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, Madison, 3-6 June 2002, 240-251.

[7]   Naumann, F., Ho, C.T., Tian, X., Haas, L. and Megiddo, N. (2002) Attribute Classification Using Feature Analysis. Proceedings of the International Conference on Data Engineering, San Jose, 2002, 271-271. http://dx.doi.org/10.1109/icde.2002.994725

[8]   Maclin, R. and Opitz, D. (2011) Popular Ensemble Methods: An Empirical Study. ArXiv11060257

[9]   Freund, Y., Schapire, R.E., et al. (1996) Experiments with a New Boosting Algorithm. International Conference on Machine Learning, 96, 148-156.

[10]   Hansen, L.K. and Salamon, P. (1990) Neural Network Ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12, 993-1001. http://dx.doi.org/10.1109/34.58871

[11]   Dietterich, T.G. (2000) Ensemble Methods in Machine Learning. In: Multiple Classifier Systems, Springer, Berlin, 1-15. http://dx.doi.org/10.1007/3-540-45014-9_1

[12]   Opitz, D.W. and Shavlik, J.W. (1996) Actively Searching for an Effective Neural Network Ensemble. Connection Science, 8, 337-354. http://dx.doi.org/10.1080/095400996116802

[13]   Perrone, M.P. and Cooper, L.N. (1992) When Networks Disagree: Ensemble Methods for Hybrid Neural Networks. DTIC Document.

[14]   Rogova, G. (1994) Combining the Results of Several Neural Network Classifiers. Neural Networks, 7, 777-781. http://dx.doi.org/10.1016/0893-6080(94)90099-X

[15]   Maclin, R. and Shavlik, J.W. (1995) Combining the Predictions of Multiple Classifiers: Using Competitive Learning to Initialize Neural Networks. Proceedings of the 14th International Joint Conference on Artificial Intelligence, Montreal.

[16]   McCallum, A. and Nigam, K. (1998) A Comparison of Event Models for Naive Bayes Text Classification. Proceedings of the AAAI-98 Workshop on Learning for Text Categorization, Madison, 1998, 41-48.

[17]   Friedman, N., Geiger, D. and Goldszmidt, M. (1997) Bayesian Network Classifiers. Machine Learning, 29, 131-163.

[18]   Zhang, G.P. (2000) Neural Networks for Classification: A Survey. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 30, 451-462.

[19]   Ruck, D.W., Rogers, S.K. and Kabrisky, M. (1990) Feature Selection Using a Multilayer Perceptron. Neural Network Comput, 2, 40-48.

[20]   Quinlan, J.R. (1986) Induction of Decision Trees. Machine Learning, 1, 81-106.

[21]   Rokach, L. and Maimon, O.Z. (2008) Data Mining with Decision Trees: Theory and Applications. World Scientific Publishing Co., Inc., Singapore.

[22]   Kotsiantis, S., Zaharakis, I. and Pintelas, P. (2007) Supervised Machine Learning: A Review of Classification Techniques. Frontiers in Artificial Intelligence and Applications, 160, 3.

[23]   Xhemali, D., Hinde, C.J. and Stone, R.G. (2009) Naïve Bayes vs. Decision Trees vs. Neural Networks in the Classification of Training Web Pages. International Journal of Computer Science Issues, 4, 16-23.

[24]   Liu, H. and Motoda, H. (1998) Feature Selection for Knowledge Discovery and Data Mining. Springer, Berlin.http://dx.doi.org/10.1007/978-1-4615-5689-3

[25]   Dougherty, J., Kohavi, R. and Sahami, M. (1995) Supervised and Unsupervised Discretization of Continuous Features. Proceedings of the 12th International Conference on Machine Learning, Tahoe City, 9-12 July 1995, 194-202.

[26]   Catlett, J. (1991) On Changing Continuous Attributes into Ordered Discrete Attributes. Machine Learning—EWSL-91 Lecture Notes in Computer Science, 482, 164-178.

[27]   Kerber, R. (1992) Chimerge: Discretization of Numeric Attributes. Proceedings of the 10th National Conference on Artificial Intelligence, San Jose, 12-16 July 1992, 123-128.

[28]   John, G.H. and Langley, P. (1995) Estimating Continuous Distributions in Bayesian Classifiers. Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence, Montreal, 18-20 August 1995, 338-345.

[29]   Natrella, M. (2010) NIST/SEMATECH e-Handbook of Statistical Methods.

[30]   Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P. and Witten, I.H. (2009) The WEKA Data Mining Software: An Update. ACM SIGKDD Explorations Newsletter, 11, 10-18.

[31]   Ghani, R., Probst, K., Liu, Y., Krema, M. and Fano, A. (2006) Text Mining for Product Attribute Extraction. ACM SIGKDD Explorations Newsletter, 8, 41-48.

[32]   Miller, G.A. (1956) The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information. Psychological Review, 63, 81-97. http://dx.doi.org/10.1037/h0043158