JILSA  Vol.3 No.3 , August 2011
Insertion of Ontological Knowledge to Improve Automatic Summarization Extraction Methods
Abstract: The vast availability of information sources has created a need for research on automatic summarization. Current methods perform either by extraction or abstraction. The extraction methods are interesting, because they are robust and independent of the language used. An extractive summary is obtained by selecting sentences of the original source based on information content. This selection can be automated using a classification function induced by a machine learning algorithm. This function classifies sentences into two groups: important or non-important. The important sentences then form the summary. But, the efficiency of this function directly depends on the used training set to induce it. This paper proposes an original way of optimizing this training set by inserting lexemes obtained from ontological knowledge bases. The training set optimized is reinforced by ontological knowledge. An experiment with four machine learning algorithms was made to validate this proposition. The improvement achieved is clearly significant for each of these algorithms.
Cite this paper: nullJ. Motta, L. Capus and N. Tourigny, "Insertion of Ontological Knowledge to Improve Automatic Summarization Extraction Methods," Journal of Intelligent Learning Systems and Applications, Vol. 3 No. 3, 2011, pp. 131-138. doi: 10.4236/jilsa.2011.33015.

[1]   A. Sharan and H. Imran, “Machine Learning Approach for Automatic Document Summarization,” Proceedings of World Academy of Science, Engineering and Techno- logy, 2009, pp. 103-109.

[2]   R. A. García-Hernandez, R. Montiel, Y. Ledeneva, E. Rendón, A. Gelbukh and R. Cruz, “Text Summarization by Sentence Extraction Using Unsupervised Learning,” Proceedings of the 7th Mexican International Conference on Artificial Intelligence: Advances in Artificial Intelligence, 2008, pp. 133-143.

[3]   I. Mani and E. Bloedorn, “Machine Learning of Generic and User-Focused-Summarization,” Proceedings of the Tenth Conference on Innovative Applications of Artificial Intelligence, Menlo Park, 1998, pp. 821-826.

[4]   J. Goldstein, “Evaluating and generating summaries using normalized probabilities,” Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, 1999, pp. 121- 128. doi:10.1145/312624.312665

[5]   K. S. Jones, “Automatic Summarising: The State of Art,” Information Processing and Management, Vol. 43, No. 6, 2007, pp. 1449-1481. doi:10.1016/j.ipm.2007.03.009

[6]   R. R. Korfhage, “Information Storage and Retrieval,” Wiley, New York, 1997.

[7]   L. Hennig, W. Umbrath and R. Wetzker, “An Ontology-Based Approach to Text Summarization,” IEEE/WIC /ACM Proceedings of the International Conference on Web Intelligence and Intelligent Agent Technology Work- shops, 2008, pp. 291-294.

[8]   R. Bellman, “Introduction to Matrix Analysis,” McGraw-Hill, New York, 1997.

[9]   M. Steinbach, “Introduction to Data Mining”, Pearson Education, Boston, 2006.

[10]   M. Ikonomakis, S. Kotsiantis and V. Tampakas, “Text Classification Using Machine Learning Techniques,” Proceedings of the 9th WSEAS International Conference on Computers, Stevens Point, 2005, pp. 966-974.

[11]   I. Mani, “Recent Development in Text Summarization,” Proceedings of the Tenth International Conference on Information and Knowledge Management, McLean, 2001, pp. 529-531.

[12]   H. Xuexian, “Accuracy Improvement of Automatic Text Classification Based in Feature Transformation and Multi-classifier Combination,” Proceedings of AWCC’2004, Zhenjiang, 2004, pp. 463-464.

[13]   G. Salton and C. Buckley, “Term-Weighting Approaches in Automatic Text Retrieval,” Information Processing and Management, Vol. 24, No. 5, 1988, pp. 513-523. doi:10.1016/0306-4573(88)90021-0

[14]   G. H. Golub, “Calculing the Singular Values and Pseudo-Inverse of a Matrix,” Journal of the Society for Industrial and Applied Mathematics, Vol. 2, No. 2, 1965, pp. 205-224. doi:10.1137/0702016

[15]   T. A. Lasko, J. G. Bhagwat, K. H. Zou and L. Ohno-Machado, “The Use of Receiver Operating Characteristic Curves in Biomedical Informatics,” Journal of Biomedical Informatics, Vol. 38, No. 5, 2005, pp. 404-415. doi:10.1016/j.jbi.2005.02.008

[16]   A. Saleh, “Reuters,” 2004.

[17]   C. D. Fellbaum, “WordNet,” Princeton University, 1985.

[18]   V. Vapnik, “The Nature of Statistical Learning Theory,” Springer-Verlag, New York, 1995.

[19]   R. Bellman, “Algorithms, Graphs and Computers”, Academic Press, New York, 1970.

[20]   F. Rosenblatt, “Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms,” Spartan Books, Washington DC, 1961.

[21]   R. Rakotomalala, “Tanagra: Un Logiciel Gratuit Pour L’enseignement et la Recherche,” Proceedings of the EGC’2005 Conference, Amsterdam, 2005, pp. 697-702.

[22]   G. Holmes, A. Donkin and I. H. Witten, “Weka,” Universty of Waikato, 1994.

[23]   J. Demzar and B. Zupan, “Orange,” University of Ljubljana, 2010.