JCC  Vol.3 No.12 , December 2015
Enhancing Amharic Information Retrieval System Based on Statistical Co-Occurrence Technique
Abstract
Information retrieval (IR) systems are designed to help information seekers retrieving relevant information from vast document. The need for relevant information from a vast amount of document gave birth to IR systems. Even though different IR systems exist, they cannot meet all users’ expectations. A different level of users’ knowledge makes queries to be expressed in different ways. As a result, the system may miss the core meaning of users query and retrieve dissatisfactory results. This happens mainly because of the ambiguities of words involved in the natural languages and expression mismatch among users and authors. The existing ambiguities in Amharic language have negative impacts on the performance of Amharic IR system. Some of the ambiguities for this type of problem are: spelling variants of the same word, polysemous and synonymous terms. If users are not fully knowledgeable about the information domain area, they will mostly formulate weak queries to retrieve documents. Thus, they end up frustrated with the results found from an IR system. This research has been conducted, aiming at augmenting the recall of previous work. Statistical co-occurrence technique has been used in order to expand query terms. The main reason for performing query expansion is to provide relevant documents as per users’ query that can satisfy their information need. Statistical co-occurrence method considers, frequently appearing terms with the query term, regardless of their position. The efficiency of proposed technique has been tested on the prototype system and the result found compared with the result of previous study. Accordingly, 6% recall and 2% f-measure improvement has been made. Hence, the statistical co-occurrence method outperformed the bi-gram based IR system.

Cite this paper
Bruck, A. and Tilahun, T. (2015) Enhancing Amharic Information Retrieval System Based on Statistical Co-Occurrence Technique. Journal of Computer and Communications, 3, 67-76. doi: 10.4236/jcc.2015.312006.
References

[1]   Bush, V. (1945) As We May Think. The Atlantic Monthly, 176, 101-108.

[2]   Baeza-Yates, R. and Ribeiro-Neto, B. (1999) Modern Information Retrieval. 2nd Edition, Addison-Wesley-Longman Publishers, England.

[3]   Spink, A. and Wilson, T.D. (2002) Towards a Theoretical Framework for Information Retrieval (IR) Evaluation in an Information Seeking Context. Journal of the American Society for Information Science, 51, 841-857.

[4]   Belkin, N.J. (1993) Interaction with Texts: Information Retrieval as Information Seeking Behavior. Universitat Regensburg and Universitatsverlag Konstanz, SchriftenzurInformationswissenschaft Band 12, 55-66.

[5]   Wang, X. (2009) Improving Web Search for Difficult Queries. Unpublished paper available at University of Illinois at Urbana-Champaign.

[6]   Greenberg, J. (2001) Optimal Query Expansion (QE) Processing Methods with Semantically Encoded Structured Thesauri Terminology. Journal of the American Society for Information Science and Technology, 52, 487-498.
http://dx.doi.org/10.1002/asi.1093

[7]   Alemayehu, N. (2002) Application of Query Expansion for Amharic Information Retrieval System. M.Sc. Thesis, Addis Ababa University, Addis Ababa.

[8]   Stubinz, J. and Whighli, S. (1998) Information Retrieval System Design for Very High Effectiveness. Division of Computer Science, Endeavour Research and Development (BVI), 12 October 1998, 1-7.

[9]   Ashington, B. (1956) An Automatic System for Retrieval of Electronic Documents. Third British Colloquium on Electronic Computing, Manchester, 118-121.

[10]   Dagnachew, A. and Worku, A. (1986) Yeamaregna Felitoch. Kuraz Asatami Dereget.

[11]   Germann, D.C., Villavicencio, A. and Siqueira M. (2010) An Investigation on Polysemy and Lexical Organization of verbs. Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics, Los Angeles, June 2010, 52-60.

[12]   Billhardt, H., Borrajo, D. and Maojo, V. (2002) A Context Vector Model for Information Retrieval. Journal of American Society for Information Science and Technology, 53, 236-249.
http://dx.doi.org/10.1002/asi.10032

[13]   Kalmanovich, I.G. and Kurland, O. (2009) Cluster-Based Query Expansion. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, New York, 646-647.
http://dx.doi.org/10.1145/1571941.1572058

[14]   Li, L. (2007) A Query Expansion Method Based on Semantic Element. Eighth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing, Qingdao, 30 July-1 August 2007, 587-590.
http://dx.doi.org/10.1109/SNPD.2007.249

[15]   Bloor, T. (1995) The Ethiopic Writing System: A Profile. Journal of the Simplified Spelling Society, 19, 30-36.

[16]   Gizaw, S. (2009) Multiple Pronunciation Model for Amharic Speech Recognition System. M.Sc. Thesis, Addis Ababa University, Addis Ababa.

[17]   Redwan, H. and Atnafu, S, (1995) Design and Implementation-Algorithms of Amharic Search Engine System for Amharic Web Contents. M.Sc. Thesis, Addis Ababa University, Addis Ababa.

[18]   Mengistu, T.M. (2007) Design and Implementation of Amharic Search Engine. M.Sc. Thesis, Addis Ababa University, Addis Ababa.

[19]   Hailemeskel, T. (2003) Amharic Text Retrieval: An Experiment Using Latent Semantic Indexing (LSI) with Singular Value Decomposition (SVD). M.Sc. Thesis, Addis Ababa University, Addis Ababa.

[20]   Saba, A. (2001) The Application of Information Retrieval Techniques to Amharic Documents on the Web. M.Sc. Thesis, Addis Ababa University, Addis Ababa.

[21]   Gezmu, A.M. (2009) Automatic Thesaurus Construction for Amharic Text Retrieval. M.Sc. Thesis, Addis Ababa University, Addis Ababa.

[22]   Bethlehem, M.A. (2002) The Application n-Gram-Based Indexing in Amharic Text Retrieval. M.Sc. Thesis, Addis Ababa University, Addis Ababa.

[23]   Tilahun, T. (2014) Linguistic Localization of Opinion Mining from Amharic Blogs. International Journal of Information Technology & Computer Sciences Perspectives, 3, 890.

[24]   Tilahun, T. and Sharma, D. (2015) Design and Development of E-Governance Model for Service Quality Enhancement. Journal of Data Analysis and Information Processing, 3, 55-62.
http://dx.doi.org/10.4236/jdaip.2015.33007

[25]   Jing, H and Tzoukermann, E. (1999) Information Retrieval Based on Context Distance and Morphology. In: Proceedings of the 22nd Annual International Conference on Research and Development in Information Retrieval (SIGIR 99), ACM, New York, 90-96.
http://dx.doi.org/10.1145/312624.312661

[26]   Bruck, A. and Tilahun, T. (2015) Bi-Gram Based Query Expansion Technique for Amharic Information Retrieval System. IJIEEB, 7, 1-7.

 
 
Top