JSEA  Vol.2 No.2 , July 2009
Transliterated Word Identification and Application to Query Translation Mining
ABSTRACT
Query translation mining is a key technique in cross-language information retrieval and machine translation knowl-edge acquisition. For better performance, the queries are classified into transliterated words and non-transliterated words based on transliterated word identification model, and are further channeled to different mining processes. This paper is a pilot study on query classification for better translation mining performance, which is based on supervised classification and linguistic heuristics. The person name identification gets a precision of over 97%. Transliterated word translation mining shows satisfactory performance.

Cite this paper
nullJ. Zhang, L. Guo, M. Zhou and J. Yao, "Transliterated Word Identification and Application to Query Translation Mining," Journal of Software Engineering and Applications, Vol. 2 No. 2, 2009, pp. 122-126. doi: 10.4236/jsea.2009.22018.
References
[1]   [1] F. Huang and Y. Zhang, “Ming key phrase translations from web corpora,” Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, pp. 483-490 ACL, 2005.

[2]   [2] P. J. Cheng, J. W. Teng, R. C. Chen, J. H. Wang, W. H. Lu, and L. F. Chien, “Translating unknown queries with web corpora for cross-language information retrieval,” in the Proceedings of 27th ACM SIGIR, ACM Press, pp. 146-153, 2004.

[3]   [3] C. Y. Lu, Y. Xu, and S. Geva, “Web-based query transla-tion for English-Chinese CLIR,” Computational Linguis-tics and Chinese Language Processing, Vol. 13, No. 1, pp. 61-90, 2008.

[4]   [4] M. Nagata, T. Saito, and K. Suzuki, “Using the web as a bilingual dictionary,” Proceedings of ACL 2001 Work-shop Data-Driven Methods in Machine Translation, pp. 95-102. 2001.

[5]   [5] W. H. Lu, L. F. Chien, and H. J. Lee, “Translation of web queries using anchor text mining,” ACM Transactions on Asian Language Information Processing (TALIP), Vol. 1, No. 2, pp. 159-172, 2002.

[6]   [6] S. Li and H. T. Ng, “Mining new word translations from comparable corpora,” COLING 2004 ACL, 2004.

[7]   [7] M. L. Zhou and J. M. Yao, “Mining named entity trans-literations from comparable corpora,” Proceedings of 7th International Conference on Chinese Computing, 2007.

[8]   [8] J. Li, “Researching and implementing of English-Chinese transliteration method based on text,” Master’s degree thesis, Harbin Institute of Technology, 2005.

[9]   [9] W. Gao, “Phoneme-based statistical transliteration of foreign names for OOV problem [D],” The Chinese Uni-versity of Hong Kong, 2004.

[10]   [10] P. Virga and S. Khudanpur, “Transliteration of proper names in cross-lingual information retrieval[A],” in Pro-ceedings of the ACI Workshop on Multilingual Named Entity Recognition [C], 2003.

[11]   [11] Xinhua News Agency, “Translation name office diction-ary of world-wide person name translations,” China Translation and Publishing Corporation, 1993.

[12]   [12] W. H. Lin and H. H. Chen, “Backward machine translit-eration by learning phonetic similarity,” in Proceedings of CONLL, Taipei, Taiwan, pp. 139-145, 2002.

[13]   [13] T. Lin, C. C. Wu, and J. S. Chang, “Word-transliteration alignment,” in Proceedings of ROCLING XV, Hsinchu, Taiwan, pp. 1-16, 2003.

[14]   [14] W. Gao, K. F. Wong, and W. Lam, “Phoneme-based transliteration of foreign name for OOV problem,” in Proceedings of the first International Joint Conference on Natural Language Processing (IJCNLP), Hainan Island, China, pp. 274-381, 2004.

[15]   [15] W. Lam, R. Z. Huang, and P. S. Cheung, “Learning pho-netic similarity for matching named entity translations and mining new translations,” in Proceedings of 27th In-ternational ACM SIGIR Conference on Research and Development in Information Retrieval, the University of Sheffield, UK, pp. 281-288, 2004.

[16]   [16] S. Wan and C. M. Verspoor, “Automatic English-Chinese name transliteration for development of multilingual re-sources,” in Proceedings of 36th Annual Meeting of the Association for Computational Linguistics, Montreal, Quebec, Canada, pp. 1352-1357, 1998.

[17]   [17] W. H. Lu, J. H. Lin, and Y. S. Chang, “Improving trans-lation of queries with infrequent unknown abbreviations and proper names,” Computational Linguistics and Chinese Language Processing, Vol. 13, No. 1, pp. 91-120,

 
 
Top