JBiSE  Vol.9 No.1 , January 2016
Application of Word Embedding to Drug Repositioning
Abstract: As a key technology of rapid and low-cost drug development, drug repositioning is getting popular. In this study, a text mining approach to the discovery of unknown drug-disease relation was tested. Using a word embedding algorithm, senses of over 1.7 million words were well represented in sufficiently short feature vectors. Through various analysis including clustering and classification, feasibility of our approach was tested. Finally, our trained classification model achieved 87.6% accuracy in the prediction of drug-disease relation in cancer treatment and succeeded in discovering novel drug-disease relations that were actually reported in recent studies.
Cite this paper: Ngo, D. , Yamamoto, N. , Tran, V. , Nguyen, N. , Phan, D. , Lumbanraja, F. , Kubo, M. and Satou, K. (2016) Application of Word Embedding to Drug Repositioning. Journal of Biomedical Science and Engineering, 9, 7-16. doi: 10.4236/jbise.2016.91002.

[1]   Ferreira, L.G., dos Santos, R.N., Oliva, G. and Andricopulo, A.D. (2015) Molecular Docking and Structure-Based Drug Design Strategies. Molecules, 20, 13384-13421.

[2]   Bajorath, J. (2015) Computer-Aided Drug Discovery. F1000Research, 4, 630.

[3]   Ashburn, T.T. and Thor, K.B. (2004) Drug Repositioning: Identifying and Developing New Uses for Existing Drugs. Nature Reviews Drug Discovery, 3, 673-683.

[4]   Emig, D., Ivliev, A., Pustovalova, O., Lancashire, L., Bureeva, S., Nikolsky, Y. and Bessarabova, M. (2013) Drug Target Prediction and Repositioning Using an Integrated Network-Based Approach. PLoS ONE, 8, e60618.

[5]   Fellbaum, C. (1998) WordNet: An Electronic Lexical Database. MIT, Cambridge, MA.

[6]   Mikolov, T., Chen, K., Corrado, G. and Dean, J. (2013) Efficient Estimation of Word Representations in Vector Space. Proceedings of Workshop at ICLR. arXiv:1301.3781v1

[7]   Mikolov, T., Sutskever, I., Chen, K., Corrado, G. and Dean, J. (2013) Distributed Representations of Words and Phrases and Their Compositionality. Proceedings of NIPS. arXiv:1301.3781v3

[8]   Mikolov, T., Yih, W.T. and Zweig, G. (2013) Linguistic Regularities in Continuous Space Word Representations. Proceedings of NAACL HLT, 746-751.

[9]   Miyao, Y. and Tsujii, J. (2008) Feature Forest Models for Probabilistic HPSG Parsing. Computational Linguistics, 34, 35-80.

[10]   Whirl-Carrillo, M., McDonagh, E.M., Hebert, J.M., Gong, L., Sangkuhl, K., Thorn, C.F., Altman, R.B. and Klein, T.E. (2012) Pharmacogenomics Knowledge for Personalized Medicine. Clinical Pharmacology & Therapeutics, 92, 414-417.

[11]   WHO Collaborating Centre for Drug Statistics Methodology (2015) ATC Classi-fication Index with DDDs. WHO Collaborating Centre, Oslo.

[12]   Lipscomb, C.E. (2000) Medical Subject Headings (MeSH). Bulletin of the Medical Library Association, 88, 265.

[13]   Wishart, D.S., Knox, C, Guo, A.C., Shrivastava, S., Hassanali, M., Stothard, P., Chang, Z. and Woolsey, J. (2006) DrugBank: A Comprehensive Resource for in Silico Drug Discovery and Exploration. Nucleic Acids Research, 34, D668-D672.

[14]   Davis, A.P., Grondin, C.J., Lennon-Hopkins, K., Saraceni-Richards, C., Sciaky, D., King, B.L., Wiegers, T.C. and Mattingly, C.J. (2015) The Comparative Toxicogenomics Database’s 10th Year Anniversary: Update 2015. Nucleic Acids Research, 43, D914-D920.

[15]   Xu, R. and Wunsch, D.I.I. (2005) Survey of Clustering Algorithms. IEEE Transactions on Neural Networks, 16, 645- 678.

[16]   Pantziarka, P., Bouche, G., Meheus, L., Sukhatme, V. and Sukhatme, V.P. (2015) Repurposing Drugs in Oncology (ReDO)-Clarithromycin as an Anti-Cancer Agent. eCancer Medical Science, 9, 513.

[17]   Dang, X.T., Hirose, O., Bui, D.H., Saethang, T., Tran, V.A., Nguyen, T.L.A., Le, T.T.K., Kubo, M., Yamada, Y. and Satou, K. (2013) A Novel Over-Sampling Method and Its Application to Cancer Classification from Gene Expression Data. Chem-Bio Informatics Journal, 13, 19-29.