JBiSE  Vol.9 No.5 , April 2016
DNA Sequence Classification by Convolutional Neural Network
Abstract: In recent years, a deep learning model called convolutional neural network with an ability of extracting features of high-level abstraction from minimum preprocessing data has been widely used. In this research, we proposed a new approach in classifying DNA sequences using the convolutional neural network while considering these sequences as text data. We used one-hot vectors to represent sequences as input to the model; therefore, it conserves the essential position information of each nucleotide in sequences. Using 12 DNA sequence datasets, we evaluated our proposed model and achieved significant improvements in all of these datasets. This result has shown a potential of using convolutional neural network for DNA sequence to solve other sequence problems in bioinformatics.
Cite this paper: Nguyen, N. , Tran, V. , Ngo, D. , Phan, D. , Lumbanraja, F. , Faisal, M. , Abapihi, B. , Kubo, M. and Satou, K. (2016) DNA Sequence Classification by Convolutional Neural Network. Journal of Biomedical Science and Engineering, 9, 280-286. doi: 10.4236/jbise.2016.95021.

[1]   Eickholt, J. and Cheng, J. (2013) DNdisorder: Predicting Protein Disorder Using Boosting and Deep Networks. BMC Bioinformatics, 14, 88-98.

[2]   Leung, M.K.K., Xiong, H.Y., Lee, L.J. and Frey, B.J. (2014) Deep Learning of the Tissue-Regulated Splicing Code. Bioinformatics, 30, i121-i129.

[3]   Lee, H., Grosse, R., Ranganath, R. and Ng, Y.A. (2009) Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical. Proceeding of the 26th Annual International Conference on Machine Learning, Montreal, 14-18 June 2009, 609-616.

[4]   Mikolov, T., Chen, K., Greg, C. and Dean, J. (2013) Efficient Estimation of Word Representations in Vector Space. arXiv preprint arXiv:1301.3781.

[5]   Johnson, R. and Zhang, T. (2015) Effective Use of Word Order for Text Categorization with Convolutional Neural Networks. Proceeding of Human Language Technologies: The 2015 Annual Conference of the North American, Denver Colorado, 31 May-5 June 2015, 103-112.

[6]   Pokholok, D.K., Harbison, C.T., Levine, S., Cole, M., Hannett, N.M., Lee, T.I., Bell, G.W., Walker, K., Rolfe, P.A., Herbolsheimer, E., Zeitlinger, J., Lewitter, F., Gifford, D.K. and Young, R.A. (2005) Genome-Wide Map of Nucleosome Acetylation and Methylation in Yeast. Cell, 122, 517-527.

[7]   Higashihara, M., Rebolledo-Mendez, J.D., Yamada, Y. and Satou, K. (2008) Application of a Feature Selection Method to Nucleosome Data: Accuracy Improvement and Comparison with Other Methods. WSEAS Transactions on Biology and Biomedicine, 5, 153-162.

[8]   Li, J. and Wong, L. (2003) Using Rules to Analyse Bio-Medical Data: A Comparison between C4.5 and PCL. Proceedings of Advances in Web-Age Information Management 4th International Conference, Chengdu, 17-19 August 2003, 254-265.

[9]   Towell, G., Shavlik, J. and Noordewier, M. (1990) Refinement of Approximate Domain Theories by Knowledge-Based Artificial Neural Networks. Proceedings of the 8th National Conference on Artificial Intelligence, Boston, 29 July-3 August 1990, 861-866.