JBiSE  Vol.8 No.10 , October 2015
Sequence Motif-Based One-Class Classifiers Can Achieve Comparable Accuracy to Two-Class Learners for Plant microRNA Detection
Abstract: microRNAs (miRNAs) are short nucleotide sequences expressed by a genome that are involved in post transcriptional modulation of gene expression. Since miRNAs need to be co-expressed with their target mRNA to observe an effect and since miRNAs and target interactions can be cooperative, it is currently not possible to develop a comprehensive experimental atlas of miRNAs and their targets. To overcome this limitation, machine learning has been applied to miRNA detection. In general binary learning (two-class) approaches are applied to miRNA discovery. These learners consider both positive (miRNA) and negative (non-miRNA) examples during the training process. One-class classifiers, on the other hand, use only the information for the target class (miRNA). The one-class approach in machine learning is gradually receiving more attention particularly for solving problems where the negative class is not well defined. This is especially true for miRNAs where the positive class can be experimentally confirmed relatively easy, but where it is not currently possible to call any part of a genome a non-miRNA. To do that, it should be co-expressed with all other possible transcripts of the genome, which currently is a futile endeavor. For machine learning, miRNAs need to be transformed into a feature vector and some currently used features like minimum free energy vary widely in the case of plant miRNAs. In this study it was our aim to analyze different methods applying one-class approaches and the effectiveness of motif-based features for prediction of plant miRNA genes. We show that the application of these one-class classifiers is promising and useful for this kind of problem which relies only on sequence- based features such as k-mers and motifs comparing to the results from two-class classification. In some cases the results of one-class are, to our surprise, more accurate than results from two-class classifiers.
Cite this paper: Yousef, M. , Allmer, J. , Khalifa, W. (2015) Sequence Motif-Based One-Class Classifiers Can Achieve Comparable Accuracy to Two-Class Learners for Plant microRNA Detection. Journal of Biomedical Science and Engineering, 8, 684-694. doi: 10.4236/jbise.2015.810065.

[1]   Erson-Bensan, A.E. (2014) Introduction to microRNAs in Biological Systems. Methods in Molecular Biology, 1107, 1-14.

[2]   Allmer, J. and Yousef, M. (2012) Computational Methods for ab Initio Detection of microRNAs. Front in Genet, 3, 209.

[3]   Lee, R.C., Feinbaum, R.L. and Ambros, V. (1993) The C. elegans Heterochronic Gene Lin-4 Encodes Small RNAs with Antisense Complementarity to Lin-14. Cell, 75, 843-854.

[4]   Tüfekci, K.U., Oner, M.G., Meuwissen, R.L.J. and Genc, S. (2014) The Role of microRNAs in Human Diseases. Methods in Molecular Biology, 1107, 33-50.

[5]   Zhang, Z., Yu, J., Li, D., et al. (2010) PMRD: Plant microRNA Database. Nucleic Acids Research, 38, D806-D813.

[6]   Kim, V.N., Han, J. and Siomi, M.C. (2009) Biogenesis of Small RNAs in Animals. Nature Reviews Molecular Cell Biology, 10, 126-139.

[7]   Chapman, E.J. and Carrington, J.C. (2007) Specialization and Evolution of Endogenous Small RNA Pathways. Nature Reviews Genetics, 8, 884-896.

[8]   Sacar, M.D. and Allmer, J. (2013) Comparison of Four ab Initio microRNA Prediction Tools. International Conference on Bioinformatics Models, Methods and Algorithms, SciTePress, Science and Technology Publications, Barcelona, 190-195.

[9]   Lopes, I.D.O.N., Schliep, A. and de Carvalho, A.C.P.D.L.F. (2014) The Discriminant Power of RNA Features for Pre-miRNA Recognition. BMC Bioinformatics, 15, 124.

[10]   Kozomara, A. and Griffiths-Jones, S. (2011) miRBase: Integrating microRNA Annotation and Deep-Sequencing Data. Nucleic Acids Research, 39, D152-D157.

[11]   Lim, L.P., Glasner, M.E., Yekta, S., Burge, C.B. and Bartel, D.P. (2003) Vertebrate MicroRNA Genes. Science, 299, 1540.

[12]   Weber, M.J. (2005) New Human and Mouse MicroRNA Genes Found by Homology Search. FEBS Journal, 272, 59- 73.

[13]   Lim, L.P., Lau, N.C., Weinstein, E.G., Abdelhakim, A., Yekta, S., Rhoades, M.W., et al. (2003) The MicroRNAs of Caenorhabditis elegans. Genes & Development, 17, 991-1008.

[14]   Lai, E.C., Tomancak, P., Williams, R.W. and Rubin, G.M. (2003) Computational Identification of Drosophila MicroRNA Genes. Genome Biology, 4, R42.

[15]   Grad, Y., Aach, J., Hayes, G.D., Reinhart, B.J., Church, G.M., Ruvkun, G. and Kim, J. (2003) Computational and Experimental Identification of C. elegans MicroRNAs. Molecular Cell, 11, 1253-1263.

[16]   Teune, J.-H. and Steger, G. (2010) NOVOMIR: De Novo Prediction of MicroRNA-Coding Regions in a Single Plant-Genome. Journal of Nucleic Acids, 2010, Article ID: 495904.

[17]   Ding, J.D., Zhou, S.G. and Guan, J.H. (2010) MiRenSVM: Towards Better Prediction of MicroRNA Precursors Using an Ensemble SVM Classifier with Multi-Loop Features. BMC Bioinformatics, 11, S11.

[18]   Xue, C.H., Li, F., He, T., Liu, G.-P., Li, Y.D. and Zhang, X.G. (2005) Classification of Real and Pseudo MicroRNA Precursors Using Local Structure-Sequence Features and Support Vector Machine. BMC Bioinformatics, 6, 310.

[19]   Jiang, P., Wu, H.N., Wang, W.K., Ma, W., Sun, X. and Lu, Z.H. (2007) MiPred: Classification of Real and Pseudo MicroRNA Precursors Using Random Forest Prediction Model with Combined Features. Nucleic Acids Research, 35, W339-W344.

[20]   Keshavan, R., Virata, M., Keshavan, A. and Zeller, R.W. (2010) Computational Identification of Ciona intestinalis MicroRNAs. Zoological Science, 27, 162-170.

[21]   Lagos-Quintana, M., Rauhut, R., Lendeckel, W. and Tuschl, T. (2001) Identification of Novel Genes Coding for Small Expressed RNAs. Science, 294, 853-858.

[22]   Lau, N.C., Lim, L.P., Weinstein, E.G. and Bartel, D.P. (2001) An Abundant Class of Tiny RNAs with Probable Regulatory Roles in Caenorhabditis elegans. Science, 294, 858-862.

[23]   Lee, R.C. and Ambros, V. (2001) An Extensive Class of Small RNAs in Caenorhabditis elegans. Science, 294, 862- 864.

[24]   Pasquinelli, A.E., Reinhart, B.J., Slack, F., Martindale, M.Q., Kuroda, M.I., Maller, B., et al. (2000) Conservation of the Sequence and Temporal Expression of Let-7 Heterochronic Regulatory RNA. Nature, 408, 86-89.

[25]   Wang, X.W., Zhang, J., Li, F., Gu, J., He, T., Zhang, X.G. and Li, Y.D. (2005) MicroRNA Identification Based on Sequence and Structure Alignment. Bioinformatics, 21, 3610-3614.

[26]   Hertel, J. and Stadler, P.F. (2006) Hairpins in a Haystack: Recognizing MicroRNA Precursors in Comparative Genomics Data. Bioinformatics, 22, e197-e202.

[27]   Sacar, M.D., Hamzeiy, H. and Allmer, J. (2013) Can MiRBase Provide Positive Data for Machine Learning for the Detection of MiRNA Hairpins? Journal of Integrative Bioinformatics, 10, 215.

[28]   Ritchie, W., Gao, D. and Rasko, J.E.J. (2012) Defining and Providing Robust Controls for MicroRNA Prediction. Bioinformatics, 28, 1058-1061.

[29]   Wu, Y.G., Wei, B., Liu, H.Z., Li, T.X. and Rayner, S. (2011) MiRPara: A SVM-Based Software Tool for Prediction of Most Probable MicroRNA Coding Regions in Genome Scale Sequences. BMC Bioinformatics, 12, 107.

[30]   Yousef, M., Jung, S., Showe, L.C. and Showe, M.K. (2008) Learning from Positive Examples When the Negative Class Is Undetermined-MicroRNA Gene Identification. Algorithms for Molecular Biology, 3, 2.

[31]   Sewer, A., Paul, N., Landgraf, P., Aravin, A., Pfeffer, S., Brownstein, M.J., et al. (2005) Identification of Clustered MicroRNAs Using an ab initio Prediction Method. BMC Bioinformatics, 6, 267.

[32]   Gomes, C.P.C., Cho, J.-H., Hood, L., Franco, O.L., Pereira, R.W. and Wang, K. (2013) A Review of Computational Tools in MicroRNA Discovery. Frontiers in Genetics, 4, 81.

[33]   Billoud, B., Nehr, Z., Le Bail, A. and Charrier, B. (2014) Computational Prediction and Experimental Validation of MicroRNAs in the Brown Alga Ectocarpus siliculosus. Nucleic Acids Research, 42, 417-429.

[34]   Oliveira, J.S., Mendes, N.D., Carocha, V., Graca, C., Paiva, J.A. and Freitas, A.T. (2013) A Computational Approach for MicroRNA Identification in Plants: Combining Genome-Based Predictions with RNA-Seq Data. Journal of Data Mining in Genomics & Proteomics, 4, 130.

[35]   Xuan, P., Guo, M.Z., Liu, X.Y., Huang, Y.C., Li, W.B. and Huang, Y.F. (2011) PlantMiRNAPred: Efficient Classification of Real and Pseudo Plant Pre-miRNAs. Bioinformatics, 27, 1368-1376.

[36]   Williams, P.H., Eyles, R. and Weiller, G. (2012) Plant MicroRNA Prediction by Supervised Machine Learning Using C5.0 Decision Trees. Journal of Nucleic Acids, 2012, Article ID: 652979.

[37]   Cakir, M.V. and Allmer, J. (2010) Systematic Computational Analysis of Potential RNAi Regulation in Toxoplasma gondii. 2010 5th International Symposium on Health Informatics and Bioinformatics (HIBIT), Antalya, 20-22 April 2010, 31-38.

[38]   Adai, A., Johnson, C., Mlotshwa, S., Archer-Evans, S., Manocha, V., Vance, V. and Sundaresan, V. (2005) Computational Prediction of miRNAs in Arabidopsis thaliana. Genome Research, 15, 78-91.

[39]   Yousef, M., Allmer, J. and Khalifaa, W. (2015) Plant MicroRNA Prediction Employing Sequence Motifs Achieves High Accuracy. (Under Review)

[40]   Liu, X., He, S., Skogerbo, G., Gong, F.Z. and Chen, R.S. (2012) Integrated Sequence-Structure Motifs Suffice to Identify MicroRNA Precursors. PLoS ONE, 7, e32797.

[41]   Bailey, T.L., Boden, M., Buske, F.A., Frith, M., Grant, C.E., Clementi, L., et al. (2009) MEME SUITE: Tools for Motif Discovery and Searching. Nucleic Acids Research, 37, W202-W208.

[42]   Bailey, T.L. and Elkan, C. (1994) Fitting a Mixture Model by Expectation Maximization to Discover Motifs in Biopolymers. Proceedings of the International Conference on Intelligent Systems for Molecular Biology, 2, 28-36.

[43]   Yan, T., Yoo, D., Berardini, T.Z., Mueller, L.A., Weems, D.C., Weng, S., et al. (2005) PatMatch: A Program for Finding Patterns in Peptide and Nucleotide Sequences. Nucleic Acids Research, 33, W262-W266.

[44]   Kowalczyk, A. and Raskutti, B. (2002) One Class SVM for Yeast Regulation Prediction. SIGKDD Explorations Newsletter, 4, 99-100.

[45]   Chechik, G. (2004) A Needle in a Haystack: Local One-Class Optimization. Proceedings of the 21st International Conference on Machine Learning, Banff, 4-8 July 2004.

[46]   Spinosa, E.J. and Carvalho, A. (2005) Support Vector Machines for Novel Class Detection in Bioinformatics. Genetics and Molecular Research, 4, 608-615.

[47]   Gupta, G. and Ghosh, J. (2005) Robust One-Class Clustering Using Hybrid Global and Local Search. Proceedings of the 22nd International Conference on Machine Learning, Bonn, 7-11 August 2005, 273-280.

[48]   Manevitz, L. and Yousef, M. (2007) One-Class Document Classification via Neural Networks. Neurocomputing, 70, 1466-1481.

[49]   Yousef, M., Najami, N. and Khalifa, W. (2010) A Comparison Study between One-Class and Two-Class Machine Learning for MicroRNA Target Detection. Journal of Biomedical Science and Engineering, 3, 247-252.

[50]   Manevitz, L.M. and Yousef, M. (2001) One-Class SVMs for Document Classification. Journal of Machine Learning Research, 2, 139-154.

[51]   Tax, D.M.J. (2005) DDtools, the Data Description Toolbox for Matlab.

[52]   Batuwita, R. and Palade, V. (2009) MicroPred: Effective Classification of Pre-miRNAs for Human miRNA Gene Prediction. Bioinformatics, 25, 989-995.

[53]   Gewehr, J.E., Szugat, M. and Zimmer, R. (2007) BioWe-ka—Extending the Weka Framework for Bioinformatics. Bioinformatics, 23, 651-653.

[54]   Sacar, M.D. and Allmer, J. (2014) Machine Learning Methods for MicroRNA Gene Prediction. In: Malik Yousef, Jens Allmer, Eds., miRNomics: MicroRNA Biology and Computational Analysis, Methods in Molecular Biology, Vol. 1107, Humana Press, New York, 177-187.