JILSA  Vol.8 No.1 , February 2016
Accurate Plant MicroRNA Prediction Can Be Achieved Using Sequence Motif Features
Abstract: MicroRNAs (miRNAs) are short (~21 nt) nucleotide sequences that are either co-transcribed during the production of mRNA or are organized in intergenic regions transcribed by RNA polymerase II. In animals, Drosha, and in plants DCL1 recognize pre-miRNAs which set themselves apart by their characteristic stem loop (hairpin) structure. This structure appears important for their recognition during the process of maturation leading to functioning mature miRNAs. A large body of research is available for computational pre-miRNA detection in animals, but less within the plant kingdom. For the prediction of pre-miRNAs, usually machine learning approaches are employed. Therefore, it is necessary to convert the pre-miRNAs into a set of features that can be calculated and many such features have been described. We here select a subset of the previously described features and add sequence motifs as new features. The resulting model which we called MotifmiRNAPred was tested on known pre-miRNAs listed in miRBase and its accuracy was compared to existing approaches in the field. With an accuracy of 99.95% for the generalized plant model, it distinguishes itself from previously published results which reach an average accuracy between 74% and 98%. We believe that our approach is useful for prediction of pre-miRNAs in plants without per species adjustment.
Cite this paper: Yousef, M. , Allmer, J. , Khalifa, W. (2016) Accurate Plant MicroRNA Prediction Can Be Achieved Using Sequence Motif Features. Journal of Intelligent Learning Systems and Applications, 8, 9-22. doi: 10.4236/jilsa.2016.81002.

[1]   Erson-Bensan, A.E. (2014) Introduction to microRNAs in Biological Systems. Methods in Molecular Biology, 1107, 1-14.

[2]   Allmer, J. and Yousef, M. (2012) Computational Methods for ab Initio Detection of microRNAs. Frontiers in Genetics. abstract

[3]   Lee, R.C., Feinbaum, R.L. and Ambros, V. (1993) The C. elegans Heterochronic Gene lin-4 Encodes Small RNAs with Antisense Complementarity to lin-14. Cell, 75, 843-854.

[4]   Tüfekci, K.U., Oner, M.G., Meuwissen, R.L.J. and Genc, S. (2014) The Role of microRNAs in Human Diseases. Methods in Molecular Biology, 1107, 33-50.

[5]   Zhang, Z., Yu, J., Li, D., Zhang, Z., Liu, F., Zhou, X., et al. (2010) PMRD: Plant microRNA Database. Nucleic Acids Research, 38, D806-D813. abstract

[6]   Kim, V.N., Han, J. and Siomi, M.C. (2009) Biogenesis of Small RNAs in Animals. Nature Reviews Molecular Cell Biology, 10, 126-139.

[7]   Chapman, E.J. and Carrington, J.C. (2007) Specialization and Evolution of Endogenous Small RNA Pathways. Nature Reviews Genetics, Nature Publishing Group, 8, 884-896.

[8]   Allmer, J. (2014) Computational and Bioinformatics Methods for microRNA Gene Prediction. Methods in Molecular Biology, 1107, 157-175.

[9]   Hamzeiy, H., Allmer, J. and Yousef, M. (2014) Computational Methods for microRNA Target Prediction. Methods in Molecular Biology, 1107, 207-221.

[10]   Sa?ar, M.D. and Allmer, J. (2013) Comparison of Four ab Initio microRNA Prediction Tools. Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms, SciTePress—Science and and Technology Publications, Barcelona, 190-195.

[11]   de ON Lopes, I., Schliep, A. and de Carvalho, A.C.P. de L.F. (2014) The Discriminant Power of RNA Features for Pre-miRNA Recognition. BMC Bioinformatics, 15, 124. abstract

[12]   Kozomara, A. and Griffiths-Jones, S. (2011) miRBase: Integrating microRNA Annotation and Deep-Sequencing Data. Nucleic Acids Research, 39, D152-D157. abstract

[13]   Lim, L.P., Glasner, M.E., Yekta, S., Burge, C.B. and Bartel, D.P. (2003) Vertebrate microRNA Genes. Science, 299, 1540.

[14]   Weber, M.J. (2005) New Human and Mouse microRNA Genes Found by Homology Search. FEBS Journal, 272, 59-73.

[15]   Lim, L.P., Lau, N.C., Weinstein, E.G., Abdelhakim, A., Yekta, S., Rhoades, M.W., et al. (2003) The microRNAs of Caenorhabditis elegans. Genes & Development, 17, 991-1008.

[16]   Lai, E.C., Tomancak, P., Williams, R.W. and Rubin, G.M. (2003) Computational Identification of Drosophila microRNA Genes. Genome Biology, 4, R42.

[17]   Grad, Y., Aach, J., Hayes, G.D., Reinhart, B.J., Church, G.M., Ruvkun, G., et al. (2003) Computational and Experimental Identification of C. elegans microRNAs. Molecular Cell, 11, 1253-1263.

[18]   Teune, J.-H. and Steger, G. (2010) NOVOMIR: De Novo Prediction of MicroRNA-Coding Regions in a Single Plant-Genome. Journal of Nucleic Acids, 2010, Article ID: 495904.

[19]   Ding, J., Zhou, S. and Guan, J. (2010) MiRenSVM: Towards Better Prediction of microRNA Precursors Using an Ensemble SVM Classifier with Multi-Loop Features. BMC Bioinformatics, 11, S11.

[20]   Xue, C., Li, F., He, T., Liu, G.-P., Li, Y. and Zhang, X. (2005) Classification of Real and Pseudo microRNA Precursors Using Local Structure-Sequence Features and Support Vector Machine. BMC Bioinformatics, 6, 310.

[21]   Jiang, P., Wu, H., Wang, W., Ma, W., Sun, X. and Lu, Z. (2007) MiPred: Classification of Real and Pseudo microRNA Precursors Using Random Forest Prediction Model with Combined Features. Nucleic Acids Research, 35, W339-W344.

[22]   Keshavan, R., Virata, M., Keshavan, A. and Zeller, R.W. (2010) Computational Identification of Ciona intestinalis microRNAs. Zoological Science, 27, 162-170.

[23]   Lagos-Quintana, M., Rauhut, R., Lendeckel, W. and Tuschl, T. (2001) Identification of Novel Genes Coding for Small Expressed RNAs. Science, 294, 853-858.

[24]   Lau, N.C., Lim, L.P., Weinstein, E.G. and Bartel, D.P. (2001) An Abundant Class of Tiny RNAs with Probable Regulatory Roles in Caenorhabditis elegans. Science, 294, 858-862.

[25]   Lee, R.C. and Ambros, V. (2001) An Extensive Class of Small RNAs in Caenorhabditis elegans. Science, 294, 862-864.

[26]   Pasquinelli, A.E., Reinhart, B.J., Slack, F., Martindale, M.Q., Kuroda, M.I., Maller, B., et al. (2000) Conservation of the Sequence and Temporal Expression of Let-7 Heterochronic Regulatory RNA. Nature, 408, 86-89.

[27]   Wang, X., Zhang, J., Li, F., Gu, J., He, T., Zhang, X., et al. (2005) MicroRNA Identification Based on Sequence and Structure Alignment. Bioinformatics, 21, 3610-3614.

[28]   Hertel, J. and Stadler, P.F. (2006) Hairpins in a Haystack: Recognizing microRNA Precursors in Comparative Genomics Data. Bioinformatics, 22, 197-202.

[29]   Ritchie, W., Gao, D. and Rasko, J.E.J. (2012) Defining and Providing Robust Controls for microRNA Prediction. Bioinformatics, 28, 1058-1061.

[30]   Wu, Y., Wei, B., Liu, H., Li, T. and Rayner, S. (2011) MiRPara: A SVM-Based Software Tool for Prediction of Most Probable microRNA Coding Regions in Genome Scale Sequences. BMC Bioinformatics, 12, 107.

[31]   Yousef, M., Jung, S., Showe, L.C. and Showe, M.K. (2008) Learning from Positive Examples When the Negative Class Is Undetermined—microRNA Gene Identification. Algorithms for Molecular Biology, 3, 2.

[32]   Sewer, A., Paul, N., Landgraf, P., Aravin, A., Pfeffer, S., Brownstein, M.J., et al. (2005) Identification of Clustered microRNAs Using an ab Initio Prediction Method. BMC Bioinformatics, 6, 267.

[33]   Gomes, C.P.C., Cho, J.-H., Hood, L., Franco, O.L., Pereira, R.W. and Wang, K. (2013) A Review of Computational Tools in microRNA Discovery. Frontiers in Genetics, 4, 81. abstract

[34]   Billoud, B., Nehr, Z., Le Bail, A. and Charrier, B. (2014) Computational Prediction and Experimental Validation of microRNAs in the Brown Alga Ectocarpus siliculosus. Nucleic Acids Research, 42, 417-429. abstract

[35]   Oliveira, J.S., Mendes, N.D., Carocha, V., Graca, C., Paiva, J.A. and Freitas, A.T. (2013) A Computational Approach for MicroRNA Identification in Plants: Combining Genome-Based Predictions with RNA-Seq Data. Journal of Data Mining in Genomics & Proteomics, 4, 130.

[36]   Xuan, P., Guo, M., Liu, X., Huang, Y., Li, W. and Huang, Y. (2011) PlantMiRNAPred: Efficient Classification of Real and Pseudo Plant Pre-miRNAs. Bioinformatics, 27, 1368-1376.

[37]   Williams, P.H., Eyles, R. and Weiller, G. (2012) Plant MicroRNA Prediction by Supervised Machine Learning Using C5.0 Decision Trees. Journal of Nucleic Acids, 2012, Article ID: 652979. abstract

[38]   Cakir, M.V. and Allmer, J. (2010) Systematic Computational Analysis of Potential RNAi Regulation in Toxoplasma gondii. Proceedings of the 5th International Symposium on Health Informatics and Bioinformatics, Ankara, 20-22 April 2010, 31-38.

[39]   Adai, A., Johnson, C., Mlotshwa, S., Archer-Evans, S., Manocha, V., Vance, V., et al. (2005) Computational Prediction of miRNAs in Arabidopsis thaliana. Genome Research, 15, 78-91.

[40]   Rajagopalan, R., Vaucheret, H., Trejo, J. and Bartel, D.P. (2006) A Diverse and Evolutionarily Fluid Set of microRNAs in Arabidopsis thaliana. Genes & Development, 20, 3407-3425. abstract

[41]   Jain, M., Chevala, V.V.S.N. and Garg, R. (2014) Genome-Wide Discovery and Differential Regulation of Conserved and Novel microRNAs in Chickpea via Deep Sequencing. Journal of Experimental Botany, 65, 5945-5958. abstract

[42]   Berezikov, E., Cuppen, E. and Plasterk, R.H.A. (2006) Approaches to microRNA Discovery. Nature Genetics, 38, 2-7.

[43]   Dai, X., Zhuang, Z. and Zhao, P.X. (2011) Computational Analysis of miRNA Targets in Plants: Current Status and Challenges. Briefings in Bioinformatics, 12, 115-121.

[44]   Kurtoglu, K.Y., Kantar, M., Lucas, S.J. and Budak, H. (2013) Unique and Conserved microRNAs in Wheat Chromosome 5D Revealed by Next-Generation Sequencing. PLoS ONE, 8, e69801. abstract

[45]   Bailey, T.L., Boden, M., Buske, F.A., Frith, M., Grant, C.E., Clementi, L., et al. (2009) MEME SUITE: Tools for Motif Discovery and Searching. Nucleic Acids Research, 37, W202-W208. abstract

[46]   Bailey, T.L. and Elkan, C. (1994) Fitting a Mixture Model by Expectation Maximization to Discover Motifs in Biopolymers. Proceedings of the International Conference on Intelligent Systems for Molecular Biology, 2, 28-36.

[47]   Yan, T., Yoo, D., Berardini, T.Z., Mueller, L.A., Weems, D.C., Weng, S., et al. (2005) PatMatch: A Program for Finding Patterns in Peptide and Nucleotide Sequences. Nucleic Acids Research, 33, W262-W266. abstract

[48]   van der Burgt, A., Fiers, M.W.J.E., Nap, J.-P. and van Ham, R.C.H.J. (2009) In Silico miRNA Prediction in Metazoan Genomes: Balancing between Sensitivity and Specificity. BMC Genomics, 10, 204.

[49]   Bentwich, I. (2008) Identifying Human microRNAs. Current Topics in Microbiology and Immunology, 320, 257-269.

[50]   Nam, J.-W., Shin, K.-R., Han, J., Lee, Y., Kim, V.N. and Zhang, B.-T. (2005) Human microRNA Prediction through a Probabilistic Co-Learning Model of Sequence and Structure. Nucleic Acids Research, 33, 3570-3581.

[51]   Nam, J.-W., Kim, J., Kim, S.-K., Zhang, B.-T. (2006) ProMiR II: A Web Server for the Probabilistic Prediction of Clustered, Nonclustered, Conserved and Nonconserved microRNAs. Nucleic Acids Research, 34, W455-W458.

[52]   Ng, K.L.S. and Mishra, S.K. (2007) De Novo SVM Classification of Precursor microRNAs from Genomic Pseudo Hairpins Using Global and Intrinsic Folding Measures. Bioinformatics, 23, 1321-1330.

[53]   Thain, D., Tannenbaum, T. and Livny, M. (2005) Distributed Computing in Practice: The Condor Experience. Concurrency and Computation: Practice and Experience, 17, 2-4.

[54]   Guyon, I., Weston, J., Barnhill, S. and Vapnik, V. (2002) Gene Selection for Cancer Classification Using Support Vector Machines. Machine Learning, 46, 389-422.

[55]   Vapnik, V.N. (1995) The Nature of Statistical Learning Theory. Springer-Verlag, New York.

[56]   Gewehr, J.E., Szugat, M. and Zimmer, R. (2007) BioWeka—Extending the Weka Framework for Bioinformatics. Bioinformatics, 23, 651-653.

[57]   Chang, C.-C. and Lin, C.-J. (2011) LIBSVM. ACM Transactions on Intelligent Systems and Technology, 2, 1-27.

[58]   Batuwita, R. and Palade, V. (2009) microPred: Effective Classification of Pre-miRNAs for Human miRNA Gene Prediction. Bioinformatics, 25, 989-995.

[59]   Zhang, B.H., Pan, X.P., Cox, S.B., Cobb, G.P. and Anderson, T.A. (2006) Evidence That miRNAs Are Different from Other RNAs. Cellular and Molecular Life Sciences, 63, 246-254.

[60]   Sacar, M.D. and Allmer, J. (2014) Machine Learning Methods for microRNA Gene Prediction. Methods in Molecular Biology, 1107, 177-187.