ABSTRACT MicroRNAs are an important subclass of non-coding RNAs (ncRNA), and serve as main players into RNA interference (RNAi). Mature microRNA derived from stem-loop structure called precursor. Identification of precursor microRNA (pre-miRNA) is essential step to target microRNA in whole genome. The present work proposed 25 novel local features for identifying stem- loop structure of pre-miRNAs, which captures characteristics on both the sequence and structure. Firstly, we pulled the stem of hairpins and aligned the bases in bulges and internal loops used ‘―’, and then counted 24 base-pairs (‘AA’, ‘AU’, …, ‘―G’, except ‘――’) in pulled stem (formalized by length of pulled stem) as features vector of Support Vector Machine (SVM). Performances of three classifiers with our features and different kernels trained on human data were all superior to Triplet-SVM-classifier’s in po- sitive and negative testing data sets. Moreover, we achieved higher prediction accuracy through combining 7 global sequence-structure. The result indicates validity of novel local features.
Cite this paper
nullZhao, Y. , Ni, Q. and Wang, Z. (2009) Identification of microRNA precursors with new sequence-structure features. Journal of Biomedical Science and Engineering, 2, 626-631. doi: 10.4236/jbise.2009.28091.
 V. Ambros. (2004) The functions of animal microRNAs, Nature, 431, 350–355.
D. P. Bartel. (2004) MicroRNAs: Genomics, biogenesis, mechanism, and function, Cell, 116, 281–297.
E. Lund, S. Guttinger, A. Calado, J. E. Dahlberg and U. Kutay. (2004) Nuclear export of microRNA precursors, Science, 303, 95–98.
L. He and G. Hannon. (2004) MicroRNAs: Small RNAs with a big role in gene regulation, Nat Rev Genet, 5, 522–531.
M. Lagos-Quintana, R. Rauhut, W. Lendeckel and T. Tuschl. (2001) Identification of novel genes coding for small expressed RNAs, Science, 294, 853–858.
N. C. Lau, L. P. Lim, E. G. Weinstein and D. P. Bartel. (2001) An abundant class of tiny RNAs with probable regulatory roles in Caenorhabditis elegans, Science, 294, 858–862.
R. C. Lee and V. Ambros. (2001) An extensive class of small RNAs in Caenorhabditis elegans, Science, 294, 862–864.
E. Berezikov, E. Cuppen and R. H. A. Plasterk. (2006) Approaches to microRNA discovery, Nature genetics, 38, s1–s7.
L. P. Lim, M. E. Glasner, S. Yekta, C. B. Burge and D. P. Bartel. (2003) Vertebrate microRNA genes, Science, 299, 1540.
L. P. Lim, N. C. Lau, E. G. Weinstein, A. Abdelhakim, S. Yekta, M. W. Rhoades, C. B. Burge and D. P. Bartel. (2003) The microRNAs of Caenorhabditis elegans, Genes Dev, 17, 991–1008.
M. W. Jones-Rhoades and D. P. Bartel. (2004) Computational identification of plant microRNAs and their targets, including a stress-induced miRNA, Mol Cell, 14, 787– 799.
E. Bonnet, J. Wuyts, P. Rouze and Van de Peer Y. (2004) Evidence that microRNA precursors, unlike other non- coding RNAs, have lower folding free energies than random sequences, Bioinformatics, 20, 2911–2917.
E. C. Lai, P. Tomancak, R. W. Williams and G. M. Rubin. (2003) Computational identification of Drosophila microRNA genes, Genome Biol, 4, R42.
A. Adai, C. Johnson, S. Mlotshwa, S. Archer-Evans and V. Manocha. (2005) Computational prediction of miRNAs in Arabidopsis thaliana, Genome Res, 15, 78–91.
I. Bentwich, A. Avniel, Y. Karov, R. Aharonov, S. Gilad, O. Barad, A. Barzilai, P. Einat, U. Einav, E. Meiri, E. Sharon, Y. Spector and Z. Bentwich. (2005) Identification of hundreds of conserved and nonconserved human microRNAs, Nat Genet, 37, 766–770.
X. Wang, J. Zhang, F. Li, J. Gu, T. He, X. Zhang and Y. Li. (2005) MicroRNA identification based on sequence and structure alignment, Bioinformatics, 21, 3610– 3614.
J. Hertel and P. F. Stadler. (2006) Hairpins in a haystack: recognizing microRNA precursors in comparative genomics data. Bioinformatics, 22, e197–e202.
C. Xue, F. Li, T. He, G. P. Liu, Y. Li and X. Zhang. (2005) Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine, BMC Bioinformatics, 6, 310.
K. L. S. Ng and S. K. Mishra. (2007) De novo SVM classification of precursor microRNAs from genomic pseudo hairpins using global and intrinsic folding measures, Bioinformatics, 23, 1321–1330.
T. Huang, B. Fan, M. Rothschild, Z. Hu, K. Li and S. Zhao. (2007) MiRFinder: An improved approach and software implementation for genome-wide fast microRNA precursor scans, BMC Bioinformatics, 8, 341.
J. W. Nam, K. R. Shin, J. Han, Y. Lee, V. N. Kim and B. T. Zhang. (2005) Human microRNA prediction through a probabilistic co-learning model of sequence and structure, Nucleic Acids Res, 33, 3570–3581.
S. Kadri, V. Hinman and P. V. Benos. (2009) HHMMiR: efficient de novo prediction of microRNAs using hierarchical hidden Markov models, BMC Bioinformatics, 10, S35.
A. Sewer, N. Paul, P. Landgraf, A. Aravin, S. Pfeffer, M. J. Brownstein, T. Tuschl, E. V. Nimwegen and M. Zavolan. (2005) Identification of clustered microRNAs using an ab initio prediction method, BMC Bioinformatics, 6, 267.
M. Yousef, M. Nebozhyn, H. Shatkay, S. Kanterakis, L. C. Showe and M. K. Showe. (2006) Combining multi- species genomic data for microRNA identification using a Na?ve Bayes classifier, Bioinformatics, 22, 1325–1334.
P. Jiang, H. Wu, W. Wang, W. Ma, X. Sun and Z. Lu. (2007) MiPred: Classification of real and pseudo microRNA precursors using random forest prediction model with combined features, Nucleic Acids Research, 35, 339–344.
Y. Xu, X. Zhou and W. Zhang. (2008) MicroRNA prediction with a novel ranking algorithm based on random walks, Bioinformatics, 24, 50–58.
V. N. Vapnik. (1995) The Nature of Statistical Learning Theory, Springer-Verlag, New York.
S. Griffiths-Jones. (2004) The microRNA registry, Nucleic Acids Res, 32, 109–111.
I. L. Hofacker. (2003) Vienna RNA secondary structure server, Nucleic Acids Res, 31, 3429–3431.
C. C. Chang and C. J. Lin. (2001) LIBSVM: A library for support vector machines.
P. P. Gardner and R. Giegerich. (2004) A comprehensive comparison of comparative RNA structure prediction approaches, BMC Bioinformatics, 5, 140.