ABSTRACT The application of one-class machine learning is gaining attention in the computational biology community. Different studies have described the use of two-class machine learning to predict microRNAs (miRNAs) gene target. Most of these methods require the generation of an artificial negative class. However, designation of the negative class can be problematic and if it is not properly done can affect the performance of the classifier dramatically and/or yield a biased estimate of performance. We present a study using one-class machine learning for miRNA target discovery and compare one-class to two-class approaches. Of all the one-class methods tested, we found that most of them gave similar accuracy that range from 0.81 to 0.89 while the two-class naive Bayes gave 0.99 accuracy. One and two class methods can both give useful classification accuracies. The advantage of one class methods is that they don’t require any additional effort for choosing the best way of generating the negative class. In these cases one- class methods can be superior to two-class methods when the features which are chosen as representative of that positive class are well defined.
Cite this paper
nullYousef, M. , Najami, N. and Khalifav, W. (2010) A comparison study between one-class and two-class machine learning for MicroRNA target detection. Journal of Biomedical Science and Engineering, 3, 247-252. doi: 10.4236/jbise.2010.33033.
 Bartel, D.P. (2004) MicroRNAs: Genomics, Biogenesis, Mechanism, and Function. Cell, 116, 281-297.
Lytle, J.R., Yario, T.A. and Steitz, J.A. (2007) Target mRNAs are repressed as efficiently by microRNA- binding sites in the 5?€2 UTR as in the 3?€2 UTR. Proceedings of the National Academy of Sciences, 104, 9667-9672.
Lim, L.P., Glasner, M.E., Yekta, S., Burge, C.B. and Bartel, D. P. (2003) Vertebrate MicroRNA Genes. Science, 299, 1540.
Lim, L.P., Lau, N.C., Weinstein, E.G., Abdelhakim, A., Yekta, S., Rhoades, M.W., Burge, C.B. and Bartel, D.P. (2003) The microRNAs of Caenorhabditis elegans. Genes & Development, 17, 991-1008.
Weber, M.J. (2005) New human and mouse microRNA genes found by homology search. FEBS Journal, 272, 59-73.
Lai, E., Tomancak, P., Williams, R. and Rubin, G. (2003) Computational identification of Drosophila microRNA genes. Genome Biology, 4, R42.
Grad, Y., Aach, J., Hayes, G.. D., Reinhart, B. J., Church, G.M., Ruvkun, G. and Kim, J. (2003) Computational and Experimental Identification of C. elegans microRNAs. Molecular Cell, 11, 1253-1263.
Bartel, D. P. (2004) MicroRNAs: Genomics, Biogenesis, Mechanism, and Function. Cell, 116, 281.
Lai, E. (2004) Predicting and validating microRNA targets. Genome Biology, 5, 115.
John, B., Enright, A.J., Aravin, A., Tuschl, T., Sander, C. and Marks, D.S. (2004) Human MicroRNA Targets. PLoS Biology, 2, e363.
Zuker, M. (2003) Mfold web server for nucleic acid folding and hybridization prediction. Nucleic acids research, 31 (13), 3406–3415.
Lewis, B. P., Shih, I. H., Jones-Rhoades, M. W., Bartel, D. P. and Burge, C. B. (2003) Prediction of mammalian microRNA targets. Cell, 115, 787.
Krek, A. et al. (2005) Combinatorial microRNA target predictions. Nature Genetics, 37, 495-500.
Grun, D., Wang, Y.L., Langenberger, D., Gunsalus, K.C. and Rajewsky, N. (2005) MicroRNA target predictions across seven drosophila species and comparison to mammalian targets. PLoS Computational Biology, 1, e13.
SaeTrom, O.L.A., Snove, O.J. and SaeTrom, P.A.L. (2005) Weighted sequence motifs as an improved seeding step in microRNA target prediction algorithms. RNA, 11, 995- 1003.
Sung-Kyu, K., Jin-Wu, N., Wha-Jin, L. and Byoung-Tak, Z. (2005) A kernel method for microrna target prediction using sensible data and position-based features. In computational intelligence in bioinformatics and computational biology. Proceedings of the 2005 IEEE Symposiumon CIBCB, 1-7.
Yan, X., et al. (2007) Improving the prediction of human microRNA target genes by using ensemble algorithm. FEBS Letters, 581, 1587.
Thadani, R. and Tammi, M. (2006) MicroTar: Predicting microRNA targets from RNA duplexes. BMC Bioinformatics, 7, S20.
Miranda, K.C., Huynh, T., Tay, Y., Ang, Y.S., Tam, W.L., Thomson, A. M., Lim, B. and Rigoutsos, I. (2006) A pattern-based method for the identification of microrna binding sites and their corresponding. Heteroduplexes, 126, 1203-1217.
Yousef, M., Jung, S., Kossenkov, A.V., Showe, L.C. and Showe, M.K. (2007) Naive Bayes for microRNA target predictions machine learning for microRNA targetsed. Oxford University Press, 2987-2992.
Tax, D.M.J. (2001) One-class classification; Concept- learning in the absence of counter-examples. Delft University of Technology ed.
Sch?lkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J. and Williamson, R. C. (2001) Estimating the support of a high-dimensional distribution. Neural Computation, 13, 1443-1471.
Chang, C.C. and Lin, C.J. (2001) LIBSVM: A library for support vector machinesed.
Tax, D.M.J. (2005) DDtools, the data description toolbox for matlab. Delft University of Technology ed.
Witten, I.H. and Frank, E. (2005) Data mining: Practical machine learning tools and techniques, Morgan Kaufmann, San Francisco.
Sch?lkopf, B., Burges, C.J.C. and Smola, A.J. (1999) Advances in kernel methods. MIT Press, Cambridge.
Vapnik, V. (1995) The Nature of Statistical Learning Theory, Springer.
Mitchell, T. (1997) Machine Learning, McGraw Hill.
McCallum, A.K. (1996) Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering text retrieval, classification and clustering.
Haussler, D. (1999) Convolution kernels on discrete structuresed, Technical Report UCSCCRL-99-10. Baskin School of Engineering, University of California, Santa Cruz.
Pavlidis, P., Weston, J., Cai, J. and Grundy, W.N. (2001) Gene functional classification from heterogeneous data. Proceedings of the 5th Annual International Conference on Computational Biology, ACM Press, Montreal, 249- 255.
Donaldson, I. et al. (2003) PreBIND and Textomy-mining the biomedical literature for protein-protein interactions using a support vector machine. BMC Bioinformatics, 4, 11.
Breiman, L. (2001) Random Forests. Machine Learning 45, 5-32.
Quinlan, J.R. (1993) C4.5: Programs for machine learning Morgan Kaufmann Publishers Inc.
Sethupathy, P., Corda, B. and Hatzigeorgiou, A.G. (2006) TarBase: A comprehensive database of experimentally supported animal microRNA targets. RNA, 12, 192-197.
Matthews, B. (1975) Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta, 405(2), 442-451.
Kowalczyk, A. and Raskutti, B. (2002) One class SVM for yeast regulation prediction. SIGKDD Explorations, 4, 99-100.
Spinosa, E.J. and Carvalho, A.C.P.L.F.d. (2005) Support vector machines for novel class detection. Bioinformatics Genetics and Molecular Research, 4, 608-615.
Crammer, K. and Chechik, G. (2004) A needle in a haystack: Local one-class optimization. Proceedings of the 21st International Conference on Machine Learning, Banff, 26.
Gupta, G. and Ghosh, J. (2005) Robust one-class clustering using hybrid global and local search. Proceedings of the 22nd International Conference on Machine Learning, ACM Press, Bonn, 273-280.
Manevitz, L.M. and Yousef, M. (2001) One-class SVMs for document classification. Journal of Machine Learning Research, 139-154.
Thirion, B. and Faugeras, O. (2004) Feature characterization in fMRI data: The information bottleneck approach. Medical Image Analysis, 8, 403.
Koppel, M. and Schler, J. (2004) Authorship verification as a one-class classification problem. Proceedings of the 21st International Conference on Machine Learning, ACM Press, Banff, 62.
Yousef, M., Jung, S., Showe, L. and Showe, M. (2008) Learning from positive examples when the negative class is undetermined-microRNA gene identificationed. Algorithms for Molecular Biology, 3, 2.