JBiSE  Vol.7 No.11 , September 2014
Predicting Βeta-Turns and Βeta-Turn Types Using a Novel Over-Sampling Approach
ABSTRACT
β-turn is one of the most important reverse turns because of its role in protein folding. Many computational methods have been studied for predicting β-turns and β-turn types. However, due to the imbalanced dataset, the performance is still inadequate. In this study, we proposed a novel over-sampling technique FOST to deal with the class-imbalance problem. Experimental results on three standard benchmark datasets showed that our method is comparable with state-of-the-art methods. In addition, we applied our algorithm to five benchmark datasets from UCI Machine Learning Repository and achieved significant improvement in G-mean and Sensitivity. It means that our method is also effective for various imbalanced data other than β-turns and β-turn types.

Cite this paper
Nguyen, L. , Dang, X. , Le, T. , Saethang, T. , Tran, V. , Ngo, D. , Gavrilov, S. , Nguyen, N. , Kubo, M. , Yamada, Y. , Satou, K. (2014) Predicting Βeta-Turns and Βeta-Turn Types Using a Novel Over-Sampling Approach. Journal of Biomedical Science and Engineering, 7, 927-940. doi: 10.4236/jbise.2014.711090.
References
[1]   Chou, K.C. (2000) Prediction of Tight Turns and Their Types in Proteins. Analytical Biochemistry, 286, 1-16. http://dx.doi.org/10.1006/abio.2000.4757

[2]   Marcelino, A.M.C. and Gierasch, L.M. (2008) Roles of Beta-Turns in Protein Folding: From Peptide Models to Protein Engineering. Biopolymers, 89, 380-391.
http://dx.doi.org/10.1002/bip.20960

[3]   Guruprasad, K. and Rajkumar, S. (2000) Beta-and Gamma-Turns in Proteins Revisited: A New Set of Amino Acid Turn-Type Dependent Positional Preferences and Potentials. Journal of Biosciences, 25, 143-156.

[4]   Takano, K., Yamagata, Y. and Yutani, K. (2000) Role of Amino Acid Residues at Turns in The Conformational Stability and Folding of Human Lysozyme. Biochemistry, 39, 8655-8665.
http://dx.doi.org/10.1021/bi9928694

[5]   Hutchinson, E.G. and Thornton, J.M. (1994) A Revised Set of Potentials for Beta-Turn Formation in Proteins. Protein Science, 3, 2207-2216.
http://dx.doi.org/10.1002/pro.5560031206

[6]   Shepherd, A.J., Gorse, D. and Thornton, J.M. (1999) Prediction of the Location and Type of Beta-Turns in Proteins Using Neural Networks. Protein Science, 8, 1045-1055.
http://dx.doi.org/10.1110/ps.8.5.1045

[7]   Kaur, H. and Raghava, G.P.S. (2003) Prediction of Beta-Turns in Proteins from Multiple Alignment Using Neural Network. Protein Science, 12, 627-634.
http://dx.doi.org/10.1110/ps.0228903

[8]   Petersen, B., Lundegaard, C. and Petersen, T.N. (2010) NetTurnP—Neural Network Prediction of Beta-Turns by Use of Evolutionary Information and Predicted Protein Sequence Features. PLoS ONE, 5, e15079.
http://dx.doi.org/10.1371/journal.pone.0015079

[9]   Kountouris, P. and Hirst, J.D. (2010) Predicting Beta-Turns and Their Types Using Predicted Backbone Dihedral Angles and Secondary Structures. BMC Bioinformatics, 11, Article ID: 407.
http://dx.doi.org/10.1186/1471-2105-11-407

[10]   Pham, T.H., Satou, K. and Ho, T.B. (2003) Prediction and Analysis of Beta-Turns in Proteins by Support Vector Machine. Genome Informatics, 14, 196-205.

[11]   Zhang, Q., Yoon, S. and Welsh, W.J. (2005) Improved Method for Predicting β-Turn Using Support Vector Machine. Bioinformatics, 21, 2370-2374.
http://dx.doi.org/10.1093/bioinformatics/bti358

[12]   Hu, X. and Li, Q. (2008) Using Support Vector Machine to Predict β- and γ-Turns in Proteins. Journal of Computational Chemistry, 29, 1867-1875.
http://dx.doi.org/10.1002/jcc.20929

[13]   Zheng, C. and Kurgan, L. (2008) Prediction of β-Turns at Over 80% Accuracy Based on an Ensemble of Predicted Secondary Structures and Multiple Alignments. BMC Bioinformatics, 9, 430.
http://dx.doi.org/10.1186/1471-2105-9-430

[14]   Elbashir, M., Wang, J., Wu, F.X. and Wang, L. (2013) Predicting β-Turns in Proteins Using Support Vector Machines with Fractional Polynomials. Proteome Science, 11, S5.
http://dx.doi.org/10.1186/1477-5956-11-S1-S5

[15]   Elbashir, M.K., Wang, J., Wu, F. and Li, M. (2012) Sparse Kernel Logistic Regression for β-Turns Prediction. 2012 IEEE 6th International Conference on Systems Biology (ISB), Xi’an, 18-20 August 2012, 246-251.

[16]   Kirschner, A. and Frishman, D. (2008) Prediction of β-Turns and β-Turn Types by a Novel Bidirectional Elman-Type Recurrent Neural Network with Multiple Output Layers (MOLEBRNN). Gene, 422, 22-29.
http://dx.doi.org/10.1016/j.gene.2008.06.008

[17]   Fuchs, P.F.J. and Alix, A.J.P. (2005) High Accuracy Prediction of β-Turns and Their Types Using Propensities and Multiple Alignments. Proteins: Structure, Function, and Bioinformatics, 59, 828-839.
http://dx.doi.org/10.1002/prot.20461

[18]   Shi, X., Hu, X., Li, S. and Liu, X. (2011) Prediction of β-Turn Types in Protein by Using Composite Vector. Journal of Theoretical Biology, 286, 24-30.
http://dx.doi.org/10.1016/j.jtbi.2011.07.001

[19]   Nakamura, M., Kajiwara, Y., Otsuka, A. and Kimura, H. (2013) LVQ-SMOTE—Learning Vector Quantization Based Synthetic Minority Over-Sampling Technique for Bio-medical Data. BioData Mining, 6, 16.

[20]   He, H. and Garcia, E.A. (2009) Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering, 21, 1263-1284.
http://dx.doi.org/10.1109/TKDE.2008.239

[21]   Hutchinson, E.G. and Thornton, J.M. (1996) PROMOTIF—A Program to Identify and Analyze Structural Motifs in Proteins. Protein Science, 5, 212-220.
http://dx.doi.org/10.1002/pro.5560050204

[22]   Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D.J. (1997) Gapped BLAST and PSI-BLAST: A New Generation of Protein Database Search Programs. Nucleic Acids Research, 25, 3389- 3402.
http://dx.doi.org/10.1093/nar/25.17.3389

[23]   Tang, Z., Li, T., Liu, R., Xiong, W., Sun, J., Zhu, Y. and Chen, G. (2011) Improving the Performance of β-Turn Prediction Using Predicted Shape Strings and a Two-Layer Support Vector Machine Model. BMC Bioinformatics, 12, 283.
http://dx.doi.org/10.1186/1471-2105-12-283

[24]   Sun, J., Tang, S., Xiong, W., Cong, P. and Li, T. (2012) DSP: A Protein Shape String and Its Profile Prediction Server. Nucleic Acids Research, 40, W298-W302.
http://dx.doi.org/10.1093/nar/gks361

[25]   Offmann, B., Tyagi, M. and de Brevern, A.G. (2007) Local Protein Structures. Current Bioinformatics, 2, 165-202.
http://dx.doi.org/10.2174/157489307781662105

[26]   Joseph, A.P., Agarwal, G., Mahajan, S., Gelly, J.C., Swapna, L.S., Offmann, B., Cadet, F., Bornot, A., Tyagi, M., Valadié, H., Schneider, B., Etchebest, C., Srinivasan, N. and de Brevern, A.G. (2010) A Short Survey on Protein Blocks. Biophysical Reviews, 2, 137-145.
http://dx.doi.org/10.1007/s12551-010-0036-1

[27]   De Brevern, A.G., Etchebest, C. and Hazout, S. (2000) Bayesian Probabilistic Approach for Predicting Backbone Structures in Terms of Protein Blocks. Proteins: Structure, Function, and Bioinformatics, 41, 271-287.
http://dx.doi.org/10.1002/1097-0134(20001115)41:3<271::AID-PROT10>3.0.CO;2-Z

[28]   De Brevern, A.G. (2005) New Assessment of a Structural Alphabet. In Silico Biology, 5, 283-289.

[29]   Joseph, A.P., Srinivasan, N. and de Brevern, A.G. (2011) Improvement of Protein Structure Comparison Using a Structural Alphabet. Biochimie, 93, 1434-1445.
http://dx.doi.org/10.1016/j.biochi.2011.04.010

[30]   Chawla, N.V., Bowyer, K.W., Hall, L.O. and Kegelmeyer, W.P. (2002) SMOTE: Synthetic Minority Over-Sampling Technique. Journal of Artificial Intelligence Research, 16, 321-357.

[31]   Karatzoglou, A., Wien, T.U., Smola, A., Hornik, K. and Wien, W. (2004) Kernlab—An S4 Package for Kernel Methods in R. Journal of Statistical Software, 11, 1-20.

[32]   Altidor, W., Khoshgoftaar, T.M. and Hulse, J.V. (2011) Robustness of Filter-Based Feature Ranking: A Case Study. Proceedings of 24th Florida Artificial Intelligence Research Society Conference (FLAIRS-24), Palm Beach, 18-20 May 2011, 453

[33]   Sonego, P., Kocsor, A. and Pongor, S. (2008) ROC Analysis: Applications to the Classification of Biological Sequences and 3D Structures. Briefings in Bioinformatics, 9, 198-209.
http://dx.doi.org/10.1093/bib/bbm064

[34]   Bache, K. and Lichman, M. (2013) UCI Machine Learning Repository. School of Information and Computer Sciences, University of California, Irvine.

 
 
Top