immunodeficiency syndrome (AIDS) is a fatal disease which highly threatens the
health of human being. Human immunodeficiency virus (HIV) is the pathogeny for
this disease. Investigating HIV-1 protease cleavage sites can help researchers
find or develop protease inhibitors which can restrain the replication of HIV-1,
thus resisting AIDS. Feature selection is a new approach for solving the HIV-1
protease cleavage site prediction task and it’s a key point in our research.
Comparing with the previous work, there are several advantages in our work.
First, a filter method is used to eliminate the redundant features. Second,
besides traditional orthogonal encoding (OE), two kinds of newly proposed
features extracted by conducting principal component analysis (PCA) and
non-linear Fisher transformation (NLF) on AAindex database are used. The two
new features are proven to perform better
than OE. Third, the data set used here is largely expanded to 1922 samples.
Also to improve prediction performance, we conduct parameter optimization for
SVM, thus the classifier can obtain better prediction capability. We also fuse
the three kinds of features to make sure comprehensive feature representation
and improve prediction performance. To effectively evaluate the prediction
performance of our method, five parameters, which are much more than previous
work, are used to conduct complete comparison. The experimental results of our
method show that our method gain better performance than the state of art
method. This means that the feature selection combined with feature fusion and
classifier parameter optimization can effectively improve HIV-1 cleavage site
prediction. Moreover, our work can provide useful help for HIV-1 protease inhibitor
developing in the future.
 Brik, A. and Wong, C.H. (2003) HIV-1 protease: Mechanism and drug discovery. Organic & Biomolecular Chemistry, 1, 5-14. http://dx.doi.org/10.1039/b208248a
 Chou, K.C. (1996) Prediction of human immunodeficiency virus protease cleavage sites in proteins. Analytical Biochemistry, 233, 1-14.http://dx.doi.org/10.1006/abio.1996.0001
 Nanni, L. (2006) Comparison among feature extraction methods for HIV-1 protease cleavage site prediction. Pattern Recognition, 39, 711-713.http://dx.doi.org/10.1016/j.patcog.2005.11.002
 Kawashima, S., Pokarowski, P., Pokarowska, M., Kolinski, A., Katayama, T. and Kanehisa, M. (2008) AAindex: Amino acid index database, progress report 2008. Nucleic Acids Research, 36, 202-205.http://dx.doi.org/10.1093/nar/gkm998
 Niu, B., Lu, L., Liu, L., Gu, T.H., Feng, K.Y., Lu, W.C. and Cai, Y.D. (2009) HIV-1 protease cleavage site prediction based on amino acid property. Journal of Computational Chemistry, 30, 33-39.http://dx.doi.org/10.1002/jcc.21024
 Du, P. and Li, Y. (2006) Prediction of protein submitochondria locations by hybridizing pseudo-amino acid composition with various physicochemical features of segmented sequence. BMC Bioinformatics, 7, 518.http://dx.doi.org/10.1186/1471-2105-7-518
 Nanni, L. and Lumini, A. (2006) MppS: An ensemble of support vector machine based on multiple physicochemical properties of amino acids. Neurocomputing, 69, 1688-1690. http://dx.doi.org/10.1016/j.neucom.2006.04.001
 Sarda, D., Chua, G.H., Li, K.B. and Krishnan, A. (2005) pSLIP: SVM based protein subcellular localization prediction using multiple physicochemical properties. BMC Bioinformatics, 6, 152.http://dx.doi.org/10.1186/1471-2105-6-152
 Nanni, L. and Lumini, A. (2011) A new encoding technique for peptide classification. Expert Systems with Applications, 38, 3185-3191.http://dx.doi.org/10.1016/j.eswa.2010.09.005
 Maclin, R. and Opitz, D. (1999) Popular ensemble methods: An empirical study. Journal of Artificial Intelligence Research, 11, 169-198.
 Jain, A.K., Duin, R.P.W. and Mao, J. (2000) Statistical pattern recognition: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 4-37.http://dx.doi.org/10.1109/34.824819
 Guyon, I. and Elisseeff, A. (2003) An introduction to variable and feature selection. The Journal of Machine Learning Research, 3, 1157-1182.
 He, X. and Niyogi, X. (2004) Locality preserving projections. Neural Information Processing Systems, 16, 153.
 Yan, H., Yuan, X., Yan, S. and Yang, J. (2011) Correntropy based feature selection using binary projection. Pattern Recognition, 44, 2834-2842.http://dx.doi.org/10.1016/j.patcog.2011.04.014
 Bradley, A.P. (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30, 1145-1159.http://dx.doi.org/10.1016/S0031-3203(96)00142-2
 Powers, D.M.W. (2011) Evaluation: From precision, recall and f-measure to ROC, informedness, markedness & correlation. Journal of Machine Learning Technologies, 2, 37-63.
 Cai, Y.D. and Chou, K.C. (1998) Artificial neural network model for predicting HIV protease cleavage sites in protein. Advances in Engineering Software, 29, 119-128.http://dx.doi.org/10.1016/S0965-9978(98)00046-5
 You, L., Garwicz, D. and Rognvaldsson, T. (2005) Comprehensive bioinformatic analysis of the specificity of human immunodeficiency virus type 1 protease. Journal of Virology, 79, 12477-12486.http://dx.doi.org/10.1128/JVI.79.19.12477-12486.2005
 Kim, H., Yoon, T.S., Zhang, Y., Dikshit, A. and Chen, S.S. (2006) Predictability of rules in HIV-1 protease cleavage site analysis. Lecture Notes in Computational Science, 3992, 830-837.
 Kontijevskis, A., Wikberg, J.E. and Komorowski, J. (2007) Computational proteomics analysis of HIV-1 protease interactome. Proteins: Structure, Function, and Bioinformatics, 68, 305-312.http://dx.doi.org/10.1002/prot.21415