ENG  Vol.5 No.10 B , October 2013
Identification of Deleterious Single Amino Acid Polymorphism Using Sequence Information Based on Feature Selection and Parameter Optimization

Most of the human genetic variations are single nucleotide polymorphisms (SNPs), and among them, non-synonymous SNPs, also known as SAPs, attract extensive interest. SAPs can be neural or disease associated. Many studies have been done to distinguish deleterious SAPs from neutral ones. Since many previous studies were based on both structural and sequence features of the SAP, these methods are not applicable when protein structures are not available. In the current paper, we developed a method based on UMDA and SVM using protein sequence information to predict SAP’s disease association. We extracted a set of features that are independent of protein structure for each SAP. Then a SVM-based machine-learning classifier that used grid search to tune parameters was applied to predict the possible disease associa-tion of SAPs. The SVM method reaches good prediction accuracy. Since the input data of SVM contain irrelevant and noisy features and parameters of SVM also affect the prediction performance, we introduced UMDA-based wrapper approach to search for the ‘best’ solution. The UMDA-based method greatly improved prediction performance. Com-pared with current method, our method achieved better performance.

Cite this paper: Chen, X. , Peng, Q. and Lv, J. (2013) Identification of Deleterious Single Amino Acid Polymorphism Using Sequence Information Based on Feature Selection and Parameter Optimization. Engineering, 5, 472-476. doi: 10.4236/eng.2013.510B097.

[1]   F. S. Collins, L. D. Brooks and A. Chakravarti, “A DNA Polymorphism Discovery Resource for Research on Human Genetic Variation,” Genome Research, Vol. 8, 1998, pp. 1229-1231.

[2]   P. C. Ng and S. Henikoff, “Accounting for Human Polymorphisms Predicted to Affect Protein Function,” Genome Research, Vol. 12, 2002, pp. 436-446.

[3]   S. Herrgard, S. A. Cammer, B. T. Hoffman, S. Knutson, M. Gallina, J. A. Speir, J. S. Fetrow and S. M. Baxter, “Prediction of De-leterious Functional Effects of Amino Acid Mutations Using a Library of Structure-Based Function Descriptors,” Proteins-Structure Function and Genetics, Vol. 53, 2003, pp. 806-816.

[4]   P. C. Ng and S. Henikoff, “Predicting Deleterious Amino Acid Substitutions,” Genome Research, Vol. 11, 2001, pp. 863-874.

[5]   D. Chasman and R. M. Adams, “Predicting the Functional Consequences of Non-Synonymous Single Nucleotide Polymorphisms: Structure-Based Assessment of Amino Acid Variation,” Journal of Molecular Biology, Vol. 307, 2001, pp. 683-706.

[6]   Y. Bromberg and B. Rost, “SNAP: Predict Effect of Non- Synonymous Polymorphisms on Function,” Nucleic Acids Research, Vol. 35, 2007, pp. 3823-3835.

[7]   E. Capriotti, R. Calabrese and R. Casadio, “Predicting the Insurgence of Human Genetic Diseases Associated to Single Point Protein Mutations with Support Vector Machines and Evolutionary Information,” Bioinformatics, Vol. 22, 2006, pp. 2729-2734.

[8]   J. Hu and C. Yan, “Identification of Deleterious Non- Synonymous Single Nucleotide Polymorphisms Using Sequence-Derived Information,” BMC Bioinformatics, Vol. 9, 2008.

[9]   L. Bao and Y. Cui, “Prediction of the Phenotypic Effects of Non-Synonymous Single Nucleotide Polymorphisms Using Structural and Evolutionary Information,” Bioinformatics, Vol. 21, 2005, pp. 2185-2190.

[10]   V. G. Krishnan and D. R. Westhead, “A Comparative Study of Machine-Learning Methods to Predict the Effects of Single Nucleotide Polymorphisms on Protein Function,” Bioinformatics, Vol. 19, 2003, pp. 2199-2209.

[11]   Z.-Q. Ye, S.-Q. Zhao, G. Gao, X.-Q. Liu, R. E. Langlois, H. Lu and L. Wei, “Finding New Structural and Sequence Attributes to Predict Possible Disease Association of Single Amino Acid Lpolymorphism (SAP),” Bioinformatics, Vol. 23, 2007, pp. 1444-1450.

[12]   P. Yue, Z. L. Li and J. Moult, “Loss of Protein Structure Stability as a Major Causative Factor in Monogenic Disease,” Journal of Molecular Biology, Vol. 353, 2005, pp. 459-473.

[13]   T. Huang, P. Wang, Z.-Q. Ye, H. Xu, Z. He, K.-Y. Feng, L. Hu, W. Cui, K. Wang, X. Dong, L. Xie, X. Kong, Y.-D. Cai and Y. Li, “Prediction of Deleterious Non-Synonymous SNPs Based on Protein Interaction Network and Hybrid Properties,” PloS One, Vol. 5, 2010.

[14]   S. F. Altschul, T. L. Madden, A. A. Schäffer, J. Zhang, Z. Zhang, W. Miller and D. J. Lipman, “Gapped BLAST and PSI-BLAST: A New Generation of Protein Database Search Programs,” Nucleic Acids Research, Vol. 25, 1997, pp. 3389-3402.

[15]   T. L. Bailey, C. Elkan, S. D. D. o. C. S. University of California, and Engineering, Fitting a Mixture Model by Expectation Maximization to Discover Motifs in Bipolymers, Citeseer, 1994.

[16]   J. Cheng, A. Randall and P. Baldi, “Prediction of Protein Stability Changes for Single-Site Mutations Using Support Vector Machines,” Proteins: Structure, Function, and Bioinformatics, Vol. 62, 2006, pp. 1125-1132.

[17]   C.-C. Chang and C.-J. Lin, “LIBSVM: A Library for Support Vector Machines,” ACM Transactions on Intelligent Systems and Technology, Vol. 2, 2011, pp. 1-27.

[18]   P. Baldi, S. Brunak, Y. Chauvin, C. A. F. Andersen and H. Nielsen, “Assessing the Accuracy of Prediction Algorithms for Classification: An Overview,” Bioinformatics, Vol. 16, 2000, pp. 412-424.

[19]   P. Larranaga and J. A. Lozano, “Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation,” Vol. 2, Springer, The Netherlands, 2002.

[20]   Y. L. Yip, H. Scheib, A. V. Diemand, A. Gattiker, L. M. Famiglietti, E. Gasteiger and A. Bairoch, “The Swiss-Prot Variant Page and the ModSNP Database: A Resource for Sequence and Structure Information on Human Protein Variants,” Human Mutation, Vol. 23, 2004, pp. 464-470.