ABSTRACT The capacity of zoonotic influenza to cross species boundaries to infect humans poses a global health threat. A previous study identified sites in 10 influenza proteins that characterize the host shifts from avian to human influenza. Here, we used seven feature selection algorithms based on machine learning techniques to generate a novel and extensive selection of diverse sites from the nine internal proteins of influenza based on statistically importance to differentiating avian from human viruses. A set of 131 sites was generated by processing each protein independently, and a selection of 113 sites was found by analyzing a concatenation of sequences from all nine proteins. These new sites were analyzed according to their annual mutational trends. The correlation of each site with all other sites (one-to-many) and the connectivity within groups of specific sites (one-to-one) were identified. We compared the performance of these new sites evaluated by four classifiers against those recorded in previous research, and found our sites to be better suited to host distinction in all but one protein, validating the significance of our site selection. Our findings indicated that, in our selection of sites, human influenza tended to mutate more than avian influenza. Despite this, the correlation and connectivity between the avian sites was stronger than that of the human sites, and the percentage of sites with high connectivity was also greater in avian influenza.
Cite this paper
nullKing, D. , Miller, Z. , Jones, W. and Hu, W. (2010) Characteristic sites in the internal proteins of avian and human influenza viruses. Journal of Biomedical Science and Engineering, 3, 943-955. doi: 10.4236/jbise.2010.310125.
 Tamuri, A.U., Reis, M., Hay, A.J. and Goldstein, R.A. (2009) Identifying changes in selective constraints: Host shifts in influenza. PLoS Comput Biol, 5(11), e1000564.
Du, X., Wang, Z. and Wu, A., et al. (2008) Networks of genomic co-occurrence capture characteristics of human influenza A (H3N2) evolution. Genome Res, 18(1), 178- 187.
Allen, J., Gardner, S., Vitalis, E. and Slezak, T. (2009) Conserved amino acid markers from past influenza pandemic strains. BMC Microbiol, 9, 77.
Furuse,Y., Suzuki, A., Kamigaki, T. and Oshitani, H. (2009) Evolution of the M gene of the influenza A virus in different host species: Large-scale sequence analysis. Virology, 6, 67.
Suzuki, Y. (2006) Natural selection on the influenza virus genome. Molecular Biology and Evolution, 23(10), 1902.
Xia, Z., Jin, G., Zhu, J.and Zhou, R. (2009) Using a mutual information-based site transition network to map the genetic evolution of influenza A/H3N2 virus. Bioinformatics, 25(28), 2309-2317.
Huang, J., King, C. and Yang, J. (2009) Co-evolution positions and rules for antigenic variants of human influenza A/H3N2 viruses. BMC Bioinformatics, 10(1), S41.
Dunn, S.D., Wahl, L.M. and Gloor, G.B. (2008) Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction. Bioinformatics, 24(3), 333-340.
Witten, I.H. and Frank, E. (2005) Data mining: Practical machine learning tools and techniques. 2nd Edition, Morgan Kaufmann Publishers, Massachusetts.
Cohen, A., Bhupatiraju, R. and Hersh, W. (2004) Feature generation, feature selection, classifiers, and conceptual drift for biomedical document triage. Proceedings of the Thirteenth Text Retrieval Conference.
Quinlan, J.R. (1993) C4.5: Programs for machine learning. Morgan Kaufmann Publishers, Massachusetts.
Holte, R.C. (1993) Very simple classification rules perform well on most commonly used data sets. Machine Learning, 11(1), 63-90.
Kononenko, I. (1994) Estimating attributes: analysis and extensions of relief. Machine Learning: ECML-94, 784, 171-182.
Platt, J. (1999) Fast training of support vector machines using sequential minimal optimization, Advances in kernel methods: support vector learning. MIT Press, Cambridge, Massachusetts, 185-208.
Cortes, C. and Vapnik, V. (1995) Support-vector network. Machine Learning, 20(3), 273-297.
Domingos, P. and Pazzani, M. (1997) On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29(2), 103-137.
Breiman, L. (2001) Random Forests. Machine Learning, 45(1), 5-32.
Rodriguez, J.J., Kuncheva, L.I. and Alonso, C.J. (2006) Rotation forest: A new classifier ensemble method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10), 1619-1630.