characteristics shared across known members of a protein family enables their
identification within the complete set of proteins in an organism. Shared
features are usually expressed through motifs, which can incorporate specific
patterns and even amino acid (AA) biases. Based on a set of classification
patterns and biases it can be determined which additional proteins may belong
to a specific family and share its functionality. A bioinformatics tool
(Prot-Class) was implemented to examine protein sequences and characterize them
based upon user-defined AA composition percentages and user defined AA
patterns. In addition the tool allows for the identification of repeated AA
patterns, biased AA compositions within windows of user-defined length, and the
characteristics of putative signal peptides and glycosylphosphatidylinositol
(GPI) lipid anchors. ProtClass is general purpose and can be applied to analyze
protein sequences from any organism. The Prot-Class source code is available
through the GNU General Public License v3 and can be accessed via the Google
Code Repository: http://code.google.com/p/prot-class/.
Cite this paper
Lichtenberg, J. , Keppler, B. , Conley, T. , Gu, D. , Burns, P. , Welch, L. and Showalter, A. (2012) Prot-Class: A bioinformatics tool for protein classification based on amino acid signatures. Natural Science, 4, 1161-1164. doi: 10.4236/ns.2012.412A141.
 Altschul, S.F., Gish, W., Miller, W., Myers, E.W. and Lipman, D.J. (1990) Basic local alignment search tool. Journal of Moecular Biology, 215, 403-410.
 Showalter, A.M., Keppler, B.D., Lichtenberg, J., Gu, D. and Welch, L.R. (2010) A bioinformatics approach to the identification, classification, and analysis of hydroxyprolinerich glycoproteins. Plant Physiology, 153, 485-513.
 Spalding, J.D. and Hoyle, D.C. (2005) Accuracy of string kernels for protein sequence classification. Lecture Notes in Computer Science, 3686, 454-460.
 Zaki, N.M., Deris, S. and Illias, R. (2005) Application of string kernels in protein sequence classification. Applied Bioinformatics, 4, 45-52.
 Vries, J., Munshi, R., Tobi, D., Klein-Seetharaman, K., Benos, P.V. and Bahar, I. (2004) A sequence alignment- independent method for protein classification. Applied Bioinformatics, 3, 137-148.
 Heinkoff, S. and Heinkoff, J. (1994) Protein family classification based on searching a database of blocks. Genomics, 19, 97-107. doi:10.1006/geno.1994.1018
 Heinkoff, S. and Heinkoff, J. (1994) A protein family classification method for analysis of large dna sequences. Proceedings of the 27th Annual Hawaii International Conference on Systems Sciences, New York, 265-274.
 Schultz, C.J., Rumsewicz, M.P., Johnson, K.L., Jones, B.J., Gaspar, Y.M. and Bacic, A. (2002) Using genomic resources to guide research directions. The arabinogalactan protein gene family as a test case. Plant Physiology, 129, 1448-1463. doi:10.1104/pp.003459
 Bendtsen, J.D., Nielsen, H., von Heijne, G. and Brunak, S. (2004) Improved prediction of signal peptides: SignalP 3.0. Journal of Molecular Biology, 340, 783-795.
 Nielsen, H., Engelbrecht, J., Brunak, S. and von Heijne, G. (1997) Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Engineering, 10, 1-6. doi:10.1093/protein/10.1.1
 Eisenhaber, B., Wildpaner, M., Schultz, C.J., Borner, G.H., Dupree, P. and Eisenhaber, F. (2003). Gylcosylphosphatidylinositol lipid anchoring of plant proteins. Sensitive prediction from sequence- and genome-wide studies for Arabidopsis and rice. Plant Physiology, 133, 1691-1701.
 Johnson, K.L., Jones, B.J., Schultz, C.J. and Bacic, A. (2003) Non-enzymic cell wall (glyco) proteins. In: Rose, J.K.C., Ed., The Plant Cell Wall, Blackwell Publishers, Oxford, 111-154.