ABC  Vol.3 No.1 , February 2013
Cysteine-associated distribution of aromatic residues in disulfide-stabilized extracellular protein families

Cysteine-dependent protein sequences were downloaded from annotated database resources to generate comprehensive EGF, Sushi, Laminin and Immu- noglobulin (IgC) motif-specific sequence files. Each dataset was vertically registered and the cumulative distribution of amino acid functional group chemistry determined relative to the respective complement of cysteine residues providing critical disulfide stabilization of these four well-known modular motif families. The cysteine-aligned amino acid distribution data revealed limited ionic, polar, hydrophobic or other side chain preferences, unique to each protein scaffold. In contrast, all four cysteine-dependent protein families exhibited strong positional preference for the aromatic residues phenylalanine (Phe) and tyrosine (Tyr), relative to analogous cysteine landmarks. More than eighty percent of the members in each protein family were found to possesses the same conserved -Cys- (Xxx)3-4-(Phe/Tyr)- arrangement, placing an aromatic amino acid at analogous EGF-C5+4, Sushi-C2+4, Laminin-C7+4 and IgC-C1+5. Over seventy percent of EGF, Sushi and IgC sequences exhibited a second obvious Cys-associated aromatic site -(Phe/Tyr)-Xxx- Cysat EGF-C4-2, Sushi-C2-2 and IgC-C2-2. The cysteine-associated placement of aromatic amino acid chemistry in four major disulfide-dependent protein families likely represents conservation of a molecular determinant of global importance in the structure- function of this large and diverse subset of extracellular proteins.

Cite this paper: Campion, S. , Longenberger, J. , Sealie, M. and Guraya, H. (2013) Cysteine-associated distribution of aromatic residues in disulfide-stabilized extracellular protein families. Advances in Biological Chemistry, 3, 90-100. doi: 10.4236/abc.2013.31012.

[1]   Henikoff, S., Greene, E.A., Pietrokovski, S., Bork, P., Attwood, T.K. and Hood, L. (1997) Gene families: The taxonomy of protein paralogs and chimeras. Science, 278, 609-614. doi:10.1126/science.278.5338.609

[2]   Chothia, C., Gough, J., Vogel, C. and Teichmann, S.A. (2003) Evolution of the protein repertoire. Science, 300, 1701-1703. doi:10.1126/science.1085371

[3]   Appella, E., Weber, I.T. and Blasi, F. (1988) Structure and function of epidermal growth factor-like regions in proteins. FEBS Letters, 231, 1-4. doi:10.1016/0014-5793(88)80690-2

[4]   Bork, P., Downing, A.K., Kieffer, B. and Campbell, I.D. (1996) Structure and distribution of modules in extracellular proteins. Quarterly Reviews of Biophysics, 29, 119 167. doi:10.1017/S0033583500005783

[5]   Kirkitadze, M.D. and Barlow, P.N. (2001) Structure and flexibility of the multiple domain proteins that regulate complement activation. Immunological Reviews, 180, 146-161. doi:10.1034/j.1600-065X.2001.1800113.x

[6]   Hegyi, H. and Bork, P. (1997) On the classification and evolution of protein modules. Journal of Protein Chemistry, 16, 545-551. doi:10.1023/A:1026382032119

[7]   Altschul, S.F., Gish, W., Miller, W., Myers, E.W. and Lipman, D.J. (1990) Basic local alignment search tool. Journal of Molecular Biology, 215, 403-410.

[8]   Altschul, S.F, Madden, T.L., Sch?ffer, A.A, Zhang, J., Zhang, Z., Miller, W. and Lipman, D.J. (1997) Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Research, 25, 3389-3402. doi:10.1093/nar/25.17.3389

[9]   Yu, Y.K., Gertz, E.M., Agarwala, R., Sch?ffer, A.A. and Altschul, S.F. (2006) Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches. Nucleic Acids Research, 34, 5966-5973. doi:10.1093/nar/gkl731

[10]   Tatusova, T. (2010) Genomic databases and resources at the National Center for Biotechnology Information. Methods in Molecular Biology, 609, 17-44. doi:10.1007/978-1-60327-241-4_2

[11]   Sayers, E.W., Barrett, T., Benson, D.A., Bolton, E., Bry ant, S.H., Canese, K., Chetvernin, V., Church, D.M., Dicuccio, M., Federhen, S., Feolo, M., Geer, L.Y., Helmberg, W., Kapustin, Y., Landsman, D., Lipman, D.J., Lu, Z., Madden, T.L., Madej, T., Maglott, D.R., Marchler-Bauer, A., Miller, V., Mizrachi, I., Ostell, J., Panchenko, A., Pruitt, K.D., Schuler, G.D., Sequeira, E., Sherry, S.T., Shumway, M., Sirotkin, K., Slotta, D., Souvorov, A., Starchenko, G., Tatusova, T.A., Wagner, L., Wang, Y., John, W., Yaschenko, E. and Ye, J. (2010) Database resources of the National Center for Biotechnology Information. Nucleic Acids Research, 38, D5-D16. doi:10.1093/nar/gkp967

[12]   Marchler-Bauer, A., Anderson, J.B., Chitsaz, F., Derby shire, M.K., DeWeese-Scott, C., Fong, J.H., Geer, L.Y., Geer, R.C., Gonzales, N.R., Gwadz, M., He, S., Hurwitz, D.I., Jackson, J.D., Ke, Z., Lanczycki, C.J., Liebert, C.A., Liu, C., Lu, F., Lu, S., Marchler, G.H., Mullokandov, M., Song, J.S., Tasneem, A., Thanki, N., Yamashita, R.A., Zhang, D., Zhang, N. and Bryant, S.H. (2009) CDD: Specific functional annotation with the Conserved Domain Database. Nucleic Acids Research, 37, D205-D210. doi:10.1093/nar/gkn845

[13]   Downing, A.K., Knott, V., Werner, J.M., Cardy, C.M., Campbell, I.A. and Handford, P.A. (1996) Solution structure of a pair of Calcium-binding epidermal growth factor-like domains: Implications for the Marfan Syndrome and other genetic disorders. Cell, 85, 597-605. doi:10.1016/S0092-8674(00)81259-3

[14]   Buljan, M. and Bateman, A. (2009) The evolution of protein families. Biochemical Society Transactions, 37, 751-755. doi:10.1042/BST0370751

[15]   Worth, C.L., Gong, S. and Blundell, T.L. (2009) Structural and functional constraints in the evolution of protein families. Nature Reviews Molecular Cell Biology, 10, 709-720.

[16]   Padlan, E.A. (1994) Anatomy of the antibody molecule. Molecular Immunology, 31, 169-217. doi10.1016/0161-5890(94)90001-9

[17]   Chothia, C., Gelfand, I. and Kister, A. (1998) Structural determinants in the sequences of immunoglobulin variable domain. Journal of Molecular Biology, 278, 457-479. doi:10.1006/jmbi.1998.1653

[18]   Andreeya, A. and Murzin, A.G. (2006) Evolution of protein fold in the presence of functional constraints. Current Opinion in Structural Biology, 16, 399-408. doi:10.1016/

[19]   Gong, S., Worth, C.L., Bickerton, G.R., Lee, S., Tanram luk, D. and Blundell T.L. (2009) Structural and functional restraints in the evolution of protein families and super families. Biochemical Society Transactions, 37, 727-733. doi:10.1042/BST0370727

[20]   Ruddock, L.W., Freedman, R.B. and Klappa, P. (2000) Specificity in substrate binding by protein folding catalysts: Tyrosine and tryptophan residues are the recognition motifs for the binding of peptides to the pancreas specific protein disulfide isomerase PDIp. Protein Science, 9, 758-764. doi:10.1110/ps.9.4.758

[21]   Klappa, P., Freedman, R.B., Langenbuch, M., Lan, M.S., Robinson, G.K. and Ruddock, L.W. (2001) The pancreas specific protein disulphide-isomerase PDIp interacts with a hydroxyaryl group in ligands. Biochemical Journal, 15, 553-559. doi:10.1042/0264-6021:3540553