ABSTRACT A novel 3-D graphical representation of protein sequence has been introduced. A right cone of a unit base and unit height has been selected to represent protein sequences on its surface. The twenty amino acids have been represented by 20 circles and all protein's residues have been represented by n lines on the cone's surface. All the spots which represent the protein's residues have been shown in the cone's top view. The spatial median of all the spots is used as a new descriptor of any protein sequence. This approach was applied on two short segments of protein of yeast Saccharomyces cerevisiae. The examination of the similarities/dissimilarities for the eight ND5 proteins and the six β-globin proteins illustrate the utility of our approach. A linear correlation and significance analysis have been provided to compare our results and the percentage sequence alignment identity.
Cite this paper
M. Abo-Elkhier, M. (2012) Similarity/dissimilarity analysis of protein sequences using the spatial median as a descriptor. Journal of Biophysical Chemistry, 3, 142-148. doi: 10.4236/jbpc.2012.32016.
 Echenique P. (2007) Introduction to protein folding for physicists. Contemporary Physics, 48, 81-108.
 Feng, Z.P. and Zhang, C.T. (2002) A graphic representation of protein sequence and predicting the subcellular locations of prokaryotic proteins. The International Journal of Biochemistry & Cell Biology, 34, 298-307.
 Randic, M. (2004) 2-D graphical representation of proteins based on virtual genetic code. SAR and QSAR in Environmental Research, 15, 147-157.
 Randic, M., Zupan, J. and Balaban, A.T. (2004) Unique graphical representation of protein sequences based on nucleotide triplet codons. Chemical Physics Letters, 397, 247-252. doi:10.1016/j.cplett.2004.08.118
 Yu, Z.G., Anh, V. and Lau, K.S. (2004) Chaos game representation of protein sequences based on the detailed HP model and their multifractal and correlation analyses. Journal of Theoritical Biology, 226, 341-348.
 Randic, M., Butina, D. and Zupan, J. (2006) Novel 2-D graphical representation of proteins. Chemical Physics Letters, 419, 528-532. doi:10.1016/j.cplett.2005.11.091
 Randic, M., Novic, M., Topic, D.V. and Plasvic, D. (2006) Novel numerical and graphical representation of DNA sequences and proteins. SAR and QSAR in Environmental Research, 17, 583-595.
 Chapin, G.A., Diaz, H.G., Molina, R., Santos, J.V., Uriarte, E. and Diaz, Y.G. (2006) Novel 2D maps and coupling numbers for protein sequences. The first QSAR study of polygalacturonases; isolation and prediction of a novel sequence from Psidium guajava L. FEBS Letters, 580, 723-730. doi:10.1016/j.febslet.2005.12.072
 Randic, M. (2007) 2-D graphical representation of proteins based on physico-chemical properties of amino acids. Chemical Physics Letters, 440, 291-295.
 Randic, M. (2007) On a geometry-based approach to protein sequence alignment. Journal of Mathematical Chemistry, 43, 756-772. doi:10.1007/s10910-007-9229-7
 Randic, M., Zupan, J. and Topic D.V. (2007) On representation of proteins by star-like graphs. Journal of Molecular Graphics and Modelling, 26, 290-305.
 Wen, J. and Zhang, Y.Y. (2009) A 2D graphical representation of protein sequence and its numerical characterization. Chemical Physics Letters, 476, 281-286.
 Li, C., Yu, X. Yang, L., Zheng, X. and Wang, Z. (2009) 3-D maps and coupling numbers for protein sequences, Physica A: Statistical Mechanics and Its Applications, 388, 1967-1972. doi:10.1016/j.physa.2009.01.017
 Randic, M., Mehulic, K., Vukicevic, D., Pisanski, T., Topic, D.V. and Plavsic, D. (2009) Graphical representation of proteins as four-color maps and their numerical characterization. Journal of Molecular Graphics and Modelling, 27, 637-641. doi:10.1016/j.jmgm.2008.10.004
 Abo el Maaty, M.I., Abo-Elkhier, M.M. and Abd Elwahaab, M.A. (2010) 3D graphical representation of protein sequences and their statistical characterization. Physica A: Statistical Mechanics and Its Applications, 389, 4668-4676.
 Abo el Maaty, M.I., Abo-Elkhier, M.M. and Abd Elwahaab, M.A. (2010) Representation of protein sequences on latitude-like circles and longitude-like semi-circles. Chemical Physics Letters, 493, 386-391.
 Novic, M. and Randic, M. (2008) Representation of proteins as walks in 20-D space. SAR and QSAR in Environmental Research, 19, 317-337.
 Li, M., Badger, J.H., Chen, X., Kwong, S., Kearney, P. and Zhang, H. (2001) An information-based sequence distance and its application to whole mitochondrial genome phylogeny. Bioinformatics, 17, 149-154.
 Out, H.H. and Sayood, K. (2003) A new sequence distance measure for phylogenetic tree construction. Bioinformatics, 19, 2122-2130.
 Makarenkov, V. and Lapointe, F. (2004) A weighted leastsquares approach for inferring phylogenies from incomplete distance matrices. Bioinformatics, 20, 2113-2121.