JBM  Vol.5 No.3 , March 2017
A New Method to Digitize DNA Sequence
The global description uses composition, transition and distribution to describe an amino acid sequence and has been widely adopted in various fields. Here we integrate it with properties of nucleic acid and form a new method to digitize DNA sequence. Through this method we can use a 39-dimession vector to represent a DNA sequence. We use the exon-1 of β-Globin genes of eight species to verify this method and compare with other methods. A similar result with other method proves that this method is persuading. This method provides a new strategy to digitize DNA sequence and generates DNA sequence descriptor vector. It is different from other methods and this method only produces a 39-dimession vector and not depends on the length of DNA sequence.
Cite this paper: Xu, X. and Zhu, F. (2017) A New Method to Digitize DNA Sequence. Journal of Biosciences and Medicines, 5, 7-12. doi: 10.4236/jbm.2017.53002.

[1]   Genomes Project Consortium, Auton, A., Brooks, L.D., Durbin, R.M., Garrison, E.P., Kang, H.M., Korbel, J.O., Marchini, J.L., McCarthy, S., McVean, G.A., et al. (2015) A Global Reference for Human Genetic Variation. Nature, 526, 68-74.

[2]   Roy, A., Raychaudhury, C. and Nandy, A. (1998) Novel Techniques of Graphical Representation and Analysis of DNA Sequences—A Review. J Biosciences, 23, 55- 71.

[3]   Gates, M.A. (1986) A Simple Way To Look at DNA. J Theor Biol, 119, 319-328.

[4]   Raychaudhury, C. and Nandy, A. (1999) Indexing Scheme and Similarity Measures for Macromolecular Sequences. J Chem Inf Comp Sci, 39, 243-247.

[5]   Leong, P.M. and Morgenthaler, S. (1995) Random Walk and Gap Plots of DNA Sequences. Computer Applications in the Biosciences: CABIOS, 11, 503-507.

[6]   Nandy, A. (1996) Two-Dimensional Graphical Representation of DNA Sequences and Intron-Exon Discrimination in Intron-Rich Sequences. Computer Applica-tions in the Biosciences, 12, 55-62.

[7]   Nandy, A. (2002) Investigations on Evolutionary Changes in Base Distributions in Gene Sequences. Internet Electron J Mol Des, 1, 545-558.

[8]   Nandy, A. and Basak, S.C. (2000) Simple Numerical Descriptor for Quantifying Effect of Toxic Substances on DNA Sequences. J Chem Inf Comp Sci, 40, 915-919.

[9]   Wu, Y., Liew, A.W.-C., Yan, H. and Yang, M. (2003) DB-Curve: A Novel 2D Method of DNA Sequence Visualization and Representation. Chemical Physics Letters, 367, 170-176.

[10]   Yao, Y.-H., Nan, X.-Y. and Wang, T.-M. (2006) A New 2D Graphical Representation—Classification Curve and the Analysis of Similarity/Dissimilarity of DNA Sequences. Journal of Molecular Structure: THEOCHEM, 764, 101-108.

[11]   Ghosh, S., Roy, A., Adhya, S. and Nandy, A. (2003) Identification of New Genes in Human Chromosome 3 Contig 7 by Graphical Representation Technique. Current Science-Bangalore, 84, 1534-1543.

[12]   He, P.A. and Wang, J. (2002) Characteristic Sequences for DNA Primary Sequence. J Chem Inf Comput Sci, 42, 1080-1085.

[13]   Dubchak, I., Muchnik, I., Holbrook, S.R. and Kim, S.-H. (1995) Prediction of Protein Folding Class Using Global Description of Amino Acid Sequence. Proceedings of the National Academy of Sciences, 92, 8700-8704.

[14]   Randic, M. (2000) On Characterization of DNA Primary Sequences by a Condensed Matrix. Chemical Physics Letters, 317, 29-34.

[15]   Hamori, E. and Ruskin, J. (1983) H Curves, a Novel Method of Representation of Nucleotide Series Especially Suited for Long DNA Sequences. Journal of Biological Chemistry, 258, 1318-1327.