JAMP  Vol.7 No.12 , December 2019
A New Numerical Method for DNA Sequence Analysis Based on 8-Dimensional Vector Representation
Abstract: Background: The multiple sequence alignment (MSA) algorithms are the traditional ways to compare and analyze DNA sequences. However, for large DNA sequences, these algorithms require a long time computationally. Objective: Here we will propose a new numerical method to characterize and compare DNA sequences quickly. Method: Based on a new 2-dimensional (2D) graphical representation of DNA sequences, we can obtain an 8-dimensional vector using two basic concepts of probability, the mean and the variance. Results: We perform similarity/dissimilarity analyses among two real DNA data sets, the coding sequences of the first exon of beta-globin gene of 11 species and 31 mammalian mitochondrial genomes, respectively. Conclusion: Our results are in agreement with the existing analyses in our literatures. We also compare our approach with other methods and find that ours is more effective.
Cite this paper: Zhang, D. (2019) A New Numerical Method for DNA Sequence Analysis Based on 8-Dimensional Vector Representation. Journal of Applied Mathematics and Physics, 7, 2941-2949. doi: 10.4236/jamp.2019.712204.

[1]   Jin, X., Jiang, Q., Chen, Y., et al. (2017) Similarity/Dissimilarity Calculation Methods of DNA Sequences: A Survey. Journal of Molecular Graphics and Modelling, 76, 342-355.

[2]   Zielezinski, A., Vinga, S., Almeida, J. and Karlowski, W.M. (2017) Alignment-Free Sequence Comparison: Benefits, Applications, and Tools. Genome Biology, 18, Article No. 186.

[3]   Ren, J., Bai, X., Lu, Y.Y., et al. (2018) Alignment-Free Sequence Analysis and Applications. Annual Review of Biomedical Data Science, 1, 93-114.

[4]   Hamori, E. and Ruskin, J. (1983) H Curves, a Novel Method of Representation of Nucleotide Series Especially Suited for Long DNA Sequences. The Journal of Biological Chemistry, 258, 1318-1327.

[5]   Hamori, E. (1985) Novel DNA Sequence Representations. Nature, 314, 585-586.

[6]   Gates, M.A. (1985) Simpler DNA Sequence Representations. Nature, 316, 219.

[7]   Zhang, R. and Zhang, C.T. (1994) Z Curves, an Intutive Tool for Visualizing and Analyzing the DNA Sequences. Journal of Biomolecular Structure & Dynamics, 11, 767-782.

[8]   Nandy, A. (1994) A New Graphical Representation and Analysis of DNA Sequence Structure: I. Methodology and Application to Globin Genes. Current Science, 66, 309-314.

[9]   Leong, P.M. and Morgenthaler, S. (1995) Random Walk and Gap Plots of DNA Sequences. Computer Applications in the Biosciences Cabios, 11, 503-507.

[10]   Tang, X.C., Zhou, P.P. and Qiu, W.Y. (2010) On the Similarity/Dissimilarity of DNA Sequences Based on 4D Graphical Representation. Chinese Science Bulletin, 55, 701-704.

[11]   Yau, S.S.T., Wang, J.S., Niknejad, A., Lu, C., Jin, N. and Ho, Y.K. (2003) DNA Sequence Representation without Degeneracy. Nucleic Acids Research, 31, 3078-3080.

[12]   Zhang, Z.J. (2009) DV-Curve: A Novel Intuitive Tool for Visualizing and Analyzing DNA Sequences. Bioinformatics, 25, 1112-1117.

[13]   Yu, C.L., Liang, Q.A., Yin, C.C., He, R.L. and Yau, S.S.T. (2010) A Novel Construction of Genome Space with Biological Geometry. DNA Research, 17, 155-168.

[14]   Yu, C.L., Deng, M. and Yau, S.S.T. (2011) DNA Sequence Comparison by a Novel Probabilistic Method. Inform Sciences, 181, 1484-1492.

[15]   Zou, S., Wang, L. and Wang, J. (2014) A 2D Graphical Representation of the Sequences of DNA Based on Triplets and Its Application. EURASIP Journal on Bioinformatics and Systems Biology, 2014, Article No. 1.

[16]   Zhang, Z.J., Li, J.Y., Pan, L.Q., et al. (2014) A Novel Visualization of DNA Sequences, Reflecting GC-Content. MATCH Communications in Mathematical and in Computer Chemistry, 72, 533-550.

[17]   Li, Y.S., Liu, Q. and Zheng, X.Q. (2016) DUC-Curve, a Highly Compact 2D Graphical Representation of DNA Sequences and Its Application in Sequence Alignment. Physica A, 456, 256-270.

[18]   Yu, J.F., Sun, X. and Wang, J.H. (2009) TN Curve: A Novel 3D Graphical Representation of DNA Sequence Based on Trinucleotides and Its Applications. Journal of Theoretical Biology, 261, 459-468.

[19]   Liao, B., Xiang, Q.L., Cai, L.J. and Cao, Z. (2013) A New Graphical Coding of DNA Sequence and Its Similarity Calculation. Physica A, 392, 4663-4667.

[20]   Yu, C.L., Deng, M., Zheng, L., He, R.L., Yang, J. and Yau, S.S.T. (2014) DFA7, a New Method to Distinguish between Intron-Containing and Intronless Genes. PLoS ONE, 9, e101363.

[21]   Yu, C.L., He, R.L. and Yau, S.S.T. (2014) Viral Genome Phylogeny Based on Lempel-Ziv Complexity and Hausdorff Distance. Journal of Theoretical Biology, 348, 12-20.

[22]   Siegel, K., Altenburger, K., Hon, Y.-S., Lin, J. and Yu, C. (2015) PuzzleCluster: A Novel Unsupervised Clustering Algorithm for Binning DNA Fragments in Metagenomics. Current Bioinformatics, 10, 225-231.

[23]   Yau, S.S.T., Yu, C.L. and He, R. (2008) A Protein Map and Its Application. DNA and Cell Biology, 27, 241-250.

[24]   Wu, Z.C., Xiao, X.A. and Chou, K.C. (2010) 2D-MH: A Web-Server for Generating Graphic Representation of Protein Sequences Based on the Physicochemical Properties of Their Constituent Amino Acids. Journal of Theoretical Biology, 267, 29-34.

[25]   Yu, C.L., Cheng, S.Y., He, R.L. and Yau, S.S.T. (2011) Protein Map: An Alignment-Free Sequence Comparison Method Based on Various Properties of Amino Acids. Gene, 486, 110-118.

[26]   Randic, M., Zupan, J., Balaban, A.T., Vikic-Topic, D. and Plavsic, D. (2011) Graphical Representation of Proteins. Chemical Reviews, 111, 790-862.

[27]   Liu, H.L. (2018) 2D Graphical Representation of DNA Sequence Based on Horizon Lines from a Probabilistic View. Bioscience Journal, 34, 1344-1350.

[28]   Liu, H.L. (2018) A Joint Probabilistic Model in DNA Sequences. Current Bioinformatics, 13, 234-240.

[29]   Randic, M., Vracko, M., Lers, N. and Plavsic, D. (2003) Analysis of Similarity/Dissimilarity of DNA Sequences Based on Novel 2-D Graphical Representation. Chemical Physics Letters, 371, 202-207.

[30]   Randic, M. (2004) Graphical Representations of DNA as 2-D Map. Chemical Physics Letters, 386, 468-471.

[31]   Peng, Y. and Liu, Y.W. (2015) An Improved Mathematical Object for Graphical Representation of DNA Sequences. Current Bioinformatics, 10, 332-336.

[32]   Hoang, T., Yin, C.C., Zheng, H., Yu, C.L., He, R.L. and Yau, S.S.T. (2015) A New Method to Cluster DNA Sequences Using Fourier Power Spectrum. Journal of Theoretical Biology, 372, 135-145.

[33]   Deng, M., Yu, C., Liang, Q., He, R.L. and Yau, S.S. (2011) A Novel Method of Characterizing Genetic Sequences: Genome Space with Biological Distance and Applications. PLoS ONE, 6, e17293.

[34]   Chi, R. and Ding, K.Q. (2005) Novel 4D Numerical Representation of DNA Sequences. Chemical Physics Letters, 407, 63-67.

[35]   Zhang, Y.S. and Chen, W. (2011) A New Measure for Similarity Searching in DNA Sequences. MATCH Communications in Mathematical and in Computer Chemistry, 65, 477-488.