JBM  Vol.4 No.12 , December 2016
A Complete and Accurate Short Sequence Alignment Algorithm for Repeats
Abstract: Eukaryotic genomes contain a significant fraction of repeats, which have very important biomedical function. Thus, aligning repeats from short sequences back to reference genome is the key step for further genome analysis. Unfortunately, the current aligning algorithms performed poorly in distinguishing repeats and nonrepeats. To this end, we proposed a new algorithm, named HashRepAligner, to address this problem. Finally, the cross comparison with other algorithms was performed, and the results indicated that HashRepAligner outperformed other aligners in terms of the detecting repeats.
Cite this paper: Lian, S. , Liu, T. , Gong, K. , Chen, X. and Zheng, G. (2016) A Complete and Accurate Short Sequence Alignment Algorithm for Repeats. Journal of Biosciences and Medicines, 4, 144-151. doi: 10.4236/jbm.2016.412018.

[1]   Shendure, J., et al. (2004) Advanced Sequencing Technologies: Methods and Goals. Nature Reviews Genetics, 5, 335-344.

[2]   Bentley, D.R. (2006) Whole-Genome Re-Sequencing. Current Opinion in Genetics and Development, 16, 545-552.

[3]   Harris, T.D., Buzby, P.R., Babcock, H., et al. (2008) Single-Molecule DNA Sequencing of a Viral Genome. Science, 320, 106-109.

[4]   Metzker, M.L. (2010) Sequencing Technologies the Next Generation. Nature Reviews Genetics, 11, 31-46.

[5]   Mardis, E.R. (2008) The Impact of Next-Generation Sequencing Technology on Genetics. Trends in Genetics, 24, 133-141.

[6]   The 1000 Genomes Project Consortium (2010) A Map of Human Genome Variation from Population-Scale Sequencing. Nature, 467, 1061-1073.

[7]   Genome 10K Community of Scientists (2009) Genome 10K: A Proposal to Obtain Whole-Genome Sequence for 10,000 Vertebrate Species. Journal of Heredity, 100, 659-674.

[8]   Treangen, T.J. and Salzberg, S.L. (2012) Repetitive DNA and Next Generation Sequencing: Computational Challenges and Solutions. Nature Reviews Genetics, 13, 36-46.

[9]   Stein, L.D., Bao, Z., Blasiar, D., et al. (2003) The Genome Sequence of Caenorhabditis briggsae: A Platform for Comparative Genomics. PLoS Biology, 1, Article E45.

[10]   International Human Genome Consortium (2001) Initial Sequencing and Analysis of the Human Genome. Nature, 409, 860-921.

[11]   Iafrate, A.J., Feuk, L., Rivera M.N., et al. (2004) Detection of Large-Scale Variation in the Human Genome. Nature Genetics, 36, 949-951.

[12]   Feuk, L., Carson, A.R. and Scherer, S.W. (2006) Structural Variation in the Human Genome. Nature Reviews Genetics, 7, 85-97.

[13]   Langmead, B., Trapnell, C., Pop, M. and Salzberg, S.L. (2009) Ultrafast and Memory-Efficient Alignment of Short DNA Sequences to the Human Genome. Genome Biology, 10, R25.

[14]   Li, R.Q., Li, Y.R., Kristiansen, K. and Wang, J. (2008) SOAP: Short Oligonucleotide Alignment Program. Bioinformatics Application Note, 24, 713-714.

[15]   Saha, S., Bridges, S., Magbanua, Z.V. and Peterson., D.G. (2008) Empirical Comparison of Ab Initio Repeat Finding Programs. Nucleic Acids Research, 36, 2284-2294.

[16]   Lian, S.B., Chen, X.W., Wang, P., Zhang, X.L. and Dai, X.H. (2016) A Complete and Accurate Ab Initio Repeat Finding Algorithm. Interdisciplinary Sciences-Computational Life Sciences, 8, 75-83.