JSIP  Vol.2 No.2 , May 2011
Improved Comb Filter based Approach for Effective Prediction of Protein Coding Regions in DNA Sequences
Abstract: The prediction of protein coding regions in DNA sequences is an important problem in computational biology. It is observed that nucleotides in the protein coding regions or exons of a DNA sequence show period-3 property. Hence identification of the period-3 regions helps in predicting the gene locations within the billions long DNA sequence of eukaryotic cells. The period-3 property exhibited in exons of eukaryotic gene sequences enables signal processing based time-domain and frequency domain methods to predict these regions efficiently. Several approaches based on signal processing tools have, therefore, been applied to this problem, to predict these regions effectively. This paper describes novel and efficient comb filter-based techniques for the prediction of protein coding region based on the period-3 behavior of codon sequences. The proposed method is then validated on Burset/Guigo1996, HMR195 and KEGG standard datasets using various prediction measures. It is shown that cascaded differentiator comb (CDC) filter can be used for prediction of protein coding region with better prediction efficiency, and involves less computational complexity compared with the other signal processing techniques based on period-3 property.
Cite this paper: nullJ. Meher, P. Meher and G. Dash, "Improved Comb Filter based Approach for Effective Prediction of Protein Coding Regions in DNA Sequences," Journal of Signal and Information Processing, Vol. 2 No. 2, 2011, pp. 88-99. doi: 10.4236/jsip.2011.22012.

[1]   Z. Wang, Y. Z. Chen and Y. X. Li, “A Brief Review of Computational Gene Prediction Methods,” Genomics Pro- teomics Bioinformatics, Vol. 2, No. 4, 2004, pp. 216-221.

[2]   D. Anastassiou, “Genomic Signal Processing,” Signal Processing Magazine, Vol. 18, No. 4, 2001, pp. 8-20. doi:10.1109/79.939833

[3]   J. W. Fickett, “The Gene Identification Problem: Overview for Developers,” Computers & Chemistry, Vol. 20, No. 1, 1996, pp. 103-118. doi:10.1016/S0097-8485(96)80012-X

[4]   R. Voss, “Evolution of Long-Range Fractal Correlations and 1/f Noise in DNA Base Sequences,” Physical Review Letters, Vol. 68, No. 25, 1992, pp. 3805-3808. doi:10.1103/PhysRevLett.68.3805

[5]   P. D. Cristea, “Genetic signal Representation and Analysis,” Proceedings of SPIE Conference, International Biomedical Optics Symposium (BIOS'02), Vol. 4623, 2002, pp. 77-84.

[6]   A. K. Brodzik and O. Peters, “Symbol-Balanced Quaternionic Periodicity Transform for Latent Pattern Detection in DNA Sequences,” IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '05), Vol. 5, 2005, pp. 373-376.

[7]   T. M. Nair, S. S. Tambe and B. D. Kulkarni, “Application of Artificial Neural Networks for Prokaryotic Transcription Terminator Prediction,” FEBS Letters, Vol. 346, No. 2-3, 1994, pp. 273-277. doi:10.1016/0014-5793(94)00489-7

[8]   A. S. Nair and S. P. Sreenathan, “A Coding Measure Scheme Employing Electron-Ion Interaction Pseudopotential (EIIP),” Bioinformation, Vol. 1, No. 6, 2006, pp. 197-202.

[9]   G. L. Rosen, “Signal Processing for Biologically-Inspired Gradient Source Localization and DNA Sequence Analysis,” Ph.D. Thesis, Georgia Institute of Technology, Atlanta, 2006.

[10]   A. S. Nair and S. P. Sreenathan, “An Improved Digital Filtering Technique Using Frequency Indicators for Locating Exons,” Journal of the Computer Society of India, Vol. 36, No. 1, 2006.

[11]   R. Zhang and C. T. Zhang, “Z Curves, an Intuitive Tool for Visualizing and Analyzing the DNA Sequences,” Journal of Biomolecular Structure & Dynamics, Vol. 11, No. 4, 1994, pp. 767-782.

[12]   A. Rushdi and J. Tuqan, “Gene Identification Using the Z-Curve Representation,” IEEE International Conference on Acoustics, Speech and Signal Processing, Toulouse, 14-19 May 2006, pp. 1024-1027.

[13]   M. Akhtar, J. Epps and E. Ambikairajah, “On DNA Numerical Representations for Period-3 Based Exon Prediction,” IEEE International Workshop on Genomic Signal Processing and Statistics, Tuusula, 2007.

[14]   B. D. Silverman and R. Linsker, “A Measure of DNA Periodicity,” Journal of Theoretical Biology, Vol. 118, No. 3, 1986, pp. 295-300. doi:10.1016/S0022-5193(86)80060-1

[15]   S. Tiwari, S. Ramachandran, A. Bhattacharya, S. Bhattacharya and R. Ramaswamy, “Prediction of Probable Genes by Fourier Analysis of Genomic Sequences,” Bioinformatics, Vol. 13, No. 3, 1997, pp. 263-270. doi:10.1093/bioinformatics/13.3.263

[16]   D. Anastassiou, “Digital Signal Processing of Biomolecular Sequences,” Technical Report, Columbia University, 2000-20-041, April 2000.

[17]   D. Anastassiou, “Frequency-Domain Analysis of Biomolecular Sequences,” Bioinformatics, Vol. 16, No. 12, 2000, pp. 1073-1082. doi:10.1093/bioinformatics/16.12.1073

[18]   P. P. Vaidyanathan and B. J. Yoon, “Digital Filters for Gene Prediction Applications,” IEEE Asilomar on Signals, Systems, and Computers, Monterey, 3-6 November 2002, pp. 306-310.

[19]   P. P. Vaidyanathan and B. J. Yoon, “The Role of Signal Processing Concepts in Genomics and Proteomics,” Journal of the Franklin Institute, Vol. 341, No. 1-2, 2004, pp. 111-135. doi:10.1016/j.jfranklin.2003.12.001

[20]   D. Koltar and Y. Lavner, “Gene Prediction by Spectral Rotation (SR) Measure: A New Method for Identifying Protein-Coding Regions,” Genome Research, Vol. 13, No. 8, 2003, pp. 1930-1937.

[21]   A. Fuentes, J. Ginori and R. Abalo, “A New Predictor of Coding Regions in Genomic Sequences Using a Combination of Different Approaches,” International Journal of Biomedical and Life Sciences, Vol. 3, No. 2, 2007, pp. 1-5.

[22]   J. Tuqan and A. Rushdi, “A DSP Approach for Finding the Codon Bias in DNA Sequences,” IEEE Journal of Selected Topics in Signal Processing, Vol. 2, No. 3, 2008, pp. 343-356. doi:10.1109/JSTSP.2008.923851

[23]   P. Jesus, M. Chalco and H. Carrer, “Identification of Protein Coding Regions Using the Modified Gabor-Wavelet Tranform,” IEEE/ACM Transaction on Computational Biology and Bioinformatics, Vol. 5, No. 2, 2008, pp. 198- 207. doi:10.1109/TCBB.2007.70259

[24]   L. Galleani and R. Garello, “The Minimum Entropy Mapping Spectrum of a DNA Sequence,” IEEE Transaction on Information Theory, Vol. 56, No. 2, 2010, pp. 771-783. doi:10.1109/TIT.2009.2037041

[25]   M. Akhtar, J. Epps and E. Ambikairajah, “Signal Processing in Sequence Analysis: Advances in Eukaryotic Gene Prediction,” IEEE Journal of Selected Topics in Signal Processing, Vol. 2, No. 3, 2008, pp. 310-321. doi:10.1109/JSTSP.2008.923854

[26]   S. K. Mitra, “Digital Signal Processing,” Tata McGraw-Hill, New Delhi, 2006.

[27]   A. V. Oppenheim and R. W. Schafer, “Discrete-Time Signal Processing,” Prentice-Hall Inc., Upper Saddle River, 1999.

[28]   M. Burset and A. R. Guigo, “Evaluation of Gene Structure Prediction Programs,” Genomics, Vol. 34, No. 3, 1996, pp. 353-367. doi:10.1006/geno.1996.0298

[29]   S. Rogic, A. Mackworth and F. Ouellette, “Evaluation of Gene Finding Programs on Mammalian Sequences,” Genome Research, Vol. 11, No. 5, 2001, 817-832. doi:10.1101/gr.147901

[30]   M. Kanehisa and S. Goto, “KEGG: Kyoto Encyclopedia of Genes and Genomes,” Nucleic Acid Research, Vol. 28, No. 1, 2000, pp. 27-30. doi:10.1093/nar/28.1.27

[31]   G. Aggarwal and R. Ramaswamy, “Ab Initio Gene Identification: Prokaryote Genome Annotation with GeneScan and GLIMMER,” Journal of Biosciences, Vol. 27, No. 1, 2002, pp. 7-14. doi:10.1007/BF02703679