Syntax-Tree Regular Expression Based DFA FormalConstruction

ABSTRACT

Compiler is a program whose functionality is to translate a computer program written in source language into an equivalent machine code. Compiler construction is an advanced research area because of its size and complexity. The source codes are in higher level languages which are usually complex and, consequently, increase the level of abstraction. Due to such reasons, design and construction of error free compiler is a challenge of the twenty first century. Verification of a source program does not guarantee about correctness of code generated because the bugs in compiler may lead to an incorrect target program. Therefore, verification of compiler is more important than verifying the source programs. Lexical analyzer is a main phase of compiler used for scanning input and grouping into sequence of tokens. In this paper, formal construction of deterministic finite automata (DFA) based on regular expression is presented as a part of lexical analyzer. At first, syntax tree is described based on the augmented regular expression. Then formal description of important operators, checking nullability and computing first and last positions of internal nodes of the tree is described. In next, the transition diagram is described from the follow positions and converted into deterministic finite automata by defining a relationship among syntax tree, transition diagram and DFA. Formal specification of the procedure is described using Z notation and model analysis is provided using Z/Eves toolset.

Compiler is a program whose functionality is to translate a computer program written in source language into an equivalent machine code. Compiler construction is an advanced research area because of its size and complexity. The source codes are in higher level languages which are usually complex and, consequently, increase the level of abstraction. Due to such reasons, design and construction of error free compiler is a challenge of the twenty first century. Verification of a source program does not guarantee about correctness of code generated because the bugs in compiler may lead to an incorrect target program. Therefore, verification of compiler is more important than verifying the source programs. Lexical analyzer is a main phase of compiler used for scanning input and grouping into sequence of tokens. In this paper, formal construction of deterministic finite automata (DFA) based on regular expression is presented as a part of lexical analyzer. At first, syntax tree is described based on the augmented regular expression. Then formal description of important operators, checking nullability and computing first and last positions of internal nodes of the tree is described. In next, the transition diagram is described from the follow positions and converted into deterministic finite automata by defining a relationship among syntax tree, transition diagram and DFA. Formal specification of the procedure is described using Z notation and model analysis is provided using Z/Eves toolset.

Cite this paper

N. Zafar and F. Alsaade, "Syntax-Tree Regular Expression Based DFA FormalConstruction,"*Intelligent Information Management*, Vol. 4 No. 4, 2012, pp. 138-146. doi: 10.4236/iim.2012.44021.

N. Zafar and F. Alsaade, "Syntax-Tree Regular Expression Based DFA FormalConstruction,"

References

[1] C. J. Burgess, “The Role of Formal Methods in Software Engineering Education and Industry,” Technical Report, University of Bristol, Bristol, 1995.

[2] D. Richard , K. R. Chandramouli and R. W. Butler, “Cost Effective Use of Formal Methods in Verification and Validation,” Foundations V&V Workshop, 2002.

[3] N. A. Zafar, “Automatic Construction of Formal Syntax Tree Based on Regular Expressions,” The 2012 International Conference of Computer Science and Engineering (ICCSE), 2012, in Press.

[4] H. Beek, A. Fantechi, S. Gnesi and F. Mazzanti, “State/ Event-Based Software Model Checking,” Integrated Formal Methods, Vol. 2999, 2004, pp. 128-147.doi:10.1007/978-3-540-24756-2_8

[5] O. Hasan and S. Tahar, “Verification of Probabilistic Properties in the HOL Theorem Prover,” Integrated Formal Methods, Vol. 4591, 2007, pp. 333-352.doi:10.1007/978-3-540-73210-5_18

[6] F. Gervais, M. Frappier and R. Laleau, “Synthesizing B Specifications from EB3 Attribute Definitions,” Integrated Formal Methods, Vol. 3771, 2005, pp. 207-226.doi:10.1007/11589976_13

[7] J. S. Dong, R. Duke and P. Hao, “Integrating Object-Z with Timed Automata,” 10th IEEE International Conference on Engineering of Complex Computer Systems, Singapore, 16-20 June 2005, pp. 488-497.

[8] J. S. Dong, et al., “Timed Patterns: TCOZ to Timed Automata,” The 6th ICFEM, Seattle, 8-12 November 2004, pp. 483-498.

[9] R. L. Constable, et al., “Formalizing Automata II: Decidable Properties,” Technical Report, Cornell University, Ithaca, 1997.

[10] R. L. Constable, et al., “Constructively Formalizing Automata Theory,” Foundations of Computing Series, MIT Press, Cambridge, 2000.

[11] M. Heiner and M. Heisel, “Modeling Safety Critical Systems with Z and Petri Nets,” International Conference on Computer Safety, Reliability and Security, Toulouse, 27-29 November 1999, pp. 361-374. doi:10.1007/3-540-48249-0_31

[12] H. Leading and J. Souquieres, “Integration of UML and B Specification Techniques: Systematic Transformation from OCL Expressions into B,” Asia-Pacific Software Engineering Conference, Vandoeuvre-les-Nancy, 2002, pp. 495-504.

[13] H. Leading and J. Souquieres, “Integration of UML Views Using B Notation,” Proceedings of Workshop on Integration and Transformation of UML Models, Málaga, 11 June 2002.

[14] W. Wechler, “The Concept of Fuzziness in Automata and Language Theory,” Akademic-Verlag, Berlin, 1978.

[15] N. M. John and S. M. Davender, “Fuzzy Automata and Languages: Theory and Applications,” Chapman & Hall, London, 2002.

[16] M. Ito, “Algebraic Theory of Automata and Languages,” World Scientific Publishing Co., New York, 2004.doi:10.1142/9789812562685

[17] J. M. Spivey, “The Z Notation: A Reference Manual,” Printice-Hall, Englewood Cliffs, 1989.

[18] C. Lindig, “Random Testing of C Calling Conventions,” Proceedings of the Sixth International Symposium on Automated Analysis-Driven Debugging, Monterey, 19-21 September 2005, pp. 3-12.

[19] J. A. Anderson, “Automata Theory with Modern Applications,” Cambridge University Press, Cambridge, 2006.doi:10.1017/CBO9780511607202

[20] M. van den Brand, A. Sellink and C. Verhoef, “Generation of Components for Software Renovation Factories from Context-Free Grammars,” Proceedings of the Fourth Working Conference on Reverse Engineering, Amsterdam, 6-8 October 1997, pp. 144-153.doi:10.1109/WCRE.1997.624585

[21] M. Balakrishna, D. Moldovan and E. K. Cave, “Automatic Creation and Tuning of Context-Free Grammars for Interactive Voice Response Systems,” Proceedings of 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering, Dallas, 30 October-1 November 2005, pp. 158-163.

[22] L. Pedersen and H. Reza, “A Formal Specification of a Programming Language: Design of Pit,” Second International Symposium on Leveraging Applications of Formal Methods, Verification and Validation, Grand Forks, 15-19 November 2006, pp. 111-118.

[23] D. P. Tuan, “Computing with Words in Formal Methods,” Technical Report, University of Canberra, Canberra, 2000.

[24] A. Hall, “Correctness by Construction: Integrating Formality into a Commercial Development Process,” Praxis Critical Systems Limited, Vol. 2391, 2002, pp. 139-157.

[25] D. K. Kaynar and N. Lynchn, “The Theory of Timed I/O Automata,” Morgan & Claypool Publishers, San Rafael, 2006.

[26] D. Jackson, I. Schechter and I. Shlyakhter, “Alcoa: The Alloy Constraint Analyzer,” Proceedings of the 22nd International Conference of Software Engineering, Limerick, 4-11 June 2000, pp. 730-733.

[27] D. Aspinall and L. Beringer, “Optimisation Validation,” Electronic Notes in Theoretical Computer Science, Vol. 176, No. 3, 2007, pp. 37-59. doi:10.1016/j.entcs.2006.06.017

[28] S. Briaisa and U. Nestmannb, “A Formal Semantics for Protocol Narrations,” Theoretical Computer Science, Vol. 389, 2007, pp. 484-511. doi:10.1016/j.tcs.2007.09.005

[29] L. Freitas, J. Woodcock and Y. Zhang, “Verifying the CICS File Control API with Z/Eves: An Experiment in the Verified Software Repository,” Science of Computer Programming, Vol. 74, No. 4, 2009, pp. 197-218.doi:10.1016/j.scico.2008.09.012

[30] S. E. Kleene, “Representations of Events in Nerve Nets and Finite Automata, Automata Studies,” Princeton University Press, Princeton, 1956, pp. 3-42.

[31] J. Sakarovitch, “Elements de Theorie des Automates,” Cambridge University Press, Vuibert, 2003.

[32] J. Sakarovitch, “The Language, the Expression, and the Small Automaton, Implementation and Application of Automata,” 10th International Conference (CIAA), Lecture Notes in Computer Science, Vol. 3845, 2005, pp. 15-30.

[33] V. M. Glushkov, “The Abstract Theory of Automata,” Russian Mathematical Surveys, Vol. 16, No. 5, 1961, pp. 1-53. doi:10.1070/RM1961v016n05ABEH004112

[34] R. F. McNaughton and H. Yamada, “Regular Expressions and State Graphs for Automata,” IEEE Transactions on Electronic Computers, Vol. 9, No. 1, 1960, pp. 39-57.doi:10.1109/TEC.1960.5221603

[35] J. A. Brzozowski, “Derivatives of Regular Expressions,” Journal of Association for Computing Machinery, Vol. 11, No. 4, 1964, pp. 481-494. doi:10.1145/321239.321249

[36] V. Antimirov, “Partial Derivatives of Regular Expressions and Finite Automaton Constructions,” Theoretical Computer Science, Vol. 155, No. 2, 1996, pp. 291-319.doi:10.1016/0304-3975(95)00182-4

[37] F. Gecseg and M. Steinby, “Tree Languages,” Handbook of Formal Languages, Vol. 3, 1997, pp. 1-68.

[1] C. J. Burgess, “The Role of Formal Methods in Software Engineering Education and Industry,” Technical Report, University of Bristol, Bristol, 1995.

[2] D. Richard , K. R. Chandramouli and R. W. Butler, “Cost Effective Use of Formal Methods in Verification and Validation,” Foundations V&V Workshop, 2002.

[3] N. A. Zafar, “Automatic Construction of Formal Syntax Tree Based on Regular Expressions,” The 2012 International Conference of Computer Science and Engineering (ICCSE), 2012, in Press.

[4] H. Beek, A. Fantechi, S. Gnesi and F. Mazzanti, “State/ Event-Based Software Model Checking,” Integrated Formal Methods, Vol. 2999, 2004, pp. 128-147.doi:10.1007/978-3-540-24756-2_8

[5] O. Hasan and S. Tahar, “Verification of Probabilistic Properties in the HOL Theorem Prover,” Integrated Formal Methods, Vol. 4591, 2007, pp. 333-352.doi:10.1007/978-3-540-73210-5_18

[6] F. Gervais, M. Frappier and R. Laleau, “Synthesizing B Specifications from EB3 Attribute Definitions,” Integrated Formal Methods, Vol. 3771, 2005, pp. 207-226.doi:10.1007/11589976_13

[7] J. S. Dong, R. Duke and P. Hao, “Integrating Object-Z with Timed Automata,” 10th IEEE International Conference on Engineering of Complex Computer Systems, Singapore, 16-20 June 2005, pp. 488-497.

[8] J. S. Dong, et al., “Timed Patterns: TCOZ to Timed Automata,” The 6th ICFEM, Seattle, 8-12 November 2004, pp. 483-498.

[9] R. L. Constable, et al., “Formalizing Automata II: Decidable Properties,” Technical Report, Cornell University, Ithaca, 1997.

[10] R. L. Constable, et al., “Constructively Formalizing Automata Theory,” Foundations of Computing Series, MIT Press, Cambridge, 2000.

[11] M. Heiner and M. Heisel, “Modeling Safety Critical Systems with Z and Petri Nets,” International Conference on Computer Safety, Reliability and Security, Toulouse, 27-29 November 1999, pp. 361-374. doi:10.1007/3-540-48249-0_31

[12] H. Leading and J. Souquieres, “Integration of UML and B Specification Techniques: Systematic Transformation from OCL Expressions into B,” Asia-Pacific Software Engineering Conference, Vandoeuvre-les-Nancy, 2002, pp. 495-504.

[13] H. Leading and J. Souquieres, “Integration of UML Views Using B Notation,” Proceedings of Workshop on Integration and Transformation of UML Models, Málaga, 11 June 2002.

[14] W. Wechler, “The Concept of Fuzziness in Automata and Language Theory,” Akademic-Verlag, Berlin, 1978.

[15] N. M. John and S. M. Davender, “Fuzzy Automata and Languages: Theory and Applications,” Chapman & Hall, London, 2002.

[16] M. Ito, “Algebraic Theory of Automata and Languages,” World Scientific Publishing Co., New York, 2004.doi:10.1142/9789812562685

[17] J. M. Spivey, “The Z Notation: A Reference Manual,” Printice-Hall, Englewood Cliffs, 1989.

[18] C. Lindig, “Random Testing of C Calling Conventions,” Proceedings of the Sixth International Symposium on Automated Analysis-Driven Debugging, Monterey, 19-21 September 2005, pp. 3-12.

[19] J. A. Anderson, “Automata Theory with Modern Applications,” Cambridge University Press, Cambridge, 2006.doi:10.1017/CBO9780511607202

[20] M. van den Brand, A. Sellink and C. Verhoef, “Generation of Components for Software Renovation Factories from Context-Free Grammars,” Proceedings of the Fourth Working Conference on Reverse Engineering, Amsterdam, 6-8 October 1997, pp. 144-153.doi:10.1109/WCRE.1997.624585

[21] M. Balakrishna, D. Moldovan and E. K. Cave, “Automatic Creation and Tuning of Context-Free Grammars for Interactive Voice Response Systems,” Proceedings of 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering, Dallas, 30 October-1 November 2005, pp. 158-163.

[22] L. Pedersen and H. Reza, “A Formal Specification of a Programming Language: Design of Pit,” Second International Symposium on Leveraging Applications of Formal Methods, Verification and Validation, Grand Forks, 15-19 November 2006, pp. 111-118.

[23] D. P. Tuan, “Computing with Words in Formal Methods,” Technical Report, University of Canberra, Canberra, 2000.

[24] A. Hall, “Correctness by Construction: Integrating Formality into a Commercial Development Process,” Praxis Critical Systems Limited, Vol. 2391, 2002, pp. 139-157.

[25] D. K. Kaynar and N. Lynchn, “The Theory of Timed I/O Automata,” Morgan & Claypool Publishers, San Rafael, 2006.

[26] D. Jackson, I. Schechter and I. Shlyakhter, “Alcoa: The Alloy Constraint Analyzer,” Proceedings of the 22nd International Conference of Software Engineering, Limerick, 4-11 June 2000, pp. 730-733.

[27] D. Aspinall and L. Beringer, “Optimisation Validation,” Electronic Notes in Theoretical Computer Science, Vol. 176, No. 3, 2007, pp. 37-59. doi:10.1016/j.entcs.2006.06.017

[28] S. Briaisa and U. Nestmannb, “A Formal Semantics for Protocol Narrations,” Theoretical Computer Science, Vol. 389, 2007, pp. 484-511. doi:10.1016/j.tcs.2007.09.005

[29] L. Freitas, J. Woodcock and Y. Zhang, “Verifying the CICS File Control API with Z/Eves: An Experiment in the Verified Software Repository,” Science of Computer Programming, Vol. 74, No. 4, 2009, pp. 197-218.doi:10.1016/j.scico.2008.09.012

[30] S. E. Kleene, “Representations of Events in Nerve Nets and Finite Automata, Automata Studies,” Princeton University Press, Princeton, 1956, pp. 3-42.

[31] J. Sakarovitch, “Elements de Theorie des Automates,” Cambridge University Press, Vuibert, 2003.

[32] J. Sakarovitch, “The Language, the Expression, and the Small Automaton, Implementation and Application of Automata,” 10th International Conference (CIAA), Lecture Notes in Computer Science, Vol. 3845, 2005, pp. 15-30.

[33] V. M. Glushkov, “The Abstract Theory of Automata,” Russian Mathematical Surveys, Vol. 16, No. 5, 1961, pp. 1-53. doi:10.1070/RM1961v016n05ABEH004112

[34] R. F. McNaughton and H. Yamada, “Regular Expressions and State Graphs for Automata,” IEEE Transactions on Electronic Computers, Vol. 9, No. 1, 1960, pp. 39-57.doi:10.1109/TEC.1960.5221603

[35] J. A. Brzozowski, “Derivatives of Regular Expressions,” Journal of Association for Computing Machinery, Vol. 11, No. 4, 1964, pp. 481-494. doi:10.1145/321239.321249

[36] V. Antimirov, “Partial Derivatives of Regular Expressions and Finite Automaton Constructions,” Theoretical Computer Science, Vol. 155, No. 2, 1996, pp. 291-319.doi:10.1016/0304-3975(95)00182-4

[37] F. Gecseg and M. Steinby, “Tree Languages,” Handbook of Formal Languages, Vol. 3, 1997, pp. 1-68.