Identification of Categorical Registration Data of Domain Names in Data Warehouse Construction Task

ABSTRACT

This work is dedicated to formation of data warehouse for processing of a large volume of registration data of domain names.** **Data cleaning is applied in order to increase the effectiveness of decision making support. Data cleaning is ap- plied in warehouses for detection and deletion of errors, discrepancy in data in order to improve their quality. For this purpose, fuzzy record comparison algorithms are for clearing of registration data of domain names reviewed in this work. Also, identification method of domain names registration data for data warehouse formation is proposed. Deci- sion making algorithms for identification of registration data are implemented in DRRacket and Python.

This work is dedicated to formation of data warehouse for processing of a large volume of registration data of domain names.

Cite this paper

Alguliev, R. and Gasimova, R. (2013) Identification of Categorical Registration Data of Domain Names in Data Warehouse Construction Task.*Intelligent Control and Automation*, **4**, 227-234. doi: 10.4236/ica.2013.42027.

Alguliev, R. and Gasimova, R. (2013) Identification of Categorical Registration Data of Domain Names in Data Warehouse Construction Task.

References

[1] E. A. Trachtengertz, “Evolution of Administrative Decision Making Support Systems of Computers,” Information Technologies, No. 1, 2006, pp. 15-31.

[2] I. P. Belyayev, “Art of Data Analysis,” Information Technologies, No. 5, 2003, pp. 31-37.

[3] V. V. Przhiakovski, “Complex Analysis of Large Volume Data: New Perspectives of Computerization,” SUBD, No. 4, 1996, pp. 71-83.

[4] A. A. Sakharov, “Construction Concept and Realization of Information Systems Oriented on Data Analysis,” SUBD, No. 4, 1996, pp. 55-70.

[5] W. H. Inmon, “Building the Data Warehouse,” 2nd Edition, John Wiley & Sons, Inc., New York, 1993, p. 298.

[6] W. H. Inmon, “Building the Data Warehouse,” 3rd Edition, John Wiley & Sons, Inc., New York, 2002, p. 412.

[7] W. H. Inmon and R. D. Hackathorn, “Using the Data Warehouse,” John Wiley & Sons, New York, 1994, p. 285.

[8] R. Kimball, “The Data Warehouse Toolkit,” John Wiley & Sons, New York, 1996, p. 388.

[9] R. Kimball and M. Ross, “The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling,” 2nd Edition, John Wiley & Sons, New York, 2002, p. 464.

[10] F. J. Damerau, “A Technique for Computer Detection and Correction of Spelling Errors,” Communications of ACM, Vol. 7, No. 3, 1964, pp. 171-176. doi:10.1145/363958.363994

[11] A. G. Sergo, “Domain Names,” Bestseller, Moscow, 2006, p. 368.

[12] A. A. Venedrukhin, “Domain Wars,” Piter, Saint Petersburg, 2009, p. 224.

[13] G. M. Landau and U. Vishkin, “Fast String Matching with k Differences,” Journal of Computer and System Sciences, Vol. 37, No. 1, 1988, pp. 63-78. doi:10.1016/0022-0000(88)90045-1

[14] G. M. Landau and U. Vishkin, “Fast Parallel and Serial Approximate String Matching,” Journal of Algorithms, Vol. 10, No. 2, 1989, pp. 157-169. doi:10.1016/0196-6774(89)90010-2

[15] Z. Galil and R. Giancarlo, “Data Structures and Algorithms for Approximate String Matching,” Journal of Complexity, Vol. 4, No. 1, 1988, pp. 33-72. doi:10.1016/0885-064X(88)90008-8

[16] A. N. Borisov and A. V. Alekseyev, “Processing of Fuzzy Information in Decision Making Systems,” Radio and Communication, Moscow, 1988, p. 304.

[17] A. Y. Solodkov, “Object Identification Algorithms in Data Bases,” System Integration, Vol. 12, No. 90, 2004, pp. 52-56.

[18] A. E. Yermakov and V. V. Pleshko, “Parsing in Statistical Text Analysis Systems,” Information Technologies, No. 7, 2002, pp. 15-17.

[19] L. A. Zade, “The Concept of a Linguistic Variable and Its Application to the Adoption of Approximate Solutions,” Mir, Moscow, 1976, p. 167.

[20] K. Iberla, “Factoral Analysis,” Statistics, Moscow, 1980, p. 398.

[21] B. Carnigan and P. Pike, “Programming Practice,” Nevsky Dialect, Saint Petersburg, 2001, p. 288.

[22] T. Connolly, K. Begg and A. Strachan, “Data Bases. Design, Realization and Maintenance. Theory and Practice,” Williams, Moscow, 2000, p. 1120.

[23] T. H. Coremen, C. I. Laserson, R. L. Rivest and K. Stein, “Algorithms: Structure and Analysis,” 2nd Edition, Williams, Moscow, 2005, p. 1296.

[24] A. Coffman, “Introduction to Fuzzy Sets Theory,” Radio and Communication, Moscow, 1982, p. 432.

[25] O. I. Larichev and E. M. Moshkovich, “Qualitative Decision Making Methods,” Fizmatlit, Moscow, 1996, p. 217.

[26] A. Ehrenfeucht and D. Haussler, “A New Distance Metric on Strings Computable in Linear Time,” Discrete Applied Mathematics, Vol. 20, No. 3, 1988, pp. 191-203.

[27] U. Masek and M. S. Peterson, “A Faster Algorithm for Computing String-Edit Distances,” Journal of Computer and System Sciences, Vol. 20, No. 1, 1980, pp. 785-807. doi:10.1016/0022-0000(80)90002-1

[28] P. H. Sellers, “The Theory of Computation of Evolutionary Distances: Pattern Recognition,” Journal of Algorithms, No. 1, 1980, pp. 359-373.

[29] E. Ukkonen, “Approximate String Matching over SuffixTrees,” Proceedings of the 4th Annual Symposium on Combinatorial Pattern Matching, Padova, 1993, pp. 229242. doi:10.1007/BFb0029808

[30] D. M. Sunday, “A Very Fast Substring Search Algorithm,” Communications of the ACM, Vol. 33, No. 8, 1990, pp. 132-142. doi:10.1145/79173.79184

[31] A. Hume and D. Sunday, “Fast String Searching,” Software—Practice and Experience, Vol. 21, No. 11, 1991, pp. 1221-1248. doi:10.1002/spe.4380211105

[32] A. V. Aho, “Algorithms for Finding Patterns in Strings,” In: J. van Leeuwen, Ed., Algorithms and Complexity, Handbook of Theoretical Computer Science, Elsevier Science Publishers, Amsterdam, 1990, pp. 255-300.

[33] R. W. Hamming, “Coding and Information Theory,” Englewood Cliffs, Prentice-Hall, Upper Saddle River, 1980, p. 239.

[34] V. I. Levenstein, “Binary Codes Capable of Correction Deletions, Insertions and Reversals,” Soviet Physics Doklady, Vol. 163, No. 4, 1965, pp. 845-848.

[35] D. Gasfield, “Lines, Trees and Sequences in Algorithms,” Informatics and Computational Biology, NEvsky Dialect, Petersburg, 2003, p. 654.

[36] R. A. Wagner and M. J. Fischer, “The String-to-String Correction Problem,” Journal of the ACM, Vol. 21, No. 1, 1974, pp. 168-173. doi:10.1145/321796.321811

[37] O. G. Berestneva and E. A. Muratova, “Construction of Logical Models Using Decision Tree,” Tomsk Polytechnic University News, Vol. 207, No. 2, 2004, pp. 55-61.

[38] J. R. Quinlan, “Generating Production Rules from Decision Trees,” Proceedings of the 10th International Joint Conference on Artificial Intelligence, Morgan Kaufmann, 1987, pp. 304-307.

[39] J. R. Quinlan, “C4.5 Programs for Machine Learning,” Morgan Kaufmann Pub., San Mateo, Vol. 240, 1993, p. 302.

[40] J. R. Quinlan, “Improved Use of Continuous Attributes in С 4.5,” Journal of Artificial Intelligence Research, Vol. 4, 1996. pp. 77-90.

[41] R. B. Findler, C. Flanagan, M. Flatt, S. Krishnamurthi and M. Felleisen, “DrScheme: A Pedagogic Programming Environment for Scheme,” Proceedings of the International Symposium on Programming Languages: Implementations, Logics, and Programs, September 1997, pp. 369-388.

[42] R. B. Findler, J. Clements., C. Flanagan, M. Flatt, S. Krishnamurthi, P. Steckler and M. Felleisen, “DrScheme: A Programming Environment for Scheme,” Journal of Functional Programming, Vol. 12, No. 2, 2002, pp. 159182.

[43] S. Manuel and J. B. Hans, “Understanding Memory Allocation of Scheme Programs,” ACM SIGPLAN International Conference on Functional Programming, Vol. 35, No. 9, 2000, pp. 245-256.

[44] M. Lutz, “Programming in Python,” 4th Edition, Symbol-Plus, Saint Petersburg, 2011, p. 992.

[45] M. Dawson, “Programming in Python,” Piter, Saint Petersburg, 2012, p. 432.

[46] I. A. Khahaev, “Workshop on Algorithms and Programming in Python,” Textbook, Alt Linux, Moscow, 2010, p. 126.

[47] D. M. Beesley, “Python,” 4th Edition, Detailed Directory, Symbol-Plus, Saint Petersburg, 2010, p. 864.

[48] E. Spearley, “Corporate Data Warehouses. Planning, Development, Realization,” Vol. 1, Williams Publishing House, Moscow, 2001, p. 395.

[1] E. A. Trachtengertz, “Evolution of Administrative Decision Making Support Systems of Computers,” Information Technologies, No. 1, 2006, pp. 15-31.

[2] I. P. Belyayev, “Art of Data Analysis,” Information Technologies, No. 5, 2003, pp. 31-37.

[3] V. V. Przhiakovski, “Complex Analysis of Large Volume Data: New Perspectives of Computerization,” SUBD, No. 4, 1996, pp. 71-83.

[4] A. A. Sakharov, “Construction Concept and Realization of Information Systems Oriented on Data Analysis,” SUBD, No. 4, 1996, pp. 55-70.

[5] W. H. Inmon, “Building the Data Warehouse,” 2nd Edition, John Wiley & Sons, Inc., New York, 1993, p. 298.

[6] W. H. Inmon, “Building the Data Warehouse,” 3rd Edition, John Wiley & Sons, Inc., New York, 2002, p. 412.

[7] W. H. Inmon and R. D. Hackathorn, “Using the Data Warehouse,” John Wiley & Sons, New York, 1994, p. 285.

[8] R. Kimball, “The Data Warehouse Toolkit,” John Wiley & Sons, New York, 1996, p. 388.

[9] R. Kimball and M. Ross, “The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling,” 2nd Edition, John Wiley & Sons, New York, 2002, p. 464.

[10] F. J. Damerau, “A Technique for Computer Detection and Correction of Spelling Errors,” Communications of ACM, Vol. 7, No. 3, 1964, pp. 171-176. doi:10.1145/363958.363994

[11] A. G. Sergo, “Domain Names,” Bestseller, Moscow, 2006, p. 368.

[12] A. A. Venedrukhin, “Domain Wars,” Piter, Saint Petersburg, 2009, p. 224.

[13] G. M. Landau and U. Vishkin, “Fast String Matching with k Differences,” Journal of Computer and System Sciences, Vol. 37, No. 1, 1988, pp. 63-78. doi:10.1016/0022-0000(88)90045-1

[14] G. M. Landau and U. Vishkin, “Fast Parallel and Serial Approximate String Matching,” Journal of Algorithms, Vol. 10, No. 2, 1989, pp. 157-169. doi:10.1016/0196-6774(89)90010-2

[15] Z. Galil and R. Giancarlo, “Data Structures and Algorithms for Approximate String Matching,” Journal of Complexity, Vol. 4, No. 1, 1988, pp. 33-72. doi:10.1016/0885-064X(88)90008-8

[16] A. N. Borisov and A. V. Alekseyev, “Processing of Fuzzy Information in Decision Making Systems,” Radio and Communication, Moscow, 1988, p. 304.

[17] A. Y. Solodkov, “Object Identification Algorithms in Data Bases,” System Integration, Vol. 12, No. 90, 2004, pp. 52-56.

[18] A. E. Yermakov and V. V. Pleshko, “Parsing in Statistical Text Analysis Systems,” Information Technologies, No. 7, 2002, pp. 15-17.

[19] L. A. Zade, “The Concept of a Linguistic Variable and Its Application to the Adoption of Approximate Solutions,” Mir, Moscow, 1976, p. 167.

[20] K. Iberla, “Factoral Analysis,” Statistics, Moscow, 1980, p. 398.

[21] B. Carnigan and P. Pike, “Programming Practice,” Nevsky Dialect, Saint Petersburg, 2001, p. 288.

[22] T. Connolly, K. Begg and A. Strachan, “Data Bases. Design, Realization and Maintenance. Theory and Practice,” Williams, Moscow, 2000, p. 1120.

[23] T. H. Coremen, C. I. Laserson, R. L. Rivest and K. Stein, “Algorithms: Structure and Analysis,” 2nd Edition, Williams, Moscow, 2005, p. 1296.

[24] A. Coffman, “Introduction to Fuzzy Sets Theory,” Radio and Communication, Moscow, 1982, p. 432.

[25] O. I. Larichev and E. M. Moshkovich, “Qualitative Decision Making Methods,” Fizmatlit, Moscow, 1996, p. 217.

[26] A. Ehrenfeucht and D. Haussler, “A New Distance Metric on Strings Computable in Linear Time,” Discrete Applied Mathematics, Vol. 20, No. 3, 1988, pp. 191-203.

[27] U. Masek and M. S. Peterson, “A Faster Algorithm for Computing String-Edit Distances,” Journal of Computer and System Sciences, Vol. 20, No. 1, 1980, pp. 785-807. doi:10.1016/0022-0000(80)90002-1

[28] P. H. Sellers, “The Theory of Computation of Evolutionary Distances: Pattern Recognition,” Journal of Algorithms, No. 1, 1980, pp. 359-373.

[29] E. Ukkonen, “Approximate String Matching over SuffixTrees,” Proceedings of the 4th Annual Symposium on Combinatorial Pattern Matching, Padova, 1993, pp. 229242. doi:10.1007/BFb0029808

[30] D. M. Sunday, “A Very Fast Substring Search Algorithm,” Communications of the ACM, Vol. 33, No. 8, 1990, pp. 132-142. doi:10.1145/79173.79184

[31] A. Hume and D. Sunday, “Fast String Searching,” Software—Practice and Experience, Vol. 21, No. 11, 1991, pp. 1221-1248. doi:10.1002/spe.4380211105

[32] A. V. Aho, “Algorithms for Finding Patterns in Strings,” In: J. van Leeuwen, Ed., Algorithms and Complexity, Handbook of Theoretical Computer Science, Elsevier Science Publishers, Amsterdam, 1990, pp. 255-300.

[33] R. W. Hamming, “Coding and Information Theory,” Englewood Cliffs, Prentice-Hall, Upper Saddle River, 1980, p. 239.

[34] V. I. Levenstein, “Binary Codes Capable of Correction Deletions, Insertions and Reversals,” Soviet Physics Doklady, Vol. 163, No. 4, 1965, pp. 845-848.

[35] D. Gasfield, “Lines, Trees and Sequences in Algorithms,” Informatics and Computational Biology, NEvsky Dialect, Petersburg, 2003, p. 654.

[36] R. A. Wagner and M. J. Fischer, “The String-to-String Correction Problem,” Journal of the ACM, Vol. 21, No. 1, 1974, pp. 168-173. doi:10.1145/321796.321811

[37] O. G. Berestneva and E. A. Muratova, “Construction of Logical Models Using Decision Tree,” Tomsk Polytechnic University News, Vol. 207, No. 2, 2004, pp. 55-61.

[38] J. R. Quinlan, “Generating Production Rules from Decision Trees,” Proceedings of the 10th International Joint Conference on Artificial Intelligence, Morgan Kaufmann, 1987, pp. 304-307.

[39] J. R. Quinlan, “C4.5 Programs for Machine Learning,” Morgan Kaufmann Pub., San Mateo, Vol. 240, 1993, p. 302.

[40] J. R. Quinlan, “Improved Use of Continuous Attributes in С 4.5,” Journal of Artificial Intelligence Research, Vol. 4, 1996. pp. 77-90.

[41] R. B. Findler, C. Flanagan, M. Flatt, S. Krishnamurthi and M. Felleisen, “DrScheme: A Pedagogic Programming Environment for Scheme,” Proceedings of the International Symposium on Programming Languages: Implementations, Logics, and Programs, September 1997, pp. 369-388.

[42] R. B. Findler, J. Clements., C. Flanagan, M. Flatt, S. Krishnamurthi, P. Steckler and M. Felleisen, “DrScheme: A Programming Environment for Scheme,” Journal of Functional Programming, Vol. 12, No. 2, 2002, pp. 159182.

[43] S. Manuel and J. B. Hans, “Understanding Memory Allocation of Scheme Programs,” ACM SIGPLAN International Conference on Functional Programming, Vol. 35, No. 9, 2000, pp. 245-256.

[44] M. Lutz, “Programming in Python,” 4th Edition, Symbol-Plus, Saint Petersburg, 2011, p. 992.

[45] M. Dawson, “Programming in Python,” Piter, Saint Petersburg, 2012, p. 432.

[46] I. A. Khahaev, “Workshop on Algorithms and Programming in Python,” Textbook, Alt Linux, Moscow, 2010, p. 126.

[47] D. M. Beesley, “Python,” 4th Edition, Detailed Directory, Symbol-Plus, Saint Petersburg, 2010, p. 864.

[48] E. Spearley, “Corporate Data Warehouses. Planning, Development, Realization,” Vol. 1, Williams Publishing House, Moscow, 2001, p. 395.