JSEA  Vol.8 No.12 , December 2015
Data Modeling and Data Analytics: A Survey from a Big Data Perspective
ABSTRACT
These last years we have been witnessing a tremendous growth in the volume and availability of data. This fact results primarily from the emergence of a multitude of sources (e.g. computers, mobile devices, sensors or social networks) that are continuously producing either structured, semi-structured or unstructured data. Database Management Systems and Data Warehouses are no longer the only technologies used to store and analyze datasets, namely due to the volume and complex structure of nowadays data that degrade their performance and scalability. Big Data is one of the recent challenges, since it implies new requirements in terms of data storage, processing and visualization. Despite that, analyzing properly Big Data can constitute great advantages because it allows discovering patterns and correlations in datasets. Users can use this processed information to gain deeper insights and to get business advantages. Thus, data modeling and data analytics are evolved in a way that we are able to process huge amounts of data without compromising performance and availability, but instead by “relaxing” the usual ACID properties. This paper provides a broad view and discussion of the current state of this subject with a particular focus on data modeling and data analytics, describing and clarifying the main differences between the three main approaches in what concerns these aspects, namely: operational databases, decision support databases and Big Data technologies.

Cite this paper
Ribeiro, A. , Silva, A. and da Silva, A. (2015) Data Modeling and Data Analytics: A Survey from a Big Data Perspective. Journal of Software Engineering and Applications, 8, 617-634. doi: 10.4236/jsea.2015.812058.
References
[1]   Mayer-Schonberger, V. and Cukier, K. (2014) Big Data: A Revolution That Will Transform How We Live, Work, and Think. Houghton Mifflin Harcourt, New York.

[2]   Noyes, D. (2015) The Top 20 Valuable Facebook Statistics.
https://zephoria.com/top-15-valuable-facebook-statistics

[3]   Shvachko, K., Hairong Kuang, K., Radia, S. and Chansler, R. (2010) The Hadoop Distributed File System. 26th Symposium on Mass Storage Systems and Technologies (MSST), Incline Village, 3-7 May 2010, 1-10.
http://dx.doi.org/10.1109/msst.2010.5496972

[4]   White, T. (2012) Hadoop: The Definitive Guide. 3rd Edition, O'Reilly Media, Inc., Sebastopol.

[5]   Dean, J. and Ghemawat, S. (2008) MapReduce: Simplified Data Processing on Large Clusters. Communications, 51, 107-113.
http://dx.doi.org/10.1145/1327452.1327492

[6]   Hurwitz, J., Nugent, A., Halper, F. and Kaufman, M. (2013) Big Data for Dummies. John Wiley & Sons, Hoboken.

[7]   Beyer, M.A. and Laney, D. (2012) The Importance of “Big Data”: A Definition. Gartner.
https://www.gartner.com/doc/2057415

[8]   Duncan, A.D. (2014) Focus on the “Three Vs” of Big Data Analytics: Variability, Veracity and Value. Gartner.
https://www.gartner.com/doc/2921417/focus-vs-big-data-analytics

[9]   Agrawal, D., Das, S. and El Abbadi, A. (2011) Big Data and Cloud Computing: Current State and Future Opportunities. Proceedings of the 14th International Conference on Extending Database Technology, Uppsala, 21-24 March, 530-533.
http://dx.doi.org/10.1145/1951365.1951432

[10]   McAfee, A. and Brynjolfsson, E. (2012) Big Data: The Management Revolution. Harvard Business Review.

[11]   DataStorm Project Website.
http://dmir.inesc-id.pt/project/DataStorm.

[12]   Stahl, T., Voelter, M. and Czarnecki, K. (2006) Model-Driven Software Development: Technology, Engineering, Management. John Wiley & Sons, Inc., New York.

[13]   Schmidt, D.C. (2006) Guest Editor’s Introduction: Model-Driven Engineering. IEEE Computer, 39, 25-31.
http://dx.doi.org/10.1109/MC.2006.58

[14]   Silva, A.R. (2015) Model-Driven Engineering: A Survey Supported by the Unified Conceptual Model. Computer Languages, Systems & Structures, 43, 139-155.

[15]   Ramakrishnan, R. and Gehrke, J. (2012) Database Management Systems. 3rd Edition, McGraw-Hill, Inc., New York.

[16]   Connolly, T.M. and Begg, C.E. (2005) Database Systems: A Practical Approach to Design, Implementation, and Management. 4th Edition, Pearson Education, Harlow.

[17]   Codd, E.F. (1970) A Relational Model of Data for Large Shared Data Banks. Communications of the ACM, 13, 377-387.
http://dx.doi.org/10.1145/362384.362685

[18]   Bachman, C.W. (1969) Data Structure Diagrams. ACM SIGMIS Database, 1, 4-10.
http://dx.doi.org/10.1145/1017466.1017467

[19]   Chamberlin, D.D. and Boyce, R.F. (1974) SEQUEL: A Structured English Query Language. In: Proceedings of the 1974 ACM SIGFIDET (Now SIGMOD) Workshop on Data Description, Access and Control (SIGFIDET’ 74), ACM Press, Ann Harbor, 249-264.

[20]   Chen, P.P.S. (1976) The Entity-Relationship Model—Toward a Unified View of Data. ACM Transactions on Database Systems, 1, 9-36.
http://dx.doi.org/10.1145/320434.320440

[21]   Tanaka, A.K., Navathe, S.B., Chakravarthy, S. and Karlapalem, K. (1991) ER-R, an Enhanced ER Model with Situation-Action Rules to Capture Application Semantics. Proceedings of the 10th International Conference on Entity-Relationship Approach, San Mateo, 23-25 October 1991, 59-75.

[22]   Merson, P. (2009) Data Model as an Architectural View. Technical Note CMU/SEI-2009-TN-024, Software Engineering Institute, Carnegie Mellon.

[23]   Kimball, R. and Ross, M. (2013) The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling. 3rd Edition, John Wiley & Sons, Inc., Indianapolis.

[24]   Zhang, D., Zhai, C., Han, J., Srivastava, A. and Oza, N. (2009) Topic Modeling for OLAP on Multidimensional Text Databases: Topic Cube and Its Applications. Statistical Analysis and Data Mininig, 2, 378-395.
http://dx.doi.org/10.1002/sam.10059

[25]   Gray, J., et al. (1997) Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals. Data Mining and Knowledge Discovery, 1, 29-53.
http://dx.doi.org/10.1023/A:1009726021843

[26]   Cattell, R. (2011) Scalable SQL and NoSQL Data Stores. ACM SIGMOD Record, 39, 12-27.
http://dx.doi.org/10.1145/1978915.1978919

[27]   Gilbert, S. and Lynch, N. (2002) Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services. ACM SIGACT News, 33, 51-59.

[28]   Vogels, W. (2009) Eventually Consistent. Communications of the ACM, 52, 40-44.
http://dx.doi.org/10.1145/1435417.1435432

[29]   Grolinger, K., Higashino, W.A., Tiwari, A. and Capretz, M.A.M. (2013) Data Management in Cloud Environments: NoSQL and NewSQL Data Stores. Journal of Cloud Computing: Advances, Systems and Applications, 2, 22.
http://dx.doi.org/10.1186/2192-113x-2-22

[30]   Moniruzzaman, A.B.M. and Hossain, S.A. (2013) NoSQL Database: New Era of Databases for Big data Analytics-Classification, Characteristics and Comparison. International Journal of Database Theory and Application, 6, 1-14.

[31]   Chang, F., et al. (2006) Bigtable: A Distributed Storage System for Structured Data. Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI’ 06), Seattle, 6-8 November 2006, 205-218.

[32]   Spofford, G., Harinath, S., Webb, C. and Civardi, F. (2005) MDX Solutions: With Microsoft SQL Server Analysis Services 2005 and Hyperion Essbase. John Wiley & Sons, Inc., Indianapolis.

[33]   Hu, H., Wen, Y., Chua, T.S. and Li, X. (2014) Toward Scalable Systems for Big Data Analytics: A Technology Tutorial. IEEE Access, 2, 652-687.
http://dx.doi.org/10.1109/ACCESS.2014.2332453

[34]   Golab, L. and Ozsu, M.T. (2003) Issues in Data Stream Management. ACM SIGMOD Record, 32, 5-14.
http://dx.doi.org/10.1145/776985.776986

[35]   Chandarana, P. and Vijayalakshmi, M. (2014) Big Data Analytics Frameworks. Proceedings of the International Conference on Circuits, Systems, Communication and Information Technology Applications (CSCITA), Mumbai, 4-5 April 2014, 430-434.
http://dx.doi.org/10.1109/cscita.2014.6839299

[36]   Poole, J., Chang, D., Tolbert, D. and Mellor, D. (2002) Common Warehouse Metamodel. John Wiley & Sons, Inc., New York.

[37]   XML for Analysis (XMLA) Specification.
https://msdn.microsoft.com/en-us/library/ms977626.aspx.

[38]   ter Bekke, J.H. (1997) Comparative Study of Four Data Modeling Approaches. Proceedings of the 2nd EMMSAD Workshop, Barcelona, 16-17 June 1997, 1-12.

[39]   Hou, R. (2011) Analysis and Research on the Difference between Data Warehouse and Database. Proceedings of the International Conference on Computer Science and Network Technology (ICCSNT), Harbin, 24-26 December 2011, 2636-2639.

[40]   Thomsen, C. and Pedersen, T.B. (2005) A Survey of Open Source Tools for Business Intelligence. Proceedings of the 7th International Conference on Data Warehousing and Knowledge Discovery (DaWaK’05), Copenhagen, 22-26 August 2005, 74-84.
http://dx.doi.org/10.1007/11546849_8

[41]   Vassiliadis, P. and Sellis, T. (1999) A Survey of Logical Models for OLAP Databases. ACM SIGMOD Record, 28, 64-69.
http://dx.doi.org/10.1145/344816.344869

[42]   Chen, M., Mao, S. and Liu, Y. (2014) Big Data: A Survey. Mobile Networks and Applications, 19, 171-209.
http://dx.doi.org/10.1007/978-3-319-06245-7

[43]   Chen, H., Hsinchun, R., Chiang, R.H.L. and Storey, V.C. (2012) Business Intelligence and Analytics: From Big Data to Big Impact. MIS Quarterly, 36, 1165-1188.

[44]   Silva, A.R., Saraiva, J., Silva, R. and Martins, C. (2007) XIS-UML Profile for Extreme Modeling Interactive Systems. Proceedings of the 4th International Workshop on Model-Based Methodologies for Pervasive and Embedded Software (MOMPES’07), Braga, 31-31 March 2007, 55-66.
http://dx.doi.org/10.1109/MOMPES.2007.19

[45]   Ribeiro, A. and Silva, A.R. (2014) XIS-Mobile: A DSL for Mobile Applications. Proceedings of the 29th Symposium on Applied Computing (SAC 2014), Gyeongju, 24-28 March 2014, 1316-1323.
http://dx.doi.org/10.1145/2554850.2554926

[46]   Ribeiro, A. and Silva, A.R. (2014) Evaluation of XIS-Mobile, a Domain Specific Language for Mobile Application Development. Journal of Software Engineering and Applications, 7, 906-919.
http://dx.doi.org/10.4236/jsea.2014.711081

[47]   Silva, M.J., Rijo, P. and Francisco, A. (2014). Evaluating the Impact of Anonymization on Large Interaction Network Datasets. In: Proceedings of the 1st International Workshop on Privacy and Security of Big Data, ACM Press, New York, 3-10.
http://dx.doi.org/10.1145/2663715.2669610

[48]   Anjos, D., Carreira, P. and Francisco, A.P. (2014) Real-Time Integration of Building Energy Data. Proceedings of the IEEE International Congress on Big Data, Anchorage, 27 June-2 July 2014, 250-257.
http://dx.doi.org/10.1109/BigData.Congress.2014.44

[49]   Machado, C.M., Rebholz-Schuhmann, D., Freitas, A.T. and Couto, F.M. (2015) The Semantic Web in Translational Medicine: Current Applications and Future Directions. Briefings in Bioinformatics, 16, 89-103.
http://dx.doi.org/10.1093/bib/bbt079

[50]   Henriques, R. and Madeira, S.C. (2015) Towards Robust Performance Guarantees for Models Learned from High-Dimensional Data. In: Hassanien, A.E., Azar, A.T., Snasael, V., Kacprzyk, J. and Abawajy, J.H., Eds., Big Data in Complex Systems, Springer, Berlin, 71-104.
http://dx.doi.org/10.1007/978-3-319-11056-1_3

 
 
Top