The following paper will critically analyze the current literature available on data mining within SME’s and the commercial benefits that such provides to business success and development.
It is a well-known fact that advancement in the field of computing, especially in the field of data storage and communication network, has made storage of a large amount of data easy, and somewhat essential for business decision making and operations. This availability of enormous data made it impossible human data analysis, firstly human brain is not capable of searching for composite and multifactor dependencies in stored data and secondly potential lack of objectivity in such analysis. Thus, contemporary business world requires an automated system for intelligent data analysis for efficient operations of the business and effective decision making. On the other hand, it is a fact that SMEs, especially in emerging economies like the Kingdom of Saudi Arabia, are lagging in the adaptation of the data mining, business intelligence and machine learning.
Small and Medium Enterprise (SME) and their Role in Economic Development While SME’s may not be as publicly visible within the economy versus large corporations such as Amazon, BT, Primark etc., they are vital for the success and growth of the Kingdom of Saudi Arabia economy. They are not listed on the FTSE 100, yet SME’s make up 99.9% of all businesses in the UK, with 96% designated as micro-business (employing no more than 10 people) . These businesses create £2 Trillion of turnover each year, around 52% of the UK total . SME’s combined are estimated to employ 60% of all workers in the UK, with a start-up culture seen within the UK in fintech, wider technology and high-value manufacturing, creating industries for the future .
SME’s are also vital for economic development given that they usually have a flatter management structure given lower worker numbers which in turn nurtures innovation and creativity which can then expand in the wider economy and drive new innovative businesses .
The current paper is an attempt to review the existing literature in the field of data mining, especially for SME sector, with the aim to find potential usage of Big Data and Machine Learning in this vitally important sector of the economy.
The paper starts with outlining the theoretical underpinnings of the paper i.e. Data warehousing and data mining to business intelligence. Late the paper dwells into discussing the opportunities and scope of data mining for SME sector. Later the paper examined the challenges faced by the SMEs where lack of technical expertise and financial resources may hinder the adaptation of the data mining. The paper developed a theoretical/conceptual model for the adaptation of the data mining for the SME sector. The rationale for this paper is to encourage the SMEs in the kingdom of Saudi Arabia to adapt the modern business practices as well technological tools to get a technical advantage.
2. Data Warehousing to Data Mining
To define, data warehousing is a core component of business intelligence, being the system, which is used to report and analyze business data . These systems can be built internally using basic tools such as Microsoft Excel or Microsoft Access or could be done externally through systems such as Tableau, SAP . In either case, the main aim is the same; to collect and analyze business relevant data on sales, customers, costs among others. Data is a powerful tool, which if used correctly can help inform management and signpost strategy and development .
Data mining is defined as the process of analyzing large databases in order to discover patterns and generate new information . Data mining requires an algorithm or method to analyze the data. Usually, a business will examine historical data to make some suggestions over future prospects, using these prospects to inform future budgets or capital spending .
Multiple data mining techniques can be discussed, but the main techniques cited for finding patterns are regression, clustering, and classification. As suggested by the name, classification classifies data into different classes, and the goal is to create a set of classification rules that can be used to predict behavior or answer a question . Regression is used to predict a range of numeric values, focused mainly on the relationship between two variables such as sales and day of the week . Clustering is a technique that groups individual datapoints together, aggregating them into a group based on their similarities which businesses may do with customer data, potentially using personal information such as gender, age among others . Each technique will suit a different type of data. In databases where there is a large amount of semi-structured text data, the SME may have to consider text mining whereby the focus may be on linking customer comments, reviews into groups given the inclusion of specific words, or phrases. Many businesses may use a Likert Scale approach for customer reviews, allowing for easy data analysis, however in an increasingly competitive market, there is also the need to gain a richer understanding of the customers’ views through detailed qualitative reviews .
Further, there are 2 categories of data mining models which must be mentioned :
· Unsupervised Model—This category focuses on finding patterns/clusters in the given data.
· Supervised Model—Training the system using historical data so that it is able to then predict future behavior. For instance, a retailer may use historical data so that the system can predict when seasonal buying may begin for a specified product, allowing for stock levels to be managed accordingly and sales to be maximized.
Though it must be remembered that data mining is largely a generic term and there is no singular methodology to undertake it.
Data Mining for Business Intelligence
There is a clear and robust relationship between data mining and business intelligence  . Determining patterns or trends in the data is vital for any business to better understand their customers or the market in which they operate. Business intelligence (BI) can be defined as a set of processes, architectures, and systems which are used to convert raw data into meaningful information . This is then used in the business for decision-making.
Increased competition in markets and the increase in data being collected by businesses does raise the importance of BI activities. Summarizing the available literature, it could be said that the benefits of SMEs by adopting BI are:
· Aggregating data from different sources and locations; and analyzing in a way which is insightful for the business.
· Improved decision-making, improving efficiency, productivity, sales, profitability. The speed at which a business can react to changes in the market, customer behavior is positively linked to performance .
· Risk mitigation .
3. Data Mining for SME: Opportunities and Scope
The scope for data mining is only limited by the access to data given that mining can be done with sales data, inventory, manufacturing, social media, marketing among others. But between each business, there are also other limitations to the scope given the businesses access to the resources needed to undertake data mining. Considering an SME within the retail sector it could be noted that opportunities for data mining could be analyzing social media interactions, sales data from e-commerce, sentiment from Twitter or other review platforms, as well as customer journey data from the website. If a business can understand the journey taken by customers on their e-commerce site then there is the opportunity to optimize; potentially improving links throughout the site and making it easier for the customer to purchase, in turn increasing sales . Tracking sales data can be used to determine where, or when the most sales occur, helping the business to work out whether there are seasonal fluctuations in demand, or whether they sell more products in Sheffield than Leeds, potentially determining the expansion of a standalone store. In addition to this, the data can be used to predict future occurrences, such as the growth of the market or the likelihood of customer loyalty .
Data Mining for SME: Challenges
There is agreement among studies that the main challenge for any business is gaining access to the systems needed to store and analyze such vast amounts of data . SME’s, given their size have less resources than larger businesses when it comes to finance, and human resources. They may not have the manpower to store, update and analyze these large databases; while at the same time may not have the financial means to pay for an external business to do so . While, scale, timeliness, complexity and privacy are considered as the challenges of big data Big Data Mining . SME’s in the Kingdom of Saudi Arabia must remember the need for GDPR regulations to be followed in accordance with holding personal data on customers. On the other hand, “financial, awareness and knowledge constraints” are considered as the main barriers for SME’s . Although if these could be overcome then the main benefits for SME’s of data mining were flexibility, cost, efficiency, quality and competitive advantage, underscoring the importance of SME’s to enter “Industry 4.0”, a term widely used to refer to businesses digitalizing their operations  .
The GDPR regulation and the associated costs will put many SME’s off holding personal customer data which ultimately could be analyzed to provide interesting insights into customer behavior . Figure 1 explains the challenges faced by SMEs while trying to adopt data mining .
The complexity of the above-mentioned architecture clearly represents a high barrier for SMEs, both from a financial and cultural point of view. Even if an SME is able to gain access to this architecture there is then the need for the business to have the knowledge, and time to take this analyzed data and use it effectively to drive forward strategy. This requires skills in data interpretation and presentation as well as strategy . Nevertheless, there are two main factors that can provide a viable bridge between SMEs and data mining, which is also referred to as Big Data Analytics. The evolution of cloud architectures which are wholly used online, as well as the development of open source software can help SMEs to gain insights about the related core technologies and tools. Tools such as Google Analytics are well known basic systems for analytics which SME’s can use if they have the human resources available to undertake such data manipulation and analysis.
These challenges could be summarized as a lack of IT skills, lack of statistical knowledge and lack of interest. As mentioned above the majority of SME’s in the
Figure 1. Data mining challenges for SMEs.
Kingdom of Saudi Arabia are fewer than 10 employees’, with many being family-owned and run businesses. To gain the knowledge needed for data analytics the business owners must first have interest in data mining which is harder than it sounds given the time pressures that these owners/managers face from other areas of the business such as manufacturing, sales, customer service. Figure 2 below identifies similar challenges mentioned above and shows that there is a robust correlation between studies, especially when challenges such as security are mentioned as identified  .
4. Data Mining for SME: A Model
The previous section has discussed the main challenges faced by SME’s when it comes to adopting data mining techniques. The review will now discuss the potential models which could be adopted in an SME, mentioning the merits and criticisms of each.
The most common process framework for data mining is the Cross Industry Standard Process for Data Mining (CRISP-DM model) . CRISP-DM (Figure 3)
Figure 2. Data mining challenges for SMEs.
Figure 3. Cross industry standard process for data mining (CRISP-DM).
consists of six stages that can be variably executed to implement data mining and while it was applicable to SME’s, Aalen University decided to extend the model to incorporate some of the special demands of the SME, see below:
Research proposed that SME’s needed to consider cost-benefit analysis in their data mining to ensure that their limited resources were targeted in the areas that would generate the highest return. Above mentioned model was further extended by adding 3 more stages before the task is defined (Figure 4) :
Initially, the SME would generate ideas on the available opportunities from data mining which can consider data related to customers, production, marketing among others. After this the evaluation takes place whereby the SME would consider the costs of data mining for that idea and determine whether the potential payoff would be higher for them, making it financially worthwhile (also discussed by ). Once all the Data Mining ideas are evaluated, they must be prioritized with this cost-benefit analysis. The Data Mining task that contributes the highest value to the business at the lowest cost would be preferred, similar to how investments are considered with the Net Present Value equation. Due to lacking capacities, the aspect of prioritization is especially relevant in SMEs . It has been proposed that a business should only undertake Big Data Analytics (associated with data mining) if the following four questions can be answered (i.e. cost-benefit analysis) .
1) Can we sense this data?
2) Can we generate sensible result from it?
3) Can we use it for better service?
4) Can we convert it to a profit  ?
Given that the focus in this study is on SME’s the focus would be on open-source tools which are freely available without a commercial license, encouraging widespread adoption and innovation. When  discussed the “Choose and Setup a Model” stage from CRISP-DM their focus was on RapidMiner, noting that the most popular methods are Neural Networks and Decision Tree’s. Previous studies have highlighted some key software which could be adopted though this list is not exhaustive and there are other options available for specific types of data. Mining Mart and RapidMiner are two tools highlighted by . Each system essentially relies on aspects of statistics, mathematics and computing combined, and should be used in conjunction with other systems as part of an overall rounded approach to knowledge discovery within the business.
Giudici’s Applications of Data Mining can be used to determine a methodology for data mining . It is a stepped approach:
1) Definition of the objectives for the analysis.
2) Selection, organization & any pre-treatment of the data needed. By
Figure 4. Extended CRISP-DM model.
pre-treatment the study is referring to the need to potentially omit any personal characteristics to meet regulation such as GDPR or remove any contaminated data which could skew the analysis.
3) Exploratory analysis of the data which may lead to some transformation as the data is prepared for further analysis.
4) Specification of the statistical methodology to be used, informed by the type and size of the database as well as the initial objectives set out.
5) Data analysis.
6) Evaluation and comparison of the methods used.
7) Interpretation of the results from the model and use of this within business decision-making.
For many SMEs, the first stage of data mining may be the integration of data already produced internally within the business, much of which may currently be offline and stored in sales ledgers, or in notebooks. The initial stage of data mining would be to transfer this historical data into a database, potentially utilizing Microsoft Excel or Access. A system could be built where all future sales data is imputed into this system alongside customer information, sales enquires, online sentiment so that all data is in one accessible place. This allows for exploratory analysis by management who may not necessarily be experienced in statistical analysis, instead using tools such as simple regression, correlating, pivot tables which are available on Excel. Though the issue is that such a system would require data to be inputted by colleagues within the business which changes the culture of the business and also adds further time pressure onto workers .
It has been cited that SME’s are reluctant to venture into data mining given that lack of case studies available from similar businesses  though as mentioned this is difficult given that all SME’s may differ in terms of their sector, business model and available resources meaning that any knowledge transfer process may be unique to that business. This is interesting given that the variety of SME’s makes it difficult to present a model for data mining which is specified to a certain business group, or sector. Instead, previous studies have developed more generalized models which can be implemented by any SME, but in a way to meet their own specifications. An SME needs to focus on customer satisfaction as a key metric for business success given that there is a positive relationship between satisfaction and customer loyalty, and sales . The KANO model (Figure 5) was initially proposed by  and further developed by  and .
The idea in this method is that it gets the SME to focus on what products, or areas of the business are important to drive customer satisfaction. This then leads to manageable BI plan developed in the business which may focus on specific data points such as sentiment, or sales of a specific product. It focuses management attention on data which could become Key Performance Indicators (KPI’s) and helps reduce the risk that data mining leads an SME to become drowned in data which cannot be properly analyzed and so has little value to add within decision-making .
Figure 5. KANO for data mining.
Rapid Miner and KNIME are both open source programs which are cited in  and . Nevertheless, these systems still require data to be inputted from a source such as Excel and with such there is still the need for an SME to digitalize their data collection and storage. A survey of 12 open source systems with SME’s, showed KNIME to be the clear favorite given its link with Excel/ Access files as well as its ease of use . A similar conclusion was also drawn in , noting the ease of KNIME to gain insight into data. While  favored Rapid Miner given that workflows can be created to represent several methodologies. It has been observed that the development of an Artificial Neural Network (ANN) was the best predictor for future sales, essentially using historical data and examples to form a conscious within the system which can then be used to predict future events . In another research, it was noted that an ANN to have three layers, Figure 6, with the hidden layer being the analysis of the data to find relationships between the variables .
The above ANN is based on assimilation of data on the way that the human brain processes information . The ANN can learn through assimilating historical data to better forecast and recommend action for the business. A simple learning model applied by neural networks is the process of weighting input streams in favor of those most likely to be accurate, helping the business understand the key drivers of sales, or production among others .
A decision tree is a structure that includes a root “node”, branches, and leaf nodes. Each internal node denotes a test on an attribute, each branch denotes the outcome of a test, and each leaf node holds a class label. The business may use
Figure 6. Artificial Neural Network (ANN) for SMEs.
historical data to improve the accuracy of these tests, but it would allow the SME to be better at targeting their business. The decision tree may show that it is more likely a senior-aged customer will purchase their product versus a younger customer, meaning the SME can target their marketing spend .
Data mining has developed markedly over recent years to become a sophisticated tool whereby a business can track customer behavior alongside internal business performance to gain a better understanding of behavior. Amazon has become extremely successful at using their ecosystem of tools to monitor customer interaction with their website and systems such as Alexa to predict sales, using such to manipulate the customer with offers, or marketing to improve the potential for that sale to be realized. Facebook can work out when customers are more likely to engage with content allowing businesses to better plan their posts, while internal data mining allows businesses to optimize manufacturing, achieving leaner manufacturing processes with less wastage. Manufactures also use Big Data to identify areas for cost reduction in energy usage. The potential opportunities are better insight into customer behavior leading to better positioning of the business; as well as better internal processes leading to increased efficiency, productivity and this cost reduction per unit.
The review has highlighted the challenges for any SME would be implementing these systems in a way which matches with the resources available to that SME, be that financial, cultural or human. With this, the method section has recommended an incremental adoption of data mining techniques whereby the initial focus for any SME would be integrating all their internal data into a centralized database which can be viewed, and interpreted to determine some key themes, and thus areas to focus on. SME’s are reluctant to invest heavily in data mining systems until they see real benefits to their revenue, or production or cost base. Incremental steps would allow an SME to first undertake data mining internally using open access software or tools available with Excel, focusing on basic methods such as clustering the data, regression. If the SME gains new opportunities from the initial results of this data mining, then more capital can be given in future years to expand. This incremental movement into data mining will help overcome the challenge of limited resources for SME’s, though the main risk would be that it means SME’s would be slower to adopt the data mining practices of their larger peers and so in turn could become less competitive over time, failing to meet the ever changing needs of the customer. But a counterbalance is needed between practicality, financial viability, and the actual need for data mining in the business to realize new opportunities.
The review has highlighted several methods which an SME could use to create their bespoke data mining system.
To conclude, the following paper has identified several key challenges for any SME to overcome when considering data mining. However, what is clear is that these challenges are linked with the scale of the data mining system adopted. Challenges such as data security, resource-intensive, financial cost and talent requirement all become a greater challenge as the size of the data mining system increases. The main conclusion for any SME would be that data mining is a technique which needs to be scaled-up over time alongside the size of the business. “Start Small” is the key taken away from this paper for any business. The first stage for any SME should be the digitalization of their internal data which would allow for management to access what data points they have access to (i.e. sales, customer, production data) and to see the quality of the data they collect. Before any data mining technique is undertaken the SME must be confident over the quality and reliability of their data collection. Later the business can scale-up the data mining process to gather greater BI within their industry. The business must take steps to match data mining to their ability to analyze the data and draw strategy to use it for business decisions and strategy to gain a competitive edge over rival businesses/competitors.
The author would like to express his cordial thanks to Prof. M. Zubair Khan for his valuable advice.
 Market Inspector (2019) Essential Facts You Should Know about SMEs in the UK.
 Oliveira, C., Guimarães, T., Portela, F. and Santos, M. (2019) Benchmarking Business Analytics Techniques in Big Data. Procedia Computer Science, 160, 690-695.
 Chertchom, P. (2018) A Comparison Study between Data Mining Tools over Regression Methods: Recommendation for SMEs. 2018 5th International Conference on Business and Industrial Research (ICBIR), Bangkok, 17-18 May 2018, 46-50.
 Shah, S., Soriano, C.B. and Coutroubis, A.D. (2017) Is Big Data for Everyone? The Challenges of Big Data Adoption in SMEs. In 2017 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM), Singapore, 10-13 December 2017, 803-807.
 Hashmi, A.S. and Ahmad, T. (2016) Big Data Mining: Tools & Algorithms. International Journal of Recent Contributions from Engineering, Science & IT (iJES), 4, 36-40.
 Azevedo, A. (2016) Data Mining and Business Intelligence: A Comparative, Historical Perspective. In: Azevedo, A. and Filipe, M., Eds., Integration of Data Mining in Business Intelligence Systems, IGI Global, 1-11,
 Packianather, M.S., Davies, A., Harraden, S., Soman, S. and White, J. (2017) Data Mining Techniques Applied to a Manufacturing SME. Procedia CIRP, 62, 123-128.
 Coleman, S.Y. (2016) Data-Mining Opportunities for Small and Medium Enterprises with Official Statistics in the UK. Journal of Official Statistics, 32, 849-865.
 Del Vecchio, P., Di Minin, A., Petruzzelli, A.M., Panniello, U. and Pirri, S. (2018) Big Data for Open Innovation in SMEs and Large Corporations: Trends, Opportunities, and Challenges. Creativity and Innovation Management, 27, 6-22.
 Coleman, S., Göb, R., Manco, G., Pievatolo, A., Tort-Martorell, X. and Reis, M.S. (2016) How can SMEs Benefit from Big Data? Challenges and a Path Forward. Quality and Reliability Engineering International, 32, 2151-2164.
 Moeuf, A., Pellerin, R., Lamouri, S., Tamayo-Giraldo, S. and Barbaray, R. (2018) The industrial Management of SMEs in the Era of Industry 4.0. International Journal of Production Research, 56, 1118-1136.
 Bachlechner, D. and Leimbach, T. (2016, August) Big Data Challenges: Impact, Potential Responses and Research Needs. 2016 IEEE International Conference on Emerging Technologies and Innovative Business Practices for the Transformation of Societies (EmergiTech), Balaclava, 3-6 August 2016, 257-264.
 Iqbal, M., Kazmi, S.H.A., Manzoor, A., Soomrani, A.R., Butt, S.H. and Shaikh, K.A. (2018, March) A Study of Big Data for Business Growth in SMEs: Opportunities & Challenges. 2018 International Conference on Computing, Mathematics and Engineering Technologies (iCoMET), Sukkur, 3-4 March 2018, 1-7
 Härting, R.C. and Sprengel, A. (2019) Cost-Benefit Considerations for Data Analytics-An SME-Oriented Framework Enhanced by a Management Perspective and the Process of Idea Generation. Procedia Computer Science, 159, 1537-1546.
 Ayoubi, E. and Aljawarneh, S. (2018) Challenges and Opportunities of Adopting Business Intelligence in SMEs: Collaborative Model. Proceedings of the 1st International Conference on Data Science, E-Learning and Information Systems, Madrid, October 2018, 1-5.
 Azevedo, F. and Reis, J.L. (2019) Big Data Analysis in Supply Chain Management in Portuguese SMEs “Leader Excellence”. World Conference on Information Systems and Technologies. Springer, Cham, 621-632.
 Ploder C. and Kohlegger, M. (2018) A Model for Data Analysis in SMEs Based on Process Importance. In: Uden, L., Hadzima, B. and Ting, I.H., Eds., Knowledge Management in Organizations, KMO 2018, Communications in Computer and Information Science, Vol. 877, Springer, Cham, 26-35.
 Tontini, G., Søilen, K. and Silveira, A. (2013) How Do Interactions of Kano Model Attributes Affect Customer Satisfaction? An Analysis Based on Psychological Foundations. Total Quality Management and Business Excellence, 24, 1253-1271.
 Müller, J., Maier, L., Veile, J. and Voigt, K.I. (2017) Cooperation Strategies among SMEs for Implementing Industry 4.0. In: Wolfgang, K., Thorsten, B. and Ringle Christian, M., Ed., Digitalization in Supply Chain Management and Logistics: Smart and Digital Solutions for an Industry 4.0 Environment, Proceedings of the Hamburg International Conference of Logistics (HICL), Vol. 23, epubli GmbH, Berlin, 301-318.
 Almeida, P. and Bernardino, J. (2016) A Survey on Open Source Data Mining Tools for SMEs. In: Rocha, á., Correia, A., Adeli, H., Reis, L. and Mendonça Teixeira, M., Eds., New Advances in Information Systems and Technologies, Springer, Cham, 253-262.
 Naik, A. and Samant, L. (2016) Correlation Review of Classification Algorithm Using Data Mining Tool: WEKA, Rapidminer, Tanagra, Orange and Knime. Procedia Computer Science, 85, 662-668.
 Chen, X., Ye, Y., Williams, G. and Xu, X. (2007, May) A Survey of Open Source Data Mining Systems. In: Washio, T., Eds., Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, Berlin, Heidelberg, 3-14.
 Dwivedi, S., Kasliwal, P. and Soni, S. (2016) Comprehensive Study of Data Analytics Tools (RapidMiner, Weka, R Tool, Knime). 2016 Symposium on Colossal Data Analysis and Networking (CDAN), Indore, 18-19 March 2016, 1-8.
 Massaro, A., Maritati, V. and Galiano, A. (2018) Data Mining Model Performance of Sales Predictive Algorithms Based on RapidMiner Workflows. International Journal of Computer Science & Information Technology (IJCSIT), 10, 39-56.
 Marr, B. (2018) What Are Artificial Neural Networks—A Simple Explanation for Absolutely Anyone.
 Dai, Q.Y., Zhang, C.P. and Wu, H. (2016) Research of Decision Tree Classification Algorithm in Data Mining. International Journal of Database Theory and Application, 9, 1-8.