Machine learning refers to field of computer science that involves development of algorithms that learns from and makes prediction on datasets. Due to the increased popularity of machine learning, the healthcare care service providers and medical fraternity in its entirety are considering employing machine learning in finding solutions to different maladies. In specific, the medical fraternity uses decision tree, Naïve Bayes, and logistic model tree filtering algorithms in medical diagnosis based on medical data    . Other notable medical application of machine learning includes the use of random tree algorithm in different predicting the likelihood of diabetes based on personal health records alongside family history, and improvement of cardio respiration fitness. In other cases, doctors have used the decision tree algorithm to deduce the medication to administer patients with great accuracy and efficiency. Of gravest exemplification of the role of machine learning in medicine is the Ford Exercise testing Project (FIT), which is a collation of data on coronary artery infections. From the FIT data, filtering algorithms are implemented to learn and predict the likely incidence of infections, a crucial part of preventive practice in the context of coronary treatments.
The application of machine learning in predicting mobile services is obscure. For instance, Riihijarvi and Mahonen, despite having used machine learning in their study, conducted prediction for mobile network services with the objective of identifying problems with wireless networks  . However, the research article that explored the acceptance and probable prediction of consumer mobile services was based on Technology Acceptance Model (TAM) and as such did not employ any known machine learning tools  . Consequently, this study, oblivious to others that may have used same technique, intends to use implement a machine learning algorithm in learning and predicting mobile services for smart phones. As such, it intends to provide pertinent information on the use of decision tree induction on mobile service prediction. It outlays the foundation and identifying the specific attributes upon which recommenders can be built to assist with the process of service recommendation. It is pertinent to acknowledge that recommenders besides the standard requirements of ratings and items rated perform better when personal information such as level of education are included in the learning process. Such is the contribution of this study to relevance of machine learning in predicting mobile user; it establishes the most critical user information that influences the likelihood of service usage.
2. Materials and Methods
2.1. Methodology (Decision Tree Algorithm)
A decision tree induction algorithm tests each attribute in and outcome of the dataset. It is a critical tool in decision making as it identifies the probability or likelihood of an outcome based on the test results of the attribute. Modelling a decision tree can use either a top-down approach, or a recursive approach, or a divide and conquer approach. The top-down approach results in a diagram with the most significant attribute at the root node while the least significant one at the lowest leave nodes  . The dataset used in the decision tree model is visualized from the root node using branches and leaf nodes in a sense like physical growth patterns of a tree. In applications, which are diverse and intricate, decision trees help in reducing the number of mistakes and improves the outcome. In cases where systems are built on decision tree framework, the automation reduces decision and selection time leading to improved efficiency and efficacy. The algorithm and its implementation require little effort and the chosen dataset is retrieved in a format that requires a limited number of data management practices. It is paramount to note that decision tree algorithm is best suited for dichotomous or categorical variable so that decisions made from the root node are based on segregation or branching based on the dichotomous response. In the context of machine learning or data mining, it suffices to deduce that a decision tree can elicit a linear similarity parameter (correlation coefficient) to visualize the relationships between attributes in the dataset. The linear similarity is the decision parameter upon which each path between root node and the leaves is evaluated, and its interpreted is likened to an “if-then” conditions leading to predictions.
The algorithm and the modelling process has three distinct phases, namely, tree growing, tree pruning, tree selection  . In the first phase, the algorithm requires the creation of a decision tree model which involves merging and splitting. In tree merging, grouping of significant and non-significant attributes ensue to ensure relevance. However, as the more attributes are added to the tree, errors (impurities) grow instigating the need to remove the “noise” in the tree splitting process. In the second phase, all irrelevant splitting nodes are removed to reduce probability of having an over-fitting model and the subsequent misclassification of data  . In the final phase (tree selection), the model is evaluated using either a cross-validation or conventional testing using withheld data. The phase also reduces the chance of misclassification. It is also important to note that there are several decision tree algorithms although the paper focused on Chi-squared Automatic Interactive Detector algorithm (CHAID) available in SPSS. The CHAID decision tree modelling technique was preferred because it supports categorical data but most importantly it terminates tree growing phase incidences where large errors are identified in the training set  . It therefore minimizes misclassification as it reduces the incidences that require tree splitting.
The study focused on mobile services of their flexibility and portability alongside integration of sophisticated applications that accomplish innumerable tasks including e-commerce services    . Modern devices have strong computing power and collects vast amount of information about the habits of the user. Moreover, these devices may have applications that benefit the end-users. As such, machine learning cannot not only boost mobile service experience but also facilitate the creation of new application  . For example, machine learning algorithms can be helpful in finding patterns and process well in data and facilitate the interpretation data, such as, voice recognition, camera, GPS, and accelerometer collect  . Mobile services are increasingly becoming popular because of increased usefulness of features and the use of natural language . Some smartphone applications incorporate this feature  . For instance, voice recognition feature has changed peoples’ perception about security aspect of these devices especially in the context of IoT  . Furthermore, automation of services such as language translation are becoming more relevant given the impacts that globalization is having on the world  . Applications such Skype have made communication easier while integration of translation services has eliminated language barrier  . More importantly, the integration of sensors among other vendor features that have rendered phones responsive body sensors are pushing further the boundary of smart phones  . For these reasons, it is anticipated that the demand for mobile services are likely to increase in future and it is important to understand the services mostly used and the factors that are likely to instigate their usage. From a broad perspective, machine learning algorithms has revolutionized different aspect of mobile service experience, and with increasing requirement for personalized mobile services, the role of data mining becomes more apparent . Most people have adopted a reasonable degree of addiction to their smart phones and the services offered are among the many reasons  . Is it possible then to ascribe the projected increase in the usage of the mobile services to increasing business and entertainment usage or are there hidden factors at play? Such is the task that the decision tree model embarked on using the Pew Research Centre Internet and Technology data.
2.2. Dataset and Source
The study used the dataset collected and stored by Pew Research Centre Internet and Technology. The dataset contains 140 variables and 2001 themes and covers most data on social media and other mobile service platforms that remit data either unanimously or through consent to service providers and the platform at large. The Pew Research Centre Internet and Technology data had a sample of size 2001 adults. The dataset defines any individual age 18 or older as an adult and the geographic scope of the data is national, and it included 1300 cell phone interview data. The interviews were conducted between October 6th, 2015 and December 7th, 2015. The gender distribution was almost equal while statistics techniques alongside Decision Tree Algorithm in SPSS were used to analyse the data  . The decision tree was created using variables with considerable influence on user mobile usage and the resultant model used to predict mobile service adoption.
3.1. Mobile User Demographics
The dataset showed that 1000 males and 1001 females participated in the study. Of the 2001 participants, over 1000 were married while over 400 were never married. The aggregated proportion of the sample that were either divorced, or widowed, or living with a partner, or separated was the less than 25% (see Figure 1 and Figure 2).
The sample of 2001 participant had diverse racial configuration with a majority being White followed by Black or African Americans and Asians. In specific, over 1500 out of the 2001 participants were of White race suggesting a likely bias regarding race in the data collection process. Given the imbalance in the race distribution, it was not prudent to use consider race as an element in the tree building process, especially in the higher levels of because of the likely misclassifications.
Figure 3 and Figure 4 shows the levels of education where the study established that most of the participants had a diploma certificate and above. The findings imply that most of the participants are aware of the mobiles services and have a basic knowledge about the features of the services and can make decisions regarding these services. The level of education can commiserate the lack of mobile service adoption although from the perspective of unawareness.
3.2. Decision Tree
The decision tree (DT) identifies the association between variables and used that information to create a model to make prediction  . Following the three phases of decision tree construction, the data was segmented and grouped based on similarity, and the resultant model used to predict mostly likely mobile service adoption. The summary in Table 1 shows that CHAID was the preferred tree growing method with SMART1 being at the root node (dependent variable).
In Table 1, possession of a smartphone is the dependent variable (root node item), while marital status, education level and race as the predictors of mobile
Figure 1. Participant distribution by gender.
Figure 2. Distribution of participant’s marital status.
Figure 3. Racial distributions of the participants showing a high-level of skewness.
service adoption. The CHAID does not require tree splitting and validation and the resultant model has 12 nodes with 8 being terminal. From the model, the prediction of possession of smartphones is summarized in Table 2. The model
Figure 4. Distribution of the levels of education among participants.
Table 1. Model summary from the CHAID Decision Tree Algorithm.
Table 2. Prediction summary classification.
performance shows a 98.6% accuracy for smart phone ownership while those for non-owners stood at 8.3% accuracy. The overall accuracy is 70.7%. From the table below the majority of participant possesses a smartphone while the rest do not possess a smartphone with the larger percentage of possession suggesting that people embrace services offered by the mobile phones.
The findings in Table 2 are supported by the node statistics in Figure 5, which shows that at node 0 (root node), 69.7% of the participants own smartphones while 24.1% do not own such devices. Education level is the most important predictor of possession of smartphone. It has four nodes.
The first node represents the levels of education which include four-year college or undergraduate degree, postgraduate or professional degree. The postgraduate and professional levels account for master’s, doctorate, medical or law. Of the highly educated, 81.2% possess smartphones while 16.7 do not.
At the second node, which refers to some college; two-year associate degree from a college or university; some postgraduate or professional schooling, 70.7% of the participant had smartphones while 24.7% did not.
At the third node, which refers to less than high school including those who never completed high school, the study shows that 45.3% of the participants had smartphones while 32.8% did not.
At the fourth node, which accounts for high school graduate, the model shows that 56.2% of the participants while smartphone and 33.3% did not. If the level of education data was ordinal so that the less than high school, high school graduate, college graduate, and postgraduate/professional certification are ranked in that order then it can be inferred that smartphone ownership increases with the level of education (45.3%, 56.2%, 70.7%, and 81.2%) while the proportion on non-ownership decreases with level of education. The finding suggests that the level of education influence smartphone ownership and this can be projected to mobile service adoption. The other predictor considered in the study is marital status, which has seven nodes in the model and branches from level of education.
The fifth and sixth nodes represent four-year college or university participants. In the fifth node, 83% postgraduates who are either married, or divorced, or
Figure 5. The Decision Tree model obtained using the CHAID technique.
never married, or living with a partner owns a smartphone while 15.9% do not. In the sixth node, 58.2% of the participants who are either widowed or separated possess smartphones while 27.3% do not.
In the seventh and eighth, the participants are college graduates with 64% of the participants who are either married widowed, divorced, having smartphones while 31.2% do not (Node 7).
Node 8 shows that 85.5% of participants who never married or separated have smartphones while 10.1% do not. For high school graduates, 51.9% of participants who are either married widowed or divorced have a smartphone (Node 9) while 75.8% who have never been married but staying with a partner have smartphone (Node 11).
The study established that a majority of those who are either widowed or separated do not have smartphones (Node 10). It is apparent that smartphone ownership is complex among graduates than the rest of the groups. Nonetheless, marital status tends to increase phone ownership among graduates with the proportion increasing from 81.2% to 83%. However, contextualizing and ascribing this proportion to each of the marital status may results in different results. For instance, smartphone usage among married couples is dependent on factors such as communication frequency and perception of phone use. In cases of where the perception is negative, it is likely that usage and subsequently ownership may be impeded (conjecture). However, it is apparent from the study that those who never married or those who are separated own smartphones than their counterparts, and as such can be a target group for mobile content service providers.
3.3. Uses of Smartphone
Smartphones and phones are generally different in uses and applications  . Of the participant data used in this section, about 600 people had their smartphones at the time of the interview while about 100 people did not (Figure 6). However, among those who had their smartphones some 200 people use them to make video calls or video charts (Figure 7). Further, about 400 people attested their smart phones to make purchase online and on different e-commerce platforms (Figure 8). However, despite having establish that a majority owns smartphones, most of the do not use them for video calling or video charts and certainly most people are not using those smart devices to purchase products. Such a finding is surprising given the rising popularity of e-commerce and the shift of concepts that such drop shipping has brought  . However, the date of data collection and the status of e-commerce and video call technology can account for this unusual difference. It is also possible that the list of products limited the responses of the participants.
Regarding use of smartphone for recommendations, most of the participants used smartphone for navigation recommendations using the GPS service with the overlay on maps (Figure 9). The recommender, especially on Google can determine the shortest distance and guide a user to the chosen destination, making this the most use of smartphones. Also, as it is the trend, gambling should be one of the leading uses of mobiles although this is not the case in this study. According to Figure 10, most do not use their phones to obtain sport scores or analysis (Figure 10). However, a considerable number of the participants confirmed that they use their phones to obtain scores and sports analysis.
Regarding uses of phones for video streaming services, almost 600 participants did not use their phones for streaming while 200 people did used their phones for streaming as illustrated in Figure 11. While over 400 of the
Figure 6. Smartphone ownership summary at the time of the interview.
Figure 7. Video call or chat smartphone usage.
Figure 8. Usage of smart devices to purchase products such clothing only.
Figure 9. A chart summarizing usage of phone to get direction and recommendation.
Figure 10. A chart summarizing participation of getting sports scores or analysis.
Figure 11. A chart showing whether participant used a cell phone to watch a movie.
participants confirmed smartphone usage for music listening while about 500 did not listen to music over their phones as shown in Figure 12. These statistics and differences can be attributed to excessive costs of subscriptions for online movie streaming services. Figure 13 shows that over 900 people use their smartphones to play games while most of them do not use their phones for such prospects although this is not usual since game playing is popular among millennials which most of the participants are not based on the smartphone usage patterns.
Figure 12. A chart summarizing whether phone is used to listen to music.
Figure 13. A bar graph showing whether people use the phone to play video games.
The conventional uses of smartphones can be used for entertainment, communicating and business endeavours  . From the mentioned uses of a phone, it is mostly used for entertainment, giving direction and recommendation. Given the rise in the number of mobiles services, it is increasingly becoming important to have models that predict adoption based on demographic data. According to the Decision Tree model in Figure 5, education is the key determinant for mobile services adoption although others such as age, marital status, and others can influence it.