Parkinson’s disease (PD) is a progressive, neurodegenerative disease that belongs to the group of conditions called motor system disorders. Parkinson’s disease sufferers get worse over time as the normal bodily functions, including breathing, balance, movement, and heart function worsen  .
Other neurodegenerative disorders include Alzheimer’s disease, Huntington’s disease, and amyotrophic lateral sclerosis or Lou Gehrig’s disease. An estimated seven to 10 million people worldwide are suffering from Parkinson’s disease. Occurrence of Parkinson’s increases with age, but an estimated four percent of people with PD are diagnosed before the age of 50   . There is no cure or prevention for PD. However, the disease can be controlled in early stage. Hence data mining techniques can play effective role in early detection and diagnosis.
Data mining techniques in medicine is a research area that combines sophisticated representational and computing techniques with the insights of expert physicians to produce tools for improving healthcare. Data mining is a computational process to find hidden patterns in datasets by building predictive or classification models that can be learnt from past experience and applied to future cases. With the vast amount of medical data available to hospitals, medical centers, and medical research organizations, the field of medicine supported by data mining techniques can increase healthcare quality and can help physicians make decisions about their patients’ care. There are various techniques for classification such as support vector machine (SVM), neural networks, decision tree, and Naïve Bayes. The objective of the study is to analyze and compare four of the abovementioned classification techniques’ performances upon Parkinson’s diagnosis. First, we compare the classifiers’ performance on actual and discretized PD dataset and then compare their performance using the attributes selection algorithm.
2. Related Work
Several researches have focused on using data mining techniques for the automatic identification of Parkinson’s disease.
Mohammad S. Islam et al.  conducted a comparative analysis for effective detection on Parkinson’s disease using Random tree (RT), SVM and Feedfoward back propagation neural network (FBANN). A 10-flod cross validation analysis has been carried out for all classification. The proposed model achieved 97.37%
Aprajita Sharma and Ram Nivas  evaluated the performance of the model build using artificial neural networks (ANN), K-nearest neighbor (KNN), and
Shian Wu and Jiannjong Guo  applied factor analysis, logistic regression, decision tree, and
Geetha Ramani and G. Sivagami  provide a survey of data mining techniques that are in use today for classification. They concluded by showing that the random tree algorithm classified the Parkinson’s disease dataset accurately and offer 100% accuracy. The linear discriminante analysis C4.5, Cs-MC4, and KNN yields the accuracy result of above 90%.
A. H. Hadjahamadi and Taiebeh J. Askari  compared the classification methods (Bayesian Network, C5.1, SVM, ANN, and C&R (Classification and Regression)). C&R has an accuracy of 93.75% whereas
Yahia Alemami and Laiali Almazaydeh  developed and validated classification algorithms based on Naïve Bayes and KNN; their results show that the automated classification algorithm, Naïve Bayes, and KNN obtained a high degree of accuracy around 93.3%.
Rashidah et al.  proposed a modelin early detection and diagnosis of PD by using the Multilayer Feedforward Neural Network (MLFNN) with Back-propagation (BP) algorithm. The output of the network is classified into healthy or PD by using K-Means Clustering algorithm. The result shows that the model can be used in diagnosis and detection of PD due to the good performance, which is 83.3% for sensitivity, 63.6% for specificity, and 80% for accuracy.
3. Parkinson Dataset
We conduct an analysis on real world PD data, where the disease is diagnosed using several features extracted from human voice  . The dataset contains 22 features extracted from 31 people, of which 23 suffered from PD. As shown in Table 1, each column denotes a particular voice feature, and each row corresponds to one of 195 voice recordings from these individuals. The dataset was created by Max Little of the University of Oxford in collaboration with the National Centre for Voice and Speech, Denver, Colorado.
These extracted features of human voices are used to diagnose PD and to determine who had actually entered the stages of the disease and who were healthy.
This study applies several classification methods, including Naïve Bayes, SVM, and decision tree (j48) on the PD dataset. The goals of this study are as follows:
1) Examine which of the above classifiers give better performance, when applied to the actual PD dataset.
2) Examine the effects of attributes selection for PD dataset on the performances of the mentioned classifiers.
3) Examine the effects of discretizing PD dataset on the performances of the classifiers.
Attribute selection, is the process of selecting a subset of relevant features for use in model construction. The central assumption when using a feature selection technique is that the data contains many redundant or irrelevant features  . The discretization of
Table 1. Parkinson disease features.
a continuous-valued attribute consists of transforming it into a finite number of intervals and to re-encode, for all instances, each value of this attribute by associating it with its corresponding interval. There are many ways to realize this process  . In this study, we implement Weka 3.6.11 software; the Weka tools used for attribute selection and discretization are CfsSubsetEval-BestFirst-D1-N5 and Discretize-R first-last, respectively (Figure 1 and Figure 2).
4.1. Naïve Bayes
Naïve Bayes classifier is used in supervised learning method and it is based on “probability” concept to classify new entities. It assigns a new observation to the most probable class. The classification process comprises two stages as follows  :
Figure 1. Compute and analysis accuracy for both actual and discretized PD datasets.
Figure 2. Compute and analysis accuracy of classifiers for actual PD data with and without attributes selection.
1) Training stage: Using the training samples, the method computes the probability distribution of that sample.
2) Prediction stage: For test sample, the method computes the posterior probability of that unknown instance. The posterior is predicting that the sample belonging to each class according to the largest posterior probability, which is called Maximum A Posterior (MAP).
It is used in supervised learning models with associated learning algorithms that analyze data and recognize patterns used for classification. Given a set of training samples, each marked as belonging to one of two classes, an SVM training algorithm builds a model that assigns new examples into one class or the other, making it a non-probabil- istic binary linear classifier. An SVM model is a representation of the examples as points in space, mapped so that the examples of the separate classes are divided by a clear gap that is as wide as possible. New examples are then mapped into that space and are predicted to belong to a class based on which side of the gap they fall in  .
4.3. Multilayer Perceptron (MLP)
MLP network comprises three layers. A three-layer MLP network is an entirely linked feed forward neural network consisting of an input layer, which is not calculated because its neurons are only for demonstration and therefore do no processing. In addition, a hidden layer and an output layer (PD or healthy), which correspond to the categorization result   . Figure 3 shows the Architecture of the Multilayer Perceptron Neural Network used. Each neuron in the input and hidden layers is linked to all neurons in the subsequent layer through weighted connections. These neurons calculate the weighted sums of their inputs and add a threshold. These sums are used to calculate the neurons’ actions by applying a sigmoid activation function. The MLP network utilizes the backpropagation algorithm which is a gradient descent method for weight adjustment. The backpropagation MLP is a supervised ANN. This means the network is presented with input example in addition to the resulting desired output.
4.4. Decision Tree (j48)
Decision trees represent a supervised approach to classification. A decision tree is a simple structure where non-terminal nodes represent tests on one or more attributes and terminal nodes reflect decision outcomes. j48 is modified C4.5. The C4.5 algorithm
Figure 3. Architecture of the multilayer perceptron neural network.
generates a classification decision tree for the given dataset by recursive partitioning of data. The decision is grown using depth-first search strategy. The algorithm considers all the possible tests that can split the data set and selects a test that gives the best information gain. For each discrete attribute, one test with outcomes as many as the number of distinct values of the attribute is considered. For each continuous attribute, binary tests involving every distinct value of the attribute are considered. In order to gather the entropy gain of all these binary tests efficiently, the training data set belonging to the considered node is sorted for the values of the continuous attribute. Further, the entropy gains of the binary cut based on each distinct value are calculated in a single pass of the sorted data. This process is repeated for each continuous attributes  .
5. Accuracy Analysis
The supervised learning algorithms are applied one after the other. The confusion matrix is a useful tool that determines how well the classifier classifies the instances of different classes. This also shows values such as true positive (TP), true negative (TN), false positive (FP), and false negative (FN). The classifier accuracy is calculated and a comparative study is done to retrieve the best classifier algorithm.
6. Experimental Results
The PD dataset was divided as follows: 70% for training and 30% for testing. The experiment was performed on the abovementioned algorithms as follows:
Apply the abovementioned algorithms one by one on the actual PD dataset without applying filter algorithm. The Naïve Bayes Algorithm classifies the PD dataset and provides 58.6% accuracy. The SVM yields 86% accuracy. The MLP neural network offers 94.8% accuracy. The decision tree (j48) provides 74% accuracy. Table 2 shows the accuracy obtained and the value of TP, TN, FP, and FN for each algorithm.
Using attributes selection algorithm CfsSubsetEval-BestFirst-D1-N5 to the filter PD dataset, the attributes selected were MDVP: Fo (Hz), MDVP: Fhi (Hz), MDVP: Flo (Hz), MDVP: RAP, MDVP:APQ, NHR, Spread1, Spread2, and D2. The accuracy obtained for this case are: Naïve Bayes, 72.4%; MLP neural network, 91.3%; SVM, 86.2%; and decision tree (j48), 82.7%. Table 3 shows the accuracy obtained when applying attribute selection algorithm and the value of TP, TN, FP, and FN for each algorithm.
Table 2.Shows the accuracy obtained when applying a classifier upon the actual PD dataset.
Table 3.Shows the accuracy obtained when applying attribute selection algorithm.
Applying the classifiers on discretized PD dataset, we obtained different values of accuracy: Naïve Bayes, 79.3%; MLP, 94.8%; SVM, 96.5%; and decision tree (j48), 89.6%. Table 4 shows the accuracy and confusion result for classifiers with the discretized PD dataset.
When test mode is changed, the classifiers give different values of accuracy. Using cross validation test mode instead of presenting the split of dataset between training and test set, lead to significant change in the accuracy of some classifiers while others showed no change. Table 5 shows the changes in classifiers’ accuracy upon changing the test mode.
As a result, we conclude the following:
Naïve Bayes gives better performance when it implemented on the discretized PD dataset with cross-validation test mode, yielding 84.6%, which is the best accuracy obtained compared with its performance when implemented on the actual PD data and on selected attributes from PD data.
SVM yields 96.5%, which is a high accuracy when implemented on discretized PD data and percentage spilt test mode (70% training, 30% test).
Decision Tree (j48) gives better performance when implemented on discretized PD data yielding 89.6%. Its performance can be enhanced using cross-validation test mode, through which it yields 92.3%.
The results show that the best performance can be obtained by MLP neural network for both actual and discretized PD data, i.e., 94.8%. Moreover, the attributes selection algorithm and cross-validation test model had no significant effect on MLP performance when it is used in PDclassification (Figure 4 and Figure 5).
The aim of this study was to recognize how different classifiers would perform when implemented across the PD dataset and to evaluate their performance and examine the effectiveness of attribute selection, discretization, and test mode on the selected classifier performance when implemented on the PD dataset. A comparative study of Naïve Bayes, SVM, MLP, and decision tree (j48) classifiers on PD dataset is performed. This is done by implementing the classifiers upon the following datasets:
Actual PD dataset.
Discretized PD dataset.
Figure 4. Comparison of classification accuracy for classifiers when implemented on the actual PD dataset, discretized PD dataset, and the selected attributes from the PD dataset.
Figure 5. Comparison of classification accuracy for classifiers when performed upon the actual PD dataset, discretized PD dataset, and the selected attributes from the PD dataset using 10-fold cross-validation test mode.
Table 4.Shows the accuracy and confusion result for classifiers with discretized PD dataset.
Table 5.Accuracy when using cross-validation testing mode.
Selected set of attributes from PD dataset.
Shifting between percentage split and 10-fold cross validation test modes.
From the experimental result, we conclude that Naïve Bayes and decision tree (j48) yield better accuracy when implemented upon the discretized PD dataset with cross- validation test mode without applying any attributes selection algorithms.
In conclusion, data discretization enhanced the performance of all classifiers except MLP. Attribute selection algorithm increases only the performance of Naive Bayes and Decision Tree (j48). The training methods had no significant impact on all classifiers performances.
 Shen-Yang, L., Puvanarajah, S.D. and Ibrahim, N.M. (2011) Parkinson’s Disease: Information for People Living with Parkinson’s. Novartis Corporation (Malaysia) Sdn. Bhd. and Orient Europharma (M) Sdn. Bhd.
 Islam, M.S., Parvez, I., Deng, H. and Goswami, P. (2014) Performance Comparison of Heterogeneous Classifiers for Detection of Parkinson’s Disease Using Voice Disorder (Dysphonia). 3rd International Conference on Informatics, Electronics & Vision.
 Wu, S. and Guo, J. (2011) A Data Mining Analysis of Parkinson’s Disease. Scientific Research.
 Hadjahamadi, A.H. and Askari, T.J. (2012) A Detection Support System for Parkinson’s Disease Diagnosis Using Classification and Regression Tree. Journal of Mathematics and Computer Science, 4, 257-263.
 Olanrewaju, R.F., Sahari, N.S., Musa, A.A. and Hakiem, N. (2014) Application of Neural Networks in Early Detection and Diagnosis of Parkinson’s Disease. International Conference on Cyber and IT Service Management.
 Little, M.A., McSharry, P.E., Hunter, E.J. and Ramig, L.O. (2008), Suitability of Dysphonia Measurements for Telemonitoring of Parkinson’s Disease. IEEE Transactions on Biomedical Engineering, 56, 1015-1022.
 Fayyad, U.M. and Irony, K.B. (1993) Multi-Interval Discretization of Continuous Valued Attributes for Classification Learning. 13th International Joint Conference on Artificial Intelligence, 1022-1027.
 Cristianini, N. and Shawe-Taylor, J. (2000) An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, Cambridge.
 Zhao, Y.H. and Zhang, Y.X. (2008) Comparison of Decision Tree Methods for Finding Active Objects. Advances in Space Research, 41, 1955-1959.