Data mining is the process of finding useful and relevant information from the various types of databases. Different approaches to data mining were suggested to face the challenges of storing and processing all types of data  . Nowadays data mining has increasing applications in Medical Science, Railway and so on  . Data mining provides doctors to provide necessary treatments, and thus patients are treated better along with more cheap health services, becoming popular day by day     . In pathology, it has become familiar with a strong technique in dealing with enormous pathological information to search knowledge that is given. Additionally, comparison of different classification techniques using WEKA (Waikato Environment for knowledge analysis) for blood-related data is a demanding task in medical science research. To find out better classification algorithms, it is hard to compare different classification algorithms in different collections of data  . The main concern is the classification of hematological data to predict diseases. With this purpose to perform better, hematological data analysis is divided into three phases: Hematological data collection, classification algorithms and evaluation of results and performance. Major data mining techniques are three which are known as regression, classification and clustering. The application of data mining now goes towards clinical research such as AML (Acute Myeloid Leukemia) where predictive model plays an important role     .
The remainder of this paper is organized as follows. Section 2 reviews the related works. Section 3 describes material and methods. Dataset and preprocessing are explained in Section 4. In Section 5, experimental results and discussion are illustrated. At Section 6, the conclusion is given.
2. Related Works
Several types of research have been made to evaluate the performance of data mining classification algorithms using WEKA. In the study   , the researchers evaluated the performance of data mining classification algorithm in WEKA. Another research in  compared different classification techniques using different datasets. The research in  compared the various clustering algorithms of WEKA tools. Moreover, performance analysis and evaluation of various data mining algorithms used for cancer cell classification had done  . This is also used in artificial intelligence and predicting abnormality in peripheral blood smear   . Data mining classifiers were used in the study  to develop an automated diagnosis of thalassemia  . Also, analysis of various clustering algorithms of data mining on health informatics was performed  . The area of bioinformatics has also used data mining tools and various classification techniques which were compared  -  . Data mining techniques were also used to differentiate between the patients with a normal blood disease and patients with blood tumor  . Another study highlighted on contrasting of two classification techniques J48 and Random tree by means of WEKA to classify Sickle Cell Diseases (SCD). More recently, anemia has foreseen using different data mining classification algorithms    where J48 algorithm confirmed its best performance in classifying types of anemia  . Besides, WEKA has been used in this experiment as hidden predictive information can be extracted using this algorithm from large database  . In addition, the experiment has been conducted for CBC (Compete Blood Count), which is quite rational to extract data using the intended algorithm as the WEKA is being employed for data mining widely.
3. Material and Methods
In this study, an open to all data mining tool WEKA (version 3.8.0) has been used. Two dissimilar data sets have been utilized and the performance of classification algorithms (classifiers) has been examined. The analysis has been carried out by SONY VIAO Windows version 8 with Intel® Core™ i3 Central Processing Unit, 1.70 Gigahertz Processor and 4 Gigabyte RAM. The data sets have been selected so that they vary in size, predominantly with the number of attributes. The hematological parameters consist of White blood cell o (WBC), Red blood cell count (RBC), Hemoglobin (Hb), Hematocrit (Hct), Mean corpuscular volume (MCV), Mean corpuscular hemoglobin (MCH), Mean corpuscular hemoglobin concentration (MCHC), Platelet count (PLT), Neutrophil count (NEU), Lymphocyte (LYMP), Monocyte (MONO), Eosinophil (EO), and Basophil (BASO) (SysMex 1000i Sysmexcorporation, Kobe, Japan). Hematological data were evaluated by the hand of a medical technologist. Data which are collected are allocated to multiple tags: indicative of anaemia of unceasing disorder, Eosinophilia, Microcytic hypochromic anaemia, Normocytic anaemia, Neutrophil leucocytosis, Neutrophilia, Non-specific findings, High ESR.
4. Dataset and Preprocessing
The dataset of experiment1 comprises of 425 samples and dataset of experiment 2 consists of 298 samples. The attributes characterize the Complete Blood Count (CBC) features as in Table 1.
In the preprocessing of the dataset, irrelevant attributes were eliminated, refilled the missing values and removed/refilled the outlier values on the outlier samples. Table 2 represents the dataset attributes which are used in this investigation.
5. Result and Discussion
In this study, the experiment that employs the data mining classifiers will be separated into two branches: the experimentation with full and reduced features. The outcomes from these two branches and in-depth classification accuracy analysis highlighting on the classification errors will be displayed in following sections. Three experiments were conducted in each type: the first one is to measure the performance of the Random Forest Tree classifier; the second one is to measure the performance of the Bayesian Network classifier, the third one to measure the performance of the Neural network (Multilayer Perceptron). The
Table 1. CBC test features.
Table 2. Dataset attributes.
feed-forward back-propagation neural network classifier was regulated with 500 training cycles, learning rate 0.3, and momentum 0.2.
5.1. Experiment with Full Features
In these experiments, whole traces aspects of each sample were used. The Random Forest tree classifier gives an accuracy of 96.47%, the Neural Network (Multilayer Perceptron) presents accuracy of 75.29%, and finally, the Bayesian network classifier provides accuracy of 84.70% as shown in Figure 1 and in Table 3.
5.2. Experiment with Reduced Feature
The results from these experiments are given in Table 4. The Random Forest Tree classifier puts the accuracy of 86.44%, while the Neural Network classifier provides accuracy of 52.54% and the Bayesian Network classifier gives an accuracy of 74.57% as shown in Figure 2 and in Table 4.
After considering Figure 1, Figure 2 and Table 5, it is seen that the maximum accuracy is 96.47% and the minimum accuracy is 52.54%. It can be concluded that Random Forest tree classifier is better than other classifiers considered.
This paper evaluated and investigated three preferred classification algorithms based on WEKA. By utilizing the hematological data, the superlative algorithm found is Random Forest Classifier with an accuracy of 96.47% and the total time taken to build the model is at 0.16 s. Neural Network has the accuracy of 52.54% which is the lowest accuracy in comparison with others, which is an affirmative side of this study. These results will aid the researchers to get competent results for a particular dataset. The finding will help users to analyze disease in minimal time which is a good contribution of this study.
Table 3. Dataset attributes.
Table 4. Simulation result of experiment 2.
Table 5. Comparison of various classifiers.
Figure 1. Classifiers accuracy value for Experiment 1.
Figure 2. Classifiers accuracy value for Experiment 2.
 Kaur, P., Singh, M. and Josan, G.S. (2015) Classification and Prediction Based Data Mining Algorithms to Predict Slow Learners in Education Sector. Procedia Computer Science, 57, 500-508.
 Zierk, J., Hirschmann, J., Toddenroth, D., Prokosch, H.U., Rauh, M. and Metzler, M. (2016) A Bioinformatics Approach to Pediatric Hematology Reference Intervals. Klinische Pädiatrie, 228, A45.
 Shouval, R., Bondi, O., Mishan, H., Shimoni, A., Unger, R. and Nagler, A. (2014) Application of Machine Learning Algorithms for Clinical Predictive Modeling: A Data-Mining Approach in SCT. Bone Marrow Transplantation, 49, 332-337;
 Papaemmanuil, E., Gerstung, M., Bullinger, L., Gaidzik, V., Paschka, P., Roberts, N., Potter, N.E., Heuser, M., Thol, F., Bolli, N., Gundem, G., Van Loo, P., Martincorena, I., Ganly, P., Mudie, L., McLaren, S., O’Meara, S., Raine, K., Jones, D., Teague, J., Butler, A.P., Greaves, M.E., Ganser, A., Döhner, K., Schlenk, R., Döhner, H. and Campbell, P.J. (2016) Genomic Classification and Prognosis in Acute Myeloid Leukemia. The New England Journal of Medicine, 374, 2209-2221.
 Chung, H.J., Park, C.H., Han, M.R., Lee, S., Ohn, J.H., Kim, J. and Kim, J.H. (2005) ArrayXPath II: Mapping and Visualizing Micro-Array Gene-Expression Data with High Dimension. Nucleic Acids Research, 33, W621-W626.
 Nookala, G.K.M., Orsu, N., Pottumuthu, B.K. and Mudunuri, S.B. (2013) Performance Analysis and Evaluation of Different Data Mining Algorithms Used for Cancer Classification. International Journal of Advanced Research in Artificial Intelligence, 2, 49-55.
 Saichanma, S., Chulsomlee, S., Thangrua, N., Pongsuchart, P. and Sanmun, D. (2014) The Observation Report of Red Blood Cell Morphology in Thailand Teenager by using Data Mining Technique. Advances in Hematology, 2014, Article ID: 493706.
 Othman, B., Fauzi, M. and Shan Yau, T.M. (2007) Comparison of Different Classification Techniques using WEKA for Breast Cancer. 3rd Kuala Lumpur International Conference on Biomedical Engineering, Kuala Lumpur, 11-14 December 2006, 520-523.
 Elshami, E.H. and Alhalees, A.M. (2012) Automated Diagnosis of Thalassemia Based on Data Mining Classifiers. In: The International Conference on Informatics and Applications, The Society of Digital Information and Wireless Communication, 440-445.
 Vijayarani, S. and Muthulakshmi, M. (2013) Comparative Analysis of Bayes and Lazy Classification Algorithms. International Journal of Advanced Research in Computer and Communication Engineering, 2, 3118-3124.
 Satish Kumar, D., Saeb, A.T.M. and Al Rubeaan, K. (2013) Comparative Analysis of Data Mining Tools and Classification Techniques using WEKA in Medical Bioinformatics. Computer Engineering and Intelligent Systems, 4, 28-38.
 Pandey, R., Guru, R.K. and Mount, D.W. (2004) Pathway Miner: Extracting Gene Association Networks from Molecular Pathways for Classifying and Predicting the Biological Significance of Gene Expression Microarray Data. Bioinformatics, 20, 2156-2158.
 Sharma, T., Sharma, A. and Mansotra, V. (2016) Performance Analysis of Data Mining Classification Techniques on Public Health Care Data. International Journal of Innovative Research in Computer and Communication Engineering, 4, 11381-11386.
 Alkrimi, J.A., Jalab, H.A., George, L.E., Ahmad, A.R., Suliman, A. and Al-Jashamy, K. (2015) Comparative Study using Weka for Red Blood Cells Classification. International Journal of Medical, Health, Pharmaceutical and Biomedical Engineering, 9, 19-22.