A Unique Approach to Epilepsy Classification from EEG Signals Using Dimensionality Reduction and Neural Networks

Show more

Received 28 March 2016; accepted 20 April 2016; published 13 June 2016

1. Introduction

For a majority of the biomedical scientists, medical practitioners and biomedical engineers, a lot of research is in progress about the functioning of the human brain [1] . The brain is actually a very complex and important organ of a human body where the interconnection of the neurons happens with both the remote and local ones. Since epilepsy greatly affects the quality of life of humans on a day to day basis, tremendous attention is drawn towards this particular disorder. The local or the remote interactions of the neurons in the brain are projected as the spatio-temporal electromagnetic field of the brain and thus EEG recordings are made easily [2] . To measure the activities of the brain, the only direct way is processing EEG and thus in the area of biomedical research it holds a paramount importance.

When performing the analysis of any particular information, high dimensional data are found often in most of the disciplines. The dimensions of the data must be made low and it should be regularized because only then approximation techniques can be applied easily [3] . In a high dimensional vector space, it is difficult to process the data and therefore it is mandatory to convert into a smooth low-dimensional manifold. For easy classification purposes and for the modeling of numerous non-linear applications, low dimensional data are often required.

EEG signal processing has several vital constraints. In EEG signal processing, a huge number of signals have to be processed, which is generally very difficult since all the signals are highly interdependent. Each signal is very unique in an EEG and hence it is not repeatable. Also, based on the characteristics of the equipment or source, EEG signals are often noisy [4] . The observed dataset is focused primarily by the dimensionality reduction techniques and it avoids the generalization performance. The total number of columns is reduced in Dimensionality Reduction techniques and the vectors are mapped to their respective sub-spaces. In the EEG recording session, it is mandatory to record the waveforms ranging from minutes to hours with a sampling frequency of about 200 Hz. In such cases, the generated data are so huge and magnified by a thousand-fold which ranges to even more than hundreds of gigabytes. So, data reduction is definitely required without which the loading of the dataset into the memory module becomes a very hectic task. Dimensionality Reduction is employed by means of selecting the most appropriate channels and time epochs. The organization of the paper is as follows. In Section 2, the materials and methods are discussed followed by the dimensionality reduction techniques in Section 3. In section 4 the Neural Networks as post classifiers are discussed followed by the results and discussion in Section 5 and in Section 6 the paper is concluded.

2. Materials and Methods

For the performance assessment of the epilepsy risk levels using the FMI, ICA, LGE, LDA and VBMF as Dimensionality Reduction technique followed by NN as Post Classifiers, the raw EEG data of 20 epileptic patients who were under treatment in the Neurology Department of Sri Ramakrishna Hospital, Coimbatore in European Data Format (EDF) are taken for study. The EEG is recorded by placing electrodes on the scalp according to the International 10 - 20 system. Sixteen channels of EEG are recorded simultaneously for both referential montages, where all electrodes are referenced to a common potential like ear, and bipolar montages, where each electrode is referenced to an adjacent electrode. Recordings are made while the patient is fully awake but in resting condition and include periods of eyes open, eyes closed, hyperventilation and photonic stimulation. Amplification is provided by an EEG-machine (Siemens Minograph Universal). Before placing the electrodes, the scalp is cleaned, lightly abraded and electrode paste is applied between the electrode and the skin. By means of this application of electrode paste, the contact impedance is less than 10 kW. Generally disk like surface electrodes are used. In some cases, needle electrodes are used to pick up the EEG signals. The signals are recorded with the speed of 30 mm/s.

The pre processing stage of the EEG signals is given more attention because it is vital to use the best available technique in literature to extract all the useful information embedded in the non-stationary biomedical signals [5] . The EEG records which were obtained were continuous for about 30 minutes and each of them was divided into epochs of two second duration. Generally a two second epoch is long enough to avoid unnecessary redundancy in the signal and it is long enough to detect any significant changes in activity and to detect the presence of artifacts in the signal [5] . For each and every patient, the total number of channels is 16 and it is over three epochs. The frequency is considered to be 50 Hz and the sampling frequency is considered to be about 200 Hz. Each and every sample corresponds to the instantaneous amplitude values of the signal which totals to 400 values for an epoch. The total number of artifacts present in the data is four. Chewing artifact, motion artifact, eye blink and electromyography (EMG) are the four numbers of artifacts present and approximately the percentage of data which are artifacts is 1%. No attempts were made to select certain number of artifacts which are of more specific nature. The main objective to include artifacts is to differentiate the spike categories of waveforms from non spike categories. Figure 1 shows the block diagram of the procedure.

3. Dimensionality Reduction Techniques

The dimensions of the EEG data are stored by a pre-processing step known as Dimensionality Reduction (DR). By separating a set of important features that goes hand in hand with certain important criteria, the dimensions of the data can be reduced. The impact of the reduced dimensions has a vital effect to play in the classification process. Each epoch contains 400 values and hence the total volume for a patient is around 25,600 samples. So it absolutely necessary to reduce the dimensions of the data for smooth processing of the EEG signals. In a high-dimensional data set, it is important to understand that not all the obtained variables by appropriate measurements are utilized for analyzing the underlying area of interest.

3.1. Independent Component Analysis (ICA)

Assuming that there are totally “n” linear mixtures as, where “n” represents the independent components, it can be written mathematically as follows:

for all j. (1)

The vector-matrix notation is utilized completely and the above equation can be written as follows [6]

. (2)

where A denotes the matrix with particular elements, x is the random row vector of or sometimes is used which denotes the transpose of the row vector, s is also the random row vector of. Emphasizing the

importance of columns of matrix A, the model can be written as follows, where denotes the

columns of matrix A. It is considered as a generative model where an observed data is described clearly. If the matrix A is estimated, then the computation of its inverse, say P, is obtained easily and then the independent component is obtained as follows

(3)

3.2. Linear Graph Embedding (LGE)

This process generally involves Graph Embedding, Linearization and Kernelization procedures but for

Figure 1. Block diagram of the procedure.

dimensionality reduction of EEG signals the following procedure is considered. A sample set for model training is represented as a matrix, where N represents the sample number. Consider, where m is the feature dimension [7] . In reality, the dimension of the feature “m” is too high and so it is mandatory to transform the data from high-dimensional space (original data) to lower-dimensional space [7] . The main task of this Dimensionality Reduction is to just find a mapping function which is represented as follows

. (4)

This main function always transforms into the desired low-dimensional representation, where.

Therefore it is mathematically represented as follows

. (5)

3.3. Fuzzy Mutual Information (FMI)

It is a filter method where the irrelevant features can be easily reduced. Enrichment of the mutual information is done using the fuzzy concept [8] . Initially the discretization process is done and the number of clusters is assigned. The membership function of the fuzzy set is constructed using triangular membership function [8] . The fuzzy entropy is then calculated using class degree as follows

. (6)

The entropy of class C is then calculated as follows

. (7)

The normalized Fuzzy entropy measure is then calculated as follows

. (8)

3.4. Linear Discriminant Analysis (LDA)

It is a popular technique for dimensionality reduction [9] . An orientation P is found out which reduces feature vectors belonging to different or higher classes to a low dimensional space. Supporting, if the dimensionality reduction is from a b-dimensional space to an c-dimensional space (where), then the size of the orientation P is easily obtained by maximizing the Fischer’s criterion function. The orientation P, within-class scatter matrix and between-class scatter matrix are the three important factors for the determination of criterion function [9] .

To determine the LDA explicitly, it is vital to consider a multiclass pattern recognition and classification problem with e classes. Let be the set of “e” class labels, where denotes the class label. In such cases, the Fischer’s criterion as a function of “P” can be given as follows

. (9)

3.5. Variational Bayesian Matrix Factorization

It refers to a method for uncovering a very low-rank structure of a particular data [10] . It also approximately determines the data matrix as a product of any two factor matrices. For collaborative prediction, matrix factorization is very popular and the user predicts the unknown ratings. If is considered to be a user-item rating matrix, the (a, b) entry in which, represents the rating of user on an item matrix b. The Matrix factorization [10] determines the factor matrices and, where the rank of the factor matrices is represented by K. It is done to approximate the rating matrix Y by as is represented as follows

. (10)

The over fitting problem is successfully alleviated by the Bayesian treatment of matrix factorization.

4. Post Classifiers Used Here

Several post classifiers for the classification of epilepsy risk levels was considered in [11] . The Neural Networks which are used as post classifiers here are Cascaded Feed Forward Neural Network (CFFNN) Generalized Regression Neural Network (GRNN) and Time Delay Neural Network (TDNN).

4.1. Cascaded Feed Forward Neural Network (CFFNN)

To understand the cascaded feed forward neural network, feed forward back propagation model is considered. The feed forward back propagation model consists of input, hidden and output layers. The learning algorithm used here is Back Propagation Networks (BPN). During the training phase, from the input layer of the network to the output layer of network, calculations were carried out and the generated error values are then forwarded to the prior layers [12] . Generally, the hidden layers are one or more and it consists of sigmoidal neurons. The cascaded forward networks are very similar to the feed forward networks. From input to each layer, a weight connection is given and between the successive layers also, a weight connection is given. The finite input-output relationship can be learnt arbitrarily in this type of network. To improve the speed, the additional connections aids greatly so that the network learns the desired relationship quickly. Here, the testing process is evaluated by the parameter Mean Square Error (MSE) which is defined as

(11)

where denotes the observed value at time i, is the target value at model j; j = 1 - 10, and N is the total number of observations per epoch.

4.2. Generalized Regression Neural Network (GRNN)

This network does not require a training procedure which is iterative in nature. It is always very consistent in its attributes. For the estimation of the continuous variables, GRNN can be used easily. The approximation of arbitrary function between input and output vectors are done quite easily in this model [13] . It is composed of four layers namely, input layer, pattern layer, summation layer and output layer respectively. The total number of input units equals the total number of parameters always. The first layer is joined to the pattern layer, which forms the second layer. A training pattern is signified by each unit in the pattern layer. Each pattern layer unit is connected to the two neurons present in the summation layer, namely S-summation neuron and D-summation neuron. S-summation computes the weighted outputs of the pattern layer and D-summation computes the unweighted outputs of the pattern layer. The function of the output layer is to divide the output of each S-summa- tion neuron by the output of each D-summation neuron [13] and so to an unknown vector, a predicted value is supplied as follows

. (12)

4.3. Time Delay Neural Network (TDNN)

It is an Artificial Neural architecture, where the main intention of it is to work on data which is sequential in manner and it is feed forward in nature. The TDNN units easily recognize the features which are highly independent of time shift [14] . Its application is higher and forms an integral part in the pattern recognition system. Augmentation of the input signal is done initially and other input is represented as delayed copies. Since there are no internal states present here, the Neural Network is generally assumed to be time-shift invariant.

4.4. Training Algorithm Used for the Neural Networks

The Levenberg-Marquardt (LM) algorithm is the basic training method for minimization of MSE (Mean Square Error) criteria, due to its fast converging properties and robustness [15] . It provides a rapid convergence and hence it is versatile, efficient, robust and simple to implement, and it is not necessary for the user to initialize any strange design parameters. It out performs simple gradient descent and other conjugate gradient methods in a wide variety of scenarios. The LM algorithm is first shown to be a blend of vanilla gradient descent and Gaussian Newton iteration [15] . This error back propagation algorithm is used to compute the weights updates in each layer of the network. The derivation of LM update rule is shown below for a standard back propagation algorithm. An approximate steepest descent rule has been used and updated the weight according to the following equation as devised by

(13)

where W(k) is the weight at the k^{th} iteration, α is the learning rate, (k) is the difference between NN output and the expected output. DW(k) is the weighted difference between the k^{th} and (k − 1)^{th} iteration (this item is optimal), and m is the momentum constant. In some adaptive algorithms, α change with time, but this requires many iterations and leads to a high computational burden. Fortunately, the non-linear least squares Gauss-Newton has been used to solve many supervised NN training problem.

5. Results and Discussion

For FMI, ICA, LGE, LDA and VBMF as dimensionality reduction techniques and Neural Networks as Post Classifiers, based on the Performance Index, Quality values, Time Delay and Accuracy the simulated result values are plotted in Tables 1-3 respectively. The formulae for the Performance Index (PI), Sensitivity, Specificity and Accuracy are given as follows

(14)

where PC―Perfect Classification, MC―Missed Classification, FA―False Alarm.

The Sensitivity, Specificity and Accuracy measures are stated by the following

Table 1. Analysis of dimensionality reduction techniques with GR-NN.

Table 2. Analysis of dimensionality reduction techniques with TD-NN.

Table 3. Analysis of dimensionality reduction techniques with CFF-NN.

(15)

(16)

. (17)

The Time Delay and the Quality Value Measures are given by the following

(18)

. (19)

On the careful examination of Table 1, other than LGE technique all the other dimensionality reduction techniques with GR-NN provide a 100% accuracy and similar results. On the careful analysis of Table 2, the FMI with TD-NN provides a 100% accuracy when compared to the other dimensionality reduction techniques. On the careful analysis of Table 3, it is inferred that LDA-CFNN provides the highest accuracy as of 97.15%. Figures 2-5 provide the accuracy measures, quality value measures, time delay measures and performance index measures respectively.

6. Conclusion

Thus the most used technique to capture the brain signals is the EEG signals. EEG always provides an excellent temporal resolution. EEG is considered as a highly complex human brain signal which consists of valid information about the functions of the brain and the other neurological disorders. EEG also plays a vital role for diagnosis of epilepsy, early detection of brain tumour, early detection of problems related to sleep etc. Epilepsy generally affects people from all ages but young infants and the elderly people are more prone to it. Epilepsy occurs due to abnormalities in the genetic mechanisms of humans or it may be due to developmental anomalies and infections in the central nervous system. It is quite difficult to extract the feature rhythms because the EEG signal is quite complex, stochastic and non-stationary in nature. Due to the abrupt and unpredictable nature of the epileptic seizures, the everyday routine life of an epileptic patient is severely affected. Since epilepsy is witnessed by sudden disturbances of the mental functions which results due to the excessive discharging of groups of cells in the brain, the epileptic EEG obtained from the scalp is characterized by synchronized periodic waveforms which have very high amplitude. Spikes and sharp waves too are found in between the seizures and hence the detection of it by an encephalographer is quite difficult as it requires skilled technicians who are in great demand nowadays. This leads to a prolonged diagnosis time period and also the expenditures related to it are too much to bear. Surgery may not be suitable to all the patients because it demands the consideration of other health risks also. Therefore, the seizures have to be detected in an automatic manner and it forms an integral part of biomedical research. This

Figure 2. Accuracy measures of dimensionality reduction techniques with NN.

Figure 3. Quality value measures of dimensionality reduction techniques with NN.

Figure 4. Time delay measures of dimensionality reduction techniques with NN.

research on epilepsy has therefore become an active interdisciplinary field of biomedical research. Thus the dimensions of the EEG signals were reduced using five different dimensionality reduction techniques and then it was classified by using three different types of Neural Network Post Classifiers. Results showed that FMI-GRNN, ICA-GRNN, LDA-GRNN, VBMF-GRNN and FMI-TDNN showed an accuracy of 100% with the highest quality

Figure 5. Performance index measures of dimensionality reduction techniques with NN.

values as of 25. Future works plan to incorporate other neural networks and genetic algorithms for the epilepsy classification from EEG signals.

References

[1] Gotman, J. (1982) Automatic Recognition of Epileptic Seizures in the EEG. Electroencephalography and Clinical Neurophysiology, 54, 530-540.

http://dx.doi.org/10.1016/0013-4694(82)90038-4

[2] Finley, K.H. and Dynes, J.B. (1942) Electroencephalographic Studies in Epilepsy: A Critical Analysis. Brain, 65, 256-265.

http://dx.doi.org/10.1093/brain/65.3.256

[3] Easwaramoorthy, D. and Uthayakumar, R. (2010) Analysis of Biomedical EEG Signals Using Wavelet Transforms and Multiultifractal Analysis. IEEE EMB Magazine, 30, 7487.

[4] Harikumar, R. and Sunil Kumar, P. (2015) Dimensionality Reduction Techniques for Processing Epileptic Encephalographic Signals. Biomedical and Pharmacology Journal, 8, No. 1.

[5] Zhang, G.Q. (2000) Neural Networks for Classification A Survey. IEEE Transactions on Systems Man Cybernetics—Part C: Applications and Reviews, 30, 451-462.

http://dx.doi.org/10.1109/5326.897072

[6] Xu, L., Cheung, C., Yang, H. and Amari, S. (1997) Independent Component Analysis by the Information-Theoretic Approach with Mixture of Densities. International Conference on Neural Networks, Houston, TX, 9-12 June 1997, 1821-1826.

[7] Yan, S., Xu, D., Zhang, B. and Zhang, H.J. (2005) Graph Embedding: A General Framework for Dimensionality Reduction. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, CA, June 2005, 830-837.

[8] Sanchez, L. (2005) A Fuzzy Definition of Mutual Information with Application to the Design of Genetic Fuzzy Classifiers. International Conference on Machine Intelligence, Tozeur, Tunisia, 5-7 November 2005, 602-609.

[9] Gu, Q.Q., Li, Z.H. and Han, J.W. (2011) Linear Discriminant Dimensionality Reduction. Machine Learning and Knowledge Discovery in Databases, 6911, 549-564.

[10] Seeger, M. and Bouchard, G. (2012) Fast Variational Bayesian Inference for Non-Conjugate Matrix Factorization Models. Proceedings of the 15th International Conference on Artificial Intelligence and Statistics (AISTATS), La Palma, Canary Islands, 15 February 2012, 1012-1016.

[11] Harikumar, R. and Sunil Kumar, P. (2015) Classifiers for the Epilepsy Risk Level Classification from Electroencephalographic Signals. Research Journal of Pharmaceutical, Biological and Chemical Sciences, 6, 469.

[12] Sumit, et al. (2011) Cascade and Feedforward BPN Neural Network Models for Prediction of Sensory Quality of Instant Coffee Flavoured Sterilized Drink. Canadian Journal on Artificial Intelligence, Machine Learning and Pattern Recognition, 2, No. 6.

[13] Shaikh, et al. (2010) Generalized Regression Neural Network and RBF for Heart Rate Diagnosis. IJCA Journal, 7, No. 13.

[14] Lang, K.J., et al. (1990) A Time-Delay Neural Network Architecture for Isolated Word Recognition. Neural Networks, 3, 23-43.

[15] Harikumar, R., Vijayakumar, T. and Sreejith, M.G. (2012) Performance Analysis of Morphological Operators Based Feature Extraction and SVD, Neural Networks as Post Classifier for the Classification of Epilepsy Risk Levels. Proceedings of the Fourth International Conference on Signal and Image Processing 2012, Coimbatore, 13-15 December, 1-12.