Epilepsy is a neurological disease that affects around 50 million people around the world. It is characterized by recurrent and chronic oftentimes unprovoked epileptic seizures. Other symptoms of epilepsy include loss of consciousness, anxiety, depression, and impaired physical movements. Epilepsy is the most common neurological disease that affects people of all ages. In low-income countries, where timely and effective treatments are almost inaccessible, it has become even more dangerous. Sporadic seizures also prevent patients from performing normal social functions, and therefore, they can be targets of discrimination and stigma . Moreover, reports also showed that epilepsy can cause an annual economic burden of ?5.5 billion in Europe, and $12.5 billion in the United States .
About 70 percent of epilepsy patients are treatable with antiseizure medication . Other patients would need to have brain surgery to remove the epileptogenic zone in which area, brain tissue lesion contributes to the seizure onset . By correctly identifying and removing the pathological zone, physicians can prevent further seizure onsets in epilepsy patients. However, this process is not riskless, because identification of the epileptogenic zone is not always accurate. In addition, among the successful cases, only 65 percent of the patients were completely seizure-free according to a study conducted in a pool of 2250 patients in total after the surgery .
In recent years, the Brain-Machine Interface (BMI) has caught people’s attention as a potential solution to seizure onset forecast and provide appropriate measures in due time. BMI is a device that can collect neuronal signals from the brain and send them to an external processor for further signal processing and peripheral machine control. Electroencephalography (EEG) is a widely used test with BMI for measuring brain electric activity with either implanted or on the scalp electrodes. EEG can only be used to detect superficial cortex activity and its spatial resolution is limited by its electrode density. However, its low cost, high temporal resolution, and relative motion tolerance made it a fit tool for long term brain pathology monitoring. Some commercially available lightweight wearable EEG devices, such as the Emotiv, Neurosky, IMEC, etc. , can be used for single brain feature detection. Comparing with these on-scalp electrode EEG devices, intracranial EEG (iEEG) obtains a signal from implanted invasive electrodes deep on the cortex, which can increase the signal-to-noise ratio (SNR).
EEG devices collect neuronal signals from electrodes and translate them into a graph that shows the electrical intensity of all neurons firing in the corresponding electrode placing area. In neural engineering, EEG signals are typically categorized into different bands based on signal frequency, shown in Table 1.
By analyzing patients’ EEG signal time-domain pattern, frequency domain features, and their corresponding pathological manifestation, clinicians can make identification and diagnosis. Using the EEG signal to identify different brain waves is crucial in the detection and prediction of epilepsy, as they are oftentimes correlated. A seizure can be further categorized into four categories based on occurrence time, shown in Table 2.
When predicting epileptic seizures, the pre-ictal period has to be identified, after which the ictal period ensues. Over the years, no clear biomarkers for the pre-ictal period have been identified . However, with the help of machine learning and neural network algorithms, the potential characteristics of the pre-ictal period can be discovered and used as an effective prediction method.
With the help of a machine learning algorithm, BMI can be used to predict seizures before its onset, and enable countermeasures, such as negating the disorderly epileptic brain activity by sending counter signals using a closed-loop Responsive Neurostimulator (RNS), whose electrodes are implanted in a burr hole in the skull .
1.2. Related Work
Developing an effective seizure prediction and suppression method has been one of the high-profile research topics for epilepsy clinicians and researchers since the last century. In the 1970s, an “epileptic seizure warning system” has been designed, in which the device takes electrical signals recorded from electrodes located on the scalp, and runs pattern recognition over the EEG signals . This seizure detection device can be carried in pockets and can check specific patterns that are related to reseizure activities. It would warn its user if the number of pre-seizure decisions exceeds a certain threshold within a predefined epoch.
Table 1. EEG bands.
Table 2. Serzure period categories.
Patients can then take medications or find a quiet place before the seizure starts. Another invention took the detection to another degree by adding a counteracting electrical energy stimulator to suppress the seizure . By implanting electrodes on the critical epileptic cortex, this device can automatically detect aberrant activity in the brain and send signals to negate the imminent seizure. However, both devices suffer from low detection accuracy and a high false alarm rate problem. The deficient inaccuracy problem was alleviated by the enhancement of computer processing power in the 21st century. Researchers have eyed the pre-ictal period, the detection of which can lead to a better success rate in identifying when the seizure starts. In 2007, the first international contest of seizure prediction was held by the International Workshops on Seizure Prediction (IWSP), in which the competitors tested their algorithms against a set of data from actual epilepsy patients for accuracy. However, none of the competitors achieved a prediction performance of above 50% . According to Epilepsy Ecosystem, a website dedicated to holding competitions for evaluating different epilepsy algorithms, the top score in a recent competition scored an overall AUC of 86.674%, which is a significant improvement on the pure guesswork ten years ago, but still not accurate enough to ultimately solve the issue . For most of the recent results, researchers employ the popular Artificial Neural Network (ANN) method, inspired by human brain neurons. The input values are passed through individual “perceptions” in the middle “hidden layers” and onto the final “outer layer,” which generates a result. The state-of-the-art Deep Learning algorithm was developed from ANN methods and has both increased accuracy and processing speed. Its advantage in modeling and classification will become even greater when given a large training dataset. In deep learning, more data can almost guarantee a better performance, which is not always the case in traditional machine learning. In 2020, a group of researchers combined deep learning with image processing by transforming EEG signals into RGB images, whose features are extracted and processed through layers of ANN classifiers, as well as three CNN layers, which is ideal for image processing . They even achieved a stunning AUC of 92% with the CNN model, which proved their algorithm effective.
Despite the progress in the seizure detection field using different machine learning algorithms, there are still some obvious limitations to all current epilepsy detection methods. First, the capacity of seizure prediction model training and validation dataset is small. Comparing with the billion level of the image training dataset, due to privacy, medical confidentiality and relative high collection cost reasons, the EEG dataset can hardly be shared between research centers, not to speak of open to the public. This is why even though many researchers have presented a very good prediction accuracy result in their publications, there is still no universal effective solution for clinical seizure treatment devices. Second, devices used in different regions have large variations. The datasets which are used to train a seizure prediction model are mostly collected from a specific type of EEG or iEEG device. However, RNS devices have different channel numbers. sample rate and SNR which makes it very hard to migrate a good prediction model to other types of devices. Third, prediction models can be either hard to execute on a small low-power EEG apparatus processor or unsuitable for clinical or domestic applications due to the low sensitivity and specificity . Traditional machine learning models, like support vector machine (SVM), K-nearest neighbor (KNN), and Random Forest, etc., suffer from a lower-than ideal AUC prediction score. They also have a high requirement for selecting proper signal features for the classifier based on signal quality and device placement. Since they have difficulty in processing raw data from nature directly, an effective feature extractor is necessary to transform the raw data into a suitable feature vector . Furthermore, deep neural networks, the current most popular machine learning model, also has its own limitation on this application. Like what we have mentioned, epileptic patient data is limited to a few ones publicized by hospitals and research facilities, DNN’s advantage in processing large quantities of data effectively is diminished. A Limited dataset can bring out an inevitable problem, the model is tended to be overfitting on specific patients and thus, loses its generality. This issue doesn’t only happen in epilepsy detection, but also in other applications with small datasets. Since brain structures of different people are oftentimes disparate, a DNN model being trained on a small dataset can hardly be used on other epilepsy patients with the same performance .
Thus, in this paper, we are going to present a signal reconstruction method and a corresponding seizure detection model which can be used as a transfer-learning model between different patients with different EEG/iEEG devices. As a proof of concept method and with a limited capacity dataset, we understand the classification model can’t perform as good as other neural network-based models. However, we believe, this method has a big advantage and potential to be applied to real clinical or commercial devices.
1.3. Our Contribution
1) Inspired by the work of Dr. Srinivas Sridhar and Dr. Yury Petrov  who introduced the concept of Electrical Field Encephalography (EFEG), we employ the same method to our dataset in the process of signal reconstruction. EFEG is a novel modality that combines traditional EEG and electrical fields of each individual electrodes on the scalp to reconstruct a local directional electric field vector. Electrical potential energy is measured at each electrode like normal EEG, but instead of averaging the potential energy, EFEG takes a central electrode and builds a coordinate system around it. EFEG utilizes a local reference that accounts for the individual electrodes and their relative positions, as opposed to traditional EEG which uses a global reference . By applying this EFEG method which is similar to solving a linear algebraic overdetermined matrix problem, we can subtract the common information from the sensing region and dramatically reduce computational cost. However, the EEG dataset, which we used in this work, doesn’t contain information about electrodeposition. Since this issue can also happen in real practice, we developed virtual coordinates based EFEG algorithm which assigns each channel a virtual coordinate based on its signal feature ranking. By doing this, only the high relevance channels will be sent to the electric field signal reconstruction process. This process can reduce the interference from the less seizure-event relevant channels and make the method applicable for devices with different electrode numbers.
2) In this work, we proposed a seizure detection model based on DNN, SVM, and random forest ensembling. In order to mitigate the generalization gap and reduce classification model specificity, we performed ensemble learning on DNN, SVM, and random forest to build a new classification model. To reduce the risk of model overfitting on a small dataset, we combined the “strong classifier” based on neural network and signal feature-based “weak classifier” using reconstructed EFEG signal as data input. Time-domain and frequency domain features are subtracted from EFEG signal and used for SVM and random forest training. Empirical Mode Decomposition (EMD) and Wavelet Transform (WVT) are also performed on EFEG signal and their outputs are used for DNN training. In the analysis section, we also showed a comparison between the ensemble model and individual models. Cross-validation is also performed on the training dataset. Different EEG channel number patients’ datasets are used for model testing.
3) In this paper, we also proved model transferring between different patients and devices. A major obstacle in epilepsy prediction is the difficulty in model migration, applying the same detection model on different patients, due to insufficient training data and the difference in the neurological mechanisms between different people. EEG recording devices, similarly, come in many shapes and forms, forcing an overall model to account for devices with varying amounts of channel inputs, which usually ranges from 16 to 256 channels. Thus, in our work, we first demonstrate that with our signal combination method, devices with different channel numbers can be used to train the same classifier. Second, we test our model with other patients who are not included in the training dataset and using different channel number devices. This can be a good concept proof of our transfer learning method. It has good potential to be used in clinical practice and its performance can increase even using data from different sources.
2.1. Detection Algorithm
The foundation of our detection system is based on the EFEG signal reconstruction method, which is introduced in 1.3.1. What’s more, we also use channel importance ranking, extraction of appropriate features, and the final model ensembling based on the predictions of three different machine learning models. As shown in Figure 1, our automatic seizure detection algorithm is divided into
Figure 1. EFEG-based automated seizure detection algorithm flow chart.
the signal combination calibration phase and model training phase. In the first phase, features extracted from the input EEG signals are used to calculate channel importance, in preparation of the EFEG method. Different from the original EFEG method, since we can’t always get the distribution of the accurate electrodes, we build a virtual coordinate EFEG combination based on the feature ranking. The best channels are then selected and ranked by their relative importance to the Random Forrest classifier in the calibration phase. Following this phase is the Model Training phase, where the previously ranked channels get reconstructed through EFEG method. The same features are then extracted from the reconstructed signals, trained with three different classifiers, and ensembled to get the final result.
2.2. Signal Processing
2.2.1. Classical Signal Features in Seizure Detection
First, we use filters to remove high-frequency brain waves and low-frequency brain waves, and keep the brain wave signals we need as much as possible, and also make the data image smoother.
After necessary pre-steps are taken to eliminate irrelevant noises, 36 features are extracted from the pre-processed data.
As shown in Table 3, the 36 features are categorized into 3 feature types. Time-domain, frequency domain, and statistic features are regular signal analysis features extracted on their respective domain field; the WVT-based and EMD-based features are not extracted from the reconstructed signals, but rather from signals further processed by WVT and EMD methods, which will be explained in details in later paragraphs.
Table 3. Set of investigated features.
Hjorth activity and Hjorth mobility are Hjorth parameters, time-domain based statistical properties widely used in EEG data analysis. Hjorth activity is a measurement of the signal’s power in a time range, calculated by taking the variance of the signal function. Hjorth mobility measures the mean frequency or the proportion of the standard deviation of the signal’s power. The subsequent five features are Power Spectral Density (PSD) of different brainwave frequencies. It is a measurement of a signal’s power content on the frequency-domain. Compared to the similar auto power whose amplitudes increase as the frequency resolution gets lower and finer, the PSD solves the difference in amplitudes by dividing each measurement by its frequency. This way, the results of PSD look consistent as the frequency resolution changes. It generates a power spectrum for each data range and is averaged to take the mean value. The five frequencies of brainwave used are delta, theta, alpha, beta, and lower-gamma band, each representing different phases of human activity, as shown in Table 1. The next two features work on data amplitude on the time-domain, with the former calculating its mean value and the latter measuring its variation. The last two regular features, kurtosis and skewness, are common statistical analysis parameters. Kurtosis represents the sharpness of data, and how much it deviates from the Gaussian distribution, and skewness, measures the symmetry or distribution of data points. The WVT based features deal with the mean power and entropy of each coefficient array, and the EMD-based features deal with the entropy, mean envelope amplitude and frequency variation of each IMF, as will be explained in the following paragraphs.
2.2.2. Wavelet Transform
Wavelet Transform is a signal analysis technique that can decompose signals in its frequency domain, while maintaining valuable time domain information. Designed in the 1980s, it was developed as a signal processing technique that offers a solution to the lack of time-domain information in Fourier Transform. Some benefits of WVT include its ability to analyze rapidly changing edges of non-linear, non-stationary signals and its denoising effects. The transformation is expressed through this equation:
In the equation, a and b are two parameters that change as the transformation proceeds—scale and translation factors. represents the “mother wavelet,” a finite wave with an overall amplitude of zero. Scale controls the size of the wavelet, and translation controls its position along with the analyzed signal . As WVT algorithm is in progress, values of a and b automatically change to fit the signal being analyzed, and the equation computes the correlation coefficients of each small shift in b. This procedure breaks down the signal into a number of smaller “coefficient arrays.” This multilevel decomposition can be realized using the “wavedec” function in the PyWavelets library in python. Among the countless wavelets, the “Daubachies 4” (db4) wavelet in Discrete Wavelet Transform is a popular choice among ECG and EEG research in the past  .
As shown below, the raw signal is being decomposed five times with the Daubachies wavelet, which results in five array segments with decreasing complexity. Each segment is a representation of the original signal and reveals its frequency information on the time domain. Subsequently, the two features aforementioned are extracted from each coefficient array, giving information on its power spectrum.
2.2.3. Empirical Mode Decomposition
The last series of features are derived from Intrinsic Mode Functions (IMF) calculated from EMD method. Similar to WVT (shown in Figure 2), EMD breaks a signal down into a number of IMFs, depending on the signal’s length and complexity (shown in Figure 3). An IMF is defined as a function where the number of extrema and zero crossings must differ by at most one, and the mean value between the local minima and maxima envelopes at any point equals to zero. This method is specialized in processing nonlinear and nonstationary signals found abundant in nature and preserves the time domain. Essentially, the relative maxima and minima of the signal is connected by a spine line to create the upper and lower envelope, encompassing all data points in between. After that, the mean value of the upper and lower envelopes is subtracted from the signal, creating the first IMF in the process called “sifting.” The IMF is then sifted multiple times, until the remainder function becomes monotonic, and thus cannot be sifted anymore. The resulting IMFs possess many properties of the original signal and are open to further analysis. Hilbert transform is then applied to the IMFs, converting them into analytic signals that have no negative frequency components.
Below are five IMFs, processed through five rounds of sifting. As you can see, the signal gets less and less disorderly and oscillatory. After the five IMFs are generated, three features are extracted from each. First, the entropy of the IMF reflects the disorder of the system. The second feature is the mean envelope amplitude of the IMF. The envelope refers to the magnitude of the analytic signal, and the mean value suggests the power of the overall amplitude. Lastly, the variation of instantaneous frequency’s first derivative is calculated. It is extracted by differentiating the instantaneous phase, which correlates with the analytic signal’s phase angle. Subsequently, the first derivative calculates the degree of instantaneous changes occurring in the variation of instantaneous frequency.
Figure 2. An example of wavelet coefficients from a 9-second ictal signal processed by WVT.
Figure 3. An example of fifive IMFs from a 9-second ictal signal processed by EMD.
The EFEG method is a recently developed signal reconstruction method that computes electrical field components of the raw signal. It takes into account the relative positions of EEG channels on the scalp during the recording:
Ex and Ey form a virtual coordinate system of the electric field. In theory, the electric potential is the value recorded by EEG devices, indicating the degree of amplitude. That corresponds to Fi in the equation above, with i being the ith electrode. Ex and Ey are individually multiplied by xi and yi, their coordinates on the scalp.
The reference electrode noise N, a common mode noise of 1 used to account for external distractions during EEG measurement that are common to the whole set of channels, is added, in order to get the final potential. This matrix multiplication serves the purpose of clarifying the calculation:
As we can see, Ex and Ey, the values we try to obtain, are on the left side of the equation and cannot be directly calculated or accessed. To move the coordinate matrix to the right, its inverse matrix must be multiplied on the right side, by definition. However, this cannot be done directly given that the coordinate matrix is not a square matrix. Therefore, a method of pseudo-inverse function in python is called upon the coordinate matrix, so that Ex and Ey can be deducted from the given data:
The pseudo-inverse is not a real inverse; rather, it is called “pseudo-inverse” because when multiplied with the original matrix, it generates a matrix that closely resembles the identity matrix. After obtaining Ex and Ey components, the magnitude of the electric field components can be calculated by taking the square root of Ex squared added to Ey squared. It is a crucial parameter deducted from the calculated Ex and Ey, and can serve as a vital input for epilepsy classification.
Because EFEG calculation requires a set amount of channels for each matrix and EEG devices’ channel counts vary, EEG data cannot be directly processed. In order to apply the same EFEG coordinate on different patients, we first create five different coordinate systems, each with different parameters.
As shown in Figure 4, the virtual channels have 3 different dispositions: coordinate 1, 3, and 5 all have the optional common-mode noise added, while coordinate 2 and 4 do not; coordinate 3 and 4 have a virtual coordinate centered around the origin, whereas coordinate 5 has a segmented coordinate that only spreads out in the first quadrant. The existence of five different coordinates is to validate which set of coordinates has the best performance. Coordinate matrices are designed to have different weight and sparsity. Coordinate 1, 2 and 5 are designed to be unsymmetrical and have a weight-center bias to simulate electrode array curvature.
Due to the fact that most devices have the different number of channels, a group of 16 channels is selected from the varied channel amount of different recording devices. First, features of data from all channels are entered as input to a random forest classifier. A series of random features is also introduced to ensure the accuracy of channel selection. In the classifier, input feature values are changed successively, and the influence of their variation to the classifier performance is recorded. The more visibly it changes, the more important the channel is to the classification. The channels’ relevance to the performance is ranked, and the top 8 or 16 is chosen.
2.4. Classification Model
Among the numerous machine learning models, the Support Vector Machine (SVM) is one of the most suitable “weak” classifiers for seizure detection, as
Figure 4. Virtual EFEG coordinates: the Red scatter plot corresponds to coordinate 1 and 2, the Green corresponds to coordinate 3 and 4, and the Blue corresponds to coordinate 5.
opposed to the “stronger” classifiers, such as DNN. SVM is ideally suited for seizure prediction because it is good at generalization and insensitive to overfitting, problems often encountered in the training process of a small dataset. The linear SVM sets up linear boundaries that separate one class of objects from the other, and it does so by creating a hyperplane between the two classes. As the Figure 5 shows, the optimal hyperplane exists when margins between itself and the nearest objects from the two classes are maximized. Samples on the margins are called support vectors, and they serve as calibrations for how effective the
Figure 5. SVM .
model is. If the training set is nonlinear or inseparable in 2-dimensional space, as in the case of EEG data, the model utilizes kernel trick to raise the data into multi-dimensional space by creating one or more “kernel functions,” such as the Gaussian kernel or radial basis function kernel . These kernel functions add more parameters to the original 2D data, thus creating a virtual space that transforms the data into linear separable data points. In python, the scikit-learn package provides the basic implementation of this model. In this article, the kernel function we use is a polynomial kernel function. For other parameters, we use python’s default built-in parameters.
Another category of the weak classifier is the decision tree model. The input data x passes through multiple decision tree models, each with hundreds of estimators and decision nodes to classify x. With each training data input, the model adjusts its decision trees to match the output label better. Compared to a regular decision tree (shown in Figure 6), the random forest model encapsulates the advantages of it, and improves upon the overfitting problem with the algorithm of bootstrap aggregating. This algorithm is designed to improve performance by lowering the variance of the data, and is especially effective when combined with decision tree models. Essentially, the random forest model creates bootstrap samples based on the original data, and selects the best split data in each randomly chosen sample set, instead of the whole data. After that, the prediction results of all trees are averaged to generate a final result based on a majority vote . As a weak classifier, the random forest can be ensemble with SVM and increase the sensitivity and specificity, by allowing a decision made from considering both models. Additionally, the random forest plays a role in determining the channel importance using features input. By analyzing the overall features extracted from the raw data of individual channels, the random forest can decide the channels that matter the most by altering their parameters and calculating the effects. The larger the effects are, the more important the channel is. A set of random data is also included as a pseudo channel to ensure the correctness of the importance calculation.
The final classifier that is included in the ensemble model is DNN, the strong classifier (shown in Figure 7). Different from Artificial Neural Network (ANN) in general, DNN possesses an increased amount of hidden layers and perceptron to process the data. In our model, the input layer is constructed by a feature column instead of separate data points, which is then passed through the middle hidden layers, and are assigned an individual weight. Next, the output from the previous layers is passed to the final output layer that gives a final prediction result. In each layer, an activation function determines what value gets output and passed to the next layer. For this purpose, we use the Rectified Linear Units (ReLU) function, an activation function with increasing popularity for its performance. The model trains on datasets by adjusting weights associated with
Figure 6. Random forest classifier, from https://levelup.gitconnected.com/random-forest-re-gression-209c0f354c84.
Figure 7. Deep neural network .
each perceptron every time a dataset is passed through, and the loss function serves as a measure of quantifying how well the model performs. By feeding more data, the model tries to find a minimal value to the loss function, which indicates optimal performance level. For this, the Binary Cross Entropy loss function is chosen in our model, because it is designed for a binary prediction and suits our situation. As a newly developed neural network model, it performs exceptionally well on a sufficient dataset. We, however, have access to very limited EEG data from epilepsy patients, which can incur serious overfitting. This means that when given a set of data from a different patient, performance accuracy from DNN alone is often too low to offer any useful predictions. Therefore, it is combined with the weak classifiers that would decrease its performance on the same patient, but increase its generalization that enables model migration and will apply to a wider range of patients.
After models from Random Forest, SVM, and DNN are trained on the same set of data, a simple voting classifier is ensemble by taking prediction outputs from all three models, adding them together, and dividing by three. This way, the ensemble model will make a 0/1 prediction to a certain data point only if at least two out of the three models return this corresponding result. Accidental prediction errors can thereby be moderately avoided and reduce specificity.
3.1. Dataset Description
The dataset comprises ictal and non-ictal data, with two 9-second and a sample rate of 500 Hz examples of both shown in Figure 8.
Figure 8. 9 s seizure and non-seizure time domain signal of patient 2.
The ictal signal is typically characterized by rapid and drastic changes, whereas the non-ictal signal is usually more stable and regular. However, given the variability of brain waves and different patients, these characteristics are not definite. In the model training process, datasets from patients 2, 3, 4, 6, and 7 are used, whose details are shown in Table 4. Datasets from patients 1 and 5 are used as validation sets to evaluate the effectiveness of model migration.
3.2. Signal Reconstruction Result
A total of 3069 seconds of ictal and non-ictal data are imported from five different patients, as shown in Table 4. For the channel selection process, 150 seconds of continuous ictal and non-ictal data are selected from the dataset. They are first applied to the Butterworth band pass filter, which filters out artifacts and unwanted noises and keeps the relevant signals between frequencies 0.3 Hz and 50 Hz—the common range of brainwave frequencies. After that, the original sample rate of 500 SPS (samples per second) is resampled to 250 SPS, which maintains the signal quality and reduces calculation time. After the two pre-processing methods are applied, the resulting raw data becomes ready for feature extraction—an important process to gather useful information from the dataset. As stated in the Method section, 36 features in total are extracted from the raw dataset. Each measurement has a window of three seconds, which means every three seconds of data is computed to generate one number of that feature, totaling up to 100 calculated feature points per individual feature. These features are stored in a number of 2D NumPy arrays, related to the number of channels with each patient.
After all features are calculated and stored, a random forest model with 1000 estimators and a random state of 42 is created. Out-of-bag samples are used for the sake of channel importance calculation. For each patient, a corresponding array is created to store the importance score each channel has on this feature. For example, a patient with 20 channels will have an array of 20 by 36, each element signifying the “score” of the cross between the horizontal channel and the vertical feature. The said “score” is subsequently calculated by training the random forest model on each feature from each patient, with the help of “rfpimp”,
Table 4. Patients and data description.
a python library with feature importance calculation functions. The “importance” function takes in the model and test data, and produces a score for each channel as a measurement for their performance, which implies how important each channel is to the feature calculation. After data from the 36 features is obtained, they are added up to get a final score for each channel. This score determines the top 8 and 16 channels in each patient, which concludes the channel selection process.
Consequently, begins EFEG-based channel combination. First, all 3069 seconds of data are loaded, but only the top 8 and 16 channels are kept. They have then applied the same pre-processing methods. After that, the dot product of each dataset matrix and EFEG coordinate is calculated. At this point, each EFEG coordinate generates an array with rows representing Ex, Ey, and the optional common mode noise, which is discarded, leaving Ex and Ey. Another row of magnitude is added, computed by taking the square root of Ex and Ey’s square. Subsequently, the three EFEG combination arrays are ready to be extracted for the same 36 features, which is the final process of signal reconstruction. The window is the same three seconds, and that produces a total of 1023 features per EFEG coordinate per combination (Ex, Ey, magnitude). After concatenating the three combinations’ features vertically, a new array of 36 by 3069 features is generated, prepared to train the models. As an example, Figure 9 graphs out nine seconds of Ex, Ey, and magnitude data from the five coordinates to show the distinctions between different EFEG—reconstructed signals.
Before feeding features to models, a 1-D array of 3069 labels, consistent with the total number of feature values per feature, is created. In order to test their general performance score, five identical Random Forest models are constructed to serve as the performance benchmark. Primarily, the train test split function from the sklearn library is called, which splits the feature dataset into a training set and testing set, each comprising 70% and 30% of the original dataset. The random state is set to 42 to ensure the reproducibility of the results. The benchmark results are shown in Figure 10.
As the figure suggests, the first two coordinates have relatively balanced accuracy, sensitivity, and specificity, albeit the sensitivity is slightly lower than the latter three coordinates. The two best-performing coordinates are coordinates 4 and 5, each with an accuracy score of 92.07% and 91.75%, a comparatively satisfactory result of the signal reconstruction process.
Subsequently, the results of our novel EFEG approach is compared with the traditional signal reconstruction method of averaging signals from every channel (shown in Figure 11). First of all, two sets of raw data from the training patient datasets are separated from the five that are processed by EFEG coordinates. The two datasets include data from the best 8 and 16 channels normally as the EFEG feature columns. After that, instead of applying the EFEG method, all signal data in the same time window are averaged and used to create new feature columns by calculating the same feature values as the EFEG processed data. Afterward, both sets are passed through the ensemble models and evaluated.
Figure 9. Reconstructed signal comparison for different EFEG virtual coordinate matrices.
Figure 10. EFEG combination benchmark test result using random forest.
Figure 11. Seizure detection result comparison between different EFEG virtual coordinate matrix combined signal and conventional channel averaging signal.
3.3. Model Transferring Test Result
Although the trained models performed well with test labels from the same set of patients, it does not guarantee the same performance with other patients. In our experiment, data from two patients, 1 and 5, are prepared to evaluate our model’s performance in different patients (shown in Figure 12). These two patients never entered the training process, so they haven’t been encountered by our model. In this stage, the two patients’ data undergo the same process as the training dataset, dividing into the EFEG group, and are tested against their respective model.
Figure 12. Seizure detection result for new patients using transferred model.
As a result, all five coordinates perform worse when they are tested against the two new patients, which is a reasonable outcome considering the vast difference between devices and patients. All the accuracy scores are measured five times and averaged to eliminate outliers. For patient 1, the best performing EFEG coordinate, coordinate 5, scored 82.85%, with the next best coordinate 4 scoring 77.45%; for patient 5, however, all models perform significantly worse, with the best coordinate 4 scoring 59.33%. The discrepancy in results from the two patients is likely caused by distinct brain mechanisms that function in different patients, malfunctioning recording devices, or motion artifacts. After all, patient 5 does have a shorter recording time, and the signal collected may not be the best representatives. Overall, the EFEG reconstructed coordinates still demonstrate a decent performance level on new patients, and successfully prove their model transfer capabilities.
4.1. Summary and Expectation
Overall, the new EFEG method notably increases the model performance of all datasets that are applied to this method, as evidenced by Figure 11, where it is compared with the traditional way of averaging all signals. While the traditional methodology only considers the signal separately with no regard to their relative position, strength, and importance, the EFEG method creates a virtual electrical field that takes all those features into consideration. Through the use of channel importance selection, important channels are ranked before putting into EFEG calculation, which guarantees only the most relevant channels get involved in the process. It also aids in model transfer across different devices, which usually have vastly different channel amounts. Additionally, a third common-mode noise column is included in three of the five virtual coordinates, aimed to reduce possible artifacts during the EEG recording, that are common to all channels. Its efficacy is sufficiently demonstrated in Figure 12, where coordinate 5, the coordinate with a common mode noise included, performs the best compared to the other ones.
The selection of features is also an important step in achieving superior results. Currently, our model is trained on a set of 36 different features, as enumerated in Table 3. A vital objective when considering the choices of features is their distinction under different labels. In EEG data training, the paramount feature is the “power” contained in each time window. Figure 8 visualizes the common difference between ictal and non-ictal signals—the former is fickler and generally contains higher power, while the latter is usually more orderly and has lower power. Most of our selections of features, including Hjorth parameters, PSD, mean, variation, the entropy of wavelets and Hilbert transformed EMD, evaluate the power volume in some ways. This in general leads to a higher performance level.
In addition to the EFEG method and feature selection, the choices of proper classifiers also play an important role in generating accurate results. Considering the difference between EEG signals across patients, two “weak” and one “strong” classifiers are chosen. The two “weak” ones, random forest and SVM, are included to mitigate overfitting caused by DNN, a powerful classifier. By ensembling the three models, their probability predictions are averaged, and a result is generated based on their combined prediction. This way, the model can be more tolerant toward ambiguous features, where the result of a single classifier is insufficient. Moreover, combining three classifiers this way can also contribute to model transfer between different patients, an intractable challenge in current prediction models due to the volatility of EEG signals.
4.2. Future Work
Several improvements can be made to our current model to encompass a wider range of data and be a step closer to clinical or commercial application. Firstly, a pre-ictal phase can be added, which is the time range most seizure detection models try to recognize. Given that this phase is before the physical seizure onset, being able to discern it is necessary on any devices that aim to give users enough time for intervention methods. However, as the pre-ictal phase can often be mingled with the ictal and non-ictal phase, it will be a challenge to accurately predict its presence before the ictal phase and will require some further investigations and improvements to the current EFEG modality.
Secondly, the dataset we employ is from the Kaggle Competition. Therefore, information on the device from which the EEG signals are recorded is not included. A potential improvement to the raw dataset can be to have the device’s model, enabling us to use real coordinates instead of virtual ones we arranged based on channel importance. Although not confirmed, the actual coordinates can most likely improve the EFEG accuracy, and thus eliminating the need to rank the channels based on their importance as well, since their positions will be provided.
Based on the detection results, the model transfer performance in patient 5 is subpar, presumably stemming from the fact that the EEG signals provided are not sufficient to ensure that the model performs at its best level. In future experiments, EEG data from more patients with longer recording time should be acquired and provided to the model.
Through a combination of data pre-processing, channel selection, EFEG signal reconstruction, feature selection, and using multiple classifier models ensembling, our new classification model proves to perform successfully on test data. The innovative EFEG method combined with channel selection overcame the challenge of lacking information on electrodeposition. By selecting high importance channels and assigning them to virtual EFEG coordinate, we are able to achieve model transfer between different devices and patients with the highest accuracy at 82%. The original model ensembling of Random Forest, SVM, and DNN classifiers complement each other and combine to produce a result that allows model migration across different patients without suffering too much from a lowered accuracy. On average, our proposed ensembled model attains a prediction result at about 85% on average which is better than classical channel averaging models by 7% - 14%.
The author would like to offer his genuine gratitude to Jin Zhou, his research mentor, for the indispensable help he provided in this research paper. Without his expertise and guidance on the subject matter, the author would never have completed this research due to the numerous challenges in the process. The author is also grateful for the invaluable opportunity provided by Shing-Tung Yau Science Awards, and the committee of expert judges for reviewing this paper.
 WHO. Epilepsy.
 Téllez-Zenteno, J.F., Dhar, R. and Wiebe, S. (2005) Long-Term Seizure Outcomes Following Epilepsy Surgery: A Systematic Review and Meta-Analysis. Brain, 128, 1188-1198.
 Mihajlovic, V., Grundlehner, B., Vullers, R. and Penders, J. (2014) Wearable, wireless EEG Solutions in Daily Life Applications: What Are We Missing? IEEE journal of biomedical and health informatics, 19, 6-21.
 Kuhlmann, L., Lehnertz, K., Richardson, M. P., Schelter, B. and Zaveri, H.P. (2018) Seizure Prediction—Ready for a New Era. Nature Reviews Neurology, 14, 618-630.
 Epilepsy Detection Algorithm Leaderboard.
 Thanaraj, K.P., Parvathavarthini, B., Tanik, U.J., Rajinikanth, V., Kadry, S. and Kamalanand, K. (2003) Implementation of Deep Neural Networks to Classify EEG Signals Using Gramian Angular Summation Fifield for Epilepsy Diagnosis. arXiv Preprint, arXiv: 04534.
 Ramgopal, S., Thome-Souza, S., Jackson, M., Kadish, N.E., Fernández, I.S, Klehm, J., Bosl, W., Reinsberger, C., Schachter, S. and Loddenkemper, T. (2014) Seizure Detection, Seizure Prediction, and Closed-Loop Warning Systems in Epilepsy. Epilepsy & Behavior, 37, 291-307.
 Versek, C., Frasca, T., Zhou, J., Chowdhury, K. and Sridhar, S. (2018) Electric Fifield Encephalography for Brain Activity Monitoring. Journal of Neural Engineering, 15, Article ID: 046027.
 Mohamed, M. and Deriche, M. (2014) An Approach for ECG Feature Extraction Using Daubechies 4 (dB4) Wavelet. International Journal of Computer Applications, 96, 36-41.
 Rajaguru, H. and Prabhakar, S.K. (2017) Time Frequency Analysis (dB2 and dB4) for Epilepsy Classification with LDA Classifier. 2017 2nd International Conference on Communication and Electronics Systems (ICCES), Coimbatore, 19-20 October 2017, 708-711.
 Lotte, F., Congedo, M., Lécuyer, A., Lamarche, F. and Arnaldi, B. (2007) A Review of Classification Algorithms for EEG-Based Brain-Computer Interfaces. Journal of Neural Engineering, 4, R1.
 Bre, F., Gimenez, J.M. and Fachinotti, V.D. (2018) Prediction of Wind Pressure Coefficients on Building Surfaces Using Artificial Neural Networks. Energy and Buildings, 158, 1429-1441.