About 15 million newborns are born prematurely every year in the world  . Unfortunately, many of the surviving babies suffer from lifetime disabilities such as visual and auditory problems, attention difficulties, and learning problems, etc. To avoid these pathologies, it is essential to diagnose, prognose, and treat preterm born babies as early and as accurately as possible   . Usually, preterm babies receive a sustained attention provided by neonatal intensive care units through brain magnetic resonance images, ultrasound assessment or EEG. Non-invasive EEG signals record electrical activity of the brain through electrodes placed along the scalp. EEG signals measure voltage fluctuations resulting from ionic current flows within the neurons of the brain. This technique gives precious information on the ongoing neurological status of a patient and remains a major diagnostic tool for neurology in many situations such as epilepsy, sleep disorders and coma  -  . As shown in Figure 1, for preterm infants, EEG is physiologically formed by an alternation of bursts of activity and periods of quiescence, called interburst intervals (IBI). The duration and the proportion of IBI vary according to the sleep stages; they are more prolonged in calm sleep. According to the term of birth, they are more prolonged for more premature babies.
During the past four decades, several studies exploited preterm babies EEG to study neural disorders. Intensive studies focused on the neurological outcome of neonatal EEG  -  . Authors of  and  defined poor outcome as death or survival with neurodevelopment impairment and good outcome as survival without impairment. In  , the authors evaluate the correlation between the characteristics of the amplitude-integrated EEG (aEEG), the cerebral ultrasound assessment and the further neurodevelopmental outcome at 3 years of age in premature infants born after less than 30 weeks of gestation. They conclude that
Figure 1. An IBI example.
aEEG is an accurate method for establishing long-term neurological prognosis with sensitivities and specificities comparable to cerebral ultrasound assessment. In  , the authors note a significant correlation between the long-term neurological prognosis of preterm infants and the IBI value measured from the aEEG in the first 3 days. More recently  , a meta-analysis confirms the value of EEG in establishing long-term prognosis in premature infants. In everyday clinical practice, the EEG analysis is still done by visual analysis which leads to several difficulties. First, physicians used to analyse EEGs of very preterm infants are rare, often causing delays in the interpretation of EEG tracings, as well as issues related to subjectivity in the analysis. On the other hand, in small hospitals, expertise is often not available. Therefore, within the current trend towards developing automatic diagnostic aid methods, the goal of this paper is to propose a method for automatically predicting the physician’s EEG analysis (abnormal EEG versus normal EEG).
Several studies tried to automatize bursts detection and seizures occurrences (uncontrolled electrical activity in the brain, producing physical convulsions, minor physical signs, thought disturbance, or a combination of those symptoms). For instance, authors of  , suggested a method for discriminating between seizure and non-seizure EEG epochs of full-term infants. They extracted features, in the time domain, frequency domain and information theory domains from 17 full-term newborns. Features were then classified using a Support Vector Machine (SVM) into seizure and non-seizure EEG. It is noteworthy that EEG characteristics vary a lot between preterm babies and full term babies  and are therefore very different from adults EEG. Few studies tackle the problem of identifying abnormal EEG of preterm infants. Within the scope of automatic EEG analysis for premature newborns, we can put forward the work presented by  . The authors proposed a method for automated burst detection in the EEG. The detection is based on line length; this length is the running sum of the absolute differences between all consecutive samples into a predefined window  . The corpus consisted of 10 preterm infants with a gestational age of less than 34 weeks. It is worth noting that in these approaches     the retrospective investigation was done without prospective investigation, which may induce inherent biases.
Finally we would like to quote a recent work we did on this problematic  . On the same corpus of this paper (100 infants born after less than 35 weeks of gestation), IBI and bursts were extracted on 316 EEG recordings. Then temporal features were computed from these bursts and IBI; this conduced to 12 indexes for each EEG. Then the age of gestation has been added to those 12 features and tested with multiple linear regressions on all features. With a 5 cross validation, we reached a sensitivity of 85.53% ± 15.97%, a specificity of 74.14% ± 5.67%, and an AUC of 0.80 ± 0.08. The main drawbacks of that paper concern the fact that it uses only multiple linear regressions and no other machine learning methods like neural networks or support vector machines for example. Furthermore, no selection of pertinent features has been done because all 13 features were systematically used. Finally the standard deviation of the sensitivity is very high, this could conduce to over fitting of the predictive model that has been retained.
The method outlined in this paper works in four steps. First, a preprocessing stage: EEG was filtered, using a band-stop IIR filter and smoothed using a moving average window. Secondly, IBI were detected by thresholding standard deviation of preprocessed EEG. Thirdly, temporal features were extracted from IBI and bursts. Finally, feature selection was incorporated in the classification step so as to select relevant features that maximize classification performance. Two classifiers were tested: Support Vector Machines (SVM) and Multiple Linear Regressions with all combinations of features. Performance measures were evaluated using areas under the ROC curves (AUC,  ). The proposed method was validated on a cohort of 100 preterm babies with no severe brain injuries.
The paper is outlined as follows: Section 2 describes the collected database. Section 3 accounts for the method. While Section 4 describes the results, Section 5 provides a discussion. Finally a conclusion is drawn and some future works are suggested.
EEG signals from 100 preterm infants were collected in the Hospital of Angers, France, at the neonatal intensive care unit of the neuropediatric department. This monitoring was part of the usual clinical follow up of premature infants. All legal representatives of the babies gave informed consent for participation in research studies. EEG were recorded at sampling rate of 256 Hz. The recording system (Alliance from Nicolet Biomedical) was used with 8 to 11 adapted scalp electrodes according to the head size. Therefore, each EEG was composed of 11 channels. Electrodes were placed according to the international 10 to 20 system (Figure 2). In the acquisition procedure, we did not use any hard filters besides the internal filters of the EEG device; we used only a software high-pass filter with 0.1 Hz as a cut-off frequency, which is used to remove the offset of the baseline.
Thus, 416 neonatal EEG recordings lasting from 30 to 45 minutes were performed between January 1, 2003 and December 31, 2004. All 100 infants had less than 35 weeks of gestation. Each baby had between 1 to 7 recordings.
The 416 EEG were reviewed by a neuropediatrician expert and classified as normal, abnormal and doubtful. Thus by a careful visual analysis, EEG were considered normal if the background activity, in relation to the gestation age, was normal and no abnormal features on the EEG appeared. The abnormal EEG were those who showed excessive discontinuities with maximal IBI duration greater than 50% of the maximal value (in relation to the age of gestation), seizures or positive rolandic sharp waves of more than 2 per minute. From 416 EEG, 100 EEG recordings were considered as doubtful and were thus rejected. Finally, for the 316 kept EEG, the careful visual eye inspection led to 274 normal EEG (88.77%, 31.04 ± 2.13 weeks of gestation) and 42 abnormal EEG (11.23%,
Figure 2. Names and positions of electrodes from  .
30.01 ± 2.19 weeks of gestation). An example of abnormal EEG is illustrated in Figure 1 showing the phenomenon of IBI.
3.1. Problem Statement
Let denoting the EEG signal of N samples recorded in a given channel, in which abnormal EEG have to be detected. The EEG signal essentially contains background activity where bursts appear together with abnormal activities (IBI with discontinuity, seizures, rolandic sharp waves, etc.). The problem we address in this paper consists first in detecting the IBI and secondly classifying EEG into normal or abnormal. Automatic detection of abnormal EEG works in four steps summarized in Figure 3: preprocessing, IBI detection, feature extraction, feature selection and classification. In this section, each of these steps will be detailed.
For each channel, raw EEG signal has been band-stop filtered at 50 Hz with a notch second order Butterworth IIR filter. Thus, we obtained a filtered signal where the power supply frequency of 50 Hz was removed. Then, has been smoothed by calculating the moving average over a window of width :
3.3. Inter Burst Intervals Detection
For detecting IBI, the standard deviation of signal has been computed and thresholded as in the work of  . Standard deviation has been computed on sliding windows of size , with an overlap of samples ( ) as in this formula:
Figure 3. Block diagram of the method.
Successive standard deviation segments with values less than a threshold (in μV) and longer than 1 s have been detected and delimited by an onset and an offset boundary limit markers. Consecutive detections less than 0.5 s apart have been grouped together and considered as the same IBI. Finally, only IBI present across all 11 EEG channels and longer than 1 s have been kept. Noteworthily, it is highly crucial to set the threshold so as to get the best performance. Hence, 100 different values of threshold , selected from 1 to 100 μV with a step of 1, have been tested.
3.4. Feature Extraction
For each EEG of 11 channels, a vector of 13 features has been extracted as following:
1) the number of IBI, called nb_IBI,
2) the total duration of IBI, which is defined as the sum of all IBI durations, called tot_IBI (seconds),
3) the percentage of IBI in the EEG, called ,
4) the duration of the longest IBI, called Max_IBI (seconds),
5) the maximum of IBI percentage in the EEG, called ,
6) the mean duration of IBI which is defined as the sum of the IBI durations divided by the number of IBI, called Mean_IBI (seconds),
7) the number of bursts, called nb_B,
8) the total duration of the bursts that are calculated as the sum of all bursts durations, called tot_B (seconds),
9) the percentage of bursts in the EEG, called ,
10) the duration of the longest burst, called Max_B (seconds),
11) the maximum of bursts percentage in the EEG, called ,
12) the mean duration of the bursts was calculated as the sum of the bursts durations divided by the number of bursts, called Mean_B (seconds),
13) the gestational age of the infant at the time of the EEG examination, called Age_EEG (weeks).
3.5. Feature Selection and Classification
The extracted features and the gestational age form a set of vectors with M the total number of EEG. The entire data set is written as with class labels for Abnormal and Normal EEG respectively. The task hereafter consists of selecting relevant features and discriminating EEG into Abnormal or Normal. Two classifiers were compared: Support Vector Machines (SVM) and Multiple Linear Regressions. In the following, feature selection is explained in the context of both classification methods.
3.5.1. Support Vector Machines
Feature extraction was done along with SVM classification  -  . We will now very briefly describe the principles underlying the SVM principles.
Technically, SVM separate the data set by a hyperplane with the largest possible margin and the minimal number of misclassified data. This hyperplane is defined by a weight vector , d being the dimension of feature vectors, and an offset as following:
This hyperplane is calculated by solving an optimization problem under constraints:
is the maximal margin hyperplane, C is the regularization parameter and are the nonnegative slack variables  measuring errors.
By setting to zero the derivatives of the partial associated Lagrangian according to the primal variables and , the optimization problem of the dual formulation can be written as:
The linear SVM is extended to a non-linear classifier by mapping data into a higher dimension space using a mapping function , then the optimization problem becomes as follows:
where K designs the kernel function. The hyperplane solution has the final following formulation:
Several kernels were tested, namely Radial basis function kernels (RBF), polynomial kernels and linear kernels. As for the dimension d of input data, all combinations of the 13 features were tested for each kernel. This results in testing combinations for each kernel and for each of the 100 threshold values aforementioned in 3.
For the implementations, we used Matlab© (The Mathworks Inc., South Natic, MA, USA) and the LS-SVM 1.8 toolbox that provides a complete implementation of SVM  .
3.5.2. Multiple Linear Regressions
Multiple linear regression is a generalization of the simple linear regression method  . This method attempts to model the relationship between a response variable and explanatory variables. Suppose we have n observations and p explanatory variables, with the n variables to be predicted and the explanatory variables, we have the following equation:
where the coefficients are the parameters to be estimated and are the errors of the model that expresses the missing informations.
Like for SVM, all combinations of the 13 explanatory variables for each threshold were tested.
3.6. Performance Evaluation
To evaluate the accuracy of the predictions, two parameters were used: the sensitivity and the specificity. The percentages of sensitivity and specificity were computed as follows:
- TP: number of true positives, TN: number of true negatives,
- FN: number of false negatives, FP: number of false positives.
The use of sensitivities and specificities is based on a precondition: the distribution of “normal” and “abnormal” EEG must be significantly balanced. We reached a prevalence of 11.23%, so this condition of data balance was not met by the corpus of EEG. Therefore, ROC curves were used : this curve-based method is independent of class distribution and independent of misclassification data proportion. By plotting sensitivity versus 1―specificity for different cutoff values, the ROC curves were built. The area under the curve reflects the accuracy of the test: a high area gives a high test accuracy  .
For estimating the generalization error with a small bias and a small variance, we used a K-fold cross-validation  (K equal to 5). So, the data set is randomly divided into K equal subsets (called folds). The classifier is trained on folds; the validation performances are then measured on the remaining fold that was not used during the training phase. The process is repeated K times by using the remaining fold to estimate the validation errors: thus, the performance of the classifier is obtained by averaging the K AUC. The latter area gives us an overall accuracy for each ROC curve; therefore to reach the best threshold for each curve, the best sensitivity and the best specificity have been computed by minimizing the quantity:
The 5 subsets were built randomly; just keeping an equivalent number of children in each subset: due to the number of 42 abnormal EEG (indivisible by 5), we had 3 sets of 8 abnormal EEG and 2 sets with 9 abnormal EEG.
During the 5 cross validations, 3 kernels (linear, polynomial and gaussian radial basis) were tested. For the polynomial kernel, the degree varied from 3 to 5. The gaussian radial basis worked with . The optimal SVM kernels (linear, polynomial and gaussian radial basis) that gave the highest mean value of the K AUC were retained.
Table 1 shows performance of all classifiers as a mean ± standard deviation of sensitivity, specificity and AUC. It is clear that the Multiple Linear Regression method achieved the best performance with a mean sensitivity of , a mean specificity of and a mean AUC of . The selected threshold was equal to 32 μV. The best combination of features was obtained with 11 features: Age_EEG, nb_IBI, tot_IBI, P_IBI, Max_IBI, P_Max_IBI, Mean_IBI, nb_B, P_B, P_Max_B and Mean_B (see Table 2 for the descriptive statistics of all extracted features for the best threshold equal to 32 μV).
For linear SVM, the threshold was 35 μV and the selected features were: nb_IBI, P_IBI, P_Max_IBI Mean_IBI, nb_B, P_Max_B. SVM with polynomial kernels reached the optimal performance with a threshold equal to 32 μV using Age_EEG, tot_IBI, P_IBI, Max_IBI, tot_B, P_B, Max_B, P_Max_B, Mean_B. Finally, the gaussian SVM used only 3 features Age_EEG, Mean_IBI, nb_B, with a threshold equal to 25 μV.
The final detector was trained on all the corpus with the Multiple Linear Regression method on the 11 features Age_EEG, nb_IBI, tot_IBI, P_IBI, Max_IBI, P_Max_IBI, Mean_IBI, nb_B, P_B, P_Max_B and Mean_B. With the prediction set to +1 (Abnormal) and −1 (Normal), we obtained the Equation (10) which is detailed in the following.
Table 1. 5-cross validation results.
Table 2. Extracted features for a threshold equal to 32 μV.
where variable P represents the variable prediction, variable x1 represents Mean_IBI, variable x2 represents nb_IBI,..., variable x11 represents Mean_B (all variables are shown in Table 3). Therefore, Equation (10) shows the weight (impact) of features on the prediction and their positive or negative correlations with prognosis. The weight associated to each feature and their cumulative values are shown in Table 3.
All calculations were performed on computers equipped with Intel Core i5-3470 CPU at 3.20 GHz, 8 Go of RAM under Linux Ubuntu. We used 10 computers simultaneously: for the 100 thresholds, the linear SVM kernels took 9 days and 14 hours. While the polynomials SVM kernels took 65 days and 8 hours, only 10 days and 2 hours were necessary for the RBF SVM kernels. Finally, the Multiple Linear Regressions took only 59 minutes on one computer.
Experimental results show that a Multiple Linear Regression estimated on 11 features (Age_EEG, nb_IBI, tot_IBI, P_IBI, Max_IBI, P_Max_IBI, Mean_IBI, nb_B, P_B, P_Max_B and Mean_B) can detect accurately abnormal EEG. The detection of an abnormal preterm infant EEG reaches a sensitivity of 95.11% ± 10.01%, a specificity of 77.44% ± 7.62%, and an AUC of 0.82 ± 0.04. Thus, if
Table 3. The impact and cumulative impact of each variable.
the automatic detection considers that an EEG is abnormal, it must be interpreted also by the neurologist before undergoing more medical examinations such as an MRI (Magnetic Resonance Imaging). Finally, due to the high sensitivity of our test, an EEG classified as normal does not need to be interpreted urgently by the doctor.
A main advantage of the proposed method is that threshold and feature selection are tuned so as to maximize classification performance. There are of course several ways to select threshold and features      ; but they are not optimal from a classification point of view.
When comparing SVM to Multiple Linear Regressions, we can see that computational time of linear SVM is 1.32 × 106 times slower, RBF SVM is 1.46 × 106 times slower and polynomial SVM is 9.52 × 106 times slower than that of regressions. Besides, Multiple Linear Regressions performance are higher than SVM ones. However, SVM results are promising, namely those obtained with RBF SVM kernels where only 3 variables were selected (Age_EEG, Mean_IBI, nb_B). This sparsity in feature selection could enhance the robustness of our learning machines   . It is to note that the Multiple Linear Regression method captures almost 95% of the prediction process with 5 variables (Mean_IBI, nb_IBI, nb_B, P_Max_IBI, Age_EEG), as can be seen in the cumulative expressive power (Table 3).
It is also worthy to note that performances were achieved on a set of 316 EEG after rejecting 100 doubtful EEG. It would be interesting to learn a classifier that could automatically labels these suspicious recordings as ambiguous. The weaknesses of this article relies on the fact that EEG classifications were only achieved by a single EEG expert. This is a major flaw of the proposed system where two or three expert opinions would limit the biases of the predictions. Another limitation of this paper lies in the fact that only SVM and Multiple Linear Regressions were used and not neural networks for example. The reason for this is essentially because it would have taken too long to test all the combinations with neural networks.
This study suggests an automated method to detect abnormal Electroencephalograms (EEG) of preterm infants. The novelty of this paper lies in the combination of these three facts: firstly we work on preterm infants; secondly we propose to automatize the current diagnosis and not to automatize a long term neurological outcome and thirdly this automated prediction is evaluated in a prospective group and not only in a retrospective group. The method consists of detecting Inter Burst Intervals, extracting features from EEG, selecting relevant features and classifying them into normal or abnormal EEG. Thus, gestational age and 10 features (N_IBI, TOT_IBI, P_IBI, MAX_IBI, P_MAX_IBI, MEAN_IBI, N_B, P_B, P_MAX_B, MEAN_B) extracted from the EEG and introduced in a Multiple Linear Regression model, could reliably predict an abnormal finding with a sensitivity of 86.11% ± 10.01%, a specificity of 77.44% ± 7.62% and an AUC of 0.82 ± 0.04.
These results are very promising and encourage further research that could enhance detection of abnormal EEG, namely considering more features, like frequency and information theory features for instance. Finally, testing combination of several classifiers could be a promising path of research too.
This research was paid by no grant. Sincere thanks to J.F. Gelfi and R. Woodward for their help in the improvement of the quality of this paper.