Impact Factor 3.398

The #1 most cited and #1 largest open-access publisher in the category of Neuroscience.

Original Research ARTICLE

Front. Neurosci., 09 October 2012 |

Multi-class motor imagery EEG decoding for brain-computer interfaces

  • 1Department of Computer Science and Technology, Tongji University, Shanghai, China
  • 2Key Laboratory of Embedded System and Service Computing, Ministry of Education, Shanghai, China
  • 3Centre for Neuroscience Studies, Queen’s University, Kingston, ON, Canada
  • 4Canadian Action and Perception Network, Toronto, ON, Canada

Recent studies show that scalp electroencephalography (EEG) as a non-invasive interface has great potential for brain-computer interfaces (BCIs). However, one factor that has limited practical applications for EEG-based BCI so far is the difficulty to decode brain signals in a reliable and efficient way. This paper proposes a new robust processing framework for decoding of multi-class motor imagery (MI) that is based on five main processing steps. (i) Raw EEG segmentation without the need of visual artifact inspection. (ii) Considering that EEG recordings are often contaminated not just by electrooculography (EOG) but also other types of artifacts, we propose to first implement an automatic artifact correction method that combines regression analysis with independent component analysis for recovering the original source signals. (iii) The significant difference between frequency components based on event-related (de-) synchronization and sample entropy is then used to find non-contiguous discriminating rhythms. After spectral filtering using the discriminating rhythms, a channel selection algorithm is used to select only relevant channels. (iv) Feature vectors are extracted based on the inter-class diversity and time-varying dynamic characteristics of the signals. (v) Finally, a support vector machine is employed for four-class classification. We tested our proposed algorithm on experimental data that was obtained from dataset 2a of BCI competition IV (2008). The overall four-class kappa values (between 0.41 and 0.80) were comparable to other models but without requiring any artifact-contaminated trial removal. The performance showed that multi-class MI tasks can be reliably discriminated using artifact-contaminated EEG recordings from a few channels. This may be a promising avenue for online robust EEG-based BCI applications.


Electroencephalography (EEG) of brain activity has long been used for clinical diagnosis and exploring brain function. Over the past two decades, EEG-based brain-computer interfaces (BCIs) have received increased attention mainly due to the ease of use, high temporal resolution, and low cost compared to other non-invasive measurements of brain activity, such as fMRI, MEG, PET scans, etc (Wolpaw et al., 2002; Noirhomme et al., 2008; Arvaneh et al., 2011). Motor imagery (MI), which can be defined as the mental rehearsal of a motor act without overt movement execution (Alkadhi et al., 2005), is often used in EEG-based BCIs. By thinking about moving their arms, hands, tongue, legs, or rotating an object, participants can produce relevant motor-related EEG patterns. If properly decoded, these patterns can then be translated into a command to control external devices like a mobile robot/wheelchair (Millán et al., 2004; Lew et al., 2006), or a virtual helicopter (Doud et al., 2011).

However, there are serious challenges. For example, low signal decoding performance, highly subject-specific data, and low processing speed limit the practical applications as well as usefulness in analyzing neurophysiological data for human brain investigations. One reason for this is that EEG signals are prone to contamination from artifacts such as blinking or movements of the eyes (electrooculography, EOG), heart beats (electrocardiography, ECG/EKG), and electromyography (EMG) activity of cranial musculature. Movements of head, body, jaw, or tongue, etc. can also interfere with recordings. For example, EOG artifacts are a major noise source in EEG recordings. However, restricting eye movements/blinks limits experimental designs and may impact cognitive processes under investigation (Joyce et al., 2004). Generally speaking, there are two kinds of strategies for obtaining high-quality EEG recordings (Joyce et al., 2004; Fatourechi et al., 2007; Schlögl et al., 2007; Hallez et al., 2009; Zhou and Gotman, 2009): (1) eliminating contaminated trials after visual inspection, or (2) correcting the artifacts automatically. The former method leads to a substantial loss of data. Moreover, it requires EEG experts to carefully inspect each trial, a process that is generally very time-consuming and subjective. On the other hand, regression-based techniques have shown promising results in the field of EOG-related artifact correction (Schlögl et al., 2007). Another effective automatic method to correct for EOG artifacts is independent component analysis (ICA; Joyce et al., 2004; Fatourechi et al., 2007; Hallez et al., 2009; Zhou and Gotman, 2009). In this study, we combine regression analysis (RA) with ICA to automatically recover the source signals from EEG signals contaminated by EOG as well as artifacts generated by other sources.

Another important challenge for online EEG decoding is choosing the optimal number of electrodes and the relevant frequency bands to improve discrimination between MI (or other) tasks. In principle, using a small number of channels without carefully choosing their locations may cause a loss of important electrophysiological information. However, including more channels to collect data will provide redundant information which could increase the risk of data over-fitting, and which increases computational complexity to the degree that would make real-time BCI application infeasible with currently available desktop computer power. Therefore, it is necessary to determine the optimal channel set in EEG-based BCI studies. Moreover, the optimized electrode locations, obtained through what is also known as spatial pattern filtering, would reflect the specific motor cortical regions related to different MI tasks which helps to provide further insight into cognitive resources used in the tasks. Pfurtscheller and Aranibar(1977, 1979); Pfurtscheller and Lopes da Silva (1999) introduced an event-related (de-) synchronization (ERD/ERS) analysis method to distinguish between channels and to select a channel set for MI classification. However, the ERD/ERS depends on frequency band and so selecting the most discriminating frequency bands is important for ERD/ERS analysis. In Pfurtscheller and Lopes da Silva (1999), the authors suggested three effective ways to determine the upper and the lower limits of the band-pass filter, i.e., detect the frequency bands based on short-time power spectra, continuous wavelet transform, and peak frequency. In this study, we assume that non-contiguous frequency band filters might provide a much more accurate way to quantify the sensorimotor in the frequency domain than manually selecting a broad frequency range filter. Hence, we conducted an automatic selection of subject-specific reactive non-contiguous frequency bands via state-of-the-art information theoretic sample entropy.

To summarize, the principal aim of this study was to introduce a novel multi-class MI EEG decoding for BCIs, including an automatic artifact correction method to recover the original source signals from EOG and other artifacts and choosing the least number of channels yet yielding the best performance with the most reactive frequency bands, i.e., sub-bands in frequency domain sets, of the recordings. We tested the performance of our method on a well-known publicly available data set from BCI competition IV in 2008.

Data Acquisition and Datasets

In this study, we used dataset 2a from BCI Competition IV1, i.e., a four-class MI study which was provided by the Institute for Knowledge Discovery (Laboratory of Brain-Computer Interfaces), Graz University of Technology (Austria). Compared to datasets from past BCI Competitions (BCI Competition I, announced at NIPS 2001, BCI Competition II, also called BCI Competition 2003, and BCI Competition III, i.e., BCI Competition 2005), there were eye movement artifacts in dataset 2a as a new challenging problem that is highly relevant for practical BCI systems.

The data set consists of EEG data from nine subjects. Each subject was sitting in a comfortable armchair in front of a computer screen. The cue-based BCI paradigm consisted of four different MI tasks, namely the imagination of movement of the left hand (class 1), right hand (class 2), both feet (class 3), and tongue (class 4). Two sessions on different days were recorded for each subject. Each session was comprised of six runs separated by short breaks. Each run consisted of 48 trials (12 for each of the four possible classes), yielding a total of 288 trials per session. On the left of Figure 1 is depicted the timing scheme of one trial. An acoustic stimulus indicated the beginning of a trial and a fixation cross (+) was displayed for 2 s, which subjects were requested to fixate. Then a cue in the form of an arrow pointing either to the left, right, up, or down (corresponding to one of the four classes mentioned above) was displayed for 1.25 s. This prompted the subjects to carry out the mental imagination until the fixation cross disappeared from the screen at t = 6 s. A short break followed which lasted 1.5–2.5 s allowing subjects to relax. Twenty-two referenced EEG channels (Figure 1 right) and three monopolar EOG channels (positioned above the nasion and below the outer canthi of the eyes) were recorded using Ag/AgCI electrodes (left mastoid serving as reference and the right mastoid as ground), were sampled at 250 Hz and band-pass filtered between 0.5 and 100 Hz, with the 50 Hz notch filter enabled (Brunner et al., 2008).


Figure 1. Left: trial timing of the motor imagery paradigm. (Cued MI: left hand, right hand, both feet, tongue); Right: electrode montage corresponding to the international 10–20 system (adapted from Brunner et al., 2008).

Analysis Methods

We labeled the proposed method a five-stage decoding of EEG (FSDE). First, the original EEG signals were segmented into trials according to the header structure information (see Brunner et al., 2008 for details). Then, the correction method based on RA in combination with the fast ICA (FastICA) was used. The third stage was a normalization process. The z-score normalization was applied on the EEG segments. The fourth stage consisted of channel selection. After comparing the ERD/ERS value between target MI class and other non-target classes, data from the selected channels were used for later feature extraction. Finally, the classification was performed using support vector machine (SVM) classifiers. The full processing procedure is shown schematically in Figure 2, and the details are explained in the following sub-sections.


Figure 2. Schematic diagram for the EEG signal processing procedure (Five-stage decoding EEG – FSDE).


As Figure 2 shows, there are two different strategies in EEG segmentation. One is segmentation without invalid sample removal. The other is segmentation with artifact removal. Since our goal was to develop an algorithm that was robust to outliers and artifacts, no trials were removed in our experiment.

Artifact Correction

Extending the work of Schlögl et al. (2007), we assume the following linear mode including EOG and other artifacts:


where X=[x1,x2,,xN]N×T denotes a matrix that represents the recorded EEG signals, N and T denote the number of channels and the number of sampled time-points, respectively, A is composed of constant coefficients aij and is a linear mixture unknown matrix, S is the uncontaminated signal without artifact contamination, U denotes the three EOG components, and K = [k1(i), k2(i), k3(i)]T indicates the weights of the EOG artifacts at EEG channel i for signal correction.

Let Y = AS, then Eq. 1 can be written as

Y = X-UK.(2)

In order to obtain Y, the EOG noise U and its weighting coefficients K must be known. Here, U is known because it was recorded by separate EOG channels (they are positioned close to the eyes in order to minimize the influence of non-EOG components). In order to identify the weighting coefficients K, we assume that the EEG signal and the EOG noise are independent because they come from different cognitive component sources, then


where <UTY>=0 results in


where <UTU> is the auto-covariance matrix of the EOG channels, and <UTX> is the cross-covariance between the EEG and EOG channels. Accordingly, the output Y can be calculated from EOG artifacts by Eq. 2.

Our aim is to obtain the independent source signals S which cannot be recorded directly. Therefore, after correcting EEG from EOG artifacts, ICA was employed to unmix the signals from other artifacts. Those components corresponding other artifacts are not identified via visual inspection but will be discarded through the subsequent channel selection algorithm. In this study, 25 physical sources emit electric signals. Each records a mixture of the original source signals.

Y=AS[ y1(t)yn(t) ]=[ a11a1nan1ann ][ s1(t)sn(t) ],(5)

where yiY,aijA,siS.

All signals can be regarded as a linear superposition of the real task-related brain signals S. The aim is to find the source signal S from the mixture Y. Since the mixing coefficients aij are different enough to make the matrix invertible, there exists a matrix W with coefficients wij. Multiplying the unmixing matrix W to Eq. 5, results in

Z=WY=WAS[ z1(t)zn(t) ]=[ w11w1nwn1wnn ][ y1(t)yn(t) ],(6)

where Z = [z1, z2, …, zn]T can be regarded as being mathematically similar to signal S, especially when WA = E, i.e., matrix W and A are inverse to one another and Z is equal to the source signal S. W = [wij]is a so-called unknown unmixing matrix.

Based on this, the aim has been changed to estimate wij in Eq. 6. In fact, if the signals are not Gaussian, it is sufficient to find an “unmixing matrix” by considering the statistical independence of different linear combinations of Y. In the present study, the classical ICA algorithm (FastICA MATLAB pack2) was used to determine W from the given multidimensional signals (see FastICA in Appendix for detail). Finally, the independent source signal Z can be calculated by Eq. 6.


Before normalization, the signals from each electrode were winsorized to reduce the effects of large amplitude outliers (Hoffmann et al., 2008): for the signals from each electrode the 5th percentile and the 95th percentile were computed. Amplitude values lying below the 5th percentile or above the 95th percentile were then replaced by the 5th percentile of the 95th percentile, respectively. We used this method because both mean and SD are sensitive to outliers. Normalization steps were then applied to EEG signals for possible variations in signal acquisition from trial to trial. In our experiments, normalization techniques, such as log (Nakayama and Inagaki, 2006), min-max normalization, and zero-mean normalization (z-score) were tested. Compared with other normalization methods, the z-score normalization data set had the highest accuracy. The normalized signals S are given by


where μij=1Pk=1PSijk,σij=1Pk=1P(Sijk-μij)212,SN×T×P, N, T, and P denotes the number of channels, number of measurement samples, and number of trials, respectively.

Channel Selection Based on ERD/ERS Analysis

It is well-known that brain rhythms as measured by EEG are time series that composed of mixtures of multiple frequency components, such as δ (1–4 Hz), θ (4–8 Hz), α (8–13 Hz), β (13–30 Hz), and γ (>30 Hz) rhythms. People have naturally occurring brain rhythms over areas of the brain concerned with different functional states. For example, when people imagine moving, the functional connectivity of cortex is changed, i.e., the amplitudes of μ and central β rhythms are first suppressed, then enhanced (Pfurtscheller and Lopes da Silva, 1999). These two changes are called ERD/ERS (event-related desynchronization and event-related synchronization), respectively (Pfurtscheller, 1977, 1992; Pfurtscheller and Aranibar, 1977; Pfurtscheller and Lopes da Silva, 1999). Because different EEG rhythms can distinguish patterns of neuronal activity associated with specific behavioral and cognitive processing functions, different patterns of synchronization or desynchronization could result from different forms of processing or computation in the brain and represent different rhythmic states. The ERD/ERS is defined as the percentage of power decrease (ERD) or power increase (ERS) within a given frequency band relative to a reference interval (Pfurtscheller and Lopes da Silva, 1999). Mathematically, it can be estimated as follows:


where Ai(k) represents the power during an experimental task segment of class k, channel i, and Ri(k) denotes the power of given frequency bands during the reference time segment of class k, channel i. The value ERDSi(k) indicates the relative power during the task. A negative value of ERDSi(k) indicates a power decrease during the stimulation in the frequency band of interest which is a desynchronization. A value of zero means no power change in the interested frequency band, i.e., there is no ERD/ERS phenomenon. Finally, a positive value signifies an increase of power, i.e., synchronization. Furthermore, the larger the ERD/ERS, the more apparent the ERS phenomenon is.

Different rhythms evoked in specific MI tasks involve different brain areas with different mental processes which may produce different brain patterns useful in a BCI. After calculating ERDS for all channels and classes, in order to investigate the EEG pattern changes of motor imageries for each frequency, we proposed an approach based on sample entropy – a modification of the approximation entropy introduced by Richman and Moorman (2000). Sample entropy is employed to measure the uncertainty of the next observation knowing m past observations and using a certain resolution r. In this approach, the non-contiguous bands consisting of sub-upper and sub-lower limits of the band-pass filter (e.g., if {4–6, 8–12} is a non-contiguous frequency band set, 4 and 8 are called sub-lower limits, 6 and 12 are sub-upper limits) is determined which is much more accurate than a single frequency band for quantifying the sensorimotor rhythms in the frequency domain. See Sample entropy in Appendix for the computation of Sample Entropy.

We wished to calculate the relevant electrode positions (spatial domain) for detection and classification of MI-related EEG patterns in the cortex, and therefore used the following algorithm on the training set to select the optimal channel set with the most reactive multiple frequency bands of the recordings.

As for the test set (also called evaluation data), the normalized EEG signals were filtered using the saved non-contiguous frequency components (including Pi,* and γi,*) for each channel i on the selected optimal channel set. It should be noted, for each channel the multiple frequency band set is made up of several non-contiguous sub-bands since some frequency components (stop frequency bands) were removed using notch filters. The size of the frequency bands set varies according to electrode placement, and it also varies from subject to subject. For each subject, multiple frequency bands were selected during training stage. Then the optimal channel set with the most reactive multiple frequency bands was applied to test recordings for the same subject.

Input: Normalized EEG segments (S) with all channels, and the corresponding header structure (H).

Output: Optimal channel set (O).

S1. Set the reference interval from 0 to 2 s (visual cue-onset at second 2).

S2. Calculate ERDSik for all channels (i) and classes (k) using Eq. 8 to estimate the power changes caused by MI in specific frequency components from 2 to 40 Hz using the BioSig toolbox3 [frequency borders = [2, 40] with 2 Hz bandwidth and in 1 Hz frequency step size, i.e., calculate for the segments: (1–3 Hz), (2–4 Hz), …, (39–41 Hz)]. The classical bandpower method of quantification of ERD/ERS (Pfurtscheller and Aranibar, 1977) was used.

S3. For each channel i

a. For each column of ERDSik (frequency component):

i Calculate SampEn using a sliding time window of width 2 s from 2.5 to 3.5 s for each class, respectively. This was done since ERD and ERS display some intra- and inter subject variability and are not restricted to 2 s time windows.

ii Calculate the significant frequency components with the 95% confidence interval by using the paired-sample t-test between each combination of two different SampEn which maximizes differences between two classes.

iii Construct a set of frequency components (fCi) in which the frequency of each component has a significant difference for all combinations.

b. For each component pi,jfCi:

i Set the upper limit to pi,j, and determine the stop frequency band set (Γi,j={ τ|1<τ<pi,j,τfCi }). Si passes through a five-order Butterworth low-pass (pi,j+ 1) filter and a notch filter for each stop frequency τ.

ii Calculate classification accuracy using the filtered signal (fSi,j), save fSi,* which has the best accuracy (Acci) for step S6(b), and save the frequency component (pi,*) with the corresponding stop frequency band set (Γi,*) which will be used in the Acci test session.

S4. Sort all EEG channels into a list in descending order according to Acci.

S5. Initialize the size of O:l = 0, and the current accuracy: Acc(l) = 0.

S6. For l = 1:number of EEG channels

a. l←l+1.

b. Calculate classification accuracy Acc(l) using fS1,*:fS1,*.

S7. Return the optimal channels set O(1:l) which presents the highest classification accuracy obtained by validation test on the training set.

Feature Extraction

We used feature extraction to find a suitable representation of the EEG recordings that can be simplified in the subsequent classification. There are a variety of feature extraction methods used in BCI systems (see e.g., Bashashati et al., 2007 for a review). Considering the non-stationary characteristics (rapidly varying over time and particularly across tasks) of MI EEG, two kinds of features were extracted in this study. One was based on the fact that a source active for one mental task is active with a different energy for another mental task (inter-class diversity). The other was based on the fact that motor tasks involve a succession of activations in different brain areas. This method has been applied successfully in Gouy-Pailler et al. (2008). In the current study, the data from the selected channel in the selected frequency bands were separated into 4 timeframes (0.5 s long) from t = 2.5 s to t = 4.5 s for feature extraction. Afterward, in order to reduce the dimension of the extracted feature, we used principle component analysis (PCA; i.e., eigenvectors with eigenvalues greater than one were chosen). We used a training set StrainT×N, where T and N′ denote the number of sampled time-points and the number of selected channels, respectively. We defined the eigenvectors of the covariance matrix of the training set as follows:

Vi={ PCA(S  itrain(t[ 2.5,4.5 ])), PCA(S  itrain(t[ 2.5:0.5:4.5 ]))|i[ 1,,N ] }.(9)

For the test set StestT×N, the feature projection matrix is determined by,



After the feature extraction, the feature vectors are subjected to a classifier. SVMs (Vapnik, 1995) introduced in 1995 are some of the most frequently used machine learning methods both for classification and regression, and have proven to be useful in EEG signal classification for MI and BCI applications (Ang et al., 2008; Noirhomme et al., 2008; Arvaneh et al., 2011). To verify our method, SVMs with radial basis kernel function as the classifier was employed by implementing the LIBSVM toolbox4 (Chang and Lin, 2001). Because of individual differences, training sets (session 1: A01T to A09T) from the nine subjects (Subject 1 to Subject 9) were used to training the SVM classifier. To limit the amount of over-fitting and reduce training time, we used a 10-fold cross-validation procedure, where 90% of all trials in each file were used for the training set, and the remaining trials were used for validation to determine performance. This was repeated 10 times for different partitions of the training set. After the classifiers had been trained from the training sets, they were applied to the data sets (session 2: A01E to A09E). Therefore, the channels and frequency bands were selected based on the cross-validation accuracy from the session 1.

The Performance Evaluation Methods

Classification accuracy, kappa score, and information transfer rate (ITR) in bits/trial were calculated for performance evaluation of the proposed method.

Classification Accuracy

Classification accuracy was measured according to (11) to evaluate the performance of the proposed methods. This criterion was also used for selecting electrode locations, and frequency bands in section III.


where Ncorrect is the number of correct classified samples, and Ntotal is the number of total samples to be classified (the test set).

Kappa Score

In order to compare our results with previous results reported by BCI Competition IV5, the kappa score as well as the classification accuracy were calculated. Cohen’s kappa score often simply called Kappa score is thought to be a robust statistical measure for qualitative categorical items. The value of the Kappa ranges between 1 and −1, where 1 corresponds to perfectly correct classification and −1 to completely erroneous classification, while a Kappa score of 0 corresponds to chance performance. The equation defined in Cohen (1960) is


where Pr(a) is the relative observed agreement among raters, and Pr(e) is the hypothetical probability of chance agreement, using the observed data to calculate the probabilities of each observer randomly saying each category.

Information Transfer Rate

Since speed for assessing this kind of non-invasive communication and control systems (BCIs) would be affected by the characteristics of the specific application which make the comparisons between different studies difficult, a common method which incorporates speed and accuracy in a single value is ITR, or bit rate (Wolpaw et al., 2002). This measure is calculated by,


where P is the classification accuracy, i.e., how well thoughts are recognized, and N is the number of mental tasks.

Results and Discussion

Although a visual inspection of the raw EEG data was performed by an expert (see Table 1), no trials with marked artifacts were removed in this study so that we could evaluate the system’s robustness and sensitivity to outliers and artifacts, as in Gouy-Pailler et al. (2010). Figure 3 plots the correlation between the number of artifact-contaminated trials and kappa score for the training and test sets, showing that kappa scores are influenced more by artifact-contaminated trials from the test sets than from the training sets.


Table 1. Summary of the number of artifact-contaminated trials for each subject.


Figure 3. Influence of artifacts: the number of artifact-contaminated trials influences the decoding of the test set (A) but not the training set (B).

Each trial lasted for 6 s, but not all time-points of this 6 s period carry information about the difference among the four MI tasks. Subjects were told to begin imagining after the execution cue was presented but they could have begun imagining right after the presentation of the preparation cue. Therefore, the EEG data from 0.5 to 2.5 s after the visual cue (i.e., from 2.5 to 4.5 s, see Figure 1) were used in this study. We used the exact same time window for the channel selection algorithm, feature extraction with PCA, and classification. The selected time segment was also used by the winner of the BCI competition IV dataset 2a (Ang et al., 2008). Since the frequency bands of interest vary from subject to subject (Pfurtscheller et al., 1997, 2000), we used a subject-specific strategy in this study.

Time-frequency maps can provide an overview of the activity over broad frequency ranges and electrode locations showing significant band power increases or decreases during MI tasks. Figure 4 shows an example of ERD/ERS map calculated from 22 EEG channels during imagery of movements of the left, right hand, tongue, or both feet. The maps cover the frequency range from 2 to 40 Hz, which is sufficient to detect important ERD/ERS patterns such as μ and β rhythms. The reference period was 0 to 2 s. It is not easy to detect differences through visual inspection even though the selected channels were highlighted by bold boxes. In order to quantify significant ERD/ERS changes, we used paired-sample t-tests to calculate the significant difference based on sample entropy among all frequency bands, thus selecting the optimal channels for discrimination among four MI tasks. Figure 5 compares the performance between the 0.5 and 40 Hz band, the [min(fCi)−1]−[max(fCi)+1] band, and the proposed non-contiguous frequency sub-band filter approach for each EEG channel across all nine subjects which demonstrates effectiveness of the non-contiguous approach proposed for most channels.


Figure 4. Time-frequency maps (ERD/ERS) relative to the baseline recorded seconds before the event for all 22 EEG channels during four-class motor imagery task. Event-related desynchronization (ERD) is plotted in red, while event-related synchronization (ERS) is in blue.


Figure 5. Comparison of kappa scores using three different filtering approaches.

Optimum Channel Selection

EEG signals are electrical activity recorded from multiple electrodes placed on the surface of scalp and generally, not all signals from all electrodes are related to the desired task. This means that each channel makes a particular contribution to the discrimination between BCI tasks. Some are highly discriminative, some low. In addition, when we consider computational complexity and time costs, some channels should be discarded. In our channel selection algorithm, two input parameters must be specified to compute sample entropy. One is the embedding dimensions m, the other is the resolution tolerance r. Although both are critical in determining the outcome of this method for entropy estimation, no guidelines exist for optimizing their values. The various existing rules generally lead to the use of values of r between 0.1 and 0.25 and values of m of 1 or 2 for data records of length N ranging from 100 to 5,000 data points (Pincus, 1991; Lake et al., 2002). In our experiment, parameter values m = 2 and r = 0.2 times the standard deviation of the original data sequence were chosen.

According to Table 2, the optimum number of selected channels is in the range of 3–12, with an overall mean of eight which significantly reduces the number of channels from 22. The number of channels for subject 5 and 8 are 3, the smallest number amongst the subjects, while the largest number of selected channels is 12 for subject 6. Figure 6 gives correlation between the selected channel set size and kappa score. It shows kappa score is influenced by the size of the channel set. Moreover, the results in Table 3 shows that the third stage (channel selection) yields superior averaged test accuracies of 71.83 ± 6.76% and 64.76 ± 6.85% for session 1 and session 2 with the use of only 3–12 from 22 channels.


Table 2. Summary of selected channels with the highest classification accuracy (Acc) for each subject.


Table 3. Comparison of 10-fold cross-validation classification accuracy (%) for each processing stage presented in this paper for each subject on the training set (session 1) and test set (session 2).


Figure 6. Correlation between the selected channel set size and kappa score.

10-Fold Cross-Validation for Each Session Data

In order to evaluate each stage’s performance of the processing framework, we used a 10-fold cross-validation on the two sessions (i.e., the training set and the test set which were recorded on two different days), respectively. The detailed view of classification accuracies of all subjects for each stage is summarized in Table 3. As seen from this table, average accuracy improves at each stage across subjects for each session. For session 1 and session 2, the proposed artifact correction algorithm yielded an average improvement of 28.1 and 15.8% in classification accuracy. Similarly, the proposed channel selection algorithm yielded an average improvement of 11.2 and 8.5% in the classification accuracy. From Table 3, we can see that the proposed artifact correction algorithm and the channel/frequency band selection algorithm yielded better performance which show these two stages made the biggest contribution to our method. The parameters of the algorithm were estimated only on the session 1 and were used on the following session-to-session transfer test.

Comparison to Previous Work

To compare with the results of the winners of dataset 2a of BCI Competition IV, we used session-to-session transfer using the same criterion, namely, the kappa score. The procedure of this evaluation method is greatly simplified, using the first session (A01T to A09T) as training data (also called calibration data) to find the optimal parameters and then to apply the procedure to unseen data (also called evaluation data; session 2: A01E to A09E) to test performance. This is the most meaningful performance measure in actual BCI experiments. Table 4 summarizes the comparison of the kappa score of the proposed method with the existing multi-class methods for each subject on the dataset 2a of BCI competition IV. It can be seen that our proposed method (FSDE) without artifact removal performed comparably to the best competitors5 and Gouy-Pailler et al. (2010). Our experimental results also found that frequency 1 Hz is important only for subject 5 and 6. That is to say, if the signals from subject 5 and 6 were filtered above 1 Hz, the kappa scores were very low. We believe that this is why the three best competitions 1st–3st had the lowest results for these two subjects (see Table 4). For the competition winner, signals were band-pass filtered into multiple frequency bands (4–8, 8–12, …, 36–40 Hz), for the competition second ranked, signals were band-pass filtered between 8 and 30 Hz, and the authors of the third ranked paper filtered signals in an 8–25 Hz band.


Table 4. Comparison of session-to-session transfer performance for each subject.

Bit rate is an objective measure for measuring improvement in a BCI and for comparing different BCIs (Wolpaw et al., 2002). Bit rate for four different choices is shown as bits/trial in Figure 7 for each subject. By comparing Table 4 and Figure 7, it can be seen that the bit rate is in direct proportion to kappa score.


Figure 7. Information transfer rate (ITR) obtained for all nine subjects.

Performance Estimation for Stage Removal and Artifact Removal

To further evaluate each stage’s performance of the processing framework separately, we reported kappa scores for each subject on the test set by removing the different stages rather than adding them (as done in Table 3). Figure 8 shows that stages 2–5 contribute to the final performance of the whole framework, especially stage 2. Furthermore, we present the classification performances comparison of the whole processing framework with and without artifact removal in Figure 9. We observed that there is only a slight improvement in classification performance in subject1, 3, 5, and 9 when removing artifact-contaminated trials from training and test set. There was no significant performance difference between methods with and without artifact removal.


Figure 8. Comparison of the performance measures (kappa scores) for different processing stage removal across all subjects on the test set (session 2).


Figure 9. Comparison of the performance measures (kappa scores) for the whole processing framework with and without artifact removal for subject 1 to 9 on the test set (session 2).


Implementing information exchange between humans and machines through the use of EEG signals is one of the biggest challenges in signal processing and biomedical engineering and one of the fundamental issues is the proper interpretation of EEG signals (Kolodziej et al., 2010).

This paper illustrates how the proposed processing framework decodes the EEG signal for multi-class mental tasks. Our robust brain-computer interface processing framework FSDE includes five stages. Basically, we focused on two challenges for online EEG decoding. One was the EOG artifact correction and other artifacts separation. Considering raw EEG signals contaminated by not only EOG artifacts but also artifacts generated by other sources, we extended the work of Schlögl et al. (2007), proposed our new computational model, and implemented the automatic artifact correction method for recovering the original source signals. The other was the channel selection based on the reactive non-contiguous discriminating frequency sub-bands instead of setting a broad frequency range which was proposed by Pfurtscheller and Lopes da Silva (1999). We did not find any similar studies in the literatures. In Arvaneh et al. (2011), the EEG data were band-pass filtered using a manually selecting frequency range, i.e., 8–35 Hz. In Ang et al. (2008), the multi-channel EEG signals were first band-pass-filtered into multiple frequency bands (4–8 Hz, 8–12 Hz, …, 36–40 Hz), then the authors extracted common spatial patterns (CSP) features from each of these bands. Compared to these existing methods, our method introduced an automatic selection of subject-specific reactive frequency sub-bands through the training session and we confirmed that non-contiguous band filtering approach provides a much more accurate way to quantify the sensorimotor rhythms in the frequency domain. During training, the method was computationally intensive, but there was almost no computational time cost in test session. In BCI applications, ITR is used for evaluating the system performance. And we did obtain good performances that were comparable to that of the winner of the competitions but using different methods. We would also like to point out that our method has another clear advantage over previous methods. Indeed, from a Neuroscience point of view, it is often of interest to find which brain signals can explain behavior. Our method automatically provides the set of signals (electrodes and frequency bands) that lead to the best prediction of behavior. This is extremely valuable for experimental research involving EEG.

The proposed method was evaluated using a publicly available dataset of BCI competition. Using the same criterion (i.e., kappa score), the overall four-class kappa values were comparable to other models but without requiring any artifact-contaminated trial removal. The performance also showed that multi-class MI tasks can be reliably discriminated using a few selected channels. This may be a promising avenue for fast online and robust EEG-based BCI applications.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.


We thank the organizers of the BCI Competition IV (2008) for providing the competition datasets. We would also like to express our gratitude to Dr. Aarlenne Zein Khan and the anonymous reviewers for their contribution to the manuscript. This work was supported by the National Natural Science Foundation of China (No. 61075056, No. 60970061, No. 61103067), the Fundamental Research Funds for the Central Universities, the Natural Sciences and Engineering Research Council of Canada (NSERC), the Canada Foundation for Innovation (CFI), and the Ontario Research Fund (ORF). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.



Alkadhi, H., Brugger, P., Boendermaker, S. H., Crelier, G., Curt, A., Hepp-Reymond, M. C., et al. (2005). What disconnection tells about motor imagery: evidence from paraplegic patients. Cereb. Cortex 15, 131–140.

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Ang, K. K., Chin, Z. Y., Zhang, H., and Guan, C. (2008). “Filter bank common spatial pattern (FBCSP) in brain-computer interface,” in Proceeding of IJCNN’08, Hong Kong, China, 2390–2397.

Arvaneh, M., Guan, C. T., Ang, K. K., and Quek, C. (2011). Optimizing the channel selection and classification accuracy in EEG-Based BCI source. IEEE Trans. Biomed. Eng. 58, 1865–1873.

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Bashashati, A., Fatourechi, M., Ward, R. K., and Birch, G. E. (2007). A survey of signal processing algorithms in brain-computer interfaces based on electrical brain signals. J. Neural. Eng. 4, R32–R57.

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Brunner, C., Leeb, R., Müller-Putz, G. R., Schlögl, A., and Pfurtscheller, G. (2008). BCI Competition 2008 – Graz data set A. Available at:

Chang, C. C., and Lin, C. J. (2001). Libsvm: A Library for Support Vector Machines. Available at:

Cohen, J. (1960). A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20, 37–46.

CrossRef Full Text

Doud, A. J., Lucas, J. P., Pisansky, M. T., and He, B. (2011). Continuous three-dimensional control of a virtual helicopter using a motor imagery based brain-computer interface. PLoS ONE 6, e26322. doi:10.1371/journal.pone.0026322

CrossRef Full Text

Fatourechi, M., Bashashati, A., Ward, R. K., and Birch, G. E. (2007). EMG and EOG artifacts in brain computer interface systems: a survey. Clin. Neurophysiol. 118, 480–494.

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Gouy-Pailler, C., Congedo, M., Brunner, C., Jutten, C., and Pfurtscheller, G. (2008). “Multi-class independent common spatial patterns: exploiting energy variations of brain sources,” in Proceeding of the 4rd International Brain–Computer Interface Workshop and Training Course, Graz, Austria.

Gouy-Pailler, C., Congedo, M., Brunner, C., Jutten, C., and Pfurtscheller, G. (2010). Nonstationary brain source separation for multiclass motor imagery. IEEE Trans. Biomed. Eng. 57, 469–478.

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Hallez, H., De Vos, M., Vanrumste, B., Van Hese, P., Assecondi, S., Van Laere, K., et al. (2009). Removing muscle and eye artifacts using blind source separation techniques in ictal EEG source imaging. Clin. Neurophysiol. 120, 1262–1272.

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Hoffmann, U., Vesin, J. M., Ebrahimi, T., and Diserens, K. (2008). An efficient P300-based brain-computer interface for disabled subjects. J. Neurosci. Methods 167, 115–125.

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Hyvärinen, A. (1999). Fast and robust fixed-point algorithms for independent component analysis. IEEE Trans. Neural. Netw. 10, 626–634.

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Hyvarinen, A., Karhunen, J., and Oja, E. (2001). Independent Component Analysis. New York: John Wiley and Sons.

Hyvarinen, A., and Oja, E. (1997). A fast fixed-point algorithm for independent component analysis. Neural. Comput. 9, 1483–1462.

CrossRef Full Text

Joyce, C. A., Gorodnitsky, I. F., and Kutas, M. (2004). Automatic removal of eye movement and blink artifacts from EEG data using blind component separation. Psychophysiology 41, 313–325.

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Kolodziej, M., Majkowski, A., and Rak, R. J. (2010). Matlab FE-Toolbox – an universal utility for feature extraction of EEG signals for BCI realization. Prz. Elektrotechniczny 86, 44–46.

Lake, D. E., Richman, J. S., Griffin, M. P., and Moorman, J. R. (2002). Sample entropy analysis of neonatal heart rate variability. Am. J. Physiol. Regul. Integr. Comp. Physiol. 283, R789–R797.

Pubmed Abstract | Pubmed Full Text

Lew, E., Nuttin, M., Ferrez, P. W., Degeest, A., Buttfield, A., Vanacker, G., et al. (2006). “Noninvasive brain–computer interface for mental control of a simulated wheelchair,” in Proceeding of the 3rd International Brain–Computer Interface Workshop and Training Course, Graz, Austria.

Millán, JdR., Renkens, F., Mouriño, J., and Gerstner, W. (2004). Noninvasive brain actuated control of a mobile robot by human EEG. IEEE Trans. Biomed. Eng. 51, 1026–1033.

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Nakayama, K., and Inagaki, K. (2006). “A brain computer interface based on neural network with efficient pre-processing,” in Proceeding of ISPACS ‘06, Yonago, Japan, 673–676.

Noirhomme, Q., Kitney, R. I., and Macq, B. (2008). Single-trial EEG source reconstruction for brain-computer interface. IEEE Trans. Biomed. Eng. 55, 1592–1601.

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Pfurtscheller, G. (1977). Graphical display and statistical evaluation of event-related desynchronization (ERD). Electroencephalogr. Clin. Neurophysiol. 43, 757–760.

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Pfurtscheller, G. (1992). Event-related synchronization (ERS): an electrophysiological correlate of cortical areas at rest. Electroencephalogr. Clin. Neurophysiol. 83, 62–69.

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Pfurtscheller, G., and Aranibar, A. (1977). Event-related cortical desynchronization detected by power measurements of scalp EEG. Electroencephalogr. Clin. Neurophysiol. 42, 817–826.

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Pfurtscheller, G., and Aranibar, A. (1979). Evaluation of event-related desynchronization (ERD) preceding and following voluntary self-paced movement. Electroencephalogr. Clin. Neurophysiol. 46, 138–146.

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Pfurtscheller, G., and Lopes da Silva, F. H. L. (1999). Event-related EEG/MEG synchronization and desynchronization: basic principles. Clin. Neurophysiol. 110, 1842–1857.

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Pfurtscheller, G., Neuper, C., Flotzinger, D., and Pregenzer, M. (1997). EEG-based discrimination between imagination of right and left hand movement. Electroencephalogr. Clin. Neurophysiol. 103, 642–651.

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Pfurtscheller, G., Neuper, C., Guger, C., Harkam, W., Ramoser, H., Schlögl, A., et al. (2000). Current trends in Graz brain-computer interface (BCI) research. IEEE Trans. Rehabil. Eng. 8, 216–219.

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Pincus, S. M. (1991). Approximate entropy as a measure of system complexity. Proc. Natl. Acad. Sci. U.S.A. 88, 2297–2301.

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Richman, J. S., and Moorman, J. R. (2000). Physiological time-series analysis using approximate entropy and sample entropy. Am. J. Physiol. Heart Circ. Physiol. 278, H2039–H2049.

Pubmed Abstract | Pubmed Full Text

Schlögl, A., Keinrath, C., Zimmermann, D., Scherer, R., Leeb, R., and Pfurtscheller, G. (2007). A fully automated correction method of EOG artifacts in EEG recordings. Clin. Neurophysiol. 118, 98–104.

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Vapnik, V. N. (1995). The Nature of Statistical Learning Theory. New York: Springer-Verlag.

Wolpaw, J. R., Birbaumer, N., McFarland, D. J., Pfurtscheller, G., and Vaughan, T. M. (2002). Brain-computer interfaces for communication and control. Clin. Neurophysiol. 113, 767–791.

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Zhou, W. D., and Gotman, J. (2009). Automatic removal of eye movement artifacts from the EEG using ICA and the dipole model. Prog. Nat. Sci. 19, 1165–1170.

CrossRef Full Text



Independent component analysis is a method for determining underlying factors or components from multivariate statistical data. What distinguishes ICA from other methods is that it looks for components that are both statistically independent, and non-Gaussian (Hyvarinen et al., 2001). The FastICA algorithm (Hyvarinen and Oja, 1997; Hyvärinen, 1999) is a commonly used ICA algorithm which uses a fixed-point iteration scheme to extract independent components by separately maximizing the negentropy of each mixture. It is a computationally more efficient method for performing the estimation of ICA than conventional gradient descent methods for ICA. To estimate W, these steps were followed:

1. Calculate the mean M of the data Y for each row.

2. YY-M.

3. Calculate PCA of Y, returns the eigenvector (E) and diagonal eigenvalue (D) matrices.

4. Calculate the whitening matrices WM:WMD-12ET.

5. Whiten Y:YWMY.

6. Begin calculating the ICA using Hyvarinen’s fixed-point algorithm (Hyvarinen and Oja, 1997; Hyvärinen, 1999) for each channel:

a. Initialize the weight matrix ω and set ε;

b. Repeat

(a) Update the function g, which is the derivative of the function G used in the general contrast function: ωnew+EygωoldTy-EgωoldTyωold.

(b) Normalize: ωnewωnew+ωnew+.

(c) Until ωnew-ωold<ε.

7. Return W = [ω1, ω2, …, ωn]T, where n denotes the number of channels.

Sample Entropy

For a given one-dimensional time series of T points: S = {s(1), s(2), …, s(T)}, computation of SampEn is shown in the following steps:

1. Change S into T–m + 1 vectors:

Sm(i)=s(i),s(i+1),,s(i+m-1) for i = 1, …, T m + 1, where m denotes the length of sequences to be compared.

2. Define the distance between each of two vectors as: dSm(i),Sm(j)=max0km-1|s(i+k)-s(j+k)| for i,j = 1, …, Nm, and i = j.

3. Given r (the tolerance for accepting matches), Bim(r) is defined as 1/(T – m) times the number of vectors Sm(j) falling within vector distance r of Sm(i), 1 ≤ j ≤ T – m; j i,



4. Similarly, calculate Bim+1(r) which is defined as 1/(T – m – 1)times the number of vectors Sm+1(j) falling within vector distance r of Sm+1(i), 1(i), 1 ≤ j ≤ T – m – 1,



5. Calculate the Sample entropy (SampEn) as


Keywords: electroencephalogram, brain-computer interface, multi-class motor imagery, artifact processing, EEG channel selection

Citation: Wang D, Miao D and Blohm G (2012) Multi-class motor imagery EEG decoding for brain-computer interfaces. Front. Neurosci. 6:151. doi: 10.3389/fnins.2012.00151

Received: 29 May 2012; Accepted: 19 September 2012;
Published online: 09 October 2012.

Edited by:

Cuntai Guan, Institute for Infocomm Research, Singapore

Reviewed by:

Fabien Lotte, Institut National de Recherche en Informatique et en Automatique, France
Alireza Mousavi, Brunel University, UK

Copyright: © 2012 Wang, Miao and Blohm. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.

*Correspondence: Deng Wang, Department of Computer Science and Technology, Tongji University, 4800 Cao’an Road, Jiading, Shanghai 201804, China. e-mail: