# **TOWARDS A NEW COGNITIVE NEUROSCIENCE: MODELING NATURAL BRAIN DYNAMICS**

**Topic Editors Klaus Gramann, Tzyy-Ping Jung, Daniel P. Ferris, Chin-Teng Lin and Scott Makeig**

# HUMAN NEUROSCIENCE

#### *FRONTIERS COPYRIGHT STATEMENT*

© Copyright 2007-2014 Frontiers Media SA. All rights reserved.

All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.

The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.

Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.

Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.

As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.

All copyright, and all rights therein, are protected by national and international copyright laws.

The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use.

**ISSN** 1664-8714 **ISBN** 978-2-88919-271-7 **DOI** 10.3389/978-2-88919-271-7

#### *ABOUT FRONTIERS*

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

#### *FRONTIERS JOURNAL SERIES*

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing.

All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

#### *DEDICATION TO QUALITY*

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view.

By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

#### *WHAT ARE FRONTIERS RESEARCH TOPICS?*

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area!

Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# **TOWARDS A NEW COGNITIVE NEUROSCIENCE: MODELING NATURAL BRAIN DYNAMICS**

Topic Editors:

**Klaus Gramann,** Berlin Institute of Technology, Germany **Tzyy-Ping Jung,** University of California San Diego, USA **Daniel P. Ferris,** University of Michigan, USA **Chin-Teng Lin,** National Chiao-Tung University, Taiwan **Scott Makeig,** University of California San Diego, USA

The picture illustrates a 'hand mirroring' MoBI experiment with participants following each other's hand movements. Participants wear high density EEG synchronized to motion capture of their arms, hands, and heads.

Decades of brain imaging experiments have revealed important insights into the architecture of the human brain and the detailed anatomic basis for the neural dynamics supporting human cognition. However, technical restrictions of traditional brain imaging approaches including functional magnetic resonance tomography (fMRI), positron emission tomography (PET), and magnetoencephalography (MEG) severely limit participants' movements during experiments. As a consequence, our knowledge of the neural basis of human cognition is rooted in a dissociation of human cognition from what is arguably its foremost, and certainly its evolutionarily

most determinant function, organizing our behavior so as to optimize its consequences in our complex, multi-scale, and ever-changing environment. The concept of natural cognition, therefore, should not be separated from our fundamental experience and role as embodied agents acting in a complex, partly unpredictable world.

To gain new insights into the brain dynamics supporting natural cognition, we must overcome restrictions of traditional brain imaging technology. First, the sensors used must be lightweight and mobile to allow monitoring of brain activity during free participant movements. New hardware technology for electroencephalography (EEG) and near infrared spectroscopy (NIRS) allows recording electrical and hemodynamic brain activity while participants are freely moving. New data-driven analysis approaches must allow separation of signals arriving at the sensors from the brain and from non-brain sources (neck muscles, eyes, heart, the electrical environment, etc.). Independent component analysis (ICA) and related blind source separation methods allow separation of brain activity from non-brain activity from data recorded during experimental paradigms that stimulate natural cognition. Imaging the precisely timed, distributed brain dynamics that support all forms of our motivated actions and interactions in both laboratory and real-world settings requires new modes of data capture and of data processing. Synchronously recording participants' motor behavior, brain activity, and other physiology, as well as their physical environment and external events may be termed mobile brain/body imaging ('MoBI'). Joint multi-stream analysis of recorded MoBI data is a major conceptual, mathematical, and data processing challenge.

This Research Topic is one result of the first international MoBI meeting in Delmenhorst Germany in September 2013. During an intense workshop researchers from all over the world presented their projects and discussed new technological developments and challenges of this new imaging approach. Several of the presentations are compiled in this Research Topic that we hope may inspire new research using the MoBI paradigm to investigate natural cognition by recording and analyzing the brain dynamics and behavior of participants performing a wide range of naturally motivated actions and interactions.

# Table of Contents

*06 Toward a New Cognitive Neuroscience: Modeling Natural Brain Dynamics* Klaus Gramann, Tzyy-Ping Jung, Daniel P. Ferris, Chin-Teng Lin and Scott Makeig *09 Methodological Aspects of EEG and Body Dynamics Measurements During Motion*  Pedro M. R. Reis, Felix Hebenstreit, Florian Gabsteiger, Vinzenzvon Tscharner and Matthias Lochmann *28 Assessing the Quality of Steady-State Visual-Evoked Potentials for Moving Humans Using a Mobile Electroencephalogram Headset*  Yuan-Pin Lin, Yijun Wang, Chun-Shu Wei and Tzyy-Ping Jung *38 A Comparison of Geometric- and Regression-Based Mobile Gaze-Tracking*  Björn Browatzki, Heinrich H. Bülthoff and Lewis L. Chuang *50 MoBILAB: An Open Source Toolbox for Analysis and Visualization of Mobile Brain/Body Imaging Data*  Alejandro Ojeda, Nima Bigdely-Shamlo and Scott Makeig *59 Pervasive Brain Monitoring and Data Sharing Based on Multi-Tier Distributed Computing and Linked Data Technology*  John K. Zao, Tchin-Tze Gan, Chun-Kai You, Cheng-En Chung, Yu-Te Wang, Sergio José Rodríguez Méndez, Tim Mullen, Chieh Yu, Christian Kothe, Ching-Teng Hsiao, San-Liang Chu, Ce-Kuen Shieh and Tzyy-Ping Jung *75 Neuroergonomics: A Review of Applications to Physical and Cognitive Work*  Ranjana K. Mehta and Raja Parasuraman *85 Continuous Monitoring of Brain Dynamics With Functional Near Infrared Spectroscopy as a Tool for Neuroergonomic Research: Empirical Examples and a Technological Development*  Hasan Ayaz, Banu Onaral, Kurtulus Izzetoglu, Patricia A. Shewokis, Ryan McKendrick and Raja Parasuraman *98 Kinesthetic and Vestibular Information Modulate Alpha Activity During Spatial Navigation: A Mobile EEG Study*  Benedikt V. Ehinger, Petra Fischer, Anna L. Gert, Lilli Kaufhold, Felix Weber, Gordon Pipa and Peter König *110 It's How You Get There: Walking Down a Virtual Alley Activates Premotor and Parietal Areas* 

Johanna Wagner, Teodoro Solis-Escalante, Reinhold Scherer, Christa Neuper and Gernot Müller-Putz

#### *121 Neural Decoding of Expressive Human Movement From Scalp Electroencephalography (EEG)*

Jesus G. Cruz-Garza, Zachery R. Hernandez, Sargoon Nepaul, Karen K. Bradley and Jose L. Contreras-Vidal

*137 Linking Motor-Related Brain Potentials and Velocity Profiles in Multi-Joint Arm Reaching Movements* 

Julià L. Amengual, Josep Marco-Pallarés, Carles Grau, Thomas F. Münte and Antoni Rodríguez-Fornells

*150 From Speech to Thought: The Neuronal Basis of Cognitive Units in Non-Experimental, Real-Life Communication Investigated Using ECoG* Johanna Derix, Olga Iljina, Johanna Weiske, Andreas Schulze-Bonhage, Ad Aertsen and Tonio Ball

### Toward a new cognitive neuroscience: modeling natural brain dynamics

#### *Klaus Gramann1,2\*, Tzyy-Ping Jung3,4,5, Daniel P. Ferris 6,7, Chin-Teng Lin8,9 and Scott Makeig10*

*<sup>1</sup> Psychology and Ergonomics, Biological Psychology and Neuroergonomics, Berlin Institute of Technology, Berlin, Germany*


#### *Edited and Reviewed by:*

*John J. Foxe, Albert Einstein College of Medicine, USA*

**Keywords: mobile brain/body imaging, EEG, fNIRS, brain mapping, embodied cognition, natural cognition, wireless EEG sensors, computational neurosciences**

Decades of brain imaging experiments have revealed important insights into the architecture of the human brain and the detailed anatomic basis for the neural dynamics supporting human cognition. However, technical restrictions of traditional brain imaging approaches including functional magnetic resonance tomography (fMRI), positron emission tomography (PET), and magnetoencephalography (MEG) severely limit participants' movements during experiments (Makeig et al., 2009). As a consequence, our knowledge of the neural basis of human cognition is rooted in a dissociation of human cognition from what is arguably its foremost, and certainly its most evolutionarily determinant function—organizing our behavior so as to optimize its consequences in our complex, multi-scale, and ever-changing environment. The concept of *natural cognition*, therefore, should not be separated from our fundamental experience and role as an embodied agent acting in a complex, partly unpredictable world.

To gain new insights into the brain dynamics supporting natural cognition requires overcoming restrictions of traditional brain imaging technologies (Gramann et al., 2011). First, the sensors must be lightweight and untethered to allow monitoring of brain activity during free movements. Fortunately, new electroencephalography (EEG) and near infrared spectroscopy (NIRS) sensors and sensing devices allow recording both electrical and hemodynamic brain and body activity while participants are freely moving (Lin et al., 2011; Liao et al., 2012; Ayaz et al., 2013). New data-driven analysis approaches must allow separation of signals arriving at the sensors from the brain as well as non-brain sources like neck muscles, eyes, heart, and the electrical environment (Makeig et al., 2004). Independent component analysis (ICA) and related blind source separation methods have proven effective for separating brain from non-brain activities from electrophysiological data recorded during experimental paradigms that stimulate natural cognition (Gramann et al., 2014). ICA has also proven valuable for separating other multi-channel signals including electromyographic (EMG) and electrocardiographic (ECG) activities (Gramann et al., 2010; Gwin et al., 2010; Kline et al., 2014).

Adequate study of natural cognition also requires synchronous recording of participants' motor actions as well as the physical environment and external events influencing cognition. Recording what the brain does (via EEG and fNIRS brain imaging), what it senses (via scene and event recording), and what it organizes (via motor, ocular, and autonomic activity recording) may be termed *mobile brain/body imaging* ("MoBI"). Technically, recording MoBI data is now possible at reasonable cost and convenience. However, joint multi-stream analysis of the data recorded in MoBI paradigms presents major conceptual, mathematical, and data processing challenges (Ojeda et al., 2014).

To overcome restrictions of established brain imaging methods and to facilitate further development of mobile brain/body imaging, a group of researchers from all over the world gathered in the beautiful scientific retreat of the Hanse-Wissenschaftskolleg in Delmenhorst, Germany in September 2013 for the first international meeting on Mobile Brain/Body Imaging. During a stimulating and intense workshop, attendees presented and discussed newest developments in mobile brain imaging technologies, novel software architectures for recording and analyzing multidimensional data streams, and other topics relevant to MoBI. Most attendees at the Delmenhorst meeting contributed to this Research Topic; other research groups have added contributions sharing related ideas. The present Research Topic thus provides an excellent overview of the current state of the art in mobile brain/body imaging. The topics cover the three main pillars of MoBI research, i.e., hardware for imaging mobile brain and body dynamics, software to record and analyze complex multi-dimensional data streams, and applications of MoBI to such diverse fields as neuroergonomics, gait rehabilitation, spatial cognition, and dance.

Starting with the technical aspects of MoBI, Reis et al. (2014), provide an overview on existing hardware and software solutions for MoBI recordings. Focusing on new sensor technology and analysis approaches, Lin et al. report a test of a new mobile EEG headgear using steady-state visual-evoked potentials in participants during treadmill walking (Lin et al., 2014). With the aim to include gaze tracking as an important information channel for investigations of natural cognition and associated brain dynamics Browatzki and colleagues describe and compare two different approaches to measuring eye movements in mobile participants (Browatzki et al., 2014).

The second pillar of MoBI, software frameworks for recording and analyses of multi-modal imaging data is addressed by Ojeda and colleagues providing a description of a new open source toolbox (Ojeda et al., 2014). MoBILAB interoperates with EEGLAB (Delorme and Makeig, 2004) and allows for analysis and visualization of multidimensional mobile brain/body imaging data. Zao et al. (2014) introduce an exciting new perspective on distributed computing describing a novel network system approach to remote monitoring of brain/body activity of one or many mobile participants.

The majority of contributions to this Research Topic can be summarized under the pillar of MoBI applications. The review by Mehta and Parasuraman (2013) provides an overview of the advantages and disadvantages of existing imaging modalities in the area of neuroergonomics, describing differing temporal and spatial resolutions and the degree of immobility that brain imaging method imposes on participants. Ayaz et al. (2013) describe the development and application of a mobile fNIRS device for investigating changes in workload in real operating environments providing an example of mobile recordings of hemodynamics. The first investigation of kinesthetic and vestibular information processing in actively navigating participants is given by Ehinger et al. (2014). The authors dissociate the brain dynamics underlying different proprioceptive senses during movements. Wagner et al. (2014) use MoBI to describe the cortical networks activated during robot-assisted walking and investigate the potential impact of movement-related feedback for gait rehabilitation. Cruz-Garza and colleagues investigate professional dancers during different whole body movements and derive distinct expressive qualities of movement from surface EEG (Cruz-Garza et al., 2014). While the previous studies used whole body movement, Amengual and colleagues describe the brain dynamics associated with the preparation and execution of multi-joint self-paced arm movements (Amengual et al., 2014). Finally, in their paper Derix et al. elucidate the neuronal basis of mental processes during natural communication based on electrocorticography in pre-neurosurgical patients (Derix et al., 2014).

All contributions in this Research Topic go beyond the state of the art in brain imaging and provide new approaches to recording and analyzing multi-modal data. The authors describe new insights into the neural basis of cognitive processes beyond traditional laboratory research. We hope this Research Topic may inspire new research that uses the MoBI paradigm to investigate natural cognition by recording and analyzing brain dynamics and behavior of participants performing a wide range of naturally motivated actions and interactions.

#### **REFERENCES**


Wagner, J., Solis-Escalante, T., Scherer, R., Neuper, C., and Muller-Putz, G. (2014). It's how you get there: walking down a virtual alley activates premotor and parietal areas. *Front. Hum. Neurosci.* 8:93. doi: 10.3389/fnhum.2014. 00093

Zao, J. K.-K., Gan, T.-T., You, C.-K., Rodríguez Méndez, S. J., Chung, C.-E., Wang, Y.-T., et al. (2014). Pervasive brain monitoring and data sharing based on multi-tier distributed computing and linked data technology. *Front. Hum. Neurosci.* 8:370. doi: 10.3389/fnhum.2014.00370

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 30 May 2014; accepted: 02 June 2014; published online: 19 June 2014. Citation: Gramann K, Jung T-P, Ferris DP, Lin C-T and Makeig S (2014) Toward a new cognitive neuroscience: modeling natural brain dynamics. Front. Hum. Neurosci. 8:444. doi: 10.3389/fnhum.2014.00444*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Gramann, Jung, Ferris, Lin and Makeig. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Methodological aspects of EEG and body dynamics measurements during motion

#### *Pedro M. R. Reis <sup>1</sup> \*, Felix Hebenstreit 2, Florian Gabsteiger 2, Vinzenz von Tscharner <sup>3</sup> and Matthias Lochmann1*

*<sup>1</sup> Department of Sports and Exercise Medicine, Institute of Sport Science and Sport, Friedrich-Alexander-University Erlangen-Nuremberg, Erlangen, Germany*

*<sup>2</sup> Digital Sports Group, Pattern Recognition Lab, Department of Computer Science, Friedrich-Alexander-University Erlangen-Nuremberg, Erlangen, Germany*

*<sup>3</sup> Human Performance Laboratory, Faculty of Kinesiology, University of Calgary, Calgary, AB, Canada*

#### *Edited by:*

*Klaus Gramann, Berlin Institute of Technology, Germany*

#### *Reviewed by:*

*Peter König, University of Osnabrück, Germany Pierpaolo Busan, University of Trieste, Italy*

#### *\*Correspondence:*

*Pedro M. R. Reis, Department of Sports and Exercise Medicine, Institute of Sport Science and Sport, Friedrich-Alexander-University Erlangen-Nuremberg, Gebbertstr 123B, D-91058 Erlangen, Germany e-mail: Pedro.Reis@FAU.de*

EEG involves the recording, analysis, and interpretation of voltages recorded on the human scalp which originate from brain gray matter. EEG is one of the most popular methods of studying and understanding the processes that underlie behavior. This is so, because EEG is relatively cheap, easy to wear, light weight and has high temporal resolution. In terms of behavior, this encompasses actions, such as movements that are performed in response to the environment. However, there are methodological difficulties which can occur when recording EEG during movement such as movement artifacts. Thus, most studies about the human brain have examined activations during static conditions. This article attempts to compile and describe relevant methodological solutions that emerged in order to measure body and brain dynamics during motion. These descriptions cover suggestions on how to avoid and reduce motion artifacts, hardware, software and techniques for synchronously recording EEG, EMG, kinematics, kinetics, and eye movements during motion. Additionally, we present various recording systems, EEG electrodes, caps and methods for determinating real/custom electrode positions. In the end we will conclude that it is possible to record and analyze synchronized brain and body dynamics related to movement or exercise tasks.

**Keywords: electroencephalography, methodology, hardware and software, movement and exercise, artifacts reduction, electrodes digitalization**

#### **1. INTRODUCTION**

Eighty-four years passed since Hans Berger recorded the first human electroencephalogram, thus the creation of EEG (Berger, 1929; La Vaque, 1999). Methods and applications have come a long way since then. Indeed, clinicians and researchers nowadays use EEG in the management of epilepsy, monitoring of coma patients, investigation of stroke; sleep dysfunction studies, machine control, sports performance amongst others. This method is often preferred to others because it is relatively cheap, easy to wear, light weight and has a high temporal resolution. In contrast, other methods such as functional Magnetic Resonance Imaging (fMRI), have low temporal resolution, are more expensive and are impossible for study ing participants whom wear them while moving. Thus, EEG became one of the most used methods for inspecting and understanding the processes from which behavior originates.

Behavior includes all actions that beings perform in their environment, and these include motion (Vanderwolf, 2007). Makeig et al. (2009) proposed the development of methods for the investigation of brain dynamics during human motion in several dimensions and the development of wearable mobile brain/body imaging (MoBi) methodology. The authors additionally proposed the creation of analysis methods that can model the relationships between the recorded dimensions. The development of such methods will enable researchers to investigate a person's simultaneously recorded brain electric activity, muscle myoelectric activity, movements in 3D space, video, and audio recordings; thus enabling the simultaneous study of brain and body dynamics interactions during motion and behavior.

The comprehension of brain-muscle interactions is beneficial for assessing degenerative diseases, impairments of motion, designing and optimizing neuro-rehabilitation therapies, human brain machine control, human performance optimization and other applications. However, clinicians and scientists considered EEG excessively artifact prone, hence incapable of recording analyzable EEG recordings during motion. Consequently, researchers avoided using EEG recordings in movement studies and preferred indirect methods involving imagery or small limb movements to study brain activity during motion (Salenius et al., 1997; Dobkin et al., 2004; Schaal et al., 2004; Zehr and Duysens, 2004).

EEG recordings use either invasive electrodes (iEEG or ECoG) or surface electrodes (sEEG). Owning to the fact that iEEG involves direct contact with the brain, the signal to noise ratio is much higher than with surface EEG. Nevertheless, iEEG involves surgery (craniotomy) to place an electrode grid on a small portion of the brain surface. This limits the information source area that the system and experts can analyze. This can cause post-surgery problems for the subject. Further, due to ethical considerations the surgery must be indicated for the benefit of the patient. Thus it nearly always involves preparation for surgery of epileptic patients. Therefore, in general, iEEG is impractical for EEG in motion research in most populations. Hence, this paper focuses on spatial resolution and high-density motion surface EEG methodology. Consequently, we refer in this paper to sEEG simply as EEG.

We found no compilation of methodological articles or guidelines for brain and body dynamics measurements. Therefore, this paper aims to supply researchers with an overview on current hardware, software and methods for this purpose. Accordingly, we discuss issues that potentially impair the recording, analysis and recent solutions developed to address these problems. These cover suggestions of how to avoid motion artifacts, the use of custom designed accessories for EEG recording during movement, the possibility and advantages of using trans-impedance amplifiers, determination of real/custom electrode positions, EEG electrode types, the use of different EEG recording systems, artifact removal and the integration of brain, motion capture (MOCAP) and EMG recordings. As an introduction, we offer a short overview of EEG principles.

#### **2. PRINCIPLES OF EEG**

The basic functional structure of the brain is the neuron and the human brain contains about 10<sup>11</sup> of them (Herculano-Houzel, 2009). Neurons are specialized cells that are able to manipulate their membrane electric potentials in order to transmit electrical signals from one to another. These electric signals, or action potentials, are rapid, instantaneous electric events. They have an amplitude of 100 mV, last 1 ms and are conducted through the axon, at a speed that varies from 1 to 100 m/s. This is the method that the brain utilizes for information exchange. This process works rather well for fast communication because of the intricate network, and amount of neurons that constitute the system (Kandel, 2000).

In an all-or-nothing chain reaction, the signal propagates throughout the network. The signal is transmitted in a wavelike movement of activation across the excitable medium of the brain which is composed of axons, synapses, dendritic membranes and ionic channels. Following an axon depolarization and the creation of an excitatory postsynaptic potential (EPSP) at neighboring dendrites, cell membrane depolarization occurs. Neurotransmitters in the excitatory synapses cause an influx of positive ions at the postsynaptic membrane. This creates a negative charge at the apical dendrites of the postsynaptic neuron. Thus a reorganization of ions ensues inside the cell. Ions move from the apical dendrite to the cell body depolarizing the cell body. This creates a positive charge on the extracellular side of the cell body and basal dendrites. A movement of positively charged ions from the cell body and the basal dendrites to the apical dendrite generates extracellular potentials (Magee, 2000; Hallez et al., 2007; Buzsáki et al., 2012). These events create two vertically oriented dipoles of opposing polarity in pyramidal cells. This is due to the arrangement of these cells. Pyramidal cells are arranged with cell bodies in deeper laminae and dendritic arbors directed upward to the surface. Neurons must be regularly arranged so that they amplify each other's extracellular potentials. For this reason neighboring pyramidal and surface cells contribute the most to the EEG signal as their the axes of their dendrite trees are parallel to each other (Hallez et al., 2007).

The flow of current through the extracellular space and the relationship between recordings at a distance of the source is described by the volume conduction theory (Schaul, 1998; Rutkove, 2007). This refers to the to the spread and conduction of extracellular potentials through the biological tissue between the source and the sensor. This bypasses the delicate wiring of the brain but spreads according to standard laws of electrodynamics through the tissue (Plonsey and Heppner, 1967; Hallez et al., 2007). Volume conduction makes measurement of EEG possible in the first place, yet makes separation and interpretation of EEG signals difficult.

Common EEG recording techniques measure the difference of the electric potential of a surface electrode with respect to a reference surface electrode. After the charges reach the electrodes, they are transmitted through cables to a high impedance amplifier. To resolve the high frequency content of EEG, the amplified signal needs to be sampled by an analog to digital converter at a high sampling rate. The sampling rate typically ranges from 250 to 2000 Hz and must be greater than twice the Nyquist frequency to ensure an adequate sampling and to minimize aliasing. The Nyquist frequency is the highest frequency that is of interest to be detected. If the Nyquist frequency is 600 Hz, then the sampling rate should be at least 1200 Hz to avoid aliasing. Here aliasing refers to the effect of under-sampling when higher frequencies are present. This results in the creation of lower frequencies in the analog-to-digital converter (Sinclair et al., 2007). As an example, Waterstraat et al. (2012) used a sample rate of 2000 Hz while recording EEG with the purpose of investigating these frequencies around 600 Hz. After recording, the data is stored on a computer hard drive. Further signal processing and analytic processes involve the removal of uninteresting signals and noise from the raw data.

After filtering, the clean signal appears as waves that are the product of the rhythmic activity of clusters of neuronal cells. It was thought that brain rhythmicity was generated from medial thalamic structures. It is now thought that neurons in the nucleus reticular thalami are the pacemaker. These neurons discharge rhythmically to the thalamocortical relay. This leads to synchronous excitatory postsynaptic potentials (EPSPs) (Schaul, 1998). The brain's rhythmic activity is defined by its occurrence at each second, therefore frequency in Hertz (Hz).

Brain rhythms can occupy several frequencies. Here we attempt to summarize and give brief examples about brain rhythms and their functioning. The lowest frequency band is the delta (δ) band. It ranges from approximately 0.3 to 4 Hz. This band is predominant during sleep and in infant children. Its manifestation in adults is associated with learning and attention deficits (Clarke et al., 2001). The next frequency band in the spectrum is the theta (θ) band. It occupies the frequencies from 4 to 8 Hz. Theta waves are associated with repression or inhibition of behavioral activities, drowsiness and with creative or spontaneous states. Occupying the next frequency band from 8 to 13 Hz are the alpha (α) waves. These were the first observed by Hans Berger and therefore called alpha. Alpha waves occur during relaxation and closed eyes state and are associated with the inhibition of certain functions in the brain (Goldman et al., 2002). Beta (β) waves occur in the frequency range from 13 to 30 Hz. Beta activity is related with anxiety, irritability, agitation, sleep disturbances and addictions (Prichep and John, 1992). Gamma (γ ) waves constitute the remaining frequency ranges from 30 to 100 Hz. This spectrum band is thought to be relevant for sensory and cognitive related brain functions. Gamma waves are thus involved in the complex activities of information processing (Colgin et al., 2009). They may also be related to motor visual processing and facial features expression (Muthukumaraswamy, 2010; Tang et al., 2011). Activity at higher frequencies are also present in the central nervous system. For example, frequencies situated around 600 Hz. These oscillations consist of a brief burst of activity, labeled often labeled as sigma-burst (σ-burst). The previous mentioned frequencies are considered to be generated by post synaptic activity. However higher frequencies, at around 600 Hz, are thought to originate from spiking activity. That is, the added-activity from single neuron cell spiking activity. Alterations in the amplitude and latency of the sigma-burst were observed under, reduced attention, general anesthesia and different stimulation paradigms (Waterstraat et al., 2012).

Specifically regarding movement, EEG activity is used as an indicator of movement initiation, prediction of its direction and even the limb that could be active during motion (Ahmadian et al., 2013). Human EEG is synchronized with muscle contraction (Salenius et al., 1996, 1997; Schoffelen et al., 2008) and is coupled with gait phase (Gwin et al., 2011). EEG rhythm changes before movement occurs for example as the Bereitschaftspotential or alpha and beta event related desynchronization (ERD). The bereitschaftspotential is a negative cortical potential which occurs around 1.5 to 1 s before the onset of a voluntary movement (Kornhuber and Deecke, 1965; Shibasaki and Hallett, 2006). ERDs are a short lasting decrease of frequency power in the alpha and beta bands that appear about 2 s before movement (Pfurtscheller and Neuper, 2003). As a practical example, these signals are used to decode a subject's movement intentions and provide control of an exoskeleton which aids the subject during locomotion (Kilicarslan et al., 2013).

#### **2.1. EEG ARTIFACTS**

Inherent with the measurement of brain activity are noise and artifacts. During recording, several sources of artifacts exist and therefore several kinds of noise contaminate the raw signal. The first most evident artifact, that occurs in recordings during movement are muscle activity artifacts. Muscle artifacts have their origin in the head and neck musculature which become active during head movement or stabilization during motion tasks (Gwin et al., 2011). Electromyographic (EMG) artifacts are the most difficult to deal with due to the fact that their spectrum overlaps with EEG activity, mainly with beta and gamma waves (Brown, 2000).

Other artifacts arise from sweat bridges, electrodes and cables movements, cardiac activity such as ballistocardiographic artifacts and eye movement. Sweat bridges occur when the person sweats and the salt and water form a contact bridge between two or more electrodes or simply alter the impedance of the electrodes. The electrolytes produced by the sweat glands create a battery effect causing a low frequency artifact. Eye movement and blink artifacts are also a source for EEG noise. In case of the use of a common average reference, they tend to affect the frontal electrodes causing a typical effect easy to identify in the raw data and in a topographic plot of the scalp. In the case of a nose reference they influence all electrodes. Electrode movement artifacts occur when the contact of the electrode with the scalp is disturbed, which results in a rapid change of impedance. Ballistocardiographic and cardiac activity artifacts happen when the pumped blood causes a mechanical movement on an electrode that lays on top of a blood vessel or is contaminated with heart electric activity. These are also easy to spot artifacts because they are rhythmic and with a much higher amplitude than EEG (Tyner et al., 1983). In sections, 3.5 and 4.2 we will present suggestions for the reduction of artifacts during recordings and during analysis.

#### **3. RECORDING HARDWARE, SOFTWARE, AND TECHNIQUES**

In this section, we present hardware, software, and techniques to deal with the previously described artifact issues and the recordings of body and brain dynamics during movement, with an emphasis on spatial resolution.

#### **3.1. AMPLIFIERS, ELECTRODES, AND CAP TYPES** *3.1.1. Amplifiers*

Over the past decades amplifiers have been optimized to improve input impedance. Today's amplifiers do not therefore alter the surface potentials. However, the surface potential is a result of the brain activity but not necessary for the brain activity itself. von Tscharner et al. (2013) has recently shown, by a model computation, that because of the relatively low inter electrode resistance, lateral currents between electrodes cause signals from neighboring electrodes to record mixed signals. Thus, signals contain information from both locations. Therefore, high impedance potential amplifiers do not allow optimal spatial resolution. The authors have shown this for EMG signals but this is most likely also the case for EEG signals. As an alternative, researchers may use trans-impedance amplifiers (electric current amplifiers). A trans-impedance amplifier removes or injects charges to keep the electrodes at ground or reference potential at all times. It yields a measurable voltage output proportional to these currents and thus to the EEG signal. von Tscharner et al. (2013) demonstrated that the trans-impedance amplifier significantly improves spatial resolution of EMG recordings because the inter-electrode cross talk is reduced. Hence, this method can perhaps improve the spatial resolution for the EEG signals.

#### *3.1.2. Electrodes*

Traditionally, the most use kind of electrodes type are wet electrodes, that is, an electrode that uses an electrolyte gel, or other means, to convey the signal from the person's scalp to the electrode pin that is coated with Ag-AgCl. This coating is used to obtain a low resistivity between the skin and the electrode and the conductive gel minimizes the electrochemical contact potential. Nevertheless, these electrodes require a time consuming preparation, especially while using a high number of electrodes for source analysis studies. After the measurements, the subjects also have to wash their head to remove the conductive gel. In addition, during longer data collection sessions, the gel may dry impairing signal conductivity. This limits the study of behavior, the development of brain computer interface for every day use, long term EEG studies or measurements in extreme conditions such as in space. In order to address these issues, researchers in recent years have developed dry electrodes.

A review by Liao et al. (2012) explores several solutions for dry electrodes alternatives. Most dry electrodes are of three types: dry micro-electromechanical system sensors (MEMS), dry fabric-based sensors and hybrid dry sensors. Additionally, a technology mentioned by Liao et al. (2012), are Photrodes™. These are a NASA spinoff in collaboration with the company Srico, Inc (Sawbury Blvd Columbus, OH 43235-4579, USA). A Mach-Zehnder interferometer measures the electric activity via the electro-optic effect that modulates a light beam. Just like other dry sensors, these also do not require skin preparation (Kingsley et al., 2004). In terms of performance, Estepp et al. (2009) showed that the correlation between wet and dry electrodes ranged from 0.45 to 0.82 depending on the electrode position on the participant's head. Additionally, Grozea et al. (2011) tested bristle-sensors against wet sensors and verified that the average coherence of the bristle-sensor/gel-based pair was above 80% of the average coherence of the two employed gel-based electrodes, from 7 to 44 Hz. In addition to that, in the frequency range around 10 Hz, the average coherence between dry and wet electrodes reached 90% of the wet-wet average coherence. For dry non-contact electrodes, Chi et al. (2012) reports a correlation between dry non-contact and wet electrodes, above 0.8 for half of the participants and for dry contact electrodes, a correlations of 0.9. Chi et al. (2012) explain that the lower signal correlation seen with non-contact electrodes and contact electrodes is due to signal degradation and susceptibility to movement artifacts when using the electrodes through hair. In summation, most of these sensors performed well, however there is no single study that tested these different devices with the same condition. In addition, the MEMS may cause injury or skin irritation due to friction of the contact surfaces with the scalp skin. Thus, researchers are advised to take this into account and judge the trade-off between technologies and take into account which conditions these perform better when designing studies (Liao et al., 2012).

To address the problem of movement noise and other signal interference, it is recommended to use active electrodes and shielded cables (Metting van Rijn et al., 1990, 1996). Active electrodes amplify the signal at the source, have a high input and low output impedance thus reducing the noise created by stray potentials and cable movements (Metting van Rijn et al., 1996). Grozea et al. (2011) and Chi et al. (2012) elaborated on solutions for active, dry electrodes. One commercial product of a MEMS electrode is the g.SAHARA by g.tec medical engineering (g.tec medical engineering GmbH, Sierningstrasse 14, 4521 Schiedlberg, Austria). Cognionics (Cognionics, Inc., San Diego, CA 92121) proposes a different approach to active dry electrodes, with their Flex Sensors in **Figure 1**. This approach provides a solution to the hair interference problem which became evident when using other previous dry electrodes (Chi et al., 2012). The electrodes are made from a 3D printed nylon material and are provided with a set of angled appendages, similar to legs, which when under pressure deform and flatten. This brushes the hair away and increases contact with the scalp surface while reducing hair

interference. When compared to dry electrodes these show a correlation of about 0.9 between the wet and dry signals (Chi et al., 2013). However these electrodes can only be used 20 to 30 times. Nonetheless, wet active and shielded electrode solutions exist, just like the actiCAP electrodes, distributed by Brain Products (Brain Products GmbH, 82205 Gilching, Germany).

#### *3.1.3. Caps*

The number and spatial distribution of EEG electrodes in an EEG electrode holder cap influences the spatial resolution and accurate source localization. Junghöfer et al. (1999) and Gutberlet et al. (2009) recommend a minimum of 64 channels with equidistant positions covering the lower areas of the head to record activity from these areas of the brain. A significant number of electrodes are recommended for independent component analysis (ICA) based artifact removal methods (Michel and Brandeis, 2010). For instance, Lau et al. (2012) showed that up to 125 electrode channels improve the ICA decomposition. On the other hand, it is possible to localize the two most robust sources with only 35 electrodes (Lau et al., 2012). Therefore, the number of channels may depend on the study objectives. Higher resolution may be necessary when measuring EEG activity during motion and correlating the EEG signals to EMG signals from specific muscles. The general view that for the localization of more sources, more electrodes are required may be misleading because the interelectrode resistivity drops with shorter inter-electrode distances and thus crosstalk among electrodes limits the spatial resolution (von Tscharner et al., 2013). Future research may therefore take advantage of combining measurements using trans-impedance amplifiers (mentioned above). However, the main limiting factor for analyzing EEG activity acquired during motion is most likely noise and movement induced artifacts. This will affect source localization. Thus the maximal appropriate number of electrodes will depend on how well one can control the mechanical influences and the inter electrode cross talk. Nonetheless, as signal acquisition and pre-processing techniques improve, one is approaching a technology that provides sufficient resolution and stability to obtain movement and behavior related information from the EEG.

Double-layered caps prevent cables from moving by restraining the cables between the layers. Thus eliminating a source of artifacts. The most commonly used ones are the BrainWave cap (Medi Factory BV, Buizerdstraat 3a, 6414 VT Heerlen, The Netherlands) or the WaveGuard™ (ANT-Neuro, Colosseum 22, 7521 PT Enschede, Netherlands). Alternatively, researchers can combine two of their present caps and accommodate the wires between the layers. This works particularly well with the actiCAP from Brain Products as seen in **Figure 2**.

Cognionics provides a high-density dry electrode EEG headset system which supports up to 64 channels (Chi et al., 2013), illustrated in **Figure 3**. This system integrates the Cognionics Flex Sensors described just above, and Cognionics version of the wireless acquisition unit, described in section 3.3.2. This design is important in order to keep adequate pressure on the sensors and thus ensures contact between sensor and scalp. The headset has concealed and restrained electrode cables; eliminating cable

electrode holder helps to fix both layers and the electrode. The cable passes trough the first layer to be fixed between both layers.

allocated on the side of the neck. Middle: View from the interior part of the headset with the structure that holds the electrodes. Right: Headset maintains its shape when not utilized. Picture, courtesy of Cognionics (Cognionics, Inc., San Diego, CA 92121).

movement and thus cable noise. Additionally, it seems to require minimal preparation and only small adjustments on pressure to ensure adequate signal collection.

#### **3.2. SPATIAL LOCALIZATION OF ELECTRODES**

Source localization techniques attempt to determine the generators in the brain that gave rise to a given scalp potential map. This is done by combining the EEG data with MRI images, thus providing a 3D representation of the possible cortex electric activity sources. However the accuracy of source localizations is influenced by the precision of the spatial localization of the electrodes in a 3D volume (Wang and Gotman, 2001). The information about electrode positions allows for the co-registration of the sampled EEG data with the study participant's own anatomy. (Michel et al., 2004). Three steps are necessary to obtain EEG sensors localizations: digitization of the electrode positions, electrode labeling and finally coregistration of the labeled 3D positions on the on the headmodel (Koessler et al., 2010). For more details on EEG source imaging readers can consult other studies (Grave de Peralta-Menendez and Gonzalez-Andino, 1998; Pascual-Marqui, 1999; Michel et al., 2004; Hallez et al., 2007).

Several methods exist to determine the electrode positions. The first and most described method is the 10–20 system, in which the electrode distances between adjacent electrodes are either 10 or 20% of the total front-back or right-left distance of the skull (Jasper, 1958). This system is limited, because the placement of electrodes is user dependent, therefore prone to inherit error of subjectivity. It also does not account for small inter electrode positioning differences and the subject's own anatomy. Furthermore, many of todays EEG electrode systems are implemented on elastic caps or some other kind of structure that allows a faster placement of electrodes on the head. Electrodes integrated in this kind of structure have a roughly pre-determined position, which adapts to the person's head (Michel et al., 2004).

To address these problems, researchers have several options that digitize positions of each electrode: The ELPOS system (Zebris Medical GmbH, Max-Eyth-Weg 43, D-88316 Isny, Germany) and the FastTrack system (Polhemus Inc, 40 Hercules Dr, Colchester, VT 05446, United States of America) can be used for this purpose. These systems automatically label each electrode. However, the digitalizations take about 20–40 min or more when multiple electrodes systems are employed and are user dependent, as the user must touch each electrode in order to acquire it's position. A study from Engels et al. (2010) further exposes some limitations and factors that influence the precision of systems such as FastTrack.

A less user dependent method for acquiring electrodes positions was described in the patent EP 2 561 810 A1 by Engels et al. (2011). This method uses at least 14 cameras that are arranged around the subject to determine the positions of reflective markers attached to the electrodes. The system detects and labels the electrodes automatically. However this method also needs an MRI scan of the person's head and a laser digitized scan of part of the person face and head, which is time consuming, impractical and expensive.

Russell et al. (2005) describes a photogrammetry system. This device shows reliable results and seems easy to use. A limitation may be that this system only works with a geodesic electrode array from Electrical Geodesics (Electrical Geodesics Inc., Eugene, Oregon, United States of America).

Ettl et al. (2013) demonstrate another optic system for the spatial detection of electrodes. This system is user independent, highly accurate and fast. It uses a hand-held, motion-robust, optical sensor based on Flying Triangulation (Ettl et al., 2012). The measurement occurs when a single-shot sensor acquires images yielding sparse 3D data. Afterwards, the data is aligned and the current measurement process is visualized in real time. Then, a dense 3D model of the object is obtained (Ettl et al., 2013). This system shows promise, although, it still does not detect and label electrodes automatically.

#### **3.3. WIRED AND WIRELESS EEG SYSTEMS**

Brain activity may be recorded by means of wired or wireless EEG systems. Nevertheless, study possibilities differ substantially, according to the systems' characteristics and as subjects are more restrained with a cable system than with a wireless system. Here we describe some of these systems and propose some means for allowing the recording of EEG during motion with wired systems. Additionally we review wireless systems that show promise for recording EEG during motion. Finally, we present suggestions on how to decrease motion related artifacts and suggest software for recording brain and body dynamics during movement.

#### *3.3.1. Wired EEG Systems*

With wired EEG systems the subject must remain constrained to a location and move only in that area. However, some solutions for the use of cable based EEG system during movement exist:

Most EEGs recorded while moving were performed using a cycle ergometer. The reason for this is that cycling does not create stepping impacts that provoke strong neck muscle contractions and electrode movements. Typical examples of studies that employed this methodology and successfully filtered the data to remove most artifacts are Brummer et al. (2011), Hilty et al. (2011), and Schneider et al. (2013). A strategy used by Jain et al. (2013) can further help with artifact reduction during cycling. Jain et al. (2013) used a recumbent cycle ergometer in an attempt to decrease neck muscle contractions, electrode movements and other motion-induced artifacts.

For other tasks, such as running or walking, we may look at the examples of Gramann et al. (2010), Gwin et al. (2010, 2011), and De Sanctis et al. (2012). They used a customized wired EEG system that allowed the subject to run on a treadmill. The electrodes cables were attached to the amplifier mounted above the head as seen in **Figure 4**. However, the subject's movements were restricted due to the limited cable length. This method allows the recording of EEG during walking or running, although cable movements induce extra noise to the data (Gwin et al., 2010). This showed how important it is to restrain the cables and make use of solutions like the ones shown in section 3.1.3.

Researchers may also utilize a modified overhead crane in a large room, as shown in **Figure 5**. The overhead crane carries the amplifier and a pre-recording system above the subject's head, which in turn is connected by cables to the computer that records the data. This system allows the subjects to move around the

**FIGURE 4 | Schematic of an over head holder for EEG amplifiers.** A, Amplifier; B, Arm holding the amplifier; C, Subject running on a treadmill (D).

designated large space. The overhead crane movements can be controlled by a feedback loop mechanism using proximity sensors, information from a MOCAP system or simply by manual control. Additionally, the overhead crane movements can be controlled by a passive system that consists of a cable attached to a body harness or vest, worn by the subject and each time the person moves, it induces it to move along.

#### *3.3.2. Mobile EEG systems*

Recently, developers have optimized wireless EEG systems that facilitate mobile recordings of brain activity. These offer an advantage compared to wired systems because the person is less restricted in movement range and types. The electronics are much smaller than in the conventional devices and allow the replacement of cables that transmit the data from the EEG cap to the computer.

The MOVE system, in **Figure 6**, replaces the cables between the electrodes system and the amplifier. After connecting the transmitter to the electrode control box, the data is transmitted via radio signals to the receiver which then sends the data to the amplifier. The transmitter pre-amplifies and digitizes the raw signals from the electrodes. The receiver then converts the signal back to an analog signal. This system can be used in addition to wet active electrodes system, such as the actiCAP from Brain Products. Moreover, the MOVE system works with several types of EEG amplifiers.

A study by Bulea et al. (2013) demonstrates the use of the wireless system MOVE. The video part of this study can be found via the link http://www.jove.com/video/50602/. In this study the subjects perform a series of exercises during data acquisition such as walking through a predetermined course in a large room, sit to stand and treadmill walking. Kilicarslan et al. (2013) used the MOVE system to acquire the brain activity of a paraplegic patient who controlled an exoskeleton with his thoughts.

Each MOVE unit can host a maximum of 64 electrodes. However, up to 5 units can be used at the same time in parallel for additional channels, or testing more than one subject at the same time. This receiver works best when it is less than 6 m distance from the transmitter. Whenever the connection is interrupted, the receiver sends a TTL marker to the amplifier and a second one when the connection is reestablished and stable. This allows the user, during the analysis phase, to know when the problem occurred. This may be a limitation, as it requires close proximity to the receiver or spatial dislocation of the receiver. All components, including the electrodes system, are powered by small long life lithium batteries, which hold the system functional for about 9 h. The manufacturer also specifies that the system has 16 bit resolution and operates at a maximal sampling rate of 954 Hz.

Another available system that allows high-density EEG recordings is the eegosports™ from ANT-Neuro. In an innovative project, much like Kilicarslan et al. (2013), researchers utilize this system to create a brain controlled exoskeleton, with the purpose of optimizing the rehabilitation of paraplegic patients. The MINDWALKER Project (Gancet et al., 2012) can be accessed under https://mindwalker-project.eu.

The eegosports wireless system uses a different approach: it uses a small amplifier and a VAIO™ Ultrabook® (Sony Corporation, Konan, Minato-ku, Tokyo 108-0075, Japan) laptop worn in a small backpack. EEG signals enter the device at the connectors and are pre-amplified. Afterwards, they are sampled in an A/D converter located in the amplifier case. The signals are amplified and pre-recorded locally. The computer sends the data wirelessly to the remote computer where it is stored. This approach allows for the temporarily store data during unstable connections. The risk of lost data is thus minimized. The system has the maximum capacity of 64 EEG electrodes and part of these can be used as EMG bipolar electrodes. Furthermore, this system works with the ANT 64 EEG electrode array WaveGuard cap. As described in section 3.1.3, the two layers of fabric fix the electrode cables, thus potentially reducing cable movement

artifacts created during motion. However, this cap utilizes passive electrodes with all disadvantages compared to active electrodes systems, even though these are shielded electrodes. Nevertheless the data obtained in a mobile setting is of sufficient quality for use in sophisticated analysis (Ehinger et al., 2014). The amplifier weights around 500 g. The whole system is light and small enough for a person to transport it (**Figure 7**). No cables restrict the person to any location. One issue is the temperature generated by the laptop, which may become uncomfortable and change the subject's body temperature. This increase in body temperature is undesirable as it may cause the subject to sweat. An advantage of this system is a maximum sampling rate of 2048 Hz and a resolution of 24 bit. Similarly, the eegosports is powered by integrated batteries with an operating time of up to 6 h.

cap cable goes inside the backpack where it connects to the amplifier and Ultrabook. Right: Lateral view.

The last wireless system we would like to describe is the Cognionics wireless EEG acquisition unit with 64 channels with a maximum sampling rate of 300 Hz. This unit encloses the digitizers, amplifier, micro controller and wireless transmitter as shown in **Figure 8**. This system uses standard 1.5 mm touchproof lead wires, thus is compatible with any device that utilizes touchproof connectors. The data is wirelessly transmitted via Bluetooth within a range of about 10 m. The system is also compatible with any computer, tablet or phone supporting the Bluetooth RFCOMM/Serial Port profile. The amplifier has a built in wireless trigger receiver. Therefore, it can work with transmitters such as the ones mentioned in section 3.5.3. Two AAA ( 44.5 mm in length and 10.5 mm in diameter) batteries can feed the system for about 6 h of data streaming. **Table 1** summarizes the characteristics of the described systems.

Other wireless systems solutions are g.tec (g.tec medical engineering GmbH, Sierningstrasse 14, 4521 Schiedlberg, Austria) and Mindo (National Chiao Tung University Brain Research Center, 1001 Ta-Hsueh Rd., Hsinchu 30010, Taiwan). It is beyond the scope of this paper to explore every system and their capabilities in detail. We exposed the main features of some systems and we advise researchers to choose the system that suits their needs best.

#### **3.4. RECORDING BODY DYNAMICS**

MOCAP and Electromyography (EMG) can be recorded simultaneously and synchronously combined with EEG recordings in order to obtain body spatial and muscular dynamics, corresponding to the specific brain activities occurring in a time window (Makeig et al., 2009; Gwin et al., 2011; Bulea et al., 2013).

#### *3.4.1. Motion Capture*

MOCAP is the digital acquisition of movement through the use of computers. There are a few methods for the acquisition of movement:


of the information, one can obtain joint angles and accelerations (Cooper et al., 2009; Fong and Chan, 2010; Sabatini, 2011).

• Optical methods: A person wears light reflective (passive) or emitting (active) markers (Sulivan et al., 2006; Tobon, 2010). Cameras track these markers and the system calculates their location through triangulation methods. There are also markerless methods based on computer vision (Gavrila, 1999; Poppe, 2007).

Motion related studies predominantly utilize infrared MOCAP methods because of its reliability and accuracy. Thus we explain here this method in more detail. Most MOCAP systems use reflective markers. Dedicated software combines the acquired images from different positions and by triangulation techniques it tracks the marker's positions in space. By repeating the acquisition over time, during a movement, the system is capable of describing the trajectory of an object. Systems, such as the ones provided by Vicon (Vicon Motion Systems Ltd., Oxford, United Kingdom) and systems from Qualysis (Qualisys AB, Gothenburg, Sweden) use such methodology.

The cameras' set-up is important, as at least 2 cameras must see each reflector marker to allow for triangulation. Whenever a marker is not visible by a camera, it is called an occluded marker. The addition of extra cameras may solve this problem during motion. A camera set-up of eight units is in general sufficient to capture body dynamics while walking or running. The space where the markers can be visualized by the cameras is called volume. The larger the volume, the more cameras with will be


**Table 1 | Wireless EEG systems.**

required thus allowing that 2 or more cameras can track the markers at all times (Tobon, 2010).

When recording body dynamics, the placement of the reflective markers is important for data interpretation and movement modeling. Marker positions can differ amongst manufacturers and laboratories, which can sometimes create difficulties when comparing results. C-Motion's (C-Motion, Inc., Germantown, MD, United States of America) suggestion for markers placement can be found at the following location: http://www.c-motion.com/v3dwiki/index.php?title=Marker\_Set\_ Guidelines#cite\_ref-Serge\_0-0. This suggestion from C-Motion also includes a well known markers placement guideline known has the Helen Hayes markers set (Kadaba et al., 1990). In order to place the markers on a person's body, C-Motion recommends to follow palpation guidelines of skeletal landmarks according to van Sint Jan (2007).

For MOCAP of locomotion over long distances and natural environment, i.e., field tests, Ojeda et al. (2013) developed a MOCAP mobile platform. The device consists of a wheeled platform that moves along with the walking subject. The cart position must be known in order to determine the subject's position. The authors present several methods and conclude that these methods are practical to be implemented with present-day sensors that grant accuracy of better than 1% over arbitrary distances. Therefore, researchers can possibly realize full body and brain dynamics recordings in an outside environment.

#### *3.4.2. Surface electromyography*

There are two kinds of electromyography (EMG): sEMG (surface EMG) and intramuscular EMG, which is an invasive technique involving needles. In this paper, we only address sEMG. In its essence sEMG is a technique that allows the evaluation of muscle activity by recording the electric activity produced by muscles. sEMG signals are the superimposed motor unit potentials (MUAPs) from several motor units. sEMG is recorded similarly to EEG, i.e., by placing an electrode in contact with the skin.

Researchers and clinicians use sEMG in applications for the non-invasive assessment of the neuromuscular structure functions. Areas of application of sEMG methods include sport science, neurophysiology and rehabilitation. From the sEMG recordings, researchers and clinicians can monitor muscle activation patterns in order to identify pathologies or evaluate therapies and sports performance (Rainoldi et al., 2004).

sEMG acquisition is performed by placing a bipolar electrode in contact with the skin above the targeted muscle of interest. The positioning of the electrodes, condition of the skin and electrode type, are important factors for adequate signal acquisition. Therefore, guidelines for EMG acquisition and EMG data analysis and reporting, were developed by the project Surface ElectroMyoGraphy for the Non-Invasive Assessment of Muscles (SENIAM) (Hermens and Freriks, 1999; Hermens et al., 2000) http://www.seniam.org and the Society of Electrophysiology and Kinesiology (ISEK) (Merletti and Di Torino, 1999) http://www. isek-online.org.

SENIAM offers guidelines for sensor types, placement and location. However, we suggest that researchers use sensor location references that best suit their experiments. Examples of other references for sensor positioning are Rainoldi et al. (2004) for lower limb muscles and Forsberg and Hellsing (1985); Schüldt et al. (1987) for electrode locations on the face, head and neck muscles.

Developments in wireless devices help reduce cable movement artifacts and increase the freedom of movement. Wireless EMG use is therefore a good choice when brain and body recordings take place in a mobile setting. EMG wireless systems offered by Noraxon (Noraxon USA Inc., Scottsdale, Arizona, USA) such as the Desktop Direct Transmission System (DTS) can hold up to 16 channels and sample at a rate up to 3000 Hz. This system utilizes small lightweight probes attached to the electrodes, preamplify the signal and transmit it wirelessly over a distance of up to 20 m. The DTS can also utilize other biomechanical sensors like goniometers, inclinometers, foot switches and can be combined with MOCAP.

Another wireless EMG system is the Trigno™ Wireless system (Delsys Inc. Massachusetts, USA), which uses dry EMG electrodes. The sensors include integrated triaxial accelerometers with motion artifact suppression and can be synchronized with motion capture. The Trigno Wireless supports 16 EMG channels, 48 accelerometer channels, a sampling rate of 2000 Hz and a transmission range of 40 m. Similarly to EEG systems, the literature is lacking in studies that compare EMG acquisition systems and the signal quality obtained.

#### *3.4.3. Force plates, IMUs, and eye tracking*

As researchers are not only interested in investigating the kinematics but also the kinetics of a subject's movements, force plates play an essential role in biomechanics. Usually composed of a plate with integrated piezoelectric sensors or strain gauges, force plates provide information about the forces exerted on the ground and equivalently the ground reaction forces acting on the body. Inverse dynamics algorithms can then be applied to determine the forces and moments acting on the body and joints during dynamic movements such as gait, running, cutting movements etc. (Robertson et al., 2004).

As single force plates can pose problems with acquiring valid data due to bad foot placement which in turn requires a high number of trials (Oggero et al., 1998), a more and more common way to acquire kinetics during gait and running are instrumented treadmills. For measuring each foot separately during gait with double limb support phases, split-belt instrumented treadmills are used. The advantage of using instrumented treadmills is that data can be recorded continuously allowing measurements with a high number of strides in less time. Nevertheless, the gap in splitbelt instrumented treadmills might affect kinematics and kinetics as the base of support is increased (Lee and Hidler, 2008; Altman et al., 2012) and familiarization is advised (Zeni and Higginson, 2010).

The measurement of forces acting directly on the body during cycling is possible through force measuring pedals that act as mobile force platforms. Strain gauges attached to a pedal spindle in a Wheatstone bridge configuration allow for measuring the tangential and normal forces in the sagittal plane (Reiser et al., 2003). Furthermore, in order to assess human sensorimotor interactions during cycling, the seat as well as grip forces and torques may be measured with sensors attached to the seat supporting rod and the stem or handle bars (Zhang et al., 2012).

Inertial measurement units (IMUs) such as accelerometers, gyroscopes, and magnetometers allow subjects to move unrestricted of a MOCAP system's measuring volume. IMUs attached to the subjects body measure its kinematics. The combination of these body sensors can then be fused to estimate joint angles (Cooper et al., 2009; Sabatini, 2011; Tadano et al., 2013). Further advantages are the low costs and the small size that makes measurements unobtrusive and implementable in realistic everyday measurements. Wireless synchronization can ensure the synchronization with other hardware components mentioned before. When not using internal storage on data storage devices, the quality of wireless communication links must be ensured to guarantee transmission to a data recording station (Hanson et al., 2009). Specialized calibration procedures or other reference systems are required for the correct alignment of the sensors to the body and angle estimation (Favre et al., 2009). The gold standard for measuring joint angles, especially during highly dynamic movement is therefore still a marker based MOCAP system.

Because there is a close relationship between vision and movement control, the synchronous analysis of gaze and motion plays an important role in current research (Ketcham et al., 2006; Heinen et al., 2012; Causer et al., 2013). Recently, Essig et al. (2012) presented a modular approach to combine infrared MOCAP systems and mobile eye trackers for the analysis of the 3D gaze vector within the 3D MOCAP volume, while traditional eye trackers relate the gaze only to 2D video positions. One step calibration procedures can ensure the coherence between the gaze direction and the MOCAP system. The integral approach allows studying gaze during dynamic movement tasks whereas traditional studies were usually carried out under artificial laboratory conditions. Researchers are thus able to investigate perception, attention and eye-body-environmental interaction in realistic 3D environments and during realistic tasks in a integral approach.

The libGaze library presents an open-source framework to combine eye tracking with MOCAP systems for real-time tracking of gaze and the observer's positions (Herholz et al., 2008).

adjustment and fitting; B, Strap bands for cable holding; C, Integrated cooling

As commercial solution, the Vicon MOCAP system and the Ergoneers Dikablis Eye Tracking Solution represent a closed approach in Vicon Nexus analysis software to track the body's position and the 3D gaze vector. Version 2.9 of the Qualisys Oqus camera system also supports the Ergoneers Dikablis eye and 3D gaze vector tracking.

#### **3.5. DATA RECORDING**

#### *3.5.1. Reducing artifacts during data recording*

In order to deal with artifacts mentioned in section 2.1 during data recording, we present some recommendations.

To deal with salt and sweat bridges short exercise tasks with resting intervals in an air conditioned room are recommended. To further maintain body temperature, subjects can wear a cooling ventilation vest during exercise (Pohr and Vogler, 2007). A modified version of this vest can accommodate parts of the EEG system as depicted in **Figure 9**. The vest opens completely and is only attached to itself in the middle section. A study by Barwood et al. (2009) shows that subjects wearing a cooling vest exercised for 18% longer time, required less rest and maintained a skin temperature lower than in control subjects. Thus, a ventilator vest can perhaps compensate for the increased heat, created by wearing EEG equipment during motion, and improve subject's performance.

To avoid electric artifacts, the recording area should be free of sources of electric interference like engines or radiation emitting devices. Mains hum create an electrical artifact at 50 or 60 Hz frequencies, for Europe and USA respectively. Notch filters can reject this artifact during post recording analysis. Further, ensuring a qualitatively good connection and online impedance check are essential in order to obtain a good signal. Finally, cables active shielding implementation help to reduce electrical noise. Solutions presented in section 3.1 reduce electrodes and cable movements. As mentioned previously, the use of a doubled layered cap effectively holds the electrodes and cables, thus minimizing this kind of artifact. In addition, researchers can use low impedance output active electrodes that pre-amplify the signal at scalp level.

Red arrows indicate the flow of hot air, which leave the vest.

Muscle activity creates a major source of noise during recordings. EMG has amplitudes from 100 to 1000 μV at frequencies from about 5 to 450 Hz. Brain activity also occupies this frequency range, raging from 0 to 100 μV. EMG artifacts dominant amplitude is in the 50 to 150 Hz band while about 90% of the EEG spectral power is present in the 1 to 30 Hz band. Therefore, when muscle activity is present, it affects much of the EEG (Shackman et al., 2009).

Due to these aspects, EMG artifacts reduction is conducted during the signal pre-processing phase by computational methods, for instance ICA (Bell and Sejnowski, 1995; Makeig et al., 1996). In this category eye blinks, cardiac and other artifacts of electromyogenic nature are also reduced during the pre-processing phase. However, adequate signal acquisition is required for better results when using ICA methods. In section 4.2 we describe some computational methods for dealing with these artifacts and reducing their presence in the recorded data. Other suggestions during recording to avoid muscle artifacts include the instruction and training of the participants to swallow and eye blink during the intervals of short recordings and avoid severe face and head muscle contractions during exercise such as weight lifting.

#### *3.5.2. Data acquisition settings recommendations*

The data acquisition settings are an important step in the study design. Adequate data sampling allows successful artifact reduction using ICA methods and provides better results. Also for body dynamics recording, adequate sampling rates and the numbers of samples are necessary, depending on the hardware and analysis methods.

In infrared MOCAP, an adequate sample rate is required to allow to capture movements. For running a sample rate between 120 and 250 fps should be sufficient. For instance, Gwin et al. (2010, 2011) used a sample rate of 120 fps for running speed of 1.9 m\s. De Sanctis et al. (2012) utilized a capture rate of 100 fps for a speed of 1.39 m\s. For faster movements though, such as throwing or hitting, an increased sampling rate might be required.

For EMG sampling, a minimum of 1000 Hz sample rate is recommended by SENIAM and ISEK. This is based on the signal ranges since the significant EMG activity happens between 5 and 450 Hz. We also advise the use of standard consensual EMG sensor locations and follow recommendations of the SENIAM or ISEK. These may perhaps not be the ideal for every muscle group but it offers a base of comparison for researchers between studies. This way studies are easier to be compared (Viitasalo and Komi, 1977; Komi and Tesch, 1979; De Luca, 1997; Merletti and Di Torino, 1999; van Boxtel, 2001).

When using ICA based methods for EMG artifact removal, it is necessary to acquire enough data for the algorithms to work adequately. With Adaptive Mixture of Independent Component Analyzers (AMICA), 10,000 muscle samples may be enough find to the muscle components (Palmer et al., 2011; Delorme et al., 2012). Also when using ICA based methods, it is debatable how many data samples are necessary to find the different components. The EEGLAB FAQs web page http://sccn.ucsd.edu/~scott/ tutorial/questions.html recommends to use at least the square of the number of channels. In a paper from Makeig et al. (1999) the authors used over six times the number of necessary input points and ICA, which allowed for the identification of three spatially fixed, temporally independent, behaviorally relevant, and physiologically plausible components.

Furthermore, simultaneous direct acquisition of EMG signal from the neck muscles that induce most artifact during movement (Gramann et al., 2010), can help detecting noise components and their subsequent exclusion. Thus, researchers can place EMG or EEG electrodes below the nuchal line and above the C7 process to measure activity of the muscles that provide stability to the head during motion. Also, sternocleidomaistoideus muscles perform an important role in head stabilization. Due to this stabilization function, these muscles activity, can induce artifacts. Hence, their EMG should be recorded in order to facilitate artifact removal. Researchers can consult Forsberg and Hellsing (1985), Schüldt et al. (1987), and Leutheuser et al. (2013) for suggestions of locations for these electrodes.

Subject's safety is important when performing recordings of exercises. The American College of Sports Medicine (ACSM) provides guidelines for exercise testing and prescription (Pescatello et al., 2013). These guidelines give indications to clinicians and scientists on how to perform exercise testing in healthy and unhealthy subjects and termination criteria based on physical and physiological signs. Lastly, the Borg scale of rates of perceived exertion provide a measurement tool to monitor the participant's performance and fatigue during exercise testing (Löllgen, 2004). The Borg scale measurements are correlated with oxygen consumption and heart rate.

#### *3.5.3. Brain and body data acquisitions synchronization*

Synchronization of the measurement devices in the millisecond range is necessary when investigating different modalities (motion capture, force plates, EEG, EMG, etc.) simultaneously during movement. Even slight time shifts between the single devices potentially lead to a misinterpretation of the obtained results. Synchronization is also important for real-time analysis, since time shifts have immediate effects. The data from the acquisition devices are usually asynchronous due to different internal clocks, sampling rates, network and operating system delays (Delorme et al., 2011). Delorme et al. (2011) proposed a software approach for data streaming management with near real-time synchronization capabilities.

This software approach has evolved into the open-source project known as the lab streaming layer (LSL). This is a data acquisition system developed by Christian Klothe from the Swartz Center for Computational Neuroscience, Institute for Neural Computation, University of California San Diego, USA. LSL allows the exchange of time series between devices, programs and computers. It's a system for the unified collection of measurement time series in research experiments. It consists of a core transport library and a series of tools. These tools include a recording program, online viewers, importers and acquisition software. These acquisition programs can acquire data from various hardware including EEG, eye tracking, motion capture, force plates, etc., from several manufacturers.

The built-in time synchronization in LSL relies on clock offset measurement and a timestamp for each sample which are collected alongside with each actual sample data. The recording program included with LSL, the LabRecorder, collects the information, including time stamps and clock offsets, for every stream and stores it. Interested readers can consult the LSL google code page at https://code.google.com/p/labstreaminglayer/ for details, downloads and related documentation.

Another way to synchronize several devices is to use hardware synchronization. This is usually achieved trough use of TTL (transistor-transistor logic) signals, via coaxial trigger cables with BNC connectors between the devices. This eliminates potential software synchronization delays. There are several possible hardware synchronization implementations:


#### **4. DATA ANALYSIS SOFTWARE AND ARTIFACT REMOVAL TECHNIQUES**

#### **4.1. SOFTWARE FOR DATA ANALYSIS AND VISUALIZATION**

To visualize and analyze synchronously captured data some options exist. Data acquired with the LSL software can be read by the MoBILAB software package. This software contributes to the Mobile Brain/Body imaging (MoBI) concepts put forward by Makeig et al. (2009). MoBILAB is designed by Alejandro Ojeda, also from the SCCN, with Nima Bigdley Shamlo and Christian Kothe. Now, this package runs as a standalone, open source, cross platform toolbox for Matlab (The MathWorks, Inc., Natick, Massachusetts, USA). MoBILAB supports the analysis and visualization of synchronously recorded EEG data, motion capture, EMG data and environmental data as seen in **Figure 10**.

In the issue of this same paper, Alejandro Ojeda dedicates an article to the MoBILAB software. Therefore, it is irrelevant to further detail this software here. For details, readers are invited to consult Ojeda et al. (2014) and the wiki page http://sccn.ucsd. edu/wiki/Mobilabsoftware.

A further possibility is a subsequent usage of biomechanical analysis and signal analysis software, such as Visual3D™ (C-Motion, Inc., Germantown, MD, United States of America) and EEGLab (Delorme and Makeig, 2004) or other signal analysis software. Visual3D is a product for 3D MOCAP data analysis and biomechanical modeling. It provides signal processing and biomechanical analysis tools such as 6◦ of freedom modeling, inverse kinematics and dynamics and can thus determine the joint angles, powers, moments, forces, velocities and accelerations during motion. Additionally, time series segmentation can be conducted with Visual3D, for example for gait cycle segmentation or any other movements using event detection based on minimum/maximum search, thresholding or template comparison on any calculated biomechanical parameter. When exported to EEGLab, the segmentation time stamps can be of further use, provided synchronized measurements, brain and muscle activity can thus be directly linked to the corresponding movements. In EEGLab, the user can then proceed with the necessary EEG signal analysis such as source localization for the specific movement task.

#### **4.2. ARTIFACT REMOVAL METHODS**

Signal artifact reduction procedures combine various approaches and routines to EEG artifact detection and removal. Overall, artifact removal procedures can be divided into basic and advanced processes. The basic stage of artifact removal focuses on environmentally induced artifacts such as cable noise, power line noise and impedance increase. These can be removed mostly by band and notch filters. The advanced stage involves the removal of EMG and other artifacts through methods such as ICA (Bell and Sejnowski, 1995; Makeig et al., 1996). Here we suggest a compilation of several artifact removal procedures. **Figure 11** describes the complete procedure.

We suggest the REMOV process as the first stage of data cleaning thoroughly described in Artoni et al. (2012). In this step, most of the environmental artifacts are removed through filtering and noise segments rejection using BCILAB tools (Kothe and Makeig, 2013) available for download at http://sccn.ucsd.edu/ wiki/BCILAB. Application of band pass filters is the inclusion of frequencies of interest and exclusion of other less interesting frequencies and noise. The REMOV procedure includes the removal of eye blinks but not the removal of EMG, heart and loose electrodes artifacts. The combination of the REMOV process with other procedures allows further reduction of artifacts.

For the removal of the remaining artifacts (heart beat, loose electrodes, ocular movements, muscular activity), researchers can use ICA methods and EEGLAB compatible tools for further processing. As of today, there exists several variations of ICA algorithms. We advise the use of the Adaptive Mixture of Independent Component Analyzers (AMICA) (Palmer et al., 2011) as it outperforms other algorithms in decomposing data (Delorme et al., 2012) and at removing EMG artifacts (Leutheuser et al., 2013). Also Gramann et al. (2010); Gwin et al. (2010, 2011) used AMICA successfully to remove walking and running artifacts from EEG data. AMICA source code is available at http://sccn.ucsd.edu/ ~jason/amicaweb.html. After the data is decomposed by ICA, noise inducing components must be selected. For the selection of

ICA components, researchers can choose an automatic or manual approach.

Due to the typical problem of the subjective and time consuming selection of ICA components to exclude some researchers created automatic component selection tools in an attempt to reduce the user dependent factor. An is the Multiple Artifact Rejection Algorithm (MARA) (Winkler et al., 2011), http://www. user.tu-berlin.de/irene.winkler/artifacts/. This is a universal classifier of ICA components from EEG data. MARA can be used as a plugin for EEGLAB. It is based on linear methods and can be utilized with different electrode placements. This classifier was trained by experts on large data during static and dynamic situations. This algorithm identifies components from muscle, eye and electrode movements. This is an attempt to automatize the time-consuming component selection process. However, we do not know of any walking, running or sport related study that used MARA. Therefore, its performance is somehow uncertain with other movements than the one which the classifier was trained with.

Thus, Gabsteiger et al. (2013) trained a classifier for the selection of muscle activity independent components. It is designed to cover a diverse selection of exercises that stimulate the musculature that most interfere in EEG recordings during movement: the Automatic Classification of Electromyogenic ICA Components (ACEMIC). This selection of exercises should produce similar artifact patterns as seen in most exercises or movements. Evaluation of this classifier shows a 93% sensitivity and 96% specificity. ACEMIC is implemented as a plugin for EEGLab and can be downloaded from http://www5.cs.fau.de/research/areas/digital-sports/automa tic-classification-of-electromyogenic-ica-components/.

Users may opt for manual selection of ICA components. For this purpose, we suggest users follow indications for data decomposition of the EEGLAB manual http://sccn.ucsd.edu/ wiki/Chapter09:DecomposingDataUsingICA. EMG and other artifact component selection directions, according to their spectral and topographical characteristics, are given in Goncharova et al. (2003) and McMenamin et al. (2010). Components that exhibit high spectral power and that are located at the electrodes of the periphery, are more likely to be myogenic activity. Also, the shape of the dipole patters has to be considered. EEG activity patterns are more likely to show smooth well-localized

previously described steps. We advise, running AMICA once, remove the 4–6 (dependent of number of channels) most noisy components, running AMICA again and removing again noise components.

and defined patterns. With these propositions, researchers will more accurately identify noise components that should be removed.

It is important to remove artifact components to keep hold of neuronal signals. Thus, **Figure 12** gives an example of an EMG component and an EEG component. The selected components according to the criteria from the mentioned studies. The more centrally localized component shows higher power in the lower frequencies and a drastic reduction in power at frequencies above 30 Hz, which is consistent with brain activity components. The posteriori localized component at the back of the head at the neck has power above 30 Hz which is higher than usual for artifact free EEG. This is consistent with EMG activity and should therefore be rejected (Goncharova et al., 2003). The rejection of the components can be realized with EEGLAB as well. Further, artifact reduction techniques can be tested for overcorrection of the EEG signals. Gwin et al. (2010) did so by computing the power spectral density of the resulting signals and compared spectral power in the 1.5- to 8.5-Hz frequency band before and after application of AMICA as an artifact removal tool. There was no sign of removal of EEG signal. The artifact cleaned recordings were also tested for whether in the movement conditions it would be possible to identify a ERP time-locked to visual target (oddball) stimulus. These were nearly identical to ERPs in the baseline condition (standing). For the running condition the ERP was only visible after artifact reduction. Therefore with this methodology it is possible to remove artifacts during running so that ERPs are identifiable similarly to a baseline condition.

Another interesting and valuable approach is demonstrated by Plöchl et al. (2012). This study attempted to remove eye movement artifacts by simultaneously recording eye movements and EEG during a guided eye movement paradigm. It resulted in the creation of an algorithm, which uses eye movement information to identify eye movement related ICA-components in an automatically. Removing the detected ICs from the data resulted in the suppression of ocular artifacts including microsaccadic spike potentials, while the EEG signal remained unaffected Plöchl et al. (2012). Ultimately, this study is an example of how recording body dynamics simultaneously to EEG, can help to reduce movement induced artifacts.

Similar to ICA, Canonical correlation analysis (CCA) is also a blind source separation (BSS) method that can reduce the influence of EMG artifacts on EEG data (De Clercq et al., 2005, 2006). BSS-CCA assumes that the autocorrelation of sources that are mostly influenced by electromyogenic activity are significantly lower then the autocorrelation of brain sources. The user therefore only has to decide how many sources, i.e., components, to reject but not which ones. The toolbox is available for download at: http://www.neurology-kuleuven.be/?id=210. The BCILAB toolbox (Kothe and Makeig, 2013) includes different filters to remove artifacts. The "clean peaks" filter projects events with abnormally high power, e.g., EMG artifacts, out of the data.

#### **5. SUMMARY AND CONCLUSIONS**

In this paper, we demonstrated methods and equipment that exist today which allow the recordings of body and brain activity during motion. Hardware, software and techniques were covered. These methodologies open a wide range of research opportunities into the cognition, motion, environment interaction and therefore, behavior fields. In fact, recording and analyzing EEG during motion remains a challenge and we hope that this paper can help researchers who attempt to dwell in this field. It is also an intention of this paper, to compile and give structure to the amounts of new methods that emerged to offer solutions for measuring and analyzing EEG and body dynamics during motion. We also speculated about future technologies such as using current amplifiers (trans-impedance amplifiers) that may allow measuring EEG with higher spatial resolution. We focused

on high-density EEG and body dynamics, not addressing the field of brain computer interfaces. In future studies it will be necessary to compare different methods and hardware more often, for instance, studies comparing the reliability of different electrodes and of the recorded signal quality. If a higher spatial resolution can be obtained then it is necessary to measure more accurately and report the spatial localization of the electrodes. Generally, today's methods have reached a point where one can consider measuring EEG, EMG, kinematics, and kinetics simultaneously during motion. Thus, they open new possibilities in the field of behavior and neuroscience.

#### **FUNDINGS**

This work was supported by the Bayerisches Forschungsstiftung (Bavarian Research Foundation).

#### **ACKNOWLEDGMENTS**

The authors would like to thank Lucie Novotná, DiS for the time and patience put on this article artwork.

*Language corrections:* Text revision and English language corrections were done by Andreas Oikonomou, B.A. and Robert Westbrook B.A.

#### **REFERENCES**


Component localized at the back of the head with high power content above 30 Hz which is consistent with EMG activity (Goncharova et al., 2003). This component should be considered for rejection which can be realized with EEGLab.


artifacts in EEG data - ACCEPTED," in *The 15th International Conference on Biomedical Engineering (ICBME 2013)* (Singapore), 1–4.


myogenic artifact correction for scalp and source-localized EEG. *Neuroimage* 49, 2416–2432. doi: 10.1016/j.neuroimage.2009.10.010


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 29 November 2013; accepted: 03 March 2014; published online: 24 March 2014.*

*Citation: Reis PMR, Hebenstreit F, Gabsteiger F, von Tscharner V and Lochmann M (2014) Methodological aspects of EEG and body dynamics measurements during motion. Front. Hum. Neurosci. 8:156. doi: 10.3389/fnhum.2014.00156*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Reis, Hebenstreit, Gabsteiger, von Tscharner and Lochmann. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Assessing the quality of steady-state visual-evoked potentials for moving humans using a mobile electroencephalogram headset

#### **Yuan-Pin Lin, Yijun Wang\*, Chun-Shu Wei and Tzyy-Ping Jung\***

Swartz Center for Computational Neuroscience, Institute for Neural Computation, University of California, San Diego, CA, USA

#### **Edited by:**

Klaus Gramann, Berlin Institute of Technology, Germany

#### **Reviewed by:**

Thorsten O. Zander, Technical University Berlin, Germany Benjamin Blankertz, Technische Universität Berlin, Germany

#### **\*Correspondence:**

Yijun Wang and Tzyy-Ping Jung, Swartz Center for Computational Neuroscience, Institute for Neural Computation, University of California, 9500 Gilman Drive, Mail code 0559, La Jolla, San Diego, CA 92093-0559, USA e-mail: yijun@sccn.ucsd.edu; jung@sccn.ucsd.edu

Recent advances in mobile electroencephalogram (EEG) systems, featuring non-prep dry electrodes and wireless telemetry, have enabled and promoted the applications of mobile brain-computer interfaces (BCIs) in our daily life. Since the brain may behave differently while people are actively situated in ecologically-valid environments versus highly-controlled laboratory environments, it remains unclear how well the current laboratory-oriented BCI demonstrations can be translated into operational BCIs for users with naturalistic movements. Understanding inherent links between natural human behaviors and brain activities is the key to ensuring the applicability and stability of mobile BCIs. This study aims to assess the quality of steady-state visual-evoked potentials (SSVEPs), which is one of promising channels for functioning BCI systems, recorded using a mobile EEG system under challenging recording conditions, e.g., walking. To systematically explore the effects of walking locomotion on the SSVEPs, this study instructed subjects to stand or walk on a treadmill running at speeds of 1, 2, and 3 mile (s) per hour (MPH) while concurrently perceiving visual flickers (11 and 12 Hz). Empirical results of this study showed that the SSVEP amplitude tended to deteriorate when subjects switched from standing to walking. Such SSVEP suppression could be attributed to the walking locomotion, leading to distinctly deteriorated SSVEP detectability from standing (84.87 ± 13.55%) to walking (1 MPH: 83.03 ± 13.24%, 2 MPH: 79.47 ± 13.53%, and 3 MPH: 75.26 ± 17.89%). These findings not only demonstrated the applicability and limitations of SSVEPs recorded from freely behaving humans in realistic environments, but also provide useful methods and techniques for boosting the translation of the BCI technology from laboratory demonstrations to practical applications.

**Keywords: EEG, BCI, mobile EEG system, SSVEP, moving humans**

#### **INTRODUCTION**

Recent advances in mobile electroencephalogram (EEG) technologies (Stopczynski et al., 2011; Wang et al., 2011; Chi et al., 2012) have radically boosted the demand of building mobile EEG-based brain-computer interfaces (BCIs) for various reallife applications, such as entertainment and clinical/in-home monitoring, assessment and rehabilitation. Thus, understanding and characterizing inherent links between human behaviors and EEG dynamics are the cores of dominating the applicability of mobile BCIs. Over the past decades, considerable laboratoryoriented BCI studies/demonstrations have led to fundamental and practical insights into how human brain actively/passively reacts in a closed-loop BCI. However, both theoretical and exploratory evidences suggest that brain dynamics might behave distinctively in response to natural environments versus those observed in highly-controlled laboratory environments (Mcdowell et al., 2013). For instance, the brain switches to a different operating method while humans actively behave, move, walk, and orient in ecologically-valid environments (Gramann et al., 2011). Sparse studies have devoted to explore the performance of applying a closed-loop BCI in real-world environment (Kohlmorgen et al., 2007; Blankertz et al., 2010). It remains unclear how well the current laboratory-oriented demonstrations can be translated into operational BCIs for users under their natural head/body positions, postures and movements. This translation can facilitate the use of operational BCI systems at patient's home (Sellers et al., 2010). Therefore, unveiling the brain dynamics associated with naturalistic human behaviors is of great interest and urgent in effective translational neuroscience.

A steady-state visual evoked potential (SSVEP)-based BCI falls into the category of reactive BCI that derives its outputs from the brain activity in reaction to external stimulation (Zander and Kothe, 2011). For a clear comparison, among active, passive and reactive BCIs, please see Zander and Kothe, 2011. SSVEP, along with evoked potentials, event-related potential (ERP), and sensorimotor rhythms (Wolpaw et al., 2002), is widely adopted in current active and reactive BCIs. The SSVEP signal is a frequencycoded brain response that is generated as neurons of visual cortex synchronizing their firing to the frequency of continuous, repetitive visual stimulation. As the natural characteristics of SSVEPs, electrodes placed at the occipital region over the visual cortex can measure SSVEPs with high signal-to-noise ratio (SNR; Lin et al., 2006; Wang et al., 2006; Friman et al., 2007). Herrmann (2001) reported that SSVEP amplitudes are sensitive to the frequencies of visual flickers with predominant resonance peak at 30–80 Hz. Wang et al. (2006) further explored three subsystems that existed in SSVEP resonances with a major peak around 15 Hz, followed by two other peaks at 31 Hz and 41 Hz. On the other hand, by means of assigning a unique flickering frequency to each of visual targets, several laboratory studies (Calhoun and Mcmillan, 1996; Cheng et al., 2002; Kelly et al., 2005a; Wang et al., 2006; Muller-Putz and Pfurtscheller, 2008) have successfully demonstrated that SSVEP signals can serve as a communication carrier in actuating BCI systems with advantages such as high SNR, brief user training, and less individual difference. However, the previous studies all assessed SSVEPs from stationary and tethered individuals, who were instructed to avoid gross task-irrelevant head/body movements. One can expect that when mobile BCIs are deployed to freely moving and non-tethered users in the real world, the SSVEP-based BCI systems fully based on laboratory evidences of SSVEP characteristics might suffer from the generalizability issue. To date, little is known about the dynamics of SSVEPs accompanying naturalistic movements.

The unavailability of ease-of-use EEG sensing systems that do not require application of conductive gels to the scalp has long hindered BCIs from effective real-life applications. Novel mobile EEG systems, featuring wireless telemetry and/or nonprep dry electrodes, may significantly facilitate EEG recordings during natural movements and behaviors. Several studies have proved the efficacy of using either experiment-grade (Popescu et al., 2007; Wang et al., 2010, 2011; Zander et al., 2011; Chi et al., 2012) or consumer-grade (Campbell et al., 2010; Crowley et al., 2010; Bobrov et al., 2011; Petersen et al., 2011) mobile EEG systems in fundamental researches and BCI demonstrations. Furthermore, the lightweight head-mounted display devices have gained increasing attentions nowadays and enable an easy access to multimedia content at anytime and anyplace. Once the mobile EEG system is integrated with the display device in near future, a ubiquitous BCI system functioning in our real life becomes feasible. Nevertheless, until recently only scattered studies (Debener et al., 2012; Lin et al., 2013) employed such mobile EEG sensing technology to field recording. Thus, fully testing the capability and limitations of the mobile EEG/BCI technology is necessary not only for the practical generalizability issue, but also for any demands that involve brain activity monitoring of unconstrained, freely-moving subjects performing ordinary tasks in their living environments.

This study aimed to address the feasibility of using a mobile and wireless EEG system to decode SSVEPs during steady walking. To systematically explore the effects of walking locomotion on the SSVEPs, this study instructed participants to stand or walk on a treadmill running at speeds of 1, 2, and 3 mile(s) per hour (MPH) for eliciting different degrees of head/body movements while subjects were performing visual tasks. The main focuses of offline data analyses are: (1) evaluating the SSVEP quality using a mobile EEG system; (2) assessing the impact of walking locomotion on SSVEP signals; and (3) optimizing the SSVEP detection pipeline for moving humans. This study devoted to facilitate the real-life SSVEP-based BCI applications for freely behaving humans using a mobile EEG system.

#### **MATERIALS AND METHODS**

#### **PARTICIPANTS**

Nineteen healthy participants (14 males and 5 females; 24–33 years of age; mean age, 27.11 years) with normal or corrected-tonormal vision participated in this study. UCSD Human Research Protections Program approved this study. Each participant read and signed an informed consent before the experiment.

#### **EXPERIMENT SETUP**

To evaluate the impacts of walking locomotion on EEG/SSVEP signals, this study instructed participants to walk on a treadmill with three speeds of 1, 2, and 3 MPH. Participants were asked to attentively gaze at continuous, repetitive black/white visual flickers at the frequency of 11 or 12 Hz for 60 s while walking (**Figure 1**). The frequencies of the stimuli were in the highfrequency α-band because SSVEPs in this frequency range often lead to higher classification performance than other frequency bands (Gao et al., 2003). The higher SNR in the high-frequency αband can be explained by higher SSVEP amplitudes and a concurrent suppression of spontaneous α-activities (Birca et al., 2006). In addition, the conditions of standing still on the treadmill and/or gazing at the screen with a black background were included for comparison. This study adopted the frequency approximation approach (Wang et al., 2010) to present single flicker (7.5 cm × 6.0 cm) on the center of a 19<sup>00</sup> LCD monitor with a refresh rate of 60 Hz. The monitor was placed above the treadmill control panel and adjusted so that the flicker located in the center of each participant's visual field. The participants were instructed to hold the treadmill hand grip during standing and walking, facing the monitor at a distance of 60 cm away. Each participant underwent the experiment consisting of 12 sessions (four treadmill speeds × three visual targets without counterbalancing) with a betweensession rest of 10–20 s to prevent visual and/or motor fatigue.

#### **EEG DATA ACQUISITION**

This study adopted a 32-channel EEG system (Cognionics, Inc.) featuring soft fabric dry electrodes (Chi et al., 2013) and wireless telemetry to sample EEG signals with 250 Hz. Notably, only two four-electrode straps (eight electrodes: P3, P1, P2, P4, PO3, PO1, PO2 and PO4) over the parietal and occipital areas were used in data recording. For assessing the quality of EEG signals using dry electrodes, two disposable electrodes with wet gel placed at O1 and O2 were also included. Both dry and wet electrodes were referenced to the same electrode placed at the forehead. Thus, this study used a total of 10 electrodes in the EEG recordings.

#### **OFFLINE EEG DATA ANALYSIS**

To assess the quality of SSVEPs for moving humans using a mobile EEG headset, an offline analysis was conducted to address three issues: (1) evaluating reliability and quality of SSVEP recorded by the non-prep and mobile EEG system; (2) exploring the impact

**FIGURE 1 | The illustration of experiment setup for SSVEP recordings**.

of walking locomotion on the SSVEP signal quality; and (3) optimizing the SSVEP detection pipeline during walking movements.

#### **EEG power spectral density**

For each of 19 participants, this study collected a dataset of 12 10 ch 60-s EEG segments (four treadmill speeds × three visual tasks). This study adopted a semi-automatic artifact removal procedure to remove EEG artifacts induced by motion. The EEG data were first filtered by a 1–50 Hz band-pass finite impulse response (FIR) filter with zero phase-shift to remove the DC-drifts and highfrequency artifacts. Then, transient artifacts and noisy channels accounting for walking locomotion were sequentially removed by hand. Since the number and locations of noisy dry channels might vary from one subject to another. It was difficult to find an identical pair of dry channels for all subjects to make a wetdry comparison. Alternatively, two dry electrodes closest to the wet electrodes (O1 and O2) were selected for each subject from the four parieto-occipital electrodes (PO1, PO2, PO3, and PO4). The data from seven participants were discarded for the spectrum analysis by subjective inspection. Two participants had poor signal quality at both of the two wet electrodes, and five participants had poor signal quality at the parieto-occipital electrodes. This study then applied the short-time Fourier transform (STFT) with a 250 point and 50% overlapping Hamming window to each of 60-s EEG segments to estimate the EEG spectrogram with a frequency resolution of 1 Hz. The averaged power spectral density (PSD) of each channel was derived by averaging the PSDs from different time windows. Lastly, this study employed a relative PSD, i.e., the ratio of PSD and the sum of total power (1–50 Hz), to compare the spectral characteristics in different conditions.

#### **Offline steady-state visual-evoked potential (SSVEP) analysis**

Previous SSVEP studies conducted with stationary, movementconstrained subjects have demonstrated several factors that affected the performance of SSVEP detection, including detection algorithm, data length, and channel montage (Lin et al., 2006; Wang et al., 2006; Bin et al., 2009). The offline SSVEP analysis of this study aimed to explore the effects of these factors on the much challenging datasets and explore an optimal dataprocessing pipeline for detecting 11 and 12 Hz SSVEPs collected from freely moving subjects in a naturalistic environment. First, this study implemented and compared PSD-based analysis (PSDA) and canonical correlation analysis (CCA; Lin et al., 2006) algorithms commonly used in SSVEP-based BCIs. Second, to evaluate the optimal data length, each of eight 10-ch 60-s visualinduced EEG trials (four speeds × two flickering frequencies) was then segmented into non-overlapping *N*-s epochs (*N* = 1–5) for comparison. Third, this study tried to explore the optimal channel montage from eight dry electrodes for each detection method.

PSDA is the most widely used frequency-detection method in early BCI implementations. PSD is estimated within a given time window of EEG data. The PSDA method decides the frequency of an SSVEP signal according to the peak of spectral amplitude. This study used the PSD values at 11 and 12 Hz as features for target identification. The frequency with higher PSD value was considered as the target frequency. Using prolonged EEG data for deriving the spectra can increase the SNR (Wang et al., 2006) and thereby improve the SSVEP detectability (Lin et al., 2006; Bin et al., 2009). Since the PSDA method can be conducted on a single channel or bipolar channels, the advantages of low computational cost and less electrode requirement lead to an irreplaceable role in BCI applications. The STFT with a non-overlapping 250-point Hamming window was applied to *N*-s EEG epochs to estimate the PSD over time with frequency resolution of 1 Hz. This study adopted a bipolar-channel montage for the PSDA calculation towards better SNR. In an optimal bipolar measurement of SSVEPs, most of the spontaneous background activities in the two electrodes are eliminated while the SSVEP component is retained (Wang et al., 2006). Notably, since the optimal channel montage may vary by subject, this study performed an exhaustive search for optimal bipolar channels, based on the criterion of maximal frequency detection performance, from the eight dry electrodes for each subject.

Unlike the frequency-based PSDA method, CCA is a multivariate statistical method that aims to maximize the correlation between the linear combination of multichannel EEG signals and the combination of sinusoidal templates (sine and cosine waves for automatic phase adjusting) corresponding to the targeted flickering frequencies (Lin et al., 2006; Bin et al., 2009). The SSVEP frequency is determined according to the maximal canonical correlation among the predefined template frequencies. For example, the CCA method returns the SSVEP frequency of 11 Hz if the correlation coefficient between the measured

signals and the 11 Hz template is larger than that between the measured signals and the 12 Hz template. The CCA calculation that uses channel covariance information has been suggested to return SNR-enhanced SSVEP signals. Unlike PSDA, CCA does not require channel selection and its multivariate statistical analysis makes it capable of improving the SNR of SSVEPs through spatial filtering. Note that CCA calculation in this study only relied on the fundamental frequency of template signals, because previous study has shown the inclusion of harmonics did not significantly improve the SSVEP detection (Bin et al., 2009). In addition, the CCA calculation was conducted on several montages from eight dry channels for comparison, including using all channels (eight-Ch), four parietal channels (P-4Ch: P3, P1, P2 and P4), four parieto-occipital channels (PO-4Ch; PO3, PO1, PO2 and PO4), two lateral parieto-occipital channels (LPO-2Ch; PO3 and PO4), and two inferior parieto-occipital channels (IPO-2Ch; PO1 and PO2). Note that the channel montage IPO-2Ch that is closed to the wet electrodes (O1 and O2) was used to perform the wet-dry electrode comparison.

To perform the CCA-PSDA comparison in a realistic online fashion (Lemm et al., 2011), this study selected an optimal bipolar channel for PSDA by estimating detection accuracy with a twofold cross validation. The training trials were only used to perform the exhaustive channel search for PSDA, whereas the test trials were adopted to calculate frequency detection performance.

In sum, this study systematically performed both PSDA and CCA methods on *N*-s EEG epochs with different channel montages. The SSVEP frequency was calculated according to the maximal PSD value (in PSDA) and correlation coefficient (in CCA) between 11 Hz and 12 Hz. This study aimed to explore an optimal pipeline for improving SSVEP detectability in moving humans. The detectability is the percentage of correctly detected epochs in frequency detection and was only calculated in the sessions in the presence of visual flickers (11 Hz and 12 Hz). The conditions without visual stimuli were only used for evaluating EEG spectral fluctuations irrelevant to visual stimulation.

#### **RESULTS**

#### **EEG SPECTRAL FLUCTUATIONS ASSOCIATED WITH DIFFERENT WALKING SPEEDS**

An attempt of this study is to assess whether or not a mobile EEG system featuring dry electrodes is capable of acquiring laboratoryquality EEG signals in moving humans. To this end, this study performed the wet-dry electrode comparison using spectral characteristics associated with standing and walking locomotion. This study employed the analysis of variance (ANOVA) to reveal the impact of different walking speeds (standing, 1 MPH, 2 MPH, and 3 MPH) on spectral changes along frequency (1–50 Hz). **Figure 2** depicts EEG spectral fluctuations associated with different walking speeds using dry and wet electrodes. As subjects started walking, both types of electrodes presented comparable tendencies in α (8–13 Hz) suppression compared to standing still (black solid line). Walking speed more and less positively correlated with the degree of α-suppression. There was a statistically significant αsuppression (*p* < 0.05) at 11 and 12 Hz for both electrodes. The walking-related α-suppression was reproduced when subjects gazed at visual flickers during walking. In standing condition, both dry and wet electrodes detected resonance peaks at the stimulus frequencies (11 and 12 Hz) and the second harmonics (22 and 24 Hz). The third harmonic was only evident in the 12 Hz condition. The SSVEP amplitudes at the fundamental frequencies measured by both types of electrodes dropped significantly during

walking (*p* = 0.05). Regardless of the presence or absence of visual flickers, either dry or wet electrodes exhibited a monotonic power increase at 2 Hz as walking speed increased.

**Figure 3** portrays the trend of the spectral changes at 2, 11 and 12 Hz at different walking speeds. In general, for either dry or wet electrodes faster walking locomotion accompanied a progressive spectral increase at 2 Hz, but a monotonic decrease at 11 and 12 Hz, regardless of the presence or absence of visual tasks. A *t*test was performed to compare the mean spectral power between walking speeds. The results showed that in most of the cases the walking speed increased by two or plus miles per hour, e.g., from standing to 2 MPH or to 3 MPH, would lead to a statistically significant spectral differences (*p* < 0.05). Only dry electrodes measured a significant 2 Hz spectral augmentation at 3 MPH versus standing.

#### **OFFLINE SSVEP ANALYSIS**

The offline SSVEP analysis aimed to not only evaluate the feasibility of using a mobile EEG system to acquire SSVEP signals, but also explore the optimal parameters for SSVEP detection in moving humans. Several analyses were performed with an emphasis on: (1) SSVEP detectability in dry versus wet electrodes; (2) optimal electrode montage; (3) SSVEP detection algorithm; and (4) frequency sensitivity in SSVEP detection (11 vs. 12 Hz).

**Figure 4** shows the SSVEP detectability using different epoch lengths at different walking speeds. In general, SSVEP detectability was improved with prolonged EEG epoch under different walking speeds, and the detectability declined as walking speed increased. Specifically, **Figure 4A** shows the wet-dry comparison of CCA-based SSVEP detectability, i.e., wet electrodes (O1 and O2) vs. adjacent dry electrodes (PO1 and PO2). The results indicated that the detectability using wet electrodes (solid line) outperformed that using dry electrodes (dotted line) by at least 10% with different epoch lengths for the standing condition. SSVEP detectability decayed as walking speed increased from 1 to 3 MPH for both electrode types. The detectability decay was more evident in wet electrodes, leading to around 5% decrease per MPH increase, making wet and dry electrodes competitive at higher walking speeds. **Figure 4B** systematically assesses the CCA-based SSVEP detectability using different montages of dry electrodes. The result showed that using more channels (from 2 to 8) in general improved SSVEP detectability along different epoch lengths and under different walking speeds, except for the montage of using four parietal channels (P-4ch, blue dash line). The maximal accuracy was obtained by using 8 channels at any given walking speed, followed by using four parieto-occipital channels (PO-4ch, blue dotted line), two inferior channels (PO1 and PO2, pink dotted line) and lateral channels (PO3 and PO4, pink dashed line) of the parieto-occipital strap, and four parietal channels (P-4ch, blue dashed line). Interestingly, both 2 ch montages returned comparable or even better results than the montage of four parietal channels. The SSVEP detectability tended to decrease as walking speed increased no matter how many channels were involved in the analysis. **Figure 4C** illustrates the CCA-PSDA comparison in SSVEP detectability based on eight dry electrodes. The profiles along different epoch lengths showed that CCA apparently outperformed PSDA under all walking speeds. Lastly, **Figure 4D** shows the frequency sensitivity in SSVEP frequency detection (11 vs. 12 Hz) using the 8-ch CCA method under different walking speeds. The result indicated that the SSVEP detectability at 11 Hz (dotted line) was clearly higher than 12 Hz (dashed line) until the speed reached 3 MPH. The SSVEP detectability of 11 and 12 Hz was nearly identical during fast walking (at 3 MPH).

**Figure 5** overviews the impacts of different data lengths in 8-ch EEG epochs on the CCA-based SSVEP detectability. The result indicated that although the detectability improved using longer data epoch, there was no statistically significant difference (*p* > 0.05) after adopting epoch length longer than 3 s across all walking speeds. The use of 8-ch 3-s EEG epochs (solid line) in CCA obtained accuracy of 84.87 ± 13.55% for standing, which declined as subjects started walking (1 MPH: 83.03 ± 13.24%, 2 MPH: 79.47 ± 13.53%, and 3 MPH: 75.26 ± 17.89%).

#### **DISCUSSION**

Most of BCI demonstrations were conducted within wellcontrolled settings where tethered subjects had highly restricted movements. It remains unclear how well the laboratory-oriented demonstration can be translated into operational BCIs for users situated in real environments. This study aimed to assess the applicability of using a mobile EEG system to decode SSVEP signals in moving humans. The results showed that although the SSVEPs began to deteriorate while subjects engaged in faster

walking locomotion, the obtained detectability from a conceptual BCI paradigm showed its potential in naturalistic environments outside highly controlled laboratory environments. Most importantly, this study found that the targeted brain responses that serve as BCI channels (e.g., SSVEPs in this study) would be more and less susceptible while human are actively behaving in reallife environments. This evidence confirmed that brain dynamics might behave distinctively in natural environments versus laboratory environments (Mcdowell et al., 2013).

Accordingly, prior to deploy a real-life mobile BCI, the desired BCI channel should be fully explored and characterized beyond the laboratory settings.

#### **USING A MOBILE EEG HEADSET FOR MOVING HUMANS**

This study aimed to elucidate whether or not the non-prep, dry electrode and the mobile EEG headset provide acceptable quality of SSVEP signals in moving humans. To clarify this issue, this study used two wet electrodes placed at O1 and O2 for comparison. As shown in the wet-dry detectability comparison (c.f. **Figure 4A**), despite the accuracy using both electrode types deteriorated as the walking speed increased, the wet electrode tended to produce better accuracy for standing (by 10%) and different walking speeds (by 4%) using different epoch lengths. However, it is worth mentioning that due to the non-identical channel locations (wet: O1 and O2; dry: PO1 and PO2) used in the comparison, the detectability gap might not be fully attributed to the electrode types. It could be partially attributed to the fact that the occipital electrodes over the visual cortex have better SNR than those at the parieto-occipital areas (Wang et al., 2006; Lin et al., 2012). The 4-ch comparison (c.f. **Figure 4B**) also mirrored this phenomenon. That is, the parieto-occipital strap (PO-4ch) significantly outperformed the parietal strap (P-4ch) by 4–13% across different walking speeds and different epoch lengths. In addition, one might argue that the signal deviation of dry electrodes might be more vulnerable to movement interference (Guger et al., 2012). In our study, the dry electrodes tended to be significantly affected by 2 Hz artifacts during fast walking. The SSVEP fluctuations (11 and 12 Hz) measured by both electrodes under different walking speeds were comparable (c.f. **Figure 3**). Considering the practical factor such as ease-ofuse for BCI users, as well as acceptable performance derived from multiple channels for moving humans (c.f. **Figure 4B**), using a mobile EEG system (dry, non-prep sensors) to record EEG/SSVEP signals under hostile recording settings should be feasible and practical for real-life BCI applications.

#### **SPECTRAL DYNAMICS ASSOCIATED WITH WALKING LOCOMOTION**

The SSVEP signals (11 and 12 Hz) in this study were found to progressively decrease as the walking speed increased from 0 (standing still) to 3 MPH (c.f. **Figures 2**, **3**). Two factors might contribute to the deterioration of SSVEPs in walking locomotion. First, the SSVEPs targeted within the α-range (8–13 Hz) might be highly constrained by the α-suppression attributed to the transition from idling to alert state. The people who are awake and engage no processing of sensory input and motor execution typically exhibit dominant 8–12 Hz resting EEG activity, called idling activity. One major idling activity, the α-rhythm over the visual cortex, can be inhibited by the increase of visual processing during walking (Williamson et al., 1997). The behavior of the idling activity may very likely explain in part the resulting occipital α-attenuation in this study. This study further explored that the level of deterioration was positively correlated with the intense and speed of walking locomotion, generally resulting in a significant drop while speeding the walking steps, especially for dry electrodes (c.f. **Figure 3**). Since the SSVEP signal is assumed to arise from stimulus-induced phase resetting of ongoing EEG oscillations (Sauseng et al., 2007), it is reasonable to assume that the suppression of spontaneous α-rhythm led to reduced SSVEP amplitudes during fast walking. Second, participants reported certain visual distraction while keeping up the movement of the treadmill, especially for the speed of 3 MPH. Since visual spatial attention plays an important role in modulating the SSVEP magnitude (Morgan et al., 1996; Kelly et al., 2005a; Lin et al., 2012), the suppression of SSVEP signals could be also in part attributed to the loss of visual focus from the flickering stimulus and/or rapid bounce of visual focus due to head nodding. However, the result of this study was limited to further differentiate these two factors in the SSVEP suppression for moving humans.

Another interesting finding related to walking locomotion was the spectral augmentation at 2 Hz. The 2 Hz power tended to monotonically increase as subjects started walking on the treadmill (c.f. **Figures 2**, **3**), especially for dry electrodes, which might be more sensitive to motion artifacts. The head movement accompanying natural walking might explain this phenomenon. Our very recent study (Lin et al., 2013 under review) had demonstrated that the head movement especially for walking at 3 MPH majorly engaged an intense 2 Hz head nodding (recorded by a vertical gyroscope sensor). This 2 Hz head-nodding movement swayed the EEG headset, encapsulating cables and circuitry, and therefore yielded low-frequency drifts in EEG signals. Fortunately, the 2 Hz headset-swaying due to head nodding accompanied by gait cadence (tested up to 3 MPH in this study) did not deteriorate the quality of SSVEPs (11 and 12 Hz). However, it might considerably contaminate ERP signals, which are widely used in ERP-based BCIs (Wolpaw et al., 2002).

#### **OPTIMAL PARAMETERS FOR SSVEP DETECTION IN MOVING HUMANS**

Several factors including data length, channel montage, decoding method, and SSVEP resonant frequency were reported to affect the performance of SSVEP-based BCIs. However, the previous comparative studies were all conducted on stationary subjects within laboratory settings. This study compared the effects of these critical factors on SSVEP detection in a hostile recording condition (e.g., walking). The goal of this study was not only to test whether or not previous statements on SSVEP parameters remain valid, but also to explore an optimal procedure for detecting SSVEPs in moving humans.

First, as revealed in **Figures 4**, **5**, using prolonged epoch length improved the SSVEP detectability consistently under different walking speeds, which was in line with the previous studies (Lin et al., 2006; Wang et al., 2006; Bin et al., 2009). This was attributed to the fact that applying longer EEG data to spectrum estimation can enhance the SNR of SSVEPs and thereby increase its detectability (Lin et al., 2006; Wang et al., 2006; Bin et al., 2009). Second, regarding the montage selection (c.f. **Figure 4B**), by comparing the detectability using different montages (P-4Ch vs. PO-4Ch, LPO-2Ch vs. IPO-2Ch), electrodes placed toward the central occipital cortex improved SSVEP detection. The above findings were reasonable as it is in accordance with the fact that the cortical sources of SSVEPs mainly localize in primary visual cortex (V1) and in the motion sensitive areas (V5), along with minor contributions from mid-occipital (V3A) and ventral occipital (V4/V8) areas (Di Russo et al., 2007). V1 is specialized for processing information about static and moving objects. Adopting IPO-2Ch montage directly probed the V1 activation and might provide more informative signals compared to other sites. In addition, more channels covering the entire visual cortex enhanced detecting the SSVEP signals (Friman et al., 2007). CCA is a multivariate statistical method that determines the SSVEP frequency by maximizing the correlation coefficient of multichannel EEG signals and targeted reference signals. Applying CCA to multichannel SSVEP signals thus can improve SNR of SSVEP and benefit the SSVEP detection (Lin et al., 2006; Bin et al., 2009). Previous CCA studies performed on data collected from stationary subjects (Lin et al., 2006; Bin et al., 2009) reported that the CCA method significantly outperformed the PSDA method, which supported our findings in the CCA-PSDA comparison. Last, as explored in **Figure 4D**, decoding 11 versus 12 Hz SSVEP predominantly contributed to the overall detectability until walking speed reaching 3 MPH. This result indicated that the SSVEP detectability of moving humans was vulnerable to the resonant frequencies of visual flickers, which was consistent to the findings in stationary subjects (Herrmann, 2001; Kelly et al., 2005b; Wang et al., 2006; Lin et al., 2012).

To conclude, the SSVEP findings under the standing condition, i.e., movement-constrained, were comparable with the previous studies with stationary (and seated) subjects. This study further explored the SSVEP dynamics in subjects walking steadily on the treadmill from 1 to 3 MPH. The SSVEP detectability tended to progressively deteriorate as walking speed increased no matter what channel montage, detection method, and flickering frequency was used. Although longer EEG epoch did improve the detectability, it could reduce the practicality of an on-line BCI system by decreasing information transfer rate (ITR), an index for evaluating BCI performance, which correlates positively to detection accuracy but negatively to decision time (Wolpaw et al., 2002). In addition, this study found that an epoch length exceeding 3 s did not significantly improve the detectability for moving subjects. Accordingly, taking account of the montage generalizability, using 3-s 8-ch EEG data to the CCA decoder might be an optimal procedure to detect SSVEP signals in moving humans. Such protocol yielded acceptable accuracies of 75% ∼ 83% in distinguishing binary SSVEPs (11 Hz and 12 Hz) for walking speeds below 3 MPH, compared to standing (84.87 ± 13.55%). The empirical findings of this study not only explored inherit characteristics and limitations of SSVEP of freely moving participants under realistic environments, but also boosted the development of conceptual BCI paradigms that can be further translated to practically feasible systems.

#### **IMPLEMENTATION OF AN ONLINE BCI**

The offline classification used in this study demonstrated the feasibility of a conceptual SSVEP BCI during walking. To implement an online BCI, the following major issues need to be addressed: (1) multiple stimuli with different flickering frequencies need to be presented simultaneously on the screen; (2) the data processing procedures such as band-pass filtering must be causal and fast to satisfy real-time implementation; (3) automatic selection of parameters such as electrodes and data length; and (4) visual or auditory feedbacks need to be provided to the subjects in near real time. These issues can be resolved using the existing methodologies developed in current SSVEP BCIs (Wang et al., 2006, 2011; Bin et al., 2009).

The "loss of focus" is a major challenge in building an online SSVEP-based BCI during walking. As discussed above, the deterioration of SSVEP amplitude during walking could be in part attributed to the loss of focus. A further challenge in an online BCI is to eliminate the interference among multiple targets caused by loss of focus. On one hand, increasing the distance between neighboring stimuli can reduce the interference between stimuli in the central and peripheral visual fields. On the other hand, a wearable stimulator (e.g., head-mounted display) may be used to facilitate fixation during walking.

#### **FUTURE DIRECTIONS**

Future efforts in decoding SSVEPs for freely moving humans can be devoted to elicit SSVEPs outside the α-frequency band, which is subject to the changes of visual processing during walking. Several studies have reported that the SSVEP resonance appeared at higher frequency band up to γ-band (30–50 Hz) (Herrmann, 2001; Wang et al., 2006; Lin et al., 2012). In addition, one future work is to replicate the treadmill experiment in which the visual stimuli will be presented through a head-mounted display device. This might help to elucidate the α-suppression attributed to the loss of visual attention and the engagement of walking locomotion. More importantly, the integration of a mobile EEG headset and a head-mounted display device might help to establish ubiquitous mobile BCI systems in ecologically valid environments. Similar to the SSVEP signals, the visual focus also strongly influences ERP amplitudes and in turn affects the performance of a gaze-dependent BCI speller (Treder and Blankertz, 2010). Another direction is to incorporate the gazeindependent paradigms (Treder et al., 2011; Riccio et al., 2012), which are applicable to patients with oculomotor impairments, to solve this issue using the same mobile settings.

#### **AUTHOR CONTRIBUTIONS**

Conceived and designed the experiments: Yijun Wang. Performed the experiments: Chun-Shu Wei, Yuan-Pin Lin. Analyzed the data: Yuan-Pin Lin, Yijun Wang, Tzyy-Ping Jung. Wrote the paper: Yuan-Pin Lin, Yijun Wang, Tzyy-Ping Jung.

#### **ACKNOWLEDGMENTS**

This work was supported by Office of Naval Research (N00014- 08-1215), Army Research Office (under contract number W911NF-09-1-0510), Army Research Laboratory (under Cooperative Agreement Number W911NF-10-2-0022), and DARPA (USDI D11PC20183).

#### **REFERENCES**


S. D'mello, A. Graesser, B. Schuller and J.-C. Martin (Berlin Heidelberg: Springer), 317–318.


**Conflict of Interest Statement**: The research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 04 November 2013; accepted: 11 March 2014; published online: 31 March 2014.*

*Citation: Lin Y-P, Wang Y, Wei C-S, and Jung T-P (2014) Assessing the quality of steady-state visual-evoked potentials for moving humans using a mobile electroencephalogram headset. Front. Hum. Neurosci. 8:182. doi: 10.3389/fnhum.2014.00182 This article was submitted to the journal Frontiers in Human Neuroscience*.

*Copyright © 2014 Lin, Wang, Wei and Jung. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# A comparison of geometric- and regression-based mobile gaze-tracking

#### *Björn Browatzki 1, Heinrich H. Bülthoff 1,2\* and Lewis L. Chuang1 \**

*<sup>1</sup> Department of Perception, Cognition and Action, Max Planck Institute for Biological Cybernetics, Tübingen, Germany*

*<sup>2</sup> Department of Brain and Cognitive Engineering, Korea University, Seoul, South Korea*

#### *Edited by:*

*Klaus Gramann, Berlin Institute of Technology, Germany*

#### *Reviewed by:*

*Martin Rolfs, Humboldt Universität zu Berlin, Germany (in collaboration with Carlos Cassanello) Sebastian Pannasch, Technische Universität Dresden, Germany*

#### *\*Correspondence:*

*Heinrich H. Bülthoff and Lewis L. Chuang, Department of Perception, Cognition and Action, Max Planck Institute for Biological Cybernetics, Spemannstrasse 41, 72076, Tübingen, Germany e-mail: heinrich.buelthoff@ tuebingen.mpg.de; lewis.chuang@tuebingen.mpg.de*

Video-based gaze-tracking systems are typically restricted in terms of their effective tracking space. This constraint limits the use of eyetrackers in studying mobile human behavior. Here, we compare two possible approaches for estimating the gaze of participants who are free to walk in a large space whilst looking at different regions of a large display. Geometrically, we linearly combined eye-in-head rotations and head-in-world coordinates to derive a gaze vector and its intersection with a planar display, by relying on the use of a head-mounted eyetracker and body-motion tracker. Alternatively, we employed Gaussian process regression to estimate the gaze intersection directly from the input data itself. Our evaluation of both methods indicates that a regression approach can deliver comparable results to a geometric approach. The regression approach is favored, given that it has the potential for further optimization, provides confidence bounds for its gaze estimates and offers greater flexibility in its implementation. Open-source software for the methods reported here is also provided for user implementation.

**Keywords: calibration method, gaze measurement, eye tracking, eye movement, active vision, gaussian processes**

#### **1. INTRODUCTION**

Using gaze-tracking methods, it is possible to record where someone is looking on a visual display. Such methods facilitate the continuous observation of natural behavior, such as reading or visual search. In the context of electroencephalography (EEG) research, it allows neural activity to be co-registered with a visual stimulus that the participant chose to fixate (Baccino and Manunta, 2005; Jagla et al., 2007).

Unfortunately, accurate gaze-tracking often requires the participant's head and body movements to be restrained, for example, with a head-rest. As a consequence, the eye's position is fixed in a global reference frame and accurate gaze-tracking can be achieved by tracking only the rotations of the eye. This can be achieved either by tracking the induction current of a coil that is placed on the eye itself (Robinson, 1963; Collewijn et al., 1985) or with video-based eye-trackers, which utilize either head-mounted or long-range cameras to monitor characteristic visual features of the eye (i.e., pupil, corneal reflection). Video-based methods are non-invasive and are, thus, more comfortable to the user and suitable for studying natural behavior for longer test sessions. During calibration, visual stimuli (e.g., 0.5◦ radius annulus) are presented at extrema points on the display for fixation. By interpolating between the pupil position in the eye-tracker's camera image, it is possible to infer the observer's point of regard (POR) between these extreme screen positions. If the physical distance of the observer's eyes to these calibrated points are known, it is possible to infer the vertical and horizontal rotations of the observer's eye in a head-centered coordinate system (Nakayama, 1974; Moore et al., 1996).

If the observer's head pose is known (i.e., combined position and orientation), this geometric approach can be extended to compute gaze without restraining head movements (Epelboim et al., 1995; Johnson et al., 2007; Ronsse et al., 2007). Continuous measures of a user's head pose can be achieved with motion tracking systems. Such systems range from off-the-shelf markerless motion-tracking systems (e.g., Microsoft's Kinect) to those that track well-placed infra-red reflective markers on the user's body with a high level of precision (e.g., Vicon Motion Systems). The critical step lies in deriving the transformation matrix that expresses the eye model, which is calibrated in an eye-centered reference frame, in terms of the global reference frame that the user and task relevant objects share (see section 2.2). This defines a line-of-sight. Namely, a gaze vector that consists of the eye's origin and direction. If an accurate model of the display (and/or other real-world objects) in the same global reference frame is known, intersections between the current line-of-sight and the screen coordinates of the display can be easily computed.

There are several limitations to this geometric approach. On the one hand, it requires an accurate model of the display as well as of the obsever's eye. Such models are often represented as idealized geometric objects and their interdependence must be explicitly stated as linear algebraic formulations. These formulations do not consider intrinsic error through non-linearities and inaccurate measurements during the calibration phase. For example, there might be small but systematic displacements of headmounted eye-tracking cameras due to tension of the forehead muscles when fixating peripheral targets (e.g., >15◦). This would cause non-linearities in the eye-model that are rarely accounted for. Finally, the geometric approach assumes that the vector of the user's gaze accurately intersects with the POR during the calibration phase. In reality, gaze stability is likely to vary across individuals and different activities, regardless of compensatory eye movements (e.g., vestibular ocular reflex; Medendorp et al., 2002). Even if gaze fixation can be assumed to be perfectly stable by minimizing head and body movements during calibration, this may not be the case during testing. Altogether, these small residual errors could accumulate and result in a significant combined error. In fact, the calibration accuracy of the eye-tracker is especially critical in a geometric-based system since this is the only aspect that can be controlled by the experimenter during data-collection. Therefore, it is often repeated until an acceptable level of error is achieved. If this is not possible, the experiment is aborted.

In contrast to the geometric approach, a purely data-driven regression approach could enable data from the motion- and eyetracker to be directly mapped to the desired coordinates for POR. For example, the screen coordinates of the display(s) or object. This mapping can be inferred from training data without the need for any domain specific knowledge. In addition, system error or unanticipated behavioral singularities need not be explicitly specified as they will be implicitly incorporated in the model. Such an approach does not attempt to geometrically reconstruct the line-of-sight. However, data-driven methods suffer from the fact that outputs are highly dependent on the training data. This means that they can only be as accurate as the data provided during calibration. And, they require behavior in the calibration phase to resemble expected behavior during testing. This may require an inordinate amount of training data, translating into a impractically long calibration phase. Nonetheless, it grants the experimenter the flexibility (and responsibility) of designing the calibration task so as to solicit looking behavior that best generalizes to the test conditions. Finally, a regression method will not only provide an estimate of the POR, but an associated confidence level as well. This can be obtained prior to experimentation and would determine if more data is required for further calibration. It can also be used to filter out unreliable PORs from the test data.

The purpose of the current work is threefold. First, it provides a comparison of a geometric and a regression approach to mobile gaze-tracking. To evaluate both methods, we adopted a calibration–validation protocol—a procedure that is common to most commercial eye-tracking systems. Data from a single user is first processed with one calibration method and then validated in terms of its accuracy in determining the user's gaze on known PORs. Therefore, our reported results should provide readers with a practical intuition of the data quality that can be expected when using either a geometric or a regression method. Previous reports on mobile gaze-tracking restricted their analyses to standing participants with unrestrained head movements (e.g., Ronsse et al., 2007; Cesqui et al., 2013). Here, we included a previously unreported condition that required our participants to walk freely. Second, we address how the procedure for collecting calibration data can influence the validation accuracy of either method. For this purpose, we collected datasets in two situations. Participants either fixated an unpredictable sequence of static markers (cf., Johnson et al., 2007) or pursued a moving marker (cf., Cesqui et al., 2013). Our algorithms were trained on either type of dataset and validated on the same or different type of dataset. Third, we provide the approaches reported in this paper as an open-source software toolbox to allow other researchers to implement the methods reported here in their own test environments and adapt them to their specific needs. Some variations of the geometric approach have been reported before (e.g., Epelboim et al., 1995; Johnson et al., 2007; Ronsse et al., 2007; Cesqui et al., 2013). Our implementation represents a general version of these methods and does not rely on specific equipment or assumptions. For example, we do not assume a particular geometric model of user's eye and head. It should be noted that our implementation is only intended for the retrieval of a mobile user's POR. It does not offer the level of spatial and temporal precision required for the study of gaze kinematics. For this, a scleral search-coil method should be employed instead.

This paper is organized as follows. Section 2 provides a systematic description of the geometric and the regression methods that we implemented for mobile gaze-tracking. Excellent textbooks are available that provide a comprehensive coverage of the basics of eye-tracking methodology as well as details of various implementations, and discussion of their relevance to behavioral research (i.e., Duchowski, 2007; Holmqvist et al., 2011). Section 2.4 reports a side-by-side evaluation of our geometric and regression methods. Three levels of user mobility were tested: (a) head-fixed, (b) head-free, (c) walking. The evaluations also explored instances where our regression method fared poorly, so as to highlight the limitations of this approach. We conclude by discussing the strengths and limitations of using either approach.

#### **2. MATERIALS AND METHODS**

#### **2.1. IMPLEMENTATION AND SYSTEM OVERVIEW**

In this section, we describe a geometric and a regression-based method for mobile gaze-tracking. These are publicly available as open-source software for mobile unrestrained gaze-tracking (*MUG*; https://bitbucket.org/browatbn/mug). Both methods require a motion tracking system and a head-mounted video eye-tracker for input data (**h**, **p**). The motion-tracking system provides the position and orientation of the user's head in a world coordinate system, which is collectively referred to as its pose, **h** = (*hx*, *hy*, *hz*, *h*φ, *h*<sup>θ</sup> , *h*<sup>ψ</sup> ). The eye-tracker provides the 2-dimensional position of the user's pupil in the camera image, **p** = (*px*, *py*). The output of both methods is the user's POR, given as the horizontal and vertical coordinates of our screen model, (*u*, *v*). Although we assumed a planar surface for the current evaluation, this could be replaced by models with other display configurations (e.g., a curved screen), without modification of the core calibration algorithms *per se*. To the best of our knowledge, our methods do not depend on any proprietary algorithms of the chosen hardware systems, ensuring the generalizability of our methods to other hardware systems.

Sections 2.2 and 2.3 provide an overview of the algorithms on which our geometric and regression implementations are based. **Figure 1** provides a flowchart of the underlying processes of each method. Our geometric implementation operates by deriving the optimal parameters for a head-to-eye transform model (*T* ), an eye-in-head model (*M*) and a screen model (*S*) from eye- and motion-tracker data that is collected during the calibration phase. In section 2.2, we describe these three models separately, before

addressing how these models are simultaneously calibrated on the input data of a mobile user from the motion- and eye-tracker. Our regression-based implementation relies on Gaussian process regression, which estimates the best fitting multi-variate Gaussian distribution that directly maps input data from the motiontracking system and the eye-tracker to screen coordinates in the display.

#### **2.2. GEOMETRIC APPROACH**

The geometric approach treats gaze as a vector in space that is jointly defined by the position and orientation of the eye in space **e** and the eye's rotation about its horizontal and vertical axes, φ and θ, respectively. However, a video-based eye-tracker can only provide estimates of the eye's rotations about its center. In addition, a motion-tracking system can only provide the position and orientation of the tracked markers, which have an unknown position and orientation offset to the center of the eye depending on their placement on the user's head. Thus, calibration consists of deriving the optimal parameters for a head-eye-transformation model (*T* ) and an eye-in-head model (*M*), based on input data that is collected from the eye- and motion-tracking system when the user is fixating known positions in space. These fixations are typically elicited by requiring the user to fixate a sequence of annuli on a visual display. If unknown, a physical representation of the visual display *S* can also be estimated from the input data, given the shape parameters of the visual display and the assumption that the user is accurately fixating the presented stimulus.

#### *2.2.1. Head-eye-transform model*

The head-eye-transform model *T* derives the eye's pose in the world coordinate system **e** from the motion-tracking data, which provides an estimate of the head's pose in the world coordinate system **h**. This transformation is affected by the user's anthropomorphic characteristics as well as the placement of the tracking markers on the user's head. These parameters δ*<sup>T</sup>* have to be estimated from calibration data.

$$\mathbf{e} = \mathcal{T}(\mathbf{h}; \delta\_T) \tag{1}$$

Given that the eye is located at a fixed position (*x<sup>T</sup>* , *y<sup>T</sup>* ,*z<sup>T</sup>* ) relative to the position of the motion-tracking markers, which are attached to the user's head (*hx*, *hy*, *hz*), and has a orientation of (θ*<sup>T</sup>* , φ*<sup>T</sup>* , ψ*<sup>T</sup>* ), δ*<sup>T</sup>* defines the affine transformation from the head-centered reference frame to the user's eye-centered reference frame:

$$\delta\_T = (\mathbf{x}\_T, \mathbf{y}\_T, \mathbf{z}\_T, \theta\_T, \phi\_T, \psi\_T) \tag{2}$$

The eye position (*ex*,*ey*,*ez*)*<sup>T</sup>* is defined by a rotation of the eye's position offset (*x<sup>T</sup>* , *y<sup>T</sup>* ,*z<sup>T</sup>* ) around the tracked head position (*hx*, *hy*, *hz*). The superscript *T* is used to indicate the transpose of a matrix or vector. This rotation is specified by the head's orientation, expressed as a rotation matrix *R***h**:

$$(\mathbf{e}\_{\mathbf{x}}, \mathbf{e}\_{\mathbf{y}}, \mathbf{e}\_{\mathbf{z}})^T = \mathbf{R\_h}(\mathbf{x}\_T, \mathbf{y}\_T, \mathbf{z}\_T)^T + (h\_{\mathbf{x}}, h\_{\mathbf{y}}, h\_{\mathbf{z}})^T \tag{3}$$

We express the eye orientation in the form of a rotation matrix *R***e**. This is calculated by multiplying the current head orientation matrix *R***<sup>h</sup>** with the rotation matrix *R<sup>T</sup>* , which is defined by the rotational components (θ*<sup>T</sup>* , φ*<sup>T</sup>* , ψ*<sup>T</sup>* ) of the head-to-eye transformation:

$$R\_{\mathbf{e}} = R\_{\mathbf{h}} R\_{T} \tag{4}$$

Thus, *R***<sup>e</sup>** represents the transformation from the tracked head orientation to the orientation of the eye-centered reference frame in the world coordinate system.

#### *2.2.2. Eye model*

The eye-tracking camera captures a pupil image and from this an eye model *M* is necessary to map the pupil's centroid position in the camera image, *px*, *py*, to the rotations of the eye, φ, θ, about its center:

$$\mathcal{A}(\phi, \theta) = \mathcal{M}(p\_{\text{x}}, p\_{\text{y}}; \delta\_{\mathcal{M}}) \tag{5}$$

This mapping is determined by position, size and orientation of the eye with respect to the camera's image plane. The parameters that are necessary to calculate this mapping are denoted as δ*<sup>M</sup>* and depends on the assumed relationship between the recorded eye and the obtained camera image. For example, an established model by Moore et al. (1996) assumes the pupil to be the center of a plane section (i.e., the iris) that is located on a perfect sphere at a fixed distance from the eye's centroid. Here, the pupil location in the camera image is treated as a perspective projection of the eye onto the image plane (see Cesqui et al., 2013 for a treatment of the pupil image as an orthographic projection instead).

In the current work, we assumed a linear correlation between the pupil's image positions and their corresponding rotation angles of the eye. This is expressed as linear models in Equations (6, 7).

$$
\phi = m\_{\phi} p\_x + b\_{\phi}, \tag{6}
$$

$$
\theta = m\_{\theta} p\_{\circ} + b\_{\theta}, \tag{7}
$$

The parameters *m* and *b* are fitted to eye-tracking data obtained in a calibration procedure. This is explained in more detail in section 2.2.4. This approximation is motivated by computational efficiency and is a reasonable assumption, if δ*<sup>T</sup>* is chosen appropriately. Doing so allows us to compute the model parameters *m* and *b* with a simple linear regression. We implemented the more complex model of Moore et al. (1996) but did not find a significant difference between the two eye models with respect to our evaluations. Both models are available in our software implementation.

#### *2.2.3. Screen model*

The screen model *S* provides a mapping between the display's 2D screen coordinates and the three-dimensional Cartesian coordinates of the same display.

$$\mathcal{S}(u,\nu) = \mathcal{S}(\mathbf{e}, \phi, \theta; \delta\_{\mathcal{S}}),\tag{8}$$

$$(\phi, \theta) = \mathcal{S}^{-1}(\mathbf{e}, \boldsymbol{\mu}, \boldsymbol{\nu}; \delta\_{\mathcal{S}}) \tag{9}$$

Assumptions about the display size, position, orientation, curvature, etc. are collectively expressed as δ*<sup>S</sup>* . The screen model relies on these parameters and the outputs of the head-eye-transform model (i.e., **e**) and the eye model (i.e., φ,θ) to estimate POR in terms of the display's horizontal and vertical screen coordinates (i.e., *u*, *v*). Conversely, the inverse of the screen model allows us to estimate the rotation angles of our eye-model, given the current eye pose and screen coordinates of the user's POR. This is a necessary step in the calibration algorithm as it allows the rotation angles of the eye to be estimated from a known POR on the display, such as when the user is fixating a specified calibration stimulus.

For current purposes, we assume that the screen is a planar surface that is defined by a center point **c**, a normal vector **n**, a metric width and height *sx*, *sy*, as well as a corresponding display resolution of *su* × *sv* in pixels. Thus, we define δ*<sup>S</sup>* as

$$\delta\_{\mathcal{S}} = (\mathbf{c}, \mathbf{n}, s\_{\mathfrak{x}}, s\_{\mathfrak{y}}, s\_{\mathfrak{u}}, s\_{\mathfrak{v}}) \tag{10}$$

From a known eye pose, **e**, a gaze vector onto the screen can be calculated by multiplying the rotation matrices of the eye's orientation in space (Equation 4) and its rotations about its center:

$$\mathbf{g} = R\_{\mathbf{e}} R\_{\phi} R\_{\theta} \begin{pmatrix} 1, 0, 0 \end{pmatrix}^{T} \tag{11}$$

The 3D intersection point, **f**, of this gaze vector, **g**, and the display screen is determined. This constitutes the user's POR. With this, the current POR can be computed in terms of screen coordinates (*u*, *v*) by an interpolation that is based on the display screen's dimensions *sx* × *sy* and its pixel resolution *su* × *sv*.

#### *2.2.4. Calibration algorithm*

In our geometric implementation, calibration works by requiring the participant to fixate on known positions on the display surface. Given the known PORs during calibration and the input data provided by the eye- and motion-tracking systems, our algorithm seeks to estimate the optimal values for the free parameters δ*<sup>T</sup>* , δ*M*, and δ*<sup>S</sup>* . The screen model parameters δ*<sup>S</sup>* are only dependent on the display surface of the experiment and not the user. Thus, it only needs to be determined once. The parameters of the head-eye-transform model, δ*<sup>T</sup>* , and eye model, δ*M*, are user-specific and must be calculated for each individual participant.

This process is comparable to the standard calibration procedure of eye-trackers, whereby the head-fixed user is required to fixate on a sequence of annuli on the visual display. The sequence usually samples from a 3 by 3 grid that is centered and aligned to the display's boundaries. Based on the pupil's position on the camera image for each pre-determined POR, PORs on other regions of the screen within this grid can be estimated by interpolation.

For a mobile user, walking and head movements result in a changing head pose. These extra degrees of freedom must be accounted for in the calibration process. This can be achieved by performing the eye-tracker's calibration first, separately from the calibration of the head-eye-transform model (e.g., Johnson et al., 2007). Alternatively, one could optimize all the free parameters in one combined calibration process—for example, by requiring the user to fixate a known location in space while moving in a way that samples the range of possible head and body movements (Ronsse et al., 2007).

Like Ronsse et al. (2007), we optimize the free parameters of our models (i.e., *T* , *M*, and *S*) simultaneously. Unlike Ronsse et al. (2007), we do not require the user to perform any specific movement behavior. Instead, we presented a moving display stimulus that the user had to fixate, while moving his head and body according to the mobility that was permitted to him as per the experimenter's instructions. Details of our stimulus and mobility instructions are given in section 2.4. In this way, each participant provides a sample of calibration data that reflects his natural eye- and head-movements whilst fixating many PORs that cover a large area of the visual display.

Calibration data consists of a set of *n* input/output pairs *D* = {*x*, *y*}*<sup>n</sup>* across the time of the calibrated session. The input data *x* = (**h**, *px*, *py*) gives the user's head and eye configuration and the output *y* = (*u*, *v*) represents the screen-coordinates of the calibration stimulus. The former is provided by the motion-and eye-tracking system while the latter is (randomly) determined by the experimental control script.

Since the screen configuration is independent of the current user, the calibration process can incorporate multiple datasets, acquired from different users. We denote the combined training corpus as *K* = (*D*1,..., *DK*). Screen model parameters δ*<sup>S</sup>* are obtained by minimizing a cost function *t<sup>D</sup>* over each *D*. This can be stated as:

$$\delta\_{\mathcal{S}} = \operatorname{argmin}\_{\delta\_{\mathcal{S}}} f \Big( \delta\_{\mathcal{S}} \Big) := \sum\_{k}^{K} \left[ \min\_{\delta\_{\mathcal{T}}} t\_{\mathcal{D}\_{k}} \Big( \delta\_{\mathcal{T}}, \delta\_{\mathcal{S}} \Big) \right] \tag{12}$$

Function *t<sup>D</sup>* returns the difference between estimated PORs of the algorithm and the true PORs, based on the current parameters δ*<sup>T</sup>* and δ*<sup>S</sup>* on the dataset *D* of a given user:

$$t\_{\mathcal{D}}(\delta\_T, \delta\_{\mathcal{S}}) := \sum\_{i}^{n} \left\|(u^i, v^i) - \mathcal{S}\left(e^i, \phi^i, \theta^i; \delta\_{\mathcal{S}}\right)\right\| \tag{13}$$

$$\begin{aligned} &= \sum\_{i}^{n} \left\| \left( u^{i}, \nu^{i} \right) \right\| \\ &- \mathcal{S} \Big( \mathcal{T} \Big( h^{i}; \delta\_{\mathcal{T}} \Big), \mathcal{M} \Big( p^{i}\_{\mathcal{X}}, p^{i}\_{\mathcal{Y}}; \delta\_{\mathcal{M}} \Big); \delta\_{\mathcal{S}} \Big) \Big\| \quad (14) \end{aligned}$$

In evaluating Equation (13), an eye model *M* is used to estimate eye-rotation angles (φ,θ) based on the pupil's position in the camera image (*px*, *py*). To optimize its parameters (i.e., <sup>δ</sup>*M*<sup>ˆ</sup> ), we carry out a minimization of the error between eyerotation angles estimated based only on δ*<sup>M</sup>* and eye-rotation angles geometrically calculated from the current parameterization of the screen model and head-eye-transform model (i.e., δ*<sup>S</sup>* and δ*<sup>T</sup>* ):

$$\begin{split} \delta\hat{\lambda}\_{\mathcal{M}} &= \operatorname\*{argmin}\_{\delta\check{\lambda}\_{\mathcal{M}}} m\_{\mathcal{D}} \Big( \delta\check{\lambda}\_{\mathcal{M}}, \delta\_{\mathcal{T}}, \delta\_{\mathcal{S}} \Big) \\ &:= \sum\_{i}^{n'} \left\| \left( \tilde{\phi}^{i}, \tilde{\phi}^{i} \right)\_{\delta\boldsymbol{T}, \delta\boldsymbol{S}} - \left( \tilde{\phi}^{i}, \tilde{\phi}^{i} \right)\_{\delta\check{\lambda}\_{\mathcal{M}}} \right\| \\ &= \sum\_{i}^{n'} \left\| \mathcal{S}^{-1} \Big( \mathcal{T} \Big( h^{i}; \delta\boldsymbol{\tau} \Big), u^{i}, v^{i}; \delta\boldsymbol{s} \Big) \right. \\ &\left. - \mathcal{M} \Big( p\_{\boldsymbol{x}}^{i}, p\_{\boldsymbol{y}}^{i}; \delta\check{\lambda}\_{\mathcal{M}} \Big) \right\| \\ &\qquad \qquad \qquad \qquad (16) \end{split}$$

To increase computational efficiency, this optimization can be carried out on a subset *<sup>D</sup>* of *<sup>D</sup>*, with *<sup>n</sup>* = |*D* |≤|*D*|.

After the screen model has been determined, the user specific parameters of the head-eye-transform model and eye models (i.e., δ*<sup>T</sup>* , δ*M*) need to be optimized for each user. The head-eye transformation coefficients are determined by minimizing Equation (13) on user data *D*:

$$\delta\_T = \arg\min\_{\delta\_T} t\_{\mathcal{D}} \left( \tilde{\delta}\_T, \delta\_{\mathcal{S}} \right) \tag{17}$$

Likewise, we derive δ*<sup>M</sup>* by evaluating Equation (15):

$$\delta\_{\mathcal{M}} = \operatorname{argmin}\_{\delta\_{\mathcal{M}}} m\_{\mathcal{D}} \Big( \delta\_{\mathcal{M}}, \delta\_{\mathcal{T}}, \delta\_{\mathcal{S}} \Big) \tag{18}$$

When computing (Equation 17), values for δ*<sup>M</sup>* are acquired as well. Similarly, we find δ*<sup>T</sup>* and δ*<sup>M</sup>* when evaluating (Equation 12). However, once the parameters for any given model are determined, the implementation of the other models can be further modified. As an example, the optimization of the screen model may be based on a simple eye model, while a more complex (but computationally intensive) eye model could be employed as the actual representation for subsequent experiments.

The minimizations of Equations (12), (17), and (18) can be accomplished employing any non-linear optimization method. We rely on the dlib implementation (King, 2009) of the BOBYQA algorithm (Powell, 2009).

#### **2.3. REGRESSION APPROACH**

In contrast to a geometric approach, a regression approach operates by predicting output data directly from a set of input data, without specifying the explicit relationships between them. It does not attempt to derive the user's line-of-sight (i.e., gaze) and its intersection with the display. Instead, it infers the relationship between the input and output values from a training set or calibration sample and then generalizes novel input data to a POR.

#### *2.3.1. Gaussian process regression*

The Gaussian process regression (GPR) is a non-linear modeling technique that is able to predict the output *y* = *f*(*x*∗) of a data point *x*<sup>∗</sup> based on a set of observations *D* = (*xi*, *yi*) *N <sup>i</sup>* <sup>=</sup> <sup>1</sup>. Rasmussen and Williams (2005) provides a thorough introduction to the method and its applications. In GPR, the underlying function *f* is represented as a Gaussian process that is defined by a multi-variate Gaussian distribution with a mean function μ and a covariance function

$$
\mu\_\* = K\_\* K^{-1} \mathcal{Y} \tag{19}
$$

$$
\Sigma\_\* = K\_{\*\*} - K\_\* K^{-1} K\_\*^T \tag{20}
$$

where *K*<sup>∗</sup> = [*k*(*x*1, *x*∗), . . . , *k*(*xn*, *x*∗)] determines the covariance vector between training data and current test input. Similarly, *K*∗∗ = *k*(*x*∗, *x*∗) and *K* is defined as the *n* × *n* covariance matrix of the training inputs such that

$$K\_{i\bar{j}} = k(\mathfrak{x}\_i, \mathfrak{x}\_{\bar{j}}) + \sigma\_n^2 \delta\_{i\bar{j}},\tag{21}$$

where δ*ij* represents the Kronecker delta function. There are numerous possibilities for specifying the kernel function *k*(*xi*, *xj*). In our implementation, we employ the automatic relevance determination (ARD) kernel:

$$k(\mathbf{x}\_i, \mathbf{x}\_j) = \sigma\_s \exp\left(-\frac{1}{2} \sum\_{d=1}^D \frac{|\mathbf{x}\_i^d - \mathbf{x}\_j^d|}{l\_d}\right) \tag{22}$$

The ARD kernel is defined by the signal variance σ*s*, the noise variance σ*<sup>n</sup>* and length-scale parameters *l*1,..., *lD*. The lengthscale adjusts the weights of the input data dimensions (e.g., head pose and pupil image position), thus adjusting the relevance of each dimension in predicting the output (e.g., POR coordinates). The kernel function is now specified by the set of hyper-parameters = (σ*s*, σ*n*, *l*1,... *lD*). It follows that the outcome of future predictions depends highly on the choice of . To obtain a sensible configuration we fitted the hyper-parameters to data *D* that we collected in a calibration phase. For this, we maximized the marginal log-likelihood given by

$$\log p(\boldsymbol{\eta}|\mathcal{D}, \boldsymbol{\Theta}) = \frac{1}{2} \boldsymbol{\eta}^T \boldsymbol{K}^{-1} \boldsymbol{\eta} - \frac{1}{2} \log |\boldsymbol{\mathsf{K}}| - \frac{n}{2} \log 2\pi,\tag{23}$$

where |*K*| denotes the determinant of *K*. The maximization can be carried out using optimization algorithms such as the conjugate gradient method (Hestenes and Stiefel, 1952).

#### *2.3.2. GPR for gaze-tracking*

Given our intention to map the eight-dimensional input data *x*<sup>∗</sup> = (*hx*, *hy*, *hz*, *h*φ, *h*<sup>θ</sup> , *h*<sup>ψ</sup> , *px*, *py*) to screen coordinates *y* = (*u*, *v*), two Gaussian processes *Gu*, *G<sup>v</sup>* are created for predicting *u* and *v*, respectively. The GP's optimal hyper-parameters *<sup>u</sup>* and *<sup>v</sup>* are estimated from a set of calibration data *D* = (*xi*, *yi*) *N <sup>i</sup>* <sup>=</sup> <sup>1</sup>. Details on the calibration procedure are found in section 2.4. We submit *<sup>D</sup>* , a reduced subset of *<sup>D</sup>*, with <sup>|</sup>*D* |≤|*D*|, for the estimation of the hyper-parameters (Equation 23). We initialize this optimization by setting all parameters to the value of 1. This optimization runs in the range (0,*e*10]. After the optimal values for these hyper-parameters are established, the kernel matrices *Ku* and *Kv* are computed from the calibration data using Equation (22). This concludes the calibration procedure and *Gu*, *G<sup>v</sup>* can now be used for predicting the POR. We obtain the target value *y*<sup>∗</sup> of an input *x*<sup>∗</sup> by evaluating the respective mean functions μ*<sup>u</sup>* and μ*<sup>v</sup>* at *x*∗:

$$\boldsymbol{\gamma}\_{\*}^{T} = \begin{pmatrix} \boldsymbol{u} \\ \boldsymbol{\nu} \end{pmatrix} = \begin{pmatrix} \mu\_{\boldsymbol{u}}(\boldsymbol{\chi}\_{\*}) \\ \mu\_{\boldsymbol{\nu}}(\boldsymbol{\chi}\_{\*}) \end{pmatrix} = \begin{pmatrix} K\_{\boldsymbol{u}}^{\*} K\_{\boldsymbol{u}}^{-1} \boldsymbol{\mathcal{V}}\_{\boldsymbol{u}} \\ K\_{\boldsymbol{\nu}}^{\*} K\_{\boldsymbol{\nu}}^{-1} \boldsymbol{\mathcal{V}}\_{\boldsymbol{\nu}} \end{pmatrix} \tag{24}$$

In addition, it is possible to estimate the confidence in each predicted POR by looking at the sample variance

$$(\sigma\_\*^T)^2 = \begin{pmatrix} \Sigma\_{\boldsymbol{u}}(\boldsymbol{\chi}\_\*) \\ \Sigma\_{\boldsymbol{\nu}}(\boldsymbol{\chi}\_\*) \end{pmatrix} = \begin{pmatrix} K\_{\boldsymbol{u}}^{\*\*} - K\_{\boldsymbol{u}}^\* K\_{\boldsymbol{u}}^{-1} K\_{\boldsymbol{u}}^{\*T} \\ K\_{\boldsymbol{\nu}}^{\*\*} - K\_{\boldsymbol{\nu}}^\* K\_{\boldsymbol{\nu}}^{-1} K\_{\boldsymbol{\nu}}^{\*T} \end{pmatrix}. \tag{25}$$

The standard deviation σ∗ provides an estimate of the predicted POR's reliability. Our implementation makes use of the Gaussian process C++ library libGp1 . Parameter optimization is performed based on the conjugate gradient implementation in dlib (King, 2009).

#### **2.4. EXPERIMENTAL VALIDATION**

#### *2.4.1. Participants*

Twelve participants (age range: 23–35 years; 8 males) with normal or corrected-to-normal vision were recruited for a user evaluation of both mobile gaze-tracking methods. Two participants were authors (Björn Browatzki and Lewis L. Chuang). The remaining 10 participants were employees of the Max Planck Institute for Biological Cybernetics. Their heights ranged from 158 to 193 cm, with a median of 177.5 cm.

#### *2.4.2. Stimuli and apparatus*

We recorded eye movements at a sampling rate of 250 Hz, using a head-mounted eye-tracker (EyeLink II, SR Research Ltd). This system required at least one camera to be individually positioned beneath a given eye, so as to capture an image of the pupil in the camera's screen coordinate system.

An infrared optical tracking camera (Advanced Realtime Tracking; 60 Hz) was used to track a fixed configuration of six reflective markers, which were mounted on top of the eyetracker itself. This provided us with data regarding the user's pose (i.e., head position and orientation) in space. This camera was mounted at the top of the display and oriented to accommodate a large range of user height.

Visual stimuli were displayed on a back-projection screen (1024 by 768 pixels; 220 by 160 cm) with a projector (Christie Mirage S+3K DLP; 120 Hz).

A height-adjustable chin-rest was used in one trial. This was positioned 140 cm away from the screen. From this viewing position, the screen was ±38.2◦ wide and ±30◦ high in terms of visual angles.

<sup>1</sup>https://bitbucket.org/mblum/libgp

#### *2.4.3. Procedure*

Prior to data collection, the eye-tracking cameras were manually positioned for each participant to provide a clear image of the participant's pupil. To ensure the quality of this camera placement, we performed the calibration and validation procedure provided by SR Research. It should be noted that this procedure did not contribute to the calibration of our mobile gaze-tracking algorithms. In fact, the data collection procedure that follows this was designed to emulate this established calibration–validation process. During this 2 min procedure, participants were required to fixate single dots (0.5◦) that were presented one after another on the display. These dots were randomly sampled without replacement from a 3 × 3 grid, which was centered on the display and subtended a field-of-view that approximated ±32◦ visual angle. This was performed twice. The first time was for calibrating SR Research's algorithm and the second for validating the accuracy of the calibrated algorithm. The cameras were repeatedly re-adjusted until a mean error was achieved that was no larger than 1.5◦. We only recorded data from the more accurate eye. Typically, behavioral experiments adopt a mean error threshold of 0.5◦ prior to recording. However, we adopted a larger error threshold because our chosen eye-tracking system was not intended for use on displays larger than ±16.5◦.

Data collection for evaluating our system was performed for three levels of user mobility, which were randomized for their presentation order. We recorded the user's six degree-of-freedom head pose from the motion-tracker and two degree-of-freedom position of one pupil in an eye-tracker's camera image for offline analyses. The participant was either required to restrain his head in a chin-rest (*head fixed*), allowed to move his head freely (*head free*), or allowed to walk freely in a 150 by 145 cm area in front of the display (*walking*).

Each level of user mobility was divided into two phases that differed in terms of their gaze-tracking task. In the first phase (*Dynamic*), the participant was required to fixate a moving red dot on the visual display. This dot moved either vertically or horizontally at a speed of (100 px/s) for at least 100 px, before changing directions randomly, in one of the three alternative cardinal directions. The marker was paused for 750 ms on each change of direction. The overall duration of this phase was 3 min. In the second phase (*Static*), participants fixated red dots that were sequentially presented one after another. The positions of these dots were sampled ten times without replacement from a 5 × 4 grid. This grid was centered in the screen with the dimensions of ±32◦ width by ±23◦ height in visual angles. Each of the 20 grid points appeared 10 times in random order for 1500 ms. This resulted in a total of 200 presented stimuli and an overall duration of 5 min.

Short rests were provided to the participants between trials and the full data collection process took approximately 1 h to complete.

#### **2.5. DATA ANALYSIS**

The screen coefficients of the geometric algorithm were initially calibrated on the datasets originating from the head-free and walking condition of the first six participants. This is a preliminary step that is necessary only for the geometric method (see section 2.2.3 and Equations 12–15).

Following this, the collected gaze-tracking data were treated to emulate the typical calibration–validation procedure that is performed prior to the use of most video-based eye-trackers (e.g., Eyelink2). First, the data were divided into four datasets for each mobility level. Two datasets were created from the first 2 min and the last minute of the *Dynamic* data collection phase that required participants to fixate a moving target. They are termed *Calib-Dynamic* and *Valid-Dynamic*, respectively. Two more datasets were created from the first 2 min and the last 3 min of stable fixations from the *Static* data collection phase wherein participants sequentially fixated single non-moving stimuli. These are termed *Calib-Static* and *Valid-Static*, respectively. *Calib-Static* and *Valid-Static* were filtered to keep only the stable fixations on the single dots. This was to account for the fact that every user required an undetermined amount of time to saccade toward and maintain a steady fixation on the new target location. Therefore, we removed eye- and head-movements between fixations by ignoring the first 1250 ms of data after each stimulus onset. Only the remaining 250 ms was used to represent the POR for each stimulus.

Three evaluations were performed offline that differed in terms of the pairing between the dataset that was used for training the calibration algorithm and the dataset on which the calibrated algorithm was validated on. These pairings were chosen to exemplify how the data collection procedure could influence the accuracy of the different calibration methods. For the first two evaluations, the regression and geometric calibration methods were trained on *Calib-Dynamic*. Following this, the calibrated algorithms were evaluated in terms of the difference between their estimated PORs on the display, given the datapoints from *Valid-Dynamic* and *Valid-Static*, and the known stimulus position. In a third evaluation, both calibration algorithms were trained on a combined dataset of *Calib-Dynamic* and *Calib-Static* and validated on *Valid-Dynamic* and *Valid-Static*. Neither algorithms was trained on *Calib-Static* alone. This is because the regression method requires a large and variable dataset of eye- and headmovements, which is not available from the discrete and static fixations recorded in *Calib-Static*.

A difference (or error) between the displayed stimulus and the computed POR of either algorithm could be attributed to the given gaze-tracking algorithm and our participants' accuracy in fixating the target stimulus. To allow for comparison to previous methods, these differences were expressed in visual angles rather than pixel distances. Thus, error was computed as the horizontal (azimuth) and vertical (elevation) angular discrepancy between the two direction vectors from either the current position of the participant's head to the estimated POR or to the visual stimulus on the screen. We also report the combined error, which is defined as the angle between these two vectors.

Computation time (measured on a 2.8 GHz desktop CPU) was <3 s for the GP training of the regression method, <1 s for the user specific calibration of the geometric method and <30 s for the calibration of the geometric screen model. If data from the tracking devices can be assumed to be always available, the regression method predicts PORs at approximately 400 Hz on the same hardware. This increases to 3000 Hz if only the POR is computed without its variance. Comparable performance can be achieved by the geometric method. Thus, our methods are computationally efficient and are suitable for real-time applications such as gaze-contingent display changes, given tracking devices with high sampling frequencies and low transmission latencies.

#### **3. RESULTS**

Three evaluations were performed for different pairings of calibration and validation datasets on the collected datasets of head pose and pupil image data (see section 2.5). These pairings differed in terms of the task that was performed during calibration and validation data collection. The results are plotted separately in **Figure 2** for the three user mobility conditions and summarize the mean error for each participant in the horizontal and vertical dimension as well as in the combined visual angle. In addition, the regression method offers a confidence bound for each POR estimate (see section 2.3.2). The mean of these confidence bounds are represented for each participant using a jet-color scheme whereby highly unreliable POR estimates are represented by dark red, which equals a mean standard deviation of 75 pixels and above, while bright green indicates a standard deviation of 0 pixels. The initials of some outlier participant data are highlighted in **Figure 2**. Their motion- and eye-tracking data are plotted in **Figures 3**, **4** to understand why the regression method fared poorly for these individuals.

Overall, our geometric method achieved comparable performance to previous work in the head-fixed and head-free conditions. Ronsse et al. (2007) reported a mean absolute error that was less than 3.5◦ whereas Johnson et al. (2007) reported azimuth and elevation errors that were less than 4.0◦. Dotted lines are provided at the 4.0◦ value in **Figure 2** for ease of comparison. Generally, the regression method compares well against the geometric method. Nonetheless, the results of our evaluation highlight some vulnerabilities of the regression method that are addressed in the following paragraphs. Finally, both calibration methods are susceptible to an increase in vertical elevation errors with increasing user mobility.

Task dissimilarity between the calibration and validation phase affected the regression method far more than the geometric method (**Figure 2B**). This can be remediated by employing similar tasks for calibration and validation (**Figure 2A**). Alternatively, the calibration algorithm could be trained on gaze behavior that is elicited across multiple tasks (**Figure 2C**). This would result in more varied data of eye- and head-combinations, which is especially beneficial for training the regression method. Such data need not be exhaustive. Our current example relied on only two tasks that elicited pursuit and fixation gaze behavior, which was sufficiently generalizable.

The regression method appeared to be better than the geometric method for the head-fixed condition, especially for the horizontal azimuth component of estimated PORs (**Figures 2A–C**). We postulate that the regression method, unlike the geometric method, is able to account for non-linearities caused by large eyein-head rotations. As mentioned previously, muscle tension in the forehead that result from extreme eye-in-head rotations could cause shifts in the head-mounted eye-tracker. While this would induce inaccuracies in the head-eye-transform model (i.e., *T* ) of the geometric method, this will not represent a problem for the regression method as long as such a shift in the eye-tracker is consistently induced.

Differences between the two calibration methods are more apparent when the calibration task varies from the test condition (see **Figure 2B**). Here, the geometric method generalizes better than the regression method. However, this is not true for all participants. Participants with low gaze-tracking accuracy on the regression method represent outlier data. They are easily identifiable by the large standard deviations (i.e., dark red dots in **Figure 2B**) in the estimated PORs. If these participants are excluded on this criterion of PORs reliability, the median accuracy of the regression method is comparable (if not superior) to the average accuracy of the geometric method. To reiterate, the geometric method provides no systematic method for removing unreliable data, apart from setting an arbitrarily defined criterion for eye-tracking accuracy during calibration itself.

Four participants demonstrated substantially worse gazetracking performance with the regression method, relative to the geometric. Namely, MS, KD, CG, and CG2. These outliers' raw data from the motion- and eye-tracker from the *Calib-Dynamic:Valid-Static* pairing from the *walking* data are respectively plotted in **Figures 3**, **4**, and contrasted against the raw data of participant CH who represented a more typical participant. The main weakness of the regression method is highlighted here in that it requires the calibration data to overlap with the test data that we intend to collect using the calibrated gaze-tracker. **Figure 3** shows that MS and KD did not cover as much of the available walking space as CH. As a result, the regression method was not able to accurately generalize from the calibration data to the validation data. The geometric method does not suffer from this problem because it builds a head-eye-transform model (*T* ) and eye model (*M*) that is independent of the user's position in space. In **Figure 4**, we note a similar pattern. Participant CG exhibited larger eye-in-head rotations in the *Valid-Static* dataset than her *Calib-Static* dataset. Participant CG2's dataset showed the same, albeit to a lesser extent. In contrast, Participant CH demonstrated an extensive overlap between the eye-tracker data from the calibration and validation datasets.

Therefore, greater overlaps between calibration and validation datasets should result in higher gaze-tracking accuracy, especially for the regression method. To confirm this, we computed the amount of overlap between the calibration and validation datasets for each of the three evaluations that was performed and examined their relationship to gaze-tracking accuracy (**Figure 5**). First, a five-dimensional space was defined in terms of headposition (*hx*, *hy*, *hz*) and pupil-position (*px*, *py*); we omitted head-orientation dimensions because it would have resulted in a large and sparsely populated space. Subsequently, this space was divided into equal-sized bin regions (10 cm for *hx*, *hy*, *hz*; 1500 units for *px*, *py*) and, for each given evaluation, populated by the calibration and validation dataset. Overlap was defined as the proportion of bin regions that were jointly occupied by calibration and validation datasets to the total number of bin regions occupied by only the validation dataset. This was calculated for each mobility condition per participant, which resulted in 36 data-points per gaze-tracking method for each evaluation.

represent the median, upper and lower quartile, ±1.5 inter-quartile range and outliers. Data-points for individual participants are plotted for the regression

estimated PORs. Dotted lines are provided to indicate the calibration accuracy of previous work. **(A)** Calib-Dynamic:Valid-Dynamic; **(B)** Calib-Dynamic:Valid-Static; **(C)** (Calib-Dynamic,Calib-Static):(Valid-Dynamic,Valid-Static).

The results are in general agreement with our expectation, there was a significant and weak relationship between dataset overlap and gaze-tracking accuracy for both methods. **Figure 5B** shows that this relationship was most prominent for the regression method (black line), when the calibration and validation tasks differed from each other. The influence of dataset overlap on gaze-tracking accuracy was considerably reduced for both methods by combining data from the dynamic and static tasks (see **Figure 5C**).

#### **4. DISCUSSION**

In this paper, we compared a general geometric method and a regression method for mobile gaze-tracking. Our results indicate that a regression method for gaze-tracking can achieve comparable performance to a geometric approach. Our results also highlight the importance of using an appropriately designed calibration task that is able to elicit variable gaze behaviors. A mobile participant can achieve the same POR by a variety of eye, head and body pose combinations. Thus, submitting a variable and rich data set for calibration can be expected to improve the calibration accuracy of both gaze-tracking methods, especially a regression method. This was similarly noted by Cesqui et al. (2013) who performed calibrations in two phases, first by restraining their participants' heads in order to elicit large eye-in-head rotations and, subsequently, without restraints.

The strength of the geometric method lies in its ability to better generalize across different gaze behavior, regardless of the underlying task. Thus, it was able to maintain reasonable levels of gaze-tracking accuracy even when the calibration task differed from the tested task (**Figure 2B**). In contrast, the regression method was vulnerable to this difference, presumably because different tasks elicited different patterns of head- and eye-movements in some participants (see **Figures 3**, **4**, respectively). This shortcoming of the regression method could

be addressed by ensuring that the calibration data is sufficiently diverse, perhaps by requiring more than one calibration task. In fact, it is generally advisable to calibrate on more than one task, as it is currently shown to benefit both methods (see **Figure 5**).

Unlike the geometric approach, a regression method for gazetracking does not require a specific data input (i.e., eye-in-head rotations) for training. It can be trained on any arbitrary units provided by the eye- and motion-tracking system. Therefore, a regression method can still be used even when the hardware manufacturer does not provide specific information regarding the nature of its available data output.

More importantly, the geometric method has intrinsic limitations that are less easy to overcome. In spite of our repeated efforts in eye-tracker camera placement, the mean accuracy of our eye-tracking calibrations was limited to a range of 0.48◦ to 1.26◦. This is worth mentioning for practical reasons. Under normal circumstances, all of these participants would have been rejected from further participation in the experiment, since most experiments calibrate their participants to an accuracy level of 0.5◦. Nonetheless, this level of accuracy in the eye-tracker was to be expected, given the large size of our tested field-of-view, which exceeded the recommended range of the eye-tracker itself (i.e., < ±16.5◦ field-of-view). Under such circumstances, the experimenter faces the dilemma of either relaxing the accuracy threshold for eye-tracker calibration or modifying the experiment. The latter could be achieved by reducing user mobility or the field-of-view. However, this would limit the scope of the researcher's study. The regression approach circumvents this problem in a principled fashion. Recorded PORs can be removed based on the regression method's expressed confidence in their estimation. If this results in a significant proportion, the individual participant's dataset could be removed altogether. Such a process would be transparent, given that the criteria for accepted PORs and proportion of accepted PORs can be reported. Currently, the number of participants who are rejected because of poor eye-tracking calibration are rarely reported, even if the adopted criterion accuracy of 0.5◦ is fastidiously applied.

The methods reported in this paper do not cover eye-tracking solutions that calibrate and align gaze to the view-frustum of a front-facing video-camera recording (e.g., ETG, Sensoric Instruments GmbH). Such systems allow estimated PORs to be superimposed on a video-recording that approximates a firstperson perspective of the user. This approach requires the content of the video-recording to be hand-coded for regions of interest. The methods that we address in this paper estimate PORs according to a known display or world objects without the need for hand-coding. This prevents the researcher from defining the regions of interest in an *ad hoc* fashion.

The accuracy of a geometric method can be improved by defining better models for the underlying eye-head transformation and the pupil's projection to the eye-tracking cameras. Additional procedures could also be introduced to compensate for any errors that might systematically accumulate during experimentation. For example, Cesqui et al. (2013) reported accuracy levels of less than 1◦ with their mobile gaze-tracking system. This improvement was achieved by introducing a procedure that corrected for drifts due to helmet slippage, by modifying the assumptions for the eye-model and by employing a non-linear optimization algorithm for deriving their calibration parameters. Given the novelty of a regression approach in mobile gaze-tracking, it remains to be seen whether similar improvements can be achieved. Future attempts to improve the regression approach should focus on selecting better algorithms for parameter optimization and improving upon calibration procedures. Unlike a geometric approach, a regression approach does not need to refine the assumptions of the eye-head transformation, eye and physical world model.

The work presented here was conducted to inform researchers who intend to employ gaze-tracking on mobile participants. To this end, we provide software for replicating and improving our methods. The computational efficiency of these methods make them suitable for gaze-contingent experiment designs Browatzki et al. Mobile gaze-tracking

and applications, if low transmission latencies and synchronization between tracking devices can be ensured (see section 2.5). Based on our results, a regression approach for gazetracking approximates the expected accuracy of a geometric approach, if the calibration data captures the effective range of eye and head movements that a user is likely to exhibit in the experiment. In our opinion, a regression approach offers more flexibility and ease of implementation. While the geometric method restricts gaze-tracking accuracy to the limitations of its assumed models and equipment, the regression approach is limited by the design of the calibration task and the employed algorithm. We consider the latter to be more achievable.

#### **FUNDING**

This research was supported by the Max Planck Society. Part of Heinrich H. Bülthoff's research was supported by the Brain Korea 21 PLUS Program through the National Research Foundation of Korea funded by the Ministry of Education.

#### **ACKNOWLEDGMENTS**

Authors Björn Browatzki and Lewis L. Chuang contributed to this work equally. The authors would like to thank Mr. Hans-Joachim Bieg, Dr. Christian Herdtweck, and Dr. Daniel H. Baker for their helpful comments.

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 30 November 2013; accepted: 21 March 2014; published online: 08 April 2014.*

*Citation: Browatzki B, Bülthoff HH and Chuang LL (2014) A comparison of geometric- and regression-based mobile gaze-tracking. Front. Hum. Neurosci. 8:200. doi: 10.3389/fnhum.2014.00200*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Browatzki, Bülthoff and Chuang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# MoBILAB: an open source toolbox for analysis and visualization of mobile brain/body imaging data

#### *Alejandro Ojeda\*, Nima Bigdely-Shamlo and Scott Makeig*

*Swartz Center for Computational Neuroscience, Institute for Neural Computation, University of California San Diego, La Jolla, CA, USA*

#### *Edited by:*

*Klaus Gramann, Berlin Institute of Technology, Germany*

#### *Reviewed by:*

*Robert Oostenveld, Cognition and Behaviour Centre for Cognitive Neuroimaging, Netherlands Alexandre Gramfort, Telecom ParisTech CNRS, France*

#### *\*Correspondence:*

*Alejandro Ojeda, Swartz Center for Computational Neuroscience, Institute for Neural Computation, University of California San Diego, 9500 Gilman Drive #0559, La Jolla, CA 92093-0559, USA e-mail: alejandro@sccn.ucsd.edu*

A new paradigm for human brain imaging, *mobile brain/body imaging* (MoBI), involves synchronous collection of human brain activity (via electroencephalography, EEG) and behavior (via body motion capture, eye tracking, etc.), plus environmental events (scene and event recording) to study joint brain/body dynamics supporting natural human cognition supporting performance of naturally motivated human actions and interactions in 3-D environments (Makeig et al., 2009). Processing complex, concurrent, multi-modal, multi-rate data streams requires a signal-processing environment quite different from one designed to process single-modality time series data. Here we describe MoBILAB (more details available at sccn.ucsd.edu/wiki/MoBILAB), an open source, cross platform toolbox running on MATLAB (The Mathworks, Inc.) that supports analysis and visualization of any mixture of synchronously recorded brain, behavioral, and environmental time series plus time-marked event stream data. MoBILAB can serve as a pre-processing environment for adding behavioral and other event markers to EEG data for further processing, and/or as a development platform for expanded analysis of simultaneously recorded data streams.

**Keywords: EEG, motion capture, mobile brain/body imaging, MoBI, EEGLAB, multimodal neuroimaging**

#### **INTRODUCTION**

For nearly 50 years the dominant approach to cognitive EEG experiment protocols and subsequent data analyses has been the Event Related Potential (ERP) paradigm in which EEG epochs are extracted from the continuous EEG data time-locked to one or more classes of experimental events (typically, stimulus onsets or finger button presses). Event-locked averages of these epochs (ERPs) extract the relatively small portion of the EEG that is both time-locked and phase-locked to the events of interest (Makeig et al., 2004). The same paradigm can be extended to linear transforms of the channel data including its maximally independent component processes (Makeig et al., 1996, 2002), and/or to time/frequency transforms of these EEG time series (Makeig, 1993; Tallon-Baudry et al., 1996; Delorme and Makeig, 2004).

The ERP paradigm assumes that differences in EEG dynamics across event-related trials unrelated to the experimental events of interest can be eliminated through random phase cancellation by averaging a sufficient number of event time-locked epochs. To maximize the effectiveness of this assumption, participants in ERP experiments are typically instructed to sit still and to minimize blinks and other muscle activities while performing some task involving evaluation of presented stimuli, the participant indicating his or her stimulus decisions by pressing a finger button (or "microswitch"). During data analysis these button press responses are considered to be in effect (point) processes without spatial or temporal extent. However, the instruction to refrain from blinking and making any other extraneous movement is in effect a dual-task that forces the brain to operate under unnatural and somewhat stressful circumstances (Verleger, 1991). It also severely restricts the range of task paradigms and behaviors that can be employed to observe and understand how human brain dynamics support our natural behavior and experience.

A new direction in experimental paradigm design was proposed by Makeig et al. (2009) to enable, for the first time, measurement and analysis of human brain dynamics under naturalistic conditions including subject eye and motor behavior in 3-D environments. Compared to previous modes of functional brain imaging, the new concept in effect proposed a new brain imaging modality that Makeig et al. termed *mobile brain/body imaging (MoBI)*. In this paradigm, synchronized streams of behavioral and environmental time series data are measured along with subject EEG and/or other brain and physiological signals. In many practical circumstances, data collection rates may differ and some information streams may be sampled irregularly. Combining data modalities as different as motion capture, eye tracking, sound and video recordings, etc., with high-density EEG data allows study of brain dynamics under conditions much closer to everyday life. To date the MoBI paradigm has been applied in studies of brain dynamics supporting gait, balance, and cognition during walking (reviewed in Gramann et al., 2011; Sipp et al., 2013) and to study expressive gesturing (Leslie et al., in press).

Traditional scalp-channel ERP analysis can be carried out in almost all available EEG toolboxes including EEGLAB (Delorme and Makeig, 2004), Brainstorm (Tadel et al., 2011), FieldTrip (Oostenveld et al., 2011), BrainVision Analyzer, and SPM (Friston et al., 1994). However, since EEG software has most often been designed to handle unimodal EEG data (plus one or more eventmarker channels), a new tool set is needed to deal with the complex analysis problems involved in efficient analysis of multimodal MoBI data.

A related experimental EEG paradigm of increasing research interest is the development of brain-computer interface (BCI) models and applications (Makeig et al., 2012). In this paradigm, brain activity associated with a cognitive state or task is used to estimate a model of a subject's cognitive brain state, response, or intent with a goal of classifying or estimated current or future responses so as to control external interfaces (Schneider and Wolpaw, 2012). When the subject (or an external interface) requires a timely interpretation of current brain/behavior state or intent, a near real-time pipeline is needed to process the data and estimate and/or update the model continuously. Near real-time signal processing may impose serious computational constraints on the signal processing methods that can be used, limiting the type and range of states estimated and the number of data streams processed (Wang and Jung, 2013). Basic studies of brain dynamics, on the other hand, may use latent variables in complex BCI models trained off-line to model brain state or dynamics associated with subject brain states, responses, or intent during an EEG or MoBI experiment. Typically, dynamic features used in such models are optimally selected in some fashion from a large repertoire of possible features. Examination of the model features so selected may be termed informative feature analysis. Current leading edge BCI toolkits include BCILAB (Kothe and Makeig, 2013) and BCI2000 (Schalk et al., 2004). Other EEG toolboxes such as MNE (Gramfort et al., 2014) and FieldTrip provide basic support for real-time signal processing as well. BCILAB, in particular, provides a strong basis for informative feature analysis of recorded EEG and/or multi-modal data.

The MoBILAB toolbox introduced here is designed to support analysis and visualization of any number of concurrently recorded multimodal data streams during off-line analysis. To extend human EEG analysis tools to encompass new modes of analysis of multiple physiological and behavioral data modalities while making the basic software functions easy to use and build upon, MoBILAB exploits the most recent advances in object-oriented programming supported by MATLAB Versions 7.5 and above. To learn more about MATLAB object oriented programming, see MathWorks on-line documentation under mathworks.com/help/matlab at matlab\_oop/classes-inthe-matlab-language.html.

#### **SOFTWARE ARCHITECTURE**

MoBILAB is composed of three independent functional modules: (1) the Graphic User Interface (GUI), (2) a data objects container, and (3) any number of modality-specific data stream objects (or stream objects). These three modules decouple GUI, file I/O, and data processing, thus allowing extensions and/or reimplementation of parts of the toolbox without making dramatic changes to the whole system, e.g., preserving large, still-stable parts of the code. **Figure 1** below shows a schematic of the MoBILAB architecture.

The MoBILAB GUI is controlled by a "*mobilabApplication*" object called "*mobilab*," this object creates the interactive tree in which the raw data stream objects and its descendants are represented; it also assigns a modality-specific menu item to each object.

The data object container is implemented in the class "*data-Source*"; this object imports a multimodal data file and collects modality-specific data stream objects in a cell array that is stored in the object property *item*. As a container, the *dataSource* object defines easily applied methods for deleting and inserting stream objects that take care of updating the logical pointers encoding parent-child relationships among the elements of the tree. To read a new file format, for example, a new class can be derived inheriting from *dataSource* its basic functionalities while implementing only the format-specific data reading method.

Natively, MoBILAB reads multimodal data files in the *extensible data format* (XDF) (freely available at code.google.com/p/XDF) designed to work both with MoBILAB and with the Lab Streaming Layer (LSL) data acquisition and control framework of Christian Kothe (Delorme et al., 2011), software freely available at code.google.com/p/labstreaminglayer. To import.*xdf* files produced by LSL during MoBI experiments, MoBILAB defines the child class "*dataSourceXDF*." When an.*xdf* file is imported by *dataSourceXDF*, a set of header and binary files are created for each data stream read from the source.*xdf* file. Data stream objects are then constructed that map the metadata and data samples contained in the header and binary data files, respectively, onto object properties. Data sample mapping uses memory-mapping (detailed below). This mapping between object properties and files allows automatic file updating by modifying stream object properties. The *dataSource* object is stored in *mobilab* property *allStreams* and can therefore be accessed from the MATLAB workspace as *mobilab.allStreams*.

Data stream objects are organized in a hierarchy with base class *dataStream;* this class defines some methods that operate on generic time series data, including *resample*, *filter*, and *plot*. Deriving from the *dataStream* base object allows straightforward definition of modality-specific data stream objects including objects that handle EEG, motion capture, or eye tracking data, among others. In the specific case of EEG data, the class *eeg* is defined to also derive from the class *head-Model*, allowing integration of spatial anatomical information with functional information contained in the EEG time series (see **Figure 2**).

#### **COLLECTION AND SYNCHRONIZATION OF MULTIPLE DATA STREAMS BY LSL**

MoBILAB itself is not meant for use during data collection, but for off-line data exploration. When reading multi-stream data from.*xdf* files, MoBILAB makes use of the time synchronization provided by the LSL (Lab Streaming Layer, referenced above) acquisition system. LSL implements time synchronization of concurrently recorded data streams by associating every collected sample with a timestamp provided by the high-resolution clock of the local computer on which the LSL data recorder application is running. When multiple data streams are recorded, LSL also estimates and stores the relative clock offsets among them. In general, multiple data streams may be collected concurrently via different computers located on a local area network, any one (or more) of which may serve to save the synchronized data to disk as an.*xdf* file that can then be imported into MoBILAB for review and processing. As it reads the file (using LSL function

*load\_xdf.m*), MoBILAB corrects the data stream timestamps by, first, estimating an (assumed) linear fit through the (possibly noisy) clock offsets and then correcting the time stamps for implied bias and drifts.

(.xdf), EEGLAB (.set), or MoBILAB (.hdr,.bin)] and instantiates a corresponding *dataSource* object that identifies the constituent data streams and creates stream-specific objects to handle them. Each data stream object defines

In practice, LSL can achieve millisecond precision or better synchronization, though only when the (non-zero) delay *within* each data acquisition device, from data sample capture to network sample transmission, is measured and made available to LSL in a device delay information file. Device delay information is most often best learned by direct measurement of each device. For example, the sample delay we have measured within our laboratory Biosemi Active 2 EEG system is 8 ms. When device delay information is available, LSL also corrects the.*xdf* file data stream timestamps for the measured delays, thereby achieving maximum stream synchronization accuracy while relieving.*xdf* data read functions of the need to locate and make use of this information.

handles two files, a header that provides metadata about its properties, and a binary data file that is memory mapped to disk, allowing its data (no matter

#### **MEMORY MANAGEMENT**

how large) to be accessed through its *data* field.

In contrast to the common practice of representing data streams as matrix variables in the MATLAB workspace with computer random access memory (RAM), MoBILAB stores its working data in MATLAB memory-mapped data objects. Such memorymapped objects are used, for example, by SPM (Friston et al., 1994) to represent MRI data. These objects organize the data in disk files optimized for fast random access, making it possible for MoBILAB to process data streams of virtually any size without requiring the host computer to have an unlimited amount

of RAM, with relatively little compromise of compute performance (particularly as solid state and other disk access latencies continue to decrease). Each stream object is created from two files: (1) a.hdr header file that contains stream object metadata in MATLAB format, and (2) a.*bin* binary file that contains the data stream samples. Based on information in the header file, portions of the binary file are mapped into main memory only as and when needed. These operations are transparent to both MATLAB users and programmers. Data samples can be accessed using standard MATLAB and EEGLAB data matrix syntax through the *dataStream* object property *data*. Thus, MATLAB syntax for accessing data in EEGLAB and MoBILAB datasets appears similar:

$$\text{\textsuperscript{\textsuperscript{\textsuperscript{\textsuperscript{\textsuperscript{\textsuperscript{\textsuperscript{\textsuperscript{\textsuperscript{\textsuperscript{\textsuperscript{\textsuperscript{\textsuperscript{\textsuperscript{\textsuperscript{\textsuperscript{\textsuperscript{\textsuperscript{\textsuperscript{\textsuperscript{\textsuperscript{\textsuperscript{\textsuperscript{\textsuperscript{\textsuperscript{\textsuperscript{\mu\_{\mu}}}}}}}}}}}}}}}}}}}$$
}

#### *>>this\_data = eegObj.data; % copy MoBILAB "eegObj" object % data into a MATLAB matrix*

In the example above note that—contrary to the EEGLAB convention—MoBILAB represents data using samples as the first dimension and channels as the second (the transpose of *EEG.data*). The reason for this design choice is that MATLAB represents matrices internally as column-wise vectorized arrays. This is because operations performed channel-by-channel (for example, spectral filtering, correlations, etc.), run faster if channel samples are next to each other in RAM or on disk, as modern systems cache values before processing them.

Currently MoBILAB includes object classes to read/write, analyze, and visualize EEG, motion capture, eye tracking, and force plate data as well as recorded audio and/or video scene data. These classes include methods to apply spectral filters, compute temporal derivatives, and perform wavelet time-frequency decomposition, independent component analysis (ICA), or principal component analysis (PCA), plus some methods for electrical forward head modeling and inverse EEG source modeling not discussed here.

#### **THE MoBILAB MULTI-STREAM BROWSER**

In MoBI experiments, the ability to review, interactively, the EEG data together with any other synchronously acquired data streams is critical for selecting, performing, and evaluating data measured during a MoBI experiment and for determining an appropriate data analysis path. The MoBILAB Multi-Stream Browser allows visual inspection, annotation, and sanity checking of recorded multistream data, and can also provide useful insights about suspected or previously unknown relationships between behavioral, environmental, and EEG data features. Each *dataStream* object has a built-in plotting function (a *plot* method) that displays the data stream in a natural manner. For some streams, more than one plotting method is supported to provide the benefits of different types of visualization. For instance, EEG may be displayed as a scrolling multi-channel time series or as an animated topographic scalp map, body motion capture data as movements of one or more 3-D stick figures, fixations in eye tracking data as time series or 2-D heat maps, etc. Extending MoBILAB plotting capabilities to a new type of *dataStream* is simple because core functions including data scrolling, interactive cursor actions, and other controls have been implemented in the base class *browser-Handle* from which a programmer user can easily derive a new browser class by replacing and/or adding new elements that best suit the new data type.

Though each *dataStream* object can be visualized separately, either from the MoBILAB GUI or from the command line, when plotted through the Multi-Stream Browser multiple browser windows can be controlled from a single GUI (**Figure 3**). This control window performs user-directed scrolling through multimodal

**coordinated views of raw and/or derived data streams within a multi-stream MoBI dataset, plus (bottom) a browser control window. (A)** Full body motion capture of participant's rhythmic hand and arm movements, animated on a 3-D human stick figure whose body position depicts subject position at the current moment (CM) in the data, marked in the scrolling time series data windows by CM indicators (vertical red lines). **(B,C)** Derived (x, y, z) velocity and acceleration time series for the (red) left

series for a 32-channel EEG channel subset, shown in a 5-s window containing the CM. **(E)** The EEG scalp topography at the CM, visualized by interpolating the channel EEG data on a template head model. **(F)** The Multi-Stream Browser control window. Movement of the CM in all browser windows at once can be controlled either by manipulating the play buttons or scroll bar in this window or by moving the red vertical CM indicator in any scrolling data window. Pressing play will animate the stick figure and scalp map displays to match the advancing CM. Data from (Leslie et al., in press).

data (and/or through data streams derived from such data) either for multimodal data review and/or to mark experimental events of interest for further analysis.

Although when the Multi-Stream Browser is not meant for on-line data viewing, it is similar in spirit to Video-EEG monitoring systems used in epilepsy or sleep studies. However, unlike the MoBI data, in these systems the data are mostly recorded for clinical purposes and not always analyzed in great detail. As a complement, MoBILAB could be used in those cases where a more sophisticated analysis is required, as it provides the tools needed for multimodal data processing and exploration.

#### **DATA PROVENANCE**

**Figure 4** shows two sample MoBILAB GUI pipelines: (1) to process EEG data: raw data ⇒ re-referencing ⇒ high-pass filtering ⇒ artifact rejection ⇒ ICA decomposition; (2) to process motion capture data: raw data ⇒ occlusion artifact cleaning ⇒ low pass filtering ⇒ computation of time derivatives. Central to MoBILAB's design is its built-in data provenance, that gives users the ability to track and recall all the transformations applied to the data in a processing pipeline. Every stream object has a *history* property that is initialized at the moment of its creation with the command and the parameters that were used to generate it. This mechanism allows representing pipelines in which child datasets are processed versions of a parent set. Graphically, parent-child links can be visualized as a tree. To make this tree as functional and interactively accessible as possible, MoBILAB embeds the Java component *JTree* in the main MoBILAB GUI. This *JTree* component allows the creation of contextual menus for each data object in the tree. By climbing back up any branch of the tree (using menu item "*Generate batch script*"), the user can generate scripts that ease the task of running the same sequence of operations on other datasets.

Other toolboxes including Brainstorm (Tadel et al., 2011) have successfully exploited this approach, making the user's interaction with the application simpler and more natural feeling. In multimodal data analysis, each data stream may have a different set of processing or pre-processing methods. For ease of use, therefore, MoBILAB provides a flexible menu interface that offers (only) selections relevant to each type of data stream present in the data.

The MoBILAB tree is meant to ease the analysis and exploration of different data types by exposing modality-specific standard options for visualization and analysis. Joint analysis of multi-rate MoBI data is still in an early stage, however. Though it is not yet possible to create converging multi-stream data processing pipelines from the GUI, it is possible for example to compute desired measures for more than one data stream and to then estimate their joint subject-level statistics through custom MoBILAB scripts, as demonstrated in (Leslie et al., in press).

**FIGURE 4 | The MoBILAB GUI and processing pipelines.** The left panel shows the tree of parent-children relationships among data objects in a loaded multi-stream MoBI dataset. The integers enclosed in parentheses to the left of each object name give the index in the cell array *mobilab.allStream.item*. The two branches shown unfolded (by clicking on the data object names, here *biosemi* and *phasespace*) represent two already selected data processing pipelines (cf. **Text Boxes 1**, **2**): (1) for Biosemi (Amsterdam) system EEG data: raw data ⇒ re-referencing ⇒ high-pass filtering ⇒ artifact rejection ⇒ ICA decomposition; (2) for PhaseSpace (Inc.) system motion capture data: raw data ⇒ correction of occlusion

artifacts ⇒ low-pass filtering ⇒ computation of time derivatives. By following any branch backwards (upwards), the user can generate MATLAB scripts that make it easy to run the same series of operations on other Biosemi and PhaseSpace datasets. The center panel shows the contextually defined menu for the EEG *dataStream* object. The menu item shaded in blue backtracks the history of every object in the selected branch, creating a script ready to run (see **Text Box 1**). The right panel shows the contextually defined menu for the motion capture data object (see **Text Box 2**). Note that the two stream objects (EEG and motion capture) have different processing menus that present their individually defined processing methods.


**Text Box 1 | Automatically generated script to run an EEG data processing pipeline (from reading an XDF-formatted MoBI dataset to performing ICA decomposition) as generated by the MoBILAB EEG menu item "***Generate batch script***" in Figure 4 (center).** Each command that modifies the data outputs a new object that handles the processed data; this object is also inserted into the MoBILAB object tree. Therefore, in the example below, *eegObj* is used as a temporary reference to the latest processed EEG dataset.


**Text Box 2 | A script implementing a MoBILAB pipeline for processing motion capture ("mocap") data (from stream separation to computing time derivatives).** This script was also generated using the menu item "*Generate batch script*" in **Figure 4** (right).

#### **MoBILAB EXTENSIONS**

Developers can use the MoBILAB infrastructure (stream objects and its signal processing and visualization methods) as building blocks for rapid development of new MoBILAB extensions (formerly "plug-ins"). The example below shows a simple function that reconstructs the EEG channel data as the sum of only those of its independent components deemed to represent brain activity. One way of identifying "brain components" could be for instance to estimate the equivalent current dipole model of each independent component scalp map and then to select those components for which the residual variance of the scalp map not accounted for by the equivalent dipole model is less than some threshold.

The example below (in **Text Box 4**) shows how to create a class for a new type of *dataStream* object by re-using existing classes. The new class, named *icaEEG,* is intended to be a placeholder for the results of ICA decomposition applied to EEG data. It inherits all the properties and methods of the class *eeg* and adds properties to store the ICA field information from the EEGLAB *EEG* dataset structure. The first method defined is the so-called constructor; this function is called automatically by MATLAB at the moment of object creation. The constructor function is mandatory and has the same name as the class itself. The second method is described in **Text Box 3**. Integrating new functions and classes into the MoBILAB class hierarchy allows users to access the new methods directly from the contextual menu associated to each class. The third method uses the EEGLAB function *pop\_topoplot* to display IC scalp maps. The fourth method shows how to redefine methods already defined in a base class. In this case, the method *EEGstructure* is extended to add ICA fields to the EEGLAB *EEG* structure.

#### **EEG DATA PROCESSING**

Although, as **Text Box 4** illustrates, MoBILAB can export an EEG *dataStream* object from a multi-stream data file to EEGLAB as an *EEG* structure, it can also be used to pre-process and export an EEG dataset after performing ICA decomposition. MoBILAB also contains some EEG processing methods (under development) not yet available in EEGLAB itself.

#### **FUTURE DIRECTIONS**

Here we have described MoBILAB, a software environment running on MATLAB for analysis and visualization of multimodal MoBI paradigm experiment data. MoBI analysis, and so also MoBILAB methods, are yet at an early stage of development. We therefore have limited our description to its general infrastructure, its Multi-Stream Browser, its motion capture data preprocessing facility, and its EEGLAB related features. At present, MoBILAB is a toolbox intended to provide researchers with basic tools for exploring their multimodal data. As the field of MoBI data analysis evolves, new methods will be added, specifically those for separately and jointly modeling brain and body dynamics using models incorporating more of the richness of the multimodal (brain/body/environment) MoBI data concept. Modeling brain dynamics while also taking into account body dynamics and interactions with environmental events (and other agents) should provide a better basis for understanding how the human brain supports our behavior and experience in the ever-changing context of daily life, thereby also gaining a deeper understanding of "how the brain works."

A possible way to model brain/body/environment dynamics might be to extend the methodology of Dynamic Causal


**Text Box 3 | A MATLAB function to back-project to the scalp channels only those independent components of EEG data estimated to represent brain activity.**

**Text Box 4 | A definition of the new** *dataStream* **class** *icaEEG* **in MoBILAB.** Observe that MATLAB requires the class to be defined in a m-file whose name matches the name of the class.

Modeling (Kiebel et al., 2009) first to the mechanics of the human body and then to its interface with the central nervous system (from spinal cord to the brain). Other approaches to MoBI data analysis might follow more data-driven approaches including those used in the field of BCI design (Makeig et al., 2012), whereby informative body and/or eye movement-defined events or features, extracted using body and eye movement models, might help classify and segregate EEG trials by cognitive state, response, or intention, thereby opening the possibility of adding powerful informative multimodal feature analysis to the repertoire of EEG/MoBI data analysis (as well as to BCI modeling). In this regard, we hope to strengthen ties between the MoBILAB, BCILAB (Kothe and Makeig, 2013), and SIFT (Delorme et al., 2011) toolboxes with a goal of better modeling EEG brain dynamics from multimodal data.

As a work in progress, new MoBILAB methods, bug fixes, and scripting examples will be added to existing documentation at sccn.ucsd.edu/wiki/MoBILAB. In the spirit of collaboration and openness that has characterized the development of EEGLAB and other open source scientific software projects, MoBILAB is freely available under open source BSD license. Explicit instructions for downloading and/or cloning the repository are given on the wiki. The authors would be pleased to collaborate with other interested researchers to extend the capabilities of MoBILAB to serve the evolving needs of MoBI brain research.

#### **ACKNOWLEDGMENTS**

We thank Christian Kothe, Makoto Miyakoshi, and others at the Swartz Center, UCSD for useful discussions and suggestions. Development of the MoBILAB toolbox was funded in part by a gift by The Swartz Foundation (Old Field, NY) and by the Army Research Laboratory under Cooperative Agreement Number W911NF-10-2-0022. The views and the conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Laboratory or the U.S Government. The U.S Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein.

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 01 November 2013; accepted: 19 February 2014; published online: 05 March 2014.*

*Citation: Ojeda A, Bigdely-Shamlo N and Makeig S (2014) MoBILAB: an open source toolbox for analysis and visualization of mobile brain/body imaging data. Front. Hum. Neurosci. 8:121. doi: 10.3389/fnhum.2014.00121*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Ojeda, Bigdely-Shamlo and Makeig. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Pervasive brain monitoring and data sharing based on multi-tier distributed computing and linked data technology

*John K. Zao1 \*, Tchin-Tze Gan1, Chun-Kai You1, Cheng-En Chung1, Yu-Te Wang2, Sergio José Rodríguez Méndez 1, Tim Mullen2, Chieh Yu1, Christian Kothe2, Ching-Teng Hsiao3, San-Liang Chu4, Ce-Kuen Shieh4 and Tzyy-Ping Jung2*

*<sup>1</sup> Pervasive Embedded Technology Lab, Computer Science Department, National Chiao Tung University, Hsinchu, Taiwan, R.O.C.*

*<sup>3</sup> Research Center for Information Technology Innovation, Academia Sinica, Taipei, Taiwan, R.O.C.*

*<sup>4</sup> National Center for High-performance Computing, Hsinchu, Taiwan, R.O.C.*

#### *Edited by:*

*Klaus Gramann, Berlin Institute of Technology, Germany*

#### *Reviewed by:*

*Reinhold Scherer, Graz University of Technology, Austria Christian Lambert, St George's University of London, UK*

#### *\*Correspondence:*

*John K. Zao, Computer Science Department, National Chiao Tung University, Room EC-527, 1001 University Road, Hsinchu 30010, Taiwan, R.O.C. e-mail: jkzao@cs.nctu.edu.tw*

EEG-based Brain-computer interfaces (BCI) are facing basic challenges in real-world applications. The technical difficulties in developing truly wearable BCI systems that are capable of making reliable real-time prediction of users' cognitive states in dynamic real-life situations may seem almost insurmountable at times. Fortunately, recent advances in miniature sensors, wireless communication and distributed computing technologies offered promising ways to bridge these chasms. In this paper, we report an attempt to develop a pervasive on-line EEG-BCI system using state-of-art technologies including multi-tier Fog and Cloud Computing, semantic Linked Data search, and adaptive prediction/classification models. To verify our approach, we implement a pilot system by employing wireless dry-electrode EEG headsets and MEMS motion sensors as the front-end devices, Android mobile phones as the personal user interfaces, compact personal computers as the near-end Fog Servers and the computer clusters hosted by the Taiwan National Center for High-performance Computing (NCHC) as the far-end Cloud Servers. We succeeded in conducting synchronous multi-modal global data streaming in March and then running a multi-player on-line EEG-BCI game in September, 2013. We are currently working with the ARL Translational Neuroscience Branch to use our system in real-life personal stress monitoring and the UCSD Movement Disorder Center to conduct in-home Parkinson's disease patient monitoring experiments. We shall proceed to develop the necessary BCI ontology and introduce automatic semantic annotation and progressive model refinement capability to our system.

**Keywords: brain computer interfaces, bio-sensors, machine-to-machine communication, semantic sensor web, linked data, Fog Computing, Cloud Computing**

#### **INTRODUCTION**

In recent years, electroencephalography (EEG) based brain computer interfaces (BCI) have left their laboratory cradles and began to seek real-world applications (Lance et al., 2012). Wearable BCI headsets such as Emotiv *EPOC*, NeuroSky *MindSet* and *MINDO* are selling as consumer products while applications such as silent communication using *The Audeo* by Ambient and focus/relax exercises using the *Mindball* by Interactive Productline are attracting widespread attention. Despite this hype, BCI applications still need to overcome a few basic challenges in order to become truly useful in real-world settings:

1. *Finding reliable ways to determine users' brain states:* it is well known that individuals' EEG responses exhibit significant differences even when the individuals perform the same task or exposed to identical stimuli. For example, the EEG correlates of fatigue vary remarkably across different subjects even though they remain relatively stable among different sessions of the same subject (Jung et al., 1997). As a result, long training sessions at different fatigue levels must be conducted on each user in order to calibrate a personalized EEG-based fatigue monitoring model. Hence, there is a pressing need to identify common EEG correlates of certain brain states in order to reduce the amount of training data required to calibrate individual users' BCI systems.


*<sup>2</sup> Swartz Center for Computational Neuroscience, University of California, San Diego, CA, USA*

often deteriorate and also affected by the changes in environmental conditions. Thus, feedback mechanisms must be in place to regulate the stimuli in order to counter the habituation trend and the environmental influences.

To tackle these challenges, real-world EEG-BCI systems not only need to conduct real-time signal analyses and brain state predictions on individual data set but also to perform data-mining and machine-learning operations over large data sets collected from vast user population over extended time periods. To do so, future EEG-BCI systems must be connected to high-performance computing servers as well as massive on-line data repositories through the global Internet in order to excavate the wealth of information buried in the massive data collection and adapt their prediction models and operation strategies in response to the incoming data in real time. To realize these futuristic scenarios, we implemented a pilot on-line EEG-BCI system using wireless dry-electrode EEG headsets and MEMS motion sensors as the front-end devices, Android mobile phones as the personal user interfaces, compact personal computers as the near-end Fog Servers and the computer clusters hosted by the Taiwan National Center for High-performance Computing (NCHC) to provide the far-end Cloud Computing services. So far, we have conducted two sets of experiments using our pilot system: first, a trial of synchronous multi-modal global data streaming was carried out in late March and then three runs of the multi-player on-line EEG-BCI game *EEG Tractor Beam* were played since late September, 2013. Outcomes of these experiments were discussed in the Results section.

This paper adopts the structure of a technology report. The Methods section expounds the two architectural concepts as well as the three operating scenarios of this system. The following Results section described the two pilot experiments performed during the past year and used them as the examples to explain the relatively easy and modular approach to use this system to develop novel applications. Finally, the Discussions section highlights the advantage of employing this system to implement future realworld EEG-BCI applications. It also discusses the information security and user privacy issues that may arise from the realworld deployment of this system. Potential cost/benefit tradeoffs are also considered. Since this is an on-going work to develop a pilot system, a list of future work is provided at the conclusion.

#### **METHODS**

This pervasive on-line EEG-BCI system was built upon two information and communication technologies: (1) a *multi-tier distributed computing infrastructure* that is based on Fog and Cloud Computing paradigms and (2) a *semantic Linked Data superstructure* that connects all the data entries maintaining in this distributed computing infrastructure through meta-data annotation. The system was designed to support three operation scenarios: (1) *"Big Data" BCI*, which can maintain ever-increasing amount of real-world BCI data in a scalable distributed data repository and search for data relevant to specific task and event types using semantic queries; (2) *Interactive BCI*, which enables the BCI systems to regulate their brain stimuli based upon real-time brain state prediction and feedback control; (3) *Adaptive BCI*, which can train and refine brain state prediction and classification models based on the relevant data sets gathered through semantic data queries and then push these models back to the EEG signal processing and brain state prediction pipelines in real time. Following sections offer a conceptual overview of the relevant technologies and the system operation. Engineering details, however, will be described in a complementary paper.

#### **MULTI-TIER FOG AND CLOUD COMPUTING INFRASTRUCTURE** *Rationale*

Real-world BCI systems (as well as other personal telemonitoring systems) constantly face the daunting challenge of providing reliable long-term monitoring results in the ever-changing real-world situations using only battery-powered devices. As Cummings pointed out in her paper (Cummings, 2010), the necessary technology for hardware miniaturization and algorithmic improvement may not become available in the near future. Meanwhile, it is simply impossible to perform the computation and communication demanding tasks on these wearable systems: *computation offloading* provides the only viable solution, and the adoption of *Fog Computing* paradigm was the practical engineering approach we chose to tackle this challenge.

Fog Computing was first proposed by Bonomi of Cisco (Bonomi et al., 2012) as an *ad-hoc* distributed computing paradigm that utilizes computing resources available among online computers (known as the Fog Servers) close to the wireless sensors and the mobile phones to offload their computing burden so as to prolong their battery life and enhance their data processing performance. When we superimpose Fog Computing onto Cloud Computing, we created a three-tier distributed computing architecture with the Fog Servers serving as the near-end computing proxies between the front-end devices and the far-end servers. These near-end servers can offer potent data processing and storage services to the front-end devices while incurring minimal amount of communication latency. Thus, the Fog Servers can be useful aids in real-time human–computer interactions.

For the sake of reaping the most benefit from this three-tier architecture, however, one must allocate computing tasks strategically at each tier and exchange information efficiently between the tiers using succinct data formats and interoperable communication protocols. In the rest of this section, we explore various ways to trade off the computation and communication workloads among the front-end, near-end, and far-end computing nodes. Our objective is to optimize the computation and communication efficiency of the entire infrastructure while enhancing the responsiveness and robustness of the pervasive on-line EEG-BCI systems.

#### *Architecture*

**Figure 1** illustrates the concept of multi-tier Fog and Cloud Computing. The first tier, known as the *front-end*, consists of battery-powered wireless sensors and mobile devices, which serve as the interfaces between the physical world, the human users and the cybernetic information infrastructure. The second tier or the *near-end* is formed by an *ad-hoc* conglomerate of consumer IT products such as personal computers, television set-top

boxes, and game consoles close to the front-end devices over the Internet. These computing nodes, known as the Fog Servers, have sufficient electric power, data storage, and computing capacity to offload the computing burden from the front-end devices in order to prolong their battery lives and enhance their performance. The final tier or the *far-end* is made up of Cloud Servers installed in public or private data centers. These high-performance computers not only have plenty computing power, storage capacity and communication bandwidth; they have also accumulated vast amount of information and can use them to make deduction and prediction beyond the capability of stand-alone computers. This massive Cloud-based information warehouse and computing engine is the "backbone" of this distributed infrastructure. Sophisticated as it seems, the Fog/Cloud Computing infrastructure is expected to be widely deployed riding the tie of the Internet-of-Things. For examples, the smart homes and buildings will have smart electric meters that can control the power consumption of electric appliances while interacting with the smart power grids; the in-home multimedia servers will deliver bundled information and communication services from the "Internet cloud" to individuals' personal devices; intelligent transportation systems will install roadside controllers/servers that will interact with pedestrians' mobile phones and vehicles' on-board computers while pulling and pushing data to the municipal and national data centers. From this perspective, our on-line EEG-BCI systems can be regarded as a kind of pervasive personal telemonitoring system. Consequently, all our design decisions were made to ensure interoperability with the de-facto or emerging standards in the field of *machine-to-machine communication* and *Internet-of-Things*.

#### *Computation and communication tradeoffs*

Currently, there exist a communication bottleneck and an information chasm between the mobile applications running on the front-end devices and the computing services provided by the far-end Cloud Servers. The existence of the communication bottleneck is due to the fact that 3G/Wi-Fi Internet connections offer asymmetric data communication. These wireless networks operate based on the assumption that data flow in larger quantity and higher rates from the Internet content/service providers to the individual consumers; hence, the provider-to-consumer downlinks are allotted much wider bandwidth than the consumerto-provider up-links. However, the balance is gradually tilted by the increasingly widespread deployment of Internet sensors; in the near future, much more data will be generated by the frontend devices than the results produced by the far-end servers. Meanwhile, an information chasm is also created by the separation between the data producers (sensors) and the data processors (servers). The data transport latency through the Internet core can run between 200 and 500 ms. Thus, it is impossible for mobile applications to produce sub-second real-time responses using Cloud Computing. Along with other Fog Computing advocates, we therefore propose to disperse computing tasks along the data transport paths. Specifically, we suggest: (1) to install powerful embedded processors in wireless sensors in order to perform onboard data pre-processing and streaming analysis; (2) to convert personal computers, television set-top boxes, and game consoles into ubiquitous Fog Servers through the deployment of *ad-hoc* computing proxy software in order to perform most of the realtime computation; (3) to support meshed-up web services among Cloud Servers in order to make full use of their information collection and computing power in cross-sectional and/or longitudinal data analyses. Following is the pragmatic approach we took to building our pervasive on-line EEG-BCI system.

Contrast to popular belief, modern wireless sensors and mobile devices are no longer impoverished in their communication and computing capability. Both the Bluetooth® 4.0 protocol (Bluetooth Smart Technology: Powering the Internet of Things) and the IEEE 802.11n low-power Wi-Fi technology (Venkatesh) can support data transfer rates up to 24 Mb/s. Also, several low-power embedded processors have 32-bit processing units, floating point co-processors, direct memory access channels and power management units built into their system-on-chip (SoC) design. With these new technologies, the design decision now lies with the tradeoff between on-board computation and communication power budget. In fact, computation is usually more power efficient than communication unless the communication occurs over very short distance as in the case of Bluetooth personal-area networks. Cell phone communication is much less efficient as its power consumption increases in proportion to the *forth power* of the communication distance. With powerful embedded processors, the new generation of wireless sensors can perform various signal pre-processing tasks including artifact removal (Jung et al., 2000; Joyce et al., 2004), compressive sampling (Candes and Wakin, 2008), and even feature extraction (Suleiman and Fatehi, 2007) on board. These pre-processing tasks can transform large amount of raw data into compact representations and hence improve the combined power efficiency of computation and communication measured in Joule/bit. We have used these technologies to build a 10-DOF motion sensor (Zao et al., 2013), which consumes less electric power and supplies much more computing power than similar commercially available sensors.

Deploying ubiquitous Fog Servers close to the front-end devices (in terms of network distance) can serve two purposes at once: first, it can help the wireless sensors to provide subsecond real-time responses by offloading their heavy computation to the more powerful Fog Servers with minimal communication overhead, and it can also mitigate the communication bottleneck between the local area networks and the global Internet by drastically reducing the amount of traffic flowing between the Fog Servers and the Cloud Servers. In the example of our multi-player on-line EEG-BCI game, EEG Tractor Beam (section Multi-player On-line Interactive BCI Game), the Fog Servers sent only the brain states of individual players over the Internet every quarter of a second. Hence, the game generates very little real-time traffic even with hundreds of players participating in a single on-line session. Fragments of raw EEG data will be uploaded only after the game for the sake of building up the vast EEG data repository.

Computation off-loading becomes most effective when the Fog Servers possess high-performance multicore processors, are abundant in electric power and connected to both wired and wireless broadband networks. Game consoles are a perfect example of such servers. Other candidates include the television set-top boxes with Wi-Fi connectivity, the next-generation home Internet gateway with built-in servers and the dashboard computers on intelligent vehicles. Whenever the BCI frontends come within the wireless network coverage of these Fog Servers, they should connect themselves directly to these servers. They can then stream their data directly and perform real time signal processing and brain state prediction on these servers. The results can then be disseminated to the associated Cloud Server(s), the peer Fog Servers and the personal mobile devices in power and bandwidth efficient ways.

The Cloud Servers play both the roles of massive data repository and high-performance computing engine in our on-line EEG-BCI system. Nonetheless, not all these servers need to be installed in big data centers; many of them can be installed in server clusters all over the world. In fact, most data sets would likely be stored in local Fog Servers with only their meta-data uploaded onto the Cloud Servers. Together, the Cloud Servers create a logical Linked Data superstructure by maintaining a federated semantic meta-database and performing semantic search over this meta-database. Only when the semantic data search matches the meta-data with certain search criteria, the associated data sets will be transported to one or more Cloud Servers. Crosssectional and/or longitudinal analyses will then be performed onto these data sets. Data will be cached within the Cloud Servers only for a finite duration; un-used data will be flushed so as to make efficient use of the cloud-based data storage.

#### *Heterogeneous data interchanges*

To ensure interoperability, our pervasive EEG-BCI system implements two Internet data interchanging mechanisms: (1) *machineto-machine publish/subscribe data exchanges* between the sensors and the Fog Servers as well as among the peer Fog Servers; (2) *web-based client-server transactions* between the Fog Servers and the Cloud Servers.

The machine-to-machine publish/subscribe data exchanges are used to push multi-modal BCI data from the front-end sensors to one or more near-end Fog Servers. This data transport mechanism supports real-time multi-point communication with minimal overhead. We chose to use MQTT (Message Queuing Telemetry Transport) (IBM), a lightweight publish/subscribe protocol with reliable transmission, so that it can be implemented on simple low-power devices.

The client-server transactions enable the Fog Servers to interact with the Cloud Servers over a standard Web Service interface. We chose to employ RESTful Web Service (Fielding, 2000; Elmangoush et al., 2012), the de-facto standard server interfaces for mobile applications, to support these transactions. This choice ensures that our Fog Servers can interoperate with any web server in the Computing Cloud, and allows any user computer to query any of our Cloud Servers so as to obtain BCI services from our system.

#### *Modularized software interfaces*

Our pervasive EEG-BCI system aims at working with a garden variety of sensors as well as signal processing and neuro-imaging software. To do so, we must support conversion between different EEG data formats and provide program interfaces to software modules.

Currently, our system supports data conversion between the legacy BDF/GDF/EDF formats and the new Extensible Data Format (XDF) (Kothe, 2014b) as well as the SET format used by the MATLAB® EEGLAB toolbox (EEGLAB, 2014). Internally, our system employs Google protocol buffers (Protobuf) (Google, 2012) to en-code all the data sent through MQTT and RESTful protocols and uses Piqi (Lavrik, 2014) to convert the data between Protobuf, XML and JSON formats.

In order for our EEG-BCI system to work with several EEG analysis MATLAB® toolboxes including (BCI2000, 2014; BCILAB, 2014; EEGLAB, 2014), we developed an application program interface (API) between the MQTT publish/subscribe data transport protocol and the MATLAB toolboxes using the Lab Streaming Layer (LSL) middleware (Kothe, 2014a). This API supports data acquisition, time synchronization and real-time data access among MATLAB modules.

Finally, in order to enable the MATLAB toolboxes to interact with the Linked Data superstructure described in the next section, we also devised a RESTful Web Service interface to support semantic data up/downloading, redirection and search operations. This interface allows mobile applications (1) to add meta-data links to the streaming EEG data and/or the archived EEG data sets and (2) to perform semantic search over these data streams and data sets without knowing the details of the semantic data structure.

#### **FEDERATED LINKED BIG DATA SUPERSTRUCTURE**

The second technology supporting our pervasive on-line EEG-BCI system is a logical data superstructure that was constructed according to the W3C Linked Data guidelines (Berners-Lee, 2006). The sole purpose of employing the Linked Data technology is to enable the Fog and Cloud Servers as well as other authorized computers to perform *semantic data search* on a distributed repository of BCI data sets. Unlike human users, computers cannot tolerate ambiguity in the meanings of the keywords as they use these keywords to search for relevant sets or describe their characteristics. Traditional data models such as the relational model fail to deliver a proper solution as they lack the ability to specify the semantic relations existing among various data objects and concepts. We need a *semantic data model* and a *querying technique* that have rich semantics to describe the real-world settings of brain–computer interactions and provide sufficient granularity to specify different BCI stimuli and responses. In the following sections, we introduce briefly the principle behind the Linked Big Data Model we adopted and the Semantic Sensor Network (SSN) ontology we extended to support semantic search among the BCI data collection.

#### *Semantic data model and linked big data*

Linked Data (2014) is the latest phase of a relentless effort to develop a global interconnected information infrastructure: the first phase began with the deployment of the Internet, which connects information processors (computers) together using physical communication networks; the second phase was marked by the development of the World Wide Web, which connects information resources (documents and services) together through logical data references; the third and the latest phase was launched through the dissemination of Linked Data, which connects information entities (data objects, classes, and concepts) together via semantic relations. From another perspective, the migration from World Wide Web to Linked Data represents a paradigm shift from publishing data in human readable HTML documents to machine readable semantic data sets so that the machines can do a little more of thinking for us.

In essence, a Linked Data set is a graph with its nodes being the *data objects*, *classes*, and *concepts* while its edges specifying the *relations* among these data entities. Conforming to the convention of Semantic Web (W3C, 2014b), every relation in this graph is specified as a *predicate* in Resource Description Framework (RDF) (W3C, 2014a); each RDF predicate or triplet consists of a *subject*, an *object* and a *relation* all expressed in Extensible Markup Language (2013) format. The formal semantics of a Linked Data set is prescribed by a core sub-graph known as a *RDF schema*. It specifies the semantic relations between data classes, concepts and attributes that are relevant to the data set. The additional information superimposed onto the actual data is referred to as the *meta-data*. A RDF schema that encompasses all the data classes, concepts and relations in a field of knowledge is known as an *ontology.* This graphic depiction of semantic relations presents a *semantic data model* in *knowledge representation* (Randall Davis, 1993).

To find all the entities in a Linked Data set that are related in a specific data object, concept or an attribute, one simply perform a search or traversal through the graph: all the nodes that can be reached via the traversal by following a set of constraints constitute the results of this *semantic search*. Since the graph traversals can be performed by computers without any human, they suit perfectly for automatic machine-to-machine information query. A query language known as SPARQL (W3C, 2014c) was developed to specify the criteria (objectives and constraints) of semantic search based on RDF predicates much the same as SQL has done for the relational databases.

We adopted the approach of Linked Big Data (Dimitrov, 2012; Hitzler and Janowicz, 2013) to support machine-to-machine semantic search among BCI data sets. This approach requires us to deposit a layer of meta-data upon the BCI data sets. These meta-data annotate the data sets (as a whole and in parts) with *semantic tags* that describe the characteristics of the subjects, the circumstances and the mechanisms with which the BCI data have been captured. Semantic search based on these meta-data will enable computers to find the annotated data sets and/or their fragments that match specific search criteria. Unlike Big Linked Data, an alternative approach that converts every data entity into a Linked Data object, the Linked Big Data approach maintains the original data representation, but adds meta-data "tags" to the data sets in order to facilitate the semantic search.

Our colleagues at the Swartz Center for Computational Neuroscience (SCCN) have designed the meta-data tags for annotating EEG data sets. Among them, the *EEG Study Schema* (ESS, 2013) and the XDF (Kothe, 2014b) were devised to describe the *context* (subjects, circumstances and mechanisms) of the recording sessions. On the other hand, the *Hierarchical Event* *Descriptor Tags for Analysis of Event-Related EEG Studies* (HED) (Bigdely-Shamlo et al., 2013) was devised to specify the events that evoke the EEG responses. Our contribution includes the specification of a BCI Ontology, which captures the semantics of ESS and HED vocabulary, and the development of a RESTful Web Service interface for managing and querying the BCI repository.

#### *BCI ontology*

A pre-requisite to organize BCI data sets according to the Linked Data guidelines is to devise a *BCI Ontology* to capture the BCI domain knowledge. Since brain–computer interactions can be regarded as a form of sensor activity, we decided to devise the BCI Ontology as an application specific extension to *SSN* Framework Ontology (W3C, 2011) for organizing the sensors and sensor networks on the World Wide Web.

The core of SSN Ontology is the *Stimulus-Sensor-Observation Ontology Design Pattern* (Compton and Janowicz, 2010) built upon the basic concepts of stimuli, sensor and observations. The sub-graph marked with the red outlines in **Figure 2** is the semantic graph of this design pattern.


Following are some of the basic concepts/classes defined in the BCI Ontology namespace: http://bci.pet.cs.nctu.edu.tw/ ontology#. They are aligned with the core concepts in the SSN Stimulus-Sensor-Observation Ontology Design Pattern. **Figure 2** shows a few examples of the alignment.

• **Sessions, Resources, Devices, and Records:** these are the basic concepts and terminology pertained to BCI applications. Among them, Sessions align with Observations; Records align with Observation Values and have EEG Records as a subclass; Devices align with Sensing Devices, which has EEG Device being its subclass; Resources is an abstraction of data files and streams.


#### *Federated linked data repository and semantic search*

In order to allow BCI users to maintain recorded data in their own servers as well as conducting semantic data search among multiple servers, our BCI system must be equipped with a distributed Linked Data repository and a federated semantic data querying scheme. Both of these facilities are safeguarded by Internet communication security and multi-domain attribute-based access control mechanisms.

The distributed Linked Data repository consists of two functional components: (1) the individual Fog/Cloud Servers that maintain the actual BCI data sets and (2) the RDF repository spread across the Cloud Servers that manage the meta-data of the Linked Big Data superstructure. In order to protect user privacy, all personal information and raw BCI data shall be stored in either the Fog Server(s) on users' premise or the trusted Cloud Server(s) authorized by the users. All sensitive data are protected by strong communication and information security measures. Only the anonymous subject identifiers, the universal resource identifiers (URI) and the meta-data tags of the data sets may be disseminated among the Cloud Servers. Together, the Cloud Servers maintain a distributed *RDF repository* that can be queried under anonymity protection using the *SPARQL Protocol and RDF Query Language* (SPARQL) v.1.1 (W3C, 2014c).

SPARQL 1.1 query language supports the *federation* of multiple SPARQL endpoints. As shown in **Figure 3**, a client can issue a SPARQL 1.1 query to a *query mediator*, which will convert the query into several *sub-queries* and forward them to different SPARQL endpoints. Each endpoint then processes the sub-query it received and sends back the query results. Finally, the mediator joins the query results from different endpoints to produce the final result.

Currently, we use *Virtuoso Universal Server* (VUS) v6.01 (OpenLink Software, 2014) to host the distributed RDF repository. Offered freely as a key component of (LOD2 Technology Stack, 2013), VUS is the most popular open-source semantic search engine for Linked Data applications. VUS can perform *distributed RDF link traversals* as a rudimentary mechanism to support federated SPARQL. To use this mechanism, we developed a Federated Query Mediator that can run on any Fog Server. This mediator can accept semantic data queries expressed in the RESTful/JSON web service format; transform them into

ontology design pattern.

SPARQL 1.1 sub-queries and then issue these sub-queries to the VUS installed in multiple Cloud Servers. This RESTful/JSONcompatible Federated Query Mediator not merely implements the federated semantic search; it also provides a standard web service interface for any authorized mobile applications to issue SPARQL queries and thus access our linked BCI repository.

(tan colored nodes with dbp prefix). The sub-graph with red outlines

#### **RESULTS**

#### **PILOT SYSTEM**

In the past two years, the Pervasive Embedded Technology (PET) Laboratory at NCTU and the SCCN at UCSD have been working together closely to develop a proof-of-concept prototype of the proposed pervasive EEG-based BCI system. In this endeavor, we chose to use wireless dry-electrode EEG headsets and MEMS motion sensors as the *front-end devices*, Android mobile phones as the personal user interfaces, compact personal computers as the *near-end Fog Servers* and a supercluster of computers hosted by the Taiwan NCHC as the *far-end Cloud Servers*. **Table 1** provides a detail list of hardware and software components that are used to build this proof-of-concept pilot system.

This pilot system is currently deployed on two application/fogcomputing sites: (1) NCTU PET Lab, (2) UCSD SCCN, and two cloud-computing sites: (1) NCHC supercluster and (2) UCSD SCCN virtual machine server. **Figure 4** illustrates the system configuration at these sites. Both NCTU and UCSD fog-computing sites have participated in all pilot experiments and demonstrations. Currently, the NCHC cloud-computing site is hosting the BCI data repository and the BCI web portal while the SCCN server is maintaining an archive of legacy BCI data sets.

In the past year, both PET and SCCN teams have used this pilot system to perform different experiments demonstrating the capability and the potential of pervasive real-world BCI operations. Following subsections describe the two multi-site experiments we have performed.

#### **SYNCHRONOUS BCI DATA STREAMING OVER INTERNET**

The NCTU-UCSD team performed a successful live demonstration of real-time synchronous multi-modal BCI data streaming at a project review meeting of the Cognition and Neuroergonomics Collaborative Technology Alliance (Can-CTA) Program on March 13, 2013. In that intercontinental demonstration, Prof. John Zao was wearing a four-channel wireless *MINDO-4S* EEG headset and a 9-DOF *BodyDyn* motion sensor at NCTU PET Lab in Hsinchu, Taiwan. Sampled data from both sensors were transmitted simultaneously via Bluetooth to a Samsung Galaxy Note 1 smart phone. The data streams were then sent to a Fog Server at the PET Lab and multicasted over the Internet to a Cloud Server at the NCHC also in Hsinchu, Taiwan and a desktop computer at UCSD SCCN in San Diego, California. Four-channel EEG data as well as 3D linear acceleration and 3D angular velocity—with a total of 10 channels—were displayed at SCCN in synchrony with the live image of Prof. Zao's movements that was beaming through a Google Hangout session. Almost no perceptible delay can be seen between the video images and the EEG/motion waveforms appeared on the display at SCCN. A video clip attached to this paper shows an excerpt of that demonstration session.

Detail timing measurements of the end-to-end synchronous transports were made later in August during several replay of the demonstration and analyzed off time. **Figure 5** shows the time traces of standalone and concurrent transport of the two data streams. **Table 2** lists the formats and sizes of individual messages as well as the statistics of timing measurements of the transports. The significant differences in the mean values of transport latency were due to the offsets existing between the system clocks in the mobile phone at NCTU and the desktop computer at UCSD.

These time traces show that no message was lost because the transport was conducted using MQTT messaging over TCP sessions. Small standard deviations of transport latency imply that few retransmissions were needed to provide reliable delivery. Latency of the EEG sessions fluctuates slightly more than that of the motion sessions; this suggests that a few more retransmissions were needed to deliver the longer EEG messages. The average transmission intervals (237–243 ms) in both standalone and concurrent transport sessions match closely with the expected quarter-second (250 ms) emission interval of the data messages. Besides, the average reception intervals also match closely with the average transmission intervals. These matching figures hinted smooth transmissions that were free of hop-by-hop traffic congestion and end-to-end message queuing. This superb performance may be partially due to the fact that the experiment was carried out between two university campuses equipped with gigabit Ethernets. Larger fluctuations in transmission/reception intervals as well as transport latency shall be expected when the data streaming is conducted over home networks.

Both the live demonstration and the performance statistics indicate that it is entirely possible to send BCI data streams reliably in real time to multiple destinations over the Internet. Thus, this experiment affirms the feasibility of Internet-based on-line EEG-BCI operation. Nonetheless, we must point out a potential *scalability* issue that may arise during multicasting of multi-channel EEG data streams. As the EEG channel numbers and sampling rates increase, the data rates of the multicasting sessions may quickly exceed the up-link bandwidth (approximately 1 Mbps) of home networks. In order to avoid causing network congestion in these cases, data compression techniques such as *compressive sampling* (Candes and Wakin, 2008) must be employed to reduce the message size. In fact, as a general principle, we should avoid sending raw data over the Internet in real time because such a practice will not only consume more network bandwidth but also incur longer transport latency. With the presence of ubiquitous Fog Servers, we should perform most real-time signal processing and brain state prediction on the Fog Servers and send only the extracted signal features, the brain states

#### **Table 1 | Hardware and software components for the pervasive on-line EEG-BCI pilot system.**

#### **HARDWARE COMPONENTS**


and the meta-data over the Internet in real time. This operation principle was demonstrated in the following experiment.

#### **MULTI-PLAYER ON-LINE INTERACTIVE BCI GAME**

In order to optimize the communication and computation efficiency, users of our pervasive EEG-BCI system should always use a Fog Server nearby to perform real-time signal processing and brain state prediction rather than performing the computation at the frontend sensors / mobile phones or sending the raw data over the Internet to the Cloud Servers. To demonstrate this operation principle, we developed the *EEG Tractor Beam*, a multiplayer on-line EEG-BCI game, and launched its first game session on September, 2013. Since then, this game has been played in several public occasions with players from both US and Taiwan.

**Figure 6** illustrates the system architecture for this game, which is also a typical configuration for multi-site interactive BCI operation. Each user has a typical BCI frontend (shown as a sky blue box) consisting of an EEG headset and a mobile phone that are connected to a local Fog Server (a navy blue box). The Fog Servers associated with different users may exchange information with one another and a Cloud Server (the green box). The game was running as a mobile application on each user's mobile phone, which serves mainly as a graphic user interface (GUI). Raw EEG data streams were sent directly to the Fog Server or through the mobile phones. Real-time signal processing and prediction were performed on the Fog Servers, each of which ran a BCI signal processing pipeline. The brain states of individual users were published by the Fog Servers and sent to the game running on each mobile phone, which subscribed for the brain state information.

On its display, the multiplayer game shows all the players on a ring surrounding a target object. Each player can exert an attractive force onto the target in proportion to her level of concentration, which was estimated using the following formula (Eoh et al., 2005; Jap et al., 2009):

$$\mathbb{C} \triangleq \ln \left( \frac{\text{PSD}\_{\beta}}{\text{PSD}\_{\alpha} + \text{PSD}\_{\theta}} \right).$$

Where the PSDs are the average power spectral density in α, β and θ bands of the player. In order to win the game, a player should try to pull the target toward herself while depriving other players their chances to grab the target. The game implements a "winnertake-all" strategy: a player is awarded points at a rate proportional to the percentage of total attractive force she exerts on the target, which is calculated by dividing that player's concentration level by the sum of the levels among all the players. However, a player can only start to accumulate points if she contributes at least her fair share to the total sum. A tractor beam will appear between that player and the target when her concentration level passes that threshold. That was when she starts to cumulate her points. **Figure 7** shows a picture of four players engaging in the game across the Pacific Ocean.

The necessary EEG signal processing and the estimation of *concentration level* were performed by the BCILAB/SIFT pipeline (Delorme et al., 2011) running on MATLAB R2013a (Mathworks, 2013) installed in the Fog Servers. **Figure 8** displays the typical processing stages of this brain state estimation pipeline. Its MATLAB code was included in the Appendix for reference. The EEG preprocessing stage aims at cleaning up the raw EEG signals, which was heavily contaminated by artifacts due to eye blinks and head movements. The heavy computation of signal correlation and artifact subspace reconstruction (Mullen et al., 2012) can only be performed on the Fog Servers; these algorithms can quickly drain the batteries in the sensors and the mobile phones. Because players' concentration levels was estimated as the ratios between power spectral density in different EEG frequency bands, multitaper spectral estimation,

**FIGURE 4 | Pilot system architecture of (A) Cloud Computing site at NCHC, Taiwan and (B) Fog Computing sites at NCTU PET Lab, Taiwan and UCSD SCCN, San Diego, California.**

**FIGURE 5 | Time traces of end-to-end synchronous transport of motion and EEG data streams. (A,B)** show the time traces of motion and EEG data transports in two separate sessions. **(C,D)** show the traces of both transports in the same session. The blue lines mark the traces of transmission time while the red lines mark those of reception time. Their slopes give the average transmission and reception intervals of individual messages.

#### **Table 2 | Performance measurements of synchronous BCI data streaming over Internet.**


*aThe average or mean values of transport latency were contaminated by the offset between the system clocks in the mobile phone at NCTU and the desktop computer at UCSD.*

power density calibration1 and averaging were done before the concentration levels were computed. Please note that although we chose to implement the BCI processing pipeline using BCILAB and SIFT, other real-time signal processing software can be used to perform the computation.

To demonstrate the working of our BCI processing pipeline, we showed in **Figure 9** two 1-min scattered plots of a player's centration levels estimated during a 2-min open-eye relaxation period and an equal-length open-eye concentration period. The average concentration level during the relaxation period was μ*<sup>R</sup>* = −0.19 < 0 as expected while the average level during the concentration period was μ*<sup>C</sup>* = +0.45. The difference between these values was statistically significant. The estimated values fluctuated notably during both periods. Partially, this was due to the wavering of player's concentration levels, but more likely, the fluctuations were caused by the remaining artifacts of head movements and muscle tension. These artifacts remain as an inevitable component of real-life EEG recording and a challenge to real-world BCI operation. Finally, both plots showed a general downward trend. This was because when the player tried to sustain her concentration, mental fatigue invariably set in after a short while; hence, her EEG power in beta band tended to decrease gradually relative to the power in alpha band. On the other hand, when the player tried to relax, it took some time for her to settle into a relaxed state; hence, we expect her alpha power to increase gradually relative to her beta power. In both cases, gradual decrease in concentration level was expected, especially if the player was untrained to perform the cognitive task.

In all the gaming sessions, the data rates and transport latencies over the Internet have been low since the Fog Servers published short messages merely containing players' identifiers and concentration levels. Also, the game displays among different players were synchronized because they all used Samsung Galaxy phones with compatible computing power. A small but noticeable display lag may appear if a player uses an old Android phone. This display lag can be eliminated using standard game synchronization protocols.

While *EEG Tractor Beam* is a somewhat frivolous demonstration of the capability of the pervasive on-line EEG-BCI system, it does demonstrate some powerful concepts that may have applications far beyond on-line gaming. Foremost, the system has the ability to acquire and process EEG data in real time from large number of users all over the world and feed their brain states back to these individuals as well as any professionals authorized to monitor their cognitive conditions. With distributed Fog and Cloud Servers, our on-line EEG-BCI infrastructure can be scaled indefinitely without adding unsustainable traffic load onto the Internet. Hence, it presents a viable way to realize *interact BCI*. Furthermore, the system has the ability to process, annotate and archive vast amount of real-world BCI data collected during the BCI sessions. Unlike the existing EEG databases, which depend on researchers to donate their data sets, this pervasive EEG-BCI infrastructure collects data sets—with users' approval—as an essential part of its normal operation. This intrinsic data collection provides a natural way to implement *"big data" BCI* as well as *adaptive BCI* in the near future. In the following section, we discuss the potential values and impacts of this pervasive on-line system toward the real-world BCI applications.

#### **DISCUSSIONS**

In this section, we examine the operation scenarios supported by the pervasive on-line EEG-BCI system as well as the costs and benefits of its potential use. This discussion begins with a comparison with the existing BCI systems and on-line physiological data repositories; it is concluded with a highlight of future development.

#### **COMPARISON WITH CURRENT PRACTICE**

Currently, all BCI systems operate in a *standalone* fashion and need to be *personalized* before their use. No matter whether they are used to control patients' wheelchairs, conduct neuromarketing or provide biofeedback, these systems require their users to go through tedious training processes in order to adapt

<sup>1</sup>The multitaper estimates of EEG power spectral density were multiplied by their sampled frequencies in order to compensate the natural decline of EEG spectral power inversely proportional to its frequency.

**FIGURE 7 | An EEG Tractor Beam game session with four people playing over the Internet: two players at SCCN in San Diego, USA are shown in the foreground while two other players at NCTU in Hsinchu, Taiwan appear in the monitor display.** The inset at the lower right corner shows a captured view of the game display.

them for personal use. Moreover, they often require the training process to be repeated once the use situations are changed. Our on-line EEG-BCI system, however, can download an initial brain state prediction model from the Cloud Server based on the real-world situation in which it operates, and then refine the model progressively using the data gathered from its users (section Adaptive BCI). This *adaptive* capability as well as its *interactive* and *big data processing* capability will distinguish our system from the existing ones.

The biomedical engineering community has been exploiting Cloud Computing and Big Data Mining technologies for years. In the past decade, several on-line physiological data repository including BrainMap (Research Imaging Institute, 2013), PhysioNet (Goldberger et al., 2000), and HeadIT (Swartz Center for Computational Neuroscience, 2013) have been put on line. Among them, PhysioNet earned the best reputation through the offering of a wide-range of data banking and analysis services. However, none of these data repositories are ready to accept real-time streaming data.

Furthermore, as demonstrated in the *EEG Tractor Beam* gaming sessions, our on-line EEG-BCI system also has the ability to support real-time multi-user collaborative/ competitive neurofeedback. This unique ability may lead to many novel applications in cognitive collaboration, e-learning as well as on-line gaming and mind training.

#### **OPERATION SCENARIOS**

As shown in **Figure 10**, the pervasive on-line EEG-BCI system can operate in three different scenarios: Big Data BCI, Interactive (or

Closed-Loop) BCI and Adaptive BCI. Each scenario represents an incremental enhancement of system capability.

#### *Big data BCI*

In this first operation scenario, the pervasive EEG-BCI system is endowed with the capability to collect multi-modal data along with relevant environmental information from real-world BCI applications anytime anywhere. This capability not only enables BCI applications to identify common EEG correlates among different users while they perform the same tasks or exposed to similar stimuli; it also provides a pragmatic way to gather vast amount of BCI data from real-life situations for cross-sectional and longitudinal studies. A linked BCI data repository and a RESTful web service API have been created for maintaining the data collection. Human clients would use the Web Portal (http:// bci.pet.cs.nctu.edu.tw/databank) to access and query the data. Machine or application clients would use the RESTful web service API (http://bci.pet.cs.nctu.edu.tw/api) to perform specific data operations.

Currently, Big Data BCI is the only fully functioning scenario of our pilot system. All our experiments archived their data sets in the linked BCI data repository.

#### *Interactive BCI*

People's brain states and their EEG characteristics can be influenced acutely by the changes in environment conditions. Various visual, auditory, heat and haptic stimuli have long been used to evoke neural responses or modulate users' brain states. Currently, all these stimuli are static in nature as they lack the ability to adapt to users' changing brain states. Hence, the stimuli would become ineffective as habituation dampens users' neural responses or in the worse cases, cause harmful side effects.

Since the on-line EEG-BCI system can perform real-time brain state prediction on the Fog Servers, we can introduce a feedback control loops between the stimuli and the users' EEG responses. This *interactive* operation scenario can improve the accuracy of exogenous brain state prediction and the effectiveness of brain state modulation by applying the most powerful stimuli based on closed-loop feedback control.

#### *Adaptive BCI*

It is well known that people's EEG responses toward the same tasks (or stimuli) often differ significantly from one another and can change drastically over time. Thus, the prediction models employed by our BCI system must adapt to individual user's EEG responses and adjust their parameters continuously to track the changes of their characteristics. Usually, model adaptation and refinement are conducted using a large amount of training data. In order to reduce the amount of training data from individual users, we are exploring the feasibility of adapting the prediction model by leveraging the archived data collected from other users plus a small amount of training data acquired from this new user.

In our system, the adaptive BCI operation is performed through the cooperation between a Fog Server and its associated Cloud Server. The Fog Server will upload the annotated BCI data along with the predicted brain states, the prediction model specification and the confidence level on its prediction onto the Cloud Server. Then, the Cloud Server will issue semantic queries to find similar EEG data fragments among the archived BCI data sets and then apply *transfer learning* techniques on both the acquired and the archived data sets. Through repetitive trials, this *progressive refinement* process will likely produce a prediction model better-adapted to the BCI activity of that user in a specific real-world situation.

#### **PRACTICAL ISSUES**

Users are rightfully concerned about several practical issues such as *cost*, *availability*, *security* and *privacy* that may arise from the daily use of this elaborate infrastructure. Following are the concrete facts we hope may soothe some of these concerns.

First, the technologies we employ have already been used to provide Internet services today. The Cloud Servers have been running Google search and Yahoo web portals all along. Television set-top boxes and game consoles that function as the Fog Servers are popular electronic appliances. Almost without exception, mobile applications are installed in every smartphones these days. From this perspective, pervasive EEG-BCI is a natural outcome of the on-going trend to foster smart living using the state-of-art information and communication technologies. The incremental costs of using pervasive EEG-BCI will be quite affordable. A user only needs to purchase a wearable EEG headset and download a mobile application. The computing engine will be automatically downloaded onto her "fog server" once the user signs a service agreement. It is quite possible that pervasive EEG-BCI would become a fashion very much like the use of fitness gadgets these days.

Second, pervasive EEG-BCI will likely be offered by a supply chain of vendors that can bundle this service with Internet connectivity, content and computing. The huge infrastructure deployment and maintenance costs must be amortized among these service providers. Furthermore, the BCI data repository and the progressive model refinement technologies will take time to develop. Hence, this service must go through a maturing process.

Third, information security and personal privacy should indeed be users' common concerns. However, they must be dealt with as two separate issues. The basic guarantees of user anonymity, secure exchange, save storage and limited access can be provided through the employment of necessary communication and information security measures. These mechanisms are discussed in the following section. However, many users would be terrified by the notion that "the big brother can know not only where I click but also what I *think* when I browse the web!" Protection of personal privacy in that sense must be offered not merely through technical means but by developing and enforcing public policies according to social norms. Surprisingly, the protection of personal cognitive information is not more difficult than the protection of personal behavioral data collected by say Google, and is much easier than preventing information leakage via social networking because unlike individuals, reputable service providers are much more serious and diligent in guarding their clients' personal information.

#### **FUTURE DEVELOPMENT**

The pervasive EEG-BCI pilot system is merely a prototype. We plan to develop it into a field-deployable system within the coming year. Specifically, we will further develop its semantic data model and provide multiple ways to access streaming and archived data via multiple Internet protocols. Moreover, the following capability will be added to the system.

#### *Cloud based progressive model refinement*

Fog Servers will be able to perform adaptive brain state prediction with the aid of *progressive model refinement* carried out by the Cloud Servers. The process begins with *automatic annotation* of EEG data segments with their corresponding brain states according to the outcome of current prediction process. The meta-data annotation will be sent to the Cloud Servers so that cloud-based semantic search can find large number of data segments that match with certain personal, environmental and event specification. These data segments will then be fed into machine learning algorithms to calibrate the prediction model. The calibrated model will be pushed back to the Fog Servers and used to perform the next round of brain state prediction and data annotation. This iterative process will continue to improve the accuracy of prediction and enable the system to track the non-stationary brain dynamics. The Predictive Model Markup Language (PMML v.3.2, 2008; Guazzelli et al., 2009) will be adopted as the interoperable model specification and encoding format.

#### *Information security and user privacy protection*

We are developing a pervasive machine-to-machine communication security infrastructure based on the Internet standards: Host Identity Protocols (HIP) (IETF, 2014) and Host Identity Indirection Infrastructure (Hi3) (Nikander et al., 2004). HIP has become an increasingly popular approach to offer secure communication among the Internet of Things (Kuptsov et al., 2012). In addition, we developed a multi-domain attribute-enriched rolebased access control architecture (Zao et al., 2014). Both of these technologies will be used to offer the essential communication and information security protection.

#### **CONCLUSION**

The pervasive on-line EEG-BCI system we built culminated the development trends of two state-of-art information technologies: *Internet of Things* and *Cloud Computing*. As such, our pilot system can be regarded as a pioneering prototype of a new generation of real-world BCI system. As mentioned in section Operation Scenarios, these on-line systems will not merely connect the existing standalone EEG-BCI devices into a global distributed system; more importantly, they are fully equipped to support futuristic operations including intrinsic real-world data collection, massive semantic-based data mining, progressive EEG model refinement, stimuli-response adaptation. In academic and clinic research, these pervasive on-line systems will cumulate vast amount of EEG-BCI data and thus enable cross-sectional and longitudinal studies of unprecedented scale. Inter-subject EEG correlates of specific tasks and stimuli may be found through these studies. In the commercial world, numerous consumer applications will become feasible as wearable EEG-BCI devices can track people's brain states accurately and robustly in real time.

#### **ACKNOWLEDGMENTS**

This system development project is a team effort. The Pervasive Embedded Technology (PET) Laboratory at the National Chiao Tung University (NCTU) in Taiwan, the SCCN at the University of California, San Diego (UCSD) in the United States of America and the NCHC sponsored by the Taiwan National Research Council have all contributed to the development of this pilot system. In addition, Dr. Ching-Teng Hsiao of the Research Center for Information Technology Innovation (CITI) at the Academia Sinica of Taiwan has served as a technology consult throughout this project. Among the authors: Tchin-Tze Gan, Chun-Kai You, and Chien Yu of PET as well as Yu-Te Wang of SCCN were responsible for the development of the Fog and Cloud computing infrastructure; Sergio José Rodríguez Méndez (PET), Cheng-En Chung (PET), and Ching-Teng Hsiao (CITI) created the Linked Data superstructure and developed the mobile applications to perform the semantic data queries; Tim Mullen, Christian Kothe, and Yu-Te Wang all of SCCN have developed the BCILAB and LSL toolboxes and implemented the EEG signal processing pipelines; San-Liang Chu and his technical team at NCHC set up the cloud servers for this project. Finally, John K. Zao, the Director of PET Lab, was the innovator and the designer of this infrastructure; Tzyy-Ping Jung, the Associate Director of SCCN, first proposed the approach of pervasive adaptive BCI and mobilized this effort; Ce-Kuen Shieh, the Director of NCHC, endorsed and promoted the inter-collegiate deployment of this pilot system.

#### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fnhum. 2014.00370/abstract

#### **REFERENCES**


DBpedia. (2014). *wiki.dbpedia.org*: *About*. Available online at: http://dbpedia.org/


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 08 December 2013; accepted: 13 May 2014; published online: 03 June 2014. Citation: Zao JK, Gan T-T, You C-K, Chung C-E, Wang Y-T, Rodríguez Méndez SJ, Mullen T, Yu C, Kothe C, Hsiao C-T, Chu S-L, Shieh C-K and Jung T-P (2014) Pervasive brain monitoring and data sharing based on multi-tier distributed computing and linked data technology. Front. Hum. Neurosci. 8:370. doi: 10.3389/fnhum. 2014.00370*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Zao, Gan, You, Chung, Wang, Rodríguez Méndez, Mullen, Yu, Kothe, Hsiao, Chu, Shieh and Jung. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Neuroergonomics: a review of applications to physical and cognitive work

#### *Ranjana K. Mehta1\* and Raja Parasuraman2*

<sup>1</sup> Department of Environmental and Occupational Health, School of Rural Public Healthy, Texas A&M University, College Station, TX, USA <sup>2</sup> Center of Excellence in Neuroergonomics, Technology, and Cognition, George Mason University, Fairfax, VA, USA

#### *Edited by:*

Klaus Gramann, Berlin Institute of Technology, Germany

#### *Reviewed by:*

Matthew Rizzo, The University of Iowa College of Medicine, USA Andy McKinley, Applied Neuroscience Branch of the 711 Human Performance Wing, USA

#### *\*Correspondence:*

Ranjana K. Mehta, Department of Environmental and Occupational Health, School of Rural Public Healthy, Texas A&M University, Room 106, College Station, TX 77843-1266, USA e-mail: rmehta@tamu.edu

Neuroergonomics is an emerging science that is defined as the study of the human brain in relation to performance at work and in everyday settings. This paper provides a critical review of the neuroergonomic approach to evaluating physical and cognitive work, particularly in mobile settings. Neuroergonomics research employing mobile and immobile brain imaging techniques are discussed in the following areas of physical and cognitive work: (1) physical work parameters; (2) physical fatigue; (3) vigilance and mental fatigue; (4) training and neuroadaptive systems; and (5) assessment of concurrent physical and cognitive work. Finally, the integration of brain and body measurements in investigating workload and fatigue, in the context of mobile brain/body imaging ("MoBI"), is discussed.

**Keywords: physical work parameters, physical fatigue, mental fatigue, vigilance, training, neuroadaptive systems**

#### **INTRODUCTION**

Neuroergonomics is defined as the study of the human brain in relation to performance at work and everyday settings (Parasuraman, 2003; Parasuraman and Rizzo, 2007). It integrates theories and principles from ergonomics, neuroscience, and human factors to provide valuable insights on brain function and behavior as encountered in natural settings (Parasuraman, 2011). In this paper, we review neuroimaging techniques applicable to neuroergonomics that has expanded our understanding of the neural correlates of operators' physical and cognitive capabilities and limitations when they interact with work systems. Moreover, while experimental laboratory studies have advanced our knowledge of brain functions during simulated work, it is important to assess operator performance in naturalistic work settings. Understanding brain function in such dynamic and mobile work settings requires the use of ambulatory neuroimaging techniques (Makeig et al., 2009).

There are two main reasons why ambulatory neuroimaging techniques need to be developed for ergonomics research and practice. First, by definition, physical ergonomics requires that participants move their limbs or bodies while carrying out some physical task. Moreover, while cognitive ergonomics studies can be conducted in immobile participants, research on *embodied cognition* has shown that cognitive processing when moving and interacting in the physical world may have unique characteristics that can only be captured with mobile neuroimaging (Clark, 1998; Parasuraman, 2003; Raz et al., 2005). This review discusses the use of neuroergonomics methods to evaluate brain responses in mobile work environments. We discuss the suitability and feasibility of mobile and immobile brain imaging techniques in the context of physical neuroergonomics, cognitive neuroergonomics, and neuroergonomic assessment of concurrent physical and mental work. Finally, we consider the requirements and utility of combined brain and body measurements in investigating workload and fatigue for neuroergonomic investigations.

#### **NEUROERGONOMIC METHODS**

Neuroergonomic studies rely heavily on existing neuroimaging techniques to understand brain structures, mechanisms, and functions during work. Neuroimaging techniques applicable to neuroergonomics fall into two general categories, those that are direct indicators of neuronal activity in response to stimuli, such as electroencephalography (EEG) and event-related potentials (ERPs), and those that provide indirect metabolic indicators of neuronal activity, such as functional magnetic resonance imaging (fMRI), positron emission tomography (PET), and functional near infrared spectroscopy (fNIRS). EEG represents summated post-synaptic electrical activity of neurons firing in response to motor/cognitive stimuli as recorded on the scalp, and thus offers excellent temporal resolution of electromagnetic brain changes, on the order of milliseconds. In comparison, fMRI and PET techniques, that provide information on cerebral blood flow in response to neuronal activity, have low temporal resolution (on the order of about 10 s), but offer excellent spatial resolution (1 cm or better) and unlike EEG, they provide valuable information on location of the neural signal generated.

Since neuroergonomics distinguishes itself from traditional neuroscience in that it evaluates brain functions in response to work, it is important that the neuroergonomic methods provide the flexibility to assess brain function in naturalistic work settings. Some neuroimaging techniques are better designed for and adapted for assessing brain functions in mobile work environments than others. The pros and cons of neuroergonomic methods are discussed in reference to three criteria: (1) temporal resolution, (2) spatial resolution, and (3) degree of immobility. **Figure 1** illustrates how these neuroimaging techniques compare against

"fnhum-07-00889" — 2013/12/19 — 20:28 — page 1 — #1

each other based on the three criteria. In addition, **Table 1** lists these methods and their major characteristics, such as portability, cost, along with spatial and temporal resolution. In this section, we provide a brief review of the various methods that have been used in neuroergonomic evaluations of human work, emphasizing measures of brain function and applicability in mobile experimental/field settings.

Electroencephalography signals are the spatial summation of current density induced by synchronized post-synaptic potentials occurring in large clusters of neurons measured at the scalp (Pizzagalli, 2007). The EEG is recorded as differences in voltage between active electrodes at different positions on the scalp, such as the frontal, parietal, temporal, and occipital lobes of the brain according to the International 10–20 System, and a reference electrode, typically the ear. EEG signals comprises of different frequency bands, each associated with various cognitive and physical states. Spectral analyses of EEG signals can be conducted to assess power in different frequency bands: delta (0.5–3 Hz), theta (4–8 Hz), alpha (8–13 Hz), beta (13–30 Hz), and gamma (40–50 Hz). Another commonly computed EEG-driven spectral metric (i.e., brain) used in conjunction with muscular output (i.e., body) is corticomuscular coherence (CMC). CMC reflects "communications" between the brain and muscle and is determined as the coherence between sensorimotor cortex activation obtained from EEG and muscular activation as measured by electromyography (EMG) during motor activities (Halliday et al., 1995).

Electroencephalography-derived ERPs represent the brain's neural response to specific sensory, motor, and cognitive events. ERPs represent the outcome of signal averaging of EEG epochs time-locked to a particular stimulus or response event. To evaluate mental workload or examine human error (Fedota and Parasuraman, 2010), ERP waveforms are examined for changes in the amplitude and latency of different ERP *components*, typically defined as positive or negative peak activity (such as the P3 and N1 components) or slowly rising activity such as the lateralized readiness potential (Luck, 2005). To assess neural bases of motor activities, motor-related cortical potential (MRCP) ERP components have been studied that are characterized by a slowly rising negative potential, called *Bereitschaftspotential* (BP) or readiness potential, which is followed by a sharp rising negative potential, known as negative slope. As the onset of MRCP occurs prior to the onset of the motor activity, MRCP is considered to indicate premotor activity, which involves specific brain regions that prepare for a desired motor behavior (Kornhuber and Deecke, 1965).

Electroencephalography-driven metrics, both spectral and temporal, in evaluating brain function during naturalistic complex tasks are relatively unobtrusive so that it does not interfere with the operator's work performance. Its compact size and low cost, compared to other neuroimaging techniques such as fMRI and PET, makes it fairly well suited for use in both laboratory and field conditions. While artifacts attributed to movement, eye blinks, and physiological interference accompany EEG data, several algorithms have been developed to allow for the removal of noise in the EEG signal in real time or during post processing of the data (Jung et al., 2000). Recent developments in making "field-friendly" EEG systems include "dry" electrode caps, which do not need extensive participant preparation time, as well as wireless systems that do not require the participant to be tethered to cables. These technical developments have


"fnhum-07-00889" — 2013/12/19 — 20:28 — page 2 — #2

enhanced the relevance and value of EEG for mobile applications (Makeig et al., 2009).

Cerebral hemodynamic techniques such as fMRI and PET provide valuable information on source locations of distinct neural activation patterns associated with simple and complex cognitive, motor, and affective functions. While PET uses injected radioactive tracers to measure the blood flow in response to stimuli, based on their respective magnetic characteristics fMRI focuses on the resulting contrast between oxygenated and deoxygenated blood called the Blood Oxygenation Level Dependent or BOLD signal (Poldrack et al., 2011). Both fMRI and PET have been fundamental in advancing our knowledge on brain functions and mechanisms during simple, and relatively static, cognitive and motor tasks. By leveraging high spatial resolution offered by fMRI measurements, reliable techniques for the fMRI-EEG integration have been made possible that offer greater spatio-temporal resolution of imaging dynamic brain activity as well as significant improvement over the conventional fMRI-weighted EEG source imaging techniques (Liu and He, 2008; Yang et al., 2010). At the same time, fMRI and PET present several limitations in studying brain functions, such as the required supine position that may yield altered hemodynamic changes than seated or standing positions (Raz et al., 2005), limited mobility, and restrictions on synchronized brain-body measurements (Makeig et al., 2009). Moreover, the increasing need to examine brain activation patterns in complex tasks more representative of natural everyday situations have led researchers to adopt alternative neuroimaging techniques that offer better mobility features.

Functional near infrared spectroscopy is a non-invasive optical technique for measuring cerebral hemodynamics similar to PET and fMRI but with lower spatial resolution. By utilizing the tight neurovascular coupling between neuronal activity and regional cerebral blood flow (Villringer and Chance, 1997) fNIRS measures regional cerebral hemodynamic changes (i.e., changes in oxy- and deoxy-hemoglobin levels) (Jobsis, 1977). Since oxygenated and deoxygenated blood can be contrasted by their different optical absorption properties, fNIRS detects the levels of these blood parameters in response to neuronal activity. fNIRS is portable, inexpensive, and has shown to be an effective tool in quantifying cortical activation during static and dynamic motor movements, without causing substantial movement artifact issues (Perrey, 2008). While fNIRS measurements, particularly oxygenated hemoglobin levels, have shown to be strongly correlated to the fMRI BOLD signals, albeit with relatively lower signal to noise ratio (Strangman et al., 2002; Cui et al., 2011), unlike fMRI and PET its effectiveness in mapping neural activations across closely connected regions or within deep cortical areas is limited due to its relatively lower spatial resolution. Multimodal imaging approaches using both fNIRS and EEG systems have demonstrated that fNIRS is capable of enhancing eventrelated desynchronization-based EEG measurements significantly (Leamy et al., 2011; Fazli et al., 2012).

While fNIRS enables measurement of oxygenated and deoxygenated hemoglobin levels in cortical regions, transcranial Doppler sonography (TCDS) uses ultrasound to image cerebral blood flow to the brain hemispheres (Aaslid, 1986). TCDS uses an emitter attached to the head to direct ultrasound toward the middle cerebral artery (MCA) within the brain, and a receiver then records the frequency of the sound wave reflected by red blood cells moving through the artery. The magnitude of the change in frequency (the Doppler shift) varies directly proportional to the velocity of blood flow within the artery (Duschek and Schandry, 2003). In response to increased task-related neuronal activity, MCA blood flow velocity increases to remove by-products of the metabolic exchange, which is captured using TCDS (Aaslid, 1986). TCDS has become increasingly popular in cognitive neuroergonomic studies of vigilance and mental workload (Warm and Parasuraman, 2007). However, because cerebral blood volume and blood flow velocity is influenced by systemic changes such as heart rate and blood pressure during exercise (Ainslie et al., 2007), TCDS is less popular in assessing task-related neuronal activity in physical neuroergonomic studies of fatigue.

In contrast to the excellent temporal resolution offered by EEG techniques (on the order of milliseconds), magnetic resonance imaging (MRI) provides a structural image of the brain and offers excellent spatial visualization of deep internal parts, such as the hippocampus. While MRI provides static images of the brain that is critical in examining structural changes in the brain due to diseases (such as tumor), its application in studying structural changes in the brain over time (i.e., plasticity) has provided important information on learning and training (Huttenlocher, 2002). A relatively newer MRI technique, called diffusion tensor imaging (DTI), uses MRI to target the diffusion of water molecules in the axons that make up white matter in the brain and allows for the computation of *fractional anisotrophy* (FA). FA values can range from 0 to 1, where 0 indicates non-directional (isotropic) and 1 indicates perfectly directional (anisotropic) diffusion. Higher FA values are thought to reflect greater integrity of white matter linking different cortical and subcortical regions of the brain. Several recent studies have assessed the effectiveness of cognitive and motor training on white matter integrity using the DTI technique (Draganski et al., 2004; Takeuchi et al., 2010; Strenziok et al., 2014). In general, the MRI technique does not offer any mobility features, but an MRI static image can be overlaid with more dynamic fMRI images (i.e., blood oxygenation) so that areas of activation can be associated with particular brain regions.

The electromagnetic and hemodynamic neuroimaging techniques discussed thus far are based on sensing brain activity while a human operator is engaged in cognitive or physical work. As such, all such techniques are *correlational*, thus it may be difficult to establish *causal* links between brain activity and performance using these methods. Researchers have therefore turned to non-invasive stimulation techniques that modulate brain activity, such as transcranial magnetic stimulation (TMS) and transcranial direct current stimulation (tDCS), in order to establish such causal associations. These techniques allow for temporary inhibition or activation of specific brain regions thereby allowing researchers to examine the causal role of different brain regions in various cognitive functions (Walsh and Pascual-Leone, 2005). TMS and tDCS can also be used to modulate brain activity so that the performance of a given cognitive or motor task is improved (Coffman et al., 2014). Alternatively, these techniques can also be applied not to enhance performance over baseline, but to

"fnhum-07-00889" — 2013/12/19 — 20:28 — page 3 — #3

reduce or eradicate a normally occurring performance limitation, such as performance decrements that occur in vigilance tasks (Nelson et al., 2014).

Transcranial magnetic stimulation uses a magnetic coil that is positioned over the participant's scalp over a brain region of interest to send electrical current that changes the magnetic field perpendicular to the head. This induces current flow in the underlying cortical issue, sufficient to alter neural firing (Walsh and Pascual-Leone, 2005). The spatial resolution of TMS is relatively high, particularly when the participant's MRI scan is available to guide the TMS coil placement. The temporal resolution is also high, since the TMS pulses can be delivered with millisecond precision. However, due to the equipment setup TMS does not offer a sufficient degree of mobility needed for neuroergonomic assessment in naturalistic work settings. Whereas TMS uses changing magnetic pulses, tDCS uses small DC electric current (1 or 2 mA) with electrodes attached to the scalp. A positive polarity (anode) is typically used to stimulate neuronal function and enhance performance, and a negative polarity (cathode) is used to inhibit neuronal activity. Compared to TMS, tDCS has low spatial and temporal resolution, but has the advantage that it is portable and very inexpensive and thus is more likely to be adopted in applied neuroergonomic studies.

#### **PHYSICAL WORK**

Ergonomics began as the science of work to maximize productivity, particularly in physical work environments, but has since then expanded to become a scientific discipline concerned with the understanding of the interactions among humans and other elements of a system, in order to optimize human well-being and overall system performance. Physical ergonomics focuses on human physical capabilities and limitations, pertaining to anthropometry, physiology, and biomechanics of the human body, as they relate to physical work (Karwowski et al., 2003). Traditional ergonomic evaluations focus solely on peripheral outcomes, such as force or muscle activity, and disregard the contributions of the brain during work. Physical neuroergonomics is an emerging field of study that focuses on the knowledge of human brain activities in relation to the control and design of physical tasks (Karwowski et al.,2003), by taking into consideration an operator's physical, cognitive, and affective capabilities and limitations. Here we consider how neuroergonomic methods have been employed to evaluate different physical work parameters (such as force production and repetition) and physical fatigue (localized muscle fatigue and whole body fatigue).

#### **PHYSICAL WORK PARAMETERS**

The primary goal of ergonomics is to ensure that work demands are always lower than operator capacity, and the conventional assessment of work demands include measuring biomechanical and physiological outcomes, such as joint torque, muscle activity, and heart rate, in laboratory and field settings. There has been recent interest in assessing physical work using neuroergonomic methods in controlled laboratory conditions; however, there is a clear lack of neuroergonomic studies in assessing physical work in actual field/work settings. Like any new field, physical neuroergonomics research first needs to understand the capabilities, limitations, and considerations of existing neuroimaging techniques on simulated work environments that can help build the knowledge base necessary to perform research in naturalistic work environments.

Since physical work can involve both static and dynamic work at different intensities, repetitions, and durations, which in turn can affect autonomic responses, different work parameters can influence the type of measurement technique adopted. For example, dynamic or ambulatory tasks, such as walking or lifting, cannot be assessed using fMRI due to mobility constraints. More appropriate neuroimaging methods to evaluate ambulatory physical work are EEG, ERP, and fNIRS. Of these, EEG appears to be the most common neuroimaging technique since it provides excellent temporal resolution. Effective artifact removal techniques are available that allow for its use in evaluating dynamic tasks. For example, EEG-derived MRCP has provided valuable information on the role of cortical motor commands (represented by the MRCP) on the control of voluntary muscle activation. MRCP from the supplementary motor area and the contralateral sensorimotor cortex has shown to be highly correlated with force production and rate of force production during isometric elbow-flexion, and associated muscle activity (Siemionow et al., 2000). Of note, a recent fNIRS investigation has demonstrated obesity-related alterations in neural patterns of force control (i.e., lower prefrontal cortex activation associated with decreased joint stability) that can shed some light on the increased incidence of injury rates and higher work absenteeism in obese workers (Schulte et al., 2007; Mehta and Shortz, 2013). High repetition is one of the major work-related risk factors that contribute to the development of musculoskeletal disorders (Bernard, 1997). To evaluate the effects of repetition that involves flexion and extension of a joint, traditional ergonomic methods focus on muscular responses such as EMG. In a study investigating thumb flexion and extension movements, EEG-derived MRCP findings from the supplementary motor area and contralateral motor cortex demonstrated that extension and flexion result from separate corticospinal projections to the motor neurons (Yue et al., 2000). Thumb extensions resulted in lower EMG but elicited greater brain responses than flexion movements. This particular finding may be important to our understanding of the etiology of musculoskeletal disorders due to repetitive motion. Real work environments are seldom static, and can require operators to focus not only on the physical work demands but also on the necessary visual/auditory cues associated with the tasks. Such tasks, which are dynamic and require visuomotor control, have shown to increase corticomuscular coupling at higher EEG frequencies (i.e., gamma bands), indicating the adaptive role of cortical oscillations in rapidly integrating visual (or new) information with the somatosensory information (Marsden et al., 2000; Omlor et al., 2007). These findings have important implications for task analysis and design, particularly for work tasks that require visual feedback or fine or precise control of body motions.

#### **PHYSICAL FATIGUE**

Fatigue is defined as the inability to maintain required power after prolonged use of the muscle(s) (Latash et al., 2003), and can be affected by central (i.e., motivation, cortical activity,

"fnhum-07-00889" — 2013/12/19 — 20:28 — page 4 — #4

etc.) and peripheral (i.e., changes in muscle contractile properties) mechanisms. Neuroergonomic methods can help examine the role of central brain mechanisms in fatigue development. Based on the work tasks, fatigue in the workplace can be broadly categorized as localized muscle fatigue, which is the fatigue of specific muscle groups during tasks such as assembly line work or precision work, and whole body fatigue, which is more cardiorespiratory in nature that can occur during manual materials handling tasks. Commonly used ergonomic indicators of localized muscle fatigue include a reduction in force generating capacity (Vøllestad, 1997) and a decrease in EMG power spectrum (Mehta and Agnew, 2012). However these measures do not delineate the contributions of central fatigue from peripheral fatigue. Using EEG-derived MRCP, Johnston et al. (2001) demonstrated a significant increase in the activity of the BP component and the motor potential (MP) component of the MRCP, associated with a decline in force production and reduced EMG activity during a fatiguing grasping task. These increases in the early components of MRCP may reflect development of compensatory cortical strategies to accommodate for the inability to maintain the desired force levels due to peripheral fatigue. Supporting this, Liu et al. (2007) advocated that muscle fatigues well before the brain does; in essence that peripheral fatigue occurs before central fatigue. They demonstrated, by estimating the changes of source locations of high-density EEG signals using a single moving current dipole model, that handgrip muscle fatigue was associated with shifting of brain activation centers from one location to another when neurons in the previous location become fatigued. These studies collectively demonstrate the application of EEG in examining the neural correlates of localized fatigue development of smaller muscles during relatively static, or immobile, tasks.

Of the various neuroimaging techniques, EEG offers the greatest flexibility and mobility features that make it an attractive candidate in assessing whole body fatigue. By simultaneously obtaining information on eye movements and spontaneous EEG signals, Kubitz and Mott (1996) demonstrated increased brain activation (i.e., decreased alpha activity and increased beta activity) during a fatiguing cycling task. While technical advances have been made in minimizing mechanical artifacts from highdensity EEG signals during whole body movements (Gwin et al., 2010), fNIRS has gained rapid attention in evaluating whole body fatigue owing to its methodological advantages over EEG. First, fNIRS provides information on the location of the neural signal generated, whereas with EEG signals, source localization has to be computationally derived. Second, there are no time-sensitive requirements in examining whole body fatigue when compared to fast reaction time tasks; slower hemodynamic responses of fNIRS are thus appropriate when compared to fast EEG responses. As such, fNIRS responses have shown to be less affected by movement artifacts than EEG signals (Perrey, 2008). Several fNIRS studies have reported a significant decrease in relative levels of oxygenated hemoglobin in the prefrontal cortex, accompanied by muscular impairment, at exhaustion during submaximal and maximal fatiguing contractions (González-Alonso et al., 2004; Bhambhani et al., 2007; Nybo and Rasmussen, 2007). In particular, Thomas and Stephane (2008) demonstrated that oxygenated hemoglobin levels in the prefrontal cortex during incremental cycling exercise increased in the early stages, but decreased markedly in the last stage until exhaustion. These findings imply that prefrontal cortex activation is associated with reduction in motor output at the cessation of exercise. However, these fatiguing tasks are accompanied by cardiorespiratory changes in the autonomic system that can affect fNIRS responses (Obrig et al., 1996). Depending on the research questions asked, such systemic influences on cerebral hemodynamic responses may be desired or undesired. Obrig and Villringer (2003) emphasize the importance of analyzing deoxygenated hemoglobin levels as an indicator of "neuronal activation" over the more commonly used oxygenated hemoglobin values. They argue that oxygenated hemoglobin levels are acceptable neuronal activity indicators when cerebral autoregulation is intact, i.e., cerebral blood flow is in homeostasis. Increases in oxygenated hemoglobin during exercise can be attributed not only to neuronal activation but also to exercise-induced increased blood flow to the brain, and as such a decrease in deoxygenated hemoglobin is the most valid parameter. Thus, neuroergonomic investigations of fatigue need to consider these systemic influences, and perhaps collect peripheral measurements such as arterial blood pressure and heart rate to ensure that appropriate inferences are made from fNIRS signals.

#### **COGNITIVE WORK**

The field of human factors and ergonomics had its origins in timeand-motion studies conducted in the early 1900s. With the advent of World War II, increasing attention was paid to evaluation of human psychological processes during work performance, but the dominant approach was behaviorism, or stimulus-response psychology. The advent of the cognitive revolution in the late 1950s lead to the introduction of the cognitive approach in human performance assessment from the 1960s to the present day, but there was still a relative neglect of brain mechanisms. Advances in neuroimaging and related methods that lead to the development of the field of cognitive neuroscience lead to the argument that neural measures should also be considered in human factors and ergonomics (Parasuraman, 2003). Since that time, the neuroergonomic approach has been applied to a number of different issues in cognitive ergonomics.

These historical trends in theoretical frameworks used in ergonomics can be seen clearly in the periodical reviews of the field of engineering psychology in the *Annual Reviews of Psychology*. Fitts (1958) reviewed work conducted mainly within time-andmotion and stimulus-response frameworks; Wickens and Kramer (1985) presented a cognitive or information-processing approach; and the most recent review, by Proctor and Vu (2010), describes the neuroergonomic approach. In this paper, we review a few key issues in cognitive neuroergonomics and on those areas where the most research and development work has been done. These include: (1) mental workload, (2) vigilance and mental fatigue, and (3) neuroadaptive systems.

#### **MENTAL WORKLOAD**

The assessment of human mental workload is one of the most widely studied topics in ergonomics (Wickens and McCarley,

"fnhum-07-00889" — 2013/12/19 — 20:28 — page 5 — #5

2008). If operator mental workload is either too high or too low human-system performance may suffer in work environments, thereby potentially compromising safety. Hence, workload must be assessed in the design of new systems or the evaluation of existing ones. Behavioral measures, such as accuracy and speed of response on secondary tasks, or subjective reports (such as the NASA-TLX) have been widely used to assess mental workload. However, measures of brain function offer some unique advantages that can be exploited in mental workload assessment (Kramer and Parasuraman, 2007). Among these is the ability to extract covert physiological measures continuously in complex system operations in which overt behavioral measures may be relatively sparse.

The dominant theory of human mental workload is *resource* theory (Wickens, 1984, 2002). This theory postulates that except for highly overlearned "automatic" tasks, task performance is directly proportional to the application of attentional resources. The theory also proposes that the degree of overlap of multiple pools of resources determines the pattern and amount of interference when two or more tasks are performed simultaneously (such as driving and talking on the cell phone). Dual-task studies have provided abundant support for resource theory (Wickens and McCarley, 2008), but one criticism is that the theory is circular (Navon, 1984), which can be linked to the lack of an independent measure of resources. This criticism can be countered if neural measures of mental resources can be identified.

Measures of cerebral hemodynamics, such as fNIRS and TCDS, have provided validation for the resource construct. In a recent study, Ayaz et al. (2012) tested experienced air traffic controllers (ATC) on a complex ATC task requiring them to keep aircraft in their sector free of conflicts. fNIRS was used to measure activation of the prefrontal cortex. Ayaz et al. (2012) found there was an increase in prefrontal cortex activation as the number of aircraft in their sector increased. These findings suggest that fNIRS can provide a sensitive index of cognitive workload in a skilled group performing a realistic task that was highly representative of their work environment. fNIRS has also been found to index changes in prefrontal cortex activation with skill acquisition in both basic working memory tasks (McKendrick et al., 2014) and more complex piloting tasks (Ayaz et al., 2012). Most recently, portable versions of fNIRS have been developed for use in mobile neuroimaging (Ayaz et al., 2013).

There are many factors, such as cost, ease of implementation, intrusiveness, etc., that must be taken into consideration when selecting neuroergonomic techniques for mental workload assessment. Some of these factors (e.g., cost) may rule out the use of neuroergonomic methods in favor of simpler indexes such as subjective measures. Some workers may also not wish to be "wired up" for physiological recording, so operator acceptance must also be carefully considered. However, with increasing miniaturization and development of dry electrode, wireless, wearable systems, some of these concerns are diminishing.

#### **VIGILANCE AND MENTAL FATIGUE**

The evaluation of operator vigilance and mental fatigue in work environments is a topic closely related to workload assessment. The widespread implementation of automation in many work environments, including air and surface transportation and health care, while often leading to a reduction in operator workload, can also increase workload because of the resulting need for monitoring the automation (Parasuraman, 1987). The typical finding in vigilance studies is that the detection rate of critical targets declines with time on task (Davies and Parasuraman, 1982). Vigilance decrement was originally attributed to a reduction in physiological arousal (Frankmann and Adams, 1962) but more recent neuroergonomic research using TCDS and fNIRS have attributed it to resource depletion (Warm et al., 2008). Warm et al. (2008) reported a series of studies of TCDS and vigilance (for reviews, see Warm and Parasuraman, 2007; Warm et al., 2008). A consistent finding is that the vigilance decrement is paralleled by a decline in blood flow velocity over time, relative to a baseline of activity just prior to beginning the vigilance session. The parallel decline in vigilance performance and in blood flow velocity is found for both visual and auditory tasks (Shaw et al., 2009). These findings have been interpreted using resource theory. A critical control finding in support of resource theory – as opposed to a generalized arousal or fatigue model – is that the blood flow change occurs only when observers actively engage with the vigilance task. When observers are asked to simply watch a display passively without having to detect targets for the same amount of time as in an active vigilance condition – a case of maximal underarousal – blood flow velocity does not decline but remains stable over time.

The deleterious effects of loss of operator vigilance can countered with reduced work hours and more frequent rest breaks, but this may not be practical in all work settings. Another mitigating strategy is to use cueing. Detection performance in vigilance tasks can be improved by providing observers with consistent and reliable cues to the imminent arrival of critical signals, with the extent of the decrement being reduced or eliminated (Wiener and Attwood, 1968). With cueing, observers need to monitor a display only after having been prompted about the arrival of a signal and therefore can husband their information processing resources over time. In contrast, when no cues are provided, observers are never certain of when a critical signal might appear and consequently have to process information on their displays continuously across the watch, thereby consuming more of their resources over time than cued observers. If the vigilance decrement stems from resource depletion due to need to attend continuously to a display, then pre-cues should reduce the decline in cerebral blood flow velocity as measured by TCDS. This was confirmed in a study by Hitchcock et al. (2003). They used no pre-cues or pre-cues that were 100, 80, or 40% reliable in pointing to an upcoming critical event in a simulated air traffic control task. Performance efficiency remained stable when perfectly reliable cues were provides but declined over time in the remaining conditions, so that by the end of the vigil, performance efficiency was clearly best in the 100% group, followed in order by the 80, 40%, and no-cue groups. Blood flow declined in the no cue control condition, but there was a progressive reduction in the extent of the decline with progressively more reliable cues. There was no decline when the cues were perfectly reliable. This pattern of change in blood flow exactly matched that of performance.

"fnhum-07-00889" — 2013/12/19 — 20:28 — page 6 — #6

In addition to cueing, non-invasive brain stimulation could also be used to mitigate vigilance decrement and mental fatigue. Nelson et al. (2014) applied 1 mA anodal tDCS to either the left or right prefrontal cortex while participants performed the same vigilance task used by Hitchcock et al. (2003). tDCS was applied either early or late during the course of the vigilance task. Compared to a control group that showed the normal vigilance decrement, the early stimulation group had a higher detection rate of critical signals. The late stimulation group initially exhibited a vigilance decrement, but this was reversed following application of tDCS. These initial findings are highly encouraging, but need to be followed up with additional research to examine the long-term effectiveness of tDCS as a method to alleviate vigilance problems at work.

#### **TRAINING AND NEUROADAPTIVE SYSTEMS**

While the goal of ergonomic design is to avoid having workers exposed to extremes of workload and to loss of vigilance, this may not always be possible in certain work settings where unexpected events, equipment failures, or other unanticipated factors lead to a transient increase in the task load imposed on the human operator, or long work hours impose demands on operator vigilance. Adaptive automation offers one approach to deal with these issues (Parasuraman, 1987, 2000). In this approach, the allocation of functions to human and machine agents is flexible during system operations, with greater use of automation during high task load conditions or emergencies and less during normal operations, consistent with the approach of dynamic function allocation (Lintern, 2012). The adaptive automation concept has a long history (Parasuraman et al., 1992), but neuroergonomic methods for its implementation have been considered relatively recently (Inagaki, 2003; Parasuraman, 2003; Scerbo, 2007).

Several methods to implement adaptive systems have been examined, including neuroergonomic measures to assess the operator's functional state (Byrne and Parasuraman, 1996; Kramer and Parasuraman, 2007; Wilson and Russell, 2007; Parasuraman and Wilson, 2008; Ting et al., 2010). Many studies have used EEG because of its ease of recording and (relative) unobtrusiveness (compared, say, to secondary tasks or subjective questionnaires). EEG also has the property of being a very high bandwidth measure, offering the possibility of sampling the human operator at up to about 30 Hz (Wilson and Russell, 2003). Workload adaptive systems need to assess operator state in real time, or near real time, so that task allocation or restructuring can be implemented in cases of overload or underload. A number of different statistical and machine learning techniques have been used for this purpose. These include discriminant analysis (Berka et al., 2004), artificial neural networks (Wilson and Russell, 2007; Baldwin and Penaranda, 2012), Bayesian networks (Wang et al., 2011), and fuzzy logic (Ting et al., 2010). These have been implemented in real time and typically provide accuracies of 70–85%.

Implementing neuroergonomic adaptive systems in real settings poses significant challenges. A major issue concerns the detection and removal of artifacts in real time. Furthermore, while initial success has been achieved in using computational techniques to classify workload on the basis of EEG and other neuroergonomic measures, the reliability and stability of these methods within and across individuals needs to be more rigorously tested (Wang et al., 2011; Christensen et al., 2012). Finally, the operational community must be involved in the design of adaptive systems to ensure user acceptance and compliance.

#### **NEUROERGONOMIC ASSESSMENT OF CONCURRENT PHYSICAL AND COGNITIVE WORK**

Both physical and cognitive neuroergonomics have helped advance our understanding on the role of the human brain during physical and cognitive work, respectively. Only a small number of studies have investigated the interaction between physical and cognitive work, which is a big concern since"work"places combined physical and cognitive demands on operators, never either one in isolation. High cognitive demands can influence physical work; and physical activity can in turn influence cognitive processing. In comparison to traditional evaluation techniques in either physical or cognitive ergonomics domain, neuroergonomic methods offer a great advantage in assessing these combined demands. For example, using EEG signals Kamijo et al. (2004) investigated the influence of exercise intensity on cognitive function using the P300 ERP component. They suggested that exercise influenced the amount of attentional resources devoted to a given task and that the changes in P300 amplitude followed an inverted *U*-shaped behavior of differences in exercise intensity. When examining the impact of cognitive demand on physical capacity, a few studies have attributed decreased muscle endurance in presence of a cognitively stressful situation to lower motivation (Marcora et al., 2009), increased neuromotor noise impairing joint steadiness (van Loon et al., 2001; Mehta and Agnew, 2011, 2013; Mehta et al., 2012), or neuronal interference at the prefrontal cortex that is involved in cognitive processing and isometric motor contractions (Dettmers et al., 1996; Rowe et al., 2000; Mehta and Parasuraman, 2013). In particular, using fNIRS to monitor cerebral oxygenation during handgrip exercises, Mehta and Parasuraman (2013) demonstrated that concurrent handgrip exercises in cognitive stressful conditions were associated with lower oxygenated hemoglobin levels in the bilateral prefrontal cortex at exhaustion when compared to the handgrip exercises at the same intensity levels (i.e., no changes observed in peripheral muscular responses of EMG and force exerted) under no stress. Quite similarly, using EEG- and EMG-derived corticomuscular coupling measure, Kristeva-Feige et al. (2002) reported that corticomuscular coupling decreased significantly during a cognitively stressful condition despite no changes observed in traditional measures such as EMG and force production. These studies collectively emphasize the importance of obtaining brain (or central) responses along with the more conventional ergonomic measurements to accurately understand the "total" demands placed on humans during work that requires both physical and cognitive processing. Future investigations on comparing these brain-body responses with the more traditional performance or subjective measures are also needed to understand the underlying neural "cost" of operator functional state. Such studies are also needed so as to develop evaluation tools (surveys, heuristic checklists) that are predictive of the neural and physiological cost associated with optimizing work tasks, which can be

"fnhum-07-00889" — 2013/12/19 — 20:28 — page 7 — #7

used by designers or supervisors to quantify operator workload and fatigue.

#### **MOBILE BRAIN IMAGING CONSIDERATIONS FOR WORKLOAD/FATIGUE ASSESSMENTS**

One of the key distinctions between neuroergonomics and neuroscience is that neuroergonomics is the study of brain and behavior "at work." Thus, it is extremely important that neuroergonomic methods are capable of examining human operators at their naturalistic work settings. In this paper, we discussed the merits and disadvantages of the available neuroimaging techniques applicable to neuroergonomics and a key theme identified was the lack of studies evaluating neural bases of mobile work, particularly in the physical neuroergonomics domain. Recent efforts in developing mobile brain imaging (MoBI) techniques, which consider the physical and environmental impact on human cognitive processing, show great promise. For example, Gramann et al. (2011) reviewed the implications and feasibility of a newly developed MoBI system that was previously employed in examining cognitive processing during human stance and locomotion. In particular, their MoBI investigation included simultaneous brain-body measurements from a 256-channel EEG system and kinematic and kinetic outcomes that are otherwise employed during conventional gait biomechanics using motion capture systems and force plates (Gwin et al., 2011). In their review, Gramann and colleagues identify key requirements for MoBI methods that include: (1) robust mobile sensor technology to measure brain activity, (2) comprehensive "wireless" body measurement system, and (3) powerful computational software to collectively processing and analyze both brain-body responses. While developing an ideal MoBI system may be a challenging goal, understanding current limitations in mobile brain-body imaging and addressing them, albeit painstakingly, is a critical step toward achieving this goal. Future investigations can also include developing similar mobile brain-body imaging systems for hemodynamic neuroimaging techniques, utilizing either fNIRS or TCDS to provide brain imaging measures, and using peripheral measurements such as heart rate and blood pressure to document physiological whole-body responses.

#### **CONCLUSION**

Ergonomics has long since moved from being a science of improving work efficiency to now being focused on enhancing well-being while improving systems performance. To effectively understand how humans interact with work systems, it is not only important to ask how well they perform, but also *why* they perform a certain way. Neuroergonomics have helped fill in the gaps on the neural bases of both physical and cognitive performance that were left unanswered with traditional ergonomic assessments. In this review we discussed the recent developments and adoption of neuroergonomic methods and applications in investigating physical, cognitive, and combined physical and cognitive work. We also reviewed the applicability and feasibility of neuroimaging techniques in evaluating mobile work environments. While some neuroimaging methods are expensive and are immobile, such as the MRI, fMRI, PET, and DTI, portable methods such as EEG, fNIRS, and TCDS, are more likely to be adopted in applied ergonomics research. With the advent of, and recent developments in, MoBI technology, we can be assured that neuroergonomics can continue providing critical information on how/why human interact in ambulatory and naturalistic work settings.

#### **AUTHOR CONTRIBUTIONS**

Both authors contributed equally to this work. Ranjana K. Mehta performed the literature review on neuroergonomics applications to physical work and Raja Parasuraman performed the literature review on cognitive neuroergonomics. Both authors discussed the reviewed implications and commented on the manuscript at all stages.

#### **ACKNOWLEDGMENTS**

Supported in part by Air Force Office of Scientific Research grant FA9550-10-1-0385 to Raja Parasuraman.

#### **REFERENCES**


"fnhum-07-00889" — 2013/12/19 — 20:28 — page 8 — #8


during a maintained motor contraction task. *Clin. Neurophysiol.* 113, 124–131. doi: 10.1016/S1388-2457(01)00722-2


"fnhum-07-00889" — 2013/12/19 — 20:28 — page 9 — #9


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 23 August 2013; accepted: 05 December 2013; published online: 23 December 2013.*

*Citation: Mehta RK and Parasuraman R (2013) Neuroergonomics: a review of applications to physical and cognitive work. Front. Hum. Neurosci. 7:889. doi: 10.3389/fnhum.2013.00889*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2013 Mehta and Parasuraman. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

"fnhum-07-00889" — 2013/12/19 — 20:28 — page 10 — #10

# Continuous monitoring of brain dynamics with functional near infrared spectroscopy as a tool for neuroergonomic research: empirical examples and a technological development

#### *Hasan Ayaz <sup>1</sup> \*, Banu Onaral 1, Kurtulus Izzetoglu1, Patricia A. Shewokis 1,2, Ryan McKendrick3 and Raja Parasuraman3*

*<sup>1</sup> School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, PA, USA*

*<sup>2</sup> Nutrition Sciences Department, College of Nursing and Health Professions, Drexel University, Philadelphia, PA, USA*

*<sup>3</sup> Center of Excellence in Neuroergonomics, Technology, and Cognition, George Mason University, Fairfax, VA, USA*

#### *Edited by:*

*Klaus Gramann, Berlin Institute of Technology, Germany*

#### *Reviewed by:*

*Craig Speelman, Edith Cowan University, Australia Dietrich Manzey, Technische Universitaet Berlin, Germany*

#### *\*Correspondence:*

*Hasan Ayaz, School of Biomedical Engineering, Science and Health Systems, Drexel University, 3508 Market Street, Monell Suite 101, Philadelphia, PA 19104, USA e-mail: hasan.ayaz@drexel.edu*

Functional near infrared spectroscopy (fNIRS) is a non-invasive, safe, and portable optical neuroimaging method that can be used to assess brain dynamics during skill acquisition and performance of complex work and everyday tasks. In this paper we describe neuroergonomic studies that illustrate the use of fNIRS in the examination of training-related brain dynamics and human performance assessment. We describe results of studies investigating cognitive workload in air traffic controllers, acquisition of dual verbal-spatial working memory skill, and development of expertise in piloting unmanned vehicles. These studies used conventional fNIRS devices in which the participants were tethered to the device while seated at a workstation. Consistent with the aims of mobile brain imaging (MoBI), we also describe a compact and battery-operated wireless fNIRS system that performs with similar accuracy as other established fNIRS devices. Our results indicate that both wired and wireless fNIRS systems allow for the examination of brain function in naturalistic settings, and thus are suitable for reliable human performance monitoring and training assessment.

**Keywords: fNIRS, optical brain monitoring, working memory training, prefrontal cortex, hemodynamic response, wireless NIRS**

#### **INTRODUCTION**

Understanding the neural mechanisms that contribute to cognitive functions such as performing complex cognitive tasks, acquisition, development, and use of cognitive skills, is an important goal for cognitive neuroscience research (e.g., Poldrack et al., 1998) and for applications of neuroscience to work and everyday activities, or neuroergonomics (Parasuraman, 2011). Various magnetic resonance imaging (MRI) methods have provided essential information about the brain systems involved in skill acquisition. These include functional MRI (Karni et al., 1995), resting state functional connectivity (Lewis et al., 2009), and diffusion tensor imaging (Lövdén et al., 2010). Such studies are critical for the development of theories of neuroplasticity, because human brain changes associated with learning can be compared to studies in animals using invasive neurophysiological and pharmacological methods (Sarter and Parikh, 2005). However, MRI has two major limitations; (1) its requirement for participant immobility, and (2) its high operational cost. The former rules out its use for understanding brain dynamics during everyday activities such as walking or running, and while many functional MRI studies have been carried out using virtual reality simulations of such naturalistic activities as spatial navigation (Hartley et al., 2003), flying (Callan et al., 2012), and driving (Calhoun et al., 2002), the concern is that carrying out these activities while prone and immobile may not recruit the same brain networks as those involved when one is mobile and upright (Raz et al., 2005). One consequence of the second limitation of MRI, its high operational cost, is that skill acquisition studies typically image participants pre- and post-training, so that only linear changes in brain structure and function can be assessed. However, skill acquisition is known to be non-linear, e.g., described by power or hyperbolic functions (e.g., Newell and Rosenbloom, 1980; Speelman and Kirsner, 2005). Also, neural changes are likely to occur throughout training, so it is important to examine how such changes are linked to performance at multiple time points during learning, not just before and after training.

One method that is well-suited for such continuous monitoring of brain dynamics is functional near infrared spectroscopy (fNIRS). fNIRS is a non-invasive, safe, and portable optical method to monitor the brain activity within the prefrontal cortex of the human brain. fNIRS has emerged during the last decade as a promising non-invasive neuroimaging tool and has been used to monitor various types of brain activities during motor and cognitive tasks with increasing interest from research communities. fNIRS uses specific wavelengths of light to provide measures of cerebral oxygenated and deoxygenated hemoglobin that are correlated with the fMRI BOLD signal (Cui et al., 2011). While fMRI can monitor the whole brain with high spatial resolution at the sub-millimeter level, fNIRS can only monitor cortical regions with less spatial resolution (usually in the centimeter range). However, unlike fMRI, fNIRS is quiet (no operating sound), provides higher temporal resolution (faster sampling frequency), and participants are not restricted to a confined space or are not required to stay in a supine position motionless. Hence, fNIRS is an ideal candidate for monitoring cortical activity related changes not only in laboratory settings but also in more ecologically valid, everyday working and field conditions.

In this paper we provide results from a number of studies that illustrate the use of this approach to the examination of workload and training-related brain dynamics with human performance assessment. This paper has two major aims. The first is to show that fNIRS provides a sensitive and reliable index of brain activity in skill acquisition, task performance and during the development of expertise in complex tasks. Because occupations involving such tasks often require their human operators to be free to speak, move their eyes or heads, and otherwise be mobile, a second aim of this paper is to describe a mobile fNIRS system that can supplement "tethered" fNIRS in studies of human performance. We first describe the use of fNIRS in monitoring cognitive workload in air traffic controllers. We then describe a study of the acquisition of a dual-tasking skill, in which participants had to perform challenging verbal and spatial working memory tasks simultaneously, as well as a study examining development of expertise in piloting unmanned vehicles. Each of these studies used conventional fNIRS devices in which the participants were tethered to the device while seated at a workstation. Current fNIRS instruments require the participant to be connected to the sensor and the device via cables and/or fiber optic lines which imposes restrictions on the ambulatory nature of the experiment protocol and to the participant. Consistent with the aims of mobile brain imaging (MoBI) (Makeig et al., 2009; Gramann et al., 2011) and neuroergonomics (Parasuraman, 2011), it would be desirable to measure brain dynamics while participants can move freely. To address these limitations we introduce a compact and battery-operated wireless fNIRS system that performs with similar accuracy as other established fNIRS devices.

#### **MONITORING COGNITIVE WORKLOAD IN AIR TRAFFIC CONTROLLERS**

The assessment of cognitive workload using neural measures is a central feature of research and development in neuroergonomics (Parasuraman, 2011). Mental workload is also a critical factor in maintaining safety in air traffic control (ATC), particularly as traffic density increases and new systems and operational procedures are implemented for air traffic management (Loft et al., 2007). Accordingly, there is a need for sensitive, objective methods of measuring cognitive workload in air traffic controllers.

The feasibility of using fNIRS for human performance assessment has been demonstrated in several recent studies. In a prior research project, we incorporated fNIRS to studies conducted at the Federal Aviation Administration (FAA)'s William J. Hughes Technical Center Human Factors Laboratory where certified controllers were monitored with fNIRS while they managed realistic ATC scenarios under typical and emergent conditions (Ayaz et al., 2012b). The primary objective of this work was to use neurophysiological measures to assess cognitive workload and usability of new interfaces developed for ATC systems (see **Figure 1**). Throughout the study, certified professional controllers (CPCs) completed ATC tasks with different interface settings and controlled difficulty levels for verification. The results provide evidence that brain activation as measured by fNIRS provides a valid measure of mental workload in this realistic ATC task (Ayaz et al., 2012b).

For the first part of the study, we used a working memory task (N-back) that has been widely used in the cognitive neuroscience research literature (Owen et al., 2005). The N-back paradigm provides varying task-load conditions to test associations between level of difficulty and cortical activation, and has been shown to activate the dorsolateral (DLPFC) and ventrolateral prefrontal cortex (VLPFC) as assessed with Positron Emission tomography (PET) (Smith et al., 1996) and functional MRI (D'Esposito et al., 1998; Owen et al., 2005). During the task, participants were asked to monitor stimuli (single letters) presented on a screen serially and click a response button when a target stimulus arrives. Four conditions were used to incrementally vary working memory load from zero to three items. In the 0-back condition, subjects respond to a single pre-specified target letter (e.g., "X") with their dominant hand (pressing a button to identify the stimulus). In the 1-back condition, the target is defined as any letter identical to the one immediately preceding it (i.e., one trial back). In the 2- and 3-back conditions, the targets were defined as any letter that was identical to the one presented two or three trials back, respectively. The total test included seven sessions of each of the four n-back conditions (hence, a total of 28 N-back blocks each of 1 min duration, which had 20 letters presented each for 500 ms with a 2500 ms inter-stimulus time) presented in a pseudo-random order.

Results showed that average oxygenation changes due to task engagement (mean for each block with baseline compared to beginning of the block) at optode 2 that is close to AF7 in the International 10–20 System, located within left inferior frontal gyrus in the dorsolateral prefrontal cortex (DLPFC), were associated with task difficulty and increased monotonically with increasing task difficulty (see **Figure 2**, left). Moreover, the significant region within left PFC in this study was implicated in many previous studies of the N-back task using PET (Smith et al., 1996; Reuter-Lorenz et al., 2000), fMRI (Cohen et al., 1997; Owen et al., 2005), and fNIRS (Schreppel et al., 2008).

For the second part of the study, complex cognitive tasks (i.e., ATC) were used. A critical transition defined in the planned future ATC system called NextGen involves augmenting the current auditory-based communications between ATC and the flight deck with text-based messaging, or DataComm systems (Willems et al., 2006, 2010). DataComm systems are expected to allow controllers to manage more air traffic at a lower level of cognitive load, thereby increasing both the capacity of the national airspace system and passenger safety.

Based on the approach and findings of the N-back working memory study, the objective in the second part of the study was to use neurophysiological measures to predict changes in cognitive workload during a complex cognitive task that very closely simulated the activities of air traffic controllers. Two types of communications between the CPCs and pilots, either typical (VoiceComm) or emergent (DataComm) communications were used in ATC simulations in a pseudo-random order (Willems et al., 2006, 2010). For each communication type, task difficulty was varied by the number of aircraft in each sector, containing 6, 12, or 18 aircraft to increase task load. Three simulation pilots supported each sector within voice-based scenarios and entered data at their workstations to maneuver aircraft, all based on controller clearances.

Analysis of data from 24 participants indicated a significant measurement location of optode 8 which is within the medial PFC/frontopolar cortex, and there were two significant main effects, Task Difficulty (number of aircrafts) and Communication (VoiceComm vs. DataComm) (see **Figure 2**, right). The fNIRS results from the main effect of Communication type confirms that VoiceComm condition results in higher oxygenation compared to the DataComm condition with a small to moderate effect size. These results are consistent with the idea that, given the same cognitive workload (within identical scenarios), DataComm required fewer cognitive resources.

#### **BRAIN DYNAMICS DURING EXTENDED WORKING MEMORY TRAINING**

The previously described study by (Ayaz et al., 2012b) showed that frontal cerebral oxygenation as measured by fNIRS increases with working memory load. Specifically, average oxygenation systematically increased in the N-back task as N was varied from 0 to 3 (see **Figure 1**, left). Working memory capacity is predictive of performance on visual attention (Engle, 2002), decision-making (Endsley, 1995), and supervisory control tasks (McKendrick et al., 2013b). Hence it is of interest that recent studies have shown that an individual's working memory capacity is not fixed but can be increased by training (Jaeggi et al., 2008), although whether such training transfers to other general domains of cognition is controversial (Shipstead et al., 2012). MRI studies have also shown that such working memory training is associated with both structural (Takeuchi et al., 2010) and functional (Dahlin et al., 2008) brain changes. However, as mentioned previously, most studies have used MRI in preand post-training designs, so that fine-grained and non-linear changes in brain dynamics have not been studied. Accordingly, we describe a study that used fNIRS to monitor skill acquisition in a dual verbal and spatial working memory task (McKendrick et al., 2013a).

As people perform a task repeatedly, they are likely to experience changes in the degree of mental effort expended, either voluntarily or as required by the task. To distinguish between brain changes as a result of working memory training from increases in mental effort, we compared two training conditions in this study: An adaptive training condition in which working memory load was adjusted based on the trainee's performance, and a yoked condition whose working memory load was adjusted based on the performance of individuals in the adaptive condition. Since task demands are not matched to the capabilities of participants in the yoked group, we predicted that they would expend more mental effort in order to perform the task and show an increase in prefrontal cortex (PFC) total hemoglobin (HbT) as measured with fNIRS. At the same time, because task demands are matched to the capabilities of adaptive-trainees, we therefore expected this group to show little change in hemodynamic response in PFC.

In addition, to improve the efficacy of the working memory training design, we implemented the suggestions for optimal training proposed by Gibson et al. (2012). First, we used a challenging dual verbal-spatial working memory task (see **Figure 3**), in which participants first memorized a string of digits, then a number of spatial locations, and following a delay period, recalled the locations and then the digits. The use of the dual task allowed for taxing the updating and executive control components associated with working memory (Baddeley, 1986). Second, to avoid ceiling effects and challenge participants, the load for verbal and spatial working memory was set to a range beyond what is considered the average capacity limit (spatial: 4 locations, verbal: 7 digits). Participants trained on the working memory task for about 2 h each day for 5 successive days. Daily training was separated into two 1 h sessions with a 15 min break between training sessions. Within a given training sessions participants performed 10 training blocks and each training block consisted of nine trials of the dual working memory task. Finally, we used linear mixed effects modeling of the data to examine both linear and nonlinear changes in performance and brain dynamics with working memory training.

**Figure 4** shows the data, plotted for each individual, for the verbal working memory task. As is clear from **Figure 4**, verbal working memory span increased with training, but in a nonlinear manner. A cubic function provided the best fitting model for training-related changes in performance. As expected, the adaptive group reached higher span levels than the yoked group at the end of training. Differences between training groups were modeled by a significant negative quadratic component for the yoked training condition, representing a slowing of skill development on the third and fourth days of training relative to the adaptive condition (see **Figure 4**).

As predicted, we observed an increase in hemodynamic response for the yoked control condition. This was specifically observed in the right rostral prefrontal cortex during the first 3 days of training. In the same region, in the adaptive condition there was a decrease in hemodynamic response over the same time period. The rostral prefrontal cortex is believed to be involved in the monitoring and processing of sensory stimuli during multitasking (Burgess et al., 2005). The changes in rostral prefrontal cortex suggest that in order to keep pace with the performance of the adaptive group the yoked group had to apply considerably more effort in maintaining and processing dual task representations. This is expected as in the yoked group task demands are not matched with participant capabilities and thus require higher mental effort. Furthermore, toward the end of training the adaptive group had to increase the effort applied to processing dual task representations to improve their performance. The differential quadratic workload effect between adaptive and yoked conditions can be seen in **Figure 5**. During this time the yoked group may have become fatigued due to the high level of effort required on the first 3 days of training. Non-linear increases in left DLPFC and right VLPFC were also observed with increased exposure to working memory training.

These findings point to the sensitivity of fNIRS to track both linear and non-linear changes in cerebral hemodynamics as a result of working memory training. Importantly, non-linear changes over time would not have been observed if a pre-/posttraining design commonly used in MRI studies had been used.

The findings also show that fNIRS provides an efficient and effective way to continuously monitor hemodynamic changes over extended periods of time, as required in training studies. In addition, as described in this paper, portable NIRS systems are being developed that could be used to measure the effects of training in complex real world tasks where the use of fMRI would be challenging or impossible.

#### **MONITORING THE DEVELOPMENT OF EXPERTISE IN PILOTING UNMANNED VEHICLES**

The study by McKendrick et al. (2013a) described above used a basic but challenging cognitive task—dual verbal/spatial working memory. Expertise development in more complex tasks that simulate work and other everyday real settings has also been examined. A majority of the studies examining task practice have found decreases in the extent or intensity of neural activations with ongoing practice, particularly in the attentional control areas (Kelly and Garavan, 2005). This finding is true whether the task is primarily motor [e.g., golf swing (Milton et al., 2004)] or primarily cognitive in nature, as in the Tower of London task (Beauchamp et al., 2003) including more complex tasks such as videogame training (Prakash et al., 2012) or the center-out adaptation task (Gentili et al., 2013). Decreases in activation are thought to represent a contraction of the neural representation of the stimulus (Poldrack, 2000) or a more precise functional circuit (Garavan et al., 2000).

In a recent study, we utilized fNIRS to investigate the relationship of the hemodynamic response in the anterior prefrontal cortex to changes in the level of expertise, and task performance during learning of simulated unmanned aerial vehicle (UAV) piloting tasks (Ayaz et al., 2012a). Novice participants with no prior UAV piloting experience participated in a 9 day training program where they used a flight simulator to execute real world maneuvers. Each day, self-reported measures (with NASA TLX), behavioral measures (task performance), and fNIRS measures (prefrontal cortex activity as mental effort on task) were recorded.

Participants practiced approach and landing scenarios while piloting a virtual UAV. The scenarios were designed to expose novice subjects to realistic and critical tasks for a UAV ground operator directly piloting an aircraft. The first scenario was a turnto-approach task, in which the pilot flies through several waypoints on an approach to land at an airfield. The second scenario was a landing task, in which the pilot performs the actual touchdown. In both scenarios, subjects were told to fly as smoothly as possible, learn the optimal paths, cope with crosswinds, and operate within certain speed and bank angle constraints. The experiment protocol involved a total of nine sessions per subject, one session per day. The first session on day 1 was to allow subjects to become acquainted with the flight simulator; by the end of this session, they needed to demonstrate basic understanding of flight simulator controls. Study data were collected during the following eight practice sessions.

Analysis of data from thirteen participants showed a reduction in the fNIRS measures (see **Figure 6**), which were significantly different across practice levels and matches the same trends reported in behavioral performance and self-reported measures. A valid hypothesis can be derived from the evidence that expertise tends to be associated with overall lower brain activity relative to novices, particularly in prefrontal areas (Milton et al., 2004). Both practice and the development of expertise typically involve decreased activation across attentional and control areas, freeing these neural resources to attend to other incoming stimuli or task demands. As such, measuring activation in these attentional and control areas relative to task performance can provide an index of level of expertise and illustrate how task-specific practice influences the learning of tasks.

Results indicate that level of expertise does appear to influence the hemodynamic response in the dorsolateral/ventrolateral prefrontal cortices confirming previous studies with learning cognitive-motor tasks (Hatakenaka et al., 2007; Leff et al., 2008). Since fNIRS allows development of portable and wearable instruments, it has the potential to be used in future learning environments to personalize the training regimen and/or assess the effort of human operators in critical multitasking settings (Ayaz et al., 2012a,b).

#### **DEVELOPMENT OF A PORTABLE, WIRELESS fNIRS SYSTEM**

The portable optical brain imaging system used in our studies reported here was first described by Chance et al. (Chance, 1998; Chance et al., 1993) further developed at Drexel University (Philadelphia, PA), manufactured and supplied by fNIR Devices LLC (Potomac, MD; www.fnirdevices.com).

The system is composed of three modules: a flexible headpiece (sensor pad), which holds light sources and detectors to enable a fast placement of all 16 optodes; a control box for hardware management; and a computer that runs the data acquisition (see **Figure 7**).

The sensor has a temporal resolution of 500 milliseconds per scan with 2.5 cm source-detector separation allowing for approximately 1.25 cm penetration depth and 16 measurement locations (optodes) on a rectangular grid covering the forehead region (see **Figure 7**) designed to monitor dorsal and inferior frontal cortical areas. The light emitting diodes (LEDs) were activated one light source at a time and the four surrounding photodetectors around the active source were sampled. For data acquisition and visualization, COBI Studio software was used (Ayaz et al., 2011).

#### **EVOLUTION OF DREXEL WIRELESS fNIRS**

The need for development and improvement of fNIRS instrumentation has been growing as fNIRS has been increasingly used in human brain activation studies since it was first described by Jobsis (1977) as an optical method for non-invasively assessing cerebral oxygenation changes. In the 1980s, Delpy et al. designed and tested an fNIRS system for a clinical application with newborn infants (Wyatt et al., 1986). Further efforts improved the methodology and hardware (Delpy et al., 1988; Wray et al., 1988; Cope, 1991; Chance et al., 1993; Elwell et al., 1994) and thus expedited the translation of fNIRS based techniques into a useful

**FIGURE 6 | Changes throughout the practice levels: Self-reported ratings: perceived mental effort as measured by NASA TLX (left), Behavioral performance: average error in banking angle (middle),**

**fNIRS measures: average total hemoglobin concentration changes (right) of all subjects throughout days.** Error bars are standard error of the mean (SEM).

neuroimaging tool (Villringer et al., 1993; Chance et al., 1997, 1998; Hoshi and Tamura, 1997; Villringer and Chance, 1997; Obrig et al., 2000; Strangman et al., 2002). Recent comprehensive reviews on fNIRS technology (Ferrari and Quaresima, 2012) and instrumentation (Scholkmann et al., 2013) confirm that the vast majority of instrumentation development was on continuous wave (CW) type fNIRS which is limited in terms of its information content (i.e., it measures only changes of oxygenated and deoxygenated-Hb) compared to frequency and time-resolved approaches. However, CW fNIRS is also most appropriate for miniaturization and portable system development, because the signal type and acquisition timing requirements are less demanding. Moreover, other than brain imaging, the same approach can also be used for many biomedical approaches (Macnab and Shadgan, 2012) such as muscle assessment and other *in vivo* and clinical applications.

The development of wearable and low cost fNIRS systems began in 1996 for prefrontal cortex brain hemodynamics and muscle measurements (Chance et al., 1997). These systems, were later further developed into the portable systems at Drexel University (Izzetoglu et al., 2004, 2005, 2011; Ayaz et al., 2012b) and used in the studies reported here.

One of the earliest wireless telemetry systems for fNIRS was based on a single channel muscle oximeter developed by Nakase and Shiga (Omron Institute of Life Science, Kyoto, Japan) in collaboration with Chance (Ferrari and Quaresima, 2012). This system was used by Hoshi et al. for assessing cognitive function of children that carried the system in their backpack, allowing them to move untethered (Hoshi and Chen, 2002). More recently, a 4-channel wireless *in vivo* imager has been developed at the University of Zurich, Switzerland (Muehlemann et al., 2008) and utilized for neurorehabilitation fNIRS studies (Holper et al., 2010). Also, an EEG integrated prototype was reported for epilepsy research (Safaie et al., 2013).

Consistent with the MoBI (Makeig et al., 2009; Gramann et al., 2011) and neuroergonomics (Parasuraman, 2011) approaches, development of miniaturized fNIRS systems for ubiquitous monitoring of the brain could benefit studies for understanding brain dynamics in ecologically valid real world environments.

#### **TWO UNIT APPROACH USING SMARTPHONES**

In this initial effort, our aim was to develop a miniaturized and battery operated CW fNIRS system with comparable operation features with the wired/portable system (Yurtsever et al., 2006), such as 16 channel full forehead assessment and 2 Hz sampling. To expedite the development and take advantage of the off-the-shelf embedded systems, a smartphone/pocket pc platform was utilized (Yurtsever et al., 2003). A custom software application and low level driver for direct hardware access was developed to interface a PCMCIA based National Instrument data acquisition card and control the optical imaging hardware unit and sampling of the data. The smartphone system simultaneously transmits the collected signals through a Wi-Fi network to a computer base station running COBI Studio software, thus allowing online monitoring at a remote base station (see **Figure 8**).

#### *Control circuitry*

The circuit board contains a stable current source for LEDs, implemented with a high precision voltage regulator, timing control elements (counter, demultiplexer, multiplexers), amplifiers, filters and is powered by a 7.2 volt Lithium-Ion camcorder battery. A 5 V voltage regulator is used to provide constant voltage to the circuit since battery voltage is decreased throughout operation.

The circuit is designed to use a minimum number of digital and analog channels of the data acquisition card, so that different data acquisition (DAQ) cards can be used with the control box. Only two digital channels and four analog channels are required to operate the system.

Controlling the timing of the LEDs and photodetectors is the key point in the design and were controlled by the data acquisition software in the Pocket PC. The LEDs turn on and off sequentially, one at a time. The LED turn on sequence in one scan cycle is depicted in red, pink and black colors below the timing signals in **Figure 9**. The LED turn on sequence is as follows: Turn on LED1 730 nm, read D1, D2, D3, D4 (Detectors 1–4); turn on LED1 850 nm, read D1, D2, D3, D4; dark, read D1, D2, D3, D4 (read offset); turn on LED2 730 nm, read D3, D4, D5, D6, and so on.

The timing in the circuit is controlled only by two digital signals, DIO0 and DIO1 that feed a negative edge triggered 3 bit counter. A detailed timing diagram is presented in **Figure 9**, a block diagram of the circuit is given in **Figure 10**. One of the two digital control lines, DIO0, is the reset/initialization signal and the other one DIO1 controls the 4:16 demultiplexer and triggers state transitions as shown in **Figure 10**. That demultiplexer controls selection of the light source LEDs by turning on/off the single pole throw switch connecting to the current source. During a scan, one of four LEDs is lit at one wavelength at a time and the surrounding four light detectors are sampled. The operation is repeated for the second wavelength and for the other three LEDs. During a scan, background light level at each detector is measured while all LEDs are off. The outputs of the analog multiplexers which select the detector outputs are amplified, filtered and digitized. The data collected for each scan cycle is transmitted from the pocket PC to the base station computer through wireless connection for data analysis and display.

#### **INTEGRATED SINGLE UNIT APPROACH**

The two unit system developed earlier presented usability challenges, for example, it required participants to carry two

separate hardware pieces (pocket pc and control box) and the experimenter needed to maintain and charge two separate batteries, etc. To address these challenges and further miniaturize the overall system, a unified single part wireless system was developed (Rodriguez et al., 2010; Rodriguez and Pourrezaei, 2011). The single unit wireless fNIRS system was designed to meet a series of requirements including being pocket-sized, lightweight, and compatible with operation in a hospital/clinical or other work settings where other wireless communications are taking place. Design parameters also included working with a rechargeable battery and also with a single charge, minimum of 3 h operation at maximum power consumption settings. The device was also required to interface and allow the use of the fNIRS sensor currently being used with the "wired" versions of the system developed at the optical brain imaging lab of Drexel University. Finally, the system should interface and communicate with the COBI Studio (Ayaz et al., 2011) which is a hardware integrated software platform that is used for all instruments developed at the Optical Brain Imaging Lab of Drexel University. The implemented system is depicted in **Figure 11** below. It provided comparable performance to that of the wired system in terms of signal integrity and signal to noise ratio (SNR) (Rodriguez and Pourrezaei, 2011) and was used in pediatric monitoring for pain assessment of neonatal patients in intensive care units (Izzetoglu et al., 2011).

#### *Implementation*

The instrumentation split into two nodes (i) a "collector node" that connects to a host computer and (ii) a "sensor node" that collects the data (**Figure 12**). Both nodes communicate with each other through wireless. For the wireless communication technology between these two nodes common technologies where assessed including Zigbee, Bluetooth, and Wi-Fi. Zigbee was chosen for this implementation due to its low cost, low-power requirement, and wireless mesh networking capabilities. The low power usage allows longer life with smaller batteries and the mesh

networking provides high reliability and larger network range coverage.

The Texas Instruments CC2430 System-On-Chip (SOC) was selected for this solution; the built in 2.4 GHz IEEE 802.15.4 transceiver, 8051 enhanced core, eight channel 12-bit ADC, and 21 digital IOs were sufficient (See **Figure 12**). A new sensor that contains two monolithic photodiodes with built-in transimpedance amplifiers and a two-wavelength (730 and 850 nm) LED source was developed (see **Figure 13**). The LED driver circuit was designed to be capable of driving each wavelength separately with output currents up to 50 mA. The sensor circuit has a buffer amplifier for each of the two photodiodes, whose gain can be controlled individually by the microcontroller. The input signals from the detectors are multiplexed and then passed through anti-aliasing filter (a low pass analog filter) to remove high frequency noise before being fed to the analog-todigital converter (ADC), which samples at a 1.5 kHz rate per channel.

An optional digital filtering of the data can be performed at the sensor node, prior to packaging and sending the data packets over the wireless network. This system can be used with various types of sensor that have different number of detectors. The "collector node" of the system communicates with a host computer through the Universal Serial Bus (USB) port utilizing a custom communication protocol that allows the COBI Studio software to control the system.

#### *Evaluation of the system*

Testing of the newly designed sensor was performed in terms of noise, accuracy, stability, and effectiveness of obtaining signals using solid and liquid phantoms that mimic the adult human head before measurements in humans.

*Solid phantom tests.* The SNR data calculated for using the modular fNIR sensor at a gain setting of 1 and 10 were compared for the input buffer amplifier. The SNR was higher for gain 10 and increased for higher LED current values from around 60 to 75 dB as the LED current reached 40 mA.

In **Figure 14** the response of both systems are compared when the input gain setting is set to 10 and the LED current is swept from 5 up to 40 mA. It can be seen that the

wireless system performs very close to the wired version: the difference between the measurements is 8% and this difference remains almost constant across the different setting configurations under which the data was collected. The gain linearity of the wireless system was also analyzed. With the LED current held constant at 20mA and the gain changed to values of 1, 10, and 20, the system responded linearly within this range.

*Liquid phantom tests.* To evaluate the dynamic response of the system for changes in oxygenated-hemoglobin (HbO) deoxygenated-hemoglobin (Hb) concentrations, a tissue stimulating phantom was used (Bozkurt et al., 2005; Yurtsever et al., 2006; Rodriguez and Pourrezaei, 2011). Liposyn III solution of 1% was prepared in a cylindrical transparent glass beaker from 30% Liposyn III in 1000 ml phosphate buffered saline at pH 7.4. This solution has reduced the scattering coefficient of 10 cm−<sup>1</sup> at 830 nm, which is a good estimate for the human forehead. The mixture was continuously stirred with a magnetic stirring rod to keep the solution homogeneous. To simulate the blood content in tissue, around 50µM, 22 mL of human blood was added to the beaker. The sensor pad was attached to the side of the beaker and baseline was recorded from the wireless unit using COBI Studio. Then, 4 g of baker's yeast was added to the mixture. The yeast respiration led to deoxygenation, so [HbO] decreases and [Hb] increases. Representative data is presented in **Figure 15**. After 13 min, the [Hb] and [HbO] reached a steady state, where oxygenation of hemoglobin and yeast respiration are at equilibrium. Then, we provided oxygen to the solution from an oxygen tank (green line in the graph) to re-oxygenate deoxyhemoglobin. As a result of oxygen bubbling inside the beaker, hemoglobin saturation exceeds the initial saturation and steady state is reached at a higher saturation level.

#### **CONCLUSIONS**

This paper described a range of studies on human performance assessment and skill acquisition monitoring using fNIRS measures of the hemodynamic response of the prefrontal cortex and its relationship to mental workload, expertise, and performance. The results show that the effects of task load and expertise on the hemodynamic response can be reliably and sensitively assessed in a range of tasks, from standardized laboratory tasks to complex cognitive tasks representative of real work settings. With respect to the development of wireless, portable fNIRS, the results, although preliminary, corroborate previous findings and point to the potential of the fNIRS system as a wearable, portable and non-invasive sensor for future neuroergonomics studies. Moreover, miniaturization and wireless system development efforts reported here will benefit future studies that can allow participants to freely navigate in indoor or outdoor environments untethered, consistent with the MoBI approach. Although further work may be needed in specific applications, both wired and wireless fNIRS systems allow for the examination of dynamic aspects of brain function in more natural settings, and are thus suitable for reliable human performance and training assessment.

#### **ACKNOWLEDGMENTS**

The authors would like to acknowledge Drs. Meltem Izzetoglu, Scott Bunce, Kambiz Pourrezaei, and also Davood Tashayyod, Gunay Yurtsever, Mauricio Rodriguez, and Juan Du for their constructive support, contributions, and for taking an active role in of the studies discussed here.This work was supported in part by the U.S. Federal Aviation Administration through BAE Systems Technology Solutions Services Inc. under Primary Contract, DTFA01-00-C-00068, and Subcontract Number, 31–5029862 as well as under U.S. Army Medical Research Acquisition Activity; Cooperative Agreements W81XWH-08-2-0573 and W81XWH-09-2-0104. Additional support was provided by the Air Force Office of Scientific Research grant FA9550-10-1-0385. The views, opinions, and/or findings contained in this article are those of the authors and should not be interpreted as representing the official views or policies, either expressed or implied, of the funding agencies.

#### **REFERENCES**


the adult brain? *Int. J. Psychophysiol.* 35, 125–142. doi: 10.1016/S0167-8760(99) 00048-3


International Airport, NJ: Federal Aviation Administration William J. Hughes Technical Center.


**Conflict of Interest Statement:** fNIR Devices, LLC manufactures the optical brain imaging instrument and licensed IP and know-how from Drexel University. Hasan Ayaz, Kurtulus Izzetoglu, and Banu Onaral were involved in the technology development and thus offered a minor share in the new startup firm fNIR Devices, LLC. The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 03 September 2013; accepted: 28 November 2013; published online: 18 December 2013.*

*Citation: Ayaz H, Onaral B, Izzetoglu K, Shewokis PA, McKendrick R and Parasuraman R (2013) Continuous monitoring of brain dynamics with functional near infrared spectroscopy as a tool for neuroergonomic research: empirical examples and a technological development. Front. Hum. Neurosci. 7:871. doi: 10.3389/fnhum. 2013.00871*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2013 Ayaz, Onaral, Izzetoglu, Shewokis, McKendrick and Parasuraman. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Kinesthetic and vestibular information modulate alpha activity during spatial navigation: a mobile EEG study

#### *Benedikt V. Ehinger <sup>1</sup> \*†, Petra Fischer 1†, Anna L. Gert 1, Lilli Kaufhold1, Felix Weber 1, Gordon Pipa2 and Peter König1,3*

*<sup>1</sup> Neurobiopsychology, Institute of Cognitive Science, University of Osnabrück, Osnabrück, Germany*

*<sup>2</sup> Neuroinformatics, Institute of Cognitive Science, University of Osnabrück, Osnabrück, Germany*

*<sup>3</sup> Department of Neurophysiology and Pathophysiology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany*

#### *Edited by:*

*Klaus Gramann, Berlin Institute of Technology, Germany*

#### *Reviewed by:*

*Lutz Jäncke, University of Zurich, Switzerland Tobias Meilinger, Max Planck Institute for Biological Cybernetics, Germany Markus Plank, Brain Products GmbH, Germany*

#### *\*Correspondence:*

*Benedikt V. Ehinger, Neurobiopsychology, Institute of Cognitive Science, University of Osnabrück, Albrechtstra*β*e 28, Osnabrück 49069, Germany e-mail: behinger@uos.de*

*†These authors have contributed equally to this work.*

In everyday life, spatial navigation involving locomotion provides congruent visual, vestibular, and kinesthetic information that need to be integrated. Yet, previous studies on human brain activity during navigation focus on stationary setups, neglecting vestibular and kinesthetic feedback. The aim of our work is to uncover the influence of those sensory modalities on cortical processing. We developed a fully immersive virtual reality setup combined with high-density mobile electroencephalography (EEG). Participants traversed one leg of a triangle, turned on the spot, continued along the second leg, and finally indicated the location of their starting position. Vestibular and kinesthetic information was provided either in combination, as isolated sources of information, or not at all within a 2 × 2 full factorial intra-subjects design. EEG data were processed by clustering independent components, and time-frequency spectrograms were calculated. In parietal, occipital, and temporal clusters, we detected alpha suppression during the turning movement, which is associated with a heightened demand of visuo-attentional processing and closely resembles results reported in previous stationary studies. This decrease is present in all conditions and therefore seems to generalize to more natural settings. Yet, in incongruent conditions, when different sensory modalities did not match, the decrease is significantly stronger. Additionally, in more anterior areas we found that providing only vestibular but no kinesthetic information results in alpha increase. These observations demonstrate that stationary experiments omit important aspects of sensory feedback. Therefore, it is important to develop more natural experimental settings in order to capture a more complete picture of neural correlates of spatial navigation.

**Keywords: spatial navigation, mobile EEG, alpha band, event related desynchronization, alpha suppression, virtual reality, independent component analysis, time-frequency analysis**

#### **INTRODUCTION**

Well-controlled studies under restricted laboratory conditions have contributed enormously to the knowledge about brain processes over the past decades. These insights are thought to capture relevant aspects of brain functionality that also hold true in natural settings or even generalize to brain processes under natural conditions. It remains to be tested whether these assumptions hold and to which degree the results obtained in reduced experimental setups transfer to natural conditions.

Specifically, such controlled settings often imply sitting in front of a computer monitor, thus omitting important sensory information that would otherwise be given in natural behavior. In particular in the case of spatial navigation, kinesthetic (registered by joint, tendon, and muscle proprioceptors) and vestibular sensory information (originating from translational or rotational changes mediated by the semicircular canals of the inner ear) have to be regarded as key percepts. With our present work, we attempt to set a first step toward evaluating the generalizability of typical laboratory paradigms to real world conditions.

In everyday life, navigation requires continuous multimodal integration of inputs from various senses—including visual kinesthetic, and vestibular information—to compute one's relative position in the environment. The mere ability to see already gives access to a multitude of spatial cues (e.g., optical flow, binocular disparity, or motion parallax) and aids not only to the recognition of objects but also to the perception of spatial relations. Vision, therefore, is assumed to dominate spatial processing. This can occur in various reference frames, i.e., multiple ways to describe how objects relate to each other. Several studies (e.g., Schicke et al., 2002, for review see Eimer, 2004; Pasqualotto and Proulx, 2012) provide evidence that even non-visual spatial perception via sound, touch, or proprioception is influenced by the existence of an early visually induced external reference frame when two modalities are interacting. Performance accuracies of healthy participants, as well as late-blind people, drop when additional biased sensory information of another modality poses a distraction. Congenitally blind people instead perfectly succeed in ignoring irrelevant stimuli exactly as the task requires. These results indicate that early visual experience establishes constitutive sensory integration within one common external reference frame and that this process does not emerge in the complete absence of vision.

However, some studies challenge the dominant role of vision. For example, Loomis et al. (1993) compared the path integration ability of congenitally blind and blind-folded sighted participants and found only small differences, suggesting that proficiency in spatial navigation relying on non-visual modalities is not necessarily dependent on previous visual experience. Other studies (e.g., Loomis et al., 1993; Klatzky et al., 1998; Wartenberg et al., 1998) also suggest that for accurate spatial updating, i.e., revision of internal information on the spatial context, vision alone is not sufficient when kinesthetic and vestibular signals that are normally generated by whole-body movements are missing.

Previous psychophysical experiments showed that when the availability of vestibular and kinesthetic sensory information was systematically varied, subjects' orientation estimates differ significantly: Frissen et al. (2011) found evidence of inaccurate spatial updating when kinesthetic information was provided but vestibular updating was prevented. Subjects tended to underestimate their perceived self-motion while they were walking in place on a circular treadmill in the absence of vision. The authors hypothesized that this effect potentially results from the conflicting zero-movement input from the vestibular system. In contrast to this, passive movement generated by the treadmill provided only vestibular information but yielded accurate spatial updating in spite of the complete absence of muscle activity. While vision was absent in the study of Frissen et al., Chance et al. (1998) showed that performance in indicating location directions of previously passed objects benefited when vestibular and kinesthetic information were provided in addition to vision (Chance et al., 1998). This is not surprising if one regards the following: Usually, proprioceptive information does not necessarily have to be available or congruent when vestibular information is changing, for example when driving a car, riding a train, or being carried as an infant. Muscle activities during natural movements instead never occur without appropriate vestibular updating. Likewise, many other psychophysical experiments (Chance et al., 1998; Loomis et al., 1999; Kearns et al., 2002; Frissen et al., 2011) show that altering the availability of sensory information causes changes in behavior. Therefore, we also expect to see modulations of the underlying brain processes.

A number of studies investigated the electrophysiological correlates of spatial navigation by recording electroencephalography (EEG) (e.g., Gramann et al., 2006, 2010; Plank et al., 2010; Chiu et al., 2012). Gramann et al. (2010) distinguished between neuronal correlates of subject groups that either use allocentric or egocentric reference frames while navigating. Subjects were classified according to their strategies of mentally representing their heading in a given environment (Gramann et al., 2006, 2010; Goeke et al., 2013). Usually, roughly half of all subjects hold on to an egocentric reference frame, which is also named *Turner* strategy, while the other half solves the navigation task according to a *Non-Turner* strategy using an allocentric reference frame. This nomenclature comes from the fact that so-called *Turners* update their heading and position in a mental map, whereas *Non-Turners* only update their position but not their heading direction—they will "not turn away" from their initial orientation. This updating is also influenced by response modality (Avraamides et al., 2004) and it depends on whether translational or rotational aspects are included (May, 2004), and whether subjects actively move through the environment (Klatzky et al., 1998). In addition to the predicted behavioral differences between these two groups, Gramann et al. (2010) also detected significant differences in their neuronal activities. Subjects virtually passed through a tunnel consisting of a straight segment, a turn of varying angle, and another straight segment while EEG was measured. During turns, alpha desynchronization occurred in parietal and occipital areas, which is in general considered to reflect enhanced cognitive processing in the respective areas (e.g., Pfurtscheller and da Silva, 1999). Gramann et al. (2010) moreover reported stronger alpha blocking in Turners in right inferior occipital gyrus, whereas Non-turners showed a stronger alpha blocking near bilateral occipito-temporal, inferior parietal, and retrosplenial cortex. The authors argue that this enhanced suppression probably indicates abstract processing of egocentric visual flow (like using a bird's eye view) when maintaining an allocentric reference frame. Functional magnetic resonance imaging studies provide similar evidence for activity in parietal cortex, more precisely in the precuneus and retrosplenial cortex (Committeri et al., 2004; Wolbers et al., 2007).

The studies described above, however, are conducted in stationary setups and therefore have provided insights only into neural processing of spatial updating without physical movement. For this reason, the authors emphasize the need for whole body imaging under real world conditions (Gramann et al., 2014). Following these ideas, we test in what way these findings on brain processes during passive navigation generalize to a task that provides not only visual input, but successively adds proprioceptive and vestibular information—two major additional task-relevant senses.

The investigation of physiological mechanisms of spatial navigation raises challenging technical issues that go hand in hand with the application of non-invasive techniques to study the human brain while the subject is in motion. Techniques such as functional magnetic resonance imaging or positron emission tomography have been used extensively in spatial navigation research (e.g., Maguire et al., 1998; Committeri et al., 2004; Wolbers et al., 2011) and provide a high spatial resolution, but are unsuitable for mobile setups as they are stationary. Miniaturization of electronic devices has led to the development of mobile EEG recording equipment; yet, EEG signals are weak and prone to movement artifacts. Recently developed systems equipped with actively shielded electrodes and cables have been particularly designed to record electrophysiological data from moving and even walking probands (Waveguard, ANT, Netherlands). Furthermore, advances in data analysis techniques permit improved cleaning of EEG signals from artifacts, instead of excluding such recordings (Gwin et al., 2010; Delorme et al., 2011). The aim of our study is to take advantage of these new possibilities to extend previous findings on neural correlates of spatial navigation and to investigate especially the integration of multiple senses, which inevitably occurs during navigation coupled with active movement.

We complemented the EEG system with a mobile virtual reality device that allowed us to implement a less restricted but at the same time well-controlled experimental paradigm. The core of our experiment is to vary the availability of vestibular and kinesthetic sensory information while the provided visual input stays identical across conditions.

To this end, we devised a specific hardware built of two straight segments that are connected by a turnable platform. The segments can be rotated freely to form new path configurations and serve as guide rails to keep our subjects on track. A cart assured stability and carried auxiliary technical equipment. Additionally, it enabled us to transport subjects passively along the track preventing kinesthetic updating in the presence of vestibular information. Our main motivation for devising the construction was to get access to the manipulation of vestibular updating with the help of the turntable: It can be rotated by leg movements of the subject while the orientation of the upper body remains constant providing kinesthetic input but no vestibular updating (see Materials and Methods for a detailed description).

Considering that our task implied similar sensory modifications, we were interested in a comparison between our subjects' behavior and the studies introduced earlier (Chance et al., 1998; Frissen et al., 2011): Providing vestibular information leads to better performance, therefore, we hypothesized that the performance of our subjects, namely the accuracy of their homing angle estimates, would improve with vestibular sensory information. Furthermore, in the passive baseline condition, we also expected to detect alpha suppression in the same regions as Gramann et al. (2010). Adding only kinesthetic or vestibular information could lead to an incongruency effect and consequently higher alpha suppression, reflecting increased cognitive demands. Since subjects are probably more involved and "immersed" in the active conditions with additional sensory input, we assumed to find even stronger effects there. This hypothesis is also suggested by previous comparisons of EEG recordings of participants in 3D or 2D environments, which showed a very similar alpha decrease in the more immersed 3D setting (Havranek et al., 2012; Kober et al., 2012).

Taken together, the overall aim of our study was to assess how previous findings on EEG correlates of spatial navigation extend to life-like experimental tasks and, moreover, to explore how the integration of multiple senses influences the underlying task-driven brain dynamics.

#### **MATERIALS AND METHODS**

#### **GENERAL METHODS**

#### *Subjects*

Five right-handed male students (mean age: 22.4 years, range 21–24 years) participated in the study. Two of those subjects showed allocentric navigation behavior in a previous online test (www*.*navigationexperiments*.*com/TurningStudy*.* html), while the other three exhibited egocentric behavior. Subjects' gaming experience has been either less than 6 months (S3, S4), 2–5 years (S5), or up to 10 years (S1, S2). All participants had normal or corrected to normal vision. They were paid 8C per hour. The procedure had been approved by the local ethics committee, and prior to the start of the experiment subjects gave informed written consent.

#### **DESIGN**

We employed a 2 × 2 within-subjects design by manipulating available information as follows: (1) In the passive condition, participants stood while watching a movement presented via a head-mounted display. (2) In the vestibular condition, subjects were moved while standing on a cart and thus received vestibular but no kinesthetic information about the turn. (3) In the kinesthetic condition, subjects were rotating a turntable beneath their feet with a lower limb movement while keeping their head oriented straight. This mimics an on-thespot-turn without vestibular updating but appropriate kinesthetic information. (4) Lastly, the active condition approximated natural behavior best as participants walked and turned by themselves.

#### *Task*

Our experiment is based on a modified triangle completion task: Participants traversed one leg of a triangle, turned on the spot, and continued along the second leg. In order to keep our probands on the right track when navigating through a random-dot starfield they had to follow a centered, small, spherical guiding object from their start to their end position. The starfield consisted of randomly distributed dots that were aligned on a horizontal ground plane. Additionally, a small number of dots were scattered across the remaining space, leaving the area through which the subjects were passing clear (see Supplementary Materials). Visibility of the dots faded to black within 20 m viewing distance from the subjects. At the end of each trial, a virtual arrow faded in with a black background while the starfield dissolved. The arrow was displayed at a constant distance in front of the participants. It was initially oriented in walking direction, and subjects rotated it either to the left or to the right by pressing two buttons on a gamepad. Once the arrow reached its desired direction, the participants pressed a third button to confirm their final decision.

Each subject completed 120 trials per condition. A total number of 480 trials per subject were recorded in four sessions with two 60 trials blocks and therefore two different conditions in each session. Participants were allowed to take breaks in between the blocks or whenever requested. Each of the six different angles (30◦, 60◦, 90◦, to the left and to the right) occurred equally often in each condition, namely 20 times. The order of the conditions across subjects as well as the order of the angles in the conditions was randomized. All subjects took part in a training session prior to their first recording in which they familiarized themselves with the setup and task in each condition until they felt confident with the experiment.

#### *Hardware*

In each of our four conditions, subjects were moving or standing on a customizable walkway consisting of two straight segments which are linked by a round platform with a turntable that can be locked in position (**Figure 1**). The straight segments have wheels attached and can be adjusted by the experimenter between trials to match the new path layout and serve as guide rails. Participants moved along the track inside of a wheeled walking frame in order to prevent deviations from the desired path.

**FIGURE 1 | Walkway.** We devised a flexible walkway consisting of two straight segments that are connected by a turnable platform. Subjects were wearing an EEG cap (not shown in the picture)

and a Head Mounted Display, and (were) moved along this predefined track while being stabilized by a cart, carrying auxiliary equipment.

The cart additionally provided storage space for VR and EEG equipment (**Figure 2**).

The virtual environment was developed in Python-based WoldViz Vizard and conveyed via a HeadMounted Display (HMD, nVisor SX60, total horizontal field of view = 44◦). Positions were tracked with the optical PPTX4 Precision Position Tracker system (WoldViz). We used two trackers—one was attached on top of the HMD to assess the subject's position and the other one was placed at the end of the second straight segment to determine the exact angle between the two segments prior to the beginning of each trial. The anglewas calculatedin relation to the center of the virtual environment that had been previously set to the center of the turntable. Precise knowledge about the accurate track and the required angle of the turn was essential as this information was displayed to the subject via the guiding object in the virtual environment. The subjects' head orientation was tracked with an additional inertial orientation sensor (InterSense InertiaCube2+) that was directly attached to the HMD. In order to transfer the rotation information of the turntable to the displayed virtual environment—which was required for the kinesthetic condition—a second wireless inertial 3D motion tracker (Xsens MTw) was attached underneath the platform.

For adjusting the arrow at the end of each trial, subjects used a consumer gamepad (Microsoft Sidewinder Plug and Play Gamepad). All devices were connected to a laptop (Dell Precision M4700, i7-3720 2.6 GHz, 4 GB Ram, NVIDIA Quadro K2000M) that rendered the virtual environment, transmitted it to the HMD via a video control unit (NVIS) and simultaneously sent triggers to the EEG laptop (DELL Latitude E6230, i5-3320 2.6 GHz, 4 GB Ram).

#### **STATISTICAL ANALYSIS OF BEHAVIORAL DATA**

By using the law of sines, the correct homing angle—defined as the direct line between start and end position (see **Figure 3**) was calculated from the exact angle of the on-the-spot turn and the three vertices of the triangle: the start position, the position of the turn and the subjects' end position where the answer was given. The correct answer was then subtracted from the estimated angle, yielding negative errors for responses that underestimated the correct angle. Such underestimation, also called *undershoot*, is indicated via an arrow that was not rotated far enough under the assumption that the shortest way was used. In **Figure 3** all blue arrows display underestimation behavior. Correspondingly, positive errors that are shown in green denote an arrow adjustment that ended beyond the correct angle under the assumption of the shortest route, which corresponds to

so-called *overestimation behavior*. Overestimation of the correct answer angle, or overshoot, hence corresponds to an inward bias in a triangle-completion task. We will call this measure *relative error* which equates the systematic bias of a subject by including negative and positive signs, and distinguish it from *absolute errors*. Absolute error simply refers to the absolute amount of the error irrespective of any knowledge about over- or underestimation of the error.

In a next step the data were winsorized, setting all values beyond the 5th or 95th percentile to the nearest percentile in order to get robust estimates of the mean errors. This was done separately for each subject-condition combination to avoid raising the performance of a single subject or a single condition. Two 2 × 2 repeated measures analyses of variance (ANOVA) with mean relative error and variance of relative errors across trials as dependent variable were conducted in SPSS (IBM) with two factors: kinesthetic information (on/off) and vestibular information (on/off). As the most extreme winsorized relative errors were −69*.*9◦ and 41.4◦, implying that the von Mises distribution can be approximated by a normal distribution (Mardia and Peter, 2000, p. 36), circular statistics were not required.

Changes of variance across conditions were investigated by computing multiple-sample and pairwise Levene's tests on the unwinsorized data of all five subjects individually. This assessment is of special interest to us as a difference in variance could be regarded as change in stochastic error. Furthermore, we correlated the absolute errors with the trial numbers to check for general performance improvements. To detect a potential change in the over-/underestimating behavior we additionally correlated the relative errors with the trial numbers.

#### **PHYSIOLOGICAL METHODS**

#### *Recording and preprocessing*

Electrophysiological data were recorded using 128 Ag/AgCl electrodes which were placed according to the 5% international system (Oostenveld and Praamstra, 2001). We kept scalp impedances below 10 kOhm and sampled EEG data with 1024 Hz using an average reference (asalab, ANT, Netherlands) with the ground electrode placed on the forehead. The electrode positions were digitized using a 3D positioning device (Xensor, ANT, Netherlands). We used passive electrodes that are actively shielded (Waveguard, ANT, Netherlands), which minimizes cable sway and line noise artifacts.

We analyzed the EEG data with custom scripts using MATLAB (Mathworks) and EEGLAB (v12, SCCN, Delorme and Makeig, 2004). The data were resampled to 256 Hz and filtered with a 1 Hz high-pass (−6 dB cutoff: 0.5 Hz, 1 Hz transition bandwith) and a 120 Hz low-pass (−6 dB cutoff: 124 Hz, 8 Hz transition bandwith) FIR filter (EEGLAB, firfilt plugin from Widmann). In order to counter line noise, a notch FIR filter between 48 and 52 Hz (−6 dB cutoff: 49 Hz, 51 Hz, 2 Hz transition bandwidth) was applied. The data were visually cleaned for strong artifacts resulting from electrical noise and strong muscle artifacts. On average 9.5% of trials (range=[0% 26.7%]) were excluded from the analysis. Moreover, channels with extreme noise or signal drop-off were removed. On average 4.25 channels (range = [2 13]) were excluded. The data were re-referenced to the new average of all remaining data channels. As re-referencing to the average introduces correlations to the data, channel IZ was excluded in all subjects to get a rank complete data matrix. The AMICA algorithm (version 12, Palmer et al., 2008) was applied with standard parameters except from the addition of automatic rejection of unlikely data. In total we obtained 5485 clusters.

Dipoles of each IC-topography were fitted using the DIPFIT toolbox (Oostenveld and Oostendorp, 2002) and a standard Boundary Element Method (BEM). Individual electrode positions were warped to fit to the template. When the explained dipole variance was less than 85%, or the source localization was outside of the brain, indicating neck muscle or eye artifacts, ICs were excluded from the analyses. In total, the remaining set of ICs consisted of 1807 components.

#### *ERSPs and clustering*

After epoching the data from −20 s before the turn to 12 s after the turn, event related spectral perturbations (ERSPs) were calculated using three-cycle Morlet wavelets on the lowest frequency linearly increasing to 75 cycles at 50 Hz. We accounted for different trial lengths by linearly warping the ERSPs in the time domain (see Gwin et al., 2010). The duration of the central part of the first straight leg and the complete turn was warped to a constant time span of 2.5 and 4.25 s, respectively. After warping, we applied single trial normalization (Grandchamp and Delorme, 2011), i.e., we divided each point in time during the turn segment by the mean log power of the baseline segment of that specific trial, which is the central part of the first straight leg. Finally, trial average ERSPs were calculated to serve as input for subsequent clustering.

ICs were grouped into functional and anatomical clusters to allow a comparison of components over subjects and sessions. After grouping, principal component analysis (PCA) was applied to reduce measures (ERSP, scalp maps, and dipole location) into a joint measure space. We used the standard EEGLAB k-means clustering to obtain functional clusters of ICs over subjects. ICclustering parameters from previous studies (Gramann et al., 2010) were used: 3D dipole locations were weighted by a factor of 15, ERSPs were reduced by PCA to 10 dimensions, normalized and weighted by a factor of 4. As an additional measure, IC topographies were reduced to 10 dimensions, normalized and weighted with 1. Finally, a PCA dimension reduction to 10 dimensions was applied to the joint measure space. This combined joint measure space over all subjects was clustered with a robust k-means algorithm into 25 clusters plus an additional one that contained outliers deviating by more than 3 standard deviations.

#### *Cluster-stability test*

In order to test the robustness or stability of our clusters, we compared them against the H0-hypothesis that they are not stable; or in other terms, as k-means clustering always returns k clusters, we have to make sure that each cluster is not a random result. Thousand bootstrap samples with 1807 ICs in each sample were drawn with replacement from the set of all ICs. The same clustering procedure (as described above) was used in order to get a bootstrap distribution of clusters. We then calculated the maximal overlap of the bootstrapped cluster components with the originally observed cluster components. The overlap was calculated as the number of identical components in both clusters. We also calculated a normalized overlap where we removed multiple identical components in the bootstrap clusters. This did not change the results. Afterwards, we calculated the H0-distribution by assuming that the clusters were randomly arranged in the brain: The same bootstrapping procedure was used, but we randomly applied cluster labels to the ICs, assigning them to random clusters. In a last step we tested both distributions of overlap values against each other with an unpaired *t*-test. All clusters were significantly different from their H0-distribution (*p <* 0*.*001). As we still found the same clusters after resampling, we showed that our clusters were stable and not randomly assigned.

#### *ROI analysis*

As described above, we were interested in cortical alpha band modulation during the turn, which has been shown to be sensitive to spatial updating (Gramann et al., 2010). We defined a region of interest (ROI) defined in time-frequency space consisting of the turn from 0 to 4250 ms and the alpha band with its well-established borders of 8 and 12 Hz. Then we analyzed the clusters based on their component ERSP ROI activations. To check whether the activations in the ROI differ significantly from zero, we Monte Carlo resampled data points with replacement 1000 times and calculated their means. Finally, we calculated the *p*-value by dividing the number of values that are larger (respectively smaller) or equal to the observed mean by the total number of values. To compensate for two-sided testing, resulting values were multiplied by two to get the respective *p*-values.

To test for differences between conditions, we used the EEGLAB "statcond" function (Delorme and Makeig, 2004) and applied a non-parametric permutation-based 2 × 2 ANOVA to our data. For *post-hoc* investigations of differences between conditions we used permutation-based unpaired *t*-tests.

#### *Cluster selection*

To select those clusters that are informative for our hypothesis, we deployed the following strategy: First, artifactual clusters representing muscular, oculomotor, or cardiac activities were identified by their dipole locations and spectra. We excluded these clusters from further analyses. The remaining clusters were screened for modulations in the alpha band during spatial updating that were similar to those previously reported (Gramann et al., 2010). Further details on cluster selection will be given in the results section.

#### **RESULTS**

#### **BEHAVIORAL RESULTS**

In the following section, we will compare subjects' pointing errors across the four different conditions when performing a modified triangle-completion task.

The average trial duration including the time of the response was 36.4 s. Traversing the straight segments required on average 7.8 s for the first and 7.6 s for the second segment. The mean duration of all turns was 5.0 s and the average response time for rotating the arrow was 7.5 s (passive condition: 6.2 s, kinesthetic condition: 8.3 s, vestibular condition: 7.9 s, active condition: 7.7 s). Pairwise *t*-tests result in significant differences between the passive and all other conditions (against kinesthetic: *p* = 0*.*003, against vestibular: *p* = 0*.*015, against active: *p* = 0*.*010). However, the reduced response times in the passive condition are not surprising, as the subjects did not have to stop the cart (kinesthetic/active condition) or wait for the cart to be stopped (vestibular condition) in order to answer.

After classifying all trials into either Turner or Non-Turner responses, we found that 98.8% of all given answers were closer to the optimal Turner response. It seems reasonable to assume that the remaining 1.2% of all trials were merely highly erroneous trials than genuine Non-Turner responses. Therefore, we conclude that regardless of their previously determined preference, our subjects responded in an egocentric reference frame in the present study.

The averages of the absolute errors over trials were 13.3◦, 14.3◦, 12.8◦, and 12.2◦ in the passive, kinesthetic, vestibular, and active condition, respectively. Whether the absolute errors differed significantly between conditions was assessed by a 1 × 4 and a 2 × 2 repeated measures ANOVA. None of the two tests showed any significant effects.

Means over subject averages and bootstrapped 95% confidence intervals of relative errors for the four conditions were −8*.*6◦ [−21*.*2◦, −3*.*5◦], −1*.*1◦ [−13*.*8◦, 6*.*0◦], −6*.*3◦ [−11*.*8◦, −0*.*7◦], and −7*.*4◦ [−13*.*3◦, −1*.*7◦]. The negative sign indicates a tendency toward undershooting the correct answer angle in all but the kinesthetic condition.

In **Table 1**, means of the systematic relative error and standard deviation of all trials are shown for each subject in each condition. It denotes rather heterogeneous behavior between subjects concerning angle estimation accuracy and performance changes across conditions.

Correlations of the absolute errors with the trial numbers showed an improvement in performance over the whole experimental course in four of five subjects (S1: *r* = −0*.*387, *p <* 0*.*001; S2: *r* = −0*.*308, *p <* 0*.*001; S4: *r* = −0*.*326, *p <* 0*.*001; S5: *r* = −0*.*139, *p* = 0*.*003; S3, ns: *r* = −0*.*068, *p* = 0*.*150).

The undershooting behavior of four of the five subjects was similarly reduced over time according to the correlations between relative errors and trial numbers (S1: *r* = 0*.*464, *p <* 0*.*001; S3: *r* = 0*.*385, *p <* 0*.*001; S4: *r* = 0*.*359, *p <* 0*.*001; S5: *r* = 0*.*483, *p <* 0*.*001; S2, ns: *r* = 0*.*068, *p* = 0*.*189). Thus, subjects showed small learning effects after prior training.

A 2 × 2 repeated measures ANOVA with kinesthetic sensory information (on/off) and vestibular sensory information (on/off) as factors and the mean relative errors over all 120 trials for each subject and condition as dependent variable revealed a significant interaction [*F(*1*,* <sup>4</sup>*)* = 24*.*42, *p* = 0*.*008, partial η<sup>2</sup> = 0*.*859] but no significant main effects [kinesthetic: *F(*1*,* <sup>4</sup>*)* = 2*.*661, *p* = 0*.*178, partial η<sup>2</sup> = 0*.*243; and vestibular: *F(*1*,* <sup>4</sup>*)* = 0*.*337, *p* = 0*.*593, partial η<sup>2</sup> = 0*.*078]. In each factor level, the dependent variable was normally distributed.

The interaction plot (**Figure 4**) shows that the subtraction of vestibular information from active walking resulted in a much less pronounced or nearly absent bias toward underestimating the correct homing angle in the kinesthetic condition. In contrast to this, the subtraction of vestibular information from the vestibular to the passive condition did not evoke such a drastic change. The kinesthetic condition, therefore, seems to be special in regard to the systematic performance error of our subjects. However, multiple comparison corrected pairwise *t*-tests between individual conditions yielded no significant differences.

With subject-specific variances of relative errors across trials as dependent variable, the 2 × 2 repeated measures ANOVA yielded no significant effects. In order to examine how random errors of individual subjects differed across conditions, we calculated robust multiple-sample Levene's tests for equal variances for each subject. The tests indicated that only for three subjects variances were heterogeneous in at least two conditions. After conducting pairwise Levene tests for each subject, we found diverging multiple comparison corrected significant differences

**Table 1 | Mean errors and standard deviations [mean (std)] for each participant in every condition.**


(α = 0*.*0083) for the different subjects. The effects were diverging as they were either only detected in single subjects (for passive-kinesthetic, passive-vestibular, and kinesthetic-vestibular condition comparisons) or otherwise were present in two participants but contradicting each other as the differences of the respective two variances had opposite signs. Subject 4 exhibited a higher variance and thereby stochastic error in the passive condition compared with the active condition, whereas for subject 1 it was exactly the other way round. These results let us conclude that a change in condition does not lead to a systematic change in stochastic error in our group of five subjects.

#### **EEG RESULTS**

During the experiment, subjects needed to update their spatial heading and position to point back to their starting position. We expected to detect the strongest effects of spatial updating processes during the turn. In order to investigate alpha-band related modulation during the turn we calculated time frequency (ERSPs) decompositions of our EEG data.

The EEG data were clustered into 25 individual clusters, not only to remove artifactual components, but also to identify separate electrophysiological processes. Due to the low number of participants we can only claim statistical evidence for our group and not the whole population.

We visually inspected the cluster spectra and dipole locations for the purpose of locating non-neural artifact clusters and identified six stereotypical muscle clusters, one heart and one eye artifact cluster. We also detected one theta-midline cluster (Onton et al., 2005). Seven further clusters did not show any sign of specific alpha modulation in the ROI and we could not classify them as other electrophysiological processes (see **Table 1**, Supplementary Material). We excluded these and the previously mentioned artifact clusters from further analyses.

The remaining nine clusters [Occipital Medial (OM), Occipital Left (OL), Occipital Right (OR), Parietal Left (PL), Parietal Medial (PM), Parietal Right (PR), Motor Left (ML), Motor Right

(MR) and Fronto-Parietal (FP)] were analyzed in more detail. **Table 2** shows the coordinates of the cluster centroids and their localization in the brain; they are, due to the large spread, not necessarily representative for the exact location of the underlying source (Akalin Acar and Makeig, 2013). All nine clusters can be seen in the left section of **Figure 5**. The ERSPs of PR, MR, and OR are not shown as they exhibit similar (PR) to identical (OR, MR) patterns over conditions to their contralateral equivalent.

As a first step of analysing the selected clusters, we looked at individual cluster data pooled over all conditions in order to investigate alpha modulation in the ROI. We observed a significant alpha decrease in all occipital/parietal clusters as the bootstrapped means were significantly different from zero (*p <* 0*.*001). This replicates the findings of previous studies (Gramann et al., 2010; Plank et al., 2010; Chiu et al., 2012). Remarkably, we do not find significant alpha decrease in the other, more anterior clusters (MR: *p* = 0*.*518, ML: *p* = 0*.*092, and FP: *p* = 0*.*306).

A main question of the study was the investigation of alpha modulation between different conditions. We therefore pooled over all clusters that individually showed a significant alpha decrease (OM, OL, OR, PM, PL, PR). A significant effect of the factor vestibular (*p* = 0*.*017) and a significant interaction (*p* = 0*.*006) was found using a bootstrapped 2 × 2 ANOVA. *Posthoc* comparisons with Monte Carlo permutation unpaired *t*-tests showed a significant difference of the passive against the kinesthetic condition (*p* = 0*.*039), the passive against the vestibular condition (*p* = 0*.*001), and the active against the vestibular condition (*p* = 0*.*017). These results indicate that the passive condition does not generalize to all other conditions, as the kinesthetic and the vestibular conditions go along with alpha suppression that is stronger than in the passive condition.

In order to check whether we find differences in single clusters between at least two conditions, we split the data

**Table 2 |** *X, Y, Z* **coordinates in Talairach space (Lancaster et al., 2000) of the cluster centroids and their localization in the brain.**


into clusters and conditions and applied a permutation-based ANOVA. The factor kinesthetic and the interactions were significant in two of the three anterior clusters [ML(Kinesthetic): *p <* 0*.*001, ML(Interaction): *p* = 0*.*011, FP(Kinesthetic): *p <* 0*.*024, FP(Interaction): *p* = 0*.*005]. A significant effect of the factor vestibular was found in one of the poster clusters [OM(Vestibular) = 0.023]. No other significant effects were detected

Subsequently, we ran *post-hoc* tests in order to examine which conditions were pairwise different. For cluster ML, *post-hoc* permutation tests showed a significant difference between passive and vestibular (*p* = 0*.*018), passive and active (*p* = 0*.*005), and vestibular and active (*p* = 0*.*002). For cluster FP, we found significant differences between passive and vestibular (*p* = 0*.*039), passive and active (*p* = 0*.*004), kinesthetic and active (*p* = 0*.*036), and vestibular and active (*p* = 0*.*007). In Cluster OM, we found a significant effect of passive vs. vestibular (*p* = 0*.*012) and kinesthetic vs. vestibular (*p* = 0*.*030).

Summarizing the cluster effects, we identify the following pattern: In posterior regions, the passive condition shows the weakest alpha modulation with slightly higher desynchronization in the active condition—whereas the kinesthetic and vestibular conditions display stronger modulations and therefore strong alpha desynchronization.

In more anterior regions, the fronto-parietal clusters, we see alpha synchronization in the vestibular condition, but desynchronization in the active condition. This pattern is visible in both clusters ML and MR, but only significant in ML. We conclude that differences between conditions were accompanied by significant ERSP alpha modulations in occipital and parietal regions.

#### **DISCUSSION**

Our study was designed with the aim to investigate the influence of different types of sensory information on EEG correlates of spatial navigation. By manipulating the availability of kinesthetic and vestibular input, we demonstrate that task-related brain activation is indeed modulated depending on the access to different sensory modalities.

We reproduced findings of earlier studies (Gramann et al., 2010; Plank et al., 2010; Chiu et al., 2012) that had shown a modulation of the alpha band in different brain areas during the turn in a triangle completion task. Furthermore, we demonstrated that incongruent information result in a modulation of alpha suppression. Depending on whether kinesthetic or vestibular information is given, medial and frontal areas show ambiguous patterns of synchronization or desynchronization. These observations reveal significant differences between the passive condition, as usually employed in laboratory setups, and the other conditions involving locomotion.

In this study, we relied on independent component analysis with subsequent source localization. Only afterwards we pooled the data (dipoles) of all subjects. This is an efficient way to deal with the small sample size. Clustering was performed by a k-means algorithm resembling the procedure in Gramann et al. (2010).

Given the experimental setup and results, there are some issues to be discussed. One might argue that the sensory impression generated by the kinesthetic condition could be artificial. Yet, it was designed in a way that the conveyed impression was as natural

projected to an MNI standard brain. Log-frequency ERSPs are shown in the central four columns. Blue denotes a decrease and red an increase in EEG power compared with baseline. ERSPs are not shown for PR, MR, and OR—they strongly resemble the pattern of their contralateral equivalents PL, ML, and OL. Boxplots in the right column depict the mean, cluster-wise ERSP activity in the alpha band (8–12 Hz) during the turn for each condition. In

significant alpha band effects between vestibular and passive, vestibular and active, and active and passive conditions. ERSP alpha activity of cluster FP shows a significant difference between the kinesthetic and active condition. The clusters are labeled as follows: OM, Occipital Medial; OL, Occipital Left; OR, Occipital Right; PL, Parietal Left; PM, Parietal Medial; PR, Parietal Right; ML, Motor Left; MR, Motor Right; and FP, Fronto-Parietal.

as possible. When using the cart and the turnable platform, the experience felt close to an on-the-spot-turn with a fixed cart, as the visual input was directly linked to the motion of the platform by an orientation sensor. We thus assume that the setup was effective in providing the desired impression of kinesthetic information in addition to vision.

Another issue and potentially confounding factor is the presence of active self-conducted and not passively initiated movement. In both the active and kinesthetic conditions, subjects had full control over their movement in the environment. In contrast to this, they had no self-control over the movement in the passive and vestibular condition. This could have been avoided by enabling the subject to navigate via joystick in the passive condition. However, in the vestibular condition active control would be more difficult to achieve. Hence, we have to bear in mind that the kinesthetic and the active condition not only include information about muscle movements, but also include cognitive processing involved in action generation as well.

Furthermore, the EEG clustering was performed not based on individual MRIs, but in a common brain space. Subsequently, we made statistical inferences on cluster level with ICs as independent measures. This implies that our statements have to be understood restricted to the specific set of subjects investigated. Future studies can improve on this situation by recruiting a representative sample of the general population and utilizing individual MRIs providing information about individual differences in subject's brain structures (Akalin Acar and Makeig, 2013).

By testing all subjects in a classical, online homing task (Goeke et al., 2013) prior to the main experiment, three of them were classified as Turners and two as Non-Turners. Subsequently, subjects exercised all four conditions of the main experiment until they felt comfortable. In the course of the actual recordings, all subjects displayed Turner behavior, i.e., the use of an egocentric reference frame, which is in line with Klatzky et al. (1998). This applied even to the two subjects that had been previously classified as Non-Turners. Their interpretation of a passive condition, akin to the typical laboratory setup, might have been influenced by performing the same task actively during the accommodation phase. Specifically, the training included active components, leading to a conflict between the movement and the mentally constructed allocentric maps. This conflict might have initiated the switch of the Non-Turner's reference frame from allo- to egocentric. Therefore, the concept of Turner and Non-Turner behavior can be interpreted as a differential involvement of distinct navigation modes in virtual environments.

We observed no distinct relationship between performance and previously classified preferred use of reference frame. It thus informs about individual preferences of spatial navigation, but not necessarily about performance in the real world (see also Klatzky et al., 1998; Goeke et al., 2013).

The overall behavioral results depict a trend toward underestimating the correct answer angle. One possible explanation could be the fact that throughout the whole experiment the arrow was initially oriented such that it was pointing away from the participant. Just as a matter of convenience or impatience, subjects might have released the button too early and thereby submitted a slightly biased, undershooting arrow adjustment. Future experiments should take this into consideration and randomize the initial orientation of the angle. An alternative explanation is that subjects simply overestimated the size of the turns.

Nevertheless, we observe a significant interaction with kinesthetic being the only condition that includes zero in the 95% bootstrapped confidence interval. As pairwise *t*-tests do not show significant differences between conditions, we can only discuss the descriptive behavioral differences. The mean relative errors of our subjects suggest a tendency toward reduced undershooting behavior in the kinesthetic condition, and is in line with the results of Frissen et al. (2011). If their findings directly transfer to our subjects, the conflicting zero-movement input from the vestibular system would lead to an underestimation of the turned angle, which should therefore elicit overshooting of the correct homing angle. An alternative explanation could be a strategy switch. In principle, reactions times could give an indication of a strategy switch. Indeed, we observe variation in reaction times with the fastest responses in the passive condition. However, in this condition subjects did not need to come to a halt. In the kinesthetic and active condition subjects had to stop by themselves and in the vestibular condition the experimenters stopped the cart. These differences can easily explain the observed variations of reaction time. Hence, interpreting the reaction time data is difficult, which leads us to refrain from a strong statement on the possibility of a strategy switch.

We did not detect consistent improvement in stochastic error in the active condition compared with the passive condition. This means that more complete navigational and congruent information did not improve the capability to choose the correct homing angle. Related results were reported by Grant and Magee (1998), who found that participants did not differ in performances when they were either actively navigating or navigating only by operating a joystick. Their performance had improved only when they carried out their task in real instead of virtual environments.

Conversely, other studies provide evidence that adding vestibular information should lead to an improved performance. Chance et al. (1998), for example, showed decreased accuracy in estimating the directions of object locations when vestibular and kinesthetic information were missing. In another study by Kearns et al. (2002), participants performed a triangle completion task either provided with visual information only, or with full bodily information from active walking. As a consequence of introducing additional kinesthetic and vestibular information, the variability of homing angle estimates decreased and general answer patterns shifted from under- to overestimation behavior. Although some participants of our study performed best in the active condition, no clear trend emerged from our data. This could be explained by a ceiling effect potentially resulting from low task difficulty.

One of our main goals was to test whether the neural correlates of spatial navigation determined in previous static EEG experiments would generalize to a more active setting. Comparing our results to past findings reveals similarities but also provides an extension of the previous literature. Nine clusters from our study closely reproduce four out of the seven clusters reported by Gramann et al. (2010). These clusters show a highly similar alpha pattern and centroid location. Similar clusters are OM, PM, MR, and PL. The cluster centroids between studies deviate to some degrees (average deviation in Talairach space: *x*: 3.75, *y*: 1.75, *z*: 9.25). This might be due to the fact that Gramann et al. visually inspected all ICs before clustering, whereas we included all computed components for clustering, and therefore added more noise to the clusters. Nonetheless, the intersection is large and clusters can be related easily. This means that the underlying sources seem to be reliable and generalize to mobile setups. The remaining five clusters are either the respective mirroring hemispheric clusters (three of five) or are new observations.

In general, the passive condition reproduces the findings by Gramann et al. (2010): Alpha activation was decreased during the turn in occipital, temporal, and parietal clusters. This suppression is thought to represent active processing and stronger cortical excitability (Pfurtscheller and da Silva, 1999; Klimesch et al., 2007) and as it was found during the turn, we conclude that the participants used more cortical resources while spatial updating was most demanding.

Even though we made some critical changes in the experimental design—we changed body position from sitting to standing, we changed the environment from a tunnel design to a starfield, we used an immersive 3D HMD instead of a computer screen, and most importantly, we used an on-the-spot-turn instead of a curved path—our results from the passive condition are nearly identical to Gramann et al.'s (2010) by means of alpha pattern and cluster locations. Due to these similarities, we argue that alpha suppression during spatial updating in a triangle completion task is a general phenomenon independent of certain changes in experimental setups.

We recorded EEG not only in the passive condition, but also in conditions where we manipulated whether kinesthetic and vestibular sensory information was provided. In posterior clusters, we found the strongest desynchronization in the vestibular and kinesthetic conditions—those that provide incongruent information about the path traveled. Conversely, in the passive condition, only moderate alpha suppression was present. In the kinesthetic and vestibular condition, the brain might need more resources to integrate partially contradictory information, like the lack of kinesthetic information or the zero-movement input from the vestibular system. The observed enhancement of alpha desynchronization could be a result of such an increased demand of resources. This pattern is present when all posterior clusters are taken into account and therefore could indicate ongoing integration processes as the parietal lobe is a prominent area for spatial navigation (Stein, 1989; Frings et al., 2006; Wolbers et al., 2007; Gramann et al., 2010) and multimodal integration (Bremmer et al., 2001). This is compatible with the observed activity in the posterior clusters, which ultimately show differential activity with different available modalities.

A different pattern emerges in anterior clusters (ML, MR, FP). The proximity to motor cortices suggests that the synchronization patterns in those clusters can be classified as mu rhythm, which is in the range of 8–12 Hz and known to get desynchronized during movement (Arroyo et al., 1993); this might account for the strong desynchronization in the active condition. In contrast to the other three conditions, the vestibular condition produces synchronization in those clusters, which might result from the absence of active movement while the participants were passively moved through space. Taken together the availability of kinesthetic and vestibular information significantly influences the pattern of alpha activity in cortical clusters.

#### **CONCLUDING REMARKS**

In this paper, we reproduced and extended previous results of Gramann et al. (2010). When only visual information was provided, we detected similar alpha band suppression during the turn of a modified triangle completion task in occipital, temporal, and parietal areas. We extended these results by providing vestibular and kinesthetic information in combination or as single, isolated sources of information. The observed difference in alpha modulations in these additional conditions demonstrates that static experiments, providing only purely visual information, omit important aspects of spatial navigation. We therefore claim that it is necessary to construct more realistic and life-like experiments to clarify the actual neural correlates behind spatial navigation. Due to rapid advances in the development of experimental equipment, this objective might become even easier to achieve in the course of the next couple of years. In regard to future studies, our approach can be applied to more complex spatial navigation tasks, like way-finding or maze tasks.

With our work, we have provided first insights into the complete picture of underlying processes and conclude that the presence of additional sensory information significantly modulates neural correlates of spatial navigation.

#### **AUTHOR CONTRIBUTIONS**

Anna L. Gert, Benedikt V. Ehinger, Felix Weber, Gordon Pipa, Lilli Kaufhold, Petra Fischer, Peter König designed the study. Anna L. Gert, Benedikt V. Ehinger, Felix Weber, Lilli Kaufhold, Petra Fischer recorded the data. Anna L. Gert, Benedikt V. Ehinger, Lilli Kaufhold, Petra Fischer did the analysis. Anna L. Gert, Benedikt V. Ehinger, Gordon Pipa, Petra Fischer, Peter König wrote and revised the manuscript.

#### **ACKNOWLEDGMENTS**

The authors would like to thank Maria Marchante Fernandez for her help designing the project. This project was supported by Cognition and Neuroergonomics/Collaborative Technology Alliance #W911NF-10-2-0022 (Peter König) and ERC-2010-AdG #269716 - MULTISENSE (Peter König).

#### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/ fnhum.2014.00071/abstract

**Supplementary Figure S1 | Starfield and guiding object.** Subjects followed a small, spherical guiding object (blue) indicating the predefined path and turn. The starfield consisted of randomly distributed dots (red), which faded out within 20 m viewing distance.

#### **Supplementary Figure S2 | Response arrow during answer period.**

#### **REFERENCES**


with navigation performance. *Psychophysiology* 49, 43–55. doi: 10.1111/j.1469- 8986.2011.01270.x


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 07 November 2013; accepted: 29 January 2014; published online: 25 February 2014.*

*Citation: Ehinger BV, Fischer P, Gert AL, Kaufhold L, Weber F, Pipa G and König P (2014) Kinesthetic and vestibular information modulate alpha activity during spatial navigation: a mobile EEG study. Front. Hum. Neurosci. 8:71. doi: 10.3389/fnhum. 2014.00071*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Ehinger, Fischer, Gert, Kaufhold, Weber, Pipa and König. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# It's how you get there: walking down a virtual alley activates premotor and parietal areas

#### *Johanna Wagner 1, Teodoro Solis-Escalante1,2, Reinhold Scherer 1,3\*, Christa Neuper 1,4 and Gernot Müller-Putz <sup>1</sup>*

*<sup>1</sup> Laboratory of Brain-Computer Interfaces, Institute for Knowledge Discovery, BioTechMed, Graz University of Technology, Graz, Austria*

*<sup>2</sup> Department of Biomechanical Engineering, Delft University of Technology, Delft, Netherlands*

*<sup>3</sup> Rehabilitation Clinic Judendorf-Strassengel, Judendorf-Strassengel, Austria*

*<sup>4</sup> Department of Psychology, BioTechMed, University of Graz, Graz, Austria*

#### *Edited by:*

*Klaus Gramann, Berlin Institute of Technology, Germany*

#### *Reviewed by:*

*Lutz Jäncke, University of Zurich, Switzerland Alissa Fourkas, National Institutes of Health, USA Daniel P. Ferris, University of Michigan, USA*

#### *\*Correspondence:*

*Reinhold Scherer, Laboratory of Brain-Computer Interfaces, Institute for Knowledge Discovery, BioTechMed, Graz University of Technology, Inffeldgasse 13, 8010 Graz, Austria e-mail: reinhold.scherer@tugraz.at*

Voluntary drive is crucial for motor learning, therefore we are interested in the role that motor planning plays in gait movements. In this study we examined the impact of an interactive Virtual Environment (VE) feedback task on the EEG patterns during robot assisted walking. We compared walking in the VE modality to two control conditions: walking with a visual attention paradigm, in which visual stimuli were unrelated to the motor task; and walking with mirror feedback, in which participants observed their own movements. Eleven healthy participants were considered. Application of independent component analysis to the EEG revealed three independent component clusters in premotor and parietal areas showing increased activity during walking with the adaptive VE training paradigm compared to the control conditions. During the interactive VE walking task spectral power in frequency ranges 8–12, 15–20, and 23–40 Hz was significantly (*p* ≤ 0.05) decreased. This power decrease is interpreted as a correlate of an active cortical area. Furthermore activity in the premotor cortex revealed gait cycle related modulations significantly different (*p* ≤ 0.05) from baseline in the frequency range 23–40 Hz during walking. These modulations were significantly (*p* ≤ 0.05) reduced depending on gait cycle phases in the interactive VE walking task compared to the control conditions. We demonstrate that premotor and parietal areas show increased activity during walking with the adaptive VE training paradigm, when compared to walking with mirror- and movement unrelated feedback. Previous research has related a premotor-parietal network to motor planning and motor intention. We argue that movement related interactive feedback enhances motor planning and motor intention. We hypothesize that this might improve gait recovery during rehabilitation.

**Keywords: neurorehabilitation, robotic gait training, locomotion, motor planning, electroencephalography, interactive feedback, gait adaptation**

#### **1. INTRODUCTION**

Gait recovery is a major rehabilitation goal in post-stroke therapy. Impairments in normal gait affect balance, stride length, walking speed, obstacle avoidance and endurance. These factors often lead to an increased risk of falls and related injuries (Said et al., 1999). In consequence, affected individuals are not able to react adequately and promptly to demands within their environment, which hinders them in performing activities of daily living autonomously (Duncan et al., 1998).

Much has been discussed about optimal training strategies in rehabilitation and different therapy approaches. Several key features including the form and intensity of motor training are assumed to support neural plasticity in motor learning. In gait rehabilitation extensive training can be provided by using a robotic gait orthosis that allows a high number of movement repetitions (Lum et al., 2002; Mehrholz et al., 2013). However, robotic rehabilitation alone generates a highly repetitive and monotonous practice environment that requires little effort from the individual. Findings on discrete upper limb movements indicate that active performance in the training is more effective for motor learning (Lotze et al., 2003; Kaelin-Lang et al., 2005). Furthermore several studies suggest that the individual's motivation in the training is one of the critical factors in determining the therapy outcome (Maclean and Pound, 2000; Liebermann et al., 2006). It has been argued that a more interactive and demanding learning context, might enhance the individual's motivation and promote active participation in the motor task. Virtual Environments (VEs) provide a convenient solution to these ends as different kinds of motor tasks with various degrees of difficulty can easily be implemented (Holden, 2005; Liebermann et al., 2006). Recent studies suggests that VE can in fact promote active participation during robotic gait training. Brütsch et al. (2010, 2011) and Schuler et al. (2011) showed that training with VE significantly increased active participation during robot assisted gait in children with various neurological gait disorders and healthy controls. Active participation was assessed using biofeedback values from hip and knee torques (Brütsch et al., 2010, 2011) and electromyographic activity of the lower limbs (Schuler et al., 2011). Other research suggests that VE combined with robot assisted lower limb training has a greater effect on improving gait parameters such as balance, speed, and endurance in individuals after stroke than robot-assisted training alone (Jaffe et al., 2004; You et al., 2005; Mirelman et al., 2009, 2010).

However, so far the underlying neurophysiological processes that are elicited by motor related feedback in a VE during gait training and their relevance to the relearning of motor skills have not been investigated. Active participation and voluntary drive in movements have been shown to be crucial for motor learning (Lotze et al., 2003; Kaelin-Lang et al., 2005). But how does the notion of voluntary drive translate to the movement of gait? In general voluntary movements have been defined as two different kinds of subjective experiences: "intention" which relates to the phase of movement planning and "agency" describing the feeling that one's own movement has caused a specific effect (Tsakiris et al., 2010). These feelings can be promoted by feedback in a VE. Findings also indicate that the experience of agency is related to the presence of perceptual and sensory feedback about the effects of motor actions in the physical world (Blakemore et al., 2002). Thus the feeling of agency can be increased by enhancing feedback to motor actions in a VE. Investigations on upper limb movements reveal a sensorimotor network of premotor-parietal cortices that is related to motor awareness and intention (Sirigu et al., 2003; Berti et al., 2005; Tsakiris et al., 2010), (for a review see Haggard, 2008). However, walking is a rhythmic and highly automated movement and it is not clear which parts of the movement are controlled by the cortex, the brain stem and central pattern generators in the spinal cord (Armstrong, 1988; Grillner et al., 1998). Hence motor awareness and intention most likely differ between walking and discrete upper limb movements. In animals motor areas of the cortex are only activated during gait initiation and gait adaptation, but not during unperturbed gait (Armstrong, 1988; Drew et al., 2008).

Few studies in humans have investigated motor preparation during gait. Recently we compared active to passive walking in a gait robot and found a trend for differences in sensorimotor EEG rhythms over the premotor cortex additionally to differences over sensory areas (Wagner et al., 2012). Wieser et al. (2010) studied evoked potentials related to gait like movements during an upright position. They found that the cortical activity over sensorimotor areas was highest shortly before a change of direction between the flexor and extensor movement of the legs. Haefeli et al. (2011) showed an increased activation over prefrontal areas during the preparation and performance of obstacle steps with EEG. Recently Sipp et al. (2013) showed that walking on a balance beam elicited increased electroencephalographic theta band activity over a wide range of mostly midline cortical areas compared to steady state treadmill walking. Several fNIRS studies have investigated motor preparation during gait. Increased activity over the prefrontal cortex (PFC) and the SMA was observed during adaptive walking compared to steady state walking (Suzuki et al., 2004), as well as during the preparation before gait initiation (Suzuki et al., 2008; Koenraadt et al., 2013). Additionally Koenraadt et al. (2013) found increased activation over the PFC during precision stepping. Consequently it seems that adaptive and challenging training paradigms that continually require participants to adjust their gait are necessary to produce motor planning during gait.

In the current study we examined the impact of an interactive VE feedback task on the EEG patterns during robot assisted walking. We compared this to walking with a visual attention task in which the stimuli were unrelated to the movement and mirror feedback where participants were observing their own movements. We chose these control conditions for two different reasons. First, to account for the amount of visual attention that is required by the interactive feedback task. The visual attention task provides visual stimuli unrelated to the movement, while the mirror feedback consists of visual information relevant to the participants' movement. The latter condition should thus activate the mirror neuron system and account for possible activations of this system during VE feedback. Higher cortical activation during VE compared to mirror feedback and the visual attention task should therefore reflect additional motor planning and visuomotor processing required by the interactive feedback. The second reason we chose the mirror feedback as a control conditions is that in automated gait rehabilitation therapy mirror feedback is often used. Research has demonstrated that mirror feedback during therapy can improve motor recovery after stroke (for a review see Ramachandran and Altschuler, 2009). These studies assume that part of the efficacy of mirror feedback could be due to the stimulation of dormant "mirror neurons." Thus we wanted to examine whether the interactive VE feedback would produce a measurable higher activation of sensorimotor areas relative to mirror feedback.

In particular we hypothesize that walking with interactive feedback in a VE would increase motor planning and intention and thus activate premotor and parietal areas relative to walking with mirror feedback and a visual attention task. Additionally we hypothesize that if the VE task would yield higher cortical activation of these areas compared to mirror feedback interactive VE feedback may be more beneficial for motor learning.

#### **2. MATERIALS AND METHODS**

#### **2.1. PARTICIPANTS**

Eleven healthy volunteers (26 ± 2 years, 7 male) with no past or current neurological or locomotor deficits participated in this study. The experimental procedures were approved by the ethical committee of the Medical University Graz. Written informed consent was obtained from all subjects before the experiment.

#### **2.2. EXPERIMENTAL DESIGN AND PROCEDURE**

Participants walked with a robotic gait orthosis (Lokomat, Hocoma AG, Switzerland) under five different visual feedback conditions. Each condition lasted 4 min and was repeated two times during the experiment. The Lokomat is a robotic driven gait orthosis that includes electrical drives in knee and hip joints and incorporates a motorized treadmill and body weight support system. Parameters of the Lokomat were adjusted according to the common practice in clinical therapy with the help of experienced physical therapists. Walking speed was adjusted according to the participants leg length with the formula: speed = 0.54(leg)/27.8 where leg is the participant's leg length in cm and the speed is computed in kilometer per hour. Walking speed ranged from 1.8 to 2.2 km per hour between participants. For comparison, fast overground walking speed lies at around 5 km/h (Bohannon, 1997). Body weight support (BWS) was adjusted for each participant at around 30%. The Lokomat was run in a control mode with 100% guidance force. The feedback conditions consisted of:

**NoFB** Participants walked while looking at a black screen.


One gait cycle was defined as the interval between two right leg heel contacts (one gait cycle lasted from 1.6 to 2.4 s depending on the participant's leg length). Before starting the experimental sessions subjects were asked to train under the virtual reality

**FIGURE 1 | Experimental setup: subject walking in the lokomat gait orthosis with body weight support.** The amplifiers for EEG recordings are fixed on a board in front of the participant. The orthosis is adapted and fixed to the participant's legs with the help of an experienced physical therapist; Left: robotic assisted walking. Speed (≤2.2 km/h) and body weight support (∼30%) were adjusted for each participant; Right top: participant walking in the 3rd person VE condition. Right bottom: gaze screen with possible locations for the graphical objects.

feedback conditions for some minutes to get used to the orthosis and to steering in the VE. After a short training period (about 3 min for each VE task), all subjects reported that they were able to control sufficiently well the VR. Conditions were randomized. In all conditions, participants were asked to look straight ahead, not to close their eyes for prolonged periods of time, and to blink normally. **Figure 1** summarizes the experimental setup.

#### **2.3. DATA ACQUISITION**

The EEG was recorded from 61 sites using two 32-channel amplifiers (BrainAmp MR plus amplifiers, Brainproducts, Munich, Germany). Electrodes were mounted in an electrode cap (EasyCap, Germany) according to the 5% 10/20 system (Oostenveld and Praamstra, 2001). The electrooculagram (EOG) was recorded from three electrodes, two placed on the outer canthi of the eyes and one between the eyes on the forehead. Both EEG and EOG were referenced to the left mastoid, and ground was placed on the right mastoid. All electrode impedances were reduced below 10 k before the recording. Three-dimensional electrode coordinates were measured on a screening day prior to the actual measurement with the Zebris Elpos system (Noraxon, USA). EEG and EOG was acquired with 1 kHz sampling rate, and band pass filtered between 0.1 and 500 Hz. The timing of the heelstrike of both legs was assessed using mechanical foot switches placed over the calcaneus bone at the foot sole of both feet.

#### **2.4. EEG ANALYSIS**

EEG data analysis was performed using Matlab 2012b (The MathWorks Inc., Natick, MA) and EEGLAB 11.0b functions (Delorme and Makeig, 2004).

In Wagner et al. (2012) we showed that it is possible to account for artifact contamination of the EEG with Infomax Independent Component Analysis during robotic gait training following the methods of Onton et al. (2006) and Gwin et al. (2010). Before submitting the EEG to an ICA the data was preprocessed accordingly.

First the data (EOG and EEG) were high pass filtered at 1 Hz using a zerophase FIR filter (order 7500) to minimize drifts, low pass filtered at 200 Hz (zerophase FIR filter order 36), and subsequently downsampled to 500 Hz. Channels with prominent artefacts were excluded from further analysis (avg. 2.2; range: 0–7), and the EEG and EOG were rereferenced to a common average reference that was computed from the remaining EEG channels. The continuous EEG data were then visually inspected for non-stereotyped artifacts (e.g., swallowing, electrode cable movements, etc.) and affected partitions were removed from further analysis. For automatic artifact rejection the data were partitioned into segments of 0.5 s to identify outliers exceeding the average of the probability distribution of values across the data segments by ±5 *SD*. On average, per condition 72% of the gait cycles of each participant's EEG data remained in the analysis (range: 61–89%, *SD*: 11).

Next, the preprocessed datasets containing EEG and EOG were decomposed using an adaptive independent component analysis (ICA) mixture model algorithm (AMICA) (Palmer et al., 2006, 2008). AMICA is a generalization of the Infomax algorithm (Bell and Sejnowski, 1995; Makeig et al., 1996) and multiple mixture (Lee et al., 1999; Lewicki and Sejnowski, 2000) ICA approaches. Infomax ICA utilizes temporal independence to perform blind source separation (Makeig et al., 1996). ICA was performed on individual subjects over all conditions (GAZE, MIRROR, 1stP VE, 3rdP VE, noFB).

Individual component scalp maps were submitted to a single dipole source localization algorithm using a standardized three-shell boundary element head model (BEM) implemented in EEGLAB (Oostenveld and Oostendorp, 2002; Delorme et al., 2012). Individual participants' electrode positions were coregistered and aligned with a standard brain model (Montreal Neurological Institute, MNI, Quebec, Canada). Ideally independent components representing synchronous activity within a cortical domain are characterized by scalp maps fitting the projection of a single equivalent current dipole. Therefore, the goodness of fit for modeling each independent component scalp map with a single equivalent current dipole was used to quantify component quality. Only ICs whose dipoles were located within the head and fitted their scalp projection with a residual variance of less than 10% were considered further.

ICs representing artifacts were identified and rejected from further analysis by visual inspection considering the scalp map, the event-locked time course and the power spectrum. The remaining ICs were submitted to an automatic clustering routine implemented in EEGLAB (Delorme and Makeig, 2004) using principal component analysis (PCA). Feature vectors coding differences between ICs in dipole location, power spectral density (PSD) (3–40 Hz), and scalp projection were reduced to 10 principal components and clustered with *k*-means (with *k* = 13). Components further than three standard deviations from the obtained cluster centers were moved to a separate "Outlier" cluster. Only clusters that contained more than half of the participants were further analyzed. Furthermore, as we were interested in motor related functions, we considered only clusters in sensorimotor areas.

#### **2.5. CLUSTERS OF CORTICAL ICs**

The PSD (using Welch's Method) and event-related spectral perturbations (ERSP) (Makeig, 1993) were computed for each independent source. To generate gait cycle ERSPs single trial spectograms were computed and timewarped using a linear interpolation function, thus aligning the timepoints for right and left heelstrike over trials. Relative changes in spectral power were obtained by averaging the difference between each single-trial log spectogram and baseline (the mean IC log spectrum over all gait cycles per condition). To visualize significant event-related changes from baseline, deviations from the average gait cycle log spectrum were computed with a bootstrap method (Delorme and Makeig, 2004). This analysis revealed gait cycle related activity in one of the clusters that was significant from baseline (see **Figure 2**). This modulation occurred in a varying frequency band ranging from 23 to 40 Hz between persons. For further statistical analysis an individual band in this frequency range was selected for each participant, considering only frequencies that were significantly different from baseline. Spectral activity in 8–12 Hz alpha and 15–20 Hz beta bands did not differ overtly between subjects. Furthermore the spectra of single subjects did not show

multiple peaks in these frequency bands. Therefore the standard bands were used for further analysis.

For statistical analysis ERSPs were computed for the GAZE, MIRROR, 1stP VE and 3rdP VE using a common baseline: the average gait cycle log spectrum computed from the noFB condition. Independent component ERSPs were then averaged in three frequency bands: 8–12 Hz (alpha), 15–20 Hz (beta), and subject specific bands in the range 23–40 Hz.

For statistical analysis we divided the gait cycle symmetrically in two stationary phases 10–30% and 60–80% of the gait cycle and two transition phases 30–60% and 80–10% of the gait cycle. Since two of the sensorimotor clusters we identified were located in midline areas we could not attribute their activity to one of the hemispheres (see **Figure 3**). The stationary phases correspond to the midstance (10–30%), initial swing (60–73%), and miswing phases (73–87%). The transition phases correspond to the terminal stance (30–50%), preswing (50–60%), terminal swing (87–100%), and loading response (0–10%) following the definition by Perry (1992).

A repeated measurements 4 × 4 within-subject ANOVA with factors "feedback" (GAZE vs. MIRROR vs. 1stP VE vs. 3rdP VE) and "gait cycle phase" (two stationary phases and two transition phases) was computed for each cluster and each frequency band separately. Multiple comparisons were corrected controlling for false discovery rate (Benjamini and Yekutieli, 2001) with a significance level set *a priori* at 0.05. In cases where the assumption of sphericity was violated significance values were Greenhouse-Geisser corrected. Additionally we computed the effect size η2. Simple paired *t*-tests with a bootstrapping method were employed for *post hoc* testing, and multiple comparisons were corrected controlling for false discovery rate with an *a priori* alpha level at 0.05. For *post hoc* comparisons we also computed the effect size (cohen's *d*) based on the distance between means.

#### **3. RESULTS**

Three clusters located in central midline areas revealed differences between the feedback conditions (see **Figure 3**). The number of subjects and sources contained in each cluster and Tailarach coordinates of cluster centroids are displayed in **Table 1**.

Cluster A, located in the premotor cortex, showed significant changes (*p* ≤ 0.05) from baseline relative to the phases of the gait cycle in the band 23–40 Hz visible in the single IC ERSPs during GAZE, NoFB, MIRROR and in reduced form during 1stP VE and 3rdP VE, (see **Figure 2**). This cluster also presented a significant difference in the average spectrum between the feedback

**the posterior cortex (Brodmann area 7); (C) Cluster C located in the posterior cortex (Brodmann area 40).** From left to right in each row: cluster average scalp projections; dipole locations of cluster ICs (blue

both of the VE conditions in the mu and in the beta range can be observed [Naming: Ss, ICs—number of subjects (Ss) and Independent Components (ICs) in the cluster].

conditions in the beta band (*F*(3, <sup>24</sup>) = 6.9 *p* ≤ 0.0094, η<sup>2</sup> = 0.46), (see **Table 2**). *Post hoc* tests revealed a significant (*p* ≤ 0.03) difference between VE and all other feedback conditions. For gait cycle related modulations in the 23–40 Hz frequency range a significant interaction between gait phases and conditions was found (*F*(9, <sup>72</sup>) = 2.6, *p* ≤ 0.0094, η<sup>2</sup> = 0.25)(see **Table 3**). *Post hoc* tests revealed that power in this range was significantly (*p* ≤ 0.0085) reduced in the two stationary gait phases during both of the VE conditions compared to GAZE (see **Figure 4**). But only the second stationary gait phase during 3rdP VE was significantly (*p* ≤ 0.0085) different from MIRROR. Compared to GAZE, MIRROR showed significantly (*p* ≤ 0.0085) reduced power in this band in the first stationary gait phase. Interestingly there is a significant difference between 1stP VE and 3rdP VE in the second transition phase of the gait cycle. For an overview and Cohen's *d* values see **Table 4**.

For cluster B (parietal cortex, Brodman area 7) the ANOVA revealed a significant main effect for the mean spectrum between the visual feedback conditions in the mu band (*F*(3, <sup>27</sup>) = 9.9, *p* ≤ 0.0094, η<sup>2</sup> = 0.56), and in the beta band (*F*(3, <sup>27</sup>) = 11.8, *p* ≤ 0.0094, η<sup>2</sup> = 0.60). *Post hoc* tests show that spectral power in the mu band (*p* ≤ 0.0025) and in the beta band (*p* ≤ 0.0045) is significantly reduced in the VE conditions compared to MIRROR and GAZE. The ANOVA for cluster C (parietal cortex, Brodmann area 40) revealed a significant main effect for the mean spectrum between the visual feedback conditions for the mu band(*F*(3, <sup>24</sup>) = 10.0, *p* ≤ 0.0094, η<sup>2</sup> = 0.55), the beta band (*F*(3, <sup>24</sup>) = 14.0, *p* ≤ 0.0094, η<sup>2</sup> = 0.64) and the gamma band



**Table 3 | Significant differences in mean gait cycle spectra between feedback conditions (***p* **≤ 0***.***05 corrected with false discovery rate), and effectsize (cohen's** *d***) (***d***1 and** *d***3, respectively denote Cohen's** *d* **values for 1stP VE and 3rdP VE).**


**FIGURE 4 | Average gait event-related spectral perturbations (ERSPs) for cluster A:** for each feedback condition ERSPs are computed relative to the full gait cycle baseline obtained from the noFB condition. Then ERSPs are averaged over subject specific frequency bands between 23 and 40 HZ and then averaged over subjects for cluster A. Temporally aligned events are marked for the right leg heel contact at 0% as the beginning and 100% as the end of the gait cycle, and for the left heel-strike at 50%. Each feedback condition is represented by a colored trace. It is visible that during 1stP and 3rdP VE in stationary gait phases (10–30% and 60–80%) power in this band is decreased compared to the other feedback conditions. Also a difference between 3rdP VE and 1stP VE during the second transition phase of the gait cycle (30–60%) is evident. Vertical lines mark the beginning and the end of gait cycle phases. Asterisks mark significance between feedback conditions in the indicated gait cycle phase.



(*F*(3, <sup>24</sup>) = 8.3, *p* ≤ 0.0094, η<sup>2</sup> = 0.51) (see **Figure 3**). *Post hoc* tests show that spectral power in the mu band (*p* ≤ 0.0055) is significantly reduced in the VE conditions and in the MIRROR condition compared to GAZE. The *post hoc* tests also show that spectral power in the beta band (*p* ≤ 0.013) and in the 23–40 Hz range (*p* ≤ 0.0075) is significantly reduced in the VE conditions compared to MIRROR and GAZE. Additionally the tests reveal that during MIRROR feedback spectral power in the beta band (*p* ≤ 0.013) is significantly reduced compared to GAZE. For an overview of significant comparisons and Cohen's values refer to **Tables 2** and **3**.

#### **4. DISCUSSION**

Our analysis revealed three independent component clusters in premotor and parietal areas that showed significantly decreased spectral power in alpha, beta and 23–40 Hz frequency ranges during the interactive VE tasks compared to MIRROR and GAZE. This spectral power decrease indicates a higher neuronal activation (Pfurtscheller and Lopes da Silva, 1999).

Gait cycle related modulations in cluster A visible in the single IC ERSPs (see **Figure 2**) showed reduced activity during 3rdP VE compared to GAZE. Statistical analysis revealed that during both VE conditions power in the 23–40 Hz range is significantly decreased in the two stationary gait phases compared to GAZE. Also comparisons between MIRROR vs. GAZE and MIRROR vs. VE show only significant differences in stationary gait phases. Interestingly, however, there is a significant difference between 1stP VE and 3rdP VE in the second transition phase of the gait cycle (see **Figure 4** and **Table 4**). In a previous study we found the same gait cycle related modulation in a 25–40 Hz frequency range during active and passive robot-assisted walking in the premotor cortex (Wagner et al., 2012). Central midline activity in the frequency range 30–45 Hz has been previously related to muscle activation during upper and lower limb movements (Pfurtscheller and Neuper, 1992; Pfurtscheller et al., 1993; Brown, 2000; Mima et al., 2000; Alegre et al., 2003; Müller-Putz et al., 2003, 2007; Raethjen et al., 2008). Results from Pfurtscheller and Lopes da Silva (1999) and Pfurtscheller et al. (1996) suggest that activity in an overlapping frequency band is involved also in motor planning. These studies reported synchrony of oscillations in the frequency range 36–40 Hz over the premotor area and in relation to the sensorimotor area shortly before movement-onset and during execution of movement. Interestingly Petersen et al. (2012) recently observed synchrony in the frequency range 24–40 Hz between EEG recordings over the foot motor area and the electromyogram from the tibialis anterior muscle during steady state walking. The significant coupling occurred prior to heel strike during the swing phase of walking. This corticomuscular coherence is similar in frequency band and cortical location to the gait cycle related modulation we find in the 23–40 Hz range. The stationary gait phases in our study coincide with the swing phases of both legs. Hence the decreased power during VE may represent processes involved in motor planning during these phases. The difference between 1stP VE and 3rdP VE during the second transition phase of the gait cycle is especially interesting and may indicate that participants were using different strategies for steering the avatar in the two conditions. We generally observed a more variable pattern of the 23–40 Hz modulation during 3rdP VE compared to the other conditions.

Our results also show a significant decrease in beta band power in the premotor cortex during VE compared to MIRROR and GAZE. Numerous scalp EEG and ECoG studies have related event-related desynchronization (ERD) in the alpha (8–13 Hz) and beta (15–25 Hz) rhythms to the activation of sensorimotor areas (Crone et al., 1998; Pfurtscheller and Lopes da Silva, 1999; Neuper and Pfurtscheller, 2001; Pfurtscheller et al., 2003; Miller et al., 2007), while synchrony in alpha and beta bands has been connected to a deactivation or inhibition of these areas (Klimesch et al., 2007; Neuper et al., 2007). Interestingly two recent studies showed that elevated synchrony in the sensorimotor beta rhythm promotes postural and tonic contraction and causes movements to be slowed (Gilbertson et al., 2005; Joundi et al., 2012); and a recent review suggests that modulation of beta activity is predictive of potential actions (Jenkinson and Brown, 2011). There is evidence that these principles hold for whole body movements such as walking. Wieser et al. (2010) showed decreased alpha and beta band power during gait like leg movements in an upright position, compared to periods of rest in which participants were lying. Presacco et al. (2011) showed that spectral power in the alpha band is suppressed during precision walking compared to standing. These results are in line with our recent study where we showed that alpha and beta spectral power in sensorimotor areas is suppressed during robot assisted walking compared to standing (Wagner et al., 2012). We also show that spectral power in these bands is significantly decreased during active compared to passive walking. Thus our findings indicate that the task of active gait adjustment in the VE requires enhanced motor planning and increases activity in the premotor cortex. This is in line with numerous studies that relate increased activity in the premotor area to the planning of single limb movements (Pfurtscheller and Berghold, 1989; Ikeda et al., 1992; Tanji, 1994), (for a review see Haggard, 2008). Recent studies have demonstrated that the premotor areas are also activated during gait initiation and adaptation (Suzuki et al., 2004, 2008; Haefeli et al., 2011; Koenraadt et al., 2013).

In the posterior parietal cortex (PPC) two clusters were identified. One located centrally (Cluster B) and one located in the right hemisphere (Cluster C). In Cluster B power in the mu and beta band was significantly suppressed during both VE conditions compared to MIRROR and GAZE. Cluster C also revealed decreased power in the beta band and the 23–40 Hz range during the VE tasks relative to all other feedback conditions. The 23–40 Hz range is overlapping with the upper beta band, and is suppressed during feedback conditions in which participants had to actively modify their steps. We assume therefore that a decrease in this band has the same functional meaning previously described for the mu and beta band. Alpha and beta rhythms in the parietal cortex have been previously linked to spatial attention, decision making, and sensorimotor integration (Capotosto et al., 2009; Donner and Siegel, 2011; Hipp et al., 2011; Capotosto et al., 2012). Interestingly two recent studies by Tombini et al. (2009) and Perfetti et al. (2011) relate alpha and beta ERD in parietal regions to the movement planning in visually guided upper limb movements under both feedforward and feedback control. For the MIRROR condition a significant power decrease in mu and beta bands relative to the movement unrelated feedback (GAZE) was observed solely in Cluster C. The PPC has been related to the mirror neuron system (Fogassi et al., 2005), we therefore conclude that the activation we find during MIRROR feedback is related to the participants' monitoring of their own movements.

Our results show that parietal cortex regions are more activated in conditions that require visually guided gait adaptation. These results are in line with studies that associate the PPC with visuomotor transformations in reaching movements. Neuronal recordings in monkeys have identified two subareas in the PPC responsible for the action planning of different body parts: the lateral intraparietal area (LIP) for saccades and the parietal reaching region (PRR) for reaching (Snyder et al., 1997). In humans, functional magnetic resonance imaging (fMRI) studies on the PPC have determined regions corresponding to the monkey PRR area (Connolly et al., 2003; Pellijeff et al., 2006). Recently Wang and Makeig (2009) demonstrated that it is possible to decode intended movement direction using human EEG recorded over the parietal cortex with a delayed saccade-or-reach task. Neuronal recordings in cats have revealed a higher activation in the PPC during visually guided gait modification, and suggest that the PPC may contribute to locomotor control (Drew et al., 2008). Interestingly a recent study has related activity in the parietal cortex directly to the awareness of human actions (Desmurget et al., 2009). Previous findings also indicate that the PPC is involved in the planning of eye-movements (Snyder et al., 1997). Planning of eye-movements in our study should have occurred mainly during GAZE as subjects were supposed to direct their gaze to objects appearing in different corners of the screen. In the parietal clusters we can observe decreased power in mu and beta bands during GAZE compared to NoFB (see **Figure 3**). Possibly some of this activity is related to the planning of eye-movements. However, differences between GAZE and VE should reflect the portion of activity not related to saccades.

Our findings that an interactive gait adaptation task activates premotor and parietal areas is especially interesting as these areas have been related to motor intention and motor planning (Haggard, 2008). The increased activity we find in premotor and parietal areas during walking in a VE might thus reflect increased motor planning that is required by the adaptive training paradigm. VE feedback elicited a higher activation compared to movement unrelated feedback and mirror feedback in all of the clusters. Mirror feedback showed enhanced activation relative to movement unrelated feedback only in one of the parietal clusters. This provides evidence that the benefits of gait training with a more demanding and interactive task may be superior to simple mirror feedback.

Interestingly we found a significant difference between 1stP VE and 3rdP VE in the premotor cortex during one of the transition phases of the gait cycle. In general 3rdP VE seems to be related to a more variable pattern of the 23–40 Hz modulation compared to the other conditions, including 1stP VE. This could be an indication that the gait movements are less regular and less automatic involving more motor planing during 3rdP VE compared to 1stP VE, at least during certain phases of the gait cycle. Studies on body ownership show that first person perspective is superior to third person perspective VE for the induction of fullbody ownership illusions (Slater et al., 2010; Petkova et al., 2011). These studies relate the first person and third person perspective, respectively to an egocentric and allocentric reference frame. Studies show that the processing of egocentric spatial information and self-motion activates the right parietal cortex (Maguire et al., 1998; Andersen et al., 1999; Vogeley and Fink, 2003). Interestingly in our study we found clusters only in the right parietal cortex, and these were more activated during the VE walking tasks compared to MIRROR and GAZE. However, we did not find differences between 1stP and 3rdP VE in these clusters. Differences between 1st and 3rdP perspective were located in the premotor cortex, a brain region that has been identified in a previous study to be related to the feeling of agency (Tsakiris et al., 2010). From observations we can say that the participants in our experiment needed more time in the beginning to get used to the first person control in the VE. We could speculate that this increased performance success in visuomotor adaptation might have induced a greater feeling of agency in the third person perspective.

Our results further support previous findings (Brütsch et al., 2010, 2011; Schuler et al., 2011) suggesting that a more challenging gait adaptation task can promote the motivation for active participation in the movement. It is, however, not clear to which extent this motivation is increased by the immersiveness of the VE or whether any kind of interactive feedback might have the same effect. A recent study by Zimmerli et al. (2013) suggests that the interactivity of the training environment is fundamental in promoting the participants' active engagement in the motor task. Interactivity can be enhanced by providing functionally significant responses to the movement.

#### **5. CONCLUSION**

This study is the first to analyze brain activity during an interactive visual gait adaptation task with a robotic gait orthosis, and to show that the premotor and parietal areas are involved in visually guided gait in humans. We found that mu, beta, and lower gamma rhythms in premotor and parietal cortices are suppressed during conditions that require an adaptation of steps in response to visual input. Such suppression indicates increased activation of these brain areas. We show that this activity is higher compared to mirror feedback and a visual attention task. Higher cortical activation during visually guided gait adaptation may reflect additional motor planning and visuomotor processing. Activity in the parietal cortex likely reflects direct visuomotor transformations required by the task. Increased activity in the premotor cortex may indicate motor planning involved in adapting the steps to the visual input. Considering studies showing that voluntary drive is crucial for motor learning (Lotze et al., 2003; Kaelin-Lang et al., 2005), our results suggest the possible benefit of goal directed walking tasks that recruit brain areas involved in motor planning. Our results are relevant for gait rehabilitation after stroke and may help to better understand the cortical involvement in human gait control.

#### **ACKNOWLEDGMENTS**

This work was partly supported by the European Union research project BETTER (ICT-2009.7.2-247935) and the Land Steiermark project BCI4REHAB. The authors are thankful with Prof. Dr. Peter Grieshofer for providing the Lokomat and Georg Schaffhauser and Pamela Holper for assistance during the experiments. We thank Prof. Ales Holobar and colleagues from the University of Maribor for providing the gaze screen and assistance during the experiments.

#### **REFERENCES**


*Augmented Cognition. Neuroergonomics and Operational Neuroscience*, eds D. D. Schmorrow, I. V. Estabrooke, and M. Grootjen (Berlin: Springer), 437–446.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

#### *Received: 29 November 2013; accepted: 07 February 2014; published online: 25 February 2014.*

*Citation: Wagner J, Solis-Escalante T, Scherer R, Neuper C and Müller-Putz G (2014) It's how you get there: walking down a virtual alley activates premotor and parietal areas. Front. Hum. Neurosci. 8:93. doi: 10.3389/fnhum.2014.00093*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Wagner, Solis-Escalante, Scherer, Neuper and Müller-Putz. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Neural decoding of expressive human movement from scalp electroencephalography (EEG)

#### *Jesus G. Cruz-Garza1,2 †, Zachery R. Hernandez 1,3\*†, Sargoon Nepaul 4, Karen K. Bradley5 and Jose L. Contreras-Vidal 1,3*

*<sup>1</sup> Laboratory for Noninvasive Brain-Machine Interface Systems, Department of Electrical and Computer Engineering, University of Houston, Houston, TX, USA*

*<sup>2</sup> Center for Robotics and Intelligent Systems, Instituto Tecnológico y de Estudios Superiores de Monterrey, Monterrey, Mexico*

*<sup>3</sup> Department of Biomedical Engineering, University of Houston, Houston, TX, USA*

*<sup>4</sup> Department of Neurobiology, University of Maryland, College Park, MD, USA*

*<sup>5</sup> Department of Dance, University of Maryland, College Park, MD, USA*

#### *Edited by:*

*Klaus Gramann, Berlin Institute of Technology, Germany*

#### *Reviewed by:*

*Joseph T. Gwin, University of Michigan, USA Julie A. Onton, Institute for Neural Computation, USA*

#### *\*Correspondence:*

*Zachery R. Hernandez, Laboratory for Noninvasive Brain-Machine Interface Systems, Department of Electrical and Computer Engineering, University of Houston, 4800 Calhoun Rd., Houston, TX 77004, USA e-mail: zrhernandez@uh.edu*

*†These authors have contributed equally to this work.*

Although efforts to characterize human movement through electroencephalography (EEG) have revealed neural activities unique to limb control that can be used to infer movement kinematics, it is still unknown the extent to which EEG can be used to discern the expressive qualities that influence such movements. In this study we used EEG and inertial sensors to record brain activity and movement of five skilled and certified Laban Movement Analysis (LMA) dancers. Each dancer performed whole body movements of three Action types: movements devoid of expressive qualities ("Neutral"), non-expressive movements while thinking about specific expressive qualities ("Think"), and enacted expressive movements ("Do"). The expressive movement qualities that were used in the "Think" and "Do" actions consisted of a sequence of eight Laban Effort qualities as defined by LMA—a notation system and language for describing, visualizing, interpreting and documenting all varieties of human movement. We used delta band (0.2–4 Hz) EEG as input to a machine learning algorithm that computed locality-preserving Fisher's discriminant analysis (LFDA) for dimensionality reduction followed by Gaussian mixture models (GMMs) to decode the type of Action. We also trained our LFDA-GMM models to classify all the possible combinations of Action Type and Laban Effort quality (giving a total of 17 classes). Classification accuracy rates were 59.4 ± 0.6% for Action Type and 88.2 ± 0.7% for Laban Effort quality Type. Ancillary analyses of the potential relations between the EEG and movement kinematics of the dancer's body, indicated that motion-related artifacts did not significantly influence our classification results. In summary, this research demonstrates that EEG has valuable information about the expressive qualities of movement. These results may have applications for advancing the understanding of the neural basis of expressive movements and for the development of neuroprosthetics to restore movements.

**Keywords: EEG, neural classification, mobile neuroimaging, neural decoding, dance, Laban Movement Analysis**

#### **INTRODUCTION**

In recent years, neural engineering approaches to understanding the neural basis of human movement using scalp electroencephalography (EEG) have uncovered dynamic cortical contributions to the initiation and control of human lower limb movements such as cycling (Jain et al., 2013); treadmill walking (Gwin et al., 2010, 2011; Presacco et al., 2011, 2012; Cheron et al., 2012; Petersen et al., 2012; Severens et al., 2012; Schneider et al., 2013), and even robotic assisted gait (Wagner et al., 2012; Kilicarslan et al., 2013). Most of these studies however have been limited to slow walking speeds and have been constrained by treadmills or the cycling or robotic devices used in the tasks, and have yet to examine more natural, and therefore less constrained, expressive movements. To address this important limitations, a mobile EEG-based brain imaging (MoBI) approach may be a valuable tool for recording and analyzing what the brain and the body do during the production of expressive movements, what the brain and the body experience, and what or how the brain self-organizes while movements of physical virtuosity are modified by expressive qualities that communicate emotional tone and texture—the basic language of human interactions. These expressive patterns are unique to each person, and we organize them in such particular ways that they become markers for our identities, even at great distances and from behind (Williams et al., 2008; Hodzic et al., 2009; Ramsey et al., 2011).

Interestingly, studies of the so-called human action observation network, comprised of ventral premotor cortex, inferior parietal lobe, and the superior temporal sulcus, have shown dissociable neural substrates for body motion and physical experience during the observation of dance (Cross et al., 2006, 2009). Orgs et al. (2008)reported modulation of event-related desynchronization (ERD) in alpha and beta bands between 7.5 and 25 Hz in accordance to a subject's dance expertise while viewing a dance movement. Tachibana et al. (2011) reported gradual increases in oxygenated-hemoglobin (oxy-Hb) levels using functional nearinfrared spectroscopy (fNIRS) in the superior temporal gyrus during periods of increasing complexity of dance movement. While current neuroimaging research aims to recognize how the brain perceives dance, no study has described the various modes of expressive movements within a dance in relation to human scalp EEG activity. Thus, the current study focuses on extracting information about expressive movements performed during dance from non-invasive high-density scalp EEG.

The study emerged from many questions about the differences in neural engagement between functional and expressive movement in elite performers of movement; specifically, dance, and movement theatre. The questions are important, because dance has been studied primarily as elite athletic movement, located in the motor cortex. And yet, dancers train for years to express nuanced and complex qualities in order to tell a story, express an emotion, or locate a situation. Where do these various communicative messages, manifested in expressive movers, fire? Are they part of the motor functions, or are other aspects of cognition involved? The questions therefore became the basis of an emergent inquiry, using the high-density scalp EEG. Since no previous data on the differences between these two modalities of movement have been found, the study is nascent. As the investigators planned for the research, it became clear from the lack of any prior studies making these distinctions that we would be gathering baseline data and demonstrating feasibility for further studies.

Our study utilized expert analysts and performers of expressive movement, all trained in Laban Movement Analysis (LMA) (Laban, 1971; Bradley, 2009). LMA is composed of four major components: Body, Space, Effort, Shape, which make up the grammar for movement "sentences," or phrases. In this study, we focus on the Effort component, which represents dynamic features of movement, specifically the shift of an inner attitude toward one or more of four factors: *Space* (attention or focus), *Weight* (impact, overcoming resistance), *Time* (pacing), and *Flow* (on-goingness). Each factor is a continuum between two extremes: (1) *Indulging in or favoring* the quality and (2) *Condensing* or fighting against the quality. **Table 1** illustrates the Laban's Effort qualities, each factor's indulging and condensing element, respectively with textual descriptions and examples.

LMA differentiates between functional and expressive movement. Functional movement is perfunctory, task-oriented, nonexpressive movement. It can be highly skill-based and technically complex, but it does not communicate an attitude or express an emotion. An example of functional movement might be cycling or treadmill walking; when such activities are primarily about the mechanics of executing the action. Expressive movement occurs through shifts in thoughts or intentions, and communicates something about the personal style of the mover. Human beings communicate in both verbal and nonverbal ways; the nonverbal expressive aspects of movement are "read" as indicators of our unique personalities and personal style. For example, movement analysts would describe individuals as "hyper" or "laid-back" based, in part, on their Effort patterns. Individuals **Table 1 | Effort factors and effort elements (Zhao, 2001; Bishko, 2007; Bradley, 2009).**


might have recurring moments of a Strong, Direct stance. Others may demonstrate recurring moments of Quick, Free, Light gestures that accent a sparkly or lively presence. These expressive components of movement do not occur in isolated ways from the other aspects of movement analysis (Body, Space, and Shape), but rather, modify movement events. They are capable of a wide range of such modifications, and the complex patterns of expressiveness make up unique movement signatures. In this way, familiar people can be identified from even great distances, simply from their Effort qualities. Unfortunately, prior research investigating natural expressive movement has been limited to motion capture technology (Zhao and Badler, 2005; Bouchard and Badler, 2007). The markers that track the body in movement are tantalizingly close to being able to trace movement qualities, but have not yet achieved legibility of the shift into expressive movement. Thus, the goal of this study is two-fold: (1) Identify those efforts and individual differences in such qualities from brain activity recorded with scalp EEG, and (2) further develop MoBI approaches to the study of natural unconstrained expressive movement.

Certified Laban Movement Analysts were used as subjects because of the extensive training in distinguishing between categories of movement as both observers and performers. The five subjects were also teachers of LMA, and had extensive experience in demonstrating the differences and unique qualities of each feature of expressive movement to students of the work. One of the researchers (Bradley) is a Certified Laban Movement Analyst and has been teaching the material for 30 years. Such experienced subjects and researcher allowed for the identification (and labeling) of shifts in performance from functional to expressive moments.

#### **MATERIALS AND METHODS EXPERIMENTAL SETUP**

#### *Subjects*

Five healthy Certified Movement Analysts (CMAs) proficient in the expressive components of LMA participated in the study after giving Informed Consent. All subjects were professional teachers and performers of movement; either dancers or movement-based actors. One man and four women were studied with ages ranging from 28–62 years. Data from subject 2 were discarded due to technical issues during the recording that resulted in missing data or data of bad quality.

#### *Task*

The study consisted of three-trial blocks where synchronized scalp EEG and whole-body kinematics data were recorded during a ∼5 min unscripted and individualized dance performance. Each trial block consisted of three Action Types ("neutral," "think," "do"). During "neutral" action, subjects were directed to perform functional movements without any additional qualities of expression. This was followed by the "think" condition where subjects continued to perform functional movements, but now imagined a particular Laban Effort quality instructed by the experimenter. Lastly, subjects executed (i.e., enacted) the previously imagined expressive movement during the "do" condition. Dancers were instructed to begin and end each Laban Effort quality cued by the experimenter, a professional movement analyst, in addition to a monotone auditory trigger at the onset of each condition. The sequence of Laban Effort qualities varied from trial-to-trial as well as from subject-to-subject. Nonetheless, all efforts were arranged such that the *indulging* (*favored*) element was preceded by *condensing* element of the Laban Effort quality. As we were interested in inferring expressive qualities, all the "neutral" instances, which were devoid of willed expressiveness, were collapsed within a superset "neutral" leaving therefore a total of 17 distinct classes of expressive movements to infer from scalp EEG ("neutral" + "think" × 8 efforts + "do" × 8 efforts).

#### **DATA ACQUISITION AND PREPROCESSING**

Brain activity was acquired non-invasively using a 64 channel, wireless, active EEG system sampled at 1000 Hz (BrainAmpDC with actiCAP, Brain Products GmbH). Electrode labeling was prepared in accordance to the 10–20 international system using FCz as reference and AFz as ground. The kinematics of each dance's movements were captured using 10 wireless Magnetic, Angular Rate, and Gravity (MARG) sensors (OPAL, APDM Inc., Portland, OR) sampled at 128 Hz mounted on the head, upper torso, lumbar region, arms, thighs, shanks, and feet. Each sensor contains a triaxial magnetometer, gyroscope, and accelerometer (**Figure 1**). A Kalman filter was used to estimate the orientation of each IMU with respect to the global reference frame. Using this information about sensor orientation, the tri-axial acceleration data, which had been compensated for gravitational effects, was estimated (Marins et al., 2001).

Peripheral EEG channels (FP1-2, AF7-8, F7-8, FT7-10, T7- 8, TP7-10, P7-8, PO7-8, O1-2, Oz, PO9-10 in the extended 10–20 EEG system montage) were rejected as these channels are typically heavily corrupted with motion artifacts and scalp myoelectric (EMG) contamination. In addition, time samples of 500 ms before and after the onset of each condition were removed from further analysis to minimize time transition effects across conditions. EEG signals were resampled to 100 Hz, followed by a removal of low frequency trends and constrained to the delta band (0.2–4 Hz) using a 3rd order, zero-phase Butterworth bandpass filter. The EEG data were then standardized by channel by subtracting the mean and dividing by the standard deviation. Finally, a time-embedded feature matrix was constructed from *l* = 10 lags corresponding to a *w* = 100 ms window of EEG data. The embedded time interval was chosen based on previous studies demonstrating accurate decoding of movement kinematics

from the fluctuations in the amplitude of low frequency EEG (Bradberry et al., 2010; Presacco et al., 2011, 2012). The feature vector for each time sample *tn* was constructed by concatenating the 10 lags (*tn* − 9, *tn* − 8, ..., *tn*) for each channel into a single vector of length 10 × *N*, where *N* is the number of EEG channels. To avoid the problem of missing data, the feature matrix was buffered by starting at the 10th EEG sample of the trial. All EEG channels and time lags were subsequently concatenated and standardized to form a [*t*<sup>0</sup> − *w*]×[*N* ∗ *l*] feature matrix.

#### **DIMENSIONALITY REDUCTION**

Once feature matrices were generated for all trial blocks, training and testing data were randomly sampled in equal sizes for each class for cross-validation purposes, and reduced in dimensionality (Bulea et al., 2013; Kilicarslan et al., 2013). Local Fisher's Discriminant Analysis (LFDA) is deployed here to reduce the dimensionality of a sample set of classes by minimizing and maximizing samples within and between classes, respectively, while preserving the locality of the samples that form each class (Sugiyama, 2006, 2007). Details of the technique adopted here (LFDA) are described in Sugiyama (2006, 2007).

#### **NEURAL CLASSIFIER ALGORITHM**

A Gaussian mixture model (GMM), capable of representing arbitrary statistical distributions as a weighted summation of multiple Gaussian distributions, or components (Paalanen et al., 2006), was employed to classify the Laban Movement (LBM) Efforts from scalp EEG. As the name implies, GMM represents each class as a mixture of Gaussian components whose parameters and component number are approximated using the Estimation-Maximization (EM) algorithm and Bayes Information Criterion (BIC), respectively (Li et al., 2012). The two main parameters for this algorithm include the number of reduced dimensions *r* and *k*-nearest neighbors *knn* (from the LFDA) and thus must be optimized for this particular application of expressive movement classification (Li et al., 2012; Kilicarslan et al., 2013).

The probability density function for a given training data set *X* = {*xi*} *n <sup>i</sup>* <sup>=</sup> <sup>1</sup> <sup>∈</sup> <sup>R</sup>*<sup>d</sup>* is given by:

$$p(\mathbf{x}) = \sum\_{k=1}^{K} \alpha\_k \phi\_k \tag{1}$$

$$\phi\_k(\mathbf{x}) = \frac{e^{-0.5(\mathbf{x} - \mu\_k)^\mathsf{T} \Sigma\_k^{-1} (\mathbf{x} - \mu\_k)}}{(2\pi)^{d/2} |\Sigma\_k|^{1/2}} \tag{2}$$

where *K* is the number of components and α*<sup>k</sup>* is the mixing weight, μ*<sup>k</sup>* is the mean, and *<sup>k</sup>* is the covariance matrix of the *k*-th component. The parameters of each GMM component *K*, including α*k*, μ*k*, and *k*, are estimated as those which maximize the log-likelihood of the training set given by:

$$L\_k = \sum\_{i=1}^n \log p\_k \left( \mathbf{x}\_i \right) \tag{3}$$

where *p*(*x*) is given in (1). Maximization of (3) is carried out using an iterative, greedy expectation-maximization (EM) algorithm (Vlassis and Likas, 2002), with the initial guess of the parameters α*k*, μ*k*, and *<sup>k</sup>* established via k-means clustering (Su and Dy, 2007), until the log-likelihood reaches a predetermined threshold. The determination of *K* is critical to successful implementation of GMMs for classification. The BIC has been reported as an effective metric for optimizing *K* (Li et al., 2012).

$$BIC = -2L\_{\text{max}} + 2\log\left(n\right) \tag{4}$$

where *Lmax* is the maximum log-likelihood of each model from (3). During training, the maximum value of *K* = 10 was chosen based on estimates from prior work in our lab (Kilicarslan et al., 2013). We then computed *Lmax* for each value of *K* ∈ {1, 2,..., 10} and estimated the optimal value of *K* as the model, using the minimum BIC from (4). In this manner, class-specific GMMs representing each Effort could be specified for use in a maximum-likelihood classifier. The parameters for each classconditional GMM were specified using an optimization data set (classifier optimization). The posterior probability of each new data point was computed using the optimized model for each class, and that data point was then assigned to the class that returned the largest value.

Neural classification from scalp EEG was performed using two schemes of class initialization. We defined the Scheme 1 (Action Type) as a differentiation of *n* time samples into one of three classes corresponding to the conditions of "Neutral," "Think," and "Do." In a similar initialization for Scheme 2 (Laban Effort quality Type), each condition of "Think" and "Do" were segregated into each of the eight Laban Effort quality elements, thereby forming an accumulation of 17 classes. The results of each classification could be observed by obtaining the confusion matrix of each classification scheme. This matrix provides the user with a detailed understanding of the overall accuracy rate in terms of the accuracy, or sensitivity and precision, for each class.

#### **CROSS VALIDATION**

Overall classification accuracy and class precision rates were averaged by implementing a random sub-sampling cross validation scheme. That is, samples from the concatenated feature matrix of three trial blocks were randomly selected and placed into an equal number of samples per class based on a percentage of samples from the least populated class. This process was then repeated 10 times (**Figure 2**) in order to minimize the effects of random sampling bias, avoid over-fitting, and demonstrate replicability of the algorithm. A sampling of 10 accuracies was found to be sufficient as it usually resulted in a low standard error (ε < 1).

#### **FORWARD SELECTION OF EEG CHANNELS**

In an attempt to identify the EEG channels that contributed most to classification accuracy, the iterative process of forward selection was introduced upon the EEG channels and their corresponding lags that comprise the feature matrix. This was performed by computing the mean classification accuracy of each EEG channel independently using the LFDA-GMM algorithm, and ranking them in descending order of accuracy values. The highest ranked channel was added to the selected channels list (SCL), and tested against each of the remaining channels. The channel that ranked

highest in classification accuracy when tested along the SCL was added to the SCL for the next iteration. This procedure was repeated until all remaining non-SCL channels were exhausted.

#### **EXAMINATION OF POTENTIAL MECHANICAL ARTIFACTS ON EEG DECODING**

To assess the potential contribution of mechanical/motion artifacts to decoding, we performed a series of analyses including time-frequency analysis, cross-correlation analysis, and coherence analysis to compare the EEG signals with the motion signals acquired with the MARG sensors. First, we performed principal component analysis (PCA; Duda et al., 2012) on the acceleration data (*d* = 10 sensors). A cross-correlation analysis was then performed between the raw EEG (resampled to 100 Hz) and the first "synergy" (i.e., first PC) of acceleration data. Histograms and box plots of each EEG channel by PC1 calculated correlation values were subsequently assessed to observe differences across the distribution of each class. Second, we performed a time-frequency analysis to compare the raw EEG signals over selected frontal, lateral, central, and posterior scalp sites and the gravity-compensated accelerometer readings from the MARG sensor placed on the head. Then, we estimated the coherence between the raw EEG signals and the accelerometer signals. Finally, we computed a whole-scalp cross-correlation of the EEG signals and the head accelerometer readings to examine the contribution of head motion to EEG.

#### **RESULTS**

#### **KINEMATIC ANALYSIS**

**Figure 3** depicts a sample set of EEG and motion capture recordings for Subject 4, Trial 2 comprising all Action type classes for the Laban Effort quality of Flow, which includes the opposing elements of free and bound flows. PCA was performed upon the full time series of acceleration data from all 10 MARG sensors. The PCs whose cumulative variability summed to at least 80% were also featured within the sample set of signal data in **Figure 3**. Time series provided for both "neutral" blocks in **Figure 3** appear to be relatively "smooth" (less varying) in terms of both neural activity and kinematic movement. One exception to this includes rapid changes in acceleration around 169 s as confirmed by the acceleration plots. EEG signal patterns are visually distinct between "think" time segments of free and bound flow elements, especially


**FIGURE 3 | Sample EEG and MARG recordings for Subject 4, Trial 2 with video recording (see Supplementary Materials).** EEG and accelerometer data are segmented by each condition (Neutral, Think, Do) of the Laban Effort quality of Flow. The first four PCs of acceleration data are also shown.

with unique areas of modulation of neural activity at 185 s (free flow) and 209 and 214 s (bound flow) which contained little to no effect of motion artifacts, as confirmed by the kinematics signal data. By contrast, the "do" section of the Laban Effort quality of free flow was found to contain the greatest influence of motion embedded in the EEG signal data, as demonstrated by the large excursions in signal magnitude for both EEG and kinematics data. These differences between classes are more prominent when the distribution of PC values can be observed for every class in the trial, as shown in **Figure 4**. Key features to note include the small variance accounted by "Do Light Weight" and "Do Sustained Time" classes, which reflects the low movement the subject effectuated for the particular action. Other classes such as "Do Free Flow" and "Do Quick Time" have a higher variance due to the nature of these efforts as they cover a greater range of motion. Potential motion artifacts produced by the subject's movements appear to contaminate EEG signal patterns, however the effect appears to be localized to specific classes of Laban Effort qualities (e.g., "Do Free Flow") and thus not consistent over the entire time series. A more detailed analysis of potential mechanical/motion artifacts based on cross-correlation, coherence and time-frequency analyses are thus provided next.

The distribution of correlation values between raw EEG channels and the first PC of the raw acceleration data returned a range of median correlation coefficients between 0.02 and 0.15 across classes (**Figure 5A**). Outliers were identified for some efforts, and thus may be indicative of a close relationship between a particular EEG channel and the first PC "synergy" of acceleration. The coefficient of determination was obtained by squaring each correlation coefficient ρ. This coefficient is defined as the percent variation in the values of the dependent variable (raw EEG) that can be explained by variations in the values of the independent variable (acceleration). Coefficients of determination (ρ2) values were generally low and ranged from ∼0.0 to ∼0.23 (that is, ∼0 to 23% of the total variation of the raw EEG can be accounted for by changes in the PC1) across all subjects and electrodes. Spatial distributions of ρ2-values were plotted as scalp maps to indicate the relationship between the raw EEG and the head acceleration across scalp channels. Peaks of highest accounted variance (**Figure 5B**) were observed for certain Laban Effort qualities, most notably in the occipital regions for "Think Quick Time" and "Think Light Weight" and temporal regions for "Do Sustained Time" for Subject 4 (See Supplementary Material for ρ<sup>2</sup> data from other subjects).

A similar analysis comparing the raw EEG signals and the head accelerometer (which directly recorded EEG electrode movements), rather than the first PC "synergy," was also conducted (**Figure 6**). This resulted in correlation values generally below ρ = 0.15, though many boxplot distributions varied by subject throughout each Laban Effort quality (**Figure 6A**). Although strong relationships between the accelerometer and EEG signals may be expected, the relatively low ρ<sup>2</sup> scores indicate otherwise. Low correlations between neural activity and head motion were observed for classes such as "Bound Flow," which is reasonable given the rigid-like movements that this effort entails. In contrast, much higher correlation coefficients remained for "Light Weight" and "Indirect Space" time segments. **Figure 6B** depicts scalp maps with ρ2-values between head accelerometer and raw EEG data for Subject 4. In the scalp maps some classes show channels with slightly high correlation ρ<sup>2</sup> = 0.1 (which account for ∼10% of the total variation of the EEG due to the head motion), specifically in "Think Light Weight," "Think Direct Space," "Think Quick Time," and "Do Sustained Time," for Subject 4. Overall, these analyses showed a slight contamination, for some classes of Laban Effort qualities, of EEG signals due to head movement (see Supplementary Material), but the amount of total variance in the EEG signals explained by head motion was relatively small.

Additionally, time-frequency and coherence analyses were performed upon the raw signals of three selected EEG electrodes (Cz, C6, and POz) representing a sampling of the spatial assortment of neural activity across the scalp, as well as the gravity-compensated acceleration magnitude of the head MARG sensor by generating two spectrograms, as shown in **Figure 7**. The spectrograms were generated by computing the short-time Fourier transform (STFT) over a time window of samples with overlap at each PSD computation of the FFT. We used a frequency range between 0.1–40 Hz and a time window of 1024 samples with 93% overlap. The mean-squared coherence between the head acceleration and each corresponding EEG electrode at each frequency value was computed using Welch's overlapped-segment averaging technique (Carter, 1987). From the spectrograms it can be observed that the actions "Do Quick Time," "Do Think Free Flow," "Do Strong Weight," and short-lived portions of "Neutral" tasks contained higher power in the head accelerometer readings that may affect decoding. However, coherence estimates were generally low (<0.3; see **Figure 7**) with some transient increases in coherence between EEG and head acceleration during some Laban Effort qualities. Given that relatively high levels of coherence were shortlived and localized to a few classes of Laban Effort qualities, and that random sampling of EEG signals were used for training and cross-validation of our neural classifiers, we argue that motion artifacts, if present, had only a very minor contribution to decoding. We further discuss these results below.

#### **DECODING ACTION TYPE FROM SCALP EEG**

We first examined the feasibility of inferring the action type ("neutral," "think," "do"), irrespective of Laban Effort quality, from scalp EEG. Analyses showed the "think" condition had the highest sensitivity than the other two action types. Based on the optimization of LFDA parameters, the mean accuracy rate (10 random subsampling cross-validation iterations were used for each subject) was 56.2 ± 0.6% by Action Type for Subject 1

(*r* = 300, *knn* = 21), which was well above 33% chance probability. Similar classification accuracy results were obtained for the rest of the subjects, namely 57.0 ± 0.4% for Subject 3, 62.1 ± 0.5% for Subject 4, 62.4 ± 1.0% for Subject 5. **Figure 8** shows the mean classification accuracies for the different data sets tested.

Predicted samples were summed across all four subjects and normalized by dividing each predicted sample size by the actual class sample size, as indicated by the percentages within each confusion matrix block (**Figure 9**). **Figure 9** depicts the confusion matrix for the Action Type decodes. Classification of EEG patterns corresponding to the "think" class achieved the highest classification rates (88.2%), followed by both "neutral" and "do" classes. Note that the highest misclassifications occurred for class "neutral," which were classified as belonging to the "think" (32.9%) class. The worst performance was for the "do" class as instances of "neutral" (23.5%) and "think" (50.7%) were misclassified as "do."

#### **DECODING LABAN EFFORT QUALITY TYPE FROM SCALP EEG**

We then examined the classification accuracy for Laban Effort quality Type (8 Think about Laban Effort quality + 8 Do Laban Effort quality + Neutral = 17 classes). In this case, nearly all test samples were accurately classified into their respective classes, which resulted in 88.2% classification accuracy across subjects. **Figure 8** (black bars) shows the mean classification accuracies for Laban Effort qualities across subjects. Interestingly, most test samples were misclassified under the "neutral" class as shown by the relatively high percentages for all non-"neutral" classes in the first column (**Figure 10**). Based upon **Figure 10**, classes related to actions of "do" were more difficult to classify (relative to actions of "think") except for "Do Quick Time," which contained the highest sensitivity rate overall (96.5%).

#### **TRAINING SAMPLE SIZE EFFECTS ON CLASSIFICATION ACCURACY**

The effect of training sample size on classification accuracy was also examined in Subject 1. The training sample size constituted a percentage (20–90) of the least populated class. Classification of Action type was not significantly affected by percentage of training samples (**Figure 11**); however, classification of Laban Effort quality type showed a non-linear increase as a function of percentage of training samples.

#### **RELEVANT EEG CHANNELS FOR CLASSIFICATION**

A forward selection approach was employed per subject in order to identify the EEG channels with the most useful information for classification (Pagano and Gauvreau, 2000). While maintaining the number of reduced dimensions (*r*) and *k*-nearest neighbors (*knn*) constant (*r* = 10, *knn* = 7) and operating under the Effort Type classification scheme, the mean classification accuracy was computed for all 39 channels and corresponding lags independently. The channel that yielded the highest classification accuracy (channel A) was then selected. Classification accuracies were then re-computed by adding channel A to each of the remaining 38 channels independently. The channel-pair yielding the highest accuracy was again selected and added to each of the remained channels to find the channel-triplets yielding

**magnitude of the head MARG sensor for Subject 4.** Frequency axes are shown in logarithmic scale. Note the generally low coherence

lines above each figure indicate the efforts windows in **Figures 3**, **4** to compare to each spectrogram plot.

the highest accuracy, and so on. This process continues until no channels remained, and classification accuracy was shown to stop increasing after selecting approximately 10 electrodes for each subject (shaded gray region in **Figure 12**). Hence, 10 electrodes were retained for further analysis per subject, as illustrated by the scalp maps depicted in **Figures 13A–D**. Electrodes common to at least two subjects were highlighted in **Figure 13E**, which span over scalp areas above bilateral premotor and motor cortices and dorsal parietal lobule areas. This is consistent with previous studies seeking to associate dancing movements with cortical regions (Cross et al., 2006, 2009). Though peak accuracies at 10 electrodes (**Figure 12**) were low (40–50%) relative to optimized Effort Type accuracies (**Figure 8**), this was largely due to the lower reduced dimension parameter for LDFA. This suggests that a higher-thanchance classification accuracy can be obtained by using as few as 10 electrodes. Nevertheless, relevant information within all 39 EEG channels ultimately allows the classifier to reach more than 90% decoding accuracy (**Figure 8**).

#### **EFFECTS OF HEAD MOTION ON NEURAL CLASSIFICATION**

We examined the relationship between classification performance and motion artifact contamination. Taking the ρ-values from **Figure 5A**, we compared them with each class' F1 score in classification. If classes with higher ρ-values showed a higher F1 score, this would mean that the classifier was able to better classify the classes that were modulated by motion artifacts. However, **Figure 14** shows no evidence of a correspondence between the F1 score and the correlation coefficients per class.

The F1 score (5) is a weighted average of the sensitivity and precision rates, and thus reflects the overall accuracy of a particular class (Hripcsak and Rothschild, 2005). For purposes of this (*r* = 10, *knn* = 7, See Relevant EEG channels for classification for discussion).

study we use the balanced F1 score equation, defined as:

$$F = \frac{\left(1 + \beta^2\right) \times sensitivity \times precision}{\left(\beta \times precision\right) + sensitivity}, \ \beta = 1\tag{5}$$

where β is used as a weighting factor between sensitivity and precision. Overall, a direct relationship between classification


success and the median correlation coefficient of EEG channelsto-acceleration data does not seem to occur, but rather a tendency exists for high successes of neural classification in classes that also contain low correlations with accelerometer data.

#### **EFFORT TYPE CLASSIFICATION REPRESENTED IN 4D LABAN SPACE**

**Figure 15** illustrates the highly predictive power of the Laban Effort quality Type neural classification scheme. Using a normalized variant of the GMM probability density function, we placed weightings to the four coordinates in the Laban Effort quality space. Each axis corresponds to a Laban Effort quality of *Space*, *Flow*, *Weight*, and *Time*. Some testing samples were found to be misclassified between *Indirect Space*, *Light Weight*, and *Quick Time* axes, as shown by the ellipsis in **Figure 15**. This may suggest shared characteristics between the expressive movements that cause such misclassification. Non-expressive, or non-classifiable, samples are depicted as green foci falling near the center of the plot, as indicated by the small arrows. The small amount of nonclassified samples reflects the overall error of the classifier to predict Laban Effort quality using neural recordings.

#### **DISCUSSION**

#### **CLASSIFICATION OF EXPRESSIVE MOVEMENTS FROM SCALP EEG**

In this study we demonstrate the feasibility of classifying expressive movement from delta band, EEG signals. Classification rates ranged from 59.4 ± 0.6% for decoding of Action Type ("neutral," "think," and "do") to 88.2 ± 0.7% for decoding of Laban Effort quality (17 classes). Surprisingly, only the "think" class was

**FIGURE 11 | Mean accuracies (for 10 iterations) across varying percentage of training samples for classification by Action (3 classes) and Laban Effort quality (17 classes) types for Subject 1.** LFDA parameters: (*r* = 180, *knn* = 7) for both classification schemes. <sup>∗</sup>Training data samples constitute a percentage of the least populated class.

rate at 10 electrodes, highlighted by the vertical gray bar, was displayed in **Figure 8** to demonstrate the extent of classifying using only 10 electrodes at such a relatively low dimensionality.

reliably decoded from EEG whereas classes "neutral" and "do" were poorly decoded. It should be noted that subjects were not instructed to perform a particular pattern of movement, but rather a mode of action ("neutral," "think," and "do") and Laban Effort quality as a component of LMA. Thus, subjects performed highly individualized changing movement patterns throughout the recording session irrespective of mode of action. We note that our neural decoding framework uses a within-subject approach where neural classifiers are trained for each subject. Such neural decoding approaches are subject specific (Lotte et al., 2007; Bradberry et al., 2010; Presacco et al., 2011, 2012; Wagner et al., 2012; Bulea et al., 2013), and thus common and unique neural patterns are to be expected to influence classification. Conventional statistical analyses can therefore be difficult to interpret in the context of this framework because many factors affect the resulting estimates of significance (i.e., assumptions underlying response distribution, sample size, number of trials, data over-fitting, etc.) (Tsuchiya et al., 2008). Given the crossvalidation procedure (i.e., separate random sampling of data for training and test trials) used in our methodology, the risk of over-fitting is minimized. By deploying our methodology for investigating differences in cortical EEG activity patterns, especially as a function of within-subject training, valuable information could be learned about the adaptation/learning trajectories of those patterns and their relationship to performance and training. On the other hand, the consistency of the underlying neural representations, within a subject, would be a valuable metric in longitudinal studies.

#### **DECODING OF ACTION TYPE AND LABAN EFFORT QUALITIES**

The mean decoding accuracy for action type ("neutral," "think," "do") was near 60%, which was well above chance level. Interestingly, classification rate for the "think" actions was highest (88.2%), followed by "neutral" (64.3%) and "do" actions (25.8%). We note that individualized and unscripted functional movements were performed across all the three action types. Thus, the lowest classification rate for the "do" actions may reflect neural patterns that contain integrated elements of "thought" expressiveness and functional movement that were enacted by the dancers. This would have likely introduced "noise" to these patterns as diverse functional movements were performed irrespective of the Laban Effort qualities being imagined. On the other hand, the "neutral" actions, albeit unscripted and varying across time, contained separable information for the classifiers to discriminate them from the other action types. Only the "think" actions contained separable information about functional movement and Laban Effort qualities, which could be decoded by the classifiers. Thus, it is expected the "neutral" class to yield the worst classification rate given the stochastic pattern of functional movements it contains. Likewise the poor classification of the "do" class may be due to the heterogeneous mixture of functional and expressive movements co-occurring, which may introduce some neural noise within the neural activity evoked by this action.

Interestingly our results demonstrate a greater predictive power toward the classification of each Laban Effort quality element rather than the aggregation of all Laban elements into a singular condition-defined class (**Figures 9**, **10**) suggesting that the neural internal states associated with these efforts contain differentiable features, beyond the movements performed, that can be extracted from scalp EEG.

#### **INFLUENCE OF MOTION ARTIFACTS**

Given the nature of the experimental setup, it is reasonable to assert the assumption that the EEG data may be plagued with motion artifacts. To examine this possibility we performed a series of analyses to uncover any potential relationship between the EEG signals and the dancers' body and head movements. We found that in a few instances the correlation between the raw

**FIGURE 13 | Binary scalp maps for each subject depicting the first 10 electrodes identified as yielding the highest combined accuracy as computed by the forward selection algorithm. (A)** S1. **(B)** S3. **(C)** S4. **(D)** S5. **(E)** Electrodes common to a least two subjects, as indicated by the

circles above a particular electrode channel. Given the subject-specific nature of neural decoding schemes, common, and unique neural patterns were expected (Lotte et al., 2007; Bradberry et al., 2010; Presacco et al., 2011, 2012; Wagner et al., 2012; Bulea et al., 2013).

EEG and the dancers' movements assessed via the MARG sensors was moderately high; however these effects appear to be localized to particular segments of time (see **Figures 3**, **4**). We also note that periods of intense unscripted and varying functional movements may have been responsible for the periods of higher correlation and coherence estimates. However, we hypothesize that for the same reason, neural activity related to the "thinking" of Laban Effort qualities may have occurred or modulated varying body and head movements, thus making these motions likely irrelevant for classifiers. Additionally, the relatively low coefficients of determination between EEG and kinematics data demonstrated that the % variability of EEG signals accounting for head motion was rather small. Moreover, the random sampling of both training and testing datasets would have precluded

the introduction of kinematic influences in both calibration and testing of the classifier, as the temporal nature of kinematic artifacts would have not been included in the training or testing data. This however warrants further investigation to develop better strategies of implementing MoBI approaches to capture neural mechanisms behind general movements.

clearly distinguishable as pertaining to a specific class.

Overall, our results show the feasibility of inferring the expressive component of movements (according to the Laban categorization) from scalp EEG, especially when those components are imagined as subjects perform unscripted natural body movements. These results may have implications for the study of movement training, disease and brain-computer interfaces for restoration of expressive movements.

#### **ACKNOWLEDGMENTS**

This material is based upon work supported by the National Science Foundation under Grant # HRD-1008117 and the National Institutes of Health Award # NINDS R01 NS075889, and also in collaboration with Laboratorio de Robotica del Noreste y Centro de Mexico-CONACyT, Tecnologico de Monterrey. We would also like to express our gratitude to Thomas Bulea, Atilla Kilicarslan, and Harshavardhan Agashe for assisting with the LFDA-GMM algorithms and setting up the EEG, kinematics, and video recordings for this study.

#### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fnhum. 2014.00188/abstract

#### **Video Recordings to Figure 3 | Subject 4, Trial 2 EEG and kinematics data.**

Recordings are included in the online manuscript submission and are titled as: Direct Space Effort (Movie 1), Free Flow Effort (Movie 2), and Bound Flow Effort (Movie 3).

**Figure 5.S1 | Mapping of the coefficient of determination (***ρ*2**) between the first principal component of the accelerometer data and unprocessed EEG data for each Laban Effort quality performed by each subject.**

**Figure 6.S1 | Mapping of the coefficient of determination (***ρ*2**) between the head accelerometer magnitude and unprocessed EEG data for each Laban Effort quality performed by each subject.**

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 30 November 2013; accepted: 14 March 2014; published online: 08 April 2014.*

*Citation: Cruz-Garza JG, Hernandez ZR, Nepaul S, Bradley KK and Contreras-Vidal JL (2014) Neural decoding of expressive human movement from scalp electroencephalography (EEG). Front. Hum. Neurosci. 8:188. doi: 10.3389/fnhum.2014.00188 This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Cruz-Garza, Hernandez, Nepaul, Bradley and Contreras-Vidal. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Linking motor-related brain potentials and velocity profiles in multi-joint arm reaching movements

#### *Julià L. Amengual <sup>1</sup> \*, Josep Marco-Pallarés 1,2, Carles Grau3, Thomas F. Münte4 and Antoni Rodríguez-Fornells 1,2,5*

*<sup>1</sup> Cognition and Brain Plasticity Unit, Department of Basic Psychology, University of Barcelona, Barcelona, Spain*

*<sup>2</sup> Bellvitge Biomedical Research Institute (IDIBELL), Hospitalet de Llobregat, Spain*

*<sup>3</sup> Neurodynamic Laboratory, Department of Psychiatry and Clinical Psychobiology, Universitat de Barcelona, Barcelona, Spain*

*<sup>4</sup> Department of Neurology, University of Lübeck, Lübeck, Germany*

*<sup>5</sup> Institució Catalana de Recerca i Estudis Avançats, Barcelona, Spain*

#### *Edited by:*

*Klaus Gramann, Berlin Institute of Technology, Germany*

#### *Reviewed by:*

*Hirokazu Tanaka, Japan Advanced Institute of Science and Technology, Japan Franck Vidal, Aix-Marseille Université, France*

#### *\*Correspondence:*

*Julià L. Amengual, Cognition and Brain Plasticity Unit, Department of Basic Psychology (Campus Bellvitge), University of Barcelona, C/Feixa Llarga S/N 08107, Hospitalet de Llobregat, Barcelona, Spain e-mail: julian.amengual@gmail.com*

The study of the movement related brain potentials (MRPBs) needs accurate technical approaches to disentangle the specific patterns of bran activity during the preparation and execution of movements. During the last forty years, synchronizing the electromyographic activation (EMG) of the muscle with electrophysiological recordings (EEG) has been commonly ussed for these purposes. However, new clinical approaches in the study of motor diseases and rehabilitation suggest the demand of new paradigms that might go further into the study of the brain activity associated with the kinematics of movements. As a response to this call, we have used a 3-D hand-tracking system with the aim to record continuously the position of an ultrasonic sender attached to the hand during the performance of multi-joint self-paced movements. We synchronized time-series of position and velocity of the sender with the EEG recordings, obtaining specific patterns of brain activity as a function of the fluctuations of the kinematics during natural movement performance. Additionally, the distribution of the brain activity during the preparation and execution phases of movements was similar that reported previously using the EMG, suggesting the validity of our technique. We claim that this paradigm could be usable in patients because of its simplicity and the potential knowledge that can be extracted from clinical protocols.

**Keywords: motor related brain potentials, 3-D movement analyser, time-series analysis, kinematics, self-paced movement, motor activity**

#### **INTRODUCTION**

Over the lasts 40 years, the electrophysiological brain activity (EEG) associated with the preparation and execution of movements has been widely described. The Bereitschaftspotential (BP) (Kornhuber and Deecke, 1965), also termed readiness potential, is a slow negativity starting 1.5–2 s before the onset of the movement that shows a wide scalp distribution being maximal over centro-parietal regions. In addition to the BP itself, a set of components related with to the preparation and the execution of movements has been identified, being known as movementrelated brain potentials (MRBPs) (see Shibasaki and Hallett, 2006, for a review). Furthermore, during the preparation and execution of voluntary movements, a characteristic modulation of the oscillatory brain activity power within the beta (17–24 Hz) and the mu (8–13 Hz) bands has been largely described (Pfurtscheller and Aranibar, 1977, 1979; Pfurtscheller et al., 2003; Jurkiewicz et al., 2006).

In order to give a more fine-grained characterization of the neural sources of these potentials and the associated oscillatory brain activity, several studies have used the Laplacian transformed activity of the EEG obtaining the current source density (CSD) waveforms (Nunez, 2002; Carbonnell et al., 2004; Kayser et al., 2010; Meckler et al., 2010; Tenke and Kayser, 2012). This method allows evaluating the topographical distribution of the brain activity in terms of current sources and sinks through the scalp (Kayser and Tenke, 2006). The maps of activity generated by these CSD waveforms are sensitive to high spatial frequency changes of local cortical potentials due to reduced volume conduction form distant sources (Le et al., 1994). Additionally, it minimizes smearing effects as caused by the tissue transmission distortion (Perrin et al., 1989) and other possible artifacts (Kayser and Tenke, 2006). Particularly, it has been proposed that this transformation is especially useful in localizing the sources of activity in sensorimotor tasks (Tenke and Kayser, 2012).

Classical paradigms designed to study these MRBPs use the surface electromyographic (EMG) signal originated by one muscle or group of muscles recorded simultaneously to the EEG to measure the activation of these muscles while subjects repeat movements at self-pace rates (Cui et al., 1999; Ohara et al., 2006). This technique allows the identification of the movement onset as a rebound in the EMG signal, allowing to off-line epoch the EEG time-locked to the onset of each movement and the posterior averaging of these epochs. Additionally, this signal provides useful information about the strength of the muscular contraction, allowing the characterization of the components of the MRBPs as a function of the kinetic parameters of movement. However, only a few studies have focused on the association between these components with kinematic properties of movement (e.g., position, velocity and acceleration). Slobounov et al. (2000) reported an increase in the amplitude of the late component of the BP (the so-called *late BP* Shibasaki and Hallett, 2006) when the maximum degree of the index finger extension was achieved. In a pioneering study Kirsch et al. (2010) reported a positive correlation between the amplitude of the BP and both velocity and distance during the execution of goal-directed movements. In addition, they found that an increase of the target distance and the movement time were associated with the smoothness of the timecourse of the BP. To this aim, these authors used a goal-directed movement paradigm and measured the movement performance using a 3D hand-tracking system to identify the movement onset instead of using the classical EMG signal. This procedure allowed them to establish a relationship between the internal forces of the movement (i.e., kinetics) and the external motion parameters as position and velocity (i.e., kinematics) to the electrical brain activity. Other studies have used the time series of velocity coupled with the EEG signal in order to develop brain computer interface (BCI) methods. In an outstanding study, Bradberry et al. (2010) estimated the trajectories of self-paced reaching movements by extracting associated patterns of EEG activity. They used a 3D tracking system to continuously extract the hand velocity coupled with the EEG signal and estimated the sources of brain activity that were more strongly involved in encoding the hand velocity using sLORETA. However, no studies to present have established a direct relationship between changes in the pattern of velocity during movement performance and the associated EEG activity at each stage of the movement.

We aimed to investigate the cross-relationship between the fluctuations of the velocity during the execution of natural selfpaced movements and the concomitant EEG activity. Self-paced movements have the property of being self-initiated, that is, triggered by the internal decision of the subject instead of being externally triggered. In the present study we designed a paradigm that required participants to reach a target positioned at a given exact location using both arms through multi-joint arm reaching movements. During the task, the 3D spatial position of an ultrasound marker located on the hand was recorded using a hand-tracking system, thus obtaining the time series of the position of this marker. We synchronized this signal with that obtained from the EEG recordings with two different aims. First, we wanted to establish the onset of each movement using the derived time series of the spatial position (that is, the time series of velocity), which permitted epoching the EEG time-locked to this time-point as similarly done in previous studies with the EMG signal (Cui et al., 1999; Ohara et al., 2006). Second, we aimed to directly compare the time series of the velocity of the sender with the components of the MRBPs and their CSD transformed signal, determining a point-to-point relationship between the different phases of the movement execution and the concomitant brain activity. To our knowledge, no previous studies have addressed this issue, and we hypothesize that this novel manner to work with these components would allow finding patterns of neural and oscillatory activity related to variations of the velocity during movement performance. Finding similar results to those reported in studies using the EMG as movement-related signal would indicate the validity of this technique to study movement related brain activity. Also, because of the simplicity of our experimental design, our study highlights the potential of examining the brain activity associated with movement using the hand-tracking system in clinical protocols.

#### **MATERIALS AND METHODS**

#### **PARTICIPANTS**

Fifteen right-handed healthy volunteers participated in this experiment (8 women, mean age 26.51 ± 3.42 years). All participants were drug free and had no history of neurological diseases. They all gave written informed consent and were paid for their participation in the study. The study was approved by the ethics committee of the University of Barcelona and was conducted according to the Helsinki Declaration.

#### **EXPERIMENTAL PROCEDURE**

Participants sat in a comfortable chair. They rested their hand on the table surface, about 10 cm from the edge, with the index finger in extended position. They were asked to perform selfpaced pointing movements reaching a white target plate located 20 cm in front of the starting point with their extended index finger (**Figure 1**). These movements consisted in elbow extensions, which might additionally involve other components as shoulder extension (see **Figure 1B**). Therefore, we will refer to these movements as multi-join reaching movements. The trajectory of these movements was parabolic-shaped, i.e., they were not to drag the arm on the table to achieve the target, and they had to perform the backward movement to the initial position as soon as the target was reached. Thus, movements were performed with a very sharp onset, starting out from muscular relaxation.

Importantly, no external cue was used to trigger the intention of the movement, so that subjects performed each movement on their own. They were asked to allow an interval of 7–10 s between each movement. In order to avoid horizontal eyes movement artifacts, subjects were instructed to fixate their gaze to the target cue during the whole task and not to blink from about 3 s before movement onset to around 4 s after completion of the movement. At the beginning of the experimental session, at least 10 practice trials were performed in order to check and to adjust the frequency of execution of the movements, as well as to avoid any kind of rhythmicity on the performance and blinking. Special care was taken so that the subjects sat upright during this task, and they were instructed before and during the task about to minimize head movements.

The experimental design consisted in four blocks of movements (10 min per block), each performed with a single arm and two blocks per arm. Arms were alternated (right-left-right-left and vice versa) and the order of alternation was counterbalanced by subjects.

#### **HAND-TRACKING SYSTEM AND ANALYSIS**

A computerized hand-tracking system (CMS-30P, Zebris, Isny, Germany) was used to continuously record the three-dimensional spatial position of an ultrasound marker attached to the metacarpophalangeal joint of the index finger (**Figure 1A**). Data were

sampled at 66 Hz and analyzed with an in-house script using MatLab 7.5 (Mathworks Inc., Natick, MA). The recorded time series of the trajectory of the hand movements were filtered offline using a moving average filter (10 data points) in order to reduce the number of signal artifacts produced by spurious movements during performance. Each data sample consisted in three coordinates (components) that were used for the threedimensional reconstruction of the trajectory of each movement (**Figures 1C and D**).

time points during movement (preparation, achievement of the maximum

For each component of the time series of the trajectory, we computed off-line the time series of the velocity through numerical differentiation (Hermsdörfer et al., 2003). Typically, in each trajectory, the velocity increased up to a maximum and then decreased again until a local minimum when the target was reached (forward movement). Afterwards, an increase of negative velocity indicated the hand going back movement to the starting position (backward movement).

For each movement, the onset of the forward movement was defined as the first data point which longitudinal component (y-axis, **Figure 1D**) accomplished three conditions: (i) it should exceed a threshold of velocity of 8 mm/s, (ii) it should be displaced at least 5 mm from the initial position, and (iii) no other in the following 20 points of the time series of velocity (that corresponding approximately to 300 ms) should cross-back the zero-line.

For each movement trajectory and for each hand, several parameters of the performance were considered. Movement time indicated the time invested in reaching the target position. The peak velocity was considered as the maximum value that the velocity achieved during the movement time. As described in Hermsdörfer et al. (2003), the percentage of acceleration time was calculated as the percentage of the whole movement time in which the peak velocity was achieved. The maximum height achieved during the movement time was also determined, as well as the percentage of acceleration time for the height. Finally, the time elapsed between two consecutive movement onsets was calculated.

correspond to the backward movement towards the initial position.

To test differences between left/right movements performance, paired *t*-test were applied separately for each parameter described above. To analyze the similarity of these parameters between left/right movements, we used the Pearson correlation for each parameter. For the *t*-tests and the Pearson correlations, the significance level was set at *p* = 0.05.

#### **EEG DATA ACQUISITION**

The EEG signal was recorded continuously (bandpass-filtered 0.01–250 Hz; A/D rate 500 Hz) with a Brainvision system (BrainProducts, Munich, Germany), and analyzed offline using the EEGLAB toolbox (Delorme and Makeig, 2004). An electrode cap was used to record EEG from 29 Ag/AgCl electrodes (Fp1/2, F3/4, C3/4, P3/4, O1/2, F7/8, T3/4, T5/6, Fz, Cz, Pz, FC1/2, FC5/6, CP1/2, CP5/6, PO1/2) using the extended 10–20 system (Jasper, 1958). An external electrode placed on the right ocular canthus was used as reference. The ground electrode was placed on FCz. A VEOG electrode was placed 1 cm below the right eye to detect vertical eye movements, and two additional electrodes were placed on each mastoid, all them recorded against the reference electrode. All impedances were kept below 5 k-. Data were bandpass-filtered offline between 0.01 and 45 Hz. Eye-movement artifacts were removed using a second-order blind identification (SOBI) technique (Joyce et al., 2004). EEG data were re-referenced offline to the algebraic summation of both mastoids.

#### **EEG-TRACKING SYSTEM SYNCHRONIZATION**

The synchronization between the EEG signal and the handtracking system was performed to allow the time-to-time correspondence between both time series (ERPs and trajectories). To this end, we used a PC computer with the software Presentation (Neurobehavioral Systems, Albany, CA) that served for simultaneously sending a 5 V electrical squared-wave to both the hand-tracking system and the EEG recorder before each block of movements. We used an in-house-made cable for trigger-out this electric signal through a parallel port and to trigger it in to the tracking system in one side (through a parallel port) and to the EEG recording (through a serial port) in the other side. When the electric squared-wave was received, the continuous recording of the position of the ultrasound sender started. The mark appearing in the EEG recording at this time was later used offline as a synchronization marker between the recordings from the tracking system and the EEG.

#### **EEG SIGNAL ANALYSIS**

Single-trial EEG data epochs were extracted from the continuous EEG and used for averaging. Epochs were time-locked to the onset of the movement defined using the time series acquired with the hand-tracking system. Each epoch was 7 s long, taking 3 s before and 4 s after the movement onset. The baseline was determined as the average activity in the −2250 to −2000 ms interval preceding each onset. Trials exceeding ±200μV were rejected. For each participant, at least 100 free-of-artifacts trials were obtained for each arm. The averaged ERPs were transformed into reference-free CSD waveforms using the spherical spline surface Laplacian algorithm (using 4th degree-Legendre polynomials and a smoothing coefficient of 10<sup>−</sup>5) reported by Perrin et al. (1989). The CSD waveforms were computed for each original ERP waveform using a CSD toolbox for MatLab (Kayser and Tenke, 2006).

Time-frequency analysis was performed convolving singletrial data from both ERPs and CSD waveforms with a complex Morlet wavelet (Tallon-Baudry et al., 1997). The frequencies studied ranged from 1 to 40 Hz, with a linear increase of 1 Hz. The time-varying energy was computed for each trial and averaged separately for each subject. The percentage change with respect to a baseline set 2250–2000 ms before the movement onset was extracted and averaged. Percentage of power decrease (ERD) or increase (ERS) of the mu (8–13 Hz) and beta band (17–24 Hz) with respect to baseline were calculated, since these are the most commonly studied in motor tasks (Neuper et al., 2006).

An initial analysis was performed to ensure that the topographic distribution of the ERPs and ERD/ERS were the same for left and right arm movements (See Supplementary material for a further description of the analysis). After demonstrating this, data for left and right arm movements were merged. To maintain the laterality effects, the signal acquired from channels located on the left and right hemisphere were switched for left arm movements. This procedure allowed us to consider all movements as right hand movements.

The statistical analysis was aimed to study specific scalp distributions of the activity during the different phases of the movement based in the velocity time series behavior during the movement performance. We identified the time-windows of interest based on the kinematics of the movement (see definition of these intervals in the Results section). We analyzed differences on the scalp distribution of the activity for each time-window of interest. We divided the set of electrodes into nine different regions: anterior-left (F7, F3, and FC5), anterior-medial (Fz, FC1, and FC2), anterior-right (F4, F8, and FC6), central-left (C3, T3, and CP5), central-medial (Cz, CP1, and CP2), central-right (C4, T4, and CP6), posterior-left (P3, T5, and O1), posteriormedial (Pz, PO1, and PO2) and posterior-right (P4, T6, and O2). First, we conducted an analysis of variance (ANOVA) with factors TIME-COURSE (each time window selected from kinematic data of the movement) × ANTEROPOSTERIOR (anterior regions vs. central regions vs. posterior regions) × LATERALITY (left regions vs. medial regions vs. right regions) to ensure distributional differences between the different times-windows. This clustering was considered in order to reduce the number of degrees of freedom in the statistical analysis (See **Figures S1**–**S3** in Supplementary Material for illustration of the average-waveforms corresponding to each of these clusters for the ERPs, mu and beta power bands). Second, we conducted an ANOVA for each timecourse with factors ANTEROPOSTERIOR and LATERALITY, to investigate the distribution of the activity within each time interval. When appropriate, the Greenhouse-Geisser correction was used. In all analyses, the level of significance was set at *p* = 0.05.

#### **RESULTS**

#### **BEHAVIORAL ANALYSIS**

All movements showed a bell-shaped velocity profiles with respect the longitudinal edge. **Figure 1C** shows the three-dimensional reconstruction of the trajectory of one movement. The recorded kinematic parameters presented the standard characteristics of pointing movements in both arms, that is, single-positive and negative peaks on the velocity and single positive peaks on the displacement (**Figure 2**, middle).

We did not find differences between right and left movements in movement time [*t*(14) = −0.9, *p* > 0.1], maximum altitude [*t*(14) = 0.5, *p* > 0.1], percentage of acceleration time [*t*(14) = 0.93, *p* > 0.1] and percentage of the acceleration time for the altitude [*t*(14) = 1.46, *p* > 0.1]. Only a slight but non-significant difference in maximum velocity [*t*(14) = −1.9, *p* = 0.08] was found, being left movements slightly faster than right movements. Furthermore, the number of movements performed for both hands was similar [*t*(14) = 0.58, *p* > 0.1]. Indeed, the elapsed time between two consecutive forward movement onsets (mean ±*SD*) was 8.4 *s* ± 2.7 for the right arm movements and 8.6 *s* ± 2.3 for the left arm movements. Movement time (*r* = 0.96, *p* < 0.01), maximum velocity (*r* = 0.73, *p* < 0.01), maximum altitude (*r* = 0.62, *p* < 0.01) and the latency of the peak of maximum velocity relative of the movement time (*r* = 0.85, *p* < 0.01) were all strongly correlated for left and right movements. By contrast, the latency of the peak of the maximum altitude relative to the movement time did not show a significant correlation between hands (*r* = 0.4, *p* > 0.1). For further details about the behavioral performance, see Table S1 in Supplementary material.

#### **ELECTROPHYSIOLOGICAL DATA**

The statistical analysis did not reveal differences on the distribution as a function of the active hand and we merged the epochs corresponding to the neural activity obtained during left and right arm movements (See Supplementary material for details of this analysis). **Figure 2** shows the waveforms extracted from the grand mean ERPs and CSD over the contralateral, ipsilateral and medial

**FIGURE 2 | Movement-related ERPs (A) and their laplacian-transformed CSD waveforms (B) at the locations C3, C4, and Cz time-locked to the onset of the movement for merged left and right movements.** Before merging, electrodes were flipped from one hemisphere to the other for left movements. Below, mean 2-D time series of the trajectory (y-coordinate) of the movement registered with the ultrasound sender (gray). The time series of velocity (black)

is represented as the numerical differentiation of the displacement time series. Vertical squares (time-intervals) correspond to different components of the movement-related ERPs that were related with different stages of the movement preparation and execution. Bellow, the topographical distribution of the scalp activity in each of these time-intervals is shown. Warm colors indicate positive activation and cold colors indicate negative activation.

motor regions (see **Figure S4** for details of data from the right and left hand separately). Single-trial epochs were averaged timelocked to the movement onset detected using the time series of velocity. For both ERPs and CSD-transformed epochs, we performed an analysis to ensure that the amplitudes did not change between left (non-dominant) and right hand (dominant) movements during preparation and execution phases. To this aim, we divided the interval from 1 s before to 1 s after the movement onset in sub-intervals of 100 ms each. This selection included the EEG signal corresponded to the preparation and execution of the movement. We included into the ANOVA the mean amplitude of the activity over the selected regions in each time interval for both arm movements. After, we proceed to merge the epochs obtained from left and right arm movements. As it is shown in **Figure 2**, we selected three time-intervals for the statistical analysis. This selection was based on the kinematics of the movement: from −200 to 0 ms (prior to the movement onset), from 250 to 350 ms (time-interval corresponding to the peak velocity) and from 700 to 800 ms after the onset (corresponding with the offset of the forward movement).

#### *ERPs grand mean*

After merging epochs from both arms, we found a significant effect of TIME-COURSE [*F*(2, 28) = 16.9, *p* < 0.01], and a significant interaction TIME COURSE × ANTEROPOSTERIOR × LATERALITY [*F*(8, 112) = 16.79, *p* < 0.01] indicating changes of the scalp distribution of the activity during the performance of movements. **Figure 2A** shows the ERPs waveforms at the scalp locations C3, C4, and Cz. In this figure we show the concomitant temporal evolution of these waveforms and the time series of displacement and velocity of the marker. First, we observed a big negative deviation starting prior to the movement onset that achieved its maximum at −100 ms. This activity showed a clear central and anterior-medial distribution [time-interval −200 to 0 ms, ANTEROPOSTERIOR × LATERALITY, *F*(4, 56) = 21.67, *p* < 0.01] (**Figure 2A**, below). Additionally, in order to seek for a significant lag between the peak activity corresponding to the contralateral M1 and the SMA, we conducted a paired *t-*test between the latencies of both peaks. We did not find a significant time-lag between these two peak activity [*t*(14) = 0.27, *p* > 0.5]. At the time to peak velocity, the distribution of the activity became mainly posterior and contralateral to the movement side [time-interval 250 to 350 ms, ANTEROPOSTERIOR × LATERALITY, *F*(4, 56) = 5.62, ε = 0.61, *p* < 0.01]. In order to corroborate this result, we calculated again the epochs of the ERPs locked trial-by-trial to the peak velocity. **Figure 3A** shows the ERP-waveforms at the scalp locations C3, C4, and Cz. Again, we observed this negative component that was maximal at 0 ms coincident to the peak velocity and showing the same distribution through the scalp [time-interval −50 to 50 ms, ANTEROPOSTERIOR × LATERALITY *F*(4, 56) = 13.49, ε = 0.57, *p* < 0.01]. Finally, we found a slow positive deviation starting at 350 ms after the movement onset, peaking at the end of the forward movement. In this time-interval, the distribution of the activity was anterior-medial and bilateral, greater in regions ipsilateral to the movement [time-interval 700–800 ms, ANTEROPOSTERIOR × LATERALITY, *F*(4, 56) = 11.18, *p* < 0.01].

**FIGURE 3 | Movement-related ERPs (A) and their laplacian-transformed CSD waveforms (B) at the locations C3, C4, and Cz time-locked to the peak-velocity.** Before merging, electrodes were flipped from one hemisphere to the other for left movements. Below, mean 2-D time series of the trajectory (y-coordinate) of the movement registered with the ultrasound sender (gray). The time series of velocity (black) is represented

as the numerical differentiation of the displacement time series. Vertical square include the time-interval (centered at the 0) that corresponds to the topographic representation. Warm colors indicate positive activation and cold colors indicate negative activation. To note, there is a clear degree of similarity between the component (−50 to 50 ms) and its distribution than the observed in **Figure 2**.

#### *CSD grand mean*

After merging data, we found differences of distribution of the CSD-transformed activity over the whole scalp between these three time-courses [TIME-COURSE × ANTEROPOSTERIOR × LATERALITY, *F*(4, 56) = 4.65, *p* < 0.01]. A first negative component was found starting 1000 ms before the movement onset, being maximal around −100 ms. This activity was prominently distributed in centro-medial regions [time-interval −200 to 0 ms, ANTEROPOSTERIOR × LATERALITY, *F*(4, 56) = 32.45, *p* < 0.01], as it is shown in **Figure 2B**. Similarly as we did for the ERPs grand mean analysis, we sought for significant differences in the peak latencies between contralateral M1 and the SMA. Again, we did not find a significant time-lag between these activities [*t*(14) = 0.58, *p* > 0.5]. Coincident to the peak velocity, we found a sink distributed over the posterior region, contralateral to the movement side. In addition, a current source distributed in frontal regions was observed at this time, more prominent over the contralateral regions [time-interval 250–350 ms, ANTEROPOSTERIOR × LATERALITY, *F*(4, 56) = 11.64, *p* < 0.01]. When locking the epochs trial-by-trial to the peak velocity, we found similar CSD-waveforms at the locations C3, C4, and Cz as observed locking the activity to the movement onset, as well as the scalp distribution [ANTEROPOSTERIOR × LATERALITY, *F*(4, 56) = 11.9, *p* < 0.01] (see **Figure 3B**). Finally, at the end of the forward movement, we found a source distributed in post-central bilateral regions, more prominent over regions contralateral to the side of the movement [timeinterval 700–800 ms, ANTEROPOSTERIOR × LATERALITY, *F*(4, 56) = 14.6, *p* < 0.01].

#### **TIME FREQUENCY-ANALYSIS**

Similarly to the ERPs and CSD waveforms, we merged the epochs corresponding to the oscillatory brain activity obtained from left and right arm movements (See Supplementary material for the description of this procedure). **Figures 4**, **5** show the mu- and beta-ERD/S at locations C3, C4, and Cz locked to the movement onset (see **Figures S5**, **S6** for details of data from the right and left hand separately). In both power-bands we found a large desynchronization over the contralateral, ipsilateral and central motor areas, starting around 1500 ms before the movement onset, lasting until 2000 s after the movement onset. In addition, a post-movement synchronization in both power bands was found starting 2300 ms after the movement onset. This synchronization was extended in regions contralateral to the side of the movement, more prominently in the mu-band (see **Figure 4**).

In addition to the intervals considered in the statistical analysis of the ERPs, we included two other intervals corresponding to the early preparation of the movement (−1000 to −800 ms) and the end of the whole movement (3200–3400 ms).

#### *Time-frequency derived from ERPs*

The analysis of the mu-band extracted from the merged data indicated different and specific spatial distributions in each time-interval [TIME-COURSE × ANTEROPOSTERIOR × LATERALITY *F*(6, 84) = 4.14, *p* < 0.01]. During the preparation of the movement, the mu-ERD showed a clear distribution over posterior regions contralateral to the movement side [time-interval −1000 to −800 ms, ANTEROPOSTERIOR × LATERALITY, *F*(4, 56) = 3.97, *p* < 0.01] (**Figure 4A**). From the onset of the movement until reaching the target location, the

mu-ERD was distributed mainly over posterior and bilateral regions [time intervals from −200 to 0 ms, from −250 to 350 and 700 to 800 ms, ANTEROPOSTERIOR, *F*(2, 28) > 17.7, *p* < 0.01 in all intervals]. At the end of the whole movement, a clear mu-ERS arose in regions contralateral to the movement side [timeinterval 3200–3400 ms, ANTEROPOSTERIOR × LATERALITY, *F*(4, 56) = 3.58, *p* < 0.05].

The power mu-band extracted from the CSD transformed data also showed specific spatial distributions in each time-interval [TIME-COURSE × ANTEROPOSTERIOR × LATERALITY, *F*(6, 84) = 5.005, *p* < 0.01]. The mu-ERD obtained from these data showed a distribution over the posterior contralateral regions, similarly to that obtained from voltage signal [time-interval −1000 to −800 ms, ANTEROPOSTERIOR × LATERALITY, *F*(4, 56) = 2.86, *p* < 0.05]. Differently, however, the topographical distribution revealed a more enclosed activity within these regions that the observed from voltage data (**Figure 4B**, bottom). Again, no differences on the distribution of the mu-ERD were found in time intervals covering the onset, the peak velocity and the end of the forward movement [time intervals from −200 to 0 ms, from −250 to 350 and 700 to 800 ms, ANTEROPOSTERIOR, *F*(2, 28) > 6.07, *p* < 0.01 in all intervals]. Similarly to the mu-band power obtained from the voltage signal, we found a clear ERS over central and contralateral to the movement side [time-interval

3200–3400 ms, ANTEROPOSTERIOR × LATERALITY *F*(4, 56) = 2.78, *p* < 0.05], more localized over contralateral motor regions (see **Figure 4B**, bottom). For the beta band, we did not find a significant effect of TIME-COURSE [*F*(3, 42) = 0.52, *p* > 0.1] in the merged data. However, we found significant interactions of TIME-COURSE × LATERALITY [*F*(6, 84) = 3.16, ε = 0.61, *p* < 0.05] and TIME-COURSE × ANTEROPOSTERIOR [*F*(6, 84) = 2.41, *p* < 0.05 ] suggesting certain specificity of the distribution of the oscillatory beta power activity as a function of the time-intervals. During the preparation of the movement, the beta-ERD was distributed prominently over central and contralateral regions (see **Figure 5A**) [time-interval −1000 to −800 ms, ANTEROPOSTERIOR × LATERALITY, *F*(4, 56) = 2.74, *p* < 0.05] that became larger at bilateral and medio-central regions during the onset of the movement [time-interval −200 to 0 ms, ANTEROPOSTERIOR × LATERALITY, *F*(4, 56) = 3.07, *p* < 0.05]. At the peak velocity, we did not find a clear distribution of the ERD [time-interval 250–350 ms, ANTEROPOSTERIOR × LATERALITY, *F*(4, 56) = 1.93, *p* > 0.1]. However, data showed a large beta-ERD over central and medial regions [timeinterval 250–350 ms, ANTEROPOSTERIOR, *F*(2, 28) = 4.2, *p* < 0.05; LATERALITY, *F*(2, 28) = 6.67, *p* < 0.05]. We found a larger desynchronization over centromedial regions at the end of the forward movement [time-interval 700–800 ms, ANTEROPOSTERIOR × LATERALITY, *F*(4, 56) = 2.99, ε =

0.609, *p* < 0.05]. Finally, we found an increase of power synchronization (ERS) starting 2 s after the movement with a clear distribution over centromedial regions [time-interval 3200– 3300 ms, ANTEROPOSTERIOR × LATERALITY, *F*(4, 56) = 3.1, *p* < 0.05].

#### *Time frequency derived from CSD-transformed data*

The power mu-band extracted from the CSD transformed data also showed specific spatial distributions in each time-interval [TIME-COURSE × ANTEROPOSTERIOR × LATERALITY, *F*(6, 84) = 5.05, *p* < 0.01]. The mu-ERD obtained from these data was distributed over the posterior contralateral regions, similarly to that obtained from voltage signal [time-interval −1000 to −800 ms, ANTEROPOSTERIOR × LATERALITY, *F*(4, 56) = 2.86, *p* < 0.05] (**Figure 4B**, bottom). Again, no differences on the distribution of the mu-ERD were found in time intervals covering the onset, the peak velocity and the end of the forward movement [time intervals from −200 to 0 ms, from −250 to 350 and 700 to 800 ms, ANTEROPOSTERIOR, *F*(2, 28) > 6.07, *p* < 0.01 in all intervals]. Similarly to the muband power obtained from the voltage signal, we found a clear ERS over central and contralateral to the movement side [timeinterval 3200–3400 ms, ANTEROPOSTERIOR × LATERALITY *F*(4, 56) = 2.78, *p* < 0.05], more localized over contralateral motor regions (see **Figure 4B**, bottom).

Regarding the oscillatory activity within the beta band from the CSD transformed signal, we again found certain similarity with that obtained from the voltage signal. After merging, we found a significant effect of TIME-COURSE [*F*(3, 42) = 9.61, *p* < 0.01], and also significant interaction of TIME-COURSE × ANTEROPOSTERIOR [*F*(6, 84) = 4.28, ε = 0.38, *p* < 0.05] and TIME-COURSE × LATERALITY [*F*(6, 84) = 3.655, ε = 0.331, *p* < 0.05]. During the preparation of the movement, beta-ERD was mainly distributed over central contralateral regions [time-interval −1000 to 800 ms, ANTEROPOSTERIOR × LATERALITY, *F*(4, 56) = 5.63, *p* < 0.01] (see **Figure 5B**). During the onset of the movement, the distribution of the ERD shifted toward mediocentral regions and contralateral regions [timeinterval −200 to 0 ms, ANTEROPOSTERIOR × LATERALITY, *F*(4, 56) = 3.4, *p* < 0.05], and remained similar during the execution until the end of the forward movement. At the end of the movement, a beta-ERS was found, showing a clear distribution over contralateral-central regions [time-interval 3200– 3400 ms, ANTEROPOSTERIOR × LATERALITY, *F*(4, 56) = 2.87, *p* < 0.05].

#### **DISCUSSION**

The main purpose of the present study was to investigate the association between the fluctuation of the time series of velocity during the performance of multi-joint reaching movements and (i) the components of the MRBPs and (ii) the ERD/S in the mu and beta-bands. We found three amplitude-peaks in the components of the MRBPs corresponding to specific time intervals within the preparation and execution phases of movements. These components have been clearly associated with the dynamics of the time series of velocity obtained from the trajectories of the movements recorded with a hand-tracking system. We show a novel approach to investigate the components of the movement-related brain activity brain activity of multi-joint selfpaced movements as a function of the changes of the velocity pattern during their performance.

#### **BEHAVIORAL DATA**

The analysis of movement trajectories showed the standard characteristics of pointing movements in both arms (Kirsch et al., 2010). With regard to the time series of velocity, a first positive peak after the movement onset indicated the peak velocity during the forward movement. Following, a negative peak corresponded to the peak velocity during the backward movement. Interestingly, behavioral parameters showed a high degree of similarity between right (dominant) and left (non-dominant) hand movement. This seems to be contradictory as the performance of both hands should differ as a consequence of the function lateralization (Lavrysen et al., 2012). There are a considerable number of studies suggesting that right arm advantages (in right-handers) might exist for kinematic parameters such as movement velocity and movement time (Hoffmann, 1994; Elliott et al., 1995; Sainburg and Kalakanis, 2000). Sainburg and Kalakanis (2000) found differences in the magnitude of the left/right shoulder muscle torque during reaching movements, indicating that control of both limbs might be underlined by different neural sources. Nevertheless, more recent studies support the idea that differences between the dominant and non-dominant sides arise in other aspects of motor performance than purely kinematics (Sainburg, 2002; Wang and Sainburg, 2007), such as the strength at the initiation of the movement or the selected strategy to achieve the target. Our findings seem to point in this direction, given that our measurements explain kinematic characteristics of the movement rather than other qualitative parameters (e.g., the median deviation of movement path) which are dependent of the handedness of the subject. Nonetheless, the acquisition of the time series of position at more locations of the arm, such as the shoulder of the fore-arm, would definitively allow a more fine-grained comparison of kinematic properties of left/right movements. Therefore, we should remain speculative about this concern.

#### **ELECTROPHYSIOLOGICAL RECORDINGS**

Most of the studies of MRBPs have used the EMG activity acquired with attached skin electrodes to identify the movement onset in absence of an external trigger (Deecke et al., 1969, 1980; Berardelli et al., 1996; Mackinnon and Rothwell, 2000). In our paradigm we used the signal recorded from the ultrasound marker to determine the movement onset using a velocity threshold. The use of ultrasonic signal to categorize trials in function of the velocity has been previously used to study the sensitivity of evoked brain activity to the range of motion in rapid goal-directed movements (Kirsch and Hennighausen, 2010; Kirsch et al., 2010), as well as synchronize the EEG signal to the movement onset for extracting the epochs (Bradberry et al., 2010). However, this is the first time that both time series (trajectory-based and EEG) are analyzed together with the aim to find associations between the characteristics of the MRBPs-components and the kinematics during the performance of natural movements.

The MRBPs during the multi-joint outback movements showed, for both voltage and CSD waveforms, a series of deflections that have been reported previously as accompanying ballistic movements (Berardelli et al., 1996; Babiloni et al., 1999; Cui et al., 1999). As a novelty, we establish a point to point association between the time series of velocity of movements and specific components of the MRBPs during movement performance. First, we observe a negative component that peaks at few tens of milliseconds prior to the onset that might correspond to the late-BP (Shibasaki and Hallett, 2006). Previous studies have reported the recruitment of the supplementary motor area (SMA) during the preparation of the movement, as well as the involvement of the contralateral primary motor regions immediately before the onset using different techniques such as EEG (Deecke et al., 1980; Cui et al., 1999; Ohara et al., 2006), event-related functional magnetic resonance imaging (Cunnington et al., 2003, 2005) and magnetoencephalography (MEG) (Cheyne et al., 1991; Nagamine et al., 1996; Erdler et al., 2000). As expected from these studies, our data reveal a prominent activation of fronto-central regions during the late-BP which would indicate the recruitment of the SMA. Another possible explanation for the increase of the activity in the SMA is associated to the role of this area in time estimation. Indeed, in this task we asked participants to wait a concrete period of time (7–10 s) between two consecutive movements. Several studies pointed the role of SMA in the attentional modulation of the time estimation (Coull et al., 2004; Schwartze et al., 2012). This possible explanation could not be ruled-out in this study. However, the SMA activation that we observed is very similar in distribution and latency to that observed in other studies using the same kind of paradigms involving self-paced motor programs (see Shibasaki and Hallett, 2006, for review) which is considered to be motor-related. To note, we found that the SMA activity did not precede the activity of the contralateral M1 as would be expected taking into account the hierarchical organization of the motor system, as suggested in previous studies (Vidal et al., 2003). This time-lag between these activities has been clearly observed in tasks involving response-choice (see Carbonnell et al., 2004). However, the task here described consists in the repetition of movements that are identically performed along the whole task, which might reduce the hierarchical flow of activity within the motor system. This could explain the apparent coincidence in time of these peaks of activity corresponding with these two structures.

A second selected stage of the movement corresponds to the peak velocity. In this period, a second negative component was observed, being maximal over parietal areas contralateral to the movement onset. We confirmed this finding epoching the MRBPs to this time point, which showed a similar behavior. Therefore, it seems that this activation might indicate a neural substrate of the encoding of the kinematics of movements. Such activity is in agreement with previous studies that reported increments of activity over the posterior parietal cortex (PPC) in sensorimotor processes during visuomotor reaching movements (Reichenbach et al., 2014). However, to our knowledge, we report by first time a clear relationship between this neural activity and the kinematics of the movement, concretely the achievement of the peak velocity. This result shows the active role of the PPC, not only in encoding the afferent input from the sensory system, but also in other processes related with monitoring the kinematics of the movement. In a very brilliant study, Bourguignon et al. (2012) reported evidences from MEG about the pivotal role of the left posterior parietal cortex in the integration of sensorimotor features of limb kinematics, which might agree with the enrolment of this area in processing velocity changes during movement performance.

Finally, a large positive activation arises on both motor cortices when the target is achieved, mainly distributed over contralateral parieto-central regions. Few studies have provided evidence of changes in corticospinal excitability accompanying voluntary relaxation of a muscle. Transcranial magnetic stimulation studies have reported a decreased motor evoked potentials (MEP) in the contracting muscle related with the decrease of the EMG signal from the same muscle at the offset of movements (Waldvogel et al., 2000). In addition, positive motor-related movement potential has been defined as an inhibitory process, which is in agreement with our findings.

#### **EVENT-RELATED SYNCHRONIZATION/DESYNCHRONIZATION**

In addition to ERPs, we investigated whether the activity and the scalp distribution of the ERD/S in the mu and beta bands were also related with kinematic properties of the movement. During the preparation and the execution of movements, we found the same pattern of synchronization and desynchronization in both bands as reported previously (Pfurtscheller and Aranibar, 1977, 1979; Pfurtscheller et al., 1996; Stancák Jr and Pfurtscheller, 1997; Alegre et al., 2003, 2004a,b). This oscillatory activity has been largely considered an indicator of neural activation during motor tasks (Salmelin et al., 1995). However, we did not find any specific distribution of this ERD associated with to the kinematics of the movement as we found with the MRBPs. It has been suggested that the sources of MRBPs and those related to the oscillatory brain activity may have different roles during movement execution. In such case, our findings would support this hypothesis. Of interest, we found a certain overlapping in these results when we studied the oscillatory brain activity of the voltage and the CSD. Notably, however, the CSD maps evidenced a superior performance localizing the scalp regions with the maximal activation in both mu and beta power activity. This is consequence of the Laplacian transformation appliying a spatial high-pass filtering, which avoids the contribution of spurious remote activities in calculating the sources (Tenke and Kayser, 2012).

#### **CLINICAL APPLICATIONS**

An important aspect of this study is the use of a hand-tracker to extract the kinematic aspects of movement performance. This method establishes a potential tool to study the evolution of the EEG related to the intrinsic properties of the movement performance. In our view, this approach could be useful in clinical studies. Indeed, hand-trackers are used to evaluate several clinical scores of the quality of movement in patients suffering from stroke consequences (Hermsdörfer and Goldenberg, 2002) and focal dystonia (Berardelli et al., 1996; Ruiz et al., 2011). We believe that the application of this experimental setup would help to disentangle specific patterns of brain activity associated to the behavioral outcome of movements. Furthermore, longitudinal studies could also benefit from this method, allowing the study of changes in brain activity and performance due rehabilitative interventions (Amengual et al., 2012; Grau-Sánchez et al., 2013).

Particularly, EMG is very sensitive to spurious activity that records from skin and it is inevitably contaminated by artifacts especially in clinical studies (Olier et al., 2011). This undesirable activity may alter the interpretation of the EMG signal when relating muscular activation to movements (De Luca et al., 2010; Olier et al., 2011). In addition, besides the valuable physiological information that EMG signal provides, specific kinematic properties of the movement are missed. Instead, signal recorded from ultrasonic markers allows a better understanding of the kinematics of movement and an easier detection of movement changes than using EMG. However, more cross-modal studies are needed to compare and validate both signals.

#### **LIMITATIONS OF THIS STUDY**

This study shows a set of limitations that will be discussed in this section. First, we only used a single marker of the hand-tracking system to register the position of the hand during the performance of movements. Although this montage was fair enough to extract the time series of the velocity and to identify associations between brain activity and kinematics, more markers attached to different locations on the arm, such as the shoulder and the forearm, would provide finer information about the dynamics of joint-muscles during movements and their relation with the EEG activity (Wang and Sainburg, 2007). This additional information could be helpful to confirm the left/right similarities that we found in the kinematic parameters included in our behavioral analysis. A second caveat of this study is the limited temporal resolution of the hand-tracking system (66 Hz) compared to the sampling rate of the EEG signal (250 Hz). Such difference in the frequency of acquisition of these signals might result in a reduced accuracy when locking the MRBPs to the response onset compared to more standardized methods using the EMG signal. However, providing an improved method for locking EEG to the movement onset than those EMG-based was beyond the scope of our study. Instead, we aimed to propose this method as a different manner to look at these motor potentials, as they allow direct comparison between the changes of brain electrical activity and the kinematics during movement execution, which has not been previously described. Indeed, to rule out that such inaccuracy locking the EEG to the motion signal might have caused a significant variability in our data that could explain the component found at the time period of the peak velocity, we extracted the ERPs locked to this time point in a subsequent analysis obtaining exactly the same pattern of activity. Therefore, this method seems reliable enough to study the motor related brain activity associated with the kinematics of movement performance. Future studies should address this issue including the recording of the EMG activity in the same experimental setup reported here. This would allow to compare the ERPs extracted by both locking methods (EMG and handtracking system), as well as obtaining a quantitative value of the inaccuracy acquired with the method that we report. A third limitation of this study is the reduced number of electrode locations for the EEG recordings, which may represent an impact for an optimal estimation of the Laplacian transformation of the EEG signal (Yao and Dewald, 2005). However, CSD waveforms extracted with similar algorithms based in spline interpolation have been previously used with the same number of electrodes or even fewer (Carbonnell et al., 2004; Tandonnet et al., 2005; Meckler et al., 2010). Another limitation regarding the application of the Laplacian transformation of the EEG signal concerns to the mean inter-electrode distance of our montage (∼5 cm). Early reports suggest that the accuracy of cortical source localization methods decreases as a function of the distance between electrodes considered in the model (Law et al., 1993). However, Giard et al. (2013) suggest that the optimal number of electrodes would range between 30 and 50 in order to avoid errors between the theroretical and real electrode position (that is sensitive to the number of electrodes). In this sense, our montage consists in 29 electrode positions, barely below the threshold defined Giard et al. (2013). Yet, CSD-waveforms are considered a sound method to extract the neural sources in sensorimotor tasks (Tenke and Kayser, 2012), and a low number of electrodes might not affect the reliability of our findings (Ohara et al., 2006).

#### **ACKNOWLEDGMENTS**

We are very grateful to Dr. Toni Cunillera for his technical assistance in synchronizing the EEG amplifier and the hand-tracking system. Different institutions and grants have supported this present work. Julià L. Amengual has been supported by the Ministry of Education and Science of the Spanish Government, within the Research Formation Program (SEJ2006-13996). Josep Marco-Pallarés is supported a Spanish Research Grant (MICINN, PSI2012-37472). Thomas F. Münte is supported by the DFG and BMBF. This project has been supported by a Spanish Research Grant (MICINN, PSI2011-29219) awarded to Antoni Rodríguez-Fornells.

#### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fnhum. 2014.00271/abstract

**Figure S1 | ERPs (A) and CSD-transformed waveforms (B) at the clustered locations over the scalp, time-locked to the onset of the movement.** The activity corresponding to the merged left and right movements is shown. Before merging, regions were flipped from one hemisphere to the other for left movements. FL, frontal left; FM, frontal medial; FR, frontal right; CL, central left; CM, central medial; CR, central right; PL, posterior left; PM, posterior medial; PR, posterior right.

**Figure S2 | Mu-ERD/ERS (8–13 Hz) extracted from voltage (A) and CSD-transformed waveforms (B) at the clustered locations over the scalp, time-locked to the onset of the movement.** The activity corresponding to the merged left and right movements is shown. Before merging, regions were flipped from one hemisphere to the other for left movements. FL, frontal left; FM, frontal medial; FR, frontal right; CL, central left; CM, central medial; CR, central right; PL, posterior left; PM, posterior medial; PR, posterior right.

**Figure S3 | Beta-ERD/ERS (18–24 Hz) extracted from voltage (A) and CSD-transformed waveforms (B) at the clustered locations over the scalp,** **time-locked to the onset of the movement.** The activity corresponding to the merged left and right movements is shown. Before merging, regions were flipped from one hemisphere to the other for left movements. FL, frontal left; FM, frontal medial; FR, frontal right; CL, central left; CM, central medial; CR, central right; PL, posterior left; PM, posterior medial; PR, posterior right.

**Figure S4 | Movement-related ERPs (A) and their laplacian-transformed CSD waveforms (B) at the locations C3, C4, and Cz time-locked to the onset of the movement separately for left and right movements.** Below, mean 2-D time series of the trajectory (y-coordinate) of the movement registered with the ultrasound sender (gray). The time series of velocity (black) is represented as the numerical differentiation of the displacement time series. Vertical squares (time-intervals) correspond to different components of the movement-related ERPs that were related with different stages of the movement preparation and execution. Bellow, the topographical distribution of the scalp activity in each of these time-intervals is shown. Warm colors indicate positive activation and cold colors indicate negative activation.

**Figure S5 | Grand average traces of mu (8–13 Hz) ERD/ERS extracted from voltage (A) and CSD-transformed signal (B) for electrodes C3, C4, and Cz time-locked to the onset of the movement separately for left and right movements.** Values are in percentages of the base-line period (−2250 to −2000 ms). Below, mean 2-D time series of the trajectory (y-coordinate) of the movement registered with the ultrasound sender (gray). The time series of velocity (black) is represented as the numerical differentiation of the displacement time series. Vertical squares (time-intervals) correspond to different components of the ERD/S that were related with different stages of the movement preparation and execution. Bellow, the topographical distribution of the power synchronization and desynchronization is shown. Warm colors indicate increases of synchronization and cold colors indicate increases of desynchronization.

**Figure S6 | Grand average traces of beta (18–24 Hz) ERD/ERS extracted from voltage (A) and CSD-transformed signal (B) for electrodes C3, C4, and Cz time-locked to the onset of the movement separately for left and right movements.** Values are in percentages of the base-line period (−2250 to −2000 ms). Below, mean 2-D time series of the trajectory (y-coordinate) of the movement registered with the ultrasound sender (gray). The time series of velocity (black) is represented as the numerical differentiation of the displacement time series. Vertical squares (time-intervals) correspond to different components of the ERD/S that were related with different stages of the movement preparation and execution. Bellow, the topographical distribution of the power synchronization and desynchronization is shown. Warm colors indicate increases of synchronization and cold colors indicate increases of desynchronization.

#### **REFERENCES**


therapy in chronic stroke patients revealed by transcranial magnetic stimulation. *PLoS ONE* 8:e61883. doi: 10.1371/journal.pone.0061883


conditions. *Clin. Neurophysiol.* 111, 1997–2007. doi: 10.1016/S1388-2457(00) 00432-6


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 22 October 2013; accepted: 11 April 2014; published online: 29 April 2014. Citation: Amengual JL, Marco-Pallarés J, Grau C, Münte TF and Rodríguez-Fornells A (2014) Linking motor-related brain potentials and velocity profiles in multi-joint arm reaching movements. Front. Hum. Neurosci. 8:271. doi: 10.3389/fnhum.2014.00271 This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Amengual, Marco-Pallarés, Grau, Münte and Rodríguez-Fornells. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# From speech to thought: the neuronal basis of cognitive units in non-experimental, real-life communication investigated using ECoG

*Johanna Derix1,2,3\*, Olga Iljina1,4,5,6, Johanna Weiske1,2,3, Andreas Schulze-Bonhage1,3, Ad Aertsen2,3 and Tonio Ball 1,3\**

*<sup>1</sup> Department of Neurosurgery, Epilepsy Center, University Medical Center Freiburg, Freiburg, Germany*

*<sup>2</sup> Department of Neurobiology and Biophysics, Faculty of Biology, University of Freiburg, Freiburg, Germany*

*<sup>3</sup> Bernstein Center Freiburg, University of Freiburg, Freiburg, Germany*

*<sup>4</sup> GRK 1624, University of Freiburg, Freiburg, Germany*

*<sup>5</sup> Department of German Linguistics, University of Freiburg, Freiburg, Germany*

*<sup>6</sup> Hermann Paul School of Linguistics, University of Freiburg, Freiburg, Germany*

#### *Edited by:*

*Klaus Gramann, Berlin Institute of Technology, Germany*

#### *Reviewed by:*

*Ivana Konvalinka, Technical University of Denmark, Denmark Wallace Chafe, University of California, Santa Barbara, USA*

#### *\*Correspondence:*

*Johanna Derix and Tonio Ball, Epilepsy Center, University Medical Center Freiburg, Engelbergerstr. 21, 79106 Freiburg, Germany e-mail: johanna.derix@ uniklinik-freiburg.de; tonio.ball@uniklinik-freiburg.de*

Exchange of thoughts by means of expressive speech is fundamental to human communication. However, the neuronal basis of real-life communication in general, and of verbal exchange of ideas in particular, has rarely been studied until now. Here, our aim was to establish an approach for exploring the neuronal processes related to cognitive "idea" units (IUs) in conditions of non-experimental speech production. We investigated whether such units corresponding to single, coherent chunks of speech with syntactically-defined borders, are useful to unravel the neuronal mechanisms underlying real-world human cognition. To this aim, we employed simultaneous electrocorticography (ECoG) and video recordings obtained in pre-neurosurgical diagnostics of epilepsy patients. We transcribed non-experimental, daily hospital conversations, identified IUs in transcriptions of the patients' speech, classified the obtained IUs according to a previously-proposed taxonomy focusing on memory content, and investigated the underlying neuronal activity. In each of our three subjects, we were able to collect a large number of IUs which could be assigned to different functional IU subclasses with a high inter-rater agreement. Robust IU-onset-related changes in spectral magnitude could be observed in high gamma frequencies (70–150 Hz) on the inferior lateral convexity and in the superior temporal cortex regardless of the IU content. A comparison of the topography of these responses with mouth motor and speech areas identified by electrocortical stimulation showed that IUs might be of use for extraoperative mapping of eloquent cortex (average sensitivity: 44.4%, average specificity: 91.1%). High gamma responses specific to memory-related IU subclasses were observed in the inferior parietal and prefrontal regions. IU-based analysis of ECoG recordings during non-experimental communication thus elicits topographicallyand functionally-specific effects. We conclude that segmentation of spontaneous realworld speech in linguistically-motivated units is a promising strategy for elucidating the neuronal basis of mental processing during non-experimental communication.

**Keywords: natural behavior, parietal cortex, prefrontal cortex, electrocorticography, high gamma mapping, autobiographical memory, idea unit, speech production**

#### **INTRODUCTION**

Spontaneous language can reflect mental states and thus constitutes a fundamental link between externally-observable behavior and internal cognitive processes (Chafe, 1994, 2000, 2012). In the present study, we explored the utility of spoken language to investigate the neuronal correlates of higher-order cognitive functions. To this purpose, we analyzed real-world conversations from simultaneously-obtained video and intracranial electroencephalographic data.

Intracranial electroencephalography recorded for diagnostic purposes from the human brain includes both electrocorticography (ECoG) and stereo-electroencephalography and is now increasingly being used to study higher-order cognition. Such functions have been addressed as speech perception (Crone et al., 2001a; Canolty et al., 2007; Pasley et al., 2012) and production (Crone et al., 2001b; Towle et al., 2008; Bouchard et al., 2013), social interaction (Cristofori et al., 2012; Derix et al., 2012; Mesgarani and Chang, 2012; Caruana et al., 2013), and episodic (Burke et al., 2013) and autobiographical (Steinvorth et al., 2010) memory. Non-experimental ECoG approaches to study speech (Towle et al., 2008; Bauer et al., 2013; Ruescher et al., 2013) and social cognition (Derix et al., 2012) have lately been proposed, which allow studying brain activity of humans behaving in out-of-the-lab conditions. Recently, we presented a new approach to study real-life interaction between people based on ongoing simultaneous ECoG and monitoring video recordings obtained during pre-neurosurgical diagnostics of epilepsy (Derix et al., 2012). Such data encompass situations in which patients are engaged in naturalistic discourse and thus constitute a rich source of information about uninstructed, real-world social behavior. This allows conducting neurolinguistic studies based on concepts developed in psycholinguistic research on spontaneously-spoken language. ECoG is particularly well-suited for such investigations, as it combines a high temporal resolution with a high resistance against myographic artifacts (Ball et al., 2009; Derix et al., 2012).

For a start, we sought a way to break down long periods of continuous speech into comparable linguistic entities. Different approaches exist to split spoken language into meaningful constituents (Auer, 2010). For instance, segmentation into single words or phrase structures appears to be a direct and intuitive approach. Yet, if one aims to study such abstract phenomena as memory-related processing, longer units of about clausal length are most likely required (Dritschel, 1991). A speech unit of suitable length may be, e.g., the *prosodically*-defined "idea unit" (Chafe, 1980, 1985), later referred to as "intonation unit" (Chafe, 1994), which is identified in the course of speech by its cohesive intonation contour, or also the *syntactically*-defined "idea unit" (IU; Dritschel, 1991), identified as "a clause consisting of a finite verb plus all its modifiers." We used the latter segmentation approach to extract cognitive units from ongoing speech in the present study. As previous literature indicates that the human capacity for short-term memory roughly corresponds to the length of an average syntactic clause (Pawley and Syder, 1983), we hypothesized that the message contained in an IU might be processed as a single entity, and that the underlying neuronal activity would reflect such cognitively-meaningful pieces of information.

Segmentation of speech into IUs as defined above or into comparable entities proved useful in psycholinguistic research on memory (Stafford and Daly, 1984; Stafford et al., 1987; Bangerter, 2000; Cuc et al., 2006; Muller and Hirst, 2010). However, only few IU-based studies exist in neurolinguistics (autobiographical narratives: Braun et al., 2001; effects of prior knowledge in memory processing: Maguire et al., 1999). All of them were conducted experimentally, and although a unit-driven approach is particularly well-suited to investigate spontaneous discourse (Dritschel, 1991), we are not aware of any IU-based neurolinguistic investigation under real-world conditions.

We here thus aimed to explore whether IU-segmented spontaneous, non-experimental speech as it occurs during conversations of ECoG-implanted patients in pre-neurosurgical evaluation is suited to investigate the underlying cognitive and neuronal processes. We transcribed several hours of conversations per patient, extracted IUs from the transcriptions, and assessed whether the obtained IUs could be classified into groups with clearly-defined functional differences, and whether such groups were comparable in terms of their basic features, such as the average temporal duration and word count. To elucidate functional differences between the IUs, we assigned them to subclasses based on the presence and type of memory content according to a previously-devised IU taxonomy by Dritschel (1991). These IU classes with different content were finally used to identify the underlying neuronal differences.

High-frequency oscillations of population activity are caused by delayed inhibitory feedback (Brunel and Hakim, 1999; Brunel, 2000) and shape the oscillatory properties of pairwise neuronal correlations (Helias et al., 2013). Neuronal activity in the high gamma range reflects spiking processes (Ray et al., 2008; Manning et al., 2009) and constitutes a direct and robust temporal, spatial, and functionally-specific index of event-related cortical activation (Crone et al., 1998; Ball et al., 2008; Cheyne et al., 2008). Previous intracranial EEG studies show that high gamma activity is a reliable marker for cognitive processing (Crone et al., 2011), such as in expressive and receptive speech (Crone et al., 2001a,b; Sinai et al., 2005; Perrone-Bertolotti et al., 2012) as well as during the involvement of memory functions (Jensen et al., 2007; Sederberg et al., 2007; van Vugt et al., 2010). We therefore focused our analyses on spectral magnitude modulations of ECoG recordings in this signal component.

#### **MATERIALS AND METHODS**

#### **SUBJECTS**

Data were analyzed from three patients (two female: S1, S2 and one male: S3) who underwent temporary placement of intracranial electrodes for the purpose of pre-neurosurgical diagnostics of medically-intractable epilepsy. Electrodes were implanted for 1–3 weeks to localize the seizure onset zone and to evaluate the possibility of surgical treatment. The patients were videomonitored 24 h a day during this time period. All patients gave their informed consent that the recordings of neuronal activity and other data collected during the diagnostic procedure might be used for scientific purposes. The locations and numbers of implanted electrodes were determined with no concern of the present study and depended entirely on the individual clinical needs of the patients. All subjects had either left-hemispheric (S1 and S2) or bilateral (S3) speech dominance according to functional magnetic resonance imaging and electrocortical stimulation mapping (ESM). Subject details are summarized in **Table 1**.

#### **ECoG RECORDINGS**

The subdurally-implanted 8 × 8 platinum electrode grids had an inter-electrode distance of 10 mm and an electrode diameter of 4 mm. The grids of all three subjects covered parts of the left temporal, frontal, and parietal cortices (see **Table 1, Figures 3A**, **4A**, **5A**). Additional recordings were made in all subjects using subdural strip electrodes and/or depth electrodes. For comparability across subjects, we analyzed IU-category-related activity in grid recordings. Data from the subdural strips were inspected only for language mapping with high gamma activity (see below). ECoG was obtained using a clinical AC EEG-System (IT-Med, Germany) at a sampling rate of 1024 Hz. Data were high-passfiltered with a cutoff frequency of 0.032 Hz, and low-pass-, antialiasing-filtered at 379 Hz. Synchronous monitoring audio and digital video recordings were acquired at a sampling frequency of 25 Hz and at a resolution of 640 × 480 pixels.

#### **ESM**

ESM was performed using an INOMED NS 60 stimulator (INOMED, Germany). Pulse trains of 10 s consisting of pulses at 50 Hz and alternating-polarity square waves of 250μs were applied systematically to pairs of electrodes. Bipolar stimulation

**Table 1 | Subject details.**


*Hand., handedness; Lang. dominance, language dominance; F, female; M, male; L, left; A, ambidextrous; R\*, right-handed converted from left.*

was performed to identify non-overlapping pairs of electrodes with movement- and speech-related functions. The functionallyrelevant contact(s) of the pair was (were) further identified using monopolar ESM. The intensity of the stimulus was gradually increased until a sensory, motor, or speech effect was induced. If no sensory (e.g., tactile), motor (stimulation-evoked movement or transient inability to move), or speech-related response (i.e., transient impairment of speech production and/or comprehension) could be observed at 15 mA (18 mA for speech functions), the stimulation was interrupted. Areas involved in receptive and expressive speech were localized using a battery of six tasks: counting, execution of body commands, naming everyday objects, reading, repetition of sentences, and Token Test. The subjects were unaware of the stimulation timing until the occurrence of the aforementioned functional effects. All stimulations were performed by the medical personnel at the University Medical Center Freiburg. See Ruescher et al. (2013) for more information.

#### **ACQUISITION AND SELECTION OF IUs**

In the ongoing digital video recordings, time periods were identified in which the patients were engaged in uninstructed, spontaneous conversations with at least one person. Dialog partners were visitors (friends, family members, or life partners) or medical staff (physicians and nurses). During the selected time periods, the patients were awake, alert, and they actively participated in conversations. The patients were neither eating nor extensively moving. The data were selected such that no ESM was performed immediately before or during the analyzed time periods, and no epileptic seizure occurred at least 30 min before and 30 min after these periods.

#### **TRANSCRIPTION**

The audio signal was extracted from the digital audio-video mpg recordings of the selected conversation periods in wav-format using the Media Converter SA Edition 0.8. Orthographic transcriptions of the patients' speech were made by native speakers of German using PRAAT (Boersma and Weenink, 2014) whenever acoustic conditions allowed (i.e., when there were no strong background noises, no overlapping talk, and when the speech was loud and distinctive enough). The overall duration of the transcribed periods was 169.12 min for S1, 113.22 min for S2, and 36 min for S3, yielding 600 IUs in S1, 390 IUs in S2, and 141 IUs in S3 (**Table 2**).

#### **CLASSIFICATION OF IDEA UNITS (IUs)**

IUs were identified in the transcriptions based on the definition of an IU as "a clause consisting of a finite verb plus all its modifiers" (Dritschel, 1991, p. 320). Thus, our identification of IUs relied on grammatical, and not on prosodic or semantic characteristics. IUs were classified into different categories according to the taxonomy proposed by Dritschel (1991). As illustrated in **Figure 1**, the classification at the highest level distinguishes between *memory units (MUs)*, which "implicitly or explicitly refer to the past," (Dritschel, 1991, p. 320) and *non-memory units (nMUs)*. As Dritschel does not provide a definition of an nMU, we here defined this category as IUs which bear no explicit or implicit reference to the past. MUs were further subdivided into *personal memory units (PMUs)*, which, according to Dritschel (1991, p. 320) "implicate the self," and *non-personal memory units (nPMUs)* "for the remaining memory units." PMUs contain four subclasses: (1) *autobiographical fact units* with "some autobiographical/biographical information that need not be accessed in memory by event-related knowledge," (2) *prospective memory units* expressing "a memory for satisfying some future plans," (3) *metamemory units* relating "information about one's memory and the ability to access memories," and (4) *autobiographical memory units (AMUs)* containing "an implicit or explicit selfreference to a past event or a collection of past events" (Dritschel, 1991, p. 321). The latter subclass was further subdivided into (i) *actions* "describing physical activity(ies) done or observed," (ii) *evaluations* "expressing a previous interpretation of an action or feeling," (iii) *propositional attitudes*, that "follow a verb which is expressed in the past tense and explicitly denotes a belief, thought, attitude, doubt or intention," and (iv) *reported speech* "explicitly relating a statement made by a speaker in a previous conversation" (Dritschel, 1991, p. 326). See Supplementary Table 1 for examples of units in each of the described categories.

All IUs were assigned to the described categories by four independent raters. The raters were familiar with the taxonomy by Dritschel (1991) and evaluated whether an IU belonged to a category as either "positive," "negative," or "unclear." While Dritschel (1991) used only positive and negative ratings, we introduced this latter option to account for IUs in which the context was either not sufficient to arrive at a clear decision, or for cases in which the IUs could not be assigned to any of the available categories. The "miscellaneous" category was only used for such cases at the level of PMUs and AMUs in the study by Dritschel. Since ambiguities could occur at all levels of classification, the rating "unclear" was allowed for all categories of IUs in the present study.

To assess the inter-rater reliability of the classification system, Fleiss' kappa (κ) (Fleiss, 1971) was calculated separately for each patient for the different IU classes. The degree of inter-rater reliability was evaluated based on the resulting κ-values, as proposed by Landis and Koch (1977).

An IU was considered as belonging to a category if it was assigned to the same category with an inter-rater agreement of


tory signal and marked them manually in the neuronal recordings made simultaneously using the Deltamed Coherence PSG System (Paris, France). The median average durations, word counts, and their respective interquartile ranges (IQRs) were calculated in Matlab for the MUs, nMUs, PMUs, nPMUs, and AMUs. Statistical differences in the durations and word counts of the different IU classes were assessed using a non-parametrical Wilcoxon rank sum test (Gibbons, 2003), suited for unequal sample sizes (Sheskin, 2007). **DATA PREPROCESSING, TIME-FREQUENCY ANALYSIS, AND STATISTICS** The recordings from the ECoG grids were re-referenced to a common average reference (CAR) across all grid electrodes. Whenever strip electrodes were analyzed, the recordings from these electrodes were re-referenced to a CAR across all strip electrodes in the respective lobe. For all channels and for each IU, we calculated event-related, time-resolved spectral magnitude values in a time frame between 4000 ms before and 3000 ms after IU onset to accommodate the entire duration of the IUs (**Table 3**) and to enable the analysis of memory retrieval processes which may begin well before the actual IU onset. Like in our previous ECoG studies (e.g., Pistohl et al., 2012; Ruescher et al., 2013), we employed a multi-taper method (Percival and Walden, 2010), using time windows of 500 ms, sliding windows of 50 ms, and 3 Slepian tapers. Trial-averaged, time-resolved spectral magnitudes were calculated in each subject for all available IUs together, as well as separately for IUs from the MU, nMU, PMU, nPMU, and AMU categories. Relative spectra were computed by dividing the time-resolved magnitude by the median baseline amplitude for each frequency bin. The baseline period was chosen between 4000 and 3000 ms before the onset of the respective IU

For statistical comparison between IUs from two different categories, the spectral magnitude in each trial was averaged over a time window of interest [corresponding to the period (i) 1 s before to IU onset or (ii) starting from IU onset to 1 s after IU onset] and averaged across the analyzed range of high gamma frequencies (70–150 Hz). We also performed the same analysis on theta (3–5 Hz) and alpha (8–12 Hz) frequencies to establish whether low-frequency effects parallel high gamma responses. The data were statistically tested in each of the analyzed time windows using a Wilcoxon rank sum test. The resulting *p*-values were false discovery rate (FDR)-corrected for multiple testing (Benjamini and Yekutieli, 2001) across the number of grid electrodes (64) and time windows (2) at a *q*-level of 0.05. In this way, we compared the categories MU vs. nMU, PMU vs. nPMU, PMU vs. nMU, and AMU vs. nMU.

We additionally performed a single-trial decoding analysis using a regularized linear discriminant analysis described in Pistohl et al. (2012) to assess whether PMUs and nMUs could be differentiated based on neuronal activity in single trials. The

**Table 2 | Inter-rater**

 **reliability and the fractions of IUs in the different categories.**

category.

*were rounded to two digits after the decimal point.* at least 75% (i.e., either all four ratings were "positive," or three were "positive" and one "unclear" or "negative"). This threshold was selected to restrict further analyses to the IUs for which the majority of raters agreed on the classification.

We detected on- and offsets of all transcribed IUs in the audi-

**study.** Figure modified from Dritschel (1991). The abbreviations of category names are specified in brackets. The same colors as used here in the

distribution of these categories in our spoken data. Categories employed for ECoG data analyses are highlighted by solid blue boxes.

#### **Table 3 | Overview of the length of the different IU categories.**


*Median durations and the numbers of words and their respective IQRs are shown for each subject (S1, S2, S3) separately and for all subjects together ("All duration" and "All No words"). These average values were calculated as the median of the respective category in all subjects together. All values were rounded to two digits after the decimal point.*

decoding was performed for each subject and recording channel, based on averaged high gamma magnitude values in the time windows (i) and (ii) separately. Like in our previous studies (e.g., Derix et al., 2012; Pistohl et al., 2012), the decoding accuracies were normalized to correct for bias due to unequal sample size of IU subclasses by averaging the class-specific decoding accuracies of the two classes.

To address the possibility of differential neuronal effects between the analyzed IU classes due to differences in syntactic complexity, we further correlated the time- and frequencyaveraged high gamma activity in the same time windows as used for statistical comparison with the word count of the IUs, as word count has been shown to be a reliable measure for syntactic complexity (Szmrecsanyi, 2004). Resulting statistical values were FDR-corrected for multiple comparisons across electrodes at *q* < 0.05.

#### **HIGH GAMMA MAPPING (HGM)**

Since high gamma mapping has been suggested as a valuable adjunct to ESM to identify eloquent cortex in pre-neurosurgical diagnostics of epilepsy (Sinai et al., 2005; Leuthardt et al., 2007), we evaluated the topographic agreement of IU-related high gamma responses (below referred to as "high gamma mapping," HGM) with the speech and mouth motor areas identified using ESM. The trial-averaged spectral magnitudes were calculated as described above. For statistical analysis, we averaged the data over the first 500 ms after IU onset and across the selected range of high gamma frequencies (70–150 Hz). Additionally, responses in theta (3–5 Hz) and alpha (8–12 Hz) frequencies were analyzed. We used a sign test, FDR-corrected at *q* < 0.001 for multiple testing across electrodes. The sensitivity and specificity of the HGM with IUs were calculated as in Ruescher et al. (2013). In addition to the 64 grid electrodes shown in **Figures 3A**, **5A** for S1 and S2, respectively, one 1 × 6 and five 1 × 4-contact strips were analyzed in S1, and two 1 × 6 and four 1 × 4-contact strips in S2 (resulting in additional 26 and 28 in S1 and S2, respectively) for comparability with the HGM findings by Ruescher et al. (2013). The strip electrodes were implanted in frontal and interhemispheric areas. IU-related HGM results from these two subjects were compared to the sensitivity and specificity values from the data sets of non-experimental onsets of speech production analyzed in the same subjects by Ruescher et al. (2013). IUs in the present study and speech onset data in Ruescher et al. (2013) were extracted from only partially overlapping hours of conversation material, since the selection of the speech onset trials in this earlier study involved stricter inclusion criteria. For optimal comparability of the IUs with the data set reported in Ruescher et al. (2013), we reanalyzed this latter data set using the same parameters for spectral analysis as for the IUs.

#### **ANATOMICAL ASSIGNMENT OF ECoG ELECTRODES AND DEFINITION OF BRAIN AREAS**

A T1-weighted magnetization-prepared rapid-acquisition gradient-echo (MPRAGE) image was obtained from each patient during the implantation period using a 1.5-T Vision magnetic resonance imaging (MRI) scanner (Siemens, Erlangen, Germany). After normalizing the MR images to a standard brain in MNI (Montreal Neurological Institute) space using SPM5 (Friston et al., 1994), electrode void artifacts, as well as the central, postcentral, and lateral sulci were identified and marked manually. The MNI coordinates of electrode positions were extracted and used for anatomical assignment to cortical areas based on a probabilistic atlas system (see Pistohl et al., 2012 and Ruescher et al., 2013 for details). The inferior parietal cortex (IPC) was defined using the Anatomy Toolbox version 1.6 (Eickhoff et al., 2005) and included the areas PF, PFm, PFt, PGa, PGp, and Fop (Caspers et al., 2006).

#### **RESULTS**

#### **INTER-RATER RELIABILITY OF IU CLASSIFICATION**

Fleiss' kappa κ was calculated for each IU category (see **Figure 1**), to evaluate the agreement of the raters' assignments ("positive," "negative," or "unclear"). **Table 2** lists the κ-values for all subjects and categories. According to the interpretation of κ-values by Landis and Koch (1977), the average inter-rater agreement for MUs, nMUs, and PMUs was moderate (i.e., between 0.41 and 0.60), and the average inter-rater agreement for nPMUs and AMUs was fair (i.e., between 0.21 and 0.40). The κ-values for autobiographical fact units, prospective memory units, and metamemory units varied between moderate and fair inter-rater agreement. The subclasses of AMU reached higher agreement values than the other categories. The κ-values were substantial (i.e., between 0.61 and 0.80) for "action" AMUs and "evaluation" AMUs, and moderate for "propositional attitude" units. Assignment of "reported speech" AMUs elicited an almost perfect (κ = 0.9) inter-rater agreement. Further analyses were performed only with the IUs which elicited at least 75% of inter-rater agreement (see Methods). **Figures 2A–C** and **Table 2** provide an overview of the numbers of IUs per category for all three subjects, as well as their relative portion in the total number of IUs per subject.

#### **DISTRIBUTION OF IUs OVER CATEGORIES**

The contribution of MUs to all IUs was 57% on average. It was similar in S1 and S2 (53.3% and 57.7%), while the percentage for S3 was higher (70.9%). The proportion of nMUs varied between 25.5% (S1), 14.4% (S2), and 5.7% (S3), and was 19.2% on average.

Also the fraction of PMUs and nPMUs in MUs was similar across subjects (average 74% and 9.8% for PMUs and nPMUs, respectively), ranging from 71.6% (S1) to 77.8% (S2) for PMUs, and from 3.6% (S2) to 15% (S1) for nPMUs. The distributions of MUs, nMUs, PMUs, and nPMUs could not be directly compared to the findings of Dritschel (1991), since this latter study did not report quantitative results for these categories. All IUs analyzed in the present study for each patient were extracted from several transcriptions and analyzed together. In the study by Dritschel (1991), however, each transcription was analyzed separately. We thus re-calculated the overall distribution of IU subcategories across all transcriptions in the data reported by Dritschel (1991) in the same way as we did for our transcriptions (cf. our **Table 2** and Tables 2, 4 in Dritschel, 1991) for better comparability. AMUs formed the majority of PMUs in our data. The overall share of AMUs in the PMUs was 82.8% on average, and ranged between 81.2% in S1 and 89% in S3. The other categories of autobiographical fact units, prospective memory units, and metamemory units were sparsely present (on average 1.3%, 0.6%, and 2.9%, respectively). Overall, this proportion is consistent with the report by Dritschel.

The distribution of the four subclasses of AMUs ("action," "evaluation," "propositional attitude," and "reported speech") is shown in **Figure 2** for each subject (**Figures 2D–F**), as well as for all subjects together (**Figure 2G**). The "action" category comprised the largest part of AMUs in all subjects, consistent with the earlier findings by Dritschel (**Figure 2H**), although the 35.5% of "action" AMUs observed by Dritschel (1991) is smaller than in our data (average 60%). S1 has conspicuously fewer "action" AMUs (47.8%) than S2 and S3 (both 70.8%). Dritschel found 35.2% of all AMUs to be "reported speech" units, clearly more than we observed in our transcriptions (12.9%). The frequency of these units in the present study also varied between subjects, S2 used them less often (7.6%) than S1 (12.9%) or S3 (24.6%). 13.2% of our AMUs were "evaluations," slightly less than in Dritschel's data (17.3%). Only 2.5% of our AMUs were assigned to the "propositional attitude" category. This category was also underrepresented in Dritschel's transcriptions (6.7%). 11.4% of our AMUs remained unassigned, similar to the 6.2% of miscellaneous units in Dritschel's study.

#### **NUMBER OF WORDS PER IU AND IU DURATIONS**

**Table 3** summarizes the average word count and durations for all analyzed IUs together and for the categories MU, nMU, PMU, nPMU, and AMU separately. The average number of words in one IU was 5.6 ± 2.8. This value was comparable across subjects: 5.6 ± 3.1 words in S1; 5.4 ± 2.2 words in S2, and 6.4 ± 2.7 words in S3. Word count was comparable for all subcategories of IU in all subjects (Wilcoxon rank sum test, *p* > 0.2).

The average IU duration was 1271.5 ms (IQR 970.7 ms), varying between 1174.8 ms (IQR 852.4 ms) in S1, 1401.9 ms (IQR 848.7 ms) in S2, and 1683.6 ms (IQR 1444.3 ms) in S3. Durations

were comparable for some categories, e.g., for PMU vs. nPMU (*p* = 0.44 in S1, *p* = 0.63 in S2, and *p* = 0.11 in S3), while they differed for other categories such as nMU vs. MU (Wilcoxon rank sum test, *p* < 0.001 in all subjects).

#### **FUNCTIONAL TOPOGRAPHY OF ECoG RESPONSES**

Time-resolved spectral analysis of the ECoG signals during the production of IUs revealed characteristic and significant changes in the high gamma frequency band, mostly increases but also decreases (spectra for S1, S3, and S2 are shown in **Figures 3**–**5**). As expected, significant IU-related effects were mostly observed in brain areas implicated in language and mouth motor functions, including Broca's area (BA 44 and 45) and the premotor cortex (cf. **Figure 3A**, **4A**, **5A** for exact anatomical locations). Spectral magnitude changes in different IU categories are shown in **Figure 5** (S2) for electrodes with ESM-identified mouth motor functions localized by means of monopolar stimulation. Spectral response patterns were similar between categories, which likely reflects the predominance of common articulatory mechanisms involved in speech production regardless of the IU content.

The same analysis on theta and alpha frequencies elicited spatially sparser effects, some of which accompanied significantly increased high gamma activity in association areas including the IPC (increased theta activity at electrode A5 in S1, **Figure 3**), the superior temporal cortex (decreased theta activity at electrodes C5, C6, D4, E4 in S3, decreased alpha activity at electrodes D4, D5, E4 in S3; see **Figure 4**), and the lateral sulcus (decreased alpha activity at E5 in S3). Other effects in low frequencies which did not parallel those in high gamma frequencies took place in the superior middle and posterior temporal cortex (decreased theta activity at electrode H1 in S2, decreased alpha activity at electrodes H1–H5 in S2 (see **Figure 5A** for anatomical locations), and decreased alpha activity at electrode D4 in the superior middle temporal cortex of S3). There was decreased alpha activity at electrode H2 in the dorsal primary somatosensory cortex of S1 (see **Figure 3**). Decreased alpha activity could also be observed in S2 at electrodes F4 and G3 on the central sulcus and G4 in the IPC (an example of a spectral response at the electrode F4 is shown in Figure 5B, see Figure 5A for anatomical locations). Since lowfrequency effects showed a looser spatial correspondence with ESM-identified speech and mouth motor areas (**Figures 3**, **4**), we performed quantitative comparisons between ECoG and ESM effects selectively on high gamma frequencies.

#### **HGM AND ESM**

As overt speech production relies on both articulatory and cognitive functions, we considered electrodes with mouth-motor as well as cognitive speech-related effects in the comparison of ESM with HGM as potentially speech-relevant. HGM of IU trials revealed an average sensitivity of 44.4% and a specificity

procedure are color-coded (see legend). **(B)** Cortical responses underlying the production of IUs averaged across 600 trials. Each of the 64 panels corresponds to the respective electrodes in **(A)**. The left vertical dashed line marks the onset (0 s), the right vertical dashed line marks the average end of

color of the electrode outline (red, mouth motor; magenta, cognitive speech impairment upon electrostimulation). The gray semi-transparent lines indicate the positions of the lateral and central sulci identified in the individual MRIs of the subjects.

of 91.1%. Similarly, HGM performance with the speech onset data from Ruescher et al. (2013) reached a sensitivity of 43.3% and a specificity of 94.2%. All functional mapping results are summarized in **Table 4**. 5 electrodes per patient on average (5, 7, and 3 electrodes in S1–S3, respectively) showed significant high gamma responses which were not identified by ESM as responsible for speech or mouth movements.

#### **CATEGORY-SPECIFIC EFFECTS**

Alongside the category-independent responses described above, differences in high gamma activity were observed in the IPC region and in the prefrontal cortex (PFC) between IU categories with vs. without memory content. The magnitude of high gamma activity in the IPC region was consistently smaller in nMU trials than in MU/PMU/AMU trials both in the 1-s time interval before and after IU onset (**Figure 6**). These parietal effects were significant (see Methods) in electrodes E1 from S1 (MNI: -51/-49/43, probability for IPC (PFm): 50%, probability for IPC (PGa): 30%, probability for IPC (PF): 30%, cf. **Figure 3A** for electrode position) and F8 from S3 (MNI: -56/-22/42, probability for Brodmann area 2: 60%, probability for IPC (PFt): 60%, probability for Brodmann area 1: 50%, cf. **Figure 4A** for electrode

position). There was a similar IPC response pattern in electrode H7 from S2 (see **Figure 5A** for electrode position), yet the crosscategory differences in this subject were not significant. Out of all subjects and time windows, the identified difference between PMUs and nMUs could only be decoded in S1 and only with a relatively low accuracy of 61.6% during the second before IU onset (*p* = 0.0029, Bonferroni-corrected for the number of grid electrodes and the number of time windows).

We further detected one PFC electrode in S1 (**Figure 7A**, MNI: −36/20/50, no probabilistic assignment available, cf. electrode E8 in **Figure 3A**) with a significantly larger magnitude of high gamma activity within a second prior to the onset of IUs with memory content (MUs and PMUs, **Figure 7B** shows PMU-related effects) than without memory content (nMUs, **Figure 7C**). A similar significant effect could be observed in S2 at electrode A1 in the dorsomedial PFC. It consisted of significantly stronger high gamma activity during the first second after IU onset in the PMU vs. nMU and in the MU vs. nMU contrasts. There were also reproducible effects in alpha frequencies in the PMU vs. nMU comparison in the time window of 0–1 s relative to IU onset in both subjects. As opposed to the aforementioned gamma effects, the level of alpha activity was significantly lower in PMUs than nMUs at electrode D8 in S1 and at electrode A1 in S2.

Additionally, there were category-specific differences in high gamma activity in the anterior/middle (S3) and posterior (S1) superior temporal cortex in the time period of 0–1 s relative to the onset of IU production. Electrode B1 in S1 showed a significantly higher level of gamma activity in PMUs than in nPMUs, and electrodes A1 and B6 in S3 showed a contrary response with less gamma activity in the PMU than in the nPMU data. Electrode A1 in S3 had shown less gamma activity during conversations with the life partner than with the physician in our previouslypublished study (S3 in Derix et al., 2012). Thus, modulations

of gamma activity at this electrode may reflect self-referential processing.

correspond to: electrodes with significant high gamma responses D6, F4, E3 (all IUs together) and electrodes without significant high

High gamma magnitude at the IPC and PFC electrodes with the aforementioned memory-related effects showed no significant correlation (Spearman's correlation, *p*-values FDR-corrected at *q* < 0.05) with the number of words in the IUs. Therefore, an explanation of these differential responses by systematic differences in syntactic complexity defined as the number of words (Szmrecsanyi, 2004) is unlikely.

Reproducible effects could be observed in the temporal cortex in the theta frequency range. These consisted of stronger activation in PMU than in the nPMU data in the posterior superior (electrode B1 in S1, see **Figure 3A** for electrode location), middle superior (electrode B6 in S3), and anterior inferior (electrode A1 in S3) temporal cortex (see **Figure 4A** for electrode locations). Furthermore, significant differences in theta frequencies occurred between MU vs. nMU, PMU vs. nMU, and AMU vs. nMU categories in the anterior inferior temporal cortex of S3, who had been investigated in our earlier study (Derix et al., 2012). The increased levels of theta activity in the memory-specific IU conditions at electrode C2 in this subject are consistent with our previouslyexpressed hypothesis that theta responses in the anterior temporal lobe may reflect autobiographical mnemonic processing (Derix et al., 2012).

and nMUs, see **Table 3**), speech-related responses are comparable

All described effects were found outside the epileptic seizure onset zone and outside areas with language and mouth motor functions defined by the ESM and HGM procedures.

#### **DISCUSSION**

across categories.

Implementation of study paradigms which are relevant to and representative of real-world situations is of central importance to understanding natural human cognition (Kingstone et al., 2003; Zaki and Ochsner, 2009; Maguire, 2012; Przyrembel et al., 2012; Stanley and Adolphs, 2013). To be able to capture neuronal processes which are grounded in real-life experiences, researchers more and more frequently employ such stimuli as longer and


**Table 4 | Comparison of HGM of mouth motor and language functions for IUs and speech onset-related high gamma responses.**

*tp, number of true positive electrodes; tn, number of true negative electrodes; fp, number of false positive electrodes; fn, number of false negative electrodes (see Ruescher et al., 2013 for more information). Statistics for IU- and speech-onset-related responses within the first 500 ms after onset are reported. The high sensitivity and specificity of HGM for speech production indicate that the present settings for ECoG data analysis are well suited for speech mapping under non-experimental conditions. The relatively good sensitivity and specificity of the IU-based approach shows that it is also suitable for identification of cortical areas supporting speechrelated functions. The higher number of false positive electrodes in the IU data set than in the speech onset data suggests that additional processes may be involved in the production of IUs.*

**production of PMUs vs. nMUs in the inferior parietal cortical (IPC) region of S1 (electrode E1) and S3 (electrode F8). (A,B)** Show stronger high gamma (70–150 Hz) magnitude in PMU than in nMU trials. Magnitude differences in subjects S3 **(A)** and S1 **(B)** were significant at one electrode in the IPC region of each subject (Wilcoxon rank sum test, FDR-corrected at *q* < 0.01) before IU onset (−1 to 0 s; dark red traces for nMUs and dark blue traces for PMUs), and also during the production of the IUs (in the first second after IU onset; light red traces for nMUs and light blue traces for PMUs). Data were smoothed using a first-order Savitzky-Golay filter with a bandwidth of 42 Hz. Electrode positions are visualized on a standard brain from SPM5 based on their MNI coordinates in **(C)**, the approximate extent of the IPC is indicated in orange.

increasingly naturalistic text passages, Hollywood movies and video recordings of interacting individuals, or they place subjects in real-life-like environments such as highly detailed virtual simulations of face-to-face communication or traffic situations (see Spiers and Maguire, 2007; Mar, 2011; Borghini et al., 2012; Konvalinka and Roepstorff, 2012; Maguire, 2012; Schilbach et al., 2013 for reviews). Here, we explored the concept of "idea units" (IUs) as a way to get a handle on differential cognitive functions involved in non-experimental, real-life speech production. To this end, we transcribed continuous speech of ECoG-implanted patients, subdivided these data into syntactically-meaningful chunks of information (IUs), classified the obtained IUs according to their mnemonic content, and analyzed the underlying neuronal activity.

#### **APPLICABILITY OF THE INVESTIGATED IU CONCEPT TO SIMULTANEOUS ECoG/VIDEO DATA**

Spontaneous ECoG data are obtained for pre-neurosurgical diagnostics during everyday hospital life. While implanted with electrodes, patients are confined to bed for safety reasons and have to stay under constant video and audio surveillance by medical personnel. This can be expected to influence the patients' behavior and topics of conversation. Thus, we refrain from calling these unusual life circumstances "natural" but rather employ the terms "real-life" or "real-world." The total recording time is limited to the time period of invasive monitoring (1–3 weeks), yet the recorded social situations are diverse. It was our aim to establish how many IUs can be collected from such data, whether they can be subdivided into functional subclasses, and how these speech data compare to IUs produced by previously-reported subjects in non-clinical settings. Our results showed that the patients' everyday dialogs contained sufficient amounts of IUs for elaborate behavioral and neurophysiological analyses.

The IU approach applied in the present study (Dritschel, 1991) allowed classifying IUs according to the different types of mnemonic content with a fair to perfect inter-rater agreement (**Table 2**). There was very good agreement for the AMU subclasses "action," "evaluation," and "reported speech." Only fair agreement could be achieved for the categories nPMU, AMU, autobiographical fact units, and the AMU subclass "propositional attitude." We employed a threshold of 75% inter-rater agreement to define functional categories for consecutive analyses. With this

inclusion criterion, we were still able to obtain large numbers of trials in the major IU subclasses (**Figures 2A–C**), including MU, nMU, PMU, AMU, and "action" AMU. Thus, on the one hand, Dritschel's method could successfully be applied to our data. On the other hand, improvements are desirable in the reliability of ratings and in the level of detail of IU classification, for which further refined taxonomies (Bangerter, 2000; Cuc et al., 2006) and alternative segmentation methods (see Outlook) may be useful.

As to the distribution of IUs across the different subclasses, most IUs had mnemonic content (assigned to MU). The majority of them contained an explicit or an implicit reference to self (assigned to PMU), and most PMUs referred to a past experience (AMU). Most of those contained references to past actions ("action" AMU). Other PMU categories (metamemory units, autobiographical fact units, and prospective memory units) were covered only sparsely. Overall, our results are in keeping with those reported by Dritschel (1991) in healthy subjects during different real-life conversational situations. The somewhat larger share of "action" AMUs in our data than in the study by Dritschel (cf. Table 4 in Dritschel and our **Table 2**, **Figures 2D–H**) may reflect differences in the individual manner or contents of conversations between subjects, and/or it may be attributable to our more strict inclusion criteria.

The average number of words in an IU roughly corresponds to previous observations (Chafe, 1994). The number of words in our study was comparable across different IU subclasses, and the average durations were comparable across different MU subclasses (**Table 3**). Interestingly, the durations of MU subclasses were around 200 ms longer than those of nMUs (Wilcoxon rank sum test, *p* < 0.001). An explanation for this may be that speech with mnemonic content is slower due to memory retrieval processes. It may be interesting to address this putative difference in future psycho- and neurolinguistic studies.

The employed taxonomy allows classifying IUs according to several types of memories. Still, its major limitation is that it does not subdivide nMUs and nPMUs into further functional subclasses, which could provide useful counterparts to the different types of IUs with mnemonic and self-referential content. Theoretical research on subclassification of these kinds of IUs is hence desired. A further observation which may be relevant for future research is that, while there were many trials in the nMU and MU categories and in the two major (sub- )fractions of MUs (PMU and AMU), some IU classes are underrepresented in spontaneous communication. Since we obtained only few IUs from the available data and in the autobiographical fact units, in prospective memory units, in the different subclasses of AMUs (cf. **Figure 2** and section "Distribution of IUs over categories"), we did not perform further quantitative analyses on these types of IUs. Considerably more extensive amounts of spoken data will be required to elucidate the neuronal correlates of these IU classes during real-world communication.

As is illustrated in **Figure 1**, the taxonomy by Dritschel (1991) classifies IUs based on a single hierarchy. However, as one of the reviewers has pointed out, a fine-grained description of semantic differences between various kinds of units will most likely comprise multiple dimensions. Development of theoretical approaches to IU classification and tests of their biological validity will be a valuable endeavor to which various lines of research can contribute.

Our aim in the second part of the study was to elucidate the neuronal activity underlying IUs as defined above. In all subjects, we observed prominent neuronal activations related to the production of IUs in such speech-related brain regions as Broca's area, the superior temporal gyrus, and the premotor cortex (**Figures 3**–**5**). The topography of the observed effects was consistent with research in healthy subjects (Pulvermüller and Fadiga, 2010; Price, 2012). Like in previous experimental ECoG studies on speech production, there were significant increases in the high gamma band (Crone et al., 2001b; Towle et al., 2008), often accompanied by decreased onset-related activity in alpha frequencies (Wu et al., 2010; Toyoda et al., 2014), although low-frequency effects seldom reached significance in our analysis. High gamma responses in the majority of electrodes were most pronounced around the onset of IUs and persisted over the entire average duration of the IUs. The sharp and accentuated change of activity around IU onset was striking, considering that IU onsets did not necessarily coincide with the start of speech production. This might be an indication that IU boundaries indeed have clear representations in brain activity. Since we did not account for the temporal distance of IU onset to the start of the respective speech production epoch (e.g., in the sense defined in Ruescher et al., 2013), future research will be required to disentangle the contribution of articulatory onset-related effects from those specific to the onset of syntactic constructions. Importantly, IU-related neuronal responses were equally well-visible in all IU categories for which the amount of the gathered data permitted trial-averaged spectral analysis (**Figures 2A–C**). The similarity of responses across the different IU categories in ESM-identified mouth motor areas (**Figure 5B**) indicates articulation-related processes common to the production of all classes of IUs. Taken together, these findings provide initial evidence that IUs are useful and appropriate basic elements for investigating the neuronal correlates of speech production under real-world conditions.

#### **SUITABILITY OF THE PRESENT APPROACH FOR LANGUAGE MAPPING**

To establish whether the present IU-based approach is suited to identify cortical areas which support expressive language functions, we compared the topography of IU-related high gamma responses with the results of ESM, as well as with a data set of non-experimental speech onsets previously obtained for HGM of expressive speech (Ruescher et al., 2013). We found that IUrelated responses had a high specificity (91.1%) and a moderate sensitivity (44.4%) for speech/mouth motor areas identified using electrocortical stimulation (**Table 4**). Thus, the present IUbased approach may provide a promising starting point for the development of adjuncts to experimental as well as other nonexperimental approaches to define eloquent language cortex in pre-neurosurgical diagnostics (Ojemann and Whitaker, 1978; Sinai et al., 2005; Ruescher et al., 2013). Importantly, IU-related neuronal effects were not only observed in the classical speech areas but also in association areas including the PFC and the IPC regions. This suggests that there may be additional higher-order processes during IU production which may remain undetected by ESM. Further investigation is needed to address this issue.

The sensitivity and specificity of the proposed IU-based method was comparable to the HGM results obtained with a previously-published data set of speech onsets (Ruescher et al., 2013) in our S1 and S2 (P2 and P1 in Ruescher et al., 2013, respectively). Our re-analysis of the speech onset data using the same parameters as for IUs revealed a specificity of 94.2% and a sensitivity of 43.3% (**Table 4**). Interestingly, the present results for both data sets exhibited a higher sensitivity for ESM-defined speech areas than in the earlier report by Ruescher and colleagues. Note that the latter study aimed to develop a common mapping approach which would readily be applicable for mapping upper- and lower-extremity motor and language functions in a clinical environment. In the present study, however, we focused on optimizing the parameters of neuronal data analysis specifically for the purpose of language mapping. Exploration of other ECoG signal components, time windows, and alternative further parameters for neuronal data analysis may be of interest, as can be seen from the comparison of our speech mapping results (**Table 4**) with the effects observed by Ruescher et al. (2013). Together with their findings, our results suggest that optimal identification of speech and extremity motor functions may have different requirements for the analysis of ECoG data. This observation may be of consequence in achieving maximally-precise definitions of eloquent cortex in pre-neurosurgical diagnostics using HGM.

#### **CATEGORY-SPECIFIC IU-RELATED BRAIN RESPONSES IN THE IPC**

We observed differential modulations of activity in the parietal cortex depending on the presence or absence of memory content in the IUs. A comparison of the trial-averaged spectral magnitude in high gamma frequencies revealed consistent differences between nMU and MU/PMU/AMU trials in the IPC region, as is shown in **Figure 6** on the example of nMUs and PMUs. These differences occurred both before and after IU onset, they were significant in S1 and S3, and started prior to gamma activation measured in articulation-related areas.

The observed memory-related activity in the IPC region agrees well with results from previous studies pointing to an important role of the parietal cortex in mnemonic processing (Wagner et al., 2005; Vilberg and Rugg, 2008). According to Svoboda et al. (2006), the lateral parietal cortex and the temporo-parietal junction form integral parts of the autobiographical memory network. Notably, we found IU-related reduced high gamma activity in the IPC region in the non-memory condition, compared to a steadily higher level of activation in the memory trials (**Figure 6**). One would expect such a difference if the IPC supported ongoing memory-related processing which was briefly interrupted by the occurrence of non-memory content.

Vilberg and Rugg (2008) proposed that the parietal cortex may serve as an "episodic buffer" (Baddeley, 2000) responsible for binding information from sensorimotor systems and from longterm memory into a temporary episodic recollection. Following this notion, it seems plausible that the buffer "empties" during the processing of non-memory content, which may explain the reduced gamma-band responses related to the nMUs in our study. Future investigation is needed to address this putative memory-retrieving mechanism.

#### **CATEGORY-SPECIFIC IU-RELATED BRAIN RESPONSES IN THE PFC**

A handful of fMRI studies which have been conducted to identify the neuronal correlates of thoughts show that the PFC is sensitive to their content. For instance, Spiers and Maguire (2006a,b) proposed and evaluated an approach to study cognitive units by correlating the content of the subjects' utterances in *post-hoc* oral reports about their experiences during navigation in a virtual-reality environment with the neuronal activity recorded during these experiences. Among other effects, these authors reported higher levels of activation in the medial PFC related to route planning (Spiers and Maguire, 2006b) and during Theory-of-Mind (ToM) recollections (Spiers and Maguire, 2006a) compared with several other categories. In a different fMRI study from the same group, Bonnici et al. (2012) asked the subjects to recall rich and vivid memories of recent (2 weeks ago) or remote (10 years ago) events, and found that the latter were more readily detectable in the ventromedial PFC than the former ones.

In the present study, we observed significantly higher levels of high gamma activity for memory- vs. non-memory-related IUs in the PFC of both subjects in whom this cortical region had electrode coverage (S1 ans S2; **Figure 7** shows an example of such differential PFC responses for S1 in the PMU vs. nMU contrast). These effects are in line with previous neuroimaging literature pointing to the contribution of the PFC to mnemonic (Maguire et al., 1999; Spiers and Maguire, 2006a; Bonnici et al., 2012) and self-referential (Johnson et al., 2002; Mitchell et al., 2005; Cabeza and St. Jacques, 2007) processing, and they indicate that the PFC can be involved in personal memory retrieval not only in experimental circumstances but also during real-life conditions.

We assume that these category-specific high gamma effects in association areas cannot be explained by articulation- or movement-related differences between conditions for the following reasons: First, these effects occurred outside the sensorimotor cortex. Second, no significant differences in sensorimotor-cortical gamma responses between the investigated conditions mirrored gamma effects in association areas in any subject. Third, there were no correlations of high gamma responses with the IU word count.

#### **OTHER COGNITIVE FUNCTIONS**

Beyond mnemonic functions, the differential effects in the IPC and in the PFC regions may also be related to other higherorder processes. Both areas have been implicated in intention perception (Fogassi et al., 2005) and intentional behavior (Thinnes-Elker et al., 2012). As the production of mnemonic contents in real-world conversations usually corresponds to the speaker's effective intentions, the differences in neuronal activity between the investigated conditions may also be related to the speaker's *intentions* to express memory- vs. non-memory-related content.

The category-specific neuronal effects in the present study could not be explained by different numbers of words between IU categories. Word number has been shown to be a reliable index of syntactic complexity in quantitative linguistic research (Szmrecsanyi, 2004), and an explanation of the observed category-specific responses by systematic differences in syntactic complexity is therefore unlikely. However, since oral speech comprises various levels of description involving articulation, word retrieval, short-term working memory, coordination with communication partners, and a multitude of other processes (Price, 2012), systematic coupling of such communicationrelated features with mnemonic content is an important topic for future research. Further investigation of psychological and linguistic differences between cognitive units may be of interest. For example, one may classify IUs based on the degree and valence of emotional content, or based on their syntactic properties.

To sum up, the present non-experimental, IU-based approach shows that human cognition can be studied in a real-life environment, and that IUs provide a handle to quantitatively explore such higher-order functions as naturalistic mnemonic processing in the human brain. Our behavioral and neuronal findings indicate that IUs can be used to decompose long conversations into small, self-contained units which (i) elicit robust speech-related activations in the articulatory areas and (ii) reflect differential IU contents in higher-order association regions. Since IUs in realworld speech production comprise a wealth of information about natural human cognition, future research in this direction can be expected to shed more light on the neuronal basis of brain functions which enable social discourse in real-world conditions.

#### **OUTLOOK**

As summarized by Auer (2010), many other ways exist to segment spoken language into basic meaningful elements, e.g., according to its prosodic and semantic characteristics. Since prosody has been proposed to reflect the boundaries of thoughts more directly than sound-based elements of speech (Chafe, 2012), and considering that spontaneously spoken language contains "many instances in which prosodic and syntactic units fail to coincide" (Chafe, pers. commun.), an interesting question for future neurolinguistic investigations will be to explore the neuronal differences between units obtained using alternative segmentation approaches, and to find out the borders of which unit types are most clearly reflected in brain activity. Application of hierarchical clustering algorithms (e.g., unsupervised learning) on ongoing neuronal activity recordings during spontaneous speech may be used to assess the success of linguistic segmentation. Beyond spatially localized effects at the level of single electrodes, large-scale dynamical network states may provide segmentation-relevant information.

Single-trial decoding of IU-related activity in the present study proved difficult, as IU classes could only be decoded from single ECoG trials in one subject (S1) with an accuracy of 61.6%, and significant decoding of IU subclasses was not possible in the remaining subjects. Future analyses with different decoding algorithms, different features, or based on signals from electrodes with a higher spatial resolution such as micro-ECoG (Gierthmuehlen et al., 2011; Bouchard et al., 2013) may result in a better decoding performance. If feasible, decoding the content of IUs from single trials of neuronal activity may further aid restoration of intended speech output in paralyzed patients with articulatory impairments (Pei et al., 2011; Derix et al., 2012; Pasley et al., 2012).

Apart from the taxonomy by Dritschel (1991) which we have made use of in the present study, other approaches exist to classify units of cognition by their content. For instance, one may distinguish between IUs describing events vs. states, whether a reference is made to a situation which is immediate or displaced, whether it is factual or fictional, and if the given IU conveys a belief, intention, or desire. Chafe (1994) proposes these and several other ways of classification. It may be interesting in future neurolinguistic investigations into non-experimental, spontaneous communication to elucidate the neuronal activation patterns peculiar to these IU classes.

A further question that merits attention with regard to mnemonic processing is how different levels of recency are reflected in neuronal activity during recollection. Bonnici et al. (2012) performed a comparison of neuronal activation patterns in fMRI while the subjects thought about recent vs. old memories, and these authors obtained topographically-specific results in the ventromedial PFC and in the hippocampus. Future ECoG studies of spontaneous speech may classify the subjects' recollections in a similar way or perhaps attempt temporally more fine-grained differentiation. Since German is a language with multiple past tenses which could provide indications of recency of the recollected situation, tense information may be useful to detect IUs with different temporal references.

A relevant further question which can be addressed with the present approach would be whether memory retrieval in realworld conversations differs when the subject is directly asked about a past event, compared to a situation in which mnemonic processing is triggered intrinsically. What is the impact of the conversation partner on how many and which memories are accessed and how? Do neuronal mechanisms of memory retrieval differ when subjects talk about a topic discussed in the directly preceding utterance, compared to a new utterance which relates to a new topic? Studying these and many related questions can be possible by analyzing natural, uninstructed conversations. With regard to psycholinguistic research, Neisser stated in 1978 that "the naturalistic study of memory is an idea whose time has come." We assert that ECoG obtained during non-experimental communication is a rich source of information for cognitive studies in the neuroscientific domain.

Last but not least, the change in content of IUs over time may merit attention. As temporal sequences of IUs are intimately linked to the flow of thoughts in the course of spontaneous speech (Chafe, 1984, 2012), deciphering patterns in the temporal structure of IU production in larger speech epochs than investigated in the present study may be a way to address these cognitive dynamics. Linguistic approaches to information structure analysis (Heusinger, 1999) or psycholinguistic methods to identify more and less likely temporal patterns of cognitive unit precedence (Spiers and Maguire, 2008) may be useful to address this largely unexplored question in future neurolinguistic studies on real-world communication (Chafe, 2012).

#### **ACKNOWLEDGMENTS**

This work was supported by the German Research Foundation (DFG) grant EXC 1086 BrainLinks-BrainTools to the University of Freiburg, Germany. The article processing charge was funded by the DFG and the University of Freiburg within the funding programme Open Access Publishing.

#### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fnhum. 2014.00383/abstract

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 15 August 2013; accepted: 14 May 2014; published online: 13 June 2014. Citation: Derix J, Iljina O, Weiske J, Schulze-Bonhage A, Aertsen A and Ball T (2014) From speech to thought: the neuronal basis of cognitive units in non-experimental, real-life communication investigated using ECoG. Front. Hum. Neurosci. 8:383. doi: 10.3389/fnhum.2014.00383*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Derix, Iljina, Weiske, Schulze-Bonhage, Aertsen and Ball. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*