# TRENDS IN NEUROERGONOMICS: A COMPREHENSIVE OVERVIEW

EDITED BY: Klaus Gramann, Stephen H. Fairclough, Thorsten O. Zander and Hasan Ayaz PUBLISHED IN: Frontiers in Human Neuroscience

#### *Frontiers Copyright Statement*

*© Copyright 2007-2017 Frontiers Media SA. All rights reserved. All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.*

*The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.*

*Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.*

*Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.*

*As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.*

> *All copyright, and all rights therein, are protected by national and international copyright laws.*

*The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use.*

ISSN 1664-8714 ISBN 978-2-88945-203-3 DOI 10.3389/978-2-88945-203-3

#### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

#### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

#### Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

#### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# **TRENDS IN NEUROERGONOMICS: A COMPREHENSIVE OVERVIEW**

#### Topic Editors:

**Klaus Gramann,** Berlin Institute of Technology (TU Berlin), Germany & University of California, San Diego, USA **Stephen H. Fairclough,** Liverpool John Moores University, UK **Thorsten O. Zander,** Berlin Institute of Technology (TU Berlin), Germany **Hasan Ayaz,** Drexel University & University of Pennsylvania & Children's Hospital of Philadelphia, USA

This Research Topic is dedicated to Raja Parasuraman who unexpectedly passed on March 22nd 2015. Raja Parasuraman's pioneering work led the emergence of Neuroergonomics as a new scientific field. He combined his research interests in the field of Neuroergonomics which he defined as the study of the human brain in relation to performance at work and everyday settings. Raja Parasuraman was a pioneer, a truly exceptional researcher and an extraordinary person. He made significant contributions to a number of disciplines, from human factors to cognitive neuroscience. His advice to young researchers was to be passionate in order to develop theory and knowledge that can guide the design of technologies and environments for people. His legacy, the field of Neuroergonomics, will live on in countless faculties and students whom he advised and inspired with unmatched humility throughout the whole of his distinguished career. Raja Parasuraman was an impressive human being, a very kind person, and an absolutely inspiring individual who will be remembered by everyone who had the chance to meet him.

#### **About this Research Topic**

Since the advent of neuroergonomics, significant progress has been made with respect to methodology and tools for the investigation of the brain and behavior at work. This is especially the case for neuroscientific methods where the availability of ambulatory hardware, wearable sensors and advanced data analyses allow for imaging of brain dynamics in humans in applied environments. Methods such as: electroencephalography (EEG), functional near-infrared spectroscopy (fNIRS), and stimulation approaches like transcranial direct-currrent stimulation (tDCS) have made significant progress in both recording and altering brain activity while allowing full body movements outside laboratory environments.

For neuroergonomics, the application of brain imaging in real-world scenarios is highly relevant. Traditionally, brain imaging experiments in human factors research tend to avoid active behavior for fear of artifacts and a contaminated data set that would provide limited insight into brain dynamics in real working environments. To overcome these problems new analyses approaches have to be developed that identify artifacts resulting from hostile recording environments and movement-related non-brain activity stemming from eye-, head, and full-body movements. The application of methodology from the field of Brain-Computer Interfacing (BCI) for neuroergonomics is one approach that has significant potential to enhance ambulatory monitoring and applied testing. Passive BCIs allow for assessing aspects of the user state online, such that systems can automatically adapt to their user. This neuroadaptive technology could lead to highly efficient working environments, to auto-adaptive experimental paradigms and to a continuous tracking of cognitive and affective aspects of the user state. Hence, deployment of portable neuroimaging technologies to real time settings could help assess cognitive and motivational states of personnel assigned to perform critical tasks.

This Research Topic gathers submissions that cover new approaches in neuroergonomics. Different article type cover advanced neuroscience methods and neuroergonomics techniques as well as analysis approaches to investigate brain dynamics in working environments. The selection of papers provides insights into new neuroergonomic research approaches that demonstrate significant advances in brain imaging technologies that become more and more mobile, Moreover, a strong trend for new analyses approaches and paradigms investigating real work settings can be seen. Together, this unique collection of latest research papers provides a comprehensive overview on the latest developments in neuroergonomics.

**Citation:** Gramann, K., Fairclough, S. H., Zander, T. O., Ayaz, H., eds. (2017). Trends in Neuroergonomics: A Comprehensive Overview. Lausanne: Frontiers Media. doi: 10.3389/978-2-88945-203-3

# Table of Contents

# **Chapter 0: Introduction**


# **Chapter 1: Electroencephalography (EEG) Section 1: Electroencephalography (EEG) in Neuroergonomics**

*11 Mobile Brain/Body Imaging (MoBI) of Physical Interaction with Dynamically Moving Objects*

Evelyn Jungnickel and Klaus Gramann


Andreas Meinel, Sebastián Castaño-Candamil, Janine Reis and Michael Tangermann

*70 Evaluation of a Dry EEG System for Application of Passive Brain-Computer Interfaces in Autonomous Driving*

Thorsten O. Zander, Lena M. Andreessen, Angela Berg, Maurice Bleuel, Juliane Pawlitzki, Lars Zawallich, Laurens R. Krol and Klaus Gramann

# **Section 2: Brain-Computer Interfaces (BCI) in Neuroergonomics**

*86 An Intelligent Man-Machine Interface—Multi-Robot Control Adapted for Task Engagement Based on Single-Trial Detectability of P300*

Elsa A. Kirchner, Su K. Kim, Marc Tabie, Hendrik Wöhrle, Michael Maurus and Frank Kirchner


Daniel E. Callan, Cengiz Terzibas, Daniel B. Cassel, Masa-aki Sato and Raja Parasuraman

*141 Efficient Workload Classification based on Ignored Auditory Probes: A Proof of Concept*

Raphaëlle N. Roy, Stéphane Bonnet, Sylvie Charbonnier and Aurélie Campagne

*153 Gaussian Process Regression for Predictive But Interpretable Machine Learning Models: An Example of Predicting Mental Workload across Tasks*

Matthew S. Caywood, Daniel M. Roberts, Jeffrey B. Colombe, Hal S. Greenwald and Monica Z. Weiland

*172 Evaluation of an Adaptive Game that Uses EEG Measures Validated during the Design Process as Inputs to a Biocybernetic Loop* Kate C. Ewing, Stephen H. Fairclough and Kiel Gilleade

#### **Section 3: EEG in Multimodel Recordings**

*185 Neural Mechanisms of Inhibitory Response in a Battlefield Scenario: A Simultaneous fMRI-EEG Study*

Li-Wei Ko, Yi-Cheng Shih, Rupesh Kumar Chikara, Ya-Ting Chuang and Erik C. Chang


#### **Chapter 2: Functional Near-Infrared Spectroscopy (fNIRS)**


Cem Seref Bediz, Adile Oniz, Cagdas Guducu, Enise Ural Demirci, Hilmi Ogut, Erkan Gunay, Caner Cetinkaya and Murat Ozgoren

*259 Prefrontal Cortex Activation Upon a Demanding Virtual Hand-Controlled Task: A New Frontier for Neuroergonomics*

Marika Carrieri, Andrea Petracca, Stefania Lancia, Sara Basso Moro, Sabrina Brigadoi, Matteo Spezialetti, Marco Ferrari, Giuseppe Placidi and Valentina Quaresima

*272 Into the Wild: Neuroergonomic Differentiation of Hand-Held and Augmented Reality Wearable Displays during Outdoor Navigation with Functional Near Infrared Spectroscopy*

Ryan McKendrick, Raja Parasuraman, Rabia Murtza, Alice Formwalt, Wendy Baccus, Martin Paczynski and Hasan Ayaz

*287 Processing Functional Near Infrared Spectroscopy Signal with a Kalman Filter to Assess Working Memory during Simulated Flight* Gautier Durantin, Sébastien Scannella, Thibault Gateau, Arnaud Delorme and Frédéric Dehais

#### **Chapter 3: Stimulation Methods**

*296 Commentary: Cumulative effects of anodal and priming cathodal tDCS on pegboard test performance and motor cortical excitability*

Pierre Besson, Stephane Perrey, Wei-Peng Teo and Makii Muthalib

*299 Simultaneous tDCS-fMRI Identifies Resting State Networks Correlated with Visual Search Enhancement*

Daniel E. Callan, Brian Falcone, Atsushi Wada and Raja Parasuraman

*311 Transcranial Direct Current Stimulation Modulates Neuronal Activity and Learning in Pilot Training*

Jaehoon Choe, Brian A. Coffman, Dylan T. Bergstedt, Matthias D. Ziegler and Matthew E. Phillips

*336 Does a Combination of Virtual Reality, Neuromodulation and Neuroimaging Provide a Comprehensive Platform for Neurorehabilitation? – A Narrative Review of the Literature*

Wei-Peng Teo, Makii Muthalib, Sami Yamin, Ashlee M. Hendy, Kelly Bramstedt, Eleftheria Kotsopoulos, Stephane Perrey and Hasan Ayaz

*351 Corrigendum: Does a Combination of Virtual Reality, Neuromodulation and Neuroimaging Provide a Comprehensive Platform for Neurorehabilitation? – A Narrative Review of the Literature*

Wei-Peng Teo, Makii Muthalib, Sami Yamin, Ashlee M. Hendy, Kelly Bramstedt, Eleftheria Kotsopoulos, Stephane Perrey and Hasan Ayaz

#### **Chapter 4: Eye Movement Methods**

*352 High Working Memory Load Impairs Language Processing during a Simulated Piloting Task: An ERP and Pupillometry Study*

Mickaël Causse, Vsevolod Peysakhovich and Eve F. Fabre

*366 The impact of expert visual guidance on trainee visual search strategy, visual attention and motor skills*

Daniel R. Leff, David R. C. James, Felipe Orihuela-Espina, Ka-Wai Kwok, Loi Wah Sun, George Mylonas, Thanos Athanasiou, Ara W. Darzi and Guang-Zhong Yang

*377 The Role of Cognitive and Perceptual Loads in Inattentional Deafness* Mickaël Causse, Jean-Paul Imbert, Louise Giraudet, Christophe Jouffrais and Sébastien Tremblay

# **Chapter 5: Overview on Neuroscience Methods in Automation Research**

*389 From Trust in Automation to Decision Neuroscience: Applying Cognitive Neuroscience Methods to Understand and Improve Interaction Decisions Involved in Human Automation Interaction*

Kim Drnec, Amar R. Marathe, Jamie R. Lukos and Jason S. Metcalfe

# Editorial: Trends in Neuroergonomics

Klaus Gramann1, 2 \*, Stephen H. Fairclough<sup>3</sup> , Thorsten O. Zander 1, 4 and Hasan Ayaz 5, 6, 7

*<sup>1</sup> Department of Psychology and Ergonomics, Berlin Institute of Technology (TU Berlin), Berlin, Germany, <sup>2</sup> Center for Advanced Neurological Engineering, University of California, San Diego, San Diego, CA, USA, <sup>3</sup> School of Natural Sciences and Psychology, Liverpool John Moores University, Liverpool, UK, <sup>4</sup> Team PhyPA, Berlin Institute of Technology (TU Berlin), Berlin, Germany, <sup>5</sup> School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, PA, USA, <sup>6</sup> Department of Family and Community Health, University of Pennsylvania, Philadelphia, PA, USA, <sup>7</sup> Division of General Pediatrics, Children's Hospital of Philadelphia, Philadelphia, PA, USA*

Keywords: neuroergonomics, human-machine interaction, brain-computer interface, mobile brain/body imaging, physiological computing, EEG/ERP, NIRS/fNIRS, tDCS

**Editorial on the Research Topic**

**Trends in Neuroergonomics**

#### NEW METHODS IN NEUROERGONOMICS

This Research Topic is dedicated to Professor Raja Parasuraman who unexpectedly passed on March 22nd 2015.

Raja Parasuraman's pioneering work led to the emergence of Neuroergonomics as a new scientific field. Neuroergonomics is defined as the study of the human brain in relation to performance at work and everyday settings (Parasuraman, 2003; Parasuraman and Rizzo, 2008). Since the advent of Neuroergonomics, significant progress has been made with respect to methodology and tools for the investigation of the brain and behavior at work. This is especially the case for neuroscientific methods where the availability of ambulatory hardware, wearable sensors, and advanced data analyses allow for imaging of brain dynamics in humans in applied environments.

For neuroergonomics, the application of brain imaging in real-world scenarios is highly desirable as an investigation tool. Traditionally, brain imaging experiments in human factors research tend to avoid active behavior for fear of artifacts contaminating the signal of interest. Here, the development of new data analyses techniques as well as the combination of different methods providing complementary insights into brain and behavioral dynamics allow for new insights into the human-machine interaction. To overcome the problem of artifactual data in mobile recordings and to allow analyses of brain activity in real working environments new portable sensors and improved analyses approaches have to be developed. Hence, deployment of portable neuroimaging technologies to real time settings could help assess cognitive and motivational states of personnel assigned to perform critical tasks.

# THE "TRENDS IN NEUROERGONOMICS" RESEARCH TOPIC: A BRIEF INTRODUCTION

The eBook of this Frontiers Research Topic is divided into four sections, defined by the primary research methods used to address a variety of neuroergonomic research questions. The scientific topics range from air traffic control and automation, over mental load detection and the use of brain activity to control a system (brain computer interfaces, BCI), to physical work, rehabilitation, and training. Across the diverse research areas, the majority of studies in this Research Topic used electroencephalography (EEG), followed by functional near infrared spectroscopy (fNIRS),

Edited and reviewed by:

*Mikhail Lebedev, Duke University, USA*

#### \*Correspondence:

*Klaus Gramann klaus.gramann@tu-berlin.de*

Received: *07 February 2017* Accepted: *21 March 2017* Published: *05 April 2017*

#### Citation:

*Gramann K, Fairclough SH, Zander TO and Ayaz H (2017) Editorial: Trends in Neuroergonomics. Front. Hum. Neurosci. 11:165. doi: 10.3389/fnhum.2017.00165* reflecting the constantly growing use of these methods in neuroergonomics. At the same time this eBook clearly demonstrates a trend to investigate physical and cognitive activity outside standard laboratory settings, moving neuroergonomics "into the wild." In addition, traditional methods like the measurement of eye movements, pupil metrics, (ECG), and established imaging approaches like functional magnetic resonance imaging (fMRI) are used in combination with other methods to better understand the physiological responses to cognitive or physical tasks and their coupling to hemodynamic changes. The different sections include original research articles, but also reviews and opinion pieces. Here, we provide short summaries as an orientation for the interested reader.

The first section in this eBook thus comprises studies that used EEG to investigate ergonomic research questions with three studies using EEG in a mobile setting. The first of these by Jungnickel and Gramann uses a mobile brain/body imaging approach (MoBI; Makeig et al., 2009; Gramann et al., 2012, 2014) to investigate the brain and behavioral dynamics of human participants interacting with dynamically moving objects. The results indicate increased activity in parietal regions when active physical behavior as compared to standard laboratory button press behavior was required to respond to relevant changes in the environment. The findings point to changes in brain dynamic states dependent on the behavioral state. The study by Wascher et al. demonstrates that mobile EEG allows for a non-obtrusive assessment of mental fatigue in natural working situations. The authors investigate EEG variations time-locked to eye-blinks as a new tool to unobtrusively monitor cognitive processing in reallife environments. Mijovic et al. ´ also use EEG in a naturalistic work environment to show that instructed responses can increase attention as reflected in brain dynamics without changes in response parameters. The results point to the possibility to use instructed responses to increase attentional processing without compromising work performance in manual assembly tasks. Again, explicitly allowing movement of participants, Meinel et al. demonstrate that EEG can be used to improve motor rehabilitation approaches. Better performance in movement tasks can be achieved by identifying comodulation of different sources in the EEG before a movement is executed. The results provide participant-specific prediction of performance fluctuations that could be used to enhance neuroergonomic and rehabilitation scenarios. The last study by Zander et al. investigates how well a passive brain-computer Interface can work in an autonomous driving scenario using a dry EEG system. The results reveal comfort issues but acceptable usability of the tested EEG system and sufficient signal quality for use in an autonomous driving context.

A number of EEG-studies use the recorded brain electrical signals for system interaction through brain-computer interfaces (BCIs, Zander and Kothe, 2011). In this context, Kirchner et al. demonstrate that event-related potentials can be used on a single trial level to infer task-engagement of an operator controlling multiple robots and to adapt the man-machine interface to the individual operator. The results could be used to adapt the task load to operators with different qualifications or capabilities to avoid mental overload. Alonso-Valerdi et al. suggest that a wider variety of control commands in motor imagery-based BCIs might lead to an accelerate brain-computer communication while Callan et al. increase response speed in flight simulation by using a passive BCI based on MEG. The former study provides insights into the use of control commands to increase BCIbased system communication while the latter study demonstrates the potential to decode motor intention faster than manual control in response to hazardous change in the system interaction cycle. Roy et al. investigate mental workload based on auditory evoked potentials. The authors present a new minimal intrusive paradigm that paves the way to monitor operators' mental state in real-life settings to allow adaption of the user interface without interfering with the primary task. Caywood et al. increase the interpretability of BCI models by using the approach of Gaussian Process Regression for assessing cognitive workload. Ewing et al. describe the development of an adaptive game system that measures spontaneous EEG activity in real-time in order to adjust the difficulty level of the game. In two studies the concept of a biocybernetic control loop (Fairclough, 2009) is introduced in detail with a particular emphasis on validating EEG measures experimentally prior to their incorporation into an adaptive game system.

Studies using EEG in combination with other methods show that different physiological parameters can lead to an improved understanding of the construct under investigation. Ko et al. demonstrate the advantage of integrating the high resolution in the time and spatial domain for EEG and fMRI, respectively, in a stop-signal paradigm. Their results from multimodal recordings provide new insights into the complex brain networks underlying inhibitory control in naturalistic environments. Using EEG in combination with ECG and fNIRS, Ahn et al. investigate mental fatigue in drivers. They show that a combination of different physiological measures substantially improves the classification of sleep deprived or well-rested drivers. Scheer et al. investigate the demands on mental resources during a closed-loop steering task in simulated car driving scenarios. The results indicate an impact of steering demands on event-related EEG activity for task-irrelevant distractor probes allowing for an evaluation of mental workload in steering environments.

The second section summarizes the use of fNIRS in traditional stationary settings but also in new mobile applications. The first study of these, Von Lühmann et al. describe the development of a wireless and low-cost open source fNIRS hardware with details on system concept, hardware, software and mechanical implementation. The proposed system was tested in a mental arithmetic BCI experiment. Mandrick et al. discuss electrocortical and neurovascular measures with respect to the measurement of mental workload. They propose that EEG and fNIRS are complementary methods in the context of applied testing in the sense that the weakness of each approach, e.g., poor spatial and temporal resolution, respectively, is counteracted by the strength of the other. They argue for a combined fNIRS-EEG approach to index neurovascular coupling during the assessment of mental workload.

In Bediz et al. the effects of supramaximal exercise on cognitive task related oxygenation changes are investigated. Performance in a working memory task before and after exercise indicated higher task related activation changes in prefrontal cortex post-exercise and higher cognitive task related brain activation increase in high-performing participants. The study by Carrieri et al. investigates the neural correlates of a cognitive/motor task in a virtual reality (VR) environment. The findings support the use of VR in combination with fNIRS as a very good platform for neuroergonomic studies to objectively evaluate cortical hemodynamic activity. Mckendrick et al. utilize ultra-portable, wearable and miniaturized fNIRS sensors on participants walking outdoors in the open air. The batteryoperated miniaturized system implementation was described by Ayaz et al. (2013). The results of a spatial navigation task indicated greater mental capacity reserves for users with head mounted displays but also unwanted attention capture and cognitive tunneling as indicated by hemodynamics measures. The final contribution in this section by Durantin et al. provides evidence for using Kalman filter as a suitable approach for realtime noise removal for fNIRS signals in ecological situations and the development of BCI. The findings from working memory tasks indicate Kalman filter increased the performance of the classification of task load levels based on brain signal.

Section three comprises studies using stimulation methods alone or in combination with other methods to investigate stimulation-based improvement of motor or cognitive functions. The introductory commentary by Besson et al. prepares the stage for this section by providing a critical commentary on existing transcranial direct current stimulation (tDCS) protocols to enhance neuroplasticity and enhance performance in realworld settings. They argue that priming tDCS protocols have significant potential to improve learning and motor performance. Callan et al. demonstrate the simultaneous use of tDCS and fMRI to investigate the effect of neurostimulation on resting state functional connectivity and behavioral performance. The results reveal greater spontaneous resting state activity for the tDCS group with higher resting state functional connectivity for participants demonstrating performance improvement in a visual search task. Choe et al. investigate the use of tDCS along with multimodal neuroimaging (EEG and fNIRS) demonstrating that tDCS stimulation to the dorsolateral prefrontal or left motor cortex during flight simulation training enhances behavioral performance and changes neurophysiological measurements (EEG and fNIRS) indicating improved skill acquisition consistent with previous studies (Ayaz et al., 2012). The final contribution in this section is the review of Teo et al. providing a discussion of the theoretical framework underlying the use of VR in combination with neuroimaging and neuromodulation as a therapeutic intervention for neurorehabilitation. The authors provide evidence for the use of VR in treating motor and mental disorders such as cerebral palsy, Parkinson's disease, stroke,

#### REFERENCES

Ayaz, H., Onaral, B., Izzetoglu, K., Shewokis, P. A., McKendrick, R., and Parasuraman, R. (2013). Continuous monitoring of brain dynamics with functional near infrared spectroscopy as a tool for neuroergonomic research: schizophrenia, anxiety disorders, and other emerging clinical areas.

The fourth and last section in this eBook then covers research approaches using eye movement measures including pupilometric methods and other peripheral physiological methods like ECG. The first study in this section by Causse et al. uses EEG and pupilometry to investigate the impact of high working memory load on language processing during piloting. The results demonstrate high working memory load to disrupt visual and language processing and a subtle effect of congruency that was observable only at an electrophysiological level. In the study by Leff et al. the authors use a collaborative gaze channel (CGC) to detect and display trainer gaze behavior to trainees in surgery tasks. The results of a simulated robotic surgery task imply liberation of attentional resources with the use of CGC potentially improving the capability of trainees to attend to additional safety critical events during the procedure. Causse et al. then demonstrate that the pupil diameter correlates with inattentional deafness in an air traffic control task with varying perceptual and cognitive load.

The final contribution to this Research Topic is a review on neuroscientific methods in automation research by Drnec et al. The authors provide a comprehensive overview on neuroscientific methods in trust in automation research and summarize how neuroscience can improve interaction design.

## CONCLUDING REMARKS

Neuroergonomics has demonstrated an incredible development since the introduction of the field by Raja Parasuraman. This Research Topic demonstrates how different methods can be used to better understand the mind, body, and brain at work and to create and design systems that are better adapted to and make use of the human information processing structures, including the body and the brain. This research is important, because it allows for a human-centered design of environments that include natural behaviors.

#### AUTHOR CONTRIBUTIONS

All authors listed, have made substantial, direct and intellectual contribution to the work, and approved it for publication.

#### FUNDING

This Research Topic gathers, among other contributions, papers presented at the 11th Berlin Workshop Human-Machine Systems, 2016 supported by a grant from the German Research Foundation (DFG GR2627/6-1) awarded to KG.

Empirical examples and a technological development. Front. Hum. Neurosci. 7, 1–13. doi: 10.3389/fnhum.2013.00871

Ayaz, H., Shewokis, P. A., Bunce, S., Izzetoglu, K., Willems, B., and Onaral, B. (2012). Optical brain monitoring for operator training and mental workload assessment. Neuroimage 59, 36–47. doi: 10.1016/j.neuroimage.2011.06.023


Zander, T. O., and Kothe, C. (2011). Towards passive brain–computer interfaces: applying brain–computer interface technology to human–machine systems in general. J. Neural Eng. 8:025005. doi: 10.1088/1741-2560/8/2/ 025005

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Gramann, Fairclough, Zander and Ayaz. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Mobile Brain/Body Imaging (MoBI) of Physical Interaction with Dynamically Moving Objects

Evelyn Jungnickel <sup>1</sup> \* and Klaus Gramann1,2

<sup>1</sup> Department of Psychology and Ergonomics, Biological Psychology and Neuroergonomics, Institute of Psychology and Ergonomics, Berlin Institute of Technology, Berlin, Germany, <sup>2</sup> Center for Advanced Neurological Engineering, University of California, San Diego, CA, USA

The non-invasive recording and analysis of human brain activity during active movements in natural working conditions is a central challenge in Neuroergonomics research. Existing brain imaging approaches do not allow for an investigation of brain dynamics during active behavior because their sensors cannot follow the movement of the signal source. However, movements that require the operator to react fast and to adapt to a dynamically changing environment occur frequently in working environments like assembly-line work, construction trade, health care, but also outside the working environment like in team sports. Overcoming the restrictions of existing imaging methods would allow for deeper insights into neurocognitive processes at workplaces that require physical interactions and thus could help to adapt work settings to the user. To investigate the brain dynamics accompanying rapid volatile movements we used a visual oddball paradigm where participants had to react to color changes either with a simple button press or by physically pointing towards a moving target. Using a mobile brain/body imaging approach (MoBI) including independent component analysis (ICA) with subsequent backprojection of cluster activity allowed for systematically describing the contribution of brain and non-brain sources to the sensor signal. The results demonstrate that visual event-related potentials (ERPs) can be analyzed for simple button presses and physical pointing responses and that it is possible to quantify the contribution of brain processes, muscle activity and eye movements to the signal recorded at the sensor level even for fast volatile arm movements with strong jerks. Using MoBI in naturalistic working environments can thus help to analyze brain dynamics in natural working conditions and help improving unhealthy or inefficient work settings.

Keywords: mobile brain/body imaging, EEG, embodied cognition, independent component analysis, P300, oddball paradigm, MoBI

# INTRODUCTION

Studying human brain dynamics accompanying natural cognition (Gramann et al., 2014) works best by studying the brain under naturalistic conditions. The embodied cognition paradigm claims that the body's interactions with the world are an essential root of cognitive processes (Wilson, 2002). Thus it appears that perception and action should both be considered when studying

#### Edited by:

Lutz Jäncke, University of Zurich, Switzerland

#### Reviewed by:

Edmund Wascher, Leibniz Research Centre for Working Environment and Human Factors, Germany Irene Sturm, Berlin School of Mind and Brain, Germany

#### \*Correspondence:

Evelyn Jungnickel evelyn.jungnickel@tu-berlin.de

Received: 08 January 2016 Accepted: 03 June 2016 Published: 27 June 2016

#### Citation:

Jungnickel E and Gramann K (2016) Mobile Brain/Body Imaging (MoBI) of Physical Interaction with Dynamically Moving Objects. Front. Hum. Neurosci. 10:306. doi: 10.3389/fnhum.2016.00306 cognitive processes and their neural basis. However, conventional neuroimaging studies consider electrical potentials generated by eye movement or muscle activity during physical movements as artifacts that have to be avoided not to contaminate the signal of interest. This view led to experimental setups that restrict participants' mobility and require them to sit still or lie even in tasks that would require standing or moving (Makeig et al., 2009; Gramann et al., 2011, 2014). These constraints are changing the way information is perceived and processed by the human agent as becomes obvious, for example, with respect to the integration of proprioceptive and vestibular information (Gramann, 2013). This kind of idiothetic information is absent when movement is restricted or altered in case the body orientation differs from its natural state for a particular task. Following the embodied cognition approach those alterations will change the concurring cognitive processes and thereby lead to different brain activity.

Neuroergonomics as the scientific study of the human brain in relation to performance at work and everyday settings (Parasuraman, 2003) is faced with the challenge to investigate the brain dynamics in environments that require physical interaction of the operator with a system. New insights into brain activity during physical human-machine interaction allow for the improvement of systems to adapt to the operators' physical and cognitive resources (see e.g., Wascher et al., 2016; Mijovic et al., 2016). However, traditional brain imaging approaches do not allow for any kind of movement (Makeig et al., 2009; Gramann et al., 2011). Mobile brain/body imaging (MoBI), in contrast, is a general research approach that embraces a variety of (the best fitting) hardware and software solutions to record and analyze brain dynamics in actively behaving participants. Lightweight and mobile sensors like electroencephalography (EEG) or near infrared spectroscopy (fNIRS) agree with experimental paradigms using a MoBI approach to study the brain and body dynamics that accompany natural cognition and behaviors including physical interaction with an environment (Mehta and Parasuraman, 2013; Gramann et al., 2014). While fNIRS provides relatively high spatial resolution of a restricted cortical surface, this methods lacks the high temporal resolution that is desirable when investigating fast cognitive processes. EEG provides the necessary temporal resolution but has only limited spatial resolution. However, recent investigations using MoBI have demonstrated that equivalent dipole reconstruction of independent components (ICs) as decomposed by independent component analysis (ICA) allow for reconstructing the origin of EEG activity with reasonable spatial accuracy (Gramann et al., 2010a; Acar and Makeig, 2013). In conclusion, mobile EEG allows for an investigation of cognitive processes in working environments with high temporal resolution and with sufficient spatial resolution to allow for conclusions regarding the underlying cortical sources and their neuroanatomical function. Such a MoBI approach no longer considers eye movements and muscle activity as artifacts but as aspects of cognitive activity associated with the accomplishment of a task (Gramann et al., 2010a). By using high density EEG recordings synchronized with motion tracking of participant's movements and data-driven analyses methods it overcomes existing imaging restrictions and enables participants to behave more naturally (Makeig et al., 2009; Gramann et al., 2011, 2014).

First MoBI studies investigated participants walking and running on a treadmill and clearly demonstrated that brain activity can be analyzed under such conditions (Gramann et al., 2010a; Gwin et al., 2010, 2011). However, walking is a highly symmetric recurrent behavior that does not include fast movements associated with jerk. Stereotyped movements like walking further allow for extracting templates of artifacts based on recurrent movement patterns (Gwin et al., 2010). It is important to investigate to what extent MoBI can be used to measure and analyze brain dynamics during nonstereotyped and aperiodic behaviors that include sudden orientation movements or manual interaction with dynamic systems. First, such an approach could be used to determine how much traditional brain imaging results restricting participants' movements deviate from results in actively moving, more naturally behaving participants. Secondly, if feasible, such an approach would significantly increase the number of conceivable neurocognitive studies especially in the fields of physical ergonomics and in human-machine interaction that require physical manipulation. Insights gained from MoBI studies comprising natural recurrent and non-stereotyped movements would thus open up new vistas for investigating cognition and action within the field of Neuroergonomics and beyond.

This study investigated the feasibility of MoBI during physical interaction with a dynamic system based on nonstereotypical fast movements. The setup mimicked real-world working environments that require physical interaction in a dynamically changing system. Dynamic changes in the system were simulated using a three-stimulus visual oddball paradigm (Grillon et al., 1990) with participants reacting either by simple button presses or by pointing at the moving stimulus. We examined whether it is possible to record and analyze an eventrelated P3 component during rapid pointing movements that include strong eye movement and neck muscle activities. To this end we compared event-related potentials (ERPs) at the sensor level with ERPs back projected from ICs that decomposed the sensor data into maximally statistically independent time source series using ICA. By separating brain processes from activity generated by muscles and eye movement and comparing these to the scalp recorded potential allowed for a direct comparison and evaluation of the feasibility of standard sensor based analyses approaches during active pointing movements. In addition, isolation of brain related activity patterns and their contribution to the surface signal allowed for a quantification of how much certain ICs representing brain processes contributed to the surface signal.

# MATERIALS AND METHODS

## Participants

Data was collected from 15 healthy right-handed adult volunteers (7 females, 8 males) with a mean age of 26.1 years (σ = 2.9). All participants had normal or corrected to normal vision,

none reported a history of neurological disease and all provided written informed consent before the experiment in compliance with the standards as defined in the Declaration of Helsinki. The study was approved by the local ethics committee of the Institute of Psychology and Ergonomics of the Berlin Institute of Technology according to the guidelines of the German Psychological Society. Volunteers were compensated 12 e/h for their participation. Due to technical issues the behavioral data of three participants had to be excluded from further analysis and all results reported are based on the final group of 12 participants.

# Experimental Design and Procedure

Participants stood in front of a projection screen (W × H: 1.2 m × 1.0 m) with a light gray background placed one arm length in front of them (**Figure 1**). Participants had to attend to a threestimulus visual oddball paradigm and were asked to react to color changes of a moving sphere by either pointing to the stimulus with their right index finger (physical pointing condition) or pressing a response button (button press condition) on a Bluetooth remote (Logitech wireless presenter R400, Logitech, Apples, Switzerland). The response conditions were blocked and block order was counterbalanced across participants. Each response condition consisted of five blocks with 50 trials each. Breaks between blocks within each response condition were adapted to the participants needs.

Every trial began with a black sphere (ø14 cm) moving from the middle of the screen in a randomly chosen direction and being reflected from the borders of the projection screen. Color changes took place uniformly randomized between 1 and 5 s after onset of a trial. A change from black to blue indicated a target stimulus (15%), a change to green indicated a distractor stimulus (15%), and a change to yellow indicated a standard stimulus (70%). Participants were instructed to react as fast and correct as possible to the onset of the target color. After a response, or after 4 s in case no response was given, the sphere stopped moving and remained on the screen for 500 ms. Thus, the trial duration for correct non-target trials ranged from 5.5 to 9.5 s with an average duration of 7.5 s. For target trials the mean trial duration was shorter because button presses or pointing movements were executed before the 4 s time window closed. Thus, the duration of target trials depended on the response onset, movement speed and movement path. Altogether the experiment lasted about 1 h.

#### EEG Recording

The EEG was recorded from 156 active electrodes referenced to Cz with a sampling rate of 500 Hz and band-passed from 0.016 Hz to 250 Hz (BrainAmps and Move System, Brain Products, Gilching, Germany). To allow for recording of neck muscle activity resulting from participants' head movements, 28 electrodes were placed around the neck using a custom neck band (EASYCAP, Herrsching, Germany). The remaining 128 electrodes were placed on the head using an elastic cap with a custom design (EASYCAP, Herrsching, Germany). Electrode impedances were brought below 7 kΩ. Due to a technical problem the neck EEG data of one participant was not recorded. Individual electrode locations were recorded using an optical tracking system (Polaris Vicra, NDI, Waterloo, ON, Canada).

# Motion Capture Recordings

Motion was captured using six cameras tracking the position of 16 red active LEDs (Impulse X2 System, PhaseSpace Inc., San Leandro, CA, USA) placed on the shoulders, the chest, and the right arm as well as the right index finger of the participants. The motion tracking system generated a data stream containing x, y, and z location and a reliability value for each LED with a sampling rate of 480 Hz. Before each data acquisition the screen position and orientation was calibrated to align with the motion capture coordinate system.

All data streams, namely EEG, motion capture, events from the experimental protocol, and behavioral data, were synchronized and recorded using the Lab Streaming Layer Software (Kothe, 2014).

#### Behavioral Analysis

In the physical pointing condition, online tracking of the LED on the participants' right index finger allowed to stop sphere movement as soon as the distance between the LED and the projection screen was smaller than 10 cm (labeled ''hot zone'' in **Figure 1**). This information was also used to create corresponding event markers. The LED was placed 5 cm apart from the fingertip approximately over the proximal phalanx of the index finger. The distance of 10 cm was chosen to avoid damage to the setup due to impact of the participants' finger with the screen. Because of occlusions the position of the finger LED was not recorded correctly in some trials and event markers were generated that did not match the movement profile of the participant. For the statistical analyses only trials with consistent event markers and motion tracking data of the right index finger were considered. This led to the exclusion of about 34.4% of the trials per participant in the physical pointing condition (range: 5.2–69.9%, σ = 21.7%) with the highest percentage of removals in standard (x¯ = 38.0%) and distractor trials (x¯ = 38.3%) that required no response. In these cases event markers indicated a movement even though in most cases the velocity profile did not indicate a response. In case of target trials on average only 13.1% were rejected.

To calculate velocity profiles from the motion capture data the MATLAB toolbox MoBILAB (Ojeda, 2011) was used. Occluded samples for each LED were interpolated by using spline interpolation and the data stream was smoothed by applying a 6 Hz low-pass zero phase distortion FIR filter before computing the velocity data. Subsequently the velocity profiles in the z-dimension of the LED placed on the index finger were analyzed with custom MATLAB scripts detecting pointing movements in the physical pointing condition on the basis of velocity peaks. To identify response movements, only the z-axis of the motion capture data was used indicating motions towards or away from the screen. This excluded smaller movements not related to the response. Based on velocity peaks defined as maximum positive deflections preceding and being followed by lower values, the onset, and offset of the corresponding movement were defined. For each color change the time window from 200 to 1800 ms after stimulus onset was selected to exclude movements unrelated to the stimulus response. As estimated from visual inspection only peaks with a velocity of at least 22% of the participants' maximum finger velocity in the physical pointing condition were regarded. This excluded smaller jerks and other movements not related to the pointing behavior. The definition of the movement onset is important in this context since its time-lag to the color change was taken as response time and used for further statistical analysis. The earliest movement onset was defined as the time point with a velocity of 5% of the subsequent peak velocity. To allow for a more conservative comparison of response times in the physical pointing condition with response times in the button press condition, increasing percentage values (>5%) of the subsequent maximum peak velocity were analyzed. The resulting movement onset distributions were then compared to response time distributions in the button press condition where no velocity profiles or force time-series could be derived. Response time statistics were calculated by means of a oneway analysis of variance (ANOVA) with subsequent correction for multiple comparisons using honestly significant difference (HSD) contrasts (Tukey, 1949).

# EEG Analysis

#### EEG Data Preprocessing

Data analysis was done by custom Matlab scripts based on the open source EEGLAB toolbox<sup>1</sup> (Delorme and Makeig, 2004) . **Figure 2** shows a flow chart explaining the whole data processing pipeline. The data was filtered using a high-pass filter (1 Hz) and a low-pass filter (120 Hz) and subsequently down sampled to 250 Hz. Single channels and time periods containing artifacts were removed by visual inspection of the data. Eye movements were not considered as artifacts. Artifact rejection was performed with an EEGLAB function automatically removing channels in case they contained zero activity for more than 5 s or revealed a correlation value below 0.6 with neighboring channels and time windows containing more than 30% noisy channels. On average, 132 EEG channels remained for further analyses (range: 114–142; σ = 8.1).

In a next step the data was re-referenced to an average reference and then parsed into maximally temporally independent and spatially fixed components (ICs; Makeig et al., 1996) using an adaptive ICA mixture model algorithm (AMICA; Palmer et al., 2006, 2008) which is a

<sup>1</sup>http://www.sccn.ucsd.edu/eeglab

generalization of former ICA approaches as the infomax (Bell and Sejnowski, 1995; Lee et al., 1999a) and the multiple mixture approach (Lee et al., 1999b; Lewicki and Sejnowski, 2000). After the first iteration the model was trained for 10 iterations rejecting time windows with a likelihood below 4 standard deviations (SDs). For the remaining parameters the default settings were used (Palmer, 2016).

Frontiers in Human Neuroscience | www.frontiersin.org June 2016 | Volume 10 | Article 306 |

For each IC an equivalent dipole model was computed using a boundary element head model (BEM) based on the MNI brain (Montreal Neurological Institute, MNI, Montreal, QC, Canada) as implemented by DIPFIT routines (Oostenveld and Oostendorp, 2002). To this end corresponding landmarks (nasion, ion, vertex and ears) were aligned by rotating and rescaling each individually measured electrode montage. The use of an average head model decreases the accuracy of source localization and thus we refer to the approximation of the spatial origin of surface activity using the description ''in or near'' a specific structure. ICs primarily accounting for brain, eye or neck muscle activity were selected for further analysis based on their time courses, spectra, and scalp topographies as well as the location and residual variance of their corresponding dipoles. Dipoles placed outside of the head model were not further considered. This resulted in 594 remaining ICs for all participants with an average of 49.5 ICs per subject (range: 31–92, σ = 17.3, P = 594). The weights and spheres returned from the AMICA decomposition were copied to the down sampled, high- and low-pass filtered continuous EEG data excluding the same channels that were excluded for ICA decomposition. Missing channels were interpolated.

#### EEG Group Level Analyses

The continuous data was epoched into 3 s long epochs with onset of a color change including a 1 s pre-stimulus baseline. Only epochs with correct responses were included in the study. Artifactual epochs containing fluctuations above 1000 µV or data values outside of 5 SDs on the sensor level were rejected in an iterative fashion keeping at least 95% of the total trial numbers per iteration. The remaining epochs ( x¯ = 370.5 per participant, σ = 53.1) were subsequently combined into a study. The study comprised a 2 (response condition) × 3 (stimulus type) factorial design providing main effects for the two independent variables as well as their interaction.

Distances between all ICs were calculated with the weighted measures of ERP, power spectrum (for a frequency range of 3–75 Hz), event-related spectral perturbations (ERSPs), intertrial coherences (ITCs), the components' scalp maps and their equivalent dipole model locations using the EEGLAB preclustering function. For all measures (except dipole location with only three dimensions) a principal component analysis (PCA) reduced the dimensionality to the first 10 principle components. The resulting measures were normalized, weighted and combined into cluster position vectors. Dipole locations were weighted by a factor of 25 to promote spatially tight clusters and to compensate for its low dimensionality. ERSPs were weighted with a factor of 10 as they were assumed to express the most relevant time-varying information regarding the task. All other measures were weighted with the standard weighting of 1. Subsequently a PCA restricted the resulting cluster position vectors to a 10-dimensional subspace.

Clustering was done via a K-means algorithm implemented in EEGLAB with the number of clusters set to 36. By default, ICs with a distance of more than 3 SDs to the mean of any cluster centroid in joint measure space were assigned to an outlier cluster. The same was done manually for ICs if a cluster contained more than one IC of a participant relying on the same measures as for the calculation of the cluster position vectors. The residual variance of the equivalent dipole models of the remaining ICs was about 10.5% for all ICs representing brain processes (range: 1.3–47.7%, σ = 7.4%) and about 23.7% for all other ICs (range: 2.8–69.1%, σ = 14.3%). Overall, 302 ICs were assigned to the outlier cluster and 292 ICs were assigned to the other clusters (range: 20–30, x¯ = 24.3, σ = 2.6 ICs per participant). Of those 292 ICs, 106 ICs revealed equivalent dipole locations within the gray matter of the head model (range: 7–11, x¯ = 8.8, σ = 1.4 ICs per participant).

# RESULTS

# Behavioral Data

An exemplary velocity profile for one physical pointing response with corresponding events derived from the velocity profile and the system generated markers is displayed in **Figure 3** illustrating a typical pointing movement. In most cases, movements towards the screen were faster than the subsequent backward movements to the initial position.

Response times were significantly faster in the physical pointing condition (x¯ = 383.1 ms, σ = 40.7 ms) than in the button press condition (x¯ = 515.8 ms, σ = 52.9 ms) when response onsets in the physical pointing condition were defined as starting at 5% of the subsequent peak velocity (p < 0.001). The means for each condition and participant are shown in **Figure 4**. Significant differences in response onsets between the

FIGURE 3 | Pointing movement velocity profile as a function of time with corresponding markers. The y-axis displays the z-component of the velocity in m/s with positive values corresponding to motion towards the screen. The blue vertical line indicates a color change of the moving sphere to the target color. The green and magenta vertical lines indicate the movement onset and offset, respectively. The red vertical line indicates the velocity peak. The black vertical line indicates a distance between LED and projection screen below 10 cm.

two response conditions were observed up to a threshold of 53% of the subsequent peak velocity (p < 0.05; x¯ = 474.3 ms, σ = 41.5 ms).

Response accuracies were very high with an average of only 0.24% and 7.99% false alarms to color changes indicating a standard stimulus in the button press and physical pointing condition, respectively. Incorrect responses to distractors revealed comparable tendencies with 1.13% and 7.47% false alarms for button presses and physical pointing responses, respectively. In cases of color changes indicating a target stimulus only 0.20% misses were observed for the button press condition and no incorrect responses at all (0%) in the physical pointing condition. While for both standard and distractor stimuli more incorrect responses were observed in the physical pointing condition, target stimuli were associated with less incorrect responses when participants had to point at the moving object. However, only 3 out of 12 participants committed errors in the physical pointing condition while eight participants committed errors in the button press condition. Due to the absence of incorrect responses in the majority of participants no further statistical analyses was conducted. **Table 1** displays mean and standard deviations of response errors in all conditions.

TABLE 1 | Means and standard deviations of response errors for all conditions.


#### EEG Data

Rapid volatile pointing movements were associated with increasing artifactual activity stemming from both physiological and mechanical sources. To correct for artifactual activity, the EEG signal was cleaned in the time and channel domain (see ''Materials and Methods'' Section). Cleaning in the channel domain revealed a specific topography for channels with a high probability to be removed. **Figure 5** displays the probability for each channel to be included in the analysis plotted with respect to its scalp position.

Channels were most likely to be removed in five different regions of the montage with the highest likelihood of removal for channels located to the left and right posterio-inferior locations in the montage. One position over the midline located near Cz and two lateralized areas around FT7 and TP8 also showed a high likelihood of channel removal. On average, a subset of 24 channels were removed from the montage before further data analyses (range: 14–42).

# Event-Related Potentials on the Sensor Level

Changes in the color of the moving sphere were associated with ERPs including a late positive complex at parietal sensors in the time range of the P3. **Figure 6** displays ERPs with onset of color changes indicating standard, distractor, and target stimuli for the button press and the physical pointing condition for the electrode closest to the parieto-central electrode of the international 10–20-system (referred to as Pz' in the following). To investigate differences in the P3 component measured at the scalp, mean amplitudes in the time range from 400 to 800 ms after a color change were submitted to a 2 (response condition) × 3 (stimulus type) repeated measures ANOVA.

Greenhouse-Geisser corrected p-values are reported in case of non-sphericity. The results revealed a significant main effect of stimulus type (F(2,22) = 8.58, p = 0.010, η <sup>2</sup> = 0.343) and a tendency for the response condition (F(1,11) = 3.32, p = 0.084, η <sup>2</sup> = 0.247) but no interaction effect (F(2,22) = 2.99, p = 0.208, η <sup>2</sup> = 0.133). Post hoc HSD contrasts (Tukey, 1949) revealed that the P3 amplitude for targets in the pointing condition was significantly higher than for standards in both response conditions (all ps < 0.009) as well as distractors in the button press condition (p = 0.02). Comparing P3 amplitudes for targets and distractors in the pointing condition revealed only a trend towards significance (p = 0.09) and there was no significant difference between targets in the physical pointing and the button press condition (p = 0.14). There were no significant differences between any of the stimuli in the button press condition (all ps > 0.70).

While both response conditions were associated with increased P3 amplitudes for targets as compared to standard stimuli and distractors, the physical pointing condition demonstrated stronger amplitude increases in the time range of the P3 as compared to the button press condition. The stronger effect in the pointing condition could have been caused by increased processing demands or a generally higher alertness in a condition that required fast responses to a dynamically moving target. However, because the P3 component was located in a time

window that also comprised participants pointing responses, increased P3 amplitudes might have been confounded with non-brain related processes. The rather strong jerks of the rapid pointing movements could have added mechanical artifacts induced by the movement. In addition, physical pointing at a moving target required constant coordination of eye, head, and arm movements that, due to volume conduction of the corneoretinal potential and neck muscle activity, likely contributed to the P3 component at the sensor level. To further investigate to what extent signals from brain and non-brain sources like eye movements or muscle activity contributed to the sensor signal the correspondent independent component processes were analyzed.

#### Contributions of Brain, Neck Muscle and Eye Movement Activity Related ICs

Clustering of ICs resulted in 26 clusters with cluster centroids located to the gray matter of the brain model or in regions of the model indicating eye movement or neck muscle activity. **Figure 7** displays clusters of IC processes (smaller spheres) and their respective cluster centroids (larger spheres) reflecting brain dynamics, eye movement and muscle activity.


TABLE 2 | Variance in µV 2 in the −200 to 1000 ms time range for the total data and separately for clusters of eye movement, neck muscle, and brain activity.

Brackets show the corresponding pvaf in %. Columns are displaying values separately for the cluster combinations and rows display values to standard, distractor and target stimuli in the physical pointing and button press condition.

Back projection of event-related activity originating from different clusters to the sensors allowed for quantifying the contribution of brain and non-brain sources to the sensor P3 component. ERPs of clusters with a centroid located to the gray matter of the brain as well as clusters representing eye movement and neck muscle activity were selected for back projection. The absolute variance and the percent residual variance accounted for (pvaf) with respect to the P3 envelope was computed for all clusters for the time interval between 200 ms before stimulus onset to 1000 ms post stimulus. The pvaf of a specific cluster is defined as 1 − R where, R is the quotient of the absolute variance of the remaining clusters (after excluding the considered one) and the absolute variance of all clusters. The pvaf has an upper bound of 100% but can be negative if its projection to the scalp electrode cancels the projected signal of another cluster. This can happen in case ICs are spatially non-orthogonal. Pvaf values were used to estimate the relative share of certain clusters within one condition. Absolute variances, in contrast, allowed for comparing the contributions of one or more clusters to the sensor level in different response conditions where relative values could be misleading due to differences in overall absolute activity.

**Table 2** shows the resulting absolute variances and pvafs. Here, the total variance refers to all 36 clusters resulting from the clustering, while neck variance refers to 12 clusters indicating neck muscle activity, eye variance refers to two clusters contributing to horizontal and vertical eye movements, and brain variance to 12 clusters located to the gray matter of the brain. **Figure 8** displays in gray the back-projected summed sensor signal envelope based on all brain, eye, and neck muscle clusters and in red from left to right the contribution of clusters accounting for eye movements, neck muscle activity, and brain activity, respectively.

# Relative Contribution of Clusters to the Envelope

#### Button Press Condition

The relative contributions to the ERP envelope for standard stimuli in the button press condition was high for eye movement activity, only marginal for neck muscle activity, and low for brain processes (eye: 87.9%, neck: −0.2%, brain: 3.8%). Decreasing contribution of eye movement activity and increasing contributions of brain processes was observed for distractor stimuli (eye: 83.3%, neck: −2.7%, brain: 8.2%) and target stimuli (eye: 63.9%, neck: 3.6%, brain: 34.3%).

#### Physical Pointing Condition

The contributions to the envelope of the ERP for standard stimuli in the pointing condition (eye: 82.4%, neck: 3.6%, brain: 7.4%) were similar to those in the button press condition with slightly stronger contributions of neck muscle activity and brain processes. This trend grew stronger for distractor stimuli (eye: 38.2%, neck: 12.0%, brain: 13.8%) with a pronounced drop in eye movement contribution. For targets neck muscle activity exceeded all other processes considerably (eye: 15.3%, neck: 56.6%, brain: 10.6%).

# Absolute Contribution of Clusters to the Envelope

#### Button Press Condition

In the button press condition the absolute variance of all non-brain and brain processes was relatively stable for standard (3.45 µV 2 ), distractor (3.04 µV 2 ) and target stimuli (3.17 µV 2 ). The absolute contribution of clusters representing eye movements revealed 2.12 µV 2 for standards, 2.71 µV 2 for distractors, and 1.62 µV 2 absolute variance for targets. For clusters with the equivalent dipole model of the cluster centroid located in or near regions of the head model indicative of neck muscles the absolute variance increased from standard (0.01 µV 2 ) to distractor (0.04 µV 2 ) and target stimuli (0.05 µV 2 ). The same trend was observed for clusters representing brain activity contributing 0.07, 0.14, and 0.43 µV 2 absolute variance for standard, distractor, and target stimuli, respectively.

#### Physical Pointing Condition

In the physical pointing condition the absolute variance of all non-brain and brain processes strongly increased from standard (2.39 µV 2 ) and distractor (4.46 µV 2 ) to target stimuli (19.32 µV 2 ). The absolute contribution of clusters representing eye movements revealed lower values compared to the button press condition explaining 1.58 µV 2 for standards, 1.26 µV 2 for distractors, and 1.98 µV 2 , for targets. The absolute variance for clusters representing neck muscle activity increased from standard (0.03 µV 2 ) to distractor (0.56 µV 2 ) and target stimuli (6.09 µV 2 ). A comparable pattern was observed for brain activity with

column by 12 clusters comprising on average 8.8 ICs (range: 6–11, σ = 1.5) from 12 participants. Cluster locations are projected onto the standard MNI brain volume and displayed in sagittal, horizontal, and coronal views. (B) Red: ERP contributions of clusters representing eye movement (left), neck muscle (middle), and brain (right) activity. Light gray: ERP envelopes of all 36 back-projected clusters. The dark gray area displays the latency range of the P3 component from 400–800 ms after a color change. The left and right columns display envelopes for the button press and the physical pointing condition, respectively, with rows displaying from top to bottom the different stimuli (standard, distractor and target).

the lowest absolute variance for standard (0.17 µV 2 ) and distractor stimuli (0.53 µV 2 ), followed by target stimuli (1.33 µV 2 ).

In summary, the absolute variance and the increase in absolute variance for clusters representing brain and neck muscle activity were more pronounced in the physical pointing condition, with clusters representing neck muscle activity explaining by far the highest amount of the sensor envelope for target stimuli. In contrast, eye movement contributions were lower for standard and distractor stimuli in the physical pointing condition.

Compared to neck muscle and eye movement activity, the contribution of brain processes to the surface potential was relatively small in both response conditions demonstrating a prominent role of non-brain sources for sensor based ERP analyses during active movements of the head and upper torso.

To further investigate the brain dynamics accompanying target processing in the physical pointing as compared to the button press condition, all non-brain clusters were excluded and only brain-related activity was back projected to the sensor level.

# Relative Contributions of Brain Activity to the Sensor Event-Related Potential

Examining the grand average ERP from back-projecting all clusters representing brain activity revealed which clusters TABLE 3 | Variance in µV 2 in the 400–800 ms time range for all clusters contributing to brain activity and separately for the parietal and ACC clusters.


Brackets show the corresponding pvaf in %. Columns are displaying values separately for the cluster combinations and rows display values to standard, distractor and target stimuli in the physical pointing and button press condition.

contributed most to the sensor level variance in the time window of the P3 component of the ERP. **Table 3** displays the explained absolute and relative (pvaf) variance for the parietal and anterior cingulate cortex (ACC) clusters for each condition in the 400–800 ms time window. For pvafs and absolute variances of all brain clusters, see Supplementary Table 1.

The absolute variance of the sensor ERP explained by brain processes increased from standard (0.14 µV 2 ) to distractor (0.25 µV 2 ), and target stimuli (0.87 µV 2 ) in the button press

parietal cortex, respectively; light gray: ERP envelope computed by back-projecting all clusters located in the gray matter of the brain model. The dark gray area displays the latency range of the P3 component from 400–800 ms after a color change which was used for calculating corresponding pvafs. The left and right columns display envelopes for the button press and the physical pointing condition, respectively, with rows displaying from top to bottom the different stimuli

condition. The same trend was observed for the physical pointing condition with lowest absolute variance for standards (0.32 µV 2 ) and distractor stimuli (0.43 µV 2 ), followed by target stimuli (3.11 µV 2 ). The amount of variance explained and the increase in explained variance was stronger in the physical pointing condition.

The general pattern observed for the contribution of all brain clusters was also observed for the backprojection of a subset of clusters with their centroids located in or near the anterior cingulate and parietal cortex. Three clusters (Cls 5, 21, and 24) representing brain activity in or near the ACC explained lower absolute variance for standard (0.098 µV 2 ) and distractor stimuli (0.094 µV 2 ) than for target stimuli (0.486 µV 2 ) in the physical pointing condition. A different contribution was observed in the button press condition with increasing absolute variance for standards (0.055 µV 2 ) to distractors (0.076 µV 2 ), and targets (0.084 µV 2 ). Because of the general increase in absolute variance in the target condition the relative contribution of the ACC clusters was considerably more pronounced for standard and distractor stimuli than for the target related P3 (see **Figure 9**). The relative contribution of the ACC clusters for standard stimuli was 68.6% and 72.5%, for distractor stimuli 39.5% and 58.8% and for target stimuli 0.3% and 1.5% in the physical pointing and the button press condition, respectively.

Parietal clusters explained increasing variance with the lowest contribution for standard stimuli (0.004 µV 2 ), followed by distractor (0.010 µV 2 ) and target stimuli (0.130 µV 2 ) in the button press condition. This increase from standard to target was also observed for the physical pointing condition with 0.004,

(standard, distractor and target).

0.020, and 0.954 µV 2 for standard, distractor and target stimuli, respectively. With 55.4% in the physical pointing condition and 38.2% in the button press condition the two parietal clusters contributed the most to the P3 signal for target stimuli. The right panel of **Figure 9** displays two clusters located in or near the parietal lobe and their summed backprojected ERP activity relative to the envelope of all ICs representing brain activity.

Beyond the contribution of the described clusters located in or near the ACC and parietal lobe, other clusters also contributed to the sensor envelope for target stimuli in the P3 time range in the physical pointing condition. These clusters were located in or near the junction of the left parietal and occipital cortex (x = −40, y = −73, z = 27 corresponding to BA 39/BA 19) explaining 38.3%, the right motor and premotor cortex (x = 40, y = −6, z = 54, corresponding to BA 6/BA 4) explaining 14.8%, and the left dorsolateral prefrontal cortex (x = −43, y = 22, z = 31, corresponding to BA 9) explaining 9.8% of variance of the sensor envelope (see Supplementary Table 1 for additional cluster contributions in the button press condition).

# The Contribution of Brain Activity to the P3 at Pz'

To analyze the brain dynamic contribution to the maximum of the P3 at the central parietal electrode, only ICs with their equivalent dipole model located to the gray matter of the brain were back projected to Pz' (see **Figure 10**). The resulting summed activity was analyzed with respect to the response condition and stimulus type. To this end mean ERP amplitudes at Pz' were calculated for a time window ranging from 400 to 800 ms after a color change of the sphere and tested for statistical differences using a 2 × 3 repeated measures ANOVA with the factors response condition (physical pointing vs. button press) and

stimulus type (standard, distractor, target). Greenhouse–Geisser correction was performed in cases where the assumption of sphericity was violated.

The analysis revealed a significant main effect of the response condition (F(1,11) = 11.70; p = 0.006; η <sup>2</sup> = 0.515) and stimulus type (F(2,22) = 16.04; p = 0.001; η <sup>2</sup> = 0.593). The interaction of both factors was also significant (F(2,22) = 12.47; p = 0.003; η <sup>2</sup> = 0.531). Post hoc HSD contrasts revealed that P3 amplitudes were significantly higher for targets in the pointing condition as compared to standards and distractors in the pointing condition (all ps < 0.001) as well as for standards, targets, and distractors in the button press condition (all ps < 0.001). In the button press condition the P3 amplitude was significantly higher for targets than for standard stimuli (p < 0.02) but did not differ from distractor stimuli (p > 0.19).

#### DISCUSSION

In the present study, a visual oddball paradigm was used to investigate the feasibility of MoBI during volatile rapid movements. The systematic manipulation of response requirements to color changes of a dynamically moving object allowed for a direct comparison of ERPs during simple button presses and active physical pointing. Whereas earlier studies demonstrated that treadmill walking introduces comparatively more eye movements than neck muscle activity (Gramann et al., 2010a) the impact of neck muscle activity was much stronger in the present study with non-stereotyped pointing movements accompanying a wide range of different velocities and movement directions. To react properly in the physical pointing condition participants were requested to move fast and accurately requiring continuous tracking of the stimulus accompanied by eye and head movements and, whenever a target appeared, rapid arm and head movements integrating visual information from the dynamically moving object to intercept the target. This mimicked the fundamental difference between traditional imaging approaches using simple button responses and the MoBI approach allowing for natural interaction with the environment.

The present study revealed important new insights into the brain dynamics accompanying physical interaction with a moving object. Firstly, the study clearly demonstrated that MoBI is feasible for recording and analyzing embodied cognitive processes and the accompanying brain/body dynamics during volatile rapid movements in a realistic 3-D environment. Secondly, applying blind source separation methods to the EEG signals recorded during the visual oddball paradigm allowed for separating and clustering ICs corresponding to neck muscle activity, eye movements or brain processes. This way it was possible to analyze the contribution of different clusters to the scalp signal revealing strong activity of neck muscles during the physical pointing response resulting from head orientation changes and compensation of shoulder and arm movements during pointing. Thirdly, movement onsets and corresponding reaction times in the physical pointing condition demonstrated significantly faster response onsets as compared to the button press condition. Fourthly, analysis of the data in the time range of the P3 component revealed a clear P3 in both response conditions at the sensor level as well as the level of cluster activity. This manifested in significantly higher mean ERP amplitudes for target stimuli as compared to standard stimuli as well as increasing absolute variance for standard, to distractor, and target stimuli in both response conditions. Finally, backprojecting all brain-related clusters to the centro-parietal sensor showed significantly higher P3 amplitudes for target stimuli in the physical pointing condition compared to the button press condition. This finding indicates different brain dynamics for different behavioral states and has far-reaching implications in the field of Neuroergonomics.

# Natural Cognition and the Contribution of Brain and Non-Brain Sources

During physical interaction with a dynamically moving object, non-brain sources stemming mainly from eye movements and neck muscle activity as well as mechanical artifacts strongly diminished the observable fraction of brain activity recorded on the scalp and avoided meaningful analysis of sensor-based potentials without further preprocessing. However removing all ICs not associated with brain activity allowed for analyzing the P3 component and the contribution of different clusters to its time course revealing the following findings.

Clusters contributing to the sensor P3 component were mostly in line with the results of previous studies. As in Makeig et al. (2004), central parietal, motor and occipital processes contributed to the P3 with the largest contribution of parietal clusters to the onset of target stimuli. The contribution of brain processes located near or in the ACC was in line with previous findings using an oddball paradigm during treadmill walking (Gramann et al., 2010a).

The explained variance of brain related sources increasing from standard, to distractor, and target stimuli were found in both response conditions, with a stronger effect in the physical pointing condition. This is consistent with the assumption that a potential physical interaction with the environment requires additional cognitive and motor processes and thus leads to higher computational effort. In the present study it was necessary to track the position and movement direction of the relevant object and body parts required for orienting to and interacting with the stimulus. Physical interaction with target stimuli required action planning, execution, and control. Those processes were not required for frequent standard and rare distractor stimuli reflected in smaller amplitudes and lower variance in clusters reflecting brain processes. However, distractors attracted more attention and potentially triggered an initial response. This response had to be suppressed resulting in additional inhibitory processes and accompanying brain activity as indicated by higher variance for distractor stimuli than for standard stimuli.

Clusters representing neck muscle activity also accounted for increasing variance from standard, to distractor and target stimuli in both conditions. The increase was stronger in the physical pointing condition where a correct response to the target required a pointing movement comprising movement of the head, shoulder and arm. Those movements were accompanied by strong neck muscle activity as observed for target stimuli in the physical pointing condition. Since the readiness to act was very high, as indicated by faster response times and the absence of any misses, it is likely that rare distractor stimuli caused the initiation of response movements. Even in case the response was subsequently inhibited for distractor stimuli, response initiation would be reflected in higher neck muscle contribution to distractor than to standard stimuli.

Finally, clusters representing eye movements explained more variance in the sensor signal in the button press as compared to the physical pointing condition for standard and distractor stimuli. One possible explanation is that in the physical pointing condition head alignment to the stimulus position facilitated physical movements in that direction. As a consequence of increasing head movements during stimulus tracking, less eye movements were required for keeping the moving stimulus in the visual field. However, since the sphere kept moving after the color change, successful pointing movements to targets required an ongoing prediction of its future position. This caused extended coordination of eye and arm movement resulting in an increase of variance explained by eye movements. In the button press condition a simple button press was sufficient to respond to a color change requiring no further coordination of eye movement and physical response. In addition, the visual stimulus stopped moving after response execution rendering stimulus tracking unnecessary. This would have resulted in a decrease of corresponding variance for target trials compared to distractor and standard trials in both conditions.

#### Limits of MoBI

The present study required continuous head and eye movements during stimulus tracking causing electrical potentials on the surface electrodes superposing the EEG signal. This happened especially for target stimuli in the physical pointing condition where a correct response demanded arm movements accompanied by strong jerks associated with increased neck muscle activity. Subsequently no significant mean ERP difference was found on the scalp electrodes in the P3 time range between physical pointing and button press for target stimuli. Volume conducted non-brain activity in the recorded EEG signal is an inevitable consequence of active movements of the participants. Using ICA for separating brain related from non-brain related activity and back projecting the former to the Pz' revealed the expected differences between those conditions. Thus, MoBI proved feasible for analyzing event-related EEG dynamics of participants performing rapid pointing movements in a realistic 3-D environment.

However some caveats were identified in this study indicating potential constraints of the MoBI approach for investigating natural movements. These included an increase of artifact contaminated trials and channels as well as higher residual variances compared to EEG studies with stationary participants that are not allowed to move their heads.

A relatively high number of trials had to be removed due to inconsistencies between markers written online during the experiment and those derived afterwards from the velocity profiles. Thus the amount of considered data was decreased impeding statistical analysis especially in the physical pointing condition. Future MoBI experiments will need a setup fully covered with fixed cameras minimizing the risk of LED occlusions and camera movement causing such inconsistencies.

Related to the reduced number of trials due to technical problems, the impact of movement-related mechanical artifacts like cable sway was reflected in a specific distribution of the probabilities for channels to be removed. Jerks and micromovements of the electrodes over the skin surface associated with fast response movements led to impedance changes with strong artifactual activity affecting the outmost neck electrodes in the posterio-inferior regions. The central midline area as well as the two lateral regions over the scalp in contrast were most likely affected by cable pull during head movements due to the cable routing over the scalp to the back. This is one likely explanation for the high rejection rate of those electrodes. The lateral regions near the mastoid processes were predestined for bad contact with high impedance leading to artifactual activity due to the fit of the cap. For further MoBI experiments a redesign of electrode attachment or cablefree electrodes has to be considered to increase the number of channels that can be analyzed.

Finally, in contrast to non-MoBI studies the ICA results revealed many ICs with large residual variances that would not be included into the clustering process applying standard selection criteria (e.g., Gramann et al., 2010b). However muscle activity and eye movements originate in regions of the head that are usually not included in the head model for source reconstructions rendering it difficult to calculate suitable dipole models. Moreover, muscle contraction causes tissue movements which could result in dipole displacement. Thus, although in general higher residual variance is associated with decreased result accuracy, dipoles with relatively large residual variances were included in the analysis. A future improvement would be the introduction of forward models including neck muscles and their contraction profiles as additional parameter for the inverse solution.

## Implications on Neuroergonomics Research

The faster response onsets in the physical pointing condition might be the consequence of another brain dynamic state caused by the need of physical interaction with the stimulus. In addition, the physical pointing condition might have led to increased motivation and more fun for this response format as reported by the participants after the experiment. However, using movement and velocity profiles for the purpose of additional brain activity analysis requires the definition of corresponding features that are widely used and accepted. Defining the movement onset as a fixed percentage of the corresponding maximum velocity as in this study was only one possible solution. Other criteria might be useful in different contexts like a fixed absolute velocity or acceleration value which would be independent of individual movement differences. A general definition should be discussed and established to increase comparability of experimental results in the field of Neuroergonomics and for MoBI research in general. Importantly, comparing different onset criteria starting from 5% of the corresponding peak velocity up to 53% of the corresponding peak velocity still indicated faster responses in the physical pointing condition as compared to button presses. Whether this was simply due to the fact that participants enjoyed the physical response format or whether interception of a dynamically moving objects was associated with a generally different behavioral and brain dynamic state will have to be investigated in future experiments. There are clear arguments in favor of state differences in brain dynamics depending on the behavioral state. Introducing a task that requires large volatile movements not only produces muscle activity and eye movements but also requires additional processes that allow for movement planning, control and execution. This additional processes will be reflected in changes in brain dynamics. Moreover, the oddball paradigm required constant attention directed towards a sphere that moved within the borders of a large screen in front of participants. In addition, the physical pointing condition necessitated the prediction of the targets' movement to integrate this information with proprioceptive information about position and orientation of the arm and hand for concurrent dynamic motor planning and execution. Continuous observation and integration of environmental aspects with complex motor programs causes higher computational effort and can be assumed to lead to different brain dynamic states compared to passive observation. This is indicated by the significantly increased amplitude of the P3 and faster response onsets in the physical pointing condition compared to the button press condition. It would be important for future investigations to analyze the impact of stimulus speed on the brain dynamic state since higher speeds increase task difficulty and thus affect head and eye movement velocities.

This has significant implications for Neuroergonomics investigating the brain at work, especially in case the working environment requires physical interaction with a dynamic system. The physical pointing task required the participants to actively interact with their environment using fast, precise movements of the upper torso and the arm and hand. This generalizes to a wide range of working tasks where people have to manipulate objects as, for example, in assembly-line work or construction trade. Future studies might investigate the brain dynamics underlying spatially extended movements with different velocities including team sports or spatial orientation with or without navigation assistance. Studying the brain activity in the described work settings could provide valuable insights into the cognitive processes and the limits of the cognitive system and thus allow for suggestions how to increase system safety. For example, the degree of interaction seems to be one factor improving working environments by influencing motivation, reaction time and task complexity. Another factor to be considered is the body posture since active movement is naturally associated with an upright posture whereas cognitive neuroscientists still investigate sitting or lying participants. Changes in brain dynamics due to different body postures could have an impact on result quality and information processing speed.

Investigating the human brain dynamics accompanying physical interaction with dynamically moving objects for the first time, this study clearly demonstrated that it is possible to record and analyze EEG activity during volatile rapid movements. Thus, future MoBI studies examining the mentioned aspects will have an important impact on Neuroergonomics specifically and cognitive neuroscience in general.

### AUTHOR CONTRIBUTIONS

KG and EJ have been actively involved in the experimental design. EJ recruited experimental participants, acquired and

#### REFERENCES


analyzed the data. KG participated in the data analysis. All authors contributed to the interpretation of the data, revised the manuscript and approved the final version to be published. EJ and KG agree to be accountable for all aspects of the work and ensure that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fnhum.2016.00 306/abstract


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Jungnickel and Gramann. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Age-Sensitive Effects of Enduring Work with Alternating Cognitive and Physical Load. A Study Applying Mobile EEG in a Real Life Working Scenario

Edmund Wascher\*, Holger Heppner, Sven O. Kobald, Stefan Arnau, Stephan Getzmann and Tina Möckel

IfADo-Leibniz Research Centre for Working Environment and Human Factors, Dortmund, Germany

Ergonomic assessment of a workplace requires the evaluation of physical as well as cognitive aspects of a particular working situation. In particular the latter is hardly possible without interfering in the natural setting. Mobile acquisition of neurophysiological measures (such as parameters of the EEG) may close this gap. At a simulated workplace we tracked older and younger participants with mobile EEG during a 4–5 h work shift. They had to perform either a monotonous cognitive task, a self-paced cognitive task or a self-paced physical task in a predefined order. Self assessment, behavioral performance and spectral measures of the EEG (before most alpha power) indicated that younger participants suffered from monotony. Older adults, on the other hand, were overall impaired by inefficient information processing. This was visible in EEG variations time-locked to eye blinks (blink-related synchronizations), a new measure to investigate cognitive processing in real life environments. Thus, we were able to distinguish between active and passive task-related aspects of mental fatigue without impinging on the natural working situation.

Keywords: working environment, mental fatigue, mobile EEG, aging

# INTRODUCTION

Evaluation of workplaces may take place on quite different levels. Traditional ergonomics focuses on physiological and physical factors of working environment, applying surveys to different aspects of the workplace in order to support and protect the worker (McAtamney and Corlett, 1993; Hignett and McAtamney, 2000; Halim et al., 2012). More recently, virtual models of humans help to evaluate many of these aspects of work (Lämkull et al., 2007; Bandouch et al., 2008). Cognition, however, and human factors affecting cognition, can be addressed only superficially in real life situations. While experimental psychology may address distinct cognitive aspects that play a role in working situations, this can hardly apply for the complex interaction of cognitive requirements that a worker faces during a work shift. Neurophysiological measures that can be taken while regular work is performed may help to close this gap. In particular, the increasing accessibility of mobile neurophysiological technology that allows for an "online" registration of work-related parameters is a huge challenge and chance for ergonomic evaluation (Wascher et al., 2014a).

#### Edited by:

Klaus Gramann, Berlin Institute of Technology, Germany

#### Reviewed by:

Juliana Yordanova, Bulgarian Academy of Sciences, Bulgaria Lutz Jäncke, University of Zurich, Switzerland

> \*Correspondence: Edmund Wascher wascher@ifado.de

Received: 09 October 2015 Accepted: 21 December 2015 Published: 13 January 2016

#### Citation:

Wascher E, Heppner H, Kobald SO, Arnau S, Getzmann S and Möckel T (2016) Age-Sensitive Effects of Enduring Work with Alternating Cognitive and Physical Load. A Study Applying Mobile EEG in a Real Life Working Scenario. Front. Hum. Neurosci. 9:711. doi: 10.3389/fnhum.2015.00711

In the present study, we focused on a neuroergonomic evaluation of the interaction between mental fatigue and agerelated changes in cognitive performance. The proportion of employed older adults is continuously increasing with the general demographic change. While older employees may benefit in many cases from professional experience, they are also facing physiological and cognitive decline, which makes it difficult to keep up with their younger colleagues, in particular when long-term working situations are considered. There is some evidence from laboratory experiments that not only physical but also mental fatigue raises faster with higher age (Wascher and Getzmann, 2014). On the other hand, it has been shown that varying the cognitive task may prevent older adults from accelerated raise of mental fatigue (Falkenstein et al., 2002). However, does this finding hold also for situations in which varying tasks require flexible retrieval of cognitive and physical capabilities continuously, as it may be the case during a working shift?

We addressed this question based on a multi-level model of fatigue and its possible relation of age-related cognitive decline by pinning the observed behavior measures down to neurophysiological mechanisms.

Long lasting activity, independently whether it is physical or cognitive, leads to a decline in capabilities of a worker (Halim et al., 2012; Lerman et al., 2012). Despite this communality and the usage of the common term of "fatigue," physical and cognitive declines have quite different underlying mechanisms and consequences. Physical fatigue goes along with metabolic changes in the muscle, which leads a decline in physiological capabilities. Additionally, an increase in the plasma fatty acid level may lead to an increase in tryptophan. This is assumed to increase the 5-HT concentration in the brain and thereby contributing to central fatigue (Newsholme et al., 1992), a mental contribution to physical fatigue. Although central aspects may play an important role for the accessibility of physical resources (Marcora et al., 2009; Mehta and Parasuraman, 2013), the peripheral exhaustion of resources is the core aspect of physical fatigue.

Mental fatigue, on the other hand, is not known to go along with any physiological resource consuming aspect (Hockey, 2013). Apart from circadian rhythms and sleeping behavior (sleep-related fatigue = SR), mental fatigue may derive either from cognitive overload (active task-related fatigue = aTR) or from mental underload (passive task-related fatigue = pTR) due to, e.g., monotony (May and Baldwin, 2009). Those latter two mechanisms end up in distinct types of fatigue that have different consequences for subsequent activity. While mental aTR was found to persist even after completion of the task and not to depend on the amount of motivation (van der Linden et al., 2003), pTR is strongly coupled with the motivational system (Boksem et al., 2006; Boksem and Tops, 2008; Bonnefond et al., 2011). Both in laboratory basic research studies as well as in applied contexts it was demonstrated that pTR can be efficiently effaced by incentives. Even after long lasting mental tasks, the instruction to put more effort into a given task is efficient to restore cognitive performance almost completely to the level that has been shown at the beginning of the experiment (Boksem et al., 2006). Therefore, pTR has been framed in motivational terms. Boksem and Tops (2008), for example, proposed that pTR reflects an imbalance between resources invested and outcome. Whenever the effort that is needed to perform on an adequate level is too high in relation to the outcome that is generated, motivation declines and, as a consequence, cognitive processing becomes less efficient.

Until now, the different aspects of mental fatigue have been investigated in isolation and under controlled laboratory settings. However, one has to be aware that most working activities that require physical and cognitive effort also contain periods in which monotonous cognitive tasks have to be performed. Such changes in duties may prevent at least from pTR, as has been demonstrated in studies that investigated effects of aging on mental fatigue. When tasks changed during the experimental session no mental fatigue was observed in older adults (Falkenstein et al., 2002), whereas performing the same task for a longer period of time led to a clear age-related decline of attentional performance (Wascher and Getzmann, 2014).

Age-related effects on pTR are insofar of central interest in fatigue research as the well-known decline of structures in the frontal lobe of the human brain (Chao and Knight, 1997) affects primarily those structures that are also involved in motivational processes and thereby in upholding cognitive performance in non-demanding situations (Berridge and Robinson, 1998). Beside this, lower muscle strength with higher age may contribute to central fatigue when physical demanding tasks are part of a working situation.

We designed a simulated workplace that resembled the post room of a German wholesale house (see Funding). The tasks of the participants recreated parts of the real workflow but were nevertheless controlled experimental settings. Participants had to perform a monotonous stimulus-response task, a self-paced cognitive task, and a physical task (moving and sorting boxes of different weights and sizes) in a repetitive sequence for about 4–5 h. Behavioral performance was measured in the cognitive tasks and self estimation of experienced fatigue and motivation were repeatedly taken. One of the core questions of the present study was, to what degree a neuroergonomic approach may help to get objective data from task load, effort, and fatigue-related changes in cognitive processing. To this end, beside the "classical" measures of behavioral performance and self estimation, mobile EEG was recorded continuously, while the participants were freely moving in an office-like room, dealing with the different tasks.

Since there is hardly any literature to such experimental setting, we focused firstly on well-known aspects of the EEG that have been related to mental fatigue. This includes before most oscillatory activity. It has been repeatedly shown that brain oscillations are slowing down with mental fatigue, indicated by increasing power in the alpha and the theta band of the EEG (Lal and Craig, 2001; Akerstedt et al., 2004; Wascher et al., 2014b). Given that most of these studies used longer lasting monotonous tasks, these effects may be related before most to pTR. In particular, the increase in alpha activity may well be a correlate of decreasing motivation and a withdrawal of attention that lead to a kind of idle state in the sensory and attention related structures of the brain (Hanslmayr et al., 2012) that should be most pronounced in the monotonous stimulus-response task. The increase in theta activity may be rather related to increasing effort that is invested to keep performance high (Sarter et al., 2001, 2006) which should be stronger in self-paced tasks. Most interestingly, a kind of slowing has been also reported with increasing age, as indicated by a reduction in the individual alpha frequency (for a review see Klimesch, 1999), but so far not for mental fatigue.

Besides these rather energetical aspects of brain activity, we asked for specific neuronal processes that can be related to information processing, and how they change with age and fatigue. In laboratory settings, stimuli are presented at distinct time points and cortical activity is measured time-locked to these events (so-called event-related potentials or oscillations). Such events are not accessible in real life situations. Adding additional stimuli to a real life situation may substantially alter the task of participants, leading to, e.g., attentional distraction. Distinct events from the surrounding that may be identified by scene cameras would be not comparable across different tasks, and no sufficient number of repetitions of comparable events is guarantied. Events that occur independently of a particular task, repetitively also in real life situations are socalled eye events. Horizontal eye-movements (e.g., saccades) are the core human behavior related to spatial orientation of attention. More interesting for the temporal segmentation of incoming information are eye blinks that occur primarily at the end of an information processing sequence (e.g., Doughty, 2001; Wascher et al., 2015). Recently, we demonstrated that time locking of EEG activity to eye blinks provide reliable measures for cognitive effort (Wascher et al., 2014a). Because eventrelated potentials did not reliably show time on task related changes in a previous task (see Wascher and Getzmann, 2014), we applied event-related synchronization/desynchronization (ERS/ERD; Pfurtscheller and Aranibar, 1979) analyses to the eye-blink related data. Due to the lower time resolution, these data might be more robust in complex experimental situations like a workplace simulation. Moreover, phasic changes in brain oscillatory activity (in particular in the Alpha band) appear to be reliable correlates of signal processing (Klimesch et al., 2002; Müller et al., 2009).

Taken together, applying these methods to a working situation that resembles a real workplace should provide information about age and fatigue-related changes in information processing and the underlying neuronal mechanisms. The aim of the present study was to go beyond the description of age-related differences in performance.

#### METHODS

#### Participants

Thirteen younger adults (20–29 years old, mean age 25.3) and 12 older adults (55–73 years old, mean age 64.4) took part in the experiment. All participants had normal or corrected to normal vision, were of good physical health, and reported no history of psychiatric or neurological diseases. For the entire procedure (lasting around 5–6 h including preparation) participants received 60 e.

Prior to the experiment participants gave written informed consent. The study was approved by the local ethics committee and according to the Declaration of Helsinki.

# Task, Stimuli, and Procedure

The experiment took place in an office room (3.50 × 4.80 m) with partly covered windows. Tables stood along the walls where the boxes for the physical task were placed (see **Figure 1**). On one of the tables, a computer monitor was positioned for the presentation of instructions for the physical task and the cognitive tasks. A research assistant who controlled the experiment sat in another room and monitored the EEG recordings. Participants could reach the assistant via phone at all times. The assistant only entered the room during breaks or to correct any technical failures if necessary.

The experiment consisted of three tasks, which were repeated in a predefined sequence within each block. Blocks started and ended with a computerized version of the d2-task, that was closely oriented on the original paper-pencil version (Brickenkamp, 1962). In between, a block of the Simon task was presented, followed by the physical task and again the Simon task had to be performed. This procedure was repeated three times with short breaks in between and added up to an overall duration of the work shift of about 4–4.5 h. Each subtest was defined for a pre-defined duration, thus the self-paced tasks were stopped no matter how many passes were finished.

In the d2-task, three lines of 57 d's or p's each with one to four marks (in the form of single or double quotes) above and/or below the letter were presented. The participants had to

shift (B).

mark as much d's with exactly two marks as possible in a given time window using a computer mouse, while simultaneously ignoring all p's and the d's with less or more than two marks. The participants had 20 s per line. Then a sound signaled to proceed with the next line. After the three lines were done, a new screen appeared with three new lines. In each d2-task block, five screens with three lines each were presented. Overall, one d2-task block took about 5 min. The d2-task served as a self-paced cognitive task.

In the Simon task (Simon, 1969) one of two symbols (either a square or a diamond) was presented on the left or the right side of a fixation cross for 150 ms. The participants had to decide which one of the symbols was shown by responding with either the left or the right hand, while ignoring the side on which the stimulus was presented. Thus, a trial could be either corresponding (stimulus presentation and response on the same side) or non-corresponding (stimulus presentation and response on different sides). The inter-stimulus interval was 1800 (±500) ms, and 704 stimuli were presented overall. Each block of the Simon task took about 21 min. The Simon task served as an externally-paced, monotonous cognitive task.

Before and after each Simon task the participants were asked to rate their subjectively experienced amount of mental fatigue and their motivation to continue with the task on a 9-point Likert scale.

In the physical box-sorting task (Boxes), participants had to handle 12 cardboard boxes of three different sizes and three different weights (0.5–15 kg). The boxes were placed on waist high tables, which formed two "zones" on opposite sides of the room. The distance between the two opposite zones was about 190 cm. Participants had to carry the boxes between these zones. They had to sort them according to size, weight, or label, consisting of either a letter (A, B, C) or a number (1–12) both attached to the boxes. Boxes always had to be arranged in three groups of four objects each. In case of sorting by numbers, boxes 1–4, 5–8, and 9–12 had to be put together. In case of sorting by letters, boxes A, B, and C had to be put together. The sorting rules were presented on a computer screen. After finishing one sorting task the participant had to press a button on the keyboard to get new instructions. The order of the sorting rules was randomized. There was no time limit for a single sorting task. Overall, one physical task block was performed for 25 min. A message on the screen signaled the ending of the block. The physical task was not paced at all.

#### EEG Data Recording and Processing

EEG was recorded from 60 standard electrode sites using an active electrode system (ActiCap; BrainProducts). Vertical eye movements and blinks were measured from two electrodes above and below the right eye (vEOG). Two electrodes at the outer canthi of both eyes were used for the measurement of horizontal eye movements (hEOG). Electrode impedance was kept below 10 k. EEG and EOG were digitized at 1000 Hz and submitted via a WiFi module (MOVE; BrainProducts) to a BrainAmp MR plus EEG amplifier (BrainProducts). Data was recorded with a resolution of 0.1µV, a Low Cutoff at DC and a High Cutoff at 250 Hz. Transmitter and the ActiCap Control Box were placed in a belt bag at the lower back of the participants. They could move around without any restrictions.

Data were offline re-referenced to averaged mastoids and a bandpass filter (0.5–45 Hz) was applied. Data were set up both for regular and event-related frequency analyses (eventrelated desynchronizations/synchronizazions = ERD/ERS: see Pfurtscheller and Aranibar, 1979; Klimesch, 1999; Pfurtscheller and Lopes da Silva, 1999). Because of the large structural differences between tasks, no task-inherent temporal markers for event-related analyses were available across tasks. Therefore, we referred to eye blinks as temporal marker for information processing (Wascher et al., 2014a, 2015). Only singular blinks were used for this procedure, which were not followed by another blink within 700 ms. Additionally, blinks were excluded that were accompanied by marked horizontal eye movements (for more detailed information about the blink detection mechanism, see Wascher et al., 2015). Data segments from −1000 to 2000 ms around the maximum of blink-related activity in the bipolar vEOG were extracted. An interval between −450 and −250 ms served as baseline. This interval was selected to avoid any temporal overlap with ongoing vertical eye movements. After statistics-based artifact removal as implemented in EEGLAB (Delorme and Makeig, 2004), an independent component analysis (ICA) was applied (data downsampled to 250 Hz). Independent components (ICs) reflecting artifacts were identified and rejected using ADJUST (Mognon et al., 2011). The remaining ICs were tested for biological plausibility based on their scalp maps. The goodness of fit for modeling each IC with a single equivalent current dipole was calculated by submitting individual component maps to an automatic source localization algorithm (DIPFIT, contributed to EEGLAB by Oostenvelt et al., 2003), using a standard four-shell spherical head model. Any IC with a residual variance of more than 40% was automatically removed from the data (for a similar procedure see Debener et al., 2005).

For the analyses of frequency spectra, fast-fourier transformations (FFTs) were applied to the extracted segments, using the spectopo function of EEGLAB. In order to provide a sufficient resolution of frequencies, data were padded with zeros to a length of 2048 data points (freqfrac = 4). For ERD/ERS analyses, the matrix of valid ICs was projected back to the continuous data set for band-pass filtering (4–7.5 Hz for Theta activity; 8–12 Hz for Alpha activity). Segments that were marked as artificial in the preprocessing pipeline were removed from those data as well.

#### Data Analysis Self Assessment

Analyses of variance for repeated measurements (ANOVAs) were conducted for the subjective measures (i.e., the rated mental fatigue and rated motivation to continue the task) with the between-subject factor Age (younger, older), and the within-subject factors Time on Task (ToT; across the three blocks) and Sequence (order within each block).

#### Behavioral Data (Simon Task)

ANOVAs were conducted for response times and error rates in the Simon task with the between-subject factor Age (younger, older) and the within-subject factors ToT, Sequence (task run before vs. after physical task), and S-R Correspondence (relates to the spatial relation between stimulus and response location: corresponding vs. non-corresponding).

#### EEG Data

All EEG analyses were restricted to FCz and POz, two electrodes that are commonly reported in studies investigating mental fatigue.

Since several individuals in the sample showed either multiple peaks in the EEG spectrum or no peak at all, we chose the gravity frequency method in order to determine the individual Alpha frequency (IAF; see Klimesch, 1999). Gravity frequency (GF) is defined as the weighted sum of spectral estimates in the Alpha range divided by the total Alpha power (Goljahani et al., 2012). Extracted power measures were individually adjusted to GFs (for a review see Klimesch, 1999). Lower alpha power was defined as the mean power between GF–2 Hz and GF. Upper Alpha ranged from GF to GF + 2 Hz, Theta from GF − 5 to GF − 3 Hz and Beta from GF + 5 to GF + 18. GF and the mean power in all bands were entered into ANOVAs with the between subject factor age and the within subject factors Task (3; D2, Simon task, Boxes), Time on Task (3), and Electrode (2; FCz, POz).

For ERD/ERS analyses, band-pass filtered data were squared and set into relation to the mean power in the baseline (−1000 to 0 ms relative to the blink maximum). The most impressive effect occurred immediately after the re-opening of the eyes (see also **Figure 8**). Therefore, ERD/ERS were measured in a distinct time windows between 0 and 300 ms after the blink maximum. Mean ERD/ERS were calculated for this time window and entered into the same analysis as power values and GF.

For factors with more than two levels, Greenhouse-Geisser adjusted p-values are reported where appropriate. Additionally, effect sizes by means of partial eta squared (η 2 p ) are reported for significant results. Post-tests were Bonferroni corrected. Signal analyses were performed on MATLAB <sup>R</sup> . All statistical analyses were conducted using GNU R (R Core Team, 2012). Plots were drawn using VEUSZ (Jeremy Sanders, 2013; http://home.gna. org/veusz/).

FIGURE 2 | Sequence of task and outcome of the self assessment. Fatigue increased and motivation decreased with time on task. In particular after the monotonous Simon task (t1 and t3).

## RESULTS

#### Self Assessment

Fatigue increased with ToT, F(2, 48) = 9.70, p = 0.001, η 2 <sup>p</sup> = 0.29, and motivation decreased, F(2, 48) = 13.92, p < 0.001, η 2 <sup>p</sup> = 0.37. For both scales (see **Figure 2**), a clear modulation was found with the task performed [fatigue: F(3, 72) = 56.04, p < 0.001, η 2 <sup>p</sup> = 0.70, motivation: F(3, 72) = 28.94, p < 0.001, η 2 <sup>p</sup> = 0.55]. In particular, after the Simon task self-experienced fatigue was high and motivation was low. The time on task effect was more pronounced in older participants for mental fatigue, interaction ToT by Age: F(2, 48) = 3.33, p = 0.061, η 2 <sup>p</sup> = 0.12, but not for motivation ratings, F(4, 48) = 0.069, p > 0.2, indicating that older adults experienced a stronger increase in mental fatigue than younger adults. On the other hand, for both measures some evidence for an interaction of age by task was found, fatigue: F(3, 72) = 2.44, p = 0.108, η 2 <sup>p</sup> = 0.09, motivation: F(3, 72) = 3.23, p = 0.045, η 2 <sup>p</sup> = 0.12, indicating more impact of the Simon task in younger compared to older adults.

## Behavioral Data (Simon Task)

Older adults responded marginally slower than younger participants in the Simon task (see **Figure 3**), F(1, 23) = 3.99, p = 0.058, η 2 <sup>p</sup> = 0.15, and responses were faster for S-R corresponding trials, F(1, 23) = 6.25, p < 0.001, η 2 <sup>p</sup> = 0.76. No overall effect of ToT was found, F(2, 46) = 2.16, p = 0.132, η 2 <sup>p</sup> = 0.09, however, within blocks (Sequence), response times were faster after the physical task, F(1, 23) = 7.67, p = 0.007, η 2 <sup>p</sup> = 0.25. This phenomenon was most pronounced at the beginning of the experiment, interaction of ToT by Sequence: F(2, 46) = 3.99, p = 0.007, η 2 <sup>p</sup> = 0.21. No systematic variation of time on task parameters with age was found.

No age effect was found for error rates, F(1, 23) = 0.40, p = 0.534, η 2 <sup>p</sup> = 0.02, but there was an increase of error rates with non-corresponding trials, relative to corresponding trials, F(1, 23) = 39.31, p < 0.001, η 2 <sup>p</sup> = 0.63. Error rates slightly increased with ToT, F(2, 46) = 5.16, p = 0.013, η 2 <sup>p</sup> = 0.18. The latter effect was more pronounced in younger adults, interaction ToT by age: F(2, 46) = 2.79, p = 0.079, η 2 <sup>p</sup> = 0.18, indicating that younger participants committed more errors in the last block of the experiment.

#### EEG Data

#### Gravity Frequency (GF), Alpha, and Theta power

GF (see **Figure 4**) did not overall vary with age, F(1, 24) = 1.70, p = 0.205, η 2 <sup>p</sup> = 0.07, but an interactions of age by channel F(1, 24) = 26.91, p < 0.001, η 2 <sup>p</sup> = 0.53, was found. No age effect was visible at the anterior lead, F(1, 24) = 0.10, p > 0.05, whereas GF was reduced with higher age at the posterior electrode, F(1, 24) = 8.12, p = 0.018, η 2 <sup>p</sup> = 0.25. Also the effect of ToT did not reach significance, F(1, 24) = 2.93, p = 0.200, η 2 <sup>p</sup> = 0.11. However, GF strongly varied with the task performed, F(2, 48) = 12.14, p < 0.001, η 2 <sup>p</sup> = 0.34, and the effect of task was qualified by electrode position, F(2, 48) = 13.74, p < 0.001, η 2 <sup>p</sup> = 0.36. At frontal leads, GF was higher in the Simon task compared to the self-paced D2 task (D2), F(1, 24) = 17.18, p < 0.001, η 2 <sup>p</sup> = 0.41. Also, the physical task showed higher GFs than the D2 at the anterior lead, F(1, 24) = 10.00, p = 0.008, η 2 <sup>p</sup> = 0.29. At POz, again the Simon task evoked higher GFs compared to the D2, F(1, 24) = 19.79, p < 0.001, η 2 <sup>p</sup> = 0.45. At this electrode location no difference in GF was found between the two self-paced tasks (D2 and Boxes), F(1, 24) = 2.31, p > 0.2.

Alpha power (see **Figures 5**, **7**) increased with ToT, F(1, 24) = 17.76, p < 0.001, η 2 <sup>p</sup> = 0.43 and varied with task, F(2, 48) = 10.96,

p < 0.001, η 2 <sup>p</sup> = 0.31. Pairwise comparisons revealed that in the cognitive tasks, alpha power was higher in the monotonous Simon task than in the self-paced D2 task, F(1, 24) = 15.40, p = 0.002, η 2 <sup>p</sup> = 0.39. Comparing the two self-paced tasks (D2 and Boxes), alpha power was higher when participants performed the physical task, F(1, 24) = 18.18, p < 0.001, η 2 <sup>p</sup> = 0.43. The effect of task was modulated both by age, F(2, 48) = 5.81, p = 0.012, η 2 <sup>p</sup> = 0.19, and by ToT, F(2, 48) = 6.97, p = 0.004, η 2 <sup>p</sup> = 0.23.

While ToT effects were obtained for both cognitive tasks [D2: F(1, 24) = 17.69, p < 0.001, η 2 <sup>p</sup> = 0.42; Simon: F(1, 24) = 14.72, p = 0.003, η 2 <sup>p</sup> = 0.38; Boxes: F(1, 24) = 4.44, p = 0.138, η 2 <sup>p</sup> = 0.16], the age effect was restricted to the Simon task. Alpha power was enhanced in younger adults, F(1, 24) = 4.83, p = 0.076, η 2 <sup>p</sup> = 0.17.

Theta power (see **Figures 6**, **7**) was reduced in older adults, F(1, 24) = 5.29, p = 0.030, η 2 <sup>p</sup> = 0.18, and varied with the task performed, F(2,48) = 26.64, p < 0.001, η 2 <sup>p</sup> = 0.53. Theta power did not differ between the two cognitive tasks (D2 and Simon), F(1, 24) = 0.19, p > 0.5, but was markedly increased in the physical task compared to the self-paced cognitive D2 task, F(1, 24) = 37.66, p < 0.002, η 2 <sup>p</sup> = 0.61. No effect of ToT was observed, F(1, 24) = 0.20, p > 0.5. Task effects were more pronounced at frontal leads, F(2, 48) = 46.81, p < 0.001, η 2 <sup>p</sup> = 0.66, and varied across age groups, F(2, 48) = 3.67, p = 0.033, η 2 <sup>p</sup> = 0.13. Significant age effects were only observed in the cognitive tasks, D2: F(1, 24) = 8.96, p = 0.018, η 2 <sup>p</sup> = 0.27, Simon: F(1, 24) = 10.48, p = 0.012, η 2 <sup>p</sup> = 0.30, but not in the physical task, F(1, 24) = 0.56, p > 0.5.

#### Blink-Related Desynchronization/Synchronization (ERD/ERS) of the EEG

As depicted in **Figure 8**, alpha activity synchronized after the eyes were opened. This effect was strongly modulated by experimental factors and differed across age groups. In the following, statistics will be reported for the mean ERD/ERS in the time window

between 0 and 300 ms after the maximum of the blink in the EOG.

#### **Alpha ERD/ERS**

Event-related synchronizations in the alpha band (see **Figures 8**, **9**) after the blink were enhanced for older adults, F(1, 24) = 9.81, p = 0.005, η 2 <sup>p</sup> = 0.29, and varied with the task performed, F(2, 48) = 6.97, p = 0.002, η 2 <sup>p</sup> = 0.23. Pairwise comparisons of tasks, however, did not show any significant effects [Simon vs. D2, F(1, 24) = 3.95, p = 0.118, η 2 <sup>p</sup> = 0.14, D2 vs. self-paced physical task, F(1, 24) = 2.21, p = 0.300, η 2 <sup>p</sup> = 0.08]. In the overall analysis also a number of interactions was observed, Age by ToT: F(1, 24) = 5.91, p = 0.023, η 2 <sup>p</sup> = 0.20, Age by Task by Channel: F(2,48) = 3.93, p = 0.026, η 2 <sup>p</sup> = 0.14, Age by Channel by ToT: F(1, 24) = 4.38, p = 0.047, η 2 <sup>p</sup> = 0.15, Age by Task by channel by ToT: F(2, 48) = 3.52, p = 0.038, η 2 <sup>p</sup> = 0.13, indicating that ERS was modulated by all experimental factors. ERS systematically increased with ToT in younger adults, F(1, 12) = 11.25, p = 0.012, η 2 <sup>p</sup> = 0.48, but not in older ones, F(1, 12) = 1.49, p = 0.492, η 2 <sup>p</sup> = 0.11. Post-tests separate for each task revealed no significant interactions of Age by ToT [D2: F(1, 24) = 4.72, p = 0.120, η 2 <sup>p</sup> = 0.16, Boxes: F(1, 24) = 5.47, p = 0.084, η 2 <sup>p</sup> = 0.19, Simon task, F(1, 24) = 0.09, p > 0.5].

#### **Theta ERD/ERS**

Overall, Theta ERS (see **Figure 10**) was enhanced in older adults, F(1, 24) = 7.52, p = 0.011, η 2 <sup>p</sup> = 0.24, and varied with the task performed, F(2, 48) = 3.00, p = 0.059, η 2 <sup>p</sup> = 0.11. Theta ERS was slightly higher in the Simon task compared to the self-paced D2 task, F(1, 24) = 4.31, p = 0.049, η 2 <sup>p</sup> = 0.15, but did not differ between the two self-paced tasks, F(1, 24) = 0.21, p = 0.652, η 2 <sup>p</sup> = 0.01. The effect of task was modulated by a number of other variables, Age by Channel by Task: F(2, 48) = 3.01, p = 0.059, η 2 <sup>p</sup> = 0.11, Age by Task by ToT: F(2, 48) = 5.69, p = 0.006, η 2 <sup>p</sup> = 0.19, Channel by Task by ToT: F(2, 48) = 5.38, p = 0.008,

η 2 <sup>p</sup> = 0.18. All of those interactions reached significance in the young group, but failed to do so in older adults, reflecting the fact that Theta ERS strongly increased in variance in older adults. Post-hoc Tests revealed evidence toward an increase in Theta ERS for older adults in all tasks, D2: F(1, 24) = 5.00, p = 0.105, η 2 <sup>p</sup> = 0.17, Simon: F(1, 24) = 7.21, p = 0.039, η 2 <sup>p</sup> = 0.23, Boxes: F(1, 24) = 5.10, p = 0.099, η 2 <sup>p</sup> = 0.18. An interaction of Age by ToT was only observed in the self-paced cognitive task, D2: F(1, 24) = 7.31, p = 0.036, η 2 <sup>p</sup> = 0.23, Simon: F(1, 24) = 0.94, p > 0.5, Boxes: F(1, 24) = 0.00, p > 0.5.

In sum, the EEG data showed a pattern that is well comparable to previous experimental settings. Alpha power increased with ToT (see Wascher et al., 2014b) and showed a marked reduction in older adults. The latter effect, however, was restricted to the monotonous cognitive task that resembled most a regular cognitive experiment. Theta power was reduced in older adults and also systematically varied with the task performed. Agerelated effects in this measure were more pronounced in cognitive tasks. Finally, a strongly enhanced synchronization of both frequency bands was observed when ERS/ERD were investigated time-locked to the blink maximum in the EEG.

### DISCUSSION

In the present study, participants simulated a short (4–5 h) working shift in the post room of a wholesales house. They moved parcels and interacted with a computer in a repetitive sequence. The cognitive tasks on the computer were either repetitive (and rather monotonous) or self-paced. The design of the study was inspired by the real workflow in this particular working environment. During the entire shift, the EEG of the participants was recorded by mobile EEG equipment that did not restrict free movement and thus allowed natural behavior at any moment.

On average, subjectively experienced fatigue remained rather stable in younger adults during the entire shift, but increased for older adults. For both groups, however, fatigue was highly

related to the task performed. After the monotonous computer task, fatigue ratings were substantially increased compared to the physical task. This finding nicely stresses the role of monotony for the experience of mental fatigue, which was more pronounced in younger adults who were obviously subjectively more affected by the monotonous task compared to older participants. These effects go along with a local decline in motivation in the monotonous cognitive task that was found only in the younger participants.

The enhanced impact of monotony upon younger participants is also nicely mirrored in behavioral and neurophysiological data. With increasing time on task, error rates but not response times increased in younger adults. At the end of the working shift, this led to an accuracy in the Simon task that was even lower in the younger, than in the older participants. Considering the EEG, younger participants showed markedly increased alpha activity in this particular task. Referring to the assumption that high alpha power is related to an idle state of the attentional system (Hanslmayr et al., 2012), younger adults might have switched to a state of attentional withdrawal (see Wascher et al., 2014b).

Within the theoretical framework described in the Section Introduction, in which mental fatigue may result either from cognitive overload or from mental underload (May and Baldwin, 2009), the fatiguing factor in this case is a passive one (pTR), namely monotony and the decline of motivation that goes along with that. Older adults appear to deal better with monotony.

Factors that made them tired were more widespread across tasks. We can't rule out that in particular the physical task was more demanding for older adults. Factors like central fatigue, i.e., a decrease of cognitive capabilities due to muscular strain (e.g., Davis, 1995; Blomstrand, 2001) might have influenced their behavior in terms of an active task-related (aTR) factor. Both, age (Müller et al., 2009) and central fatigue (Hilty et al., 2011) have been reported to go along with larger cortical phase synchronization. In particular early synchronizations in the EEG might indicate that older adults were far more driven by external signals (see Klimesch et al., 2002) compared to young participants (Zacks and Hasher, 1997; Lustig et al., 2007; Wascher et al., 2011, 2012). This is in accordance with a number of laboratory studies that showed amplified early EEG responses both in evoked potentials and in time frequency based analyses (e.g., Müller et al., 2009). This stronger impact of stimulation is assumed to be due to reduced executive cognitive control with increasing age that may affect numerous cognitive functions (Gazzaley et al., 2005, 2008; Grady et al., 2006). As a neurophysiological correlate for this deficit, reduced frontal theta activity has been discussed (Cummins and Finnigan, 2007) which was also found in the present study. This latter effect, however, was restricted to the computer-based cognitive tasks and disappeared when boxes had to be sorted. Thus, we can demonstrate that the decrease in theta power is not a global fact with increasing age, but rather is a task-dependent decline. Thus, cognitive tasks appear to be more demanding for older adults, because of deficient signal handling when information enters the system. Too much irrelevant information might be processed (Wascher et al., 2012) which is resource consuming. Therefore, mental fatigue in older adults is at least in parts related to the exhaustion of cognitive functions.

Finally, regarding effects of time on task, alpha activity showed the well-known pattern of increasing power. In contrast to pure laboratory experiments (Wascher et al., 2014b), no saturation is visible in any task in the present study. This phenomenon might be due to the alternation of tasks that interrupted monotony. In particular, the huge increase of alpha power in younger participants in the Simon task indicate that monotony was an important factor that drove

alpha power. More interestingly, both measures of eventrelated synchronization/desynchronization showed convergence between age groups with time on task. When younger participants were impaired in particular by the passive taskrelated factor of monotony, a decline in motivation should go along with that. Reduced motivation is correlated with reduced activity in the frontal dopaminergic motivation system (Berridge and Robinson, 1998). An impairment of executive control functions was the consequence and lead to more stimulus-driven behavior. This transient state resembles the aging brain that lacks frontal activity due to physical decline (Bäckman et al., 2000).

Taken together, these results show that applying neural measures to a real life work situation provided substantial information about mechanisms and causes of mental fatigue in younger and older adults. The core results were highly comparable to laboratory studies and therefore, validity and reliability of data appears to be sufficient. The diversity of tasks additionally provided important insight into the meaning and usefulness of particular neurophysiological measures for neuroergonomics. Most importantly, blink-related activity in the EEG (Berg and Davies, 1988) was systematically changing with the task performed and with other experimental factors. As has been shown before (Wascher et al., 2014a), cognitive demands and cognitive strategies are reflected in these measures. Thus, it can be assumed that the re-opening of the eyes, after a blink has been executed, denotes a moment when new information enters the system, very similar to the presentation of a visual stimulus. This fact allows to measure event-related EEG analyses without any external stimulation. In particular in working situations, nothing has to be changed to the natural environment. Nevertheless, aspects of information processing can be specifically addressed.

In summary, the present study demonstrates that mobile EEG provides substantial information about information processing at the workplace and its alteration due to fatigue or age related aspects. The data pattern in self assessments, behavioral data, and neurophysiological measures nicely indicates that younger participants suffered before most from monotony. Passive taskrelated fatigue led to deficits in information processing with time on task. Older adults, on the other hand, were challenged from the very beginning of the work shift by altered information processing. Due to declined executive control mechanisms, their information processing was much more stimulus-driven. Thus, the active process of overcoming this deficit appears to play a major role for mental fatigue in older worker in this particular working situation. Addressing this issue when designing working environment and work flow could substantially improve life quality of employees.

#### AUTHOR CONTRIBUTIONS

EW wrote the MS and analyzed data. HH contributed to the design of the study and recorded data. SK was involved

#### REFERENCES


in the development of the theoretical framework. SA helped developing methods of analysis. SG contributed with respect to the theoretical embedding of age related cognitive decline. TM provided theoretical aspects with respect to mental fatigue.

#### FUNDING

This research was supported by funding from the German Social Accident Insurance Institution for the trade and logistics sector (BGHW), the Federal Labour Office (Bundesagentur für Arbeit) and the "Metro AG."


countermeasure technologies. Transport. Res. F Psychol. Behav. 12, 218–224. doi: 10.1016/j.trf.2008.11.005


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Wascher, Heppner, Kobald, Arnau, Getzmann and Möckel. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Benefits of Instructed Responding in Manual Assembly Tasks: An ERP Approach

Pavle Mijovi´c<sup>1</sup> \*, Vanja Kovi´c<sup>2</sup> , Maarten De Vos <sup>3</sup> , Ivan Maˇcuži´c<sup>1</sup> , Branislav Jeremi´c<sup>1</sup> and Ivan Gligorijevi´c<sup>1</sup>

<sup>1</sup> Department for Production Engineering, Faculty of Engineering, University of Kragujevac, Kragujevac, Serbia, <sup>2</sup> Laboratory for Neurocognition and Applied Cognition, Department for Psychology, Faculty of Philosophy, University of Belgrade, Belgrade, Serbia, <sup>3</sup> Department of Engineering, The Institute of Biomedical Engineering, University of Oxford, Oxford, UK

The majority of neuroergonomics studies are focused mainly on investigating the interaction between operators and automated systems. Far less attention has been dedicated to the investigation of brain processes in more traditional workplaces, such as manual assembly, which are still ubiquitous in industry. The present study investigates whether assembly workers' attention can be enhanced if they are instructed with which hand to initiate the assembly operation, as opposed to the case when they can commence the operation with whichever hand they prefer. For this aim, we replicated a specific workplace, where 17 participants in the study simulated a manual assembly operation of the rubber hoses that are used in vehicle hydraulic brake systems, while wearing wireless electroencephalography (EEG). The specific EEG feature of interest for this study was the P300 components' amplitude of the event-related potential (ERP), as it has previously been shown that it is positively related to human attention. The behavioral attention-related modality of reaction times (RTs) was also recorded. Participants were presented with two distinct tasks during the simulated operation, which were counterbalanced across participants. In the first task, digits were used as indicators for the operation initiation (Numbers task), where participants could freely choose with which hand they would commence the action upon seeing the digit. In the second task, participants were presented with arrows, which served as instructed operation initiators (Arrows task), and they were instructed to start each operation with the hand that corresponded to the arrow direction. The results of this study showed that the P300 amplitude was significantly higher in the instructed condition. Interestingly, the RTs did not differ across any task conditions. This, together with the other findings of this study, suggests that attention levels can be increased using instructed responses without compromising work performance or operators' well-being, paving the way for future applications in manual assembly task design.

Keywords: neuroergonomics, wireless electroencepholagraphy, event-related potentials, P300, attention, manual assembly

#### Edited by:

Klaus Gramann, Berlin Institute of Technology, Germany

#### Reviewed by:

Rolf Verleger, Universität zu Lübeck, Germany Edmund Wascher, Leibniz Research Centre for Working Environment and Human Factors, Germany

#### \*Correspondence: Pavle Mijovi ´c

p.mijovic@kg.ac.rs

Received: 01 October 2015 Accepted: 04 April 2016 Published: 20 April 2016

#### Citation:

Mijovi ´c P, Kovi ´c V, De Vos M, Ma ˇcuži ´c I, Jeremi ´c B and Gligorijevi ´c I (2016) Benefits of Instructed Responding in Manual Assembly Tasks: An ERP Approach. Front. Hum. Neurosci. 10:171. doi: 10.3389/fnhum.2016.00171

# INTRODUCTION

The importance of studying the human brain processes while executing everyday complex tasks in naturalistic environments was pinpointed by Parasuraman (2003), through a new direction in human factors and ergonomics (HF/E) research. This novel direction was tentatively named neuroergonomics (Parasuraman, 2003; Parasuraman and Rizzo, 2006; Parasuraman, 2011). Although Parasuraman and Wilson (2008) modestly stated that neuroergonomics should not be thought of as revolutionary, but rather as another step in HFE research, the growing body of neuroergonomics research refutes this statement. In fact, ever advancing technology has facilitated neuroergonomics research and only 12 years from its inception it has become one of the principal directions in HFE research. Ultimately, understanding brain processes in naturalistic environments can lead to improvement of existing industrial processes design and to creation of safer and more efficient working conditions (Parasuraman, 2003), consequently improving the operators' overall well-being.

Neuroergonomics has had significant success in evaluating brain activity in its interaction with automated systems, through the studies of mental workload, dual-task performance (Ayaz et al., 2013) and operators' vigilance (Warm and Parasuraman, 2006; Warm et al., 2009). Additionally, it has gone a step further with the development of state-of-the-art neuroadaptive systems facilitating the mutual interaction between an automated system and operators, in the sense that both human and the system can initiate a change in the level of automation when needed (Scerbo, 2006; Mehta and Parasuraman, 2013). On the one hand, this trend is understandable as industry, over several decades, has tried to reach the ''lights-out manufacturing'' concept (Tompkins et al., 2010), i.e., completely automated factories which can operate without the direct presence of human operators in the production processes. In that case, human supervisory control of automated systems becomes essential (Sheridan and Parasuraman, 2005), as human operators would be solely responsible for controlling the automated production systems (Warm et al., 2008). On the other hand, although automation is becoming ubiquitous in industry and everyday life (Parasuraman and Wilson, 2008), the ''lights-out'' concept is still rather futuristic and there is still a need for human manual operations in production processes. This is especially notable in assembly tasks and processes where costs related to process automation are generally not justifiable (Tang et al., 2003).

For these reasons, it is evident that neuroergonomics studies should pay additional attention to more traditional workplaces, through investigation of concurrent physical and cognitive work. This approach has received far less attention in neuroergonomic studies (for review see Mehta and Parasuraman, 2013). For example, in the car manufacturing industries the majority of processes are automated, however human operators play a crucial role in the final car cockpit and interior assembly, i.e., final assembly (Michalos et al., 2010). Typically, manual assembly tasks require a large number of repetitions and are monotonous in nature, thus leading to hypo-vigilance of operators (Spath et al., 2012). In turn, operators have difficulty in sustaining the desired level of attention during the task, and therefore, the risk of workrelated injuries, material damage or even accidents is increased (Kletz, 2001). Therefore, employing existing neuroimaging techniques to understand the way the brain processes various stimuli in this class of tasks could be beneficial, as the task design could be optimized in such a way as to obtain and maintain sufficient operator attention, thereby avoiding possibly hazardous situations.

An extensive review of neuroimaging techniques applicable to neuroergonomics research has been recently published by Mehta and Parasuraman (2013). Although functional near infrared spectroscopy (fNIRS) presents a convenient technique for the neuroergonomics research in naturalistic setting due to its light weight and portability (Ayaz et al., 2010, 2012; Mehta and Parasuraman, 2013), it suffers from low temporal resolution and its use in dynamic everyday environments is still somewhat limited (Gramann et al., 2011). On the other hand, Electroencephalography (EEG) and therefrom derived event related potentials (ERPs) belong to the neuroimaging techniques that directly measure brain activity (Gramann et al., 2011; Mehta and Parasuraman, 2013) and both EEG and ERPs possesses high temporal resolution (down to the order of milliseconds) making them suitable for real-time investigation of brain dynamics in complex environments (Gramann et al., 2011). Even though Parasuraman (1990) proposed the introduction of ERPs in ergonomics research, until recently the traditional EEG recording suffered from long wiring between the electrode cap and amplifier unit, which engenders the artifacts that degrades signal quality (Debener et al., 2012). Additionally, EEG recordings usually required shielded, dimly lit and sound attenuated rooms, which was one of the main precondition for its recording, thus limiting its use in naturalistic environments (Gramann et al., 2011). However, these problems were recently overcome by the development of wearable EEG systems, empowering its use in everyday and applied settings (Debener et al., 2012; De Vos et al., 2014; Wascher et al., 2014; Mijovi´c et al., 2016). Consequently, operators' brain dynamics can nowadays be successfully investigated with wearable EEG in faithfully replicated workplaces, by simulating the work activity (Wascher et al., 2014; Mijovi´c et al., 2016). This can provide insight in how the brain responds to complex industrial tasks and these findings can contribute to more efficient task designs.

The aim of this article is the investigation of assemblers' mental states, by utilizing ERPs in a realistically replicated workplace. Neuroergonomics implies that overt performance measurements are unreliable (Parasuraman, 2003), since they do not provide the possibility for timely investigation of the underlying covert cognitive processes during everyday tasks. To get better insights into the temporal course of the underlying attention processes engaged in manual assembly operation, we selected two tasks in which we triggered goal-directed actions of workers by presenting them with either digits (in one) or arrows (in the other task) prior to initiating the operation. In this way we wanted to elicit the P300 ERP component (also called P3 or P3b), which is represented by the positive ERP voltage deflection that usually appears between 300 and 500 ms after appearance of the task-relevant stimuli (Polich and Kok, 1995; Verleger et al., 2005). The P300 component is often used to identify the depth of cognitive information processing and its amplitude and latency are considered to be related to the human attention level (Johnson, 1988; Polich, 2007; De Vos et al., 2014).

The P300 complex is the most prominent over the midline scalp sites (Polich, 2007) and it is among the most prominent ERP components (Verleger et al., 2014), making it one of the most studied components of human ERP. However, there is still a lack of consensus regarding what brain functions the P300 component represents (the arguments are briefly summarized in Verleger et al., 2014). One influential view is that the P300 component can be explained through the context updating hypothesis that was proposed by Donchin (1981) and which governs that the P3 reflects the updating of working memory that is related to task-relevant and unexpected events. The context updating theory assumes that the mental process that elicit the P3 component reflects a revision of the model of the environment rather than serving to organize a response to the eliciting stimulus (Verleger et al., 2005). In other words, it is assumed that following an initial sensory processing, an attention-related process evaluates the presentation of the previous event in working memory and if a new stimulus in a train of standard stimuli is detected, the attention-related process updates, which is followed by production of the P300 component (Polich, 2007). However, we have also witnessed arguments against the context updating theory (Verleger et al., 2005, 2014). In fact, Verleger et al. (2005) proposed a new hypothesis in which they argued that the P300 component is related both to stimuli processing and organizing the response. In order to prove this hypothesis, Verleger et al. (2005) compared the P3 amplitude in stimulus- and response-locked ERPs and they found that both P3 amplitudes were comparable. Therefore, it was confirmed that P300 amplitude does not reflect just the simple reaction to stimulus change. Rather, P300 reflects a process that mediates between perceptual analysis and response (Verleger et al., 2005), i.e., it is related to the organization of the response and it depends on the stimulus-response links (Verleger et al., 2014).

Based on these findings, the present study investigated whether and how the neural correlates of goal-directed actions would differ if the operators were requested to initiate the simulated assembly operation spontaneously (upon seeing a digit), as opposed to the condition where participants were instructed with which hand to commence the operation (upon seeing an arrow). In the spontaneous condition (the Numbers task), we adopted the stimuli from the original SART paradigm that is a simple ''go/no-go'' task, which consists of consecutively presenting digits from ''1'' to ''9'' and participants are required to give a speedy response on all stimuli, with the exception of digit ''3'' (Robertson et al., 1997). The main difference between the original SART and the Numbers paradigm (used in our study) is that the digits in Numbers are randomized. Further, in the original SART paradigm it is requested that participants provide the speedy response with the index finger upon the stimulus presentation. However, this would impede the simulation of the real working operation, since it would require an additional, task-unrelated operation from participants. Instead, in the Numbers paradigm, participants were instructed to initiate the assembly operation as soon as the visual (target) stimulus appeared on the screen, with whichever hand they felt more comfortable (the assembly operation is explained in detail in Section ''Simulated Assembly Operation''). For the instructed responding (the Arrows) task, we adopted the stimuli and procedures from Donkers and van Boxtel (2004). The Arrows task is essentially a choice reaction task, where the arrows pointing to the left and right appear on the screen; white arrows represent the target (''go'') condition, while red arrows represent the ''no-go'' stimulus. The main difference between the Numbers and Arrows tasks was that in the Numbers task participants could freely choose the hand with which they would initiate the assembly operation, while in the Arrows task, participants were instructed to commence each operation with the hand that corresponds to the direction in which the white arrow on the screen was pointing. An important notion is that not only the simple stimulus difference between the tasks was varied (digit vs. arrow), but also the informational value of those stimuli: the Arrows task arguably requires stimulus-response mapping, which in turn requires more cognitive evaluation, consequently inducing higher-level attentional processing than in the simple ''go/no-go'' task. In both the task specific and spontaneous condition, the visual stimuli (digits and arrows) appeared in the center of a screen that was placed in front of the participants.

We expected attention, when assessed through the P300 amplitude, to be more enhanced in the instructed responding (Arrows) task, compared to the one where participants could initiate the assembly operation upon seeing the task unspecific cue (Numbers task). Further, we wanted to investigate whether the difference in the task condition would also influence the reaction times (RTs), as the performance of the participants is also important, since this study simulates the naturalistic assembly task replicated from the industry. In other words, we wanted to investigate whether the participants would be slower in the case when they are instructed with which hand they should start the assembly operation, as compared to the condition when they can spontaneously initiate the assembly operation with whichever hand they prefer.

# MATERIALS AND METHODS

#### Participants

Seventeen healthy subjects, from which one was left-handed, aged between 19 and 21 years volunteered as participants in the study. Due to abnormalities in the recording three subjects were excluded from further analysis, leaving a total of 14 participants. The study was restricted to male participants, both to exclude possible inter-gender differences and to replicate the selected job task more faithfully, since in the company that supported our research only males occupy the specific workplace under study. Participants did not report any past or present neurological or psychiatric conditions and were free of medication and psychoactive substances. They were instructed not to take any alcoholic drinks prior to, nor on the day of, participation in the study. All participants had normal or corrected-to-normal vision. They agreed to participate in the study and signed informed consent after reading the experiment summary in accordance with the Declaration of Helsinki. The Ethical Committee of the University of Kragujevac approved the study and procedures for the participants.

#### Replication of the Workplace

As we stated in the introduction, reliable EEG recording still relies on wet electrodes, limiting on-site industrial EEG recording. For that reason, we simulated the production process of the rubber hoses, which are used in hydraulic brake systems in the automotive industry, in a faithfully replicated workplace (**Figure 1**). Full-scale replica of the specific workplace was created at the laboratory of University of Kragujevac, in consultation with the car sub-component manufacturing company. In order to create a naturalistic environment, all major elements from the real factory settings have been included while preserving respective spatial ratios and replicating ambient conditions.

The laboratory was air-conditioned and microclimate conditions controlled, keeping the ambient temperature at 24 ± 1 ◦C while the measured relative air humidity value was between 40% and 60%. The luminance at the real workplace was also replicated from the industrial settings, using the same lighting and maintaining the luminance value at 810 lx. Finally, the noise trace was obtained by recording sounds in the vicinity of the original production facility, using cardiodid condenser microphone AT2020USB (Audio-technica, Japan), and this was replayed during the experiments with an SW-HF 5.1 6000 surround multimedia speaker (Genius, Taiwan). The ambient (light, noise) and microclimate (temperature, humidity) condition values were obtained using multifunctional environmental meter device PCE-EM882 (PCE instruments, UK).

The experimental setup used in this study was similar to previously reported studies (Mijovi´c et al., 2015a,b, 2016), while the experimental task and procedure were modified. For clarity, we will repeat the detailed experimental setup here.

#### Simulated Assembly Operation

In the production process, an operator carries out a crimping operation in order to join a metal extension to a rubber hose. This

FIGURE 1 | Left image—Real workplace (replicated from our industrial partner); Right image—Replicated workplace.

single operation, carried out in a sitting position, consists of eight simple steps (actions). Step-by-step simulated operation, carried out by participants in the replicated working environment, is graphically presented in **Figure 2A** and explained in detail further in the text.

The major production steps can be summarized as follows (**Figure 2A**): first, the information to initiate the simulated assembly operation is presented to the participant, in the form of visual stimulus (step 1, explained in detail in Section ''Experimental Procedure''), upon which he is instructed to instantly initiate the operation by taking the metal part (step 2) and the rubber hose (step 3). Following this, participants should place the metal part on the hose (step 4) and place both inside the crimping machine (step 5). Once the rubber hose and metal part are correctly placed inside the opening, the industrial green lamp lights and presents a visual cue to the participant, informing him that the part has been correctly placed. Participant then proceed by promptly pressing the pedal, which initiates the improvised machine and replicates the real machines' crimping sound with a duration of 3500 ms (step 6). The real crimping operation that would happen upon pressing the pedal was avoided, preserving its major aspects from operator's perspective—the sound it produces and the cessation of which indicates the end of machine operation, analogously to the real case. Upon completion of the simulated crimping process, the participant removes the component and places it in the box with completed parts (step 7). Finally, following these steps, the participant sits still, waiting for the subsequent stimulus (step 8) indicating the next-in-line operation.

Although the assembly task consists of eight sub-actions, the whole operation lasts less than 10 s and a single operator

pedal in order to initiate the simulated crimping operation; Step 7—Placing the completed into the box with completed parts; Step 8—Waiting for the successive

stimulus presentation. (B) Graphical representation of the Numbers Task. (C) Graphical Representation of the Arrows task.

completes between 2500–3000 elements during a work shift. Hence, this workplace represents a typical example of a repetitive, monotonous operational task in industrial assembly settings.

# Preparation

Each participant arrived to the laboratory at 9:00 a.m. Upon carefully reading the experiment summary and signing the informed consent for participation in the study, participants started the training session in order to gain familiarity with the task. Due to its simplicity, they were given 15 min for practicing, following which they confirmed their readiness to start the experiment. Finally, an EEG cap and amplifier were mounted on the participant's head (as explained in the Section ''EEG Recording'') and the recording started around 9:30 a.m.

# Experimental Procedure

During the experiment, at least two experimenters were constantly present in the laboratory in order to assure that experimental procedures were strictly followed. The experimenters were seated behind an opaque board (so that participants could not see them during the task) and they observed the participants through a red-blue-green (RGB) camera that recorded the entire experiment.

Participants were seated in a comfortable chair in front of an improvised workplace including the improvised machine (**Figure 1**). In order to extract the ERP component from continuous EEG recording, a single functional modification in the simulated assembly task was made. Simultaneously with the simulated assembly process, the participants were subjected to either the Numbers (**Figure 2B**) or Arrows (**Figure 2C**) task to prompt initiation of the assembly operation. Both tasks were presented on the 24'' screen from a distance of approximately 100 cm in a balanced order across participants (with a 15 min break between the tasks). The screen was height adjustable and the center of the screen was set to be level with participants' eyes. Upon presentation of the stimuli on the screen, the participants were instructed to complete the previously explained assembly operation (also graphically presented in **Figure 2A**).

All the stimuli were presented for 1000 ms on a black screen background. In both tasks the appearance of the stimuli was randomized, with the condition that forbade the two consecutive appearance of the ''no-go'' stimuli (digit ''3'' in Numbers, and red arrow in Arrows task). Additionally, in the Numbers tasks, five randomly allocated digit sizes were presented to increase the demands for processing the numerical value and to minimize the possibility that subjects would set a search template for some perceptual feature of the ''no-go'' trial (the digit ''3''). Digit font sizes were 60, 80, 100, 120 and 140 in Arial text font (similar to Dockree et al., 2005). The main difference between the tasks is that in the Arrows tasks the participants were instructed to initiate the simulated operation with the right hand (step 2) if the white arrow was pointing to the right, or with the left hand (step 3) if pointing left (as depicted on **Figure 2C**), while in the Numbers task, the participants could freely choose between step 2 or step 3 (from the **Figure 2A**) upon seeing the digit. Each task consisted of 500 stimuli, where the probability of appearance of the ''no-go'' stimuli was set at 10% (50 in total), while the ''go'' stimuli were presented 450 times. The inter-stimulus interval (ISI) between two consecutive ''go'' stimuli was on average 11,240 ms (STD = 410 ms), while between ''no-go'' and following ''go'' stimuli the average ISI was 3210 ms (STD = 120 ms). The duration of the each task was around one and a half hours, upon which participants had a 15 min break, before starting the second task. Thus, the whole experiment lasted around 3 h and 15 min.

The task specifications were programmed in Simulation and Neuroscience Application Platform (SNAP)<sup>1</sup> , developed by the Swartz Center for Computational Neuroscience (SCCN). As explained in Bigdely-Shamlo et al. (2013) and Gramann et al. (2014), SNAP is a python-based experiment control framework that is able to send markers as strings to Lab Streaming Layer (LSL)<sup>2</sup> . LSL is a real-time data collection and distribution system that allows multiple continuous data streams as well as discrete marker timestamps to be acquired simultaneously in an eXtensible Data Format (XDF)<sup>3</sup> . This data collection method provides synchronous, precise recording of multi-channel, multistream data that is heterogeneous in both type and sampling rate (Bigdely-Shamlo et al., 2013; Gramann et al., 2014), and is obtained via a local area network (LAN).

# EEG Recording

EEG data acquisition was performed using the SMARTING (mBrainTrain, Serbia) wireless EEG system, with a sampling frequency of 500 Hz and 24-bit data resolution. The small and lightweight EEG amplifier (85 × 51 × 12 mm, 60gr) is tightly connected to a 24-channel electrode cap (Easycap, Germany) at the occipital site of the participant's head, using an elastic band. The connection between the EEG amplifier and recording computer was obtained using a Bluetooth connection, and the data were streamed to the described LSL recorder. The design of the cap-amplifier unit ensured minimal isolated movement of individual electrodes, cables, or the amplifier, which strongly reduced electromagnetic interference and movement artifacts. Further, the small dimensions of the recording system provided full mobility and comfort to the participants, as movement constraints were not imposed. The electrode cap contained sintered Ag/AgCl electrodes that were placed based on the international 10–20 System: Fp1, Fp2, Fz, F7, F8, FC1, FC2, Cz, C3, C4, T7, T8, CPz, CP1, CP2, CP5, CP6, TP9, TP10, Pz, P3, P4, O1 and O2. The electrodes were referenced to FCz and the ground electrode was AFz. Before initiation of the experiments, the experimental procedure imposed that the electrode impedances must be below the 5 kΩ value, which was confirmed by the device acquisition software.

#### ERP Processing

EEG signal processing was performed offline using EEGLAB (Delorme and Makeig, 2004) and MATLAB (Mathworks Inc.,

<sup>1</sup>https://github.com/sccn/SNAP

<sup>2</sup>https://code.google.com/p/labstreaminglayer/

<sup>3</sup>https://code.google.com/p/xdf/

Natick, MA, USA). EEG data were first bandpass filtered in the 1–35 Hz range, following which the signals were re-referenced to the average of the mastoid channels (Tp9 and Tp10). Further, an extended infomax Independent Component Analysis (ICA) was used to semi-automatically attenuate contributions from eye blink and (sometimes) muscle artifacts (as explained in Viola et al., 2009; De Vos et al., 2010, 2011). After this data preprocessing, ERP epochs were extracted from −200 to 800 ms with respect to timestamp values of ''go'' and ''no-go'' stimuli indicated by the SNAP software. Baseline values were corrected by subtracting mean values for the period from −200 to 0 ms from the stimuli. The identified electrode sites of interest for the ERP analysis in this study were Fz, Cz, CPz and Pz, as the P300 component is most prominent over the central and parietocentral scalp locations (Picton, 1992).

For the ''no-go'' condition we extracted and averaged the ERPs across the trials. For the ''go'' condition, the ERPs that preceded the ''no-go'' condition were calculated. Following these steps, the grand average (GA) ERPs across participants were formed. Further, the P300 amplitude was calculated for both ''go'' and ''no-go'' conditions and for each experimental condition, using mean amplitude measure (Luck, 2005) in the time window from 350 to 450 ms, with regard to the time stamps of the stimuli. Finally, the statistical analysis on the obtained results was carried out.

## Reaction Times

As already stated in Section ''Experimental Procedure'', our experimental design did not allow subjects to react with the button press upon seeing the visual ''go'' stimulus. Therefore, the RT could not be measured in the traditional fashion, as the time elapsed between the stimulus presentation and the response by the participants (usually executed with the right index finger). Instead, the RTs here were measured as the time elapsed between the stimulus presentation (step 1) and the pedal press (step 6 from the Section ''Preparation'', also depicted on the **Figure 2A**). The pedal used in our study was actually a modified mouse button and it was connected to the recording computer via USB connection. As LSL is capable of real-time recording of the timestamps of the mouse button press, it enabled us to gather precise information regarding the time when pedal was pressed. This allows the calculation of RTs, as the difference between timestamps from stimulus presentation (operation initiation) and the beginning of the machine simulated crimping process.

#### Error Processing

Errors of omission were classified as the errors occurring when participants did not respond to the appearance of the ''go'' stimuli. The commission errors processing was challenging, since our task did not require a speeded button press and therefore, the errors of commission were difficult to interpret. In fact, the most obvious classification of commission errors would be when participants completely execute the simulated operation upon appearance of the ''no-go'' stimuli. However, it is important to note that participants sometimes made slight movements upon appearance of the ''no-go'' stimuli (in sense that they showed intention to initiate the action) and then they inhibited the response upon realization that it was a ''no-go'' stimulus. This kind of error we classified as near-misses. The identification of the near misses and commission errors was conducted initially by the experimenters in the room and subsequently confirmed in an off-line analysis, by replaying the videos recorded with the RGB camera during the experiment.

#### Statistical Analysis

The statistical analysis was performed using IBM SPSS software. The ERPs used for statistical analysis included all ERPs related to the ''no-go'' condition and 50 ERPs related to ''go'' preceding the ''no-go'' condition. The 4 × 2 × 2 × 2 repeated measures analysis of variance (ANOVA) was conducted with Site (Fz, Cz, CPz and Pz), Task (Arrows vs. Numbers) and Condition (''go/no-go'') as within subject factors and Order of presentation (first vs. second) as between-subject factor. Additionally, a 2 × 2 ANOVA comparing RTs across Task (Arrows vs. Numbers) as within subject factors and Order of presentation (first vs. second) as between subject factor was conducted. Finally, we carried out a 2 × 2 ANOVA comparing near misses across Task (Arrows vs. Numbers) as within subject factors and Order of presentation (first vs. second) as between subject factor. Greenhouse-Geissser corrections (FG) were applied where necessary. Since the participants did not make any omission errors and only seven commission errors occurred across the participants they were exempted from further statistical analysis.

# RESULTS

#### Behavioral Results

#### Reaction Times

The 2 × 2 ANOVA comparing RTs across Task (Arrow vs. SART) condition as within subject factor and Order of presentation (first vs. second) as between subject factor revealed neither significant main effects, nor interaction effects.

#### Errors

As stated in the ''Materials and Methods'' Section (Section ''Statistical Analysis''), the participants did not make any omission errors and the low number of omission errors were not statistically analyzed. However, regarding near-misses, the ANOVA revealed only a significant effect of task (F(1,8) = 11.9, p < 0.01, η = 0.60) with more near-misses occurring in the Numbers compared to the Arrows task.

## ERP Results

The GA ERPs for each task (Arrows and Numbers), each condition (''go/no-go'') and each electrode site under study (Fz, Cz, CPz and Pz) are depicted in **Figure 3**.

The 4 × 2 × 2 × 2 ANOVA revealed that the ERPs differed depending on the condition (Go/No-Go: F(1,12) = 5.99, p < 0.05, η = 0.33), the task (Task: F(1,12) = 17.06, p < 0.001, η = 0.59), the order of presentation (Order of presentation: F(1,12) = 15.635, p < 0.01, η = 0.57) and across the scalp (Site: F(1.48,17.75) = 5.352,

p < 0.05, η = 0.31). Namely, the P300 amplitudes elicited for ''go'' trials were higher than for ''no-go'' trials (M = 5.73, SD = 1.47; M = 2.25, SD = 1.41, respectively). Further, the Arrow task produced higher amplitudes in comparison to Numbers (M = 5.24, SD = 1.11; M = 2.73, SD = 1.46, respectively). The P300 amplitudes elicited with regard to the Order of presentation demonstrated higher amplitudes for whichever task was presented first in comparison to second task (M = 5.11, SD = 1.31; M = 2.86, SD = 1.54, respectively). Finally, amplitudes elicited at Pz were significantly higher than the amplitudes at the other three sites and amplitudes at CPz site were higher than at Cz and Fz sites at the p < 0.05 level. All the other comparisons were significant in the same direction apart from the Fz-Cz difference.

**Figure 4** depicts the GA ERPs elicited over all four electrode sites under study for the ''go'' condition.

The P300 amplitude differences for all four sites and depending on the task representation order are presented in **Figure 5**.

#### DISCUSSION

The present study investigated whether operators' attention is enhanced when they are instructed with which hand to initiate the manual assembly operation, as compared to spontaneous and free choice of preferred hand. The attention was assessed through the P300 amplitude, as it is widely accepted that the P300 amplitude is positively related to the human level of attention (Ford et al., 1994; Polich, 2007; De Vos et al., 2014). For this aim we simulated a manual assembly operation, where we provided the participants with two distinct psychological tasks (Numbers and Arrows) simultaneously with the simulated operation.

The P300 components' amplitude was significantly higher in magnitude for the frequent ''go'' (target), than for the infrequent ''no-go'' condition (as presented on the **Figure 3**). This finding is in contrast to the majority of previously reported studies where an infrequent target condition elicits a higher magnitude of the P300 amplitude, since the participants are usually required to note the occurrence of infrequent targets by button press or by silent counting (Strüber and Polich, 2002). On the other hand, in our task target stimuli were the frequent ones, as the continuity of operation in manual assembly is essential, while the participants were instructed just to sit still and with no actions during the infrequent ''no-go'' condition. As such, it is not surprising that the lower magnitude of the P300 amplitude were elicited in infrequent non-target condition, as passive stimulus processing induces smaller P300 amplitudes than active tasks (Polich, 2007). This was also supported by the results from the study of Potts et al. (2004), where they reported that the P300 amplitude was larger in frequent ''go'' condition as compared to rare non-target condition in the task where the ratio between ''go'' and ''no-go'' condition was 80/20. Moreover, it was found that the ISI between target stimuli influences the P300 amplitude, in the sense that a short ISI leads to decreased amplitude, while relatively long ISIs elicit the

higher P300 amplitude, which is the case even in the singlestimulus paradigm (Strüber and Polich, 2002; Polich, 2007). This was the case also in our study, since the ISI was relatively long (approximately 11 s) and we believe that it was suitable for eliciting the P300 amplitude even in the frequent target condition.

line if the task was presented as second task.

The main finding of the present study is that the P300 amplitude was considerably higher in magnitude when participants were instructed with which hand to initiate the simulated assembly operation, as compared to the case when participants could freely choose the preferred hand for the operation initiation. This may not be surprising, since in the choice reaction task (Arrows) participants were subjected to slightly higher demands of the incoming stimuli evaluation, as they were un-aware of the direction in which the white arrow stimuli would point. On the other hand, the digit stimulus carries considerably lower information, as participants are required just to make distinction whether it is a ''go'' or ''no-go'' stimulus and to perform their action accordingly, i.e., the participants may stop evaluating the content of the stimuli after some time. Therefore, the response selection requirements during the Arrows task are substantially higher than in Numbers task, which may lead to increased P300 amplitude in the condition which required instructed responding from the participants (Verleger et al., 2005, 2014). Following this finding, it may be proposed that the workers on repetitive and monotonous assembly tasks should not receive information solely on whether they should initiate the operation or not, but it should be beneficial if they receive information that carries slightly higher cognitive demands. In fact, the task that consisted of the stimuli with the higher cognitive demands induced the higher P300 amplitude, which may be related to the attention of the worker for the task in hand. An important notion, however, is that there is possibility that the P300 amplitude in this study does not reflect solely the attention level of a worker, but it also may be influenced by the different cognitive demands of the tasks. For that reason, it is important to further investigate whether the P300 amplitude was influenced by the presented

task demands, or it was solely related to the attention of the workers.

Interestingly, although it was expected that the RTs could differ between the two tasks, this was not the case in our study. One of the possible reasons for the absence of the response time effect could be the methodology used for the RTs calculation. In fact, the time period for RT calculation is much longer than in the conventional studies, where a speeded response from the participants is expected. Apart from that, the RT calculation includes several coordinated hand movements before the foot switch is pressed. All of these could induce a large variation within and between subject conditions, which may induce inaccuracy of the RT methodology used in this study. Further, with regard to behavioral measurements, the number of commission errors was relatively low and did not differ between the tasks. However, there was significantly higher amount of near-misses in the Numbers than in the Arrows task. The fact that there was larger number of nearmisses in the Numbers task may be expected, as the Arrows task imposes a higher workload on the participants, due to the higher response selection requirements, and as it was previously reported, the errors and mental workload are related according to a U-shaped curve (Desmond and Hoyes, 1996).

Although we showed that the Arrows task produced a higher P300 amplitude than the Numbers task, one could argue about the selection of the tasks, as the stimulus type between task conditions significantly differed (digits vs. arrows). The main reason for not investigating the difference between instructed and non-instructed condition with the same type of stimuli was the avoidance of the interference effect (Pashler, 1994). In fact, if only stimuli from Numbers task were used and dedicated the directions to specific digits in the hand instructing task (e.g., odd numbers means left and even numbers right hand first), it would be highly likely that the memory would strongly influence the attention processing. On the other hand, if we only used the Arrows stimuli type, an undesired bias would be included in the condition when participants could initiate the operation with their preferred hand. An additional concern is whether the two distinct psychological tests trigger different attentional resources, given that they are composed of different stimulus types and that the Arrows task alternates the response hand, while in the Numbers task participants could respond with whichever hand they preferred. The answer to this doubt could be found in premotor theory of attention (Rizzolatti et al., 1994), which states that attention orienting processes are triggered during uni-manual response preparation and that the orienting processes are assumed to be equivalent to the processes elicited during instructed endogenous shifts of spatial attention (Eimer et al., 2005). Moreover, Ranzini et al. (2009) also used the tasks with Arabic digits and Arrows and they demonstrated that processes evoked by these cues are alike and that the volitional and non-volitional attentional shifts rely on the same fronto-parietal brain networks. Thus, both Numbers and Arrows tasks should evoke the same cognitive resources of attention, which gives legitimacy to the choice of the tasks used in this study.

One of the limitations of the present study is that it was conducted in a simulated working environment, instead of a real factory setting. The main reason for this was usage of the wetelectrode EEG recording system, which is still uncomfortable for application in actual industrial environments. Nevertheless, we replicated both the spatial dimensions and ambient conditions and performed the wearable EEG study, demonstrating its applicability for the investigation of covert cognitive processes in naturalistic environments for HF/E studies. Another limitation is that, simultaneously with the simulated operation, we used two distinct psychological tests, with the aim of eliciting the P300 ERP component. Although it could be argued that psychological tests could interfere with the simulated operation, an important notion is that the assembly workers should be provided with timely information regarding the performed operation (Stork and Schubö, 2010). Therefore, we believe that this modification did not significantly differ from the actual assembly operation in industrial environments. Moreover, in naturalistic settings it is usually hard to isolate and analyze the specific cognitive process, since they should first be evoked and co-occurring cognitive factors should be isolated (Bulling and Zander, 2014). Thus, this modification in the information presentation to the participants was necessary in order to elicit the anticipated P300 ERP component during the simulated assembly operation. Unfortunately, the present study is unable to compare brain responses between self-paced (as in this specific workplace) and externally paced work routines that we used in our study. This issue should be addressed in future studies.

The present study demonstrated that wearable EEG recording could be beneficial for task design in HF/E studies. Future studies should investigate whether the reported findings also hold for similar job positions, which are monotonous and repetitive in nature but require continuous focus of the worker on the industrial task (e.g., quality control tasks). Although the present study utilized wearable EEG in a faithfully replicated workplace environment, it seems that it is just a matter of time until EEG systems will be willingly accepted for everyday use (van Erp et al., 2012; Mihajlovic et al., 2015). This could even lead to the application of passive brain-computer interfaces, which could be used for real-time assessment of the cognitive user states in industrial environments (Zander and Kothe, 2011). Nevertheless, the fact that it is nowadays possible to investigate brain dynamics during natural movements (without imposing movements constraints) of the recorded individual brings us a step closer to the guiding principle of the neuroergonomics, that is, to investigate how the brain carries out the complex tasks of everyday life and not just simplified and artificial tasks in the laboratory settings (Parasuraman and Rizzo, 2006).

# CONCLUSION

Comparing monotonous (''go/no-go'') Numbers task to the choice-reaction (Arrows) task, which instructs the participants with which hand to commence the assembly operation, the present study indicates that the latter is more suitable to preserve participants' attention during the initiation of externally-paced assembly task. This finding was achieved through investigation of the ERP waveform, where it was found that the P300 amplitude, which is related to the level of attention, was enhanced in the task that instructed the participants with which hand to initiate the simulated assembly operation. This study demonstrated the potential benefits of introducing the EEG measurements in the industrial task design, as from the presented results it may be concluded that in in monotonous assembly tasks, instructed responding, or a similar method of engagement, should be imposed on operators, since it is indicated that additional engagement enhances the worker's attention.

# AUTHOR CONTRIBUTIONS

Study design and protocols were created by PM, VK, MDeV, BJ, IM and IG. Data acquisition was performed by PM and IG. Data analysis was performed by PM, VK, MDeV, IM and IG. The data interpretation was performed by PM, VK, BJ, IM and IG. The manuscript was written by PM and VK and critical editing was performed by MDeV, BJ, IM and IG. Final approval of the version was obtained from all co-authors. All co-authors agree on all aspects of the work and ensure that questions related to the accuracy and integrity of any part of the submitted work are appropriately investigated and resolved.

# ACKNOWLEDGMENTS

This research is financed under EU—FP7 Marie Curie Actions Initial Training Networks—FP7-PEOPLE-2011-ITN, project name ''Innovation Through Human Factors in Risk Analysis and Management (InnHF)'', project number: 289837. We would further like to acknowledge company ''Gomma Line'' (Serbia), for their assistance and advisory during the experimental setup phase. We would like to acknowledge Nora Balfe from Center for Innovative Human Systems (CIHS), Trinity College Dublin, Department for Psychology, for the critical comments of the manuscript.

# REFERENCES


**Conflict of Interest Statement**: IG is associated with the mBrainTrain Company, supplier of wireless EEG system ''SMARTING'' used in this study. However, no financial or other conflicting interests arise from this fact. The other authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Mijovi´c, Kovi´c, De Vos, Maˇcuži´c, Jeremi´c and Gligorijevi´c. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Pre-Trial EEG-Based Single-Trial Motor Performance Prediction to Enhance Neuroergonomics for a Hand Force Task

Andreas Meinel <sup>1</sup> , Sebastián Castaño-Candamil <sup>1</sup> , Janine Reis <sup>2</sup> and Michael Tangermann<sup>1</sup> \*

<sup>1</sup> Brain State Decoding Lab, Cluster of Excellence BrainLinks-BrainTools, Department of Computer Science, Albert-Ludwigs-University, Freiburg, Germany, <sup>2</sup> Department of Neurology, Albert-Ludwigs-University, Freiburg, Germany

We propose a framework for building electrophysiological predictors of single-trial motor performance variations, exemplified for SVIPT, a sequential isometric force control task suitable for hand motor rehabilitation after stroke. Electroencephalogram (EEG) data of 20 subjects with mean age of 53 years was recorded prior to and during 400 trials of SVIPT. They were executed within a single session with the non-dominant left hand, while receiving continuous visual feedback of the produced force trajectories. The behavioral data showed strong trial-by-trial performance variations for five clinically relevant metrics, which accounted for reaction time as well as for the smoothness and precision of the produced force trajectory. 18 out of 20 tested subjects remained after preprocessing and entered offline analysis. Source Power Comodulation (SPoC) was applied on EEG data of a short time interval prior to the start of each SVIPT trial. For 11 subjects, SPoC revealed robust oscillatory EEG subspace components, whose bandpower activity are predictive for the performance of the upcoming trial. Since SPoC may overfit to non-informative subspaces, we propose to apply three selection criteria accounting for the meaningfulness of the features. Across all subjects, the obtained components were spread along the frequency spectrum and showed a variety of spatial activity patterns. Those containing the highest level of predictive information resided in and close to the alpha band. Their spatial patterns resemble topologies reported for visual attention processes as well as those of imagined or executed hand motor tasks. In summary, we identified subject-specific single predictors that explain up to 36% of the performance fluctuations and may serve for enhancing neuroergonomics of motor rehabilitation scenarios.

Keywords: single-trial performance prediction, trial-by-trial variability, isometric force modulation, hand motor rehabilitation, visuomotor integration, EEG, oscillatory subspace, spatial filtering

# 1. INTRODUCTION

Motor training is utilized in rehabilitation scenarios to accelerate the re-gain of lost motor function after brain injury. State-of-the-art rehabilitation concepts are based on repetitive training tasks with the aim to reach a functional gain (Dobkin, 2004; Timmermans et al., 2009; Langhorne et al., 2011). Most prominent training paradigms comprise mirror training (French et al., 2007),

#### Edited by:

Klaus Gramann, Berlin Institute of Technology, Germany

#### Reviewed by:

Sara L. Gonzalez Andino, Hôpitaux Universitaires de Genève, Switzerland Floriana Pichiorri, Fondazione Santa Lucia, IRCCS, Italy Jörn M. Horschig, Artinis Medical Systems BV, Netherlands

#### \*Correspondence:

Michael Tangermann michael.tangermann@blbt. uni-freiburg.de

Received: 15 December 2015 Accepted: 04 April 2016 Published: 25 April 2016

#### Citation:

Meinel A, Castaño-Candamil S, Reis J and Tangermann M (2016) Pre-Trial EEG-Based Single-Trial Motor Performance Prediction to Enhance Neuroergonomics for a Hand Force Task. Front. Hum. Neurosci. 10:170. doi: 10.3389/fnhum.2016.00170

constraint-induced movement therapy (Wolf et al., 2002), simultaneous bilateral training (Coupar et al., 2010), BCIsupported training (Ang and Guan, 2013) and robot-assisted techniques (Kwakkel et al., 2008). Recent rehabilitation approaches include the training of novel, unfamiliar motor skills instead of training well-known habitual motor tasks, attempting to optimize functional cortical reorganization.

Repetitive paradigms allow for the assessment of motor performance on a very fine-granular time scale. The performance of each single trial can be monitored by metrics such as the length, speed or smoothness of the produced movement trajectory. The distributions and temporal characteristics of trial-wise motor performance variations have been studied by different groups (Abe and Sternad, 2013; van Beers et al., 2013; Wu et al., 2014; Hadjiosif and Smith, 2015). While practicing a motor task over several sessions enables a user for skill acquisition (Lage et al., 2015), trial-by-trial variability of motor performance is a prominent feature which does not fully vanish with training (Cohen and Sternad, 2009; Osu et al., 2015). The underlying neural mechanisms of motor performance fluctuations on short time scales is subject of controversial discussion in literature and is not fully resolved yet (Faisal et al., 2008; Hadjiosif and Smith, 2015; Osu et al., 2015).

In the present work, we aim toward closing this gap. Therefore, trial-wise performance fluctuations of a sequential visuo-motor task (SVIPT; Camus et al., 2009; Reis et al., 2009; Fritsch et al., 2010 are investigated while registering a user's brain activity by EEG. In SVIPT trials, the quality of a movement changes within seconds and from repetition to repetition.

Our hypothesis is that subject-specific pre-trial brain signals can partially explain and temporally predict the trial-by-trial fluctuations of the upcoming motor performance. Given that such informative neural markers exist, then the SVIPT paradigm could be altered in order to meet the cognitive ergonomic requirements of each single user. Practically, the starting time point of the upcoming trial can be determined based on the information contained in this pre-trial neural marker. Ideally, such a neuroergonomic closed-loop gating strategy could provide control over the level of difficulty. This should allow to causally influence user performance and ultimately support SVIPT motor learning on the long run.

Paradigms which include brain-state-dependent experimenting (see Jensen et al., 2011; Horschig et al., 2014c) require that an informative neural marker can be extracted robustly from brain signals. Given the high dimensionality and noisy characteristic of most types of brain signals, the extraction and decoding of such individual neural markers is a challenging task.

Screening literature on relevant neural markers of visual and motor performance, it is important to make a distinction between the use of single-trial decoding in contrast to the extraction of statistical differences, which may even be reported as group averages. Neural features which correlate with the task performance on the grand average (GA) of a set of subjects have limited usefulness for closed-loop experimenting with a given individual. As inter-subject differences get lost during the averaging, GA features may have low predictive power when tested with data of a novel subject. Research in the field of brain-computer interfaces (BCI) has pushed forward methods for single-trial decoding of individual brain activity (mostly EEG signals) (Millán et al., 2010; Makeig et al., 2012). Results from this field affirm that brain signals and informative features vary strongly between individuals (Müller et al., 2008). To obtain optimal decoding results, BCI data processing pipelines thrive to identify subject-specific informative features. Technically, these are gained either from a calibration recording prior to the online use of the BCI (Blankertz et al., 2007), or by transfer learning methods (Kindermans et al., 2014) which exploit features from pre-trained machine learning models of earlier sessions or previous users. Furthermore, attention needs to be paid to temporal dependencies: brain features may correlate with previous behavior, with simultaneous behavior or may even be predictive for future behavior. Only the latter brain features can serve as a tool for brain-state-dependent experimenting.

Statistical correlates of visual perception performance are reported by several groups. For stimuli near the perception threshold, the pre-stimulus occipital alpha bandpower correlates with the detection performance (van Dijk et al., 2008), even on a single-trial basis using predictive features (Hanslmayr et al., 2007). In addition to bandpower, the pre-stimulus alpha phase was reported to correlate with the detection performance (Busch et al., 2009). Single-trial decoding methods were not applied in those auditory studies, but the reported correlates precede the perception, which may open the possibility for closed-loop experimenting. Based on the findings of Hanslmayr et al. (2007) and van Dijk et al. (2008), there are two examples that set up an online experiment based on occipital alpha bandpower features. Tonin et al. (2013) using EEG data and Horschig et al. (2014b), who employed MEG signals, both decoded covert visual attention in a closed-loop experiment by utilizing singletrial feedback on the detected attention shift. However, both groups did not fully close the loop e.g., by manipulating the perception performance, which may have been possible by selecting suitable brain states for stimulation. Gonzalez Andino et al. (2005) studied a cued reaction time task and identified that gamma band oscillatory activity observed in fronto-parietal regions prior to the stimulus onset correlates with reaction time. Similarly, Hoogenboom et al. (2010) stated that the strength of visually induced gamma band activity is predictive for the detection of stimulus motion. Somatosensory stimuli of lowintensity, but above threshold were delivered and combined with a distracting masker stimulus by Schubert et al. (2009). Investigating perceived vs. missed stimuli in an offline analysis, pre-stimulus beta bandpower over the left frontal cortex was found predictive for the perception performance on the grand average, as well as mu and beta bandpower over the pericentral sensorimotor areas.

In the motor domain, several groups have successfully decoded hand kinematics, using the center-out task as the dominating experimental approach. In their own work, Jerbi et al. (2011) provide a review over the decoding of hand movement parameters such as direction, position and velocity based on brain signals. ECoG signals were used by Pistohl et al. (2008) to decode two-dimensional hand movement trajectories using an autoregressive filtering approach. Considering non-invasive techniques, Waldert et al. (2008) have decoded (but not temporally predicted) the hand movement direction based on MEG and EEG. Neural correlates which encode the velocity of a movement have been investigated by Bradberry et al. (2009). The decoding of produced grip force based on a phase feature extracted from the beta range has been reported on data of three subjects by Logar et al. (2008). Zaepffel et al. (2013) reported an increased centro-parietal beta power during the planning period of grasping movements, but it was not investigated, if decoding may work on the basis of single trials. Focusing on single-trial methods, Lew et al. (2014) used slow cortical potentials of the EEG from fronto-parietal areas to predict self-paced movement directions a few hundred milliseconds prior to movement onset. Similarly, Hammon et al. (2008) inspected predictive EEG features for planning target directions using a cue-based paradigm.

In the field of BCI research, Maeder et al. (2012) studied a motor imagery paradigm. The single-trial decoding performance of left vs. right hand movement imagery tasks could be correlated to the level of pre-trial alpha bandpower over the sensorimotor cortices. Despite used offline, this neural marker would allow for a predictive intervention in a closed loop. In their statistical analysis, Yang et al. (2014) identified frontal alpha and beta bandpower features which correlate with performance metrics of a reaching task. Proceeding to single-trial methods, Meyer et al. (2014) reported on data of six subjects, who performed a hand positioning task. Their offline analysis revealed that the normalized time-to-target could be predicted based on pre-cue alpha-band activity of the EEG.

The state-of-the-art can be summed up as follows: In the perception domain, several studies have established single-trial performance prediction, partially even in closedloop applications. The situation is different for the motor domain since only very few studies have investigated subject-specific motor performance prediction in singletrial upon a sufficiently large subject group. Closest to all of these requirements is the study by Meyer et al. (2014). Our research hypothesis builds exactly upon this point. In the context of a hand force task, we propose a generalized workflow which identifies subject-specific predictive oscillatory EEG features evaluated on a single trial basis.

First, by means of a simulated online analysis, an approach to extract robust and meaningful EEG components is developed. We evaluate, if the information contained in selected components is able to partially explain the trial-bytrial variation of SVIPT performance in a predictive fashion, i.e., the pre-trial component is required to predict the outcome of the upcoming trial. Second, the characteristics of the best performance predictors are investigated by a group-level analysis.

# 2. MATERIALS AND METHODS

# 2.1. Hand Motor Paradigm

In the context of hand motor skill learning, Reis et al. (2009) investigated the Sequential Visual Isometric Pinch Task (SVIPT), which demands an isometric force control of thumb and index finger. Interestingly, training-induced improvement of the SVIPT generalizes well to other hand motor control tasks, even though pinch grasp activities are rarely displayed during natural behavior patterns. Compared to the original SVIPT setup, brain activity is recorded using electroencephalogram (EEG) during a training session for post-hoc offline analysis. The resulting SVIPT setup follows the proposal in Meinel et al. (2015) and is sketched in **Figure 1**.

Each SVIPT trial consists of three phases: a light blue (inactive) cursor appears on the leftmost edge of the T0 field (corresponding to zero force), while the user is touching the sensor only slightly with his non-dominant left hand. The appearance of the cursor indicates the start of the get-ready phase, which corresponds to a waiting period with enhanced attention level. Its duration is varied randomly between 2 and 3 s. The transduction of force into cursor movements is deactivated during the get-ready phase. Fixating the cursor, the user will observe a distinct color change of the cursor from light to dark blue. This go-cue indicates the beginning of the running phase, in which the cursor position can be controlled by applying force to the sensor. As force is transduced into horizontal cursor position, increasing force will move the dark blue cursor to the rightmost position, which is pre-calibrated at session start to represent 30% of the user's maximum force. The user has been instructed to navigate the cursor as quickly and accurately as possible, in order to visit a sequence of target fields (T0, T1, and T2). Overshoots of the cursor are to be avoided. The current target field is visually indicated to the subject by a green shading (see **Figure 1**). Reaching a target field, a dwell time of 200 ms must be fulfilled in order to achieve a successful hit of this target field. Hit events are indicated visually by a switch of the target field (another field is shaded in green), or by the end of the trial. Trials were chosen randomly from two conditions, each with a specific required target field sequence (T1-T0-T2-T0 or T2-T0- T1-T0). A trial was finished by fulfilling the complete sequence – skipping a target was not allowed. Trial duration is presented visually as an immediate performance feedback during the pause phase between trials.

# 2.2. Subjects and Ethics

Overall, 20 right-handed normally aged subjects (8 female, average age: 53 years, std: 6 years) were recruited. The subject group resembles the target group of first-stroke patients with respect to age and gender (Ovbiagele and Nguyen-Huynh, 2011). The term normally aged was chosen to indicate our selection criteria: the participants did not have any known neurological or psychological history and were probably healthy—even though we can not exclude the possibility that some participants had a history of unrecognized micro stroke events.

The offline study was approved by the Ethics Committee of the University Medical Center Freiburg. Following the principles of the Declaration of Helsinki, written informed consent was given by subjects prior to participation. In one session of about 3–4 h (including EEG setup and washing the hair), every participant controlled the cursor with their left hand for 20 blocks of 20 trials each.

## 2.3. SVIPT Performance Scores

SVIPT enables to capture single trial motor performance. Given a high order motor control, the force profile F(t) of a single-trial is characterized by a quick force ramp up upon the presentation of the go-cue and the avoidance of overshoots. The requested speedaccuracy trade-off can be translated into various performance scores of the SVIPT task. In Tangermann et al. (2015), the authors selected a set of scores, which describe the single-trial performance:


$$CPL = \int\_{t\_{\text{ges}}}^{t\_{\text{hit}}} |\dot{F}(t)| \, dt'$$

• **Integrated Squared Jerk/ISJ**: The level of fine-granular motor control is reflected in variations of the trajectory smoothness. Therefore, jerk—defined as the third derivative of the force profile—is expressed by the ISJ metric, which is defined as:

$$ISI = \int\_{t\_{\rm g\phi}}^{t\_{\rm h\dot{t}}} |\frac{d^3F(t)}{dt^3}|^2 \, dt'$$

• **Normalized Jerk/NJ**: A unit-free variant of ISJ captures smoothness variations. It is given by the normalized jerk:

$$N\!\!\!/ = \sqrt{\frac{ISI \cdot DUR^5}{2 \cdot CPL^2}}$$

Since there are two conditions of target field sequences, a standardization of the performance scores (except for RT) is the prerequisite for pooling trials of both conditions. Therefore, the extracted metrics of each condition were standardized (zero mean and standard deviation one) prior to pooling. Except RT, the metrics are defined with respect to some end point (e.g., thit). Choosing this boundary represents a trade-off between (a) harvesting a metric which is temporally close and thus related to the get-ready interval (the interval before the go-cue), and (b) including thorough information about the force trajectory of the current trial. To balance the two conflicting goals, we chose the hit of the first target field.

# 2.4. Data Acquisition and Preprocessing

During a single session, subjects were placed in a chair at 80 cm distance from a 24-inch flat screen. EEG signals from 63 passive Ag/AgCl electrodes (EasyCap) were recorded, which were placed according to the extended 10–20 system. Impedances were kept below 20 k. All channels were referenced against the nose. The signals were registered by multichannel EEG amplifiers (BrainAmp DC, Brain Products) at a sampling rate of 1 kHz. An analog lowpass filter of 250 Hz was applied before digitization. The signal of the force sensor was recorded by an additional amplifier system (BrainAmp ExG, Brain Products).

For outlier identification, the offline preprocessing consisted of high-pass filtering the raw EEG signals at 0.2 Hz, low-pass filtering at 48 Hz and sub-sampling to 500 Hz sample frequency. Therefore, linear butterworth filters of 5th order were applied. For each trial and all 63 channels, an epoch of 2000 ms duration prior to the go-cue was extracted. In order to identify outlier epochs, three rejection methods were applied. First, EEG epochs violating a min-max threshold of 60µV on frontal channels were excluded from further analysis. Second, a variance threshold on single epochs and channels was applied to remove high-frequent muscular artifacts. Therefore, the variance of single epochs needs to be within Pup = 90th percentile and is not allowed to exceed 2 · (Pup − Plow) with Plow = 10th percentile. Third, epochs belonging to extreme trials, represented by outliers of the motor performance metric, were removed. For this purpose, the following min-max thresholds were defined based on earlier pilot recordings (Meinel et al., 2015). The thresholds were [150, 900] ms for RT, [−1.5, 1.5] for ISJ, [−0.6, 0.6] for CPL, [−1.5, 2] for DUR, and [0, 1300] for NJ. They were applied prior to further data analysis. The total number of trials N<sup>e</sup> entering the following offline analysis procedures varied across subjects and performance metrics. Only for 2 out of 20 subjects less than 150 out of the original 400 epochs were remaining after the EEG preprocessing. We discarded data of these subjects from the following analysis. The frequency filtering for our main analysis will subsequently be described in Section 2.6.

#### 2.5. Single-Trial Performance Prediction

In the following, the multivariate variable **x**(t) ∈ R Nc characterizes the EEG signal recorded from N<sup>c</sup> sensors. In addition, **s**(t) defines the time course of a neural source. The physics of volume conduction assumes a linear mapping of the source space to the sensor space. The forward (or generative) model thus reads :

$$\mathbf{x}(t) = \mathbf{A}\,\mathbf{s}(t) \tag{1}$$

where the matrix **A** ∈ R <sup>N</sup>c×<sup>M</sup> describes the projection of the M sources to the EEG sensor space.

The main goal is to approximate the true neural source **s**(t) by **s**ˆ(t), whose power achieves the highest correlation with a predefined external variable z(t), called target variable from this point onwards. Several methods can be used to estimate such a source, among them Blind Source Separation (BSS) and source reconstruction techniques. BSS techniques rely solely on an unsupervised framework, which may be a suboptimal approach given the availability of labels in the form of the target variable z. Source reconstruction techniques may provide a high level of interpretability for the results directly in the source space, and may describe non-stationarities in the data and other complex dynamics (Castaño-Candamil et al., 2015b). However, there are three potential drawbacks of source reconstruction approaches (Grech et al., 2008): First, the estimation of **s**ˆ(t) usually creates a rather high computational burden. Second, the methods require a forward model **A** for each individual subject, which may not be available in most situations since it corresponds to the exact anatomical description of a subject's brain. Third, as source reconstruction problems are intrinsically ill-posed, the quality of an estimated source depends on additional assumptions, such as the density of sources or their location within the brain.

An alternative to both, BSS and source localization approaches is the family of the so-called supervised spatial-filtering methods. One widely known approach is the Common Spatial Patterns algorithm (CSP; Ramoser et al., 2000), which searches for spatial filters that enhance the contrast between two classes. Consequently, it is well suited for supervised classification problems. A more recent approach, the Source Power Comodulation algorithm (SPoC; Dähne et al., 2014) is adequate for regression problems. As the five SVIPT metrics are continuous variables, we preferred to include the SPoC algorithm into our data analysis framework.

SPoC learns an optimal spatial filter **wopt** ∈ R Nc×1 that allows to estimate the source as **s**ˆ(t) by projecting **x**(t) into a subspace, which maximizes the correlation between the band power of **s**ˆ(t) and the target variable z(t):

$$
\hat{\mathbf{s}}(t) = \mathbf{w\_{opt}}^{\top} \mathbf{x}(t) \tag{2}
$$

Without loss of generality, the objective function for SPoC may be defined in terms of the epoched data **x**(e), where e refers to the e-th epoch. Assuming that the EEG sensor signal has been bandpass-filtered to a narrow frequency band and that the norm of the spatial filter is constrained, e.g., Var[**w** <sup>⊤</sup>**x**(t)] != 1, the optimization problem can be solved by maximizing the covariance:

$$\begin{aligned} \mathbf{w}\_{opt} &= \underset{\mathbf{w}}{\text{argmax}} \{ \text{Cov} [\Phi\_x(e), z(e)] \} \; \forall \; e\\ \text{s.t.} \; \text{Var} [\mathbf{w}^\top \mathbf{x}(t)] & \stackrel{!}{=} 1 \end{aligned} \tag{3}$$

where 8x(e) = Var[**s**ˆ](e). This formulation of the algorithm – called SPoCλ – can be transformed into a generalized eigenvalue problem and thus delivers a set of N<sup>c</sup> spatial filters **W** ∈ R Nc×Nc . In this paper, the SPoCλ algorithm is utilized which subsequently will be abbreviated by the term SPoC.

Applying a SPoC filter **wtr** learned from training data **xtr**(t), the method allows to estimate the target variable zest on novel, unseen test data **xte**(t) on a single-trial basis by calculating the bandpower of the narrowband subspace signal:

$$z\_{\rm est} = \text{Var}[\mathbf{w\_{tr}}^{\top}\mathbf{x\_{te}}(t)](e) \tag{4}$$

Using this relation, we will focus on the prediction of singletrial SVIPT performance using EEG activity within the get-ready phase of the trial.

As Haufe et al. (2014) have been pointing out, there is an existing forward model of the form of Equation (1) for every backward model as in Equation (2). Thus, the corresponding spatial activation patterns can be obtained from the spatial filters **W** via the covariance matrix **C<sup>x</sup>** of the data **x**(t) via:

$$\mathbf{A} = \mathbf{C}\_{\mathbf{x}} \mathbf{W} \tag{5}$$

Note, that any subspace components resulting from the SPoC analysis depend mainly on four hyperparameters. In the temporal domain, two of them define the epoching interval [t0, t<sup>0</sup> + 1t] where t<sup>0</sup> is the starting time relative to the go-cue and a duration 1t. In the frequency domain, the lower frequency f<sup>0</sup> and the band width 1f are the hyperparameters describing the band [f0, f<sup>0</sup> + 1f] in which **x**(t) is bandpass-filtered.

Even though simple regression of bandpower features on the channel level does not fulfill the requirements of the assumed forward model, we added this simple method for comparison with SPoC. Therefore, channel-wise bandpower features of the training and test set were calculated.

# 2.6. Selection Criteria for Informative Oscillatory Components

Performing a grid search across subjects and SPoC parameters, we restrict the evaluation to a fixed predictive time interval given by t<sup>0</sup> = −800 ms prior to the go-cue and a window size of 1t = 750 ms.

As sketched in **Figure 2**, logarithmically increasing and overlapping frequency bands ranging from ≈ 1–100 Hz (55 configurations in total) were evaluated from the original nonfiltered signals. For bandpass filtering, linear butterworth filters of 5th order were utilized. As a trial-wise target variable z, the five different performance metrics introduced in Section 2.3 were considered. Evaluating SPoC across the complete study group of 18 subjects, using five different motor performance metrics, sweeping through 55 discrete frequency bands and selecting the highest-ranked components (see details below) per configuration, results in more than 12,000 oscillatory components. In this section, we will describe an offline selection strategy in order to identify a subset of the most robust and informative oscillatory components which qualify to predict single-trial motor performance.

Upon each parameter configuration, a K = 5-fold chronological cross-validation procedure was employed upon the calculation of SPoC (Lemm et al., 2011). Only trials were considered, which survived the data preprocessing (see Section 2.4). From these, N<sup>e</sup> EEG pre-trial epochs and their corresponding target variable values z were extracted in chronological order and split into 5 equally-sized folds. Thus, 4-folds served as training data while the remaining one was used for validating the SPoC filter as described in Equation (4). Since each fold served as test fold once, the estimated target variable zest,<sup>j</sup> of fold j can be concatenated for all N<sup>e</sup> epochs, resulting in zest = [zest,j]j∈[1,K] . According to Equation (5), on each fold j the corresponding test pattern is given as

**a<sup>j</sup>** = **Cte**,**jwtr** utilizing the covariance matrix **Cte** of the test data **xte**(t).

The same cross-validation scheme was applied upon the linear regression model. The whole parameter space of 3600 configurations was screened. Note, that this number is smaller than the number of components delivered by SPoC analysis, since the latter may deliver more than one component per parameter configuration. The regression, which delivers a single component per configuration only, was trained on the training data and finally applied on test data such that an estimate zest was gained on all N(e) trials which had survived the data preprocessing step.

For a given parameter set, SPoC<sup>λ</sup> returns a set of N<sup>c</sup> filters. As described in Tangermann et al. (2015), it is sufficient to take only the first-ranked components into consideration<sup>1</sup> . For this purpose, we applied a rank-based criterion. First, removed the linear trend from the ordered set of N<sup>c</sup> eigenvalues. A threshold of 1.5·σ(r) relative to the standard deviation σ of the resulting N<sup>c</sup> residuals r was defined. We restricted the investigation to positive eigenvalues.

Given a single component **w**, the following set of scores enable to characterize its predictive strength and stability:

• **Correlation characteristics:** As a measure to verify the quality of the predictive strength of a SPoC configuration, the overall correlation of the Ne-many measured performance labels ztrue with the corresponding predictions zest can be considered:

$$R\_{all} = \text{Corr}[z\_{true}, z\_{est}] \tag{6}$$

Similarly, the predictive strength in terms of single-trial performance can also be verified by checking the mean of the fold-wise correlations R<sup>j</sup> = Corr[ztrue,j, zest,j], which rewards temporally stable components:

$$R\_{\text{folds}} = \frac{1}{K} \sum\_{j=1}^{K} \text{Corr}[z\_{\text{true},j}, z\_{\text{est},j}] \tag{7}$$

The correlation based metrics Rall and Rfolds come closest to the original optimization objective of the SPoC algorithm. If the trained spatial filters model trial-to-trial fluctuations well, Rall and Rfolds will report a large value, but only Rfolds allows to discriminate between single-trial predictors and session-trend models. Furthermore, a stable component requires that the correlation of each fold j shares the same sign with Rall. Thus, it is reasonable to require a high homogeneity Hfolds:

$$H\_{\text{folds}} = \sum\_{j=1}^{K} \Theta(\text{sign}(R\_{all}) \cdot \text{sign}(R\_j)),\tag{8}$$

with 2(x) = 1 for x ≥ 0 and 2(x) = 0 for x < 0 representing the unit-step function.

• **Separability of estimated performance:** Simulating the trial-wise online application, the continuous prediction is transferred into a two-class problem. Therefore, we split

<sup>1</sup>The components are sorted according to their eigenvalues. In case of SPoCλ, they equal to the covariance between the bandpower features and the target variable.

the N<sup>e</sup> prediction values zest into two distributions (zest,<sup>h</sup> and zest,<sup>l</sup> ) based on the 50th percentiles of the true target variable distribution from ztrue, thus representing high and low performance. The separability of zest,<sup>h</sup> and zest,<sup>l</sup> can be quantified by a statistical test. Here, the area under the receiver-operating-characteristic curve (Fawcett, 2006) is reported. It is denoted as z-AUC and has a chance level of 0.5.

• **Stressing the stability:** SPoC is a supervised method, which uses label information to guide the spatial filter calculation. Thus, the robustness of a resulting component can be stressed by introducing label noise. The concept of a step-wise reduction of the SNR of the labels has been introduced by Castaño-Candamil et al. (2015a). Here, SNR levels were varied from −20 dB to 10 dB by adding white noise. Applying SPoC, we estimated the target variable zest for all N<sup>e</sup> epochs using 5-fold cross-validation. At each SNR level, three sets of noisy labels z were calculated. For each SNR level, the separability of the resulting zest distribution is verified by the z-AUC value. Regarding the z-AUC values as a function of the SNR, the area under this curve—referred to as AAUCSNR—describes the stability of the component.

To finally identify and select robust and predictive components, we propose to apply three out of these five criteria in parallel. As a prerequisite, the data set needs to consist of at least N<sup>e</sup> = 150 trials in order to ensure the convergence of the SPoC algorithm (see Dähne et al., 2014; Castaño-Candamil et al., 2015a):

1. The separability of the predicted performance zest can be verified by the resulting z-AUC value. A corresponding threshold z-AUCmin = 0.59 was determined according to the 85th percentile across all configurations.


# 3. RESULTS

# 3.1. SVIPT Performance Metrics

Single-trial based SVIPT performance can be assessed by different metrics, as described in Section 2.3. In **Figure 3** examples of the trial-to-trial fluctuations of different metrics are visualized for two subjects. The visualization covers the full sessions, but omits trials removed during the preprocessing. **Figures 3A,D** show the metric reaction time (RT) for two subjects. It is not affected by a session trend. Its distribution is slight asymmetric, which is caused by a physiological limit for the minimal RT. The normalized jerk (NJ) in **Figures 3C,F** behaves in a similar manner. It is affected only slightly by a global trend, but shows a more skewed distribution compared to RT. In contrast, integrated squared jerk (ISJ) depicted in **Figure 3B**, and cursor path length (CPL) in **Figure 3E** both show a strong session trend, which can be explained by the user learning (data not shown here). A comparably strong session trend is present also in the duration metric DUR (data not shown).

The cross-correlations between all five metrics and the shape of their distributions were reported in Tangermann et al. (2015). Metrics ISJ, CPL and DUR showed strong correlations to each other, while RT as well as NJ both are rather independent from the four other metrics.

# 3.2. Contrasting SPoC with Linear Regression on Sensor Level

As a baseline comparison for the predictive power of SPoC components, a linear regression model employing channel-wise bandpower features was evaluated as described in Section 2.6. The resulting distributions of the overall correlation Rall and the performance separability z-AUC are reported in **Figure 4**. Across all configurations, SPoC delivers a median correlation Rall,med = 0.07 and a separability of z-AUCmed = 0.54, while on average the regression performs on chance level. While both methods come up with components revealing z-AUC values above chance level, those with the strongest predictive information are generated by the SPoC method.

# 3.3. Single-Trial Motor Performance Predictors

In **Figure 5**, five exemplary predictive and robust SPoC components, gained from five different subjects are characterized. Although SPoC components are computed from band-pass filtered data, the resulting filter **w** (gained on all available N<sup>e</sup> trials) of a component can be re-applied to nonfrequency-filtered epoched data. This spectral content of a SPoC component is shown in **Figure 5A**. The frequency band in which the component was extracted from is indicated by the dashed gray area. Using all available epochs, **Figure 5B** shows the spatial activity pattern gained via Equation (5). In **Figure 5C**, the SPoC filter weights on the 2D-scalp projection are shown. The scatter plot in **Figure 5D** reports on the measured performance metric ztrue as a function of the predicted performance zest according to the CV scheme described in Equation (4). The data points are colored by the fold index (1–5), which corresponds to the temporal order of the session. Fold 1 represents the beginning of the session, fold 5 its end. In addition, the overall correlation Rall reports on the predictive strength of the component. The distributions shown in **Figure 5E** illustrate the separability of the single-trial performance values zest. For this purpose,

the estimated labels zest have been reduced to the lower and upper quartile. The corresponding true labels ztrue were used to compute the quartiles Qlow,est and Qhigh,est and were fitted by a kernel distribution (solid lines). In an ideal case, those quartiles would converge toward the extreme quartiles (Qlow and Qhigh) of ztrue, which are indicated by dashed lines. As a score of their separability, the score z-AUC as described in Section 2.6 is reported based on the 50th percentile.

The exemplary components in **Figure 5** are selected across the investigated frequency range depicted in **Figure 2**. The predictor of S7 can be assigned to the theta band, those of S9 and S13 correspond to the alpha range, the component for S5 originates from the beta range and the one of S8 was found in the gamma range. Regarding the scatter plots, there are two different types of patterns recognizable: single-trial predictors showing a confined point cloud without a clear trend over time (all examples except for S13), whereas the scatter plot of subject S13 shows a clear trend over the course of the session. The separability plots indicate that the predictive power of a single component nicely matches with the z-AUC value.

# 3.4. Testing the Stability of SPoC Components

The stability of an oscillatory component can be challenged by varying the signal-to-noise ratio (SNR) of the target variable z. In **Figure 6**, the z-AUC score is investigated as a function of the SNR for two parameter configurations. **Figure 6A** shows a stable component, where z-AUC is expected to decrease, while for a non-informative component in **Figure 6B** the z-AUC can be expected around the noise floor. Thus, the resulting area under the z-AUC curve can be assessed as a tool for mapping the stability of the subspace component under challenging noise conditions. In **Figure 6C**, the distribution of this so-called AAUCSNR is reported for all evaluated SPoC components across all 18 subjects. The distribution of AAUCSNR values has its median at 0.07 and is slightly skewed.

# 3.5. Identification of Robust and Predictive Components

As described in Section 2.6, the first highest ranked components of each parameter configuration have been evaluated, resulting in about 12,000 different subspace components. In **Figure 7**, the configurations are characterized by their stability under noise (AAUCSNR), which is plotted in **Figure 7A** as a function of the separability measure z-AUC, in **Figure 7B** as a function homogeneity of the fold-wise sign of the correlation Hfolds and in **Figure 7C** as a function of the overall correlation Rall. A few observations can be made: First, the metrics are not centered at zero. Second, based on all initial configurations (blue data points), AAUCSNR correlates with the z-AUC as well as with Rall. The largest AAUCSNR values are evoked by the most homogeneous fold-wise correlation signs with Hfolds ≥ 3.

The threshold criteria applied to select the best of the 12,000 subspace components are indicated by red dashed lines, and red dots indicate the components finally selected. As shown in

inter-quartile ranges.

upper and lower quartiles of zest, which resulted in Qlow,est and Qhigh,est. As a reference, the extreme quartiles Qlow and Qhigh of ztrue are also given (dashed curves). In addition, the z-AUC value based on the 50th percentile split is reported.

**Figure 8**, the overall correlation Rall is strongly correlated with the z-AUC metric, such that an additional threshold criterion on Rall was not necessary. The most predictive components achieve a correlation value of up to 0.6, corresponding to Rall <sup>2</sup> = 0.36. Assuming a linear relationship between ztrue and zest as well as normally distributed data, this means that zest can explain up to 36% of the performance variance contained in ztrue.

In **Figure 9**, all 361 selected components are characterized by histograms in terms of their input parameters. **Figure 9A** displays the subject-wise grouping. In total, 11 out of 18 subjects contribute at least one component, for three subjects more than 50 configurations survive the selection procedure. **Figure 9B** shows the histogram over the number of trials available for the offline analysis. Note, that this histogram is dominated by the best three subjects reported in **Figure 9A**, who contributed a large number of the selected 361 configurations. **Figure 9C** characterizes the selected components assigned to their underlying frequency band [f0, f0+1f] (see **Figure 2**). Most components are gained from the alpha- and beta-band range. Interestingly, robust features detected in the gamma band were

FIGURE 6 | Stressing the stability of two exemplary SPoC components for two different parameter configurations (A,B). While stepwise decreasing the SNR ratio (indicated by the red arrow), z-AUC-values (solid lines) describing the separability of the prediction are plotted together with standard deviations (dashed lines). The area under the z-AUC curve—further on called AAUCSNR—describes the stability of the component under the challenge of added noise. (C) Shows the histogram of all AAUCSNR scores evaluated for the considered parameter configurations.

dominantly selected for their ability to predict CPL. The slow frequency (<4 Hz) components are dominated by artifactual subspaces. **Figure 9D** reports on the occurrences of the different performance metrics among the selected components. Most components could be extracted for RT (61%), followed by CPL (16%) while all other metrics contribute almost equally well with 6–8% of the selected components. **Figure 9E** provides an overview over the SPoC ranks of the surviving components. The rank ordering corresponds to the eigenvalue ordering of the complete data set. As the number of selected components drop with increased rank, the ranking is associated with the information content of the subspace component.

SPoC as a linear filtering method allows for a limited neurophysiological interpretation of spatial activity patterns. A representative subset of typical scalp topographies from the selected stable and informative subspaces are plotted in **Figure 10**. The components were assigned to three groups. About 70% of components fall into group G1, which comprises patterns ranging from activations in occipital, to central or frontal areas. The maximum activity of those components often is found over one of the hemispheres. About 10% of the components fall in group G2. They show patterns of probable non-neural sources and may represent e.g., eye artifacts, muscular activity or single noisy channels. Group G3 comprises noisy topographies. As indicated by patterns in the intersection area of the three groups, mixed components were observed as well. The detailed parameter configurations of each of the plotted components is listed in **Table 1**.

#### 4. DISCUSSION

We hypothesized that subject-specific pre-trial brain signals contain information which allows to partially explain and temporally predict the trial-by-trial variability of the upcoming motor performance in SVIPT. To test the hypothesis, we developed a workflow which is capable to extract informative oscillatory EEG subspace components and to identify the most robust ones. Simulating an online application, our analysis revealed strong evidence that the band power of the selected components is predictive for the single-trial SVIPT performance. Major findings were that these components indeed exist, but need to be optimized for individual users. With 11 out of 18, not all, but a majority of the subjects revealed the desired informative features.

In the following we will first discuss the decision to utilize SPoC instead of other alternative analysis methods. In this context, the proposed selection procedure and the stability of SPoC components over time is discussed, with a special focus onto the role of SNR, frequency and the illiteracy phenomenon. In addition, the detected components will be related to existing literature and characterized on a group-level with respect to the covered frequency bands, sub-processes reflected by the components and the time courses revealed. Before concluding, we will describe a neuroergonomically enhanced rehabilitation paradigm as a possible use case of our contribution.

### 4.1. SPoC and its Alternatives

Designing the data analysis workflow, we built upon our background in BCI. Accordingly, we carefully selected

algorithmic building blocks only, if they can be applied in single-trial analysis [e.g., the application of the spatial SPoC filter according to Equation (4)]. This decision should simplify the translation of the presented workflow to closed-loop experiments in the future. The choice of the supervised SPoC algorithm for extracting informative components is supported by its good performance compared to a supervised linear regression of bandpower features on the sensor level (see Section 3.2). This is in accordance with findings of Dähne et al. (2014). On data from an auditory steady-state evoked potential paradigm, these authors reported better results for SPoC compared to both, linear regression and an unsupervised subspace decomposition using independent component analysis (ICA). SPoC does not reconstruct sources of the brain, but instead performs a supervised subspace decomposition. Thus, a SPoC subspace component can not be expected to correspond to a single physical source or even a dipole source (even though such SPoC components are possible). Theoretically even several spread-out brain areas may contribute to a single SPoC component, if they share oscillatory activity which co-varies over time with the labels. The choice between SPoC and source reconstruction approaches (Gonzalez Andino et al., 2005) represents a tradeoff—while the latter may facilitate the interpretation of results, SPoC components avoid several of the drawbacks mentioned in Section 2.5. As our workflow was aligned in terms of applicability for single-trial online paradigms, our decision was biased toward SPoC.

# 4.2. Selection Criteria for Robust and Predictive Components

Over-fitting is a general issue for supervised methods and for SPoC in particular, as no form of regularization was applied. This requires some form of post-hoc selection of SPoC components. The situation is aggravated, as SPoC returns full rank filter matrices, which result in a very large numbers of subspaces. However, only a fraction of these can be expected to be informative about the labels. As robustness over time as well as with respect to label noise are important criteria for the potential closed-loop applicability of a component, a single selection criterion (e.g., a threshold on the correlation value) is not sufficient. By that, we selected three criteria (see Section 2.6), which suited best these requirements. Out of the initial five selection criteria, the two scores Rall and Rfolds turned out to

FIGURE 9 | Histograms of different parameters solely for the selected SPoC components. (A) Shows the assignment over the 18 subjects. (B) Gives the allocation over data set sizes Ne (with a lower limit of 150 trials). (C) Visualizes the distribution across frequency bands. (D) Depicts the spread of components over the five utilized motor performance metrics, while (E) shows the split according to their SPoC rank positions.

distribution of Rall values for the selected components.

TABLE 1 | Parameter configurations for components of groups G1, G2 and G3 as visualized in Figure 10.


be beneficial for characterizing the extracted components. Thus, they were omitted for the selection process, since a strong correlation between z-AUC and Rall was observed (see **Figure 8**). The same holds for the correlation between z-AUC and Rfolds (not shown).

An alternative to the current selection procedure would be to relax the thresholds and combine it with additional methods to judge the plausibility of the remaining components posthoc. For ICA components, workflows have been proposed, such as MARA, an automatic classification of artifactual components by Winkler et al. (2014). MARA uses features based on topology, time-frequency analysis and source reconstruction. Similar approaches have been proposed by Daly et al. (2015) and Grosse-Wentrup et al. (2013).

# 4.3. Influence of SNR on SPoC Components

By applying rather strict selection criteria, weaker but still informative components may have been removed. As a result, the data of some subjects did not reveal informative prego oscillatory components. Reasons may be a lower SNR of their data, which hides potential informative content from the SPoC analysis, especially in combination with the limited number of trials used. The work of Castaño-Candamil et al. (2015a) on robustness testing of SPoC components backs this interpretation. In this case, future improvements may be expected by regularization techniques introduced to SPoC, a reduction of the dimensionality prior to applying SPoC, using more data or from transfer learning approaches. However, we can not exclude that informative oscillatory components may not be visible to the EEG or may be absent in some subjects. This problem has been described as BCI "illiteracy." It has predominantly been studied in the context of motor imagery paradigms for the control of BCI applications (Hammer et al., 2012), where decoding the imagery class usually is not possible for a subset of subjects. The BCI illiteracy problem was tackled by novel experimental setups like hybrid BCI paradigms (Allison et al., 2012; Müller-Putz et al., 2015), but could also be alleviated by more advanced decoding methods (Sannelli et al., 2010).

# 4.4. Rank Stability of SPoC Components over Time

In Section 4.3, the relation between SPoC solutions and the SNR of the data has been touched. As SPoC ranks the detected components according to their covariance values, solutions may seem unstable when only the first-ranked component is considered. In real-life data sets, variation of the SNR over time can induce rank switches or mixed components. Tracking a component over multiple runs of the subspace decomposition method is a challenging task, especially as mixtures theoretically can not be distinguished from a single source. However, as similar problems arise for online learning of blind source separation methods like ICA, practical solutions are available (Hsu et al., 2015). **Figure 11** gives examples of stable, stationary components (**Figure 11A**) and of unstable SPoC components (**Figures 11B,C**), both observed over the five chronological cross-validation folds. Instable components may be evoked if the stationarity assumption of SPoC is violated e.g., by slow temporal intensity variations due to user learning. For **Figure 11B**, arrows indicate a possible path through rank positions across folds by connecting corresponding components. Please observe, that SPoC generates cases with even severe variation between folds as those depicted in **Figures 11B,C**. However, such components typically have been removed during the selection process. While mixed, yet stable components may be hard to interpret, they can still be useful for predicting the task performance.

We have observed a high sensitivity of SPoC for small differences in the frequency parameters. Seemingly unstable components which display rank switching behavior (see **Figure 12** at 8.7 Hz) can sometimes be stabilized by slightly changing the frequency, e.g., to 9.4 Hz in this example. Further increase of the frequency to 10.2 Hz again induces instability in this example.

# 4.5. Characterization of Informative SPoC Components and Sub-Processes

The proposed SPoC workflow delivers a diverse set of oscillatory components, which vary in their topological patterns as well as in their underlying frequency band. This is not surprising, since SVIPT requires the interaction of several cognitive sub-processes in order to reach a good overall performance. For each subprocess, one or more specific neural features may exist, with all of them being informative about the overall outcome of the complex task.

The best components differ between subjects and predominantly occur in the alpha band, followed by beta and gamma band. Our findings are supported by informative features in the alpha and beta-range observed during premovement intervals of a hand grasping task (Zaepffel et al., 2013; Meyer et al., 2014; Yang et al., 2014). Furthermore, the informative frequency ranges for SVIPT are comparable to those reported for attention related tasks (Gonzalez Andino et al., 2005; Hoogenboom et al., 2010; van Ede et al., 2012). We obtained best results when using RT as a performance metric, which supports our earlier findings on disjunct data from younger subjects (Castaño-Candamil et al., 2015c; Meinel et al., 2015). RT of course does not automatically lead to a successful trial, but it can be seen as an indicator for a quick ramp-up phase and alertness. For fewer users, presumably those with highest SNR characteristics, informative oscillatory features could be identified for other performance metrics of the force task, too.

Comparing the topological plots of group G1 in **Figure 10** with those reported in literature, it can be observed that many

of them resemble patterns emerging for motor imagery tasks in BCI (Krauledat et al., 2008). These often display a clear maximum of activity in channels located over one of the sensorimotor areas (cp. pattern 5 of G1 in **Figure 10** and the pattern of S5 in **Figure 5**) or are located centrally over both hemispheres. While similarity of patterns are by no way a proof for an origin of these oscillatory components in the sensorimotor cortices, the hand force action required to succeed in the SVIPT task would allow for such components.

Other components show a maximum intensity over parietal and occipital areas and may reflect the involvement of the visual system in the SVIPT task. Pattern 2 of **Figure 10** and patterns in **Figure 11A** display a lateralization similar to patterns reported for directed and covert visual attention processes (Hanslmayr et al., 2007; Horschig et al., 2014a). Components with a centrally located maximum (cp. pattern 1 of **Figure 10** or the pattern of S9 in **Figure 5**), or with double wing shapes (e.g., pattern 3 of **Figure 10**) resemble components reported for generalized visual attention processes (van Dijk et al., 2008; Meyer et al., 2014). Again, most of these rather clear patterns originate from the alpha frequency band.

While the relevance of several of the selected components cannot be fully interpreted, we do consider these features as added value for neurologists, e.g., by tracking the power time course over sessions for a subject-specific component. Further insight into underlying sub-processes and participating brain areas may be obtained from a post-hoc source reconstruction applied upon single SPoC subspaces.

# 4.6. Behavioral Variability on Different Time Scales

Independent of the choice of the exact motor task, subjects generally display two types of performance variations (Chaisanguanthum et al., 2014). First, a large trial-to-trial

performance variability is observed from behavioral data. Second, slow performance drifts can occur over the course of a session. Accordingly, SPoC can deliver components, which reflect either one of the two types of performance variations. To tell them apart, a comparison between Rall and Rfolds is helpful. High values for Rall, but low for Rfolds indicate a session trend. If both are high, then the component is informative for trial-by-trial variation (see single-trial predictors and session trend predictors in **Figure 13** as well as the examples given in **Figure 5**).

For the purpose of brain-state-informed closed-loop experimenting, single-trial predictors may be more suitable. Session trend predictors, however, may still be useful for precleaning the performance labels. While session trend predictors may reflect an increasing fatigue or a learning effect, it is much harder to determine underlying mechanisms, which cause the rapidly changing trial-to-trial performance of the single-trial predictors (Wu et al., 2014; Hadjiosif and Smith, 2015; Osu et al., 2015). However, our identified components reveal strong evidence that the pre-trial brain activity is partially informative about trail-by-trial variability of motor performance. This in accordance with Churchland et al. (2006) who reported on monkey experiments that at least 30% of behavioral variability could be explained by the fluctuations of preparatory neural activity in the dorsal premotor cortex. However, Chaisanguanthum et al. (2014) stated only a weak relationship between motor cortex activity (PMd/M1) in monkeys and trial-wise fluctuations of behavior.

# 4.7. Closed-Loop Experimenting as Neuroergonomical Application

The predictive EEG features are extracted from a pre-go interval of each trial. Our pipeline carefully simulated an online scenario, but this approximation of course can not replace the evaluation within a future online study. However, the informative trial-by-trial performance predictors may serve to enhance the neureorgonomical needs of motor rehabilitation scenarios. Since motor performance variability was reported to become larger for stroke patients (Lodha et al., 2010), applying identified patient-specific components within brainstate dependent closed-loop experimenting may enable to causally influence their performance e.g., by manipulating difficulty levels in motor rehabilitation paradigms. So far, BCI methods in stroke rehabilitation (Ang and Guan, 2013) have been used to detect the attempted movement of the affected hand by analyzing informative ERD/ERS features of the EEG and subsequently close the feedback loop for the patient either by triggering a simulated hand movement on a screen (Pichiorri et al., 2015) or by triggering a passive movement of the affected hand, e.g., via an external robotic device or an active orthesis (Ramos-Murguialday et al., 2013).

When implemented in future closed-loop applications, it may be worth to combine SPoC features across multiple frequency bands e.g., by a regression approach. This might allow for enhancing the trial-wise performance prediction, in case the information contained in different frequency bands is independent. Similarly, the combination of predictors based on different performance metrics might serve to gain an enhanced performance estimate.

# 5. CONCLUSION

In summary, we have shown that the proposed workflow is a suitable basis to identify subject-specific single-trial based neural markers which are predictive for the performance of an upcoming motor task. Those predictors may be valuable building blocks for neuroergonomic applications since they are informative about the status of the visual subsystem as well as the sub-processes involved in hand motor control. Moreover, exploiting those features in future closed-loop experimenting, e.g., by temporal gating of upcoming trials, they will allow for brain-state-informed rehabilitation paradigms. Furthermore, the group-level analysis motivated to utilize our workflow to gain a better understanding of trial-to-trial variations of cognitive sub-processes, which are relevant for a successful rehabilitation outcome.

# AUTHOR CONTRIBUTIONS

Conceived and designed the experiments: AM, SC, JR, and MT. Performed the experiments: AM, SC, and MT. Analyzed the data: AM and MT. Contributed reagents/materials/analysis tools: AM, SC, JR, and MT. Wrote the paper: AM and MT.

# ACKNOWLEDGMENTS

This work was fully supported by BrainLinks-BrainTools Cluster of Excellence funded by the German Research Foundation (DFG), grant number EXC 1086. The analysis was partly performed on the computational resource bwUniCluster funded by the Ministry of Science, Research and the Arts Baden-Württemberg and the Universities of the State of Baden-Württemberg, Germany, within the framework program bwHPC. The article processing charge was funded by the German Research Foundation (DFG) and the Albert-Ludwigs-University Freiburg in the funding programme Open Access Publishing. The authors would like to thank to Eva-Maria Schlichtmann for her support with the data collection.

# REFERENCES


Ang, K. K., and Guan, C. (2013). Brain-computer interface in stroke rehabilitation. J. Comput. Sci. Eng. 7, 139–146. doi: 10.5626/JCSE.2013.7.2.139


effectiveness of perceptual masking. J. Cogn. Neurosci. 21, 2407–2419. doi: 10.1162/jocn.2008.21174


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Meinel, Castaño-Candamil, Reis and Tangermann. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Evaluation of a Dry EEG System for Application of Passive Brain-Computer Interfaces in Autonomous Driving

Thorsten O. Zander 1, 2 \*, Lena M. Andreessen1, 2, Angela Berg<sup>1</sup> , Maurice Bleuel <sup>1</sup> , Juliane Pawlitzki <sup>1</sup> , Lars Zawallich<sup>1</sup> , Laurens R. Krol 1, 2 and Klaus Gramann1, 3

*<sup>1</sup> Biological Psychology and Neuroergonomics, Technical University of Berlin, Berlin, Germany, <sup>2</sup> Team PhyPA, Biological Psychology and Neuroergonomics, Technical University Berlin, Berlin, Germany, <sup>3</sup> Center for Advanced Neurological Engineering, University of California San Diego, San Diego, CA, USA*

We tested the applicability and signal quality of a 16 channel dry electroencephalography (EEG) system in a laboratory environment and in a car under controlled, realistic conditions. The aim of our investigation was an estimation how well a passive Brain-Computer Interface (pBCI) can work in an autonomous driving scenario. The evaluation considered speed and accuracy of self-applicability by an untrained person, quality of recorded EEG data, shifts of electrode positions on the head after driving-related movements, usability, and complexity of the system as such and wearing comfort over time. An experiment was conducted inside and outside of a stationary vehicle with running engine, air-conditioning, and muted radio. Signal quality was sufficient for standard EEG analysis in the time and frequency domain as well as for the use in pBCIs. While the influence of vehicle-induced interferences to data quality was insignificant, driving-related movements led to strong shifts in electrode positions. In general, the EEG system used allowed for a fast self-applicability of cap and electrodes. The assessed usability of the system was still acceptable while the wearing comfort decreased strongly over time due to friction and pressure to the head. From these results we conclude that the evaluated system should provide the essential requirements for an application in an autonomous driving context. Nevertheless, further refinement is suggested to reduce shifts of the system due to body movements and increase the headset's usability and wearing comfort.

#### Keywords: autonomous driving, passive BCI, EEG, usability, ERP

# INTRODUCTION

Driving has become a part of everyday life, which makes the drive to work or for recreational activities a simple routine task. However, the effects of the mental workload and effort required by driving often go unnoticed. A study by Borghini et al. (2014) found that mental workload, fatigue, and drowsiness are significantly increased while driving. Especially long periods of constant driving, as often performed by professional truck drivers, result in an accumulation of these effects over time, decreasing the driver's cognitive capabilities and driving performance, thus increasing the chances of traffic accidents.

#### Edited by:

*Mikhail Lebedev, Duke University, USA*

#### Reviewed by:

*Nima Bigdely-Shamlo, Qusp, USA Tjeerd W. Boonstra, University of New South Wales, Australia Tomas Emmanuel Ward, Maynooth University, Ireland*

> \*Correspondence: *Thorsten O. Zander tzander@gmail.com*

Received: *08 January 2016* Accepted: *08 February 2017* Published: *28 February 2017*

#### Citation:

*Zander TO, Andreessen LM, Berg A, Bleuel M, Pawlitzki J, Zawallich L, Krol LR and Gramann K (2017) Evaluation of a Dry EEG System for Application of Passive Brain-Computer Interfaces in Autonomous Driving. Front. Hum. Neurosci. 11:78. doi: 10.3389/fnhum.2017.00078*

The field of automotive human factors and ergonomics is concerned with minimizing safety risks depending on human performance in driving tasks. Today, many automations and small devices have found their way into cars in order to help reduce the mental workload required to operate the vehicle (Young and Stanton, 1997; Tadaka and Shimoyama, 2004; Ma and Kaber, 2005). A different approach aims to fully or at least partly automate the task of driving, so the human driver can be eliminated as a risk factor in most instances. The scientific field working toward this goal is called Autonomous Driving (Franke et al., 1998) and has grown more important over the past years.

One particular problem with autonomous driving is the question of responsibility: Who is accountable in case of an accident? Most countries still define the human driver of a car as the entity responsible for anything that happens while driving (Beiker, 2012). Therefore, experts believe it would be best to only automate some of the tasks that arise while driving, but to leave the most complex tasks to a human driver for the time being. According to Sukthankar et al. (1997), the task of driving consists of different levels, which are the strategic level (route planning), the tactical level (maneuver selection), and the operational level (maneuver operation). Automation of the lowest, operational level is thus legally the least complex, and also technically possible (Dickmanns and Zapp, 1987; Pomerleau, 1992). Driving along a highway could relatively easily be automated, but once the traffic situation changes, the human may be required to take over control. This approach thus requires an important exchange of information between the human driver and the automated system: The human must be timely and appropriately informed of the pending takeover. As stated by Llaneras et al. (2013), people tend to focus their attention on secondary tasks once the primary objective of driving has been taken over by automation. As a consequence, in a situation where the car drives autonomously, a signal given by the system to indicate the necessity for takeover might be missed, or might catch the human by surprise. This may result in loss of control over the vehicle.

As a solution to the above problem, the car could monitor the driver's mental state, and adapt the notification process to the current context. A completely attentive driver might quickly perceive and understand even simple signals, whereas for example a sleeping driver may need to be woken carefully by the car in advance of leaving the highway. Passive braincomputer interfaces (passive BCIs, Zander and Kothe, 2011) are promising approaches for such monitoring and automated adaptation (Zander et al., 2011). This technology enables realtime detection of mental conditions like fatigue, workload, and degree of relaxation (Blankertz et al., 2010; Gerjets et al., 2014), which offer a good estimate of whether or not the driver is ready to take over control of the car. But the passive BCI approach during autonomous driving is not limited to this. More general information—like mood or situational awareness—and also very specific information about the subjective interpretation of the current context—that might be reflected in the driver's brain as error responses—could be assessed by the passive BCI (Zander and Jatzev, 2012). This information could then be integrated in the autonomous decisions of the car. The car learns how the driver interprets the context and gains a degree of context-awareness by utilizing the driver's brain as a sensor.

Passive BCIs are commonly based on electroencephalography (EEG). Traditional EEG systems are relatively cumbersome to apply and use, requiring preparation of the skin, application of conductive gel, and cleaning of the cap afterwards. To make EEG applicable for non-scientific uses, e.g., to be used by drivers, its application and handling needs to be as easy as possible. This is why alternative electrode systems (e.g., described in Zander et al., 2011; Liao et al., 2012) are an important focus of autonomous driving related BCI research. Primarily, the use of gel is eliminated, and the caps containing the electrodes are made for quick application, resulting in less preparation time and, ideally, more comfortable for the wearer. Recent laboratory studies provided evidence of good signal quality, comparable to that of standard gel-based electrodes. It is still unclear however that the signal quality can be maintained in real-world contexts.

This study focused on evaluating the use and application of a dry electrode EEG system in the context of a running vehicle. It was assessed how easy it is for untrained person to apply the system on their own head, how well the electrodes can be positioned and remain in place, and whether the signal quality is sufficient for BCI usage when the system is self-applied. Two common features in the EEG, an N200-P300 ERP and the parietal alpha rhythm, were analyzed as examples of signals that potentially can be used in a passive BCI application. Furthermore, interference in the EEG signal resulting from usage inside a running car—a noisy environment—was investigated. Finally, wearing comfort over a prolonged period of time as well as general user acceptance were evaluated.

# MATERIALS AND METHODS

# Participants

Ten participants, five male, participated in the experiment. The mean age was 28 years (SD = 3.4). Two participants reported to have sensitive skin. All participants gave their written informed consent to participate in the study and were paid 20 euros as expense allowance. The overall duration of the experiment was on average 165 min (SD = 39 min.), including breaks.

# Apparatus

#### Vehicle

The vehicle we used to evaluate the influence of vehicleinduced noise on the recorded EEG was a Volkswagen Touran (year of manufacture 2003). The car was stationary during the experiments, but had the engine running, the radio switched on (though muted), and the air conditioning enabled. A 7.6′′ TFTdisplay was mounted to the right of the steering wheel near the center console.

#### Experimental Room

The experimental room used for baseline recordings was a nonfrequented room at the TU Berlin with constant light, right next to the parked car. Diversions and disturbances were kept to a minimum.

#### Computer System

The EEG system was connected to a laptop (Sony Vaio Z, 2012) and EEG data was recorded using the BrainVision Recorder, BrainVision RDA (Brain Products GmbH, Munich, Germany), and LabRecorder (as part of the BCILAB framework, Delorme et al., 2010). The experimental paradigms were run using SNAP<sup>1</sup> (Iversen and Makeig, 2013). To analyze the data, we used the EEGLAB toolbox (Delorme and Makeig, 2004), an open source toolbox embedded in MATLAB. For classification we used the open source toolbox BCILAB (Kothe and Makeig, 2013), also embedded in MATLAB.

#### EEG System

The system examined in this study was the Brain Products actiCAP Xpress dry-electrode EEG system (see **Figure 1**) provided by Brain Products GmbH for the duration of the experiment. The system included 16 active data electrodes plus one reference and one ground electrode. Electrodes were applied to one of two differently-sized flexible caps, depending on the head circumference of the participant (52–58, or 58–64 cm). To ensure fixation on the participant's head, a chin belt was attached to the cap. Each cap provided 78 possible electrode positions most of the extended international 10% system, with additional options to set up regions of interest. We used electrode positions Fp1, Fp2, Fz, FC5, FC6, C3, C4, Cz, CPz, Pz, CP5, CP6, PO3, PO4, POz, and Oz.

To adjust the system to an individual participant, the electrodes can be extended to different shapes and sizes by attaching so-called QuickBits (see **Figure 2**). The kit used in the study came with six T-shaped flat tips (with a diameter of 7 mm) to be attached to the forehead and earlobes, as well as 32 mushroom-head tips for application on the scalp. These latter come in different lengths of 8, 10, 12, and 14 mm, which can be attached to the electrodes according to head shape and required pressure. This enabled a personalization of the system: Optimal

sensor lengths for electrode positions can be noted, stored and re-applied in follow-up experiments.

Prior to applying the actiCAP Xpress, the electrodes were cleaned using a disinfectant spray. This was done even in case the electrodes and sensors had not been used before to remove dust and particles to improve connectivity.

The electrode array was connected to a V-Amp EEG signal amplifier (Brain Products GmbH, Munich, Germany), which in turn was connected to a laptop computer through a universal serial bus (USB) 2.0.

# Experimental Procedure

#### Experimental Rationale

This study was designed to assess different requirements to an EEG system for application in real-world driving scenarios. We defined the following requirements: (1) self-applicability of the system, (2) impact of interfering noise signals inside a running vehicle on EEG signal quality, (3) stability of cap and electrode positions after context-related movements, and (4) usability and wearing comfort of the system.

The experiment was divided into four blocks covering these four issues, answering the following questions.


**Figure 3** summarizes the experimental session. After arrival of the participant, the experiment was explained and a demographic survey was conducted. While the cap was personalized by the investigator by exchanging electrode tips where necessary, the participant was asked to read the instruction manual of the system, in preparation for Block I.

#### Block I: Self-application

Self-application of the cap, as opposed to having the cap fitted to you by a trained operator, may take a different amount of time and may affect the positioning of the electrodes and the signal quality. To estimate these effects, we compared cap application

<sup>1</sup> Simulation and Neuroscience Application Platform (SNAP). Available: https://github.com/sccn/SNAP.

in two conditions: Application by the experimenter, and selfapplication by the participant. Customization of the cap was not included here, as it is assumed to be a one-time effort.

Participants were seated in the experimental room, in front of a laptop. A stopwatch was used to first measure the time required by the experimenter to apply the EEG cap to the participant's head.

Once the cap and ground/reference electrodes were in place, electrode positions were measured using the Polaris Vicra system (Northern Digital Inc., Waterloo, ON, Canada), allowing for measuring 3-dimensional electrode locations. We chose to record the 16 electrode positions, as well as the inion, the nasion and the left and right preauricular points. The latter three were used as coordinate references to allow the transformation of coordinates taken from different measuring sessions into one coordinate system to allow comparison (described below in the section "Analysis Procedures"). To achieve comparable, stable positions for the reference points in each measurement during the experiment, we marked them by a small dot on the respective positions on the participant's skin using a removable eudermic marker.

Following this, signal quality was optimized by relatively fine-grained adjustments to the electrodes. As the system did not provide an objective measure of signal quality or electrode contact (e.g., impedance), signal quality was assessed visually. The signal was monitored using the BrainVision Recorder software, with all 16 channels displayed at once, set to a resolution of 50 µV. A display filter was enabled, bandpass-filtering the visible signal from 0.1 to 40 Hz, not affecting the recording. The duration of this optimization was again timed using a stopwatch. The resulting signal quality was also recorded, as rated by the experimenter. The indication for signal quality was the visual form of the signal on the display, artifacts had to be recognized visually. The rating followed predefined guidelines and was done on a 5-point scale with 5 meaning "perfect signal" and 1 meaning "no signal at all" (see **Figure 4**). This rating was done twice: Once for the signals with the display filter switched on, and once based on the unfiltered raw signal.

Following this, the cap was removed and participants, who read the instructions manual, were asked to put on the cap by themselves, after all of their questions about the procedure had been answered by the experimenter. Application time was again measured, as were the electrode positions and the resulting signal quality.

#### Block II: EEG Recording

For investigating signal quality in standard EEG analyses we chose the well-known N200 and P300 components of the visual event-related potential and the parietal alpha rhythm. Both time- and frequency domain parameters are well-examined phenomena in EEG research. Hence, clear expectations about morphology, topography and signal strength can be drawn, that build the baseline of comparison for our results.

In order to assess the EEG signal and the possible influence on it of the electromagnetically noisy environment that is the car, participants performed in two established experimental paradigms of BCI research (Zander et al., 2011), once in the experimental room, and once inside the car. The order of these two conditions was randomized between participants.

The first paradigm focused on the elicitation of visual eventrelated potentials (ERPs) using a standard oddball approach: An infrequent deviant stimulus sometimes appeared instead of the frequent standard stimuli (Duncan-Johnson and Donchin, 1977; see **Figure 5**). This is a common approach when researching ERPs referred to as the N200-P300 complex (Polich and Kok, 1995; Linden, 2005). ERP detection during autonomous driving can be useful, as they may allow a car to detect how drivers react cognitively to perceived stimuli/information.

On the screen, participants saw a circle divided by lines into 30◦ angles. First, a bar appeared, like a clock's arm pointing 12 o'clock. This bar then rotated clockwise in discrete steps, once every second. A standard stimulus had it rotate by 90◦ ; a deviant consisted of an initial 60◦ rotation, followed by a 100 ms pause and a 15◦ counterclockwise rotation. After each deviant, the bar disappeared and reappeared at the 12 o'clock position.

10% of all stimuli were deviants. In total 400 trials were displayed (360 standard, 40 deviant).

The second paradigm focused on features in the spectral domain, specifically the parietal alpha rhythm. This feature is of special interest to autonomous driving, as parietal alpha can be used as an indicator of whether the participant is currently in a relaxed state or performing some mentally demanding task (Berka et al., 2007). It also is a standard example for features in the spectral domain.

The paradigm (see **Figure 6**) presented to the participant was designed to induce changes in parietal alpha activity by alternating between two states of mind: Engaged and relaxed. To engage the participant, a six-letter word was presented letter

by letter, with letters appearing on random locations on the screen amidst visual noise. Each letter was only visible for 1 s. Participants were instructed to read the word. After each engagement trial, the participant was instructed simply to relax for 6 s with their eyes open. This relaxation phase was introduced using an auditory signal and ended by a similar one with lower pitch.

There were 32 trials of each condition. The order of words in the engaged condition was randomized across participants.

These two paradigms were presented in fixed order to the participants in the two conditions (room vs. car).

#### Block III: Driving-Related Movements

The third block investigated the influence of movements on the position of the electrodes.

Electrode positions were recorded, using again the Polaris system mentioned earlier, at the start of this block. Participants then performed a series of three different types of drivingrelated movements inside the car, and the electrode positions were measured again after each group of movements. Because measurements were not done inside the car but in a nearby room, some walking was required. Electrode cables were bundled together and fixed to the participant's clothing in a relaxed way to minimize their strain on the cap while walking.

To make movements comparable between participants, we placed markers (sticky notes) at certain places in the car: One on the left rear window, one above the driver's seat, one in the legroom of the front passenger seat and one in the center of the rear bench seat. Before seating the participant in the driver's seat, the markers were shown to them. The EEG system was not connected to the amplifier during the movements. All instructions for different movements were given through pre-recorded audio files played back using a laptop and speakers inside the car.

#### Block IV: Usability

To assess the usability of the system, the participants were asked to fill out a questionnaire right after Block I. This questionnaire was the System Usability Scale (SUS; Brooke, 1986) was employed, also used in other BCI related studies prior to this one (Pasqualotto et al., 2011; Duvinage et al., 2012). SUS is a standardized questionnaire consisting of ten questions based on Likert scales with five options ranging from "strongly disagree" to "strongly agree." In total, SUS contains five positively and five negatively formulated questions about the system being assessed, for example "I think that I would like to use this system frequently" or "I found the system unnecessarily complex." From the answers given, a SUS score is calculated, ranging between 0 (worst possible system) and 100 (best possible system). This score has to be interpreted taking the individual context of system usage into account. In contrast to qualitative assessments, the SUS does not yield any insight into which usability problems exactly are present within the system. It provides however a quick and reliable way to determine whether or not major changes are necessary in order to make the system safe and comfortable to use.

Additionally, the participants were asked to rate the level of comfort wearing the system after each of the previously described experimental blocks (I–III) on a scale from 1 to 10, one meaning "extremely bad" and ten "very comfortable." We acquired these three subjective impressions to gather insight into how the system's perceived comfort changed over the course of the experiment.

To get an even deeper insight into the comfort of wearing the system, participants were asked to fill out another questionnaire after the third experimental block, after roughly 140 min of wearing the system almost constantly. We adapted a questionnaire for the evaluation of the wearing comfort for firemen helmets (Fabrizio and Cimolino, 2014), by only keeping questions deemed fitting to our context. All questions were rated on a five point Likert scale. In addition to these questions, we asked two yes-no questions: Whether or not the participant believed the cap had moved, and whether or not it induced the feeling of dents on their head. Finally, we asked the participants to mention any discomfort associated with wearing the system, like the feeling of pressure on the head, headaches, or nausea.

# Analysis Procedures

#### Block I: Self-application

Comparison of time needed by the experimenter and the participant to apply the system and to adjust the electrodes was done by two-sample t-tests.

The signal quality ratings were subjected to a three-way mixed measures ANOVA with the two within-subject factors visual filters (no filters vs. 0.1–40 Hz bandpass) and electrode (Fp1 vs. Fp2 vs. vs. Fz vs. FC5 vs. FC6 vs. C3 vs. C4 vs. Cz vs. CPz vs. Pz vs. CP5 vs. CP6 vs. PO3 vs. PO4 vs. POz vs. Oz) and the between-subject factor applicant (investigator vs. participant).

Because a total of six different measurements of electrode positions were taken during the course of this experiment, these measurements were first transformed into one coordinate system to allow a unified comparison. To this end, all measurements were re-referenced to a mean head middle and radius, within participants, as follows.

	- a. Drawing a line through both preauricular points lp<sup>j</sup> and rpj :

Calculate the slope by computing new coordinates

$$(\boldsymbol{u}\_j)\_i := (lp\_j)\_i - (rp\_j)\_i, \text{for } i = 1, 2, 3 \text{ denoting the}$$
 
$$\text{scalars of the three-dimensional vector } \boldsymbol{u}\_j.$$

Define the line by

gj := lp<sup>j</sup> + rju<sup>j</sup> with r<sup>j</sup> to be determined.

b. Construction of a plane H<sup>j</sup> through n<sup>j</sup> , which is perpendicular to the line g<sup>j</sup> :

Find the variables x, y, z to determine the plane equation for H<sup>j</sup>

$$H\_{\flat}: (\mathfrak{u}\_{\flat})\_1 \times + (\mathfrak{u}\_{\flat})\_2 \mathfrak{y} + (\mathfrak{u}\_{\flat})\_3 z := e.$$

To find e, insert the coordinates of the nasion reference point n<sup>j</sup> into the equation

$$H\_j(n\_j): \ (\boldsymbol{\mu}\_j)\_1 \ (\boldsymbol{n}\_j)\_1 + (\boldsymbol{\mu}\_j)\_2 \ (\boldsymbol{n}\_j)\_2 + (\boldsymbol{\mu}\_j)\_3 \ (\boldsymbol{n}\_j)\_3 = e.$$

c. For the purpose of finding the intersection of the line g<sup>j</sup> with the plane H<sup>j</sup> , insert the coordinates of g<sup>j</sup> into the plane equation above and solve for r<sup>j</sup> :

$$H\_{\circ} \left( \mathfrak{g}\_{\circ} \right) : r\_{\circ} = \frac{e - (u\_{\circ})\_{1} (lp\_{\circ})\_{1} - (u\_{\circ})\_{2} (lp\_{\circ})\_{2} - (u\_{\circ})\_{3} (lp\_{\circ})\_{3}}{(u\_{\circ})\_{1}^{2} + (u\_{\circ})\_{2}^{2} + (u\_{\circ})\_{3}^{2}}.$$

Inserting r<sup>j</sup> into the plane equation yields the head midpoint:

$$
\hbar m\_{\circ} = \not p\_{\circ} + r\_{\circ} u\_{\circ}.
$$


$$d\_{\circ} := hm\_{\circ} - \overline{hm} \text{ , } j = 1, ..., 6.$$

4. Then, all recorded electrode positions (ep<sup>k</sup> ) j , k = 1, ..., 16 are re-referenced to hm by addition with d<sup>j</sup> and the euclidean distance edj1j<sup>2</sup> between different recordings j1, j<sup>2</sup> is calculated:

$$\begin{aligned} (d\_{j\_1 j\_2})\_i &:= ((ep\_k)\_{j\_1} + d\_{j\_1})\_i - ((ep\_k)\_{j\_2} + d\_{j\_2})\_i, \\ ed\_{j\_1 j\_2} &:= \sqrt{(d\_{j\_1 j\_2})\_1^2 + (d\_{j\_1 j\_2})\_2^2 + (d\_{j\_1 j\_2})\_3^2} \end{aligned}$$

The value used for comparison of different recordings j1, j<sup>2</sup> was this euclidean distance edj1, <sup>j</sup><sup>2</sup> .

For Block I, recorded positions from the investigator-applied cap were compared to the positions from the self-applied cap. Mean differences of electrode positions were then compared to the expected value of no difference in positions using a onesample t-test against zero.

#### Block II: EEG Recordings

#### **Oddball paradigm: ERP analysis**

EEG data was first preprocessed by applying a bandpass-filter from 1 to 30 Hz, retaining all frequencies relevant for later analyses. Then, epochs of 1100 ms were extracted, starting 100 ms before stimulus onset of the standard and deviant events. Baseline correction was performed with a 100 ms pre-stimulus interval.

To compare event-related activity between car and indoor recordings, amplitudes and latencies of the N200's and P300's were extracted.

First, the indoor condition was used as a baseline as it conforms to laboratory conditions. Inspection of the grand average revealed a global negative minimum at 300 ms over the centro-parietal lead (Pz) and a global positive maximum at 400 ms over the centro-central lead (Cz). Based on these peaks, a search window was defined around 300 ± 70 and 400 ± 70 ms to search for maxima in the individual averages. Once for each individual the global peaks were identified, the peaks on individual channels were identified using a ± 25 ms window around the individual global peak. Mean amplitudes and latencies were extracted for all channels. This procedure resulted in a 4 x 16 vector for each participant, consisting of the mean amplitudes and the latencies of the two components at each channel.

For comparison of mean peak amplitudes two repeated measures ANOVAs were performed. Mean amplitudes from electrode Pz were used for the negativity and from Cz for the positivity. Each 2x2 ANOVA had the two within-participant factors recording condition (indoor vs. car) and stimulus (standard vs. deviant).

In order to examine disparities of mean peak latencies between conditions (indoor vs. car), mean difference peak latencies were calculated by subtracting the negative from the positive peak latency. The mean difference was taken per participant for the two conditions and subjected to a paired sample t-test.

To test for equivalence of EEG measures between recording conditions the two one-sided tests (TOST, Schuirmann, 1981, 1987; Westlake, 1981) procedure was applied to mean peak amplitudes and mean difference peak latencies with an epsilon of the standard deviation of the indoor condition, which was regarded as the control group (R-package "equivalence" May 14, 2016; V0.7.2). A p-value of 0.05 was taken as the significant threshold for all TOST.

#### **Induced alpha paradigm: frequency analysis**

To compare oscillatory features between car and indoor recordings, three different measures were taken: The power spectral density function covering 0.1–40 Hz, single measurements of the band power in the alpha band, and the time course of the alpha band power during the 6-s trials of the paradigm (engaged vs. relaxed).

Fluctuations in alpha power occur with a broader distribution over posterior areas of the scalp (Sauseng et al., 2005). Since we were interested in parietal alpha as potential indicator of mental load, analyses were restricted to five posterior electrodes, namely Pz, PO3, PO4, POz, and Oz. The data was bandpass filtered from 0.1 to 40 Hz and time epochs of 6 s were selected, covering each full trial.

Power spectral densities (PSD) were calculated for each entire epoch and averaged per participant, resulting in 2 x 2 x 5 PSD distributions for each participant (2 experimental conditions x 2 mental states x 5 channels). We used these participant-individual PSDs as well as the averaged PSDs over all participants (grand average), resulting in a total of 11 (2 x 5+1) PSD-distributions for each experimental condition.

Individual and grand average Pearson Correlation of the PSD in the frequency band of 0.1 Hz to 40 Hz were calculated for each electrode between indoor and car conditions and tested for significance using one sample t-tests against zero.

The alpha band (7–13 Hz) being of prime interest here, we also calculated a single bandpower value in this frequency range for each participant, electrode, and trial. We used epochs of 4 s length, starting 2 s after stimulus onset. Logarithmic variances of each trial per electrode of each participant were calculated and normalized with the maximum value of each electrode. These measures were then averaged over all trials, resulting in

a normalized mean alpha band power for each participant under each experimental condition on the five investigated electrodes. Effects between recording conditions, stimuli and electrodes were investigated in a 2 x 2 x 5 ANOVA with the three withinparticipant factors recording condition (indoor vs. car), stimulus (standard vs. deviant) and electrode (Pz vs. PO3 vs. PO4 vs. POz vs. Oz). The factor electrode is a repeated measure here as EEG measures at one electrode depend on values measured by other electrodes. Again, the TOST procedure with an epsilon of the standard deviation of the indoor condition was applied to normalized mean alpha band power values to test for equivalence between recording conditions.

As a third measure, the time course of the band power in the alpha band was used. It was calculated by shifting a 500 ms window over each single trial and calculating the band power for each window position. To avoid leakage effects, the window was multiplied with a Gaussian bell curve of the same size. Afterwards the single-trial measurements were normalized with the mean of all band powers. The normalized measurements were averaged, resulting in 2 x 5 time courses for each participant (2 experimental conditions x 5 channels). As above, we also took the grand average into account, resulting in 11 time courses in total per experimental condition.

To examine the difference in the time course of the band power in the alpha range between conditions, Pearson Correlations were calculated for each participant, channel and condition.

#### **BCI Analysis of both paradigms**

BCILAB's built-in classification approaches were used to evaluate the offline single-trial accuracies as an estimate of potential online performance.

For the oddball paradigm, data was bandpass filtered from 0.1 to 15 Hz and downsampled to 100 Hz. Epochs of 800 ms were extracted starting at each stimulus marker. A windowed-means approach (Blankertz et al., 2011) was used to extract features, using 8 consecutive windows of 50 ms starting at 300 ms poststimulus. As a classifier we used linear discriminant analysis, LDA (Webb, 2002). Mean ERP classification error rates of all eight participants were subjected to a paired samples t-test.

Logarithmic band power was used for feature extraction (Solis-Escalante et al., 2010; Zander et al., 2011) of the data of the second paradigm. This was applied to epochs of 6 s, as above. We performed a (10 x 10)-fold cross-validation, and classified using LDA. Mean classification error rates were subjected to a paired samples t-test.

Classification error rate results from both paradigms were subjected to a TOST procedure with an epsilon of the standard deviation of the indoor condition to test for equivalence between recording conditions.

#### Block III: Driving-Related Movements

Each of the three movement groups had one electrode position measurement before, and one after it. Mean differences of electrode positions prior to and after each movement group were compared to the expected value of no difference in positions using a one-sample t-test against zero.

#### Block IV: Usability

The System Usability Scale was interpreted following the guidelines set by Brooke (1986). To determine the resulting SUS score of the system, all given answers were weighted accordingly and added up. This resulted in a total score per participant, which then was multiplied by the factor 2.5.

After experimental blocks I to III, participants were asked to give a subjective estimate of how comfortable the system felt. The median of the comfort ratings of all participants was used as the overall comfort rating here. To test for differences between the three time points, a Wilcoxon Signed-rank test was applied. The wearing comfort questionnaire was evaluated descriptively.

# RESULTS

#### Block I: Self-application Application Time

A two-samples t-test indicated that the mean time needed for application of the cap did not differ significantly between experimenter (M = 123.2 s, SD = 43.8) and participants (M = 104.9 s, SD = 49.0), t(9) = 0.880, p = 0.391, though showing a tendency that participants perform faster. Mean times needed for adjustment of electrodes also did not differ significantly between investigator (M = 256.3 s, SD = 221.3) and participants (M = 310.2 s, SD = 285.1), t(9) = 0.472, p = 0.642, showing a tendency that experimenters are faster.

#### Electrode Signal

The three-way mixed measures ANOVA on signal quality ratings revealed no significant main effect of applicant, F(1, 18) = 0.341, p = 0.341, η <sup>2</sup> = 0.019. The main effect of filter was significant, F(1, 18) = 66.861, p = 0.000, η <sup>2</sup> = 0.788. Since the main effect of electrode violated the assumption of sphericity Greenhouse-Geisser corrected values were used. The main effect electrode was significant, F(5.167, 93.012) = 2.876, p = 0.017 η <sup>2</sup> = 0.138. None of the interaction effects were significant, all ps > 0.281.

#### Electrode Positions

The t-test against zero performed on mean differences of electrode positions (M = 13.76 mm, SD = 5.12 mm) between investigator- and self-applied cap yielded significance, t(9) = 8.498, p = 0.00001. The electrode positions varied most on the midline of the head, with 15.5 mm variation (averaged over all 10 participants) at Oz to 16.1 mm averaged variation at Fz. This could be due to the structure of the cap: It has two holes for the ears, so electrodes in this area are fixated more clearly than electrodes elsewhere. Electrodes on the forehead can be positioned up to 1 cm higher or lower without any obvious effects on the cap like inconvenience or ill-fittingness, so it was hard for both participants and investigators to position the cap correctly around the midline of the head (see **Figure 7**).

For Block I, recorded positions from the investigator-applied cap were compared to the positions from the self-applied cap. Mean differences of electrode positions were then compared to the expected value of no difference in positions using a onesample t-test against zero.

# Block II: EEG Recordings

Due to software problems on a laptop EEG data of two participants had to be excluded. Analyses of the EEG data were based on the remaining eight participants.

#### Oddball Paradigm: ERP Results

Grand average ERPs from the oddball paradigm are depicted in **Figure 8**. The repeated measures ANOVA performed on mean amplitudes of the negativity measure yielded significance for the main factor stimulus, F(1, 7) = 21.745, p = 0.002, η <sup>2</sup> = 0.756. Amplitudes of the deviant stimuli (M = −5.44 µV, SD = 6.21 µV) were more negative than in standard stimuli (M = −0.01 µV, SD = 2.66 µV). The main factor environment was not significant, F(1, 7) = 0.101, p = 0.760, η <sup>2</sup> = 0.014. There was also no significant interaction, F(1, 7) = 0.261, p = 0.625, η <sup>2</sup> = 0.036. Results of a TOST procedure with an epsilon of the standard deviation of the indoor condition were not significant (mean difference = 0.145; epsilon = 3.95; confidence-interval: −6.79 to 7.08; df = 7; p = 0.166).

For the positivity measure there was no significant main effect of stimulus, F(1, 7) = 5.001, p = 0.060, η <sup>2</sup> = 0.417. The main effect environment also was not significant, F(1, 7) = 2.767, p = 0.140, η <sup>2</sup> = 0.283. The interaction between stimulus and environment was significant, F(1, 7) = 31.800, p = 0.001, η <sup>2</sup> = 0.820. Amplitudes of the deviant trials were higher indoors (M = 9.54 µV, SD = 9.05 µV) than in the car (M = 5.18 µV, SD = 10.57 µV), while amplitudes in standard trials indoors (M = 0.02 µV, SD = 1.25 µV) were only slightly smaller than in the car (M = 0.92 µV, SD = 2.27 µV). Due to this significant interaction effect no TOST was performed.

Results from the t-test performed on mean peak latency differences of the indoor (M = 85 ms, SD = 46.3 ms) and the car condition (M = 101.5 ms, SD = 75.1 ms) were not significant (p = 0.569). The TOST procedure with an epsilon of the standard deviation of the indoor condition showed no significance for

mean peak latency differences (mean difference = −16.5; epsilon = 46.3; confidence-interval: −68.8 to 35.8; df = 7; p = 0.158).

#### Induced Alpha Paradigm: Frequency Results

All individual correlation values for power spectral densities between conditions were higher than 0.79 on all five electrodes, with a mean correlation value of 0.97 (SD = 0.046). All t-tests of these correlations against zero were significant with ps < 0.0001. For the grand average, correlation values between indoor and car condition were both higher than 0.989, with a mean of 0.997 (SD = 0.004). T-tests against zero yielded significance (ps < 0.0001) for both conditions (engaged/relaxed).

The three-way repeated measures ANOVA with withinsubject factors recording condition (p = 0.061), stimulus (p = 0.177), and electrode (p = 0.24) performed on mean alpha band powers was not significant on main or interaction effects, with non-significant interactions (all ps > 0.272). The TOST procedure with an epsilon of the standard deviation of the indoor condition assigned to mean alpha band powers showed significance on electrodes PO4 (mean difference = 0.049; epsilon = 0.129; confidence-interval: −0.031 to 0.128; df = 7; p = 0.049) and Oz (mean difference = 0.001; epsilon = 0.127; confidenceinterval: −0.079 to 0.076; df = 7; p = 0.009). The TOST was not significant for electrodes PO3, POz, and Pz, all ps > 0.340.

Alpha band time course (see **Figure 9**) correlations between indoor and car condition yielded a mean correlation of r = 0.27 for the relaxed condition (Pz: r = 0.43, PO3: r = 0.26, PO4: r = 0.29, POz: r = 0.30, Oz: r = 0.09). Correlations in this condition were significant on all five electrodes for five participants (ps < 0.00001), on four electrodes for one participant (ps < 0.005), and for the other three participants on three electrodes (ps < 0.021). In the engaged condition the mean correlation of all participants was r = 0.23 (Pz: r = 0.34, PO3: r = 0.19, PO4: r = 0.18, POz: r = 0.31, Oz: r = 0.14). Tests yielded significance of correlations on all five channels for three participants (ps < 0.043). For three participants correlation was significant on four channels (ps < 0.00001) and for two participants on three electrodes (ps < 0.00001).

#### BCI Results of Both Paradigms

A paired samples t-test indicated that the error rates for ERP classification in the indoor condition (M = 0.126, SD = 0.086) did not differ significantly from the error rates in the car condition (M = 0.145, SD = 0.116), t(7) = −0.68149, p = 0.518. Furthermore, the TOST procedure with an epsilon of the standard deviation over participants in the indoor condition confirmed significant equivalence classification results in the two recording conditions (mean difference = 0.018; epsilon = 0.086; confidence-interval: −0.032 to 0.069; df = 7; p = 0.020).

A paired samples t-test indicated that the error rates of band power classification for the indoor condition was lower (M = 0.283, SD = 0.160), but did not differ significantly from the error rates in the car condition (M = 0.351, SD = 0.137), t(7) = −1.608, p = 0.152. The TOST procedure with an epsilon of the

FIGURE 9 | Grand Averages of the alpha band time courses for relaxed and engaged conditions indoors and in the car. For the red and the green curve, displaying the relaxed conditions, a similar pattern starting 1 s after onset of stimulus presentation is observed. Similarities over time are also apparent for the engaged conditions, represented in the black and blue curve. Clear co-variation of indoor and in car alpha time courses for both relaxed and engaged conditions is proven by high correlation between the signals.

standard deviation over the participants in the indoor condition confirmed significant equivalence for classification results in the two recording conditions (mean difference = 0.066; epsilon = 0.162; confidence-interval: −0.012 to 0.144; df = 7; p = 0.026).

#### Block III: Driving-Related Movements

**Figure 10** shows the shifts in electrode positions after each of the three groups of movements.

After head-related movements the difference between electrode positions (M = 9.6, SD = 9.1) differed significantly from zero, t(9) = 3.3237, p = 0.009. The apparent lateralization of this effect (25.3 mm mean variation at CP5 vs. 19.6 mm at CP6) may be due to the direction of the shoulder check.

After performance of arm movements the mean difference between electrode positions (M = 7.6, SD = 4.8) differed significantly from zero, t(9) = 5.0241, p = 0.001. Variations were located mainly to the right side of the head with a maximum of 10.5 mm mean variation at PO4. The cause for this may be the direction of the rotation and/or handedness of participants.

Mean electrode position differences after whole-body movements (M = 8.4, SD = 6.4) differed significantly from zero, t(9) = 4.1691, p = 0.002. The greatest shift was on the forehead with 10.1 mm average variation on Fp2 and on the midline of the head (8.2 and 9.3 mm mean variation at POz and Fz). This could be caused by the cables, which were tied together, but interfered with the seatbelt nevertheless.

#### Block VI: Usability

The total SUS score of the system added up to 65. Following the official SUS score interpretation, this is slightly above the threshold for an acceptable system.

Due to minor delays during the experiments, the time points of the additional questionnaires varied slightly for each participant. On average, questions were answered after 60 (Block I), 122 (Block II), and 137.5 (Block III) min.

After the first 60 min, the system got a comfort rating of 7.5, which then decreased significantly over the next hour resulting in a rating of 3 after 122 min. In the following quarter of an hour needed for block III, the comfort rating stayed stable at 3. A Wilcoxon signed-rank test showed that there was a significant difference between the first time point of the rating after 60 min (Mdn = 7.5) and the second rating after 122 min (Mdn = 3), (W = 0, Z = −2.69, p = 0.008). No valid Wilcoxon signed-rank test could be performed to compare the second and third ratings, because the number of effective samples was less than 6 after subtraction of ratings equaled zero for six participants (W = 4, Z = −0.82, p = 0.625). Rating scores of the first and the third rating again showed significant differences, (W = 0, Z = −2.67, p = 0.008).

The six examined items of wearing comfort of the system are summarized in **Figure 11**. A feeling of pressure on the head was rated as the most irritating with a mean score of 2.2. The overall impression of wearing comfort got a mean score of 2.7, and was therefore also perceived as bad. The overall weight of the system on the head was on average rated as the most pleasant aspect of it with a score of 4.2.

Furthermore, the wearing comfort questionnaire yielded the following insights. Seven participants complained about dents and chafe marks on their heads, four about headaches, and one each about neck pains, nausea, and dizziness. Moreover, one participant had the subjective impression that the system had moved over the course of the experiments. None of the participants reported skin irritations due to wearing the cap.

#### DISCUSSION

#### Block I: Self-application

We found that the participants were equally fast as the experimenter in applying the cap, and equally capable in optimizing signal quality. We thus conclude that this type of dry electrode EEG system can indeed be used by individual endusers. We should note, however, that there was no objective measure of when the application was finished; it was based on individual judgements of the experimenter.

We did not investigate the personalization of the cap by adjusting the length of each electrode pin, because this task needs to be done only once. Therefore, we did not investigate how easy it is to personalize the cap while wearing it. Personalization did, however, take up quite some time. We assume that the QuickBit approach would benefit from improvement: Continuously adjustable bits would probably simplify personalization and optimize the result.

While it is not surprising that the signal quality was rated better with active display filters, we had assumed that the signal quality would be better after adjustments by an expert operator than compared to that adjusted by the participant. This, however, was not the case: Participants reached a similar, sometimes even better signal quality. We assume the reason for this to be that participants had a better feeling for how hard, and where exactly the electrodes pressed against their heads, allowing them to fit them even better to the scalp than the experimenter could without the risk of harming the participant.

For the electrode positions, some variation in the measurements must be taken into account. The used system has known variations in measured data points, and for some electrodes (primarily at the back of the head), the measuring stylus may have moved slightly due to head shifts that were sometimes necessary for the measurement. This problem was addressed mathematically, as described above. It was also not possible to point the stylus exactly at the electrode's point of contact with the skin, but only at the electrode's body. It remains unclear, whether or to what extent the differences in electrode positions we measured, imply that the points of contact changed as well.

#### Block II: EEG Recordings

For the oddball paradigm ERP analysis revealed highly similar morphology of ERPs elicited by deviant stimuli in both recording conditions. We found highly significant effects for the negative

peak in the ERP condition. The deviant trials were significantly different from the standard trials in both the indoor and the car condition, showing no difference between conditions. This is not the case for the positivity. The main effect is not significant. It should be mentioned though that we have a clear tendency into the right direction with a p value slightly missing the threshold criteria of 5%. Peaks of the P300 are reduced in the car environment as a result of other signals interfering with the recorded signal in the car. No significant differences were found between peak latencies between indoor and car recordings. We conclude that the main information carried in the signal is comparable for indoor and in car recordings, but its signal strength is attenuated slightly in the car condition.

For the alpha recordings, we have a slightly more complex case. We clearly see a correlation between conditions—alpha values show a similar development over time outside of and in the car. However, there is no significant difference between relaxed and engaged trials on average over all participants, which was expected from the experimental design. When we take a closer look at the individual values (see **Figure 12**), we see that some participants managed to get relaxed in the corresponding task, while others did not. This explains why we do not get significant main effects—several participants were not able to relax in the appropriate condition. This effect can be seen consistently on both conditions, inside and outside the car. However, we do perhaps see a tendency on the main effect of condition that, even though it's not significant, indicates a small change in alpha power between recordings inside and outside of the car.

For all comparisons that showed no significant difference between conditions an equivalence test was performed. Features of the ERP were not equivalent between conditions while spectral features were equivalent on some of the tested electrodes.

These results show that even though we do not have significant differences, the recoded data cannot be taken equivalent. For strict neurophysiological measurements it hence might be worth a consideration whether the tested headset should be used or not.

For ERP and spectral data classifications were not significantly different, and were furthermore clearly equivalent. We, hence, assume that the evaluated system measured the differences in cognitive states, well, in both conditions. Despite small morphological and power differences, classification results were comparable in both domains. Therefore, a BCI can be applied with equal reliability to data from both conditions.

The results we found on the EEG components examined here are as expected from the literature and replicate results from a previous comparison study (Zander et al., 2011). Therefore, we conclude that the dry electrode system investigated here provides comparable data to a conventional gel-based system when used in an autonomous driving context.

It still remains unclear whether the results can be fully transferred to a real-world autonomous driving context where the car would most likely be moving. A driving car would bring additional factors like increased vibration from the engine, jerks due to uneven roads, or inertial effects induced by direction changes. Moreover, the driving task itself could lead to additional artifacts, such as stress related sweating on the scalp and the user scratching their own skin. Also head movements against the headrest might lead to changes of electrode positions in a way that was not examined here. Another factor would be the radio not being muted in a real-world-driving scenario: Environmental noises between 70 and 120 decibels have been found to increase the amplitude of measured P300 events (Nam et al., 2008). Drivers will also be moving e.g. their heads and hands, which they minimized during data recording. This study however presents a first step in investigating the applicability of dry systems in a car environment, revealing initial insights in a scenario with controlled artifact activity. These results can form the basis for future studies in active driving study scenarios, where that control is further relaxed.

#### Block III: Driving-Related Movements

The results showed that the electrodes shifted in position when executing different driving-related movements.

The most significant shifts occurred during movements involving the head directly, primarily at the rear left of the head.

We assume this was due to the shoulder check, which required a sudden, fast turn of the whole head to the left and back. We can, however, not be sure as to whether the shoulder check or the look at the ceiling had more effect on the electrodes positions since they were measured together as one movement group. Either way, the resulting differences may well-influence the quality of the data recorded by the system.

The performed arm movements had less impact on the electrode positions, though the shifts were still significant.

The third group of movements resulted in the least position changes for all electrodes although the participants had to move their whole upper body—including the head. The most pronounced shifts were observed at the right frontal area. The instruction to touch the marker in the legroom of the passenger seat might offer an explanation for this, as the head had to be moved rather far to the right and down. Also in the area around the left ear increased shifts in position were observed. Most likely, this was a result of fastening and unfastening the seatbelt which may have induced some strain in that area, maybe by pulling on the cables.

Finally, since the movements were always performed in the same order (head, arm, and body), order effects cannot be excluded.

For future use, the cap could be applied e.g., only after the seat belt has been fastened, which often requires some effort. Since the cables may also have caused some of the position shifts, a wireless system is preferable.

#### Block IV: Usability

The System Usability Scale is a general questionnaire to evaluate the usability of technical systems, and is not specifically designed for BCI systems. As SUS provided significant insights in other BCI-related studies, we decided to use it here as well (Duvinage et al., 2012; Käthner et al., 2013). Some questions however, especially about the interaction with the system, did not fit the current purpose and even confused some of the participants. The resulting SUS score might therefore not be entirely accurate, but, we believe, still provides a good indication about the overall usability of the system in an autonomous driving context.

The evaluation of the wearing comfort was better tuned to the current context and raised no questions from participants. The results showed that the first hour of using the system did not bother the participants much, which qualifies it for shortterm usage at least. After the second hour of using the system, however, the subjective comfort ratings dropped significantly and participants began to complain about dents, slight headaches, neck pain, even nausea and dizziness, which clearly shows that the EEG system with the current cap design is not suitable for long-term use. We did not investigate recovery time: How long a break is needed, before the cap can be comfortably worn again? This remains an open question.

The most annoying features of the system, according to the participants, were its rather tight fit onto the head resulting in the feeling of pressure. The overall weight of the system was, in contrast, rated to be quite pleasant which might be caused by the flexible, thin material of the cap. Also, participants rated the adaptability of the cap as quite high. The cap was rated as being fixated well, thanks to the chin belt and the holes for the ears providing a lot of stability–only one participant had the feeling the cap had moved at all.

# CONCLUSION

Concluding in brief, the EEG system allowed for technically sound recordings, even with car-induced interferences present. It thus appears to be suitable for passive BCIs in autonomous driving scenarios, allowing mental states to be detected in real time.

In only a few minutes, individuals were able to apply and adjust a pre-customized cap, with the help of a little mirror, like the rear view mirror of a car. A system to better support the evaluation of signal quality would be beneficial, however.

According to the system usability scale, the system is at the edge of acceptability in terms of usability. This may suffice for professional drivers, who likely stand to gain the most from autonomous driving and supportive systems, but room for improvement remains. In particular the reported discomfort after longer use is unacceptable. Here, major improvement is necessary to decrease pressure on the scalp so the system is no longer obstructive and uncomfortable, hindering the users from focusing on themselves and their tasks.

Seeing now that EEG technology has made clear progress toward ease of use and mobile scenarios, we can envision the application of passive BCIs in the context of autonomous driving. Passive BCIs can provide essential information about the driver's cognitive or affective state, which can be combined with other sensor data of the car. In that way, the car can adapt to, and make decisions informed by, individual aspects of the driver. As passive BCIs do not rely on directed or even conscious actions of the driver (Zander and Kothe, 2011), the car will still drive autonomously but gains an additional stream of information, pertaining to the subjective situational interpretation of the driver.

For example, we can clearly imagine applications improving safety and comfort. In cases where the driver is required to take over control, the communication of this requirement can be adapted to the current, actual state of the driver. Another scenario would be the detection of whether or not communicated alarm signals were perceived and processed by the driver. These are only a few, simple examples of a broad range of applications to be thought of.

Moreover the investigated system could be used in a broader field of scenarios and might be of special interest for the field of Mobile brain/body imaging (MoBI). The field's objective is to acquire neurophysiological recordings of human cognition in real world environments where subjects perform real-world tasks. A portable, wireless, high-quality data recording and fast to prepare dry contact system would prove useful for brain activity recordings on actively behaving participants (Gramann et al., 2011, 2014; De Sanctis et al., 2012).

The application of passive BCI during autonomous driving however provides an exemplary use case for technology that adapts to the (neuronal) state of its operator during automation in general. Such Neuroadaptive Technology is a clear additional step toward closing the cybernetic loop (Pope et al., 1995).

#### ETHICS STATEMENT

The study involved standard EEG procedures covered in an ethic statement approved by the ethics committee of the Institute of Psychology and Ergonomics of the Berlin Institute of Technology. All participants gave written consent to their participation in the conducted study. They were provided with information on the purpose of the study, given the opportunity to ask questions and were informed that their participation was voluntary and they could end the experiment whenever they liked without a need to provide reasons. Participants also gave their consent for data recording, anonymous storage of that data, as well as its usage for publication.

# AUTHOR CONTRIBUTIONS

All authors contributed substentially to the work presented here. Everybody was contributing to the drafting and revising

#### REFERENCES


of the documents and approved the final version. Everybody agreed to be accountable for the integrity and accuracy of the work. Specifically: TZ designed and supervised the experimental procedures, conducted and supervised the analyzes, interpreted the results for the context of autonomous driving. LK and KG were responsible for quality of writing and validation of results. Everybody below was involved in conducting the experiments and ensured data quality. LA was responsible for the statistical analyzes and integrity of the manuscript. JP and MB were responsible for the electrode localization and the related mathematical procedures. LZ: Was responsible for the programming and EEG and BCI analyzes. AB: Was responsible for evaluation of the questionnaires.

### ACKNOWLEDGMENTS

We thank Brain Products GmbH (Munich, Germany) for providing us with the tested EEG System which made this research possible. We are also indebted to Prof. Dr.-Ing. Matthias Roetting and Mario Lasch, Chair for Human-Machinse Systems, TU Berlin for providing the research car and supporting us in all technical questions regarding the car.


International Driving Symposium on Human Factors in Driver Assessment, Training and Vehicle Design (Bolton Landing: NY), 92–98.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Zander, Andreessen, Berg, Bleuel, Pawlitzki, Zawallich, Krol and Gramann. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# An Intelligent Man-Machine Interface—Multi-Robot Control Adapted for Task Engagement Based on Single-Trial Detectability of P300

Elsa A. Kirchner 1, 2 \*, Su K. Kim<sup>2</sup> , Marc Tabie<sup>2</sup> , Hendrik Wöhrle<sup>2</sup> , Michael Maurus <sup>2</sup> and Frank Kirchner 1, 2

<sup>1</sup> Research Group Robotics, Mathematic and Computer Science, University of Bremen, Bremen, Germany, <sup>2</sup> Robotics Innovation Center (RIC), German Research Center for Artificial Intelligence (DFKI GmbH), Bremen, Germany

Advanced man-machine interfaces (MMIs) are being developed for teleoperating robots at remote and hardly accessible places. Such MMIs make use of a virtual environment and can therefore make the operator immerse him-/herself into the environment of the robot. In this paper, we present our developed MMI for multi-robot control. Our MMI can adapt to changes in task load and task engagement online. Applying our approach of embedded Brain Reading we improve user support and efficiency of interaction. The level of task engagement was inferred from the single-trial detectability of P300-related brain activity that was naturally evoked during interaction. With our approach no secondary task is needed to measure task load. It is based on research results on the single-stimulus paradigm, distribution of brain resources and its effect on the P300 event-related component. It further considers effects of the modulation caused by a delayed reaction time on the P300 component evoked by complex responses to task-relevant messages. We prove our concept using single-trial based machine learning analysis, analysis of averaged event-related potentials and behavioral analysis. As main results we show (1) a significant improvement of runtime needed to perform the interaction tasks compared to a setting in which all subjects could easily perform the tasks. We show that (2) the single-trial detectability of the event-related potential P300 can be used to measure the changes in task load and task engagement during complex interaction while also being sensitive to the level of experience of the operator and (3) can be used to adapt the MMI individually to the different needs of users without increasing total workload. Our online adaptation of the proposed MMI is based on a continuous supervision of the operator's cognitive resources by means of embedded Brain Reading. Operators with different qualifications or capabilities receive only as many tasks as they can perform to avoid mental overload as well as mental underload.

Keywords: EEG, P300, machine learning, space robotics, teleoperation, task load, man-machine interaction, embedded brain reading

#### *Edited by:*

Klaus Gramann, Berlin Institute of Technology, Germany

#### *Reviewed by:*

Tamer Demiralp, Istanbul University, Turkey Pieter-Jan Kindermans, TU-Berlin, Germany

*\*Correspondence:* Elsa A. Kirchner ekir@informatik.uni-bremen.de

*Received:* 05 December 2015 *Accepted:* 31 May 2016 *Published:* 21 June 2016

#### *Citation:*

Kirchner EA, Kim SK, Tabie M, Wöhrle H, Maurus M and Kirchner F (2016) An Intelligent Man-Machine Interface—Multi-Robot Control Adapted for Task Engagement Based on Single-Trial Detectability of P300. Front. Hum. Neurosci. 10:291. doi: 10.3389/fnhum.2016.00291

# 1. INTRODUCTION

Human-robot interaction with semi-autonomous robots has to be improved to be safe and intuitive. This can be achieved by (1) building robots with advanced "on-board" solutions that support natural interaction behavior between human and robot (Kirchner et al., 2015) and (2) by developing intelligent man-machine interfaces (MMIs). Especially in cases of tele-operating robots at remote places the MMI has to be easy, intuitive and comfortable.

Usually only experienced people are chosen to remotely operate robotic systems (Cornella et al., 2012 ˘ ), since their performance is robust. During remote control of several robots in a complex mission, task load and task engagement change tremendously over time, which can lead to mental over- or underload as well as fatigue. Therefore, an online-adaptable MMI can be applied to act on these changes. For this, reliable measures for online changes in the human's state must be detected (Allanson and Fairclough, 2004). Such realtime indicators have to consider theories about brain capacity and resources (Kahneman, 1973; Wickens, 1984, 1992, 2008), which propose that brain resources are limited and must be shared between tasks. Comprehensive work showed that certain patterns in the electroencephalogram (EEG), e.g., the amplitude of the event-related potential (ERP) P300 (Prinzel et al., 2003), or ratios of EEG power bands like alpha, beta or theta bands (Pope et al., 1995), can be used to measure the processing capability of the brain, mental workload and task demands. In earlier work from Pope et al. (1995) it is shown that an EEG-based index of user engagement and arousal could indeed be used to, i.e., adapt the level of system automation in response to changes in mental workload demands. It was found that especially the P300 is a reliable measure for changes in task load (Kok, 2001; Prinzel et al., 2003). Earlier work that examined the P300 in response to primary and secondary task demands showed that an increase in demands on the primary task resulted in fewer resources for the secondary task accompanied by a smaller P300 amplitude (Isreal et al., 1980). Many studies make use of the dual-task design (Isreal et al., 1980; Prinzel et al., 2003) to detect an increase in workload or task load in the primary task by analyzing the P300 amplitude evoked by the secondary task, e.g., listening to auditory stimuli presented in an oddball fashion (Prinzel et al., 2003) or P300 that is evoked by ignored probes (Kramer et al., 1995).

With the focus on online user state detection based on the analysis of brain activity, which is naturally evoked during human-machine interaction and deeply embedded into the systems control, embedded Brain Reading (eBR) was developed (Kirchner and Drechsler, 2013; Kirchner, 2014, 2015). The main focus of embedded Brain Reading is to passively infer on the human's intention to implicitly improve interfaces like an exoskeleton which is used for explicit interaction, such that the intended interaction or behavior can be supported best (Folgheraiter et al., 2012; Kirchner et al., 2013a,b, 2014). However, embedded Brain Reading can also be applied to passively infer on the users' neurophysiological state, such as their current workload or task load, to adapt an interface implicitly in such a way that the user is neither stressed nor bored (Kirchner et al., 2010, 2013b; Wöhrle and Kirchner, 2014a) which would both have negative impact on human-robot interaction. We already showed that eBR can utilize P300-related activity to infer, whether subjects recognize and will respond to important task messages, which were presented interleaved with task-irrelevant messages in an oddball fashion, while performing a complex interaction task like playing a labyrinth game (Kirchner et al., 2013b). In a later work we showed that eBR can indeed be applied to improve interaction in an application scenario in which subjects had to respond to warnings interleaved with task-irrelevant status messages while remotely controlling a robotic arm via an exoskeleton (Wöhrle and Kirchner, 2014a). In both cases, the information about the operator's capability of recognizing task-relevant warnings was used to adapt the developed MMI with respect to the timing of repetitions of task messages. To this end, the MMI was adapted before the operator would respond to the task message. In our previous work, subjects had to perform two tasks: controlling a machine and responding to task-relevant warnings. Thus, we did not make use of the primary and secondary task design just for the purpose of measuring task load on the user. The second task was indeed required to be performed by the user with the goal to estimate an operator's capability to perform two tasks at the same time. We also believe that even when using ignored probes to measure load on the user, i.e., workload (Kramer et al., 1995), any extra stimulation which is only added for the purpose of measuring load on the user will likely disturb the operator in a complex and demanding interaction task. Instead, we used the single-trial detectability of the naturally evoked P300 components in case that rare task-relevant stimuli were presented (i.e., warnings that anyway requested responses of the operator) and had to be answered as index of load, here, task load and task engagement. However, in many real world applications the occurrence of task-relevant target stimuli is likely not interleaved consistently with task-irrelevant stimuli as it was implemented in the previous studies by using the oddball design. Thus, it is of interest to investigate whether single-target stimuli successfully and reliably evoke P300 ERP components during human-machine interaction, as suggested by comprehensive work performed under controlled conditions of the single-stimulus paradigm (Mertens and Polich, 1997; Polich and Margala, 1997). Polich and Margala (1997) for example showed, that single-target stimuli evoke P300 components with similar characteristics as target stimuli presented in an oddball fashion as long as the probability and the inter target interval (ITI) were kept the same.

One research interest of the current work is therefore to investigate whether P300 ERP components are reliably evoked under application conditions in case of a single-stimulus presentation that was naturally embedded into a human-machine interaction task. We further investigate whether eBR can be used to adapt the frequency of task messages that are presented to the user by an MMI instead of modulating task repetitions as in a former work (Kirchner et al., 2013b; Wöhrle and Kirchner, 2014a). The adaptation of the MMI should again be performed online. However, the proposed MMI is designed for multi-robot control. Hence, an adaptation of the MMI with respect to the inferred task load and the users current task engagement in preceding, still ongoing, tasks for other robots can be investigated. Again, task engagement or task load was inferred from P300-related ERP activity that is naturally evoked during interaction. Both a high task load and a high task engagement to a preceding task were expected to reduce the amplitude of P300 related activity evoked by a new task message. In the presented work, subjects performed only one type of task: controlling different robots with respect to different requested tasks. Hence, we break down dual-task execution into sequential and timely overlapping task execution to investigate the influence of task load and task engagement between subsequent tasks. We again show that it is not necessary to artificially add an extra task or probe, like in the dual task or ignored-probe design, to evoke P300-related activity for measuring task load and task engagement. Instead we directly infer the task load and task engagement of the operator from the P300-activity evoked by task messages.

Hence, our approach matches natural requirements on the user during robot control since it avoids to add potentially disturbing stimuli, like auditory stimuli, just for the goal to measure and adapt for task load.

We further present and describe the developed MMI, which makes use of a virtual control environment, i.e., a Cave Automatic Virtual Environment (CAVE) (**Figure 1**). This MMI can be adapted based on the changes in task engagement of the user measured by EEG, i.e., P300-related ERP activity. While the presentation of each task-relevant message was expected to evoke a P300 we further assumed that the amplitude of a single-trial P300 evoked by a new task message is reduced in case that the user is still involved in executing a previous task. This is due to the fact that mental resources are still bound to the previous task. The more frequently such task conflicts occurred the stronger we expected a reduction in averaged P300 peak amplitude. We further assumed that the expected changes in P300 amplitude were mainly caused by effects like task engagement or task load but not by target probability, since the inter-stimulus interval (ISI) between stimuli was very long. Polich (1990) showed by means of an auditory discrimination task that the target probability has no effect on P300 amplitude in case of longer ISIs, i.e., ISIs longer than 6–8 s (Polich, 2007). For longer ISIs, the probability effect (Tueting et al., 1970; Duncan-Johnson and Donchin, 1977) is missing since brain resources can be redirected fast enough to process a new target stimulus.

It is important to state that in the present work the level of task load and task engagement as well as the occurrence of task conflicts may strongly depend on different factors, e.g., the general capability of the user in controlling the robots, fatigue levels or secondary requirements on attention that are not related to the main task, i.e., distractions of any kind that may occur while the operator was controlling the robots. While the concept of workload is distinct from the concept of multiple resource theory (Wickens, 2008), both concepts do overlap in real world applications and it is not always clear what contributes most. Moreover, additional mechanisms like confusion, cooperation between task elements like ongoing task engagement to the preceding task and unwanted diversion of attention influence the allocation of brain resources (Wickens, 2008). Additionally, as known from educational research, changes in the motivational state influence perception of workload, task complexity and cognitive strategies (Kyndt et al., 2011). Real world applications are therefore not a good paradigm to decouple components and dimensions of influencing parameters, but they can be used as a test case on whether certain measures can be used to predict the general state and capacities of a subject. Since the goal of our study was to measure the current task engagement or task load of an operator and to use this measure to adapt an MMI continuously to avoid an overall state of overload, we took measures to avoid excessive workload.

In summary, the scope of this study was to artificially evoke task conflicts to (I) not only show that P300-related activity was naturally evoked when task messages were presented, but also that it was indeed modulated by generally high demands on the operator and by task engagement to previous tasks and (II) that the detectability of P300-related activity could be used to adapt an MMI with regards to task engagement and therefore enabling a kind of steady-state task involvement. This should result in higher subjective contentment and high overall task performance.

The paper is structured as follows. In Section 2 we describe the experimental setting, i.e., the developed MMI, the kind of human-machine interaction task which can be performed and the interaction tasks that the subjects had to solve, the experiments that were performed for this work, and data recording procedure. We further describe our research goals and hypotheses in more detail and describe the performed data processing and analysis. In Section 3 we describe our results with respect to behavioral, machine learning and ERP average analysis. Finally in Section 4 we will discuss the outcome of our work and its relevance for the improvement of MMIs for multi-robot control.

# 2. MATERIALS AND METHODS

# 2.1. Experimental Design

We developed an experimental setup in which a subject can control several simulated robots. For this, we designed a virtual environment using the in-house developed software "Machina Arte Robotum Simulans" (MARS) (Rommerman et al., 2009; DFKI - RIC, 2015), which can be run as a 3D environment in, e.g., a CAVE (see **Figure 1**), as a 2D environment on a standard personal computer and monitors or a multi-screen system (see **Figure 2**). In both environments the operator can use different input devices to control the robot, e.g., a 3D mouse, a wand, an exoskeleton or an eye tracking device. In the future, the developed virtual 3D environment will be used to control real robots. To allow this, we use a physical simulation with close to realistic physical simulations of the real robots developed at our institute. In this work a 2D multi-screen system was used as the environment and a wand was used as the interface to control the simulated robots in the simulated environment. The used wand is a hardware device and functions in a 3D environment similar to a mouse in a 2D environment. It is tracked in 3D space using an ultrasound-based tracking system combined with an IMU and has five buttons as well as a pressure-sensitive joystick as input options. We used the inertial-ultrasonic hybrid tracking device

InterSense IS-900 (Thales Visionix, Inc., Billerica, USA) in our experiments.

or he had to wait for the automatic repetition of the task message.

#### 2.1.1. Human-Robot-Interaction

In general, the task of the operator in the multi-robot control environment (see **Figure 2**) was to supervise all robots and to assign new tasks to individual robots as indicated by messages presented to the user on the screen (see **Figures 3A,B** upper part for examples of different messages). Individual robots were labeled with different colors. Task messages were presented as icon based widgets supporting fast recognition by the operator. The operator used the interface to select a robot he or she wanted to control by either selecting the robot directly or by selecting the robot's icon in the upper part of the middle screen (see **Figure 3A**: 2). Moreover, information about the chosen system was presented to the operator on the right screen via an icon based information panel. Information such as the robot's name, its energy level, its current task as well as robot control commands were presented here (see **Figure 3A**: middle picture lower right corner). On the left monitor, tasks for the operator were listed as soon as the operator confirmed that he/she had seen the message by clicking on the appropriate robot icon on the monitor in the middle. By selecting the robot's icon with a double click, the virtual camera was additionally moved such that the chosen robot was in the focus of the operator. After selecting a robot, the operator can issue a task by clicking the corresponding robot control command icon. (see **Figure 3A**: 4). In case that an operator was not sure or did not recognize the robot to whom a task was assigned, he or she could select an unknown icon displaying a gray robot with a question mark (see **Figure 3A**). After clicking the unknown icon, all the missed tasks were displayed in the task list on the left screen. However, in the experiments presented here this gray robot button was disabled to force the subjects to focus on the task messages as much as possible. In case that a user did not recognize the task message correctly she

#### 2.1.2. Interaction Tasks

As mentioned in Section 2.1.1 the operator had to fulfill different tasks with the robots. Within the experiment there were three kinds of tasks with varying complexity:


Interaction is controlled by different software managers and schedulers. Widget-based icons are used to display information about the robots, messages for the user and to select robot commands. The user "Need-to-Know Area" is the part of the system visible to the user. The robot interface with connections to the real robots (depicted by dotted lines) is not yet implemented.

robot's state. The operator may also forget to click the recharge icon after the robot reached the lander. An example of such a message for the red robot can be seen in the upper right part of **Figure 3B**.

All tasks were pseudo-randomly chosen, such that no more than one task at a time was assigned per robot. When creating a new "Go to Landmark" task for a specific robot the robot's distance to the landmarks will be computed first. In order to solve the

task by clicking on a response button (2). A (middle) after the task was confirmed, it was shown in the task manager (3). A (bottom) when the green robot was selected, a menu with all possible control commands was shown. In this example, the mission could be accomplished by clicking on the send-message button of the control menu (4). When a task was accomplished, it was removed from the task manager (5). (B) The scenario contained three possible tasks, which were depicted by an intuitive symbol. All tasks were related to a specific robot, encoded by a colored symbol, see the following examples. B (top left) send a message with the green robot. B (top middle) send the red robot to waypoint 3. B (top right) recharge the red robot. Different robots (encoded by color) and different task messages were randomly combined. B (bottom) messages are sorted in order as they are presented. Some messages (repetitions of tasks) get a higher priority and will be presented earlier.

task the robot has to be in a specific radius around the chosen landmark. If the robot is already within the specific radius the new task would directly be solved when the robot is selected. In such a case the target landmark will be chosen among the other landmarks. Further, there was an automated mechanism which generated a "Recharge Robot" task in case that the energy level of a robot dropped bellow a certain value. This was necessary to ensure that a robot would remain fully functional. If a robot runs out of energy it would get stuck at its position and no more tasks could be solved by this robot.

Frontiers in Human Neuroscience | www.frontiersin.org June 2016 | Volume 10 | Article 291 |

When a message was presented requesting interaction the first response of the user like selecting the correct robot was counted as correct behavior. The message was not repeated. On the other hand, a predefined response time (in our experiments 13 s) and a predefined ISI was set for the operator. The predefined ISI was important for our experiments and research questions as will be explained in Section 2.4. Task messages were put into a message queue. To avoid unfair scheduling due to different urgency of information pending messages may change their priority over time (see **Figure 3B** lower part). So far it is implemented that a message is repeated as a warning in case that a complex task with longer duration is started, i.e., a robot is sent to a landmark, but does not arrive after a certain amount of time. Since the robot might have got stuck the warning is repeated with higher priority. To give the user an overview on initiated but still running tasks, they were visualized in a icon panel in the upper left corner of the left monitor in the order as they appeared with the newest depicted on the top (see **Figure 3A**: 5). As soon as a task was fulfilled the task message was removed.

#### 2.2. Performed Experiments

Six subjects participated in the study. All subjects were male with normal or corrected to normal vision and aged between 20 and 38 years (mean: 28.74, SD: 6.92). All subjects were intensively trained in the scenario on a different day to get used to the tasks, i.e., to control the robots by using the developed MMI. On the same day of the study just before data recording subjects were asked to get comfortable with the scenario. The study consisted of 6 runs, performed in the same order. In each run, subjects had to complete 30 tasks. The response behavior was supervised and logged by the message scheduler (see **Figure 2** lower part).

In case no response was detected within 13 s after presentation of a task message, the same task message was again attached to the message queue. Since the queue is implemented as a FIFO (first in first out), the message is repeated after presentation of all other messages within the queue.

Task messages (**Figure 3** top illustration and **Figure 3B**) were presented for 1.1 s. The duration of presentation was determined by empirical tests with a different group of 4 subjects. The goal was to keep the duration of message presentation as short as possible to allow the evaluation of event-related activity in the EEG while ensuring that subjects were able to recognize and understand the presented messages.

#### 2.2.1. Adaptation of the Inter-Stimulus Interval (ISI)

Between the 6 runs experimental conditions were varied with respect to the ISI (**Table 2.1**: EEG data). For runs 1 to 4 ISIs were fixed. We used two different ISIs: a long ISI (25 s) in runs 1 and 2 and a short ISI (15 s) in runs 3 and 4. In both cases an additional random jitter of ±5 s was added. Appropriate time intervals for long and short ISIs were empirically determined beforehand by tests with 4 subjects that were not involved in this study. The time interval for the short ISI was chosen such that the overall workload or overall task load caused by the message frequency was not too high. We were successful in empirically determining an appropriate time interval for short ISIs as supported by results of the evaluation of the NASA Task Load Index questionnaire (see Section 3.1.3). The time interval for the long ISI was empirically chosen to be clearly higher in the subjective perception of the 4 test subjects. A very low ISI could not be chosen, since we experienced that subjects easily gave up the run in cases of very short ISIs, i.e., with a duration of 5 s or even with a duration of 10 s. Further, no P300 was evoked under extremely stressful circumstances, as in runs with an ISI of 5 s. Moreover, to train the classifier qualitatively good training examples were required. And finally, we had to limit the number of runs and thus total experiment time to avoid overstraining the subjects.

For runs 5 and 6 the ISI was adapted online with respect to detectability of the P300 and related ERP activity. For the online detection of single-trial ERP activity a classifier was trained on examples from either runs 1 and 2 (for application in run 5) or on examples from runs 3 and 4 (for application in run 6) (see Section 2.8 for more details). Adaptation in runs 5 and 6 of the ISI was increased gradually (up to a maximum of 35 s in steps of 5 s) in case that an expected P300 was not detected two times in a row after a new task message or was decreased stepwise (down to a minimum of 5 s in steps of 5 s) in case that an expected P300 was detected two times in a row. For both adapted runs the ISI was preset to 25 s. We always startet with the fixed ISI condition with an ISI of 25 s in runs 1 and 2 to allow subjects to get comfortable with the control task. This was done since long training sessions just before the experimental session were not possible since they would have increased the total experiment time to an unacceptable long duration. For our experimental setting it was more important to record all runs in the same session to avoid between-session effects on the shape of the ERPs as well as the single-trial classification performance. Although subjects were intensively trained, they needed to readapt to the control of the robots, since the control task was very complex. Next, in runs 3 and 4 training data was recorded under the fixed ISI condition. We did not perform a run with adapted ISI right after the recording of training data with ISI 25 to keep both runs with adapted ISI close together and thus condition of the subjects similar. Further, interleaving runs with fixed and adapted ISIs were not performed, since this might have had an influence on the motivation of the subject during the recording of training data after a run with adapted ISI.

#### 2.2.2. Ethics Statement

The study has been conducted in accordance with the Declaration of Helsinki and approved with written consent by the ethics committee of the University of Bremen. Subjects have given informed and written consent to participate.

#### 2.3. Recorded Data

During each executed run EEG was recorded with 64 electrodes referenced against electrode FCz. An actiCap system (Brain Products GmbH, Munich, Germany) arranged as an extended 10–20 system was used for recording. Electrode impedance was kept below 5 k. EEG signals were sampled at 5 kHz, amplified by two 32 channel BrainAmp DC amplifiers (Brain Products GmbH, Munich, Germany) and filtered with a low cutoff of 0.1 Hz and high cutoff of 1 kHz.

# 2.4. Research Goals & Hypotheses

The presented work addresses two different research goals with specific subgoals. (I) We want to show that a P300-related activity is naturally evoked when task messages are presented and recognized. (Ia) We investigate whether the evoked P300 is modulated by factors like demands on the operator or the operator's task engagement to previous tasks. (II) We want to show that single-trial detection of P300-related activity can be used to adapt the interaction with respect to the task engagement of the operator. (IIa) In particular, we investigate whether an individual balanced task involvement of the operator can be achieved by adaptation of the ISI resulting in a higher subjective contentment of the operator and in an individually optimized overall task performance.

By means of data recorded in runs 1–4 we investigated research goal (I). We artificially modulated the current task engagement (on the previous task) by presenting a new task. This was achieved by modulating the time interval between both consecutive tasks: long ISIs of 25 seconds in runs 1 and 2; short ISIs of 15 s in runs 3 and 4. Changes in P300 characteristics were investigated by averaged ERP analysis and machine learning methods. To support the usage of single-trial P300 detection we had to assure that the detection performance is adequately high and not too strongly influenced by ISI per se such that for very short ISIs possibly no P300 would be detectable in single-trial. For this, an offline machine learning analysis was performed first with training and test on runs with the same ISI. These results were used as a baseline for other experiments. This condition was called "baseline" condition. Using this analysis, we investigated whether P300-related activity is detectable in single-trial under application conditions and for different ISIs as well as how strongly different ISIs would influence classification performance.

Further, we investigated the effect of classifier transfer between runs with different ISIs. More precisely, a transfer of classifier between training runs (runs 1 and 2 or runs 3 and 4) and test runs (runs 5 and 6 with adapted ISI) was applied. This condition was called "transfer" condition. This offline analysis was relevant because under the online condition the classifier was transferred between different ISI conditions. Different ISIs were caused by the adaptation of the ISI under the online condition. Results allow to estimate the sensibility of the classifier for changes in ISI.

To achieve research goal (II) we adapted the developed MMI with respect to the current task engagement of the user to previous tasks when a new task was presented in runs 5 and 6 (**Table 2.3**: online stCL). Current task engagement was measured by the online single-trial classification of P300-related activity evoked by recognized target stimuli, i.e., task messages: (1) task engagement to a previous task was expected to be high in case that the P300-related activity was weakly evoked by a new task and thus not detected by a classifier, (2) task engagement to a previous task was expected to be low in case that P300 related activity was more strongly expressed and thus detected by a classifier. Note that in the online case each EEG trial after a presented first task message was classified, thus in case the operator completely missed a task message no P300 was expected to be evoked and could therefore not be detected. Hence, our approach did not only account for reduced P300 activity but also for missed P300 in case of missed target events.

To prove that the interaction of the user was improved by online adaptation of the ISI, we analyzed the total runtime, median reaction time and number of late responses and missed messages. We expected a reduction in total runtime by online adaptation of the ISI compared to the case of a fixed long ISI (ISI-25; runs 1 and 2). We did not expect a significant difference to be found for reaction times, since our approach would avoid user overload and responses were rather complex (see Section 2.1). However, we expected some late responses and missed messages in cases that the user was strongly involved in ongoing tasks when a new task was presented.

Our approach of online adaptation of the ISI allows to adapt an MMI with respect to the current task engagement or task load, improves user performance by equalizing the level of task engagement over all tasks and by selectively avoiding task overload. To further support this, we investigated the effect of an online adaptation of the ISI on averaged P300-related activity, i.e., we investigated whether expected changes related to task engagement in P300 amplitude could be found. For this evaluation, we compared averaged activity evoked in case of a fixed ISI of 25 s (runs 1 and 2) and a fixed ISI of 15 s (runs 3 and 4) with averaged P300-related activity evoked in runs 5 and 6.

Based on the research goals, we had three hypotheses: (1) The online adaptation of the ISI reduces total runtime if compared to the long fixed ISI condition (ISI of 25 s). (2) The modulation of the ISI influences amplitudes of averaged ERP. In particular, we expect differences between ISI types with respect to peak amplitudes of the averaged ERP. (3) The usage of historic data is feasible to detect P300 in the current data (e.g., a transfer of the classifier trained on historic data to the current data is possible).

## 2.5. Analysis of Subjects' Behavior 2.5.1. Analysis of Total Runtime

The total runtime was measured as the time between the first and the 30th task message within the experiment. This procedure was chosen since the total number of tasks differs slightly. This happens if the last task is from one of the categories "go to landmark" or "recharge robot" and if the adapted ISI is quite low. Solving one of these more complex tasks may take some time since the traveling distance can be rather long. Therefore, all robots may get one of these tasks. When one of the robots reaches its goal position the experiment is finished, but in this way more than 30 task messages could have been displayed to the user (see **Figure 5**).

For the statistical analysis, the value of total runtime was merged depending on the ISI type. This leads to three groups: ISI-25 (runs 1 and 2), ISI-15 (runs 3 and 4), and ISI-online adaptation (runs 5 and 6). The three ISI groups were compared by the Friedman test. For multiple comparison, the Wilcoxon signed-rank test was performed (the p-value was adjusted by the Bonferroni-Holm correction).

#### 2.5.2. Analysis of Reaction Times

To calculate the reaction times, the EEG marker files were analyzed in order to deduce all important operator- and scenariorelated events. Whenever a message was presented to the operator or the operator issued a control command this was marked in the EEG file. Based on the markers we calculated the reaction times, i.e., the amount of time the operator required to react to a task message by clicking on the correct response button for the robot. Only first task messages were considered in the analysis. Repetitions of task messages were not analyzed. The median of reaction time was calculated because of strong deviations and outliers. For a comparison with the ERP average analysis an additional analysis was performed considering only reaction times after target trials with ISIs that were used for the average analysis, i.e., target trials which belonged to one of the both groups: ISI-long or ISI-short (see **Table 1**). Note that for the ERP analysis not all trials could be used since in run 5 6.82% of the ISI-long trials and 13.33% of the short ISI trials and in run 6 18.57% of the ISI-long trials and 12.05% of the ISI-short trials contained artifacts and were discarded from analysis.

For the statistical analysis, the value of reaction time was merged depending on ISI type and this leads to three groups: ISI-25 (runs 1 and 2), ISI-15 (runs 3 and 4), and ISI-online adaptation (runs 5 and 6). The three ISI groups were compared by the Friedman test. For multiple comparison, the Wilcoxon signed-rank test was performed (the p-value was adjusted by the Bonferroni-Holm correction).

Additionally to median reaction times we calculated late responses after 15 s, and missed messages. EEG trials after messages with responses later than 15 s as well as missed message trials were not considered during training of the classifier (see Section 2.8).

#### 2.5.3. Questionnaires

Before the experiments started, each subject was instructed to assess its skills related to the use of computers by filling out the "Computer usage questionnaire" (CUQ) (Schroeders and Wilhelm, 2011). For the statistical analysis, the Friedman test was performed to compare the patterns of computer usages between subjects. For multiple comparison, the Wilcoxon signedrank test was performed (the p-value was adjusted by the Bonferroni-Holm correction). Furthermore, after each of the six runs of the experimental session, the subjects had to fill out the NASA Task Load Index (TLI) questionnaire (Hart and Staveland, 1988). For the statistical analysis, the value of task load index was merged depending on the ISI type and this leads to three groups: ISI-25 (runs 1 and 2), ISI-15 (runs 3 and 4), and ISI-online adaptation (runs 5 and 6). The three ISI groups were compared by the Friedman test. For multiple comparison, the Wilcoxon signed-rank test was performed (the p-value was adjusted by the Bonferroni-Holm correction).

#### 2.6. Analysis of the MMI Behavior

The behavior of the MMI was analyzed by plotting the changes in the ISI for each subject in case of ISI adaptation (run 5 and 6, see **Figure 5**). **Figure 5** illustrates what kind of tasks were presented to the operator and which ISI was used, therefore the trace is the same as it was during the actual experiment. The purpose of this analysis was to give an impression of how "good" the adaptation worked and which ISI was most comfortable for the operator over the course of the run. For a comparison of the mean ISI between subjects, the mean ISI for each subject and run was calculated and the mean ISI of each run was compared between subjects by using the Friedman test. For a multiple comparison, the Wilcoxon signed-rank test was performed (the p-value was adjusted by the Bonferroni-Holm correction). Furthermore, we investigated whether the mean ISI is a useful indicator for the analysis of the MMI behaviors. To this end, the correlation between the mean ISI and the total runtime was calculated using the Spearman's rank correlation. We expected a positive correlation such that a longer ISI leads to a longer total runtime. In addition, we investigated task type as another factor with a potential effect on the total runtime. For example, the task types "go to landmark" and "charging robot" required a longer total runtime compared to the task type "send message." The frequency and order of task types were randomly chosen. Thus, differences in frequency of task types can in principle lead to differences in total runtime between subjects. However, we did not expect a strong correlation between task type and total runtime.

#### 2.7. ERP-Average Analysis

Continuous EEGs were bandpass-filtered (0.1–30 Hz) and segmented into "target" trials from −100 to 1000 ms with respect to the stimulus onset (baseline correction: from −100 ms before the stimulus onset to 0 ms). As for the machine learning analysis only trials after the first task messages which have been responded to within a time period of 15 s were labeled as "target" trials when analyzing runs 1–4. For runs 5 and 6 again only trials with answered task messages were used as "target" trials and averaged as explained in **Table 1**. This procedure copies the procedure of the offline analysis. Trials after missed task messages were not averaged to exclude their contribution to the average ERP characteristic. We used a common average reference (CAR) and recalculated the data from channel FCz. For ERP average analysis only artifact-free segments were used (see **Table 1**). Artifact detection was performed semi-autonomously with a maximum amplitude of −100µV and 100µV. We compared average artifact-free ERP activity evoked in runs with ISI-25 and ISI-15 as well as ISI-long and ISI-short. Trials for ISI-25 were conducted in runs 1 and 2 and trials for ISI-15 in runs 3 and 4. An adaptation of the ISI in runs 5 and 6 did not only result in various ISIs but also in individual ranges of ISIs for different users (see **Table 1**). Therefore, we individually divided the EEG segments of runs 5 and 6 into two ISI groups with respect to trials being evoked after short or long ISIs for each subject. For example, from the data of the subject depicted in **Figure 9** we merged examples after ISI-15 and ISI-20 to calculate average ERP activity after long ISIs and ISI-5 and ISI-10 to calculate average ERP activity after short ISIs (see **Table 1**). By means of this procedure, we could compare averaged P300-related activity



Subject Number of targets for all possible ISI-groups within runs 5 and 6


For average ERP analysis different ISIs were categorized in two ISI-groups: ISI-short (marked as red) and ISI-long (marked as blue).

for ISI-short and ISI-long of runs 5 and 6 with the activity evoked in runs 1 and 2 (fixed ISI of 25 ms: ISI-25) or runs 3 and 4 (fixed ISI of 15 ms: ISI-15) (**Table 2.2**). For peak detection, we selected a single window of the interval 0.3 –0.7 s after a "target" trial. The positive maximum peak was detected within the selected window.

For the statistical analysis of average ERP amplitude values with a sample size of 6 (i.e., 6 subjects), we performed the Wilcoxon signed-rank test to compare different ISI types (ISI-25 vs. ISI-15 and ISI-long vs. ISI-short).

#### 2.8. Machine Learning Analysis

The data flow of the machine learning algorithm is depicted in **Figure 4A**. For the analysis the software framework pySPACE (Krell et al., 2013a) was used. First the continuous EEGs were processed by a DC removal filter, which is an online-capable method for centering the signal around zero. The normalized EEGs then were decimated from 5000 to 25 Hz.A cutoff frequency of 4 Hz was used for the anti-alias filter in the decimation process (Jansen et al., 2004; Ghaderi et al., 2014). Afterwards the EEGs were segmented into chunks of 1 s length. Chunks cut right after a first task message (not after repetitions of messages) were labeled as "targets." Within the training, these windows were only cut if the operator responded to the first task message within 15 s after presentation, in the online case every first task message was analyzed. We further cut "standard" windows of length 1 s while training. These windows were needed to train the used binary classifier. The standard windows

were cut every second with the constraint that no other action relevant for task recognition was performed in a range from [−1, 1] s around the cut window. For the task recognition, actions such as the presentation of a task message or the response of the operator of one of these messages were used. The segments were further processed with the xDAWN spatial filter (Rivet et al., 2009). The xDAWN is a spatial filter especially designed for P300 detection. It (1) enhances the separability of the P300 ERP and noise and (2) reduces the dimensionality of the data. To achieve this, a set of filters maximizing the signal-to-signalplus-noise ratio is computed on a training data set. The resulting filters can be used to create a set of pseudo-channels that contain the filtered signal. From the newly created pseudo channels the 8 most relevant channels were used for further processing.

As features we used local straight line features, i.e., polynomial features. To fit a polynominal function EEG data must be segmented (see **Figure 4B**). Earlier investigations showed that the longer the segments are chosen, the more coefficients are needed to keep the performance level high. For this paper every 120 ms, segments of length of 400 ms within the 1 s segments after stimulus onset were cut. Polynominal features of order one, i.e., straight lines were fitted to the 400 ms long segments of the ERP data with 120 ms steps to describe the ERP in terms of a series of slope values (see **Figure 4B**). Polynominal features of order one have been chosen since in former investigations of P300 ERP activity the highest value was obtained with this low coefficient. Previous analyses, too, as performed for example in Wöhrle and Kirchner (2014b) support our choice.

After this preprocessing a Support Vector Machine (SVM) (Chang and Lin, 2011) was used as classifier. During training the complexity of the SVM was optimized with a grid search and an internal five-fold cross validation. The possible complexities were 10<sup>n</sup> with n ∈ 0, −1, . . . , −6. Further a threshold optimization was applied (Metzen and Kirchner, 2011). Further a threshold optimization was applied (Metzen and Kirchner, 2011). After building the model of a SVM the decision boundary is defined as 0 and the two classes (here target and standard) are at the positive and negative side of the boundary. The threshold optimizations gives the opportunity to further improve the classification performance with respect to a given metric, here the balanced accuracy. The threshold is shifted into the negative or positive direction, in a way that for the training data the highest classification performance in terms of balanced accuracy is achieved.

We used the balanced accuracy (bACC), i.e., the mean of true positive rate (TPR) and true negative rate (TNR), as the performance metric due to the insensitivity of this metric to changes in class distribution (Krell et al., 2013b; Straube and Krell, 2014). Area under the curve (AUC) values were additionally calculated. Classification performance was compared between all conditions. For details see **Table 2.3**. Although the adaptation of the ISI was evaluated online (**Table 2.3**: online stCL), we additionally analyzed the data in the offline mode (**Table 2.4**: offline stCL). This procedure was chosen for reasons of fair comparison. While in the online mode data of two runs (runs 1 and 2 or runs 3 and 4) were used for training, this was not possible for evaluating the general P300 detectability in case of fixed ISIs since here only one run could be used for training while the other was used for testing. By means of the chosen offline approach we were able to analyze the no-transfer case (as baseline/control) and the transfer case equally.

For the statistical analysis on single-trial classification performance, two separate comparisons were performed by using the Wilcoxon signed-rank test. First, we compared two online cases: online P300 detection in run 5 vs. run 6 (see (e) vs. (f) in **Table 2.3**: online stCL). Here, two samples per subject were obtained for each online case. Altogether, we obtained a sample size of 12 (2 samples × 6 subjects) for each online case. Second,


TABLE 2 | Design for the recording of EEG data, evaluation design for ERP analysis and design for the analysis of single-trial classification performance (online/offline-mode).

ERP, event-related potentials; online stCL, online single-trial classification; offline stCL, offline single-trial classification; and ISI, inter-stimulus interval. Each run contained 30 trials. For online single-trial classification, 60 trials (e.g., runs 1 and 2) were used to train a classifier and 30 trials (e.g., run 5) were used for evaluation. For offline single-trial classification, 30 trials were used for training and testing in both cases (no transfer/classifier transfer).

two adapted ISI conditions were compared with two fixed ISIconditions in offline mode depending on the type of training data (ISI-25 or ISI-15) used to train the classifier: (1) adapted ISI (e) vs. ISI-25 (control) (see in **Table 2.4**: offline stCL) and (2) adapted ISI (f) vs. ISI-15 (control) (see in **Table 2.4**: offline stCL). In the offline analysis, the number of training examples for the fixed ISI conditions (run 1 or run 2 / run 3 or run 4, see **Table 2.4**) was half the number of training examples used for the adapted ISI conditions in case of online evaluation (run 5 or run 6, see **Table 2.3**). For a fair comparison between the adapted and fixed ISI-condition, only one run (run 1 or run 2) was used to train the classifier to test it on run 5, and the mean of classification performance obtained by using run 1 or run 2 for training was calculated in the case of the adapted ISI(e) (see **Table 2.4** (e) in offline stCL). Similarly, in the case of the adapted ISI(f), only one run (run 3 or run 4) was used to train the classifier to test it on run 6 and the mean of classification performance obtained by using run 3 or run 4 for training was calculated (see **Table 2.4** (f) in offline stCL). Each adapted and fixed condition has two samples per subject. Altogether, we obtained a sample size of 12 (2 samples × 6 subjects) for each condition.

#### 3. RESULTS

# 3.1. Behavior of Subjects

#### 3.1.1. Total Runtime

**Figure 5** shows how the ISI changed over one run based on the inferred task load and task engagement of the user measured by P300 detectability. Subjects reported that the online adaptation made them feel to have just the right task frequency. This indicates that online adaptation of the MMI has a positive effect on the interaction. The finding was supported by the results of the behavioral analysis of the total runtime (see **Figure 6**). The online adaptation of the ISI reduced total runtime significantly if compared to the ISI-25 condition [p < 0.001]. Moreover, there was no significant difference in total runtime between the case of online adaptation of ISI and the case of ISI-15 condition [p = n.s.].

#### 3.1.2. Reaction Time

**Figure 7A** shows the median reaction time for individual subjects over all runs. It can be seen that median reaction times are very similar over all conditions and runs for each subject. When merging the two runs of each condition (ISI-25, ISI-15, and ISI-adapt) we found no significant difference between ISI types. However, when analyzing median reaction time individually for ISI-long and ISI-short groups of runs 5 and 6 as performed for average ERP analysis it can be seen that the reaction time on task messages presented after short ISIs showed a higher variance compared to task messages presented after long ISIs (see **Figure 7B**).

A descriptive analysis of the sum of late responses and missed messages per subject for each run is visualized in **Figure 8**. It can be seen that for some subjects the number of late responses and missed messages was higher than for others (subjects 3 and 4).

**Table 3** provides information about the number of late responses, missed messages and the sum of both as depicted in **Figure 8**.

#### 3.1.3. Questionnaires

The analysis of the "computer usage questionnaire" shows a significant difference between subjects, especially subject 4 differed significantly from the other subjects [p < 0.03]. The analysis of the "NASA Task Load Index (TLI) questionnaire" shows no significant differences between runs [p = n.s].

#### 3.2. Behavior of MMI

**Figure 5** depicts the changes of the ISI for both adapted runs (runs 5 and 6) for each subject. It can be seen that the adaptation of the ISI is very individual for each subject and even for each run. While for some subjects and runs, as for subject 2 in run 5, the ISI goes down to the minimum of 5 s and stays there for almost 20 trials, for other subjects the ISI is not reduced that much (see for example subject 5 for both runs).

In most cases the ISI gradually decreases just to later increase. However, there are exceptions from these findings. For example subject 1 shows a reduction of ISI at the end of run 6 and subject 6 stays with a low ISI during both runs. For all subjects the ISI starting with 25 s was reduced to a lower mean ISI with average values of 14.67 and 15.62 s (runs 5 and 6) (see **Table 5**). Moreover, we could also find differences in the mean ISI between subjects. For example, while the mean ISI for subject 4 and subject 5 is around 19 and 22 s (runs 5 and 6), the mean ISI for subject 6 is at 10.45 and 8.43 s (runs 5 and 6) and for subject 2 at 9.85 and 12.42 s (runs 5 and 6). The mean ISI for Subject 4 and subject 5 was significantly higher compared to the other subjects [p < 0.017]. Furthermore, the mean ISI correlated strongly with the total runtime [r = 0.874, p < 0001], but not the task type (e.g., send message, go landmark, etc.).

#### 3.3. Average P300-Related Activity

As shown in **Figures 9**,**10**, we observed differences in averaged ERP shape depending on the ISI condition (short/long ISI). Note that the ISI in case of long ISIs and short ISIs differ for both average analysis conditions (fixed-ISI condition and adapted-ISI condition, see **Table 5**). While for ISI-long average analysis condition the ISI is set to 25 s, ISI-long for the adapted-ISI condition is around 19 s. Similar differences can be found for the ISI-short average analysis condition (fixed short ISI: 15 s versus adapted ISI around 10 s). The peak amplitude of the averaged P300-related activity was not significantly reduced in case of ISI-15 (runs 3 and 4) compared to ISI-25 condition (runs 1 and 2) [p = n.s.]. However, we observed a significant reduction in averaged P300 amplitude in run 5 and run 6 for short ISI groups compared to long ISI groups [p < 0.04]. Furthermore, there was a significant difference between ISI-15 and ISI-short [p < 0.04], but not between ISI-25 and ISI-long [p = n.s.].

#### 3.4. Online P300 Detectability

Finally, we achieved high classification performances in both the online and offline analysis. In the online evaluation, we found no significant difference between both online runs [adapted ISI (e) vs. adapted ISI (f): bACC of 0.77 vs. bACC of 0.78, p = n.s., see adapted ISI (e) vs. adapted ISI (f) in **Table 4.1**]. In the offline evaluation, classification performance obtained by using the classifier trained on ISI-25 statistically differed from classification performance obtained in case of no transfer [ISI-25 vs. adapted ISI: bACC of 0.84 vs. bACC of 0.75: p < 0.003, see adapted ISI (e) vs. ISI-25 in **Table 4.2**]. However, we found no significant difference in classification performance when using the classifier trained on ISI-15 compared to the case of no transfer (ISI-15) [ISI-15 vs. adapted ISI: bACC of 0.80 vs. bACC of 0.79: p = n.s., see adapted ISI (f) vs. ISI-15 in **Table 4.2**]. There was no significant difference between the online and offline evaluation for the case of ISI-adaptation [adapted ISI (e) in **Table 4.1** vs. adapted ISI (e) in **Table 4.2**: p = n.s. ; adapted ISI (f) in **Table 4.1** vs. adapted ISI (f) in **Table 4.2**: p = n.s.]. In summary, we found a transfer effect on classification performance in case that the classifier was trained on data from the ISI-25 runs. However, such an effect was missing when the classifier was trained on data from the ISI-15 runs. It must be emphasized that the classification performance was very similar in case of both classifier transfer analyses, i.e., adapted ISI (e) and adapted ISI (f) (see **Table 4.1**).

### 4. DISCUSSION

#### 4.1. Improvement of Interaction

Supporting our hypothesis (1) behavioral data showed that total runtime in runs with adapted ISI was significantly shorter compared to an unadapted condition with an ISI of 25 s. Although there was no significant difference between the adapted ISI and the fixed shorter ISI of 15 s the mean total runtime was still very low considering the fact that runs with ISI adaptation did start at an ISI of 25 s. Significant differences in the total runtime between runs with adapted ISI and the fixed shorter ISI of 15 s were not expected, since the time needed until a task was performed by a robot does (although not strongly) depend on the type of task. For example, sending data was very fast and instant while reaching a certain landmark could take a long time depending on the current position of the robot and the landmark. Thus, some deviation in runtime depending on the kind of tasks that had to be performed by the robot, was expected. On the other hand, we did not choose subjects with a certain qualification but chose subjects independent of their experience in robot control or video gaming. Thus, we expected differences in the subjects' performances resulting in different "suitable" ISIs and hence also in different total runtimes. Important was that a significantly shorter runtime could be achieved compared to the fixed ISI-25 condition under which all the subjects could perform the tasks without being stressed.

Besides, the goal was not to reduce the total runtime to a minimum but to adapt the ISI with respect to the demands of the user of the MMI. Indeed, for some subjects the mean ISI was reduced to mean values around 10 s while for other subjects, i.e., subjects 4 and 5, the ISI was clearly above 15 s (around 19 s, see Section 3.2). On the other hand, even for subjects for whom the ISI was not reduced that much, mean ISI was clearly below 25 s, supporting our presupposition from the 4 test subjects that were not included in this study that a fixed ISI of 25 s ensures that all subject can easily perform the tasks but will probably make the subjects feel bored. An interesting

finding is that subject 4 for which the ISI was reduced only to a still high value (around 19 s) significantly differed from the other subjects with respect to computer usage as evaluated by the "Computer usage questionnaire" (CUQ). This finding supports our assumption that the MMI could be adapted based on the detectability of the P300 to support the user with respect to her or his general capabilities. Note that subject 4 showed the lowest classification performance in both runs compared to the other subjects (although no significant differences between subjects could be found, see **Table 4**). Moreover, subject 4 had a high amount of late responses and missed messages (see **Figure 8**). Another interesting finding is that the median reaction time does not significantly differ between subjects. This finding suggests that in our application behavioral data is probably not a good indicator for task load. Moreover, it shows that using our approach subjects were exposed to an appropriate workload. In summary, the results suggest that by using the developed MMI utilizing embedded Brain Reading, the MMI cannot only be adapted to the general capabilities of the user (e.g., experienced or rather inexperienced in computer usage) but also to the changes in task load over time.

# 4.2. Changes in the Characteristic of Average P300 Depending on the ISI

Applying average ERP analysis, we were able to show that during a complex multi-robot control task a P300-related activity is

TABLE 3 | Number of tasks with late or no response in runs 5 and 6.


evoked by task messages which are presented to the operator. This finding is the most important basis for our approach to adapt an MMI based on P300 detectability. As expected we found no significant differences in the averaged-peak P300 amplitude for both fixed ISI conditions. This supports earlier findings that the ISI has no influence on the P300 amplitude in case of long ISIs (longer than 6–8 s as found by Polich, 2007). More importantly this finding supports our assumption that on both fixed ISI conditions the general workload on the subjects was rather modest and comparable. Hence, any found differences in the P300 peak amplitude should be caused by changes in the current task load and task engagement. This finding is supported by the fact that in case of an ISI adaptation the average P300 peak amplitude was significantly reduced for trials after short ISIs compared to trials after long ISIs.

Our results from the average ERP analysis support hypothesis (2): we could show differences in the P300 peak amplitude for average conditions with a high task load (averaged ERP activity after ISI-long in adapted ISI condition) compared to average conditions with low task load (averaged ERP activity after ISIlong in adapted ISI condition).

The finding that the peak amplitude of the average P300 activity after trials with ISI-short (adapted ISI condition) is significantly smaller compared to the peak amplitude of the average P300 activity of both fixed ISI conditions (ISI-25 and ISI-15) suggests that for all subjects the MMI was indeed adapted to achieve the best performance without enhancing the workload too much such that no P300 would be evoked. Tests on 4 subjects (not included in this study) showed that in cases in which the workload was too high no P300 was evoked on average or could not be detected in single-trial while subjects reported that they were very stressed and could not perform the tasks. Hence, the MMI is adapted such that subjects perform best while avoiding an excessive general workload. Some subjects were able to keep their performance high with a short ISI all through the experiment while others did not. For the latter, the MMI was again adapted to longer ISIs reducing the task load back to normal. The task load and thus the general workload being modest under the adapted condition after long ISIs is supported by the finding that the average P300 peak amplitude evoked after long ISI trials under the adapted ISI condition is comparable to the average P300 peak amplitude under the fixed ISI conditions (ISI-25 and ISI-15). This was even the case although the mean long and short ISI differed strongly between subjects (see **Tables 1**, **5**). Based on these findings we suggest that the P300 ERP is indeed a good indicator for the current and individually different task load of a subject while controlling the robots.

# 4.3. Detectability of P300 in Single-Trial

The results of the offline machine learning analysis support that the P300-related activity which was evoked by task messages can be detected in single-trial even in case that the classifier is transferred between different ISI conditions. Thus, the results support hypothesis (3).

When comparing online classification with offline classification a performance drop can be observed. This can be explained as follows: In the online case each first message was classified independently of having been responded to. Therefore, trials after missed task messages which likely did

not contain a P300 were classified, leading to "false negative" results. It was therefore expected that classification performance was lower for the online case, since the approach is sensitive to missed targets. The small difference between online and offline results support that the MMI was well designed such that only few target events (messages) were completely missed (see also **Table 3**).

Besides this, in both transfer cases similar classification performance can be achieved. Hence, for an application it is not that relevant for the classification on which data a classifier is trained. While we found no significant differences between subjects for online classification performance it is noticeable that subject 4 had the worst classification performance in both runs compared to the other subjects (Discussion see Section 4.1).

# 4.4. P300 Detectability as Index for Task Load or Task Engagement

By reducing the ISI to way shorter ISIs compared to the ISI-15 condition (see **Table 5**) we strongly enhanced the task load and likelihood of conflicts since subjects might still be engaged in a former task when a new task message was presented. This is supported by two findings: (1) the higher variance in reaction time found for the ISI-short group (based on grouping for average analysis) and (2) the smaller average P300 evoked after short ISI trials in the adapted ISI condition (see **Figure 10**). Likely, subjects were still involved in a previous task and often could therefore respond to a new task only with a delay.

We found a similar effect in a previous study (Kim and Kirchner, 2012). In this previous study, subjects played a labyrinth game and had to respond to target stimuli which were presented in an oddball design. However, subjects were not allowed to respond to target events right away. We asked the subjects to steer the ball in a save corner first before answering a target event. When analyzing the average P300 potential we grouped the data with respect to reaction time such that the first group consisted of EEG trials with only short reaction times up to 1.4 s, for the second group trials were added which had reaction times up to 1.6 s, for the third group up to 1.8 s, the fourth up to 2.0 s, and the fifth up to 7.0 s. Although keeping the trials with short reaction times up to 1.4 s for the second group and up to 1.6 s for the third group, we still found descriptive differences in average peak amplitude of the P300 component between all groups with highest amplitude for the group of 1.4 s and lowest for the group of 7.0 s. When classifying between standard and target trials we found significant differences between the group of 1.4 s compared to all other groups with the exception of group 1.4 s compared to group 1.6 s and significant differences between the group of 7.0 s compared to all other groups with highest classification performance of 0.85 for the group of 1.4 s and lowest classification performance of 0.76 for the group of 7.0 s. These results suggest that ongoing task engagement, i.e., playing the labyrinth game, reduced the P300 evoked by a new target stimulus tremendously and would also reduce classification performance.

# 4.5. Summary and Outlook

In summary, our results show that complex interaction between humans and robotic systems can be improved by the application of an MMI adapted by eBR. The time between tasks can be adjusted such that a reduction of run time compared to a safe mode is possible. The strength of adaptation does further correlate with the experience of the user. Thus, the MMI can be adapted to the needs of the user within a range of workload that can otherwise not be resolved. Our approach shows that EEG activity like the P300-related activity that is naturally evoked during interaction can be used to adapt an MMI with respect to online changes in task load or task engagement of an operator. Thus, the dual-task design (with a primary and usually artificially introduced secondary task) that is often applied to infer on current processing capacity of the brain must not be applied to adapt for task engagement. The ERP activity can be used rather naturally, similar to approaches that make use of ratios of EEG power bands (Pope et al., 1995) while being specific to certain

#### TABLE 4 | Online and offline classification performance.

#### Table 4.1. Online single trial classification performances (cf. Table 2-3. Online stCL)


Table 4.2. Offline single trial classification performances (cf. Table 2.4. Offline stCL)


TABLE 5 | Mean ISIs in case of online ISI-adaptation (runs 5 and 6).


*Table 5.1*: mean over all trials. *Table 5.2*: mean over a selected group of trials with ISI-short and ISI-long as defined for average ERP analysis (see *Table 1*).

stages of information processing (Prinzel et al., 2003). Hence, for the user, our approach of measuring brain states and task engagement remains invisible and avoids any possible additional load on the user, since the task itself is used to measure task load, without any additional task.

In the future, we will have a closer look at the long term effect of adaptation of the ISI compared to a high task load condition, i.e., ISI of 10 s or even lower. For this, it is required to avoid the recording of extra training data since this requires a considerable amount of time. The total time for one experiment (6 runs) was already between three to 4 h including preparation. Thus, for a long term study, preparation and especially training of the classifier must be kept to a minimum. This can be achieved by using zero-training approaches (Krauledat et al.,

2008; Kindermans et al., 2012) or by using old training data from either previous recordings of the same subject or other subjects (Lotte and Guan, 2010; Devlaminck et al., 2011; Samek et al., 2013). To reduce transfer effects (between sessions and between subjects) adaptive algorithms for the spatial filter (Rivet et al., 2011; Ghaderi and Straube, 2013), the classifier (Li et al., 2008; Lu et al., 2009; Tabie et al., 2014) or both (Wöhrle et al., 2015) can be applied. Moreover, we want to investigate whether adaptive measures can be used to even improve the classification performance and the support for the user as we could already show for the prediction of movement onsets (Tabie et al., 2014). Finally, we will investigate transferability of the final approach to a mobile analysis system which makes use of hardware accelerators as already tested for the current application. Even for an adaptive approach hardware accelerators have shown to be feasible for the detection of both the P300 event-related potential (Wöhrle et al., 2013b,a, 2014a) and the movementrelated ERP activity (Wöhrle et al., 2014b).

#### AUTHOR CONTRIBUTIONS

EK developed concepts for the MMI and for data evaluation, interpreted the results and wrote most of the manuscript. She further contributed to data recording, evaluation and statistical design. SK performed the analysis of ERP averages and of questionnaires and the statistic evaluation of classification performance, ERP results and behavior data. She further wrote parts of the manuscript and supported data acquisition. MT

#### REFERENCES

Allanson, J., and Fairclough, S. (2004). A research agenda for physiological computing. Interact. Comput. 16, 857–878. doi: 10.1016/j.intcom.2004.08.001

Chang, C.-C., and Lin, C.-J. (2011). LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27. doi: 10.1145/1961189.1961199

conducted the experiments, designed the online processing flow, did the machine learning analysis and evaluated the ISI changes. He further wrote parts of the manuscript. HW conducted the experiments, designed the online processing flow and performed behavioral analysis with respect to reaction time and late reaction time, selected the questionnaires and wrote parts of the manuscript. MM conducted the experiments, adjusted the MMI to match the experiments needs and wrote parts of the manuscript. FK contributed to the concept of the MMI, critically discussed the research goals, and revised and improved the manuscript. All authors gave their final approval of the version to be published and agreed to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

# FUNDING

This work was funded by research grants from the German Federal Ministry for Economic Affairs and Energy (grant FKZ 50 RA 1011, FKZ 50 RA 1012 and grant FKZ 50 RA 1301).

#### ACKNOWLEDGMENTS

We want to thank Elke Neubauer for her help in data acquisition, Alexander Dieterle and Johannes Teiwes for their help during the development of the MMI, and Nils Eckardt for helping us in digitizing behavioral data as preparation of analysis.

Cornella, J., Zerbato, D., Giona, L., Fiorini, P., and Sequeira, V. (2012). "Dynamics ˘ simulation for the training of teleoperated retrieval of spent nuclear fuel," in ICRA (St. Paul, MN), 5012–5017.

Devlaminck, D., Wyns, B., Grosse-Wentrup, M., Otte, G., and Santens, P. (2011). Multisubject learning for common spatial patterns in motor-imagery BCI. Comput. Intell. Neurosci. 2011, 1–9. doi: 10.1155/2011/217987


Kahneman, D. (1973). Attention and Effort. Englewood Cliffs, NJ: Prentice Hall.


calibration," in Proc. International Congress on Neurotechnology, Electronics and Informatics (NEUROTECHNIX 2013) (Vilamoura: ScitePress), 46–53.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The reviewer PK and handling Editor declared their shared affiliation, and the handling Editor states that the process nevertheless met the standards of a fair and objective review.

Copyright © 2016 Kirchner, Kim, Tabie, Wöhrle, Maurus and Kirchner. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Perception and cognition of cues Used in synchronous Brain– computer interfaces Modify electroencephalographic Patterns of control Tasks

#### *Luz María Alonso-Valerdi1,2 \*, Francisco Sepulveda1 and Ricardo A. Ramírez-Mendoza2*

*1Brain-Computer Interfaces (BCI) Group, School of Computing Science and Electronic Engineering, University of Essex, Colchester, UK, 2Escuela de Ingeniería y Ciencias, Tecnológico de Monterrey – Campus Ciudad de México, Mexico City, Mexico*

A motor imagery (MI)-based brain–computer interface (BCI) is a system that enables humans to interact with their environment by translating their brain signals into control commands for a target device. In particular, synchronous BCI systems make use of cues to trigger the motor activity of interest. So far, it has been shown that electroencephalographic (EEG) patterns before and after cue onset can reveal the user cognitive state and enhance the discrimination of MI-related control tasks. However, there has been no detailed investigation of the nature of those EEG patterns. We, therefore, propose to study the cue effects on MI-related control tasks by selecting EEG patterns that best discriminate such control tasks, and analyzing where those patterns are coming from. The study was carried out using two methods: standard and all-embracing. The standard method was based on sources (recording sites, frequency bands, and time windows), where the modulation of EEG signals due to motor activity is typically detected. The all-embracing method included a wider variety of sources, where not only motor activity is reflected. The findings of this study showed that the classification accuracy (CA) of MI-related control tasks did not depend on the type of cue in use. However, EEG patterns that best differentiated those control tasks emerged from sources well defined by the perception and cognition of the cue in use. An implication of this study is the possibility of obtaining different control commands that could be detected with the same accuracy. Since different cues trigger control tasks that yield similar CAs, and those control tasks produce EEG patterns differentiated by the cue nature, this leads to accelerate the brain–computer communication by having a wider variety of detectable control commands. This is an important issue for Neuroergonomics research because neural activity could not only be used to monitor the human mental state as is typically done, but this activity might be also employed to control the system of interest.

Keywords: brain–computer interface, motor imagery, classification accuracy, electroencephalographic patterns, human factors

#### *Edited by:*

*Klaus Gramann, Berlin Institute of Technology, Germany*

#### *Reviewed by:*

*Reinhold Scherer, Graz University of Technology, Austria Josef Faller, Graz University of Technology, Austria*

> *\*Correspondence: Luz María Alonso-Valerdi luzalonsoval@gmail.com*

*Received: 17 August 2015 Accepted: 06 November 2015 Published: 23 November 2015*

#### *Citation:*

*Alonso-Valerdi LM, Sepulveda F and Ramírez-Mendoza RA (2015) Perception and Cognition of Cues Used in Synchronous Brain– Computer Interfaces Modify Electroencephalographic Patterns of Control Tasks. Front. Hum. Neurosci. 9:636. doi: 10.3389/fnhum.2015.00636*

# INTRODUCTION

A brain–computer interface (BCI) is a system that enables humans to interact with their environment by translating their brain signals into control commands for a device of interest (Graimann et al., 2010). The mechanism of a BCI system fundamentally consists of two steps: (1) detecting and decoding the user intentions for controlling the system and (2) maintaining a continuous user-system communication. The *user intentions* of controlling a BCI system are changes in the user brain signals that are regulated through control tasks. A control task can be based on exogenous or endogenous paradigms (Jackson and Mappus, 2010). Particularly, the endogenous paradigm is based on the quantification of brain oscillations that are modulated via cognitive tasks such as motor imagery (MI). The *user-system communication* is established by a control interface. A control interface can be synchronous or asynchronous (Hassanien and Azar, 2015). In a synchronous interface, the user-system communication is allowed only in fixed time windows; whereas in an asynchronous interface, the user initiates the communication with the system at will. Both synchronous (Obermaier et al., 2003; Leeb et al., 2006; Maeder et al., 2012; Bamdadian et al., 2014) and asynchronous (Scherer et al., 2007; Galán et al., 2008; Lotte et al., 2008; Tseng et al., 2015) systems have been developed over the past few years. For real-world applications, the prototyping of asynchronous systems is preferred because these allow users to interact naturally with their environment. The relevance of synchronous systems cannot, however, be ignored, even in real applications. The cueing process facilitates the early and accurate detection of the user control tasks, despite the user ability for modulating his/her brain signals. This, in turn, raises confidence, persistency, and autonomy in the users toward the mastery of BCI skills. Furthermore, the identification of MI onset allows to analyze prior and post periods, which have been associated with the improvement of BCI performance and the recognition of the user cognitive state (Maeder et al., 2012; Bamdadian et al., 2014; Gutierrez et al., 2015).

As brain signals are modulated by neural networks that modify their degree of synchronization according to the sensory–cognitive input, it is not surprising that control tasks (particularly those based on MI) contain much more information than only that related to the user intention of controlling the system (Kropotov, 2010). MI-related control tasks are a source of information that has been exploited not only to generate control commands for a target device, but also to enhance BCI performance, to predict classification accuracy (CA), or to determine the user mental state. For example, Pfurtscheller and Neuper (2001), and then Obermaier et al. (2003), reported that left- and right-hand MIs were correctly discriminated as early as 250 ms after the onset of a specific visual cue. They attributed the early discrimination to the cue properties, concluding that the control tasks were the result of conscious (MI) and unconscious (visual stimulation) processes over the sensory–motor area of the brain. Furthermore, in a later and more detailed study, Pfurtscheller et al. (2008) found that distinct short-lasting brain patterns appeared within a time window of about 500–750 ms after cue onset. Those brain patterns produced different features for different imaginary movements (hands and feet), facilitating and accelerating the discrimination of MI-related control tasks in naïve subjects. Another example is the study carried out by Grosse-Wentrup and Scholkopf (2012) in which high gamma range (55–85 Hz) between two frontoparietal networks were used to predict BCI performance on a trial-to-trial basis. Additional and important examples are two studies, respectively, undertaken by Maeder et al. (2012) and Bamdadian et al. (2014). Those researchers demonstrated that the user performance in classical synchronous BCIs can be predicted by quantifying the modulation of the brain signals on pre-cue stages because they reflected somehow the user cognitive state. Finally, and more recently, Scheel et al. (2015) found that visual and auditory cues provoked significant differences of the peak amplitude of movement-related cortical potentials in synchronous BCIs. They also found that potentials from the auditory-cue paradigm had a wider spatial distribution than those from the visual cue.

Overall, all aforementioned studies support the view that brain patterns extracted from MI-related control tasks can provide much more information than that used to control a target device. In particular, the cue effects on MI-related control tasks have been studied. Researchers in the field have shown that both perception (e.g., sensory–cognitive processing of the cue) and cognition (e.g., imaginary motor activity) are reflected on the brain signals wherefrom BCI control tasks are extracted. Studying the influence of human factors on BCI control tasks may help to design a more versatile human–machine interaction for this type of systems because of the active (e.g., extraction of control commands for manipulating a target device) and passive (e.g., monitoring of the level of attention of an individual) use of the brain signals. This work could have further applications in Neuroergonomics, where neural activity is registered in order to monitor human mental state. Making use of neural activity in an active and passive way may be much more fruitful.

So far, it has been shown that brain patterns before and after cue onset can reveal the user cognitive state and enhance the discrimination of MI-related control tasks. However, there has been no detailed investigation of the nature of those brain patterns. We, therefore, propose to study the cue effects on MI-related control tasks by selecting the brain patterns that best discriminate such control tasks, and analyzing where those patterns are coming from in order to answer two questions:


The present study was conducted as follows. First, brain activity was registered by means of electroencephalography (EEG). Second, the frequently used stimulation modalities (SMs) for cueing in training sessions were applied. These were auditory (Nijboer et al., 2008) and visual (Boostani et al., 2007) stimuli. In addition, a bimodal cue (combination of auditory and visual stimuli) was included in the study because previous investigations in sensory encoding (Basar et al., 1999; Isoğlu-Alkaç et al., 2007) have shown that simultaneous presentation of auditory, visual, and somatosensory stimuli significantly enhances sensory responses. Third, as preparation and imagination of movements evoke similar neural desynchronization events over the sensory–motor areas (Neuper et al., 2006) and both of them are widely used as control task, the two motor activities were included in the study. Finally, given that brain oscillations occur in a wide range of EEG recording sites, frequency bands, and time intervals (Kropotov, 2010); brain patterns were analyzed using two methods: standard and all-embracing. The standard method was restricted to the well-established motor activity sources (Pfurtscheller et al., 2007), while the all-embracing method involved all the available EEG information.

## MATERIALS AND METHODS

## Experimental Procedure

#### Participant Recruitment and General Instructions

Nine participants (four females and five males) took part in this study, which was previously authorized by the Ethics Committee of the University of Essex. All of them were aged between 28 and 41 years. None of them reported auditory impairments, seven of them had normal vision, and two of them had corrected-tonormal vision. Eight of the nine reported to be right-handed and only one was left-handed.

The participants were informed about the experimental procedure and signed a consent form. Only two of the nine had previously engaged in cognitive tasks related to imagination of movements. At the beginning of the experiment, every participant was carefully instructed as follows:


#### Organization of the Experiment

In order to collect sufficient EEG data, the participants attended two sessions. The sessions lasted 48 min each and followed an identical procedure. Every session consisted of six runs and one run had 50 trials. One trial took from 8500 to 9500 ms (**Figure 1**), resulting in runs of ~8 min. Within each trial, there were three phases: MP (0–2500 ms), MI (2500–6000 ms), and relaxing (6000–8500 ± 1000 ms). In the latter phase, a random variation of 1000 ms was included to reduce expectation effects.

As there were three SMs (audio, visual, and bimodal) and both hands (left and right) were involved, there were six categories of trials: audio-left, audio-right, visual-left, visual-right, bimodalleft, and bimodal-right. Each of these categories was randomly presented 50 times and distributed over the six runs. We thereby obtained 12 conditions (six categories of trials × two control tasks) and one condition had 100 trials (2 sessions × 50 trials).

#### Timing Protocol

The duration of the cues was standardized to 500 ms in accordance with sensory recognition and reaction time studies (Teichner, 1954; Shelton and Kumar, 2010). The movement preparation (MP) was adjusted to 2000 ms, which is the necessary period to achieve readiness in the neural networks over the sensory–motor area (Jeannerod, 2006; Neuper et al., 2006). The MI was limited to 3000 ms, as is commonly done in synchronous BCIs. The relaxation span varied from 2000 to 3000 ms, guaranteeing a proper recovery of the longest desynchronization process, i.e., the alpha one (Pfurtscheller et al., 1996). See **Figure 1**.

#### EEG Data Collection

The EEG signals were recorded by means of Biosemi equipment (Amsterdam, The Netherlands), the integration of ActiveTwo system and ActiView software (Honsbeek et al., 1998). The ActiveTwo system was configured to acquire the signals within a bandwidth between DC and 400 Hz, and at a sampling frequency of 2048 Hz. The ActiView software was programed to decimate the signals at 512 Hz. Such configuration limited the effective digital bandwidth to 104 Hz by default.

The EEG signals were sensed via 61 active electrodes, plus driven-right-leg and common-mode-sense electrodes. The 61 active electrodes were mounted on a head-cap labeled as stated in the 10/10 system. The other two electrodes were only used for referencing electrically the ActiveTwo system, but they were not recorded. In addition, three external electrodes were included for recording the eye movements (EOG). Two of them (EOGL and EOGR) were placed 1 cm below and above the lateral canthus of the left and right eyes, respectively. The third one was placed on the right mastoid (MR) for referencing EEG and EOG signals (**Figure 2**). At the end of the experiments, we gathered 18 datasets (9 participants × 2 sessions).

#### Data Analysis

The datasets of one participant were excluded from the study. Those showed electrode-pop artifacts over the occipital area of the scalp. There were then 16 datasets for the study purposes.

#### Processing of Continuous EEG Data

To attenuate the interference in the EEG channels, these were processed by using the open-access toolbox for electrophysiological signal processing, EEGLAB (Delorme and Makeig, 2004). First, every channel from each of the 16 datasets was processed as follows: (1) referencing against MR, (2) high-pass filtering at 0.1 Hz using a Butterworth filter of order 4, (3) low-pass filtering at 41 Hz using a Butterworth filter of order 7, and (4) down-sampling from 512 to 256 Hz. Second, every dataset was scanned to eliminate discontinuities and detect high-impedance electrodes. Up to three electrodes under this condition were detected per dataset. Third, independent component analysis was applied to each dataset for rejecting artifacts such as EOG and electrocardiography. Only EEG channels without high-impedance difficulties were involved

in such analysis. EOGL and EOGR channels were used to identify all the independent components related to EOG activity. Finally, the EEG channels with high-impedance difficulties were replaced by interpolating their nearest neighboring channels as reported by Gargiulo et al. (2010). See **Figure 3**.

#### Processing of the Control Tasks (MP and MI)

The processing of the control tasks was carried out through the *miBCI* software1 , package published by Alonso-Valerdi and Sepulveda (2015). From every EEG channel of the 16 datasets, the control tasks were extracted in line with the cue onset (**Figure 1**). The MP and MI were thus 2500 and 3500 ms long, respectively. Having obtained the EEG signals of interest, they were spatially filtered via large Laplacian in order to obtain more localized electrical activity (Dornhege et al., 2007).

#### Feature Extraction

It is well-established that MP and MI provoke neural desynchronization with peak power around 10 and 20 Hz (Neuper et al., 2009). As band power (BP) estimation has been validated as a stable and consistent method for quantifying EEG power changes due to motor activity (Neuper et al., 2005), this was selected as feature extractor. BP estimation was applied in line with the methods described below.

#### *Standard Method*

Previous investigations have empirically established the following criteria to effectively discriminate hand imaginary movements. First, 18 central recording sites have been validated as the maximum number of EEG channels for satisfactory classification (Ramoser et al., 2000). Second, narrow frequency bands around the maxima 10 and 20 Hz have been widely used in synchronous BCI systems (Pfurtscheller et al., 2007; Neuper et al., 2009). Third, it has become common practice to discard 1 s post-cue, wherein evoked potentials are typically detected (Boostani et al., 2007). With these criteria in mind, we laid down the *standard method*. This method was based on 15 central recording sites (**Figure 2**), four frequency bands, and EEG segments taking place 1 s post-cue. The frequency bands were established as follows: lower alpha (αL) from 8 to 10 Hz, upper alpha (αU) from 10 to 12 Hz, lower beta (βL) from 16 to 20 Hz, and upper beta (βU) from 20 to 24 Hz.

#### *All-Embracing Method*

EEG signals are regulated by brain oscillators that adjust their state of synchrony according to sensory (e.g., cue decoding) and cognitive (e.g., MP and MI) events. These oscillators are neural networks that enter into synchrony in a wide range of resonant frequencies (from 0 up to about 80 Hz) and over specific periods of time (Krause, 2003). In view of this fact, we extended the scope of the standard method by establishing the

<sup>1</sup>Available at https://github.com/LuzAlondra/BrainComputerInterfaces/tree/ master/MI-based\_BCIsystem

*all-embracing method*. This method was based on 61 recording sites (**Figure 2**), seven frequency bands, and whole trace of MP and MI. In addition to the previously mentioned bands, the following ones were also considered: lower theta (θL) from 4 to 6 Hz, upper theta (θU) from 6 to 8 Hz, and gamma (γ) from 39 to 41 Hz. These bands were included in the analysis on the basis of the following evidence. Theta band rhythms resonate at the frequency band 4–8 Hz and emanate from the frontal midline due to audio–visual information encoding, attention demands, memory retrieval, and cognitive load. Moreover, these rhythms enhance after practice on the cognitive tasks at hand. They are more prevalent when the subject is focused and relaxed, and prolonged activity is related to selective attention (Basar et al., 1999; Krause, 2003; Kropotov, 2010). The upper theta band (6–8 Hz) generally reflects levels of alertness (Pineda, 2005). On the other side, gamma band rhythms oscillate near 40 Hz during sensory encoding, perceptual–cognitive functions, and motor behaviors. These rhythms are phase-locked to the stimulus and shortlasting, and appear 100 ms post-stimulus in sensory–motor tasks (Pfurtscheller and Lopes da Silva, 1999; Ward, 2003; Altermaller et al., 2005).

Bearing in mind the criteria of standard and all-embracing methods, we can now briefly describe the feature extraction based on BP. The MP/MI signals were first filtered through Butterworth band-pass filters of order 7, with cut-off frequencies defined by the afore-stated bands. Afterwards, the signals were squared per sample and segmented by using time windows of 500 ms length with 50% overlapping rate. Finally, the resulting time segments (herein denoted by δ*n*) were averaged and logarithmically transformed (refer to **Figure 3**), obtaining nine features per MP signal and 13 features per MI signal.

By the standard method, there were 15 channels and 4 frequency bands under consideration. In addition, three time segments [δ1 (0–500 ms), δ2 (250–750 ms), and δ3 (500–1000 ms)] were discarded. Hence, vectors of 300 features for MP and vectors of 540 features for MI were obtained. By the all-embracing method, vectors of 3843 features for MP and vectors of 5551 features for MI were similarly obtained.

#### Feature Selection and Classification

After the feature extraction, there were 24 types of feature vectors that proceeded from three SMs, two control tasks, two hands, and two methods. These feature vectors were grouped by merging left and right MIs. Having obtained 12 different cases of study (**Figure 3**), Davies–Bouldin indexes (DBIs) were determined in each case to increasingly sort the corresponding features (Sepulveda et al., 2004; Kovács et al., 2005). DBI is a method for measuring the linear separability among *m* classes (Equation 1). This metric is based on comparing the similarity (*R*) among classes. Such similarity is determined by the class dispersion (*s*) and the distance (*d*) between centroids (Eq. 2). The class dispersion is the average distance between every element (τ) in the class and the centroid of the class (*v*). See Eq. 3. Thereby, the features within each vector were ranked from the most to the least suitable feature in terms of linear separability between two classes: left and right (Kovács et al., 2005). Note that smaller DBIs correspond to major linear separability.

$$\text{DBI} = \frac{1}{m} \sum\_{i=1}^{m} R\_i \tag{1}$$

where *R R j m* i ij <sup>=</sup> <sup>=</sup> max ( ) 1,..., and *i* = 1,…,*m*

*i j*

≠

$$R\_{\stackrel{\
u}{\
u}} = \frac{s\_i + s\_j}{d\_{\stackrel{\
u}{\
u}}} = \frac{s\_i + s\_j}{d\left(\nu\_i, \nu\_j\right)}\tag{2}$$

$$s\_i = \frac{1}{\mathcal{T}\_i} \sum d\left(\tau, \nu\_i\right) \tag{3}$$

where *T*i is the number of features in class *i*.

After ranking the features, a classification process took place in order to select the appropriate number of features that best

discriminated between left and right. If there were two classes and κ denoted the total number of features in each vector, K classifications were run for each case of study (**Figure 4**). From the K resulting CAs, the feature vector yielding the maximum performance was selected from each case of study. Thereby, we obtained 12 feature vectors for every participant. They were called the highest quality feature vectors (HQFVs). Note that the term "maximum performance" refers to 1.5 times the interquartile range plus the upper quartile of the general distribution of all the CAs obtained at the end of the process. As a result, any peak value that was beyond the 99% of the distribution was discarded.

Every classification process2 was based on Fisher discriminant analysis (FDA) and consisted of two phases: training and testing. The 100 available trials per cluster were distributed half and half; that is, 50 trials (session 1) for training and 50 trials (session 2) for testing. The classifier was trained via 10-fold cross validation. That is, 50 training trials were split into a training set and a validation set. Through the validation set, the model was optimized by adjusting the regularization term that generally avoids overfitting problems due to the large number of features in use (Bishop, 2006). Once the classifier had been trained, this was tested by the rest of the trials and the percentage of the total number of correct predictions was estimated. The resulting CA in the testing phase was the parameter for acquiring the HQFVs. See **Figure 4**.

In this study, all statistical analyses were performed using the non-parametric method Kruskal–Wallis one-way ANOVA, and significance levels were set at 5%.

#### Statistical Evaluation of the HQFVs

The features of the HQFVs proceeded from specific recording sites (e.g., C3, Cz, or C4), frequency bands (e.g., αL, αU, βL, or βU) and time windows (δn). The origin of a feature in any of these three domains (location, frequency, and time) was referred as to *feature source*. On this basis, the HQFVs were statistically evaluated in accordance with those three domains and under two parameters: index of dispersion (ID) and *mode*.

The ID was calculated by using Eq. 4 and was an approach to quantify how spread a HQFV was over the feature sources in each domain. In Eq. 4, *k* is the number of feature sources in the domain of interest, *fi* is the number of occurrences of each feature source, and *N* is the total number of features in the HQFV under analysis (Norman and Streiner, 2008). Note that ID is 0 when all the features fall into one feature source. By contrast, it is 1 when the features are equally divided among the *k* feature sources.

$$\text{ID} = \frac{k\left(N^2 - \sum f\_i^2\right)}{N^2\left(k - 1\right)}\tag{4}$$

The *mode* was the central tendency of a HQFV, i.e., the most frequently occurring feature source in the domain at hand. Having gathered the *modes* of all the HQFVs, these were graphically represented via a 2D-histogram (*modal* distribution) for each domain. In every 2D-histogram, the number of occurrences of each *mode* (*fmode*) was normalized by diving *fmode* by *N*.

<sup>2</sup>Before undertaking the classification process, note that the features were normalized using normalization and standardization methods developed by *mlpy* (high-performance Python package for predictive modelling avaliable at http:// mlpy.sourceforge.net/) developers, to avoid BP estimates in greater numeric ranges dominate those in smaller numeric ranges.

# RESULTS

#### Classification Accuracy of the HQFVs

The CAs reached by the HQFVs are arranged in **Figure 5**. This figure indicates that there is no significant difference of CAs among SMs (*p* = 0.935). The figure also indicates that there is a significant increase of CAs (*p* = 1.11 × 10<sup>−</sup>16) between standard and all-embracing methods for both control tasks (MP and MI) and the three SMs (audio, visual, and bimodal). Finally, the figure shows that CAs (*p* = 0.707) between MP and MI are comparable for the three SMs and the two methods in use.

# Index of Dispersion of the HQFVs in Location, Frequency, and Time

The IDs of the HQFVs, which were obtained from the standard method, are presented in **Figures 6A–C**. In location and time, the HQFVs are generally spread over all the feature sources showing IDs above 0.52 and 0.7, respectively. By contrast, IDs range between 0 and 1 in frequency. The IDs of the HQFVs, which resulted from the all-embracing method, are provided in **Figures 6D–F**. These IDs are above 0.85, 0.5, and 0.87 in location, frequency, and time, respectively. The statistical comparison of the IDs between both methods in location, frequency, and time resulted in the following *p*-values: 1.461 × 10<sup>−</sup><sup>9</sup> , 0.049, and 4.767 × 10<sup>−</sup><sup>7</sup> . Note that all the remarks mentioned in this section apply to MP and MI.

#### Modal Distribution of the HQFVs in Location, Frequency, and Time Standard Method

**Figure 7** presents the *modal* distribution of the HQFVs over the following feature sources: (a) 15 recording sites, (b) 4 frequency bands, and (c) 5/9 time windows for MP/MI. With regard to the location domain, **Figure 7A** shows that *modes* from audio cues mainly tend toward FC3 and C3, while those from visual cues mostly tend to FC2, FC4 (only applicable for MI), and C4. *Modes* from bimodal cues are essentially distributed among FC3, C3, FC4, and C4. In all the cases, MI displays greater tendencies than MP.

We can see from **Figure 7B** that the overriding band for the three SMs is αU. In this case, the highest and the lowest tendencies are reached by *modes* from visual and bimodal cues, respectively. The MI control task shows a second dominant band. Such dominant band for *modes* from audio and visual cues is βL, while that for *modes* from bimodal cues is βU. The MP control task only shows a second dominant band for *modes* from bimodal cues, which is βL. As in the location domain, MI reveals stronger tendencies in comparison with MP.

Lastly, **Figure 7C** provides the *modal* distribution in time. Keeping in mind that MP only involved five time windows (from δ5 to δ9), we can see that *modes* from the three SMs are evenly distributed along most of them. Although MI involved the nine time windows, the *modes* from the three SMs are mostly distributed across δ5 and δ9 as well. In both cases, the major *modal* tendencies for audio, visual, and bimodal cues are, respectively, the following: δ7/δ8 (1500–2000 ms), δ6/δ8/δ9 (1250–2500 ms), and δ5/δ9 (1000–1500 ms and 2000–2500 ms). There is additionally a relevant *modal* distribution over δ13 for the three SMs in MI, regardless of the decreasing trend of the foregoing time windows.

#### All-Embracing Method

**Figure 8** provides the *modal* distribution of the HQFVs over the following feature sources: (a) 61 recording sites, (b) 7 frequency bands, and (c) 9/13 time windows for MP/MI. With respect to the location domain, **Figure 8A** indicates that *modes* from the three SMs are distributed over about 40% of the feature sources in both control tasks. Specifically, *modes* from audio cues are distributed among 24 of the 61 recording sites. From those, 62% are on central areas, 25% are on parieto-occipital areas, and 13% are on frontal areas. *Modes* from visual cues are also distributed among 24 of the 61 recording sites. However, those are differently spread. Over

half of them are distributed between frontal and parieto-occipital areas (33% and 21%, respectively), while less than half of them are related to central areas (46%). *Modes* from bimodal cues are distributed among 27 of the 61 recording sites. From those, 56% are on central areas, 37% are on parieto-occipital areas, and 7% are on frontal areas.

**Figure 8B** illustrates the prevalence of αU band in the *modes* from the three SMs in both MP and MI. The figure also reveals the secondary but not insignificant role of βL band. The *modal* distribution between θL and αL bands is moderate for the three SMs, whereas that between θU and βU bands is negligible for the three SMs. Furthermore, the *modal* distribution over γ band is considerable for bimodal cues. In all these cases, MI shows much higher tendencies than MP.

Last but not least, **Figure 8C** depicts the *modal* distribution in time, considering that MP only involved 9 of the 13 time windows. It can be seen from this figure that the *modes* from audio cues are spread across δ1 and δ9 (0–2500 ms), while those from visual and bimodal cues are spread across δ1 and δ7 (0–2000 ms). Particularly for MI control tasks, the *modes* from audio, visual, and bimodal cues strongly tend toward δ3 (500–1000 ms), δ<sup>4</sup> (750–1250 ms), and δ2/δ3 (250–1000 ms), respectively. In addition, there is an unexpected *modal* tendency to δ13 for the three SMs in MI.

# DISCUSSION

This paper set out with the aim of analyzing the cue effects on the discriminability of MP and MI control tasks. The analysis was carried out using two methods: standard and all-embracing. The standard method was based on feature sources, where the modulation of brain signals due to MP/MI is typically detected. For the all-embracing method, the scope of the standard method was extended by including a wider variety of feature sources, where not only motor activity is reflected. The analysis was limited to the HQFVs, i.e., the feature vectors that yielded the highest CAs during a DBI-FDA process. The following is a discussion of the most relevant results of the analysis.

recording sites, (B) 7 frequency bands, and (C) 9/13 time windows for MP/MI.

# Classification Accuracy of the HQFVs

On the question of improving CA by using different cues, we found that there was no significant difference in the discrimination of MI-related control tasks triggered by three SMs: audio, visual, and bimodal. However, there was a significant increase in the CA of control tasks analyzed under the all-embracing method over those analyzed using the standard method. These findings demonstrated that an unbiased approach in location, frequency, and time leads to a better performance, but different cues do not make a difference. Although Scheel et al. (2015) suggested that different stimuli might improve the CA of the control tasks at hand, this study has been unable to demonstrate that. Nevertheless, it is a fact that more distinguishable EEG patterns are extracted from no MI-related sources. Possibly, other type of stimuli could improve the differentiation of MI-related control tasks. In accordance with the findings of Pfurtscheller and Neuper (2001) and Obermaier et al. (2003), control tasks are result of conscious and unconscious processes. As different stimuli may evoke different unconscious processes, more differentiable EEG patterns could be found. However, this needs further investigation.

Another important result was the analogous performances of MP and MI for the three SMs and the two methods. It is well established that both control tasks generate similar event-related oscillations, but the "no-go" signal accompanying MI is frequently overlooked (Krepki et al., 2007). An imaginary movement activates motor areas of the brain almost to the same extent as a real one, except for the visible contractions. This means that the neural commands for muscular contractions are blocked at some level of the motor system by an active inhibitory mechanism. This questions whether MI in motor-disabled people takes place like that in healthy ones, or it is rather a real movement process (Jeannerod, 2006). In addition, the use of MI as control task involves the development of an electromyographic detector so as to eliminate undesirable muscular activity un- or consciously triggered by healthy BCI users. Based on these two factors and given that both control tasks achieved analogous performances, MP may be a better option for BCI systems.

# Index of Dispersion of the HQFVs in Location, Frequency, and Time

For both methods, the distribution of the HQFVs over the available feature sources was much more even in location and time than in frequency. This indicates that the most gainful features for discriminating between left and right MIs mainly proceeded from the entire set of recording sites and the total duration of the control task, but only from a specific frequency band (αU). The finding is in agreement with that of other studies (Pfurtscheller and Neuper, 2001; Neuper et al., 2009) which showed that the correct discrimination between left and right started 250–500 ms after cue onset and where the most discriminating frequency band was the αU. With reference to the location domain, although this finding differs from some published studies (Ramoser et al., 2000; Leeb et al., 2006), it is consistent with those of Meckes et al. (2004) and Sepulveda et al. (2004). They suggested giving attention to non-motor locations, even when the mental task of interest was movement related.

The current study also found that the HQFVs tended to be more widely spread over the feature sources in the all-embracing method than in the standard method. It is worth mentioning that the inclusion of more feature sources increased the diversity of HQFVs. This result corroborate the ideas of Pfurtscheller and Neuper (2001) and Obermaier et al. (2003), who suggested that control tasks are result of the mental effort demanded by the control task (conscious process) and the sensory–cognitive processing of the cue (unconscious process).

## Modal Distribution of the HQFVs in Location, Frequency, and Time

The *modes* of the HQFVs fundamentally tended toward the expected sources (Pineda, 2005). These were the C3/C4 recoding sites and the αU/βL frequency bands. The *modes* also revealed clear tendencies toward feature sources that reflected the nature of the cue in use. Before going on to discuss this further, it is necessary to mention that the *modal* tendencies were much greater in MI than in MP. The reason for this is not clear, but it may be due to the mental effort involved in each control task. MP is such an intention, whereas MI is a dynamic process that goes through many of the central phases of actual movements.

#### Standard Method

The *modes* from audio cues tended to the left hemisphere, where some language-related functions take place, whereas those from visual cues tended to the right hemisphere, where visual perception is processed (Kropotov, 2010). Being the bimodal cue, a composition of audio and visual cues, the corresponding *modal* tendency was to both hemispheres. This result suggests that the most discriminating features were defined not only by the MP/ MI mechanisms but also by the sensory–cognitive processing of the cue in use. With respect to the frequency domain, the involvement of high frequency bands took importance successively in *modes* from audio, visual, and bimodal cues. This result may be related to previous work of Giannitrapani [whose work is cited in Kropotov (2010)], who found that high beta activity (21–33 Hz) increased when the stimulus structure complexity also increased. It is possible, therefore, that the cue complexity had played a significant role in the discrimination process of features as well. Regarding the time domain, the highest tendencies took more time (after the cue onset) to appear in *modes* from audio than from bimodal cues. Hence, it is also possible to hypothesize that more informative features were found earlier when a more direct cue was employed.

#### All-Embracing Method

The *modes* from audio cues mostly tended to central recording sites, where auditory evoked potentials are typically recorded (Proverbio and Zani, 2003), and to δ3 (500–1000 ms) time window, where brain rhythms normally respond to the recognition and/or retrieval of acoustic stimuli (Krause, 2006). On the other hand, the visual stimulation is registered around 200 ms poststimulus as a response to modulations of alpha band rhythms over parieto-occipital areas and beta band rhythms over frontoparieto-occipital areas (Kropotov, 2010; Andreassi, 2013). This may be a reason why *modes* from visual cues tended to frontoparieto-occipital recording sites, αU band, and δ2 (250–750 ms) time window. Finally, the *modes* from bimodal cues displayed a well-balanced distribution between central and fronto-parietooccipital recording sites and between δ2 (250–750 ms) and δ<sup>3</sup> (500–1000 ms). This finding confirms that bimodal stimuli require feature sources that are also required by audio and visual stimuli separately (Isoğlu-Alkaç et al., 2007). In the frequency domain, the remarkable tendency of these *modes* toward γ band accords with previous findings of Ward (2003), who found that the sensory decoding around 250 ms post-stimulus is reflected in modulation of γ band rhythms.

In the three SMs, one unanticipated finding was the minor role occupied by βU band that is well-known as one of the major contributors in the discrimination process of MI activity. A possible explanation for the small contribution of this band is that neural desynchronization around 20 Hz has been considered as a harmonic response of desynchronization around 10 Hz, whereas the one around 16 Hz is an authentic response to motor activity (Pfurtscheller et al., 1996). Moreover, Pfurtscheller et al. (1999) found that the most discriminating frequency components throughout MI-related tasks were found within the αU band in three of four subjects, while those were found within the βU band only in one subject.

Lastly, it is worth noting the underlying tendency of *modes* from visual and bimodal cues toward the δ9 (2000–2500 ms) time window in the standard method. There was also another clear tendency of *modes* from the three SMs toward the δ13 (3000–3500 ms) time window in both methods. For visual and bimodal stimulation, we believe that gaze fixation at the screen center provoked by cues "left"/"right" could have driven the participants to anticipate the upcoming cue "start." Similarly, the cue "start" appearance caused the cue "stop" expectation. For audio stimulation, once the participants had received the cue "start," and owing to the likeness between increasing and decreasing tones (cues "start" and "stop," respectively), the anticipation of the audio cue "stop" was likely to have arisen. This speculation is supported by the findings of Scherer et al. (2008), who found that the involuntary expectations for the approaching cues provoked false control commands during virtual navigation. Another interesting tendency of *modes* of the three SMs is toward δ2 (250–750 ms) and δ3 (500–1000 ms) time windows in the all-embracing method. These results are in agreement with the findings of Pfurtscheller et al. (2008), who showed that distinct short-lasting brain patterns appeared within a time window of about 500–750 ms after cue onset.

All these interpretations must be, however, taken with caution. More research on this topic need to be undertaken because these findings can only be conclusive in early training sessions. The effects observed in this study could diminish or vanish, either in further training sessions or in online applications. Another source of uncertainty is associated with the ambiguity of multivariate classifiers (such as FDA) to determine the brain regions, frequency, and time intervals where cognitive processes are reflected. Haufe et al. (2014) demonstrated that backward models (e.g., multivariate classifiers) combine information from different channels to separate the brain patterns of different classes. These models may, however, give significant weight to channels unrelated to brain processes of interest. By contrast, forward models (e.g., blind source separation) explain how the measure data are generated from the neural sources, providing a neurophysiological meaning of the outcomes. Furthermore, Haufe et al. showed that brain patterns were much smoother and covered more diverse cognitive-related areas, when those patterns were obtained via forward methods. These findings are of particular interest due the nature of our study. It seems that the present results are limited by the methods applied to select the features vectors. Possibly, by transforming the backward model in use (DBI-FDA process) into a forward model such as proposed Haufe et al. (selection of brain patterns according to the neurophysiological contribution of each EEG channel), a clearer feature distribution over unrelated MI sources could have been achieved.

#### Implications on Neuroergonomics Research

This is a key issue for Neuroergonomics research because neural activity could not only be used to monitor the human mental state, but this might be also employed to control a system of interest. In fact, Myrden and Chau (2015) have suggested to develop a BCI system on the basis of an overt adaptation to keep user mental within the optimal region, and a covert adaptation that automatically adjusts BCI parameters according to such mental state.

An important application may be on driver modeling and vehicle simulation environments (Xu et al., 2015). These two areas of research have been of interest to develop driver assistance systems for safer driving and intelligent transportation. For example, EEG signals of a driver can be used to model the driver neuromuscular dynamics (Bi et al., 2015) and, thus, improving the performance of a driver simulator. Such EEG signals can also

#### REFERENCES


be employed to detect the fatigue (Wang et al., 2014) and level of attention (Wang et al., 2015) of the driver to activate the driver simulator and, hence, preventing driving accidents. Furthermore, the performance of the driver simulator can be improved by analyzing the human reaction to traffic cues such as car horn, direction indicators, and traffic lights. All of these cues produce specific EEG patterns on the driver brain signals as has been shown in this study.

#### CONCLUSION

The findings of this study have provided a new understanding of how MI-related control tasks used to control a BCI system may become modified by their preceding cues. Although previous investigations have somehow studied the cue effects on MI-related control tasks; in this study, we have shown that the CA of those control tasks does not depend on the type of cue in use. Moreover, we found that the EEG patterns that best differentiate MI-related control tasks emerge from recording sites, frequency bands, and time windows well defined by the perception and cognition of the cue in use. An implication of this study is the possibility of obtaining different control commands that could be detected with the same accuracy. Since different cues trigger control tasks that yield similar CAs, and those control tasks produce EEG patterns differentiated by the cue nature, this leads to accelerate the brain–computer communication by having a wider variety of detectable control commands. This is an important issue for Neuroergonomics research because neural activity could not only be used to monitor the human mental state, but this might be also employed to control the system of interest.

#### ACKNOWLEDGMENTS

LA-V wishes to express her sincere gratitude to the National Council of Science and Technology of Mexico (CONACyT) for providing funding support to accomplish this research project (grant number: 306714). Authors also thank Alexander Bedborough (language advisor of the International Academy at the University of Essex) for his editorial help with the manuscript.


interfaces for continuous control of robots. *Neurophysiol. Clin.* 119, 2159–2169. doi:10.1016/j.clinph.2008.06.001


Wang, H., Zhang, C., Shi, T., Wang, F., and Ma, S. (2014). Real-time EEG-based detection of fatigue driving danger for accident prediction. *Int. J. Neural Syst.* 25, 1–14. doi:10.1080/00207721.2014.953798

Wang, Y. K., Jung, T. P., and Lin, C. T. (2015). EEG-based attention tracking during distracted driving. *IEEE Trans. Neural Syst. Rehabil. Eng*. 23(6), 1085–1094.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Alonso-Valerdi, Sepulveda and Ramírez-Mendoza. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# The Brain Is Faster than the Hand in Split-Second Intentions to Respond to an Impending Hazard: A Simulation of Neuroadaptive Automation to Speed Recovery to Perturbation in Flight Attitude

Daniel E. Callan1, 2 \*, Cengiz Terzibas <sup>2</sup> , Daniel B. Cassel <sup>3</sup> , Masa-aki Sato<sup>4</sup> and Raja Parasuraman<sup>5</sup>

*<sup>1</sup> Center for Information and Neural Networks, National Institute of Information and Communications Technology, Osaka University, Osaka, Japan, <sup>2</sup> Multisensory Cognition and Computation Laboratory, Universal Communication Research Institute, National Institute of Information and Communications Technology, Kyoto, Japan, <sup>3</sup> Locomobi Inc., Toronto, ON, Canada, <sup>4</sup> Neural Information Analysis Laboratories, Advanced Telecommunications Research Institute, Kyoto, Japan, <sup>5</sup> Center of Excellence in Neuroergonomics, Technology, and Cognition, George Mason University, Fairfax, VA, USA*

#### Edited by:

*Thorsten O. Zander, Technical University of Berlin, Germany*

#### Reviewed by:

*Sebastien Helie, Purdue University, USA Josef Faller, Graz University of Technology, Austria*

#### \*Correspondence:

*Daniel E. Callan dcallan@nict.go.jp*

#### In Memoriam:

*The present manuscript wishes to be the authors' personal honor to the memory of Raja Parasuraman, as a tribute to his scientific competence, and the contribution that he gave to our research projects in these years of proficient collaboration.*

> Received: *06 November 2015* Accepted: *12 April 2016* Published: *27 April 2016*

#### Citation:

*Callan DE, Terzibas C, Cassel DB, Sato M and Parasuraman R (2016) The Brain Is Faster than the Hand in Split-Second Intentions to Respond to an Impending Hazard: A Simulation of Neuroadaptive Automation to Speed Recovery to Perturbation in Flight Attitude. Front. Hum. Neurosci. 10:187. doi: 10.3389/fnhum.2016.00187* The goal of this research is to test the potential for neuroadaptive automation to improve response speed to a hazardous event by using a brain-computer interface (BCI) to decode perceptual-motor intention. Seven participants underwent four experimental sessions while measuring brain activity with magnetoencephalograpy. The first three sessions were of a simple constrained task in which the participant was to pull back on the control stick to recover from a perturbation in attitude in one condition and to passively observe the perturbation in the other condition. The fourth session consisted of having to recover from a perturbation in attitude while piloting the plane through the Grand Canyon constantly maneuvering to track over the river below. Independent component analysis was used on the first two sessions to extract artifacts and find an event related component associated with the onset of the perturbation. These two sessions were used to train a decoder to classify trials in which the participant recovered from the perturbation (motor intention) vs. just passively viewing the perturbation. The BCI-decoder was tested on the third session of the same simple task and found to be able to significantly distinguish motor intention trials from passive viewing trials (mean = 69.8%). The same BCI-decoder was then used to test the fourth session on the complex task. The BCI-decoder significantly classified perturbation from no perturbation trials (73.3%) with a significant time savings of 72.3 ms (Original response time of 425.0–352.7 ms for BCI-decoder). The BCI-decoder model of the best subject was shown to generalize for both performance and time savings to the other subjects. The results of our off-line open loop simulation demonstrate that BCI based neuroadaptive automation has the potential to decode motor intention faster than manual control in response to a hazardous perturbation in flight attitude while ignoring ongoing motor and visual induced activity related to piloting the airplane.

Keywords: neuroadaptive automation, brain computer interface, brain machine interface, neuroergonomics, decoding, independent component analysis, MEG, aviation

# INTRODUCTION

Safe and effective performance in many occupational settings is critically dependent on people making timely and correct splitsecond decisions to avoid an impending hazard. Consider a speeding driver having to swerve to avoid hitting a child running unexpectedly onto the roadway; a nurse having to administer defibrillation to a patient having sudden cardiac arrest; or a pilot having to execute a rapid maneuver to recover from a stall or other abrupt perturbation during high-speed flight. Although some drivers, nurses, and pilots may be sufficiently skilled to make quick decisions and avoid mishaps in these situations, there are many conditions—fatigue, stress, mind wandering, task overload, to name a few—that can degrade human performance so that a correct and timely response is not possible and an accident may result.

One approach to this problem is to enhance human performance in such time-critical situations by decoding a person's neural activity associated with the intention to act. Once intention has been detected, one could provide appropriate feedback to the human operator or trigger computer aiding. Brain activity precedes motor action, so if neural signals associated with the intention to act could be successfully decoded in real time, one could use the decoded output to aid the human user. Using computer technology to augment human performance based on an assessment of human operator cognitive states is termed adaptive automation (Parasuraman et al., 1992; Scerbo, 2008; Parasuraman and Galster, 2013). Neuroadaptive automation is when neural signals are used to assess operator state, an approach that has been successful in mitigating human performance decrements in a variety of cognitive tasks (Byrne and Parasuraman, 1996; Prinzel et al., 2000; Scerbo et al., 2003; Wilson and Russell, 2007; Ting et al., 2010; Durantin et al., 2015; Gateau et al., 2015). Such an approach is consistent with the field of passive Brain Computer Interfaces (BCI), also referred to as Brain Machine Interfaces (BMI), in which user neural states are monitored in order to enhance human interaction with external devices (Blankertz et al., 2010; Lotte et al., 2013).

There is extensive research on the use of BCIs to support partially or fully disabled persons to control devise such as computers and prosthetic limbs (Reiner, 2008), and a smaller but growing literature on their use for healthy individuals so as to enhance human-system interaction (Zander and Kothe, 2011; Lotte et al., 2013). Comparatively little work has been conducted comparing the effects of neuroadaptive automation or passive BCIs to human performance in time-critical (splitsecond) decision-making situations [For related research see the studies by Haufe et al. (2011, 2014) and Kim et al. (2015) concerned with detection of braking intention by EEG]. In particular, when a critical event has to be detected and responded to quickly, can one decode the associated neural states of the human operator to achieve a faster response than the operator's manual action? We can rephrase the question as follows: given that the brain is faster than the hand (or foot or other effector), can one solve the problem that human manual actions are sometimes too sluggish to avoid a mishap when very little time is available by using the decoded brain activity to respond to a critical hazardous event?

We addressed this issue in the present study by examining whether neural signals could be decoded to enhance human performance in a time-critical decision-making task. We chose a decision-making situation that is encountered in aviation tasks: responding quickly to an in-flight perturbation, such as turbulence, micro-bursts, severe windshear, structural damage (e.g., from trim tab failure, bird strike, etc.). While such perturbations can occur in many types of flight, they can be a major contributor to mishaps in military aviation, given the greater exposure to risky situations requiring split-second decision-making, such as low-level flight over terrain, or high Gforce maneuvers (Knapp and Johnson, 1996; Moroze and Snow, 1999; Nakagawa et al., 2007). When flying at high speed and very close to terrain, a savings of even a few milliseconds in responding to a perturbation can represent the difference between life and death (Haber and Haber, 2003).

Decoding neural states corresponding to cognitive states has been the object of considerable attention in the neuroimaging literature. A major approach to the problem has been to apply pattern-classification algorithms to multi-voxel functional MRI data in order to decode information representation in a participant's brain (Kamitani and Tong, 2005, 2006; Norman et al., 2006; Nishimoto et al., 2011; Poldrack, 2011; Shibata et al., 2012; Callan et al., 2014; Christophel and Haynes, 2014; Hutzler, 2014). However, the relatively low temporal resolution of fMRI and other neuroimaging methods based on cerebral hemodynamics renders them unsuitable for decoding neural states associated with splitsecond decision-making. Instead, electroencephalography (EEG) or magnetoencephalography (MEG) provide methods with sufficient temporal resolution to decode neural states associated with rapid decision-making. In the present study we used MEG as our primary source of neural activity, but also conducted an fMRI study to allow for better localization of MEG activity to brain areas.

A number of studies have applied pattern classification methods to neural signals in order to decode specific cognitive states. Typically these approaches train the classifier on part (e.g., half) of the neuroimaging obtained during performance of a cognitive task and then evaluate the effectiveness of the classifier on the remaining (untrained) half of the data (Garrett et al., 2003; Wilson and Russell, 2007; Baldwin and Penaranda, 2012). This is certainly an accepted criterion for evaluating how well a particular decoding algorithm works in a particular domain of human performance. But a stricter test is necessary if such neural BCI-decoders are to be useful in a general way. The more stringent test would involve application of a trained classifier to untrained data taken from a different task in the same general cognitive domain. Such a strategy, if successful, can provide for a more generalizable test of the efficacy of neural state decoding for a given cognitive function. We used this approach in the present study by training a MEG classifier during performance of a simple flight task involving a perturbation and applying it to a more complex flight task involving similar types of perturbations during ongoing piloting. It is important to note that the ongoing piloting task uses the same control stick (controlled by articulation of the hand, wrist, and arm) as that needed to recover from the perturbation. It is therefore necessary for the BCI-decoder to be able to distinguish between brain activity responding to changes in the visual field and motor intention that are a result of piloting from brain activity responding to changes in the visual field and motor intention that arise from a perturbation (even though the BCI-decoder has not been specifically trained to do so). We additionally evaluated the generalizability of a trained model across participants' by using the weights of the model of the best participant to test performance over the trials of the complex flying task of the remaining participants.

Several MEG and EEG studies have identified neural correlates of visual and motor responses that suggest our goal of predicting motor intention to a visually presented hazard prior to actual movement is possible. Single trial response times to visual coherent motion onsets were predicted by MEG activity from 150 to 250 ms before the manual response of the observer (Amano et al., 2006). While the focus of the Amano et al. (2006) study is on the onset of visual perception, not on motor intention, it does provide a potential link between response time and the identification of the perceptual event. In a study investigating neural correlates of speeded motor responses to a visual stimulus it was found that larger low-theta complexes in EEG preceded more rapid button presses (Delorme et al., 2007). It has also been found that self-paced motor intention of reaching direction can be successfully decoded prior to movement onset (62.5 ms with 76% classification performance) using slow cortical potentials (0.1–1 Hz) recorded by EEG (Lew et al., 2014). In addition, research conducted on the detection of braking intention in simulated (Haufe et al., 2011; Kim et al., 2015) and real (Haufe et al., 2014) driving using EEG was able to make predictions about 130 ms earlier than the corresponding behavioral responses. The real-world task set out in our experiment to be able to predict motor intention to a visual hazard in the presence of complex ongoing motor control and a dynamically changing visual field goes beyond what was investigated in these previous studies. Nevertheless, we do believe that these studies taken together suggest that there may potentially be some features present in the MEG (and EEG) signal that can be decoded prior to movement onset in response to a visually presented hazard even under the robust conditions set out in our experiment.

There have been previous studies (Blankertz et al., 2002; Parra et al., 2003) using online BCI to detect error-related potentials to reduce error-rate and improve overall performance. While these methods are promising they utilize data that occurs after the response is made and are thus not applicable to our objective of detecting motor intention prior to movement. It is the goal of our study to utilize an off-line BCI-decoder to evaluate the feasibility of using real-time neuroadaptive automation to enhance piloting performance by reducing response time to recover from an impending hazard (see **Figure 1**).

# MATERIALS AND METHODS

# Participants

A total of seven right–handed adults participated in this study. Five (three females and two males) were glider pilots from local university clubs. The two participants (males) that were not pilots had considerable experience with driving or flying related video games. The age of the participants ranged from 19 to 40 years with a mean of 23.9 years and SE = 2.7 years. All participants gave written informed consent for experimental procedures approved by the ATR Human Subject Review Committee in accordance with the principles expressed in the Declaration of Helsinki.

# Experimental Tasks

Two different tasks were used, a simple piloting task of level flight over the ocean and a more complex piloting task through the Grand Canyon. We used the first task to develop a method for decoding of neural states associated with response to a perturbation and the second task to investigate the generalizability of the method to a related but more complex situation. In both tasks the participant was given a first-person unobstructed view from the airplane (the view was as if from a camera in the front of the aircraft, see **Figures 2A–D**). The aircraft model simulated was an F22—Raptor using the Xplane flight simulator (Version 9.75, Laminar Research). The data for various flight parameters (elevator, aileron, rudder deflections, pitch, roll, yaw, heading, speed, dive rate, structural g-forces, latitude, longitude, altitude, etc.) and the control stick (NATA Technologies MRI and MEG compatible) deflections were collected at a mean sampling rate of 400 Hz using a UDP Matlab interface. The experimental conditions could be controlled via Matlab by using the UDP interface to give commands to the flight simulator.

#### Simple Piloting Task

This task had four conditions, two involving the presence or absence of a perturbation, and two in which the participant had the choice to either pilot the plane or passively watch the screen without moving the control stick [see Supplementary Videos 1–4; (1) fly\_perturbation; (2) fly\_noperturbation; (3) watch\_perturbation; (4) watch\_noperturbation; The participants viewed the 1st person perspective given on the left side of the video]. The primary task required the participant to pull back on the control stick (causing an upward elevator deflection resulting in the plane to climb) as rapidly as possible in response to a perturbation in attitude (orientation of the plane with respect to the horizon) causing the plane to dive at a steep rate (see **Figures 2A,B**). The participant was instructed to hold the control stick but not to move it until after the perturbation occurred. The perturbation consisted of instantaneous maximum downward deflection of the elevator for 200 ms causing the plane to enter a steep dive. The trial started with the plane flying at an altitude of 107 m above sea level at a speed of 1040 kph. The perturbation occurred on 67% of the trials at a random time between 2 and 4 s (randomly determined) after the beginning of the trial. If the plane descended to 30 m above sea level the

perturbations to the airplane. This information can be used to reduce the occurrence of false-alarms made by the system (executing upward elevator deflection when there is no actual perturbation or motor intention to recover). The aircraft computer can use up to 120 ms (time of the processing window for the BCI-decoder) to determine the presence of a non-pilot initiated perturbation in attitude without causing a loss in the time savings afforded by the neuroadaptive automation system. ICA, Independent Component Anayalsis; BCI, Brain Computer Interface; LSPC, Least Squares Probabilistic Classification; UDP, Universal Datagram Protocol.

simulator was paused before the plane crashed into the ocean. At the end of each trial the simulator was paused for 1.5– 2.5 s (randomly determined). The timing was the same for trials in which there was no perturbation. Before the beginning of each trial participants chose by button press whether they were going to pilot the plane or passively watch without moving the joystick. Participants were instructed to try to make about twice as many piloting trials as passive trials. The rational for having the participant select whether they were to fly or watch rather than to direct them which condition it was by instruction was to better insure that they were actually doing the task correctly. If given visual directed instructions, participants would often try to recover from the perturbation even when they were instructed to just watch. Allowing participants to choose which condition to fly or watch helped to alleviate this problem. In this study, for the simple piloting task, only the trials containing a perturbation (fly\_perturbation and watch\_perturbation) were used to train the BCI-decoder. Please see the section under Decoding Pilot

panel for each task (B,D) shows a representative image of what the view may appear like during the perturbation. Notice that in the simple piloting task over the ocean (A,B) the bank angle is always level, whereas, in the complex piloting task the bank angle is continuously changing based on the control stick inputs to maintain the goal of tracking the river (See Supplementary Videos 1–6).

Intention below for the rational. After the button was pressed there was a delay of 0.8–1.3 s (randomly determined) before the trial begins. The passive trials were the same as the piloting trials with the exception that the plane would pause when it reached an altitude of 30 m above the ocean, which occurred at a mean time of 1.3 s after the onset of the perturbation.

There were 90 trials per session. On average there were 40 piloting perturbation trials, 20 piloting no-perturbation trials, 20 passive viewing perturbation trials, and 10 passive viewing no-perturbation trials. The actual number of trials for each condition was dependent on the participant's choice to pilot or passively view. The percentage of perturbation trials (67%) was experimentally determined and presented randomly within each of those conditions. Each session was ∼13 min. Bad trials (plane did not fly straight and level until time of perturbation) were removed from the analysis. Additionally, trials with response times slower than 700 ms from the onset of the perturbation were removed from the analysis.

#### Complex Piloting Task

This task involved flying through the Grand Canyon and consisted of two conditions: perturbation (67% of trials) or no perturbation (33%) [See Supplementary Videos 5, 6; (5) Grand\_Canyon\_perturbation; (6) Grand\_Canyon\_noperturbation]. Unlike the simple flying task, the participant was always required to pilot the plane. There were no passive viewing conditions. In the complex task the participant was constantly required to move the elevator and ailerons of the plane with the control stick to track the river through the Grand Canyon. The perturbation was caused by an instantaneous maximum downward deflection of the elevator for 200 ms. Depending on the attitude (particularly the angle of bank–roll) of the plane, the perturbation would cause a rapid departure from the trajectory of flight toward the ground and/or one of the cliffs (see **Figures 2C,D**). The plane started each trial at approximately 30 m above ground level at a speed of 1135 KPH. As in the simple task, the perturbation occurred between 2 and 4 s (randomly determined) after the beginning of the trial. There was also a pause for 1.5–2.5 s (randomly determined) at the end of each trial. Unlike the simple task, in which the participant specified by button press whether they were going to pilot the plane or passively watch, in the complex task every trial was a piloting trial. The instructions on the screen denoted that the participant could push the button when they were ready to begin the trial. After the button was pressed there was a delay of 0.8–1.3 s (randomly determined) before the trial began. Unlike the simple task, in the complex task the plane was allowed to crash into the ground or cliff. Upon a crash the system would pause the screen. There were 90 trials total in the complex piloting task. There were 60 perturbation trials and 30 no-perturbation trials. The order was randomly determined. Each session was approximately 14 min. Trials in which the plane crashed before the onset of the perturbation were removed from the analysis.

## Functional MRI

Our goal to develop a classifier of operator intention to undertake a rapid action to avoid a perturbation was to use a neuroimaging method with high temporal resolution, such as EEG or MEG. We used MEG in the current study, but in order to bolster our ability to localize MEG activity to intracortical sources, we also conducted an fMRI study of the same piloting tasks in order to establish seeds for conducting source localization analyses of MEG data. In the fMRI experiment participants underwent two sessions of the simple piloting task. Visual presentation of the flight simulation was projected by mirrors to a screen behind the head coil that could be viewed by the participant by a mirror mounted on the head coil. An fMRI compatible control stick (NATA technologies) was used by the right hand of the participant to control the elevator (back = pitch up; forward = pitch down) and aileron (roll left and right) deflections. Trigger timing of the fMRI scanning was directly read into one of the flight parameters of the flight simulator by means of a National Instruments Hi Speed USB NI USB-9162 BNC analog to digital converter.

A Siemens Verio 3T scanner was used to obtain functional T2<sup>∗</sup> weighted images with a gradient echo-planar imaging sequence (echo time 30 ms; repetition time 2500 ms; flip angle 80◦ ). A total of 40 interleaved axial slices were acquired with a 4 × 4 × 4 mm voxel resolution covering the cortex and cerebellum. A single run consisted of approximately 340 scans. (The number varied depending on the randomized time and how long the participant took to make a button response to start the trial). The first three scans were discarded. Structural T2 images, later used for normalization, were also collected using the same axial slices as the functional images with a 1 × 1 × 4 mm resolution. Images were preprocessed using SPM8 (Wellcome Department of Cognitive Neurology, UCL). Echo planar images EPI were unwarped and realigned. The T2 image was co-registered to the mean EPI image. The T2 images were acquired during the same fMRI session as the EPI images with the same slice thickness. Since the head was in approximately the same position it is thought that this will facilitate coregistration. The EPI images were then spatially normalized to MNI space (3 × 3 × 3 mm voxels) using a template T2 image and the coregistered T2 image as the source. Normalization was done using the T2 image rather than EPI because we believe it gives better results due to better spatial resolution. The images were smoothed using an 8 × 8 × 8 mm FWHM Gaussian kernel. Regional brain activity was assessed using a general linear model employing an event-related analysis in which the onset times were convolved with a hemodynamic response function. High pass filtering (cutoff period 128 s) was carried out to reduce the effects of extraneous variables (scanner drift, low frequency noise, etc.). Auto-regression was used to correct for serial correlations. The six movement parameters were used as regressors of non-interest in the analysis to account for biases in head movement correlations present during the experimental conditions. Anatomical T1 weighted images were acquired with a 1 × 1 × 1 voxel resolution for use in constructing source models for localizing brain activity recorded by MEG.

## MEG

In the MEG experiment participants underwent three sessions of the simple piloting task and one session of the complex piloting task. The first two sessions of the simple piloting task were used for training the decoding algorithm. The third session of the simple piloting task was used to evaluate the effectiveness of the trained algorithm in decoding neural states when participants perform the same task. As discussed previously, however, an effective classifier should be able to decode not only neural states on the same task that it has been trained on, but on more complex versions of the task that the classifier has not been trained on—that is, whether the classifier can achieve transfer. Accordingly, we also assessed the effectiveness of the classifier in decoding neural activity preceding detection and response to a perturbation in the complex piloting task. Visual presentation of the flight simulation was projected to a mirror to a screen above the participant's head. An fMRI compatible control stick (NATA technologies) was used by the right hand of the participant to control the elevator (back = pitch up; forward = pitch down) and aileron (roll left and right) deflections. Trigger timing for the start of each trial and the start of the perturbation was registered by a photodiode placed on the screen. A small white square was constantly presented on the lower center part of the screen (out of the view of the participant) at the start of each trial and at the onset of the perturbation the small square turned black for 20 ms. The light intensity change was detected by the photo diode and written directly to one of the extra channels on the MEG.

The data was recorded using a Yokogawa 400 channel MEG supine position system. Head movement was restrained by using a strap across the forehead. A sampling rate of 1000 Hz was used with input gain of ×5 and an output gain of ×100. The trials were segmented 1000 ms before and after the onset of the perturbation. For trials with no perturbation the timing of the virtual perturbation was given by the photodiode and used as the onset point for segmentation. The data were down sampled from 1000 to 250 Hz and filtered using a causal Butterworth online bandpass filter from 2 to 100 Hz. Only bad trials in which there was a machine failure in the flight simulator causing the plane to verge from a straight and level course (for the simple piloting task) or bad trials in which the plane crashed before the onset of the perturbation (for the complex piloting task) were removed from the data. Besides bandpass filtering there was no manual or automated artifact cleaning of the data prior to independent component analysis. Infomax independent component analysis (EEGLAB, Delorme and Makeig, 2004) with principal component analysis PCA reduction to 64 components was conducted on the first two sessions of the simple piloting task (processing time was approximately 7 min). ICA has been shown to be well suited for separation of artifact and task related components (Delorme et al., 2007). The weights derived from the ICA were used to calculate component activation waveforms for the trials in sessions one and two. They were also used to calculate component activation waveforms for the trials in sessions three of the simple piloting task and the session of the complex piloting task. There were two reasons that the weights from the first two sessions were used to calculate the activation waveforms of the later sessions: first, we did not want to bias the classification results of the BCI-decoder used for training by including the test data of the later sessions; and second, we wanted to simulate conditions required to run the BCI-decoder as if we were running it online in real-time. The independent components showing evoked responses to the averaged perturbation piloting trials were considered for training of the BCI-decoders. Each participant had one evoked potential related component with an ICA spatial filter showing a prominent sinc and source (See **Figures 3A,B**). All preprocessing steps described above were automated except for the selection of the independent components showing evoked responses, which was done by visual inspection of the averaged activation waveform and the ICA spatial filter. This step can also be automated if desired.

# MEG Source Localization Analysis

Source localization analysis involves the following steps: (1) Determining the position of the head (brain) within the MEG device, (2) Segmentation of the cortex of the brain, (3) Estimation of the leadfield model on the vertex points of the segmented cortex, and (4) Current source estimation on the cortex.

1. Five coils attached to the participant's head (one behind each ear, and three across the forehead) were used to determine the position of the head within the MEG. The positions of the five coils on the participant's head were measured by the Polhemus FastSCAN Cobra system. This system allows for a 3D laser scan of the face as well as the coordinate location for the five markers to be obtained. Matlab software (part of the VBMEG toolbox) was used to register the coordinate space of the 3D face image to the participant's anatomical T1 MRI structural image. Once the position of the five coils in reference to the

MEG sensors are known the position of these sensors can be registered in the coordinate space of the participant's T1 MRI structural image.


The VBMEG analysis used the fMRI t-values of the contrast of the perturbation piloting condition vs. the no perturbation piloting condition on the simple piloting task. The results of the SPM analysis for the contrast for each participant (using a threshold of p < 0.05 uncorrected, with a spatial extent of 50 voxels, and masking out activity in the cerebellum and subcortical areas) were projected onto the brain of their own T1 image using their individual specific normalization parameters. For one of the participants for which fMRI data was not collected the random effects analysis across all of the participants for the same contrast (using a threshold of p < 0.05 uncorrected, with a spatial extent of 50 voxels, and masking out activity in the cerebellum and subcortical areas) was used as a prior and projected back to the individuals T1 image using their specific normalization parameters. The fMRI T-values were then mapped to the vertex points of the segmented brain serving as prior information for the VBMEG analysis. A lenient uncorrected threshold of p < 0.05 was used to ensure that sufficient vertex points of the brain were given prior information for the VBMEG analysis. Using a conservative threshold corrected for multiple comparisons for the fMRI analyses may considerably restrict the extent of prior information for the VBMEG analysis.

The activation waveforms of the trials from all conditions and sessions for both the simple and complex tasks were projected to the MEG sensor space (400 channel) using the weights of the independent components as determined from the ICA on the trials from the first two sessions of the simple piloting task. The mean activity of the trials for each condition was used in the VBMEG analysis. The noise model, serving as a baseline, was calculated using the activity from the no perturbation passive viewing condition. The VBMEG analysis estimated current activity over the entire cortex using a variance magnification factor = 500 and a confidence parameter = 500 [these parameters are such that they give somewhat less weight to the fMRI prior activity in determining the location of the source activity Sato et al. (2004) and Yoshioka et al. (2008)]. The time period for current estimation was 250 samples and the time step for the next period was 100 samples. The output of the analysis was the mean estimated current across trials for each cortical vertex point for each condition.

To determine the location of current on the brain thought to underlie the response to the perturbation and to be able to compare the results across participants the following procedure was used for each participant using data from the complex piloting task: For each of the vertex points (there were ∼6000 for each participant), the root-mean-squared RMS current was determined for perturbation recovery and for a baseline period prior to perturbation: The RMS current for perturbation recovery was calculated from 12 ms before and 8 ms after the new mean response time (utilizing performance of the adaptive automation BCI-decoder—see Results section). The RMS current for the baseline period was calculated from 400 ms before onset of the perturbation to just before the onset of the perturbation. The current for each vertex point was normalized by subtracting 20 times the mean RMS current of the baseline period (across all vertex points) from the RMS current of perturbation recovery for each vertex point and then dividing by the maximum RMS current across all vertex points. The normalized current of the vertex points that were greater than zero were projected to the standard template brain (2 × 2 × 2 mm) (given in SPM8) using the MNI coordinates determined during segmentation by Free Surfer. The resulting images were smoothed using a FWHM 8 × 8 × 8 mm Gaussian kernel. Because smoothing may cause activity to be spread to regions that were not originally active a threshold was used such that only voxels showing mean RMS values greater than the lowest value of the original smoothed voxels (corresponding to the original projected vertex points) were considered to be significant (using a spatial extent threshold of 100 voxels). The intersection of active voxels across all seven participants was used to define common activity. The SPM Anatomy Toolbox v1.8 (Eickoff et al., 2005) was used to determine the labels of active brain regions.

# Decoding Pilot Intention

We developed a method to decode the participant's intention to perform an action in response to a perturbation by training a classifier on neural data taken from the first two sessions of the simple piloting task. The classifier was then evaluated by testing its ability to decode participant intention on the third session of the same task. As a more stringent test of classifier performance—an examination of its transfer generalizability we then examined its ability to decode intention in the complex piloting task. It should be noted that this classifier represents an open loop simulation of a BCI in order to test the feasibility of such a method for real-time implementation of neuroadaptive automation using a closed loop BCI-decoder. See **Figure 1** for a depiction of the hypothesized neuroadaptive automation system implemented in this study.

The training of the classifiers was conducted using trials from the first two sessions of the simple piloting task. The two classes to be decoded were presence of perturbation while piloting the plane vs. presence of perturbation while passively viewing. The reason for selecting these contrasts to train on was because we wanted to ensure that the BCI-decoder was not just picking up the visual evoked response induced by the perturbation but was extracting activity related to the attentional components of the response to the perturbation in relation to the intention to recover from the change in attitude. Rather simple features were used for decoding in the hope that they would generalize across sessions, tasks, and participants. The first step in calculating the features used for decoding was to determine the time point of the absolute peak in the mean evoked response (that was less than 300 ms) to the onset of the perturbation in the selected independent component of the perturbation piloting trials of the first session of the simple task used for training (the peak time for the participants was as follows: S1 = 232 ms; S2 = 236 ms; S3 = 284 ms; S4 = 196 ms; S5 = 228 ms; S6 = 284 ms; S7 = 264 ms; mean = 246 ms) (See **Figure 3B** for the mean activation waveforms for the Fly and Watch conditions for each participant from session 1). For each trial RMS amplitude was calculated within two consecutive 40 ms windows prior to the time of the peak of the averaged evoked potential and one 40 ms window after (These windows are depicted as blue bars at the top of the mean activation waveforms in **Figure 3B** for each participant). The RMS amplitude values in these three windows served as the features for decoding for the perturbation piloting trials. To help in generalization and to bias the classifier to make fewer false alarms the perturbation passive trials used three separate time points to extract the features (120, 60, and 0 ms before the onset of the perturbation). This had the effect of increasing the number of training trials for the perturbation passive condition by three. Since there were originally half as many perturbation passive trials than perturbation piloting trials this increased the training ratio to be about 1.5 times as many trials for the perturbation passive condition to that of the perturbation piloting condition. The greater number of training trials and the greater variability for the perturbation passive condition is used to increase the ability to reject trials that are not from the perturbation piloting condition (reduce false-alarms) and increase the noise variability with regards to timing such that the classifier may more readily generalize to the complex task in which the attitude of the plane (and thus the visual image on the screen) is constantly changing as a result of the continuous piloting task of tracking the river in the Grand Canyon. The reason we did not use the no perturbation piloting task as one of the conditions to train the BCI-decoder on is that it would likely just extract the visual evoked response to the perturbation piloting task and not extract the attention related component of the motor intention for attitude recovery that we are interested in determining. Given the continuous changes in attitude of the plane while maneuvering on the complex task a BCI-decoder that is based on visual evoked perturbations from the simple task may result in a large number of false alarms.

The BCI-decoder was trained on approximately 80 trials of the perturbation piloting task and 120 trials of the perturbation passive task using the Matlab Least Squares Probabilistic Classification (LSPC) toolbox (Sugiyama et al., 2010). LSPC uses a linear combination of kernel function to model the classposterior probability. Regularized least-squares fitting of the true class-posterior probability is used to learn its parameters (Sugiyama et al., 2010). The use of least-squares fitting to determine a linear model allows for a global solution to be made analytically providing a considerable speedup in computational time. The default parameters were used in training of the LSPC models (see Matlab code: Sugiyama et al., 2010). The time required to train the classifier is approximately 0.25 s. Prior to training the features for the trials were normalized by subtracting the mean and dividing by the standard deviation. The mean and standard deviation from the training trials were used to normalize the trials used for testing. The first test set consisted of trials from the session of the simple piloting task that was not used during training. There were approximately 40 perturbation piloting trials and 20 perturbation passive trials to be classified using the train LSPC model. Balanced accuracies (Brodersen et al., 2010) are reported to account for biases in unequal number of trials in the two conditions to be classified. The test data consisted of features computed at the time point specified by the peak of the evoked response determined from the training data. No information about the distribution of the test data was used. The BCI-decoder treats each test trial as independent. One hundred BCI-decoders were trained and then tested using trials from the simple piloting task. The primary parameter that is random for training of the model for each BCI-decoder is the order of the trials in the training cross validation steps. The BCIdecoder with the best performance as determined by balanced accuracy was used to test the trials from the complex piloting task.

The goal for the BCI-decoder in the complex piloting task was to be able to detect the intention to recover from a perturbation in attitude faster than by movement of the control stick by the hand. The selected LSPC model trained on the simple piloting task was used for testing of the complex piloting task. Additionally the same parameters (mean and standard deviation) used during training were also used on the test set for normalization of the features. For the perturbation piloting trials and the no perturbation piloting trials the LSPC model began testing at time point zero when the perturbation started. The window for the BCI-decoder was 120 ms encompassing the three 40 ms time windows in which the RMS amplitude was calculated. Therefore, the earliest time the perturbation could be detected was at 120 ms. The 120 ms time window tested by the BCI-decoder was incremented in 8 ms steps through 1000 ms of the data for each trial. The earliest point at which the BCI-decoder detected the presence of a perturbation piloting trial was the point at which the adaptive automation would be implemented to recover attitude. The time between detection by the BCI-decoder and the onset of the control stick by the hand to recover from the perturbation in attitude was used to evaluate the time benefit (time savings) of the implementation of the adaptive automation. The trial was only considered a hit if the BCI-decoder predicted time was faster than the actual movement time of the control stick. To determine the statistical significance of the BCI time savings, BCI-decoders were trained using 1000 random permutations of the labels and each was tested on the complex piloting task. All 1000 permuted models used for evaluation had less than 25% false positives. This criterion was used in order to be comparable to the false positive performance of that of the BCI models trained with correct labels. The perturbation time benefit were calculated for each of the 1000 permuted models and used as a distribution to compare against the model trained with the actual labels.

In order to evaluate the generalizability of a single model across participants the weights of the model of the participant with the best performance were used to test the trials of the complex flying task of the remaining six participants. Performance measures including BCI time savings were determined using the same method as applied when using each participants corresponding model to test the trials of the complex piloting task (see above).

#### Procedure

Participants underwent the fMRI and MEG sessions on separate days. The fMRI experiment was used to calculate a prior for the MEG source localization analysis using Variational Bayes Multimodel Encephalography (VBMEG). Six of the seven participants participated in the fMRI experiment. One participant only did the MEG experiment. All of the participants that participated in the fMRI experiment did it prior to the MEG experiment. MRI anatomical T1 scans were acquired for all seven participants and used to make models for source localization analysis using VBMEG. All analyses were conducted using Matlab software unless otherwise stated.

# RESULTS

#### Behavioral Performance

The response times (RTs) for each of the participants to initiate pull back on the control stick in reaction to a perturbation causing the plane to enter a steep dive for both the simple piloting task and the complex piloting task are presented in **Table 1**. The RT in the complex task (median = 436.5) was not found to be significantly higher than in the simple task (median = 368.5), p = 0.0781 (df = 6). However, there is a tendency in this direction. The number of trials the plane crashed into the ground/cliff, as well as the number of bad trials (resulting from machine failures and/or movement before the onset of the perturbation for the simple task and crashes before the onset of the perturbation for the complex task) are also presented in **Table 1**. As **Table 1** indicates, these numbers were relatively small, but were greater in the complex task. It should be noted that bad trials were excluded from analysis and not used for calculation of response times or training/testing of the BCI-decoders. In some cases on the complex task there were crashes after the onset of the perturbation. These trials were not excluded from analysis.

#### Source Localization

The smoothed RMS current centered around the time of the perturbation on the complex piloting task of the activation waveform of the projected task related independent component rendered on the surface of the brain (See Methods for details of source localization analysis) is displayed for each participant along with the corresponding independent component spatial map in **Figure 3C**. There was some degree of variability in the extent to which different brain regions were active across participants (**Figure 3C**, **Table 2**). As can be seen in **Figure 4** and **Table 3** brain regions that were commonly active for all participants include the pre-central gyrus (including premotor cortex), post- central gyrus, the superior parietal lobule, the primary visual cortex, and human occipital cortex visual motion processing area V5 (hOC5). It should be noted that while source localization is interesting in determining the brain regions associated with the independent component upon which decoding is made it is not a necessary step in implementation of the proposed neuroadaptive automation brain-machine interface.

## BCI-Decoder Performance

The results of the performance of the BCI-decoder are presented in **Tables 4**–**9**. The performance of the best (as determined by the highest balanced accuracy score) out of 100 BCI-decoders tested on the novel sessions of the simple piloting task is presented in **Table 4** for each participant. The average over all 100 BCIdecoders is given in the table for comparison. The BCI-decoder for six of the seven participants showed significant differences (p < 0.05) in being able to distinguish between perturbation piloting trials and perturbation passive viewing trials. The mean balanced accuracy performance was approximately 70%. Certainly the selection of the best BCI-decoder out of 100 trained biases these results, however, it was our goal to find the model that may best extract attentional information related to the intention of recovering from the perturbation in attitude. In this respect


*ID, Participant identification number; RT, Response Time; Ses, Session; BT, Bad Trial; CT, Crash Trial.*

#### TABLE 2 | MNI coordinates of clusters of brain activity for each participant.


*P, participant identification number; IFG, Inferior Frontal Gyrus; SFG, Superior Frontal Gyrus; SMA, Supplementary Motor Area; PMC, Premotor Cortex; Pre-CG, Pre Central Gyrus; Post-CG, Post Central Gyrus; SPL, Superior Parietal Lobule; IPC, Inferior Parietal Cortex; hOC5 (V5), Human Occipital Cortex Visual motion processing area V5; MT, Middle Temporal Cortex overlaps area hOC5; IOG, Inferior occipital gyrus; MTG, Middle Temporal Gyrus; ITG, Inferior temporal gyrus. MNI coordinates of Clusters of root-mean-squared RMS current 12 ms before and after the mean time in which the decoder detected motor intention to the presence of a perturbation. The threshold of significant RMS current activity at a specific vertex point was set at 20x the mean baseline RMS current from* −*400 to 0 ms across all vertex points. A spatial extent threshold of 100 voxels was used on the smoothed projection into MNI space.*

#### TABLE 3 | MNI coordinates of clusters of brain activity common across all participants.


*Pre-CG, Pre Central Gyrus; Post-CG, Post Central Gyrus; SPL, Superior Parietal Lobule; hOC5 (V5), Human Occipital Cortex Visual motion processing area V5.*

*MNI coordinates of Clusters of root-mean-squared RMS current 12 ms before and after the mean time in which the decoder detected motor intention to the presence of a perturbation that are common across all seven participants. The threshold of significant RMS current activity at a specific vertex point was set at 20x the mean baseline RMS current from* −*400 to 0 ms across all vertex points.*

we feel justified in selecting the best model trained and tested on the simple piloting task to use for testing in an unbiased manner on the complex piloting task. Although training the BCIdecoder to distinguish between the perturbation piloting and no perturbation piloting trials on the simple piloting task may have provided better performance when testing on the novel session from the same task it is likely that the model would have learned the response to the visual aspects of the perturbation rather than the neural activity related to the intention to recover attitude.

As discussed above the model with the highest balanced accuracy on the test session of the simple piloting task was used to test the session of the complex piloting task. The goal was to simulate the use of a brain computer interface in real time that would initiate the use of adaptive automation to initiate recovery from a perturbation in attitude faster than could be done by moving the control stick by the hand. In accomplishing this goal the BCI-decoder was used on a 120 ms window starting at the time of the perturbation and stepping through the data in 8 ms steps. The BCI-decoder was also tested on trials in which there was no perturbation within the same time region in which the perturbation may have occurred. This point was determined randomly during the experiment and triggered on the MEG trace using a photodiode (see Methods). Bad trials were eliminated from the analysis (see **Table 1**). The first instance of the classification by the BCI-decoder as a perturbation piloting trial is the time point at which the adaptive automation is initiated. Only trials in which the BCI-decoder is faster than the movement of the controls stick are counted as hits (true positives). The results of the classification performance for the complex piloting task are presented in **Table 5**. Because unequal number of trials existed for perturbation piloting and no perturbation piloting trials balanced accuracies were used (Brodersen et al., 2010). All seven participants showed significant classification performance above chance even with the additional criteria that the detection of a perturbation piloting trial had to be before movement of the control stick. In these cases where there was classification of a perturbation trial after control stick movement, the trial was counted as a miss (false negative). The ratio of correct rejections (true negatives) to false alarms (false positives) was greater than the ratio of hits (true positives) to misses (false negatives). The mean balanced accuracy across participants was 73%. **Table 6** shows the performance on the complex piloting task of the six subjects tested using the weights from the model of the best participant. The results indicate that the balanced accuracies of all six participants showed significant classification performance above chance (**Table 6**). While there were significant differences in hit rate and false alarm rate between the generalized and own model tests there were no significant differences, using the Wilcoxon signed rank test, in the primary performance measures including balanced accuracy, d′ , and a′ (a′ a prime or area under the curve was calculated by the method given in Macmillan and Creelman (1991) (See **Table 3B**).

The improvement in response time afforded by the use of the neuroadaptive automation is given in **Table 7**. In trials in which there was a miss, neuroadaptive automation was not employed and the original response time was used. The mean response time difference was calculated from the original onset time minus the onset of the neuroadaptive automation for all perturbation piloting trials. The mean improvement in response time across participants was a reduction from 425 to 353 ms under neuroadaptive automation, or an average of 72 ms improvement. **Figures 5A,B** depicts the decoded response times plotted on the single trial activation waveforms of the adaptive automation (black circles) for participant 1 (best performer) and 3 (median performer), respectively. The single trials are arranged in increasing order of behavioral response time from bottom to top (white line). The significance of the time savings was



*ID, Participant identification number; bacc\_mean, Balanced accuracy mean in percent; bacc\_ppi, Posterior probability intervals; bacc\_p, p value; TP, true positive (hit); FN, false negative (miss); FP, false positive (false alarm); TN, true negative (correct rejection); HR, hit rate; FAR, false alarm rate; d*′ *, d prime; a*′ *, a prime.*

*In the case when the FAR* = *0 calculation of FAR is made by adding 1 to the original FP and TN values.*

*The performance scores are for the best out of 100 BCI-decoders trained on the first two sessions and tested on the novel simple piloting task session. The average balanced accuracy for all 100 BCI-decoders is given in parentheses for comparison.*

*The bacc\_p value for the group mean is the Wilcoxon signed rank test over the bacc\_mean values for the 7 subjects that the values are greater than 50.*

#### TABLE 5 | Novel test session classification performance: complex piloting task through Grand Canyon: detect perturbation piloting vs. no perturbation piloting.


*ID, Participant identification number; bacc\_mean, Balanced accuracy mean in percent; bacc\_ppi, Posterior probability intervals; bacc\_p, p value; TP, true positive (hit); FN, false negative (miss); FP, false positive (false alarm); TN, true negative (correct rejection); HR, hit rate; FAR, false alarm rate; d*′ *, d prime; a*′ *, a prime.*

*The bacc\_p value for the group mean is the Wilcoxon signed rank test over the bacc\_mean values for the seven participants that the values are greater than 50.*

#### TABLE 6 | Generalization of performance using best subjects weights: novel test session classification performance: complex piloting task through Grand Canyon: detect perturbation piloting vs. no perturbation piloting.


*ID, Participant identification number; bacc\_mean, Balanced accuracy mean in percent; bacc\_ppi, Posterior probability intervals; bacc\_p, p value; TP, true positive (hit); FN, false negative (miss); FP, false positive (false alarm); TN, true negative (correct rejection); HR, hit rate; FAR, false alarm rate; d*′ *, d prime; a*′ *, a prime.*

*The bacc\_p value for the group mean is the Wilcoxon signed rank test over the bacc\_mean values for the six subjects that the values are greater than 50.*

*The number in parentheses are the group mean values of the original decoder excluding sub01.* \**Denotes p* < *0.05 on paired Wilcoxon signed rank test for the comparison between the original decoder and the one trained with sub01 model over the six participants.*



*The p-value in the last column denotes the significance of the time savings improvement of the BCI adaptive automation over the original joystick based response times based on permutation testing of 1000 models trained with random labels.*

*ID, Participant identification number; N, Number of Perturbation Piloting Trials; TP, True Positives (hits); FP, False Positives (false alarms); rt diff, response time difference; Org, Original; se, standard error; Perm, Permuted; BCI Brain Computer Interface.*

*The Perm P-value for the group mean is the paired Wilcoxon signed rank test comparing the BCI rt diff values to the Perm rt diff values for the seven participants.*

#### TABLE 8 | Generalization of performance using best subjects weights: improvement in response time by adaptive automation complex piloting task through the Grand Canyon.


*The p-value in the last column denotes the significance of the time savings improvement of the BCI adaptive automation over the original joystick based response times based on permutation testing of 1000 models trained with random labels.*

*ID, Participant identification number; N, Number of Perturbation Piloting Trials; TP, True Positives (hits); FP, False Positives (false alarms); rt, response time; Org, Original; se, standard error; Perm, Permuted; BCI Brain Computer Interface.*

*The Perm P-value for the group mean is the paired Wilcoxon signed rank test comparing the BCI rt diff values to the Perm rt diff values for the six subjects.*

*The number in parentheses are the group mean values of the original decoder excluding sub01.* \**Denotes p* < *0.05 on paired Wilcoxon signed rank test for the comparison between the original decoder and the one trained with sub01 model over the six participants.*

evaluated by comparing the neuroadaptive automation response time difference (to that of the control stick response) relative to the distribution of response time differences of over 1000 models trained with randomly permuted labels (See Methods Section). The p value was computed by the number of times the models with permuted labels had larger response time differences than the BCI trained with the correct labeling over the 1000 permuted models (see **Table 7**). **Table 8** shows the time savings of the six participants tested using the weights from the model of the best participant on the complex piloting task. The same permutation technique as discussed above was used to evaluate statistical significance. While all participants showed a significant difference in time savings even using a model trained by a different participant, the time savings were significantly (p < 0.05; paired Wilcoxon signed rank test) greater when using their own model (median = 62.4 ms; mean = 71.9 ms) vs. the generalized model (median = 35.0 ms; mean = 51.5 ms).

#### DISCUSSION

The present study examined whether it is possible to decode neural signals associated with the intention to act in response to an impending hazard. Using MEG, the results showed that neural activity could be decoded so as to decrease the time needed to respond to the hazard, compared to manual action. As such, the results demonstrate that neuroadaptive automation can be implemented to speed up intentional action when there is very little available to respond.

There has been extensive prior research showing the effectiveness of both neuroadaptive automation (Byrne and


#### TABLE 9 | Flight characteristics of F22 on Grand Canyon task.

*ID, Participant identification number; DR, Decent Rate; Avg, average.*

*The climb/descent rate is variable depending on the attitude of the plane at time of perturbation. The values given are the (1) mean of the maximum slope of descent calculated over a 200 ms period across trials and (2) the greatest maximum slope of descent calculated over a 200 ms period across trials.*

*It is important to note that time and distance to ground saved by earlier elevator engagement is not only the savings in less descent toward ground but also allows for gain in altitude relative to time because of earlier climb.*

Parasuraman, 1996; Wilson and Russell, 2007; Ting et al., 2010) and passive BCI (Blankertz et al., 2010; Zander and Kothe, 2011) in enhancing human performance. However, the present study represents the first successful attempt to show that decoded neural activity can be used to potentially speed up split-second decision making in response to an impending hazard on a novel complex task that neither the participant or the classification model has been trained on. While the brain is indeed faster than the hand in responding to a hazard, its activity must be accurately decoded so as to accrue a savings in time. In the piloting task used in the present study, the mean savings in response time was 72 ms (ranging from 36.1 to 138.9 ms). Although this may seem relatively small, in situations where humans are moving at high speed toward a hazard, as in driving or piloting, the savings may be sufficient to avert disaster.

To put a time savings of 72 ms in context, consider the flight characteristics of a F22 aircraft on the complex piloting task. **Table 9** gives the response times to the in-flight perturbation for each participant [**Figures 5A,B** depicts the decoded response times plotted on the single trial activation waveforms of the adaptive automation (black circles) for participant 1 (best performer) and 3 (median performer), respectively]. Even with an average improvement of 72 ms in response time this can result in an average savings of up to 7.4 m of lost altitude as a result of earlier initiation of recovery in attitude to the perturbation. This could make a difference between a successful and failed attempt to avoid a collision. It should be noted that the large variability in savings between participants is likely a result of the quality of the MEG data in terms of separating task related activity from artifacts rather than expertise on the task. There was no apparent relationship between the savings afforded by the simulated neuroadaptive automation and manual response time on the task. It is known that there is considerable variability in the quality of MEG and EEG data across individuals that impacts successful BCI performance (Lotte et al., 2013).

It must be acknowledged, however, that the improvement in response time using neuroadaptive automation comes at the expense of making false alarms on a small number of trials. As in any automated alarm system, the tradeoff between correct early warning (hits) and false alarms has to be considered when setting the decision criterion of the alarm (Swets, 1973; Parasuraman and Riley, 1997). It may be possible in some cases to adjust the criteria of the BCI-decoder to make less false alarms at the expense of making less hits as well and reducing the overall response time improvement afforded by the neuroadaptive automation. For example in the study by Blankertz et al. (2002) the classifier was trained such that it was optimal under the constraint that the false positive (false alarm) rate attains a preset value.

The presence of a false alarm by the BCI-decoder could be somewhat problematic. Without some type of system that would identify externally induced perturbations from changes in attitude induced by the pilot in flight the neuroadaptive automation would initiate a recovery maneuver. Which in this case is to reverse the pitch down elevator deflection caused by the perturbation. Without the presence of a real perturbation, if the plane was in level flight and the BCI decoder made a false-alarm the neuroadaptive automation would cause the plane to make an abrupt climb. With respect to the pilot, this would constitute a pitch up perturbation. The goal of the hypothesized neuroadaptive automation is not to take control away from the pilot but rather to speed up the response of the pilot's motor intentions to unexpected flight conditions such as perturbation of attitude. While the use of detecting errorrelated potentials to decrease error rate has been successful in some implementations (Blankertz et al., 2002; Parra et al., 2003) it, unfortunately, is not likely to be of benefit in detecting motor intention to improve response time. This is because the relevant features for detecting the error-related potential on a single trial basis is after the response is made. One way to possibly keep the pilot in the loop and reduce the effects of

than the manual response time (white line). For perturbation absent trials the black circles denote false alarms. A red circle is shown over the original response time in the case when the simulated neuroadaptive automation failed to classify the trial as a hit (misses) or in which it was slower than the original response time (slow responses).

false-alarms is to engage the neuroadaptive automation for only a couple hundred milliseconds and immediately disengage it in response to opposite deflection of the flight controls by the pilot. This would reduce the detrimental effects of false-alarms and at the same time would speed up response to recover from true perturbations in the case of hits. Given that the BCI-decoder is extracting motor intention related activity it would be interesting to determine whether the pilot actually notices the engagement of the neuroadaptive automation in the case of hits or rather just feels that they are really fast in reacting. The extent to which pilot-automation induced oscillations arise and offset the beneficial affects of time savings of the neuroadaptive automation need to be investigated using closed-loop implementation of the system during flight simulation (It should be noted that our study reported here only uses an open-loop BCI decoder tested offline to test the feasibility of implementation in neuroadaptive automation).

Although the BCI-decoder was trained using a specified window (120 ms) centered at the time of the peak evoked response prior to movement onset to detect a perturbation causing a pitch down attitude while in straight and level flight over the ocean (simple visual field) it was able to generalize to a novel complex flight condition in which the pilot maneuvered the plane through the Grand Canyon. In this complex condition the orientation of the perturbation with respect to the horizon is dependent on the roll angle (bank angle) of the plane at the time of the perturbation. The magnitude of the perturbation reflected in negative deflection in the pitch axis is dependent on the planes attitude (pitch, roll, yaw axes), speed, airflow over the flight surfaces, and the time in which it takes for the pilot to initiate recovery (the longer it takes the bigger the perturbation effect). It is impressive the BCI-decoder is able to generalize to the novel complex flight condition given that the nature of the perturbation and the corresponding visual aspects of the scene and ongoing motor control are quite different from the training situation. As we envision the closedloop operational neuroadaptive automation system it would not need to know the magnitude of the perturbation (although this information may be available by flight instruments) as its job is to only initiate recovery based on the decoded motor intention of the pilot. It is up to the pilot to appropriately control the plane within the first couple hundred milliseconds after the neuroadaptive automation has been initiated. As it stands now the system is only set up to recover from a pitch down attitude. Ideally, we would like a system that could recover from a perturbation in attitude to any of the axes (pitch, roll, yaw) or combinations thereof. By comparing data from flight instruments that precisely measure attitude of all axes of the plane and pilot directed control movements the neuroadaptive automation could initiate the proper combination of control surface deflections to recover from various types of non-pilot induced perturbations. It would be interesting to test whether our system would generalize to other types of perturbation in attitude even though it was only trained on a pitch down perturbation. While this system using constraints determined by flight instruments may work in the case of perturbations it may not be effective in situations involving collision avoidance (e.g., with another aircraft or bird, etc.). In these situations it would be necessary to additionally build a BCI-decoder that can determine the desired direction of motor intention as it relates to the flight controls governing the attitude of the plane. This task may be difficult to accomplish within the framework of achieving the desired time savings to initiate recovery as fast as possible.

For the complex flying task no information was given concerning the timing of the peak of the event related evoked response to the onset of the perturbation used during training. Rather, the 120 ms window of the BCI-decoder progressed through the data in 8 ms steps until it identified an occurrence of a perturbation. The initial time window for the perturbation present trials started at the onset of the perturbation (the onset of the perturbation absent trials was randomly determined). However, there is no implicit information in this starting time that would reference the time of the evoked response upon which the BCI-decoder was trained. The presence of false alarms for the perturbation absent trials may be problematic for application of neuroadaptive automation working in a continuous manner given that the occurrence of true perturbations is quite sparse. As was discussed above, one way to reduce the number of false alarms made by the BCI-decoder is to only attempt to decode motor intention at points in which a perturbation is detected by flight instruments and then the appropriate recovery maneuver is applied by the neuroadaptive automation. One could implement a system that automatically recovers from a perturbation without regards to the pilot's intention ("Automation"). However, this is not the intention of the neuroadaptive automation proposed here for which the goal is to always keep the pilot intentions in control of the aircraft.

Previous research conducted on detection of driver braking intention, using EEG (Haufe et al., 2011, 2014; Kim et al., 2015), is relevant to the discussion of our results. In their studies as well as in ours simple amplitude based features related to the onset of the visual event were used for decoding the onset of movement intention. The visual event signaling the onset to move in the Haufe et al. (2011, 2014) and Kim et al. (2015) studies was the flashing of the brake light on the car just in front of the one the participant was driving. In our study the visual event signaling the onset to move was the changes in the optic flow field and the change in the position of the horizon (sky relative to ground) (See **Figure 2**). The finding that the best participant's decoding model generalizes to the remaining six subjects on detecting perturbation on the complex flying task with significant, although reduced, time savings (See **Tables 6, 8**), does suggest that the features selected by the model are not individual specific but are to some degree common across participants. As it stands now at least one session of the simple flying task is necessary to extract the task related independent component that help in artifact extraction. However, the finding that the BCI-decoder generalizes across participants (See **Tables 6, 8**) is promising in future attempts to make a generic system that does not require training.

There are three important aspects that distinguish our study from that of previous studies investigating braking intention.

The first is that our test condition was on a novel task that was fairly different from the one the BCI-decoder was trained on rather than just using a subset of trials on the same task for testing as is commonly done in decoding studies (Garrett et al., 2003; Wilson and Russell, 2007; Haufe et al., 2011, 2014; Baldwin and Penaranda, 2012; Callan et al., 2015; Kim et al., 2015). Our study demonstrates that a BCI-decoder trained on a simple task can generalize to a more complex one characteristic of real world conditions with significant performance in identifying perturbation events (mean bacc = 73%, p < 0.05; mean d′ = 1.52; mean a′ = 0.84; **Table 5**) with a significant time savings of 72 ms (**Table 7**).

The second is that the testing session (complex piloting task) requires that the participant use the same control stick to recover from the perturbation as used to maneuver the plane tracking above the river. Under these conditions it is necessary for the BCI-decoder to be able to distinguish brain activity related to the perturbation and the intention to move from ongoing changes in the visual field and motor intention required to pilot the plane. This is substantially different from decoding of movement intention of the foot from the accelerator to the brake in response to a flashing light. In order to extract neural activity related to movement intention in response to a perturbation, rather than that just related to the visual event, the BCI-decoder was trained to distinguish between trials in which the participant was to pull back on the control stick in response to a perturbation vs. just passively viewing the perturbation. All but one of the subjects showed significant classification performance in identifying movement intention trials from passive viewing trials on the test session (mean bacc = 69.8%, p < 0.05; mean d ′ = 1.30; mean a′ = 0.80; See **Table 4**). The ability of the BCI-decoder to be able to identify cases of motor intention in response to identical visual events likely contributes to its ability to distinguish between variations in brain activity in response to changes in the optic flow pattern and movement intention in response to a perturbation rather than changes in the optic flow pattern induced by piloting while maneuvering through the Grand Canyon.

The third is the difference in response time for emergency braking, that is approximately 650 ms (Haufe et al., 2011, 2014; Kim et al., 2015), compared to pulling back on the stick to recover from a perturbation, which took approximately 437 ms for the complex flying task and 369 ms for the simple flying task. One reason why the time savings in the braking studies [up to 222 ms using combined EEG and EMG (Haufe et al., 2014)] is larger than in our study (72 ms) may be attributed to the longer response time for emergency braking (over 200 ms longer). The mean peak of the event related potentials used as the target range to train the BCI-decoders in our study was 246 ms (See **Figure 3B**). Because of the relatively fast response times the slower event related potentials could not be used for decoding because they occur after the behavioral response has already been given. The mean response time for the complex flying task for the adaptive automation is 352.7 ms compared to the original of 425.0 ms. The mean time in which decoding performance reached an area under the curve (A′ ) value of 0.8 was also around 350 ms in the emergency braking studies (Haufe et al., 2011, 2014; Kim et al., 2015). It should be mentioned that the improvement in response time afforded by the adaptive automation in our study for some of the participants allowed them to have almost superhuman performance on this piloting task.

Given that the perturbation we employed abruptly alters the optic flow field we predict that visual motion processing areas as well as brain regions involved with motor intention (premotor cortex, motor cortex, somatosensory cortex, parietal cortex) are involved in decoding the decision for rapid movement in response to an impending hazard. While there is considerable variability in the extent and location of brain activity of the selected independent component used for the BCI-decoder for each participant there are regions that are commonly activated across the participants (See **Figures 3**, **4** and **Tables 2**, **3**). Consistent with our predictions all subjects showed some degree of activity in visual motion processing areas (hOC5, MT, IOG), as well as the premotor cortex, pre-central gyrus (motor cortex), post-central gyrus (somatosensory cortex), and parietal cortex (superior parietal lobule) (See **Figures 3**, **4** and **Tables 2**, **3**). The visual cortex (BA17,18) also showed some degree of activity from all subjects (See **Figures 3**, **4** and **Tables 2**, **3**). Our findings are consistent with fMRI research investigating action intentions from preparatory brain activity (Gallivan et al., 2011). In the Gallivan et al. (2011) study, decoded activity from voxels in multiple parietal, premotor, and motor regions was found to successfully predict intended future grasp and reach movements. A study using electrocortiocography (ECoG) revealed that in addition to motor and premotor activity somatosensory activity also precedes voluntary movement (Sun et al., 2015). The finding of predominantly caudal rather than rostral dorsal premotor cortex activation found for most participants in our study (See **Figures 3**, **4** and **Tables 2**, **3**) is interesting as it relates to studies showing that that action intention is processed more caudally and attention is processed more rostrally in the premotor cortex (Boussaoud, 2001).

A potential limitation of our study is the low number of participants. However, the primary aim of our study is to show the feasibility of the proposed approach for the development of neuroadaptive automation and to determine limitations that need to be addressed in future research. In our study the results from each individual participant are given. Even when individually tailored models were trained specifically on data from that participant there is some degree of variability in performance at predicting presence/absence of a perturbation (ranging from 63.7 to 85.6%, See **Table 5**), and the corresponding time savings (ranging from 36.1 to 138.9 ms, See **Table 7**), as well as the pattern of brain activation (See **Figures 3**, **4** and **Tables 2**, **3**). In the future, it may be interesting to investigate why some participants have better predictive models than others. These results strongly suggest that best performance will be achieved by individually tailored systems rather than using a generalized system that works across individuals. The drawback of individually tailored systems is the time necessary to train the system including ICA and the BCI-decoder. While this study does demonstrate that it is potentially possible to enhance response time by using an off-line BCI-decoder in these select participants it will be necessary to test a larger sample to see how well they generalize to the population in general and to determine factors predicting model efficacy.

Given that the mean time savings is 72 ms in the simulated offline open loop neuroadaptive automation system demonstrated here, it is important to discuss whether the processing time would be of sufficient speed to be used in a real-time closed loop neuroadaptive automation system (see **Figure 1**). The Yokogawa 400 channel MEG system at ATR is set up with a real-time processing system. The hardware and software for acquiring MEG channel data in real time and analog to digital conversion includes the following: National Instruments A/D Converter boards (6 Boards: 80 channels per board) can convert 400 MEG channels plus additional channels (EEG, EOG, triggers, etc.) sampled at 1000 Hz. To get high temporal precision that is stable the National Instruments real-time operating system "Pharlab" is used on a dedicated computer. Pharlab carries out filtering operations on 400 channel MEG and sends the analog to digital converted MEG channel data via UDP to a different computer for further processing in ∼1.5 ms. The application of the ICA weight matrix of the selected independent component to the 400 channel MEG data as well as the weight matrix of the BCIdecoder over the computed activation waveform is <0.1 ms. The ICA and BCI-decoder can operate in such a short time because the weights have been trained ahead of time based on data from previous sessions. Therefore, the data acquisition, preprocessing, and BCI-decoding can all be accomplished in <2 ms in the real-time system. The X-Plane flight simulator is running at around 400 Hz. It takes ∼2.5–5 ms for the flight simulator computer to receive the command from the BCIdecoder computer over UDP and initiate the directed command. Based on the specifications of the system at ATR the loss in time savings afforded by the simulated neuroadpative automation resulting from processing time would be ∼4.5–7 ms. This would still leave a mean time savings afforded by the neuroadaptive automation ranging from 65 to 67.5 ms, which could be of substantial benefit in hazardous time critical situations. Most of the delay (resulting in reduction of time savings) is in the processing speed of the flight simulator, which theoretically could be improved if using dedicated hardware and software in a real aircraft.

While this system is specific to the MEG setup at ATR it is possible to make such a dedicated real-time system that will work with EEG that can be used in real-world settings. In order for the system to be feasible in real aircraft it will be necessary to use a more portable technology such as EEG. The signal processing techniques used in this study together with automatic subspace reconstruction have been shown to be able to separate artifacts from brain related activity in flight even in an open cockpit biplane (Callan et al., 2015). It is uncertain whether moving from 400 channels to 64 or 20 channels with an EEG setup will have a large effect on system performance. Source localization would likely be considerably worse in the case of EEG especially with 20 channels compared with that of MEG. The number of channels will also play an important part in the ICA in the number of brain and artifact components that can be separated. In future research we will test an EEG based closed-loop version of this neuroadaptive automation system on a motion platform based flight simulator to determine its feasibility and additional processing that may be necessary if it were to be realized in actual manned or unmanned aircraft.

# CONCLUSION

Our study explores the potential that neuroadaptive automation may have in facilitating human performance. Our goal is to develop a system that enhances performance to super human levels during normal hands on operation of an airplane (vehicle) by reducing the response time by directly extracting from the brain the movement intention in response to a hazardous event. This approach differs considerably from those that utilize BCI to maneuver a vehicle by hands-off control by such methods as decoded mental imagery or attention related steady state visual evoked potentials (Blankertz et al., 2010; LaFleur et al., 2013). These applications of BCI, although impressive, are severely limited in performance compared to normal hands on control with the addition of greater workload as well as divided attention away from the task at hand (It should be noted however, that these types of BCI are of extreme benefit when the normal channels of motor control are impaired). Advantages of the neuroadaptive automation BCI implementation proposed here, afforded by the use of only brain activity naturally occurring during the perceptual motor task, include improved performance with no additional workload or attentional demands for the pilot (operator), as well as no training by the pilot to fit the BCI. However, it should be noted that human training protocols for utilizing BCI are likely to improve performance (Lotte et al., 2013). Our proposed BCI-decoder works continuously over time without any a-priori knowledge of when a perturbation may occur. In addition it was shown to be able to generalize to more complex tasks and differentiate between motor intention to an unexpected perturbation from that used during normal maneuvering. Future research needs to test the proposed neuroadaptive automation online using EEG in motion based flight simulators as well as in real airplanes to evaluate its realworld performance. It is interesting to conjecture whether the participants will notice when the neuroadpative automation is active or will they just think they are responding really fast. This research adds to the growing field of neuroergonomics and specifically to aviation cerebral experimental sciences. Our results, using an off-line BCI-decoder, suggest that indeed neuroadaptive automation can be implemented that is faster than

#### REFERENCES


the hand. The data can be shared with interested scientists upon request.

# AUTHOR CONTRIBUTIONS

DEC, CT, and DBC designed and conducted experiment. DEC, CT, DBC, MS, and RP analyzed results of the experiment. DEC and RP wrote the manuscript.

# FUNDING STATEMENT

Funding for this research was supported in part by a contract with the National Institute of Information and Communications Technology, Japan, entitled, "Multimodal integration for brain imaging measurements" and by the Center for Information and Neural Networks, National Institute of Information and Communications Technology. MS was supported by the National Institute of Information and Communications Technology. Additional support to RP was given by the Air Force Office of Sponsored Research grant FA9550-10-1-0385.

## ACKNOWLEDGMENTS

We would like to thank Eiji Nawa for his assistance with this research. We would also like to thank MEG and fMRI technicians Yasuhiro Shimada, Ichiro Fujimoto, Hiroaki Mano, and Hironori Nishimoto at the Brain Activity Imaging center at ATR as well as Yuka Furukawa for assisting in running the experiments. We would especially like to thank Yasushi Morikawa and Erika Matsumoto for helping in recruitment of glider pilots who participated in this study.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fnhum. 2016.00187

on Pattern Recognition IEEE (Istanbul), 3121–3124. doi: 10.1109/icpr.20 10.764


adaptive aiding. Hum. Factors 49, 1005–1018. doi: 10.1518/001872007X2 49875


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Callan, Terzibas, Cassel, Sato and Parasuraman. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Efficient Workload Classification based on Ignored Auditory Probes: A Proof of Concept

Raphaëlle N. Roy1,2 \*, Stéphane Bonnet1,3, Sylvie Charbonnier1,2 and Aurélie Campagne1,4

<sup>1</sup> Université Grenoble Alpes, Grenoble, France, <sup>2</sup> Gipsa-Lab, Centre National de la Recherche Scientifique, Grenoble, France, <sup>3</sup> CEA LETI, Grenoble, France, <sup>4</sup> Laboratoire de Psychologie et NeuroCognition, Centre National de la Recherche Scientifique, Grenoble, France

Mental workload is a mental state that is currently one of the main research focuses in neuroergonomics. It can notably be estimated using measurements in electroencephalography (EEG), a method that allows for direct mental state assessment. Auditory probes can be used to elicit event-related potentials (ERPs) that are modulated by workload. Although, some papers do report ERP modulations due to workload using attended or ignored probes, to our knowledge there is no literature regarding effective workload classification based on ignored auditory probes. In this paper, in order to efficiently estimate workload, we advocate for the use of such ignored auditory probes in a single-stimulus paradigm and a signal processing chain that includes a spatial filtering step. The effectiveness of this approach is demonstrated on data acquired from participants that performed the Multi-Attribute Task Battery – II. They carried out this task during two 10-min blocks. Each block corresponded to a workload condition that was pseudorandomly assigned. The easy condition consisted of two monitoring tasks performed in parallel, and the difficult one consisted of those two tasks with an additional plane driving task. Infrequent auditory probes were presented during the tasks and the participants were asked to ignore them. The EEG data were denoised and the probes' ERPs were extracted and spatially filtered using a canonical correlation analysis. Next, binary classification was performed using a Fisher LDA and a fivefold cross-validation procedure. Our method allowed for a very high estimation performance with a classification accuracy above 80% for every participant, and minimal intrusiveness thanks to the use of a single-stimulus paradigm. Therefore, this study paves the way to the efficient use of ERPs for mental state monitoring in close to real-life settings and contributes toward the development of adaptive user interfaces.

Keywords: workload, classification, auditory evoked potentials, spatial filtering

## INTRODUCTION

Mental workload is frequently defined as task difficulty and the associated mental effort (Gevins and Smith, 2007). It is therefore of critical interest to better assess this state to the human factor community who aims at developing smart technologies that enhance operator's safety and performance. The impact of workload on behavior has been extensively documented. Participants' reaction time is known to increase linearly with the increase in the number of items to memorize

#### Edited by:

Thorsten O. Zander, Technical University of Berlin, Germany

#### Reviewed by:

Virginia R. De Sa, Vanderbilt University, USA Hasan Ayaz, Drexel University, USA

\*Correspondence: Raphaëlle N. Roy roy.raphaelle@gmail.com

Received: 04 October 2015 Accepted: 30 September 2016 Published: 13 October 2016

#### Citation:

Roy RN, Bonnet S, Charbonnier S and Campagne A (2016) Efficient Workload Classification based on Ignored Auditory Probes: A Proof of Concept. Front. Hum. Neurosci. 10:519. doi: 10.3389/fnhum.2016.00519

(Sternberg, 1969), as well as with the number of tasks to perform in parallel (Cain, 2007). However, behavioral responses are not always enough for mental state monitoring (MSM) systems, mainly due to their latency of occurrence, and to the fact that some mental states are not necessarily or systematically reflected by a specific response. Physiological data give more insight into the operator's state, especially electroencephalography (EEG), a method that allows for direct mental state assessment. The use of physiological markers derived from the cerebral activity for human factor purposes has given rise to a new field: neuroergonomics (Parasuraman et al., 2012).

Amongst the various markers derived from the EEG activity, event-related potentials (ERPs) are frequently used for MSM. ERPs correspond to the EEG activity that is temporally locked to the appearance of a given stimulation, or probe. Although, ERPs only allow for a discontinuous evaluation of the operator's mental state -unlike frequency measures, according to Roy et al. (2016) frequency measures are very sensitive to mental fatigue and vigilance states whereas ERPs are more robust to these states. Therefore, ERPs may be more suitable for ecological settings. Moreover, the literature describes numerous workload-related ERP modulations, such as early and late components' amplitude decreases. Hence, the P300 component's amplitude is reduced by an increase in workload (Kok, 2001; Schultheis and Jameson, 2004; Gomarus et al., 2006; Holm et al., 2009; Friedrich et al., 2011), and so is the N1, N2, and P2 components' amplitude (Kramer et al., 1995; Ullsperger et al., 2001; Gomarus et al., 2006; Allison and Polich, 2008; Miller et al., 2011; Boonstra et al., 2013). In the specific context of simulated flight, the P300 component's amplitude elicited by auditory probes has also been shown to decrease when the primary task's complexity increases (Natani and Gomer, 1981; Kramer et al., 1987; Sirevaag et al., 1993).

In order to determine an operator's mental state to modify the behavior of a system, one needs to compute an index or a class of workload to be fed as an input. This can be done using machine learning algorithms developed for brain– computer interfaces (BCIs). When those algorithms are used for applications that are not directed toward the voluntary control of an effector, those systems are often referred to as passive BCIs (Zander and Kothe, 2011; van Erp et al., 2012). Although, the number of publications regarding mental workload assessment has drastically increased this decade, only a few articles actually propose a classification based on ERPs. Brouwer et al. (2012) used seven electrodes and achieved 64% of correct binary classifications. Recently, it was proved that ERP spatial filtering could significantly enhance workload classification (Mühl et al., 2014; Roy et al., 2015). The authors achieved 72 and 98% of correct classifications using respectively a Fisher spatial filtering (FSF; Hoffmann et al., 2006) and a canonical correlation analysis filtering (CCA; Hotelling, 1936). However, these authors used task-dependent probes, i.e., items that were paramount for the task at hand, which is therefore quite unrealistic for real-life settings. Very recently, Roy et al. (2016)showed that ERPs elicited by visual task-independent probes could be used for mental workload estimation. They inserted a basic detection task in a Sternberg memory task and used the ERPs elicited by the targets to classify the workload level of the memory task. They reached 91% of correct binary classifications by filtering the ERPs using a CCA. This is very promising, however the probes, although task-independent, still required an overt answer from the participant. This kind of dual task setting can therefore lead to decreased attentional engagement to the primary task, which seems rather unwelcome for operators' monitoring in hazardous work situations (e.g., driving, plant monitoring, custom control). Hence, the best approach to use ERPs in ecological settings would be a stimulation paradigm with task-independent and ignored probes. And as the ultimate goal should be to develop systems based on minimally intrusive probes, these stimulations should be as scarce as possible. As reported by Mertens and Polich (1997), the ERPs elicited in a single-stimulus paradigm by visual or auditory probes are a viable alternative to the traditional oddball procedure, although late components' amplitude is reduced when the stimuli are ignored compared to when they are counted or await a motor response. The authors even report that auditory probes elicit ERPs that are more robust to response type. That is to say that ignored auditory stimuli generate early and late components which amplitude is quite similar to that of stimuli awaiting an active answer. This makes them very good candidates for the features to use in a mental workload estimation procedure.

This study intends to provide an evaluation of the efficiency of a workload estimation based on the ERPs elicited by infrequent, task-independent and ignored auditory probes. Workload was modulated by modulating the number of tasks to perform in parallel with the Multi-Attribute Task Battery – II (MATB; Comstock and Arnegard, 1992). A single-stimulus paradigm was used to elicit ERPs which were then spatially filtered with a CCA and classified. The performance of this processing chain was also compared to that of a simpler chain without spatial filtering. The contributions of this paper are threefold: (1) to assess the validity of the single-stimulus paradigm for effective mental workload estimation; (2) to assess the relevance of a processing chain that includes a spatial filtering step in order to classify accurately the auditory evoked potentials (AEPs) of those ignored, infrequent probes; (3) to assess the relevance of both the stimulation paradigm and the processing chain for an ecologically valid task, the MATB.

### MATERIALS AND METHODS

This research was promoted by Grenoble's clinical research direction (France) and was approved by the French ethics committee (ID number: 2014-A00040-47) and the French health safety agency (B140052-31).

## Experimental Setup

Eight healthy right-handed volunteers (three females; 29.9 years old ± 5.9) performed two 10-min experimental blocks of the Multi-Attribute Task Battery-II, the last version of task developed by NASA to study divided attention and multitasking (Comstock and Arnegard, 1992; **Figure 1**). In this experimental setup, each block corresponded to a different workload level (low/high), which was pseudo-randomly assigned. In the low

workload condition, the participants performed two monitoring tasks using the keyboard, i.e., the system monitoring and the resource management tasks. The system monitoring task was presented in the upper left window of the display. As explained in the article of Comstock and Arnegard (1992), the demands of monitoring gages and warning lights were simulated here. The participants had to respond to the absence of the green light, the presence of the red light, and to monitor the four moving pointer dials deviation from midpoint. Regarding the resource management task, it simulated the demands of fuel management. The participants had to maintain tanks A and B at 2500 units each. This was done by turning on or off any of the eight pumps, which can sometimes fail.

In the high workload condition, they had an additional tracking task to manage in parallel. The tracking task was located in the upper middle window and simulated the demands of manual control. The participants had to keep the target at the center of the window using the joystick. Therefore, in both the low and high workload conditions perceptual, attentional, and decision making processes are recruited, along with motor preparation and performance. The difference between the low and the high workload conditions only stems in the additional workload imposed by the additional task.

In addition to the visual stimulations induced by the MATB-II, the participants received auditory stimuli. They were instructed to ignore these auditory stimuli and to focus on the task at hand. These stimuli were sent by the Eprime software (E-prime Psychology Software Tools, Inc., Pittsburgh, PA, USA) into their Sennheiser audioset. In a similar fashion to the single-stimulus paradigm of Allison and Polich (2008), they consisted of 100 ms 1000 Hz pure tones (10 ms rise/fall, 65 dB SPL), with a random 6– 30 s inter-tone interval (**Figure 2**). A minimum of 30 stimulations per block were presented.

# Data Acquisition

Data acquisition was performed at the IRMaGe Neurophysiology facility (Grenoble, France). The participants' answers to the Rating Scale Mental Effort questionnaire (RSME; Zijlstra, 1993)

and their resource management task root mean square (RMS) error scores were recorded, as well as their EEG activity using an Acticap <sup>R</sup> (Brain Products, Inc.) equipped with 32 Ag-AgCl unipolar active electrodes that were positioned according to the 10–20 system. The reference and ground electrodes used for acquisition were those of the Acticap, i.e., FCz for the reference, and AFz for the ground. The electro-oculographic activity was also recorded using two electrodes positioned at the eyes outer canthi, and two respectively above and below the left eye. Impedance was kept below 10 k for all electrodes. The signal was amplified using a BrainAmpTM system (Brain Products, Inc.) and sampled at 500 Hz with a 0.1 Hz high-pass filter and a 0.1 µV resolution. Participants were instructed to limit eye and body movements during the task.

#### Signal Processing

The processing chain is detailed in **Figure 3**. In a general manner, the raw data was preprocessed, then spatially filtered, and lastly classified. Details are given in the following subsections regarding each step of this chain. It should be noted that the same processing chain was replicated without the spatial filtering step in order to evaluate if spatial filtering enhances the discriminability of the two workload levels.

#### Preprocessing

The digital EEG signal was band-pass filtered between 1 and 40 Hz, and re-referenced to a common average reference. The signal was then epoched starting 100 ms before and ending 600 ms after the auditory stimulation. Next, artifacts related to ocular movements (saccades and blinks) were corrected using the signal recorded from the electrooculographic electrodes (EOG) and the Second Order Blind Identification algorithm (SOBI; Belouchrani et al., 1997). This algorithm was chosen to perform the source decomposition because thanks to its assumption of non-correlation –and not mutual independence- it has been shown to be more suitable for electrophysiological data by Congedo et al. (2008). In order to get closer to a system that could be implemented on-line in a real-life setting, the two sources that were the most correlated to the EOG activity were canceled. All trials were kept for analysis. The AEPs were then extracted by subtracting a 100 ms baseline (i.e., mean signal amplitude) to the 600 ms segment that starts at the onset of the stimulation. Lastly, the data was decimated to 100 Hz using a five-point moving average.

#### Spatial Filtering

Then, the preprocessed data **X** (N<sup>s</sup> – number of samples × N<sup>e</sup> – number of channels) were spatially filtered, resulting in the signal **Z = WX** (N<sup>s</sup> – number of samples × N<sup>f</sup> – number of spatial filters). Each column of the matrix **W** contains a spatial filter with its spatial pattern in the corresponding column of **A** = (**W**−<sup>1</sup> ) T . In this paper, we use CCA as a spatial filtering method. As Spüler et al. (2014) detailed it , in a two-class scenario the CCA filters are computed in order to maximize the correlation between the EEG signal **X** and the matrix **Y** = **D**1**P**1+**D**2**P**<sup>2</sup> that contains the time replication of the average ERP responses **P**<sup>i</sup> for each class. The matrix **D**<sup>i</sup> is a Toeplitz binary matrix that indicates the stimulation onset for the ith class (Rivet et al., 2009).

Several methods have been proposed to solve CCA by computing orthonormal bases for the data matrices either by QR or singular value decomposition – SVD (Björck and Golub, 1973).

The CCA spatial filters were computed using the training data only. Then, the spatial filters with the two highest associated canonical correlations were selected. When these filters are applied on the testing data, the feature vector for the jth trial is given by the column concatenation **f**j= vec(X<sup>j</sup> [**w**1**w**2]) with dimension 120x1 (i.e., 60 samples × 2 virtual electrodes). In order to have the same number of features for both processing chains (with and without spatial filtering), for the chain without spatial filtering the feature vector was composed of the concatenated signals of the C3 and Pz electrodes (chosen visually using the average spatial patterns presented in Section "Spatial patterns").

#### Classification

A single-trial classification was performed on the feature vector **f** using a Fisher linear discriminant analysis (FLDA), with a shrinkage estimation of the covariance matrices (Schäfer and Strimmer, 2005). As explained by Blankertz et al. (2011), this estimation method allows the use of LDA with high dimensional features and gives good results that can generalize well (Blankertz et al., 2011). We used a random fivefold cross-validation procedure. The spatial filters were learned on the training set, and applied on the testing set. In the same way, the shrinkage estimation was learned on the training set. The performance of the processing chains was assessed based on their intra-subject binary classification accuracy.

### Statistical Analyses

Statistical analyses were carried out on all results, i.e., subjective results from the RSME questionnaire, N1, P1, N2, P2, and P3 peak amplitude and latency from the AEP components, and classification results obtained using the processing chains with and without spatial filtering. All results were compared between themselves using repeated measures ANOVAs and Tukey post hoc tests. The significance level was set at 0.05.

#### RESULTS

#### Behavioral and Subjective Data

In a similar manner to Fournier et al. (1999), behavioral responses were standardized within each participant by dividing their response times to the resource management tasks by their proportion of correct responses. There was a significant effect of workload on this performance score (t = 2.99, p < 0.05), the participants' performance was significantly degraded in the high workload condition compared to the low workload condition (m1\_perf = 0.33; sd1\_perf = 0.12; m2\_perf = 0.43; sd2\_perf = 0.12). Moreover, the participants reported having furnished a significantly bigger effort in the high workload condition than in the low workload one [F(1,7) = 38.04, p < 0.01; m1\_RSME = 45.5; sd1\_RSME = 18.2; m2\_RSME = 71.6; sd2\_RSME = 24.3].

FIGURE 4 | Grand average (in bold) and standard deviation (dotted line) of the auditory evoked potentials (AEPs) elicited by the ignored infrequent auditory probes depending on workload condition at major midline electrode sites (Fz, Cz, Pz, and Oz), as well as at auditory processing relevant sites (T7 and T8).

#### Auditory Evoked Potentials

fnhum-10-00519 October 8, 2016 Time: 16:28 # 6

**Figure 4** gives the grand-average AEPs across participants at major median electrode sites (Fz, Cz, Pz, and Oz), as well as at electrode sites located close to the auditory cortex (T7 and T8). **Figure 5** also gives the individual AEPs for the eight participants at the Pz electrode site (chosen to illustrate the results that follow regarding early components). The typical components reported to be modulated by workload can be noticed, i.e., N1, P1, P2, N2, and P3 (Kramer et al., 1995; Kok, 2001; Ullsperger et al., 2001; Schultheis and Jameson, 2004; Gomarus et al., 2006; Allison and Polich, 2008; Holm et al., 2009; Miller et al., 2011; Boonstra et al., 2013). However, the statistical analyses revealed only few significant results at the group level, which is understandable given the mostly overlapping variance of both signals (see standard deviations in **Figure 4**). Indeed, with increasing workload there were only trends at the Pz electrode for a decrease in amplitude of the P1 component (p = 0.11; **Figure 5**) and for a decrease in latency of the N1 component (p = 0.07). Moreover, when workload increased there was a significant decrease in latency of the P2 component at all electrode sites [F(1,7) = 6.74, p < 0.05].

#### Spatial Patterns

The topographical representation of each of the two CCA spatial patterns obtained for each participant using the processing chain proposed in this paper are presented respectively in **Figures 6** and **7** (average across training folds). The first spatial pattern reveals that in order to better discriminate workload levels, our first selected spatial filter enhances the activity from centro-parietooccipital regions -consistent with attentional processing, while the second one enhances the activity from temporal regions consistent with auditory processing- as well as prefrontal areas which could be related to ocular activity.

## Filtered EEG Signal

The filtered EEG signals obtained from the testing sets for each participant using the first and second CCA filters are presented respectively in **Figures 8** and **9** (grand average across cross-validation folds). Both filters seem to mainly enhance the ERP activity of the low workload condition while decreasing it for the high workload condition from around 50 to 400 ms. This is particularly true for participants 3 and 4 for the early components. The signal's polarity fluctuates in a different manner depending on the participant and the filter, however, a general pattern emerges. Particularly, for both filters we can see an enhancement in the low workload condition of the amplitude of the early components that peak between 80 and 250 ms, be it in the negative or in the positive range. Therefore it seems that the filters act in a way so that they enhance the relevance of early auditory evoked components but not so much of later components.

for all participants (grand average across trials).

#### Classification Accuracy

The workload level classification results obtained using the single-stimulus paradigm and both the processing chain that includes a CCA spatial filtering step and the simpler processing chain without spatial filtering are given by **Figure 10** for each participant. There was a significant effect of the type of processing chain [F(1,7) = 39.90, p < 0.001]. Indeed, the chain that included the CCA spatial filtering step gave higher classification results than the one that didn't. The mean percentage of correct binary classification across the eight participants was 90.51% ( ± 10.7 SD) and 71.49% ( ± 15.9 SD) respectively for the processing chains with and without spatial filtering. Using the chain that included the CCA filtering, the performance was optimal for participant 4 with a classification accuracy of 100% and a null standard deviation, and the lowest performance was obtained for participant 7 with a classification accuracy of 80% and a very large standard deviation of 21.73.

#### DISCUSSION

Studies have demonstrated that workload modulates the ERPs elicited by attended or ignored auditory probes in a classical oddball paradigm involving deviant and standard tones (Kramer et al., 1995). Allison and Polich (2008) had also demonstrated this

phenomenon using only infrequent standard tones (i.e., singlestimulus paradigm). However, to our knowledge, there was no literature regarding effective workload classification based on ignored auditory probes. Indeed, no signal processing chain had been applied to estimate workload in an automatic way from the ERPs of ignored auditory stimuli. Hence, this study was intended to bring new light on the potential use of ignored infrequent task-independent probes to efficiently and automatically assess mental workload in a minimally intrusive way. In order to do so, a single-stimulus paradigm similar to that of Allison and Polich (2008) was used, along with a processing chain that included a CCA spatial filtering step. The participants rated their effort as significantly higher for the high workload condition than for the low one and also exhibited a decrease in performance in the high workload condition compared to the low workload condition akin to that observed by Fournier et al. (1999). Their ERPs revealed only trends for a decrease in P1 amplitude and N1 latency, as well as a significant decrease in P2 latency. These results are in line with the literature regarding resource allocation processes. In a general manner, the amplitude of the ERP components that occur within the first 250 ms following stimulus onset has been demonstrated to be influenced by attentional capacity allocated to the eliciting stimulus and task operations (for a review see Kok, 1997). For instance, the P1 component amplitude is larger in active relative to passive viewing conditions (Fu et al., 2010). As for the N1 and the

P2 latency, it has also been shown to decrease with a decrease in allocated attentional resources (Okita, 1979; Callaway and Halliday, 1982). However, more differences in amplitude were expected based on articles that describe workload modulations for ERPs elicited by task-dependent stimuli, specifically on late ERP components' amplitude (Kok, 2001; Ullsperger et al., 2001; Schultheis and Jameson, 2004; Gomarus et al., 2006; Holm et al., 2009; Miller et al., 2011; Boonstra et al., 2013). Yet, Kramer et al. (1995) had found that only the early components of the ERPs elicited by ignored task-irrelevant probes were relevant to perform a non-intrusive workload assessment, and that the late P300 component was not a good marker for such a goal. Our results confirm theirs as to which components are significantly modulated by workload using ignored auditory probes.

Despite the few significant results obtained at the group level regarding AEP components' amplitude, very accurate mental workload estimations were obtained using a signal processing chain that included a CCA spatial filtering step with at least 80% of correct binary classification accuracy for all participants, and an average of 90.51%. This result is in line with the literature that shows that classifiers can reveal statistical differences when standard statistical tests between ERPs do not (Noh and de Sa, 2014). Also, here the use of the CCA spatial filtering step significantly enhanced the estimation performance, as already demonstrated by Roy et al. (2015), and also reduced the variance in the results. Besides, this is only slightly lower than what Roy et al. (2016) obtained using task-independent visual probes -91%. This is very promising given that here, opposite to their protocol, the task-independent probes required no overt response and were ignored by the participants. Moreover, the probes used in this experiment are auditory while they were visual in their protocol. Lastly, those results are also higher than that obtained by previous studies that classified raw or spatially filtered ERPs elicited by task-dependent probes (Brouwer et al., 2012; Mühl et al., 2014). Therefore, the use of the single-stimulus paradigm coupled to a processing chain that includes a spatial filtering step allows a precise estimation of mental workload for a task that is very close to an actual work task. A limitation to this study is the number of trials, although in ecological settings it will be difficult to use more probes and to remain minimally intrusive. Nevertheless, according to Combrisson and Jerbi (2015), if we have more than 20 trials our performance should be over 70% in order to account for a significant detection with a p < 0.05 significance rate. Here, using the spatial filtering step we obtained at least 80% of correct detections, and an average of 90.51% with a minimum of 30 trials per condition. Therefore, we can say that our results were significantly above chance and that our method is quite efficient.

The spatial patterns of the selected CCA filters revealed that an enhancement of temporal and centro-parietal activity allowed reaching such high classification results. This is in accordance with the auditory nature of our probes. It is interesting to note that the activity that was enhanced by the spatial filters in the

FIGURE 9 | Individual filtered test data – using the second CCA filter – depending on workload condition (grand average across cross-validation folds).

previously mentioned study of Roy et al. (2016) who used taskindependent visual probes originated from the occipital sites, in accordance with the visual nature of their stimuli. In our study, given that the probes were auditory, we observed a specific enhancement of the activity from the temporal electrode sites. The signal from the centro-parietal sites was also enhanced in

their study as in ours. These sites are known to be involved with attentional processing, and more generally with resource engagement (Kok, 1997). Additionally, the patterns also revealed an implication of prefrontal sites, which could stem from an under-efficient ocular artifact correction step in our processing chain. Indeed, in order to preserve the cerebral activity as much as possible, we only deleted the 2 out of 32 sources that were the most correlated to respectively the vertical and horizontal EOG channels. Also, given that the MATB-II is a task that is very close to a real work task, it elicits more ocular movements than classical laboratory tasks during which participants are asked to fixate the center of the screen and to limit eye movements and blinks. In any case, if it is indeed ocular activity that our second spatial filter enhanced, it means that this ocular activity allows efficient mental workload estimation. This is not surprising given that blink frequency has been reported to vary depending on task difficulty (Holland and Tarlow, 1972; Tanaka and Yamaoka, 1993). Moreover, it is known that the appearance of an unexpected stimulation leads to a startle eyeblink reflex. This reflex is attenuated during a multiple-task –high workload- compared to a singletask condition –low workload (Neumann, 2002). Therefore, the ocular activity produced in response to an infrequent auditory probe could be an efficient marker of task engagement and mental workload. As Roy et al. (2014) already argued , if ocular activity is helping to discriminate workload levels, why remove it? Hence, it might be interesting for future developments to use a processing chain that either does not include an ocular artifact correction step, or, that does but performs classification by fusing two feature vectors, a clean EEG one and an ocular activity one. Multimodality in terms of origin of the physiological markers (e.g., cerebral or ocular) could therefore be the key to enhance classification accuracy for real-life implementations.

Besides, this study evaluates the relevance of a stimulation paradigm and its dedicated processing chain for an ecological task which is the MATB. Although still in a laboratory setting, this task is very close to that performed by pilots and air traffic controllers. However, it modulates workload only by varying the number of tasks to perform in parallel, that is to say by varying the participants' degree of divided attention. In order to pursue the evaluation of the relevance of this stimulation paradigm, future work should focus on an evaluation of its relevance for several tasks that modulate workload based on different cognitive functions, e.g., working memory load, divided attention, executive functions. To our knowledge, only Berka et al. (2007) assessed the relevance of an EEG marker across several types of tasks. But, they focused on frequency power in the classical EEG bands. Thus, in order to progress toward an efficient estimation in real-life settings, the literature still lacks a thorough comparison of ERP modulations due to workload across several tasks. Also, although the participants of our study told us that they were not annoyed by the auditory infrequent stimulations and generally entirely forgot about it, a more thorough investigation of the real cost of such a paradigm in terms of operator fatigue and efficiency should be carried out. What's more, in order to increase the practicality of EEG measures, the number of electrodes should be diminished. However, this study, along with that of Roy et al. (2015) clearly establishes the relevance of a spatial filtering step in order to enhance the discriminability between the two workload levels. Therefore, future studies should evaluate how to reduce the number of electrodes while keeping enough channels to efficiently apply such a filtering step.

Consequently, this study contributes to the neuroergonomics research topic on mental workload estimation by uncovering three main points. First, the single-stimulus paradigm in which participants are probed by infrequent task-independent and ignored probes allows minimally intrusive workload estimation. Second, a spatial filtering step such as a CCA filtering enables a very accurate AEP-based workload classification. Lastly, the combination of this single-stimulus paradigm with infrequent ignored probes and its dedicated processing chain allows efficient workload estimation for an ecologically valid task such as the MATB.

# CONCLUSION

This study has demonstrated as a proof-of-concept that a singlestimulus paradigm based on infrequent ignored auditory probes and its dedicated processing chain could allow a very accurate estimation of mental workload with a classification performance above 80% for every participant. This is also the first study to effectively classify workload based on ERPs elicited by ignored stimuli for a task that is very close to a real-life work situation. It paves the way toward the efficient use of ERPs for MSM and brings us closer to the implementation of user adaptive systems in ecological settings.

# AUTHOR CONTRIBUTIONS

Study conception and design: RR, SB, SC, AC. Acquisition of data: RR. Analysis and interpretation of data: RR, SB. Drafting of manuscript: RR. Critical revision: SB, SC, AC.

# ACKNOWLEDGMENTS

Grenoble Neurophysiology facility IRMaGe was partly funded by the French program "Investissement d'Avenir" run by the "Agence Nationale pour la Recherche": Grant "Infrastructure d'Avenir en Biologie Santé" (ANR-11-INBS-0006).

# REFERENCES

fnhum-10-00519 October 8, 2016 Time: 16:28 # 11



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Roy, Bonnet, Charbonnier and Campagne. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Gaussian Process Regression for Predictive But Interpretable Machine Learning Models: An Example of Predicting Mental Workload across Tasks

Matthew S. Caywood<sup>1</sup> , Daniel M. Roberts1,2, Jeffrey B. Colombe<sup>1</sup> \*, Hal S. Greenwald<sup>1</sup> and Monica Z. Weiland<sup>1</sup>

<sup>1</sup> The MITRE Corporation, McLean, VA, USA, <sup>2</sup> Department of Psychology, George Mason University, Fairfax, VA, USA

#### Edited by:

Thorsten O. Zander, Technical University of Berlin, Germany

#### Reviewed by:

Virginia R. De Sa, Vanderbilt University, USA Nicolas Langer, Child Mind Institute, USA Sheng-Hsiou Hsu, University of California, San Diego, USA

> \*Correspondence: Jeffrey B. Colombe jcolombe@mitre.org

Received: 23 October 2015 Accepted: 05 December 2016 Published: 11 January 2017

#### Citation:

Caywood MS, Roberts DM, Colombe JB, Greenwald HS and Weiland MZ (2017) Gaussian Process Regression for Predictive But Interpretable Machine Learning Models: An Example of Predicting Mental Workload across Tasks. Front. Hum. Neurosci. 10:647. doi: 10.3389/fnhum.2016.00647 There is increasing interest in real-time brain-computer interfaces (BCIs) for the passive monitoring of human cognitive state, including cognitive workload. Too often, however, effective BCIs based on machine learning techniques may function as "black boxes" that are difficult to analyze or interpret. In an effort toward more interpretable BCIs, we studied a family of N-back working memory tasks using a machine learning model, Gaussian Process Regression (GPR), which was both powerful and amenable to analysis. Participants performed the N-back task with three stimulus variants, auditoryverbal, visual-spatial, and visual-numeric, each at three working memory loads. GPR models were trained and tested on EEG data from all three task variants combined, in an effort to identify a model that could be predictive of mental workload demand regardless of stimulus modality. To provide a comparison for GPR performance, a model was additionally trained using multiple linear regression (MLR). The GPR model was effective when trained on individual participant EEG data, resulting in an average standardized mean squared error (sMSE) between true and predicted N-back levels of 0.44. In comparison, the MLR model using the same data resulted in an average sMSE of 0.55. We additionally demonstrate how GPR can be used to identify which EEG features are relevant for prediction of cognitive workload in an individual participant. A fraction of EEG features accounted for the majority of the model's predictive power; using only the top 25% of features performed nearly as well as using 100% of features. Subsets of features identified by linear models (ANOVA) were not as efficient as subsets identified by GPR. This raises the possibility of BCIs that require fewer model features while capturing all of the information needed to achieve high predictive accuracy.

Keywords: EEG, BCI, Gaussian Process Regression, machine learning, neuroergonomics

# INTRODUCTION

fnhum-10-00647 January 10, 2017 Time: 17:14 # 2

Neuroimaging methods, particularly inexpensive and noninvasive techniques such as electroencephalography (EEG) and functional near infrared spectroscopy (fNIRS), are increasingly being used to continuously assess the cognitive state of individuals during task performance, an example of Neuroergonomics (Parasuraman, 2003; Parasuraman and Rizzo, 2006). This information can be used to better understand the demands of the task being performed, assess the limitations of the individual, or be fed back into the system to adjust the task relative to the individual's current state. The use of physiological data to assess operator state has also recently been described as a 'passive' brain-computer interface (BCI) (Zander et al., 2009; Zander and Kothe, 2011), in contrast to traditional 'active' BCIs which utilize physiological data to allow an individual to act on the outside world (Wolpaw and Wolpaw, 2012).

Workload, the demand on the individual's attention and working memory, is a cognitive state of special interest for passive measurement during task performance. Cognitive Load Theory (CLT) (Sweller et al., 1998) for example, suggests that maintaining an optimal level of workload for a given task can assist in learning new material. Further, Coyne et al. (2009) incorporate CLT with Multiple Resource Theory (MRT) (Wickens, 2008), which distinguishes between different modes of mental demand, suggesting that real-time measurement of participant workload could be utilized to optimally redirect mental demand across the several modes of resources available, for example presentation of information in a spatial versus verbal code as delineated by MRT (see Coyne et al., 2009). Physiological measures of workload have been sought in a variety of tasks including N-back (Grimes et al., 2008; Baldwin and Penaranda, 2011; Ayaz et al., 2012; Brouwer et al., 2012), the Sternberg Memory Scanning Task (Wilson and Fisher, 1995; Baldwin and Penaranda, 2011), memory span tests (Baldwin and Penaranda, 2011; Chaouachi et al., 2011), the MAT-B multi-tasking scenario (Wilson and Russell, 2003b; Kothe and Makeig, 2011); and operational simulations (Wilson and Russell, 2003a; Ayaz et al., 2012). The extent of this literature reflects scientific awareness of the limitations of behavioral or subjective workload assessment techniques, including limited sensitivity (Gevins and Smith, 2003; Just et al., 2003), subjective bias, and intrusiveness.

EEG based workload monitoring has been explored using a variety of different machine learning approaches, including step-wise linear discriminant analysis (SWDA) (Wilson and Fisher, 1995; Wilson and Russell, 2003a), artificial neural networks (ANN) (Wilson and Russell, 2003a; Baldwin and Penaranda, 2011), naïve Bayes models (Grimes et al., 2008), and least-angle regression (Kothe and Makeig, 2011).

To estimate workload as defined above based on EEG spectra, we applied a supervised machine learning approach, performing a statistical regression to take processed neurophysiological signals as inputs and to predict the load parameter N from the N-back task as an output. Many previous projects seeking to predict mental workload have used a classifier rather than a regressor approach. For example Baldwin and Penaranda (2011) used three working memory tasks, each with two levels of imposed difficulty, Wilson and Fisher (1995) used a battery of tasks, each with two levels of difficulty, while both Wilson and Russell (2003b) and Kothe and Makeig (2011) used the Multi-Attribute Task Battery with two levels of difficulty. Wilson and Russell (2003a) distinguished between up to seven conditions, using three different simulated ATC tasks with three, three, and one level of difficulty, respectively. However, the condition levels were treated as categorical, with the authors using ANN and stepwise discriminant analysis as classifiers to discriminate between data from each condition. Similarly, Grimes et al. (2008) predicted working memory load within 4 levels of the N-back task (0- through 3-back), but classified the levels as categorical labels, rather than as a continuous construct.

Treating mental workload as a series of categorical states has the effect of forcing estimates of workload to reside in discrete categorical bins without any continuous variation. The N-back task is comprised of discrete task load levels N = 1, 2, 3, and considered in isolation, this task is readily amenable to prediction based on a classifier. However, we conceptualize the mental state of workload as potentially lying along a continuum of values that the N-back task visits at discrete levels due solely to the structure of the task, not necessarily due to the inherent structure of working memory and attentional resources. The neurophysiological data is continuous in nature, and in order to preserve any potential information about workload as a continuously varying mental state, we treated the predicted N as a continuous variable even though all the training data for N was discrete. This required the use of a regression method rather than a classification method. One consequence of treating workload as a continuous measure is that the appropriate measure of error to be minimized in supervised training, as well as for operational testing, is continuous rather than discrete. For this reason, we present predictor performance primarily in standardized mean square error (sMSE), discussed more fully in the Section "Materials and Methods."

Our choice of regression on a continuous task load variable was also motivated by a follow-on application of methods described here for estimating cognitive workload in a highly realistic en-route air traffic control (ATC) simulation, in which task difficulty was multivariate, and in each dimension highly granular and ordinal. This required a regressor rather than a classifier. The results presented here are meant to relate workload estimation to the dominant baseline literature on workload, and to generalize those studies to a broad variety of operational contexts including but not limited to ATC.

We employed Gaussian Process Regression (GPR; Rasmussen and Williams, 2005), a type of non-parametric regression, in which a single unknown target variable's status (in this case, the number 'N' back) is estimated as a function of the state of one or more known input variables (in this case, power spectra at each electrode in the EEG montage).

Parametric regression methods, for example multiple linear regression (MLR), replace training data with a user-specified function, such as a line or curve or surface in the geometric

space of inputs and outputs, whose parameters can be fitted to optimize estimation of outputs from inputs over the training data. For parametric methods, after the regression weights have been obtained, the original training data may be discarded. Non-parametric regression methods, by contrast, may keep the original training data to use as a scaffold for constructing a regressor function. Test data is compared to the training data points, with output value of the test point estimated via the distance of the test data input to the training data input. As a result of this weighting, estimates of output values form a locally smooth surface spanning the input data, in a process often referred to simply as smoothing. Non-parametric regression only assumes that data points with similar input values will be close in the output space. For GPR specifically, the form of the local weighting is defined by the covariance function and associated hyperparameters learned during model training.

This non-parametric GPR approach has several benefits with respect to cognitive monitoring. First, GPR makes few assumptions about the shape of the estimator function beyond the assumptions associated with the choice of covariance function. This is beneficial especially in high-dimensional input spaces, as is the case when there are many known variables for each data point, and the shape of the relationship between knowns and unknowns cannot easily be visualized and understood by a researcher.

Second, a GPR model can be constructed to change the width of the local weighting functions separately for each known input dimension during training, providing an indirect measure of that input dimension's relevance. Measuring relevance adds interpretability to the model, and can be used to relate the features used by the model to existing literature, or aid in understanding which of the input variables could be left out of the analysis with little or no reduction in predictive accuracy.

A third benefit of GPR is its robustness to spontaneous failure of sources of input during operational test use of a BCI, such as the loss of good electrical contact by an EEG electrode or other equipment failure. Changes in the set of features available to machine learning methods challenge parametric methods such as linear or quadratic models, which typically have dependencies between features. In contrast, GPR depends more directly on the data and is robust to such changes; it can even be applied to data containing many fewer features than the model received during training.

Finally, a fourth major benefit of GPR for cognitive monitoring is its inherently probabilistic nature, returning both point predictions and confidence intervals around those predictions. Confidence values associated with each prediction may be used to dynamically inform decisions about when to trust a trained model's predictions in operational settings.

While GPR has been used to classify EEG in the context of a BCI task involving imagined hand movement (Zhong et al., 2008; Wang et al., 2009), its use in cognitive state assessment has been limited (although see Chaouachi et al., 2011, 2015).

A reasonable assumption in cognitive neuroscience is that similar regions of the brain are engaged in similar functions across individuals during a specific task. This assumption motivates an approach to research that seeks constant neurophysiologic signatures for cognitive functions that generalize broadly among human participants. The present study has employed a more conservative and directed approach based on another reasonable assumption, which is that brain function involves learning, and that as a result, meaningful idiosyncratic differences may be expected among individuals with different learning histories, or within an individual over learning timescales. As such, we focused our analysis on a same-day, same-individual construct for training and testing our machine learning methods. Further, we did not set out to evaluate the neural basis of working memory and attention during task loading, although we regard this as an important goal for other research. Our goal was simply to evaluate the effectiveness and interpretability of a best-of-class machine learning approach for real-time, passive BCI targeted to cognitive monitoring in its simplest and most direct form.

We present a paradigm for assessment of cognitive workload for an operator, specifically the working memory and attentional demand based on measurable task load. We predict workload within several N-back tasks by training a GPR model, then testing it on held-out data from the same participant and session. The N-back task variants, which were designed to have face validity to an operational ATC task, include the following variations: auditory, numeric, and spatial. Finally, we analyze the GPR model to identify which EEG electrode sites, frequency bands, and derived features are essential to the predictive accuracy of the model, which serves to set a lower bound on the number of features required for accurate prediction.

# MATERIALS AND METHODS

#### Participants

The study included 16 male participants, aged 39–62 years old, selected for operational experience in the target operational domain of ATC. All participated voluntarily, and provided written informed consent after having had the procedures of the study described to them. Personally identifiable information for all participants was anonymized and kept secure by a trusted agent. All participants were salaried employees of the MITRE Corporation, and were compensated by allowing them to apply the time spent participating in the study to their work hours. Human subjects procedures were approved by the MITRE Corporation Institutional Review Board (MIRB), to which the Code of Federal Regulations, Title 45 (Public Welfare), Department of Health and Human Services, Part 46 (Protection of Human Subjects) applies for federally funded research involving human subjects.

#### Task

To change working memory load in a controlled manner, we used an N-back working memory task in one of three stimulus modes (Auditory, Numeric, Spatial) and three task levels (N = 1, 2, 3) for each mode. The N-back task required participants to view a series of stimuli and press the spacebar key when the currently presented stimulus matched the stimulus presented

N stimuli before the current one. The task was implemented in BrainWorkshop (Hoskinson, 2011), modified to synchronize with the EEG system.

The Auditory stimuli were NATO letters ('Alpha,' 'Bravo,' 'Charlie,' etc.) spoken by a computer-generated voice. Numeric stimuli were numbers of 3 or 4 digits, e.g., "505" or "6099," presented in the center of the screen. Spatial stimuli were blue squares presented in one of eight spatial locations on the screen, in a 3 × 3 grid leaving out the center square (**Figure 1**). Within each condition, eight unique stimuli were presented over the course of the block. Within the spatial condition, these eight stimuli were the aforementioned eight spatial positions. Within the Auditory and Numeric blocks, these eight stimuli were eight sounds or images randomly selected from a pool of 26 possible NATO letter sounds or 26 possible Numeric images. Each trial lasted 3 s, with visual stimuli in the Numeric and Spatial conditions remaining onscreen for the first 500 ms of the trial. The stimuli for each trial were selected pseudorandomly from the eight possible stimuli within the block, with an N-back match additionally forced on 1/8 of trials. The combination of the inherent 1/8 probability of random match and the independent forced match probability of 1/8 results in an overall 76.56% (7/8 <sup>∗</sup> 7/8) chance of non-matching stimuli and 23.44% chance of matching stimuli. Participants were instructed to respond to matching stimuli by pressing the spacebar key on a standard computer keyboard, while non-matching trials did not require a response.

The task was performed in blocks of 100 comparison trials of a single modality and task level. Participants were allowed to take short breaks between 100 trial blocks. Three 100-trial blocks of each N-back level were performed for each of three stimulus modes, totaling 900 trials for each participant. Stimulus modes were counterbalanced across participants, while N-back levels were performed in the order 1-Back, 2-Back, 3-Back, within each modality block. Before each block, a resting baseline condition was recorded, however, data from this resting baseline condition was not included in the regression models. Prior to the experimental blocks, participants completed 20 practice trials at each N-back level of a Color variant, in which participants indicated if the color of the current stimulus (a square presented in the center of the display) matched the color of the stimulus presented N stimuli prior.

Following each block, participants reported their subjective rating of block difficulty (subjective workload) on a 1–7 Likert scale from low to high workload. Subjective workload ratings were collected in order to confirm that the N-back task was subjectively experienced as more demanding as N-back level increased, as well as to investigate any subjective differences in demand between N-back modalities used (Auditory, Numeric, Spatial).

#### Behavioral Data

Accuracy on the N-back task is evaluated within each block as the number of true positive (TP) responses (correctly responding when the current stimulus matched the stimulus presented "N" back), divided by the sum of the TP responses, false positive (FP) responses (incorrectly responding when the current stimulus did not match the stimulus presented "N" back), and false negative (FN) responses (incorrectly failing to respond when the current stimulus matched the stimulus presented "N" back). This is equivalently described as accuracy = TP/(TP + FP + FN). As the N-back match probability was 23.44%, this places an upper limit of chance performance at 23.44%. For example, responding to all stimuli regardless of N-back match would generate an accuracy of 23.44%, while responding to no stimuli regardless of N-back match would generate an accuracy of 0%.

Behavioral accuracy and subjective workload were assessed via separate two-way repeated-measures ANOVAs, with factors N-back level (1, 2, 3) and task mode (Auditory, Numeric, Spatial). Mauchly's test was used to assess sphericity, with F-values adjusted via Greenhouse–Geisser correction where appropriate. Effect size is indicated by generalized eta squared (η 2 G ) (Olejnik and Algina, 2003), a measure of effect size appropriate for repeated measures designs (Bakeman, 2005).

## EEG Collection

EEG data were collected via a 32-channel actiCAP active electrode system and BrainAmp amplifier at a sampling rate of 500 Hz using Recorder software (Brain Products GmbH), with online reference at electrode FCz and online bandpass filter from 0.1 to 250 Hz.

#### Processing

Offline data analysis was completed with the EEGLAB toolbox for MATLAB (Delorme and Makeig, 2004) and custom MATLAB scripts. EEG signals were band-pass filtered to 1–50 Hz, downsampled to 250 Hz, and re-referenced to the average of the left and right mastoid sites (TP9 and TP10).

Trial epochs were extracted from 0 to 3 s post stimulus onset, and labeled according to N-back level, stimulus mode, and behavioral accuracy. Channels and epochs containing paroxysmal artifacts such as gross EMG or cap movement were identified via visual inspection, and were removed from further analysis (Delorme et al., 2007). Between 0 and 4 electrodes were removed per participant (mean of 0.75 electrodes were removed). The remaining epochs were decomposed via independent component analysis (ICA), using the extended InfoMax algorithm as implemented in EEGLAB. For each participant, independent components (IC) representing sources of artifact including eye blinks, lateral eye movements, and muscle activity were manually identified based on IC topography, frequency spectra, and time-domain activity, and were removed from the data.

#### Feature Extraction

fnhum-10-00647 January 10, 2017 Time: 17:14 # 5

Band-power features were extracted by transforming each epoch from the time to frequency domain via the Welch method. The Welch method averages the Fast Fourier Transform (FFT) results from several overlapping Hamming windowed segments. A window size of 500 points (2 s) and overlap of 250 points (1 s) were used, along with a 512 point FFT.

For each channel, frequencies were averaged into 6 prespecified bands, delta: 1–3 Hz, theta: 4–7 Hz, low alpha: 8–10 Hz, high alpha: 11–12 Hz, beta: 13–25 Hz, gamma: 26–40 Hz.

Band power values were then converted to the natural logarithm of their original values to more closely approximate a Gaussian distribution, and each feature was then zero-centered and normalized by its standard deviation on the training set. The same normalization was applied to trials from both training and test sets; both were z-scored relative to the mean and standard deviation of the training set. Trials from the test set were z-scored relative to the mean and standard deviation of the training set, rather than the test set, to place the test trials on the same scale as the training set. Scaling the test set trials to the test set mean and standard deviation could eliminate meaningful differences that could be present between training and test sets. For example, when using a trained model to derive workload predictions on a new task that is on average more difficult than the training task. In addition, for online prediction applications the mean and standard deviation of the full test set are unknown in advance.

### Machine Learning: Gaussian Process Regression

The Information present in EEG band-power features about task level was analyzed using a continuum of methods including ANOVA, MLR and GPR. Additionally, while imposed task level (the number 'N' back) was of primary interest, models were additionally constructed using participants' subjective rating of their mental demand as labels.

For machine learning, a feature vector composed of each of the 6 bands at each of the 32 electrode sites, less any electrodes rejected due to excessive artifact or poor electrode contact, was taken as input. The features were normalized as described in Section "Feature Extraction." The length of the feature vector is the product of the number of bands and number of electrode sites analyzed, and was thus of length 192 (6 <sup>∗</sup> 32) for the 11 participants for whom no electrodes were rejected due to artifact, and six elements (frequency bands) less for each rejected channel for the remaining five participants.

#### Gaussian Process Regression

A GPR model, a form of Bayesian non-linear regression, was trained using the Gaussian Processes for Machine Learning (GPML) library for MATLAB (Rasmussen and Williams, 2005; Rasmussen and Nickisch, 2010). A GPR model is defined primarily by the selection of a covariance function, which defines how the expected value of the target variable changes as values change across the input space. Here, a squared-exponential covariance function with automatic relevance determination (ARD) was used, in conjunction with a constant zero mean function. ARD refers to the inclusion of a length-scale for each feature within the covariance function, which can be examined after training to determine the relative importance of that feature to prediction. As described by Rasmussen and Williams (2005), the squared exponential covariance function with ARD is defined as:

k(xp, xq) =

$$\sigma\_f^2 \ast \exp\left(-\frac{1}{2}(\varkappa\_p - \varkappa\_q)^\mathsf{T} \ast (\operatorname{diag}(\ell)^{-2}) \ast (\varkappa\_p - \varkappa\_q)\right) \tag{1}$$

Where x<sup>p</sup> and x<sup>q</sup> represent values in the input space, σ 2 f represents the noise free signal variance, and ` is a vector of length-scales (one for each feature).

This covariance function is stationary in the sense that the relationship between values in the input space depends only on their distance, not to their particular location in the space. The squared exponential covariance function was selected a priori based on its relative simplicity, the assumption inherent in its use is that data points that are close in the input space will tend to be close in the output space. The constant zero mean function was selected as the data was normalized to have zero mean in the training set. Rasmussen and Williams (2005) present an in-depth presentation of the properties of different covariance and mean functions in the context of GPR.

The covariance and mean functions were used in conjunction with a Gaussian likelihood for prediction via the following equations, all from Rasmussen and Williams (2005):

$$f\_\* | X, \mathcal{Y}, X\_\* \sim N(\overline{f}\_\*, co\nu(f\_\*)) \tag{2}$$

Where f∗ is a posterior distribution, X is a matrix of training inputs, y is a vector of training targets, X∗ is a matrix of test inputs, f ∗ is the posterior mean, and cov(f∗) is the posterior covariance.

The posterior mean is specified as:

$$\overline{f\_\*} \stackrel{\Delta}{=} \mathbb{E}\left[f\_\*|X, Y, X\_\*\right] = K(X\_\*, X)[K(X, X) + \sigma\_n^2 I]^{-1} \mathcal{Y} \tag{3}$$

The posterior covariance is specified as:

$$cov(f\_\*) = 1$$

$$K(X\_\*, X\_\*) - K(X\_\*, X)[K(X, X) + \sigma\_n^2 I]^{-1} K(X, X\_\*) \tag{4}$$

Where K indicates a covariance matrix, and σ 2 n is a noise variance term.

The covariance function contains several hyperparameters, which are optimized during model training. Hyperparameters for the covariance function include a length-scale for each feature (`), and a noise free signal variance (σ 2 f ). In addition, the covariance function is evaluated using a Gaussian likelihood function, which has a single hyperparameter, the noise variance (σ 2 n ). The constant zero mean function has no hyperparameters.

Prior to each model run, these hyperparameters are set to default values, which are subsequently adjusting during model training. Here, for hyperparameters associated with the squared exponential covariance function with ARD, the length-scale for each feature was set to 10, and the signal variance was set to 1. Additionally, for the hyperparameter associated with the Gaussian likelihood function, the likelihood variance was set to 1. These hyperparameters are then optimized within each model run by the GPML library, by minimizing the negative log marginal likelihood on the training set, over 100 function evaluations.

After training the model, new predictions are made via the conditional distribution of target output values, given the test inputs, training inputs, training targets, covariance function, and associated hyperparameters. The mean and variance of the posterior target distribution are used to generate point predictions and confidence intervals, respectively.

#### Evaluation of Model Performance

Model performance at predicting N-back task level (N) was assessed via fivefold cross-validation with a five trial buffer between training and test sets. Data from each modality and N-back level block (9 blocks total) was split into five partitions, with each partition containing a contiguous block of trials. On any given fold of the fivefold cross-validation procedure, 4 of the 5 partitions (80% of data) were used for training the GPR model, with the remaining partition held out as a test set for assessing model performance. Additionally, any trials from the test set that occurred within five trials of a member of the training set were removed from the test set and not included in measures of model performance. Trials were removed from the test set, and not the training set, to ensure a constant amount of training data (4 of 5 partitions or 80%) across runs. These neighboring trials were removed in order to reduce any short-time scale effects of attention or participant posture on model performance. After identification of the training and test trials from each of the 9 blocks, the data, from these 9 blocks (3 N-back levels and 3 modalities) were pooled for training and testing, labeled by N-back level and subjective workload rating provided by each participant after each block, but not labeled by modality. Data from the three modalities were pooled in an attempt to identify features indicative of working memory load independent of any particular stimulus modality. Measures of prediction quality were obtained for each participant by combining the results from the five model runs. Specifically, for each participant, the true and predicted values from each model run of collected and used to compute a single sMSE and a single Pearson correlation coefficient for that participant. On average, 661.25 trials were included in each training set, and 88.66 trials were included in each test set. Despite the use of fivefold cross-validation, the number of trials in the average test set is less than 1/4 of the trials in the average training set due to the removal of trials from the set test partitions that occurred within five trials of a trial from the training set.

As a parametric regression model for performance comparison to GPR, we used MLR with one linear term per feature plus a constant term. The model training and testing functions were implemented using BCILAB (Kothe and Makeig, 2013). Our BCILAB plugins for Gaussian Processes (a BCILAB wrapper around the GPML library), and for MLR (a BCILAB wrapper around the 'regress' function in MATLAB), are available as open source code.

#### TABLE 1 | Predictive ability of feature subsets.

fnhum-10-00647 January 10, 2017 Time: 17:14 # 7


Feature subsets are categorized by whether they are subsets of electrode sites (over all frequency bands), frequency bands (at all electrode sites), or selected from sites × bands. <sup>a</sup>For this analysis, Emotiv EPOC sites AF3 and AF4, which were not included in our configuration, were substituted by adjacent sites Fp1 and Fp2.

Continuous prediction accuracy was quantified using two metrics: standardized mean squared error (sMSE) and Pearson correlation coefficient (r). sMSE is the mean squared error (MSE) of true and predicted values, divided by the variance of the true values. sMSE has a characteristic scale of 0–1 and, due to the standardization on the variance of the true values, is dimensionless, unlike the MSE. Like MSE, sMSE equals 0 for a perfectly accurate prediction. However, due to standardization sMSE equals 1 for a naïve model which always predicts the mean of the ground truth values, and exceeds 1 for predictions that are more erroneous than could be obtained by only predicting the mean of the ground truth values. For machine learning purposes, r ranges from 1 (perfect accuracy) to 0 (uncorrelated); however, a naïve model predicting a constant output will show positive r. Additionally, although mental workload is argued to be best treated as a continuous, rather than discrete, variable, we have also included discretized versions of the continuous MLR and GPR output. These predictions were included to allow the presented results to be more readily compared with other reports in which discrete classification is performed, and are computed by rounding each continuous prediction to the nearest label in the training set (i.e., a continuous prediction of 2.4 is relabeled as 2), then computing the fraction of predictions which have the correct label.

While predicting the imposed task load is of primary focus, an additional model was trained to predict subjectively experienced workload, using the reports provided by each participant following each task block. This model was computed in the same manner as the previously described model for imposed task load, with the exception of each trial being labeled according to the subjective workload provided by that participant for that block (a value that can range from 1 to 7), rather than the imposed task load (1–3).

Additionally, models were constructed with data from single task variants, in order to investigate the ability of the model to predict the task load within task variants relative to across task variants. Data from each task variant and load was split into five partitions, with a separation of at least five trials between partitions, as previously described for the primary analysis. While the primary analysis combined data across the three task variants for a given fold, the present analysis used data from only a single task variant for training, and a single task variant for testing. For example, the first run of training on the Auditory task and testing on the Auditory task uses the first training fold and first testing fold of exclusively Auditory task data. In contrast, the first run of training on Auditory task and testing on Spatial task uses the first training fold of exclusively Auditory task data, and first testing fold of exclusively Spatial task data. As three task variants were included in the experiment, generating nine combinations of training and test task variants.

#### Feature Analysis

To illuminate the association between individual participants' EEG features and working memory load prediction, we used two techniques. First, we applied a one-way ANOVA for task level (the number N-back; 1, 2, 3) to individual participants' EEG data. Second, using the trained GPR predictive model, we examined ARD length scales of each feature to identify which played the greatest role in prediction. ARD length scales were also used to evaluate the predictive power of alternate EEG electrode

FIGURE 3 | GPR predictions of N-back level for participants 1–8. The figure displays predictions derived from the 5 cross-validation folds in a single graph. Main graph shows ground truth task load (black line). Predicted load is represented as the point prediction for each trial, for both Gaussian Process (GP) and Multiple Linear Regression (MLR) predictors, displayed in blue x's and green o's, respectively. Task block for each section of the experiment (Auditory, Numeric, Spatial) is indicated by colored labels above the predictions. The gray region shows the ± 2σ confidence interval for each GP point prediction generated by the model. sMSE for both GP and MLR models are included in the lower left of each participant panel. Participant behavioral performance is shown in the line graph at top of each subplot (+ = correct, − = incorrect, with incorrect points also colored in red) in order to visually examine the relation between model prediction and participant behavioral performance on the task.

FIGURE 4 | GPR predictions of N-back level for participants 9–16. The figure displays predictions derived from the 5 cross-validation folds in a single graph. Main graph shows ground truth task load (black line). Predicted load is represented as the point prediction for each trial, for both Gaussian Process (GP) and Multiple Linear Regression (MLR) predictors, displayed in blue x's and green o's, respectively. Task block for each section of the experiment (Auditory, Numeric, Spatial) is indicated by colored labels above the predictions. The gray region shows the ± 2σ confidence interval for each GP point prediction generated by the model. sMSE for both GP and MLR models are included in the lower left of each participant panel. Participant behavioral performance is shown in the line graph at top of each subplot (+ = correct, − = incorrect, with incorrect points also colored in red) in order to visually examine the relation between model prediction and participant behavioral performance on the task.


TABLE 2 | Predictive ability of Gaussian Process Regression (GPR) model in comparison to multiple linear regression (MLR) model, predicting either task load or subjective workload using all model features.

Model performance is provided via Pearson's correlation coefficient 'r', standardized mean squared error (sMSE), and categorical classification accuracy. Values displayed are the mean performance across participants ± the standard error of the mean across participants.

montages mapped to other commercial EEG equipment, as is further explained in Section "Results."

#### RESULTS

#### Behavioral Data

Participants completed all N-back working memory tasks (Auditory, Numeric and Spatial tasks) at above chance performance within all N-back levels. As N-back level increased, performance significantly decreased. Across all participants and modalities, mean 1-back performance was 97%, mean 2-back performance was 79%, and mean 3-back performance was 46% (**Figure 2**). There was a main effect of level, F(2,30) = 135.108, p < 0.001, η 2 <sup>G</sup> = 0.747, as well as a main effect of mode, F(2,30) = 14.457, p < 0.001, η 2 <sup>G</sup> = 0.097, and a level by mode interaction, F(4,60) = 3.336, p = 0.016, η 2 <sup>G</sup> = 0.033.

As indicated by the reported measure of effect size generalized eta squared (η 2 G ), the effect of N-back level on performance was of greater magnitude than the effect of modality of performance.

Participants reported subjective workload levels spanning from 1 to 7, with mean 1-back workload 2.0, mean 2-back workload 3.9, and mean 3-back workload 5.9 (**Figure 2**). For subjective workload, there is both a main effect of level, F(2,30) = 181.449, p < 0.001, η 2 <sup>G</sup> = 0.735, and a main effect of mode, F(2,30) = 7.773, p = 0.002, η 2 <sup>G</sup> = 0.042, while the level by mode interaction was not significant (p > 0.10).

Similar to task accuracy, according to the reported measure of effect size, generalized eta squared, the effect of N-back level on subjective workload was of greater magnitude than the effect of modality on subjective workload. Participants performed the Spatial task more accurately, and additionally rated it as lower in subjective workload, in comparison to the Auditory or Numeric tasks. Although each task used only 8 stimuli within each block, it is possible that the consistent use of the same 8 spatial locations across blocks of the spatial task contributed to this performance and subjective workload difference.

#### Predictive Accuracy of BCI

The GPR with ARD was trained to predict task level for individual participants on a mixture of all three N-back tasks, and tested on the left-out test data using fivefold cross-validation.

Individually trained GPR models were able to predict task level across participants with high accuracy. sMSE mean and standard error across multiple participants was 0.44 ± 0.04, where 0 is perfect prediction and 1 is a model which performs no better than a naive model always predicting the mean of ground truth (**Table 1**). Pearson's r correlation was 0.75 ± 0.03, where r = 1 is perfect, r = 0 is uncorrelated. (All error estimates are given as standard error of the mean.) The GPR predictions of task level for each trial are presented within **Figure 3** (participants 1–8) and **Figure 4** (participants 9–16). The predictions derived from the 5 model folds have been merged into a single dataset for presentation.

The models trained using GPR and all features performed significantly better than models trained using MLR and all features, using the same training and test folds. GPR models had mean sMSE of 0.44 ± 0.04, while MLR models had mean sMSE of 0.55 ± 0.04, t(1,15) = −6.28, p < 0.001. Similarly, models trained to predict subjective workload ratings performed better using GPR than MLR. The subjective workload model trained using GPR had mean sMSE of 0.43 ± 0.04, while the analogous model trained using MLR had mean sMSE of 0.54 ± 0.04, t(1,15) = −6.07, p < 0.001. Measures of model quality in terms of Pearson r and discretized classification are provided in **Table 2** for comparison with other paradigms. The GPR predictions of subjective workload for each trial are presented within **Figure 5** (participants 1–8) and **Figure 6** (participants 9– 16). The predictions derived from the 5 model folds have been merged into a single dataset for presentation. **Table 3** additionally displays the sMSE for each participant, both collected across the 5 runs prior to calculating sMSE, and the mean and standard deviation of sMSE calculated by first computing sMSE within run. The sMSE for each participant collected across the 5 runs prior to calculating sMSE is very similar to the mean of sMSE calculated by first computing sMSE within runs.

Comparing the performance of the GPR model trained on task level to the equivalent model trained on subjective workload, the ability to predict the two label types was not significantly different t(1,15) = 0.53, p = 0.606.

# Feature Analysis in the ANOVA, GPR, and MLR Models

To determine which EEG band-site features were significantly associated with N-back level, we applied a one-way ANOVA for level to individual participants' EEG data. Because the predictive model was also individualized, it was necessary to analyze individual data rather than group effects as is commonly done in cognitive neuroscience.

Using the GPR predictive model, we examined the set of features to identify which played the greatest role in prediction. When ARD is used in training a GPR, the resulting length scale

FIGURE 6 | GPR predictions of subjective workload for participants 9–16. The figure displays predictions derived from the 5 cross-validation folds in a single graph. Main graph shows ground truth subjective workload (black line) provided by the participant at the end of the block. Predicted subjective workload is represented as the point prediction for each trial, for both Gaussian Process (GP) and Multiple Linear Regression (MLR) predictors, displayed in blue x's and green o's, respectively. Task block for each section of the experiment (Auditory, Numeric, Spatial) is indicated by colored labels above the predictions. The gray region shows the ± 2σ confidence interval for each GP point prediction generated by the model. sMSE for both GP and MLR models are included in the lower left of each participant panel. Participant behavioral performance is shown in the line graph at top of each subplot (+ = correct, − = incorrect, with incorrect points also colored in red) in order to visually examine the relation between model prediction and participant behavioral performance on the task.



The total sMSE combines the results of each run into vectors of true and predicted values, which weights each test example equally. The sMSE mean over 5 runs averages the sMSE results derived from each test run. The sMSE standard deviation is the standard deviation of the sMSE results derived from each test run. The total sMSE and sMSE mean are not equal because each test run can have different numbers of test trials, due to the removal of any test trials that occurred within 5 trials of a training trial.

of each feature indicates the relative sensitivity of the model to changes in that feature's value (MacKay, 2003; Rasmussen and Williams, 2005). A model is more sensitive to features with short length scales and least sensitive (most invariant) to features with long length scales.

**Figures 7** and **8** show one-way ANOVA F-values compared to GPR length scales for each channel × band power feature, for each of the 16 participants. The values displayed are the average of the values for that participant, over the 5 runs of crossvalidation performed. Although there is substantial betweenparticipant variability, gamma band features at occipital and temporal sites are commonly utilized by the GPR models for prediction.

Unlike the features with significant level effects in ANOVA, the (most sensitive) features with the shortest length scales are not generally clustered into individual frequency bands, with the exception of the gamma band, where several channels are uniformly short in length scale. This lack of spatial patterning was also typical across participants.

As the MLR predictions were derived from a multivariate regression, multicollinearity between features can make interpretation of the resulting regression coefficients difficult or misleading (Haufe et al., 2014). Weights from the MLR models were therefore transformed into activation patterns via Equation (6) from Haufe et al. (2014). Specifically, the activations are derived by:

$$A = \sum\_{\chi} W \sum\_{\S}^{-1} \tag{5}$$

Where P x is the covariance of the data, W is the multivariate regression weights, and P−<sup>1</sup> Sˆ is the inverse covariance matrix of the latent factors, in this case simply the N-back level labels. **Figure 9** displays the activation patterns from the MLR models predicting task load.

# Feature Selection and Prediction Accuracy

For each participant, we compared the predictive ability of several feature subsets. The feature subset "All" (i.e., all electrodes × all bands) was the upper bound on predictive accuracy for this data set (**Table 1**).

To illuminate which frequency bands are important to the task, we considered the predictive accuracy of feature subsets corresponding to single frequency bands (e.g., the 32 features corresponding to the beta band at all electrodes). While all bands contributed to predictive accuracy, the largest contribution came from features in the beta and gamma frequency range. This suggests that information that might be discounted by standard EEG analysis can be highly informative in the context of a BCI predicting workload.

How important was it for the BCI to include all 32 electrodes for this task? We considered feature subsets with a smaller number of EEG electrode sites than were actually measured (but all frequency bands). The model's accuracy for several subsets of electrode sites, averaged over all 16 participants, is also shown (**Table 1**). We compared the montage of our laboratory EEG headset to the montage of two EEG headsets including fewer electrodes, one focused on rapid deployment (B-Alert X10) and one on affordability for home use by consumers (Emotiv EPOC). In this task, the 16 channels present in the Emotiv EPOC device, primarily near equatorial sites such as F7, F8, P3, P4, P7, and P8, capture much of the model's predictive ability. However, the montage of channels present in the B-Alert X-10, more along midline sites such as Fz, Cz, and POz are less effective, generating similar performance as achieved by only looking at a single region's channels (e.g., parietal channels or occipital channels).

One operationally relevant scenario is that a full laboratory electrode cap might be used to calibrate a model for a participant before switching to a simpler EEG device for operational use. We tested this concept by training the GPR model on the full feature set, leaving out features using ARD or ANOVA F-values, then testing the newly reduced model's predictive power. With this paradigm, we found that selecting a reduced feature model using GPR length-scales was more resilient than reducing models using ANOVA features. For both the top 25% and top 50% of features, selection based on training data GPR length-scale generated a model with lower test set sMSE in comparison to selection based on training data ANOVA F-value, as evaluated with paired samples t-tests; [t(1,15) = −5.62, p < 0.001 for the top 25% of features, t(1,15) = −2.68, p = 0.017] for the top 50% of features, see **Table 1**.

A similar method allowed us to measure the absolute minimum number of features required for prediction of task level. Each individual's feature length scales were sorted from shortest to longest, and the GPR model was tested on subsets of increasing size, from 1 to 100% of total features, in increments of

1% (**Figure 10**). As features are added beyond the minimum level required for the model to function, the trend is for classifier error to decrease monotonically until it plateaus near the minimum sMSE of the full model. Approximately 20% of the total number of features are sufficient for prediction quality near the full model.

# Prediction within and across Task Variants

Predictions obtained using single task variants for training and testing are obtained in **Table 4**. When the training and test data are from the same task variant (the diagonal of the table), sMSE is approximately the same or lower than what was obtained by combining training and test data across pooled modalities. However, when training and test data are obtained from differing task variants, prediction is no better than what could be obtained by naively predicting the mean of the target distribution.

# DISCUSSION

We used GPR to train a model capable of accurately predicting N-back working memory load or workload. When data from

all three task variants were pooled for training and testing, above chance predictions were obtained. This result is consistent with a meta-analysis of functional magnetic resonance imaging (fMRI) studies using N-back variations which showed a frontoparietal network which, although affected by the nature of the information retained, is generally active across all N-back variants (Owen et al., 2005). However, if data was trained exclusively on a single task variant, then prediction on alterative task variants was no better than a naïve model which always predicts the mean task load. Training exclusively on a single task variant may overfit to that particular variant, impairing prediction when the test variant differs. It is possible that improved cross-variant prediction could be obtained by modification of the GPR model to account for greater uncertainty in predicting a new task variant.

For the pooled data, predictive accuracy was high overall (sMSE = 0.44, r = 0.75), although the GPR model was less able to predict (or extrapolate to) extreme values, tending to smooth extreme values to middling values. This limitation is typical of interpolation based regressors such as GPR, especially given the limited number of data points (∼800) relative to the high dimensionality of the data (up to 192 per participant, dependant

on whether any channels were removed due to excessive artifact). Extrapolation might be improved by the use of alternative covariance functions incorporating linear terms.

GPR performed significantly better than the baseline performance established by a simpler parametric technique, MLR. This was the case for models trained and tested on N-back task load, as well as for models trained and tested on subjective workload ratings provided by the participants after each block of the task. Model performance between N-back task load and subjective workload was similar, likely due to the strong relation between N-back task load and subjectively reported workload as reported in the behavioral results.

Applying feature subset selection to the model revealed that feature subsets selected based on techniques such as ANOVA are significantly less efficient at prediction than the subsets identified by GPR with ARD. For example, using the top 25% of features derived from GPR generates model performance approximately equivalent to the top 50% of features derived from an ANOVA model. Similarly, models using GPR consistently outperformed models utilizing MLR, a simpler but less flexible approach.

Periods of data containing obvious muscular artifacts were manually rejected from the dataset prior to training the machine learning model. Despite this, features in the gamma band of the EEG were most sensitive to variations in N-back level. Several works have cautioned against the use of higher frequency band power features such as beta and gamma for workload estimation (Gerjets et al., 2014; Brouwer et al., 2015), due to EMG contamination from differential motor activity in different blocks

TABLE 4 | Predictive ability of Gaussian Process Regression (GPR) models trained and tested on trials from single working memory task variants.


The predicted variable for all models was task load. Model performance is provided as the mean and standardized error of the mean of standardized mean square error (sMSE).

of the task. Here, the N-back task was utilized, in which mental workload between task levels is varied by task instruction rather than alteration of the perceptual or motor demands of the task. While perceptual demands do vary between the modality variants of the task utilized (auditory, numeric, spatial), as our predictor was trained and tested on a random selection of data from each of three modalities at each N-back level, the N-back level groupings do not contain systematic differences in perceptual or motor demands that would aid prediction.

As gamma band power is susceptible to contamination from muscular artifact (Muthukumaraswamy, 2013), the source of these features cannot be assumed to be of neural origin. However, the consistency of the gamma band features across participants despite the constant motor demands of the N-back suggests that, if predictions result in part due to EMG activity, more direct measures of EMG activity may prove useful for mental workload classification. It is possible that the diagnosticity of the gamma band feature is due not to a confound in how participants respond between levels of the task, but subtle postural changes on the part of participants as mental demand increases. In this sense, gamma band features may be considered an artifact when EEG is used to measure electrical activity of exclusively neural origin, but in our analysis may also represent a feature that is truly diagnostic of mental demand, and not simply an experimental design confound.

It is often desirable to minimize the number of electrode sites required for a BCI. In laboratory settings with standard electrode caps, using fewer sites can reduce experimental preparatory time or allow experimenters to allocate more time to ensuring a lowimpedance connection at key sites, improving data quality. In custom-designed electrode caps, fewer sites may also reduce the size, weight, and power (SWaP) and cost of BCI systems intended to operate in real-time. Taking advantage of our non-parametric GPR model, we were able to demonstrate a method for determining subsets of channels that capture the full predictive accuracy of the entire electrode cap – and even determine the minimum number of channels, or even EEG features, required for accuracy. We observed that the 16 channels present in a commercial off-the-shelf device, mostly lateral sites near the head's equator, capture a very large fraction of the predictive ability of the full 32 channel laboratory cap. Devices with such electrode montages might be used in future experiments, provided their EEG signal quality is acceptable.

Despite what we believe to be an overall contribution to the field, several limitations of the current report should be noted. The present paradigm used 80% of the available data for training on each cross-validation fold. This amount of training data may not be practical to acquire before a real-time device could be utilized. Additionally, our models were trained and tested within each participant. A more optimal model would be participant independent. It possible that these issues could be partially mitigated by adapting data or Gaussian Process hyperparameters that were learned from previous participants to reduce the training time required for new participants. Additionally, the present work contains data from 16 participants, all of whom are male and middle-aged. Future reports should expand workload prediction using larger and more demographically variable participant samples. Finally, while stimulus modality was randomized, participants completed increasingly demanding experimental blocks within each modality. Therefore, fatigue or tiredness could potentially contribute to estimations of mental demand.

#### CONCLUSION

There is potentially great value in real-time, non-invasive monitoring of cognitive states by 'passive' BCI using methods such as electroencephalography (EEG). Cognitive variables such as workload, which are predictive of operational errors, are potentially valuable targets for real-time monitoring. Information about these variables may be useful in a variety of downstream applications, including providing situational awareness for human operators, alerting operators about highworkload situations, testing and training operators, redesign of interfaces, and redesign of working practices to optimize operator performance.

In this paper, we used EEG to monitor cognitive workload during a simple working memory task (N-back) in multiple sensory and cognitive modalities (Auditory, Numeric, and Spatial). Calibration from training data was demonstrated to be effective using GPR, out performing a more basic model utilizing MLR. GPR also provided the ability to assess the relative predictive value of each input variable (EEG electrode sites, and frequency bands at each site, together summarized as EEG 'features') in predicting the workload variable of interest. The GPR approach was superior to conventional analysis of variance

# REFERENCES


(ANOVA) methods in determining which reduced subsets of EEG features from the training set would be most predictive about the cognitive variable of interest in the test set. This type of analysis may inform engineering efforts to produce EEG systems with few electrodes placed at the most highly informative sites on the scalp for the desired evaluations.

The current approach can be placed within a class of methods that seek to use techniques from machine learning to not only make predictions, but glean useful information about the neural or behavioral processes under study. In another example, Noh and de Sa (2014) have reported that a machine learning model trained on a subset of EEG data can be used to select features for traditional hypothesis testing on an independent test set. As this method derives candidate features for discriminating between conditions from the independent training set, it avoids the issue of multiple comparisons encountered when performing traditional hypothesis testing on several potential features within a single set of data.

In addition, in contrast to more traditional statistical methods such as MLR, the GPR approach provides confidence intervals around each prediction. Information regarding the confidence of a predictor may be useful in operational domains in order to determine when to trust the outputs of the predictive model. For example, a test point that contains data that is far outside what was observed within the training set would be predicted with a large confidence interval.

# AUTHOR CONTRIBUTIONS

MC: Co-led research effort, collected and analyzed data, wrote software code, wrote paper. DR: Collected majority of data, wrote software code, analyzed data, contributed to writing of paper. JC: Participated in data collection, edited paper. HG: Participated in data collection, wrote stimulus/task software. MW: Initiated project plan, co-led research effort, participated in data collection.

## ACKNOWLEDGMENTS

This research was supported by the MITRE Innovation Program. Approved for public release, distribution unlimited, MITRE case number 15-2950.

power and ERPs in the n-back task. J. Neural Eng. 9, 045008. doi: 10.1088/1741- 2560/9/4/045008



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 The MITRE Corporation. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Evaluation of an Adaptive Game that Uses EEG Measures Validated during the Design Process as Inputs to a Biocybernetic Loop

#### Kate C. Ewing, Stephen H. Fairclough\* and Kiel Gilleade

School of Natural Sciences and Psychology, Liverpool John Moores University, Liverpool, UK

Biocybernetic adaptation is a form of physiological computing whereby real-time data streaming from the brain and body is used by a negative control loop to adapt the user interface. This article describes the development of an adaptive game system that is designed to maximize player engagement by utilizing changes in real-time electroencephalography (EEG) to adjust the level of game demand. The research consists of four main stages: (1) the development of a conceptual framework upon which to model the interaction between person and system; (2) the validation of the psychophysiological inference underpinning the loop; (3) the construction of a working prototype; and (4) an evaluation of the adaptive game. Two studies are reported. The first demonstrates the sensitivity of EEG power in the (frontal) theta and (parietal) alpha bands to changing levels of game demand. These variables were then reformulated within the working biocybernetic control loop designed to maximize player engagement. The second study evaluated the performance of an adaptive game of Tetris with respect to system behavior and user experience. Important issues for the design and evaluation of closed-loop interfaces are discussed.

#### Edited by:

Mikhail Lebedev, Duke University, USA

#### Reviewed by:

Dimitrios Kourtis, Central European University, Hungary Viktor Müller, Max Planck Institute for Human Development, Germany Jean-Arthur Micoulaud Franchi, Centre National de la Recherche Scientifique, France

#### \*Correspondence:

Stephen H. Fairclough s.fairclough@ljmu.ac.uk

Received: 14 December 2015 Accepted: 29 April 2016 Published: 18 May 2016

#### Citation:

Ewing KC, Fairclough SH and Gilleade K (2016) Evaluation of an Adaptive Game that Uses EEG Measures Validated during the Design Process as Inputs to a Biocybernetic Loop. Front. Hum. Neurosci. 10:223. doi: 10.3389/fnhum.2016.00223 Keywords: psychophysiology, EEG, gaming, physiological computing, adaptive interface, effort, engagement

#### INTRODUCTION

Biocybernetic control describes how the implicit measurement of physiological signals from the brain or body can be transformed into a control input for real-time software adaptation. This category of physiological computing system (Fairclough, 2009) has also been described as a passive brain-computer interface (Zander and Kothe, 2011) because the user simply responds to events at the interface without any requirement for volitional control. The purpose of biocybernetic adaptation is to create a seamless and tacit form of human-computer interaction where software adaptation is timely and intuitive from the perspective of the user.

The biocybernetic model has been applied to a range of domains, such as: adaptive automation (Bailey et al., 2006), detection of negative emotions (Kapoor et al., 2007), adaptive robotics (Liu et al., 2007) and support for social behavior (Chanel and Mühl, 2015). An early example of a working biocybernetic control loop was developed by NASA in the 1990s where the real-time analysis of electroencephalography (EEG) signals was converted into an input variable for the control of the level of system automation during simulated aviation tasks (Pope et al., 1995; Freeman et al., 1999; Prinzel et al., 2000; Scerbo et al., 2003). This control loop was designed to sustain operator engagement within an optimal zone that avoided complacency and inattention by selectively disabling system automation in order to oblige the operator to engage with a manual interface. This example of biocybernetic control set a blueprint for a data processing protocol wherein electrocortical activity interacts with a computerized system within a negative feedback loop. This model of closed-loop control detects deviations from an optimal state of brain activity and uses these variations to cue changes at the human-computer interface in order to ''pull'' the psychological state of the user in a desired direction.

The design of a biocybernetic closed-loop incorporates a number of distinct processing stages: (1) data collection from sensors, (2) filtering of raw data coupled with artifact correction techniques, (3) data analysis for the extraction of meaningful metrics that permit a valid inference of the user state, (4) conversion of the metrics in order to instigate adaptation at the user interface, i.e., defining criteria/triggers for adaptation or by categorizing data using machine learning algorithms (Baldwin and Penaranda, 2012; Novak et al., 2012); and (5) adaptation of the user interface in a manner designed to promote a desirable user state.

All biocybernetic closed-loop systems are rooted in a psychophysiological inference; for example: inferring increased arousal from increases in skin conductance level, inferring negative affect from activation of the corrugator supercilli. The validity of this inference is fundamental to the integrity of a working loop, but the process of establishing validity is complex (Cacioppo and Tassinary, 1990). The loop is designed to utilize software adaptation in order to influence a key psychological concept or dimension in the user, e.g., engagement, mental workload, attention. If the fundamental link between input measures, the psychological concept targeted by those measures and the adaptive logic of the loop is weak or tenuous, then the effectiveness of the closed-loop system will be compromised (Fairclough, 2007, 2009). Because the loop works in real-time, it is important that measures are: (a) sufficiently sensitive to changes in the relevant psychological dimension; and (b) specific to that dimension, i.e., not confounded with other psychological variables. Consequently it is important to construct biocybernetic loops on the basis of measures that have either been scientifically validated according to research literature or tested and validated in the context of the target task or application.

This article describes the development of an adaptive computer game where the software responds in real-time in order to enhance the experience of the player by making the game appropriately challenging. Optimizing task difficulty is one of several methods of adapting gaming experiences using biocybernetics, others include enhancing emotional engagement and reducing player frustration (Gilleade et al., 2005). This closed-loop approach employs the same logic that underpins the integration of biofeedback mechanics into gaming applications (Nacke et al., 2011) and the design of adaptive games dedicated to the creation of a specific emotion (Dekker and Champion, 2007). One goal for an adaptive game is to deliver a level of difficulty tailored to the skills of the player via closed-loop control such that the game is personalized to the skills and abilities of each player. This article will describe the development of an adaptive game of Tetris designed to sustain player engagement (see also Chanel et al., 2011) and also an experimental study intended to validate the psychophysiological inference underpinning the system that was conducted prior to the creation of the working prototype.

Game construction began with the formulation of a conceptual framework upon which to model the responses of the adaptive game. Our framework was based upon the Motivational Intensity Model (MIM: Wright, 2008) which describes the relationship between effort investment and task demand; a model that has been corroborated via a number of experimental studies (e.g., Wright and Kirby, 2001; Richter et al., 2008; Richter and Gendolla, 2009). One prediction of this model is that effort rises proportionally with increases in task difficulty until demand is so great that the human deems task success unlikely and withdraws effort, the result of which is a shark-fin shaped effort curve (**Figure 1**). The MIM was adapted to provide a conceptual framework for defining a desirable state of player engagement that could serve as the target for the biocybernetic loop. The adaptation took account of research upon the gaming experience to define an ideal ''zone'' state for the player. For instance, Csikszentmihalyi (1990) described the ideal or optimal level of engagement as ''flow''; a state where engagement with a task is full to the point that time seems to slip away. According to Nacke and Lindley (2008) flow is characterized by an absence of undesirable mental states (i.e., boredom) and entails a positive emotional experience. Similar states, such as being in the zone or total immersion have been described by Chen (2004) and Ryan et al. (2006) respectively. The observation has also been made that situations of high effort promote skill development and an opportunity to demonstrate mastery or competence that leads to a positive gaming experience (Ryan et al., 2006). Thus, the MIM was adapted to represent four broad categories of player state; boredom, engagement, zone and overload (**Figure 1**). The conceptual distinction between these four categories was used to define adaptive goals for the biocybernetic loop, namely:


• To make no adjustment when the player occupied the target states of engagement and zone

In order for the control loop to work within this framework (**Figure 1**) the model must be operationalized using psychophysiological measures. The MIM has been extensively corroborated by cardiovascular indices of mental effort (e.g., Wright and Dill, 1993; Wright and Kirby, 2001; Richter et al., 2008; Richter and Gendolla, 2009), however cardiovascular measures have a number of limitations as inputs to a biocybernetic loop including an inability to diagnose and monitor individual psychological dimensions of effort, e.g., reactivity in blood pressure is simultaneously sensitive to motivation, cognitive effort and physical effort (Cacioppo et al., 2000). By contrast EEG provides a wide choice of metrics that permit a multidimensional monitoring of engagement, including spontaneous oscillations, evoked and event-related potentials (EPs and ERPs), different frequency bands, scalp locations and power values. Multivariate combinations of EEG measures have demonstrated impressive levels of accuracy at discriminating user workload (e.g., Gevins et al., 1998; Prinzel et al., 2003; Scerbo et al., 2003; Chanel et al., 2008; Christensen et al., 2012). Of particular interest are EEG oscillations in the alpha (7.5–13 Hz) and theta (4–7 Hz) bands, which are reliable measures of cortical activation and mental effort (e.g., Gevins et al., 1998; Klimesch, 1999; Wilson, 2002, 2003). In an earlier study (Fairclough et al., 2013), measures of power in the alpha and theta bands were sensitive to manipulations of cognitive demand and motivational incentives using the N-back working memory task, however, the capacity of these metrics to index demand and motivation in the context of a computer game remained unknown.

# STUDY ONE: VALIDATION OF INPUT MEASURES

#### Introduction

An experimental study was conducted to evaluate the sensitivity and reliability of the EEG alpha and theta bands to variations in game demand and motivation during the play upon the popular game Tetris. The study aimed to establish: (a) the most suitable EEG measures to use as inputs to a real-time biocybernetic loop; and (b) an appropriate framework for the operationalization of the MIM with respect to measures of spontaneous EEG. The study employed a within subjects design and involved game based manipulations of motivation and demand: three levels of game demand were tested (low, high, excessive) along with two incentive conditions whereby a game-based incentive was present in one condition and absent in the other. It was expected that changes in oscillatory EEG activity in the alpha and theta bands would capture: (1) situations of low effort (i.e., due to boredom or overload); (2) instances of effort increasing in line with demand (when players were engaged with the game) and most significantly; and (3) when players were in the ''zone'' (when maximal effort was apparent; **Figure 1**). It was also anticipated that the addition of an incentive would increase effort investment provided that game success was likely (Wright, 2008).

# Method

#### Participants

Twenty participants (11 females) took part in the experiment. Participants were aged between 19 and 36 years, and had a mean age of 23.2 years (SD = 4.02). All participants were volunteers who gave their written informed consent prior to data collection in accordance with the Declaration of Helsinki.

#### Game Demand

Cognitive demand was manipulated using an adapted version of the Tetris game. The game requires participants to rotate and move falling pieces in order to build rows of blocks at the bottom of a game board. Falling pieces were one of seven possible colored shapes; each comprised of four squares arranged in different configurations. Pieces were selected to fall in random order. In order to allow gameplay for a fixed duration of 180 s the conventional Tetris game-board was adapted to prevent gamedeath (when pieces stack to the top of the board to signal game-over). The adaptation consisted of shifting the game-board upwards so that the highest stacked piece was maintained at the center of the game board, and was unable to rise above this level (**Figure 2**).

The speed and quantity of the falling pieces were systematically manipulated to create three levels of game demand (low, high or excessive). In the low demand condition, an average of 22.1 pieces fell with a drop speed of 2.5 board squares s−<sup>1</sup> An average of 66.2 pieces fell with a drop speed of 6.7 board squares s−<sup>1</sup> in the high demand condition. In the excessive demand condition, an average of 217 pieces fell at a drop speed of 20 board squares s−<sup>1</sup> . These parameters were determined on the basis of a small pilot study (N = 7).

#### Incentives

Games were presented in one of two incentive conditions (incentive + performance feedback vs. no incentive + no performance feedback). Each participant completed both incentive conditions (i.e., within-subjects). In the incentive + feedback condition, game coins could be earned for completing rows of Tetris pieces. Coins were accrued in proportion to the number of rows cleared relative to the maximum possible row clearance, such that a maximum of 70 coins could be earned (representing 100% possible clearance). Between zero and seven coins were accumulated every 10 s depending on the proportion of maximum cleared rows achieved at the time of accrual, i.e., at the end of each game, best performance = 70 coins and worst performance = 0 coins (**Figure 2**). Sounds were presented with each award of coins: ''kerching'' with an award (if current total was less than 35 coins) or ''coin jackpot'' (if total was over 35 coins). In the no-incentive (+ no feedback) condition, the display related to the coin incentive was absent and no sound effects related to the award of coins were played. For both incentive conditions sound effects occurred when rotating the

pieces (small ''pop'') and shifting the pieces left or right (small ''snap'').

#### Experimental Design

The experiment consisted of six 180 s games (2 incentive blocks × 3 levels of demand per incentive block). Incentive blocks were delivered in a counterbalanced order and each level of demand presented in random order within each incentive block. Post hoc T-tests questionnaires were completed after each game. Throughout each game EEG was measured along with task performance; the total duration of the experimental session was approximately 40 min. Participants practiced by playing each of the six game versions once prior to the experiment and the fitting of EEG equipment. The procedure for the experiment and data collection protocol was approved by the Liverpool John Moores University (LJMU) University Research Ethics Committee and the experiment was conducted in accordance with the recommendations of the LJMU University Research Ethics Committee.

#### Subjective Questionnaires

Subjective workload was assessed using the NASA Task Load Index (TLX; Hart and Staveland, 1988) which consists of six scales (subjective effort, mental demand, temporal demand, physical demand, perception of performance and frustration). Subjective levels of motivation were assessed using the Dundee State Stress Questionnaire (DSSQ) v1.2 motivation scale, which includes eight items relating to motivation, task enjoyment, desire for success, task value, mental effort, agreeableness on completion, concern over poor performance and eagerness to do well (Matthews et al., 1999). Participants completed one version of each questionnaire immediately after each of the six experimental conditions.

#### EEG Recording and Analysis

EEG was recorded monopolarly from 64 Ag–AgCl pin-type active electrodes mounted in a BioSemi stretch-lycra head cap. Electrodes were positioned using the international 10–20 system and recorded activity from the following sites: frontal pole (FPz, FP1 and FP2), anterior-frontal (AFz, AF3, AF4, AF7 and AF8), frontal (Fz, F1, F2, F3, F4, F5, F6, F7 and F8), fronto-central (FCz, FC1, FC2, FC3, FC4, FC5 and FC6), central (Cz, C1, C2, C3, C4, C5 and C6), temporal (FT7, FT8, T7, T8, TP7 and TP8), parieto-central (CPz, CP1, CP2, CP3, CP4, CP5 and CP6), parietal (Pz, P1, P2, P3, P4, P5, P6, P7, P8, P9 and P10), occipitoparietal (POz, PO3, PO4, PO7 and PO8) and occipital/inion (Oz, O1, O2 and Iz). Two reference electrodes, the ''common mode sense'' (CMS) and ''driven right leg'' (DRL) were used; these function via a feedback loop to drive the participant's voltage (acquired via CMS) as close as possible to zero. AC differential amplifiers performed continuous digitization at 16,384 Hz which was then down-sampled online to 256 Hz. No filters were applied online to allow visual inspection of noise. Offline filtering was performed using a notch filter of 50 Hz and high and low pass filters of 0.05 and 40 Hz respectively. The data were visually inspected for artifacts from external electromagnetic sources. Automatic correction of blink artifacts and horizontal and vertical saccades was performed using detection through predefined topographies. Muscle activity over 100 µV was also excluded. Fast fourier transforms (FFTs) were computed over 50% overlapped windows of 2 s (512 points). The total power in µV <sup>2</sup> was obtained for lower alpha frequency band (7.5–10 Hz), upper alpha frequency band (10.5–13 Hz) and theta frequency band (4–7 Hz; Klimesch, 1999). For the analysis of spectral power in the alpha bands data from the electrodes most spatially representative of the regions of interest were used i.e., frontal (F3, F4); temporal (T7, T8); central (C3, C4); parietal (P3, P4); occipital (O1, O2). This selection permitted analysis of distributed signals whilst minimizing type one error. The theta band used in this study consisted of a 1 Hz window taken around the frequency of peak modulation within the 4–7 Hz theta range for each participant. This was in order to individualize measurements and maximize their validity. As the majority of participants tend not to produce a clear peak frequency within the theta band, and because there tends to be a large inter-individual variability in the magnitude of the theta response to demand, individualization of the measure was deemed necessary (Gevins et al., 1998). The method involved (for each participant) plotting the spectral power values that lay within the 4–7 Hz theta band for each demand condition on a graph where frequency was represented on the x-axis and spectral power on the y-axis. The graph for each participant was then visually inspected to discern the theta frequency possessing the greatest demand related modulation of power. Many participants did not display a unique frequency with the greatest power modulation, but instead a small window of similar frequencies that displayed greater modulation than the other theta frequencies; for this reason a 1 Hz window was selected for each participant. Power spectra values for both alpha and theta bands were log transformed (using the natural log) to normalize distribution. A single 180 s continuously recorded data segment was analyzed for each experimental condition.

#### Statistical Analysis

A priori hypotheses concerning effects for demand were tested using repeated measures analyses of variance (ANOVA). Multivariate analyses are reported using the Pillai's Trace statistic and where multivariate tests failed to reach significance, due to a small sample size (N = 20) significant univariate analyses are reported. Greenhouse-Geisser corrections were applied for violations of sphericity as indicated by Mauchly's test. Alpha levels for a priori tests were set at 0.05. Significant omnibus effects have been followed up with post hoc tests where the alpha levels were corrected to minimize Type one errors using the Bonferroni adjustment.

#### Results

#### Performance

A 2 × 3 (incentive × demand) repeated measures ANOVA was performed on game performance scores (i.e., the percentage of successful line completions), which revealed an omnibus effect for demand (F(2,18) = 504.8, p < 0.01, η <sup>2</sup> = 0.98). There were no main or interaction effects for the incentive. Post hoc tests revealed that performance was significantly reduced at excessive compared to high demand (p < 0.01) and low demand (p < 0.01). Performance scores were also significantly lower at high compared to low demand (p < 0.01), descriptive statistics are presented in **Table 1**.

#### Subjective Self-Report Data

A 2 × 3 (incentive × demand) MANOVA on scores for the six scales of the NASA TLX revealed significant main effects for

TABLE 1 | Mean scores and standard deviation (in brackets) for Tetris performance (the percentage of rows completed; N = 20).


demand (F(12,220) = 22.64, p < 0.01, η <sup>2</sup> = 0.55) and incentive (F(6,109) = 2.85, p < 0.05, η <sup>2</sup> = 0.14). Ratings of mental, physical and temporal demand increased significantly with each increment in demand (all p < 0.05). Effort ratings increased from low to high demand (p < 0.01) and showed a marginally significant increase at excessive vs. high demand (p = 0.05). Perceptions of performance quality were reduced at excessive vs. high and low demand (both p < 0.01) while frustration was elevated at excessive vs. high and low demand (both p < 0.01). Ratings of mental demand, physical demand and effort all increased with incentive (p < 0.05). However there was no effect for incentive upon the ratings of temporal demand, frustration and perception of performance quality; descriptive statistics are provided in **Table 2**.

Scores on items from the DSSQ Motivation subscale had a high internal consistency (Cronbach's alpha = 0.88) so were collapsed into one index of subjective motivation. A demand (3) × incentive (2) repeated measures ANOVA revealed significant main effects for demand (F(2,18) = 29.42, p < 0.01, η <sup>2</sup> = 0.77) and incentive (F(1,19) = 15.16, p < 0.05, η <sup>2</sup> = 0.44). Post hoc T-tests indicated enhanced motivation at high demand (high vs. low: p < 0.01; high vs. excessive: p = 0.01). Motivation was also elevated when the incentive was present for all demand conditions (p = 0.01; **Table 2**).

#### EEG Theta Power

A 2 × 3 repeated measures MANOVA was conducted on theta power data from five frontal (F, FC) and AF sites (AFz, Fz, FCz, F1, F2). This analysis produced a main effect for demand (F(2,18) = 21.89, p < 0.01, η <sup>2</sup> = 0.71) and site (F(4,16) = 38.73, p < 0.01, η <sup>2</sup> = 0.91). A quadratic trend for demand was significant (F(1,19) = 19.71, p < 0.01, η <sup>2</sup> = 0.51) indicating maximum power at high demand. There was no effect of incentive on frontal theta power.

To locate the effects for demand paired sample T-tests were conducted on data that had been collapsed across the levels of site and incentive. Theta power was significantly elevated at high vs. low and excessive demand (p < 0.01). There was also a marginally significant increase of theta power during excessive compared to low demand (p = 0.05).

#### EEG Alpha Power (7.5–13 Hz)

To discern effects of the manipulations upon spectral power in the alpha band, repeated measures (2 × 3 × 5 × 2) ANOVAs with factors of incentive (incentive, no incentive) × demand (low, high, excessive) × site (frontal (F3, F4), parietal (P3, P4), occipital (O1, O2), central (C3, C4), temporal (T7, T8)) × hemisphere (left, right)


TABLE 2 | Mean and standard deviation (brackets) scores for the six NASA TLX Scales (mental demand, physical demand, temporal demand, frustration, effort and perception of performance) and the DSSQ motivation scale.

Inc., incentive; No inc., no incentive; N = 20.

were performed separately on lower and upper alpha band power.

The omnibus analyses for lower alpha band power (7.5–10 Hz) produced main effects for site (F(4,16) = 41.05, p < 0.01, η <sup>2</sup> = 0.91) and hemisphere (F(1,19) = 4.92, p < 0.04, η <sup>2</sup> = 0.21). Trend analysis showed a linear trend for hemisphere with reduced lower band power in right hemisphere (statistic as for effect). Interactions were also present in the analysis of lower alpha power for incentive × hemisphere (F(1,19) = 5.73, p < 0.03, η <sup>2</sup> = 0.23) and demand × site (F(4,82) = 4.01, p < 0.01, η <sup>2</sup> = 0.17). Post hoc tests indicated the incentive × hemisphere interaction was related to greater reduction of alpha power in right hemisphere during the incentive condition (p = 0.02). The demand × site interaction was linked to a reduction of lower alpha power at occipital sites during high compared to excessive demand (p = 0.03); lower alpha was also suppressed at high compared to low demand at temporal sites (p < 0.01). Summary statistics for the post hoc tests are presented in **Table 3**.

The omnibus ANOVA for upper alpha band (10.5–13 Hz) produced main effects for incentive (F(1,19) = 6.41, p < 0.03, η <sup>2</sup> = 0.25), demand (F(2,18) = 6.62, p < 0.01, η <sup>2</sup> = 0.42) and site (F(4,16) = 25.22, p < 0.01, η <sup>2</sup> = 0.86). There were significant linear trends indicating that upper alpha power decreased as demand increased (F(1,19) = 13.63, p < 0.01, η <sup>2</sup> = 0.42) and when the incentive was offered (statistic as for effect). Interactions were also present for incentive × hemisphere (F(1,19) = 6.81, p < 0.02, η <sup>2</sup> = 026) and demand × site (F(4,81) = 8.69, p < 0.01, η <sup>2</sup> = 0.31).

Post hoc T-tests revealed a reduction of upper alpha power when game coins were present (p = 0.02). Upper alpha was also suppressed at excessive compared to high and low demand (p < 0.01) and at high compared to low demand (p = 0.02)

TABLE 3 | Differences in power between levels of Tetris demand by region for lower alpha band (N = 20).


indicating a concomitant drop in upper alpha power as game demand increased.

Analysis of the demand × site interaction revealed a stepwise reduction of upper alpha power as demand increased at parietal, frontal and central sites. However, this demand effect was not apparent at occipital and temporal sites. Post hoc tests indicated that the hemisphere × incentive interaction was related primarily to a reduction in power during the incentive condition compared to the no-incentive condition in the right hemisphere (p < 0.01). The t-values and effect sizes for these post hoc tests are displayed in **Table 4**.

#### Discussion

This study was performed to assess the suitability of oscillatory EEG metrics for the real time monitoring of effort and cognitive demand during Tetris play. The results indicated frontal theta was robustly sensitive to objective game demand but that alpha activity only responded to demand at specific sites. For both frontal theta power and subjective motivation there were significant quadratic trends with maxima at high demand indicating that this level stimulated the highest subjective motivation and effort investment, as predicted by the MIM (**Figures 1**, **3)**. Upper alpha band (10.5–13 Hz) indicated a linear increase in cortical activation as the challenge of the game increased (**Figure 4**), which corresponded with the trend in subjective workload (**Table 2**). There was no main effect for either manipulation upon the lower alpha band (7.5–10 Hz) however, an interaction with site revealed sensitivity to demand

TABLE 4 | Differences in power between levels of Tetris demand by region for upper alpha band (N = 20).


over temporal and occipital areas of the scalp. The sensitivity of upper alpha activity to game demand was specific to frontal, central and parietal sites. In addition, upper alpha was the only frequency band to respond to the incentive coins (greater power reduction when game coins were present over the right hemisphere).

Augmentation of frontal theta has been widely reported in association with sustained attention, increased cognitive control and working memory (Gevins et al., 1998; Klimesch, 1999; Jensen and Tesche, 2002; Gevins and Smith, 2003; Sauseng et al., 2005; Cavanagh and Frank, 2014; Hsieh and Ranganath, 2014; Clayton et al., 2015). However, the decline of frontal theta power under conditions of excessive demand (**Figure 3**) has not previously been observed. The reproduction of this pattern in Tetris players provided an indication of the ecological validity of this metric and the ability of frontal theta to retain sensitivity to demand when generalized to spatial cognition in a gaming context. The capacity of frontal theta to act as a ''generic'' index of mental effort makes it an appropriate input to a closed-loop system since games typically use different elements of cognition at different stages of play. In addition, frontal theta demonstrated a degree of face validity owing to the similar pattern of modulation between EEG activity in this band and subjective motivation. The large effect sizes attest to the sensitivity of this measure and its capacity to discriminate between three or more categories of demand as well as detect the ''tipping point'' where effort is withdrawn due to overload (**Figure 1**).

Alpha power in the upper band, which is associated with taskspecific cognitive processes (Klimesch, 1999), was suppressed as demand increased from low to high to excessive levels (**Figure 4**); a finding supported by a significant body of literature on cortical activation (e.g., Pfurtscheller, 1992; Gevins et al., 1998; Fournier et al., 1999; Klimesch, 1999). However, this main effect did not extend to lower band power (an index of cortical arousal and alertness), instead an interaction between demand and site showed that sensitivity of lower alpha band was limited to occipital and temporal areas. The lessening of power in the

upper alpha band, and hence the level of cortical activation, was maximal during excessive demand despite a reduction of frontal theta power at this level. This suggests that upper alpha reflected the objective level of task demand upon spatial cognition (e.g., the processing of high numbers of fast moving stimuli in the form of falling Tetris blocks) whereas frontal theta represented the level of effort mobilization in the face of excessive demand (i.e., a withdrawal of effort). These findings suggested that a two-dimensional space could be created akin to the MIM (**Figure 1**) wherein demand is represented by upper alpha power and frontal theta power is used as an index of mental effort.

The sensitivity of the alpha band was found to vary across recording sites. Upper band effects occurred at frontal, central and parietal sites which provides some agreement with other studies linking these cortical areas with mental rotation—a key cognitive component of Tetris play (e.g., Inoue et al., 1998; Yoshino et al., 2000). Conversely, the effects of demand in the lower alpha band were restricted to temporal and occipital electrodes. In addition, the lower band revealed stronger activation in right hemisphere, which is traditionally associated with spatial tasks (Hellige, 1993), whereas the upper band indicated bilateral sensitivity to game demand. This regional variation indicates the importance of targeting the right cortical sites in order to maximize the sensitivity of the EEG metrics to the chosen psychological variables.

The results from the study identified two EEG measures as suitable inputs to a biocybernetic loop designed to control an adaptive game of Tetris. Frontal theta was selected to index mental effort due to its sensitivity to this variable, its reliability and its specificity (i.e., theta did not respond to the incentive+feedback manipulation), The sensor location Fz, which generally lies at the center of the scalp area associated with frontal theta augmentation was selected as the recording site. Power in the upper alpha band (10.5–13 Hz) was selected to index the level of task cognition; this variable was sensitive to the objective difficulty of the task and demonstrated a linear pattern over the three levels of demand in accordance with subjective workload ratings. There is also strong literature based support for the involvement of upper alpha band with task related cognition, including mental rotation (for a review see Klimesch, 1999). The right parietal site P4 was the chosen sensor input for the sampling of upper alpha oscillations. A parietal site was selected because in the first study parietal sites P3 and P4 detected sensitivity to game demand; central sites were also responsive but there were concerns that these would be subject to confounds from motor activity associated with game play. Although sensitivity was recorded at frontal sites this was smaller in magnitude than the parietal response to demand (**Table 4**). The choice of recording site was also constrained to the set of sites analyzed in study one i.e., frontal (F3, F4), temporal (T7, T8), central (C3, C4), parietal (P3, P4) and occipital (O1, O2) to preserve the validity of the psychophysiological inference regarding game-related cognition. Although there was no interaction of demand with hemisphere in the first study to guide this selection of site, the right hemisphere electrode P4 was selected on the basis of a robust association of right hemisphere with spatial cognition (Klimesch, 1999).

To summarize, the selection of the two EEG inputs to the biocybernetic loop made it possible to operationalize the adapted MIM (**Figure 1**) i.e., frontal theta was used to represent effort and parietal upper alpha to represent game demand (**Figure 5**). According to this conceptual model the desirable states of ''zone'' and ''engagement'' are associated with high effort while undesirable states are defined by low effort combined with high demand (overload) or combined with low demand (boredom).

## DEVELOPMENT OF THE REAL-TIME ADAPTIVE GAMING SYSTEM

The working biocybernetic loop was created from a network that involved the connection of two PCs; one PC that ran the adaptive Tetris Software and a second PC that hosted a virtual instrument (VI) constructed with LabVIEW. Raw EEG data were transmitted to the VI to be filtered and averaged prior to transformation into estimates of motivation and workload by a state classification algorithm. These estimates were defined in terms of the four states of boredom, engagement, zone and overload (**Figure 5**). If the state fell within the undesirable categories of boredom or overload, a signal would be transmitted to the adaptive Tetris Software in order to adjust the level of game demand. The components of this loop are illustrated in **Figure 6**.

EEG data was recorded monopolarly from two Ag-AgCl pin-type active electrodes mounted in a BioSemi head cap at the locations Fz and P4 (sites determined by the 10–20 system). AC differential amplifiers amplified signals at source

EEG measures (cortical activation is inversely proportional to alpha band power).

with continuous digitization at 16,384 Hz and online down sampling to 512 Hz. No filters were applied online to allow visual inspection of noise. The EEG signal was filtered using a Kasier Finite Impulse Response (FIR) of 2–30 Hz then subjected to a FFT in real time using a 2 s Hanning window. Theta activity between 4–8 Hz was obtained from the midfrontal electrode Fz and activity in the upper alpha band (10.5–13 Hz) was derived from right parietal site P4. The FFT calculated power spectra for each frequency band to generate total power values for each measure. These values were then converted to estimates of workload (upper alpha) and motivation (frontal theta).

For the operational model to trigger adaptations of game demand in real-time, it was necessary to select criteria for adaptation so that the four regions of the user state model could be defined (**Figure 5**). To maximize the effectiveness of adaptation, it was desirable to calibrate the criteria or trigger levels to individual players to counteract individual variability in the magnitude of EEG responses to game demand (Gevins et al., 1998).

The criteria for triggering adaptations of the Tetris interface were developed based upon patterns of theta and upper alpha oscillations that were observed relative to a baseline reading. Our participants were required to watch a relaxing video clip (Piferi et al., 2000) in order to establish baseline EEG levels of frontal theta and (parietal) upper alpha for each participant. Baselined derivatives of theta and alpha were captured in 5-s windows during subsequent game play. For example, if frontal theta activity increased or decreased from baseline by 100% in any 5-s window whilst parietal alpha increased or decreased by 100% then system adaptation may be triggered. In practice, frontal theta and parietal alpha were assessed every 5 s as the participant played the adaptive version of Tetris. If the system detected that frontal theta had decreased by 100% or more (from baseline) whilst parietal alpha had increased by 100 or more (from baseline), the player was assessed to be in a state of boredom (**Figure 5**). If the decrease of frontal theta was accompanied by a decrease of parietal alpha, the player was deemed to be in a state of overload.

A straightforward strategy for the adaptation of the game interface was used, i.e., reducing or increasing the drop speed of the falling Tetris blocks to manipulate game difficulty. Speed was increased if the player was deemed to be in a state of boredom and decreased if overload was detected (**Figure 5**). If neither of those states were detected by the system, the drop speed of the Tetris blocks was maintained. This assessment took place in 5-s epochs, hence the drop speed of the game increased or decreased over a period of play depending on the relative frequency of ''boredom'' or ''overload'' epochs that occurred within that period.

A series of pilot tests were conducted to determine an appropriate magnitude of the drop speed changes and whether or not to incorporate feedback of drop speed into the interface. The outcomes from these tests indicated that small adjustments without any overt feedback of drop speed were the most acceptable version of the Tetris interface from a user perspective. This design corresponded to a covert adaptive strategy where the adaptive process is expected to produce a gradual impact rather than an immediate impact on player state. This strategy was adopted in order to focus the attention of the players on the game as opposed to the ongoing activity of the biocybernetic loop.

# STUDY TWO: EVALUATION OF THE BIOCYBERNETIC LOOP

#### Introduction

A study was conducted to evaluate the adaptive Tetris game with respect to two questions: (1) does adaptation improve player experience compared to a manual adjustment of game demand; and (2) how does varying the reactivity of the biocybernetic loop (i.e., liberal vs. conservative trigger levels) impact upon player experience and the behavior of the closed-loop. The first question contrasts a covert, automated process of adjustment with a scenario where adjustments of game demand are both overt and manually instigated by the player. The second question pertains to the design of the trigger events for adaptation and how psychophysiological criteria can impact upon the process of system adaptation and the player experience.

# Method

#### Design

Three types of biocybernetic loop were compared: (a) a conservative system that produced an upward or downward adjustment of game demand (i.e., drop speed) when changes in frontal theta and parietal alpha substantially deviated from baseline (greater than 200%); (b) a liberal system that adjusted game demand in response to smaller deviations from baseline EEG activity (100%); and (c) a moderate system that responded to moderate changes in EEG (150%). It was anticipated that the conservative system would be the least reactive and would respond slowly and only to extreme examples of boredom and overload. By contrast, the liberal system was expected to make frequent adjustments and be the most responsive to instances of boredom/overload. For the fourth system, which operated under manual control, participants were required to speak aloud an instruction to increase (''higher'') or decrease (''lower'') the speed of the falling blocks. These adjustments were made in real-time by an experimenter sitting behind a screen in the laboratory. Ten participants played each of the four Tetris games (conservative closed-loop, liberal closed-loop, moderate closed-loop, manual) for 5 min. The order of presentation of each system was counterbalanced and participants were given a 5 min rest break between each game. Every game began on the slowest speed setting. If the blocks reached the top of the board and ''game death'' occurred, the game would restart with an empty board on the slowest speed setting. The procedure for the experiment and data collection protocol was approved by the LJMU University Research Ethics Committee and the experiment was conducted in accordance with the recommendations of this same committee.

#### Participants

Ten volunteers (6 females) participated in the evaluation session. A repeated measures design was used where each participant encountered each of the four versions of the system (conservative/liberal/moderate/manual). All participants were volunteers who gave their written informed consent prior to data collection in accordance with the Declaration of Helsinki.

#### Subjective Measures

Player experience was analyzed using subjective measures of mood and game immersion. The mood adjective checklist (UMACL; Matthews et al., 1990) assesses three components of mood: energetical arousal (EA: tired-alert), tense arousal (TA: relaxed-tense) and hedonic tone (HT: happy-sad). The UMACL was administered before and after each game to allow calculation of the change scores (post- minus pre-game) for each mood component. Participants also completed the Immersive Experience Questionnaire (IEQ) designed to capture the immersive quality of the gaming experience (Jennett et al., 2008); this scale was administered after each game.

#### Measures of System Behavior

Data were obtained in order to quantify the behavior of each version of the system. This enabled the three versions of the adaptive closed-loop to be contrasted with one another and an understanding to be acquired of how they differed from the manual control system. Three aspects of system behavior were measured for each system version:


#### Results

An ANOVA analysis was conducted on each of the three measures of system behavior (see **Table 5** below for the descriptive statistics). Each measure (mean freq. of increases/decreases in demand, mean freq. of game deaths (resets), average game difficulty) was subjected to a oneway ANOVA to assess statistical significance. The number of adjustments to increase task demand was significantly higher for the conservative system compared to the other three systems; unsurprisingly, all three biocybernetic systems exhibited a higher rate of upward adjustment compared to the manual system (F(3,7) = 79.40, p < 0.01). The analysis of downward adjustment (to decrease game demand) revealed that automated decreases of demand occurred more frequency during games played with the moderate and liberal versions of the biocybernetic loop (F(3,7) = 18.4, p < 0.01). The analysis of reset frequency indicated that resets were most common in the conservative system, however, this increase failed to reach statistical significance. The analysis of mean demand level indicated that difficulty was significantly lower for the liberal system compared to all other systems (F(3,7) = 12.3, p < 0.01).

The impact of system adaptation on the user experience was assessed using two types of subjective questionnaire; the IEQ and the UMACL mood adjective checklist. The UMACL was administered before and after each game session in order for us to calculate a change score that quantified the changes in the three components of mood: EA (alert-tired), TA (tense-relax) and HT (happy-sad). All three components were subjected to a one-way ANOVA; mean values are displayed in **Table 6**.

TABLE 5 | Mean values for measures of system adaptation across the four systems (N = 10).


Highest difficulty level = 10; lowest difficulty = 1.



The mean values for the changes in mood indicated some consistent trends, namely that participants found the game to be alerting and conducive to tension and negative affect. An ANOVA analysis of all three mood components revealed a significant effect for EA (energetic arousal) only (F(3,7) = 5.48, i.e., p < 0.05), i.e., participants found the experience of playing the conservative version of the biocybernetic game to be more alerting compared to the liberal version (p < 0.05). The analysis of responses to the immersion questionnaire was insignificant, but a trend was observed that participants found the manual version of the game to be the most immersive.

#### Discussion

This evaluation study demonstrated how the reactivity of the biocybernetic loop affected the performance of the system and the experience of players.

The analysis of system behavior revealed that the conservative system provided the greatest level of challenge, i.e., it produced the highest average level of demand and made the largest number of adjustments to increase game demand. This skew towards increased adjustment of demand was mirrored by the liberal system, which tended to adjust difficulty in the opposite direction, such that the liberal version produced the lowest number of game deaths and lowest average level of demand. The moderate system produced a pattern of upward and downward adjustments that represented a midpoint between that of the conservative and liberal systems. As anticipated, the number of adjustments made manually by participants was lower than the numbers produced by the biocybernetic loop as they tended to simply increase the level of difficulty to their preferred level early in the game without making any subsequent adjustments. The mean level of difficulty during play on the manual system (included as benchmark to compare with the adaptive protocols) provided an indication of the optimal level of demand for the group (3.3). By contrast the conservative system generally pushed the players to a higher level of demand (3.8), resulting in the greatest number of game deaths; the moderate and liberal systems tended to set difficulty at a lower level than the manual system on average. Therefore, the three adaptive systems and their respective triggers tended to either over- or undershoot the mean level of demand that was preferred by our participants.

It was noteworthy that the conservative system produced a large number of upward adjustments in game demand (63.6) suggesting that this system was detecting boredom via the EEG (i.e., a 200% decrease in theta and increase in alpha relative to baseline; **Figure 5**). Boredom may have resulted from games starting on the slowest drop speed setting and the return to the slowest speed when the game was reset (**Table 5**). By contrast, the liberal system produced more downwards than upwards adjustments even though games started at the easiest level, meaning on some occasions where the trigger criteria for a downward adjustment was fulfilled the interface was unable to slow the speed because the player was already at the lowest level. The liberal system was also detecting more player overload than the other two adaptive systems (more downward adjustments), i.e., the trigger criteria of a 100% decrease in theta and alpha power relative to baseline were fulfilled the most frequently (**Figure 5**). This is surprising in view of the low levels of demand delivered by the liberal system. One explanation may be that the EEG indicators of overload used were incorrect and that simultaneous decreases in alpha and theta power of around 100% indicate low effort (reduced frontal theta) combined with low levels of sensory processing (reduced parietal alpha) instead of overload (Klimesch, 1999). It may be that deviations of 200% or more are required to indicate overload where excessive demand leads to a high level of alpha power suppression, as occurred when demand was excessive in the first study (**Figure 4**). This underlines the importance of not only selecting the best combination of input measures for the biocybernetic loop, but also of defining accurate trigger criteria in terms of the relative magnitudes of the input variables.

It was expected that player experience would be affected by the different outcomes in system behavior between the four versions of the system. However, there were few statistically significant effects on mood and immersion. Alertness was enhanced under the conservative system relative to the liberal system but there were no other significant effects. Of the four systems analyzed the conservative version of the loop produced the most desirable overall impact on player mood, i.e., it evinced the greatest increase in arousal and least negative affect which may be because participants were too challenged to dwell upon their emotional state. This may be because the conservative system was the most successful at detecting boredom and alleviating it with increases in demand. Conversely ratings of immersion in the game were greatest for the manual system, which may reflect the impact of taking momentary breaks from the game to voluntarily control difficulty with a verbal instruction. Even very short breaks from a task are known to increase vigilance performance (Ariga and Lleras, 2011) and opportunities for control can increase the intrinsic motivation for a task (Fisher, 1978). Alternatively, it may be argued the level of demand during the manual control condition was optimal for enhancing immersion. The observation that play on the game increased negative affect under all but the conservative system was unanticipated and may reflect the impact of the lower levels of challenge experienced by participants.

Based upon the results, it would appear that the criteria used to define the three versions of the biocybernetic loop were too similar to evince much difference in player experience. It may be speculated that if players were provided with more time to experience play upon each version of the system they may have been better able to differentiate their respective experiences.

The biocybernetic loop employed a straightforward linear process of calibration to the individual instead of machine learning algorithms. The rationale for this approach was that our psychophysiological measures, EEG theta and upper alpha frequency power, had been validated prior to construction of the loop—and we wished to preserve the transparency of both measures and criteria when testing the working loop. However, there may have been scope to use machine learning during calibration such that more precise linear models may have been generated especially for each participant.

The results highlighted a number of questions surrounding the evaluation of working biocybernetic systems, particularly with respect to the benchmarking of system performance. In this study, a manual system was selected as the benchmark for comparative purposes on the assumption that participants would tailor gameplay to their personal preference. However, this comparison was asymmetrical because the locus of system control for a manual system resided with the user while control was automated within the biocybernetic loop. This is a significant factor when comparing player experience across automated and manual systems since the opportunity for control over a task (as provided by the manual system) is known to affect the level of engagement with that task (Fisher, 1978; Wright and Kirby, 2001). Comparisons with other autonomous systems may therefore be more informative. For example, benchmarking against a system that adapts game demand in a random fashion without an objective rationale, or by using a ''yoked'' system where the game responds to the physiology of another individual (Bailey et al., 2006). Either of these options may have provided a more parsimonious comparison with the three versions of the working biocybernetic loop.

# GENERAL DISCUSSION

This article has described the process of creating a working biocybernetic loop whereby hypotheses derived from experimental work on EEG were first validated in a gaming context in order to select the input measures for the loop. Predictions regarding the modulation of EEG frontal theta and alpha power by variations in the level of cognitive demand and effort were validated during Tetris play; subsequently an adaptive game of Tetris was built that used a biocybernetic loop with the EEG measures tested during the validation stage. Our development process for this prototype exemplifies the principle of designing interactive technologies based upon a theory-driven process of psychophysiological inference (Fairclough, 2009).

The evaluation of autonomous, closed-loop control systems raises important issues for the development of biocybernetic adaptation. The relationship between criteria or categories of psychophysiological activity and the triggering of adaptive responses at the interface requires careful design. The derivation of valid input measures and effective categorization of psychophysiological data in real-time is one stage of this process. Once a method of categorizing the states of the user has been defined (**Figure 5**), these classes must be mapped onto appropriate responses at the interface. This mapping reflects more than a simple linkage between state x and response y; decisions must be made regarding the frequency and likelihood of those responses as well as the temporal characteristics and relative magnitude of the adaptations. As was demonstrated in the evaluation study, once a working biocybernetic loop has been constructed, responses may be adjusted to optimize the user experience, a process that inevitably involves exploring the interaction between the user and the adaptive response. The behavior of the biocybernetic loop and the interaction between user psychophysiology and adaptive control is an object of study in itself.

Together these two studies provide a potential blueprint for the development and evaluation of a biocybernetic loop. However, further research is required to incorporate psychophysiological theory into the design of physiological computing systems and to develop an effective methodology for system evaluation.

#### REFERENCES


#### AUTHOR CONTRIBUTIONS

The design and experimental protocol used for Study 1 was developed by KCE and SHF. The Tetris game used in Study 1 was built by KG. Data collection and analysis for Study 1 was performed by KCE. The design and experimental protocol for Study 2 was developed by KG and SHF. Build of the biocybernetic loop and adaptive game of Tetris for Study 2 was performed by KG as was the data collection and analysis for this study. The manuscript was written and edited by KCE (primary author), SHF and KG.

## FUNDING

This research was funded by a grant from the REFLECT Project; the European Union's Future and Emerging Technologies Scheme, 7th Framework Programme (FP7). Grant number 215893. Project website: http://reflect.pst.ifi.lmu.de/.


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Ewing, Fairclough and Gilleade. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Neural Mechanisms of Inhibitory Response in a Battlefield Scenario: A Simultaneous fMRI-EEG Study

Li-Wei Ko1,2,3\*, Yi-Cheng Shih1,2 , Rupesh Kumar Chikara2,3 , Ya-Ting Chuang1,2 and Erik C. Chang<sup>4</sup> \*

1 Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, Taiwan, <sup>2</sup> Brain Research Center, National Chiao-Tung University, Hsinchu, Taiwan, <sup>3</sup> Department of Biological Science and Technology, National Chiao Tung University, Hsinchu, Taiwan, <sup>4</sup> Institute of Cognitive Neuroscience, National Central University, Taoyuan, Taiwan

The stop-signal paradigm has been widely adopted as a way to parametrically quantify the response inhibition process. To evaluate inhibitory function in realistic environmental settings, the current study compared stop-signal responses in two different scenarios: one uses simple visual symbols as go and stop signals, and the other translates the typical design into a battlefield scenario (BFS) where a sniper-scope view was the background, a terrorist image was the go signal, a hostage image was the stop signal, and the task instructions were to shoot at terrorists only when hostages were not present but to refrain from shooting if hostages appeared. The BFS created a threatening environment and allowed the evaluation of how participants' inhibitory control manifest in this realistic stop-signal task. In order to investigate the participants' brain activities with both high spatial and temporal resolution, simultaneous functional magnetic resonance imaging (fMRI) and electroencephalography (EEG) recordings were acquired. The results demonstrated that both scenarios induced increased activity in the right inferior frontal gyrus (rIFG) and presupplementary motor area (preSMA), which have been linked to response inhibition. Notably, in right temporoparietal junction (rTPJ) we found both higher blood-oxygen-level dependent (BOLD) activation and synchronization of theta-alpha activities (4–12 Hz) in the BFS than in the traditional scenario after the stop signal. The higher activation of rTPJ in the BFS may be related to morality judgments or attentional reorienting. These results provided new insights into the complex brain networks involved in inhibitory control within naturalistic environments.

Keywords: electroencephalography (EEG), function magnetic resonance imaging (fMRI), inhibitory control, thetaalpha band, right temporoparietal junction (rTPJ)

# INTRODUCTION

Inhibitory control is a crucial aspect of cognitive control processes. It allows one to stop ongoing action when it is deemed inappropriate (Aron, 2007). Bari and Robbins (2013) suggested to divide inhibitory control into two categories: cognitive inhibition and behavioral inhibition. Cognitive inhibition can be defined as ''the stopping or overriding of a mental process, in whole or in part, with or without intention'' (MaCleod, 2007), and is usually measured by the interference task (Kipp, 2005; Leroux et al., 2006). In contrast, behavioral inhibition,

#### Edited by:

Klaus Gramann, Berlin Institute of Technology, Germany

#### Reviewed by:

Marcus Heldmann, University of Lübeck, Germany Scott Edward Kerick, US Army Research Laboratory, USA

#### \*Correspondence:

Li-Wei Ko lwko@mail.nctu.edu.tw; Erik C. Chang auda@ncu.edu.tw

Received: 16 December 2015 Accepted: 11 April 2016 Published: 02 May 2016

#### Citation:

Ko L-W, Shih Y-C, Chikara RK, Chuang Y-T and Chang EC (2016) Neural Mechanisms of Inhibitory Response in a Battlefield Scenario: A Simultaneous fMRI-EEG Study. Front. Hum. Neurosci. 10:185. doi: 10.3389/fnhum.2016.00185 which is the focus of the current study, refers to the suppression of actualizing behavioral outcome, and can be measured by the stop signal task (SST) or go/no-go task (GNGT). Both SST and GNGT use frequent go trials which require participants to perform an action (e.g., press a key button) and infrequent stop (no-go) trials which requires participants to inhibit preparative action (e.g., not to press a key button) upon receiving an additional SST or a different target stimulus (GNGT).

Previous studies have adopted either GNGT or SST to explore the neuroanatomical loci and temporal characteristics of associated brain activities with functional magnetic resonance imaging (fMRI) and electroencephalography (EEG), respectively. In the neuroanatomical domain, many studies found that the prefrontal gyrus (PFG) is important for executive control (for a comprehensive review, see Miller and Cohen, 2001). Consistent activation for response conflict, novelty, working memory (number of elements and delay) and perceptual difficulty has been observed in the inferior frontal gyrus (IFG), dorsal anterior cingulate gyrus (ACG), dorsolateral prefrontal gyrus (DLPFG), but not other frontal regions, regardless of the specific contrast task (Duncan and Owen, 2000). Aron et al. (2004) concluded that the right IFG (rIFG) was more closely related to inhibitory control because damage of the rIFG crucially affected performance in executive cognitive control paradigm, apparently by disrupting inhibition.

However, a number of studies have also proposed that the rIFG is recruited across different task conditions that require sustained attention (Shallice et al., 2008a,b; Simmonds et al., 2008). Hampshire et al. (2010) also suggests that the rIFG serves a general role in attentional control, which rapidly adapts in order to respond to relevant and salient stimuli related to inhibitory control in GNGT and SST. Hence, the suppression of an already initiated response likely depends on rIFG, yet exactly how the inhibitory function is manifested in the motor system remained to be investigated. On the other hand, Aron and Poldrack (2006) had shown that the subthalamic nucleus (STN), which is a part of the basal ganglia, may play a role to suppress the ''direct'' fronto-striatal pathway that is activated by response initiation and also involved the presupplemetary motor area (preSMA). The findings by Mostofsky et al. (2003) suggest that the preSMA appears necessary for inhibiting unwanted movements (stop or no-go condition). Based on previous studies (Aron and Poldrack, 2006; Nachev et al., 2008; Verbruggen and Logan, 2008), Duann et al. (2009) had applied Granger causality analysis in an fMRI study on stopsignal task to explore the functional connectivity of IFG and preSMA. Their study found that preSMA and primary motor gyrus (PMG) have functional interconnectivity via the basal ganalia circuitry to mediate response inhibition, whereas IFG connects with preSMA to modulate the basal ganglia circuitry. According to Duann et al. (2009), the PMG is mediated by IFG and preSMA via basal ganalia circuitry and the functional connectivity between IFG and preSMA is ''bi-directional'' in SST. Recently, IFG has been hypothesized to serve various functions including resolution of stimulus conflict, attentional orienting or the monitoring of behavior. Consequently, results from some studies have suggested that preSMA is more directly related to response inhibition than IFG, given its involvement in motor control (Bari and Robbins, 2013; Obeso et al., 2013; Aron et al., 2014).

Although the imaging studies are informative about the neuroanatomical loci of response inhibition in the brain, equallythe brain, equally important is how the inhibitory process evolved across time upon its inception. Huster et al. (2013) reviewed EEG studies on the response inhibition under GNGT and SST. Most empirical reports mainly examined event-related potentials (ERP), and it is commonly observed that both stop and no-go conditions evoked two different ERP components which are usually absent in the go condition: a fronto-central negativity occurring around 200–300 ms after stimulus onset (stop or no-go stimulus), followed by a positive potential with a delay of approximately 150 ms exhibiting a fronto-central to centro-parietal topography. These two components have often been conjointly referred to as the N2/P3 complex. Nevertheless, N200s and P300s were also evoked in a broad range of paradigms, including but not limited to response inhibition (e.g., SST, GNGT, Stroop task, Flanker task and Simon task; Kopp et al., 1996; Liotti et al., 2000; Falkenstein et al., 2002; Nieuwenhuis et al., 2004; Ramautar et al., 2006; Johnstone et al., 2007; Bruchmann et al., 2010).

To more specifically determine the temporal marker(s) for response inhibition, an alternative way of analyzing EEG data is through time-frequency analysis for uncovering the oscillatory components involved in inhibitory response (e.g., Herrmann et al., 2005). Basar et al. (1999) demonstrated that EEG can be investigated in the frequency domain and oscillations of specific frequencies are related to specific cognitive functions, such as alpha band (8–12 Hz) fluctuations during both sustained and directed attention (Mathewson et al., 2014). While ERP analysis generally compares latencies or magnitudes of components elicited by different conditions (e.g., go condition vs. stop or no-go conditions), in time-frequency analysis the oscillations of frequency bands associated with different conditions are usually compared. Recently, a number of studies have applied time-frequency analysis in response inhibition tasks. The most common findings from these time-frequency analyses are a burst in frontal-midline theta power for no-go and stop signal conditions as compared to the go condition between 200 and 600 ms after the no-go or stop signal presentation, which falling well into the time range of N2/P3 complex (Huster et al., 2013). In addition, Schmiedt-Fehr and Basar-Eroglu (2011) also reported activity in the delta power for the same time window using a GNGT. These time-frequency components seem to more specifically associated with response inhibition.

Most studies explored the ''inhibitory network'' by using stimuli with simple configuration in SST or GNGT (e.g., circle as the go stimulus, and an ''X'' as the stop (no-go) signal; Chang et al., 2014; Lavallee et al., 2014) to investigate the properties of the inhibitory network. How this inhibitory network for typical SST generalizes to response inhibition in more realistic scenarios remains to be investigated. The generalizability issue is not new in cognitive experiments, and not many studies have explored how well cognitive phenomena established in simple scenes can be generalized to more complex and realistic ones. Lapenta et al. (2014) used transcranial direct current stimulation (tDCS) to explore inhibitory control of EEG under food craving using realistic food picture as go signal in GNGT. They first induced the participant's food craving by a brief movie showing scenes of food and then required the participant to complete a visual analog scale for appearance, smell and taste of the exposed food. All participants were then required to perform GNGT twice: one was performed with active tDCS at F4 and F3 (10–20 EEG coordinate system) and the other with sham tDCS. Their results indicated that tDCS reduced magnitude of frontal N2 component but enhanced the P3a component, as compared with the sham condition. Regenbogen et al. (2010) used real and virtual computer game scenarios to compare the pattern of brain activation between gamers and non-gamers. They analyzed fMRI data by contrasting different combination of conditions, including Violent vs. Nonviolent scenarios under real and virtual modality for gamers and non-gamers, respectively. The activity pattern of non-gamers under the contrast Violent vs. Nonviolent is more complex than gamers in both real and virtual scenes. More importantly, when the neural activities of real modality were compared with virtual modality between gamers and non-gamers, they found non-gamers have more activated brain regions when contrasting Violent vs. Nonviolent conditions, and when contrasting Real vs. Virtual scenes. Based on the findings above, it seems that real and virtual scenes may recruit the brain in distinct ways.

Given the paucity in literature exploring response inhibition by combining methods with high spatial and temporal resolutions, by applying time-frequency analyses, and by contrasting performance under simple vs. naturalistic scenarios, the current study aims to compare behavioral performance and neural mechanisms of inhibitory response under simple and realistic scenarios with simultaneous recording of EEG and fMRI. Scenes from a well-known shooting game ''Count Strike'' were adopted as the visual background in the battlefield scenario (BFS), where image of a ''terrorist'' holding a gun served as the go sign, and a ''hostage'' image as the stop signal. Besides higher extent of visual complexity, this scenario is supposed to induce stressful feelings in the participants. As a control condition, the conventional SST in which simple symbols represent go and stop signals, namely a symbol scenario (SBS), was also adopted. In order to investigate both the rapid brain dynamics and precise spatial loci of the inhibitory process, simultaneous fMRI and EEG recordings were carried out to acquire signals of brain activation from sources with high spatial and temporal resolutions, respectively. Comparing to independent recording of fMRI and EEG, simultaneous fMRI-EEG can confirm that the characterization of functional activations and frequency oscillations of brain networks are under the same experimental condition, and thus more likely the same neural networks (Mulert, 2013). The current study examined significant differences in fMRI and EEG responses associated with successful-stop (SS) vs. successful-go (SG) trials to identify inhibition-related brain activations/dynamics, and SS vs. fail-stop (FS) trials to identify error-related brain activations/dynamics (Li et al., 2006; Boehler et al., 2010; Swick et al., 2011). Based on the literature of inhibitory control reviewed above, we predict that preSMA will show fMRI activation and modulations in theta-alpha band power under both scenarios of SST. However, for the comparison between SBS and BFS, it remains an empirical question whether additional neural networks related to cognitive processing of emotional or social information, such as amygdala or middle temporal gyrus would be involved.

# MATERIALS AND METHODS

## Participants

All participants (n = 35; mean age = 23.39; SD = 1.86) were right-handed, had normal or corrected-to-normal vision, and none reported history of neurological or psychiatric disorders. Each participant provided written informed consent approved by the Research Ethics Committee of the National Taiwan University prior to participation. Data from three participants were excluded from analyses due to low performance in SST (SG ratio is lower than 2SD below the group mean). Among the remaining participants, simultaneous fMRI-EEG data were successfully acquired from 11 participants, and 21 participants only have fMRI data. Therefore, the fMRI results were based on 32 datasets, whereas the EEG results were based on 11 datasets. Although there were only 11 participants for the EEG analysis, given that each participants made responses to 105 trials, the total amount of epochs is 1155. These epochs are distributed into the four conditions (SG = 705, SS = 170, FG = 92, FS = 188). We consider this amount of epochs are sufficient for our EEG analyses.

# Experimental Design

The experiment implemented the stop-signal task under two different scenarios (**Figure 1**), where one consisted of simple symbol (i.e., SBS) and the other battlefield images (i.e., BFS). Every participant was asked to respond to a go stimulus (a circle for SBS and a ruffian for BFS). They hold their response (stop stimulus), when appeared (a cross for SBS and a hostage for BFS), when it was presented after the go stimulus. The critical stop signal delay (cSSD), which is approximately 50% probability of SS, was measured by using a staircase tracking procedure before they performed formal experimental trials in the scanner. The staircase tracking procedure worked in the following way: SSD started at 150 ms and if the participant successful-stopped their response, SSD would increase by 50 ms; on the contrary, SSD would reduce 50 ms and the lower bound of SSD was 150 ms. The formal task used five different SSDs (cSSD, cSSD ± 40 ms, and cSSD ± 80 ms) and each SSD had equal number of trials. The participant performed four runs in the fMRI experiment and each run was equally divided into one half for BFS and the other for SBS, where the order of scenario was completely counterbalanced across runs. Each block of scenario in a run had 105 trials of which 25% were stop trials while the rest were go trials. Each go trial began with a fixation cross lasting for a random duration (0.5–6.5 s), followed by a go-signal lasting for 1 s or until response. In a stop trial, the stop-signal is presented

N milliseconds after the go-signal, where N was defined by the SSD assigned to that trial.

# fMRI Signal Acquisition and Preprocessing

Participants performed the task in a Siemens 3T MAGNETOM Skyra scanner located in the Taiwan Mind and Brain Imaging Center at National Chengchi University, Taipei. Structural T1 weighted images were acquired using the MPRAGE sequence (TR: 2530 ms; TE: 3.03 ms; flip angle: 7◦ ; matrix size: 224 × 256; field of view: 224 × 256 mm; in-plane resolution: 1 × 1 mm; slice thickness: 1 mm; 192 slices). Functional brain images were acquired using a gradient echo-planar imagine sequence (TR: 2000 ms; TE: 25 ms; flip angle: 90◦ ; matrix size: 64 × 64; field of view: 220 × 220 mm; voxel size: 3.438 × 3.438 × 4.0 mm<sup>3</sup> ; 292 volumes per run). The preprocessing stream as well as statistical analyses was completed using the Analysis of Functional Neuroimages (AFNI) software (Cox, 1996). The preprocessing stream included image reconstruction, slice-time correction (time-shifting the time series using Fourier interpolation), and motion-correction (linear least-squared alignment via affine transformation with three translational and three rotational parameters). Activation outside the brain was removed using edge detection techniques. After the preprocessing, each participant's anatomical image was transformed into the standard space of the Montreal Neurological Institute (MNI) 152 brain template using an automated feature-matching algorithm (Collins et al., 1994). Each participant's functional data was first aligned to their own anatomical image and then transformed into the standardized MNI space.

# EEG Signal Acquisition and Preprocessing

An MR-compatible 34-channel amplifier (BrainAmp MR; Brain Products) and a MR-compatible EEG cap (BrainCap-MRI 32-Channel-Standard) with a head volume coil were applied in this study. EEG was recorded in the MR scanner room simultaneously with fMRI acquisition. The EEG cap had 31 electrodes for brainwave recording and one for electrocardiography (ECG) recording. Electrode-skin impedance was kept smaller than 10 kOhms by using abrasive electrolyte-gel (ABRALYT HiCl). Data were transferred through fiber-optic cables to an IBM-compatible laptop and recorded by the BrainVision Program (BrainVision Recorder, Brain Products) synchronized with the BOLD signals via triggers from the MR scanner. The EEG signals were recorded with a passband of 1–250 Hz, digitized at 5000 Hz with 32-bit of resolution (equivalent to 0.5 µV; dynamic range: 16.38 mV). The EEG data were band-pass (1–50 Hz) filtered, re-referenced to the average of channel TP9 and TP10. The MR gradient artifacts in the EEG data were corrected. The MR-denoised EEG data were then downsampled to 500 Hz, and the cardioballistic signals from the ECG recording were used to adjust EEG signals via peakdetection algorithms in the BrainVision Analyzer software. Severe artifacts of EEG signal induced by muscle activities, environmental noise, eye movements, and blinking were manually removed to minimize their impacts on the subsequent analysis.

## Behavioral Data Analysis

We calculated SG and SS ratio of both scenarios to verify if each participant's performance met the criterion. Behavioral characteristics of performance in the stop-signal task, including the go reaction time (Go-RT) and cSSD were analyzed with student's t test (BFS vs. SBS). Furthermore, the stop-signal reaction time (SSRT) based on the horse-race model of stopping (Logan et al., 1984) was computed to represent one's inhibitory ability. Since the stopping mechanism itself cannot be directly measured, the SSRT was calculated by subtracting SSD from the Go-RT. The inhibition function was computed as the number of SS trials divided by the number of all stop trials, and subjected to a two-way withinsubject ANOVA to assess the effect of Scenarios (BFS vs. SBS), SSD (cSSD, cSSD ± 40 ms, and cSSD ± 80 ms), and their interaction.

#### fMRI Data Analysis

The fMRI analysis was also completed in AFNI. Stimulus types and participant's response conjointly determined four conditions for each scenario, including SG, SS, FS and fail go. The first-level statistical analysis for each participant was carried out in a general linear model (GLM) by convolving the onset of go stimulus in the SG, SS, and FS conditions, respectively, with a canonical hemodynamic response function (the BLOCK function in 3dDeconvolve of AFNI). Here the effects of interest are inhibitory control and error detection. The active brain areas for inhibitory control was defined by the contrast between SG and SS; on the other hand, the active brain regions for error detection was defined by the contrast between SS and FS. The scenario effect of inhibitory control and error detection were examined by comparing the ''difference of difference'', namely (SS − SG)BFS − (SS − SG)SBS and (FS − SS)BFS − (FS − SS)SBS, respectively. In the second level analysis, the between-scenario differences were analyzed with a linear mixed-effect model (3dMEMA), and the whole-brain type I error was controlled at a cluster threshold (alpha) of 0.05 via Monte Carlo simulation (3dClustSim).

To more sensitively detect activations associated with inhibitory control and error detection, we also carried out region of interest (ROI) analysis by both adopting ROIs related to stop-signal task in the literature (literature-based ROIs) and by selecting ROIs surviving the whole-brain analysis from the inhibitory control and error detection contrasts, respectively, regardless of scenarios (empirical-based ROIs). For the empirical-based ROIs, the leave-one-subject-out (LOSO) method (Esterman et al., 2010) was applied to extract the GLM coefficients, and the differences between scenarios were statistically assessed. It turned out the literature-based ROIs did not yield any significant difference between scenarios and will not be further described. On the other hand, six ROIs empirically identified from the whole brain analysis of inhibitory control and error detection, respectively, regardless of scenarios were analyzed to verify the between scenario difference. Empirical-based ROIs for inhibitory control included rIFG, left insula, preSMA, left inferior parietal gyrus (IPG), right middle occipital gyrus (rMOG) and left MOG. ROIs for error detection included right middle frontal gyrus (rMFG), left IFG, right IPG, right superior temporal gyrus (STG), right inferior occipital gyrus (IOG) and left MOG. The empirical MNI coordinates of inhibitory control and error detection were listed in the Supplementary Materials Table 1, while the literature-based ROIs were listed in the Supplementary Materials Table 2.

# EEG Data Analysis

The EEG analysis was completed in EEGLab. Independent Component Analysis (ICA; Makeig et al., 1996; Delorme and Makeig, 2004) was used to separate out temporally independent time course of the activation of which dipole source location (Oostenveld and Oostendorp, 2002) was localized in the brain of each participant for group analysis (crosssubject analysis). We removed artifact components manually and then performed component clustering based on k-means (k = 5) criteria and dipole-fitting coordinates to identify the most representative clusters. The value of k was determined both by considering potential number of sources associated with the stop-signal task, and the number of ROIs identified in the fMRI results. One of the five resultant clusters was excluded because less than 70% of participants have it. Therefore, four clusters (preSMA, rMFG, and bilateral MOGs) and their dipole locations were identified (see **Figure 2**) to investigate brain dynamics following the go events and the subsequent stop events. Note that the preSMA and bilateral MOG clusters were in anatomical proximity of the inhibitory control ROIs of fMRI results, and the rMFG and the left MOG cluster was close to the error detection ROIs of fMRI results.

Each epoch was separately transformed into the timefrequency domain using the event-related spectral perturbation (ERSP) routine (Delorme and Makeig, 2004). Three conditions, namely SG, SS and FS, were identified as the effect of interest. The baseline was defined as the signals between −0.5 and 0 s before Go-stimulus for comparing response magnitudes of corresponding epochs. A two-way Scenario × Condition ANOVA was conducted on the baseline data to verify whether they are equivalent across scenarios and conditions. We have explored not only the power spectrum of each condition, but also the power spectrum of inhibitory control and error detection, respectively, in each scenario which was also done in the fMRI analyses.

FIGURE 2 | Clusters of dipole locations for the analysis of EEG dynamics. PreSMA and rMFG are for of inhibitory control and error detection, whereas lMOG and rMOG are used for processing visual stimul. Small spheres indicate individual participant's dipole location, while large spheres indicate diploe locations of each cluster. lMOG, Left middle occipital gyrus; rMOG, Right middle occipital gyrus; preSMA, Pre-supplementary motor area; rMFG, Right middle frontal gyrus.

## RESULTS

#### Behavioral Results

In SBS, the Go-RT, cSSD, SSRT, SG ratio and SS ratio of SBS were 425 ± 62 ms, 188 ± 50 ms, 240 ± 60 ms, 94.0 ± 7.6% and 45.6 ± 16.5%, respectively. In BFS, the Go-RT, cSSD, SSRT, SG ratio and SS ratio of SBS (BFS) were 422 ± 58 ms, 195 ± 68 ms, 230 ± 53 ms, 93.0 ± 10.3% and 45.6 ± 13.6%, respectively. When compared between scenarios, none of these behavioral outcomes reached significance (all ps > 0.05; **Figure 3**). In addition, the averaged inhibition function approached 50% at cSSD and error rate level increased with the length of SSD (**Figure 3**).

#### Imaging Results

#### Inhibitory Control

#### **Whole Brain Analysis**

**Tables 1A,B** summarized brain regions that were more activated in the SS than in the SG condition, namely the inhibitory control component, under SBS and BFS, respectively. **Figure 4** also shows these activations under the two scenarios conjointly so that overlapping brain regions are explicit. In SBS, the MOG and a few different frontal areas were activated in this contrast (see **Table 1A** and **Figure 4** right panel). On the other hand, the brain areas activated by the BFS (see **Table 1B** and **Figure 4** left panel) was similar to those in SBS (see the purple regions colored in purple in the **Figure 4** middle panel). Moreover, when directly contrasting the two scenarios under inhibitory control, the only significant

number unsuccessful stop trials with all stop trials under each SSD.

loci (BFS > SBS) fell within the right temporal-parietal junction (rTPJ; MNI: x = 48, y = −74, z = 11; cluster size = 39).

#### **ROI Analysis**

Pairwise t tests between the BFS and SBS in the six empirically defined ROIs for inhibitory control revealed significantly higher activation in BFS than in SBS at the left IPG (t(32) = 2.4, p = 0.02) and rMOG (t(32) = 2.5, p = 0.02).

#### Error Detection

#### **Whole Brain Analysis**

**Tables 1C,D** summarized brain regions more activated in SS than fail stop under SBS and BFS, respectively. **Figure 5** also shows these activations in both volumetric (left and right panels) and surface (middle panel) views as described in section ''Inhibitory Control'' for the inhibitory control. In SBS, the MOG, the bilateral IFG, the rMFG and right postcentral gyrus were activated in this contrast (see **Table 1C** and **Figure 5** right panel). On the other hand, the rMFG, left IFG, the right IPG, the fusiform gyrus, the right precuneus and the rMOG were activated by the BFS (**Table 1D** and **Figure 5** left panel). There was very few overlapping brain regions (purple regions in the middle panel of **Figure 4**). When directly contrasting the two scenarios under error detection, no region showed significant difference.

#### **ROI Analysis**

Paired t tests between the BFS and SBS in the six ROIs mentioned above revealed only significantly higher activation in BFS than in SBS at right IOG (t(32) = 2.7, p = 0.01).

#### EEG Results

**Figure 2** shows the four clusters (rMFG, preSMA, and bilateral MOGs) and their dipoles that fulfilled the cluster selection


TABLE 1 | Brain regions more activated in (A) SS compared with SG under SBS, (B) SS compared with SG under BFS, (C) SS compared with FS under SBS, (D) SS compared with FS under BFS.

Voxelwise threshold, p = 0.0001; cluster alpha < 0.01; BA, Brodmann Area; R, Right; L, Left.

criteria (see ''EEG Data Analysis'' Section). Because the rMFG is considered as a crucial area for sustaining attention rather than stopping action and preSMA is considered as directly related to response inhibition, preSMA and rMFG were subject to the analysis at the time period when sustained attention and response inhibition were supposed to be ongoing. On the other hand, because bilateral MOGs were considered only relevant to visual perception that are relatively minor to the stop-signal task, their power ERSPs were analyzed at the time period of visual processing and described in the supplementary materials (Supplementary Figures 5, 6). For the rMFG and preSMA clusters described in the main text, the focus is on the contrasts for inhibitory control (i.e., SS vs. SG) and for error detection (i.e., FS vs. SS) in each scenario. The significant modulations within the individual conditions (i.e., SS, SG, and FS) can be found in **Figures 6**–**9**, which are mainly described in the following sections.

The baseline power of EEG oscillations were supposed to be equivalent between the SS and SG conditions as well as between the FS and SS conditions in both scenarios, because participants should be under similar state before the presentation of stimulus in each condition. Consistent with this assumption, one-way ANOVAs comparing SS and SG in BFS and SBS found no significant difference, and so did the one way ANOVAs comparing FS and SS. The analyses of baseline power are described in the Supplementary Figures 2, 4.

#### Inhibitory Control

**Figures 6**, **7** show the results of time-frequency analyses in preSMA and rMFG, respectively. In the preSMA component (**Figure 6**), the brain dynamics for inhibitory control can be examined by contrasting the SS and SG conditions. In this contrast, the burst of delta and theta band power was observed in BFS, whereas the suppression of alpha and beta band power was observed in SBS.

In the rMFG component (**Figure 7**), the brain dynamics for inhibitory control (SS vs. SG) showed delta, theta and alpha band power desynchronization after response in BFS, whereas

FIGURE 4 | Inhibitory control related brain areas. All results were mapped onto a standard brain surface model in Caret (Van Essen et al., 2001). Left panel: horizontal sections under the BFS; middle panel: visualization of significant activations on the cortical surface for both scenarios (Red: BFS; Blue: symbol scenario [SBS]; Purple: overlap of both scenarios); right panel: horizontal slices under the SBS. The top-left number besides each slice indicate the z-axis. Right hemisphere is at the right side of the figure. Voxelwise statistical threshold was set at p < 0.0001, and cluster threshold alpha <0.01.

beta band power was in synchronization after go stimulus in SBS.

#### Error Detection

In the preSMA component (**Figure 8**), the brain dynamics for error detection (SS vs. FS conditions) showed the suppression of theta and alpha band power in BFS; on the other hand, all frequency bands power of FS condition displayed much greater magnitude than SS in SBS.

In the rMFG component (**Figure 9**), the brain dynamics for error detection (SS vs. FS) showed that delta and theta band power were in desynchronization after response in the

BFS, whereas beta band power was in synchronization after go stimulus in SBS.

# DISCUSSION

The current study aims to compare inhibitory functions and the associated brain mechanisms underlying realistic and simplified scenarios. Based on the behavioral results, participants successfully performed the stop-signal task under BFS and SBS (the SG ratio was above 90% and SS ratio approached 50% for both scenarios). The SSRT has been suggested to be an indicator of one's inhibitory ability (Band et al., 2003). Since SSRTs of the two scenarios do not differ, performance on response inhibition does not

seem to be influenced by different scenarios one faces, likely due to highly adaptive nature of human's inhibitory processing. The brain mechanisms for inhibition under the two scenarios can be compared on equivalent bases of behavioral performance.

To summarize, main findings in the fMRI and EEG data are as the following: in the whole-brain analysis of fMRI data, significant difference between the battlefield and SBSs was found only in rTPJ for inhibitory control, and no significant region was found for error detection. In the ROI analysis of fMRI data, significant difference between the two scenarios (BFS > SBS) was found in left IPG and rMOG for inhibitory control, and in right IOG for error detection. As for the EEG results, for inhibitory control in the preSMA, the burst of delta and theta band power was observed in BFS, whereas the suppression of alpha and beta band power was observed in SBS. In the rMFG, there were delta, theta and alpha band power desynchronization after response in BFS, and beta band power synchronization after go stimulus in SBS. For error detection in the preSMA, there was the suppression of theta and alpha band power in BFS, and broadband synchronization in SBS. In the rMFG, there were delta and theta band power desynchronization after response in the BFS, and there was beta band power synchronization after go stimulus in SBS.

#### Neural Mechanisms of Inhibitory Control

The fMRI results show that, under the contrast of inhibitory control, the stop-signal task in BFS and SBS activate overlapped brain areas including preSMA, rIFG, bilateral IPG, bilateral MOG. All of these brain areas are either involved in target detection or attention to salient events (Corbetta and Shulman, 2002; Eckert et al., 2009; Menon and Uddin, 2010). Specifically, one of MOG functions is visual form perception and recognition (Grill-Spector and Malach, 2004), the parietal lobe is a crucial locus for spatial attention (Yantis et al., 2002), and the rIFG and preSMA show significant activation for contrast between SS and SG conditions (Boehler et al., 2010; Swick et al., 2011). While participants performed stop-signal task in both scenarios, we expect to observe stronger activations in BFS than SBS because BFS contains more complex visual information and may evoke other cognitive functions involved in the inhibitory network.

With respect to the main goal of the current study, when contrasting the inhibitory control component in both scenarios in the whole-brain analysis, we observe higher activation for the BFS in the rTPJ. The rTPJ has been implicated, together with the rIPG, in detecting behaviorally relevant salient events (Corbetta and Shulman, 2002; Husain and Nachev, 2007). Chang et al. (2013) uses transcranial magnetic stimulation (TMS) to interfere with bilateral TPJ to probe the function in attentional networks, and find that the rTPJ is critically involved in attentional reorienting. In addition, rTPJ is also involved in the ''theory-of-mind'' (ToM) network which includes the medial PFG, precuneus, right superior temporal sulcus and bilateral TPJ (Saxe and Kanwisher, 2003; Aichhorn et al., 2009). The ToM network increases metabolic activity when one thinks about other people's thoughts. Koster-Hale et al. (2013) use multi-voxel pattern analysis to examine the difference between intentional and accidental harms on other people, and conclude that rTPJ is associated with moral judgments. In the current study, the rTPJ may serve one or a few of the functions mentioned above in BFS because the task involve shooting decision which may aim at innocent hostage.

There is a greater potential negative consequence of failing to stop a shooting response in the presence of an innocent hostage, which may actually decrease response impulsivity but yet still increase the level of activation of inhibitory systems. To verify this speculation with enhanced sensitivity, six brain areas are selected from the whole-brain analysis of inhibitory control that were localized by contrast orthogonal to the scenario effect, including the rIFG, preSMA, left insula, left IPG, rMOG and left MOG. The left IPG and rMOG show a greater activation in BFS. To relate the findings with the roles of these ROIs in previous studies, the left IPG has been implicated in tool manipulation (Ishibashi et al., 2011) or executive function (Kübler et al., 2003), which are both relevant in the current study because participants might have connected the task to firing with a gun (using a tool) to shoot terrorist in BFS. According to Slotnick et al. (2003), the reallocation of visual attention to external stimulus will result in an increase in occipital activation. In ROI results, the stronger activation of rMOG suggests that participants might have focused on terrorist and hostage and ignore the battlefield background. However, as BFS and SBS not only differed in their contextual information but also their visual complexity and emotional implications, the above conjectures need to be considered with caution.

With respect to the temporal dynamics, we first examine common findings in SS and SG conditions of both scenarios. In the preSMA source, there is a burst of each frequency band power except the beta band following the go stimulus, which lasted for 400–600 ms. This phenomenon is consistent with what was found after no-go and stop signal trials in previous studies (Schmiedt-Fehr and Basar-Eroglu, 2011; Huster et al., 2013). Because the preSMA is essential for the conversion from volitional thoughts to actions (Penfield and Welch, 1951; Fried et al., 1991), the beta band power has been generally considered as a marker of explicit responses. The event related desynchronization (ERD) of beta band occurs before and during response and then the event related synchronization (ERS) would follow actual response (Schulz et al., 2014). In the current experiment, the ERD of beta band occurs in SS and SG conditions of two scenarios likely because participants have already prepared to respond when they see the go stimuli; however, the ERS of beta band only occurs in SG condition of both scenarios because participants do not make actual response in the SS condition. Furthermore, the spectral perturbation of SS between BFS and SBS show that the power of theta-alpha band is much greater in BFS. According to Huster et al. (2013), the burst of frontal theta band power is associated with successful inhibition. In the current study, we observed synchronization of theta-alpha band power of SS under BFS and SBS in the preSMA. Furthermore, the theta-alpha band power in preSMA of BFS is higher than SBS, which suggests that the impulsivity in BFS is stronger than in SBS.

One thing worth noticing is that, in the preSMA brain source (**Figure 6**), there is no difference in the baseline power between scenarios, likely because each go stimulus may or may not be followed by a stop signal. This indicates that these two scenarios had the same baseline states when preparing for inhibiting prepotent response in the current trial regardless of stop signal. However, unlike in fMRI analysis, the rTPJ does not show significant differences between scenarios in the EEG analysis.

On the other hand, Swann et al. (2012) demonstrate that the power of 4–15 Hz is suppressed and beta band power would increase in right frontal lobe after the stop signal. Beta band power from the right frontal lobe may serve to compute coherence with preSMA. Therefore, they suggest that right frontal lobe monitors and detects the stop signal and then transfers the information to preSMA (coherent beta activity). This finding about the role of the right frontal lobe is similar to our results of SS of SBS at the rMFG, but not in the BFS. Perhaps the rMFG is involved in transferring information but not directly in inhibitory control so that different scenarios evoked different spectral perturbation. Finally, the spectral perturbation of two scenarios under bilateral MOG are similar, likely due to their similar roles in processing visual stimuli.

# Neural Mechanisms of Error Detection

In the whole-brain analysis, we observe higher activation in the IFG, MFG and MOG for the SS than the FS condition in both scenarios. These brain regions may reflect different cognitive functions in attention during visual processing, decision making, response execution and post-response processing (Iannaccone et al., 2015). Previous fMRI studies have indicated that attention neural network modulates visual cortical activation and facilitation of visual stimulus processing through inhibition of unattended stimulus information (Brefczynski and Deyoe, 1999; Smith et al., 2000; Slotnick et al., 2003). Although higher activation in MOG for error detection can be observed in both scenarios, we expect to observe stronger activations in BFS than SBS because BFS is a more complex situation requiring participants to correct their error and evoke other cognitive functions involved in inhibitory control (see also ''Neural Mechanisms of Inhibitory Control'' Section).

To verify the above speculation with improved sensitivity, rMFG, left IFG, right STG, right IPG, right IOG and left MOG were identified as ROIs from a contrast (SS—FS) orthogonal to the scenario effect in the whole-brain analysis. Only the rMOG shows greater activation in BFS than SBS. This result supports the idea that participants need to pay more attention to SS in BFS (Slotnick et al., 2003). Although we did not find significantly different activation of MFG between the two scenarios, the current findings still suggest that these middle and inferior frontal regions may differ in the post-response processes in the error detection (i.e., FS vs. SS). The middle and inferior frontal areas have been implicated in error detection and conflict monitoring (Braver et al., 2001; Menon et al., 2001; Rubia et al., 2003, 2005; Rushworth et al., 2004).

The current study also finds that both scenarios have stronger activation of MFG in error detection. The activation of MFG may reflect stronger performance monitoring after FS. Furthermore, we observed only the activation of fusiform gyrus in BFS (**Table 1D**) due to our stimuli design and may reflect face recognition (George et al., 1999).

With respect to the temporal dynamics, when analyzing the effect of scenarios, although ROI analysis in fMRI results reveal significant differences in the right IOG, we do not observe significant difference between scenarios in rMOG (Supplementary Figure 5). Although we do not find the effect of scenarios, we observe the suppression of theta band after response in error detection in both scenarios. Previous EEG studies have indicated that the oscillation of theta-alpha band is associated with the attention network (Fan et al., 2007). The current study finds that the suppression of theta band may be associated with the activation of right occipital gyrus for greater attention to the visual stimuli during SS when compared with FS. On the other hand, we explore the temporal dynamic in preSMA and rMFG brain source and observe the duration of burst of delta, theta and alpha band power in the FS condition were longer then SS condition in preSMA and rMFG. The prolonged duration of the FS condition may reflect error detection. The findings of previous EEG studies suggest that the burst of theta and alpha band power after response in the frontal lobe reflect error processing (Cavanagh and Frank, 2014; Cohen, 2015; Shou and Ding, 2015). Finally, this study reveals that the EEG oscillation of preSMA brain source is related to not only inhibitory control but also error detection.

# CONCLUSION

This study uses BFS to translate stop signal paradigm in simulated threatened situation and demonstrates that when human inhibits their action under threatened situation, the rTPJ is involved in the mediation of inhibitory control. The power of theta-alpha band under threatened situation is greater than normal situation that may be associated with the rising activation level of preSMA. Through over half a century of investigations on cognitive functions, significant amount of knowledge of basic cognitive processes has been acquired using stimuli with extremely simple configuration. From the behavioral performance of the current study we demonstrated that findings discovered with simple stimuli remains valid when carefully and comparably transformed into complex and realistic ones. At the meantime, additional brain regions relevant to the new configuration may be involved dynamically for the more complex stimuli, as can be identified from sources of signals differentially specialized in spatial and temporal resolutions. How these simultaneously recorded sources of signals (e.g., EEG and fMRI) are conjointly related to the valence, complexity, and motivational effects induced by scenes embedding the basic cognitive process remain an intriguing and important issue for future studies.

# AUTHOR CONTRIBUTIONS

L-WK initiated the main idea of this study, designed the experiment and advised the EEG data analysis methods on the collecting fMRI-EEG data. Y-CS, RKC and Y-TC were performing the fMRI-EEG data collection and data analysis. ECC provided advises on modifying the experiment design and analyzing the fMRI data.

# ACKNOWLEDGMENTS

The current study was supported in part by the Taiwan Ministry of Science and Technology (MOST) under grant numbers 102-2420-H-009-003-MY3 and 103-2410-H-009-019-MY2, and in part by UST-UCSD International Center of Excellence in Advanced Bioengineering sponsored by MOST I-RiCE Program under grant number MOST 103-2911-I-009-101. Research was also in part sponsored by the Army Research Laboratory and was accomplished under Cooperative Agreement Number W911NF-10-2-0022. The authors are grateful for the technical support of Taiwan Mind and Brain Imaging Center (TMBIC).

#### REFERENCES


#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fnhum. 2016.00185/abstract


potential indices. Int. J. Psychophysiol. 63, 25–38. doi: 10.1016/j.ijpsycho.2006. 07.001


inhibition tasks. Neuroimage 56, 1655–1665. doi: 10.1016/j.neuroimage.2011. 02.070


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Ko, Shih, Chikara, Chuang and Chang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Exploring Neuro-Physiological Correlates of Drivers' Mental Fatigue Caused by Sleep Deprivation Using Simultaneous EEG, ECG, and fNIRS Data

Sangtae Ahn1 †, Thien Nguyen2 †, Hyojung Jang<sup>1</sup> , Jae G. Kim<sup>2</sup> \* and Sung C. Jun<sup>1</sup> \*

*<sup>1</sup> School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju, South Korea, <sup>2</sup> Department of Biomedical Science and Engineering, Gwangju Institute of Science and Technology, Gwangju, South Korea*

#### Edited by:

*Klaus Gramann, Berlin Institute of Technology, Germany*

#### Reviewed by:

*Tzyy-Ping Jung, University of California San Diego, USA Chin-Teng Lin, National Chiao-Tung University, Taiwan*

\*Correspondence:

*Jae G. Kim jaekim@gist.ac.kr; Sung C. Jun scjun@gist.ac.kr*

*† These authors have contributed equally to this work.*

Received: *05 November 2015* Accepted: *27 April 2016* Published: *13 May 2016*

#### Citation:

*Ahn S, Nguyen T, Jang H, Kim JG and Jun SC (2016) Exploring Neuro-Physiological Correlates of Drivers' Mental Fatigue Caused by Sleep Deprivation Using Simultaneous EEG, ECG, and fNIRS Data. Front. Hum. Neurosci. 10:219. doi: 10.3389/fnhum.2016.00219* Investigations of the neuro-physiological correlates of mental loads, or states, have attracted significant attention recently, as it is particularly important to evaluate mental fatigue in drivers operating a motor vehicle. In this research, we collected multimodal EEG/ECG/EOG and fNIRS data simultaneously to develop algorithms to explore neuro-physiological correlates of drivers' mental states. Each subject performed simulated driving under two different conditions (well-rested and sleep-deprived) on different days. During the experiment, we used 68 electrodes for EEG/ECG/EOG and 8 channels for fNIRS recordings. We extracted the prominent features of each modality to distinguish between the well-rested and sleep-deprived conditions, and all multimodal features, except EOG, were combined to quantify mental fatigue during driving. Finally, a novel driving condition level (DCL) was proposed that distinguished clearly between the features of well-rested and sleep-deprived conditions. This proposed DCL measure may be applicable to real-time monitoring of the mental states of vehicle drivers. Further, the combination of methods based on each classifier yielded substantial improvements in the classification accuracy between these two conditions.

Keywords: EEG/ECG/EOG/fNIRS, neuro-physiological correlates, drivers' mental fatigue, sleep deprivation, simulated driving, multimodal integration, driving condition level

# INTRODUCTION

Neuroergonomics is an emerging field that investigates human mental states and their workloads in order to improve the reliability of human performance, and ensure its stability in various environments (Parasuraman, 2003; Parasuraman and Rizzo, 2008). In neuroergonomics, both the fundamental principles of neuroscience and human factors are considered thoroughly, and neural behaviors have been investigated primarily when people are engaged in tasks in a work environment (Parasuraman and Wilson, 2008). Due to the implications for public safety, a major application of neuroergonomics is the assessment of driver fatigue. In general, driver fatigue is categorized as either mental or physical. Mental fatigue occurs because of gradual and cumulative mental effort (Grandjean, 1979) during driving, or sleep deprivation before driving (Durmer and Dinges, 2005). In contrast, physical fatigue represents reduced muscular strength and coordination. Physical fatigue may be countered by deliberate action; however, mental fatigue is difficult to resolve. Because of mental fatigue, drivers begin to doze involuntary, which often results in traffic accidents (Horne and Reyner, 1999; Connor et al., 2002; Herman et al., 2014).

One potential method that may be used to reduce traffic accidents is to measure inherent mental fatigue before or during driving, in order to predict a driver's mental condition and determine whether s/he can drive safely. Because driving requires complex cognitive processes and sustained concentration, predicting a driver's mental fatigue before or during driving could be effective in preventing traffic accidents. Thus, we attempted in this work to explore neuro-physiological correlates in two different conditions, one well-rested with a low risk of fatigue, and the other sleep-deprived with a high risk of fatigue.

Among many studies performed to evaluate drivers' fatigue in real-time, computer vision-based systems have been used widely. Bergasa et al. (2006) proposed a noninvasive system to monitor a driver's vigilance using several parameters, including percentage or duration of eye closure, blinking, and the frequency of nodding. By using a fuzzy classifier, the researchers then inferred the level of the drivers' fatigue. However, the reliability of the findings decreased when the drivers wore glasses or the surrounding brightness changed. To address these problems, D'Orazio et al. (2007) designed an experimental paradigm that incorporated conditions in which some subjects had different eye colors, wore glasses, and drove vehicles in light of varying intensities. Using the proposed visual framework, the authors obtained robust results. In addition, various visual cues that characterized eyelid, gaze, and head movements, as well as facial expressions were employed in a probabilistic model developed to predict fatigue (Ji et al., 2004) that yielded even more robust results. Recently, Wang et al. (2014) developed an online, closed-loop lapse detection system featuring a mobile wireless electroencephalograph (EEG), and were able to extract certain EEG signatures associated with fatigue.

To date, EEG has been found to be a promising indicator for investigations of driver fatigue (Lal and Craig, 2001). EEG data have shown that there is a significant increase in theta and delta activity, and a decrease in heart rate (HR) associated with fatigue (Lal and Craig, 2002). Further, in a subsequent study that considered three phases of fatigue (early, medium, and extreme), software was developed to monitor driving fatigue, and was validated with EEG data from 35 subjects engaged in a simulated driving task (Lal et al., 2003). Another study (Lin et al., 2005) estimated drowsiness and driver performance by correlating changes in log power spectra. To detect drowsiness, they constructed an individualized linear regression model to assess EEG dynamics continuously based on an independent component analysis. Because drowsiness is a crucial factor in driving, Lin and his colleagues investigated the effect of continuous arousing auditory feedback on sustained attention in a driving simulator (Lin et al., 2010). They found that spectral powers in alpha and theta bands were suppressed and lasted 30 s or longer after feedback. This finding was introduced to estimate classification accuracy; as a result, they achieved a classification accuracy of approximately 78% using the maximum likelihood classifier (Lin et al., 2013) and applied it to develop an online, closed-loop system for practical lapse detection in real environments (Wang et al., 2014).

Various other methods have been used to explore drivers' mental fatigue, such as a support vector machine (SVM) (Shen et al., 2008; Yeo et al., 2009), Bayesian network (Yang et al., 2010), wavelet analysis (Kar et al., 2010; Li and Chung, 2013), and others. In addition to EEG studies, electrocardiography (ECG) and electrooculography (EOG) have been used to determine neuro-physiological correlates of drivers' mental fatigue. One study (Patel et al., 2011) used neural network analysis and demonstrated that the variability in drivers' HRs differed significantly in alert and fatigued states. They investigated the power spectral density behaviors between the two states during long-term driving and reported that the neural network was 90% accurate in classifying mental state.

Eyelid-related features from EOG data also have been reported to be possible candidates to detect whether or not a driver is drowsy (Hu and Zheng, 2009). In this report, they used vertical and horizontal EOG channels to extract and validate eye blinks according to eyelid movement parameters, such as blink duration, speed, and amplitude. Three conditions (alert, sleepy, and very sleepy) were classified with high reliability using SVM. Simultaneous recording of EEG/ECG (Zhao et al., 2012) and the combination of multimodal features from EEG, EOG, and ECG data (Khushaba et al., 2011) demonstrated significant differences during long-term driving. In this study, the researchers developed an efficient, fuzzy mutual informationbased wavelet packet transformation that combined EEG, EOG, and ECG features to detect drivers' drowsiness; this technique yielded a classification accuracy greater than 90%.

An emerging portable and noninvasive brain functional imaging technique, functional near infra-red spectroscopy (fNIRS), has been introduced to monitor cognitive workload or fatigue in simulated environments (Ayaz et al., 2012). fNIRS data from the prefrontal cortex were collected during a complex airtraffic control task that required the subjects to prevent collisions between aircraft in their sectors. As the number of aircraft in their sector increased, a concomitant increase in prefrontal cortex activation was observed, which suggests that fNIRS provides a sensitive index of cognitive workload. fNIRS also demonstrated changes in prefrontal activation during skill acquisition in both basic working memory tasks (McKendrick et al., 2014) and more complex piloting tasks (Harrison et al., 2014; Gateau et al., 2015).

A portable fNIRS device was developed for use in mobile neuroimaging of the prefrontal cortex (Ayaz et al., 2013). In a driving environment, Li et al. (2009) observed changes in cerebral oxygenation during prolonged simulated driving. Forty healthy subjects were divided randomly into two groups (driving vs. non-driving), and the driving group performed a simulated 3 h driving task. The authors found a relative increase in frontal cortex oxygenation in the driving group by comparison to the non-driving group, and oxygenation decreased gradually after the driving task. Considering real driving situations, Yoshino et al. (2013) investigated the changes in cerebral oxygen exchange during actual driving on an expressway. An fNIRS signal was recorded in the subjects' parietal and prefrontal cortices using

an fNIRS device mounted in the vehicle. They found that the areas activated varied depending on the driving task, such as parking, acceleration, driving at constant speed, deceleration, and U-turns. Thus, the use of fNIRS may be an effective approach to evaluate brain activity in various driving environments.

Recently, hybrid approaches that combine two different modalities (Pfurtscheller et al., 2010) to improve performance and reduce classification error have been reported as promising for future brain-computer interfaces (BCI). One example of a hybrid BCI that incorporates both EEG electrical activity and fNIRS hemodynamic changes yielded improved classification performance in sensorimotor rhythm-based BCI systems (Fazli et al., 2012). The researchers calculated classification accuracies in the movements executed and motor imagery by estimating a meta-classifier. After the estimation of both classifiers (EEG and fNIRS), the combination of outputs of each classifier resulted in improved classification accuracy. Khan et al. (2014) decoded four movement directions (left, right, forward, and backward) using the mixed features of EEG and fNIRS, in which EEG features were used to classify left/right, and fNIRS features were used to classify forward/backward. In addition, hybrid BCI may be used as a brain switch that determines whether a certain task is active. Koo et al. (2015) employed a novel experimental paradigm to detect the occurrence of motor imagery in fNIRS data. Threshold-based detection with a feature value of the fNIRS data determined whether or not the action of a motor imagery task was attempted. The combination of EEG and fNIRS is also applicable to language studies (Wallois et al., 2012) and cortical current estimation (Morioka et al., 2014). Hybrid BCIs may provide a good opportunity to increase BCI performance by offering the synergistic effects of multimodal brain imaging techniques.

In this work, we recorded multimodal EEG/ECG/EOG and fNIRS data simultaneously in a driving simulator and combined their features to distinguish drivers with high- and low-risks of fatigue using neuro-physiological correlates and a classification method. Hemodynamic changes in the prefrontal cortex (Li et al., 2009; Ayaz et al., 2012, 2013; Yoshino et al., 2013; Harrison et al., 2014; McKendrick et al., 2014; Gateau et al., 2015) have been used to neuro-physiological correlates, and these activities were reported to play an important role in neuroergonomics, such as mental workload (Mandrick et al., 2013a), cognitive operation (Mandrick et al., 2013b), and emotional function (Doi et al., 2013). Furthermore, it is clear that EEG, ECG, and EOG are also promising indicators that may be used to investigate the neurophysiological correlates of drivers' mental fatigue (Lal and Craig, 2001). Therefore, combining this hybrid system with prefrontal fNIRS may be a far more informative measure for identifying neuro-physiological correlates under varying driving conditions. To the best of our knowledge, this multimodal approach has been tested rarely to explore neuro-physiological correlates of drivers' mental fatigue.

Thus, the goal of this study was to determine modalityspecific features of EEG, EOG, ECG, and fNIRS. These features were then used to distinguish between well-rested and sleepdeprived conditions, and resulted in a classifier that showed whether or not a driver was in an alert mental state. The use of a reasonable combination of these multimodal features may improve classification accuracy and its quantification may yield a real-time strategy to monitor drivers' mental fatigue.

# MATERIALS AND METHODS

# Experimental Procedure

Eleven healthy subjects (10 males, 1 female, aged 26.6 ± 1.4, range = 24–28) who had valid driver's licenses participated in a custom-built virtual driving simulation task, as depicted in **Figure 1A**. The subjects practiced repeatedly until they were familiar with the simulation system. The purposes of, and instructions for, the experiment were explained in advance, and all of the subjects signed an informed consent. Subjects received approximately \$10 per h as compensation for their participation. Each subject performed simulated driving under two conditions (well-rested and sleep-deprived) on different days. Under the well-rested condition, subjects were instructed to sleep at least 7 h before the experiment, as sleeping seven or more hours is known to maintain healthy mental alertness (Kripke et al., 2002). In the sleep-deprived condition, the subjects were instructed to stay up all night in order to produce mental fatigue.

Driving tests in both conditions were performed before 9 a.m. In this experiment, we assumed that subjects would be significantly mentally fatigued after one night of sleep deprivation. To determine the degree of fatigue produced by sleep deprivation, a subjective questionnaire was administered to the subjects before the experiment to score their levels of fatigue, and the scores demonstrated clearly that the sleepdeprived subjects were substantially more fatigued than were the well-rested subjects. The subjects sat in a comfortable driver's seat and drove on an oval track for a minimum of 30 min. The maximum driving speed was set at 100 km/h in both conditions. The steering wheel vibrated whenever the vehicle collided with a crash barrier in order to prevent the drivers from falling asleep completely. A high-definition webcam (Logitech HD Pro C920) was used to record each subject's behavior in real-time. This experiment was approved by the Institutional Review Board at the Gwangju Institute of Science and Technology (20150615-HR-18-02-06).

# Data Recording of EEG/ECG/EOG and fNIRS

Sixty-four EEG electrodes were attached to the drivers' scalps according to the 10–20 international position system. Horizontal and vertical EOGs were used and two ECG electrodes were attached to the left/right chest (Biosemi ActiveTwo System). These data were collected at a 512 Hz sampling rate using BCI2000 software (Schalk et al., 2004). Biosemi ActiView software monitored the stability and reliability of the EEG signal. After the experiment, bad channels that contained abnormal noise were identified by visual inspection and excluded from the analysis.

A custom-built fNIRS system (continuous wave, 10 Hz sampling rate) was used to record hemodynamic changes in the brain. This was an updated version of one described in a

previous work (Kim et al., 2015). The system consists of probe and control circuits. The probe includes 2 LEDs (emitters) and 8 photodetectors (detectors). The LEDs emit near infrared (NIR) light at two wavelengths (735 and 850 nm). The emitter and four surrounding detectors were separated by 3 cm, as Homma et al. (1996) suggested that in soft tissues, NIR is able to attain a penetration depth equal to half of the emitter-detector separation. Therefore, with a 3.0 cm emitter-detector separation, our system should have been able to collect brain activity at a depth of 1.5 cm below the scalp. An emitter-detector pair form one fNIRS channel that measures hemodynamic changes midway between the emitter and the detector. Given a suitable geometric arrangement, many detectors may receive light from one emitter. This enabled us to design an 8-channel probe with 2 LEDs and 8 photodetectors. The 8-channel probe was attached to the prefrontal region to investigate the subjects' mental state, as illustrated in **Figure 1B** (Li et al., 2009; Sato et al., 2013). The control circuit receives a signal from the probe, amplifies it, and sends it to the computer via serial communication. Matlabbased software was programmed to record, process, and display the hemodynamic signals. Interference between EEG/EOG/ECG electrodes and fNIRS emitters has been observed and is believed to result from light leakage from the emitters, which may cause deterioration in the quality of electrical data (Koo et al., 2015). This interference was removed by blocking light leakage from the emitters and applying a simple pre-processing technique. Two desktop computers were used to record the EEG/ECG/EOG and fNIRS data simultaneously. Triggers for start and end times were sent to the BCI2000 software to synchronize the multimodal EEG/ECG/EOG and fNIRS data. The computer that recorded EEG/ECG/EOG data sent a start trigger, at which time the second computer began to record fNIRS data. The end time of the experiment was marked in the same way.

# Data Analysis

#### Feature Extraction and Classification from EEG

After the experiment, the data collected were inspected visually and bad channels were rejected. The logistic infomax independent component analysis (Bell and Sejnowski, 1995) was used to remove EOG artifacts and the data were then bandpass filtered from 1 to 50 Hz. We analyzed data from the first 30 min only after the drivers began the task, because fatigue levels between well-rested and sleep-deprived subjects were likely to be quite different during the initial minutes of driving. From the real-time webcam video monitoring data, we observed that even some well-rested subjects became drowsy and quite bored after that length of time.

EEG data (30 min) were divided into 10 s (a trial) to yield a total of 180 trials for each driving condition. A power spectral density was computed for each trial using the EEGLAB library (Delorme and Makeig, 2004), and a relative power level (RPL) was computed in order to reduce session/subject variability (Ahn et al., 2013a,b, 2014; Cho et al., 2015). To calculate the RPL, we considered five spectral band ranges: delta (1–4 Hz), theta (4–8 Hz), alpha (8–13 Hz), beta (13– 30 Hz), and gamma (30–50 Hz). Next, each band-power was normalized by the total power, defined as the sum over all band powers, after which we extracted the most informative RPL features between the two driving conditions. On the other hand, to discriminate between the well-rested and sleepdeprived conditions, pre-processed data (180 trials) from each driving condition (well-rested and sleep-deprived) were firstly divided into 2 groups (training and test) and according to time sequence; training and test groups were composed of 126 (70%) and 54 trials (30%), respectively. Then, to avoid temporal dependency between groups, last 6 trials (1 min) for each group were excluded; thus, for each of driving conditions, 120 and 48 trials for training and test were obtained, respectively. By this grouping, temporal dependency (adjacency) was included within groups, but was excluded between groups. This procedure was repeated 30 times by sliding temporal window of 1 min (6 trials) and then choosing training and test groups. Thereafter, each feature vector of the training and test data using RPL was fed into the classifier. The training group was used to construct a classifier based on Fisher's linear discriminant analysis (FLDA), and the test group was input to a constructed classifier in order to measure classification accuracy. A classifier was generated from the training data and the classification accuracy was estimated from the test data. Finally, 30 classification accuracies were estimated to obtain an average accuracy.

#### Feature Extraction and Classification from ECG and EOG

The HRs of each subject were extracted using two ECG channels (left/right chest). During pre-processing, ECG data were bandpass filtered from 0.1 to 30 Hz and were detrended to remove the baseline shift. After detrending, a QRS-complex was observed to be the most prominent repeating peak in the ECG signal. The QRS-wave is used commonly to determine subjects' HRs or predict abnormalities in cardiac function. Specifically, the emergence of an R-peak indicated a subject's HR clearly and was extracted easily by adjusting a deterministic threshold of the ECG magnitude. Next, the number of R-peaks per minute was counted and used to determine HR per minute. HRs from the two ECG channels on the left and right chest were calculated for the entire 30 min and averaged to reduce possible detection error and bias. To classify mental state from the ECG data, we adopted the extraction of RR-peak interval features (de Chazal et al., 2004). After detection of the R-peak in each 10-s trial, the intervals between one R-peak and the next were averaged, and the procedure was repeated for all trials. In this way, 180 R-peak intervals were estimated as a feature set. The EOG signal was used to extract the rate of eye blinking in each 1-min trial, which has been reported to be associated well with a human's mental state (Schleicher et al., 2008): when the eye blinks, a clear, sharp wave is observed. After baseline drift removal was applied, a peak detection algorithm (Pettersson et al., 2013) was used with a given threshold of signal magnitude. Finally, the number of peaks per minute, which represented the eye-blinking rate, was used as the EOG feature.

#### Feature Extraction and Classification from fNIRS

We adopted the modified Beer-Lambert's law (mBLL) to retrieve relative concentration changes from the light intensities of the 8 detectors (Cope et al., 1988; Kocsis et al., 2006). The change in optical density at two wavelengths (735 and 850 nm) is related to changes in oxy-hemoglobin concentration (HbO) and deoxyhemoglobin concentration (HbR). Data with abnormal noise were removed by visual inspection, and the remaining data were filtered with a 0.01 Hz high-pass filter to remove baseline drifts. Light intensities for 30 s after the initiation of the experiment were averaged and set as baseline intensities. HbO and HbR were estimated with the following equations:

$$
\Delta HbO = \frac{\log \frac{I\_b^{\lambda\_1}}{I\_t^{\lambda\_1}} \varepsilon\_{HbR}^{\lambda\_2} - \log \frac{I\_b^{\lambda\_2}}{I\_t^{\lambda\_2}} \varepsilon\_{HbR}^{\lambda\_1}}{d \cdot DPF \left[ \varepsilon\_{HbO}^{\lambda\_1} \varepsilon\_{HbR}^{\lambda\_2} - \varepsilon\_{HbO}^{\lambda\_2} \varepsilon\_{HbR}^{\lambda\_1} \right]},\tag{1}
$$

$$
\Delta HbR = \frac{\log \frac{I\_b^{\lambda\_2}}{I\_t^{\lambda\_2}} \varepsilon\_{HbO}^{\lambda\_1} - \log \frac{I\_b^{\lambda\_1}}{I\_t^{\lambda\_1}} \varepsilon\_{HbO}^{\lambda\_2}}{d \cdot DPF \left[ \varepsilon\_{HbO}^{\lambda\_1} \cdot \varepsilon\_{HbR}^{\lambda\_2} - \varepsilon\_{HbO}^{\lambda\_2} \cdot \varepsilon\_{HbR}^{\lambda\_1} \right]},\tag{2}
$$

where

I λ b : baseline intensity (λ<sup>1</sup> : 735 nm λ<sup>2</sup> : 850 nm) I λ t : transient intensity

d: emitter − detector separation ε λ Hb : extinction coefficient DPF : differential path length factor

In continuous wave fNIRS, the differential path length factor (DPF) is unknown. However, it is similar for both wavelengths and is included conventionally in the unit of hemodynamic changes as a scaling factor. Thus, HbO and HbR have the same unit of mM/DPF, and the extinction coefficients are specific for HbO and HbR at each wavelength. Matcher et al. (1995) measured extinction coefficients of hemoglobin at different wavelengths as follows:

at wavelength λ1= 735 nm,

$$
\varepsilon\_{HbO}^{\lambda\_1} = 0.4646 \text{m}M^{-1}cm^{-1} \text{ and } \varepsilon\_{HbR}^{\lambda\_1} = 1.2959 \text{m}M^{-1}cm^{-1},
$$

at wavelength λ2= 850 nm,

$$
\varepsilon\_{HbO}^{\lambda\_2} = 1.1596 \text{m}M^{-1}cm^{-1} \text{ and } \varepsilon\_{HbR}^{\lambda\_2} = 0.7861 \text{m}M^{-1}cm^{-1}
$$

Like EEG feature extraction, 10 s of data were defined as one trial, which yielded a total of 180 trials per condition. Next, relative concentration changes were estimated for each trial. To reduce the effects of fluctuations and noise, fNIRS data were smoothed using 10-s temporal windowing with a 50% overlap. Finally, the amplitudes of HbO and HbR were used as informative features for classification of the two driving conditions.

#### RESULTS

#### Relative Power Level from EEG

We investigated RPL values over five spectral bands—delta, theta, alpha, beta, and gamma—and found that the RPL values for delta, theta, and gamma did not differ statistically between the two driving conditions. However, alpha and beta RPL values differed clearly in the two conditions, as shown in **Figure 2A**. Grand-averaged topographies were described for each condition, and alpha RPL in the sleep-deprived condition was activated to a greater degree in the right centro-parietal region. Such an increase in alpha power has been reported in the literature as a notable marker in driving (Lal and Craig, 2001; Simon et al., 2011). A decrease in beta RPL over the fronto-central region was observed in the sleep-deprived condition. This decrease in beta power may indicate a lack of arousal, which is consistent with the results of several studies (Tanaka et al., 2012; Zhao et al., 2012).

**Figure 2B** shows the distributions of alpha and beta RPLs from subject S5 in a two-dimensional Cartesian coordinate space. We note that most subjects showed similar physiological behaviors, and S5's results were chosen as representative because they yielded the highest classification accuracy in the EEG, as shown in **Table 3**. For the purposes of consistency and comparison, the other results (ECG, fNIRS) from subject S5 are also illustrated in the subsequent sections. Each RPL was averaged spatially over significant regions, such as the centroparietal for alpha and the fronto-central for beta. Each dot

represents corresponding alpha (x-coordinate) and beta (ycoordinate) RPLs for one trial in each condition (well-rested and sleep-deprived). In the well-rested condition, most RPL dots were distributed in the upper left area in R<sup>2</sup> space, while they were distributed in the lower right area in the sleep-deprived condition. Thus, these features (centro-parietal alpha RPL and fronto-central beta RPL) in the dataset collected allowed us to achieve a discriminative classification between the two driving conditions quite well.

To investigate inter-subject variability (Ahn and Jun, 2015), we plotted RPL topographies in **Figures 2C,D**, respectively, for two subjects who achieved the highest (S5) and the lowest (S2) EEG classification accuracies (**Table 3**). As depicted in the figures, subject S5 showed a clear alpha RPL increase in the right centro-parietal region and beta RPL decrease in the frontocentral region in the sleep-deprived condition. In contrast, subject S2, who demonstrated the lowest classification accuracy, showed a slight alpha RPL increase and a beta RPL decrease in the sleep-deprived condition. Interestingly, this subject (S2) was likely to have been fatigued already, despite being in the well-rested condition before the experiment. Our investigation of this subject will be described in detail in the Discussion section.

# Time Course of Relative Concentration Changes from fNIRS

The time course of the relative concentration changes of HbO and HbR were estimated through mBLL. **Figure 3** depicts the concentration changes at channels 1 and 5 for subject S5. Because all channels were attached to the prefrontal cortex, they all showed similar behaviors over time. Thus, for the purpose of illustration, we chose two representative channels (1 and 5). The concentration changes of HbO increased gradually over time in the well-rested condition and demonstrated the highest level at channels 1 and 5 between 20 and 30 min. Paying attention while driving a vehicle requires high oxygen consumption in the brain, which induces an increase in cerebral blood flow; this increase in cerebral oxygenation, as shown by an increase in HbO and decrease in HbR, indicates that the cerebral blood flow increased during

driving under the well-rested condition. On the other hand, concentration changes of HbO decreased slightly (Channel 1) or remained stable (Channel 5) compared to the baseline, and HbR concentration maintained baseline values while driving under the sleep-deprived condition. Less oxygen may be consumed when mentally fatigued, and therefore, brain activity is likely to be suppressed, resulting in less blood flow to the brain.

# Reduced Heart Rate and Eye Blinking in the Sleep-Deprived Condition

The mean HRs of all subjects were calculated from ECG signals over the entire 30-min driving period (average of varied HRs), as tabulated in **Table 1** and depicted in **Figure 4A**. As shown in the table and figure, HRs in the sleep-deprived condition were significantly lower than were those in the well-rested condition for all subjects (p < 0.01, Wilcoxon signed-rank test). **Figure 4B** shows the HR of subject S5 over time (from initiation to 30 min of driving). HRs in the well-rested condition were higher than those in the sleep-deprived condition; however, the difference in the HR between the two conditions decreased gradually as driving time increased and became quite small at the end of the driving task (∼30 min). We deduced from this time variance in the HR that even a well-rested driver began to feel fatigued after some duration of driving and was considerably fatigued by the end of the task.

Because of the EOG signal, we expected that subjects in the sleep-deprived condition would demonstrate a relatively higher rate of eye blinking than those in the well-rested condition. Instead, we observed (not shown here) that some subjects showed higher rates of eye blinking in the well-rested than in the sleep-deprived condition, although the differences were not statistically significant. From the video data, we found that these subjects closed and opened their eyes frequently to overcome drowsiness, and this action on their part may have affected seriously the rate of eye blinking derived from the EOG data.

# Driving Condition Level and Relative Driving Condition Level

We attempted to demonstrate that it may be possible to evaluate neuro-physiological correlates of drivers' mental fatigue using the significant features found in EEG, ECG, and fNIRS data. To that end, we used the three factors extracted from multimodal signals in the previous sections: RPL (ratio of beta to alpha) from EEG, HbO from fNIRS, and the averaged HR from ECG. Each feature was normalized by scaling between 0 and 1 for equal contribution, as formulated in Equation (3). Each value of the EEG, ECG, and fNIRS was distributed well between those values. Significantly abnormal values—greater than 95% (mean ± 2∗σ)—were considered outliers and were shrunk to their maximum or minimum values in the feature set. The summation of all three normalized factors was proposed as the driving condition level (DCL), as depicted in Equation (4). In addition, we estimated the relative difference in DCL between the well-rested and sleep-deprived conditions (rDCL), which represented the degree of the drivers' fatigue compared to that in the well-rested condition, as defined in Equation (5); a higher rDCL indicates greater fatigue.

$$norm(\mathbf{x}) = \frac{\mathbf{x} - \min(\mathbf{x})}{\max(\mathbf{x}) - \min(\mathbf{x})},\tag{3}$$

$$DCL = norm(\frac{\text{beta } RPL}{\\\\alpha \text{ RPL}}) + norm(HbO)$$

$$+\underset{\cdot}{\operatorname{norm}}\{\stackrel{\cdot}{\operatorname{HR}}\},\tag{4}$$

$$\begin{array}{l} \left( 0 \le norm \Big( \frac{\text{beta } RPL}{\text{alpha } RPL} \right) norm (HbO) norm (HR) \\ \le 1, \ 0 \le DCL \le 3 \end{array}$$

$$rDCL \text{ (\%)} = 100 - \frac{DCL\_{sleep} - depried}{DCL\_{well-restricted}} \* 100,\tag{5}$$

FIGURE 4 | Heart rates from ECG in two different conditions. (A) Averaged heart rates for all subjects in well-rested and sleep-deprived conditions. (B) Subject 5's HRs over time. Each HR was estimated every minute.



*Values in parentheses indicate standard deviations for each HR. The Wilcoxon signed-rank test was performed.*

Using our proposed definition of DCL (Equation 4), DCL values were estimated in the two conditions for each subject. For all subjects, 30 min of multimodal data were used to estimate the values. DCL ranged from 0 to 3, with a smaller DCL indicating greater fatigue. These DCL values are tabulated in **Table 2**. Subject S8 showed the highest DCL value in the well-rested condition, while subject S4 showed the lowest DCL value in the sleep-deprived condition. We found that the two drivers' conditions (well-rested and sleep-deprived) differed significantly (p < 0.01, Wilcoxon signed rank test). In addition, rDCL, which represents the percentage of the fatigue in a drivers' mental condition, was introduced in this work. All rDCLs were consistently greater than 30%, except for those for two of 11 subjects (S7 and S9); thus, when rDCL is tuned more finely with more data, it may be used as a predictor of mental fatigue.

To investigate the drivers' mental fatigue over time, each DCL value (per a minute) was estimated by extracting features of RPL (beta over alpha), HbO, and averaged HR. **Figure 5** shows the DCL values of subject S5, in which the values decreased gradually over time in the well-rested condition, and remained consistent at approximately 1 in the sleep-deprived condition. Sleep-deprived subjects were quite fatigued already at the beginning of the driving task. From the questionnaire, we found a weak correlation (r <sup>2</sup> = 0.42) between rDCL values in the sleep-deprived condition and subjects' reported degree of sleepiness (1: rarely sleepy to 5: very sleepy). The average scores for sleepiness over all subjects were 1.4 and 4.1 in the well-rested and sleep-deprived conditions, respectively, while the average hours of sleep reported were 7.36 and 0 h, respectively.

# Multimodal Integration to Determine Neuro-Physiological Correlates

In this work, we recorded simultaneous EEG/ECG/EOG and fNIRS signals for multimodal analysis. Multimodal integration is an efficient technique that yields important insights into brain processes (Uludag and Roebroeck, 2014 ˘ ). Even though the EOG signals did not differ statistically in this work, they were used to eliminate eye movement artifacts in the EEG data. On the other hand, EEG/ECG and fNIRS data yielded clear features that discriminated between the driving conditions, and each feature from the three different modalities differed significantly between the well-rested and sleep-deprived conditions. Based on these features, DCL (summation of these normalized factors) was proposed to determine neuro-physiological correlates of drivers'


TABLE 2 | Driving condition level (DCL) in well-rested and sleep-deprived conditions.

*Relative DCL (rDCL) indicates percentage of fatigue level by comparison to well-rested condition.*

mental fatigue in a quantitative manner. As a result, we observed that DCL may offer a reasonable method to discriminate well between the two driving conditions. To illustrate the individual contribution of each modality to the differences in DCL between the two driving conditions, the modality-specific contributions are shown for all subjects in **Figure 6**. Accumulation of the three colored bars indicates DCL differences in the multimodal data (EEG+ECG+fNIRS), which represent the synergistic effect of these data. As shown in this figure, subjects S2 and S4 demonstrated the greatest differences in DCL, while the DCL of subjects S7 and S9 differed the least between the two conditions. Because of the unbiased combination, the averaged contributions of each modality (EEG, ECG, and fNIRS) to the DCL differences were quite similar (0.46, 0.45, and 0.46, respectively).

# Comparison of Hybrid Approaches Using EEG/ECG and fNIRS

To investigate the hybrid effect of the classification for the two different driving conditions, we compared various combinations of modalities with respect to the classifiers' outputs. For the combined classifiers in each modality, each classifier's outputs (EEG, ECG, and fNIRS) were regarded as features of the second classifier. Thereafter, the outputs of the second classifier represent the results of the combined classifiers. A flow diagram of this procedure is depicted in **Figure 7**.

Classification accuracies of single modalities (EEG, ECG, fNIRS) and all combinations of modalities (EEG+ECG,

EEG+fNIRS, ECG+fNIRS, and EEG+ECG+fNIRS) at the classification level are summarized in **Table 3**. Most of the subjects (8 of 11) achieved improved performance in the combined EEG+ECG+fNIRS, and the average performance of this combination was greater than that of the others. A one-way ANOVA conducted on the seven different approaches indicated that there was a considerably significant difference [F(6, 70) = 4.38, p = 0.0008 < 0.10]. Further, the EEG+ECG+fNIRS combinations differed significantly from the others (p < 0.05, Wilcoxon signed-rank test). Each level of significance is marked with asterisks in **Table 3**. Notably, subject S6 showed the greatest improvement (∼30.5%) in the EEG+ECG+fNIRS combination compared to EEG only.

# DISCUSSION

# EEG Spectral Changes and Driving Conditions

To date, most researchers have investigated driving fatigue using EEG changes, which are promising indicators of this phenomenon (Lal and Craig, 2001), and EEG has the advantages of being portable, noninvasive, inexpensive, and safe to measure during driving. With EEG recording, Lal and Craig (2002) found substantial increases in delta, theta, and alpha activity in the transition to fatigue, which was


*The highest accuracies among seven approaches are displayed in bold. Asterisks indicate the level of signficance with the EEG+ECG+fNIRS combination (\*\*p* < *0.01: \*p* < *0.05).*

consistent with existing findings described in a review paper (Sahayadhas et al., 2012). Alpha activity is believed to be the most prominent indicator of driver fatigue. With this reasoning, Simon et al. (2011) verified alpha spindle activity based on a short-time Fourier transformation in real traffic conditions. Statistical analysis of these actual driving data revealed significant increases for all alpha spindle parameters, such as rate, duration, amplitude, and power, between the awake and drowsy states during 20 min of driving. Similarly, in the EEG recordings in this work, we observed a significant increase in alpha based on RPL. To reduce session and subject variability, a normalized alpha RPL was introduced and a significant alpha RPL difference was found in the centro-parietal region.

It is known that attention is also correlated with alpha power suppression. In our experiment, visual attention may be expected to be related closely to a simulated driving task. Such visual attention-related alpha power suppression may be observed normally in the occipital region, as reported in previous studies (Worden et al., 2000; Sauseng et al., 2005; Rihs et al., 2007). However, in our study, notable alpha suppression was observed in the centro-parietal region alone, which is consistent with results in previous studies of fatigue (Lal and Craig, 2001, 2002; Simon et al., 2011; Sahayadhas et al., 2012). For this reason, it is clear that a reduction in power in the alpha band was correlated with fatigue in this experiment. We observed beta RPL changes in the sleepdeprived condition, and beta power may be an additional indicator of mental fatigue. Tanaka et al. (2012) found that beta power densities decreased significantly after tiring cognitive tasks. They calculated EEG power spectra in each band and showed that beta waves decreased significantly in the frontocentral region with increased driving times. It also has been reported that beta rhythm is associated closely with increased alertness and arousal (Okogbaa et al., 1994), which is likely to be applicable to driving situations (Yeo et al., 2009; Yang et al., 2010; Zhao et al., 2012). In this work, we inferred that the lack of arousal and alertness caused by mental fatigue and sleep deprivation may result in a decreased beta rhythm during simulated driving.

Until now, most studies related to the detection of mental fatigue during driving have been experimental, and driving conditions have been divided according to the elapsed duration of driving. For example, data from the first 10 min have been considered to be the normal condition, while those from the last 10 min were specified as the fatigued condition (Li et al., 2009; Simon et al., 2011). Such an approach may not be appropriate, however, as some people may not develop fatigue even after 2–3 h of driving, especially professional drivers. Therefore, in order to discriminate between high- and low-risk conditions explicitly, each subject was both at high and low risk of fatigue before the driving tests, depending on how many hours they slept at night. Because sleep deprivation is well known to affect decision-making, attention, vigilance, human performance, and mental fatigue (Åkerstedt et al., 2004; Alhola and Polo-Kantola, 2007), it is appropriate to refer to sleep deprivation as analogous to fatigue. In addition, in our driving simulator, the steering wheel vibrated whenever the car crashed into a barrier to prevent drivers from actually falling asleep. Preventing the subjects from falling asleep may have suppressed activation in the delta and theta bands because these waves are associated closely with deep sleep (Maquet et al., 1997) and REM sleep (Jouvet, 1969), respectively.

# Observations of Fatigue in the Well-Rested Condition

Before the experiment, all subjects with well-rested condition were instructed to sleep over 7 h to ensure that they were mentally alert and physically refreshed. However, several subjects experienced fatigue in the driving task nonetheless, due to various internal or external environmental factors, even though they reported that they had slept well the previous night. Subjects S7 and S9 had the lowest DCL values in the wellrested condition, as shown in **Table 2** and **Figure 6**. According to their questionnaires, these two subjects recorded a score of 2 in the sleepiness section (1: rarely sleepy to 5: very sleepy) prior to the experiment, yet they often nodded off during driving. Their behavior was recorded on the HD-Webcam, and showed clearly that they were drowsy. Furthermore, they reported scores of 4 and 5, respectively, on the sleepiness scale after the experiment. **Figure 8** represents the DCLs for these two subjects over time. As shown, their DCLs in the well-rested condition were similar to those in the sleep-deprived condition. The DCLs fluctuated in the well-rested condition, as shown by repeated increases and decreases. Moreover, these two subjects achieved low classification accuracies in a single modality, as shown in **Table 3**, although their accuracy improved when measured with mixed features. Clearly, we believe that these two subjects were likely to have been fatigued despite their assignment to the wellrested condition. In this work, we used the HD-webcam only to monitor the subjects, and did not measure or analyze behavioral data. Analyzing subjects' behavior in real-time, such as head or eye movements, may offer supporting evidence that some subjects slept well but experienced mental fatigue nonetheless. We will collect such behavioral data in the online mental fatigue monitoring system, which is currently under investigation.

# Temporal Mismatch between EEG and fNIRS

Recently, many studies have tried to combine EEG and fNIRS to improve classification accuracy or increase the degrees of freedom in BCI systems (Fazli et al., 2012; Khan et al., 2014; Putze et al., 2014; Koo et al., 2015; Yin et al., 2015). However, the fNIRS system measures hemodynamic change, which is a delayed response compared to neuronal electrical activity, and it also has a relatively low temporal resolution (<10 Hz), both of which are critical drawbacks in fNIRS measurements. Because of their low temporal resolution, it is sometimes difficult to combine fNIRS data with other brain imaging data. Nevertheless, one of the most significant merits of the fNIRS system is its ability to measure oxygen consumption related to blood flow in the brain, similar to that in functional magnetic resonance imaging (fMRI), and fNIRS has been nicknamed the portable fMRI for this reason.

Considering the advantages and drawbacks of the fNIRS system, we calculated the features during each 1-min period throughout the dataset in this work. Each minute in the 30 min of data yielded a DCL value, which was used to discriminate between the well-rested and sleep-deprived conditions. In addition, oxygen consumption in the prefrontal cortex may represent cognitive workload or fatigue (Ayaz et al., 2012, 2013; Harrison et al., 2014; McKendrick et al., 2014). Therefore, it is likely that fNIRS may be a significant indicator of mental fatigue, and we are sure that employing multimodal data is quite useful in monitoring mental fatigue. Also, the prefrontal cortex is related closely to mental workload (Mandrick et al., 2013a)

and the performance of cognitive tasks (Mandrick et al., 2013b). For these reasons, we introduced the prefrontal fNIRS in this work. However, whole head fNIRS would be beneficial and will be considered in our future work.

# Limitations and Future Work

We proposed here an indicator of drivers' mental fatigue (Equation 4) that was able to discriminate the drivers' mental conditions well. To include equal contributions of EEG, ECG, and fNIRS features in the indicator, we normalized and summed all three. Further, all three were weighted equally to calculate the indicator, although this might not be an optimal method of quantification. Therefore, we calculated a new indicator using a weighting factor of [DCL = a ∗ norm beta RPL alpha RPL + b ∗ norm HbO + c ∗ norm(HR)]. Each weighting factor was defined between 0 and 1 with increments of 0.10; thus, we were able to search for the highest DCL difference in the two conditions (subtraction of sleep-deprived from well-rested). As a result, they had values comparable to those with equal weights and we observed no significant improvement. Even though we were able to apply elegant optimization methods, a somewhat better indicator may be achieved.

The primary reason to monitor drivers' mental fatigue is to prevent car accidents by providing drivers with a rapid and reliable alarm. To achieve this purpose, both making predictions before driving and monitoring a driver's condition in real-time are potential approaches. Our proposed rDCL (Equation 5) can predict a driver's fatigue prior to driving if training data can be obtained when the driver is in an alert state. Compared with DCL in the well-rested condition, the driver's condition prior to driving may be pre-checked by estimating the DCL. Although we used the entire dataset obtained during 30 min in the well-rested condition in this work, data of a shorter duration could be used for baseline. We are investigating the minimum duration needed to predict drivers' fatigue now. Another approach to accident prevention is to monitor drivers' fatigue in real-time.

We performed only an offline analysis in this work, and normalized DCL values were estimated each minute. For online monitoring, however, normalization of each modality's features is quite difficult to implement based on current methods available. One possible approach to solve the normalization issue is to record resting data before driving and use them as baseline data. Alternatively, an adaptive normalization method that updates feature values in real-time is a candidate. We are investigating the most appropriate normalization method for a subsequent online mental fatigue monitoring system.

In this work, we attempted to analyze multimodal data with simultaneous recordings of EEG/ECG/EOG and fNIRS. One of the advantages of multimodal signal integration is that each imaging method provides a physiologically and physically filtered view of one or more brain processes of interest. Thus far, the EEG-fMRI combination has been investigated widely, especially in epilepsy research, to help localize specific regions (Rosenkranz and Lemieux, 2010) by improving spatial and temporal resolution. Now, the EEG-fNIRS combination may be an alternative imaging method with merits that include low cost and simple implementation.

In this study, we custom-built an fNIRS that was already validated in previous study (Kim et al., 2015), and thus enabled us to design the experiment well. Although, a lengthy preparation time was required to attach the detectors and emitters, and test the quality of the light intensity for fNIRS measurement, this EEG-fNIRS integration may be quite beneficial in developing a monitoring system, as reported in the existing literature (Fazli et al., 2012; Wallois et al., 2012; Khan et al., 2014; Morioka et al., 2014; Putze et al., 2014; Koo et al., 2015; Yin et al., 2015). Another concern in analyzing multimodal data is how their differences (physical values, temporal resolution) are considered in an integrated frame. An in-depth investigation is needed to enhance the synergistic effect of multimodal data recording. Similarly, in this study, we were unable to guarantee that the EEG, ECG, and fNIRS, or their combined features, are related linearly to the fatigue level, although in the multimodal results, each modality influenced the fatigue level to some degree, as shown in **Figure 8**.

# CONCLUSIONS

The purpose of this study was to use simultaneous EEG/ECG/EOG and fNIRS recordings to determine neurophysiological correlates that can be used to discriminate sleep deprivation-induced mental fatigue in drivers by comparison to those who are well-rested. To achieve our purpose, we introduced two driving conditions (well-rested and sleep-deprived), and were able to extract significant features from their EEG, ECG, and fNIRS data. However, no significant feature was found in the EOG due to high variability in the subjects' data. The features observed allowed us to determine the mental condition of each driver, and also yielded good discriminative results between two driving conditions. Further, we investigated the synergistic effects of multimodal data to compare the various combinations at the classification level with a single modality. In conclusion, our proposed combined approach of simultaneous EEG/ECG and fNIRS data may be a promising tool with which to monitor drivers' mental fatigue.

# AUTHOR CONTRIBUTIONS

SA, TN, HJ, JGK, and SCJ designed the experimental paradigm, and TN and JGK custom-built the fNIRS system. Data collection was conducted by SA, TN, and HJ, and analysis was performed by SA; the entire manuscript was written by SA and SCJ.

# ACKNOWLEDGMENTS

This work was supported by Hyundai Next Generation Vehicle (NGV), the National Research Foundation of Korea (#2013R1A1A2009029), and the GIST Research Institute (GRI) in 2016.

# REFERENCES


Grandjean, E. (1979). Fatigue in industry. Br. J. Ind. Med. 36, 175–186.


population-based case control study in Fiji (TRIP 12). Injury 45, 586–591. doi: 10.1016/j.injury.2013.06.007


near-infrared spectroscopy-derived cerebral hemodynamic responses. Int. J. Ind. Ergon. 43, 335–341. doi: 10.1016/j.ergon.2013.05.003


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Ahn, Nguyen, Jang, Kim and Jun. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Steering Demands Diminish the Early-P3, Late-P3 and RON Components of the Event-Related Potential of Task-Irrelevant Environmental Sounds

Menja Scheer <sup>1</sup> , Heinrich H. Bülthoff 1,2\* and Lewis L. Chuang<sup>1</sup> \*

<sup>1</sup> Department of Perception, Cognition and Action, Max Planck Institute for Biological Cybernetics, Tübingen, Germany, <sup>2</sup> Department of Cognitive and Brain Engineering, Korea University, Seoul, South Korea

The current study investigates the demands that steering places on mental resources. Instead of a conventional dual-task paradigm, participants of this study were only required to perform a steering task while task-irrelevant auditory distractor probes (environmental sounds and beep tones) were intermittently presented. The event-related potentials (ERPs), which were generated by these probes, were analyzed for their sensitivity to the steering task's demands. The steering task required participants to counteract unpredictable roll disturbances and difficulty was manipulated either by adjusting the bandwidth of the roll disturbance or by varying the complexity of the control dynamics. A mass univariate analysis revealed that steering selectively diminishes the amplitudes of early P3, late P3, and the re-orientation negativity (RON) to task-irrelevant environmental sounds but not to beep tones. Our findings are in line with a three-stage distraction model, which interprets these ERPs to reflect the post-sensory detection of the task-irrelevant stimulus, engagement, and re-orientation back to the steering task. This interpretation is consistent with our manipulations for steering difficulty. More participants showed diminished amplitudes for these ERPs in the "hard" steering condition relative to the "easy" condition. To sum up, the current work identifies the spatiotemporal ERP components of task-irrelevant auditory probes that are sensitive to steering demands on mental resources. This provides a non-intrusive method for evaluating mental workload in novel steering environments.

#### Keywords: steering, mental workload, distraction, MMN, early P3, late P3, RON

# INTRODUCTION

Safety concerns have strongly motivated research in determining the demands, or workload, that users experience while performing closed-loop steering tasks, particular in the context of driving a car or piloting an aircraft (for a general review about workload, see Kramer, 1991; Wickens, 2008; Young et al., 2015). Even if competence can be maintained in spite of high mental workload, such scenarios leave little spare capacity for handling unexpected occurrences. There is no doubt that steering places high requirements on visual and motoric resources (Land and Lee, 1994; Salvucci and Gray, 2004).

#### Edited by:

Hasan Ayaz, Drexel University, USA

#### Reviewed by:

Kimmo Alho, University of Helsinki, Finland Erich Schröger, University of Leipzig, Germany

#### \*Correspondence:

Heinrich H. Bülthoff heinrich.buelthoff@tuebingen.mpg.de; Lewis L. Chuang lewis.chuang@tuebingen.mpg.de

> Received: 22 December 2015 Accepted: 15 February 2016 Published: 01 March 2016

#### Citation:

Scheer M, Bülthoff HH and Chuang LL (2016) Steering Demands Diminish the Early-P3, Late-P3 and RON Components of the Event-Related Potential of Task-Irrelevant Environmental Sounds. Front. Hum. Neurosci. 10:73. doi: 10.3389/fnhum.2016.00073 Besides this, some aspects of steering have also been shown to require mental resources (Wickens et al., 1983, 1984). This has been typically demonstrated with the use of dual-task paradigms that induce a competition for mental resources between the primary steering task and an appropriately chosen secondary task (McLeod, 1977; Wickens and Gopher, 1977). The purpose of this article is to evaluate the demands that steering places on mental resources without requiring the user to perform a secondary task. To do so, we investigate how steering demands modify the eventrelated potentials (ERPs) to task-irrelevant auditory probes. The steering task is further manipulated for two aspects of steering that are known to influence handling difficulty, namely the bandwidth of disturbance and the complexity of (vehicle) control dynamics.

Workload can be defined as the ratio between the demands of a task and the resources of the human operator. Its concept originates from the idea that human operators possess, at any given time, a limited reserve of mental resources (Kramer, 1991; Wickens, 2008). By introducing a competition for this limited reserve, for example by requiring participants to perform two tasks simultaneously, researchers are able to investigate how difficulty manipulations in a primary task can create a demand for resources that are drawn away from an accompanying secondary task. Changes in resource demands are indexed by secondary task performance. A comparison of performance measures on competing tasks typically demonstrate that participants are capable of varying the relative prioritization of competing tasks (Wickens and Gopher, 1977), but only when the tasks overlap in their resource requirements (McLeod, 1977). The ''Multiple Resource Theory'' provides a framework that allows researchers and practitioners to define the resource requirements of different tasks and in doing so, predict possible conflicts (Wickens and Yeh, 1983; Wickens, 2002, 2008). Within this framework, a steering task places obvious demands on visual perception and motoric responses. By using electroencephalography (EEG) to measure the ERP to secondary task stimuli, Wickens and colleagues were able to demonstrate the demands of various aspects of steering on mental resources as well.

To date, ERP studies have broadly demonstrated that steering demands tend to reduce the amplitude of the P300, an ERP component that is generated by the target stimuli of a secondary task (e.g., Wickens et al., 1977; Isreal et al., 1980; Wickens and Yeh, 1983). Dual-task studies that investigate steering demands typically require participants to detect and explicitly respond to infrequently presented ''oddball'' targets as a secondary task. ''Oddballs'' elicit a prominent P300 component in the EEG signal. The P300 is a positive deflection between 250–400 ms and its amplitude has been used to index the level of experienced workload (Kok, 1997). The finding that steering demands diminish P300 amplitudes in an accompanying ''oddball'' detection task is commonly interpreted as follows. The primary steering task places prioritized demands on mental resources, resulting in the reduced availability of mental resources that would otherwise be recruited for the detection of secondary ''oddball'' targets (Wickens et al., 1977; Isreal et al., 1980; Wickens and Yeh, 1983). Hence, the reduced availability of mental resources is reflected in the reduced amplitudes of P300 that are elicited by the detected ''oddballs''. This serves as a proxy for evaluating the demands for mental resources, given different manipulations of steering difficulty. Some steering parameters exert a uniform cost on P300 amplitudes regardless of their manipulated difficulty levels, while increasing the difficulty levels of other parameters can induce decreased P300s to secondary ''oddball'' targets. For example, increasing the number of simultaneously tracked dimensions (Wickens et al., 1977; Kramer et al., 1983; Sirevaag et al., 1989), tracking speed (Kida et al., 2004), and the frequency bandwidth of the tracked target (Isreal et al., 1980) do not result in a decrease of P300 amplitudes. In contrast, increasing the complexity of control dynamics (e.g., from a first-order to a second-order integrator; Wickens et al., 1983, 1984; Sirevaag et al., 1989) or the unpredictability of the tracked target (Kida et al., 2004) result in corresponding decreases in P300 amplitudes. Other ERP components have also been analyzed for their sensitivity to changes in steering demands, albeit with mixed results. Kida et al. (2004) reported a decrease in the amplitude of the N140 component to the somatosensory targets of a secondary oddball task, which did not vary with the predictability of the steering task.

Until now, ERP studies of steering demands have mainly been performed in the presence of a secondary task that contains the stimuli for eliciting the ERP. It is generally believed that ERP probes are only effective for evaluating the resource demands of tasks that they are in explicit conflict with. Indeed, Wickens et al. (1983) have shown that the influence of steering demands on P300 amplitudes is removed when the ERP probes were task-irrelevant. Unfortunately, dual-task paradigms present several limitations in understanding steering demands. First, requiring an overt response to a secondary task interferes with the performance of the primary steering task (Wickens et al., 1983). In this regard, the secondary task is not a passive consumer of residual mental resources but is, rather, in direct competition with the primary task for shared resources. Second, the researcher has little control over how participants might choose to divide their resources between primary and secondary task, regardless of explicit instructions. Finally, estimated workload from ERP measurements could be due to the interaction of the primary and the secondary task demands, instead of the primary task alone. These reasons, amongst others, have motivated the development of nonintrusive methods for estimating primary task demands that do not necessitate a secondary task.

In contrast to Wickens et al.'s (1983) findings, ERPs to taskirrelevant stimuli can sometimes be demonstrated to vary with the demands of a task that is performed in isolation. This has been shown with the use of ERP probe stimuli that are more likely to recruit larger momentary shifts of resources than simple beep tones, such as complex environmental sounds (Courchesne et al., 1975; Ullsperger et al., 2001; Polich, 2003). Such stimuli are task-irrelevant and reliably elicit a positive ERP component termed the novelty-P3 (P3a)—that has a similar time-course to the P300 but with a frontal instead of a parietal distribution (Polich, 2007). Given their task-irrelevant nature, it is more reasonable to assume that their elicited ERP components reflect residual resources that are not consumed by the demands of the investigated task. Task-irrelevant probes have been used to estimate the demands of a variety of tasks including arithmetic and visual monitoring (Ullsperger et al., 2001), working memory task (i.e., n-back task; SanMiguel et al., 2008), Tetris<sup>r</sup> (Miller et al., 2011; Dyke et al., 2015), first-person-shooter (Allison and Polich, 2008) and car racing games (Burns and Fairclough, 2015). It has not always been necessary to employ novel environmental sounds in order to generate ERPs for the evaluation of task demands—simple beep tones have proven to be sufficient in some instances (Burns and Fairclough, 2015). Nonetheless, there are also other examples whereby simple beep tones do not generate ERPs (i.e., P3a) that are sensitive to task demands (e.g., Ullsperger et al., 2001; Muller-Gass et al., 2007). Environmental sounds have the added value of generating larger novelty-P3s that are further separable for an early and late P3 component, which are claimed to be functionally distinct (Alho et al., 1998; Yago et al., 2003; McDonald et al., 2010). Early P3 is claimed to reflect post-sensory detection of unexpected events that contradict the observer's representation of the external world, while late P3 is claimed to reflect attentional processing of the unexpected event. Besides novelty-P3, other ERP components of task-irrelevant probes (i.e., N1/MMN; Ullsperger et al., 2001; Dyke et al., 2015; P2 and N2; Allison and Polich, 2008; late positive potential or LPP; Miller et al., 2011) have also been claimed to be diminished by increased task demands, albeit less consistently.

Taken together, ERP probes can be regarded as distractors that demand resources either through explicit competition with the primary task (Isreal et al., 1980; Wickens et al., 1983, 1984) or by implicitly drawing upon residual resources that are unconsumed by the primary task (SanMiguel et al., 2008; Miller et al., 2011; Burns and Fairclough, 2015; Dyke et al., 2015). Previous work that assessed steering demands might have required ERP probes to be task-relevant because the employed probes (i.e., beep tones) did not recruit sufficient resources to indicate the influence of steering demands.

ERP components that are elicited by distracting stimuli have been suggested to reflect three stages of distraction (Schröger and Wolff, 1998; Escera and Corral, 2007; Wetzel and Schröger, 2014). Based on the specific ERP components that are decreased with an increase of the task demands, inferences about the stages of distraction that are influenced can be drawn. The first stage of distraction is the detection that the model of the environment was violated. When engaged in a task, participants can be expected to be primarily focused on this task. At the same time, the regularities of the acoustic environment are encoded and used to form a predictive model of the surroundings. Whenever a current event violates this predictive model, the distraction process is initiated. This first stage of distraction is reflected in the elicited ERP by the mismatch negativity (MMN). The MMN is an early, negative ERP component that is apparent in the difference wave between the distractor- and the standard stimuli, for example in an oddball paradigm. Thus, the presence of a MMN indicates early sensory detection of an unexpected change in the environment. The second stage is the, voluntary or involuntary, orientation of attention towards the distracting event. Depending on the level of readily available resources and the eliciting event, resources might be directed towards the distracting event in order to process it. This stage is reflected by the occurrence of the novelty-P3 component. The third stage describes a disengagement of resources from the distracting event and a re-orientation back to the task at hand. Disengagement from the distractor stimuli is reflected by the re-orientation negativity (RON), a late negative component.

The current study investigates the influence of steering demands on ERP components that are generated by taskirrelevant auditory distractor stimuli. In the viewing baseline condition, we expect distractor stimuli to elicit ERP components that correspond to the three-stage distraction model, regardless of whether they are infrequently presented beep tones or infrequently presented environment sounds. However, we expect these ERP components to be larger when generated by environment sounds. Furthermore, we expect these ERP components to decrease when participants are required to perform a steering task, but only when they are generated by environmental distractors. We employ a data-driven approach (i.e., mass univariate analyses; Groppe et al., 2011) to ensure the validity of any correspondence between distractor ERP components and steering demands. This approach allows us to define each affected component in terms of its spatial and temporal characteristics, as opposed to restricting our analyses to an a priori selection of components (cf., Miller et al., 2011; Dyke et al., 2015). ERP components that are found to be sensitive to steering demands are subsequently submitted for permutation tests to evaluate their suitability for discriminating between manipulated levels of steering difficulty. We manipulate steering difficulty by either increasing the frequency bandwidth of the disturbance that is experienced during steering (cf., Isreal et al., 1980), or by varying the complexity of the control dynamics (cf., Wickens et al., 1983). We expect more participants to demonstrate a significant reduction in these targeted ERP components in the ''hard'' condition compared to the ''easy'' condition.

# MATERIALS AND METHODS

### Participants

We tested 24 right-handed volunteers (seven women, mean age = 27.9 years, SD = 5.2). All participants reported normal or corrected-to-normal vision, no hearing impairment and no history of neurological diseases. The experimental procedure was approved by the MPG Ethics Council and all participants gave written informed consent.

# Stimuli and Apparatus

The experiment was set up in a dimly-lit, low noise environment. It consisted of a primary steering task and the presentation of task-irrelevant, auditory stimuli. The steering task was presented via a central display (1027 × 581 mm, resolution 1920 × 1080 px), approximately 180 cm away from the seated participants. Auditory stimuli were presented to both ears via headphones (MDR-CD380, Sony), that where driven by a soundcard (sampling frequency: 96 kHz; DELTA1010LT, M-Audio). A secondary heads-down display informed the participants of their most recent steering performance and the current experimental status. Data collection was performed, using customized software, written in Matlab Simulink. The software version of the NASA-TLX questionnaire (Hart and Staveland, 1988) was presented on a separate notebook.

Two lines (length: 16◦ visual angle, thickness: 2 px) were presented on a blue background. These lines were a white horizontal non-moving reference line and a second black line that rotated around the joint center of both lines. A right-handed sidestick (Extreme 3D Pro, Logitech) with a spring constant of 0.6 N/degree was used as input device.

During the entire experiment, participants were probed with task-irrelevant stimuli with a random inter-stimulus interval (mean = 1.20 s, SD = 62 ms). Infrequently presented environmental sound distractors (prob. of presentation: p = 0.1) were intermixed with frequent, standard (p = 0.8) and infrequent distractor (p = 0.1) beep-tones. Two easily discriminable beeptones were used (i.e., 300 and 700 Hz) and their probability (p = 0.1 and p = 0.8) was counter-balanced across participants. The environmental sounds consisted of a set of 30 recognizable complex sounds (e.g., human laughter) that were selected from a database obtained from the New York State Psychiatric Institute (Fabiani et al., 1996). The environmental sounds were presented in quasi-random order without replacement. Environmental sounds, as well as standard and distractor beep-tones, had a mean duration of 336 ms (SD = 62.5 ms) and a mean intensity of 60 dB SPL (SD =0.31 dB). Both, environmental and beep sounds were always preceded by at least one standard beep.

#### Task

Participants performed a steering task in which they were required to continuously counteract a quasi-random roll motion of a rotating line. This unpredictable roll motion was defined by the forcing function ft(t) (see Equation (1) and **Table 1**). Participants were instructed to minimize the displacement e(t) of the rotating line (black in **Figure 1**) relative to the reference line (white in **Figure 1**), with lateral deflections of the sidestick.

Task-irrelevant sounds were presented that our participants were instructed to disregard. The experiment consisted of steering as well as of viewing trials. The viewing trials presented the same visual feedback in all sessions and served as a baseline. In this condition, participants viewed the steering task that was prerecorded. By comparing the steering trials against these viewing trials that both presented the same visualization, we could determine how the demands of the steering task influenced the measured ERPs, independent of the visualization.

Two aspects of the steering task were used to influence the level of workload in the task: (1) the frequency bandwidth of the roll disturbance and (2) the complexity of the internal control dynamics. In every steering trial, one of these aspects was manipulated, leading to two levels of steering task difficulty, namely ''easy'' and ''hard'' for each of the two manipulations. The second aspect was kept constant and will be referred to as ''standard'', in the following. The objective was to create two levels of workload for independent manipulations (cf. Isreal et al., 1980; Wickens et al., 1984). Details of these manipulations of engagement are given in the following.

#### Manipulation of the Bandwidth of Roll Disturbance

The roll disturbance was designed as a sum of ten sine waves that could be manipulated for the number and intensity of roll reversals by adjusting the frequency bandwidth, such that the ''easy'' condition presented less power in the higher frequencies, compared to the ''hard'' condition. The ''standard'' condition was designed to be an intermediate of these two conditions.

In all conditions, the forcing function was formalized as the sum of ten sine waves that were non-harmonically related, as described in (1):

$$f\_t(t) = \sum\_{j=1}^{10} A(j) \cdot \sin\left(\omega(j) \cdot t + \phi(j)\right) \tag{1}$$

The amplitude A(j), frequency ω(j) and phase φ(j) of these 10 sine waves, for the ''standard'', the ''easy'' and the ''hard'' condition, are given in the **Table 1**.

TABLE 1 | Amplitude A(j), frequency ω(j) and phase φ(j) of the ten sine waves, contained in the forcing function, for the "standard", "easy" and "hard" condition.


The forcing function in the ''standard'' condition had a variance of 1.61 degree<sup>2</sup> , adapted from Nieuwenhuizen et al. (2013). In the ''easy'' condition a variance of 1.47 degree<sup>2</sup> and in the ''hard'' condition a variance of 1.78 degree<sup>2</sup> was applied.

To sum up, the ''hard'' condition presented larger amplitudes in the higher frequencies that resulted in more instances of rollreversals than the ''standard'' and ''easy'' condition.

#### Manipulation of the Control Dynamics

By manipulating the control dynamics, the motion of the rotating line, relative to the sidestick input of the participants, was manipulated. The control dynamics can be formally described as the transfer function H(s).

In the ''standard'' condition the transfer function had the form of:

$$H\_{\text{standard}}(s) = \frac{2.75}{s \, (s + \omega\_b)} \tag{2}$$

This represents a hybrid controller that reacts to the sidestick input with a weighted mixture of velocity and acceleration control. In other words, depending on the frequency of the sidestick input of the participant, either the velocity or the acceleration of the rotating line was influenced. To manipulate the internal control dynamics for difficulty levels, we removed either the velocity or the acceleration component, resulting in either a pure velocity controller with the following form for the ''easy'' condition:

$$H\_{\text{easy}}(\mathbf{s}) = \frac{1.5}{\mathbf{s}} \tag{3}$$

or a pure acceleration controller with the following form for the ''hard'' condition:

$$H\_{\text{hard}}(s) = \frac{5}{s^2} \tag{4}$$

These transfer functions were adopted from Zollner et al. (2010).

Controlling the acceleration has been shown to be more demanding than controlling the velocity (e.g., Wickens et al., 1984; Sirevaag et al., 1989). When the velocity is controlled, the angle of the sidestick translates to the velocity of the controlled line. In this case, keeping the sidestick in the center results in no motion of the controlled line. When the acceleration is controlled instead, keeping the sidestick in the center results in no further acceleration, but the controlled line will maintain its current velocity. Thus, participants have to anticipate the future consequence of their input commands when using a pure acceleration controller.

#### Design and Procedure

The experiment consisted of two sessions on 2 separate days, one that contained the manipulation of the bandwidth of the roll disturbance and one that contained the manipulation of the complexity of the control dynamics. Session order was counterbalanced across participants. Each of the two sessions consisted of four blocks that contained three trials each. The four blocks differed in terms of the implemented difficulty (''easy'' or ''hard''). Each block contained two steering and one viewing trial, where the order of the trials was randomized for every participant. Each of the trials lasted 4 min 26 s and trials were separated by 20 s of rest. During EEG preparation, participants were trained on every difficulty level and for each manipulation for at least one trial. Over the whole course of the experiment, after each trial, participants were presented with their performance (normalized root-mean-square error, nRMSerror) to keep them motivated. At the end of each block, participants were asked to rate their perceived workload in the NASA-TLX questionnaire for each level of difficulty, separately.

#### EEG Signal Processing

The EEG was recorded with 26 active g.tec Ag/AgCl electrodes (g.LADYbird, g.tec), mounted in an elastic cap (g.GAMMAcap, g.tec). The electrooculogram (EOG) was recorded from four additional electrodes: at the outer canthi of the left and right eye, and above and below the left eye. All recorded signals were rereferenced off-line to the linked mastoids. The ground electrode was placed at FPz. The signals were amplified in the range between 0 and 2.4 kHz and digitized with a sampling rate of 256 Hz (g.USBamp, g.tec).

Further processing and analysis of the ERP signal was performed with Matlab and the open source Matlab toolboxes EEGLAB (Delorme and Makeig, 2004) and ERPLAB (Lopez-Calderon and Luck, 2014). In the off-line preprocessing, the data was high pass filtered at 1 Hz and low pass filtered at 15 Hz. Second-order Butterworth filters were used for both filters. From the filtered data, epochs from −200 to 1000 ms, relative to the onset of the presented sounds, were extracted. Epochs that showed blink or eye movement characteristics, in any of the electrodes, were rejected. The remaining epochs were averaged for each auditory stimulus type (environmental distractor, beep distractor, standard beep tone) and baseline corrected with reference to the prestimulus interval. The statistical analysis of the ERPs was based on the difference wave between ERPs that were elicited by distractors (the beep and environmental distractors, separately) and standards. This difference wave has been also referred to as distraction potential (DP; Escera et al., 2003).

#### Statistical Analysis of the ERPs

We adopted a 2-stage approach for analyzing the ERPs elicited by the environmental and beep distractors. First, we employed mass univariate analyses to: (i) determine the ERP components that were elicited by the distractors; (ii) determine the ERP components that differed between the environmental and beep distractors; and (iii) identify and define the spatiotemporal characteristics of ERP components that were significantly reduced during steering, relative to the viewing baseline condition. To perform the mass univariate analyses, measured brain potentials were compared between the relevant conditions at all time points (between 100–900 ms after the presentation of the auditory stimuli) and all measured electrodes (26 electrodes distributed over the scalp). Two-tailed t-tests were performed between the compared conditions to yield t-values for every time-point of each electrode. The false discovery rate (FDR) was controlled using the Benjamini and Yekutieli (2001) procedure with a FDR level of 5%. This particular FDR procedure guarantees that the true FDR will approximate the nominal FDR level of 5%, regardless of the dependency structure of the multiple tests (a tutorial review of the mass univariate analysis is provided by Groppe et al., 2011). This revealed ERP time points and their corresponding electrodes that were significantly different between the conditions.

Second, the ERP components that were identified to be sensitive to steering demands were submitted to permutation tests for each individual participant, in order to determine if these components were influenced by our difficulty manipulations for either disturbance bandwidth or control dynamics. A description of these single-subject permutation tests and their interpretation is provided by Maris and Oostenveld (2007). In brief, four key steps are performed for each participant: First, the selected electrode's mean amplitude over the time-range of interest was computed for every trial. Second, these mean amplitudes were submitted to a one-tailed, paired-samples t-test to yield a test t-value. Third, a null-distribution of t-values was generated. All trials were pooled and randomly distributed (without replacement) to two subsets. A paired t-test was performed between these two sub-sets to generate a single t-value. This was repeated 10,000 times to generate a null distribution. Fourth, the test t-value was compared to this generated null-distribution to determine its z-value. An alpha-level of 0.05 was adopted to determine if the tested participant showed a significant difference for the difficulty manipulations. This procedure was repeated for each participant and each ERP component of interest.

#### RESULTS

# Steering Performance and Perceived Workload

Steering performance and the perceived workload were analyzed for our manipulations of steering demands. This was performed independently for our manipulations of disturbance bandwidth and control dynamics complexity with the use of a pairedsamples t-test. This was to validate that our participants responded appropriately to our difficulty manipulations for ''easy'' and ''hard''. An alpha-level of 0.05 was adopted for significance testing. The Cohen's d is reported for the effect size. Overall, we found medium to large effects in our manipulations of difficulty for both performance and perceived workload.

Steering performance was evaluated based on the root-mean squared deviation of the rotating line from the reference line (i.e., RMSerror). The mean RMSerror was significantly higher in the ''hard'' than in the ''easy'' condition for manipulations of the disturbance bandwidth (t(23) = −6.6, p < 0.001, d = −1.4) and control dynamics (t(23) = −2.2, p = 0.04, d = −0.4).

Perceived workload was based on the participants' responses in the NASA-TLX questionnaire. The resulting workload score is the weighted sum of six subscales that were perceived by the participants as contributing to the overall workload in the following proportions: Effort: 24.5%, Mental Demand: 23.1%, Temporal Demand: 17.7%, Performance: 14.3%, Physical Demand: 13.4%, and Frustration: 7.0%. The ''hard'' condition was rated as being significantly more demanding than the ''easy'' condition for both manipulations (disturbance bandwidth: t(23) = −3.4, p = 0.00, d = −0.7; control dynamics: t(23) = −3.6, p < 0.001, d = −0.7). **Figure 2** illustrates the distribution of the six subscales over the two manipulations and two levels of difficulty.

### ERP Results

This section is divided into three parts that describe the three analyzed aspects of the elicited ERP components. First, we present the comparison of the two distractor stimuli. Second, we present the results of the comparison between the viewing and steering trials. Third, we present the results of the comparison between the two applied manipulations of steering demands.

#### Comparison of the Two Distractor Stimuli

To begin, we separately identified ERP components that were elicited by the environmental and beep distractors. Therefore, we identified, with mass univariate analysis, the time-periods for which ERP amplitudes were significantly different from the prestimulus time interval. **Figure 3** illustrates the grand averaged waveforms and indicates significant ERP components with black bars. The environmental sounds elicited, in the steering and the viewing condition, a MMN, an early and late P3, a RON, a late positive potential (LPP) and a late negativity (LN). The beep distractors elicited a MMN, a P3a that was not further discriminable for early and late P3 sub-components, a RON, and (only in the steering condition) a LN.

Subsequently, we contrasted the ERPs that were elicited by the environmental and beep distractors. This was performed

FIGURE 2 | Weighted sum of the six subscales of the NASA-TLX that were perceived by the participants as contributing to the overall workload. The error bars represent the 95% confidence interval.

pre-stimulus time-interval. The gray areas highlight the time-periods where the ERPs of the beep and environmental distractor differed significantly from each other.

separately for the steering and the viewing trials with the use of mass univariate analyses. **Figure 3** highlights (in gray) the timeperiods where the ERPs of the beep and environmental distractor differ significantly. This reveals that environmental distractors generate larger P3, RON, LPP and LN components than the beep distractors. The beep distractor generated an MMN that peaked earlier than the environmental distractor.

#### General Demands of the Steering Task

Here, we determined the influence of steering demands on the elicited ERP components. In the grand averaged waveform (see **Figure 4**), the influence of the steering demands can be mainly observed in the ERPs that were elicited by the environmental distractor stimuli and to a lesser degree, in the beep distractors. As expected, for the ERPs that were elicited by the standard beeps the steering demands did not have a visible influence.

Using a mass univariate analysis, we determined the electrodes and time points for which ERPs were significantly decreased during the steering trials, relative to the viewing trials. This was performed separately for the ERPs that were elicited by the environmental distractors and those elicited by the beep distractors. The ERPs elicited by the beep distractors were not significantly influenced by steering demands for any electrode at any time point. In contrast, the ERPs elicited by the environmental distractors were selectively decreased by steering demands at specific time-points and electrodes. **Figure 5** provides a raster diagram to indicate the time-points and electrodes where ERPs of the environmental distractors were sensitive to steering demands. The scalp topographies for significant ERP components are provided together with the significant electrodes, indicated as white filled circles. Altogether, we find that steering demands diminish an early and late sub-component of the novelty-P3, and the RON. These ERP components have a frontocentral distribution.

Steering demands significantly decrease the early P3 generated by the environmental distractor in the time window between 280–330 ms in the frontocentral electrodes (AF3, AF4, F3, F4, FC5, FC1, FC2, C3, T7, Fz, Cz). The late P3 was significantly decreased between 330–430 ms in the central electrodes (FC1, FC2, FC6, C3, C4, CP1, CP2, P3, CP6, Cz, CPz, Pz). Interestingly, steering demands influence late P3 amplitudes at electrodes that do not correspond with the frontal electrodes, which exhibit the largest late P3 amplitudes. The RON was significantly decreased in the time window of 500–550 ms over the left electrodes (AF3, F3, FC1, FC2, FC5, FC6, C3, CP1, CP2, CP5, P3, PO3, Fz, Cz, CPz, Oz).

Following this, we employed permutation tests to analyze the influence of steering demands on the early P3, late P3, and RON of individual participants, when elicited by environmental distractors. Single trials of the two steering conditions (''easy'' and ''hard'') were independently compared to the baseline viewing condition. For each participant, we submitted the recorded data from the electrodes and time points of the targeted ERP components to the permutation test. This was performed independently for the two different manipulations of steering difficulty, namely disturbance bandwidth and control dynamics complexity. **Figure 6** plots the number of participants that produced significantly larger ERP amplitudes in the viewing compared to the ''easy'' or ''hard'' steering trials for the targeted ERP components.

The single-subject analysis produced results that were consistent across both manipulations (i.e., disturbance bandwidth and control dynamics complexity) and all three analyzed components (early P3, late P3 and RON). More participants showed a significant reduction in the three targeted ERP components for the ''hard'' condition than the ''easy'' condition, relative to the ''viewing'' baseline. **Figure 6** also indicates differences across individuals, in terms of how they varied in response to the difficulty manipulations. White bars represent participants whose selected ERP components were diminished in both the ''easy'' and ''hard'' conditions. The dark gray bars represent participants whose ERP components were only diminished by the ''hard'' condition but not by the ''easy'' condition. The light gray bars represent participants whose ERP

Frontiers in Human Neuroscience | www.frontiersin.org March 2016 | Volume 10 | Article 73 |

viewing (red) and steering (black) trials.

components were only diminished by the ''easy'' condition but not by the ''hard'' condition. Overall, the results are in line with our expectations. More participants whose ERPs were unaffected by the ''easy'' condition were, nonetheless, affected by the ''hard'' condition than vice versa.

#### Influence of the Steering Manipulations

Permutation tests were conducted to identify the number of participants who reliably exhibited lower amplitudes for the targeted ERP components (i.e., early P3, late P3, and RON) in the ''hard'' trials relative to the ''easy'' trials. **Figure 7** represents these results as gray bars. The same analysis was performed based only on the peak-amplitude electrode and corresponding time-window (i.e., ±20 ms around the grand average peak). This is the approach that is employed by comparable research (cf., Miller et al., 2011; Dyke et al., 2015). **Figure 7** represents these results as black bars. A comparison shows that a mass univariate analysis approach identified ERP components that were more sensitive to the current steering manipulations. Finally, more participants responded in the expected direction for the targeted ERP components when the complexity of the control dynamics was manipulated for difficulty than when the bandwidth of disturbance was manipulated.

# DISCUSSION

The current study was designed to investigate if the demands of a steering task would attenuate the amplitudes of ERPs to task-irrelevant stimuli. It is in this regard that the current work sets itself apart from previous work that evaluated steering demands by measuring the ERPs to the task-relevant stimuli of a concurrent secondary task (e.g., Wickens et al., 1983, 1984; Sirevaag et al., 1989). The main findings of the current study are that steering demands can significantly reduce the amplitudes of three ERP components (i.e., early P3, late P3, and RON) of task-irrelevant auditory probes. However, this requires the probes to be complex environmental sounds and not simple beep-tones. Two aspects of the steering task (i.e., disturbance bandwidth and control dynamics complexity) were manipulated for steering demands and the found ERP components were significantly diminished in more participants during the difficult conditions relative to the easy conditions for both manipulations. The current results agree with a three-stage distraction model, whereby the ERP probes can be regarded as distractor stimuli that consume mental resources involuntarily (Schröger and Wolff, 1998; Escera and Corral, 2007; Wetzel and Schröger, 2014). Therefore, we will discuss our results within this simple framework. The discussion will be organized as follows. First, we shall discuss the differences between complex environmental sounds and simple beep tones in order to understand why the former elicit ERPs that are sensitive to steering demands while the latter do not. Second, we will discuss the implications of each ERP component that was found to respond to steering demands. Third, we will discuss the observed differences in the ERPs between manipulating either the disturbance bandwidth or the control dynamics complexity.

# Comparison of Complex Environmental Sounds and Beep-Tones Distractor Stimuli

Both types of task-irrelevant distractor sounds elicited a characteristic waveform that contained ERP components, which were significantly different from the baseline (see **Figure 3**). In temporal order, they are the MMN, the novelty-P3, and the RON. Respectively, they are claimed to represent the three subsequent stages of how users respond to distraction (Schröger and Wolff, 1998; Escera and Corral, 2007; Wetzel and Schröger, 2014): (1) detection of the unexpected stimulus; (2) orientation towards the stimulus; and (3) disengagement from the distractor to re-orient back to the steering task. In other words, infrequently presented sounds are preferentially processed by the brain in spite of being task-irrelevant, whether they are complex environmental sounds or beep-tones. Two other ERP components (i.e., LPP and LN) were also elicited, but were not sensitive to steering demands.

Environmental sounds elicited ERPs that differed from the beep tones in two ways. First of all, they elicited larger ERPs. Second, their ERPs contained components that were sensitive

to steering demands. These two aspects are related. To begin, it can be argued that the larger novelty-P3 and RON amplitudes (see gray areas in **Figure 3**) indicate that environmental sounds recruit more corresponding mental resources than the beep sounds (Kok, 1990, 1997). This difference is apparent in the baseline viewing condition during which the participants' mental resources were unoccupied and readily available. Involuntary resource recruitment is attenuated when participants are required to perform a steering task (i.e., in the steering trials), but only for the novelty-P3 and the RON of the environmental distractors (see **Figure 5**). This is because the steering task reduced the amount of available resources to a lower level than task-irrelevant environmental distractors would typically recruit. In view of this, we believe that our use of task-irrelevant

Frontiers in Human Neuroscience | www.frontiersin.org March 2016 | Volume 10 | Article 73 |

environmental distractors is a more direct assessment of the resource demands of the steering task, when compared to dual-task paradigms that increase the resource demands of task-relevant stimuli that actively compete for resources with the steering task (Wickens et al., 1983, 1984; Sirevaag et al., 1989).

What are the properties of environmental sounds that allow them to recruit more mental resources and hence, generate larger ERPs even when they are task-irrelevant? Previous work suggests that distractor stimuli tend to recruit more resources if they are personally meaningful and/or exhibit high dissimilarity from their context. The personal meaning and dissimilarity from the context are respectively referred to as being stimuli specific and aspecific (Eimer et al., 1996; Hughes, 2014). Specific aspects are parameters that are inherent to the stimulus, which represent its meaning to the observer (Hughes, 2014). For example, one's personal ringtone is more distracting, as reflected by larger elicited ERPs, than another person's ringtone (Roye et al., 2007). In the current study, the environmental distractors represented familiar objects (e.g., dogs, cats, babies), which have more personal meaning than the beep-tone distractors. Thus, they can be expected to recruit more resources. Aspecific aspects of the eliciting stimulus recruit resources involuntarily due to its embedded presentation context. For example, a task-irrelevant female voice has been shown to be less distracting, as reflected by a decrease of performance in a visual recall task, when presented in a series of female voices than when presented in a series of male voices (Hughes et al., 2013). In the current experiment, we presented the environmental sounds as well as the beep sounds against a context of frequent beep tones. Arguably, environmental sounds that are a complex combination of multiple frequencies are more dissimilar to this context than their beep tone counterparts. This raised the likelihood that the environmental sounds would recruit more resources than their beep tone counterpart.

To sum up, task-irrelevant stimuli are more likely to be sensitive to task demands if they are personally meaningful and differ sufficiently from their embedded context. Some studies have been reported that have been successful in using task-irrelevant beep tones to evaluate task demands. However, these studies investigated complex tasks—that is, first person shooter (Allison and Polich, 2008) and racing games (Burns and Fairclough, 2015)—that, presumably, induced higher task engagement and varied in their resource demands at levels that beep tones were sensitive to. We expect the ERPs of taskirrelevant environmental sounds to be even more sensitive than beep tones to the resource demands of such complex tasks.

# Influence of Steering Demands on the Measured ERP Components

The current study is the first to employ task-irrelevant ERP probes in a task that allows for the systematic manipulation of different steering demands. Such task-irrelevant probes, in particular environmental sounds, continue to elicit ERPs with components that we have identified to be selectively diminished by steering demands: early P3, late P3 and RON (see **Figures 3**, **5**). As noted before, these components correspond to the mid and late stages of a three-stage distraction model (Schröger and Wolff, 1998; Escera and Corral, 2007; Wetzel and Schröger, 2014). From the perspective of this model, steering demands did not inhibit our participants' capacity for detecting unexpected occurrences. Instead, steering demands significantly diminished the extent to which available mental resources could be directed towards the processing of distractor stimuli. In turn, this hinders an efficient re-orientation away from the distractor stimuli. Altogether, these findings demonstrate that steering places demands on mental resources that would otherwise be directed towards an instinctive evaluation of unexpected events. These resources are based on attentional processes, but at a cognitive rather than a perceptual level. It is interesting to note that our participants were able to articulate this in that they rated the ''hard'' condition as being more demanding than the ''easy'' condition in terms of mental rather than physical effort (see **Figure 2**). This supports our research motivation in understanding the demands of a steering task beyond its perceptual and response requirements.

The ability to maintain an appropriate level for ''distraction'' is a fundamental capability of our attentional system and a critical aspect of effective vehicle handling. On the one hand, the capacity to be distracted by unexpected events is necessary when these events reflect potential dangers in the environment. For example, the phenomenon of ''attentional tunneling'' refers to scenarios when high-performance pilots miss unexpected hazards given their increased engagement with vehicle handling. Such undesirable instances have even been observed in novel cockpit environments that are designed to promote engagement with vehicle handling, for example when synthetic vision displays with intuitive flight guidance were employed for fixed-wing control (Wickens and Alexander, 2009). On the other hand, distraction presents a danger when it interrupts and prevents one to carry out a safetycritical task. In the United States, driver distraction raises the risk of a light-vehicle near-crash/crash to approximately three times of the baseline level (Klauer et al., 2006; Regan et al., 2011). Task-irrelevant or task-relevant probes can be judiciously employed in steering environments depending on whether the goal is to investigate either involuntary or voluntary distraction. A perspective that considers steering environments in terms of the driver's engagement with the steering task and potential distractions (both voluntary and involuntary) is more likely to yield practical insights and operational recommendations than one that simply evaluates driving workload.

In this study, we show that both, early and late P3 components, were influenced by steering demands. These components are discriminable from each other in terms of their spatial and temporal characteristics. Functionally, the early P3 reflects a sensitivity towards violations of one's model of the environment at a post-sensory stage (Ceponiene et al., 2004). The late P3 relates to the attending of the unexpected event itself, presumably for the purpose of updating one's model of the environment when deemed necessary (Escera et al., 1998; Yago et al., 2003; SanMiguel et al., 2008). Earlier studies have provided mixed evidence on the relationship of workload and these components. Difficulty manipulations in a complex Tetris<sup>r</sup> gaming environment have been found to only diminish early P3 amplitude (Dyke et al., 2015), while other studies, in particular those that target memory load, identified the late P3 as the only P3 sub-component that is influenced by workload (Escera et al., 1998; SanMiguel et al., 2008). Until the subtle interactions between workload and these P3 sub-components are better understood, we recommend employing approaches such as mass univariate analyses to determine the role of either sub-components in new task paradigms (e.g., steering), so as to reduce the risk of false positives.

Characterizing the relevant sub-components in terms of their spatial and temporal distributions provides an additional benefit. It allowed us to discriminate between manipulations of steering demands that would not be noticeable by only analyzing the peak, given inter- and intra-individual differences (cf., Munka and Berti, 2006; Miller et al., 2011; Dyke et al., 2015). In the current work, we show that more participants discriminated for the ''easy'' and ''hard'' steering trials compared to when the analysis was based on the highest peak in the grand average (see **Figure 7**). Mass univariate analysis also offers an additional benefit in that it more accurately defines the spatial location of the effect of interest. In the case of late P3, we find that the electrodes that are sensitive to steering demands have a more parietal distribution than the peak amplitude electrode. This agrees with the work of Yago et al. (2003) who also defined a discriminable parietal aspect of late P3 that is claimed to be involved with working memory updating and is believed to originate from the posterior and superior parietal lobes.

Besides early and late P3, we found that steering demands significantly decreased RON amplitude. RON is believed to reflect the re-orientation of attention from the distractor stimulus (Schröger and Wolff, 1998; Escera and Corral, 2007; Wetzel and Schröger, 2014). In this sense, it can be regarded as a disengagement of resources from processing distractor stimuli. Our results are comparable to those reported by Berti and Schröger (2003) who also found that increasing workload in the primary forced-choice task reduced RON amplitudes to a distracting task-irrelevant feature. In their experiment, participants were required to discriminate between sounds with ''short'' and ''long'' durations. Infrequent changes in the task-irrelevant pitch of the sounds produced RONs with an approximate latency of 500 ms. In their experiment, workload was manipulated either by allowing participants to respond immediately or by requiring them to respond upon the presentation of the next stimuli. The latter was considered to be more difficult as it involved a stimulusresponse conflict. The amplitude of RON was found to be diminished in the difficult condition. Our current results indicate that a similar RON component can be diminished by increased task demands, even when the task is presented in a separate modality from the distractor. One reason for this could be that fewer resources were available to begin with, that could be effectively engaged by the distractor stimuli. Another reason could be that mental resources are more likely to be engaged with processing distractor stimuli for longer periods of time when sub-optimal levels of resources are allocated for their processing. In this case, the disengagement from the distractor stimuli could be expected to be less efficient. Whichever the reason, it is important to realize that RON reflects resource (re-)allocation processes at a postsensory stage and that its amplitude does not simply decrease with increased workload. In fact, RON amplitudes have been found to be larger for the 1-back working memory task than its 0-back counterpart (SanMiguel et al., 2008). In this example, the 1-back task required participants to reference information of the primary task from recent history and larger RONs could have reflected a disengagement of resources from the distractor stimulus in addition to the re-allocation of resources to task-relevant information. We believe that our manipulation of steering demands resulted in decreased RON amplitudes because it only reflected the disengagement of resources from task-irrelevant distractor stimuli. If this is true, a dual-task paradigm that entails resource competition between a steering task and a task-relevant probe should result in larger RON amplitudes when steering demands are increased.

# The Steering Demands of Manipulating Disturbance and Control Dynamics

In the current study, we manipulated two aspects of steering that are known to influence steering demands—that is, disturbance bandwidth and control dynamics complexity. Both manipulations of steering difficulty had an influence on the identified ERP components in the expected direction (**Figures 6**, **7**). Comparatively, this influence was evident in more participants when the complexity of control dynamics was manipulated. This result is in agreement with previous work that has shown a greater sensitivity of secondary task ERPs to the manipulation of control dynamics in the primary task (Isreal et al., 1980; Wickens et al., 1983, 1984; Sirevaag et al., 1989).

While encouraging, these results should be treated with caution. Our analyses reveal that our manipulations for steering demands do not influence the identified ERP components in all of our participants. In fact, some participants responded to steering demands only in the ''easy'' but not the ''hard'' condition, albeit to a lesser extent than vice versa (**Figure 6**). We believe that this reflects two aspects of inter-participant variance that are difficult to control for with the use of task-irrelevant ERP probes. First, the amount of resources that are involuntarily recruited for the processing of task-irrelevant probes. Second, steering competence and engagement with the steering task.

Participants can be expected to differ in terms of how meaningful they perceive different environmental sounds. Such differences could vary the extent to which these task-irrelevant distractors attract resources for their processing. If ''insufficient'' resources are recruited, changes in the level of available resources due to manipulations in steering demands can be expected to go undetected. To mediate this, future studies could consider employing environmental distractors that are not as easily recognizable. It has been shown that larger frontal and parietal novelty-P3s are elicited by environment sounds that are not as easily recognizable, compared to their more recognizable counterparts (Opitz et al., 1999). Moreover, it has been shown that the novelty-P3's amplitude decreases with the repetition of familiar sounds but not unfamiliar sounds, presumably because participants are more effective in ignoring them (Cycowicz and Friedman, 1998, 2007).

Participants can be expected to vary in terms of steering proficiency. Therefore, some participants may only start to exhibit reduced levels of available resources under highly demanding scenarios. In fact, this is reflected in our results (see **Figure 6**). The current experiment employed fixed levels of steering difficulty. Subsequent studies could calibrate levels of steering difficulty for individual participants so that their performance discriminates sufficiently between ''easy'' and ''hard'' conditions. This would be similar to the use of adaptive methods in psychophysics to calibrate stimuli settings to individual differences in perception (Kingdom and Prins, 2010).

In spite of these limitations, our current findings are consistent with previous findings. The ERP components, which we have identified as being sensitive to steering demands, are more likely to differentiate for ''easy'' and ''hard'' conditions when disturbance bandwidth was manipulated than when control dynamics complexity was manipulated (cf., Isreal et al., 1980). This difference between the two manipulations is more prominent for early P3 and RON than it is for late P3. This suggests that increasing the complexity of the control dynamics limits how resources are directed towards and away from distractor stimuli.

#### Conclusion and Outlook

To conclude, we have shown that the demands of a steering task influence how the brain responds to task-irrelevant stimuli. Specifically, steering demands diminish the amplitudes of the early P3, late P3, and RON that are elicited by task-irrelevant auditory distractors, which are personally meaningful and distinct from the background. A three-stage distraction model would suggest that steering demands decreases one's sensitivity and likelihood to attend to unexpected events (early/late P3), as well as one's capacity to re-orient back to the steering task at hand (RON). In particular, we found this to be true for steering manipulations that increased the complexity of control dynamics.

The three-stage model of distraction, and its associated ERP components, is a simplification. It assumes a serial chain of information processing of the distractor stimulus and is agnostic to how the stages could be selectively influenced by factors that do not pertain to the distractor stimulus itself. Thus, its explanatory power is limited. Our finding, that environment sound distractors are more ''distracting'' than deviant beep tones (and result in larger MMN, P3a, and RON), is in line with the predictions of the three-stage distraction model. However, the three-stage distraction model does not explain why steering demands selectively influence P3a and RON amplitudes but not MMN. In fact, there is accumulating evidence to suggest that dissociations exist between the three stages of distraction. Factors such as the predictability of the distractor, which is not dependent on the distractor per se but on the homogeneity of the sequence of stimuli that precedes it, can influence MMN and P3a but not RON (Horváth et al., 2008). Converse dissociations have been reported whereby increasing the predictability of an auditory distractor with a visual cue can decrease P3a and RON amplitudes but leave MMN intact (e.g., Sussman et al., 2003). Hence, more complex accounts have since been proposed that not only consider how distractor stimuli are processed but also how their processing might interact with the perceived regularity of the auditory scene (for example, see Bendixen, 2014). For now, it is sufficient to note that the demands of a steering task are reflected in how it modulates the distractibility of task-irrelevant environment sounds, as reflected in the early/late P3 and RON that they elicit. Besides electrophysiological responses, future experiments should be designed to investigate the behavioral consequences of distraction on steering performance (c.f., Parmentier, 2014). This could elucidate differences between distractor stimuli that passively reflect steering engagement and those that pose an involuntary conflict with the cognitive processes that underlie steering itself.

Task-irrelevant stimuli can be expected to be more easily integrated into real-world operations than the use of ERP probes that require an explicit response. In this regard, our current findings raise the opportunity of estimating steering demands across a wider range of scenarios than was previously considered to be practical. Furthermore, the use of task-irrelevant and taskrelevant distractor stimuli can reveal complementary aspects of how mental resources are managed during steering. In this regard, they can be effectively employed to understand the demands of steering and users' level of engagement with the steering task and their environment.

# AUTHOR CONTRIBUTIONS

HHB and LLC: conception of the work; LLC and MS: design and interpretation of the data; LLC and MS: acquisition and analysis of the data; HHB, LLC and MS drafting and revision of the work; HHB, LLC and MS: final approval of the version to be published and agreement to be accountable for all aspects of the work.

# FUNDING

This research was supported by the German Research Foundation (DFG) within project C03 of SFB/Transregio 161 as well as by the Max Planck Society.

# ACKNOWLEDGMENTS

Special thanks to Christiane Glatz, Nina Flad, Eva Symeonidou, Alessandro Nesti, Tonja Machulla and two anonymous reviewers for their insightful comments and suggestions.

# REFERENCES


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Scheer, Bülthoff and Chuang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Toward a Wireless Open Source Instrument: Functional Near-infrared Spectroscopy in Mobile Neuroergonomics and BCI Applications

Alexander von Lühmann1, 2 \*, Christian Herff <sup>3</sup> , Dominic Heger <sup>3</sup> and Tanja Schultz <sup>3</sup>

<sup>1</sup> Machine Learning Department, Computer Science, Technische Universität Berlin, Berlin, Germany, <sup>2</sup> Institute of Biomedical Engineering, Karlsruhe Institute of Technology, Karlsruhe, Germany, <sup>3</sup> Cognitive Systems Lab, Karlsruhe Institute of Technology, Karlsruhe, Germany

Brain-Computer Interfaces (BCIs) and neuroergonomics research have high requirements regarding robustness and mobility. Additionally, fast applicability and customization are desired. Functional Near-Infrared Spectroscopy (fNIRS) is an increasingly established technology with a potential to satisfy these conditions. EEG acquisition technology, currently one of the main modalities used for mobile brain activity assessment, is widely spread and open for access and thus easily customizable. fNIRS technology on the other hand has either to be bought as a predefined commercial solution or developed from scratch using published literature. To help reducing time and effort of future custom designs for research purposes, we present our approach toward an open source multichannel stand-alone fNIRS instrument for mobile NIRS-based neuroimaging, neuroergonomics and BCI/BMI applications. The instrument is low-cost, miniaturized, wireless and modular and openly documented on www.opennirs.org. It provides features such as scalable channel number, configurable regulated light intensities, programmable gain and lock-in amplification. In this paper, the system concept, hardware, software and mechanical implementation of the lightweight stand-alone instrument are presented and the evaluation and verification results of the instrument's hardware and physiological fNIRS functionality are described. Its capability to measure brain activity is demonstrated by qualitative signal assessments and a quantitative mental arithmetic based BCI study with 12 subjects.

Keywords: open source, functional near-infrared spectroscopy (fNIRS), brain computer interface (BCI), modularity, wearable devices, neuroergonomics

# 1. INTRODUCTION

Functional Near-Infrared Spectroscopy (fNIRS) is an increasingly established technology pioneered by Jöbsis (1977) that allows non-invasive, comparatively low-cost, compact and hazard-free continuous measurement of cerebral oxygenation levels using near-infrared light.

While first generation instruments were rather bulky and expensive, using Laser Diodes with Photo Multiplier Tubes (PMTs) (Cope and Delpy, 1988; Cope, 1991; Rolfe, 2000;

#### Edited by:

Srikantan S. Nagarajan, University of California, San Francisco, USA

#### Reviewed by:

Hasan Ayaz, Drexel University, USA Barak A. Pearlmutter, National University of Ireland Maynooth, Ireland Laurens Ruben Krol, Technische Universität Berlin, Germany

#### \*Correspondence:

Alexander von Lühmann a.vonluehmann@campus.tu-berlin.de

> Received: 26 May 2015 Accepted: 26 October 2015 Published: 12 November 2015

#### Citation:

von Lühmann A, Herff C, Heger D and Schultz T (2015) Toward a Wireless Open Source Instrument: Functional Near-infrared Spectroscopy in Mobile Neuroergonomics and BCI Applications. Front. Hum. Neurosci. 9:617. doi: 10.3389/fnhum.2015.00617 Schmidt et al., 2000) and later Avalanche Photo Diodes (APDs) (Boas et al., 2001; Coyle et al., 2004, 2007), today's devices often take advantage of Light Emitting Diodes (LED) and Photo Diodes (PDs) (Vaithianathan et al., 2004; Bunce et al., 2006; Chenier and Sawan, 2007; Ayaz et al., 2013; Safaie et al., 2013; Piper et al., 2014) which allow safe, more compact and mobile applications. After the initial development of laboratory and bedside-monitoring devices for monitoring of local oxygenation levels e.g., in newborn infants (Cope and Delpy, 1988; Cope, 1991), in the 2000s many research groups focused on the design of imaging instruments for brain activity mapping from topographic information [functional Near-Infrared Imaging (fNIRI)] (Schmidt et al., 2000; Boas et al., 2001; Vaithianathan et al., 2004). Recently, fNIRS and fNIRI have entered neuroscience as a reliable and trustworthy research tool for research based on investigating groups of subjects (Scholkmann et al., 2014), offering potentially complementary information to fMRI, PET and EEG (e.g., oxygenation information or cytochrome oxidase as marker of metabolic demands; Strangman et al., 2002). But also in adjacent fields such as Brain Computer Interfaces (BCI) and neuroergonomics, defined as the study of the human brain in relation to performance at work, (Parasuraman, 2003, 2011), fNIRS technology opens new possibilities (see e.g., Matthews et al., 2008, for an introduction in hemodynamics for Brain-Computer Interfaces). It is increasingly built for and used in single-trial fNIRS applications for BCIs for control (Naseer and Hong, 2013; Schudlo and Chau, 2015) and rehabilitation (Kanoh et al., 2009; Yanagisawa et al., 2010) and has successfully been used for cognitive workload assessment (Son and Yazici, 2006; Ayaz et al., 2012), brain dynamics monitoring during working memory training and expertise development (Ayaz et al., 2013), hybrid NIRS-EEG based signal processing tasks (Safaie et al., 2013; Putze et al., 2014) and recently also in combination with trans-cranial direct current stimulation (tDCS; McKendrick et al., 2015). Furthermore, fNIRS is found to be a promising multimodal expansion to EEG-based BCI (Pfurtscheller et al., 2010; Biessmann et al., 2011; Fazli et al., 2012). The major limitation of fNIRS is the relatively slow onset of hemodynamic processes. However, especially in the field of passive BCI (Zander and Kothe, 2011), reaction times do not necessarily have to be extremely fast.

An increasing number of approaches using fNIRS in the field of mobile brain imaging (e.g., Piper et al., 2014) and neuroergonomics (Fairclough, 2009) shows the demand for wireless, miniaturized and customizable fNIRS technology. So far, researchers either relied on costly and mainly static commercial devices, or designed their own fNIRS equipment from scratch. Trying to overcome the restrictions of commercial tabletop instruments, the latter was done by groups such as Ayaz et al. (2013), Safaie et al. (2013), Lareau et al. (2011) and Atsumori et al. (2007), using LEDs with Si PDs/APDs for new generation instruments that generally enable a mobile use. To the best of our knowledge however, only very few of even newer generation devices (Safaie et al., 2013) are truly miniaturized, stand-alone, unobtrusive and mobile and can be carried on the body without a backpack while still enabling free movement and data transmission/processing at the same time, as often external static instrumentation such as DAQ-equipment, lock-in amplifiers and power sources are required. Also, in many cases, signal extraction technologies like lock-in amplification seem to be sacrificed for the sake of miniaturization or complexity. Probe and attachment designs proposed in the last years, such as the use of flexible PCBs (Vaithianathan et al., 2004; Bozkurt et al., 2005; Bunce et al., 2006; Son and Yazici, 2006; Rajkumar et al., 2012), eeg-cap like optodes (Kiguchi et al., 2012; Piper et al., 2014) and mechanical mounting structures (Coyle et al., 2007) are usually limited to static applications and/or in case of flexible PCB to a fixation on the forehead due to obstruction by hair. An often reported issue in the field of mobile applications, that seems not to be resolved satisfactorily so far, is the optode attachment to the head for both stable optical contact, sufficient light levels and comfortable wearing.

While the recent trend of new system designs for portability and mobility can also increasingly be observed in commercial devices, researchers will always need custom solutions for innovative approaches. To help reducing the time and effort in these cases, we present the design and a first evaluation of a configurable, miniaturized, modular, fully mobile (wireless) multichannel fNIRS system that is provided open source on www.opennirs.org under a CC BY-NC 4.0 license with a detailed documentation. It is a customizable low-cost research tool that enables both stand-alone use and the combination with custom or external DAQ equipment. Also, the device makes use of a new detailed spring-loaded optode fixation concept to tackle the above mentioned optode attachment issue.

# 2. MATERIALS AND METHODS

#### 2.1. Instrument Requirements

We identified aspects that are crucial to be fulfilled for a fNIRS device in the context of mobile BCI and neuroergonomics. Besides the criterion for the hardware to be comparatively low cost, these can be assigned to four groups:


The following subsections will provide detailed information on our approach to fulfill these requirements on a concept, hardware and software level.

#### 2.2. Instrumentation Design 2.2.1. System Concept

The system concept of the modular open instrument is shown in **Figure 1**. It consists of one or more stand alone 4-channel

Continuous Wave NIRS modules and a mainboard. Each module is controlled by the mainboard via a simple parallel 4 Bit control interface. The mainboard provides the power supply rails, AD-conversion of the NIRS signals and an UART communication interface and can be replaced by any custom data acquisition (DAQ-) equipment when the control interface and symmetric ±5 V power rail are supplied. This enables full customization of the instrument with respect to physical channel number, power consumption and conversion rate and depth, while spatially distributing the hardware components (and weight), and performing local hardware signal amplification and processing, thus minimizing noise and interferences.

The fNIRS modules were designed considering the current understanding of fNIRS instrumentation technology as reviewed by Scholkmann et al. (2014) and others (Obrig and Villringer, 2003; Son and Yazici, 2006) with special regard to hardware design and wavelength-selection for SNR maximization/crosstalk minimization and considering potential hazards as identified by Bozkurt and Onaral (2004).

Each module provides four dual wavelength fNIRS channels using 750 and 850 nm multi-wavelength Epitex L750/850-04A LEDs. While the LEDs have a broader emission spectrum (1λ = 30/35 nm) than sharp peaked laser diodes (typically 1λ ≈ 1 nm), their incoherent and uncollimated light allows for a higher tissue interrogation intensity and direct contact with the scalp due to less heating and is safer with the human eyes.

The LED current is regulated by adjustable current regulator circuits based on high precision amplifiers (Analog Devices AD824A) and field effect transistors (FMB2222A). Channel activation and current modulation for lock-in amplification is performed by analog switches (Analog Devices ADG711) that are accessed via an analog 1:8 demultiplexer (NXP HEF4051). After tissue interrogation, NIR light is detected by a central Si photo detector with integrated trans-impedance amplifier for output noise minimization (Texas Instruments OPT101, 1 M feedback resistor, bandwidth 14 kHz) and is then amplified and lock-in demodulated (using Analog Devices AD630). An 8 Bit Atmel Corp. AtMega16A microcontroller's PWM module creates the 3.125 kHz square wave reference for lock-in (de- )modulation using an external 20 MHz crystal for jitter minimization. It also processes incoming control signals from the 4 Bit control interface and operates and configures the on board hardware. For adjustment of the LED currents, an 8 Bit digital-to-analog converter (DAC; Maxim MAX5480) is implemented. It supplies the voltage level at the current regulator inputs that is the command variable for the current regulation level. A programmable gain amplifier (Texas Instruments PGA281) is implemented for pre lock-in amplification of the detected NIR signal with a variable gain from G = 0.688 to 88.

During lock-in demodulation, the signal is filtered by a 3rdorder Butterworth low-pass and is then again amplified (G = 5.1) and stabilized by a set of two high precision amplifiers (Texas Instruments LMC6062) before leaving the fNIRS module for external AD conversion.

The system is designed for Time-Division Multiplexing (TDM) of the fNIRS channels. This is a trade-off between minimizing inter-channel crosstalk, heating (Bozkurt and Onaral, 2004) and battery consumption on the one hand and sacrificing SNR, which is limited by the width of the applied time windows. For demultiplexing of the locked-in output branches, a variable (sample rate dependent) dwell time is inserted after each onset of a single channel activation before sampling the steady state photo detector signal on the mainboard or with custom DAQ equipment.

Configurable PGA gain (G = 0.6875–88) and LEDintensity (256 DAC levels) in combination with a feedback "signal monitor" line allow the signal dependent adaption for maximum amplification in the lock-in demodulation process without reaching the dynamic range limit of one of the components.

**Modularity:** The above described design of the fNIRS modules allows operation in many configurations—only requiring compatibility with the above mentioned interface consisting of 4 Bit control, power supply and analog output. For an extension of the total channel count, several modules can be used. Changes in set-up and module count only affect the control unit and its routines chosen by the user, which activate the time division multiplexed channels and convert the analog fNIRS signals from the modules:


#### 2.2.2. Selection of Hardware Design Aspects

**Emitter Branch:** For a high accuracy of the fNIRS instrument, a careful design of the NIR-light emitting circuit is crucial, as fluctuations in the radiation intensity cannot be discriminated from changes in absorption due to changes in chromophore concentrations in the tissue.

To keep the current through the LED semiconductor junctions constant and independent from variations in supply voltage and temperature, and at the same time allow intensity adjustment and current modulation for the lock-in amplification process, a customized current regulator circuit was designed (see **Figure 2**).

Similar to a solution proposed by Chenier and Sawan (2007), an analog switch is used in the OpAmp based regulation circuit for square-wave modulation of the current. However, instead of disrupting the regulation process at the transistor base, analog switches (ADG711) are used at the inputs of the regulator circuits to pull the regulator inputs low when deactivated. fNIRS channel activation and modulation is thus realized by simply feeding through the square wave reference to the corresponding current regulator switch selected by the multiplexer.

As the regulator is modulated in the kHz-range, over- and undershoots influence the ideally square-wave shape of the current. To optimize the shape, a passive negative RC feedback was added and evaluated for best performance.

**Receiver Branch:** The receiver branch was designed to maximize SNR by minimizing noise influences from shot, thermal and 1/f noise, dark currents and stray light from external light sources.

Shot noise is based on the quantum nature of the photons and therefore unavoidable and, for detectors without internal amplification, proportional to the square root of the average incident intensity (Scholkmann et al., 2014). To maximize SNR, the instrument is operated using the maximum NIR-light intensity level for the current regulators that is feasible in the experimental situation. Opaque cell rubber tubes are used to cover the sides of the NIR emitters and detector and the fNIRS module housing is covered with opaque paint to minimize shot noise influences from background radiation.

To reduce thermal noise influences, a Si photo diode with integrated trans-impedance amplifier circuitry (OPT101) was selected for detection. Lock-in extraction of the detected signal further reduces stray light, dark current and 1/f noise influences. Placing the PGA between the detection and lock-in extraction unit enables maximum pre-amplification of the signal while amplifier noise components added in the amplification process are reduced by the subsequent lock-in demodulation. Nonphysiological high frequency components of the signal are attenuated by the 3rd order low pass filter of the lock-in demodulation unit.

#### 2.2.3. Interfaces and Software Design

**Figure 3** shows the software concept. The fNIRS module software sets up hardware components (PGA, DAC, MUX,...) and is controlled by an interrupt-based architecture that receives its control signals from the 4 Bit parallel interface. Therefore, interface operation and analog signal conversion can be done by the mainboard or any custom or standard DAQ-equipment with 4Bit programmable digital outputs (such as e.g., NI USB600x series). Using the mainboard, a channel administration routine both supervises data acquisition and acts as interface between the fNIRS modules and the PC by processing received user

operated via parallel control interface by the mainboard or any custom control and data acquisition device. Function of the 4 Bit interface (3:RST, 2:TRIG, 1:CH1, 0:CH0): Bits CH1:CH0 select one of the four physical NIRS channels. A rising edge on the TRIG line activates the selected channel, always beginning with wavelength 750 nm of the corresponding LED. Each subsequent rising edge toggles the activation between 750 and 850 nm. When the RST line is pulled up, all channels are turned off. The next rising edge on the TRIG line starts the process again, beginning with 750 nm.

commands (configuration, start, stop...), translating them into signals for the 4 Bit fNIRS module interface(s) and sending acquired data packages via the UART interface. On the PC's operating system side, the user can control the instrument and directly read out the data packages in ASCII CSV format via a simple serial port command console or access the serial port with any software such as LabView or Matlab. A LabView graphical user interface was developed for easy configuration and control as well as display and logging of raw and modified Beer-Lambert Law data.

#### 2.2.4. Mechanical and Probe Design

In the fNIRS instrument's mechanical design, the idea of modularity/scalability and robust fixation is continued by providing independent custom 3D printed solutions for the single fNIRS modules and the mainboard:

The Mainboard, Bluetooth module and batteries are worn on the upper arm of a subject in a chained multiple-unit housing (see also **Figure 5**, in the next section).

For the single fNIRS modules, a new mechanical springloaded design was approached to optimize signal quality, sensitivity and light penetration depth together with easy and robust, adaptive fixation of the optodes (see **Figure 4**). Based on a spherical approximation of the head with diameter D = 20 cm, the central NIR light detector and the four NIR LEDs are placed perpendicular to the scalp with a source-detector distance of d =

35 mm. To enable perpendicular fixation of the emitters/detector and at the same time allow alignment to the natural unevenness of the head and its deviations from the spherical approximation, the NIR light LEDs are not stiffly connected to the module body housing but integrated in movable spring-loaded LED holders. These holders are based on two nested tubes that are spring-loaded against each other (S1) and against the module housing (S2) and are able to rotate around an axis (R): Spring S1 presses the LED toward the surface of the head, thus enabling alignment and preventing the loss of contact during movements. Spring S2 and the rotary joint R keep the LED perpendicular to the surface while enabling small deviations for comfort and alignment.

To minimize stray light influences and for cushioning purposes, the detector and emitters are encased by an opaque cell rubber tubing. To fixate a single module to the head, a flexible ribbon with hook-and-loop fastener can be used that is sewed to the module housing.

The mechanical concept was designed to allow the modules to be used on the forehead as well as over haired regions of the head: The single spring-loaded optodes are easily accessible due to their modular fixation without a cap or other concealing elements. This enables the user to manually brush aside obstructing hair from under the optodes for better optical contact. Even though we successfully conducted measurements over hairy regions of the head, it has to be pointed out that the usability of the modules on other regions than the forehead has not been proven under controlled conditions so far.

#### 2.3. System Evaluation 2.3.1. Hardware Analysis

To enable a differentiated characterization of the instrument's hardware according to functional units, evaluation and analysis was split into emitter branch (current regulation and modulation), receiver branch (lock-in module), power supply stability and overall drift characteristics:

• Current regulator/modulator speed and current shape/oscillation characteristics: To evaluate and optimize the current regulator design characteristics for a stable and

minimally oscillating but steep square wave shape of the regulated current signal, both LTSpice simulations and measurements were conducted and the regulator design parameters iteratively improved using two high-precision operational amplifiers (Analog Devices AD824A and Linear Technologies LMC6064). To minimize transient oscillation and settling times, a negative feedback decoupling capacitor C was introduced to the regulator design. For the determination of its optimal value, the shape of the regulated square wave current signal was investigated in a range from C = 0 pF to C = 330 pF at different current levels.

• Lock-in performance: The sum of propagation delays that result from each hardware component in the emitter-detectorsignal path leads to an overall phase shift between input and reference signal in the analog lock-in amplification process. Such a phase shift results in an attenuation of the signal during demodulation (Meade, 1982, 1983). To minimize this effect, all hardware elements in the signal path were selected with respect to high-speed/low delay times. The remaining overall phase shift 18 = 1t T · 2π between the reference signal (with period T) and the detected pre-amplified signal was measured before demodulation. Using the established straight forward mathematical model for square wave reference lockin demodulation, as in Meade (1983), a phase shift dependent attenuation factor

$$A = \cos(\Delta \Phi) \tag{1}$$

was used to estimate the resulting attenuation.

For an estimation of the receiver sensitivity using the noise equivalent power (NEP), dark voltage noise levels (no incident light to the photo detector) were measured at the output of the lock-in-module.

• System drifts: The following possible sources of system drift were considered: Changes in the 1 LED current regulation resistance due to temperature changes, changes in the total

radiated power of the LEDs due to semiconductor junction temperature and changes despite constant currents and supply voltage variations. Changes in stray light, amplifier and thermal resistor noise are strongly suppressed by the lock-in amplification process. To minimize signal drifts resulting from changes in the 1 current regulator resistance, Panasonic current sensing resistors with a low temperature coefficient of resistance (TCR = ±50 · 10−<sup>6</sup> / ◦C) were chosen.

The overall system drift of a single fNIRS module was specified with 20 min continuous acquisition windows of a single active channel at maximum intensity (100 mA) with the PGA set to G = 44 and the module being placed at a fixed position in an opaque closed box.

• Mainboard power supply stability: DC supply voltage drifts during 20 min signal acquisition periods and current modulation impacts on the supply voltage were evaluated. As the 100 mA (max.) square wave 3.125 kHz modulation can influence the power supply voltage stability and noise it can degrade the performance of the signal detection and amplification elements. Their output signals during active modulation were acquired while zero optical input to the photo detector was ensured by encasing the active LED with an opaque metal box. For customization, the layout of the fNIRS module allows both separate and common supply of the LED currents and module hardware.

#### 2.3.2. Physiological Verification

Simple qualitative experiments were conducted using a channel at 10–20 point Fp1 to verify significant strength of physiological information in the raw signal and its power spectrum. Amongst others, visibility and strength of pulse artifacts are indicators for the signal quality and have been widely documented in fNIRS literature with the pulse artifact's amplitude being in the order of metabolic variations due to brain activity (Boas et al., 2004; Lareau et al., 2011; Scholkmann et al., 2014). Thus, with the fNIRS module pressed firmly against the head to reduce the sensitivity to scalp signals (decreased blood flow under the optodes), a clearly visible pulse artifact is a first indicator for sufficient signal quality to measure brain activation. The pulse rate was verified with conventional reference pulse measurements.

For verification and quantification of the device's capability to measure metabolic brain activity, a mental arithmetic BCI experiment was conducted with 12 subjects. In this experiment, it is shown that the measured hemodynamic responses can be classified on a single-trial basis, i.e., each trial can be classified as containing mental arithmetic or relaxation, instead of measuring only the difference in the average hemodynamic response.

Mental arithmetic tasks are known to illicit strong hemodynamic reactions in frontal brain areas and have been investigated in a variate of studies with fNIRS (Ang et al., 2010; Herff et al., 2013; Bauernfeind et al., 2014). Here, 30 trials of mental arithmetic data were recorded for each participant. During each 10 s trial, participants were asked to repeatedly subtract a number between 7 and 19 (excluding 10) from a number between 501 and 999). Both numbers were presented on a screen at a distance of roughly 50 cm. After each mental arithmetic trial, participants were asked to relax for 25–30 s. These pause intervals were indicated by a fixation cross on the screen. A longer resting period of variable length was included after 15 trials to allow participants to rest and drink. No data of these extended resting periods were used in our analysis.

The open fNIRS device was placed on the forehead and fixated around the head with the flexible ribbon with hook-and-loop fastener sewed to its housing. It was placed such that both active emitters were placed on the locations Fp1 and Fp2 of the international 10–20-system. The light detector was placed on AFz resulting in an emitter-detector distance of approximately 3.5 cm.

All subjects were informed prior to the experiment and gave written consent.

The signal processing of the recorded data was performed in a straight-forward and simple manner, since we focus on the developed hardware in this paper. More advanced methods have been shown to improve accuracies for classification in neuroimaging (Calhoun et al., 2001; Blankertz et al., 2008; Lemm et al., 2011; Heger et al., 2014). The raw optical densities were transferred to concentration changes of oxygenated and deoxygenated hemoglobin (HbO and HbR, respectively) using the modified Beer-Lambert Law (Sassaroli and Fantini, 2004). HbO and HbR values were then linearly detrended in windows of 300 s. Low frequency noise was attenuated by subtracting a moving average of the mean of 30 s prior and after every sample. Finally the data was low-pass filtered using an elliptic IIR filter with filter order 6 and a cut-off frequency of 0.5 Hz to reduce high-frequency systemic noise like pulse artifacts.

After preprocessing, trials were extracted based on the experiment timings. For the pause blocks, we extracted the last 10 s of the 25–30 s pause intervals, to ensure that hemoglobin levels have returned to baseline. For each mental arithmetic trial, we extracted 10 s of data starting 5 s after stimulus presentation, to ensure that the hemodynamic response has already developed. Labels were assigned to the trials referring to either mental arithmetics or pause data. For each trial, we extracted the slope of a straight line fitted to the HbO and HbR data of each channel as a feature. The line was fitted using linear regression with a leastsquares approach. Slope features have been shown to work well in previous studies (Herff et al., 2014).

Evaluation was performed using a 10-fold cross-validation and classification by Linear Discriminant Analysis. In addition to the single trial analysis, the average hemodynamic response is calculated by averaging over all mental arithmetics or all pause trials.

# 3. RESULTS

The developed open modular multichannel fNIRS system (see **Figure 5**) proved functionality, fast set up and easy application in all testing conditions. Wearing the mainboard module on the upper arm and the fNIRS module on the head using flexible ribbons and hook-and-loop fastener, the user can move freely and is bothered minimally by the instrument while signal quality and robustness to movement showed promising results. It should be noted however, that the physiological signals used for evaluation results in this paper were acquired from sitting subjects to reduce possible error sources and thus allow a more explicit first performance assessment of the new open source hardware.

The final instrument is characterized by:


Low cost components were used for the design. The total cost of the instrument's hardware for one 4 channel fNIRS module and one mainboard mainly depends on PCB fabrication costs and is approximately 200 EUR/250 USD.

A full documentation including detailed descriptions, schematics, and evaluation can be found in the supplementary materials for this article/on the web: www.opennirs.org. In the following, we present the main evaluation results of the steps described in section 2.

#### 3.1. Current Regulation/modulation Circuit

Mostly due to its higher slew rate, the AD824A showed a much faster current regulation and lower transient oscillations than the LMC6064. The experimental results for the minimization of oscillation and settling times with different decoupling capacitor values C (see **Figure 6**) showed higher settling oscillations for low C at low current levels, higher transient oscillation for high C at higher current levels and allowed the identification of the optimal C: C = 100 pF showed the best tradeoff between minimal oscillations and maximal edge steepness for all current levels.

# 3.2. System Drifts

The signal drift of a continuously active channel in Volts per second was calculated using linear least squares regression on the acquired 20 min. raw signals, yielding a negative drift coefficient of C<sup>D</sup> = −1 · 10−<sup>6</sup> V/s (measured fNIRS signals typically dependent on the device configuration—being in the order of several hundred mV) and a respective long term stability coefficient of < −0.42% for both wavelengths. It was observed that independent from the fNIRS module, power supply heating on the peripheral hardware can add additional drifts of up to one order of magnitude by effecting the analog-to-digital converter. This points out the importance of careful selection/design of peripheral hardware for the acquisition of the fNIRS module's analog signal.

# 3.3. Lock-in Amplification, SNR and Dynamic Range

For an approximation of the total effective phase shift between reference and demodulator input signal in the lock-in unit, the delays between both signals were measured at the 50% levels of both respective rising (tdr) and falling edges (tdf ). It is the sum of times where the logical levels of both signals do not match and was measured as 1t = tdr + tdf = 18.5 + 7.2 µs. To estimate the attenuation caused by non-phase-synchronous demodulation of the signal A, Equation (1) is used with the measured 1t and reference signal cycle duration T = 320 µs, and yields A ≈ 0.875, which does not affect the overall accuracy significantly. Evaluation of the single component phase delays in the signal path revealed that further minimization approaches should first target the PGA (1tPGA = 7.0 + 4.5 µs).

For the evaluation of the detector's sensitivity and dynamic range, the mean dark voltage signal µ<sup>d</sup> (no incident light on the


FIGURE 7 | Estimation of signal and noise in the instrument: With the NEPs identified to be 2.27/2.21 nWpp, the distance between these optical powers equivalent to the noise floor of the detection circuit and the measured powers incident to the tissue (5.70 mW/5.38 mW) at medium LED illumination can be determined to be approximately 128 dB/127.7 dB, respectively. The degree of optical loss of the incident light in the tissue is subject dependent. Here, we estimate it to be in the order of at least 40–60 dB. With the actual metabolic fNIRS signal being in the order of 1% of the measured optical signal, the distance of the fNIRS signals to the noise floor is approximately 28 dB.

photodetector) at the output of the lock-in amplifier and post amplification branch was measured to be µ<sup>d</sup> = 0.101 Vrms with a standard deviation of σ<sup>d</sup> = 3.99 mVrms at a typical PGA gain of G = 44 and fixed lock-in filter gain of G = 5.1. Using the mean dark voltage plus standard deviation and the responsivities R<sup>λ</sup> of the OPT101 photodiode (R<sup>750</sup> = 0.55 V/µW and R<sup>850</sup> = 0.60 V/µW), the Noise Equivalent Powers of the whole detector branch for a SNR of one

$$NEP\_{\lambda} = \frac{\mu\_d + \sigma\_d}{R\_{\lambda} \cdot G\_{total}} \tag{2}$$

were estimated to be NEP<sup>750</sup> = 2.27 nWpp = 0.80 nWrms and NEP<sup>850</sup> = 2.21 nWpp = 0.78 nWrms.

The optical powers radiated by the LED at medium intensity (I<sup>F</sup> = 50 mA) were measured to be 5.70 mW for 750 nm and 5.38 mW for 850 nm. Using these incident powers and the NEPs allows an estimation of the signal to noise distances (for an overview see **Figure 7**): The wavelength dependent ratio of incident light to light not longer detectable as its signal is drowning in noise, yields signal to noise distances of 128 dB<sup>750</sup> and 127.7 dB850. These distances are largely decreased by the optical loss due to tissue scattering and absorption that is subject dependent and assumed to be in the order of > 60 dB. With the physiological fNIRS signal usually being around 1% of the measured optical signal, the distance between the fNIRS signal components and the noise floor of the detection circuit is further decreased by ≈40 dB and estimated to be in the order of 28 dB.

Saturation of the detection branch occurs, when the upper input voltage limit of the ADC, here 2.5 Vpp, is reached for the lowest PGA gain setting of G = 0.6875, which is the case at 1.296/1.188 µWpp incident light (750 nm/850 nm). Using these results, the minimum system dynamic range, expressed as the ratio of signal saturation to the NEPs, is estimated to be in the order of 55.13/54.6 dB. It should be stated, that the configuration of the LED intensities (25–100 mA) on the emitter side can further increase the dynamic range

#### TABLE 1 | Performance characteristics of the fNIRS Module.


of the instrument. **Table 1** summarizes the performance characteristics.

#### 3.4. Mainboard Power Supply

The ±5 V DC supply voltage drift measurements showed a stable supply voltage of +4.959 V and −4.960 V with less than 500 µV total drift in 20-min measurement periods. Evaluation of the maximum impact of the current modulation on the detecting components via the power supply revealed that current modulation flanks can create a ±2 mV high-frequency(kHz) noise around the photo detector output baseline signal that is further amplified by the PGA to strong ±100 mV peaks (at G = 44) when supplying the LED current either directly from the battery or from the regulated +5 V rail for the other fNIRS module hardware. However, as the supply variations are synchronous with the signals and as high-frequency noise is effectively suppressed by the 3rd-order lock-in low-pass of

the fNIRS module, influences on the baseline of the lock-in demodulated signal were not observed.

## 3.5. Physiological Measurements

Qualitative physiological experiments showed very clear signals and proved the basic functionality of the instrument. **Figure 8** shows a representative raw signal (750 nm) during three mental arithmetics trials performed by a subject (a) and the power spectrum computed over the whole session for the same subject (b). The latter shows the typical power law appearance and peaks by systemic artifacts that have widely been reported for fNIRS signals in the literature (e.g., Fekete et al., 2011).

The average hemodynamic response (see **Figure 9A**) over all subjects of the mental arithmetics experiment shows the expected behavior, i.e., an increase in HbO peaking after approximately 10 s during mental arithmetics. During the average pause interval, HbO levels still slowly return to baseline after the preceding activation.

Discrimination between pause and mental arithmetics yielded an average of 65.14% accuracy. Of the 12 recorded participants, 9 yielded accuracies significantly higher than chance level (one-sided t-test, p < 0.05). Classification results for all participants can be seen in **Figure 9B**). In a similar study by Herff et al. (2013), mental arithmetics could be discriminated from pause with 71.17% using 8 channels and 67.26% when using only two channels at similar positions as in this study.

# 4. DISCUSSION AND CONCLUSION

# 4.1. Key Findings

In the beginning of this paper, we identified system requirements for mobile fNIRS based neuroergonomics/BCI applications. The results indicate, that the presented open source device satisfies the requirements.

FIGURE 8 | (A) Excerpt of typical raw signal (blue) during mental arithmetics. Green line: binary label (high states: m. arithmetics. low states: relax), magenta line: median filtered signal). (B) Typical power spectrum of raw signal of the same subject and complete session, showing expected shape (power spectrum follows power law) and deviations caused by systemic artifacts.

FIGURE 9 | (A) Average hemodynamic response over all subjects during mental arithmetics and pause. (B) Classification results for single-trial discrimination between pause and mental arithmetics. Whiskers indicate standard errors. Solid line shows chance level.

In the course of the experiments, both, experimentators and subjects, appraised the usability of the device to be high. Miniaturization of the modules and mobility through Bluetooth based wireless transmission allowed free movement, the use of commercial reference systems usually required longer preparation times for optode fixation and was often uncomfortable and static because of the weight of the optical fiber guides and the lack of cushioning of the optodes. In contrast, the new wearable system could be applied within several seconds and was generally perceived less cumbersome during the experiments.

The hardware evaluation results and physiological verification of the designed miniaturized fNIRS instrument indicated a sufficient signal quality and system performance for brain activity measurements with an approximated signal to noise distance of 28 dB. The lock-in amplifier, detector sensitivity, current modulation precision and drift evaluation of the device showed satisfying results comparable to other documented fNIRS devices. The physiological measurements showed the expected hemodynamic responses, classification accuracies in single-trial analysis exceeded chance level for 9 out of 12 participants and yielded results comparable to those measured with a commercial device in a similar study (Herff et al., 2013) using 2 of 8 channels at similar positions (65.14 vs. 67.26%). The open fNIRS device can thus be used for mobile fNIRS-based BCI and neuroergonomics applications.

Battery supply and wireless communication, low heating due to time multiplexing of the channels and the use of LEDs as light sources assured a safe usage of the device.

The scalable modular concept, configurable light intensities and detector amplification gains and the flexible parallel interface of the fNIRS modules allow easy customization and configuration of the hardware.

However, there are still several elements in the design that can be optimized to further improve instrument performance in the future.

# 4.2. Limitations and Next Steps

#### 4.2.1. Mainboard/Data Acquisition and Control

An obvious but crucial component for the use of the fNIRS module is the data acquisition unit. When using custom hardware for data acquisition, the design and selection of the analog-to-digital converter (ADC) determine not only quantization depth but also the frequency resolution of the time division multiplexed fNIRS channels, as the ADC sampling rate has to be shared by the up to 4 active channels of one module. The ADC (LTC2486) first used on the mainboard offered 16 Bit conversion depth and exceptional DC accuracy but significantly limited time resolution due to a conversion time of type 80.3 ms. Additional experiments indicated that, using ADCs with significantly higher sample rate but lower resolution, down to 10 Bit quantization depth can suffice for reliable brain activation measurements. Future designs of the mainboard/DAQ hardware should therefore aim to use a better suited (faster) ADC to prevent the sampling frequency bottleneck. Here, the modular concept is advantageous, as the DAQ-unit can be customized and optimized independent from the hardware of the fNIRS modules.

Power supply and current modulation impact evaluation showed, that even though the implemented linear-voltageregulator-based symmetric supply appeared to be sufficient, several improvements can be suggested for use with the fNIRS module:

To minimize crosstalk between the modulated NIR-LED current and the regulated ±5 V supply voltage rail for the detection hardware, supplying the LEDs with a separate additional voltage regulator circuit is preferable over the use of a common regulator or direct battery connection in the design. Implementation of additional high-frequency filters and enhanced stabilization are also recommended in future approaches to reduce noise pickup from external sources and further minimize LED current modulation influences on the rest of the system. The use of voltage regulators with higher efficiency can further enhance battery life and decrease heating effects, which also can—dependent on the supplying and acquisition hardware's design and layout—influence system drifts.

#### 4.2.2. fNIRS Module

The phase delay dependent attenuation of approximately 0.875 by the lock-in detector is acceptable as it does not significantly decrease overall system accuracy. However, it can be further minimized: To improve the lock-in performance, an analog adjustment of the PWM reference phase could be implemented for overall phase shift compensation. Alternatively, a potentially superior approach for a next-generation design would be digital lock-in demodulation based on a microcontroller/DSP. This bears several advantages: reduced cost of hardware components, reduced power consumption and an adjustable phase shift correction and thus higher precision.

The four channel set up per module using four LEDs and one photodiode was necessary for this first approach using a single-channel analog lock-in receiver branch for a simple interface in favor of modularity. However, to further reduce energy consumption and increase channel density, future approaches should utilize configurations with more PDs measuring simultaneously. Additionally, although the fNIRS module is already compact and provides stand-alone functionality, further miniaturization is possible. A next step will be the development of entirely stand-alone modules to redundantize peripheral hardware such as the mainboard. Integrating the above mentioned insights and data acquisition, digital lock in, power management and wireless transmission components onto a further miniaturized multichannel fNIRS module could enable even more applications in and out of the lab.

The instrument can be improved and evaluated in several more ways. However, providing this fNIRS device open source, we hope that aspects of this work will be helpful to further simplify and reduce time and effort in future custom fNIRS based mobile BCI and neuroergonomics approaches.

#### ACKNOWLEDGMENTS

We acknowledge support by Deutsche Forschungsgemeinschaft and Open Access Publishing Fund of Karlsruhe Institute of Technology.

# REFERENCES


**Conflict of Interest Statement:** The Review Editor Laurens Ruben Krol declares that, despite being affiliated with the same institution as the Author Alexander Von Lühmann, the review process was handled objectively. The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 von Lühmann, Herff, Heger and Schultz. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Why a Comprehensive Understanding of Mental Workload through the Measurement of Neurovascular Coupling Is a Key Issue for Neuroergonomics?

Kevin Mandrick <sup>1</sup> , Zarrin Chua<sup>1</sup> , Mickaël Causse<sup>1</sup> , Stéphane Perrey <sup>2</sup> and Frédéric Dehais <sup>1</sup> \*

<sup>1</sup> Département Conception et conduite des véhicules Aéronautiques et Spatiaux, Institut Supérieur de l'Aéronautique et de l'Espace, Toulouse, France, <sup>2</sup> EuroMov, University of Montpellier, Montpellier, France

Keywords: mental workload, mental resources, neurovascular coupling, neuroergonomics, electroencephalography, near-infrared spectroscopy

Raja Parasuraman, the father of Neuroergonomics (the crossroads of Ergonomics and Neuroscience, **Figure 1**) has opened the doors to new discoveries and techniques for advancing understanding of human behavior with the underlying brain mechanisms (Parasuraman, 1998). As of his death in 2015, a precise and objective definition of the concept of mental workload (MWL) had still not yet been formulated. In this opinion piece, we posit that MWL is associated through the measurement of neurovascular coupling (NVC); innovative neuroimaging methods is now capable of measuring such a phenomenon; all while highlighting Parasuraman's many contributions to this field.

#### Edited by:

Stephen Fairclough, Liverpool John Moores University, UK

#### Reviewed by:

Ranjana K. Mehta, Texas A&M Health Science Center, USA

> \*Correspondence: Frédéric Dehais frederic.dehais@isae.fr

Received: 14 December 2015 Accepted: 13 May 2016 Published: 31 May 2016

#### Citation:

Mandrick K, Chua Z, Causse M, Perrey S and Dehais F (2016) Why a Comprehensive Understanding of Mental Workload through the Measurement of Neurovascular Coupling Is a Key Issue for Neuroergonomics? Front. Hum. Neurosci. 10:250. doi: 10.3389/fnhum.2016.00250

# BEYOND THE CONCEPT OF MENTAL WORKLOAD AND TOWARD MENTAL RESOURCES IN NEUROERGONOMICS

MWL measurement is an important issue in the Human Factors field, as seen through its ubiquitous presence in the literature. It is well acknowledged that an accurate assessment of MWL could help to reduce human error while improving human performance. The recently founded field of Neuroergonomics may help to reduce the ambiguity surrounding the MWL concept by providing data on its underlying neural processes. Neuroergonomics allows for the study of the human brain structure and function with respect to behavior during physical or cognitive performances in the workplace (Mehta and Parasuraman, 2013). The main goal of this interdisciplinary field is to integrate our understanding of the neural basis of cognition in relation to technologies and settings in complex daily life tasks.

However, Neuroergonomics does not yet provide a consensual and comprehensive explanation of the MWL. Despite being a roughly defined concept, there have been some formal attempts. Generally, MWL reflects how hard one's mind is working (under- over-loaded or occupied) at any given moment or how much mental effort it will cost for brain to meet given task demands (Parasuraman, 2003). Furthermore, Parasuraman and Caggiano (2002) and Kramer and Parasuraman (2007) defined MWL as a set of mental and composite brain states that modulate human performance in different perceptual, cognitive, and/or sensorimotor skills. It is also considered as a construct used to reflect the relation between the demands of the environment (input load), the human characteristics (capacities), and the task performances (output performance). However, the notion of MWL is dissociated from performance as suggested

by Ayaz et al. (2012). MWL presupposes that the consumption of true brain resources supports brain activity during work, suggesting a possible link between MWL and the key concept of mental resources. These two concepts can be treated by the intensity of the mental costs and be measured by the mental effort of performing tasks to predict operator performance. As stated by Cain (2007) "As such, [MWL] is an interim measure and one that should provide insight into where increased task demands." Therefore, it is not possible to define MWL without also clearly characterizing mental resources.

Though it is generally admitted that mental resources are appreciable, multiple, independent, and limited (Wickens, 2008), most studies remain vague on their exact nature. One perspective is to think of mental resources as neural pathways. However, this oversimplification ignores the fact that mental resources exists in other forms. As a metaphor, an army may have efficient firepower, but without ammunition, a supply corps, and roads, it is useless. Similarly, the army of the brain has mental resources composed of neural pathways, energy supply, and irrigation (communication channel) to fuel mental effort, implemented by the mobilization of neurophysiological cellular processes in the operator's brain.

# ENERGY MOBILIZATION OF NEUROVASCULAR COUPLING FOR THE OPERATOR'S BRAIN MACHINERY

The absence of consideration of the neurophysiological mechanisms in Neuroergonomics is certainly due to the difficulty in investigating them. Yet, there are real energy mobilizations that occur within the operator's brain machinery across several cellular levels to meet task demand. As previously compared to a super calculator or a computer, the brain machinery supports mental processors that need substantial and constant energy requirements. But the human brain is devoid of intra-cellular capacity for energy storage in oxygen, lactate, and glucose (even if small parts of glycogen exist). Fortunately, the demand for high-metabolic energy of the brain tissue is mainly regulated by complex but adequate energetic substrate delivery via a dense and redundant network of microvessels. Hence, metabolic demands are orchestrated by the blood supply hemodynamic response.

Since the first discoveries by Roy and Sherrington (1890), it has been possible to better understand the close spatiotemporal dynamics between the electrical activity of neuronal cells and the hemodynamic phenomenon that boost the local bloodstream circulation in localized arterioles and capillaries. The intimate neurofunctional relationship that concomitantly links the metabolically active neurons with the increasing oxygenation of the blood flow near of these cells reflects the functional hyperemia and is more widely known as neurovascular coupling (NVC). Simply, NVC is a tight temporal association of the neuronal activity with regional cerebral blood flow delivery. Understanding the fundamental cellular mechanisms underlying NVC is necessary to measure a dimension of the local brain machinery expenditure at work. The appraisal of the energetic costs required by NVC implies the assessment of mental resources. For instance, when an operator is engaged in a task, the mobilization of the neural pathways needs a synergistic support of massive astrocyte glial cells to fuel neurons and interneurons with oxygen and nutriments furnished by close capillaries.

NVC is observable due to changes in neuronal-astroglial and microvasculature activities, which occur in several steps. First, the measurable electrical neuronal activity (spiking and postsynaptic potential activity) is accompanied by synaptic neurotransmitter release (glutamate, GABA) with a neuronal-astroglial regional cerebral metabolic rate of oxygen consumption, mainly for regional cerebral metabolic rate of glucose demand. Second, this activity induces a cascading pathway involving the production and the release of powerful vasodilator metabolites by neurons and astrocytes and drives a chemical signal up to the vascular smooth muscle and pericytes cells along the microvessels which dilate the microvasculature. Third, the microvessels dilatation significantly modulates the regional cerebral blood activity (flow, volume, and oxygenation) which greatly exceeds the neuronalastroglial oxygen requirements, and results in a measurable overabundance of blood flow, hence, a local hyperoxygenation. Yet, the role of NVC as it contributes to the comprehension of the energy mobilization in response to mental resources is not common knowledge. The cellular measures of energy production, delivery, and utilization are crucial to understanding and interpreting NVC activity. How to clearly establish the role of NVC into the operator's brain machinery? One possible way would be to associate the level of correlates of NVC while interpreting the degree of task demand. It seems thus fairly possible that an accurate measurement of NVC, spatially and temporally and in terms of amplitude, would be a valuable neurophysiological marker for quantifying changes in brain activation. Although this statement is still reductionist (that NVC activity is proportional to operator's brain activity), this approach links the concept of human MWL and mental resources to objective neurophysiological measures for Neuroergonomics purposes.

Recent Neuroergonomics research has progressed in neurocognitive or neuroimaging-sensing instrumentation for determining operator states through the measurement of NVC activity associated with the degree of mental processes (Parasuraman and Wilson, 2008). Tremendous advances have been made toward establishing approaches for portable neuroimaging equipment and brain activation measurements to assess sensitivity to NVC in human operators acting in realistic work environments. This development is especially the case in ambulatory functional neuroimaging methods such as functional near-infrared spectroscopy (fNIRS) and electroencephalography (EEG). To date, the aforementioned non-invasive brain imaging techniques are beginning to be well-established in the Neuroergonomics community. These advantages will be even more beneficial in the future as the coupling between these methods becomes more widespread.

# ASSESSING NEUROVASCULAR COUPLING WITH FNIRS-EEG METHODS: AN OBJECTIVE NEUROERGONOMICS APPROACH FOR EVALUATION OF THE OPERATOR'S BRAIN ACTIVITY

Technological advances in opto- and electronic miniaturization have improved the portability and operational flexibility in brain imaging sensors, allowing for greater comprehension of the brain at work in real-world applications (aeronautics, automotive, robotics). fNIRS provides a continuous monitoring of the hemodynamic activity using near-infrared light transmitted between optodes. It infers the changes in the concentrations of oxygenated and deoxygenated hemoglobin in the cortical regions from scattering and absorption properties of light probing beneath the surface of the skull (Perrey, 2008). These two fNIRS signals have their origins in the metabolic response corresponding to a shift of oxygen consumed and the vascular response linked to a modulation of the microvasculature activity (dilatation). This hemodynamic response disrupts the regional cerebral blood flow and volume which exceeds oxygen intake (functional hyperemia) consumed by the recruited neuronal population. fNIRS responses characterize the operator's brain activity related to cerebral blood flow and cerebral tissue oxygenation changes over time (Mandrick et al., 2013; Durantin et al., 2014; Fishburn et al., 2014). Good spatial localization can be derived if a high number of optodes are used in an array, but temporal resolution is coarse by the delayed nature of the hemodynamic response to cortical activity (few seconds).

On the other hand, EEG offers a fine temporal resolution (milliseconds) thus enabling detection of brief neuronal processes, but is limited in its capacity for spatial resolution, at least in real time even though dense array EEG permits source propagation localization. EEG uses scalp electrodes to capture weak electrical current fluctuations generated by inhibitory or excitatory postsynaptic potentials of a pool of neurons firing simultaneously in response to a stimulus. The electrophysiological roots of these signals correspond to the summation of the spontaneously and synchronously recruited neuronal population that contributes to the neuronal activity of the superficial layers of the cortex. EEG waves and eventrelated potentials signals are particularly strong candidates for objective measures of operator's brain activity at the workplace (Parasuraman and Rizzo, 2006). In general, fNIRS and EEG are complementary as they improve on each other's measurement weaknesses in terms of information content (Fazli et al., 2012). Additionally, there is no noise cross-interference between fNIRS and EEG (light and electrical, respectively; Karanasiou, 2012). Therefore, simultaneous fNIRS-EEG signal acquisition would be suitable for assessing NVC in order to evaluate the operator's brain activity in ecological contexts (Hirshfield et al., 2009; Safaie et al., 2013).

However, it not should limit our understanding of the brain activity to only one perspective; looking at the brain at work with new tools and new eyes we could have new NVC comprehension during ecological context. Readers must note that the multimodality using fNIRS-EEG methods is a very promising approach in the investigation of where, when, and how much NVC exhibits energy mobilization during work. The spatiotemporal evolution of the functional neural connectivity and blood flow regulation through the scalp is permitted due to the recording of temporal electrical activity and spatial hemodynamic activity. Consequently, the evaluation of NVC distribution throughout the head becomes accessible. This measurement makes it possible to dynamically map the brain activity and identify the brain areas with the activated main NVC. Additionally, the assessment of the power of the

electrical signal by EEG coupled with the amplitude of the hemodynamic signal by fNIRS will enable a better depiction of the intensity of the NVC, thus extrapolating the effectiveness of the metabolic effort of performing tasks. This view of the degree of extrapolated metabolic correlates as an indicator of the level of mental resources seems straightforward at first glance. However, the metabolic expenditure that fuels cognitive processes is the prerequisite for any mental resources and the assessment of operator's brain activity. The challenge now is to enhance the reliability of NVC measurement in situ with fNIRS-EEG methods.

#### THE FUTURE FOR NEUROERGONOMICS

It is clear that the extensive work of Parasuraman has left the scientific community in an excellent position to objectively define MWL and subsequently, mental resources, through the measurement of NVC activity. It is our opinion that NVC measurement could be achieved through the use of an efficient fNIRS-EEG coupling. In particular, there needs to be greater characterization of the energy mobilization of NVC with respect to neurophysiological mechanisms (neuronal-astroglial, metabolic and hemodynamic activity) and methods for its assessment in work settings (Parasuraman, 2011). There rests a great deal of work in Neuroergonomics before the development of a standard assessment approach of NVC with innovative neuroimaging technology for the evaluation of the operator's brain activity at work. In other words, there are still opportunities for the technological deployment of coupled hybrid devices

#### REFERENCES


(dry-electrodes EEG within a high density headset of fNIRS optodes). From a broader perspective, emerging devices must meet several criteria: discriminate different levels of workload; not interfere with the subject's work and environment; be accepted by the individual; be low cost with high portability; be easy to implement and to evaluate; be reproducible and reliable; and dissociate the mental workload from emotional processes (sensitivity and specificity). Theoretically, a multimodal fNIRS-EEG approach should help to investigate the interactions between different mental states and user behavior while taking into account the physiological processes. Further investigations are warranted to address newer assessments of the neurophysiological events of the operator's brain at work.

## AUTHOR CONTRIBUTIONS

Each of the authors has read and concurs with the content in the final manuscript. The first author (KM) wrote the majority of the manuscript. The other authors (ZC, MC, SP, and FD) have extensively reviewed and revised the manuscript from the first draft before giving final approval of the version to be submitted.

# FUNDING

This work was funded by the French Research National Agency, the French Defence Procurement Agency (ASTRID), and the AXA Research Fund.


Wickens, C. D. (2008). Multiple resources and mental workload. Hum. Factors 50, 449–455. doi: 10.1518/001872008X288394

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Mandrick, Chua, Causse, Perrey and Dehais. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Acute Supramaximal Exercise Increases the Brain Oxygenation in Relation to Cognitive Workload

Cem Seref Bediz 1,2\*, Adile Oniz <sup>2</sup> , Cagdas Guducu<sup>2</sup> , Enise Ural Demirci <sup>1</sup> , Hilmi Ogut <sup>2</sup> , Erkan Gunay 2,3 , Caner Cetinkaya<sup>3</sup> and Murat Ozgoren<sup>2</sup>

<sup>1</sup> Department of Physiology, Faculty of Medicine, Dokuz Eylul University, Izmir, Turkey, <sup>2</sup> Department of Biophysics, Faculty of Medicine, Dokuz Eylul University, Izmir, Turkey, <sup>3</sup> School of Sport Sciences and Technology, Dokuz Eylul University, Izmir, Turkey

Single bout of exercise can improve the performance on cognitive tasks. However, cognitive responses may be controversial due to different type, intensity, and duration of exercise. In addition, the mechanism of the effect of acute exercise on brain is still unclear. This study was aimed to investigate the effects of supramaximal exercise on cognitive tasks by means of brain oxygenation monitoring. The brain oxygenation of Prefrontal cortex (PFC) was measured on 35 healthy male volunteers via functional near infrared spectroscopy (fNIRS) system. Subjects performed 2-Back test before and after the supramaximal exercise wingate anerobic test (WAnT) lasting 30-s on cycle ergometer. The PFC oxygenation change evaluation revealed that PFC oxygenation rise during post-exercise 2-Back task was considerably higher than those in pre-exercise 2-Back task. In order to describe the relationship between oxygenation change and exercise performance, subjects were divided into two groups as high performers (HP) and low performers (LP) according to their peak power values (PP) obtained from the supramaximal test. The oxy-hemoglobin (oxy-Hb) values were compared between pre- and post-exercise conditions within subjects and also between subjects according to peak power. When performers were compared, in the HP group, the oxy-Hb values in post-exercise 2-Back test were significantly higher than those in pre-exercise 2-Back test. HP had significantly higher post-exercise oxy-Hb change (∆) than those of LP. In addition, PP of the total group were significantly correlated with ∆oxy-Hb.The key findings of the present study revealed that acute supramaximal exercise has an impact on the brain oxygenation during a cognitive task. Also, the higher the anerobic PP describes the larger the oxy-Hb response in post-exercise cognitive task. The current study also demonstrated a significant correlation between peak power (exercise load) and post-exercise hemodynamic responses (oxy-, deoxy- and total-Hb). The magnitude of this impact might be related with the physical performance capacities of the individuals. This can become a valuable parameter for future studies on human factor.

# Edited by:

Hasan Ayaz, Drexel University, USA

#### Reviewed by:

Ranjana K. Mehta, Texas A&M Health Science Center, USA Frederic Dehais, Institut Supérieur de l'Aéronautique et de l'Espace, France

> \*Correspondence: Cem Seref Bediz cem.bediz@deu.edu.tr

Received: 07 October 2015 Accepted: 05 April 2016 Published: 20 April 2016

#### Citation:

Bediz CS, Oniz A, Guducu C, Ural Demirci E, Ogut H, Gunay E, Cetinkaya C and Ozgoren M (2016) Acute Supramaximal Exercise Increases the Brain Oxygenation in Relation to Cognitive Workload. Front. Hum. Neurosci. 10:174. doi: 10.3389/fnhum.2016.00174

Keywords: PFC, human factor, fNIRS, cognition and exercise, N-back test

# INTRODUCTION

Individuals usually feel a mental arousal and define an increase in their cognitive abilities after exercise. The relation between exercise and cognitive function has been investigated in several decades. It has been proposed that exercise durations, exercise intensity and the differences in cognitive tasks could cause some contradicting results in previous studies (Ando et al., 2011; Endo et al., 2013; Schmit et al., 2015). Despite the large variations of the reported results, there is meta-analytic evidence that describes significant beneficial effects of acute exercise on cognition in the general population (Lambourne and Tomporowski, 2010; McMorris and Hale, 2012). The positive effects of erobic exercise on various cognitive tasks have been reported mostly. However there are rarely any reports about the effect of anerobic exercise (i.e., the protocol in the current study), characterized as short term and highly intensive effort, on cognition. The specific mechanisms by which exercise affects cognitive functions remain largely unclear (Rupp and Perrey, 2008). One of the mechanisms proposed is the relationship between prefrontal cortex (PFC) oxygenation and cognitive function. With the development of imaging methods, it has been possible to show the brain oxygenation during or post exercise conditions. Most of the studies demonstrate an increment in oxygenation of PFC following exercise (Rupp and Perrey, 2008; Jung et al., 2015).

In the literature, effects of exercise on cognitive performance were investigated during exercise and after exercise period. It is largely accepted that the metabolic activity in brain increases during cognitive tasks. The consumption of energy in the neurons depending on the metabolic activity increase leads an increase in the cerebral blood flow to meet the increased demand of oxygen and glucose. For this reason, the increment in brain oxygenation is accepted to be/as the physiological indicator for cognitive workload. Near infrared spectroscopy (NIRS) is a non-invasive measurement based sensitive method that indicates cerebral hemodynamic response (Ozgoren et al., 2012) to cognitive tasks by means of changing levels of oxyand deoxy-HB (Albinet et al., 2014). Some studies reported a relationship between cerebral oxygenation level and cognitive test scores (Ayaz et al., 2007, 2012; McMorris et al., 2011; McMorris and Hale, 2012). Additionally, Li et al. (2005) and Gateau et al. (2015), describe a higher dorsolateral PFC activation during working memory task by using functional near infrared spectroscopy (fNIRS). Inadequate increase in cerebral oxygenation during cognitive task defined as an indicator of cerebral fatigue (Nybo and Rasmussen, 2007; Mandrick et al., 2013; Mehta and Parasuraman, 2014). Dietrich (2003) reported that, aside from cerebral fatigue, the hypofrontality may occur due to the challenge of source allocation between the areas responsible for physical and cognitive workload.

We hypothesized that the brain oxygenation during cognitive task after the acute supramaximal exercise is higher than the pre-exercise cognitive task. We also expected a higher behavioral performance after the acute supramaximal exercise. This study aimed to evaluate the hemodynamic and behavioral changes of TABLE 1 | Ages, heights, body weights, peak powers and maximal heart rates (Max-HR) of low performers (LP) and high performers (HP) groups were presented as mean ± SD.


cognitive processes following an acute (anerobic) supramaximal exercise.

# MATERIALS AND METHODS

# Participants and Experimental Design

Thirty-five male healthy and physically active subjects (1.78 ± 0.07 cm height and 72.1 ± 10.3 kg weight) aged between 18 and 23 years participated to the study (**Table 1**). All subjects were informed about the procedures and signed a written consent. Experiments were conducted in two visits. In first visit, the subjects were informed and became familiar to both exercise and N-Back test protocols. Within 3 days, subjects re-visited the laboratory to perform a short term-supramaximal acute exercise and cognitive tasks (2-Back tests) before and after exercise (**Figure 1**). Brain oxygenation was continuously measured via fNIRS during cognitive tasks and exercise. In order to evaluate the performance dependent results, an average peak power value was calculated for all subjects. Subjects who had higher power values than the average (751 Watt) was considered as high performers (HP), (N = 17 subjects) while who had lower power values than average were considered as low performers (LP), (N = 18 subjects). There were no statistically significant differences in terms of age, weight and height between the HP and LP groups. The Ethics Committee of Dokuz Eylul University approved all procedures and experimental design.

# Cognitive Task Procedure

Cognitive performance was evaluated by N-Back test before and after exercise. N-Back test, which mainly evaluates the working memory as well as sustained attention, was administrated to evaluate working memory and inhibitory control of irrelevant information (Jonides et al., 1997; Jaeggi et al., 2010). N-Back test was developed in OpenGL by using C programming language, has been administered to the participants by a laptop. A pseudorandom sequence of 120 letters consisting ''K, Q, H, X, M, F, R, and B'' were displayed on the center of the screen, one by one. Participants were asked to press the button on a response pad only if a letter on the screen was the same as the letter shown ''n'' steps earlier. In the present study ''2-Back'' condition was employed. The probability of each letter to be same as the two-step earlier letter was 30%. Each letter was presented on the computer screen for 500 ms and the inter-stimulus interval was set to vary between 1500 and

2000 ms. All tests were completed within 5 min. A brief training session, including a practice sequence of 15 letters, was given to all participants before the actual test to familiarize the subjects to the protocols. In order to evaluate working memory; reaction time for all stimuli total reaction time (TRT) and reaction time for the correct answers (CART), number of correct answers (CA), wrong answers (WA), and missed answers (M) have been obtained and measured. Nexus-10 bio-amplifier, which was equipped with appropriate sensors, was used to monitor the appearance of stimuli on the screen and participant responses in order to measure exact reaction times.

# Exercise Procedure

Wingate Anerobic test (WAnT) protocol was applied as the supramaximal exercise. WAnT is an exhaustive exercise for assessment of anerobic performance (Dotan and Bar-Or, 1983). WAnT is also accepted as a standardized anerobic exercise model for evaluating physiological responses to supramaximal exercise under controlled conditions (Weinstein et al., 1998). Following a warm up on the bicycle for 5 min at intensity about 50 watts, all subjects performed WAnT on a mechanically braked cycle ergometer with an optical pedal counter (Monark 824E, Sweden). Two unloaded 5-s sprints were performed during warm up. Following the warm up, subjects were instructed to pedal as fast as possible for 30 s against a resistance of 80 g/kg body mass. The subjects were verbally encouraged to maintain high pedaling rate throughout the WAnT. Pedal revolutions were monitored and recorded at 1-s intervals. Power outputs were calculated as described in Weinstein et al. (1998). The highest power output in the first 5 s of the test was used to represent the peak power (PP; PP = Distance × Load/ Time).

# fNIRS Recordings

The continuous wave (CW) fNIR system used in the present study (Imager 100, fNIR Devices LLC, MD, USA). System was connected to a flexible sensor pad, Sensor pad contained four light sources with built in peak wavelengths at 730 nm and 850 nm and 10 detectors designed to scan cortical areas underlying the forehead. The forehead area was cleaned with alcohol swap and scrubbing cream (NuPrep, USA). After the cleansing process the sensor was placed to the forehead region and covered by an elastic bandage specifically designed to hold it tightly across the head. Furthermore, a black head bandage was placed on top to eliminate the possible ambient light effects.

This system records two wavelengths and dark current for each of the 16 voxels, totaling 48 measurements for each sampling period (Ayaz et al., 2011, 2013). With a fixed sourcedetector separation of 2.5 cm, this configuration generates a total of 16 measurement locations (voxels) per wavelength. Data acquisition and visualization were conducted using COBI Studio software (Ayaz and Onaral, 2005). The fNIR device calculates relative changes to baseline values of oxy-hemoglobin (oxy-Hb) and deoxyhemoglobin (deoxy-Hb) molecules by means of a CW spectroscopy system that applies light to tissue at constant amplitude. The mathematical basis of CW-type measurements uses the modified Beer Lambert Law (Cope and Delpy, 1988). In the present study baseline condition was started at the beginning of the each task and lasted 20 s. The summation

of oxy-Hb and deoxy-Hb values was described as total-Hb. Moreover, ∆oxy-Hb and ∆deoxy-Hb parameters were used to describe the value differences (oxygenation change; i.e., rise or fall) between average values of two different sessions (i.e. 1OxyHb = OxyHbpost − OxyHbpre) for both oxy-Hb and deoxy-Hb.

#### fNIRS Analysis

The raw intensity measurements at 730, and 850 nm were Butterworth low-pass filtered with MATLAB program (MATLAB and Statistics Toolbox, 2007). Butterworth filter was designed to eliminate possible respiration and heart rate signals and unwanted high frequency noise (Huppert et al., 2009). The artifact removal process has been made according to Ayaz et al. (2012). The PFC oxygenation data retrieved via fNIRS were examined in right, left and central PFC areas and defined as region of interest (ROI; **Figure 2**). Optodes that are located on the leftmost side of the forehead namely 1, 2, 3 and 4 combined to denote left PFC, optodes that are centrally located as 7, 8, 9 and 10 combined to denote central PFC, and optodes that are located on the rightmost side as 13, 14, 15 and 16 combined to denote right PFC. fNIRS signals were calculated and averaged over the whole pre- and post-exercise 2-Back sessions (Endo et al., 2013).

#### Statistical Analysis

Shapiro-Wilk and Kolmogorov-Smirnov tests were used to control the normality of the data. All of the data were distributed normally. Age, height and weight differences between groups were tested via independent groups t-test, and the results were given in **Table 1**. We performed statistical analysis on combined channels as ROI wise.

A two-way group (LP vs. HP) and time (pre vs. post exercise) analysis of variance (ANOVA) was performed on fNIRS data and behavioral data. Moreover, oxy- and deoxy-Hb change were calculated as mentioned before (i.e., ∆OxyHb), and used for the statistical analysis. An independent group t-test was performed to test group differences (LP vs. HP) on oxy-, deoxy- and total-Hb changes. Also, the correlations between peak power and oxy-Hb, deoxy-Hb, and total-Hb changes were investigated via Spearman's rank order correlations test.

#### RESULTS

#### fNIRS Findings

In the present study, 35 healthy subjects were recruited. In order to evaluate the performance dependent results, pre and post exercise PFC oxygenation levels during 2-Back tests were compared for both HP and LP groups with two-way mixed ANOVA. The oxy-Hb, deoxy-Hb, and total-Hb values of the groups were demonstrated for pre- and post-exercise 2-Back sessions were demonstrated in **Table 2**. Also the demonstration of oxy-Hb and deoxy-Hb levels in central PFC area during pre- and post-exercise 2-Back tests were given for whole group averages in **Figure 3**. All of the figures represent the group averages during related task (pre- and post-exercise 2-back).

For oxygenation related analysis; pre- and post-exercise 2-Back tests oxy-Hb values were selected as within-subject factor and peak power was selected as between-subject factor. According to these analyses, oxygenation in post-exercise 2-Back test was higher than the oxygenation in pre-exercise 2-Back test for all three ROI of PFC [F(1,33) = 51.82, p < 0.001, η = 0.61 for left; F(1,33) = 78.42, p < 0.001, η = 0.70 for central; F(1,33) = 54.25, p < 0.001 η = 0.61 for right]. For right PFC area there was


TABLE 2 | Pre- and post-exercise 2-back oxy-, deoxy, and total-Hb levels (mean ± SD) are given for total, LP and HP group in central prefrontal cortex (PFC) area.

The significant differences related to comparisons of oxygenation, deoxygenation and total-Hb between pre- and post-exercise 2-Back session are marked with "<sup>∗</sup> " (where " ∗ " denotes p < 0.05, "∗∗" denotes p < 0.01, and "∗∗∗" denotes p < 0.001).

not any interaction so, the pairwise comparisons were used. According to pairwise comparisons there was not any significant difference between groups, but there were significant differences between pre and post oxygenation levels in right PFC (p < 0.001). Because of the significant interaction in left and central PFC areas between within group effects (Pre/Post) and between group effects (HP/LP) in terms of two-way ANOVA results, between group analyses could not be evaluated (McDonald, 2009). Further analyses were made to clarify the statistical significance via within group paired samples t-test analysis. Paired samples t-test was conducted separately for each group and post-exercise 2-Back test oxy-Hb levels were found significantly higher than the pre-exercise 2-Back test oxy-Hb level for both group [for HP group: in left PFC, T(16) = −5.99, p < 0.001; in central PFC, T(16) = −8.12, p < 0.001; for LP group: in left PFC, T(17) = −3.89, p < 0.001; in central PFC, T(17) = −4.13, p < 0.001].

Moreover, deoxy-Hb related analyses were conducted. For the analyses; pre- and post-exercise 2-Back tests deoxy-Hb values were selected as within-subject factor and peak power was selected as between-subject factor. According to these analyses, deoxy-Hb values were significantly higher in post-exercise 2-Back test than the deoxy-Hb values in pre-exercise 2-Back test for all three ROI of PFC [F(1,33) = 9.56, p < 0.01, η = 0.23 for left PFC; F(1,33) = 7.03, p < 0.05, η = 0.17 for central PFC; F(1,33) = 11.59, p < 0.01 η = 0.26 for right PFC]. For right and left PFC areas there was not any interaction so, the pairwise comparisons were used. According to pairwise comparisons there was not any significant difference between groups, but there were significant differences between pre and post deoxygenation levels in left and right PFC (p < 0.01 for both). Because of the significant interaction in central PFC area between within group effects (Pre/Post) and between group effects (HP/LP) in terms of two-way ANOVA results, between group analyses could not be evaluated (McDonald, 2009). Further analysis was made to clarify the statistical significance via within group paired samples t-test analysis. Paired samples t-test was conducted separately for each group and post-exercise 2-Back test deoxy-Hb levels were found significantly higher than the pre-exercise 2-Back test deoxy-Hb level for HP group in central PFC area [T(16) = −2.67, p < 0.05] and there was not any significance for LP group in central PFC area.

Also total-Hb was significantly higher in post-exercise 2-Back test for all three interested ROI of PFC [F(1,33) = 38.55, p < 0.001, η = 0.54 for left PFC; F(1,33) = 47.20, p < 0.001, η = 0.59 for central PFC; F(1,33) = 42.15, p < 0.001 η = 0.56 for right PFC]. For right and left PFC areas there was not any interaction so, the pairwise comparisons were used. According to pairwise comparisons there was not any significant difference between groups, but there were significant differences between pre and post total-Hb levels in left and right PFC (p < 0.001 for both). Because of the


TABLE 3 | Post- and pre-exercise oxy-Hb, deoxy-Hb, and total-Hb changes (∆oxy-Hb, ∆deoxy-Hb, and ∆total-Hb) are given for LP and HP groups in all PFC area (mean ± SD).

Significances between groups are marked with "<sup>∗</sup> " (where "<sup>∗</sup> " denotes p < 0.05, "∗∗" denotes p < 0.01).

significant interaction in central PFC area between within group effects (Pre/Post) and between group effects (HP/LP) in terms of two-way ANOVA results, between group analyses could not be evaluated (McDonald, 2009). Further analyses were made to clarify the statistical significance via within group paired samples t-test analysis. Paired samples t-test was conducted separately for each group and post-exercise 2-Back test total-Hb levels were found significantly higher than the pre-exercise 2-Back test total-Hb level for HP group [in central PFC, T(16) = −6.24, p < 0.001], and for LP group [in central PFC, T(17) = −3.06, p < 0.01].

In addition to these findings, the changes of oxy-Hb, deoxy-Hb, and total-Hb were compared between HP and LP groups via independent samples t-test (**Table 3**). While comparing the oxygenation changes (∆oxy-Hb) of the LP and HP group, PFC oxygenation rise in central PFC (p < 0.01) and left PFC (p < 0.05) areas were found significantly higher in HP group. While comparing the deoxygenation changes (∆deoxy-Hb) of preand post-exercise 2-Back tests, HP group's PFC deoxygenation changes in central PFC area was found significantly higher (p < 0.05). While comparing the total-Hb changes (∆total-Hb) of pre- and post-exercise 2-Back tests, HP group's PFC total-Hb change in central PFC (p < 0.01) and left PFC (p < 0.05) areas were found significantly higher. Moreover, the peak power values (PP) of the total group were significantly correlated with ∆oxy-Hb, ∆deoxy-Hb, and ∆total-Hb values for central PFC area (r = 0.042 p < 0.01; r = 0.035 p < 0.02; r = 0.040 p < 0.01, respectively). The demonstration of average oxy-Hb and deoxy-Hb levels in central PFC area during pre- and post-exercise 2-Back Tests for HP and LP group were given in **Figures 4**, **5**.

#### Behavioral Findings

All subjects have successfully completed the whole procedure. 2-Back test scores (CA, WA, M, CART, TRT) were compared for total group and there were no significant differences between pre- and post-exercise 2-Back test scores. The same comparison was made for HP and LP groups and also there were no significant differences within (pre-exercise vs. post-exercise sessions) and between groups (**Table 4**). Though not statistically significant the reaction times, the correct answer CART were slightly progressed after the exercise.

### DISCUSSION

Both physical benefits and cognitive improvements of the exercise have been frequently studied in the field. In the present study, the effects of anerobic exercise on brain oxygenation were investigated in relation to cognitive workload. The results revealed an increment in oxy-, deoxy-, and total-Hb levels on the post-exercise session of cognitive task. The increments in these parameters have been found to be statistically higher than the pre-exercise session of the cognitive task. In the related literature, it has been demonstrated that the oxy-Hb level increases in PFC during both physical exercise and cognitive task. Such rise has been linked to the increase in neuralmetabolic activation (González-Alonso et al., 2004; Shibuya et al., 2004; Rupp and Perrey, 2008; Endo et al., 2013). In their study Tam and Zouridakis (2014) describes the oxy-Hb, deoxy-Hb, their summation (oxy-Hb + deoxy-Hb), and their difference (oxy-Hb—deoxy-Hb) measurements corresponding to the changes in oxygen delivery, oxygen extraction, total blood volume delivered, and total oxygenation, respectively. The increasing level of oxy- and deoxy-Hb is accepted as an indicator of an increase in blood flow (Endo et al., 2013). The present study revealed a rise in all oxy-Hb, deoxy-Hb, and total-Hb levels during both pre- and post-exercise cognitive tasks, which might be related to regional increase of cerebral blood flow. In this context, regional blood flow in PFC during postexercise session of cognitive task could be considered higher than the pre-exercise session of cognitive task. Such increased hemodynamic responses may be an indicator for the additional effort during post-exercise cognitive task (Mandrick et al., 2013).

In the present study, a relationship between brain's hemodynamic responses and physical performance (under anerobic supramaximal exercise conditions) has been revealed. The participants were divided into two groups by means of PP mentioned as high- and low-performers in order to address the performance relationship. Despite the similar hemodynamic response patterns during pre-exercise session of cognitive task, different hemodynamic response patterns have been observed between groups during post-exercise sessions of cognitive task (**Figures 3**, **4**). As the most striking difference the rise of oxy-Hb and total-Hb levels of HP group have been found statistically higher than those of LP group in central and left PFC areas. These findings may demonstrate that the rise of oxygen consumption and demand in PFC is higher in the HP group, as known as higher anerobic capacity, than the LP group. In this context, Drigny et al. (2014), reported that the brain oxygenation could change with training in obese patients. Also, Khan and Hillman (2014) reviewed the connection between training and its effects on brain oxygenation levels and suggested a relationship between erobic fitness and cognitive processes that can be demonstrated

Standard deviations are also marked in line with average as vertical lines (Note that only positive deflection is displayed for the sake of simplicity). The four panels are divided into performers (low and high performance, left and right consequently) and oxy- and deoxy-Hb (upper and lower consequently). Left panel indicated oxy-Hb and right panel deoxy-Hb results. Vertical scale denotes the strength of fNIRS signal in µMolar units, which is normalized to baseline. Horizontal scale denotes the time scale in minutes Vertical dashed lines denote pre-, during and post- exercise onsets and durations. The dotted lines represent the periods of warm-up and cool-down.

via different fMRI and EEG studies. A similar relationship was described between exercise and cognitive functions by Davenport et al. (2012). In the light of results of the present study, level of PFC oxygenation in physically better performing group during post-exercise cognitive task might be higher than the physically lower performing group during post-exercise cognitive tasks.

The current study also demonstrated a significant correlation between peak power (exercise load) and post-exercise change of hemodynamic responses (∆oxy-Hb, ∆deoxy-Hb, and ∆total-Hb). This correlation also supports the aforementioned assumptions in which the higher performers have higher PFC oxygenation. This can become a valuable parameter for future studies on human factor by means of physical/cognitive



load. Formerly using a different paradigm, our group has demonstrated the centrofrontal shift of the active brain areas due to cognitive load (Bayazıt et al., 2009). In that study, the cognitive load—such as conflict resolution—displayed a physical drive from the posterior areas towards frontal areas with an increased demand on the frontal regions. Similarly, in the current study, the high peak power task is not solely a motor one but also requires a multitude of cognitive activities. In order to cope with the higher motor (and cognitive) load, the brain has to perform increased levels of concentration, attention, coordination, environment monitoring, and reactive and interactive motor control. The frontal areas are the primary areas for these and similar executive and complex skills. Further clarification is needed in this context to describe the effect of physical fitness on brain oxygenation by means of new study designs and larger size groups.

The effects of acute exercise on cognitive functions have been investigated for decades. However, studies demonstrating the mechanisms of physical exercise on brain oxygenation in relation to cognitive performance via fNIRS are very limited and these mechanism are not well understood (Albinet et al., 2014; Drigny et al., 2014; Dupuy et al., 2015). In previous studies, the participants' performance for decision-making, mental processing speed, selective attention, and reaction time was investigated (Aks, 1998; Arcelin and Brisswalter, 1999; Emery et al., 2001). Moreover, Brisswalter et al. (2002) and Tomporowski (2003) reported that the cognitive performance could be improved by acute exercise. Interestingly, previous studies revealed different findings depending on the type of the exercise, while intense and exhausting exercise causes fatigue (Brisswalter et al., 2002), and light exercise (Varner and Ellis, 1998) causes cognitive performance impairment. Besides, some studies reported a decrease in cognitive functions related with the fatigue caused by the intensity or duration of the exercise (Mehta and Parasuraman, 2014). The physiological basis of this decrement could also be originated from the natural challenge between PFC area related with cognition and motor area related to motion, which causes a hypo-frontality (Dietrich, 2003). The experimental setup of the current study employed a high load but very short (30-sec) anerobic exercise, and such exercise model could cause higher oxygenation responses in PFC similar as erobic exercise models. As mentioned before, the behavioral results displayed only a small but non-significant increase in WA and slight but not significant increase in the CA after the exercise. Our results did not provide a strong support for neither an improvement nor decrement in cognitive scores after acute exercise.

In the related literature, Stroop test was frequently employed as a cognitive task, and generally the improvement of the total test time was accepted as an indicator of improved cognitive functions (Hyodo et al., 2012; Endo et al., 2013). In the present study we employed the 2-Back test as a cognitive task, and evaluated the whole parameters (CA, WA, MA, CART, TRT), but not the total test time. Therefore 2-Back test might not be the right tool as an indicator of the improved cognitive functions. Also the rise of PFC oxygenation could not fully mean an improvement in cognitive function during the

TABLE 4 | The

demonstration

 of correct answers, wrong

answers, missed answers; correct answers reaction time, and total reaction time (mean

± SD) for high (HP) and low performers

 (LP) group post-exercise session. Another plausible explanation might be that the participants had to work harder ''neurally'' (indicated by increased oxy-Hb levels) to maintain the cognitive performance during post-exercise session.

In another view, some studies reported an improvement of cognitive functions after the moderate exercise (Potter and Keeling, 2005; Coles and Tomporowski, 2008). However, a recent meta-analysis (McMorris et al., 2011) raised an issue. Those authors examined the effect of acute, moderate intensity exercise on working memory tasks and found that speed and accuracy of processing were differentially affected. In particular, the positive effects of acute exercise seem to be disproportionately influential on executive control processes (i.e., planning, coordination, inhibition, mental flexibility, working memory) relative to tasks of recall and alertness (Chang et al., 2012; McMorris and Hale, 2012). N-Back task used in the current study indeed involves working memory, alertness, planning, and inhibition strategies. Therefore a possible explanation could be that there might be an increase in the cognitive (i.e., executive control process) functions due to prolonged total blood flow capacity increase in the brain metabolic capacity (both oxygen and glucose should be increased) but somehow 2-Back test might not be tough enough to elucidate such improvement. Another explanation could be that the increased metabolic activity in brain might be the mean of more neural effort to maintain the performance, not the indicator of cognitive improvement.

The findings of current literature are not sufficient to explain positive effects of exercise intensity on cognitive functioning (Soga et al., 2015). This study employed a short-term high intensity and all-out exercise model. Following this anerobic exercise, 2-Back test was administered and its results showed a non-significant enhancement in cognitive scores. The different effects of anerobic and erobic exercise on cognitive scores could cause the insignificant differences in test results despite the increase in brain oxygenation. Also the variation in cognitive tests that administered could lead such insignificant results. Finally, the administration time of post-exercise tests could affect the cognitive scores. Soga et al. (2015) reported that there is no clear information in literature about cool-down periods following exercise and administration time of cognitive test. In addition, they (Soga et al., 2015) underlined the importance of assessing both time and heart rate variables post-exercise before administering the cognitive tests for future studies.

In the related literature, it has been suggested that the increment in PFC oxygenation has a positive effect on PFC functions which uses erobic exercise (Endo et al., 2013). Our study has no statistically significant behavioral results to demonstrate such positive effects on PFC functioning. Though the current study used anerobic exercise model, the results (hemodynamic responses) are in line with the studies which consists erobic exercise model (Ando et al., 2011; Endo et al., 2013; Byun et al., 2014). The obtained changes in oxygenation levels are compatible with results of previous studies. But no statistical differences were found in cognitive performance parameters, which were positively affected by erobic exercise. The major dividing line between the erobic and anerobic exercise are the duration and intensity parameters. And these erobic definitions are mostly related to the body and muscle metabolism. As far as we know there is no certain study that has specifically displayed the dividing line between erobic and anerobic capacity in regard to brain only. This might be pointing to the fact that the relationship of the body metabolic capacity and the dynamic patterns might be different than of the brain's.

There are findings demonstrating that cognitive functions, especially following an intense exercise, could be impaired or not augmented. Soga et al. (2015) reported that exercise has different effects on working memory and inhibitory control components. In our study, despite the PFC oxygenation increment, no statistical differences were found between 2-Back test scores. The different effect mechanisms, exercise intensity, and administration time of cognitive tests could be the reason for this insignificancy. In future studies, with an experimental design employing both erobic and anerobic exercises, the hemodynamic and cognitive performance changes can be demonstrated.

Current study may have a considerable limitation. Heart rate may not recover linearly during post exercise (erobic or anerobic), it is likely that cerebral oxygenation, which is regulated by autonomic responses also change drastically from exhaustion to 1 min post recovery to 5 min post recovery. Therefore even though the oxygenation recordings were done post 5 min plus period we may have a bias in the aspect of fluctuations within the 5 min average periods.

#### CONCLUSION

In summary, acute supramaximal exercise increased the oxy-Hb, deoxy-Hb and total-Hb levels during post-exercise session of 2-Back test. This would indicate that the functions of PFC would increase after acute exercise but the comparison between pre- and post-exercise sessions of 2-Back test scores did not point out any significant improvement. These results could not reveal any cognitive performance augmentation despite the increased PFC oxygenation. As aforementioned in the discussion, such post-exercise oxygenation increase in PFC may not be only derived from cognitive performance. To clarify the basis and effects of oxygenation rise further studies is required with an experimental design, which consist the cognitive tasks in all three phases (pre-, during-, postexercise).

HP group, who has higher PP, outperformed LP group by means of oxy-, deoxy-, and total-Hb changes. Such difference rise a question if the short-term brain oxygenation responses might be affected by physical performance. The current study could provide a methodological approach for human factor studies that would require combining of behavioral and objective methods (fNIRS, etc).

#### AUTHOR CONTRIBUTIONS

The study was planned by CSB, AO, and MO, with experimental design by CSB, CG, EUD, EG, CC. Data was collected and analyzed by CG, EUD, HO, CC. Work was drafted by EUD, CC, EG, and HO. Important intellectual support was given by CSB, AO, MO, EG, and CG throughout the project. All authors prepared the manuscript and approved the final version of the manuscript.

#### FUNDING

This study was funded partly via Dokuz Eylul University, Department of Scientific Research Projects coded by 2012.KB.SAG.081 and 2014.KB.SAG.012.

### REFERENCES


#### ACKNOWLEDGMENTS

Authors would appreciate the research group of Dokuz Eylul University Human Factor Labs. Also authors would like to thank to all volunteers to their attendance to the study and to Ipek Ergonul for editing and commenting of manuscript. The authors would like to thank Dr. Pembe Keskinoglu for her contributions and comments to the statistical analyses.

infra-red transillumination. Med. Biol. Eng. Comput. 26, 289–294. doi: 10. 1007/bf02447083


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Bediz, Oniz, Guducu, Ural Demirci, Ogut, Gunay, Cetinkaya and Ozgoren. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Prefrontal Cortex Activation Upon a Demanding Virtual Hand-Controlled Task: A New Frontier for Neuroergonomics

Marika Carrieri <sup>1</sup> , Andrea Petracca<sup>1</sup> , Stefania Lancia<sup>1</sup> , Sara Basso Moro<sup>1</sup> , Sabrina Brigadoi <sup>2</sup> , Matteo Spezialetti <sup>1</sup> , Marco Ferrari <sup>3</sup> , Giuseppe Placidi <sup>1</sup> and Valentina Quaresima<sup>1</sup> \*

<sup>1</sup> Department of Life, Health and Environmental Sciences, University of L'Aquila, L'Aquila, Italy, <sup>2</sup> Department of Developmental Psychology, University of Padova, Padova, Italy, <sup>3</sup> Department of Physical and Chemical Sciences, University of L'Aquila, L'Aquila, Italy

Functional near-infrared spectroscopy (fNIRS) is a non-invasive vascular-based functional neuroimaging technology that can assess, simultaneously from multiple cortical areas, concentration changes in oxygenated-deoxygenated hemoglobin at the level of the cortical microcirculation blood vessels. fNIRS, with its high degree of ecological validity and its very limited requirement of physical constraints to subjects, could represent a valid tool for monitoring cortical responses in the research field of neuroergonomics. In virtual reality (VR) real situations can be replicated with greater control than those obtainable in the real world. Therefore, VR is the ideal setting where studies about neuroergonomics applications can be performed. The aim of the present study was to investigate, by a 20-channel fNIRS system, the dorsolateral/ventrolateral prefrontal cortex (DLPFC/VLPFC) in subjects while performing a demanding VR hand-controlled task (HCT). Considering the complexity of the HCT, its execution should require the attentional resources allocation and the integration of different executive functions. The HCT simulates the interaction with a real, remotely-driven, system operating in a critical environment. The hand movements were captured by a high spatial and temporal resolution 3-dimensional (3D) handsensing device, the LEAP motion controller, a gesture-based control interface that could be used in VR for tele-operated applications. Fifteen University students were asked to guide, with their right hand/forearm, a virtual ball (VB) over a virtual route (VROU) reproducing a 42 m narrow road including some critical points. The subjects tried to travel as long as possible without making VB fall. The distance traveled by the guided VB was 70.2 ± 37.2 m. The less skilled subjects failed several times in guiding the VB over the VROU. Nevertheless, a bilateral VLPFC activation, in response to the HCT execution, was observed in all the subjects. No correlation was found between the distance traveled by the guided VB and the corresponding cortical activation. These results confirm the suitability of fNIRS technology to objectively

#### Edited by:

Hasan Ayaz, Drexel University, USA

#### Reviewed by:

Stephane Perrey, University of Montpellier - EuroMov, France Murat Perit Cakir, Middle East Technical University, Turkey

> \*Correspondence: Valentina Quaresima valentina.quaresima@univaq.it

Received: 04 November 2015 Accepted: 01 February 2016 Published: 16 February 2016

#### Citation:

Carrieri M, Petracca A, Lancia S, Basso Moro S, Brigadoi S, Spezialetti M, Ferrari M, Placidi G and Quaresima V (2016) Prefrontal Cortex Activation Upon a Demanding Virtual Hand-Controlled Task: A New Frontier for Neuroergonomics. Front. Hum. Neurosci. 10:53. doi: 10.3389/fnhum.2016.00053 evaluate cortical hemodynamic changes occurring in VR environments. Future studies could give a contribution to a better understanding of the cognitive mechanisms underlying human performance either in expert or non-expert operators during the simulation of different demanding/fatiguing activities.

Keywords: functional near-infrared spectroscopy, neuroergonomics, hand-controlled task, LEAP motion controller, virtual reality, remote control, brain activation

# INTRODUCTION

The term neuroergonomics was first introduced in 1997 for depicting an interdisciplinary area of research which involves the intersection of two disciplines: neuroscience and ergonomics (Parasuraman and Rizzo, 2007). The studies in this field, previously carried out using either mobile or immobile neuroimaging techniques, have been nicely reviewed by Mehta and Parasuraman (2013). Virtual reality (VR), a computer-based technology that allows the creation of multisensory simulated environments in which users can interact and receive realtime feedbacks on their performance, was claimed by Kearney et al. (2007) to be highly relevant for neuroergonomics. This because VR can replicate, with a greater control than that applicable in the real world, a wide range of conditions that are impractical or impossible to observe in the real situations; then allowing behavioral and neurophysiological observations of the mind and brain at work. Given its peculiarity, VR is also effectively used by human operators to accomplish their work in dangerous environments, thus avoiding any physical risk. For instance, applying gesture-based control interfaces, VR is usually employed for tele-operated applications such as driving robots, rovers and other devices remotely, with the operators at a certain distance from them (Chen et al., 2015; Liu and Zhang, 2015; Wei et al., 2015). The teleoperated systems are very expensive, unique neither replicable nor quickly replaceable, and from their proper use depends the success or failure of long-planned, critical, costly and challenging operations. Taking into account the high degree of responsibility inherent to operators' duties, their considerable physical/cognitive work should be evaluated objectively by neuroimaging techniques in the framework of neuroergonomics (for review, see Gramann et al., 2011, 2014; Mehta and Parasuraman, 2013).

Although the most widely used immobile functional neuroimaging modality has been undoubtedly represented by functional magnetic resonance imaging (fMRI), the development of portable and wearable neuroimaging devices, comprising electroencephalography (EEG) and functional near infrared spectroscopy (fNIRS; for review, see Mehta and Parasuraman, 2013; Gramann et al., 2014; Scholkmann et al., 2014) has considerably facilitated the approach of neuroergonomics. fNIRS, with its high degree of ecological validity and its very limited requirement of physical constraints to subjects, represents a valuable tool for monitoring cortical responses in the research fields of neuroergonomics (for reviews, see Ayaz et al., 2013; Derosière et al., 2013). Furthermore, compared to fMRI, fNIRS is silent, allowing to avoid any bias in the results due to difficulties in focusing on the task because of the high level of noise. Briefly, fNIRS is a non-invasive vascularbased functional neuroimaging technology which assesses, simultaneously from multiple measurement sites, concentration changes in oxygenated-deoxygenated hemoglobin (O2Hb/HHb, respectively) at the level of the cortical microcirculation blood vessels (Scholkmann et al., 2014). O2Hb/HHb, indeed, interact differently with near infrared light, so that both physiological indexes can be recovered from the measured signal. This is a further advantage of fNIRS over fMRI, the latter being able to recover only a single physiological index, namely the blood-oxygen level dependent (BOLD) signal. When a specific brain region is activated, cerebral blood flow increases in a temporally and spatially coordinated manner through a complex sequence of coordinated events, tightly linked to changes in neural activity (i.e., neurovascular coupling). The coupling between the neuronal activity and the cerebral blood flow is fundamental to brain function. fNIRS relies exactly on this coupling to reveal the activated cortical region by measuring the associated cortical blood oxygenation changes (i.e., the increase in O2Hb and the decrease in HHb).

Since 1993, fNIRS has been employed for evaluating the spatiotemporal characteristics of the cortical activation during different motor tasks related to upper and/or lower limb exercise (for a review, see Leff et al., 2011). In most previous fNIRS studies, the activation of the sensory-motor cortex and the PFC has been widely investigated in different tasks of the lower limb like walking (Koenraadt et al., 2014; Mirelman et al., 2014), stepping (Huppert et al., 2013), precision stepping (Koenraadt et al., 2014), etc. Several fNIRS studies have also investigated the PFC activation during hand tasks such as: finger movement (Wriessnegger et al., 2012), passive finger movement (Chang et al., 2014), isometric grasping/grasping (Mandrick et al., 2013), handgrip exercise (Derosière et al., 2014), learning a hand motor skill (Hatakenaka et al., 2007), etc.

Since 2000, fNIRS has been also employed in real-world activities realized in VR environment for evaluating the PFC activation during the simulation of different hand-related demanding/fatiguing activities, like airplane piloting (Ayaz et al., 2012a; Durantin et al., 2014; Gateau et al., 2015), car driving (Tomioka et al., 2009), grasping (Holper et al., 2010, 2012), natural orifice transluminal endoscopic surgery (James et al., 2011), etc. Interestingly, the inferior frontal gyrus (Ayaz et al., 2012a) and the dorsolateral PFC (DLPFC; Durantin et al., 2014; Gateau et al., 2015) were found activated during airplane piloting tasks. The bilateral ventrolateral PFC (VLPFC) was found activated during natural orifice transluminal endoscopic surgery when the simulation required a more difficult navigation path through an orifice (James et al., 2011).

It is well-known that the PFC, and in particular the VLPFC and DLPFC, are involved in the control of the motor actions. On one hand, it has been demonstrated that the VLPFC is involved in visuo-motor learning tasks (Yamagata et al., 2012; Hoshi, 2013). Moreover the reflexive orienting seems to be controlled by the right VLPFC (Corbetta et al., 2008), whereas the goal relevant information for the action control seems to be maintained and retrieved by the left VLPFC (Badre and Wagner, 2007; Souza et al., 2009). On the other hand, the DLPFC apparently plays a specific role in learning by trial and error (Halsband and Lange, 2006). Furthermore DLPFC, for its involvement in mediating and monitoring of actions, is considered to be the major anatomical correlate of the central executive (Baddeley, 2003; Gateau et al., 2015). Therefore, the VLPFC and the DLPFC are involved in associating visual information with motor responses (Halsband and Lange, 2006; Tanji and Hoshi, 2008).

The PFC plays not only a crucial role in single cognitive or motor tasks, but also in combined sensorimotor-cognitive task (i.e., dual-task; Gentili et al., 2013; Mandrick et al., 2013; Mirelman et al., 2014). Several fNIRS studies have reported that, in comparison to a single task, the attentiondemanding dual-tasks (e.g., walking while talking, calculating while stepping, balancing a ball while walking, etc.) induced an increase of the PFC activation due to a greater cognitive load (Holtzer et al., 2011). For instance, Mandrick et al. (2013) investigated how an additional mental load (i.e., arithmetic task) during isometric grasping affects the PFC activation. The performance of the mental task was impaired when the motor task difficulty increased, suggesting that performing a dual-task requires more attentional resources than performing a single task.

In the last few years, the use of VR interfaces, driven by natural hand movements for remote control, is growing-up thanks to the development of innovative optical 3-dimensional (3D) systems for gesture recognition (Erden and Çetin, 2014). The key advantage of gesture recognition technology is that no physical contact is required between the human body and the gesture recognition device, so that the subjects can move freely. One of the most recent optical 3D sensors, based on stereo-vision, is the LEAP Motion Controller<sup>r</sup> (LEAP). The LEAP is a high-resolution 3D hand-sensing device, which allows the freehand natural interaction crucial for the implementation of real-time, realistic VR systems. This low cost non-bulky device has an extremely accurate reactivity (Bachmann et al., 2014). In addition, the 3D rendering technologies, including state of the art displays and visors specifically designed for VR (Dodgson, 2013; Desai et al., 2014; Nan et al., 2014), have permitted a large development of VR techniques. In particular, their high visual/rendering fidelity and an immersive wide field of view enables the sensation of presence and the feeling being actually inside the virtual scene with the possibility to have multi-sensorial feedbacks.

The aim of the present study was to investigate noninvasively by fNIRS the PFC responses in healthy subjects while performing a complex hand-controlled task (HCT) in a VR environment. This task emulated the interaction with a real, remotely-driven, system operating into a critical environment. The hand movements were captured by the LEAP, a high spatial and temporal resolution 3D hand-sensing device. The subjects were asked to move their right hand/forearm with the purpose of guiding a virtual ball (VB) over a virtual route (VROU). The VROU can be easily and purposely designed to replicate a real track that an operator should travel to carry on a given challenging operation. The HCT-related PFC response was monitored non-invasively by a 20-channel fNIRS system. The HCT involved the control of the hand/forearm movement, and the active interaction with the virtual environment through the hand/forearm motor actions. The attentional resources allocation and the integration of different executive functions (e.g., coordination, planning, decision making, etc.) are needed in performing the HCT. Taking into account the wellknown role played by the VLPFC and the DLPFC in motor action preparation and in the allocation of the attentional resources to generate goals from current situations, it was hypothesized that either VLPFC or DLPFC or both of them would be activated bilaterally in subjects while performing the HCT.

# MATERIALS AND METHODS

### Participants

Fifteen University students (all males, age: 26.6 ± 2.9 years; level of education: 14.4 ± 2.1 years), without neurological or psychiatric illness and normal or corrected-to-normal vision were recruited in the study. In order to prevent any gender differences in emotional responses (Matud, 2004) and in visuo-motor abilities (Wang et al., 2015), only men were enrolled. To exclude left-handed subjects, all participants completed the Edinburgh Handedness Inventory assessing hand dominance. Following a full explanation of the protocol and its non-invasiveness, and prior to the starting of the experimental procedure, a written informed consent was obtained from each participant. All procedures were conducted in accordance with the Declaration of Helsinki and approved by the University Ethics Committee.

# Experimental Setup

#### Hand-Controlled Task (HCT)

A VR HCT was implemented by integrating a LEAP Motion Controller<sup>r</sup> with a real-time 3D engine. The LEAP provides both a 3D hand model and real-time hand tracking information for enabling subjects to transpose their hand movements within the virtual 3D HCT (**Figure 1**). The LEAP is a small (1.3 cm × 3.2 cm × 8 cm) 3D sensor which uses two internal infra-red (IR) cameras and three IR

indicates the corresponding effect on the rover. The length of the arrows indicates the force amplitude impressed by the operator. Note that, during the HCT, the hand/forearm virtual model (second column) was not shown to the operator and the visualization of the arrows (third column) was disabled.

light emitting diodes to detect objects within a dome of approximately 0.22 m<sup>3</sup> above it. Its spatial and temporal resolution is 1 mm and 15 ms, respectively. The LEAP, connected to a computer via a USB cable, is designed specifically to detect, in real-time, hand and finger motions and gestures, such as pinching fingers, closing hand, tapping, etc. This device, positioned under the palm center of the right hand at a distance of about 25 cm (**Figure 1**), was utilized to: (1) capture the movements of the hand; (2) associate hand movements to a virtual hand model; and (3) translate the movements of the virtual hand model to a set of commands in order to drive a VB within a virtual environment.

In the present study, the adopted virtual environment was aimed at simulating the driving action of a spider-like robot similar to the one developed by the National Aeronautics and Space Administration (NASA). Due to its high stability, equilibrium and ability to change quickly direction, this robot has been proven to be adapted for moving in very rough environments and in environments designed for humans (to go up and down stairs). The movements of the ball, in fact, simulated fairly those ones performed by the considered robot (lateral, ahead, behind, and stop). The aim of this adopted HCT was to move the VB over a VROU of a fixed length (42 m; **Figure 2**), trying to travel as long as possible the distance in a fixed time without falling (2 min). Either in the case of VB falling or in the case of accomplishment of a VROU in advance, subjects were requested to restart the VROU from the beginning. To calculate the whole distance traveled by each subject over the HCT, the completed VROU and the distance traveled until the task end were considered. The VROUs with failure were not considered. The VROU adopted in this study (**Figure 2**) was purposely designed to reproduce a narrow road including some critical points (i.e., stairs, turns, a slippery part, and climbs). The Torque 3D Engine<sup>1</sup> , a crossplatform high performance real-time 3D engine, was used both for the editing and the rendering of the whole virtual 3D HCT. A controllable VB and a 3D VROU were created by using a customized version of Marble Motion, a well-known game<sup>2</sup> . In particular, the whole native source code was rewritten and enriched in order to fulfill the requirements of the task: (1) a time driven version of the task in which both the start and the end of the HCT were established by a fixed time interval; (2) the possibility of storing information related to the operator-task interaction process (including the number of times in which the VB was fallen out of the VROU, the position of the VB falls over the VROU, the distance traveled by the VB in the given time). The software allowed also the calculation of some dynamic parameters (followed trajectory, VB speed and acceleration). These last parameters have been considered to calculate the VB speed along the VROU, to get information about resting periods occurred during the HCT, and to evaluate the subject skills while executing the HCT. Moreover, to increase the subject's concentration and motivation during the HCT, the whole technical items, including elements of the VROU (textures and materials), were redefined utilizing some predefined items of the 3D engine editor (Torque 3D Editor<sup>3</sup> .

The subject was asked to place his right forearm on a fixed and firm support in order to allow the hand capture (**Figure 1**). This support ensured the maintenance of the correct position of the forearm, and consequently of the hand, during the execution of the HCT. The task started with a stationary

<sup>1</sup>http://www.garagegames.com/products/torque-3d

<sup>2</sup>http://mit.garagegames.com/MarbleMotion-1-0b.zip

<sup>3</sup>http://www.garagegames.com/products/torque-3d/overview/editor

VB placed at the beginning of the VROU. The subject had to maintain his right hand opened over the LEAP device by keeping his forearm on the support with the center of the palm perpendicular to the center of the device and with all the five fingers extended (**Figure 1**). At the beginning of the HCT, the subject had to guide the VB over the VROU by using four commands (**Figure 1**). The first command (hand flexion) made the VB to proceed forward; the second command (hand extension) made the VB to decrease the speed (up to stop the VB) and to proceed backward; the third and the fourth commands (counter clockwise and clockwise rotations of the wrist) made the VB to move toward left or right, respectively (a combined use of the hand flexion/extension movements or the rotation of the wrist made the VB to stop). These hand movements had a real-time proportional impact on the VB. More specifically, the command chosen by the subject transmitted the direction and the ''force'' to the VB (e.g., a low hand downward flexion corresponded to a low ''force'' application to the VB in the forward direction), especially when the VB speed was depending on the inclination degree of the downward flexion and the time during which the hand was maintained at the same position. Thus, the HCT was purposely designed to combine the four main commands and when the subject assumed a pose of his hand halfway between two commands, the system merged both directions and ''force'' amplitude. In this way, the subject had the feeling to guide the VB without restrictions or constraints.

The effective range of the LEAP tracking system is limited to roughly 60 cm due to the low near infrared light intensity. In this study, the distance between the fNIRS head probe and the LEAP was always greater than 60 cm. Therefore, the LEAP should have not interfered with the fNIRS measurements. Interestingly, several fNIRS studies in combination with other near infrared based tracking systems such as Kinect or eye trackers have been published (Kita et al., 2010; Sukal-Moulton et al., 2014; Urakawa et al., 2015). Although those systems have utilized more powerful light emitters (than the LEAP), and the light emitters have been directly pointed toward the fNIRS head probe, the potential interference with the fNIRS data was not mentioned.

#### fNIRS Instrumentation and Data Processing

A two-wavelength continuous wave 20-channel fNIRS system (Oxymon Mk III, Artinis Medical Systems, Netherlands) was utilized to map non-invasively the changes in O2Hb and HHb over the bilateral PFC. The details of this instrumentation have been previously reported (Basso Moro et al., 2013). The O2Hb/HHb data from the 20 channels were acquired at 10 Hz. The O2Hb/HHb concentration changes (expressed in ∆µM), obtained by using the modified Beer-Lambert law and the age-dependent differential pathlenght factor (4.99 + 0.067 × Age0.814) were displayed in real-time on a PC monitor. Eight optical fiber bundles (length: 3.15 m; diameter: 4.5 mm) were utilized to transport the light to the left and the right PFC (four for each hemisphere), whereas ten optical fiber bundles of the same size (five for each hemisphere) were utilized to collect the light emerging from the PFC. The illuminating and collecting bundles were assembled into a flexible probe holder, consisting of two mirror-like units (9.7 cm × 8.9 cm each) held together by three flexible junctions. In 16 out of the 20 channels the illuminator-detector distance was set at 3.5 cm, while in the remaining four channels the illuminator-detector distance was set at 1 cm (short-separation channels or SS channels). In the 16 channels, the measurement points were defined as the midpoint of the corresponding illuminator-detector pairs. The probe holder was placed over the subject head by a Velcro brand fastener in order to get a stable optical contact with the scalp (**Figure 1**). In particular, the two frontopolar fibers bundles, collecting the light at the bottom of the holder, were centered (according to the International 10–20 system for the EEG electrode placement) on the Fp1 and Fp2 locations for the left and right hemisphere, respectively. The pressure created by the fastener was sufficient to induce a partial transient blockage of the skin circulation during the fNIRS study. The adopted procedure would suggest that a consistent reduction of forehead skin blood flow was occurring as a result of this approach. The Montreal Neurological Institute coordinates of the optodes and the relative 16 measurement points were calculated using a probe placement method. For the details of this procedure see Basso Moro et al. (2013). The measurement points 1, 2, 3, 9, 10, 11 corresponded to the DLPFC, which includes part of the Brodmann's Area (BA) 46; the measurement points 5, 6, 13, 14 corresponded to the frontopolar cortex, which includes part of the BA 10; and measurement points 4, 7, 8, 12, 15, 16 corresponded to the VLPFC, which includes part of the BA 45.

During the data collection procedure, the fNIRS signal quality as well as the absence of movement artifacts were verified on the PC monitor. The subject's heart rate (HR) was monitored by a pulse oximeter (N-600, Nellcor, Puritan Bennett, St. Louis, MO, USA) with the sensor clipped to the index finger of the left hand. The Homer2 NIRS processing package<sup>4</sup> was employed to analyze the data. Raw intensity data in each channel were converted into optical density changes (OD). Channels showing low intensity values were

<sup>4</sup>http://www.nmr.mgh.harvard.edu/PMI/resources/homer2/home.htm

excluded from further analyses. The Wavelet motion correction method was employed to correct motion artifacts. Based on the method developed by Molavi and Dumont (2012), it sets to zero all wavelet detail coefficients exceeding a predefined threshold (iqr = 0.1). The modified Beer-Lambert law was then applied to convert the corrected OD data into concentration changes. A General Linear Model (GLM) approach (hmrDeconvHRF\_DriftSS) was utilized to recover the mean hemodynamic response function (HRF) for each subject and channel. The approach, consisting in adding the SS channel signal with the highest correlation with the analyzed standard channel signal in the design matrix, was able to reduce the contribution of the mean arterial blood pressure changes in task-evoked fNIRS signal. The less restrictive set of Gaussian functions with standard deviation (SD) of 3 s and with their means separated by 2 s was chosen as temporal basis functions (ranging between −20 before and 210 s after the starting of the HCT; Gagnon et al., 2011).

#### Experimental Design

A familiarization/training phase was carried out 3 days before the study. The subjects were informed about the procedures and familiarized with both the experimental setting and the HCT. During this phase, the fNIRS probe holder was placed over the head of the subjects who were trained to stay as firm as possible to avoid movement artifacts in fNIRS measurement during the HCT execution. After evaluating the joint mobility of their hand and wrist, the subjects were requested to pay attention to the presented four commands to be used for guiding properly the VB (hand flexion, hand extension, counter clockwise and clockwise rotations of the wrist). Later, the subjects were asked to place their right hand opened over the LEAP in order to verify the correctness of the estimated hand virtual model. Once this phase was completed, the subjects were asked to guide a VB over a VROU. In order to avoid a potential learning effect, a different VROU was used in this training phase. For each subject, the training phase was considered completed when he demonstrated his ability to guide properly the VB. After 3 days, at the same time of the training phase, the subjects participated in the study. A monetary reward was not given.

This study was carried out in a quiet and dimly lit room. The subject was asked to sit on a comfortable high-backed chair in front of a 17<sup>00</sup> PC monitor, and to keep his forearm on the firm support with the right hand opened over the LEAP (see ''The Hand-Controlled Task'' Section; **Figure 1**). The HCT protocol lasted 6 min. Specifically, the protocol started with a 3 min baseline, during which subjects were asked to relax (observing a white fixation cross presented on a black screen) in order to get stable fNIRS signals. Then, a stationary VB came into view on the PC monitor, and a visual instruction informed the subjects that the 2 min HCT was starting. During the HCT, the subject had to guide the VB over the VROU through the four hand movements (see ''The Hand-Controlled Task'' Section). When the subject failed in guiding the VB over the VROU or the subject completed the VROU in less than 2 min, the VB was repositioned at the VROU starting point, and the route restarted. At the end of the HCT execution, there was a recovery period (1 min), in which the subject was requested to relax while observing a white fixation cross presented on a black screen. In order to evaluate the potential ''state anxiety'' provoked by HCT, all the subjects completed the 20-items of the State Trait Anxiety Inventory Form Y-1 (STAI) before and after the protocol.

## Data Analysis and Statistics

The integral values of the O2Hb/HHb (INTO2Hb/HHb) changes of the HRFs were calculated from the beginning (at 0 s) until the end of the HCT (at 120 s), for each measurement point and subject, and were used as metric for the following statistical analysis. The mean values of the HR changes (analyzed as percentage of control) were calculated from the beginning (at 0 s) until the end of the HCT (at 120 s). Both the INTO2Hb/HHb changes and the HR mean values were corrected for the baseline periods, calculated over the last 20 s before the starting of the HCT. The median of the values of the distance traveled by the guided VB was calculated to subdivide the subjects in two groups: best performers (above the median) and worst performers (below the median). The subject of the median value was not included in any group. Student's t-test was conducted in order to evaluate the presence of any difference, in terms of the distance traveled by the guided VB between the worst and best performers.

All data were examined for normality and sphericity using Shapiro–Wilk and Mauchly's Sphericity tests, respectively. Each level of the independent variables followed a normal or approximately normal distribution in all the dependent variables (O2Hb, HHb and HR), permitting the use of parametric statistical analyses. When the sphericity was not assumed, the Greenhouse-Geisser correction was utilized.

In order to investigate the PFC activation in response to HCT, the two-way analysis of variance (ANOVA) was applied to INTO2Hb/HHb changes. The ANOVA included two factors: measurement point (16 levels) and cortical hemodynamic response (CHR; i.e., corrected task period vs. zero; 2 levels). To control for multiple significance tests, the Fisher's least significant difference adjustment was applied. A series of oneway ANOVAs was performed for the HCT in order to evaluate the influence of the CHR (2 levels) on the INTO2Hb/HHb changes. In particular, the one-way series of ANOVAs were performed only for the INTO2Hb/HHb changes related to the measurement points 7, 8, 15 and 16, chosen as descriptive measurement points of the hemodynamic response. For the HCT, the Pearson's correlation coefficient was calculated in order to evaluate the relation between the distance traveled by the guided VB and the INTO2Hb/HHb changes in the 7, 8, 15 and 16 measurement points. A one-way ANOVA was performed for the HCT in order to evaluate the influence of the CHR (2 levels) on the HR changes. The Pearson's correlation coefficient was also calculated for evaluating the relation between the distance traveled by the guided VB and the HR. Student's t-tests were conducted in order to evaluate the presence of any difference in: (1) the anxiety state before and after the protocol; (2) INTO2Hb/HHb changes in the 7, 8, 15 and 16 measurement points between the worst performers and best performers; and (3) the distance traveled by the guided VB of worst and best performers.

All statistical analyses were conducted with SPSS 20.0 (SPSS Inc., Chicago, IL, USA). Data were expressed as mean ± SD. The criterion for significance was p < 0.05.

## RESULTS

The behavioral data analysis revealed the following main results. There was no significant difference (t = 0.64, p = 0.53) in the anxiety state before (28.8 ± 6.4) and after the protocol (28.9 ± 5.8). The distance traveled by the guided VB in the 15 subjects was: 21, 26, 36, 43, 51, 53, 56, 58, 61, 83, 88, 96, 101, 132, and 148 m, respectively. The median value was 58 m. The mean distance was 70.2 ± 37.2 m. The less skilled subjects failed several times in guiding the VB over the VROU. Therefore, the distance traveled by their guided VB was shorter. Indeed, the distance traveled by the guided VB was significantly different (t = −4.89, p < 0.001) between the worst performers (40.9 ± 13.7 m) and best performers (101.3 ± 29.7 m). The fNIRS data evidenced a heterogeneous O2Hb/HHb response over the mapped cortical area in the subjects while performing the HCT (**Figure 3**). In particular, since the beginning of the HCT, a progressive O2Hb increase and a concomitant progressive HHb decrease were observed in the measurement points 7, 8, 15 and 16, corresponding to the VLPFC, which includes part of the BA 45. About 15 s after the end of the HCT, a gradual return of O2Hb/HHb to the corresponding baseline values, was observed. This delay is reasonable considering that the cerebral blood flow increase lasts over the period of the HCT. The statistical analysis revealed the following main results. The two-way ANOVA analysis, carried out on the INTO2Hb changes, revealed a significant main effect of: (1) the measurement point (F(2.67,37.47) = 9.52, p < 0.001), and (2) the measurement point <sup>∗</sup> CHR interaction (F(2.67,37.47) = 9.52, p < 0.001). The two-way ANOVA analysis, carried out on the INTHHb changes, revealed a significant main effect of the: (1) measurement point (F(2.83,39.67) = 15.70, p < 0.001); (2) CHR (F(1.00,14.00) = 13.83, p = 0.002); and (3) measurement point <sup>∗</sup> CHR interaction (F(2.83,39.67) = 15.70, p < 0.001). The two-way ANOVAs, carried out on the INTO2Hb/HHb changes, revealed the main significant differences between the measurement points 7, 8, 15, 16 and all the others (ps < 0.05). The series of one-way ANOVAs, carried out on the INTO2Hb changes of the measurement points 7, 8, 15, and 16, revealed a significant activation in all the measurement points (F(1,14) = 5.32, p = 0.037; F(1,14) = 23.79, p < 0.001; F(1,4) = 7.33, p = 0.017; F(1,14) = 16.89, p = 0.001).The series of one-way ANOVAs carried out on the INTHHb changes of the measurement points 7, 8, 15, and 16 revealed a significant cortical activation in all the measurement points (F(1,14) = 41.61, p < 0.001; F(1,14) = 58.57, p < 0.001; F(1,14) = 19.12, p = 0.001; F(1,14) = 26.23, p < 0.001). In the HCT, no correlation was found between the distance traveled by the guided VB and the corresponding INTO2Hb/HHb changes (ps > 0.05). During the HCT, no differences (ps > 0.05) were found in the INTO2Hb/HHb changes in the 7, 8, 15 and 16 measurement points between the worst performers and best performers. The one-way ANOVA analysis for the HR mean values revealed a significant main effect of the CHR (F(2.88,40.40) = 8.06, p < 0.001). However, the mean values of the HR changes during the execution of the HCT increased only of about 15% with respect to the mean value of the baseline. No correlation was found between the HR and the distance traveled by the guided VB.

# DISCUSSION

In this feasibility study, the bilateral PFC was investigated by a multi-channel fNIRS system while subjects performed a demanding VR HCT, a remotely-driven operation simulated by a high-resolution and low-cost 3D hand-sensing device. The observed involvement of the bilateral VLPFC supports the formulated hypothesis.

The results of the present study have indicated a consistent bilateral VLPFC activation (measured as O2Hb increase and a concomitant HHb decrease) in response to the execution of the VR HCT (**Figure 3**). It has been reported that VLPFC is involved in associating visual information with motor responses (Tanji and Hoshi, 2008). In fact, the execution of the adopted HCT requires the combination of the contextual visual information in order to coordinate the hand/forearm movements for guiding the VB over the VROU. Although the tested subjects had the same age and level of education (University students), their skills in performing the HCT were different. Indeed, the distance traveled by the VB was heterogeneous: the distance traveled by the VB guided by the best performer was about seven times longer than the distance traveled by the VB guided by the worst performer. These results clearly confirm that the designed VROU (**Figure 2**) was really demanding. The diverse skills of the subjects could not be attributable to emotional factors, because no difference between anxiety state before and after the HCT, and no correlation between the distance traveled by the guided VB and the HR of the subjects were found. However, a bilateral VLPFC activation was observed indiscriminately in all the tested subjects, including the ones who never completed at least one VROU. Therefore, the present fNIRS data did not provide the possibility to discriminate the subjects according to their performance. This could be partly explained by the fact that other cortical areas were not investigated in this study and that subcortical areas and/or cortical-subcortical network are supposed to be responsible of the differences between the best and the worst performers. As well known, PFC is involved in the executive functions (e.g., attention, coordination, planning, decision making, etc.), the same required to perform the HCT regardless of the performer's skills. The combined use of a fNIRS-EEG system has evidenced a greater involvement of the deeper structures (e.g., hippocampus) in the ''good performers group'' compared with the ''bad performers group'' while executing a spatial navigation

task (Kober et al., 2013). Moreover, it has been reported that a complex sensorimotor-cognitive task, such as the adopted HCT, would require the involvement of different cortical-subcortical networks including: PFC, spinal cord, brainstem, cerebellum, basal ganglia, and motor cortex (Takakusaki, 2008). This suggests that the degree of the cognitive demand (measured by fNIRS), required for executing the HCT, is not associated with the subjects performance (measured as the distance traveled by the guided VB). This in part confirms the results of other VR fNIRS studies in which a dissociation between the mental work demanded to execute a complex task and the performance output was observed (Ayaz et al., 2012a; Boyer et al., 2015). In the present study, this dissociation could be also explained by the fact that tested subjects had only a short familiarization/training phase. A higher activation of the PFC was usually observed in non-expert subjects while executing a novel VR task compared to expert (Ayaz et al., 2012b). The nonexpertise requires the employment of more attentional resources to perform a novel task. On the contrary, the expertise implies an increase of automaticity and does not require the same high level of attention and control (Ayaz et al., 2012b). The VLPFC activation, observed in the present study in non-expert subjects while performing the novel VR HCT, suggests that some of the well-known executive functions (e.g., attention, coordination, planning, decision making, etc.) are required in the learning phase. Then, it could be supposed that the amplitude of the observed VLPFC activation would become lower or even disappear in subjects very familiar with the VR HCT.

Several studies evidenced the advantages of using fNIRS technology for investigating non-invasively cortical responses in subjects while performing different VR tasks; the most representative studies are listed in **Table 1**. The common relevant finding is represented by the activation of different regions of the frontal cortex and the PFC. For example, the involvement of the medial PFC (mPFC) and the frontopolar cortex (Ayaz et al., 2012b) and the inferior frontal gyrus (Harrison et al., 2014) was observed in subjects while executing air traffic control tasks; an activation of the overall frontal cortex was observed in subjects while performing a train piloting task (Kojima et al., 2005). Very recently, the usefulness of fNIRS as a tool to conduct driving research has been nicely reviewed (Liu et al., 2015). For example, an activation of the right PFC (Tomioka et al., 2009) and an activation of the overall PFC (Tsunashima and Yanagisawa, 2009) were found in subjects while executing different simulated car driving tasks. However, in all of the above reported studies, no activation of the VLPFC, and in particular of the BA 45, was found.

The combined use of fNIRS-EEG would be an ideal tool for carrying out studies in the field of neuroergonomics. The pros of lightweight, high-density EEG and fNIRS recording

#### TABLE 1 | Selected fNIRS studies about the effects of VR tasks on different cortical areas.


ACC, adaptive cruise control; AD, Alzheimer's disease; ATC, anterior temporal cortex; BD, brain damage patients; BLK, blocked order; BP, bad performers; Ch, number of channels; CS, chronic stroke patient; D, Device; D1, Imager 1000 (fNIR Devices, USA); D2, NIRO-200 (Hamamatsu Photonics, Japan); D3, CW6 (TechEn, USA); D4: ETG-4000 (Hitachi, Japan); D5, OMM-3000 (Shimadzu, Japan); D6, FOIRE-3000 (Shimadzu, Japan); D7, NIRO-500 (Hamamatsu Photonics, Japan); D8, Wireless prototype (Zurich University, Switzerland); D9, Imagent (ISS, USA); D10, OEG-16 (Spectratech Inc., Japan); DLPFC, dorsolateral prefrontal cortex; FC, frontal cortex; FP, frontopolar; GP, good performers; IFG, inferior frontal gyrus; IS, intraparietal sulcus; (L), left; M1, primary motor cortex; MC, motor cortex; MTG, middle temporal gyrus; NA, not available; OC, occipital cortex; PC, parietal cortex; PFC, prefrontal cortex; PMC, premotor cortex; (R), right; SC, somatosensory cortex; SMA, supplemental motor area; STG, superior temporal gyrus; TC, temporal cortex; VR, virtual reality; <sup>∗</sup> fNIRS-fMRI study; ∗∗combined fNIRS-EEG study.

to study natural human cognition have been previously reviewed (Gramann et al., 2011, 2014). Simultaneous fNIRS-EEG measurements offer complementary functional information about neuronal activity and hemodynamic changes in order to provide a wider perspective on different aspects of the cortical processes. To the best of our knowledge, Kober et al. (2013) have first compared neuronal responses of good and bad navigators during a VR spatial navigation task by a combined fNIRS-EEG system. The commercial fNIRS systems, utilized either by Kober et al. (2013) or in the present study, are equipped with fiber optic bundles. The disadvantage of using fiber optic bundles is that the fibers are often heavy and with a limited flexibility. Therefore, this kind of fNIRS instrumentation is not the best choice for studies in the neuroergonomics field. Since 2009, different battery operated multi-channel wearable/wireless fNIRS systems have been commercialized (Scholkmann et al., 2014). A 64-measurement point wireless fNIRS system was developed and integrated with simultaneous EEG and electrocardiography (ECG) monitoring in order to record data up to several days (Zhang et al., 2014). These most advanced versions of integrated fNIRS-EEG systems represent a suitable tool for evaluating brain activation in response to cognitive tasks executed in normal daily activities. To make fNIRS technology more suitable for neuroergonomics studies, in terms of robustness, mobility, user-friendliness and customization, a very recent and successful effort has been made to realize a dedicated fNIRS device (von Lühmann et al., 2015). Further future hardware developments will increase the production of miniaturized wireless integrated fNIRS-EEG devices and their use in neuroergonomics. It is noteworthy to mention that a 16-measurement point wireless fNIRS system has been recently coupled with transcranial direct current stimulation (tDCS) in order to investigate the effects of tDCS on spatial working memory (McKendrick et al., 2015). These authors have suggested the utility of using the combination of simultaneous tDCS and fNIRS techniques for future applications in the field of neuroergonomics: from enhanced/accelerated learning and training of complex human-machine systems to optimization of task load for improved safety and productivity. The neuroergonomics approaches can also provide sensitive and reliable assessment of mental workload in complex tasks and naturalistic work settings (Parasuraman, 2011). The mental workload has been defined as the degree of the effort to be made by the brain to meet the task demands (Young et al., 2015). Recently, Peck et al. (2014) have reviewed the use of fNIRS to measure mental workload in the real world tasks, and the different approaches for automatic detection of the workload.

The strengths and the limitations of the fNIRS technique have been previously discussed in detail (for review, see Scholkmann et al., 2014). The fNIRS equipment is transportable, completely safe and non-invasive. These advantages allow for the investigation of brain activity in natural conditions (e.g., while sitting on a chair) and during daily life activities (e.g., standing and/or walking). Therefore, with respect to other functional neuroimaging methods such as fMRI, fNIRS represents an useful tool for neuroergonomics research (for review, see Ayaz et al., 2013; Derosière et al., 2013), for studies in other fields of neuroscience such as brain-computer interface (for review, see Naseer and Hong, 2015), human-robot interaction (for review, see Canning and Scheutz, 2013), and cognitive states measurements (for review, see Strait and Scheutz, 2014). In addition, very recently the integration of fNIRS with a wearable technology, such as Google Glass, has been demonstrated (Afergan et al., 2015). For an adequate understanding of the current findings, some limitations should be pointed out: (1) this study has been conducted in a small sample of healthy young male adults subjects and the subjective cognitive load was not tested by NASA Task Load Index; (2) the duration of the adopted task and the length of the route were relatively short, hence, the effect of a longer duration of the VR HCT on the PFC hemodynamic response remains unknown; (3) this

# REFERENCES

Afergan, D., Hincks, S. W., Shibata, T., and Jacob, R. J. K. (2015). Phylter: a system for modulating notifications in wearables using physiological sensing. Lect. Notes Comput. Sc. 9183, 167–177. doi: 10.1007/978-3-319-20816-9\_17

study did not imply a control session for example including/not including motor task with/without VR; (4) this study did not contemplate repeated trials on separate days in order to verify the reproducibility and the potential learning effect of the HCT; (5) the limited number of measurement points (16) made possible by the utilized fNIRS system did not allow the investigation of the supposed connectivity between the PFC and other cortical areas (e.g., premotor and motor cortices) likely involved in performing HCT; and (6) this study did not take into account the impact of the variability of the skull thickness amongst the 16 measurement points within subject and amongst subjects. This anatomical variability could be examined by acquiring structural T1-weighted MRI scans from each subject.

# CONCLUSION

The results of the present study confirm the promising application of fNIRS technology to objectively evaluate cortical hemodynamic changes occurring in VR environments. The ongoing development of fNIRS technology, finalized to deliver more dedicated, sophisticated and wireless devices, together with the most advanced VR solutions, could provide the best combined approach for monitoring operators training and assessing mental work. Future studies could give a contribution to a better understanding of the cognitive mechanisms underlying human performance either in expert or non-expert operators.

# AUTHOR CONTRIBUTIONS

Study design and protocols were conceived by MC, AP, SL, SBM, MS, MF, GP and VQ. Data collection was performed by MC, AP, SL, SBM, MS, MF, GP and VQ. Data analysis was performed by MC, AP, SL, SBM, SB, MS, MF, GP and VQ. The manuscript was written by MC, AP, SL, SBM, SB, MS, MF, GP and VQ.

# ACKNOWLEDGMENTS

The study has been performed in the framework of the ''Interdepartmental Research Centre for Molecular Diagnostics and Advanced Therapies''. This work was supported in part by: (1) the 2014 grant from the ''Fondazione Cassa di Risparmio della Provincia dell'Aquila'', and (2) the ''Abruzzo Earthquake Relief Fund'' (Toronto, ON; purchase of the Artinis system). The authors wish to thank Dr. Simone Cutini, ''Department of Developmental Psychology and Socialization'', University of Padua, Italy for helping in the identification of the relative Brodmann's Areas, and Dr. Danilo Avola, ''Department of Mathematics and Informatics'', University of Udine, Italy for HCT implementation.

Ayaz, H., Cakir, M. P., Izzetoglu, K., Curtin, A., Shewokis, P. A., Bunce, S., et al. (2012a). ''Monitoring expertise development during simulated UAV piloting tasks using optical brain imaging,'' in Proceedings of the IEEE Aerospace Conference (Big Sky, MN), 1–11.


and near-infrared spectroscopy study. PLoS One 5:e11050. doi: 10.1371/journal. pone.0011050


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Carrieri, Petracca, Lancia, Basso Moro, Brigadoi, Spezialetti, Ferrari, Placidi and Quaresima. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Into the Wild: Neuroergonomic Differentiation of Hand-Held and Augmented Reality Wearable Displays during Outdoor Navigation with Functional Near Infrared Spectroscopy

Ryan McKendrick <sup>1</sup> \*, Raja Parasuraman<sup>1</sup> , Rabia Murtza<sup>1</sup> , Alice Formwalt <sup>1</sup> , Wendy Baccus <sup>1</sup> , Martin Paczynski <sup>1</sup> and Hasan Ayaz 2,3,4 \*

<sup>1</sup> Psychology Department, Human Factors and Applied Cognition, George Mason University, Fairfax, VA, USA, <sup>2</sup> School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, PA, USA, <sup>3</sup> Department of Family and Community Health, University of Pennsylvania, Philadelphia, PA, USA, <sup>4</sup> Division of General Pediatrics, Children's Hospital of Philadelphia, Philadelphia, PA, USA

#### Edited by:

Richard A. P. Roche, National University of Ireland, Maynooth, Ireland

#### Reviewed by:

Sean Commins, National University of Ireland, Maynooth, Ireland Antonia Hamilton, University of Nottingham, UK

#### \*Correspondence:

Ryan McKendrick rmckend2@gmu.edu; Hasan Ayaz hasan.ayaz@drexel.edu

Received: 13 December 2015 Accepted: 26 April 2016 Published: 18 May 2016

#### Citation:

McKendrick R, Parasuraman R, Murtza R, Formwalt A, Baccus W, Paczynski M and Ayaz H (2016) Into the Wild: Neuroergonomic Differentiation of Hand-Held and Augmented Reality Wearable Displays during Outdoor Navigation with Functional Near Infrared Spectroscopy. Front. Hum. Neurosci. 10:216. doi: 10.3389/fnhum.2016.00216 Highly mobile computing devices promise to improve quality of life, productivity, and performance. Increased situation awareness and reduced mental workload are two potential means by which this can be accomplished. However, it is difficult to measure these concepts in the "wild". We employed ultra-portable battery operated and wireless functional near infrared spectroscopy (fNIRS) to non-invasively measure hemodynamic changes in the brain's Prefrontal cortex (PFC). Measurements were taken during navigation of a college campus with either a hand-held display, or an Augmented reality wearable display (ARWD). Hemodynamic measures were also paired with secondary tasks of visual perception and auditory working memory to provide behavioral assessment of situation awareness and mental workload. Navigating with an augmented reality wearable display produced the least workload during the auditory working memory task, and a trend for improved situation awareness in our measures of prefrontal hemodynamics. The hemodynamics associated with errors were also different between the two devices. Errors with an augmented reality wearable display were associated with increased prefrontal activity and the opposite was observed for the hand-held display. This suggests that the cognitive mechanisms underlying errors between the two devices differ. These findings show fNIRS is a valuable tool for assessing new technology in ecologically valid settings and that ARWDs offer benefits with regards to mental workload while navigating, and potentially superior situation awareness with improved display design.

Keywords: fNIRS, situation awareness, mental workload, spatial navigation, working memory, head-mounted display, neuroergonomics

# INTRODUCTION

The availability and use of highly mobile computing devices is increasing. Examples include fitness trackers, smartwatches, and smartphones; however, there are also devices such as Google Glass, Occulus Rift and Microsoft Hololens which promise not just mobile computing but the coexistence of real world objects with supplementary computer generated objects (i.e., augmented reality; Azuma et al., 2001). Augmented reality wearable displays (ARWD) are already being put into service by the National Aeronautics and Space Administration (NASA). It is believed that these devices will help astronauts on the international space station improve their training and performance in highly demanding situations (Schierholz et al., 2015). While it is clear that having a hands-free display can improve physical ergonomics, especially when both hands are required for adequate task execution, ARWDs could also enhance cognitive ergonomics through augmentation of mental workload and situation awareness.

Ideal task performance is dependent on optimizing mental workload. Mental workload refers to the limited information processing capacity of the brain that is demanded by a task (Parasuraman et al., 2008). When demands exceed the brains maximum information processing capacity, further increases in mental workload lead to ever increasing decrements in performance (Hancock and Parasuraman, 1992). This can be realized as incorrect responses, missed responses or even the ''shedding'' of secondary tasks (Wickens et al., 2013). ARWDs have the potential to reduce mental workload by reducing the distance and time between visual fixations. Reducing fixation time and distance could reduce the amount of information needed to be held in working memory. For example, during simulated emergency braking, drivers using Google Glass to send text messages experienced less mental workload relative to drivers using a smartphone (Sawyer et al., 2014). ARWDs have also been used to improve operator comfort and procedure efficiency during cardiac surgery (Opolski et al., 2015). An ARWD allowed cardiologists to view reconstructed tomographic images while performing catheterization, improving landmark visualization and verification of surgical tools.

Situation awareness, the perception of critical information (stage 1), comprehension of its meaning (stage 2), and the projection of this information into the future (stage 3; Endsley, 1995a) is also critical for complex task performance (Wickens et al., 2013). High situation awareness, while not guaranteeing successful performance, increases the probability of successful performance. Like mental workload, situation awareness is dependent on working memory and highly dependent on attention (Endsley, 1995a). In this regard ARWDs have the potential to both enhance and degrade situation awareness. ARWDs may enhance situation awareness by freeing up working memory capacity. Conversely, ARWDS may reduce situation awareness from degradation of divided attention. Divided attention relates to the optimal allocation of attention to different inputs by splitting or rapidly shifting the focus of attention (Parasuraman, 1998). The compellingness of ARWD symbology is more likely to exogenously capture the focus of attention and hold it (Thomas and Wickens, 2001, 2004). This results in increased focused attention to display elements, and reduced or eliminated attention to task relevant information outside of the ARWD display. This phenomena of increased focused attention to a display coinciding with decreased divided attention to an external scene is referred to as cognitive tunneling (Fischer et al., 1980). Cognitive tunneling is often implicated in aviation studies where a failure to perceive and act on an unexpected event reduces performance (Crawford and Neal, 2006).

Measurement of situation awareness and mental workload in ARWDs is problematic. Traditionally situation awareness and workload are assessed with questionnaires administered during artificial pauses (Situation Awareness Global Assessment Technique (SAGAT); Endsley, 1995b), in task probes (Situation Present Assessment Measure (SPAM); Durso and Dattel, 2004), or upon task completion NASA Task Load Index (TLX; Hart and Staveland, 1988). Within dynamic environments such assessments can be intrusive, thereby reducing ecological validity, or underrepresenting time critical signals, such as abrupt changes in workload. Workload can also be objectively assessed via dual-task secondary task decrements. In the dual task paradigm, interference on a cognitive process is anticipated between the primary task and the secondary task. This results in a decrement in performance on the secondary task, due primarily to the mental resource demands of the secondary task exceeding the mental resources that can be allocated. This secondary task performance decrement can be used as an index of the cognitive workload required of the primary task (Gopher, 1993; Wickens, 2008; Wickens et al., 2013). However, dual-task decrements have been criticized with regard to circularity; as performance varies with resource allocation, but resources are only inferred from performance (Navon, 1984).

An objective, non-invasive, motion artifact robust and portable method is needed to measure situation awareness and mental workload in ARWDs. Functional near infrared spectroscopy (fNIRS) provides an attractive method for continuous monitoring of brain dynamics in both seated and mobile participants (Ayaz et al., 2013). fNIRS is safe, highly portable, user-friendly and relatively inexpensive, with rapid application times and near-zero run-time costs (Villringer and Chance, 1997; Ayaz et al., 2012a; Ferrari and Quaresima, 2012). fNIRS uses specific wavelengths of light to provide measures of cerebral oxygenated and deoxygenated hemoglobin that are correlated with the blood-oxygen-level dependent (BOLD) contrast used in functional magnetic resonance imaging (fMRI; Cui et al., 2011; Sato et al., 2013). Importantly fNIRS measurements are objective and non-invasive to the mental task being measured. fNIRS for mobile neural measurement is also relatively robust to motion artifacts and allows wearable sensors to be physically untethered to the acquisition module (Ayaz et al., 2013; McKendrick et al., 2015). Mobile fNIRS allows for a freedom of movement not previously possible in neuroimaging, providing the opportunity to monitor mental workload and situation awareness in dynamic mobile tasks.

FIGURE 1 | Map depicting the four routes followed by participants. Exact routes depicted in red, white arrows indicate walking direction. Image © 2015 DigitalGlobe.

Hemodynamic indexes of mental workload as used by fNIRS and fMRI assume that activity related metabolic changes in specific functional brain regions are useful indexes of mental workload. Prefrontal cortex (PFC) is commonly monitored due to its functional relationship with working memory (Braver et al., 1997; Cohen et al., 1997), decision making (Ramnani and Owen, 2004; Figner et al., 2010), and executive control (Badre et al., 2005; Badre and Wagner, 2007). A growing body of research has found fNIRS hemodynamic measurements of PFC to be a useful index of mental workload in a number of complex cognitive and real world tasks (Ayaz et al., 2011, 2012b; Abibullaev and An, 2012; Naseer and Keum-Shik, 2013; Bogler et al., 2014; Derosière et al., 2014; Herff et al., 2014; Schudlo and Chau, 2014; Pinti et al., 2015; Solovey et al., 2015). Divided attention has also been associated with activity in PFC (Corbetta et al., 1991; Herath et al., 2001; Loose et al., 2003; Fagioli and Macaluso, 2009; Mizuno et al., 2012). Divided attention is a key component of dual tasking (Pashler, 1994), and superior dualtasking has been associated with decreased activity/more efficient processing in PFC (Rypma et al., 2002; Grabner et al., 2006; McKendrick et al., 2014). Reduced demands on working memory capacity and superior dual-tasking are factors that influence greater situation awareness (Endsley, 1995a). Therefore, reduced PFC activity may be implicative of greater situation awareness during ARWD use.

#### TABLE 1 | List of situation awareness queries.


The present study implemented a neuroergonomics approach (Parasuraman, 2003) to examine the cognitive differences between an ARWD (Google Glass) and a handheld display (Smartphone). We used mobile fNIRS to monitor lateral PFC and complimented it with two separate secondary tasks assessing differences in mental workload and situation awareness during navigation. Superior performance on the secondary tasks is anticipated to reflect reduced mental workload and greater situation awareness respectively. Reduced PFC activity is anticipated to index reduced mental workload and improved situation awareness in the absence of secondary task errors. Specifically, the ARWD was expected to show reduced mental workload and superior situation awareness across both behavioral and hemodynamic indices.

## MATERIALS AND METHODS

## Participants

Twenty participants (12 female adults) volunteered for the study. All participants were right handed and aged 18–29 years. Each participant was randomly assigned to one of two experimental groups. The two experimental groups each contained 10 participants. If complications were experienced viewing the Google Glass display, these individuals were moved into the other experimental condition (two such complications occurred). All participants reported normal or corrected to normal vision. All participants also reported average or greater cardiovascular health, and had no history of cardiovascular abnormalities. Each participant gave informed consent via a form approved by the George Mason University Institutional Review Board prior to study participation.

# Primary Task

#### Route Following

Participants were given a visual map of a route to walk along. The visual map was generated via Google Maps and presented to the subject via an Apple Iphone 4 s (which the participant held in hand) or Google Glass (affixed to the participant's head). In both devices Google Maps presents a birds-eye-view of the route with a digital arrow indicating the direction to be followed, as well as written turn-by-turn instructions. Google maps also provides auditory turn-by-turn instructions but these were muted in both devices. Four different routes were used (route one = 1500 ft, route two = 1400 ft, route three = 1600 ft, route four = 2000 ft; **Figure 1**) and each participant walked all four routes, total experiment time was between 45 and 60 min. The route following took place on a North American college campus. Portions of the routes were familiar to the participants, however the majority were unfamiliar and selected specifically because these regions are not frequented by university undergraduates. The routes also contained portions that simulated urban and rural environments. Each route was entered into either device by the experimenter. Participants in the hand held device (Smartphone; Apple Iphone 4 s) group were asked to hold the device in their right hand and lift the device near their field of view when confirmation of the correct route was needed (to avoid excessive motion artifacts in the fNIRS signal from tilting the head down). Participants in the ARWD (Google Glass) group were instructed to keep their right index finger on the Google Glass touchpad. This was done to ensure that Google Glass did not enter ''sleep mode'' during route following and to control for physical load in the right arm across devices. Once the route navigation began participants had no interaction with the devices other than viewing the generated route. Participants were instructed to walk at the pace they felt most comfortable with. This was done to minimize variability in the physical load of the walking task via self-adaptation. If errors were made during route following, participants were tapped on the shoulder and instructed as to the correct direction of the route. Only two such errors occurred throughout the experiment, one in each display group in the same navigation route, the error was related to a poor GPS signal.

# Secondary Tasks

#### Auditory 1-Back

While following the route, participants simultaneously completed 37 blocks of an auditory 1-back lasting 60 s each. The auditory stimuli consisted of tone triplets randomly composed from fundamental frequencies of 493.88, 554.36, 698.45 and 880 Hz presented via Bluetooth in-ear headphones. The tones were created from bandpass filtered white noise and a tone overlay. The triplets were presented randomly in one of three spatial locations; left, right and central (balanced sound distribution). Five triplets were presented for each block. Participants were asked to compare the triplet they had just heard to the triplet they had previously heard. If the two triplets were of the same frequencies presented in the same sequence, then the trial was considered a match. At the end of a block participants were prompted by the experimenter to verbally indicate how many matches they heard. The experimenter recorded the response within the program administering the auditory task and participants were immediately given feedback regarding the accuracy of their response. An fNIRS measurement block began with each 1-back block and ended just prior to the participant being prompted to respond.

#### Scenery Probe

While route following, participants were also asked 10 questions about their surroundings to assess and help maintain an accurate awareness of the environment. After a prompt from the experimenter to be ''situationally aware'', participants maintained this search disposition for approximately 30 s after which the experimenter asked them to stop moving and face forward. During this time the experimenter queried whether the participant had seen a particular object in the environment. The participant was previously informed to respond verbally with a response of either ''yes'' or ''no''. Queried objects could either have been present in the environment or not present, and there were six instances where the queried object was present and four where it was not. When the object was present in the environment the participant was stopped and queried 5 s after the object was no longer visible. Participants were given immediate feedback regarding the accuracy of their responses. The query list is presented in **Table 1**. An fNIRS measurement trial began when participants were prompted to be situationally aware and ended when the participant was asked to stop walking just before the scenery probe query.

#### Procedures

#### fNIRS Setup

Participants were seated and asked to remove any makeup from their forehead with an alcohol swab and or adjust their hair prior to affixing the wireless and battery operated fNIRS neuroimaging device, Model 1100W (fNIR Devices, LLC<sup>1</sup> ). The hardware unit was connected to headband sensor pads via cable and transmitted the data wirelessly to a remote tablet computer. Both the pocket sized control hardware (that contains the battery and antenna) and sensor pads were affixed to the subject making the participant completely mobile during recording. Two separate sensor headband pads were placed approximately 3 cm above the participant's brow and centered approximately with respect to the eye pupil of the corresponding side, laterally symmetric from the midline of the participant's forehead, one pad for left and the other for right hemisphere monitoring. The positioning was intended to capture hemodynamic changes in bilateral dorsolateral PFC. Draw strings attached to the sensor pads were used to prevent the pads from moving once positioned on the participant. A 9 cm wide self-adhesive bandage of length approximately the circumference of the participant's head was folded width-wise and secured around the participant's head across the brow just below the fNIRS sensor pads. Next a sheet of aluminum foil approximately half the circumference of the participant's head and folded width-wise was form fitted over the bandage and fNIRS sensor pads. Care was taken to ensure that the fNIRS sensor pads were fully encapsulated by the aluminum foil sheet. This was done to ensure that while imaging in sunlight infrared light from the sun would not contaminate the fNIRS signal. Once the foil was affixed to the participant two more selfadhesive bandages of length approximately the circumference of the participant's head were used. One bandage folded twice width-wise was wrapped around the participant's head just below the fNIRS sensor pads, over the participant's brow and over the aluminum foil. The second bandage was folded once width-wise and wrapped around the participant's head just above the fNIRS sensor pads and over the foil. These bandages were used to ensure that the foil did not shift during walking, and special care was taken to minimize constrictive pressure over the fNIRS sensor as initial pilot tests showed this to be extremely uncomfortable for the participants after only a few minutes of walking. Once the sensors, foil, and bandages were positioned, the fNIRS device was turned on and the received light signal was adjusted by light source brightness and detector gain for signal quality. Also, an ambient light channel was captured to further assess signal quality. When the signal was deemed adequate, the participant was asked to put the

FIGURE 2 | Participant in augmented reality wearable displays (ARWD) group wearing battery operated wireless functional near infrared spectroscopy (fNIRS) sensor over the forehead, Google Glass and Bluetooth headphones (left) wireless fNIRS sensor pads (right, top) and placement sketch (right, bottom) with four optodes identified between light source and detectors.

<sup>1</sup>www.fnirdevices.com.

fNIRS transmitter in their pocket. Final setup can be viewed in **Figure 2**.

#### Experimental Paradigm

Once the fNIRS neuroimaging setup was complete, participants were given the Bluetooth head phones and instructed to place the earbuds in their ears. Prior to this, the earbuds were cleaned with alcohol swabs. If the ear buds did not fit, a new size bud was used to optimize the setup for the participant. Once the headphones were set up, participants were introduced to the auditory 1-back and scenery probe tasks as described in sections ''Auditory 1-Back'' and ''Scenery Probe'' respectively. For the auditory task, participants were informed as to the type of stimuli they would hear, and what was considered a correct response, after which participants performed one practice block to ensure they understood the task. If participants were still unclear as to the nature of the task following the practice block, a second practice was given. No participant required more than two practice blocks in order to understand the principal of the auditory task. For the scenery probe task, participants were told they would be prompted to be ''situationally aware'' at which point they should be acutely aware of their surroundings. They were also informed that after being in this state for a brief period they would be questioned as to whether an object was or was not present in the environment during this time. Participants were informed that both of these tasks would take place while TABLE 2 | Auditory 1-back secondary-task hemodynamics as a function of device and accuracy.


Notes. <sup>∗</sup>p < 0.05; ∗∗p < 0.01; ∗∗∗p < 0.001.

they were route following, but that the auditory 1-back and scenery probe task would never occur simultaneously. Once participants acknowledged they understood the nature of the two secondary tasks, the experimenters and participant relocated outdoors. All testing sessions took place between 7 and 11 am to minimize fatigue from midday heat. The participant was told they would navigate a predetermined route, the route will be displayed via a navigation device (dependent upon their group assignment) and programed into the device by the experimenter. The navigation task is described in detail in section ''Route Following''. The first secondary task was prepared and the participant was instructed to begin. The secondary task orders were randomized within-subjects across the four routes, at least 15 s of navigation occurred between secondary task blocks. The start and end positions along the routes for each secondary task were preplanned so that each participant would experience the same secondary task at the same place along their navigation routes. Start and end times of the secondary tasks were synchronized with the fNIRS signal via manual entry of timing markers in the data acquisition program at the preplanned start and end positions. Upon completing a route, the participant was instructed to relax and asked whether they were still comfortable and if they wished to continue. No participant indicated they would like to stop participation due to discomfort. The navigation device was then taken by the experimenter, a new route was inputted, and the next route began.

TABLE 3 | Auditory 1-back secondary-task hemodynamics as a function of device and accuracy.


Notes. <sup>∗</sup>p < 0.05; ∗∗p < 0.01; ∗∗∗p < 0.001.

#### fNIRS Signal Processing

For each participant, raw light intensity fNIRS data (4 optodes × 2 wavelengths per optode) that were sampled at 4 Hz were low-pass filtered with a finite impulse response, linear phase filter with order 20 and cut-off frequency of 0.1 Hz to attenuate high frequency noise, respiration and cardiac cycle effects (Ayaz et al., 2011). Each participant's data was checked for any potential saturation (when light intensity at the detector was higher than the analog-todigital converter limit) and motion artifact contamination by means of a coefficient of variation based assessment (Ayaz et al., 2010) and for each optode, a separate channel that recorded ambient light, provided for additional verification. The light intensity changes for 730 and 850 nm wavelengths for each optode for each task block were extracted using time synchronization markers of task onset and end marked during the experiment and hemodynamic changes during each block were calculated separately using the Modified Beer-Lambert Law as described in Ayaz et al. (2012b). Ten seconds (10 s) local baselines were used in the modified Beer-Lambert law to calculate oxygenation for each task condition to look at the relative changes in oxygenated and deoxygenated hemoglobin within each task condition. The local baselines were taken at the beginning of each secondary task, during that time participants were mobile and performing the primary task. The time series for each block was further binned, the hemodynamic response at each optode across the trial was temporally divided into sub-blocks of 10 s each and each TABLE 4 | Auditory 1-back secondary-task hemodynamics as a function of device and accuracy.


Notes. ∗∗∗p < 0.001.

sub-block was averaged across time to provide a down-sampled hemodynamic response at each optode for each block. The final output of each optode was mean block deoxygenated hemoglobin (HbR), mean block oxygenated hemoglobin (HbO).

# fNIRS Analysis

#### Generalized and Linear Mixed Effects Models

All forthcoming statistical tests employ either linear mixed effects, or generalized linear mixed effects models implemented in R (R Core Team, 2012) via lme4 (Bates et al., 2014). Denominator degrees of freedom and p-values were estimated via Sattherwaite corrections implemented via lmerTest (Kuznetsova et al., 2013). These models offer several advantages as extensions of the general linear model (GLM). Such as, analysis of binomial outcomes, treatment of effects as simultaneously fixed and random, hierarchical modeling, analysis of unbalanced designs, and robustness to missing data (Pinheiro and Bates, 2000; Baayen et al., 2008; Jaeger, 2008; Verbeke and Molenberghs, 2009; Demidenko, 2013).

#### Fixed and Random Effects Selection

Bayesian information criterions was used to select the fixed and random effects in the final models for each dependent variable. Competing models were constructed by adding potentially meaningful random and fixed effects to a null model. The null model was specified in each case as having no fixed effects and a random effect of participant intercept. All competing



Notes. <sup>∗</sup>p < 0.05; ∗∗p < 0.01; ∗∗∗p < 0.001.

models were estimated with maximum likelihood to allow for testing of fixed effects. The competing models were tested simultaneously with BIC and the strength of evidence criterion described by Kass and Raftery (1995) was employed. In the procedure deviations of greater than two BIC are viewed as a meaningful difference. The final model was selected based on having the lowest BIC, with no other models of interest having a BIC deviance of less than two. This procedure serves to both minimize over fitting of the models random effects, and to act as an omnibus test of variance for fixed effects and interactions between fixed effects, as passing this procedure ensures that these variables accounted for a meaningful amount of variance in the data.

#### Multiple Comparisons Corrections

In all forthcoming analyses of fNIRS data multiple comparisons were corrected for across hypotheses and optodes but within secondary tasks and chromophores by adjusting p-value criterion with false discovery rate (FDR) corrections. Controlling for FDR can increase statistical power relative to correcting for multiple comparisons via controlling for the familywise error rate (FWER). The Benjamini-Hockberg FDR procedure, employed here for controlling the FDR is adaptive in that the threshold for rejecting the null hypothesis is dependent on the size of the initial p-value and the number of hypotheses tested (Benjamini and Hochberg, 1995; Lindquist, 2008). Adjustments were made with alpha set to 0.05 in the Benjamini-Hockberg equation.

FIGURE 4 | Relative concentrations of oxygenated hemoglobin in RLPFC for correct and incorrect blocks of an auditory 1-back while navigating with an ARWD (Google Glass) and HHD (Smartphone). <sup>∗</sup>p < 0.05.

FIGURE 5 | Relative concentrations of deoxygenated hemoglobin in RLPFC for correct and incorrect blocks of an auditory 1-back while navigating with an ARWD (Google Glass) and HHD (Smartphone). ∗∗p < 0.01; ∗∗∗p < 0.001.

# RESULTS

#### Auditory 1-Back

#### Behavioral

The results of the auditory 1-back were submitted to a generalized linear mixed effects regression. The link function was specified as binomial and parameter estimates were calculated using maximum likelihood. The tested fixed effects included condition (ARWD vs. HHD with smartphone coded as the reference factor), trial and the interaction between the two. The trial component was included to determine if there were any accommodation effects within and between the two devices. Participant intercepts were specified as the random effect. Parameter estimates are reported here as log odds ratios (as they are linear and non-conditional within this analysis). Participants in the HHD group were more likely to correctly than incorrectly report the number of matches heard (b = 0.528, SE = 0.178, p < 0.005). Participants in the ARWD group were more likely

than the HHD group to correctly report the number of matches heard (b = 0.551, SE = 0.257, p < 0.05; **Figure 3**). The effect of trial was non-significant (b = 0.019, SE = 0.013, p = 0.137), and this did not differ between the two device groups (b = 0.013, SE = 0.019, p = 0.492). These results suggest that participants in the ARWD condition experienced lower levels of cognitive load relative to participants in the HHD condition when route following. Furthermore, there is no evidence that this level of load changed throughout the experiment both within and between conditions.

#### fNIRS

Relative measures of HbO and HbR acquired during the auditory 1-back were submitted to a linear mixed effects regression. Parameter estimates in the model selected from the procedure described in section ''Fixed and Random Effects Selection'' were calculated with restricted maximum likelihood. Fixed effects were condition (ARWD vs. HHD) and an interaction with performance (correct vs. incorrect). Participant intercepts and block slope were specified as random effects for HbO, and participant block slope was specified for HbR. The results of models for optodes over left lateral, left medial, right medial, and right lateral PFC are reported in **Tables 2**–**5**.

#### **Left Lateral PFC (LLPFC)**

Correct blocks while using an ARWD were associated with a decrease in the hemodynamic response as evidenced by a reduction in oxygenated hemoglobin relative to the null hypothesis. Furthermore, correct blocks while using a HHD were associated with an increase in the hemodynamic response as evidenced by a decrease in deoxygenated hemoglobin relative to incorrect blocks.

#### **Left Medial PFC (LMPFC)**

Correct blocks while using a HHD were associated with an increase in the hemodynamic response as evidenced by a reduction in deoxygenated hemoglobin. Furthermore, there is evidence to suggest that incorrect blocks while using an ARWD were associated with an increase in the hemodynamic response. Specifically, relative to the null hypothesis and correct blocks, incorrect blocks were related to an increase in oxygenated hemoglobin.

#### **Right Medial PFC (RMPFC)**

Incorrect blocks while using an ARWD were associated with an increase in the hemodynamic response as evidenced by increased oxygenated hemoglobin relative to correct blocks.

#### **Right Lateral PFC (RLPFC)**

Correct blocks while using an ARWD were associated with a decrease in the hemodynamic response as evidenced by reduced oxygenated hemoglobin relative to the null hypothesis and incorrect blocks. Furthermore, HHD use during correct and incorrect blocks reduced total hemoglobin as evidenced by the reductions in oxy and deoxygenated hemoglobin. Of particular note for workload comparison between the display conditions is that during correct blocks, HHD deoxygenated hemoglobin was less than ARWD deoxygenated hemoglobin. Finally, during incorrect blocks HHD use was associated with decreases in oxygenated and deoxygenated hemoglobin relative to ARWD use.

Overall, correct auditory memory performance while using ARWD was associated with a reduction in the hemodynamic response in bilateral PFC. Interestingly, incorrect responses were associated with an increase in the hemodynamic response at the more medial measurement sites. Effects of HHD use were mainly observed in left medial PFC, where correct auditory memory performance while using a HHD was associated with an increase in the hemodynamic response. Workload differences as inferred from errors on secondary tasks are most apparent in RLPFC. Where auditory errors during HHD use were associated with reductions in oxygenated (**Figure 4**) and deoxygenated (**Figure 5**) hemoglobin relative to ARWD use.

# Situation Awareness

#### Behavioral

The results of the scenery probe task were submitted to a generalized linear mixed effects regression. The link function was specified as binomial and parameter estimates were calculated using maximum likelihood. The tested fixed effects were condition (ARWD vs. HHD with HHD coded as the reference factor), trial and the interaction between the two. The trial component was included to determine if there were any accommodation effects within and between the two devices. Participant intercepts and uncorrelated trial slopes were specified as the random effects. Parameter estimates are reported here



Notes. <sup>∗</sup>p < 0.05; ∗∗p < 0.01; ∗∗∗p < 0.001.

as log odds ratios. Participants in the HHD group were more likely to correctly respond to the scenery probe (b = 0.914, SE = 0.235, p < 0.001) than not. Participants in the ARWD group showed no significant difference relative to the HHD group in correctly responding to the scenery probe (b = −0.155, SE = 0.335, p = 0.644; **Figure 6**). The effect of trial was nonsignificant (b = −0.098, SE = 0.091, p = 0.282), and this did not differ between the two device groups (b = −0.148, SE = 0.132, p = 0.260). These results suggest that participants in the both conditions were able to effectively perform the task. However, there is no measureable difference in situation awareness for environmental objects between the two conditions. Furthermore, there is no evidence that situation awareness changed throughout the experiment both within and between conditions.

#### fNIRS

Relative measures of HbO and HbR acquired during the scenery probe task were submitted to a linear mixed effects regression. Parameter estimates in the model selected from the procedure described in section ''Fixed and Random Effects Selection'' were calculated with restricted maximum likelihood. Fixed effects were condition (ARWD vs. HHD) and an interaction with performance (correct vs. incorrect). Participant random trial slopes were specified as random effects. The results of models for optodes over left lateral, left medial, right medial, and right lateral PFC are reported in **Tables 6**–**9**.

#### **Left Lateral PFC**

Correct trials while using an ARWD, were associated with a decrease in oxygenated hemoglobin. Incorrect ARWD trials were associated with an increase in oxygenated hemoglobin, and the difference in relative oxygenated hemoglobin between the two outcomes was significant. Incorrect trials while using a HHD were related to reduced deoxygenated hemoglobin. Finally, incorrect trials while using a HHD reduced oxygenated and deoxygenated hemoglobin relative to incorrect trials while using an ARWD. This is either representative of only a reduction in total hemoglobin or a reduction in total hemoglobin and a reduction in brain activity in this region as the decline in oxygenated hemoglobin is greater than that of deoxygenated hemoglobin.

#### **Left Medial PFC**

Correct trials while using an ARWD were associated with a decrease in the hemodynamic responses as evidenced by the reduction in oxygenated hemoglobin. Furthermore, incorrect trials while using a HHD were associated with an increase in the hemodynamic response as evidenced by reduced deoxygenated hemoglobin.

#### **Right Medial PFC**

No significant differences in hemodynamics were observed in regards to accuracy, or device use.

#### **Right Lateral PFC**

Incorrect trials while using a HHD were associated with a decrease in the hemodynamic response as evidenced by reduced oxygenated hemoglobin relative to the null hypothesis and correct trials. Furthermore, incorrect trials while using an ARWD were associated with an increase in the hemodynamic response as evidenced by an increase in oxygenated hemoglobin relative to correct trials, and incorrect trials while using a HHD.

Overall, high situation awareness while using glass was associated with a reduced hemodynamic response in left PFC. Low situation awareness while using glass was related to an increase in the hemodynamic response in bilateral PFC. Conversely, low situation awareness while using a smartphone was associated with a reduced hemodynamic response in bilateral PFC (**Figure 7**).

# DISCUSSION

ARWDs are increasing in use and it is important that we understand how such devices affect mental workload and situation awareness. NASA plans to use ARWDs to improve training and performance in highly demanding situations (Schierholz et al., 2015). Objectively measuring mental workload and situation awareness in ARWDs can be difficult due to the immersive and mobile nature of the technology. To circumvent issues of mobility and immersion we used wireless fNIRS to examine hemodynamic differences in mental workload and situation awareness between an ARWD (i.e., Google Glass) and a hand-held display (i.e., a smartphone) during real-world navigation and dual-tasking.



TABLE 8 | Scenery probe secondary-task hemodynamics as a function of device and accuracy.


Notes. <sup>∗</sup>p < 0.05; ∗∗p < 0.01.

Behavioral differences between the ARWD and HHD while navigating and performing an auditory working memory task suggest differences in experienced workload. While dual-tasking, both tasks were preformed successfully across displays types. However, individuals using an ARWD showed superior working memory recall relative to HHD users. The dual-task method of assessing mental workload (Ogden et al., 1979; O'Donnell and Eggemeier, 1986) dictates that higher performance observed in secondary tasks represents reduced workload during the primary task. The increased working memory performance observed while using an ARWD suggests that relative to handheld displays ARWDs induce less mental workload while being used for navigation.

Mental workload and the hemodynamic response representative of brain activity are positively related, especially in working memory tasks (Braver et al., 1997; Cohen et al., 1997; Culham et al., 2001; Ayaz et al., 2012b); from our behavioral results, we expected a lower hemodynamic response for ARWD users relative to HHD users. In accordance with our behavioral results ARWD blocks were associated with a reduction of oxygenated hemoglobin representative of a reduction of brain activity in bilateral PFC. Furthermore, HHD trials were associated with a reduction of deoxygenated hemoglobin representative of an increase in brain activity in left medial and right lateral PFC. A direct comparison of the two conditions hemodynamics in RLPFC revealed reduced deoxygenated hemoglobin during HHD use relative to ARWD use during correct auditory working memory performance. This provides further evidence that even when the interference between the auditory working memory task and the navigation task was not overloading, neural activity was higher while using an HHD.

With regard to the scenery probe task, we observed no performance differences between ARWDs and hand-held displays, but hemodynamic differences were observed. Both display groups performed the scenery probe and navigation tasks successfully. However, unlike when working memory and navigation co-occurred, dual-task assessment could not differentiate between the two displays in terms of mental workload during the scenery probe task. This was not the case for hemodynamic measurements made with wireless fNIRS. The difference between the display conditions is strongest in left lateral PFC. In this region there was a reduction in oxygenated hemoglobin during ARWD use on correct trials. A decrement was not present in left lateral PFC during hand-held display use. While not as large, a similar trend can be seen between ARWD and hand-held displays in right lateral PFC as well. While inconclusive, considering the non-significant differences in oxygenated hemoglobin on correct trials between ARWD and hand-held displays, the trend is for reduced brain activity during ARWD use. Taking the scenery probe task as a proxy for level 1 and 2 situation awareness, less mental resources were required during landmark perception and comprehension while navigating with an ARWD relative to a hand-held display.

Scenery probe and working memory errors were associated with changes in ARWD hemodynamics. Lower situation awareness during ARWD use was associated with increased



Notes. ∗∗p < 0.01; ∗∗∗p < 0.001.

oxygenated hemoglobin in bilateral PFC. Similarly, incorrect working memory trials and ARWD use were associated with increased oxygenated hemoglobin across PFC. Effectively, poor secondary task performance was associated with an increase in PFC activity while navigating with an ARWD. This increase in activity coincides with the increase in workload expected due to dual-task interference. Stimulus driven attention capture is related to increased activity in PFC (Fockert et al., 2004; Serences et al., 2005; Asplund et al., 2010). Furthermore, head-up display symbology is known to negatively affect performance from unnecessary attention capture (Thomas and Wickens, 2001, 2004). The presence of cognitive tunneling during ARWD use can parsimoniously explain the presence of an error, the increase in brain activity and the increase in mental workload observed across both secondary tasks. Also considering that the display symbology was unchanged between the ARWD and HHD conditions, and that the symbology was originally designed for the HHD; the presence of cognitive tunneling was expected.

The association of secondary task errors on HHD hemodynamics was the opposite of that observed during ARWD use. Across both secondary tasks, errors were associated with decreases in brain activity. Working memory errors were associated with an increase in left PFC deoxygenated hemoglobin. Lower situation awareness was associated with a decrease in bilateral PFC oxygenated hemoglobin and RLPFC deoxygenated hemoglobin. It is probable that HHD errors were related to task shedding, the abandonment of one of the two tasks being performed; a common strategy during dual-tasks that overload mental resources (Schneider and Detweiler, 1988; Raby and Wickens, 1994; Hancock and Szalma, 2003; Grier et al., 2008; Schulte and Donath, 2011). Task shedding should produce a reduction in brain activity due to reducing mental workload. Therefore, we would expect a reduced hemodynamic response during correct secondary task trials if the primary task was shed. This effect was not observed. Instead, brain activity decreased during incorrect trials. Continuing with the logic that reduced activity is related to reduced mental workload, reduced activity during incorrect secondary trials suggests that the secondary-tasks may have been shed. This explanation is consistent with the emphasis we placed on the navigation task as well as our observed behavioral and hemodynamic effects.

# LIMITATIONS

Due to the nature of the wireless fNIRS, and the miniaturized design of our imaging unit we are limited to four optodes imaging the PFC. Therefore, other cortical regions may have shown significant hemodynamic differences between the two devices that we could not measure. Furthermore, given the current design, we could not account for all factors that might influence difference in mental workload between the two devices. We could only measure differences in mental workload that manifest as dual-task interference from increased working memory load, or increased perceptual load.

# CONCLUSION

Taking a neuroergonomic approach combining dual-task interference and wireless fNIRS, we were able to examine differences in mental workload, and situation awareness between a hand-held display (smartphone) and an augmented reality wearable display (Google Glass) while navigating an outdoor environment. ARWDs show few downsides with regards to dual tasking while route following. Relative to a HHD, mental workload while navigating with an ARWD was reduced, both during a working memory and situation awareness secondary task; performance was also enhanced during the working memory dual-task. Hemodynamic effects induced during errors also suggest ways in which ARWDs can be improved, specifically by reducing unwanted attention capture and cognitive tunneling. Future work should identify other hemodynamic biomarkers induced by cognitive tunneling. From an applied perspective development of tunneling biomarkers could greatly advance display design for navigation, training and other tasks ARWDs are expected to enhance.

# DEDICATION

In Memory of Professor Raja Parasuraman, this article is dedicated to Professor Parasuraman for his guidance on past and present neuroergonomic studies as well as the inspiration he provides for studies to come.

# REFERENCES


#### AUTHOR CONTRIBUTIONS

RMcK designed the study, collected the data, analyzed the data and wrote the manuscript. RP designed the study. RM designed the study and collected data. AF designed the study and collected data. WB designed the study, analyzed data and edited the manuscript. MP analyzed data and edited the manuscript. HA designed the study, analyzed data and wrote the manuscript.

#### FUNDING

This research was supported by Air Force Office of Scientific Research (AFOSR) Grant No. FA9550-10-1-0385, and the Center of Excellence in Neuroergonomics, Technology, and Cognition (CENTEC).


of application. Neuroimage 63, 921–935. doi: 10.1016/j.neuroimage.2012. 03.049


**Conflict of Interest Statement**: fNIR Devices, LLC manufactures the optical brain imaging instrument and licensed IP and know-how from Drexel University. HA was involved in the technology development and thus offered a minor share in the new startup firm fNIR Devices, LLC. The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The Review Editor SC and handling Editor declared their shared affiliation, and the handling Editor states that the process nevertheless met the standards of a fair and objective review.

Copyright © 2016 McKendrick, Parasuraman, Murtza, Formwalt, Baccus, Paczynski and Ayaz. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Processing Functional Near Infrared Spectroscopy Signal with a Kalman Filter to Assess Working Memory during Simulated Flight

Gautier Durantin1, 2, 3 \*, Sébastien Scannella<sup>1</sup> , Thibault Gateau<sup>1</sup> , Arnaud Delorme2, 3 and Frédéric Dehais <sup>1</sup>

<sup>1</sup> Département Conception et Conduite des Véhicules Aéronautiques et Spatiaux, Institut Supérieur de l'Aéronautique et de l'Espace (ISAE-Supaéro), Toulouse, France, <sup>2</sup> Centre de Recherche Cerveau et Cognition, Université Toulouse III - Paul Sabatier, Toulouse, France, <sup>3</sup> Centre National de la Recherche Scientifique, Centre de Recherche Cerveau et Cognition, Toulouse, France

Working memory (WM) is a key executive function for operating aircraft, especially when pilots have to recall series of air traffic control instructions. There is a need to implement tools to monitor WM as its limitation may jeopardize flight safety. An innovative way to address this issue is to adopt a Neuroergonomics approach that merges knowledge and methods from Human Factors, System Engineering, and Neuroscience. A challenge of great importance for Neuroergonomics is to implement efficient brain imaging techniques to measure the brain at work and to design Brain Computer Interfaces (BCI). We used functional near infrared spectroscopy as it has been already successfully tested to measure WM capacity in complex environment with air traffic controllers (ATC), pilots, or unmanned vehicle operators. However, the extraction of relevant features from the raw signal in ecological environment is still a critical issue due to the complexity of implementing real-time signal processing techniques without a priori knowledge. We proposed to implement the Kalman filtering approach, a signal processing technique that is efficient when the dynamics of the signal can be modeled. We based our approach on the Boynton model of hemodynamic response. We conducted a first experiment with nine participants involving a basic WM task to estimate the noise covariances of the Kalman filter. We then conducted a more ecological experiment in our flight simulator with 18 pilots who interacted with ATC instructions (two levels of difficulty). The data was processed with the same Kalman filter settings implemented in the first experiment. This filter was benchmarked with a classical pass-band IIR filter and a Moving Average Convergence Divergence (MACD) filter. Statistical analysis revealed that the Kalman filter was the most efficient to separate the two levels of load, by increasing the observed effect size in prefrontal areas involved in WM. In addition, the use of a Kalman filter increased the performance of the classification of WM levels based on brain signal. The results suggest that Kalman filter is a suitable approach for real-time improvement of near infrared spectroscopy signal in ecological situations and the development of BCI.

Keywords: fNIRS, Kalman filtering, Neuroergonomics, working memory, SVM

#### Edited by:

Klaus Gramann, Berlin Institute of Technology, Germany

#### Reviewed by:

Daniele Marinazzo, University of Ghent, Belgium Tao Liu, Sun Yat-Sen University, China

> \*Correspondence: Gautier Durantin gautier.durantin@isae.fr

Received: 25 September 2015 Accepted: 17 December 2015 Published: 19 January 2016

#### Citation:

Durantin G, Scannella S, Gateau T, Delorme A and Dehais F (2016) Processing Functional Near Infrared Spectroscopy Signal with a Kalman Filter to Assess Working Memory during Simulated Flight. Front. Hum. Neurosci. 9:707. doi: 10.3389/fnhum.2015.00707

# 1. INTRODUCTION

The development of passive Brain Computer Interfaces (BCI) is a key topic of research in Neuroergonomics. In contrast with active ones, Passive BCI (Cutrell and Tan, 2008) allows the use of unintentionally produced brain activity to derive various cognitive states (Blankertz et al., 2010) such as excessive mental workload. Such states inference provides an interesting insight as they aim at dynamically adapting the nature of the humansystem interactions to overcome cognitive limitations (Zander and Kothe, 2011; Brouwer et al., 2013). In the field of BCI design to enhance user performance, there is a growing interest for functional near infrared spectroscopy (fNIRS) based BCI (Coyle et al., 2004; Derosière et al., 2014; Strait et al., 2014). This brain imaging device uses near infrared light absorption properties to estimate local variations of cortical hemodynamics. It uses a modified Beer-Lambert law to link light transmittance through brain tissues to variations in local concentrations in oxygenated hemoglobin (HbO2) and deoxygenated hemoglobin (HHb) (Villringer and Obrig, 2002). fNIRS has a good spatial resolution (around 1 cm<sup>2</sup> ) and interesting signal-to-noise ratio. Moreover, this technique has the advantage to be easy and fast to set over the participant's head with a short calibration process (Naseer and Hong, 2015). However, the processing of fNIRS signal faces a lack of methodological consensus and thus still represents a great challenge (Bashashati et al., 2007). The extraction of the relevant activity from brain signals requires complex techniques (van Erp et al., 2012), and most efficient ones often rely on long calibration times [e.g., in subspace artifact removal techniques (von Bünau et al., 2009), adaptive filtering (Zheng et al., 2002)]. The complexity of these methods limits their applicability for Neuroergonomics purpose, as the signal has to be useable in real-time.

Most BCI designs rely on classical linear bandpass filtering techniques such as Infinite Impulse Response (IIR) (Naseer and Hong, 2015), although current research focuses on the investigation of alternative signal processing techniques, such as the Moving Average Convergence Divergence (MACD) filter (Durantin et al., 2014b; Gateau et al., 2015). On this basis, the improvement of signal quality in real-world conditions as suggested in the Neuroergonomics approach, makes Kalman filtering an ideal candidate. This signal processing and estimation technique relies both on the measurements performed on a system and on a modeling of its dynamics to improve signal quality (Kalman, 1960). The use of a Kalman filter including a physiological model of brain function to improve signal usability has been previously applied to EEG (Georgiadis et al., 2005; Callan et al., 2015) or fMRI (Diamond et al., 2005). However, concerning fNIRS, this technique has been limited to the estimation of model parameters (Abdelnour and Huppert, 2009) or the correction of motion artifacts (Izzetoglu et al., 2010), therefore not requiring the use of a physiological model of hemodynamic response to stimulation.

One of the greatest challenges regarding Kalman filter design is the tuning of its parameters, i.e., to evaluate the level of measurement noise (R) affecting the signal and the state noise (Q) in the model (Diamond et al., 2005). The value of the ratio Q/R greatly influences the behavior of the Kalman filter. Indeed, a Kalman filter with a low value of Q/R will put confidence in the dynamical model, whereas a Kalman filter with a high value of Q/R will put confidence in the measurements. In practice, the value of this ratio often has to be chosen empirically (Abdelnour and Huppert, 2009; Callan et al., 2015), as there exists no efficient way to evaluate it. Consequently, the dynamics of the Kalman filter may not be adapted to the data needed to be improved. The challenge of this study was to design a Kalman filter suitable for fNIRS that includes a physiological model of hemodynamic response (Boynton et al., 1996). By applying this filter to fNIRS data collected during both controlled and ecological experiments, we also aimed at testing the improvements such a filter could bring to fNIRS signal toward the implementation of a passive BCI. To that end, we first designed a Kalman filter relying on a model of the hemodynamic response (Boynton et al., 1996) to improve signal quality. We then conducted a first experiment with a prefrontal fNIRS, involving a digit sequence memorization task used to measure Working Memory (WM) storage and update capacity. Provided that the development of a signal improvement technique usable in realistic operational settings was the objective of this study, this basic task was chosen as WM is a key executive function to operate complex systems (Causse et al., 2011). Data collected during the first experiment were used to select the value of the filter parameter Q/R using an optimization procedure. Finally, the improvement of the signal by the optimal Kalman filter was evaluated with formal classification during an ecological experiment which involved pilots performing a realistic WM task (i.e., recalling air traffic instructions) in a flight simulator.

# 2. KALMAN FILTER DESIGN

The functional model used to design the Kalman filter for fNIRS signal was inspired by the Hemodynamic Response Function (HRF) proposed by Boynton et al. (1996). This function is simple enough to be represented by a low order state-space model. This model assumes a third order impulsional response to stimulation, and has the following transfer function :

$$HRF(\wp) = \frac{\pi^3 e^{-\delta\wp}}{(\wp + \tau)^3} \tag{1}$$

As shown in Equation (1), the response shape depends on two parameters : δ represents the pure delay between stimulation and the start of HbO<sup>2</sup> increase ; τ influences the timeto-peak delay. Typical values that were chosen here were extracted from Boynton et al. (1996), and are δ = 2 s and τ = 1.5 s. This choice leads to a time-to-peak delay from pulse stimulation of around 5 s (Handwerker et al., 2004). Then, the Kalman filter principle requires the addition to the model of a state noise w (defined as the amount of noise affecting the model, i.e., the amount of errors in it) and of a measurement noise v (defined as the amount of noise affecting the measures). As shown on **Figure 1**, we chose to represent the state noise as a perturbation affecting the stimulus (i.e., the input of the model). This choice led us to consider that

state noise represents a stimulus perception (or internalization) bias.

The perception bias perturbing the stimulus is noted b. In the nominal model, Kalman filter assumptions impose that b = w, where w is the state noise following a gaussian centered distribution. This model, in addition to the choice of a Q/R value (where Q is the variance of the state noise, and R is the variance of the measurement noise), allowed us to design a Kalman filter for fNIRS signal improvement. The inputs of the Kalman filter were the stimuli onsets and the fNIRS raw signal.

One of the main limitations of this approach is the fact that the stimulus perception bias has to be centered (i.e., b = 0 on average), which can be erroneous when the subject sustainably disengages from the task and doesn't pay attention to the stimuli. To take this element into account, we built a second model, which is an augmentation of the nominal model, and in which ˙b = w. Thus, as the first derivative of b follows the gaussian centered distribution, it is still possible to design a Kalman filter, without assuming that b is null on average. This augmented model, along with the value of Q/R, allowed the computation of the augmented Kalman filter for fNIRS signal processing. For both filters, the value of the ratio Q/R was fixed according to an optimization process (see next section).

## 3. FIRST STEP : SETTING THE FILTER PARAMETERS

#### 3.1. Material and Methods

Nine healthy participants from the Institut Superieur de l'Aeronautique et de l'Espace (ISAE ; Mean age = 21.6; SD = 1.5;

eight males, eight right handed) participated in the experiment. The volunteers performed a computer-based digit sequence memorization task, while fNIRS measurements of the prefrontal cortex were recorded. Data were recorded using a Biopac <sup>R</sup> fNIR100 device, composed of 16 optodes placed on the forehead (see **Figure 2**). Each optode of the device records hemodynamics at a frequency of 2 Hz in term of oxygenated hemoglobin (HbO2) and deoxygenated hemoglobin (HHb) level variations in comparison to a baseline.

Each trial of the experiment consisted in the memorization of a sequence of 5, 7, or 9 randomly chosen digits. The size of the sequence defined a level of difficulty. **Figure 3** summarizes the time sequence of a trial. During each trial, the subjects were asked to look at a fixation cross at the center of the screen. The digit sequence was presented through the loudspeakers of the computer using prerecorded audio tracks, at a rate of one digit per second. After the presentation of the last digit, the fixation cross was replaced by three crosses, indicating that the subjects had 8 s to type the memorized sequence on the keyboard. Between two consecutive trials, the subjects looked passively at the fixation cross at the center of the screen for 6 to 9 s (the intertrial interval was chosen randomly to avoid task periodicity). The experiment consisted of 27 trials (nine trials for each of the three levels of difficulty), presented in a randomized order.

#### 3.2. Data Processing

Data were processed using Matlab <sup>R</sup> . Two different types of Kalman filters were applied to the data, the nominal Kalman filter (in which we assumed that the stimulus perception bias is null on average) and the augmented Kalman filter (without this assumption). The inputs for both filters were the stimuli onsets and the raw fNIRS data. For each filter, the value of the Q/R ratio chosen for the Kalman filter tuning ranged from 10−<sup>5</sup> to 10<sup>5</sup> , in order to look for the optimal results. Simultaneously, we also applied the MACD filter (Durantin et al., 2014b) to raw data in order to compare Kalman results with classical filtering.

For each trial, we computed the HbO<sup>2</sup> peak response (noted 1HbO2), i.e., the difference between the maximum value of HbO<sup>2</sup> in the 30 s following the trial onset and the value of HbO<sup>2</sup> at onset time. We similarly computed the HHb peak response (noted 1HHb).

FIGURE 2 | fNIRS device optodes location. The device is composed of four light sources and 10 light detectors. The association of one light source and one light detector composes the optodes. The disposition of the sources and detectors leads to 16 optodes over the prefrontal cortex.

Preliminary, the potential good values for Q/R (i.e., those leading to improvement in the signal) were isolated by computing an Effect Size Index (ESI, illustrated on **Figure 4**. For each filter (MACD, nominal Kalman or augmented Kalman with a given Q/R value) and each digit sequence size N, we computed the mean (µN) and standard deviation (σN) of the level of 1HbO<sup>2</sup> or 1HHb measured at each optode. We used these values to compute a confidence interval corresponding to one standard deviation as [µ<sup>N</sup> − σN;µ<sup>N</sup> + σN]. The ESI was defined as the gap between the confidence intervals of each condition (negative if the confidence intervals are overlapping), i.e.,

$$ESI = \left( (\mu\_7 - \sigma\_7) - (\mu\_5 + \sigma\_5) \right) + \left( (\mu\_9 - \sigma\_9) - (\mu\_7 + \sigma\_7) \right)$$

We then proceeded to visual inspection to find the best values for Q/R ratio, by finding the parameters leading to higher ESI values. Each set of data was finally tested using a Two-way analysis of variance ANOVA, with two factors (16 optodes, three levels of difficulty), performed using STATISTICA <sup>R</sup> software. The strength of the statistical effect of the difficulty level, evaluated using the partial η 2 , was used to compare the results of the different filters.

#### 3.3. Results

As shown on **Figure 5**, the optimal results were obtained for HbO<sup>2</sup> at optode 2 recording mainly from the left inferior frontal gyrus, when using the nominal Kalman filter with Q/R = 3.98 or the augmented Kalman filter with Q/R = 0.50. **Table 1** summarizes the effect sizes obtained for each of the signal processing techniques tested (those effects showed an increase in the level of 1HbO<sup>2</sup> with growing sequence sizes).

The frequency and phase responses (Bode diagram) of the nominal Kalman filter (Q/R = 3.98) and of the augmented Kalman filter (Q/R = 0.50) are given on **Figure 6**. As the two filters exhibit similar Bode diagrams (and therefore similar filtering properties), we retained only the augmented Kalman filter for testing on new data.

TABLE 1 | Effect sizes obtained for the effect of difficulty over all the subjects for the level of <sup>1</sup>HbO<sup>2</sup> measured at optode 2, depending on the type of filter used for signal processing.

gaps between the confidence intervals (negative if the confidence intervals are


# 4. SECOND STEP : TESTING THE APPLICABILITY OF THE FILTER IN ECOLOGICAL CONDITIONS

# 4.1. Material and Methods

overlapping).

Data used for testing the Kalman filter were extracted from a second experiment involving a digit sequence memorization in a realistic flight simulator (see **Figure 7** for an illustration of the setup). The experiment was similar to Gateau et al. (2015), and included 18 healthy subjects (Mean age = 27.1; SD = 6.4; six women). Pilots heard prerecorded Air Traffic Controller (ATC) messages and were asked to dial the corresponding flight parameters in the Flight Control Unit (FCU) using the four knobs ( i.e., speed, heading, altitude, and vertical speed knobs) of the FCU. The ATC messages were delivered at 78 dB SPL trough a Sennheiser <sup>R</sup> headset. We defined two levels of difficulty depending on the complexity of the message:


The task consisted in 20 repetitions of each difficulty for a total of 40 trials. Each ATC message started with the airplane call sign (i.e., "Supaero 32"), followed by the sequence of flight parameters. It ended with the message "over." The subjects were instructed to set the parameters strictly only after they heard the "over" message. A practice session was conducted for each subject before the actual experiments to allow them to become familiar with the experiment and the interface. After each message, the pilots had 18 seconds to enter the flight parameters. Trials were separated by 11 to 13 s of rest. During the experiment, hemodynamics of the prefrontal cortex were recorded using the same device than in the first experiment.

### 4.2. Data Processing and Classification

The raw HbO<sup>2</sup> data measured at each optode were filtered using three types of filter. First, we used the MACD filter and the augmented Kalman filter retained from the optimization phase (Q/R = 0.50). We also used a classical IIR Butterworth bandpass filter (0.02 Hz < f < 0.1 Hz), in order to compare the results to classical filtering. The statistical effect sizes of the level of 1HbO<sup>2</sup> (computed in the same way than in the first experiment)

FIGURE 8 | Comparison of the t-maps for the contrast High load - Low load on the level of HbO<sup>2</sup> over the prefrontal cortex obtained in function of the type of filter used for signal processing (classical IIR, MACD, or Kalman). The topographical view was extracted from fNIRSoft® and the threshold was fixed at

were evaluated using repeated measures ANOVA performed with STATISTICA <sup>R</sup> . The performance of the different filters were compared in terms of partial η 2 . In addition, we computed the statistical t-maps representing the differences in the contrasts between high and low load conditions in terms of level of HbO<sup>2</sup> for each type of filters. This computation was done using Matlab and plotted using the topograph tool from fNIRSoft <sup>R</sup> .

the statistical significance level with α = 0.01, to account for multiple comparisons.

The improvement of the signal depending on the type of filter used for processing was also evaluated by performing formal classification on the data. This analysis was performed using the Statistics and Machine Learning toolbox from Matlab. The 1HbO<sup>2</sup> values extracted from each optode were used to train and test a Linear Support Vector Machine (SVM) classifier through a 10-fold cross validation process : for each subject, data from all trials were randomly divided in 10. The difficulty (high or low load) of the trials of each 10% of data was predicted by a SVM classifier that was previously trained on the 90% remaining data. The predicted labels were then examined to evaluate the Accuracy (probability of good classification), Sensitivity (probability of good classification for high load trials), and Specificity (probability of good classification for low load trials) of the classifier.

#### 4.3. Results

The partial η <sup>2</sup> obtained for each type of filters are given in **Table 2** (for each optode and across all the optodes). The results show that the use of MACD elicits a better statistical effect size than the classical IIR filter. Similarly, the use of Kalman filter yields better results than both MACD and IIR filters. This result is true not only when filtering data from optode 2, but present notably at all optodes located in the bilateral dorsolateral areas of the prefrontal cortex (optodes 1, 2, 3, 4 and 13, 14 ,15, 16).

The effect of trial difficulty on the level of HbO<sup>2</sup> measured over the prefrontal cortex is shown on **Figure 8**. On this figure, we observe that both the MACD and Kalman filter over classical IIR


TABLE 2 | Effect sizes (partial η <sup>2</sup>) obtained for the effect of difficulty over all the subjects for the level of HbO<sup>2</sup> measured for each optodes (plus main effect size over all optodes), depending on the type of filter used for signal processing.

The effect sizes corresponding to a significant effect (p < 0.05) after correction for multiple comparisons are reported in bold font.

filter improve the discriminability between the two conditions in the lateral areas of the prefrontal cortex.

Ultimately, the cross-validation procedure performed on the data to classify low-load vs. high-load trials are presented in **Figure 9** in terms of accuracy, sensitivity, and specificity. The classification results were all significantly better than chance, although Kalman filter led to statistically better results than IIR and MACD filters. Using Kalman filtering, the classification accuracy reached 77.8%, with a sensitivity of 79.4% and a specificity of 76%.

#### 5. DISCUSSION

The objective of the study was to design a Kalman filter to improve fNIRS signal for Neuroergonomics applications. In particular, the main challenge concerned the tuning of the parameters Q and R (Diamond et al., 2005), representing the state noise and measurement noise variances. Based on a simple model of the hemodynamic response to neuronal stimulation (Boynton et al., 1996), we designed a Kalman filter model taking into account both the measurement noise and the stimulus perception bias that can occur in periods of disengagement or when the level of attention varies. During an optimization process, we showed that it was possible to find values for the parameters which leads to better statistical results (Q/R = 0.50) with an augmented model. Interestingly, the relatively low value of the Q/R ratio in the second model suggests that the Kalman filter put more confidence in the dynamical model of hemodynamics response than in fNIRS data. The higher optimal value obtained for this ratio when using the first choice of model (Q/R = 3.98) suggests this model was less consistent with the actual hemodynamics characteristics.

We applied the optimal results found in the first experiment on new data from an ecological experiment in a flight simulator, and showed that the optimal Kalman filter tuning could be applied generically. This filter led to higher effect sizes when looking at the effect of task difficulty in both tasks, compared to classical filters (see **Figure 8**). It is argued that the use of a dynamical physiological model by the Kalman filter implies less variability across trials and subjects, therefore explaining the greater stability of the results obtained with this filter. These results suggest that this filter would be suitable to improve the discriminability between the two conditions toward the implementation of a BCI to assist the operator, and would support the use of Kalman filtering to improve fNIRS signal (Izzetoglu et al., 2010). In particular, the Kalman filter helped us perform better during the SVM-based classification procedure between low-load and high-load trials, which confirms its contribution to the improvement of the signal. In addition, the experiment also confirmed that the MACD filter brings good results compared to classical IIR filtering, as it was previously demonstrated (Durantin et al., 2014b; Gateau et al., 2015). Although the discriminability obtained with this filter is not as good as the one obtained with the Kalman filter, it presents the advantage of not requiring any information on the stimulus onsets.

Interestingly, the optimal results for the first experiment were found at optode 2 recording mainly from the left inferior frontal gyrus. More generally, when applying the optimal Kalman filter in the second experiment, the WM solicitation elicited an activation of bilateral areas in the inferior and middle frontal gyri, part of the dorsolateral prefrontal cortex (see **Figure 8** and **Table 2**). This result is in agreement with previous fNIRS studies that have found these regions are sensitive to WM solicitation (Ayaz et al., 2012; Durantin et al., 2014a). Therefore, the improvement of the fNIRS signal collected in this region suggests that this filter could be applied to any experiment recruiting the same functional areas. In particular, the optimization process carried in this study would avoid the need of a calibration phase or of a convergence phase (in case of adaptive filtering) to improve signal quality. However, further investigation is still needed to assess whether this filter could be used with the same model and tuning in experiments recruiting different brain areas. Similarly, further investigation is also needed to assess the usability of these filters

#### REFERENCES


in ecological conditions that would differ from a simulated flight (e.g., with higher levels of light variations or motion artifacts).

Nevertheless, some modifications of the model could lead to better usability and performance of the Kalman filter. For instance, the use of a stimulus onset detection technique such as the detection technique based on the MACD filter (Durantin et al., 2014b; Gateau et al., 2015) could replace the stimulus onsets input of the Kalman filter, therefore reducing the complexity of the filter. In addition, it would be interesting to compare the results of the current Kalman model relying on a simple modeling of the hemodynamic response to more complex physiological models (e.g., Buxton et al., 2004). Finally, using an adaptive Q/R gain or realizing an optimization process for each subject instead of using a generic filter could also yield better results, although it would add complexity and a calibration phase to the procedure.

Altogether, the promising results of the study stand in favor of the use of Kalman filtering as a signal improvement technique for fNIRS signals with applications in Neuroergonomics. In particular, the improved signal would be available in realtime and without a calibration phase, and would allow better classification of WM levels in ecological settings.

## AUTHOR CONTRIBUTIONS

All the authors contributed to the experiment design, results discussion, and paper redaction. Data collection was made by GD, TG, and SS. Signal processing tools (Kalman filter, MACD), were developed by GD.

#### ACKNOWLEDGMENTS

This research was supported by the Midi-Pyrénées region and Pôle de Recherche et d'Enseignement Supérieur (Neurocockpit project), and the AXA research fund (Neuroergonomics for flight safety). The work was approved by the Inserm Committee of Ethics Evaluation (Comité d'Évaluation Éthique de l'Inserm - CEEI/IRB00003888).


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Durantin, Scannella, Gateau, Delorme and Dehais. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Commentary: Cumulative effects of anodal and priming cathodal tDCS on pegboard test performance and motor cortical excitability

Pierre Besson<sup>1</sup> , Stephane Perrey <sup>1</sup> , Wei-Peng Teo<sup>2</sup> and Makii Muthalib<sup>1</sup> \*

*<sup>1</sup> EuroMov, University of Montpellier, Montpellier, France, <sup>2</sup> School of Exercise and Nutrition Sciences, Deakin University, Melbourne, VIC, Australia*

Keywords: tDCS, motor performance, metaplasticity, neuroergonomics, priming effects

#### **A commentary on**

#### **Cumulative effects of anodal and priming cathodal tDCS on pegboard test performance and motor cortical excitability**

by Christova, M., Rafolt, D., and Gallasch, E. (2015). Behav. Brain Res. 287, 27–33. doi: 10.1016/j.bbr.2015.03.028

Consistent with a neuroergonomics approach, task performance can be facilitated by non-invasive neuromodulation techniques, such as anodal transcranial direct current stimulation-atDCS (Clark and Parasuraman, 2014; McKendrick et al., 2015). However, robust stimulation parameters and protocols need to be developed for applying atDCS to enhance motor performance in clinical and healthy populations. For instance, protocols using Online atDCS, where the motor task is performed during the stimulation, has greater facilitative effects on motor performance/learning than if the motor task is performed after the stimulation (i.e., Offline atDCS; Stagg and Nitsche, 2011). These greater facilitative effects of Online atDCS on motor performance/learning are likely due to enhanced synaptic efficacy in the simultaneously engaged neural network through a "gating" mechanism (Ziemann and Siebner, 2008). Overall, the interaction of the timing of tDCS application and motor task are crucial parameters to optimize atDCS effects on enhancing motor performance/learning.

The recent study of Christova et al. (2015) aimed to optimize Online atDCS effects on enhancing motor performance/learning by applying a novel cathodal tDCS (ctDCS) priming protocol that harnessed homeostatic metaplastic mechanisms. In the design of the study, healthy subjects were randomly distributed into three priming tDCS groups (n = 12) and were required to perform with their non-dominant left hand a grooved pegboard test (GPT) over four training blocks and a retest 2 weeks later. Three priming tDCS conditions were investigated on the right primary motor cortex (M1): (1) Sham: Sham ctDCS (15 min) 10 min before Sham Online atDCS (20 min); (2) Online atDCS: Sham ctDCS (15 min) 10 min before Online atDCS (1mA, 20 min); (3) ctDCS priming: ctDCS (1mA, 15 min) 10 min before Online atDCS (1mA, 20 min). Transcranial magnetic stimulation (TMS) parameters (motor evoked potential-MEP, intracortical facilitation-ICF, and short interval intracortical inhibition-SICI) were assessed before and up to 60 min after the tDCS conditions. The results indicated that although both Online atDCS conditions improved GPT performance (i.e., faster completion time) over Sham after the four training blocks, only the priming ctDCS/Online atDCS condition further enhanced GPT performance 2 weeks later. These latter findings were explained in relation to homeostatic metaplastic mechanisms based on the Bienenstock-Cooper-Munro (BCM) theory that postulates a "sliding threshold" for bidirectional

#### Edited by:

*Stephen Fairclough, Liverpool John Moores University, UK*

> Reviewed by: *Ute Kreplin, Massey University, New Zealand*

#### \*Correspondence:

*Makii Muthalib makii.muthalib@umontpellier.fr; makii.muthalib@gmail.com*

Received: *22 November 2015* Accepted: *12 February 2016* Published: *01 March 2016*

#### Citation:

*Besson P, Perrey S, Teo W-P and Muthalib M (2016) Commentary: Cumulative effects of anodal and priming cathodal tDCS on pegboard test performance and motor cortical excitability. Front. Hum. Neurosci. 10:70. doi: 10.3389/fnhum.2016.00070* synaptic plasticity (Karabanov et al., 2015). Accordingly, priming with ctDCS, which reduced cortical excitability (reduced MEP amplitude and ICF) and increased cortical inhibition (increased SICI) after the ctDCS session, would have reduced post-synaptic activity in the activated neural network. Based on the BCM model, this ctDCS-induced reduction in post-synaptic activity would be expected to reduce the modification threshold for long term potentiation (LTP)-like plasticity during subsequent Online atDCS, and thus further enhanced GPT performance 2 weeks later. The prolonged increase in ICF and reduced SICI for at least 60 min afterwards provides some evidence for this homeostatic metaplastic effect enhancing offline learning of the GPT. However, the authors acknowledged that a limitation of the study design was that a priming ctDCS followed by Sham Online atDCS condition was not tested, which could have confirmed that the results of the priming ctDCS/Online atDCS condition were primarily due to homeostatic metaplastic mechanisms. Nevertheless, Christova et al.'s (2015) novel methodology and findings can be used to optimize tDCS priming protocols to modulate neuroplasticity and enhance motor performance/learning. The following sections will provide a commentary on ways to optimize the timing and polarity of tDCS applications, which could have significant implications for the original paper's conclusion.

An important tDCS parameter that requires further investigation is the influence of the time delay between priming and test tDCS application on homeostatic metaplasticity and its effects on motor performance/learning (Karabanov et al., 2015). A few studies have investigated the effects of altering the delay between repeated tDCS applications of the same polarity on cortical excitability (Fricke et al., 2011; Monte-Silva et al., 2013; Bastani and Jaberzadeh, 2014) and motor performance/learning (Bastani and Jaberzadeh, 2014). However, no clear evidence of the optimal delay time period could be ascertained from their respective priming tDCS protocols. Christova et al. (2015) considered a 10 min delay between ctDCS and Online atDCS to be sufficient to allow homeostatic metaplastic mechanisms to take hold. But it is still not known if a shorter or longer time delay between priming ctDCS and Online atDCS would differentially modulate homeostatic metaplasticity and motor performance/learning. We (Muthalib et al., 2016) have previously postulated a non-homeostatic approach of priming with atDCS immediately before Online atDCS to further facilitate the neuroplastic effects of Online atDCS. We reason that since sub-threshold neuronal membrane depolarization induced by atDCS has an intensity- and time-dependent effect to strengthen synaptic efficacy (Nitsche and Paulus, 2001), performing atDCS (2 mA, 10 min) immediately before Online atDCS would boost the already strengthened synaptic connections through a further "gating" mechanism induced with the concurrent motor task. We have recently shown that this priming atDCS/Online atDCS protocol on the left M1 can reduce bilateral M1 activation to perform a unilateral simple finger sequence task at the same tapping rate (Muthalib et al., 2016). These results could be explained by a non-homeostatic mechanism following the "gating" theory, such that the reduced motor task related bilateral M1 activation during the atDCS suggests a greater efficiency of neuronal transmission (i.e., less synaptic input for the same neuronal output) in the activated neuronal network. Whether this priming atDCS/Online atDCS protocol would enhance motor performance/learning greater than a priming ctDCS/Online atDCS or Online atDCS protocol still requires to be investigated.

Since the design of the Christova et al. (2015) study corresponded to a learning paradigm, it is difficult to differentiate the tDCS effect from the learning effect on improving online GPT performance during the four training blocks. In order to specifically test the tDCS effect, and minimize the effects of learning, on performance would have been to include a familiarization session to allow the GPT task to become "well learned" and performance stabilize at near maximal levels in all individuals prior to starting the tDCS interventions (Hummel et al., 2010). For highly skilled individuals (e.g., elite athletes, expert operators), it is extremely difficult to improve maximal performance levels since learning has reached relative "ceiling" levels. However, this "ceiling" performance can conceivably be modulated directly using neuromodulation protocols. For example, an excitatory TMS protocol to the dominant left M1, which lead to increased M1 excitability, was able to increase dominant right hand maximal finger tapping rate and reduce the decline of the movement rate over 10 s (Teo et al., 2012). In contrast, an inhibitory TMS protocol to the dominant left M1, which decreased M1 excitability, was shown to decrease maximal finger tapping rate of the dominant right hand (Jäncke et al., 2004). We therefore, consider that applying tDCS to the dominant left M1/right hand and utilizing a "well learned" stable motor task, such as a simple finger sequence task performed at maximum rate (Avanzino et al., 2008), may provide a sensitive means to investigate the tDCS effects on task performance.

In conclusion, priming tDCS protocols are promising ways to optimize tDCS facilitatory effects on motor performance/learning, which has relevance from a neuroergonomic standpoint. Thus, future studies are necessary to determine the optimal polarity and timing of tDCS applications to modulate neuroplasticity and enhance performance in clinical, sports, and real-world settings.

# AUTHOR CONTRIBUTIONS

All authors listed, have made substantial, direct, and intellectual contribution to the work, and approved it for publication.

# ACKNOWLEDGMENTS

We would like to thank Prof. John Rothwell for his helpful comments on the manuscript. MM was supported by a Labex NUMEV Fellowship (Digital and Hardware Solutions, Environmental and Organic Life Modeling, ANR-10-LABX-20). WT is supported by an Alfred Deakin Postdoctoral Fellowship.

# REFERENCES


(tDCS): expanding vistas for neurocognitive augmentation. Front. Syst. Neurosci. 9:27. doi: 10.3389/fnsys.2015.00027


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Besson, Perrey, Teo and Muthalib. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Simultaneous tDCS-fMRI Identifies Resting State Networks Correlated with Visual Search Enhancement

Daniel E. Callan1,2\*, Brian Falcone<sup>3</sup> , Atsushi Wada1,2 and Raja Parasuraman<sup>3</sup>

<sup>1</sup> Center for Information and Neural Networks (CiNet), National Institute of Information and Communications Technology (NICT), Osaka University, Osaka, Japan, <sup>2</sup> Multisensory Cognition and Computation Laboratory, Universal Communication Research Institute, National Institute of Information and Communications Technology, Kyoto, Japan, <sup>3</sup> Center of Excellence in Neuroergonomics, Technology, and Cognition (CENTEC), George Mason University, Fairfax, VA, USA

This study uses simultaneous transcranial direct current stimulation (tDCS) and functional MRI (fMRI) to investigate tDCS modulation of resting state activity and connectivity that underlies enhancement in behavioral performance. The experiment consisted of three sessions within the fMRI scanner in which participants conducted a visual search task: Session 1: Pre-training (no performance feedback), Session 2: Training (performance feedback given), Session 3: Post-training (no performance feedback). Resting state activity was recorded during the last 5 min of each session. During the 2nd session one group of participants underwent 1 mA tDCS stimulation and another underwent sham stimulation over the right posterior parietal cortex. Resting state spontaneous activity, as measured by fractional amplitude of low frequency fluctuations (fALFF), for session 2 showed significant differences between the tDCS stim and sham groups in the precuneus. Resting state functional connectivity from the precuneus to the substantia nigra, a subcortical dopaminergic region, was found to correlate with future improvement in visual search task performance for the stim over the sham group during active stimulation in session 2. The after-effect of stimulation on resting state functional connectivity was measured following a post-training experimental session (session 3). The left cerebellum Lobule VIIa Crus I showed performance related enhancement in resting state functional connectivity for the tDCS stim over the sham group. The ability to determine the relationship that the relative strength of resting state functional connectivity for an individual undergoing tDCS has on future enhancement in behavioral performance has wide ranging implications for neuroergonomic as well as therapeutic, and rehabilitative applications.

Keywords: fMRI, tDCS, resting state, functional connectivity, visual search, neuroergonomics

## INTRODUCTION

In recent years there has been an explosion of research investigating a method by which to augment human cognition by passing a low amplitude direct current (typically in the range of 0.5–2 mA) through a person's head and enhancing human performance and abilities (Coffman et al., 2014). This technique is called transcranial direct current stimulation (tDCS). tDCS has been shown to enhance such abilities as attention and performance on vigilance, threat detection, and

Edited by: Hasan Ayaz, Drexel University, USA

#### Reviewed by:

Christophe Phillips, University of Liége, Belgium Filippo Brighina, University of Palermo, Italy Vincent Clark, University of New Mexico, USA

> \*Correspondence: Daniel E. Callan dcallan@nict.go.jp

Received: 07 November 2015 Accepted: 12 February 2016 Published: 07 March 2016

#### Citation:

Callan DE, Falcone B, Wada A and Parasuraman R (2016) Simultaneous tDCS-fMRI Identifies Resting State Networks Correlated with Visual Search Enhancement. Front. Hum. Neurosci. 10:72. doi: 10.3389/fnhum.2016.00072 visual search tasks (Falcone et al., 2012; Parasuraman and Galster, 2013; Nelson et al., 2014); to enhance learning and performance on perceptual and cognitive tasks (Clark et al., 2012; Parasuraman and McKinley, 2014); and to improve motor and cognitive function in patients with brain damage, neuropsychiatric, and neurological diseases (Flöel, 2014; Kuo et al., 2014; O'Shea et al., 2014). The underlying neurological processes that allow for these enhancements in ability are under ongoing investigation. It has been shown that anodal DC stimulation decreases neural firing thresholds, and that glutamatergic modulation of long-term potentiation/depression may be involved with the enduring effects of tDCS (Liebetanz et al., 2002; Nitsche et al., 2003; Bikson et al., 2004; Coffman et al., 2014; Hunter et al., 2015). While one may expect these effects to be localized on the cortex near the stimulating electrode, functional MRI (fMRI) studies have also shown modulation in activity in distal brain regions suggesting possible network effects induced by tDCS (Clemens et al., 2014; Ellison et al., 2014; Weber et al., 2014).

It is our goal in this study to use simultaneous tDCS and fMRI to investigate the relationship between modulation in resting state activity as well as resting state functional connectivity of the brain correlated with improved performance as a result of stimulation. Studies have shown that resting state activity and connectivity in the brain can predict various characteristics such as attention (Kelley et al., 2008), learning (Baldassarre et al., 2012), memory (Hampson et al., 2006), language processing (Koyama et al., 2011), personality (Adelstein et al., 2011), and IQ (van der Heuvel et al., 2009; for review, see Stevens and Spreng, 2014). Previous studies using tDCS and fMRI have revealed, that as a result of stimulation, resting state networks can show wide spread changes in activity and connectivity in cortical and subcortical brain regions (Saiote et al., 2013; Clemens et al., 2014).

In our study, we investigate both resting state activity and performance related resting state connectivity in response to tDCS. A visual search task was employed before (pre-training), during (training), and after (post-training) tDCS stimulation to determine its facilitative effects on performance. Resting state fMRI was recorded toward the end of each session after completing the visual search task. We placed the stimulating electrode over the posterior parietal cortex as it has been found in previous tDCS studies to modulate visual search performance (Bolognini et al., 2010; Ellison et al., 2014). We used the fractional amplitude of low frequency fluctuations (fALFF) in the BOLD signal, which has been found to be associated with spontaneous neural activity (Biswal et al., 1995; Zou et al., 2008; Song et al., 2011), as a measure of resting state activity. By comparing fALFF across tDCS stimulation and sham groups we intend to show brain regions in which the spontaneous neural activity is being modulated. Unlike most previous neuroimaging studies, we applied tDCS and fMRI concurrently in order to observe the active effects of tDCS on resting state activity rather than just the after-effects that exist following the cessation of tDCS. Brain regions determined to show tDCS induced activity are then used as seed regions for a functional connectivity analysis (Song et al., 2011). It is hypothesized that resting state functional connectivity related to improvement in behavioral performance on the visual search task will be found to exist for these seed regions for the tDCS group to a greater extent than for the sham group.

Our study addresses many of the future directions concerning the investigation of tDCS on resting state activity and connectivity proposed by Clemens et al. (2014). Specifically, we applied tDCS and fMRI concurrently to investigate the immediate active effects on resting state activity and connectivity. The after-effects of tDCS on resting state functional connectivity were also investigated following a post-training session. In addition, as proposed by Clemens et al. (2014), our study includes the use of sham stimulation. By comparing between tDCS stim and a sham group (unlike other studies that look at tDCS stim vs. pre-stim), our study is able to investigate behaviorally related enhancement in resting state functional connectivity that differs between the two groups that can be attributed to modulation by tDCS rather than changes in resting state connectivity that normally occur as a result of task training.

# MATERIALS AND METHODS

## Participants

There were 28 participants that took part in this study. All of the participants (14 males, 14 females) were Japanese righthanded adults ranging from 18 to 25 years (mean = 20.7) of age from Osaka University. The participants were pseudo-randomly assigned to the tDCS stim and sham groups such that there were seven females and seven males in each group. All participants were screened for exclusion if there was a history of head injury, history of mental, neurological, alcohol or drug abuse disorders, or using medication that affects central nervous system function. The participants gave written and informed consent to take part in this experiment. The experimental procedures were approved by the National Institute of Information and Communications Technology (NICT) Human Subject Review Committee were carried out in accordance with the principles expressed in the WMA Declaration of Helsinki. Originally there were 18 tDCS stim and 17 sham participants. From the tDCS stim group two participants were excluded because of pressure pain caused by the tight fit of the headphones within the head coil and two participants were excluded because task performance was below chance on session three even after completing the training session. From the sham group one participants was excluded because of pressure pain caused by the tight fit of the headphones within the head coil and two participants were excluded because task performance was below chance on session three even after completing the training session.

#### Procedure

The experiment consisted of three sessions within the fMRI scanner. During the first part of scanning the participants conducted a visual search task. During the last 4.5 min of fMRI scanning, for each session, resting state activity was acquired. In this study, we will focus only on the resting state fMRI data from these sessions.

The visual search task was based on a search and rescue mission that required participants to locate a red pickup truck located in the search area amongst buildings and other similar looking distractor vehicles. In each trial there were five nonmoving vehicles distributed throughout the search area, one of which could be the red truck. There were a total of 60 trials in each session and the target red truck was randomly present on half of the trials. The task was designed so that as the unmanned aerial vehicle UAV loitered in a circle around the search area, all vehicles would remain in constant view despite a continually changing view angle. Each trial lasted 10 s where the participants searched the area looking for the target and were required to make a button press indicating whether the search area contained a target or not.

The three experimental sessions consisted of the following: Session 1: Pre-training session that did not provide performance feedback. Session 2: Training session in which tDCS stimulation or sham stimulation was delivered. In the training session immediate reinforcement error-feedback (''ding'' sound correct, ''buzz'' sound incorrect) after each response. Additionally, for target present trials only, a transparent white sphere would appear over the target at the end of the 10 s trial identifying the target location. This type of auditory reinforcement feedback will allow subjects to know immediately whether the features they were attending to are incorrect, in the case of a false alarm or a miss, or correct, in the case of a correct rejection or a hit. This information together with the visual feedback of the position of the target at the end of the trial when it was present will allow for learning of the relevant features and improve performance. Session 3: Posttraining session with no feedback. The total time of the visual search task for session 1 and 3 was approximately 15 min and session 2 was approximately 16 min. After each experimental session resting state activity was recorded for 4.5 min. For session 2 tDCS stimulation was given concurrently with fMRI scanning. The task for the participants during collection of the resting state data was to visually fixate on a white cross mark presented in the center of the display against a black background. Participants were instructed to fixate on the cross on the screen, and to relax without falling asleep.

# Transcranial Direct Current Stimulation

TDCS was delivered during the training session (session 2) using the MRI compatible NeuroConn DC-Stimulator MR. Two rectangular-shaped (5.3 × 7.2 cm) MRI compatible conductive rubber electrodes were placed on the participant before entering the MRI scanner (see **Figure 1** for picture of placement of electrode on the head of a participant and a rendered MRI showing the tDCS electrode on the head). The anodal electrode was placed over the right posterior parietal cortex. It was placed over where the P4 electrode is located according to the 10–20 International EEG System. The electrode was held in place by the conductive paste (Ten20 conductive paste gel, Waver and Company) as well as a padded headband. The cathodal electrode was placed over

FIGURE 1 | Top: Picture showing the placement of the anodal tDCS electrode on the right posterior parietal cortex of the participant. Bottom: The placement of the tDCS electrode can be seen in the rendered MRI of the participant. Sections are shown through the brain at the site of the electrode. For the MRI sections the right side of the image is the right side of the brain.

the contralateral left side trapezius muscle on the back of the shoulder.

Participants in the stim group received 1 mA current for a total of 30 min (1 mA was the highest level possible within the MRI scanner with our version of the NeuroConn DC-Stimulator MR). Stimulation was started 5 min before the task in order to ensure that the full modulatory effect of tDCS was active during task performance. Current was ramped up over the initial 10 s and ramped down the last 10 s of stimulation. The participants in the sham group also received 1 mA current but only for 30 s and then the unit was turned off. This procedure helps to conceal from the participant which group (stim or sham) they belong to as both groups feel the onset of the stimulation. In addition group membership of the participant was not known by the experimenter giving the instructions.

#### fMRI Data Collection and Analysis

fMRI scanning of resting state activity was acquired for 4.5 min at the end of each session (TR = 2 s; 30 interleaved slices covering the brain and cerebellum, 3 mm × 3 mm × 4 mm voxels; Siemens 3T Trio Scanner; 32 Channel head coil). Preprocessing of fMRI data was conducted using SPM8 (Wellcome Department of Cognitive Neurology, UCL) and included realignment and unwarping, normalization to the template EPI image (2 mm × 2 mm × 2 mm), and smoothing (8 mm × 8 mm × 8 mm). EPI template based normalization was used because the source image upon which the normalization parameters are determined is in the same space as the scans to be normalized. This has advantages in that it avoids additional steps of coregistration to/or from EPI space (resulting in image distortion) that are required when using an anatomical T1 or T2 image to determine the normalization parameters. Because whole brain EPI was acquired the difficulties associated with mapping partial EPI volumes to the template image are avoided.

The REST (Song et al., 2011) Toolkit was used to conduct the resting state spontaneous activity (fALFF) and the functional connectivity analyses. The realignment parameters were used as covariates of non interest and regressed out of the preprocessed EPI data to extract potential confounds related to head movement while scanning. The linear trend was then removed from the data. The parameters for the fALFF analysis included a low frequency fluctuation band of 0.01–0.08 Hz (Biswal et al., 1995) compared to the entire frequency range (0–0.25 Hz). The fALFF results were normalized by dividing by the mean fALFF values within the whole brain mask to be used for second level random effects analyses. The functional connectivity analysis was carried out over the preprocessed covariates removed detrended and filtered (0.01–0.08 Hz) data. Three separate functional connectivity analyses were conducted. The seed regions of interest (ROI) were determined from the results of the fALFF analysis (see ''Results'' Section). The ROI included the precuneus (MNI 6, −46, 60). A Spherical region with a radius of 8 mm at the given coordinate was used as the seed for the resting state functional connectivity analyses conducted separately for each session. The Pearson linear correlation was used to determine the functional connectivity between the mean of the voxels within the seed ROI and the rest of the voxels in the brain according to the defaults in the REST toolbox (Song et al., 2011). The Fisher's z transform was used to normalize the correlation coefficients to be used for second level random effects analyses. SPM8 was used to conduct the random effects analyses. Correction for multiple comparisons (p < 0.05) across the entire brain was carried out using Monte-Carlo simulation of the brain volume to define a voxel contiguity threshold at an uncorrected significance level of p < 0.005 (Slotnick et al., 2003; Ellison et al., 2014). Using 10000 Monte-Carlo simulations a cluster extent greater than 154 voxels thresholded at p < 0.005 uncorrected, is necessary to correct for multiple comparisons across the whole brain at a threshold p < 0.05. Activated brain regions were identified using the SPM Anatomy Toolbox v1.8 (Eichkoff et al., 2005) as well as Talairach Client after transforming from the MNI to the Talairach coordinate system using mni2tal function in Matlab. The substantia nigra, red nucleus, and subthalamic nuclei were identified using the regions specified in Keuken and Forstmann (2015).

## RESULTS

#### Behavioral Results

The behavioral results in terms of percent correct on the visual search task for the tDCS stim and sham groups are as follows: there was a significant enhancement in performance post- relative to pre-training (ANOVA F(2,52) = 12.47, p < 0.05). The enhancement was statistically significant (t = 4.05; p < 0.05) for the stim group (pre-training mean = 64.26%; SE = 2.64; post-training mean = 72.06%; SE = 2.36) and was statistically significant (t = 3.15; p < 0.05) for the sham group (pretraining mean = 64.98%; SE = 2.47; post-training mean = 73.02%; SE = 1.99). There was no significant difference (assessed at p < 0.05) between stim and sham groups for either pre- (T = −0.21) or post-training (T = −0.33) sessions. The interaction between stim and sham groups and pre- and post-training session was not significant (F(2,52) = 0.3; p > 0.05). Additionally there was no significant difference (T = −0.83, p > 0.05) between stim (mean = 64.86%; SE = 2.7) and sham (mean = 67.94%; SE = 2.74) groups for the training session 2.

#### Brain Imaging Results

#### Resting State Activity: fALFF Analysis

The results of the fALFF analysis are given in **Figure 2** and **Table 1**. Using a random effects between groups t-test, statistically significant (p < 0.05 corrected for multiple comparisons, see ''Materials and Methods'' Section) differences in fALFF between the stim and sham groups for session 2 were found to be located in three clusters of activity: Cluster 1 is located around the right superior parietal spreading into the left parietal cortex as well as the neighboring regions of the precuneus, post central gyrus, pre-central gyrus, and supplementary motor area (It should be noted that since the analysis is corrected for multiple comparisons at the cluster level we cannot definitively know which of these regions making up the cluster are activated, only that some of them are); Cluster 2 is located in the right inferior parietal lobule including the temporal parietal junction; Cluster 3 is located in the premotor cortex BA6 (see **Figure 2**, top and **Table 1**, top). Brain regions statistically significant (p < 0.05 corrected) for the stim–sham contrast masked by the interaction (random effects ANOVA)

of Stim (Session 2 – Session 1) – Sham (Session 2 – Session 1) with a corrected threshold of p < 0.05, consisted of the right precuneus (see **Figure 2**, bottom and **Table 1**, bottom). Masking by the interaction controls against differences in fALFF that may exist between the tDCS stim and sham groups prior to training. In addition the masking also allows for a more focal site that is likely modulated by the tDCS, which can be used as a seed for the resting state functional connectivity analyses. No statistically significant differences in fALFF were found for session 2 for the sham greater than stim contrast when correcting for multiple comparisons.

#### Resting State Connectivity: Functional Connectivity Analysis

Resting state functional connectivity analyses for each of the three sessions, using post relative to pre behavioral performance


TABLE 1 | Differential resting state activity between stim and sham groups determined by fractional amplitude of low frequency fluctuations.

Top: Brain regions showing significant differential activity for the stim–sham comparison for session 2 corrected for multiple comparisons at the cluster level (p < 0.05) using Monte-Carlo simulation (corrected cluster extent threshold greater than 154 contiguous voxels over uncorrected significance threshold of p < 0.005). Bottom: Regions that are also significant when masking the results by the interaction of Stim (Session 2 – Session 1) – Sham (Session 2 – Session 1) corrected for multiple comparisons at the cluster level (p < 0.05). BA, Brodmann area; IPL, Inferior Parietal Lobule; SPL, Superior Parietal Lobule; SMA, Supplementary Motor Area; TPJ, Temporal Parietal Junction. Negative "x" MNI coordinates denote left hemisphere and positive "x" values denote right hemisphere activity.

as a covariate of interest, were conducted using the precuneus (significant for the fALFF analyses; see **Figure 1**, bottom and **Table 1**, bottom) as a seed. In order to determine differences between the tDCS stim and sham groups in resting state functional connectivity related to enhanced behavioral performance, a random-effects between groups t-test using post-minus pre-training behavioral performance as a covariate of interest was conducted. Enhancement in behavioral performance was defined as the percent correct score for session 3 minus session 1 for each participant. The resting state functional connectivity score for each participant was the resultant Fisher's z transformed normalized correlation coefficient of the connectivity analysis for each voxel in the brain.

The results of the behavioral enhancement related functional connectivity analysis for sessions 1, 2, and 3 using the right precuneus as the seed region to the voxels in the entire brain are the following:

For session 1 behavioral enhancement related resting state functional connectivity was not observed for the stim–sham contrast, the stim alone contrast, or the sham alone contrast using a cluster level corrected threshold of p < 0.05.

For session 2 a cluster encompassing the substantia nigra, red nucleus, and subthalamic nuclei was found to show statistically significant differences in behaviorally related resting state functional connectivity when correcting for multiple comparisons at the cluster level (p < 0.05) for the stim–sham contrast (see **Figure 3** and **Table 2**). For the stim alone contrast three clusters showed statistically significant behavioral enhancement related resting state functional connectivity (p < 0.05 corrected). These clusters include the following: (1) the substantia nigra, red nucleus, and subthalamic nuclei; (2) the thalamus; and (3) the cerebellar lobule VIIIa Vermis (see **Figure 4** and **Table 2**, bottom). The sham alone contrast for session 2 did not show any statistically significant behaviorally related resting state functional connectivity using a corrected threshold of p < 0.05.

FIGURE 3 | Session 2 results of the SPM random effects between groups t-test for stim relative to the sham group for resting state functional connectivity with the precuneus using post-pre behavioral performance as a covariate of interest. Statistically significant (p < 0.05 corrected) differences in behaviorally related resting state functional connectivity are rendered on sections of a template T1 MRI scan at MNI coordinates for the peaks in the various significant clusters. Negative "x" MNI coordinates denote left hemisphere and positive "x" values denote right

hemisphere activity. For the MRI sections the right side of the image is the right side of the brain. For session 3 two clusters (one in the cerebellum lobule

VIIa Crus I and the other in the insula) showed statistically significant (p < 0.05 corrected) differences in behaviorally related resting state functional connectivity for the stim–sham contrast (see **Figure 5** and **Table 3**). For the stim alone contrast four clusters showed statistically significant (p < 0.05 corrected)


Brain regions showing significant differential resting state connectivity for session 2 corrected for multiple comparisons at the cluster level (p < 0.05) using Monte-Carlo simulation (corrected cluster extent threshold greater than 154 contiguous voxels over uncorrected significance threshold of p < 0.005). BA, Brodmann area. Negative "x" MNI coordinates denote left hemisphere and positive "x" values denote right hemisphere activity. Correlation coefficient r: ∗∗denotes (p < 0.005).

behaviorally related resting state functional connectivity. These clusters include the following: (1) the cerebellum lobule VIIa Crus I; (2) the cerebellar lobule VI vermis and hemisphere; (3) the thalamus; and (4) the inferior parietal cortex (see **Figure 6** and **Table 3**, bottom). The sham alone contrast for session 3 did not show any statistically significant behaviorally related resting state functional connectivity using a corrected threshold of p < 0.05.

The magnitude and the direction of the correlation between behavioral enhancement and resting state functional connectivity are given in **Tables 2**, **3** (correlation coefficient r). For the tDCS stim group an increase in resting state functional connectivity

FIGURE 5 | Session 3 results of the SPM random effects between groups t-test for stim relative to the sham group for resting state functional connectivity with the precuneus using post-pre behavioral performance as a covariate of interest. Statistically significant (p < 0.05 corrected) differences in behaviorally related resting state functional connectivity are rendered on sections of a template T1 MRI scan at MNI coordinates for the peaks in the various significant clusters. Negative "x" MNI coordinates denote left hemisphere and positive "x" values denote right hemisphere activity. For the MRI sections the right side of the image is the right side of the brain.

is statistically significantly correlated (see **Tables 2**, **3**) with the increase in behavioral performance. This relation is not found for the sham group.

# DISCUSSION

Resting state functional connectivity during tDCS is correlated with future improvement in performance. Our study shows that tDCS affects low amplitude fluctuations in spontaneous brain activity in the precuneus region around the anodal stimulating electrode (see **Figure 2**, bottom and **Table 1**, bottom). Performance enhancement related differences (between tDCS stim and sham groups that is also present in the stim alone analysis) in resting state functional connectivity were found from the precuneus to a cluster encompassing the substantia nigra, red nucleus, and the subthalamic nuclei during concurrent tDCS stimulation for session 2 (see **Figures 3**, **4** and **Table 2**). An after-effect of tDCS stimulation on resting state functional connectivity was measured following a post-training session on the visual search task that occurred approximately 20 min after the session of tDCS stimulation. Performance enhancement related differences (between tDCS stim and sham groups that is also present in the stim alone analysis) in resting state functional connectivity were found from the precuneus to a


Brain regions showing significant differential resting state connectivity for session 3 corrected for multiple comparisons at the cluster level (p < 0.05) using Monte-Carlo simulation (corrected cluster extent threshold greater than 154 contiguous voxels over uncorrected significance threshold of p < 0.005). BA, Brodmann area; IPC, Inferior Parietal Cortex. Negative "x" MNI coordinates denote left hemisphere and positive "x" values denote right hemisphere activity. Correlation coefficient r: <sup>∗</sup>denotes (p < 0.05) and ∗∗denotes (p < 0.005).

cluster encompassing the right cerebellum lobule VIIa Crus I for session 3 (see **Figures 5**, **6** and **Table 3**).

The mechanisms behind tDCS-induced enhanced cognition have been associated with that of activity-dependent plasticity. The precuneus region revealed by the fALFF analysis to be specifically modulated by anodal tDCS is most likely the result of increased spontaneous neuronal firing due to excitability changes brought on by tDCS. Spontaneous fluctuations in BOLD signal related to cognitive abilities are known to be present at rest (Biswal et al., 1995; Stevens and Spreng, 2014). Furthermore, studies have shown that resting state activity is modulated by tDCS (Saiote et al., 2013; Clemens et al., 2014).

The results of the fALFF analysis revealed that the precuneus showed significantly greater spontaneous resting state activity for the tDCS stim over the sham group that was not attributed to preexisting differences between the groups in resting state activity present prior to tDCS stimulation (see **Figure 2**, bottom and **Table 1**, bottom). The precuneus has been found to be involved with processes related to the visual search task employed in this experiment. These include attentive tracking of moving targets (Culham et al., 1998), attention orientation (Le et al., 1998; Simon et al., 2002), attention shift between object features (Nagahama et al., 1999), and mental rotation (Suchan et al., 2002). It has been put forward that the precuneus is involved with internally guided attention and manipulation of mental images related to visuospatial processing (Cavanna and Trimble, 2006).

Using the precuneus as a seed, we were able to reveal performance related differences in resting state functional connectivity associated with tDCS stimulation. The resting state functional connectivity analysis assumes that, in the absence of ongoing task related activity, two regions that display spontaneous fluctuations in BOLD signal that are highly temporally synchronized are likely within the same functional network. Using visual search performance posttraining (session 3) relative to pre-training (session 1) as a covariate of interest in this analysis allowed us to identify regions that were associated with improved performance as temporal synchrony (functional connectivity) increases with our seed ROI (precuneus). Our results revealed visual search performance enhancement related differences in resting state functional connectivity between the precuneus and a cluster encompassing primarily the substantia nigra for the stim over the sham group (that was also present for the stim group alone contrast; **Figures 3**, **4** and **Table 2**). Interestingly, consistent with the task presented in our study, previous research has implicated the substantia nigra with aspects of visuospatial processing (Matsumoto and Takada, 2013). The study by Matsumoto and Takada (2013), using single cell recordings in monkeys, showed that neurons in the substantia nigra were active when the task required visual search and working memory. Consistent with the findings in these studies, the task in our experiment required the participant to maintain the features of the target truck and distractors in working memory to accomplish the visual search task. Also relevant to our study, the substantia nigra, is part of the dopaminergic system (Björklund and Dunnett, 2007). The dopaminergic system is thought to be intimately involved with value dependent learning (Montague et al., 1996; Schultz, 1998; Doya, 2002; Callan and Schweighofer, 2008). The performance related enhancement in resting state functional connectivity for the stim over the sham group between the precuneus and the substantia nigra is consistent with the hypothesis that tDCS may in part be modulating value dependent learning systems involved with the visual search task (see **Figures 3**, **4** and **Table 2**). While we cannot rule out that the effect of tDCS stimulation alone is responsible for our observed performance related resting state connectivity, given the function of the brain regions involved, it is perhaps more likely that the effect of tDCS stimulation interacting with the visual search task is responsible for the changes in performance related resting state functional connectivity that we observe in our study.

In addition to investigating the active effects of tDCS stimulation on performance related enhancement in resting state functional connectivity we also investigated the after-effect of tDCS stimulation following a post-training session on the visual search task that occurred approximately 20 min after the session of tDCS stimulation. The session 3 results revealed visual search

performance enhancement related differences in resting state functional connectivity between the precuneus and a cluster in the cerebellum lobule VIIa Crus I for the stim over the sham group (that was also present for the stim group alone contrast; **Figures 5**, **6** and **Table 3**). From studies on individuals with localized brain damage, human functional imaging studies, and animal studies the cerebellum is known to be involved with visuospatial processing (for review, see Molinari and Leggio, 2007). Related to our experimental task, the same region of the cerebellum as is present in our study has been found in an fMRI study to be involved with preparatory processes involved with visual search (Bourke et al., 2013). Also relevant is the known presence of anatomical connections between the precuneus and multiple circuits in the cerebellum through the basis pontis (Cavanna and Trimble, 2006).

One interesting finding of the analyses concerning the after-effects of tDCS stimulation is that the locus of differential performance related resting state functional connectivity (between tDCS stim and sham groups) is different from that of active tDCS stimulation. It is unclear why the focus of performance related resting state functional connectivity to the precuneus switches from the substantia nigra during active stimulation to the cerebellum as an after-effect. It is possible that the differences reflect distinctive stages of learning and correspondent modification of resting state networks.

Also of interest was the lack of any significant performance related resting state functional connectivity with the precuneus for the sham group while several regions were found to be significant for the tDCS stim group for active and after-effect analyses (see **Figures 4**, **6** and **Tables 2**, **3**). One possibility is that multiple degenerate networks (Edelman, 1987) are utilized for processing the visual search task for the sham group whereas for the tDCS stim group, as a result of stimulation, specific networks are selectively used. This is potentially why the degree of correlation between the strength of these networks and behavioral improvement in performance is relatively high (see **Tables 2**, **3**) for the tDCS stim group.

In our study as well as in others (e.g., Polanéa et al., 2011) the seed ROIs for the resting state functional connectivity analyses were selected based on previous, although different, analyses of the same data. The advantage of using the results of the fALFF analysis as seed ROIs is that we ensure that we are actually utilizing regions that are showing potential modulation as a result of the tDCS for the functional connectivity analysis instead of arbitrarily selecting a region underneath the stimulation electrode. We do not believe that this will unduly bias the results of the functional connectivity analyses for stim over the sham group comparison because the fALFF (fluctuations in low frequency activity in single voxels) and functional connectivity (correlation in time course between voxels) analyses are quite different. Additionally, we employed the use of improvement in behavioral performance post-relative to pre-training as a covariate of interest in the functional connectivity analyses. There is no a priori reason to believe that future improvement in behavioral performance should be predicted by differences in fALFF or functional connectivity unless of course these changes are induced as a result of tDCS.

As with all brain imaging studies, there are many potential limitations and confounds that need to be addressed. One potential limitation of this study is that the changes we see in fALFF and resting state functional connectivity may not be a result of changes in spontaneous neural activity, but rather may be a result of changes in cerebral perfusion or noise induced by tDCS (this is only a potential problem for session two in which concurrent tDCS and fMRI was applied). Previous studies using concurrent tDCS and fMRI have suggested that tDCS induced distortions on fMRI SNR are minimal (Antal et al., 2011; Zheng et al., 2011). It is unlikely that these tDCS distortion effects would be specific to the low frequency range (0.01–0.08 Hz) used in the fALFF analysis. Since the fALFF compares this low frequency range to the entire range (0–0.25 Hz) it would cancel out any effects induced by tDCS (artifacts on fMRI, etc . . .) that are not frequency specific. In terms of the resting state functional connectivity analyses we employed there is no reason why changes in cerebral perfusion or noise induced by tDCS would correlate with post-relative to pre-training behavioral performance. It is much more likely that the behaviorally related resting state functional connectivity we observed is a result of changes in spontaneous neural activity resulting from tDCS stimulation.

An additional confound that we did not test was whether the radio frequency and gradients associated with fMRI EPI scanning influences the tDCS current. The NeuroConn DC-Stimulator MR that we used for this experiment includes an RF filter module with MRI compatible cables and electrodes. These components help to prevent effects of RF on tDCS current. In addition we avoided cable loop formation in the setup that may result in gradient induced currents. Furthermore, the fMRI compatible cables used for tDCS had a high resistance (5 kΩ), which should also decrease the induced current avoiding potential effects of MRI on tDCS. Although we do not believe it to be the case, since we did not measure the current during stimulation, we cannot rule out the possibility that the tDCS current could have been reduced or modulated such that its enhancement effects on behavior were diminished.

One of the biggest limitations of our study was the lack of a robust difference in behavioral performance after training for the tDCS stim over the sham group. There are several reasons why group differences in improved performance between the stim and sham groups may not have been observed in our study when they have been found in many previous studies (Coffman et al., 2014). One reason may be that the task was too difficult when first starting such that many participants were at near chance levels. The effects of training were so great in this situation that the modulatory benefits of tDCS were washed out in the behavioral data. Another reason for a lack of a behavioral performance enhancement difference between tDCS stim and sham groups may be that training was too short for the modulatory effects of tDCS to be revealed in behavioral performance. Relatedly, another limitation in our study was the restriction on the level of tDCS stimulation to be a maximum of 1 mA with our version of the NeuroConn DC-Stimulator MR. Many tDCS studies commonly utilize 2 mA to get robust enhancement in behavioral performance (Coffman et al., 2014). One reason why we did not see an overall difference between tDCS stim and sham groups may be because the level of stimulation was too low to induce robust enhancement in the short training time of approximately 16 min. Given that we do see strong performance related differences in the resting state functional connectivity data it may be the case that we are observing the early stages of tDCS modulated learning. The involvement of dopaminergic brain regions involved with value dependent learning as well as to working memory and visual attention are certainly consistent with this hypothesis. In the future it would be interesting to investigate whether longer training on this same task as well as higher stimulating levels (2 mA) results in enhanced behavioral performance for tDCS stim over sham groups.

# CONCLUSION

Participants' who showed greater resting state functional connectivity between parietal regions and dopaminergic subcortical brain regions (substantia nigra, hippocampus, and amygdala) showed greater improvement in visual search task performance. Essentially, future improvement in performance showed a significant linear relationship with resting state functional connectivity (thought to be) induced by tDCS. These results suggest that it may be possible to employ multivariate pattern analysis machine learning decoding techniques to predict future performance, given a certain pattern of resting state functional connectivity. Future experiments need to be conducted to determine if changes in resting state functional activity and connectivity induced by tDCS can be used to predict long-term changes in task performance. One could even use neurofeedback techniques (Fukuda et al., 2015) in conjunction with tDCS to induce greater changes in specific functional networks. These techniques could be utilized to optimize performance benefits resulting from tDCS. Our results have wide ranging implications regarding effective utilization of tDCS for neuroergonomic as well as therapeutic, and rehabilitative applications.

# DEDICATION

The present manuscript wishes to be the authors' personal honor to the memory of Raja Parasuraman, as a tribute to his scientific competence, and the contribution that he gave to our research projects in these years of proficient collaboration. Raja was a kind and generous man who was a pioneer in the field of neuroergonomics. He passionately sought out to facilitate and inspire researchers in the field instilling a thirst for knowledge. RP will always be present as a driving force inspiring us all to succeed.

# AUTHOR CONTRIBUTIONS

DEC, BF and AW conducted experiment. DEC, BF, AW and RP designed experiment, analyzed the data and wrote the manuscript.

# FUNDING

This work was supported by the Center for Information and Neural Networks, National Institute of Information and Communications Technology. Additional support to BF was given by the National Science Foundation sponsored East Asia and Pacific Summer Institutes fellowship award 1414852, funded in collaboration with Japan Society for the Promotion of Science. Additional support to RP was given by the Air Force Office of Sponsored Research grant FA9550-10-1-0385.

#### ACKNOWLEDGMENTS

We would like to thank Chie Kawakami for her extensive assistance in conducting the experiment and Akiko

#### REFERENCES


Callan for her assistance in translating instructions from English to Japanese. We would also like to thank the MRI technicians at CiNet for their assistance in running the experiments.


critical role of attention. Front. Hum. Neurosci. 7:273. doi: 10.3389/fnhum.2013. 00273


visual rotation. Behav. Brain Res. 136, 533–544. doi: 10.1016/s0166-4328(02) 00204-8


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Callan, Falcone, Wada and Parasuraman. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Transcranial Direct Current Stimulation Modulates Neuronal Activity and Learning in Pilot Training

Jaehoon Choe<sup>1</sup> , Brian A. Coffman1, 2, 3, Dylan T. Bergstedt 1, 4 , Matthias D. Ziegler 1, 5 and Matthew E. Phillips <sup>1</sup> \*

<sup>1</sup> HRL Laboratories LLC, Malibu, CA, USA, <sup>2</sup> Department of Psychiatry, The University of Pittsburgh, Pittsburgh, PA, USA, <sup>3</sup> Psychology Clinical Neuroscience Center, The University of New Mexico, Albuquerque, NM, USA, <sup>4</sup> Department of Sports Medicine, Pepperdine University, Malibu, CA, USA, <sup>5</sup> Advanced Technologies Laboratories, Lockheed Martin, Arlington, VA, USA

Skill acquisition requires distributed learning both within (online) and across (offline) days to consolidate experiences into newly learned abilities. In particular, piloting an aircraft requires skills developed from extensive training and practice. Here, we tested the hypothesis that transcranial direct current stimulation (tDCS) can modulate neuronal function to improve skill learning and performance during flight simulator training of aircraft landing procedures. Thirty-two right-handed participants consented to participate in four consecutive daily sessions of flight simulation training and received sham or anodal high-definition-tDCS to the right dorsolateral prefrontal cortex (DLPFC) or left motor cortex (M1) in a randomized, double-blind experiment. Continuous electroencephalography (EEG) and functional near infrared spectroscopy (fNIRS) were collected during flight simulation, n-back working memory, and resting-state assessments. tDCS of the right DLPFC increased midline-frontal theta-band activity in flight and n-back working memory training, confirming tDCS-related modulation of brain processes involved in executive function. This modulation corresponded to a significantly different online and offline learning rates for working memory accuracy and decreased inter-subject behavioral variability in flight and n-back tasks in the DLPFC stimulation group. Additionally, tDCS of left M1 increased parietal alpha power during flight tasks and tDCS to the right DLPFC increased midline frontal theta-band power during n-back and flight tasks. These results demonstrate a modulation of group variance in skill acquisition through an increasing in learned skill consistency in cognitive and real-world tasks with tDCS. Further, tDCS performance improvements corresponded to changes in electrophysiological and blood-oxygenation activity of the DLPFC and motor cortices, providing a stronger link between modulated neuronal function and behavior.

Keywords: tDCS, EEG, fNIRS, DLPFC, M1, flight simulation, skill learning

Edited by: Hasan Ayaz, Drexel University, USA

#### Reviewed by:

Makii Muthalib, University of Montpellier, France Mickael Causse, Institut Supérieur de l'Aéronautique et de l'Espace, France

> \*Correspondence: Matthew E. Phillips mephillips@hrl.com

Received: 06 November 2015 Accepted: 19 January 2016 Published: 09 February 2016

#### Citation:

Choe J, Coffman BA, Bergstedt DT, Ziegler MD and Phillips ME (2016) Transcranial Direct Current Stimulation Modulates Neuronal Activity and Learning in Pilot Training. Front. Hum. Neurosci. 10:34. doi: 10.3389/fnhum.2016.00034

#### Choe et al. tDCS in Pilot Training

# INTRODUCTION

There has recently been a rapid increase in the number of published studies in the field of neuromodulation due to the availability of non-invasive stimulation technologies such as transcranial direct current stimulation (tDCS). New tools for training enhancement are emerging which target specific, basic cognitive functions, with the goal of increasing performance in high-level, real-world tasks, such as pilot training. For example, Clark et al. (2012) demonstrated enhanced concealed image detection training with tDCS. Others have observed enhanced skill learning with tDCS in spatial and verbal working memories (Martin et al., 2014; Richmond et al., 2014), language acquisition (Flöel et al., 2008) and motor skills development (Banissy and Muggleton, 2013; Reis et al., 2015; Rumpf et al., 2015). For a review of tDCS enhancements (see Coffman et al., 2014).

Computerized cognitive training methods have been only moderately successful in enhancing performance (Ball et al., 2002). However, computerized procedural training (flight simulation) has been an important part of airplane pilot training since the mid 1970's. Commercial and military pilot training programs now utilize flight simulation extensively for training basic flight and combat skills (Bell and Waag, 1998; Rosenkopf and Tushman, 1998). Research on the effectiveness of flight simulator training has historically been limited by the high cost of full flight simulators, and occurs in the context of ongoing pilot training programs, rather than unbiased thirdparty research programs (Hays et al., 1992; Rosenkopf and Tushman, 1998). The field has recently overcome this limitation by the commercialization of relatively low-cost flight simulator devices available for purchase and use in standard research environments. These personal computer-based flight simulators are also used in various contexts for flight training (Koonce and Bramble, 1998), lending ecological validity to simulator studies.

Piloting an airplane is a demanding task requiring skillful execution of learned procedures. This has been observed as a correlation between flight simulator performance and measures of reasoning and working memory in general aviation pilots (Causse et al., 2011), and a concurrent decline in working memory and flight errors (Dismukes, 2008; Engle, 2010). Furthermore, neurophysiological markers of both short-term (e.g., fatigue) and long-term (e.g., expertise) cognitive functions correlate with behavioral performance (Ayaz et al., 2013; Borghini et al., 2014). Pilot skill development requires a synthesis of multiple cognitive faculties, many of which are enhanced by tDCS and include: dexterity (Boggio et al., 2006), mental arithmetic (Hauser et al., 2013), cognitive flexibility (Chrysikou et al., 2013), visuo-spatial reasoning (Heimrath et al., 2012), and working memory (Gill and Hamilton, 2014)—an important predictor of flight situation awareness in novices (Sohn and Doane, 2004).

Working memory is linked primarily with brain activity in the dorsolateral prefrontal cortex (DLPFC) (Courtney et al., 1996; Braver et al., 1997; Curtis and D'Esposito, 2003), an area often targeted by non-invasive brain stimulation in cognitive research. Most researchers agree that tDCS of DLPFC has substantial effects on working memory (for a review see Coffman et al., 2014); however, Horvath et al. (2015) recently reported disconfirming evidence for this hypothesis in a meta-analysis of selected studies investigating the cognitive effects of tDCS. In this meta-analysis, tDCS did not have a significant effect on any cognitive measure. However, their approach may be confounded by calculation of effect sizes based only on post-stimulation scores, rather than accounting for pre-stimulation differences between groups. Chhatbar and Feng (2015) illustrated this issue in their response paper, where they show substantial effects of tDCS when calculating effect sizes from pre-post difference scores rather than post-stimulation scores alone.

The focality of stimulation is also a critical component of tDCS-driven behavioral changes, and this aspect of experimental design is difficult to capture in meta-study. Large pad-type electrodes used in previous studies have comparatively poor focality and target current intensity as compared to the multiple electrode montage approach (Dmochowski et al., 2011). Finite elements modeling work with MRI-derived brain models performed by various groups demonstrate optimization of currents to the brain that improve focality and intensity to areas of interest by 80 and 98%, respectively (Bikson et al., 2009; Datta et al., 2011, 2012; Dmochowski et al., 2011; Faria et al., 2011; Edwards et al., 2013). The importance of this modeling work is underscored by clinical investigations that show differences in targeting and stimulation intensity results in marked differences in behavioral output and stimulation efficacy (Valle et al., 2009; Moliadze et al., 2010; Mendonca et al., 2011). Finally, Santarnecchi et al. (2015) have suggested that the impact of tDCS on target brain structures is dependent on not only the placement of electrodes and current density, but also the current state of activity in those brain areas. This crucial point is often overlooked in tDCS research, and investigators should carefully consider the cognitive task performed during stimulation to maximize the desired effect.

Despite recent controversy over the effects of tDCS on working memory, tDCS applied to specific brain regions has been reported to improve behavioral performance in a diverse array of cognitive categories: attention (Coffman et al., 2014), reaction time (Teo et al., 2011), object recognition (Clark et al., 2012), memory (Manenti et al., 2013), creativity (Chrysikou et al., 2013), and motor skill acquisition (Nitsche et al., 2003). In addition to acute improvement of various performance measures, some laboratories have also observed persistence of cognitive enhancement even after the electrical current is removed (Snowball et al., 2013; Lefebvre et al., 2014). These results indicate that, in some cases, stimulation need only be applied initially or periodically to achieve continual performance gains. Although the modulation of procedural learning through enhancement of working memory has remained an open question in the field, non-invasive brain stimulation methods are potential vehicles to enhance learning and performance and nootropic benefits for commercial and military applications (Clark et al., 2012; Phillips and Ziegler, 2014).

The application of neuroimaging techniques, such as functional near-infrared spectroscopy (fNIRS) and electroencephalography (EEG), allow the precise measurement of spatial and dynamic functional brain activity. The development of these non-invasive, low overhead and high-resolution tools have given investigators the ability to observe the activity of the human brain in vivo with an unprecedented degree of control (Been et al., 2007; McKendrick et al., 2015).

EEG results confirm tDCS-related modulation of brain processes involved in working memory, as evidenced by increased midline frontal theta-band oscillatory brain activity (MFT) during a working memory task (Miller et al., 2015). MFT is most commonly measured during maintenance of information in working memory, and reflects theta coupling between the DLPFC and anterior cingulate cortex (Sauseng et al., 2004). MFT is positively correlated with attentional demands during mental calculation (Ishii et al., 2014) and working memory load (Jensen and Tesche, 2002), and theta-band synchrony between frontal and parietal areas is directly related to individual working memory capacity (Palva et al., 2010). Further evidence supporting the functional relationship comes from studies temporarily disrupting the DLPFC with transcranial magnetic stimulation—leading to performance decrements in working memory tasks (Grafman et al., 1994; Pascual-Leone and Hallett, 1994). Other frequency bands have also been implicated in working memory and attentional control. For example, tonic increases (and phasic decreases) in parietal alpha-band power reflects greater perceptual involvement for tasks requiring attention to the environment (Klimesch, 1999), suggesting a role of alpha in perception. Furthermore, Sauseng et al. (2009) showed that alpha band activity over sensorimotor areas indicates greater excitability in that region, as measured with transcranial magnetic stimulation. Therefore, stimulation of either M1 or DLPFC could increase tonic alpha band activity in this study compared to sham by enhancing sensorimotor excitability and/or perceptual involvement.

Other imaging studies, employing fNIRS have found significant correlations between cognitive performance and blood oxygenation in the DLPFC (Yanagisawa et al., 2010; McKendrick et al., 2014). fNIRS is an non-invasive imaging technique that measures the relative concentrations of oxygenated (Hboxy) and deoxygenated (Hbdeoxy) hemoglobin to infer neuronal activity. fNIRS relies on differences in the near infrared absorption spectra of oxygenated and deoxygenated hemoglobin along with a neuro-vascular hemodynamic response function to relate relative chances in localized cerebral blood flow to neuronal activity (Villringer et al., 1993).

Hbdeoxy and total hemoglobin concentrations (Hbtot) are linked to levels of cognitive workload in the anterior prefrontal cortex (PFC) (Ayaz et al., 2012). For example, using a Scarborough adaptation of the Tower of London task, Ruocco et al. (2014) found that difficult problems were associated with greater Hboxy concentrations in the DLPFC relative to a baseline condition. The study also found that participants who scored higher in deliberation, or careful thinking, before acting, showed greater activation in this same region, regardless of task difficulty. The magnitude of Hbtot and Hbdeoxy concentration changes in specific brain regions has been used as a proxy for mental workload and expertise. Hbtot levels increase in the PFC during difficult trials in the N-back task, suggesting greater recruitment of neural resources (Herff et al., 2013). In addition, during a complex flight task, Hbtot levels decrease in the PFC over a 9-day learning period with progression from beginner to intermediate and finally advanced levels of performance (Ayaz et al., 2012). Furthermore, blood oxygenation level-dependent (BOLD) responses, which correlate with Hboxy, Hbdeoxy and Hbtot concentrations (Cui et al., 2011), decrease with improvements in response time, suggesting more efficient activation of PFC (Holland et al., 2011). Decreases in hemoglobin concentrations exist in the motor system (Hbdeoxy—Wolf et al., 2007), and in prefrontal cortex where they were correlated with reward value (Hboxy and Hbtot—DiStasio and Francis, 2013).

Although reported effects of primary motor cortex (M1) stimulation on skill acquisition and procedural learning have been promising, these methods have primarily been investigated in standard psychological and motor tasks including the serial reaction time task (Nitsche et al., 2003), the tower of London task (Dockery et al., 2009); and sequential visual isometric pinch task (Reis et al., 2009). Increasing evidence for the application of tDCS to enhance real-world skills has been reported for vehicle control (Beeli et al., 2008; Sakai et al., 2014), golf (Zhu et al., 2015), threat detection in image analysis (Falcone et al., 2012), air traffic control (Nelson et al., 2014). tDCS has also decreased resumption lag after interruption (Blumberg et al., 2014), and maintained vigilance (McIntire et al., 2014) in real-world tasks.

Critical for the acquisition of these real-work skills are both online and offline learning. Online learning is the change in behavioral performance across trials within an experimental session and is analogous to encoding (Reis et al., 2009). Offline learning is the change in performance, between sequential experimental sessions, from the last trial of the n-1th session to the first trial of the nth session, and is analogous to consolidation (Robertson et al., 2004). The modulation of online and offline learning rate for practical, real-world skill acquisition with tDCS of M1 or DLFPC stimulation have remained unexplored.

Here, we investigated changes in skill acquisition and learning rates with tDCS applied to either DLPFC or M1 during custom pilot training exercises developed and administered with a commercially available flight simulator (X-Plane). These results were recently reported in a poster presentation at the Society for Neuroscience Meeting (Choe et al., 2015). We measured task-evoked changes in functional activity using fNIRS and EEG as subjects learned to complete flight simulator and N-back training exercises at increasing levels of expertise across four daily consecutive sessions. We hypothesized that stimulation of DLPFC over the course of flight simulation and N-back training would alter group variability in skill learning, MFT power, and Hboxy and Hbtot concentrations in the DLPFC. Furthermore, we hypothesized that tDCS of M1 will alter tonic alpha-band power over parietal cortex.

# MATERIALS AND METHODS

#### Participants

Thirty-two right-hand dominant, healthy adult HRL Laboratories employees (31 males) participated in this study.

Easy Landing tasks are highlighted, and the duration of tDCS is depicted in red. (B) An example of 6 trials of the N-Back task is shown. 1-back orientation and location match trials are highlighted in yellow. (C) The flight simulator, neuroimaging (EEG and FNIRS) and tDCS setup is shown with on a subject (1). Flight simulator equipment includes three-panel display, a radio panel (2), an instrument panel (3) with (from left to right) compass, altimeter, airspeed indicator, vertical speed indicator, and turn/slip indicator, a multi-panel (4) with (from left to right) autopilot settings, auto throttle switch, flaps switch, and elevator trim wheel, yoke (5), and throttle quadrant system (6). (D) Autopilot flight path for the Easy Landing task is shown in 3 dimensions, color-coded by vertical speed. Screenshots for initial descent, approach, and landing are also shown.

Their ages ranged from 21 to 64 (mean ± STD = 38 ± 13). Participants were randomly assigned to one of four groups: DLFPC stim (n = 7, age = 35 ± 11), DLPFC sham (n = 7, age = 42 ± 13), M1 stim (n = 10, age = 41 ± 16), or M1 sham (n = 8, age = 31 ± 5). HRL Laboratories employees are a vulnerable class of subjects for this study. In order to manage the risk of any undue influence, coercion, or confidentiality breach we only allowed individuals who are not directly supervised by the investigators of this study to volunteer, and only performed experiments during normal business hours (9 a.m.–5 p.m.) to mitigate any possibility for recourse or reward for participation in performance evaluation or job advancement. To maintain confidentiality, each subject was assigned a unique number, known only to the investigators of the study and subject identities were not shared. This design is in line with the recommendations of Meyers (1979) on student and employees as a vulnerable population of subjects and complies with DHHS: protected human subject 45 CFR 46; FDA: informed consent 21 CFR 50. Inclusion criteria were: (1) normal or corrected-to-normal vision, (2) no prior history of epileptic seizures or known neurological disorders, and (3) no females who are pregnant or are likely to become pregnant during the course of the study. All participants provided written informed consent to participate in the experiment. JC, MDZ, and MEP are listed as inventors in patent applications on brain stimulation methods.

## Materials

#### Flight Simulator

Flight simulation tasks were designed and administered with the XForce Dream Simulator package (X-Force PC) and the X-plane 10 Flight Simulator software (Laminar Research). A depiction of the XForce Dream Simulator package can be seen in **Figure 1C**, and included a yoke, a radio panel, an instrument panel with compass, attitude indicator, altimeter, airspeed indicator, vertical speed indicator, and turn/slip indicator, a multi-panel with autopilot settings, auto throttle switch, flaps switch, and elevator trim wheel, and a throttle quadrant system. This flight simulator included an adjustable seat for maximum comfort for the subject. Three monitors were placed at an optimal distance from the subject to avoid any eyestrain. Custom scenarios were designed using the simulator software development kit following a model of flight training (Williams, 2012, see **Table 1**).

#### Neuroimaging

We recorded continuous EEG and fNIRS data during flight simulation training, N-back, finger tapping, situational awareness, and resting-state assessments. Horizontal and vertical electro-oculogram (EOG) was also recorded. EEG was collected using a 32-channel acti32Champ system, with electrodes placed in a custom, 10-10 based arrangement to accommodate tDCS electrodes (StarStim Neuroelectrics) and fNIRS illuminators/receivers (NIRSport NIRX) within custom headcaps (BrainVision). EEG caps were selected for each subject based on individual head size and aligned to Cz. Conductive gel (Signagel) was applied onto each EEG electrode and ultrasound gel (Aquasonic clear) was applied to each fNIRS source and detector. fNIRS was recorded with dual-wavelength continuous-wave (CW) near infrared (NIR) diffuse tomographic measurements at 760 and 850 nm. A total of 20 fNIRS channels (source-detector pairs) were recorded over the left M1 (10 channels) and right DLPFC (10 channels, see **Figure 2**). The distance between source-detector pairs was <3.5 cm (see **Figure 2**). EEG data were collected at 500 Hz, and fNIRS data were collected at 8 Hz. Locations of EEG electrodes and fNIRS channels can be seen in **Figure 2**.

#### tDCS

Sham or actual tDCS was applied with the Starstim system (Neuroelectrics) following the finger tapping task (see **Figure 1A**). The total current applied was 2 mA, with scalp current density of 0.04 A/m<sup>2</sup> for active tDCS (for 60 min), or 0.1 mA (0.002 A/m<sup>2</sup> ) for sham tDCS (for 1 min). Currents were applied with a 1 min ramp-in at initiation and a 1 min ramp-out at termination. Sham stimulation was used as a control condition to induce the physical sensation associated with tDCS (e.g., tingling) without directly stimulating the brain areas located below the electrodes (Coffman et al., 2012b). Silver/silver chloride electrodes were each 3.14 cm<sup>2</sup> in size (total anode area = 6.28 cm<sup>2</sup> ; total cathode area = 9.42 cm<sup>2</sup> ). During stimulation the impedance value was limited to 20 k for operation of the device; actual impedance values typically were below 10 k and impedances were observed to be stable throughout the duration of the experiment. tDCS channel impedances were continually monitored at 1 HZ. To achieve maximum focality for the targeted brain regions of interest, electrode placements were derived using HD Targets (Soterix Medical) with stimulation targets in the left M1 (right posterior field orientation model) and right DLPFC (left anterior field orientation model) and possible electrode locations were defined using standard 10-10 electrode locations (see **Figure 2**). HD Targets uses a MRI-derived finite element brain model that provides predictions for current flow and alignment for multiple interacting electrodes; this model was used to calculate maximal

FIGURE 2 | Neuroimaging and tDCS experimental setup for DLPFC (A,C) and M1 stimulation (B,D). (A,B) EEG locations are denoted in blue and follow the 10–20 locations where possible. fNIRS sources (red) and detectors (green) are shown over the left-M1 and right DLPFC with channels depicted as orange lines (M1 channels: FC3-FCC5h, FC3-FCC3h, C5-FCC5h, C3-FCC5h, C3-FCC3h, C1-FCC3h, C5-CCP5h, C3-CCP5h, C3-CCP3h, C1-CCP3h; DLPFC channels: AFF6h-AFF2h, AFF6h-F4, F2-AFF2h, F2-F2, F2-FFC4h, FFC6h-F4, FFC6h-FFT8h, FFC6h-FFC4h, FC4-FFC4h, FC4-F4) tDCS electrodes are denoted in purple (cathodes) and yellow (anodes) and follow the current values specified in Section Neuroimaging CandDBal prefrontal cortex (DLPFC [e confidence bound was >4x the size of the positive confidence bound]). Predicted electric field intensities from the maximum focality montages from the Male 1 model in the Soterix HD Targets software (Soterix Medical).

focality and intensity for regions of interest. For M1 stimulation, this resulted in current values of: CP1 = 1244µA, CP3 = 745µA, FP1 = −417µA, F8 = −448µA, and F9 = −1124µA. For DLPFC stimulation, current values were F6 = 1511µA, FC6 = 482µA, AF8 = −271µA, AF4 = −283µA, and FP2 = −1439µA (see **Figure 2**). The predicted field intensities at the target locations were 0.56 V/m (DLPFC) and 0.45 V/m (M1). Groups are denoted as: DLPFC stim, DLFPC sham, M1 stim, and M1 sham.

#### Procedures

All participants performed flight simulation training, N-back, finger tapping, situational awareness, and resting-state assessments once per day for four consecutive daily sessions (see **Figure 1**). Resting-state brain activity was collected for 1 min both before and after the experiment. During resting scans, subjects observed autopilot flight (level flight at 5000 ft. altitude) and were instructed to keep their eyes open and observe the visual scene while keeping their hands in their laps. Following the pre-experiment resting-state assessment, motor reference scans were taken during a simple motor sequence task in which subjects were instructed to touch each fingertip with the thumb of the right hand in sequence/cycle, continuously for 30 s (**Figure 1A**, finger-tapping task). We analyzed neuroimaging data recorded during the finger-tapping task as a confirmatory



measure (see Supplementary Figures S4, S5), where sensorimotor network activity was expected to be evident in EEG as increased power in the beta band, and reduced power in the alpha band, compared to baseline, and in fNIRS as an increase in deoxygenated hemoglobin beneath M1 sensors.

Participants then performed the N-back task followed by a series of basic flight training exercises including a situational awareness task, climbing to fix altitude, turning at a constant roll angle, and a controlled descent. Follow these fight control tasks participants performed a series of landing task including the "easy landing" task, nighttime landing, nighttime landing without runway lights, a landing in mountainous terrain, and a landing in turbulent weather (**Figure 1A**). Results for the situational awareness, free flight, climb to fixed altitude, heading change at constant roll angle, descent at constant vertical speed, nighttime landing, no-lights landing, mountain, and turbulence landing task are the subject of subsequent manuscripts.

#### N-Back

The Brain Workshop N-back task was implemented in this study (Paul Hoskinson, V.4.8.8 http://brainworkshop. sourceforge.net/). Participants monitored position and image for N-back matches without audio feedback. Custom N-back images were used, showing airplanes in eight different orientations with 1 of 3 possible flight numbers (24 total image possibilities, see **Figure 1B**). Subjects completed six blocks of 20 trials each day. Every subject began the N-Back task at the 1-back level, and was instructed at the beginning of each block to focus on a central fixation point. Subjects were free to move their eyes during the task. Upon reaching an upper threshold of accuracy within a given block (>80%), the task difficulty was increased (N + 1) using an adaptive threshold paradigm (Jaeggi et al., 2008). Upon reaching a lower threshold of accuracy (<20%), the task difficulty was decreased (N − 1). Each time this occurred, the changes were explained to the subjects between blocks. At the completion of each block, subjects were allowed to review the rules and ask clarifying questions about the tasks.

#### Autopilot Landing Observation

Subjects viewed a replay video of the autopilot executing an "optimal" landing from ∼800 ft. altitude onto a runway. Initial aircraft position was aligned with the runway and aircraft was already maintaining proper vertical speed for ideal glide slope. This scenario presents a wide, long, flat runway with no visual obstructions and no landscape features that interfere with landing the aircraft. Subjects were instructed not to manipulate controls or control the simulation in any way, but were told to pay close attention to the flight parameters through the instrumentation, as well as the visual field displayed by the simulator as the aircraft proceeded with landing. Particular emphasis was placed on two key parameters: azimuth (20◦ ) for runway alignment, and vertical speed (∼700–800 ft./min) for appropriate glide slope. Attention was also drawn to the final control input to landing (pitch up at ground contact), and subjects were instructed to minimize landing force (G-force) as a top priority. Once the autopilot landing was viewed in its entirety, subjects were given the opportunity to ask questions about the landings. Most subjects asked very few, if any questions, typically on the first trial day.

#### Easy Landing Task

Subjects were instructed to complete the landing task as shown by the autopilot under daylight conditions and 100% visibility. Subjects attempted landing under these conditions a total of 5 times per day. As the subject attempted replication of the autopilot landing, the experimenter made observations in three categories: (1) Vertical speed maintenance; (2) Runway alignment; and (3) Final approach dynamics (pitch angle at touchdown). Any large deviations from the autopilot in any of these modalities were noted, then provided as feedback to the subject after the plane had touched down and the simulator paused. When given, feedback was ∼1–2 min in length and conducted in an informal manner. The time duration of feedback also shortened throughout training as the subject made fewer errors. Following feedback, the subject was offered opportunity to ask any questions regarding landing technique, then the scenario was restarted. If subjects passed beyond the terminal end of the runway, the attempt was ended and the landing listed as "missed landing." This counted against the number of subject attempts (i.e., attempts were not repeated due to missed landing). Feedback methods and handling of missed landings was identical for all landing task.

#### Data Analysis EEG

EEG data were preprocessed using EEGLAB (Delorme and Makeig, 2004) by applying a 0.5 Hz high-pass filter (Butterworth, 12 dB/oct) and removing bad channels (max = 19%). Adaptive Mixture Independent Components Analysis (AMICA) (Delorme et al., 2012) was then used to detect and remove artifacts associated with eye blinks, vertical and horizontal electrooculogram, electrocardiogram, and tDCS-related voltage fluctuation. Following artifact rejection using AMICA, data were back-reconstructed and channels removed prior to AMICA decomposition were interpolated back into the data by spherical interpolation. Blocks corresponding to N-back, resting-state, and Easy Landing tasks were then segmented from the data.

Frequency decomposition was performed using FieldTrip (Oostenveld et al., 2011) by first segmenting data for each task into sequential 1-s epochs. Data were then windowed using a hanning taper, and frequency content of each trial was assessed at 1 Hz increments from 4 to 7 Hz (theta-band) or 8–12 Hz (alpha-band) using Fast Fourier Transform (multitaper method). After frequency decomposition, epochs with average theta or alpha power greater than two standard deviations from the mean were rejected, and remaining epochs were averaged for each participant, training day, and task. Data missing due to equipment issues (i.e., amplifier battery failure: N = 4, stimulus trigger errors: N = 1, or excessive noise/artifact during recording which could not be removed with AMICA: N = 7) were replaced with the mean for that participant group and training day prior to statistical analysis. We verified sensorimotor network activity during the finger-tapping task on the first day of flight simulator training (prior to tDCS) within baseline-subtracted beta and alpha band power maps, calculated across all subjects (Supplementary Figures S4, S5).

Participants receiving tDCS were compared with sham tDCS participants at each of the 4 days of training using independentsamples t-tests, which separately tested differences in alphaband and theta-band activity at each sensor. Additionally, day 1 was compared to day 4 within each tDCS group and sensor using paired t-tests to assess training-related effects on alphaband and theta-band activity. Statistical tests were corrected for multiple comparisons using cluster-based permutation tests (500 repetitions, data point α = 0.05, cluster-level α = 0.05, minimum spatial extent = 2 channels). Results from these comparisons are reported separately for each cluster of significant differences between groups/conditions. We calculated mean alpha/theta band power within clusters for use in examining relationships between task-related EEG and fNIRS/behavioral data.

We also examined correlations between behavioral measures, fNIRS beta values, and mean theta/alpha power across clusters identified during cluster-based permutation tests comparing days 1 to 4. fNIRS beta values were unavailable for 7 subjects (4 active M1 subjects and 3 sham M1 subjects) because time stamps could not be parsed from the fNIRS data files; therefore, the number of participants used in this analysis were: M1 stim = 6, M1 sham = 5, DLPFC sham = 7, and DLPFC stim = 7. These correlations were examined only within the stimulation groups where significant clusters were identified. To investigate relationships between midline frontal theta-band activity (Midline frontal theta-band activity was calculated as the mean theta power across electrodes Fz and FC1, the electrodes nearest to medial prefrontal cortex) and behavioral measures in the easy landing and N-Back task, Pearson correlation statistics were examined. We compared midline frontal theta-band activity in the easy landing task with autopilot displacement, g-force at landing, vertical speed at landing, roll at landing, pitch at landing, or online/offline learning rates for number of control inputs, autopilot displacement, vertical speed deviance from autopilot, or vertical speed variance. In the N-Back task, we compared midline frontal theta-band activity with average N level achieved and online/offline learning rates. Correlations were examined separately for DLPFC and M1 groups, stim and sham groups, and days of training. Because of the large number of correlation statistics examined, we used a conservative alpha of 0.001 to determine statistical significance. We additionally report statistics with a relaxed alpha of 0.05; however, these effects will be considered trends in this analysis.

In addition to cluster-based permutation tests across all channels, 3-way split-plot ANOVA was used to compare midline frontal theta-band activity between tDCS conditions (stim and sham), days of training (day 1, 2, 3, and 4), and training block (Block 1, 2, 3, 4, and 5) for the N-back and easy landing tasks. Huynh-Feldt epsilon was used to correct degrees of freedom for assumptions of sphericity, and Fishers Least Significant Difference corrections of alpha were used for simpleeffects/pairwise comparisons (Maxwell and Delaney, 2004).

#### fNIRS

fNIRS data was processed within the nirsLAB analysis package (NIRx Medical Technologies, Glen Head, NY; Xu et al., 2014). The Gratzer Spectrum was used to measure the absorbance


TABLE 2 | Average ± standard deviation of day 1, day 4, and day 4–day 1 Hboxy, Hbdeoxy, and Hbtot concentrations across subjects and channels for M1 and the DLPFC.

All values are presented are in mM concentration units. Bold numbers indicate a significant difference between days 4 and 1 as determined by a SPM (see Materials and Methods Section fNIRS).

spectra of Hbdeoxy and Hboxy, with average wavelengths of 760 and 850 nm, respectively. The corresponding molar extinction coefficients ε are εHboxy [1097.0 781.0] cm-1/M and εHbdeoxy = [645.5 1669.0] cm-1/M, (nirsLAB, NIRx Medical Technologies). The differential path lengths were 5.98 for Hboxy and 7.15 for Hbdeoxy (Essenpreis et al., 1993). In the Beer-Lambert law calculation, the distance between source-detector pair was =< 3.5 cm, and the exact distances were computed within NIRSLab according to the corresponding distances on the headcap.

Hbdeoxy, Hboxy and Hbtot concentration time series were band-pass filter from 0.01 to 0.2 Hz (finite impulse response with least-squares error minimization), to remove slow drifts in the signal and respiratory and cardiac rhythms. Inter-trail data was removed from the time series, and the average baseline concentration values were subtracted from the task-evoked concentration measurements.

The average concentration value of Hbtot, Hboxy, and Hbdeoxy were computed separately for each channel, subject, task, and day. Concentration values were averages within days, across all 20 trails of each of the 6 blocks in the N-back, and all 5 trials of the easy landing task. Individual channel concentration values were then averaged across channels within regions (M1 and DLPFC) and across subjects within each group. Day1 groupaveraged concentration values were then subtracted from Day 4 concentrations to compute the change in concentrations across the duration of the experiment.

Statistical significance of group-averaged concentrations changes from days 1 to 4 was determined using Statistical Parametric Mapping (SPM version 8). SPM was performed based on a general linear model of the canonical hemodynamic response function, with a discrete cosine transformation used for temporal filtering. A t-statistic-thresholded, baselinesubtracted Beta image was generated for each subject for baseline-subtracted, task-evoked Hbtot, Hboxy, and Hbdeoxy concentrations for days 1 and 4 (corrected for multiple comparisons across channels using the Bonferroni correction: # channels = 20, p < 0.0025). Paired t-statistic maps (subtracting the day 1 from day 4 betas) were generated from baseline-subtracted, trial/block-averaged (within day n = 5 Easy landing, n = 6 N-back) task betas obtained from individual subjects. If a t-statistic exceeded the corrected p-value threshold of 0.0025 the days 4–1 concentration values were determined to be significant (**Table 2**—denoted by bolded values).

Channel-wise statistical analysis was performed on all channels for measurements of Hboxy, Hbdeoxy, and Hbtot days 4–1 concentrations in easy landing and N-back for all subjects. Significance was determined if the trial-wise average exceeded 3.5 standard deviations from the null hypothesis of no concentration change (Bonferroni corrected, two-sided, Fischer's test p < 0.00035).

# Behavioral Performance

#### **N-Back**

Raw percent accuracy values for each subject and for each block were scaled according to the information content required for each back condition. A 100% score on a 1-back trial requires both an image match: 9 possible plane orientations, 4 possible flight numbers, and a position match: 9 possible

FIGURE 3 | N-back learning rates across experimental groups. The average group-learning rate is shown for each group in 1 and 2 back trials (left) and for all back trials (right column scaled by information content see Section N-Back). Learning rates computed by combining across position and image match trials (top row), for position trails (middle row) and image trials (bottom row) are shown.

spatial location, a 100% score on 2-back doubles the required information kept in working memory, and a 100% score on a 3-back trial triples this value. The normalization weights used for the 1–, 2–, and 3–back raw accuracy values were therefore 0.33, 0.66, and 1.0. Alternative normalization schemes (e.g., bit-wise maximum information and log-scaling) did not generate substantial differences in the outcome metrics. Learning rates were determined by the slope (±standard deviation) of a linear regression over block-wise group-averaged scaled percent accuracy: (1) across all 4 days (overall learning rate), (2) within each day independently (online learning rate), and (3) between the accuracy of the first trial of the nth day and the last trial of the n-1th day (offline learning rate) (Reis et al., 2009). Meta-learning rate was determined from the slope of the linear regression over the combined online/offline learning rate time series (the rate of change in the learning rates over time). The average number of trials for each group to reach the 2 and 3 back levels in the adaptive N-back task and the average streak (number of consecutive trials) at 2 and 3 back were calculated for each group. Learning rates were compared using one-sample against zero or paired, two-tailed t-tests (both α = 0.05) were noted.

#### **Easy landing**

G-force assessment. Flight parameters were sampled from the simulator at 10 Hz, including altitude (above ground level), longitude, and latitude. The derivative of the vertical speed of the aircraft at runway touchdown determined the landing impact g-force (acceleration divided by 9.8 m/s<sup>2</sup> ). Smaller g-force landings reflected improved skill with the landing task as subjects were asked to minimize this value to the best of their ability for each trial. The impact g-force is a "one-shot" assessment of landing skill at the most difficult and critical phase of the landing task, while ignoring other factors of landing performance (e.g., approach, glide slope, alignment, aircraft attitude). Online, offline, and meta learning rates are negative indicating a reduction in the applied G-force at landing (Supplementary Table S2, **Figure 4**).

Flight path deviation. Latitude, longitude, and altitude were transformed into Cartesian (X-Y-Z) coordinates and the Euclidean distance between coordinates of subject and autopilot were computed over the flight path 6sqrt(x2+y2+z<sup>2</sup> ) using a moving window average to resample and align the flight paths. The Euclidean distances for each sample were then summed in order to provide the total deviation from the autopilot flight path. Unlike G-force, this metric takes into account the entire approach, including all flight maneuvers leading up to the final descent and touchdown. This measure, however, does not take into account proficiency with aircraft controls or avionics; it merely assesses the ability of the subject to adhere to the reference flight path. Subjects were instructed to replicate the flight path of the autopilot landing observation. With this metric, a better landing would have lower deviation values (Supplementary Table S2, **Figure 5**).

Vertical speed deviation. The vertical speed of the subject throughout the landing trial was subtracted from the vertical speed of the autopilot landing at each time step and summed as in the flight path deviation. The vertical speed profile of the aircraft is stereotypic for an excellent landing and this parameter is visible on the aircraft's instruments. Subjects could therefore be reasonably expected to match the vertical speed of their aircraft with that of the example shown during autopilot

FIGURE 5 | G-force at moment of landing results across all four experimental groups. (A–D) Average g-force at moment of landing across all 4 days is plotted for each group. Note reduction in between-subject variance in days 3 and 4 of the DLPFC stim group. (E) Online and offline g-force learning rates are plotted for each experimental group across the duration of the experiment. Whole numbers on the x-axis represent the average online learning rate (slope of scaled percent correct linear regression for each subject across 6 blocks within a day) and ½ numbers on the x-axis represent offline learning rate (slope of the percent correct on the last trials of the N-1 day to the first trial of the Nth day). Smaller G-force indicates improved performance.

observation (vertical speed maintained at 600 ft./min for majority of approach, see **Figure 1D**). Replication of the autopilot-derived demonstration flight should result in lower overall vertical speed deviation values as performance improves (Supplementary Table S2, **Figure 5**).

Vertical speed variance. The amount of vertical speed variation throughout the landing approach was summed across trials to represent the degree to which a subject could maintain a steady, continuous descent. This measure does not penalize the subject for deviating from the ideal flight path, it merely assesses the degree to which the subject can maintain a smooth descent with little variation. This removes the goal-directed aspect of flight parameter maintenance while focusing on the motor aspect of flight parameter maintenance. As a means of comparison, the autopilot flight data only changes vertical speed in the final 5 s before landing, which minimizes this variance in the autopilot. In the ideal scenario, vertical speed stays constant, with only slight changes necessary for the final phase of landing; therefore, smaller variances indicate superior flight performance (Supplementary Table S2, **Figure 6**).

Control input measure. The number of control inputs was computed over landing trials by identifying the number of signchanges in the vertical speed parameter throughout the landing period. This metric identifies to what extent subjects could maintain a consistent vertical speed profile (negative vertical speed indicates descent, positive ascent). Since maintaining vertical speed with minimal control input adjustment does not require specific planning of actions or prediction of flight path, it was hypothesized to be a primarily motor-processing focused measure. The number of sign-changes in the vertical speed variable was summed between start and end of the landing. As a means of comparison, the autopilot had 1 major control input at the nose flair ∼1 s. before touchdown.

#### **Outlier rejection**

For each metric, trial-wise data were examined for outliers across subjects across all groups. If any trial exceeded three standard deviations from the mean, it was determined an outlier and removed from analysis. Outlier rejection was performed on a trial-wise basis for all computed metrics.

#### **Group variance analysis**

For each metric, the variance in the average online learning rate was computed as the change in the group's average accuracy treating each subject's performance in a trial as a repeated sample within days. For the n-back task, the metric used was the scaled percent accuracy across 6 trials per day. For the easy landing task, flight metrics, performed over the course of 5 landing trials per day. This measure is the variance in the online learning rate linear regression (Reis et al., 2009). Significant differences in learning rate variance was assessed with a two-sample F-test for equal variances. The null hypothesis that two independent samples of two subject pools come from a single normal distribution with the same variance was tested against the alternative that they come from two normal distributions with different variances. F-stat criticality was computed by generating a F cumulative distribution function appropriate to the variance ratio and degrees of freedom of sample pools. The resulting critical values are asymmetric and can be used at either tail. We were then able to determine the distance between computed F and F = 1 (null hypothesis). Across days, Bartlett's test was performed to test the hypothesis of equal population variance across groups. This test was performed on the average subject metrics (across trials within days) to preserve sample independence. Reported p-values represent the probability of observing the given result by chance if the null hypothesis were true (Snedecor and Cochran, 1989).

# RESULTS

# Finger-Tapping Task

As expected, the finger-tapping task induced beta band oscillatory activity and increased the concentration of Hbdeoxy over sensorimotor cortex, and reduced alpha band activity over frontal and parietal cortex compared to baseline (Supplementary Figures S4, S5). Power in the beta band was greatest over left sensorimotor cortex, contralateral to the hand used during the finger-tapping task.

#### N-Back Task Behavioral Results **DLPFC stimulation**

The DLPFC stim group showed significant overall learning in five separate learning rate measures, compared to two significant learning rates observed for the DLPFC sham simulation group (one-sample, two-tailed t-test, see **Figures 3**, **4**, and Supplementary Table S1). Significant overall learning was observed for the DLPFC stim group collectively across all trial types (combining image and position match trials—denoted as "combined trials," and for position match trails) aggregating across all 1/2/3 back trials (scaled according to the methods in Section N-Back). Significant overall learning was also observed for the DLPFC stim group in 1-Back combined, position, and image trials. Significant overall learning was observed for the DLPFC sham group for combined and position trials, across all backs (see **Figure 3**). Meta-learning regressions did not show statistically significant changes in learning rates between stim and sham groups.

Neither the initial nor the final behavioral performance were significantly different between DLPFC stimulation and sham groups (Supplementary Table S1). The average trial duration to reach 2-/3-back was not significantly different between stimulation and sham groups. In addition, the average number of trials to reach 2-/3- back and the average 2-/3- back streak durations were not statistically different between groups. Significant differences in online, offline, and combined learning rates were not observed between stimulation and sham groups (see Supplementary Table S1, **Figure 4E**).

The variance in the DLPFC stim group's learning rate was significantly less than the variance of the DLPFC sham group. Examined across days, the DLPFC stim group had significantly reduced variance compared to the DLPFC sham group on Day 3 of experimentation [Chi(1) = 5.77, p < 0.02, see **Figures 4A,B**]. Examined at the trial-level, the reduced variance reached statistical significance in >33% of individual N-back trials comparing DLPFC stim with DLPFC sham, and no trials showed greater variance in the DLPFC sham group [Day 1 Trial 2: F(6, 6) = 0.21; Day 1 Trial 4: F(6, 6) = 0.18; Day 1 Trial 6: F(6, 6) = 0.23; Day 2 Trial 5: F(6, 6) = 0.095; Day 3 Trial 1: F(5, 5) = 0.069; Day 3 Trial 2: F(5, 5) = 0.17; Day 3 Trial 6: F(5, 5) = 0.15; Day 4 Trial 6: F(5, 4) = 0.021; p < 0.05]. These results support the hypothesis that tDCS of the right DLPFC would reduce the variability in individual learning rates in a cognitive task.

#### **M1 stimulation**

The M1 stim group showed significant overall learning in five separate learning rate measures, compared to six significant overall learning rates observed for the M1 sham stimulation group. Significant overall learning was observed for the M1 stim group for combined, image and position trails aggregating across all 1/2/3 back trials. Significant overall learning was also observed for M1 stim in 1-Back combined and image trails. Significant overall learning was observed for the M1 sham group for combined trails, aggregating across all backs and for 1 and 2-Back trials as well as for position trails (all and 2-Backs) and 2-Back image trials (see **Figure 3**). Meta-learning regressions did not show statistically significant changes in learning rates between stim and sham groups.

As with the DLPFC groups, initial and final behavioral performance between stimulation and sham groups was not significantly different (Supplementary Table S1). The average duration to reach 2-/3-back was not significantly different between stimulation and sham groups. In addition, the average number of trials to reach 2-/3- back and the average 2-/3- back streak durations were not statistically different between groups. Significant differences in online, offline, and combined learning rates were not observed between stimulation and sham groups (see Supplementary Table S1 and **Figures 3**, **4E**).

Unlike the results observed for DLPFC stimulation, M1 stimulation resulted in minimal differences in learning rate variance between stimulation and sham groups. Only 1 trial showed reduced M1 stim variance compared to M1 sham variation, while there were 3/24 trials that indicated smaller M1 sham variance when compared with the values from the M1 stim group. Examined across days, no trials shows significant differences in variance under Bartlett's Test.

#### FNIRS Results

#### **DLPFC stimulation**

Hboxy. Exclusion criterion for individual FNIRS channels were greater than 0.001 mM fluctuations between the maximum and minimum measured concentrations during baseline. We did not observe any concentration fluctuations above this cutoff threshold for any of the 20 FNIRS channels (10 above the DLPFC and 10 above the M1 cortex) across all 4 days of recording for the 25 subjects analyses (7 DLPFC stim, 7 DLPFC sham, 6 M1 stim, 5 M1 sham). Subjects were not included (n = 3 in M1 stim, and N = 3 in M1 sham) if event time stamps could not be identified robustly within the fNIRS data files.

Average Hboxy concentrations across subjects and channels significantly increased between day 1 and day 4 in M1 channels, and significantly decreased in DLPFC channels for the DLPFC sham group and (see **Table 2**, **Figure 7**). Individual channel analysis showed no significant change in Hboxy concentrations from days 1 to 4.

Hbdeoxy. Average Hbdeoxy concentrations across subjects and channels significantly decreased between day 1 and day 4 in M1 and DLPFC channels for the DLPFC sham group (see **Table 2**).

Hbtot. Like Hboxy, average Hbtot concentrations across subjects and channels significantly increased between day 1 and day 4 in M1 channels for the DLPFC sham group, and significantly decreased in DLPFC channels for the DLPFC sham group (see **Table 2**). Individual channel analysis showed no significant change in Hbtot concentrations from days 1 to 4.

#### **M1 stimulation**

Hboxy. The average Hboxy concentration across subjects and channels within the DLPFC channels significantly decreased between days 1 and 4 in the M1 stim group (see **Table 2**). Individual channel analysis shows no significant change in Hboxy concentrations from day 1 to 4.

Hbtot. The average Hbtot concentration across subjects and channels within the DLPFC channels significantly decreased between day 1 and 4 in the M1 stim group (see **Table 2**). Individual channel analysis shows no significant change in Hbtot concentrations from days 1 to 4.

#### EEG Results

#### **Theta (4–7 Hz)**

DLPFC stimulation. In each day, significant differences in thetaband power were found between DLPFC stim and sham tDCS groups in frontal/central electrodes (**Table 3**). In days 1–3, right frontotemporal theta power was higher in DLPFC stim participants, compared to sham. Statistical differences were distributed over midline frontal electrodes in day 4. Comparison of days 1 and 4 revealed a significant increase in midline frontal theta-band power in stim, but not sham participants (see **Figure 8A** and **Table 3**). Split-plot ANOVA comparing MFT in the N-back task revealed a trend-level main effect of tDCS group, with DLPFC stim participants showing a greater effect than DLPFC sham participants [F(1, 12) = 4.65, p = 0.052]. There was no main effect of training or interaction between tDCS group and day of training (p > 0.1).

M1 stimulation. For M1 stimulation, broadly-distributed differences in theta-band power were seen between stim and sham participants during N-Back performance on days 1 and 3, which were mostly left-lateralized, and were strongest near the site of stimulation (**Table 3**). Importantly, no differences between days 1 and 4 were seen for M1 stim or sham participants in the N-Back (see **Table 3** and **Figure 8A**). Split-plot ANOVA

#### TABLE 3 | Cluster statistics for comparisons of alpha- and theta-band power during the N-back task.


\*Reported t-values are the average t-statistic across all electrodes in a given cluster. \*\*Reported p-values are corrected for multiple comparisons using cluster-based permutation tests.

comparing MFT revealed a significant main effect of day of training [F(3, 48) = 3.23, p = 0.048]; however no significant or trend-level pairwise comparisons were found, There was no main effect of tDCS group or interaction between group and day of training.

#### **Alpha (8–12 Hz)**

DLPFC stimulation. Alpha-band power differences between DLPFC stim and sham groups were found in days 2 and 4 (**Table 3**). In day 2, frontal alpha power was greater in DLPFC stim than sham participants. In day 4, differences existed in parietal and occipital electrode sites, with DLPFC stim greater than sham. Differences between day 1 and 4 were found only for the DLPFC stim group, characterized by reduced alpha power at left temporoparietal sites (see **Table 3** and **Figure 8B**).

M1 stimulation. Greater alpha-band power was found for M1 stim compared to sham participants in days 1 and 3 (**Table 3**). These differences were distributed over frontal, central, and parietal electrode sites, mostly near the site of stimulation. No differences in alpha power were found in the comparison of days 1 and 4, for either M1 stim or M1 sham (see **Table 3** and **Figure 8B**).

#### **EEG/fNIRS/behavioral correlations**

We did not find any significant correlations between MFT or alpha power in the N-Back task and behavioral measures (i.e., average N level achieved and online/offline learning rates, p's > 0.05). No correlations were identified between fNIRS beta values and either MFT or alpha power in the N-Back task for any group (p's > 0.1).

#### Easy Landing Task G-Force

#### **DLPFC stimulation**

tDCS to DLPFC reduced the variability (standard deviation) of the third and fourth day online learning rates compared to sham (DLPFC stim: day 3 = 0.355, day 4 = 0.583; DLPFC sham: day 3 = 0.846, day 4 = 0.637, see **Figure 5**). First trial comparisons of variance between DLPFC stim and DLPFC sham groups in day 3 showed statistically significant changes in between-subject variance [F(4, 5) = 0.046, p < 0.02]. There were no significant differences in variance for first trials of day 1 and day 2 (p > 0.1). Trial 1 of day 4 also did not show significant changes in variance (p > 0.1). Examined across days, Bartlett's comparisons of variance between DLPFC stim and DLPFC sham groups in day 3 showed statistically significant changes in between-subject variance [Chi(1) = 7.33, p < 0.01]. There were no significant differences in variance for days 1, 2, or 4 (p > 0.1). These results support the hypothesis that tDCS of the right DLPFC would reduce the variability in individual learning rates in the easy landing task.

Learning rates were determined by computing the rates at which performance improved (i.e., reduction of G Force over time, see **Figures 5A–D**). Meta-learning regressions did not show statistically significant changes in learning rates between stim and sham groups. The DLPFC stim group exhibited positive meta-learning rates (DLPFC stim = 0.052 ± 0.090), where the DLPFC sham group, by contrast, showed overall negative metalearning (DLPFC sham = −0.051 ± 0.106), but this betweengroup difference did not reach statistical significance due to large within –group variance (p > 0.1; **Figure 5E**).

There were also no statistically significant differences in offline learning rates (p's > 0.1), but DLPFC stim showed a relatively strong offline learning between day 1 and 2 (−0.149 ± 0.52) compared to sham (0.124 ± 0.990).

The number of missed landings (did not land before or during the runway) was not different across groups over days (DLPFC stim: day 1: 2.9%, day 2: 2.9%, day 3: 0%, day 4: 0%; DLPFC sham:

day 1: 2.9%, day 2: 14.3%, day 3: 5.7%, day 4: 0%). Missed landings typically occurred on the first trial of the day.

#### **M1 stimulation**

tDCS to M1 resulted in no significant changes in inter-subject variance when compared to the M1 sham group across days (Bartlett's Test, p's > 0.1). M1 sham appeared to have unusually low variance during day 2 (**Figure 3D**), and this was determined to be a statistically significant reduction of variance when compared to the M1 stim group for three of the five trials of Day 2 [F(8, 7) = 6.32, F(8, 7) = 3.86, F(8, 7) = 5.07; p < 0.05]. However, this reduction in variance only applied to Day 2 and in single trials only in Day 1 and 3. All trials on Day 4 had no significant change between M1 stim and sham variances. There were no statistically significant changes in learning rates between stim and sham groups (Supplementary Table S2). As with DLPFC stim group, there were also no statistically significant differences in offline learning rates (p's > 0.1), but M1 stim showed a relatively strong offline learning between day 1 and 2 (−0.235 ± 0.676) compared to sham (0.258 ± 1.330).

Initial starting (day 1 average), group averaged (across all days and the final (day 4 average) G-forces were not significantly different between experimental and control groups (paired t-test p > 0.1), which are similar to the results found with the N-back task. Though performance improved in both sham and stimulation cohorts (reduced overall G-force), the ultimate performance of each subject group was similar. It is probable that the landing task was effectively learned over the course of four training days, and subjects reached a G-force performance ceiling.

The number of missed landings were not different across groups over days (M1 stim: day 1: 10%, day 2: 10%, day 3: 2.0%, day 4: 0%; M1 sham: day 1: 0%, day 2: 0%, day 3: 0%, day 4: 2.5%).

#### 3D Autopilot Displacement

No group (DLPFC stim/sham nor M1 stim/sham) exhibited statistically significant learning (i.e., reduction of flight path displacement over time), as inter-subject variability was very high for this metric (**Figures 6A–D**). All groups showed inconsistent positive and negative learning slopes, and though the M1 stim group had negative online learning slopes for all 4 days, none of these reached statistical significance (p's>0.1). When comparing group variances for 3D autopilot displacement over experiment days, the M1 stim group showed greater variance compared with the M1 sham group during Day 1 [Chi(1) = 8.46, p < 0.01]. The variance for the M1 stim group, however, was significantly lower than that of DLPFC stim group for 3 trials across 3 days of training [Day 1, Trial 1: F(5, 9) = 5.1032; Day 2, Trial 4: F(6, 9) = 6.03; Day 3 Trial 2: F(6, 7) = 3.99; p < 0.05]. Interestingly, this was also true for M1 sham vs. DLPFC stim [Day 1, Trial 1: F(5, 7) = 13.67; Day 2 Trial 4: F(6, 7) = 4.74; p < 0.001]. No other variance comparisons yielded statistically significant results (Supplementary Table S2, **Figure 6**). While the combined learning intercept for the M1 sham group was negative (−13140 ± 13300, p < 0.05) this resulted from an isolated day 1/2 offline learning rate with a large standard deviation (−29011 ± 41147).

#### Vertical Speed Variance

The M1 stim group exhibited the lowest average values of vertical speed variance on the final day of training (4.955 ± 0.433; Supplementary Figure S2C). This is similar to the M1 sham (Supplementary Figure S2D) value of 5.051 ± 0.502 and an improvement over DLPFC groups (5.684 ± 0.690 and 5.647 ± 0.718, for stim and sham groups, respectively, but this does not reach significance under 2-way ANOVA (p > 0.1; Supplementary Figures S2A,B). This appears to be derived from the relatively higher online/offline learning rates in both M1 groups as compared with the DLPFC groups, though the overall rates were not statistically significant. ANCOVA, covarying the learning rates of vertical speed variance with group identity, performed on this data shows that the slopes appear identical (p > 0.9 vs. null hypothesis) but the initial performance (intercept) approaches significance (p < 0.07).

Online, offline and meta-learning rates were largely flat, and training effects were not observes within any group (p's < 0.1 level (Supplementary Table S2, Supplementary Figure S2). Because vertical speed variation is primarily a motor-centric task, it may be subject to a different learning curve that was not specifically measured during this study.

#### Autopilot Vertical Speed Deviation

M1 sham showed significant overall offline learning, with smaller deviations of vertical speed on Day 4 as compared with Day 1 (−12.21 ± 2.51, p < 0.05, Supplementary Figure S3D). DLPFC sham (Supplementary Figure S3B) also had a negative slope indicating reduced deviation from ideal vertical speeds, but this was not statistically significant (−69.71 ± 952.145, p > 0.05). However, both of these offline learning effects were washed out when combined into overall learning rates across the 4 days (M1 sham: −5.98 ± 20.26; DLPFC sham: −17.09 ± 25.80). Overall performance did not significantly change over the course of the 4 days, and initial/final performance were not significantly different across groups (p's > 0.1; Supplementary Figure S3E).

Unlike tests of G-force and flight path deviation, F-tests do not show any significant difference for inter-subject variance during 1st trial comparisons across all groups (p's > 0.1).

#### Number of Control Inputs

Variance between subjects for both DLPFC groups appeared larger than that of both M1 groups (Supplementary Figure S1). However, no meta, online, offline, or combined learning rates reach significance, and no significant changes were observed between groups (see Supplementary Table S2 and Supplementary Figure S1).

# fNIRS Results

#### DLPFC Stimulation

**Hboxy**

Average Hboxy concentrations across subjects and channels significantly decreased between days 1 and 4 in M1 channels for the DLPFC stim and DLPFC sham groups, and decreased between days 1 and 4 in DLPFC channels for the DLPFC stim group (see **Table 2**, **Figure 7**).

Furthermore, individual channel analysis revealed that only two subjects (S7 in the DLPFC stim group, and S7 in the DLPFC sham group) showered a significant change in Hboxy concentrations from days 1 to 4 (−0.01 mM decrease at DLPFC channel: source AFF6h to detector F4 in DLPFC stim subject 7, and 0.02 mM increase at DLPFC channel: source FC4 to detector FFC4h in DLPFC sham subject 7, see **Figure 2** for channel locations). Within the DLPFC stim group, of all 70 channels measured across subjects in the DLPFC region (10 channels per subject, 7 subjects per group) 65 showed a decrease in Hboxy concentration from days 1 to 4 (compared to 40/70, 33/60, 325/50 for DLPFC sham, M1 stim, and M1 sham respectively).

#### **Hbdeoxy**

The average Hbdeoxy concentration across subjects and channels within M1 significantly increased between day 1 and day 4 in the DLPFC stim and DLPFC sham groups (see **Table 2**). Individual channel analysis shows no significant change in Hbdeoxy concentrations from days 1 to 4.

#### **Hbtot**

Average Hbtot concentrations across subjects and channels significantly decreased between days 1 and 4 in M1 channels for DLPFC stim and in the DLPFC channels for the DLPFC sham group, and increased in M1 channels for the DLPFC sham group (see **Table 2**). Individual channel analysis revealed that only one subject (S7 in the DLPFC stim group) showed a significant change in Hbtot concentrations from days 1 to 4 (−0.01 mM decrease at DLPFC channel: source AFF6h to detector F4, see **Figure 2**). Within the DLPFC stim group 64/70 channels in the DLPFC region showed a decrease in Hbtot concentration from

blocks, solid lines indicate 1/2 and 2/3-back block types in (A), and easy landing trials in (B)]. An upward displacement in Hboxy and Hbtot concentrations can be seen during pauses between subsequent blocks. (C) fNIRS t-statistic beta maps of Day 4 vs. Day 1 Hboxy (top) and Hbtot (bottom) in the Easy landing task. Images are the averages for the DLPFC stim (left) and DLPFC sham (right) groups (Bonferroni corrected p < 0.0025, see Table 2 for the corresponding concentration changes averaged over all channels within M1 and DLPFC).

days 1 to 4 (compared to 37/70, 34/60, 32/50 for DLPFC sham, M1 stim, and M1 sham respectively).

#### M1 Stimulation

#### **Hboxy**

Average Hboxy concentrations across subjects and channels significantly decreased between days 1 and 4 in M1 channels for the M1 stim group, and increased within M1 channels for the M1 sham group (see **Table 2**). Individual channel analysis shows no significant change in Hboxy concentrations from days 1 to 4.

#### **Hbdeoxy**

Average Hbdeoxy concentrations across subjects and channels significantly decreased between days 1 and 4 in M1 channels for the M1 stim group, and increased within M1 channels for the M1 sham group (see **Table 2**). Individual channel analysis shows no significant change in Hbdeoxy concentrations from days 1 to 4.

#### **Hbtot**

The average Hbtot concentration across subjects and channels significantly increased between days 1 and 4 in M1 channels for the M1 sham group (see **Table 2**). Individual channel analysis shows no significant change in Hbtot concentrations from days 1 to 4.

### EEG

#### Theta (4–7 Hz)

#### **DLPFC stimulation**

In each day, significant differences in theta-band power were found between DLPFC stim and sham groups in frontal/central electrodes (**Table 4**). In days 1 and 3, right frontotemporal theta power was higher in DLPFC stim participants. Statistical differences were more broadly distributed in days 2 and 4, encompassing bilateral frontotemporal and midline frontal electrode sites. Comparison of days 1 and 4 revealed a significant increase in midline frontal theta-band power in DLPFC stim, but not DLPFC sham participants (see **Figure 8A** and **Table 4**). Splitplot ANOVA comparing MFT in the easy landing task revealed a significant main effect, with DLPFC stim greater than DLPFC sham [F(1, 12) = 4.86, p = 0.048]. Additionally, an interaction was found between group and day of training [F(3, 36) = 4.54, p = 0.014]. Simple-effects comparisons revealed increased MFT in stim compared to sham for only day 4 [day 4: F(1, 12) = 6.47, p = 0.026]. Simple-effect of day within the DLPFC stim group reached trend-level significance [F(3, 16) = 3.15, p = 0.087].

#### **M1 stimulation**

Theta-band differences between M1 groups during the easy landing task were found only in day 3, and were restricted to central/parietal electrodes (**Table 4**). Broadly-distributed differences in theta-band power were seen between days 1 and 4 in M1 stim participants, but not M1 sham participants (see **Table 4** and **Figure 8A**). No main effects or interactions were found in ANOVA comparing MFT in the easy landing task between M1 groups.

TABLE 4 | Cluster statistics for comparisons of alpha- and theta-band power during the Easy Landing task.


\*Reported t-values are the average t-statistic across all electrodes in a given cluster. \*\*Reported p-values are corrected for multiple comparisons using cluster-based permutation tests.

# Alpha (8–12 Hz)

#### **DLPFC stimulation**

Significant differences in alpha-band power were found between DLPFC stim and sham groups in parietal/occipital electrodes (day 1) and frontal/central electrodes (day 3), with greater power in the DLPFC stim group (**Table 4**). No differences in alpha power were found in the comparison of day 1 and 4, for either DLPFC stim or sham groups (see **Table 4** and **Figure 8B**).

#### **M1 stimulation**

Alpha-band differences between M1 groups during the easy landing task were found only in day 1, and were broadly distributed over frontal, central, and parietal electrode sites

(**Table 4**). Two separate clusters of significant differences in alpha-band power were seen between days 1 and 4 for M1 stim participants, but not M1 sham participants (see **Table 4** and **Figure 8B**). The first cluster revealed increased alpha power in the M1 stim group over parietal electrode sites. The second revealed decreased alpha power in right temporal electrodes.

#### **EEG/fNIRS/behavioral correlations**

theta power were found in the Easy Landing task.

There were positive correlations between change in MFT power (day 4 minus day 1) and both average Hbtot and average Hboxy beta values in DLPFC channels for M1 stim subjects. The direction of this correlation indicates that increased theta from days 1 to 4 is correlated with less reduction of Hboxy/Hbtot from days 1 to 4 in DLPFC fNIRS channels (**Table 5**). There were also strong negative correlations between change in alpha power and both average Hbtot and average Hboxy beta values in M1 channels for M1 stim subjects, indicating that increased parietal alpha power is correlated with reduced fNIRS beta values. No correlations were identified between theta/alpha power and fNIRS beta values for sham groups, there was no correlation between theta and fNIRS beta values at M1 channels, and there was no correlation between alpha power and fNIRS beta values at DLPFC channels (p's > 0.1).

#### DISCUSSION

#### Overview

In this study, we measured task-evoked changes in functional neural activity and the modulation of learning from tDCS to the right DLPFC or left M1. Simultaneous fNIRS and EEG measured changes in neural activity as subjects learned to complete flight simulator and n-back training exercises at increasing levels of expertise across four daily consecutive sessions. Assessment of



VSDFA, Vertical Speed Deviance from Autopilot.

TABLE 6 | Summary of behavioral and neurophysiological results.


Parietal Alpha Power × Hboxy −0.94 0.005

behavioral performances were performed on n-back accuracy, flight metrics of landing performance, as well as for online and offline learning rates associated with practice and skill acquisition. We report that tDCS to the right DLPFC reduced the variability in online learning across individuals in the n-back task, and in g-force on the easy landing task. This was associated with decreased Hboxy and Hbtot in the DLFPC across days for the landing task, and increased MFT power in both the n-back and landing tasks. Additionally, tDCS to the left M1 increased tonic parietal alpha power, which was correlated with changes in Hboxy and Hbtot at M1 fNIRS channels.

#### Interpretation—Behavior

The observed reduction in group variability in online learning may be attributed to "convergence to the mean" (i.e., increasing online learning rates of low performing individuals and reducing online learning rates of high performing individuals). Subjects may have employed distinct cognitive and behavioral strategies, with correspondingly different brain networks, to complete and learn the n-back task across sessions. tDCS of the right DLPFC may have therefore facilitated the deployment and consolidation of a particular strategy in some subjects, and inhibited certain behaviors in others. The variance in the learning rates did not arise from individual differences of untrained performance, as initial and final performances were similar (see Section Behavioral Results). Furthermore, the results could indicate that all groups reached a ceiling of behavioral performance, or that our measures are under-powered to detect a change in performance statistically, or that a reduction in individual variability produced this observation.

The variability results reported for the easy-landing task were specific to DLPFC stim subjects for the g-force metric [a similar reduction in variance was not seen for the same data in the autopilot displacement (**Figure 6**), the number of control inputs (Supplementary Figure S1), the variability of vertical speed (Supplementary Figure S2), and the vertical speed deviation from autopilot (Supplementary Figure S3)]. Since both the initial and final g-force values were not significantly different across stim and sham groups, the reduction in DLPFC stim group variability implies a similar convergence to the mean phenomenon observed for n-back learning. tDCS of the DLPFC may therefore, facilitate the learning of a smoother landing procedure in subjects who would otherwise consolidated an incorrect landing procedure and increased landing g-forces in subsequent days. Likewise, tDCS of the DLPFC may have hindered some subjects who would have otherwise consolidated a superior landing procedure and decreased landing g-forces in subsequent days.

It should be noted that for the measure of 3D autopilot flight path deviation (Section 3D Autopilot Displacement), it was not readily apparent to subjects when the aircraft deviated from the prescribed flight path of the autopilot; there is no visual field indication that they are deviating from the glide slope, and the Flight Director instrument does not indicate degree of displacement from optimal glide slope. Additionally, for deviation from the autopilots vertical speed (Section Autopilot Vertical Speed Deviation) is possible that, because vertical speed was a peripheral skill required for landing (i.e., non-essential for a successful landing), subjects did not train to maintain a low vertical speed deviation from the reference glide path. As subjects needed only to maintain one constant vertical speed during the landing task, they may have reached maximal capacity to do so beginning from day 1. The combined learning rates and online learning metrics seem to support this view (see Supplementary Table S2, Supplementary Figure S3). Furthermore, low-G Force landings can be performed from a wide range of glide slopes, which can mask large deviations from the "ideal" flight path.

### Interpretation—Neurophysiology

We observed an increase in MFT in the DLPFC stim group compared to the DLPFC sham group, as well as experiencerelated increase in MFT and decrease in central/parietal alpha in DLPFC stim, indicating increased working memory and attention (Klimesch et al., 1997; Jensen and Tesche, 2002; Ishii et al., 2014). Increased theta/alpha band activity in M1 stim compared to M1 sham near the site of stimulation may indicate greater motor cortex excitability (Sauseng et al., 2009). Furthermore, experience-related increases (day 4 vs. 1) in broad central/parietal theta/alpha in M1 stim during flight tasks implicate greater tactile/proprioceptive monitoring. For example, Baumeister et al. (2008) observed that increased parietal theta during goal-directed learning was associated with increased motor skill performance. Although MFT nor parietal alpha power increases were correlated with behavioral performance increases in this study, it is possible that increases in MFT or parietal alpha may be indirectly associated with cognitive performance enhancement. The significant correlations observed between MFT and online and offline learning of autopilot displacement, vertical speed variance and deviation in the M1 stim group support this hypothesis (**Table 5**).

We observed a decrease in Hboxy and Hbtot in DLPFC channels for the DLPFC stim group in the easy landing task (**Table 2** and results Hboxy and Hbtot). This evidence suggests that tDCS produced more efficient neural activation to consolidate the newly-learned procedural skills as has been previously reported (Wolf et al., 2007; Holland et al., 2011; Ayaz et al., 2012; DiStasio and Francis, 2013). Previous literature from McKendrick et al. (2015) suggest that some, but not all, of these changes may be related to the task performance enhancements associated with tDCS. However, changes in Hbtot concentration may also be related to task reward value (DiStasio and Francis, 2013), the recruitment of additional motor resources (Herff et al., 2013), or a behavioral ceiling effect where low-performing subjects were not able to advance to expert performance levels as shown by Ayaz et al. (2012). Although reward was not explicitly manipulated in the easy landing task, subject's motivations may have played a role based on their prior day's performance. Similarly, the motor resources required for the easy landing task may have changed as subjects learned more advanced motor programs to complete the task. Finally, a ceiling effect could explain the more efficient neural activation, and the trend in meta-learning for stim groups supports this theory (**Figure 5E**).

Hbtot and Hboxy were also significantly correlated with MFT from days 4 to 1 in the easy landing task in the M1 stim group. These results suggest that these separate neurophysiological measures are not totally independent. Future studies should examine the relationships between MFT, Hbtot, and behavioral performance in a larger cohort to determine whether these effects are truly concomitant.

# Relation to Prior Investigations of tDCS in Real-World Tasks

To date, there have been few studies in which procedural/real world learning tasks have been tested with a tDCS intervention (Izzetoglu et al., 2014; Nelson et al., 2014), and even fewer with a significant motor component as the focus of performance/training enhancement (Zhu et al., 2015). However, tDCS enhancement of real-world skills has been reported for complex motor control tasks. For example, Beeli et al. (2008) reported that anodal tDCS to either the left or right DLPFC (10/20 EEG site F3 or F4) significantly improved the care of driving style as measures by following distance, average speed and number of errors. Similarly, Sakai et al. (2014) reported that anodal tDCS to the right DLPFC significantly improved carfollowing and lane-keeping performance in a driving simulator task across days. Finally, Zhu et al. (2015) reported that cathodal tDCS to the left DLPFC suppressed verbal working memory but improved motor learning. The results presented here support these findings, as we observed that tDCS to the right DLPFC reduced online learning variability in higher cognitive measures (e.g., affecting the g-force value of landing by judging multimodal flight-data in a timely fashion, or n-back accuracy variance) more than those related motor planning or judgment (e.g., flight path deviation see **Table 6**).

Furthermore, real-world skill enhancement from right inferior frontal tDCS has been reported in a perceptual threat detection (Clark et al., 2012; Falcone et al., 2012), and tDCS of the DLPFC has been shown to increase regional cerebral blood oxygenation and behavioral performance in target detection in an air traffic control task (Nelson et al., 2014). The results presented here are indirectly related to these findings as the reduction in behavioral variance we observed from tDCS to the right DLPFC could be attributed to increase in spatial attention, vigilance, or perceptual discrimination (e.g., when to judge an n-back match or the correct time for a nose-flare maneuver during landing). We also observed that tDCS of the right DLPFC decreased Hboxy and Hbtot in DLFPC channels across days in the easy landing task. One possible explanation for the difference reported in cerebral blood oxygenation between the two studies concerns the disparate experimental designs employed. Here, all subjects returned for four consecutive days of testing, regardless of physiological or behavioral measures, whereas Nelson et al. (2014) had subjects return for days 2–4 only if performance and blood flow velocity declined over the course of the first 40-min session.

# Relation to Prior Investigations of tDCS in Working Memory

Previous studies have reported evidence that working memory improvements are correlated with the administration of tDCS in diverse contexts (Grafman et al., 1994; Nitsche et al., 2003; Dockery et al., 2009; McKendrick et al., 2015). Specifically, tDCS over DLPFC was associated with acute increases in working memory accuracy (Stagg and Johansen-Berg, 2013; Chhatbar and Feng, 2015; De Putter et al., 2015; Santarnecchi et al., 2015). Although, we observed a reduction in learning rate variance from tDCS to the right DLPFC in the n-back task, but did not find an increase in working memory accuracy for tDCS of either the DLPFC or M1. This discrepancy may be attributed to the adaptive n-back design employed here, the long durations of experimental sessions, and a potential ceiling effect from repeated tDCS and n-back sessions across consecutive days. In addition, the application of tDCS began directly prior to the n-back task (see **Figure 1**) and the effects of stimulation may require more time to produce the reported improvements in n-back accuracy.

#### Limitations and Future Directions

A goal of this research was to determine if tDCS stimulation would improve training techniques for pilots in a flight simulator. Such improvements could drastically reduce time and therefore the cost of training a pilot, as it would in any training environment. While our results show decreased variability in training, it is too early to confirm or deny any useful improvements to simulation training until an understanding of the sources and contributing factors to the observed behavioral variance is achieved.

Additional studies must be performed to further investigate n-back accuracy improvement with tDCS by comparing different stimulation montages, stimulation timing, and task paradigms. Because we were unable to parametrically manipulate these parameters in this study, we are unable to determine which of these factors may have led to null effects of tDCS on n-back accuracy. The baseline performance of individuals with differing initial skill levels in n-back and flight tasks are important, and measures of this were limited by the study design employed. In addition, the experimental design employed here (continuous, multiple tasks over 60 min duration) did not provide a sufficient means to control the endogenous brain state of subjects before and throughout the experimental session given the numerous tasks, and instructions and feedback required for subjects to perform them. Thus, subject's diverse experiences and resultant brain states throughout the session may be a significant factor in the interpretation of our findings. For example, the n-back task was performed near the beginning of the stimulation period, while the easy flight landing was performed near the end of the stimulation period. Future studies should examine relationships between tDCS effects and EEG microstates and/or brain metabolic activity.

Some of the null findings in this study were related to exceptionally high within-group variance. One potential method to examine within and across group behavioral variance is to categorize subjects by learning rate bins or perform a cluster analysis of tDCS responders and nonresponders. Since the same tDCS protocol may have variable effects across individuals, possibly due to neuroanatomical and neurophysiological differences, and that the same tDCS protocol may produce different effects within an individual over time, due to changes resulting from neural plasticity, the absence of post-hoc categorization of subjects likely reduces the statistical power and interpretability of our results (e.g., Supplementary Table S2). Future studies may benefit from real-time assessments and individualized tDCS planning rather than a "one size fits all" approach. While a priori selection or post-hoc classification of subjects within experimental groups can control for differences in baseline performance levels, it is not realistic when transferring this technology into real-world training environments.

#### REFERENCES


The high variability between subjects and the need for personalized training becomes more important when we recognize the subject pool for this experiment all fit the western, educated, industrialized, rich and democratic (WEIRD) population. Although this population of subjects for the experiment goal of pilot training was acceptable, we speculate that the inclusion of a wider demographic range of the world populous may produce an even larger variability in behavioral performance. Therefore, a systematic understanding of the sources and contributing factors to the observed behavioral variance is extremely important for the application of tDCS across a wider range of subjects.

# CONCLUSIONS

The results presented here underscore the importance of developing the understanding to identify and optimize neurostimulation protocols. Our results suggest that the time course of both online and offline learning is critical for the observed changes in working memory and procedural flight performance. Repeated training sessions reveal timedependent factors regarding the interaction between tDCS and the learning processes that remain unclear in the literature. Applying such interventions in the real-world will require a much larger investment than initially anticipated in order for the scientific community to measure and catalog the precise behavioral, learning, and neurophysiological changes resulting from each component of procedural skill acquisition. Because there appears to be a differential, region-based effect of neurostimulation interventions, it is critical to determine the optimal targets, stimulation parameters, timing relative to the target behaviors, and synchrony between innate learning processes and strategies and exogenous stimulation for maximally-effective augmentation.

# AUTHOR CONTRIBUTIONS

JC, MZ, and MP designed the experiments; JC and DB performed the experiments; JC, BC, and MP analyzed the data; and JC, BC, and MP wrote the manuscript.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fnhum. 2016.00034

dynamics with functional near infrared spectroscopy as a tool for neuroergonomic research: empirical examples and a technological development. Front. Hum. Neurosci. 7:871. doi: 10.3389/fnhum.2013. 00871

Ball, K., Berch, D. B., Helmers, K. F., Jobe, J. B., Leveck, M. D., Marsiske, M., et al. (2002). Effects of cognitive training interventions with older adults: a randomized controlled trial. JAMA 288, 2271–2281. doi: 10.1001/jama.288.18.2271


stimulation of the parietal cortex in a visuo-spatial working memory task. Front. Psychiatry 3:56. doi: 10.3389/fpsyt.2012.00056


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Choe, Coffman, Bergstedt, Ziegler and Phillips. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Does a Combination of Virtual Reality, Neuromodulation and Neuroimaging Provide a Comprehensive Platform for Neurorehabilitation? – A Narrative Review of the Literature

Wei-Peng Teo<sup>1</sup> \*, Makii Muthalib2,3, Sami Yamin4,5, Ashlee M. Hendy<sup>6</sup> , Kelly Bramstedt<sup>4</sup> , Eleftheria Kotsopoulos4,7, Stephane Perrey<sup>2</sup> and Hasan Ayaz8,9,10

#### Edited by:

Mikhail Lebedev, Duke University, USA

#### Reviewed by:

Richard B. Reilly, Trinity College Dublin, Ireland Zoltan Nadasdy, NeuroTexas Institute Research Foundation, USA Ilya Boristchev, IT Universe, Russia

\*Correspondence: Wei-Peng Teo weipeng.teo@deakin.edu.au

Received: 08 December 2015 Accepted: 25 May 2016 Published: 24 June 2016

#### Citation:

Teo W-P, Muthalib M, Yamin S, Hendy AM, Bramstedt K, Kotsopoulos E, Perrey S and Ayaz H (2016) Does a Combination of Virtual Reality, Neuromodulation and Neuroimaging Provide a Comprehensive Platform for Neurorehabilitation? – A Narrative Review of the Literature. Front. Hum. Neurosci. 10:284. doi: 10.3389/fnhum.2016.00284 1 Institute for Physical Activity and Nutrition (IPAN), Deakin University, Burwood, VIC, Australia, <sup>2</sup> EuroMov, University of Montpellier, Montpellier, France, <sup>3</sup> Cognitive Neuroscience Unit, Deakin University, Burwood, VIC, Australia, <sup>4</sup> Liminal Pty Ltd., Melbourne, VIC, Australia, <sup>5</sup> Adult Mental Health, Monash Health, Dandenong, VIC, Australia, <sup>6</sup> School of Exercise and Nutrition Sciences, Deakin University, Burwood, VIC, Australia, <sup>7</sup> Aged Persons Mental Health Service, Monash Health, Cheltenham, VIC, Australia, <sup>8</sup> School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, PA, USA, <sup>9</sup> Department of Family and Community Health, University of Pennsylvania, Philadelphia, PA, USA, <sup>10</sup> The Division of General Pediatrics, Children's Hospital of Philadelphia, Philadelphia, PA, USA

In the last decade, virtual reality (VR) training has been used extensively in video games and military training to provide a sense of realism and environmental interaction to its users. More recently, VR training has been explored as a possible adjunct therapy for people with motor and mental health dysfunctions. The concept underlying VR therapy as a treatment for motor and cognitive dysfunction is to improve neuroplasticity of the brain by engaging users in multisensory training. In this review, we discuss the theoretical framework underlying the use of VR as a therapeutic intervention for neurorehabilitation and provide evidence for its use in treating motor and mental disorders such as cerebral palsy, Parkinson's disease, stroke, schizophrenia, anxiety disorders, and other related clinical areas. While this review provides some insights into the efficacy of VR in clinical rehabilitation and its complimentary use with neuroimaging (e.g., fNIRS and EEG) and neuromodulation (e.g., tDCS and rTMS), more research is needed to understand how different clinical conditions are affected by VR therapies (e.g., stimulus presentation, interactivity, control and types of VR). Future studies should consider large, longitudinal randomized controlled trials to determine the true potential of VR therapies in various clinical populations.

Keywords: neurorehabilitation, neuroplasticity, tDCS, fNIRS, EEG, virtual reality therapy

# INTRODUCTION

fnhum-10-00284 June 22, 2016 Time: 13:26 # 2

In the last two decades, the application of VR training has become increasingly popular, not only as a means to enhance gaming experiences, but also in the education and healthcare settings to improve learning and rehabilitation outcomes. Particularly in the area of neurorehabilitation, the use of VR technology has shown great promise by providing a sense of realism during training, thereby promoting skill acquisition and retention, and inducing functional recovery (**Figure 1**; for review, see Adamovich et al., 2009).

In the context of neurorehabilitation, VR therapy can be described as a method of brain–computer interaction that involves real-time simulation of an environment, scenario or activity that allows for user interaction and targets multiple senses. In particular, the combination of VR and recent technological advances in robotic and haptic interfaces allow users a seemingly life-like interactional experience in a VE (Jung et al., 2012; Yeh et al., 2014). For example, VR has been used in clinical settings as a training tool for surgeons to learn intricate fine motor skills associated with precision surgery (Wang, 2012; Fang et al., 2014), and as a tool to deliver cognitive-based therapies (Kim et al., 2011; Kandalaft et al., 2013). More complex forms of VR presentation such as augmented VR (whereby VR is superimposed on the actual environment) and immersive VR (first-person interaction in a VR environment) brings the immersive experience to another level with technology such as head-mounted displays (i.e., Oculus <sup>R</sup> Rift and Microsoft <sup>R</sup> Hololens) or screens. It is through this naturalistic environment, and allowing for interactive behaviors while being monitored and recorded, that is the primary advantage of implementing VR technology. This means that VR technology can be used to deliver meaningful and relevant stimulation to an individual's nervous system and thereby capitalize on neuroplasticity to promote both cognitive and motor rehabilitation.

In this review, we will discuss the theoretical framework for the use of VR in the context of neurorehabilitation. We will provide evidence for the use of VR in motor rehabilitation for neurological disorders such as PD, CP and stroke and in mood and mental health disorders such as anxiety, PTSD and schizophrenia. We will also review the concurrent use of noninvasive brain stimulation and neuroimaging techniques during VR, discussing how these combined techniques may augment the benefits and complement current VR training protocols.

#### THEORETICAL FRAMEWORK FOR VR AND LEARNING

## Experiential Learning

The most important aspect of using VR is to provide new experiences by allowing users to interact physically and emotionally within a VE that is almost identical to the real world. The combination of physical, mental and emotional interaction encourages active participation and involvement of the user. In this sense, users of VR assimilate knowledge more effectively when they have the freedom to engage in self-directed activities within their learning context. By finding solutions and learning new skills autonomously, users of VR invest mental effort by constructing conceptual models that are both consistent with what they already understand and with the new content that is presented (Garrison and Garrison, 1997). Another key feature of VR training is that it offers users the opportunity to acquire skills in the context where they need to be applied. This results in more meaningful and effective learning, as compared with learning out of context (Nieuwenhuijsen et al., 2006). In physical rehabilitation for example, rehabilitation of fine motor control of the hands and wrists can be "re-trained" by simulating a VE where a stroke patient needs to pour him or herself a glass of water in the kitchen. In this way, patients practice and refine fine motor control of muscles controlling the hands and wrists through manipulating a virtual object that allows the same kind of natural interaction with objects that patients would engage in the real world.

# Augmented Feedback: Knowledge of Results and Performance

Another important aspect of VR therapy is the ability to provide augmented feedback to its users. Augmented feedback is additional information provided through any means (e.g., visual, auditory or kinesthetic) that is complimentary to the inherent feedback received via the sensory systems. There is no hard and fast rule as to how or what kinds of information augmented feedback should provide, however, VR therapy offers two vital pieces of information that is essential for learning (Winstein, 1991; Lauber and Keller, 2014); (1) knowledge of performance – information on how the participant performs during movement (i.e., movement sequences, joint angles, force outputs at each phase of movement etc.); (2) knowledge of results – information on the outcome of the performance (i.e., overall quality and quantity of movement). Currently, most commercially available VR games would incorporate visual, auditory and even kinesthetic feedback that can be provided either during or after the game. Very often, these VR games are designed in a manner that users have to maintain or achieve a pre-determined score or level in order for the game to progress. For example, VR applications can provide knowledge of performance throughout gameplay in the form of movement kinematics (i.e., joint angles, velocity, and speed), kinetics (i.e., ground reaction forces and torque) or even the level of activation in specific brain regions during a particular task. In order for users to progress to the next level, users must maintain or exceed a threshold that has been set based on previous trials or specific performance outcomes. Upon task completion, knowledge of results and performance can be provided, allowing both clinicians and users to understand deficiencies in movement patterns that are associated with specific dysfunctional movement outcomes, apply progressions

**Abbreviations:** BCI, brain computer interface; CP, cerebral palsy; EEG, electroencephalography; fMRI, functional magnetic resonance imaging; fNIRS, functional near-infrared spectroscopy; PD, Parkinson's disease; PTSD, posttraumatic stress disorder; tDCS, transcranial direct current stimulation; VE, virtual environment; VR, virtual reality.

appropriately, and address those deficiencies with a targeted rehabilitation approach.

## Observational Learning

Apart from providing feedback that is necessary for learning, another aspect of VR training is the enhancement of observational learning. The basis for learning, or at least its intended outcome, is to mimic or replicate an ideal response that brings about a desired result and induce a lasting change in behavior. In terms of neurorehabilitation, observation of goal-oriented movements or processes provides sensory feedback about the movement, behavior or emotional state, which contributes to learning (Oouchida et al., 2013; Williams and Carnahan, 2014). These observations preferentially activate parts of the brain that are involved with the physical performance itself allowing a motor program to be developed based upon the observed movements (Burke et al., 2010). Training in a VE may facilitate observational learning in four different ways; (1) VR applications can provide an accurate visual representation of the user's body and limb position using motion capture technology; (2) VR applications commonly use an avatar to mimic the movement of users, or conversely, the user could mimic the movements of the avatar; (3) accurate guides or a correct movement pattern can be produced for which users can follow; (4) VR applications can facilitate mental imagery by inducing optimal mood states and instructions for mental imagery.

# Motivation

Importantly, the goal-oriented nature of VR tasks may support the maintenance and adherence of neurorehabilitation programs. Unlike traditional therapist-led sessions, where improvements in physical or cognitive function may be subjective or difficult for patients to identify (Van den Broek, 2005), VR programs can provide an objective, quantitative measure of session outcomes and objectives. Furthermore, VR applications can provide both users and clinicians the ability to individualize training programs or alter the progression of a training session based upon the user's personal performance. The capacity to individualize therapy intensity may enhance motivation by allowing users to select practice sessions that are catered to their individual time and need, and more importantly, to manipulate treatment parameters to create optimal learning conditions. Another important consideration for VR to improve motivation is by incorporating competition or co-operation between other players during therapy sessions. Engaging users in a group environment either competing against each other or working in teams promotes an element of enjoyment through increased social interaction, particularly amongst people suffering with similar conditions (Van den Broek, 2005).

# EVIDENCE OF VR THERAPY IN MOVEMENT NEUROREHABILITATION

#### Stroke

The use of motion-controlled VR game consoles, including the Nintendo <sup>R</sup> Wii and Xbox <sup>R</sup> Kinect, have been explored as adjuncts to conventional physical therapy (see **Table 1**), specifically for improving upper limb function (Thomson et al., 2014; Laver et al., 2015). VR programs for stroke neurorehabilitation are based on the potential for brain neuroplasticity after neurological injury to support acquisition and retention of new motor skills to recover motor function. The goal of VR therapy in stroke is to apply these motor learning principles for stroke neurorehabilitation, such as providing repetitive, graded intensity, and motivating taskspecific training with real time multimodal feedback of movements and performance (Saposnik et al., 2011). Thus, VR systems are designed to enhance conventional therapy by providing a tool to deliver more specific, intensive and enjoyable therapy with real time feedback of performance (Levin et al., 2015).


#### TABLE 1 | Examples of recent systematic reviews and meta-analyses demonstrating the effects of VR in neurorehabilitation of stroke, PD and CP.

(Continued)

#### TABLE 1 | Continued

fnhum-10-00284 June 22, 2016 Time: 13:26 # 5


PEDro, physiotherapy evidence database (PEDro); RCT, Randomized controlled trials; PD, parkinson's disease; CP, cerebral palsy; VR, virtual reality.

Despite the potential utility of commercial VR game consoles for stroke neurorehabilitation, a number of limitations have been highlighted (Bower et al., 2015): (1) VR games designed for the general population can be too challenging for stroke patients with physical and cognitive deficits; (2) the difficulty levels and control of VR games are often not readily adjustable to rehabilitation targets, and the tasks may lack functional relevance; (3) feedback and scoring provided can be negative and frustrating for the user; (4) current VR games do not include neurological assessment; (5) VR does not integrate multiple environmental factors that connect to motor performance. In response to some of these limitations, there has been an emergence of research and development of modified VR programs specifically designed for stroke neurorehabilitation using adaptable software and hardware components of commercial VR systems (e.g., Kinect system) and guidance from clinicians in their development (Laffont et al., 2014; Bower et al., 2015). These adapted VR systems are progressively optimized with new functions including: (1) allowing automatic adaptation/intensity grading of the activity to the patient's own achievements; (2) allowing the therapist to adapt online the task's characteristics to the patient's needs; (3) allowing multiplayer VR systems via a web-service platform to enhance interactivity; (4) automatic recording of the patient's movements to provide therapists with data describing the quality and quantity of motor function recovery/progression, including the level of compensatory movements (Laffont et al., 2014).

While there is some evidence to suggest that VR may be highly applicable for stroke rehabilitation, the evidence from recent systematic reviews and meta-analyses indicate that current studies are limited by sample size issues and study designs (see **Table 1**). Early VR interventions used commercial applications such as the Nintendo Wii that controls an avatar, however, more recently customized systems have focused on interactive platforms to target activities of everyday living (i.e., reaching and grasping tasks). However, a major challenge with stroke is that no one stroke patient will present with the same motor deficit and therefore an individualized approach to therapy, including VR therapy, is needed. In this sense, future systems must be adaptive and customizable to manage the heterogenous nature of stroke for patients to gain greater benefits.

## Parkinson's Disease

fnhum-10-00284 June 22, 2016 Time: 13:26 # 6

Emerging VR therapies presents as an attractive option for delivering neurorehabilitation therapies to manage the cognitivemotor symptoms in people with PD, as it can be employed at any stage as an adjunct to standard pharmacological (Levodopa therapy) and/or surgical (ablation, deep brain stimulation) treatment (see **Table 1**). This new possibility in the field of neurorehabilitation aims to provide PD patients with a motivating way to perform multiple motor neurorehabilitation exercises with the rationale that the VR system might promote balance training, and cognitive-motor practice. Some commercial VR systems, such as the Nintendo <sup>R</sup> Wii system using a balance board, has drawn considerable attention from both the research and clinical communities as effective and feasible neurorehabilitation interventions to enhance gait and balance for people with PD (Barry et al., 2014; Harris et al., 2015). More recent studies have implemented custom programming and hardware to their VR systems to specifically improve balance and gait in PD (Mirelman et al., 2011).

It has been demonstrated that a VR neurorehabilitation program of 6–8 weeks involving 40–60 min a day, three times per week appears to be a viable option for significantly improving balance in a clinical population of individuals with PD (Esculier et al., 2012). The intensity/difficulty load of interventions used across existing studies appears a key contributing factor for the discordant findings reported in the literature (Esculier et al., 2012). In addition, activity selection could have contributed to some differential findings among studies. Some studies targeted static slow controlled movements in a closed environment such as the Wii Fit with balance board (dos Santos Mendes et al., 2012; Esculier et al., 2012), while others involve dynamic movements in an open environment such as Wii Sports (Herz et al., 2013).

Despite some evidence for performance improvement in balance, there are still limitations inherent in commercial VR systems that may not directly apply to realistic everyday settings for PD neurorehabilitation. Additionally, the programs are not very scalable, or modifiable, to each individual's needs or progress for all stages of disease. Mirelman et al. (2011, 2013) utilized a custom-made VR system to incorporate virtual obstacles presented on a screen during treadmill walking (18 sessions over 6 weeks). During the gait training they used a novel method (V-TIME) for tracking foot position based on the X-box Kinect technology. Interestingly, Mirelman et al. (2011) observed significant elevated gait speed with and without a cognitive dualtask upon completion of training and 4 weeks post-training. However, this VR gait training protocol confines participants to straight-walking, a gait pattern that is relatively uncommon in real-life environments. Perhaps a more viable approach may be the development of a VR system that may be used in conjunction with activities of daily living. People with PD are known to use visual and/or auditory cues to improve physical performance (Lee et al., 2012), and perhaps the use of augmented VR, via goggles or smart glasses, may be used to provide sensory cues as a feedforward or feedback mechanism to improve physical performance.

# Cerebral Palsy

Cerebral Palsy (CP) is the most common pediatric physical disability, thought to affect three to four individuals per 1000 of the population (Aisen et al., 2011; Oskoui et al., 2013) characterized as a spectrum of disorders of motor and postural development that cause limited functionality or dysfunction (Monge Pereira et al., 2015). Studies investigating exercise-based treatments for children with CP has provided growing evidence in the last decades for effectiveness in improving postural control (see **Table 1**). Although effective, traditional physical exercise in the clinical settings consists of repetitive tasks that limits the enthusiasm over regular periodic application.

While the study of VR in children with CP is still at its infancy, Denise Reid at the University of Toronto's Virtual Reality Laboratory (Reid, 2002a,b, 2004) has provided preliminary evidence to support its use. In these studies, children with CP were engaged with VR based exercises for upper extremity and postural control. The self-reported effect of VR on perceived selfefficacy to perform given tasks was tested in an uncontrolled study, before and after intervention (Reid, 2002a). Based on the self-efficacy theory, Reid (2002b) attempted to identify if use of VR could increase the motivation for exercise in children with CP. The pilot study yielded encouraging results for VR use with improvement in perceived performance abilities and satisfaction with performance. In a follow up study on upperextremity efficiency, improvement was also reported with VR use (Reid, 2002b). Similarly, Reid (2004) later investigated the effect of VR intervention on playfulness and found that VR environments stimulated playfulness in children, specifically the VR tasks that allowed creativity, expression, and choice of activity. You et al. (2005) used fMRI in a case report to investigate cortical reorganization and associated motor function improvement after a VR therapy. Neuroplastic changes were observed in the primary sensorimotor cortex and supplementary motor area following VR therapy, together with enhanced functional motor skills. A later study by Bryanton et al. (2006) compared the VR therapy with conventional exercises in children and found that although children completed more repetitions of the conventional exercises, the range of motion and hold time in stretched position was greater during VR tasks.

While the current research for VR in children is still in the early stages, VR therapies represent a viable option to increase exercise adherence and physical activity as they are both engaging and rewarding particularly in an adolescent population. The process of gamification, one that entails an interactive dynamic storyline and an overall goal, is likely to better capture and retain the attention of children over traditional physical training (Lister et al., 2014). The challenge, particularly in children with CP, will be to incorporate a diversity of activities performed during the game to train a repertoire of fundamental skills so as to further develop their motor and cognitive skills.

# EVIDENCE FOR VR THERAPY IN COGNITIVE REHABILITATION AND MENTAL HEALTH

# Anxiety, Phobias and Post-traumatic Stress Disorder

Anxiety can be generalized in nature [i.e., generalized anxiety disorder (GAD)], characterized by long-lasting anxiety that is not focused on a specific object, or may be more focal (i.e., phobias) occurring in the presence of, or in anticipation of, a specific object or situation. Preliminary evidence on the use of VR in GAD indicate that a combination of relaxation, controlled exposure and stress inoculation may help patients to cope with various stressors and sources of worry (Gorini et al., 2010; Repetto and Riva, 2011). Additionally, the combination of biofeedback (e.g., heart rate and electro-dermal skin response) may potentially help to identify particular sources of worry and emotion that can be used to modify specific features of the VR environment (Gorini et al., 2010; Repetto and Riva, 2011).

Despite the limited evidence for the use of VR therapy in GAD, there is some support for the use of VR in a range of other anxiety disorders (see **Table 2**) including specific phobias (Cote and Bouchard, 2005; Maskey et al., 2014), panic disorder (Vincelli et al., 2002) and social phobia (Klinger et al., 2004, 2005). Current VR therapies, particularly for phobias, use controlled exposure therapy that allows the patient to experience a sense of presence in an immersive, interactive VE that minimizes avoidance behavior and facilitates emotional involvement. This VE also allows controlled delivery of sensory stimulation via the therapist, for which the patient confronts the feared stimuli in a progressive manner. Another advantage of VR therapy is being able to recreate situations that cannot be re-experienced in vivo (i.e., combat situation or terrorist attack). VR therapy may be used as an alternative to imaginal exposure, meaning that patients with PTSD need not rely on internal imagery to visualize an event. A potential limitation in imaginal exposure therapy is that the therapist has no control over, or even knowledge of, what imagery the patient actually evokes (Strosahl and Ascough, 1981). Whereas in the VE, the stimuli presented can be carefully controlled and monitored. As with phobic patients, VR-based exposure therapy may be particularly useful for patients with PTSD for whom avoidance and failure to engage with therapy may hinder the therapeutic process. The efficacy of VR therapy for the treatment of PTSD has predominantly been examined in military populations (Cukor et al., 2009; Goncalves et al., 2012). A systematic review found that VR therapy was just as efficacious as traditional exposure treatment for PTSD (Goncalves et al., 2012). Seven of the 10 studies included in Goncalves's review found that VR environments significantly reduced PTSD symptoms in comparison to control, however, no significant differences in symptoms were observed between VR therapy and traditional exposure treatment.

Whilst the existing literature on the use of VR therapy for phobias, panic disorder, and PTSD were promising, several limitations must be considered (Powers and Emmelkamp, 2008; Meyerbroker and Emmelkamp, 2010; Opris et al., 2012). Meyerbroker and Emmelkamp (2010) noted that VR as a therapeutic tool is difficult to assess as it is often combined with other techniques. This potentially masks any underlying benefits of VR therapy on the patient. Furthermore, most studies do not include behavioral avoidance tasks, which would help to determine how transferable the results are to the real world.

# Schizophrenia

As an assessment tool, VR offers the possibility of creating unique environments allowing researchers to better identify and understand specific areas of the brain commonly effected in schizophrenia. It is proposed that binding errors during the memory encoding process are responsible for the episodic memory impairments reported in schizophrenia (Waters et al., 2004). In this sense, VR is able to tease out areas of the brain responsible for binding impairments by providing specific situations or tasks for which patients have to perform. For example, a study by Ledoux et al. (2013) examined contextual binding in schizophrenia using fMRI during a navigation task in a virtual town (i.e., find the grocery store from the school). Their results showed significantly less activation among patients relative to controls in the left middle frontal gyrus, and right and left hippocampi. Ledoux et al. (2013) further suggested that the reduced activation was indicative of context and content not being appropriately linked, therefore affecting the formation of a cognitive map representation in the patient group and eliciting a contextual binding deficit.

As a rehabilitation tool, VR offers a unique potential to expose individuals to controlled rehabilitation environments and allow for interaction within a VE. Indeed, VEs may be perceived as less intimidating for patients as it allows for more gradual increase in task difficulty and may therefore enhance participation with rehabilitation (Rizzo et al., 1998). In particular, VR therapy has been explored as an alternative option to improve cognitive function (Marques et al., 2008) and vocational skills (Tsang and Man, 2013) in schizophrenic patients with some success. However, perhaps one of the most important roles of VR therapy may be to attenuate the deficit in social skills associated with schizophrenia. Traditionally, social skills training using roleplay has been effective in remediating these deficits (Benton and Schroeder, 1990), however, role-playing of social skills training are limited in that they require appropriately matched groups, and may produce social anxiety, negative symptoms and poor insight. VR-based techniques offer an alternative to traditional role playing techniques by providing a computergenerated but realistic three-dimensional world and human-like avatars that can provide emotional stimuli. These VR-based techniques may be highly beneficial to re-train conversational skills (i.e., beginning a conversation, breaking silences, and differentiating facial expressions; Ku et al., 2007; Park et al., 2011).

While there is great potential for the role of VR in the treatment of schizophrenia, the evidence for its use remains contentious. Questions still remain if the effects of VR directly affects the condition itself, or perhaps the effects of VR may attenuate other psychiatric comorbidities such as anxiety or


#### TABLE 2 | Examples of systematic reviews and meta-analyses demonstrated the use of VR in treating PTSD and anxiety disorders.

CBT, cognitive behavioral Therapy; PTSD, post-traumatic stress disorder; VR, virtual reality.

depression that may trigger visual or auditory hallucinations in sufferers of schizophrenia. As these are still early days for VR therapy in general, there is a need to determine the precise role of VR in treatment therapies for schizophrenia and its limitations.

# FUTURE OUTLOOK TO THE MOST PROMISING RESEARCH AVENUES - COMPLEMENTING VR THERAPY USING NON-INVASIVE NEUROMODULATION AND NEUROIMAGING TECHNOLOGIES

This review so far has provided evidence for the use of VR therapy in various clinical populations as a standalone or adjunctive tool with mainstream neurorehabilitation treatment modalities. However, the question remains as to whether the beneficial effects of VR can be augmented via neuromodulation techniques such as tDCS, or if it is possible for a more targeted approach to monitor the effects of VR via non-invasive and portable neuroimaging methods such as fNIRS and EEG.

#### Augmenting VR Therapy with tDCS

Transcranial direct current stimulation is an emerging noninvasive brain stimulation technique that uses low-intensity constant direct electrical currents to modulate the excitability of cortical neurons and related networks (Nitsche and Paulus, 2000; Lang et al., 2005). By placing either a positive anode or negative cathode electrode over the scalp of the head, tDCS

is able to facilitate (anodal tDCS) or inhibit (cathodal tDCS) excitability of the underlying cortical neurons in a polarityspecific manner. Due to this robust neuromodulatory effect, tDCS has often been used in conjunction either before (offline) or during (online) rehabilitation therapy to improve motor and cognitive performance in healthy and clinical populations (for a recent reviews, see Coffman et al., 2014 and Floel, 2014).

In theory, the application of tDCS with VR therapy to augment neurorehabilitation appear complimentary. A study by Lee and Chun (2014) showed an improvement in stroke-specific clinical measures, manual muscle test and the Korean-modified Barthel Index in subacute stroke patients after 15 sessions of VR therapy with online cathodal tDCS to the unaffected motor cortex compared to sham. Kim et al. (2014) further demonstrated that the addition of online anodal tDCS to the affected motor cortex with VR therapy not only improved upper arm function, but also increased corticospinal excitability in subacute stroke patients.

In contrast to the aforementioned findings, mixed results were reported by Viana et al. (2014) that compared the effects of combining VR with offline anodal tDCS over the affected motor cortex of stroke patients, across 15 1-h VR therapy sessions. While the results showed no statistical differences in stroke-specific clinical measures (i.e., Fugl-Meyer assessment, Wolf motor assessment, and modified Ashworth scale) of upper arm function between patients receiving real tDCS compared to sham, it is important to note that more than 50% of participants receiving anodal tDCS and VR therapy had clinically significant improvements in wrist spasticity following treatment. Based on these limited combined VR and tDCS findings, it can be seen that performing tDCS during the VR therapy is a significant factor for enhancing the effects of VR therapy alone, which is also the case for combining tDCS with neurorehabilitation (Rothwell, 2012). Although combined VR and neuromodulation (tDCS) has been primarily applied in movement disorders, to the best of our knowledge, there are currently no known studies that have investigated this combination in cognitive and mood disorders. Thus therapists that are currently adopting the use of VR therapy in mental and mood disorders can potentially exploit the concurrent use of both VR and tDCS to augment therapy benefits above and beyond VR therapy alone.

It is likely that the combined effects of VR and tDCS is influenced by a combination of several factors, namely (1) general patient characteristics (e.g., brain region affected, and structural/functional reserve) and (2) tDCS parameters including electrode placement (affected or unaffected brain region), polarity (anodal or cathodal) and timing of tDCS application (online or offline). In such circumstances, where the efficacy of combined VR and tDCS interventions are both timing and location dependent (i.e., when and where to stimulate), a method of detecting and monitoring changes to neurophysiological function as patients receive treatment is crucial for optimizing intervention effects. In this regard, neuroimaging methods could be applied to monitor treatment VR progression, which will be discussed in the subsequent section.

# Monitoring VR Therapy with Neuroimaging

Neurophysiological changes associated with VR neurorehabilitation can be measured by non-invasive and portable neuroimaging techniques including fNIRS and/or EEG, to ascertain changes in cerebral hemodynamic responses or oscillatory brainwaves, respectively. In particular, the use of fNIRS as a tool to measure online cerebral hemodynamic responses during neurorehabilitation has received attention (for review see Irani et al., 2007; Ferrari and Quaresima, 2012). The use of fNIRS as a neuroimaging method relies on the principle of neurovascular coupling that measures the increase in regional cerebral blood flow (i.e., increase in oxygenated and decrease in deoxygenated hemoglobin) induced by neuronal activation, which is analogous to the blood-oxygenation-leveldependent responses measured by fMRI (Ferrari et al., 2004). Cortical activation measurements by fMRI and fNIRS techniques show highly correlated results in both motor and cognitive tasks (Huppert et al., 2006; Cui et al., 2011; Muthalib et al., 2013). While the application of fNIRS techniques is gaining popularity, EEG has long been used to measure online brain activity during a cognitive or motor task, and in various clinical populations (Bonanni et al., 2008; Gosselin et al., 2011). Of particular importance, EEG is used to detect changes in various brainwaves (i.e., Gamma, Alpha, Beta, Theta, and Delta) which are differentially affected by changes in mood (Huang and Lo, 2009), wakefulness (De Gennaro et al., 2001), neurological diseases (Bonanni et al., 2008), and brain injury (Gosselin et al., 2011). Both fNIRS and EEG have several advantages over fMRI, as they are portable, relatively inexpensive to use, and easy to operate with high temporal resolution. Furthermore, new generation systems are battery operated, wireless and further miniaturized to the size of a smartphone, ideal for ambulatory and untethered measurements consistent with a neuroergonomics approach (Ayaz et al., 2013; de Lissa et al., 2015).

As the use of fNIRS and EEG techniques in VR therapy is still relatively new, most studies to date have focused on healthy individuals (Bayliss and Ballard, 2000; Mingyu et al., 2005; Holper et al., 2010; Seraglia et al., 2011; Basso Moro et al., 2014), with the potential for more studies in clinical populations emerging as the popularity of these portable neuroimaging technologies increases. The current use of fNIRS and EEG in VR therapy has two proposed roles; (1) to monitor and provide augmented feedback regarding regions of cortical activation during therapy and (2) to use fNIRS or EEG as part of a BCI paradigm for therapy. In support of the first role, several studies have investigated the efficacy of fNIRS and EEG to record cortical hemodynamic and oscillatory changes during actual motor tasks and motor imagery in a VR environment. These studies demonstrated the efficacy of fNIRS and EEG to detect task-specific changes in cortical hemodynamics (Holper et al., 2010; Seraglia et al., 2011; Basso Moro et al., 2014) and oscillatory patterns (Bayliss and Ballard, 2000; Mingyu et al., 2005).

The ability of fNIRS and EEG to detect changes in these neurophysiological measures can provide both feedback on

FIGURE 2 | Stroke participants engaged in VR therapy using an X-Box Kinect motion capture system while receiving tDCS.

location and level of activation, for which clinicians and users can use to set intensity and progression of therapy. Furthermore, feedback on cortical activation can also be used to identify areas of hypo- or hyperactivity, which can be modulated using neuromodulatory techniques such as tDCS (see Prospective Integration of Neuromodulation-Neuroimaging with VR Therapy). In support of the second role, identifying cortical areas of activation, patterns and timing of cortical activation associated with various movements or mood states may also be recorded as classifiers for BCI training. Indeed, most BCI studies to date have employed the use of EEG as a measure for cortical activation for which to control a robotic limb or avatar in a VR environment (Lin et al., 2007; Formaggio et al., 2013, 2015). Although relatively new, there are also fNIRS-based BCI approaches demonstrating feasibility for future integration with VR and neurorehabilitation (Sitaram et al., 2007; Hong et al., 2015). Moreover, joint use of fNIRS- and EEG-based BCI approaches have also been demonstrated (Koo et al., 2015; Xuxian et al., 2015). This shows the potential to adopt fNIRS in a similar manner, whereby appropriate cortical hemodynamic responses can be classified to control a robotic or computer interface.

# Prospective Integration of Neuromodulation-Neuroimaging with VR Therapy

In the last 5 years, new research suggests that the combination of VR therapy with neuromodulation and neuroimaging techniques may help to improve the effects and delivery of VR therapies. Neuromodulation techniques such as tDCS (**Figure 2**) and neuroimaging methods such as fNIRS (**Figure 3**) and EEG have already been combined and have shown some success (Ang et al., 2015; Dutta et al., 2015; Muthalib et al., 2016). While the use of these techniques in combination with VR is still in its infancy, the available evidence suggests highly complementary effects when combining neuromodulation and neuroimaging with VR therapy.

As both neuroimaging (fNIRS/EEG) and neuromodulation (tDCS) techniques have complimentary capabilities, and they both can be built as wearable and wireless systems, integration of the two presents as a natural avenue for applications in natural environments and real world settings (McKendrick et al., 2015). One potential use is enhancing BCI applications. In its most general form, BCI provides a route for neural output that does not involve the neuromuscular system (Wolpaw et al., 2002). Almost all current non-invasive BCI systems are read-only, that is, brain signals are read directly to the system (via neuroimaging) and the system can provide an output or feedback that relies on sensory input mechanisms and peripheral nervous system to reach back to the brain. A more direct BCI could eliminate the need for sensory input with the use of neuromodulation, and hence provide feedback directly to the brain for a read-write BCI.

Integration of VR therapy for such a read/write BCI hold potential for enhanced or accelerated therapeutical processes in neurorehabilitation. A study by Ang et al. (2015) was one of the first studies that combined VR with both tDCS and EEG to investigate the additive effect of offline anodal tDCS on BCI haptic training with chronic stroke patients. Although this study reported no difference in the Fugl-Meyer assessment and blockand-box test between groups that received real tDCS prior to BCI training, compared to sham, the study did demonstrate an increased state of EEG mu rhythm suppression in the real tDCS group. Thus indicating improved neurophysiological responses in motor preparation. Future studies are necessary to determine

FIGURE 3 | The use of a semi-immersive VR environment and fNIRS system.

the significance of these neurophysiological improvements to clinical outcomes.

Neuroimaging guided tDCS and VR therapy can be applied to neurorehabilitation in general. The optimal location for applying tDCS electrodes to modulate a target cortical region and connected networks is a debated point in the field. Modeling studies of current flow between the tDCS electrodes have provided some guidance to the placement of electrodes to stimulate a specific brain region; however, whether these models predict the actual current flow and polarization of the targeted brain networks is not known. Simultaneous tDCS-neuroimaging could provide a solution to confirming the modeling predictions and/or to guide the direction of current flow between multielectrode tDCS montages (Ruffini et al., 2014). For example, a motor task could be used to activate a broad network of cortical regions of interest that can be measured using fNIRS/EEG neuroimaging. Once the locations of the tDCS electrodes have been determined using neuroimaging guidance, neuroimaging and tDCS could be simultaneously or independently used to guide and modulate VR therapy. In this scenario, neuroimaging during VR tasks can be used to adapt the intensity based on the level of activation, such as the attentional hub of the dorsolateral prefrontal cortex (DLPFC). The level of DLPFC activation during the initial task would be expected to be high due to the novelty of the activity, however, as the activity progresses the learning of task requirements become automatic and performance improves, less attentional resources would be required, and the level of DLPFC would be expected to decrease (Ayaz et al., 2012; McKendrick et al., 2014). In such a scenario, the VR task can be adapted online to modulate the intensity level and maintain optimal DLPFC activation. Also, if the levels of DLPFC activation remain at high levels and/or performance is stagnant, then tDCS could potentially be applied online to upregulate neuronal networks required to perform the task. Preliminary evidence using a modeling approach to locate tDCS electrodes was provided by McKendrick et al. (2015) using a spatial memory task with concurrent fNIRS and tDCS. They showed that when task performance declined rapidly following baseline, the application of tDCS almost immediately eliminated the performance decrement. Furthermore, they showed that tDCS can modulate the neural activity of specific brain regions near the site of stimulation. However, they cautioned that current models and protocols for determining tDCS montages are lacking, due to complex interactions between stimulation montage, task performance and underlying hemodynamics that are not fully understood. Therefore, additional joint tDCS and fNIRS/EEG studies are required to further unravel these complexities and to better define the pattern of cortical excitation induced by tDCS during the performance of cognitive and motor tasks.

## CURRENT LIMITATIONS OF VR THERAPY

fnhum-10-00284 June 22, 2016 Time: 13:26 # 12

While using a VE offers many unique advantages to traditional treatment and neurorehabilitation approaches, limitations to their efficacy and practicality must be acknowledged. Firstly, larger clinical studies are required to establish the efficacy of using VR in physical and cognitive rehabilitation in different clinical populations. Much of the existing literature report mixed findings from small sample sizes, and often lack appropriate control comparisons (see **Tables 1** and **2**). Secondly, there is little information on the transfer of the training effects of VR into the corresponding physical environment in general, and the VR training parameters associated with optimal transfer to real-world functional improvements are yet to be elucidated. Thirdly, in many clinical populations it is unclear whether advantages of VR over real-world training exist, and if so, precisely what these advantages are (see **Tables 1** and **2** for study limitations). Furthermore, because this literature is extremely vulnerable to selective reporting and Type-I statistical error, there is an inherent bias of publishing results that show correlation between rehabilitation improvements and the application of VR. Therefore to potentially limit any bias in future studies, it may be useful for future studies to adopt a double-blinded protocol for the evaluation of the effectiveness of the use of VR. Lastly, it is important to investigate any unique rehabilitative effects of VR that may be exploited, or whether the benefits of VR can be attributed to the enjoyment of gaming platforms associated with VR themselves (i.e., VR therapies may only present as more effective because they engage and motivate participants throughout their training session, providing increased adherence). While limitations in VR technology exist, the potential for favorable neuroplasticity afforded by such technology undoubtedly warrants further investigation.

## REFERENCES


#### CONCLUSION

In summary, this review has discussed the strengths and limitations for the use of VR therapies in motor and mental health neurorehabilitation. The current evidence suggest that a combination of VR and conventional therapies are safe and likely to be more efficacious compared to just traditional or VR therapy alone. However, it is not known if the use of VR therapies can lead to cost-saving benefits (i.e., reduced financial and manpower cost) or even if current commercial or customized systems will be applicable by patients that are living within the community. More importantly, there is a need to elucidate the aspects of VR that are most effective for rehabilitation. While this is not apparent in the current review, future studies should attempt to systematically determine the role of self-projection, sensory feedback or motivation on rehabilitation in relation to specific diseases or impairments. Furthermore, we have discussed the potential for VR therapy to be complemented by other forms of technologies such as neuromodulation (tDCS) and neuroimaging (fNIRS/EEG) in order to augment training benefits of VR, and provide a more targeted approach to neurorehabilitation. Large-scale longitudinal studies will also be required to determine the effects of VR therapy (in combination with tDCS/fNIRS/EEG) and the translation of VR therapy in a non-clinical environment (i.e., home setting).

## AUTHOR CONTRIBUTIONS

All authors listed, have made substantial, direct and intellectual contribution to the work, and approved it for publication.

## ACKNOWLEDGMENTS

W-PT is supported by an Alfred Deakin Postdoctoral Research Fellowship. MM is supported by a Postdoctoral Research Fellowship of the University of Montpellier.


robot-assisted hand performance: brain oscillatory changes in active, passive and imagined movements. J. Neuroeng. Rehabil. 10:24. doi: 10.1186/1743-0003- 10-24



continuous monitoring of cerebral hemodynamics with NIRS. Neuroimage 85, 1014–1026. doi: 10.1016/j.neuroimage.2013.05.103



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Teo, Muthalib, Yamin, Hendy, Bramstedt, Kotsopoulos, Perrey and Ayaz. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Corrigendum: Does a Combination of Virtual Reality, Neuromodulation and Neuroimaging Provide a Comprehensive Platform for Neurorehabilitation? – A Narrative Review of the Literature

Wei-Peng Teo<sup>1</sup> \*, Makii Muthalib2, 3, Sami Yamin4, 5 , Ashlee M. Hendy <sup>6</sup> , Kelly Bramstedt <sup>4</sup> , Eleftheria Kotsopoulos 4, 7, Stephane Perrey <sup>2</sup> and Hasan Ayaz 8, 9, 10

*1 Institute for Physical Activity and Nutrition (IPAN), Deakin University, Burwood, VIC, Australia, <sup>2</sup> EuroMov, University of Montpellier, Montpellier, France, <sup>3</sup> Cognitive Neuroscience Unit, Deakin University, Burwood, VIC, Australia, <sup>4</sup> Liminal Pty Ltd., Melbourne, VIC, Australia, <sup>5</sup> Adult Mental Health, Monash Health, Dandenong, VIC, Australia, <sup>6</sup> School of Exercise and Nutrition Sciences, Deakin University, Burwood, VIC, Australia, <sup>7</sup> Aged Persons Mental Health Service, Monash Health, Cheltenham, VIC, Australia, <sup>8</sup> School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, PA, USA, <sup>9</sup> Department of Family and Community Health, University of Pennsylvania, Philadelphia, PA, USA, <sup>10</sup> The Division of General Pediatrics, Children's Hospital of Philadelphia, Philadelphia, PA, USA*

Keywords: neurorehabilitation, neuroplasticity, tDCS, fNIRS, EEG, virtual reality therapy

#### **A corrigendum on**

**Does a Combination of Virtual Reality, Neuromodulation and Neuroimaging Provide a Comprehensive Platform for Neurorehabilitation? – A Narrative Review of the Literature**

by Teo, W.-P., Muthalib, M., Yamin, S., Hendy, A. M., Bramstedt, K., Kotsopoulos, E., et al., (2016). Front. Hum. Neurosci. 10:284. doi: 10.3389/fnhum.2016.00284

In the original article, there was a mistake in the legend for Figure 2 as published. The correct legend appears below. The authors apologize for this error and state that this does not change the scientific conclusions of the article in any way.

**Figure 2**. **Stroke participants engaged in VR therapy using an X-Box Kinect motion capture system, MediMoov by NaturalPad, while receiving tDCS**.

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Teo, Muthalib, Yamin, Hendy, Bramstedt, Kotsopoulos, Perrey and Ayaz. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Edited and reviewed by: *Mikhail Lebedev, Duke University, USA*

\*Correspondence: *Wei-Peng Teo weipeng.teo@deakin.edu.au*

Received: *12 January 2017* Accepted: *24 January 2017* Published: *03 February 2017*

#### Citation:

*Teo W-P, Muthalib M, Yamin S, Hendy AM, Bramstedt K, Kotsopoulos E, Perrey S and Ayaz H (2017) Corrigendum: Does a Combination of Virtual Reality, Neuromodulation and Neuroimaging Provide a Comprehensive Platform for Neurorehabilitation? – A Narrative Review of the Literature. Front. Hum. Neurosci. 11:53. doi: 10.3389/fnhum.2017.00053*

# High Working Memory Load Impairs Language Processing during a Simulated Piloting Task: An ERP and Pupillometry Study

Mickaël Causse1,2\*, Vsevolod Peysakhovich<sup>1</sup> and Eve F. Fabre<sup>1</sup>

<sup>1</sup> Département Conception et Conduite des Véhicules Aéronautiques et Spatiaux, Institut Supérieur de l'Aéronautique et de l'Espace, Toulouse, France, <sup>2</sup> Ecole de Psychologie, Université Laval, Québec, QC, Canada

Given the important amount of visual and auditory linguistic information that pilots have to process, operating an aircraft generates a high working-memory load (WML). In this context, the ability to focus attention on relevant information and to remain responsive to concurrent stimuli might be altered. Consequently, understanding the effects of WML on the processing of both linguistic targets and distractors is of particular interest in the study of pilot performance. In the present work, participants performed a simplified piloting task in which they had to follow one of three colored aircraft, according to specific written instructions (i.e., the written word for the color corresponding to the color of one of the aircraft) and to ignore either congruent or incongruent concurrent auditory distractors (i.e., a spoken name of color). The WML was manipulated with an n-back sub-task. Participants were instructed to apply the current written instruction in the low WML condition, and the 2-back written instruction in the high WML condition. Electrophysiological results revealed a major effect of WML at behavioral (i.e., decline of piloting performance), electrophysiological, and autonomic levels (i.e., greater pupil diameter). Increased WML consumed resources that could not be allocated to the processing of the linguistic stimuli, as indexed by lower P300/P600 amplitudes. Also, significantly, lower P600 responses were measured in incongruent vs. congruent trials in the low WML condition, showing a higher difficulty reorienting attention toward the written instruction, but this effect was canceled in the high WML condition. This suppression of interference in the high load condition is in line with the engagement/distraction trade-off model. We propose that P300/P600 components could be reliable indicators of WML and that they allow an estimation of its impact on the processing of linguistic stimuli.

Keywords: mental workload evaluation, electroencephalography/event-related potential (EEG/ERPs), pupil size, neuroergonomics, human factors, selective attention, attentional orienting

# INTRODUCTION

## Visual-Auditory Interference

Depending on the current task, the information surrounding us is roughly divided into relevant or irrelevant. Naturally, we tend to ignore the irrelevant information and to privilege that which is relevant. Despite such top-down attentional focus on a primary task, concurrent stimuli can capture human attention, especially when they share common characteristics

Edited by: Hasan Ayaz, Drexel University, USA

#### Reviewed by:

Mario Bonato, Ghent University, Belgium Lewis Leewui Chuang, Max Planck Institute for Biological Cybernetics, Germany

> \*Correspondence: Mickaël Causse mickael.causse@isae.fr

Received: 07 November 2015 Accepted: 09 May 2016 Published: 25 May 2016

#### Citation:

Causse M, Peysakhovich V and Fabre EF (2016) High Working Memory Load Impairs Language Processing during a Simulated Piloting Task: An ERP and Pupillometry Study. Front. Hum. Neurosci. 10:240. doi: 10.3389/fnhum.2016.00240 with the focal task (Folk and Remington, 1998). As stated by Watkins et al. (2007), although distracting subjects, such attentional capture may be advantageous for survival because even a single stimulus can convey critical information about the environment. The distraction phenomenon has been widely investigated over the last decades (e.g., Parmentier, 2014). According to various authors, distraction may result from three different processing steps (Escera et al., 2000; Berti, 2008, 2013; Horváth et al., 2008). First, a preattentive change detection step may occur automatically when novel/deviant stimulus appears in the environment. Second, once the concurrent stimulus is detected, attentional resources may be automatically allocated to it (i.e., involuntary orienting of attention) at the expense of goal-relevant stimuli. Third, if the stimulus is irrelevant to the task, a voluntary reorientation of attentional resources from irrelevant stimulus to relevant stimulus may finally occur. These involuntary and voluntary shifts of attention are assumed to interfere with the processing of the information relevant to the task at hand. Generally, the literature on visual-auditory interference tends to support this three-step model using a wide variety of experimental paradigms such as auditoryvisual oddball tasks (Andrés et al., 2006; Boll and Berti, 2009; Bendixen et al., 2010; Parmentier and Andrés, 2010; Ljungberg and Parmentier, 2012), task-irrelevant auditory distractor probes (Scheer et al., 2016), visual-auditory Stroop tasks (Roelofs, 2005; Donohue et al., 2013; Elliott et al., 2014), response competition paradigms (e.g., Lavie and Cox, 1997; Tellinghuisen and Nowak, 2003), and more ecological tasks like decision-making during aircraft landing (e.g., Scannella et al., 2013). Overall, auditory distractors are likely to interfere with the processing of visual targets as longer response times and sometimes a decrement in accuracy are observed (Stuart and Carrasco, 1993; Yuval-Greenberg and Deouell, 2009; Chen and Spence, 2011; Berti, 2013; Donohue et al., 2013).

The electroencephalography (EEG) technique is particularly appropriate for the study of the processing of auditory distractors in that it has a high temporal resolution which enables to observe the different steps of the process (Luck and Kappenman, 2011). The mismatch negativity (MMN) event related potential (ERP), a negative deflection occurring between 150 and 250 ms after stimulus onset, maximal at frontal and central sites, was found to be elicited by novel/deviant auditory distractors; this was interpreted as reflecting the pre-attentive detection of the distractors (e.g., Friedman et al., 2001; Berti, 2013). In addition, the novelty-P3 component, a positive deflection occurring around 300 ms after stimulus onset and highest in the frontal lobes, was found to index the involuntary switch of attention to the distractors (Escera et al., 1998, 2000; Friedman et al., 2001). Finally, the reorienting negativity (RON) component, a later negative deflection occurring around 500 ms after the stimulus onset, maximal at frontal site, was found to index the reorientation of attention back to the task after distraction (Schröger and Wolff, 1998; Schröger et al., 2000; Berti and Schröger, 2001; Wetzel et al., 2004).

# The Effect of Working Memory Load on the Processing of Auditory Distractors

Many studies have investigated how the processing of distractors is impacted by both perceptual load (Tellinghuisen and Nowak, 2003; Lavie, 2005; Parks et al., 2011; Lavie et al., 2014; Bonato et al., 2015) and working-memory load (WML; Lavie et al., 2004; Kim et al., 2005; SanMiguel et al., 2008). Lavie et al. (2004) proposed that while an increase in perceptual load may reduce distractor interference, an increase in WML may, on the contrary, increase distractor interference. However, various studies investigating the impact of WML on the processing of both visual targets and auditory distractors found opposite results (SanMiguel et al., 2008; Lv et al., 2010; Sörqvist et al., 2012). In SanMiguel et al.'s (2008) study, participants performed an auditory-visual distraction paradigm. While performing the visual task, participants had to ignore task-irrelevant auditory stimuli (i.e., 20% novel environmental sounds and 80% repetitive standard tones). The WML was also manipulated by an nback task (Kirchner, 1958). In the low load condition (i.e., 0-back), participants had to decide whether the two digits appearing on screen at the same time were the same or different, while in the high load condition (i.e., 1-back) they had to compare the left digit appearing on the screen with the left digit seen in the previous trial. An increase of response times and a decrease of hit rate showed that participants were distracted by novel sounds. Moreover, behavioral data and an attenuation of the amplitude of the novelty-P3 showed that high WML decreased the distraction effect. In another study, Lv et al. (2010) asked participants to remember the order of three (low load) or seven digits (high load). As in SanMiguel et al. (2008), task-irrelevant auditory stimuli were played during the working memory (WM) task with 80% repetitive standard sounds and 20% novel environmental sounds. Participants responded faster and performed significantly better on the task in the low load condition than in the high load condition. Moreover, lower novelty-P3 amplitudes were found in the high WML condition in comparison to the low WML condition, leading the authors to conclude that high WML decreases the distraction effect. Finally, Sörqvist et al. (2012) measured the auditory brainstem responses (ABR; i.e., a neural signal transmitted by the cochlea to the auditory cortex via the brainstem) of participants completing a visualverbal version of the n-back task (i.e., low load for 1-back, medium load for 2-back and high load for 3-back). They were presented with a visual sequence of letters and were asked to press the space bar on the computer keyboard when the letter was the same as the letter previously presented n letters back in the sequence. The results of this study demonstrate that a medium increase in WML (i.e., 2-back condition) may disrupt the processing of the distractor (i.e., lower ABR responses) without affecting task performance, while a significant increase of WML (i.e., 3-back condition) may not only result in lower ABR response but also in lower accuracy.

The behavioral and electrophysiological (i.e., ERPs, ABR) results of these three studies tend to confirm that high WML reduces the distraction effect. The impact of WML on distractors processing may also depend on how WM content (e.g., tones, digits, letters, words, geometric forms, etc.) overlaps taskrelevant information. Stroop interference was found to increase when target types overlapped WM content (Kim et al., 2005). This result provides a suitable explanation for the discrepancy in results of studies investigating the impact of WML on auditory distractor processing (i.e., an increase of distraction vs. a decrease of distraction under high WML). However, significantly, all these experiments only used tones as auditory distractors. Linguistic distractors were studied previously by Mayer and Kosson (2004), but as far as we know the impact of WML on the processing of visual task-relevant and auditory task-irrelevant linguistic stimuli has never been investigated.

# Visual-Auditory Interference and Load on Working Memory in the Cockpit

Operating an aircraft generates a high WML, pilots have to simultaneously select, process, memorize and retrieve an important amount of information, which requires high multitasking and WM capacity (Konig et al., 2005). Previous studies in flight simulators have demonstrated the critical impact of human WM limitations on piloting performance (Taylor et al., 2000; Causse et al., 2011) and how it is likely to compromise flight safety (Borghini et al., 2014). For instance, high WML was found to affect the ability of the pilots to process ATC verbal instructions (Taylor et al., 2005) and simulated auditory alerts (Dehais et al., 2013; Giraudet et al., 2015). However, pilots find themselves confronted with false or irrelevant information (Belcastro et al., 2016). In order to maintain optimal performance, they sometimes need to insulate themselves from auditory distractors to focus on relevant visual information. For example, they have to ignore irrelevant ATC communications and unjustified warnings (e.g., ground proximity system alarm, Loomis and Porter, 1982) while always maintaining the ability to shift attention to concurrent stimuli to decide whether or not they are taskrelevant. However, as previously stated, this ability to detect and process concurrent stimuli might be altered, a situation sometimes referred to as cognitive tunneling (Wickens et al., 2015).

# Present Study and Hypothesis

The present study aimed at investigating the impact of high WML on the processing of both linguistic visual targets and auditory distractors. Participants performed a simplified piloting task in which they had to control an aircraft using a joystick. They were instructed to follow one of three different colored aircraft displayed on the left of the computer screen, according to written instructions (i.e., the color corresponding to one of the aircraft) displayed in the center of the screen. Similarly to Donohue et al. (2013), each time a color was displayed on the screen, a concurrent spoken distractor (i.e., a color to be ignored) was simultaneously presented either congruently or incongruently with the written instructions. WML was manipulated via an n-back-like sub-task (i.e., the delay between the displayed instruction and its execution). In the low WML condition, participants were instructed to immediately apply the written instruction. In the high WML condition, they had to apply the 2-back written instruction. We measured piloting performance (i.e., accuracy in following the correct aircraft), ERPs, and pupil diameter.

We expected that accuracy may be higher in the congruent vs. incongruent condition. Also, the piloting task should be cognitively more demanding when the WML is high (i.e., 2-back condition) compared to when it is low (i.e., 0-back condition), thus we expected to observe higher accuracy in the low WML condition compared to the high WML condition (in line with Sörqvist et al., 2012). In addition, given that less attentional resources should be available for processing spoken distractors in the high WML condition, in line with various authors (e.g., SanMiguel et al., 2008; Lv et al., 2010; Sörqvist et al., 2012), incongruence may affect accuracy only in the low WML condition.

As pupil diameter was found to vary according to attentional effort (Smallwood et al., 2011), and task engagement (Gilzenrat et al., 2010), we predicted greater pupil diameter in response to incongruent trials compared to congruent trials (Siegle et al., 2004). This greater pupil diameter may be observed in the low WML condition only. Pupil diameter measurements have been also found to be a reliable psycho-physiological marker of WML (van Gerven et al., 2004; Lisi et al., 2015; Peysakhovich et al., 2015), with larger tonic pupil responses indicating an increase in WML. In line with previous studies, we expected to observe greater pupil diameter in the 2-back condition than in the 0-back condition.

At an electrophysiological level, we expected to observe amplitude modulations of ERP components associated with attention allocation (i.e., novelty-P3). Various studies support the fact that the P3a component and the novelty-P3 component are variations of the same potential (Spencer and Polich, 1999; Simons et al., 2001; Polich and Comerchero, 2003; Polich, 2007). Since no novel auditory distractors were used in the present task, we chose to use the generic term P3a and not novelty-P3 when referring to this component. Based again on previous results showing that an increase in WML may lower the distraction effect (SanMiguel et al., 2008; Lv et al., 2010; Sörqvist et al., 2012), we predicted greater P3a amplitudes in response to incongruent trials than to congruent trials in the low WML condition only, reflecting an involuntary switch of attention to the spoken distractors (e.g., Escera et al., 1998, 2000; Friedman et al., 2001). In addition, spoken distractors seem to lead to involuntary semantic evaluation (Parmentier, 2008; Parmentier et al., 2011; Parmentier and Hebrero, 2013). In order words, spoken distractors are, at least partly, processed. Based on these previous results, we predicted amplitude modulations of ERPs associated with language processing (i.e., N400 component). The N400 component is a negative deflection reaching the scalp around 400 ms after stimulus onset, is highest at centralparietal sites (for a review see Kutas and Federmeier, 2011), and was found to index sematic incongruence processing (Pickering and Schweinberger, 2003). Consequently, we predicted greater N400 amplitudes in response to incongruent trials compared to congruent trials (Kutas and Hillyard, 1980; Kutas and Federmeier, 2011), indicating the processing of both written instructions and spoken distractors (in line with Donohue et al., 2012). As for the P3a, we may also observe N400 amplitude differences in response to incongruent trials vs. congruent trials in the low WML condition only. Finally, we predicted a general effect of WML with lower P3a/N400 amplitudes in the 2-back condition than in the 0-back condition (SanMiguel et al., 2008; Lv et al., 2010).

#### MATERIALS AND METHODS

#### Participants

Participants were 24 healthy volunteers (mean age = 24.6, SD ± 1.86), all native French speakers. They were recruited at the Institut Supérieur de l'Aéronautique et de l'Espace (ISAE) and were familiar with the aeronautical domain. All were righthanded as assessed by the Edinburgh Handedness Inventory (Oldfield, 1971), had normal auditory acuity and normal or corrected-to-normal vision. None of the participants reported a prior history of neurological disorder. All participants were informed of their rights and gave written informed consent for participation in the study according to the Helsinki Declaration. The research was carried out fulfilling ethical requirements in accordance with the standard procedures of the University of Toulouse. The experimental protocol was reviewed and approved by a national ethic committee (CEEI/IRB00003888).

#### Material

The piloting task was displayed on a 22'' monitor (1680 × 1250) located at a distance of approximately 70 cm from the participants. The screen luminance and the piloting task were identical in all experimental conditions. As a consequence, no confounding effect of light could have jeopardized pupil measurements. Spoken names of colors (i.e., gray, red, blue, yellow, green) were presented via two stereo speakers, positioned on each side of the computer monitor. They were recorded using a synthetic voice taken from the French Voxygen startup website.<sup>1</sup> Throughout the entire experiment, both pupil diameter variations and EEG signals were recorded (see ''Electroencephalography'' and ''Pupillometry'' Sections).

#### Experiment Design

We used a full factorial design with two within-participant factors: WML (Load: Low, High) and the congruency of task-irrelevant auditory distractor (Congruency: Congruent, Incongruent). Participants performed two blocks of 250 trials each, which only presented either ''Low'' or ''High'' load trials. Block order was counter-balanced among participants. The congruency of the task-irrelevant auditory distractor was randomly determined per trial. We computed accuracy for each condition as a percentage of correctly targeted aircraft. We considered that an aircraft was correctly followed if the vertical distance between the user's aircraft and the target one was less than 100 pixels for at least 90% of the trial length.

<sup>1</sup>https://www.voxygen.fr/fr

#### Task and Stimuli

The task involved controlling an aircraft with a joystick in order to follow one of three possible aircraft that were defined by unique colors. The color name corresponding to the color of the aircraft to target was presented in black ink every 4500 ms in the center of the screen for 1000 ms (i.e., written instructions). In addition, a task-irrelevant auditory distractor (i.e., spoken distractor) was also presented simultaneously for 280 ms (i.e., visual-auditory Stroop paradigm; Roelofs, 2005). These written instructions and spoken distractors created four different trial combinations. In the first combination occurring 10% of the time, the spoken and the written color names (i.e., blue, red, green or yellow) were the same (i.e., congruent trials). In the second combination also occurring 10% of the time, the spoken and the written color names differed from one another (i.e., incongruent trials). In the third combination occurring 10% of the time, the spoken color name did not correspond to any aircraft color (i.e., neutral trials). And finally, in the fourth combination occurring 70% of the time, the spoken color name was ''gray'' (i.e., standard trials). The neutral and standard trials were not analyzed. These two trial combinations were used to inhibit habituation effects and create a rarity effect toward the congruent/incongruent distractors, respectively.

The three aircraft to target were displayed on the left of the computer screen. **Figure 1** shows the layout of the presentation display. The colors of the three aircraft were randomly chosen for each block among four possible colors: red, blue, yellow and green. The initial horizontal position of the targeted aircraft and the control aircraft were equidistant from the center and the edges of the display. They were respectively positioned 30% from the left/right borders of the screen. In order to create a continuous, engaging, and dynamic interaction with the task, every 50 ms the position of the three aircraft on the left changed, a random shift up to 12 pixels vertically and up to 2 pixels

displayed for 1000 ms while the auditory distractor was played for 280 ms. In this particular example, the written target color is "red". Consequently, a congruent auditory distractor would be "red". "Yellow" or "blue" would be incongruent distractors, and "green" would be neutral. The standard distractor was "gray" throughout the whole experiment.

horizontally was applied. The amplitude of this shift was chosen by a moving average filter (of 10th order) of randomly generated numbers (from −1 to 1 to choose the proportion of the greatest authorized shift). A small jitter was also added to the aircraft under control, with a maximum authorized shift up to five pixels vertically and up to one pixel horizontally, so it would be unstable and require continuous control.

Task difficulty was manipulated in terms of WML and the tracking task was designed to be similar to an n-back task. Two difficulty levels corresponded to the delay between the displayed instruction and its execution. Contrary to the classic n-back paradigm in which a participant has to indicate if the current stimulus matches one from n steps earlier in the sequence, our participants had to target the aircraft corresponding to the current written instruction (n = 0) in the low WML condition or corresponding to the instruction presented two trials before (n = 2) in the high WML condition. After each block, participants filled out the NASA Task Load Index questionnaire (NASA TLX; Hart and Staveland, 1988), see **Figure 2**. This questionnaire provides an evaluation of the subjective mental demand elicited by the task for each level of difficulty.

We did not manipulate piloting complexity per se, the control of the aircraft remained constant. Neuroergonomically, this task was designed to recreate an ecological context with an engaging, dynamic and complex situation. Similar to piloting, the participants had to continuously control the trajectory of their aircraft and had to remain responsive to the written and verbal instructions under various WML conditions. In addition, the task also reproduced the multiple conflicting warnings that can confuse crews (Belcastro et al., 2016). Given the voluntary complexity of the task, we did not intend to specifically separate all cognitive processes at each time point. However, the main cognitive abilities engaged during the task were visuospatial (monitoring the position of the aircraft), psychomotor (control of the aircraft) and attentional (toward the instructions).

# Procedure

Participants were comfortably seated in an armchair in a sound-dampened experimental room. The room had no windows and the light was kept constant and moderate. After the training session, they were equipped with the EEG electrode cap as well as the electrooculographic electrodes for blink and saccade detection. Eye tracker calibration was performed to record participants' pupil diameter. Participants then performed the four experimental blocks. The instructions


were generated so the participant had to change target aircraft every trial. The tracking was continuous during the whole task; when a new instruction was displayed, participants still followed the previous aircraft until they processed the new instruction and switched to the corresponding aircraft. Participants could not repeat aloud the instructions and they were instructed to avoid moving and talking. After each of the four blocks, participants filled out the NASA-TLX.

# Electroencephalography

EEG was amplified and recorded with an ActiveTwo BioSemi system (BioSemi, Amsterdam, Netherlands) from 30 Ag/AgCl active electrodes mounted on a cap and placed on the scalp according to the International 10–20 System (FP1, FP2, AF3, AF4, F7, F3, Fz, F4, F8, FC5, FC1, FC2, FC6, CP5, CP1, Cz, CP2, CP6, P7, P3, Pz, P4, P8, T7, T8, PO3, PO4, O1, Oz, O2) plus two sites below the eyes for monitoring eye movements. Analyses were focused on 23 electrodes of interest. Two additional electrodes placed close to Cz, the Common Mode Sense (CMS) active electrode and the Driven Right Leg (DRL) passive electrode, were used to form a feedback loop that maintains the average potential of the participant as close as possible to the AD-box reference potential. Electrode impedance was kept below 5 kΩ for scalp electrodes, and below 10 kΩ for the four eye channels. Skinelectrode contact, obtained using electro-conductive gel, was monitored, keeping voltage offset from the CMS below 25 mV for each measurement site. All the signals were (DC) amplified and digitized continuously with a sampling rate of 512 Hz with an anti-aliasing filter with 3 dB point at 104 Hz (fifth-order sinc filter); no high-pass filtering was applied online. The triggering signals to each word onset were recorded on additional digital channels. EEG data was offline re-referenced to the average activity of the two mastoids and band-pass filtered (0.1–40 Hz, 12 dB/octave), given that for some participants the low-pass filter was not effective in completely removing the 75 Hz artifact. Epochs were timelocked to instructions onset and extracted in the interval from −200 ms to 800 ms. The 200 ms pre-stimulus baseline was used in all analyses. Given their synchronicity, we could not dissociate the respective contributions of written instructions and auditory distractors on the ERPs. Segments with excessive blinks and/or artifacts (such as excessive muscle activity) were eliminated off-line before data averaging. The lost data (due to artifacts) represented 7%.

## Pupillometry

The diameter of participants' left pupil was continuously recorded with a remote SMI RED eye-tracker (SensoMotoric Instruments GmbH, Germany) at a sampling rate of 500 Hz. Before each condition, participants performed a 5-point calibration procedure. The continuous pupillary recordings were cleaned for blink artifacts using linear interpolation, including adjacent 40 ms from each side to avoid eyelid closure artifacts. The data was then filtered with a ''two pass'' 9-point filter (low-pass) and segregated into trials by conditions. A trial was validated for the statistical analysis if the time spent blinking during the trial did not exceed 50% (i.e., 2250 ms). This resulted on average in 87% (SD = 18%) of validated trials per condition and was not dependent on the condition.

#### Statistical Analyses

Statistical analyses were performed using Statistica 10 (StatSoft). Differences between the experimental conditions were investigated by using analysis of variance (ANOVA) followed by Tukey's honestly significant difference (HSD) post hoc testing.

#### RESULTS

#### Performance

A 2 × 2 (Congruency [Congruent; Incongruent] × Load [Low; High]) repeated measures ANOVA revealed a main effect of WML on piloting performance (F(1,23) = 7.79, p < 0.05, η 2 <sup>p</sup> = 0.25); the participants were better at aircraft targeting in the low WML condition (M = 85.65, SD ± 13.26) compared to the high WML condition (M = 75.55, SD ± 16.65; see **Figure 3**). On the contrary, we found no effect of congruency (F(1,23) = 0.08, p = 0.78, neither Load × Congruency interaction (F(1,23) = 1.18, p = 0.29, η 2 <sup>p</sup> = 0.05) on piloting performance. **Figure 4** shows the temporal evolution of the distance between the user's and the target aircraft for each condition and correct and incorrect trials. This fine-grained analysis confirms that poorer piloting performance in the high WML condition cannot be associated with slow or inaccurate following of the correct aircraft. The decline of piloting performance under high WML was due to incorrect targeting (i.e., the inability to correctly encode and retrieve the aircraft to target). For the incorrect trials, the distance between the target aircraft and the user's was indeed kept constant at about 220 pixels corresponding to a wrong aircraft. On the contrary, the rare incorrect trials under low WML are mainly due to poor following of the correct aircraft (the average distance oscillates just above the 100 pixel threshold).

#### NASA-TLX Questionnaire

The 2 × 6 (Load [low, high] × NASA-TLX dimensions [Mental Demand, Physical Demand, Temporal Demand, Performance, Effort, Frustration]) repeated measures ANOVA showed a significant effect of load (F(1,23) = 32.68, p < 0.001, η 2 <sup>p</sup> = 0.59). Participants evaluated the high WML condition as more mentally demanding, see **Figure 5**. The ANOVA also showed a significant effect of the NASA-TLX dimension (F(1,23) = 8.23, p < 0.001, η 2 <sup>p</sup> = 0.26). The three dimensions with the highest scores were (in descending order): Performance, Effort and Mental Demand. Finally, a Load × NASA-TLX dimensions interaction was found (F(5,15) = 7.83, p < 0.001, η 2 <sup>p</sup> = 0.25). Most notably, in the high WML condition, the Mental Demand dimension was rated higher on average than all other conditions (M = 72.50, SD = 15.28). However, this is statistically significant only when

compared to Physical Demand and Temporal Demand (HSD: p < 0.01 in both comparisons).

## Electroencephalography

Because of EEG recording issues, data were not available for one participant (corrupted data). First, five 3 × 2 × 2 (Electrode [Fz, Cz, Pz] × Congruency [congruent, incongruent] × Load [low, high]) repeated measure ANOVAs were conducted to assess mean amplitudes of MMN, P3a, P3b, N400 and P600 components on the three midline electrodes. ERPs time windows were determined through both literature and visual analysis of the peak amplitudes. Second, in order to investigate possible topographical differences for these ERPs, the remaining electrodes were collapsed into four regions of interest of five electrodes each (see Siyanova-Chanturia et al., 2012): Left Anterior (AF3, F7, F3, FC1, FC5), Right Anterior (AF4, F4, F8, FC2, FC6), Left Posterior (CP2, CP6, P4, P8, PO4) and Right Posterior (CP1, CP5, P3, P7, PO3). A 4 × 2 × 2 (Region [Left Anterior, Right Anterior, Left Posterior, Right Posterior] × Congruency [congruent, incongruent] × Load [low, high]) ANOVA was conducted. See **Figure 6** for grand average ERP waveforms.

#### MMN (200–240 ms Time Window)

The MMN amplitude was assessed in terms of the mean amplitude in the 200–240 ms time window. The statistical analysis revealed no significant main effect or interaction (ps > 0.05).

#### P3a (300–330 ms Time Window)

The P3a amplitude was assessed in terms of mean amplitude in the 300–330 ms time window. The analysis revealed a main effect of load (F(1,22) = 8.13, p < 0.01, η 2 <sup>p</sup> = 0.27), with a greater positivity in the low WML (M = 3.34 µV, SD ± 6.77)

than in the high WML condition (M = 0.10 µV, SD ± 5.92). The analysis also revealed a significant Electrode × Congruency interaction (F(2,44) = 3.23, p < 0.05, η 2 <sup>p</sup> = 0.13), with a greater positivity for incongruent trials (M = 1.21 µV, SD ± 6.64) than for congruent trials (M = 0.39 µV, SD ± 7.11), (HSD: p < 0.05) at Fz. No significant differences

were found at Cz (HSD: p = 0.84) and at Pz (HSD: p = 0.13).

#### P3b (400–490 ms Time Window)

The P3b amplitude was assessed in terms of the mean amplitude in the 400–490 ms time window. The analysis revealed a main effect of load (F(1,22) = 5.03, p < 0.05, η 2 <sup>p</sup> = 0.19), with a greater positivity in the low WML condition (M = 1.27 µV, SD ± 7.89) than in the high WML condition (M = −2.03 µV, SD ± 5.24). The analysis also revealed an Electrode × Congruency interaction (F(2,44) = 3.52, p < 0.05, η 2 <sup>p</sup> = 0.14), with a greater positivity in response to incongruent trials (M = −2.08 µV, SD ± 6.28) compared to congruent trials (M = −3.22 µV, SD ± 7.76; HSD: p = 0.01) at Fz. No significant differences were found at Cz (HSD: p = 0.21) and at Pz (HSD: p = 0.54).

#### N400 (470–540 ms Time Window)

The N400 was assessed in terms of the mean amplitude in the 470–540 ms time window. The analysis revealed a significant Electrode × Load interaction (F(2,44) = 9.45, p < 0.001, η 2 <sup>p</sup> = 0.30), with greater negativities in the high WML condition (Fz: M = −3.70 µV, SD ± 6.00; Cz: M = −2.91 µV, SD ± 4.93; Pz: M = −1.05 µV, SD ± 4.26) than in the low WML condition (Fz: M = −2.09 µV, SD ± 7.72; Cz: M = −0.36 µV, SD ± 7.52; Pz: M = 2.80 µV, SD ± 7.67; HSD: ps < 0.001) on Fz, Cz and Pz. The analysis also revealed a significant Electrode × Congruency interaction (F(2,44) = 8.95, p < 0.001, η 2 <sup>p</sup> = 0.29), with a

instruction and the auditory distractor) and the y-axis displays amplitude in microvolts. Negative is plotted down.

greater N400 amplitude for incongruent trials (M = 0.48 µV, SD ± 6.12) than for congruent trials (M = 1.27 µV, SD ± 6.82) at Pz (HSD: p < 0.05). The opposite pattern was found at Fz (incongruent trials: M = −2.17 µV, SD ± 6.48; congruent trials: M = −3.62 µV, SD ± 7.34; HSD: p < 0.001; this apparent contradiction for decreased N400 for incongruent trials in Fz can be explained by the previous more positive P3a on this electrode in this condition). No difference was found at Cz (incongruent trials: M = −1.33 µV, SD ± 6.16; congruent trials: M = −1.94 µV, SD ± 6.78, (HSD: p = 0.11). The topographical analysis revealed no significant effect or interaction.

#### P600 (530–750 ms Time Window)

The P600 was assessed in terms of the mean amplitude in the 530–750 ms time window. The analysis revealed a main effect of load (F(1,22) = 8.15, p < 0.01, η 2 <sup>p</sup> = 0.27), with a greater positivity in the low WML condition (M = 2.44 µV, SD ± 7.08) than in the high WML condition (M = −1.02 µV, SD ± 5.18). A significant Electrode × Congruency interaction (F(2,44) = 5.53, p < 0.01, η 2 <sup>p</sup> = 0.20), was found with a greater positivity observed for congruent trials (M = 2.97 µV, SD ± 6.45) compared to incongruent trials (M = 1.62 µV, SD ± 5.43) at Pz (HSD: p < 0.001), but not at Fz (HSD: p = 0.44) nor at Cz (HSD: p = 0.75). The topographical analysis revealed a significant Load × Region interaction (F(3,66) = 8.83, p < 0.001, η 2 <sup>p</sup> = 0.29), with greater positivity in both left and right posterior regions in the low load condition (respectively: M = −0.28 µV, SD ± 3.77; M = −0.14 µV, SD ± 3.82) than in the high WML condition (respectively: M = −1.14 µV, SD ± 7.72, p < 0.001; M = −1.61 µV, SD ± 8.34; HSD: p < 0.001). The analysis also revealed a second significant Load × Congruency interaction (F(1,22) = 5.11, p < 0.05, η 2 <sup>p</sup> = 0.19), with a greater positivity in response to congruent trials compared to incongruent trials in the low WML condition (respectively: M = −0.45 µV, SD ± 6.43; M = −1.79 µV, SD ± 8.47; HSD: p < 0.05), but not in the high WML condition (respectively: M = −0.67 µV, SD ± 3.89; M = −0.24 µV, SD ± 4.09; HSD: p = 0.44). Moreover, a greater positivity was also found for incongruent trials in the high WML condition (M = −0.24 µV, SD ± 4.09) than in the low WML condition (M = −1.79 µV, SD ± 8.47; HSD: p < 0.05).

#### Pupillometry

Two 2 × 2 (Load [low, high]) × Congruency [congruent, incongruent]) repeated measure ANOVAs were carried out on the mean value of 1.5-s recording starting from 1-s post-stimulus for tonic pupil response (absolute diameter) and phasic pupil response (relative dilation). This interval largely includes the peak of pupillary reaction known to appear about 1200–1500 ms post-stimulus (Beatty and Lucero-Wagoner, 2000). We used the mean value of 500 ms pre-stimulus as a baseline value for statistical analyses of the phasic pupil response.

#### Tonic Pupil Response

The ANOVA showed a significant effect of WML (F(1,23) = 9.14, p < 0.01, η 2 <sup>p</sup> = 0.28; see **Figure 7**). The high WML condition elicited larger tonic pupil response (M = 3.76 mm, SD ± 0.11) compared to the low WML condition (M = 3.68 mm, SD ± 0.11). No Congruency (F(1,23) = 1.82, p = 0.19, η 2 <sup>p</sup> = 0.07), nor Load × Congruency interaction (F(1,23) = 0.85, p = 0.37, η 2 <sup>p</sup> = 0.04] was found.

#### Phasic Pupil Response

No significant effects of WML (F(1,23) = 0.85, p = 0.37, η 2 <sup>p</sup> = 0.04), congruency (F(1,23) = 0.53, p = 0.47, η 2 <sup>p</sup> = 0.02), or Load × Congruency interaction (F(1,23) = 0.24, p = 0.63, η 2 <sup>p</sup> = 0.01) were found for phasic pupil response.

# DISCUSSION

Operating an aircraft is cognitively demanding and requires high multitasking and WM capacities (Konig et al., 2005). Pilots have to simultaneously process, memorize and retrieve an important amount of visual and auditory information. In

addition to this high cognitive load, pilots sometimes have to ignore irrelevant auditory distractors such as background ATC communications and false alarms. Paradoxically, they must remain responsive to unexpected stimuli at all times. Previous studies emphasized that attention-demanding settings that generate a high WML can impair the perception of unexpected/irrelevant stimuli (Berti and Schröger, 2003; Sörqvist et al., 2012). However, the impact of WML on the processing of linguistic material has rarely been tested in an explicit way. In the present study, participants completed a crossmodal version of the Stroop task paradigm (Donohue et al., 2013) adapted to a dynamic piloting task in combination with ERP and pupillary measurements. They were asked to take into account a written target instruction (i.e., the name of a color) and to ignore a concurrent spoken distractor (i.e., also a color). We investigated how WML modulated the processing of both target and distractors and to what extent it affected piloting performance (this latter being dependent on the processing/maintenance/retrieval of written instructions). Overall results revealed a subtle effect of congruency that was observable only at an electrophysiological level, an interaction between congruency and load on P600 amplitude, and a major effect of WML at behavioral, electrophysiological, and autonomic levels.

### Impact of the Congruency

At a behavioral level, the results revealed no main effect of the congruency between the written target and the spoken distractor, indicating that the latter does not interfere enough with the processing of the written instruction to affect piloting performance. This absence of interference at a behavioral level may be due to some limitations in the experimental paradigm. First, a recent study showed that distraction is maximal when an auditory distractor is presented 400 ms before the onset of the visual information of interest, but that it is significantly reduced when both stimuli of interest and the distractor are presented at the same time (Donohue et al., 2013). Since in the present study, ERP measurements were performed, it would have been complex to interpret electrophysiological results if the distractor and the target had been presented at different times. Presenting the spoken distractor before the written instruction could have led to a greater distraction effect, observable at the behavioral level. Second, in general, the distraction effect is observable on reaction times rather than accuracy measures. Longer reaction times reflect the penalty yielded by the involuntary orientation of attention to and away from deviant sounds (Parmentier, 2008). Given the continuous control of the aircraft trajectory with the joystick, it was not possible to measure reaction times accurately. We could have captured variations of reaction times if the task allowed such measurements.

However, electrophysiological data demonstrate the increased complexity for processing the written instructions when incongruent spoken distractors were presented simultaneously, with greater P3a and N400 amplitudes in incongruent trials compared to congruent trials. According to Polich (2007), the P3a component may be generated when focal attention on the

the written instruction and the auditory distractor.

task-relevant stimuli is captured by a distractor; and indexes the automatic allocation of attentional resources to the distractor at the expense of goal-relevant stimuli. Greater P3a amplitudes in response to incongruent trials may reflect the recruitment of supplementary attentional resources and the involuntary orienting of attention to the spoken distractors (Escera et al., 1998, 2000; Friedman et al., 2001).

The N400 component was found to index semantic incongruence processing (Kutas and Federmeier, 2011). In line with the literature (Hanslmayr et al., 2008; Kutas and Federmeier, 2011; Donohue et al., 2012), greater N400 amplitudes were found at parietal sites in incongruent trials compared to congruent trials. Some behavioral studies have shown that not only are attentional resources allocated to spoken distractors, but that they also lead to an involuntary semantic evaluation (Parmentier, 2008; Parmentier et al., 2011; Parmentier and Hebrero, 2013). The N400 component may index this involuntary semantic analysis of the spoken distractor. However, as the processing of the spoken distractor was not sufficient to trigger significant distraction observable at a behavioral level, we conclude that the mobilization of additional attentional resources indexed by the P3a component was sufficient to process both the relevant written instruction and the irrelevant spoken distractor.

#### Impact of the Working-Memory Load

At both behavioral and subjective levels, a higher level of WML was found to decrease the accuracy and was associated with a higher perceived mental demand as shown by the NASA-TLX questionnaire. Moreover, the pupil diameter was also modulated by WML, with greater tonic pupillary responses under high WML compared to low WML, thus objectively confirming the increased task difficulty in the high WML condition.

Given that difficulty levels were generated using an nback-like task, the tonic pupil diameter indicates the WML that was maintained throughout the block. Thus, when segregated into trials, the tonic reaction is always higher during the high WM condition. Pupil diameter was found to be correlated with WML and task difficulty (Kahneman and Beatty, 1966; Beatty and Lucero-Wagoner, 2000; Karatekin et al., 2007; Causse et al., 2010; Peysakhovich et al., 2015). Greater pupil diameter observed during trials of high WML provides additional evidence that higher WML increases the attentional resources allocated to the task. The present study also demonstrates that it is possible to measure pupil dilation relative to variations in WML in a task that requires natural eye movement if luminosity remains constant.

At an electrophysiological level, an increase in WML was found to affect the allocation of attentional resources (i.e., P3a and P3b components) and the semantic processing of both the written instruction and the spoken distractors (i.e., P600 component). The amplitudes of both the P3a and P3b components were lower in the high load condition than in the low load condition. An increased WM demand in the n-back task was previously shown to enlist attentional resources and processing capacity away from the matching subtask (i.e., the comparison process; Watter et al., 2001) and was associated to reduced P300 amplitudes. For this reason, decreased amplitudes of P3a and P3b in the high WML condition of the present study is interpreted as an ''overall'' alteration of the ability to orient attention and process environmental stimuli, including the critical written instruction. A classic hypothesis postulates that the P3a component originates from stimulusdriven frontal attention mechanisms when sufficient attentional focus is engaged (Polich, 2007). Recent research has shown that the P3a component indicates information selection within the WM (Berti, 2016). The P3b component originates from temporal-parietal activity when subsequent attentional resources promote context updating operations and memory processing (Knight, 1996; Brázdil et al., 2001; Polich, 2007). The P3b component is also considered to indicate stimulus analysis and response initiation (Verleger et al., 2005). Lower P3a and P3b responses in the 2-back condition indicate the mobilization of resources that cannot be allocated to processing the current written instruction since these resources are utilized by complex operations in WM. An increased N400 negativity was found at parietal sites in the high load condition compared to the low load condition. However, this increased negativity could be attributable to a decrease in amplitude of both P3a and P3b components preceding the N400 component. Therefore, it is difficult to draw definitive conclusions on the impact of load on this component.

Our results also revealed an unpredicted decrease in amplitude of the P600 in high WML compared to the low WML condition. The P600 component is a positive deflection occurring around 500 ms after stimulus onset and is known to reflect language revision processes (Friederici, 1995; Kaan et al., 2000; Papageorgiou et al., 2001). Interestingly, a recent study suggests that the P600 component may also index attention reorientation processes (Sassenhagen and Bornkessel-Schlesewsky, 2015). While the voluntary reorientation of attention was found to be indexed by the RON component (e.g., Schröger and Wolff, 1998; Schröger et al., 2000; Berti and Schröger, 2001; Wetzel et al., 2004), occurring around 400 ms after stimulus onset for simple stimuli (i.e., tones), in the present study, this process step appears to be delayed for linguistic stimuli (i.e., words) and indexed by the P600 component. It is likely that both the voluntary reorientation of attention back to the written instruction and its reanalysis/rechecking were affected in the high WML condition because fewer WM resources were available. This interpretation corresponds with a recent study showing that patients with WM deficits demonstrate lower P600 amplitudes (El-Kholy et al., 2012).

#### Interaction between Load and Congruency

At a behavioral level, the results of the present study revealed no interaction effect between the WML and target/distractor congruency. Previously mentioned limitations in the experimental paradigm could have contributed to this null effect. At an electrophysiological level, congruency was found to modulate the amplitude of the P600 component only in the low WML condition, with lower P600 responses in incongruent trials compared to congruent trials. Moreover, lower P600 amplitudes were found in response to incongruent trials in the low WML condition compared to the high WML condition. Taken together, these results are consistent with the idea that the interference effect is reduced as WML increases (e.g., SanMiguel et al., 2008; Lv et al., 2010; Sörqvist et al., 2012; Scharinger et al., 2015). We assume that attentional resources in WM were not sufficient to intensively process auditory distractors in the high WML condition. Consequently, no congruency effect was observed and the voluntary reorientation of attention back to the target instruction was easier. Higher task engagement generated by high WML condition tended to further reduce the effect of distraction observable only at an electrophysiological level. According to Kim et al. (2005), WML can either impair or benefit attentional selection depending on whether it overlaps with target/distractor processing or not. Their results showed that Stroop interference increased when the type of WML overlapped with the type of information required for the task. At the same time, Stroop interference decreased when the type of WML overlapped with distractor processing. In the present study, WML was elicited by exactly the same type of content as the targets/distractors (i.e., verbal stimuli). As a consequence, WML most likely impacted the processing of both written instructions and distractors, as shown by the decline in piloting performance and the mitigation of interference caused by the incongruent distractor, demonstrated by the P600 results.

We found results in line with previous studies (SanMiguel et al., 2008; Lv et al., 2010; Sörqvist et al., 2012) but also apparently in contradiction with the load theory, which predicts that high WML increases distractor interference by impeding inhibitory cognitive control (de Fockert et al., 2001; Lavie et al., 2004; Woodman and Luck, 2004). These contradictory results could possibly be explained by differences in the experimental paradigm. In the studies finding that WML enhances distraction, two independent tasks were combined: a ''WM task'' and a ''selective attention task''. In these experiments, a trial consisted in: first, a memorization phase (e.g., memorizing a list of digits), second, a selective attention task (e.g., classifying a written list of famous names such as pop stars or politicians while ignoring distractor faces), and third, a memory probe (e.g., reporting the digit that followed a probe in the memory set presented at the beginning of the trial). However, in the studies in which high WML was found to reduce the distraction effect, including the present study, the WM task and the selective attention task were nested. In other words, when the distraction task and the WM task were concomitant, WML was found to lower distractor interference, while when they were not, WML enhanced distractor interference. Future studies should test this hypothesis by comparing the effect of delayed vs. concomitant WML on distractor processing. Also, in order to get closer to the real piloting situation, future works should use spoken material conveying relevant, neutral or irrelevant information to the piloting task in order to investigate top-down processing modulations associated with spoken information according to their value to the focal task.

# CONCLUSION

In the present study, we adapted a visual-auditory version of the Stroop paradigm (Donohue et al., 2013) to a dynamic simulated piloting task in combination with ERPs and pupillary measurements. WML was also manipulated using an nback task. Electrophysiological results revealed that more attentional resources were mobilized during incongruent trials (i.e., P3a component) and that the incongruence between written instructions and spoken distractors was detected (i.e., N400 component), suggesting that spoken distractors were semantically processed. This result confirms previous behavioral findings showing that not only are attentional resources allocated to spoken distractors but that they also lead to an involuntary semantic evaluation of the latter (Parmentier, 2008; Parmentier et al., 2011; Parmentier and Hebrero, 2013). However, the semantic processing of distractors was not sufficient to impair task accuracy, probably thanks to the mobilization of supplementary attentional resources that enabled participants to process both the target and the incongruent distractor.

Overall, high WML disrupted the processing of both the visual target instruction and the spoken distractors. High WML provoked a decline in task accuracy and increased pupil diameter. At an electrophysiological level, an alteration of the ERPs component was found when the WML was high. In particular we found lower P3a/P3b responses indexing the mobilization of resources by the WM task that could not be allocated to orient the attention and process environmental stimuli, including the critical written instruction. We also found lower P600 responses, showing the impairment of voluntary reorientation of attention back to the processing of written instruction, thus altering the reanalysis/rechecking process. In addition, lower P600 responses in incongruent trials than to congruent trials were significant in the low WML condition only, indexing an easier voluntary reorientation of attention back to the target instruction because interference was reduced in the high WML condition. Our electrophysiological results can be related to a recent study (Scheer et al., 2016) that support a threestage distraction model with ERPs that reflect the post-sensory detection of the task-irrelevant stimulus, engagement, and reorientation back to the relevant task. They showed that the difficulty of a steering task not only diminished the amplitudes of early P3, late P3 but also the re-orientation negativity (RON) to the steering task (reorientation being rather indexed by P600 component in our study). Our results are also consistent with theories such as enhancing inhibitory control (Scharinger et al., 2015) and the task engagement/distraction trade-off model (Sörqvist and Rönnberg, 2014) with the idea that an higher cognitive engagement in a task can diminish the distractibility and responsiveness to additional stimuli. From an operational point of view, we confirm that high WML can compromise the ability of pilots to process, maintain, and execute ATC verbal instructions (2005) and to react to critical auditory alerts (Giraudet et al., 2015). We also demonstrate that P300 and P600 components are good candidates to detect variations in WM demand and that they allow estimation of its impact on the processing of linguistic stimuli.

### AUTHOR CONTRIBUTIONS

MC: designed the experiment, conducted data analysis, interpreted the data and wrote the manuscript; EFF: administered the experiment, conducted data analysis, interpreted the data and wrote the manuscript; VP: designed the experiment, developed the experimental task, conducted data analysis and wrote the manuscript.

#### REFERENCES


#### ACKNOWLEDGMENTS

Authors would like to thank Marine Gonzalez and Louise Giraudet for their precious help with data collection. The authors would also like to thank the two reviewers for their helpful comments. We are also grateful to Zarrin Chua and Joseph Shea for proofreading earlier versions of the article.


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Causse, Peysakhovich and Fabre. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# The impact of expert visual guidance on trainee visual search strategy, visual attention and motor skills

Daniel R. Leff 1† , David R. C. James 1† , Felipe Orihuela-Espina1,2 , Ka-Wai Kwok <sup>1</sup> , Loi Wah Sun<sup>1</sup> , George Mylonas <sup>1</sup> , Thanos Athanasiou<sup>1</sup> , Ara W. Darzi <sup>1</sup> and Guang-Zhong Yang<sup>1</sup> \*

<sup>1</sup> Hamlyn Centre for Robotic Surgery, Imperial College London, London, UK, <sup>2</sup> National Institute for Astrophysics, Optics and Electronics (INAOE), Tonantzintla, Mexico

Minimally invasive and robotic surgery changes the capacity for surgical mentors to guide their trainees with the control customary to open surgery. This neuroergonomic study aims to assess a "Collaborative Gaze Channel" (CGC); which detects trainer gazebehavior and displays the point of regard to the trainee. A randomized crossover study was conducted in which twenty subjects performed a simulated robotic surgical task necessitating collaboration either with verbal (control condition) or visual guidance with CGC (study condition). Trainee occipito-parietal (O-P) cortical function was assessed with optical topography (OT) and gaze-behavior was evaluated using video-oculography. Performance during gaze-assistance was significantly superior [biopsy number: (mean ± SD): control = 5.6 ± 1.8 vs. CGC = 6.6 ± 2.0; p < 0.05] and was associated with significantly lower O-P cortical activity [1HbO<sup>2</sup> mMol × cm [median (IQR)] control = 2.5 (12.0) vs. CGC 0.63 (11.2), p < 0.001]. A random effect model (REM) confirmed the association between guidance mode and O-P excitation. Network cost and global efficiency were not significantly influenced by guidance mode. A gaze channel enhances performance, modulates visual search, and alleviates the burden in brain centers subserving visual attention and does not induce changes in the trainee's O-P functional network observable with the current OT technique. The results imply that through visual guidance, attentional resources may be liberated, potentially improving the capability of trainees to attend to other safety critical events during the procedure.

Keywords: functional near infrared spectroscopy, optical topography, neuroergonomics, graph theory, collaborative gaze, visual attention, skills assessment, mentoring

# Highlights


#### Edited by:

Klaus Gramann, Berlin Institute of Technology, Germany

#### Reviewed by:

Peter König, University of Osnabrück, Germany Frederic Dehais, Institut Supérieur de l'Aéronautique et de l'Espace, France

#### \*Correspondence:

Guang-Zhong Yang, Hamlyn Centre for Robotic Surgery, Imperial College London, Level 4, Bessemer Building, South Kensington Campus, London, SW7 2AZ, UK g.z.yang@imperial.ac.uk

†These authors have contributed equally to this work.

Received: 28 July 2015 Accepted: 10 September 2015 Published: 14 October 2015

#### Citation:

Leff DR, James DRC, Orihuela-Espina F, Kwok K-W, Sun LW, Mylonas G, Athanasiou T, Darzi AW and Yang G-Z (2015) The impact of expert visual guidance on trainee visual search strategy, visual attention and motor skills. Front. Hum. Neurosci. 9:526. doi: 10.3389/fnhum.2015.00526

# Introduction

In high-risk industry, collaboration between operators is integral to performing goal-orientated tasks successfully (e.g., pilots, airtraffic controller, surgeons, etc). Regarding surgery, collaboration is necessary between surgeons and their assistant(s), theatre nurse(s) and occasionally members of allied specialties. Recent developments in technologies for robotic surgery such as dual console systems (e.g., da Vinci<sup>r</sup> Si) enable two surgeons to operate simultaneously, facilitating both high-level co-operation and mentorship as well as potentially streamlining the operators' cognitive resources towards improved safety. However, in this scenario, it is important that communication between both surgeons is effective to enable a seamless flow of information between the two operators and ensure an efficient workflow. Similarly, excellent communication facilitates technical skills training in surgery. During ''open'' surgery, expert trainers' employ a variety of methods for communication with trainees that include a combination of verbal instruction, physical pointing or actual demonstration(s). However, during robotic minimally invasive surgery (MIS), there may be circumstances in which the trainee or collaborating surgeon is using both instruments simultaneously within the operative field of view, constraining the trainer/master surgeon and rendering them reliant solely on verbal communication.

Within MIS and robotic surgery, techniques exist such as telestration that aid information transfer between surgeons and/or between trainer and trainee. Telestration allows information to be ''drawn'' onto a monitor at a remote site by the surgeon guiding the procedure. This information is then displayed on the operator's screen with the aim of guiding performance and may be undertaken either remotely or locally (Ferguson and Stack, 2010). Remote guidance or telementoring enables surgeons to be guided by a mentor at a location remote from the operation. This form of instruction has been applied to better enable regional experts to guide surgeons at local centers and to provide assistance and mentoring from surgical experts in other countries (Micali et al., 2000; Schlachta et al., 2010).

There has been interest in the role that gaze behavior may have in improving the flow of communication between collaborating subjects. For example, it has been demonstrated that shared gaze during visual collaboration enables a more efficient search strategy when compared to verbal collaboration alone (Brennan et al., 2008). Therefore, it is anticipated that observing a guiding surgeon's point of regard instead of, or in conjunction with their verbal instruction(s) will significantly improve the performance of the operating surgeon by providing supplementary cues critical to task success. Based on this concept, a new system referred to as ''collaborative gaze control'' (CGC) was developed to enable an operating surgeon to be directed by visual guidance as opposed to or in conjunction with verbal instruction(s) from an expert (Kwok et al., 2012). With CGC enabled, the trainer's gaze behavior is extracted in real-time. Their point of regard is subsequently relayed to the trainee's screen, which may be in a remote location. Therefore, the trainee's operative manoeuvres can be directed more precisely, potentially obviating the dependence on verbal instruction(s). Importantly, in manipulating target salience, visual search is modulated leading to enhanced behavioral performance (Avraham et al., 2008).

More recently, there is evidence that workload can be inferred from saccadic eye movements (Tokuda et al., 2011), pupillary responses (Zheng et al., 2015) and blink frequency (Zheng et al., 2012). Challenging, effortful visual search results in greater visual cortical (V1) excitation (Kojima and Suzuki, 2010). Evaluating the impact that technological manipulation of visual search has on an operator's cortical function helps to determine whether performance enhancement is offset by the need for greater attentional demands at brain level. This is encompassed by ''neuroergonomics'' which concerns the investigation of the brain behavior at work (Parasuraman, 2003), a paradigm that has been applied to surgery in order to investigate how recruited brain regions may be modulated by novel performance-enhancing tools (James et al., 2010b, 2013).

In order to examine this effect, functional Near Infrared Spectroscopy (fNIRS) a non-invasive neuroimaging modality is utilized to measure task-evoked fluctuations in oxygenated and deoxygenated hemoglobin (HbO<sup>2</sup> and HHb respectively) within cortical tissues that reflects the magnitude of cortical activation (Jöbsis, 1977). This is based upon the principle that neuronal activity and the associated increased metabolic demand within the brain leads to local hemodynamic changes, so termed ''neurovascular coupling'' (Roy and Sherrington, 1890). Unlike functional magnetic resonance imaging (fMRI), fNIRS is relatively resistant to motion artifact and can be used in conjunction with ferromagnetic instruments and has been successfully applied to monitor the cortical responses in surgeons (Leff et al., 2008a,b,c; Ohuchida et al., 2009; James et al., 2011, 2013). Broadly, these studies highlight the importance of the prefrontal cortex (PFC) in supporting ''cognitive phases'' of skill learning (Leff et al., 2008a), evolution in PFC excitation with technical skills training (Leff et al., 2008c), and relative PFC redundancy amongst expert surgeons (Ohuchida et al., 2009). More recently, investigators have demonstrated the impact of the type of learning (e.g., implicit vs. explicit) and the influence of technology to stabilize performance and enhance neuronal efficiency amongst surgeons (Zhu et al., 2011; James et al., 2013).

Functional brain connectivity captured in coherence or crosscorrelation between different brain regions can be used to investigate efficiency in brain networks (Zhu et al., 2011; James et al., 2013). Graph Theory, a popular method for interrogating brain networks, can model the organization, development and function of complex networks (Sporns et al., 2004; Bullmore and Sporns, 2009; Sporns, 2011) and has been successfully employed to networks derived from fNIRS data (Niu et al., 2012; James et al., 2013). In this regard, studies investigating graph topology such as the number of connections, cost and efficiency have demonstrated associations between task performance and brain network efficiency or cost-efficiency (Bassett et al., 2009). Despite the above, there have been no studies investigating the influence of varying trainer/mentor guidance on brain function or network architectures amongst trainees.

The aim of this paper is to investigate the influence of a gaze channel on changes in visual search strategies, technical performance, and brain behavior in a group of task naïve subjects being instructed to perform simulated biopsy using robotic MIS. Therefore, it is anticipated that compared to verbal guidance technical procedural skills may be superior during gazeassistance owing to the improved perceptual flow of information to the trainee. The primary hypothesis is that increased target saliency will lead to a ''bottom-up'' search strategy, reflected in a more focused pattern of V1 activation and a reduction in the need for recruitment of extra-striatal visual association areas. Conversely, verbal communication (gold standard) is anticipated to lead to a more effortful ''top down'' visual search strategy, necessitating recruitment of additional cortical regions outside V1, manifest as greater excitation in centers of visual attention. The secondary hypothesis is that collaborative gaze may facilitate the flow of information transfer in the visual-parietal network manifest as reduced network costs, improved efficiency and reduced network burden.

## Materials and Methods

#### Subjects

The study was carried out in accordance with the recommendations of the Local Regional Research Ethics Committee (LREC 05/Q0403/142) with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki. Following ethical approval a randomized control trial was conducted in which 20 subjects (1 female) were recruited from Imperial College London (mean age, years ± SD = 28.9 ± 1.5). Left-handed subjects and those with a history of neuropsychiatric illness or previous exposure to the task were excluded (Orihuela-Espina et al., 2010). Subjects were included on the basis that they were task naïve. The task was performed under both guidance conditions (order randomized) such that subjects served as their own controls and bias associated with learning or ordering effects was minimized.

#### Task Paradigm

The robotic surgical task entailed the subject (''trainee'') and an expert (''trainer'') collaborating in taking virtual biopsies from a simulated gastric mucosa in a shared surgical environment as depicted in **Figure 1**. Haptic manipulators (Phantom, Omni, SensAble Technologies, USA) were used to control robotic graspers in the virtual scene. The task necessitated the trainee take a virtual biopsy and pass the specimen to the guiding trainer. Both the trainee's and the trainer's graspers were visible within the same field of view with the former located inferiorly and the latter superiorly as depicted in **Figure 1** (panels i–iv). Within the operative field, seven nodules were visible to the trainee. The choice of nodule for biopsy was randomly determined and this selection was available only to the trainer. Therefore, the appropriate biopsy site had to be conveyed to the trainee either visually or verbally by the trainer. Once the biopsy was taken by the trainee, the specimen was passed towards the trainer's graspers and when successfully transferred to the trainer, it disappeared from the field of view. This process was repeated as many times as possible during the allotted task periods.

Prior to commencing the study, all subjects received a standardized period of task familiarization. All subjects performed the simulated biopsy task under verbal (control) and visual instruction (CGC; Kwok et al., 2012). The order was randomized (random number generator) in order to control for learning effects. Regarding the control task, the location of the biopsy site was described by the trainer using verbal instructions. With CGC enabled, a portable eyetracker (×50 eyetracker Tobii Technologies, Sweden) situated beneath the trainer's monitor detected their fixation point and conveyed this to the trainee's screen as a cross. Therefore, with CGC enabled, the trainer's target selection would be conveyed to the trainee. For each condition (verbal and CGC) a block design experiment was employed comprising a baseline rest period (30 s) followed by five task blocks each of which comprised alternating episodes of simulated nodule biopsy (30 s) and inter-trial rest periods (30 s). During rest periods, subjects were asked to remain still with their eyes open regarding a black screen on the task monitor. Within functional neuroimaging experiments, block design paradigms have the advantage of allowing the hemodynamic response to return to baseline between each session, therefore providing reliable indices of task-evoked cortical activity. Furthermore, the block design allows task data to be averaged, increasing the signal to noise ratio.

#### Cortical Activity

Brain activation was assessed using a commercially available 24 channel Optical topography (OT) system (ETG-4000, Hitachi Medical Corp., Japan). Sixteen optodes (8 emitters and 8 detectors) were positioned in a 4 × 4 array over the O-P cortices as displayed in **Figure 1**. A ''channel'' represents a banana-shaped volume of cortex where changes in absorption of near infrared light from the optode emitters are interpreted as changes in HbO<sup>2</sup> and HHb. The array was centered on ''Oz'' of the International 10–20 system (Jurcak et al., 2007) with the intention of capturing activation within the visual cortex. Cortical data was subject to both manual and automated data integrity checks (Orihuela-Espina et al., 2010) to identify and eliminate data contaminated with noise, optode movement and saturation-related artifacts (i.e., apparent non-recordings and ''mirroring''). Since both ambient light and near infra-red light from eye-tracking systems have the potential to influence OT data (Orihuela-Espina et al., 2010), laboratory lights were dimmed and the probes were shielded using a combination of external fixation tapes and shower cap.

#### Technical Performance

The number of nodules that the trainee was able to successfully biopsy and transfer to the trainer's graspers across the task period and the trainee's instrument pathlength (metres) were recorded and used as objective metrics of technical performance. This was preferred to restricting the overall number of moves towards calculating time/nodule biopsied, and helped to ensure that subjects were focusing on the task quality and not the procedural time, or perceiving the number of movements.

#### Gaze Behavior

Subject and trainer gaze behavior was recorded throughout the study with portable eyetracking technology (×50 eyetracker, Tobii Technologies, Sweden) situated beneath the task monitor (as displayed in **Figure 1**). The gaze behavior of the trainer was interrogated to derive their fixation point in order to display this as a cross on the trainee's monitor thereby facilitating gaze-guidance in CGC (study condition). The trainer's fixation point was not visible to the trainee during episodes of verbal guidance (control condition). The trainee's fixation points were recorded to determine the time taken, termed ''gaze latency'' (GL, seconds), to fixate on the same area of the surgical scene as the expert.

#### Heart Rate Monitoring

A portable band electrocardiogram (Bioharness v2.3.0.5; Zephyr Technology Limited, USA) was used to acquire continuous heart rate data, from which heart rate variability (HRV) was derived and used to infer subject stress (Task Force of the European Society of Cardiology the North American Society of Pacing Electrophysiology, 1996).

# Data Analysis

#### Cortical Hemodynamics

Cortical hemodynamic data and network graph econometrics were observed to be non-Gaussian and therefore analyzed using non-parametric tests of significance. Channel-wise cortical activation was determined as a task-evoked statistically significant increase in HbO<sup>2</sup> coupled to a significant decrease in HHb from baseline rest (Wilcoxon Rank Sign, p < 0.05). For each channel of data and hemoglobin species a variable ∆Hb was computed (Hb task–Hb rest). To investigate the influence of the mode of guidance (CGC vs. control) and stress on cortical hemodynamics (i.e., ∆HbO<sup>2</sup> and ∆HHb) random effects models (REM) were generated (Intercooled Stata, v10.0 for windows, Stata Corporation, USA).

Cortical hemodynamic data was subsequently used to construct a task-evoked network of the 24 channels using graph theory (Bullmore and Sporns, 2009). A 24 × 24 bidimensional cross-correlation matrix was constructed by cross-correlating data between all channels, as previously described (James et al., 2013). This matrix represents the strength of functional associations within the network of 24 channels. Comparisons between graphs of different functional networks are potentially sensitive to the method used for thresholding, for which an optimal solution does not yet exist (van Wijk et al., 2010). Therefore, to evaluate the active network, the matrix was pruned to eliminate ''inactive'' graph nodes. This approach renders a network for each subject during each task condition.

Econometric data from these networks was then calculated to derive: (a) the number of network connections; (b) the maximum global efficiency (Achard and Bullmore, 2007); (c) the normalized cost (Achard and Bullmore, 2007); and (d) the task-induced ''network burden'' (James et al., 2010a). Network economy is defined as efficiency minus cost (Achard and Bullmore, 2007). The network burden is defined here as—economy which equates to ''costefficiency''. If a network is economical the cost-efficiency is high and accordingly the network burden is low. Network measures were also compared between the study and control groups using REM analysis to determine whether the mode of guidance (CGC vs. control) significantly influenced network econometrics. Statistical significance was set at p = 0.05.

#### Performance and Gaze Behavior

The number of nodules biopsied by each subject during the allotted task time and the instrument pathlength (metres) were determined. GL (seconds) was derived from the eye-tracking data stream. Behavioral performance and GL data was observed to be Gaussian and therefore analyzed using paired t-tests. These data were subsequently incorporated into the REM analysis in order to assess whether the guidance mode (control vs. CGC) was a predictor of performance accuracy and efficiency in visual search.

#### Heart Rate Analysis

HRV as calculated by the standard deviation of the R to R interval (SDRR) was derived from the HR data stream (Task Force of the European Society of Cardiology the North American Society of Pacing Electrophysiology, 1996). The SDRR decreases under stress and was incorporated into the REM analysis, to exclude any potential confounding effect that differences in HRV or changes in mean HR may exert on changes in cortical hemodynamics. Furthermore, HRV was utilized to determine which mode of guidance (verbal vs. CGC) trainee's found the most stressful by undertaking a univariate random effects analysis (p = 0.05).

### Results

#### Technical Performance

Biopsy number and instrument pathlength was analyzed to determine whether CGC improved trainees' technical performance. As illustrated in **Figure 2A**, gaze-guidance

FIGURE 2 | (A) Technical performance as indexed by the number of biopsies retrieved (I) and instrument path length (II). Box plots indicate mean and error bars represent 95% confidence interval. (B) Gaze plots from a representative subject under control (I) and gaze guidance (II) demonstrate more focussed fixations during gaze-assistance.


TABLE 1 | The influence of guidance mode on technical performance, visual search behavior, changes in cortical hemodynamics, network topological properties and systemic effects.

p < 0.05 = bold, p < 0.001 = bold italic.

under the influence of CGC resulted in enhanced technical performance. **Table 1** highlights the differences in technical performance according to the mode of guidance. With gazeassistance, trainees' biopsied a significantly greater number of nodules [biopsy number (mean ± SD): control = 5.6 ± 1.8 vs. CGC = 6.6 ± 2.0, p < 0.05] using significantly shorter instrument pathlength (metres) [mean ± SD: control = 0.6 ± 0.1 vs. CGC = 0.3 ± 0.7, p < 0.001]. This implies that trainees were faster, more productive and used virtual instruments more economically when operating from the CGC mode.

#### Gaze Behavior

GL which represents the temporal delay between trainer and trainee gaze fixation was analyzed to determine whether gaze guidance streamlined trainee visual search. **Figure 2B** depicts

depicting spatially broader task-evoked oxygenated hemoglobin change during verbal guidance.

the visual search pattern acquired from a representative trainee under both guidance conditions. It is apparent that whilst operating under gaze guidance, trainee fixations appear to be more localized to the nodule to be biopsied. GL was significantly shorter in CGC mode [GL seconds (mean ± SD): control = 1.4 ± 0.3 vs. CGC = 0.8 ± 0.2, p < 0.001]. This suggests that gaze assistance manifests as more rapid fixation on the appropriate target nodule to be biopsied.

#### Cortical Activation

Cortical hemodynamic change was analyzed to compare trainee brain responses between verbal and gaze-assisted modes of operation, with the hypothesis that verbal guidance would induce higher amplitude and spatially broader O-P hemodynamic changes. Topograms of a representative subject depicting the

average change in HbO<sup>2</sup> overlying the O-P cortices are displayed in **Figure 3**. **Table 1**, depicts cortical hemodynamic change as ∆HbO<sup>2</sup> (mMol × cm) averaged across the O-P cortices for both verbal and gaze-guidance. Cortical hemodynamic change evoked by verbal guidance was more diffuse as illustrated in **Figure 4** (CGC: 11/24 channels active vs. verbal: 19/24 channels active), more likely to involve bilateral parietal as well as bilateral visual cortices and was greater in magnitude than the response evoked by gaze guidance (∆HbO<sup>2</sup> mMol × cm [median (IQR)]: control = 2.5 (12.0) vs. CGC = 0.63 (11.2), p < 0.001; ∆HbT mMol × cm [median (IQR)]: control = 3.6. (13.0) vs. CGC = 1.1 (11.6), p < 0.001). Overall, this data supports the primary hypothesis that training in CGC mode evokes an attenuated O-P brain response. The mode of guidance did not significantly influence the magnitude of ∆HHb [∆HHb mMol × cm [median (IQR)]: control = −1.4 (5.0) vs. CGC = −1.0 (4.5), p = 0.27]. Similarly, as highlighted in **Table 2**, REM analysis revealed that guidance mode was a predictor of ∆HbO<sup>2</sup> (p < 0.001) but not of ∆HHb (p = 0.19).

#### Cortical Networks

Graph theoretical econometric data were computed and compared between guidance modes with the hypothesis that the performance of functional network in CGC mode would be associated with less cost and greater efficiency. **Figure 5** depicts the activated cortical network under control and CGC conditions for a representative subject. **Table 1** represent results of econometric analysis delineating the number of cortical connections, normalized cost, maximum global efficiency and cognitive burden. Differences in these network topological properties between modes guidance did not reach statistical threshold. Additionally, even when subject-level clustering was considered (**Table 2**) guidance mode was not found to predict network properties (e.g., cost, efficiency, etc). This suggests that CGC does not induce changes in the trainee's O-P functional network observable with the current OT technique.

#### Heart Rate Data

HR and SDRR were monitored to determine the influence of guidance mode on stress-related change in systemic responses (**Table 1**). Between-condition differences in HR and SDRR were not statistically significant [Median HR (IQR): control = 71.2 (10.0) vs. CGC = 73.4 (8.1) p = 0.70; Median SDRR (IQR): control = 57.7 (42.0) vs. CGC = 47.2 (36.9), p = 0.43). Additionally, upon REM analysis, neither HR nor SDRR were observed to be predictors for changes in cortical hemodynamics.

#### Harms

No harms occurred in the study.

# Discussion

In this study, performance on a simulated surgical task has been improved by modulating the manner in which collaborating surgeons interact with one another. Communicating through collaborative gaze-driven control leads to a greater number of successful biopsies and a reduction in instrument path length, the latter being a measure of dexterity previously shown to reflect skill level in laparoscopic and open surgery (Bann et al., 2003; Xeroulis et al., 2009). The foundation for this improvement appears to be a change in visual search strategy manifest as a reduced GL indicating that with gazeassistance, trainee fixation points more rapidly reach those of the expert. This was accompanied by an amelioration of cortical excitation across primary visual centers in the brain, but without an appreciable difference in O-P network costs or burden.

The current paper offers a potential mechanistic explanation for improvements observed in novices' performance when training under the influence of expert visual cues (Wilson et al., 2011; Chetwood et al., 2012). Experienced operators are known to utilize more effective gaze-strategies than novices, characterized by fixating on relevant target locations and adopting optimal psychomotor control (Wilson et al., 2011). Unlike novices who learn mapping rules by switching their point of regard between tool and target, experts utilize a target locking strategy and rarely need to check tool locations (Leong et al., 2008). As demonstrated by Wilson et al. (2011), novices trained to observe and then ''mimic'' the more focused gaze patterns of experts improve their laparoscopic performance and multi-tasking capabilities more than novices trained to observe expert performance without the benefit of expert gaze-cues. Similarly, Chetwood et al. (2012) observed improved completion times and reduced errors in novices guided by expert gaze vs. expert verbal instructions. However, unlike the current experiment, the aforementioned studies were not designed to explain the foundation for improved performance owing to gaze guidance, resulting instead in speculation regarding adaptation in visual

cognitive function. Here, improved performance as a result of expert gaze guidance is understood as a reduction in visual activation and hence attentional demand on the visual cortex. This is in line with studies demonstrating learning related plasticity in activation maps implying attenuation of attentional resources associated with training and expertise (Dayan and Cohen, 2011). By manipulating the visual behavior of novices in a way that they align more closely with those of experts it is conceivable that novices may bypass the early ''cognitive'' phases of visual-motor learning (Fitts and Posner, 1967). This notwithstanding confirming that the gaze behavior of trainees operating under gaze guidance was characterized by less random saccadic activity and was indeed more ''expert'' cannot be confirmed using GL alone and would necessitate a more elaborate analysis of eye-tracking data such as using exploit/explore ratio (Dehais et al., 2015) or visual entropy (Di Nocera et al., 2007).

There is evidence from functional neuroimaging studies that streamlined visual search strategies lead to reduced activation in the visual cortex (Kojima and Suzuki, 2010). For example, Kojima and Suzuki (2010) observed greater hemodynamic responses in fNIRS channels centred on the visual cortex during more effortful search strategies. However, it must be acknowledged that the introduction of a target feature into the surgical scene might be anticipated to increase visual attention owing to changes in visual saliency. This is relevant since the eye-tracking derived fixation point of the expert was projected to trainee as a visually salient target. Interestingly, shifts in visual attention secondary to manipulations in visual saliency as a result of gaze-guidance (i.e., the trainer's fixation point) did not manifest as greater activation in the visual cortex when compared to verbal instruction. Rather, the resultant visual search is potentially streamlined from a ''top-down'' to ''bottomup'' strategy (van der Stigchel et al., 2009; Theeuwes, 2010). Specifically, if a target markedly differs from its background, it



(p < 0.05 = bold, p < 0.001 = bold italic).

is visually salient and is more likely to be detected by a ''bottomup'' search strategy guided by the saliency of the scene, whereas if a target requires greater cognitive input to be identified, a ''top down'' search ensues which is dependent on the PFC and parietal cortex (PC; van der Stigchel et al., 2009; Theeuwes, 2010). Bottom up saliency is not coded in the primary visual cortex (Betz et al., 2013), and this mode results in search simplification leading to a reduction in activity in visual association areas (Kojima and Suzuki, 2010). Enhanced saliency through visual guidance may parallel visual processing of natural stimuli (Einhäuser and König, 2010), whereby responses in V1 cells are optimally sparse (Vinje and Gallant, 2000). In the current study, this effect has been observed as a reduction in O-P cortical hemodynamic changes with comparatively fewer channels reaching statistical threshold for activation.

Parietal cortical activity is also associated with oculomotor intention and attention and may be important in planning eye movements (Kanwisher and Wojciulik, 2000). Verbal guidance may result in demanding visual search since it necessitates that auditory information be explicitly processed and translated into visual-spatial co-ordinates to understand the desired target's location, and parietal lobe activation has been shown to be important in spatial integration (Molholm et al., 2006). Conversely, gaze-guidance protocols may share many similarities with implicit learning protocols (Wilson et al., 2011). Implicit learning, a form of unconscious, incidental and procedural knowledge demands fewer attentional resources than explicit learning, a form of conscious, intentional or declarative knowledge. Implicit motor learning has been shown to reduce non-essential co-activation or connectivity between verbal-analytic and motor planning regions during laparoscopic performance (Zhu et al., 2011).

Here, as well as investigating connectivity (i.e., correlations), network topology has been explored with graph theory, which provides a powerful method for quantitatively describing the topology of brain connectivity (He and Evans, 2010). Graph theory has been utilized to interrogate cortical networks in both pathological and non-pathological brains (Achard and Bullmore, 2007; Bassett et al., 2009), and allows network parameters such as cost and efficiency to be determined (Bullmore and Bassett, 2011). Presently, graph theory was applied to experimental data in order to further appreciate the impact of a ''gaze-channel'' on functional brain networks. From the active network analysis (i.e., that which retains only activated nodes), it is evident that compared to verbal-guidance, gaze-assistance does not lead to significant differences in O-P network topologies, therefore disproving the secondary hypothesis. Therefore, our conclusion is that collaborative gaze exerts a positive effect on technical skills, alleviates burden on the visual cortices, and yet critically does not significantly alter performance of the functional O-P network.

Intuitively verbal instructions about target location are time consuming to deliver, more complex to interpret and harder to translate into the ''visual'' workspace, ultimately relying therefore on greater cognitive work as evidenced by enhanced task performance when visual guidance is employed (Chetwood et al., 2012). We suspect that gaze assistance makes the flow of information between the trainer and trainee more seamless by increasing the perceptual fidelity of the instruction given. Extrapolating this effect to the in vivo setting, a reduction in the attentional demands necessary to execute a procedure may manifest as a liberation of resources to devote to other safety critical aspects of clinical care (e.g., reacting to unexpected events, multitask decision making, planning operative steps, etc.). Future studies may capitalize on a framework that enables combined analysis of brain responses, visual behavior and HRV to improve the detection of changes in workload as has been demonstrated in pilots (Duratin et al., 2014). Furthermore, although not specifically investigated within the confines of this study, it is feasible that in using visual guidance the need to verbalize the intended target is bypassed and as such the trainer can focus on supplementary aspects of the procedure. For example, if the site of suture placement is already determined and displayed visually, a trainer can then focus verbal instruction on the technical aspects of suturing manoeuvres required to achieve accurate tissue apposition.

# Conclusion

To summarize, this study demonstrates that capitalizing on visual behavior enhances communication between collaborating surgeons, and improves operator performance. This may be achieved through a bottom up allocation of resources within the visual cortex of the surgeon being instructed. It is plausible that trainees instructed in this fashion will be better able to devote neural resources to other safety critical aspects of the procedure. In investigating these hypotheses, fNIRS technology is well placed to make an impact, as it overcomes the limitations of traditional scanning environments (Cutini et al., 2012). However, future validation of graph theory measures for fNIRS connectivity analysis will necessitate comparison against models of anticipated responses and structural connectivity as have been observed using other neuroimaging technologies such as fMRI (van den Heuvel et al., 2009; Zhang et al., 2010). Critically, demonstration of correspondence between predicted and observed patterns of functional connectivity would support the feasibility and validity of fNIRS-derived connectivity measures.

# Author Contributions

Study design and protocols were conceived by DRCJ, DRL, FO-E, LWS, K-WK, GM, G-ZY, and AWD. Data collection was performed by DRCJ, DRL, FO-E, LWS and K-WK. Data analysis was performed by DRCJ, FO-E, DRL, K-WK, LWS, GM and TA. The manuscript was written by DRCJ, DRL and FO-E and final critical editing was performed by DRL, LWS, GM, TA, G-ZY and AWD.

# Funding

This work was funded in part by research grants from the Academy of Medical Sciences (Lecturer Starter Grant) and Cancer Research UK (Academic Lecturership).

#### References


learning with fNIRS. Med. Image Comput. Comput. Assist. Interv. 13, 319–326. doi: 10.1007/978-3-642-15711-0\_40


of measurement, physiological interpretation and clinical use. Task Force of the European Society of Cardiology and the North American Society of Pacing and Electrophysiology. Eur. Heart J. 17, 354–381. doi: 10.1161/01.cir.93. 5.1043


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Leff, James, Orihuela-Espina, Kwok, Sun, Mylonas, Athanasiou, Darzi and Yang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# The Role of Cognitive and Perceptual Loads in Inattentional Deafness

Mickaël Causse1,2\*, Jean-Paul Imbert <sup>3</sup> , Louise Giraudet <sup>1</sup> , Christophe Jouffrais <sup>4</sup> and Sébastien Tremblay <sup>2</sup>

<sup>1</sup> Département Conception et Conduite des Véhicules Aéronautiques et Spatiaux, Institut Supérieur de l'Aéronautique et de l'Espace (ISAE), Toulouse, France, <sup>2</sup> School of Psychology, Co-Dot Laboratory, Université Laval, Québec, QC, Canada, <sup>3</sup> Laboratoire d'Informatique Interactive (LII), École Nationale de l'Aviation Civile (ENAC), Toulouse, France, <sup>4</sup> Centre National de la Recherche Scientifique (CRNS) and Université de Toulouse, IRIT, Toulouse, France

The current study examines the role of cognitive and perceptual loads in inattentional deafness (the failure to perceive an auditory stimulus) and the possibility to predict this phenomenon with ocular measurements. Twenty participants performed Air Traffic Control (ATC) scenarios—in the Laby ATC-like microworld—guiding one (low cognitive load) or two (high cognitive load) aircraft while responding to visual notifications related to 7 (low perceptual load) or 21 (high perceptual load) peripheral aircraft. At the same time, participants were played standard tones which they had to ignore (probability = 0.80), or deviant tones (probability = 0.20) which they had to report. Behavioral results showed that 28.76% of alarms were not reported in the low cognitive load condition and up to 46.21% in the high cognitive load condition. On the contrary, perceptual load had no impact on the inattentional deafness rate. Finally, the mean pupil diameter of the fixations that preceded the target tones was significantly lower in the trials in which the participants did not report the tones, likely showing a momentary lapse of sustained attention, which in turn was associated to the occurrence of inattentional deafness.

#### Edited by:

Klaus Gramann, Berlin Institute of Technology, Germany

#### Reviewed by:

Peter König, University of Osnabrück, Germany Massimo Grassi, University of Padova, Italy

#### \*Correspondence:

Mickaël Causse mickael.causse@isae.fr

Received: 15 December 2015 Accepted: 21 June 2016 Published: 06 July 2016

#### Citation:

Causse M, Imbert J-P, Giraudet L, Jouffrais C and Tremblay S (2016) The Role of Cognitive and Perceptual Loads in Inattentional Deafness. Front. Hum. Neurosci. 10:344. doi: 10.3389/fnhum.2016.00344 Keywords: inattentional deafness, cognitive load, perceptual load, pupil diameter, neuroergonomics

# INTRODUCTION

The Air Traffic Control (ATC) environment involves supervisory control of emergency response, and security surveillance. Air traffic controllers must deal with dynamic and cognitively demanding tasks: guiding aircraft through a controlled airspace and optimizing trajectories whilst adhering to minimum distance and altitude separation minima requirement. This task must be completed in the face of temporal pressure, stress, and high-risk decision-making situations. Several research tried to identify the characteristics of the ATC environment that create cognitive demand (e.g., Manning et al., 2002; Loft et al., 2007). Manning et al. (2001) showed that these characteristics include, among others, the total number of aircraft controlled, the number of aircraft changing altitude, and the total conflict alert displayed. Other studies revealed that the dynamic density of the airspace at a given moment accounts for approximately half the variance in workload (Laudeman et al., 1998; Kopardekar and Magyarits, 2003). Although task demand has a strong relationship with workload, this relationship depends on the ATC operator capacity to select priorities and manage its cognitive resources (Loft et al., 2007).

The auditory channel is an essential means for air traffic controllers to exchange information with pilots and other controllers through radio and phone communications. They must also be vigilant and responsive to the occurrence of auditory alarms such as ground collision avoidance alerts or area infringement warnings that have been increasingly integrated into ATC workstations (Cabrera et al., 2005). Given that the auditory modality provides information without requiring head/gaze movements (Edworthy et al., 1991), it is particularly suitable for the transmission of alerts and warnings in emergency situations because perception is not dependent on the direction of gaze at a particular moment (Harris, 2011). However, research in the field of aviation has provided ample evidence that individuals can still remain unaware of unexpected task-relevant and often safety-critical auditory stimuli if deeply involved in demanding tasks (Dehais et al., 2012, 2014; Giraudet et al., 2015b).

Several studies support the notion of a central bottleneck of attention processing (Jolicoeur, 1999; Arnell and Larson, 2002; Lavie, 2005; Dux et al., 2006; Raveh and Lavie, 2015; Wahn and König, 2015) but other works propose modality-specific restrictions of attention (Duncan et al., 1997; Talsma et al., 2006; Martens et al., 2010; Keitel et al., 2013). In accordance with the first view, Tombu et al. (2011) proposed a central attentional bottleneck that includes the inferior frontal junction, superior medial frontal cortex, and bilateral insula that temporally limits cognitive processes such as perceptual encoding or decisionmaking. In contrast, other studies show support for modality specific limitations by demonstrating that attentional capacity between modalities is greater than attentional capacity within the same modality (Talsma et al., 2006). Furthermore, Martens et al. (2010) showed that an attentional blink is produced only when targets are both presented within the same modality (auditory or visual) but not cross-modally, thus favoring the idea of a modality-specific sensory system rather than a central amodal system. From a theoretical viewpoint, multiple resource theory (Wickens, 1980) posits that there are multiple, independent pools of resources and that tasks that share the same limited resource would interfere with each other but would not affect other tasks that require a different type of resource. For example, Kim et al. (2005), showed that Stroop interference increased when the type of working memory (WM) load overlapped with the type of information required for the target task. At the same time, Stroop interference decreased when the type of WM load overlapped with distractor processing. Beyond this debate between central vs. modality-specific attentional limitations, many studies show that WM load also affects the ability to process visual or auditory environmental stimuli. For example, Sörqvist et al. (2012) demonstrated that brain response to an irrelevant sound decreased as a function of central WM load, induced by a visual-verbal version of the n-back task. In the same way, it has been shown that manipulating the task load of the primary task reduced markedly the sensitivity to auditory distractors during a duration-discrimination task (Berti and Schröger, 2003).

Given the evidence for both sides of the amodal vs. modalityspecific debate on attentional capacity, we might postulate the existence of both central limitations in the control of attention and executive control (Rossi et al., 2009), with a key role of the prefrontal cortex (Asplund et al., 2010) and higher-order multisensory cortices (Calvert and Thesen, 2004), and additional capacity limits in modality-specific sensory brain areas (Talsma and Kok, 2001). Such a hypothesis is supported by Vachon and Tremblay (2008) using an attentional blink paradigm. Their results tend to support the idea that attentional limitations are due to a mixture of both modality-specific and amodal resource constraints. Based on the results of Berti and Schröger (2003) and Sörqvist et al. (2012) showing the adverse effect of WM load, as well as similar fundamental works (Wood and Cowan, 1995; Spence and Read, 2003; Lavie et al., 2004; Hughes et al., 2013) and observations in flight and ATC simulators (Dehais et al., 2012, 2014; Giraudet et al., 2015a,b) indicating that a high cognitive load context can lead to the neglect of auditory alerts, we may reasonably postulate that the risk of missed alarms is quite important in complex activities such as ATC.

The high cognitive and perceptual loads typical of ATC operations may consume most of attentional resources, thus reducing the remaining attentional capacity for processing unexpected stimuli such as auditory alarms. This failure to perceive auditory stimuli has been called inattentional deafness (Macdonald and Lavie, 2011; Koreimann et al., 2014). Given the potential impact of inattentional deafness in safety-critical occupations, it is important to understand the factors that promote this phenomenon and to be able to detect its occurrence. When no visual feedback from the operator is available, or when the alarm is triggered by a system, it is almost impossible to interpret human reactions. However, recent studies have found electro-encephalographic indicators of the occurrence of inattentional deafness with diminution of the amplitude of the P300 evoked potential (Giraudet et al., 2015a,b). These results are promising since they allow an offline analysis to test alarm designs and to evaluate the conditions favoring inattentional deafness. However, the online detection of inattentional deafness with ERP is complex under ecological conditions given the low signal-to-noise ratio of the event-related EEG activity. A more robust way for detecting inattentional deafness online is still to be developed, but the ability to predict its occurrence using a physiological measure has excellent potential. With the visual modality monopolizing most of attentional resources, we suggest that recording eye movements while operators are exposed to alarms can inform about their auditory capacity in real time, particularly if they are displaying inattentional deafness. Eyetracking has already proven very useful for interface design and for usability tests (Goldberg and Kotval, 1999). Several behavioral ocular metrics such as the number of fixations and their duration, the scanpath direction and length, or the switching rate between areas of interest can provide a non-invasive measure of cognitive activity (for a review see, Jacob and Karn, 2003). Evidence suggests that when the eye is free to move, fixation location is strongly correlated with where attention is focused (Findlay and Gilchrist, 2003). But while eye tracking is known to reflect visual cognition, it is uncertain whether ocular behavior could reflect further mental processes beyond basic visual encoding of task-relevant information. Also, the pupil diameter is a classic measure to index cognitive activity and is generally higher in context of high mental workload (Kahneman and Beatty, 1966; Palinko et al., 2010; Peysakhovich et al., 2015) or when the level of vigilance is high (Beatty, 1982). For example, Beatty (1982) showed that vigilance decrement was associated to decreased amplitude of the phasic task-evoked pupillary response during an auditory vigilance task, while tonic or baseline pupillary diameter exhibited no such relationship.

Inattentional deafness is generally studied by varying perceptual load (Koreimann et al., 2009, 2014; Macdonald and Lavie, 2011; Molloy et al., 2015; Raveh and Lavie, 2015), while the effects of variations in mental workload (central demand) are less well investigated (Giraudet et al., 2015b). Importantly, no studies have examined and compared the impact of these two loads on the ability to perceive auditory stimuli. The present study had two main objectives: to further understand how cognitive and perceptual loads impact auditory detection sensitivity, and to assess the possibility of eyemovements and pupil diameter to predict the occurrence of inattentional deafness. Twenty participants performed a realistic ATC simulation task called Laby (Imbert et al., 2014a) while an auditory oddball task was presented. Participants had to react to deviant tones (simulating auditory alarms) by button pressing, as an indicator of their detection of the sound. We separately examined the impact of cognitive and perceptual loads on auditory detection sensitivity with a 2 × 2 factorial design. The cognitive load varied with the number of central aircraft to control, and the perceptual load with the number of peripheral aircraft to monitor. In a previous study also using Laby, we demonstrated that the cerebral response (P300) to deviant auditory tones was diminished when the visual design of Laby was poor (Giraudet et al., 2015a). Also, this study showed that approximatively 6% of the deviant tones were missed in the high cognitive load condition with the poor visual design. To further understand the factors that promote the occurrence of inattentional deafness, in the present study we intended to increase the inattentional deafness rate by using a more engaging and complex version of the Laby. Inducing a high level of missed alarms would enable comparison between the ocular behavior of missed and reported alarms. We hypothesized that: (1) the high cognitive and perceptual load conditions should generate more missed alerts than the low cognitive and perceptual load conditions; (2) increased cognitive and perceptual load should impact ocular measurements; and (3) ocular measurements may predict the occurrence of inattentional deafness.

#### MATERIALS AND METHODS

### Participants

Twenty participants, all students of Université Laval were recruited for this study (Mean age = 23.5 years, Standard Deviation (SD) = 4.2). None had a history of neurological disease, psychiatric disturbance, substance abuse, or took psychoactive medications. They all received full information on the experimental protocol, signed an informed consent and received compensation for their participation in the study.

#### Experimental Design

We used a 2 × 2 factorial design crossing two independent variables, cognitive and perceptual loads. The cognitive load was manipulated by the number of central aircraft in the corridor. The low cognitive load condition was the first half of the scenarios, with one aircraft to guide. The high cognitive load condition was the second half of the scenarios, with two aircraft to guide. The perceptual load was manipulated by the number of peripheral aircraft around the corridor (between 5 and 21). The perceptual load was unique for each scenario and the order in which low and high perceptual load scenarios was performed was counterbalanced across participants.

# The Laby Microworld and The Auditory Oddball Task

#### The ATC Task

The Laby microworld is a functional simulation of ATC, developed to create and evaluate new designs for controller's visualization. It is built on the main task of guiding aircraft around a route shown on the center part of the screen (light green path). For the first half of the Laby scenario, there was only one aircraft to monitor. In order to increase the main task demand, at the beginning of the second half of the scenario, a second aircraft entered the corridor and participants had to guide both aircraft along the route (**Figure 1**).

In order to maintain the central aircraft within the corridor or to follow altitude instructions, participants had to regularly modify their heading and altitude, using drop-down menus (**Figures 2A,B**).

In addition to the central aircraft, participants had to monitor a set of static aircraft (5 in the low perceptual load condition, 21 in the high perceptual load condition) located around the main aircraft corridor (**Figure 1**). ''Color-Blink'' visual notifications were displayed in or around the radar label located in the vicinity of these peripheral aircraft (**Figure 3**). Color-Blink uses colored text with the word ''ALRT'' which switches from white to red (see **Figures 3A,B**). It is used in ATC operational radar visualization for high-priority short-term conflict alerts. The Laby interface design is similar to operational radar visualization, and has been used in a previous study comparing the performance of several visual notifications in peripheral vision (Imbert et al., 2014b). Compared to other enhanced designs, the Color-Blink notification was found to be less salient and had a lower detection rate among controllers (see, Imbert et al., 2014a). We thus selected the Color-Blink notification to increase the overall monitoring effort in the present study. In another study also with Laby (Giraudet et al., 2015a), we showed that a high cognitive load condition of the Laby was associated with 6% unreported ton. Also, as we intended to increase the inattentional deafness rate with a more engaging and complex versions of the Laby in the present study, two modifications were performed. In the present study, there was two aircraft to guide in the high cognitive load condition (one in the previous study). In addition, contrary to the previous studies in which the heading indications to give to the aircraft was already computed by the system and just had to be selected by the participants in a drop-down menus, in the present study, the participants had to mentally calculate the various heading that the aircraft should follow to turn and stay in the corridor. An orange heading indicator was displayed on the top left corner in order to help participants to transform direction into heading values in degree.

FIGURE 1 | Screenshot of the Laby microworld simulation. An example with 21 static peripheral aircraft positioned around the corridor. The central aircraft navigates through the corridor.

aircraft. The menus appeared when clicking on the radar label.

Visual notifications were randomly displayed in the radar label located in the vicinity of these peripheral aircraft. Only one notification was issued at a time. The notification disappeared as soon as the participant clicked on the aircraft. If the participant did not react within a given time (5 s), the notification disappeared. Thirty-four visual notifications were displayed in each scenario.

A score was displayed on the top left of the screen. The score decreased for the following three reasons: first, when a participant led an aircraft outside the corridor; second, when he/she gave an incorrect altitude instruction; third, when he/she failed to click on a peripheral notification in the time limit. A deviation in the assigned route resulted in the aircraft crossing the border and initiating a visual alert in the center of the screen. An error in the altitude instructions resulted in the aircraft maintaining its trajectory, with no alert, and continued control. The score aimed to engage the participant in the ATClike simulation in order to avoid them paying attention to the auditory alarm detection task only. The score was not considered in the analysis. The simulation ended as soon as the first aircraft reached the arrival area (colored red), at the end of the corridor.

#### Auditory Oddball to Simulate an Alarm Detection Task

In parallel to the ATC task, participants had to perform an auditory alarm detection task. Standard pure tones (1000 Hz, 52.5 dB SPL, 500 ms long, probability = 0.8) and deviant pure tones (2000 Hz, 52.5 dB SPL, 500 ms long, probability = 0.2) were randomly played. The tones were not representative of the auditory alerts recently integrated in ATC operations, their frequencies were chosen from previous works (Giraudet et al., 2014, 2015a,b). The mean time window between successive tones was 4.2 s. Participants were told to consider the deviant tones as auditory warnings and to report them as fast as possible by pressing a specific button. The auditory oddball detection task had no impact on the score. The number of auditory alarms (10) was the same in the four experimental conditions. There were ten tones in the first half (with one main aircraft) and ten tones in the second half (two aircraft). In order to increase the sound detection task difficulty, A 42 dB white noise was played continuously during each ATC scenario and the oddball control task. A control condition was also performed by the participant. They only had to react to the deviant auditory tones of the oddball while fixating a cross at the screen.

#### Procedure

The whole procedure lasted about 1 h. First, participants were seated comfortably at 60 cm from the 30<sup>00</sup> screen in a soundattenuated room with their right hand on the computer mouse and their left hand on the auditory alarm button. Second, they completed a training phase of 5 min to familiarize with the Laby microworld software, i.e., enter correctly path and altitude instructions by the drop-down menus, acknowledge visual notifications, and report deviant tones. After the training, the eye tracker was calibrated and participants completed the four ATC scenarios. Between scenarios, the eye tracker was recalibrated. Finally, participants performed the auditory oddball control task.

#### Eye Tracking Measurements and Data Processing

Continuous eye tracking was performed with a Tobii T1750 during the four ATC scenarios. The signal was recorded at a sampling rate of 300 Hz. For all eye movement analyses, the threshold to detect a fixation was set at 100 ms and the fixation field corresponded to a circle with a 30-pixel radius (equivalent to 1.15◦ of visual angle when seated at a distance of 50 cm). The position of both eyes on the screen was recorded. Data analysis was performed using MATLAB 7.1 (The Mathworks). Heatmaps visualizations of the distribution of fixations were generated using the open source software Open Gaze and Mouse Analyzer (OGAMA; Vosskühler et al., 2008).

#### Statistical Analysis

The impact of cognitive and perceptual loads on the accuracy to the central aircraft guiding task, peripheral notifications detection rate and missed auditory stimuli (rare tones) were analyzed. We also calculated the mean fixation duration on each of the four whole scenario and the mean duration of the fixation time that preceded the onset of a deviant tone (time-locked analysis) as well as the mean pupil diameter of this fixation (averaged on both eyes). Statistical analyses were performed using Statistica 10 (StatSoft©). Differences between the experimental conditions were investigated with the use of within-subjects analysis of variance (ANOVA) followed by post hoc testing (Tukey's honestly significant difference, Tukey HSD). We finally computed a multivariate logistic regression in order to further determine the variables that predicted inattentional deafness.

#### RESULTS

## Effects of Cognitive and Perceptual Loads on Performance to The ATC Task

We examined if the performance in the ACT task depended on cognitive and perceptual loads, see **Figure 4**. The 2 × 2 (cognitive load × perceptual load) repeated measures ANOVA showed no significant effect of the cognitive and perceptual loads on the accuracy to the central aircraft guiding task (respectively, F(1,19) = 0.86, p > 0.05, η 2 <sup>p</sup> = 0.04; F(1,19) = 0.64, p > 0.05, η 2 <sup>p</sup> = 0.03). The interaction term was not significant (F(1,19) = 0.37, p > 0.05, η 2 <sup>p</sup> = 0.01). Regarding the peripheral notifications detection rate, the 2 × 2 (cognitive load × perceptual load) repeated measures ANOVA showed a significant effect of the cognitive load, with a lower performance in the high cognitive load condition (F(1,19) = 37.45, p < 0.001, η 2 <sup>p</sup> = 0.66). The perceptual load had a near significant impact (F(1,19) = 3.74, p = 0.06, η 2 <sup>p</sup> = 0.16). The interaction term was not significant (F(1,19) = 0.00, p > 0.05, η 2 <sup>p</sup> = 0.00).

# Effects of Cognitive and Perceptual Loads on Ocular Behavior

The analysis of the dispersion of fixations across the Laby interface is shown in **Figure 5**. In the high cognitive load condition, there is an increase in the overall time spent fixating the central part of the Laby interface where the central aircraft are moving. Also, the time spent fixating peripheral aircraft is increased in the high perceptual load condition.

We analyzed the extent to which the overall fixation time, averaged across each whole condition duration, were affected by cognitive and perceptual loads (**Figure 6**). The 2 × 2 (cognitive load × perceptual load) showed a significant effect of cognitive load on fixation duration (F(1,19) = 7.69, p < 0.05, η 2 <sup>p</sup> = 0.29). The effect of the perceptual load was not significant (F(1,19) = 0.00, p > 0.05, η 2 <sup>p</sup> = 0.00) neither the interaction term (F(1,19) = 0.41, p > 0.05, η 2 <sup>p</sup> = 0.02). Overall average fixation durations were approximatively 420 ms (see **Figure 6**), which is consistent with a previous study using the same eye tracker during a simulated combat control system microworld. In this latter work, participants demonstrated average fixation durations above 300 ms in several experimental conditions (Hodgetts et al., 2015).

Fixation durations before a saccade have been shown to be modulated by the relative angle of the saccade (see, Wilming et al., 2013). The alternation between the two central planes in the high load condition could lead to systematic differences in the angle between subsequent saccades in comparison to the low load condition with only one central plane. This difference in angle by means of saccadic momentum can in turn lead to differences in fixation duration. Consequently, we compared the average angle between two saccades in the low vs. high cognitive load condition in order to examine a possible effect of momentum on fixation times. This analysis revealed that the mean angle slightly increased with increased cognitive load (low cognitive load = 84.64◦ (SD = 0.81); high cognitive load = 85.95◦ (SD = 1.08)) but the analysis did not reach the significance threshold (F(1,19) = 2.24, p > 0.05, η 2 <sup>p</sup> = 0.10). Saccadic momentum cannot explain by itself the variations of fixation times across the two cognitive load conditions.

# Effects of Cognitive and Perceptual Loads on the Inattentional Deafness Rate

The control condition revealed that the inattentional deafness rate (missed alerts = 1-hit rate) was extremely low, with 2% (SD = 5.93) of missed alert. As a matter of fact, two participants omitted a few deviant tones whereas the 18 others reacted to 100% of the deviant tones. This result confirms that the

FIGURE 4 | Correct responses to the central aircraft guiding sub-task according to the levels of cognitive and perceptual loads. Validation of the peripheral aircraft sub-task according to the levels of cognitive and perceptual load. The square in the center of the boxes represent the mean, the horizontal line in the center of the boxes represent the 50th percentile (median), the end of the boxes represent the 25th and 75th percentiles, and the whiskers represent the 5th and 95th percentiles.

FIGURE 5 | Heatmap visualizations of the distribution of fixations on the Laby interface. (A) Low cognitive load/low perceptual load; (B) low cognitive load/high perceptual load; (C) high cognitive load/low perceptual load; (D) high cognitive load/high perceptual load.

tones were clearly perceptible despite the continuous white noise. We then examined if the inattentional deafness rate depended on cognitive and perceptual loads. The 2 × 2 (cognitive load × perceptual load) repeated measures ANOVA showed a significant effect of the cognitive load on the percentage of missed auditory alarms with an increased percentage of missed auditory stimuli in the high cognitive load condition (F(1,19) = 24.49, p < 0.001, η 2 <sup>p</sup> = 0.56), see **Figure 7**. The perceptual load had no significant impact (F(1,19) = 0.10, p > 0.05, η 2 <sup>p</sup> = 0.00). The interaction term was not significant (F(1,19) = 0.49, p > 0.05, η 2 <sup>p</sup> = 0.02). We finally examined if the specificity (true negative rate rate) and the discriminability index (d<sup>0</sup> = Z(hit rate) − Z(false alarm rate); Stanislaw and Todorov, 1999) depended on cognitive and perceptual loads. A first 2 × 2 (cognitive load × perceptual load) repeated measures ANOVA revealed no significant main effect of cognitive load (F(1,19) = 1.23, p > 0.05, η 2 <sup>p</sup> = 0.06) and perceptual load (F(1,19) = 0.07, p > 0.05, η 2 <sup>p</sup> = 0.00) neither interaction effect (F(1,19) = 0.22, p > 0.05, η 2 <sup>p</sup> = 0.01) on the specificity. A second 2 × 2 (cognitive load × perceptual load) repeated measures ANOVA revealed significant main effect of cognitive load (F(1,19) = 18.88, p < 0.001, η 2 <sup>p</sup> = 0.50) but no effect of perceptual load (F(1,19) = 0.00, p > 0.05, η 2 <sup>p</sup> = 0.00) neither interaction (F(1,19) = 3.30, p > 0.05, η 2 <sup>p</sup> = 0.14) on the discriminability index. In summary, the cognitive load had a specific impact on the sensibility (hit rate), which increased the number of missed alerts. This is illustrated in **Figure 7**, the cognitive load had a specific impact on the true positive rate, not on the false positive rate. The d<sup>0</sup> variations are only due to this effect of cognitive load on the true positive rate. We finally estimated if the variations in loads impacted the reaction time to alerts. In all four experimental conditions, mean reaction times were markedly below the mean time available to respond between two tones (i.e., 4.2 s), M = 1.67 s in low cognitive load condition; 1.53 s in high cognitive load condition; 1.65 s in low perceptual load condition; 1.55 s in the high perceptual load condition. The 2 × 2 repeated measures ANOVA showed no significant effect of

cognitive (F(1,19) = 1.02, p > 0.05, η 2 <sup>p</sup> = 0.05) and perceptual load (F(1,19) = 1.03, p > 0.05, η 2 <sup>p</sup> = 0.05) neither significant interaction (F(1,19) = 0.06, p > 0.05, η 2 <sup>p</sup> = 0.00) on reaction times.

#### Multivariate Logistic Regression Analysis

In order to further investigate the factors that promote inattentional deafness, we performed a multivariate logistic regression analysis stratified by trial (trials with deviant tones only) with all participants grouped together (number of trials = 800). We used the occurrence of inattentional deafness (yes/no) as binary dependent variable and cognitive and perceptual load were introduced as categorical independent variables. Furthermore, the mean duration of the fixation that occurred just before the occurrence of the rare tone and the mean pupil diameter during this fixation were used as a continuous variable. The Wald chi-square p-values confirmed that the cognitive load was a significant predictor of the occurrence of inattentional deafness (Wald statistic = 14.38, p < 0.001) while perceptual load was not (Wald statistic = 0.77, p > 0.05). In addition, the regression also showed that pupil diameter of the fixation that preceded the rare sound was significantly lower in the trials in which the participants did not react to the target tones (Wald statistic = 18.66, p < 0.001) (see **Figure 8**). Finally, the duration of the previous fixation was not predictive (Wald statistic = 0.20, p > 0.05).

#### DISCUSSION

#### Summary of Results

Our results showed that a high level of cognitive load, manipulated by the number of planes to guide, significantly increased the inattentional deafness rate whereas the perceptual load, manipulated by the number of peripheral aircraft to monitor, had no significant impact. The cognitive load also

impacted ocular behavior with lower fixation time in the high load condition, while perceptual load had no significant effect. Finally, logistic regressions showed that the mean pupil diameter of the fixation that preceded the onset of the tones predicted inattentional deafness.

#### Effects of Cognitive and Perceptual Loads

Participants missed 28.76% of alarms in the low cognitive load condition (irrespective of the perceptual load) compared to 46.21% in the high cognitive load condition. This strikingly high rate of missed alerts cannot be attributed to sensorial difficulties as only 2% of alerts were missed in the control condition, in which the participant completed the tone detection task only (only 2 out of 20 participants missed any alerts). In the high cognitive load condition, participants engaged in greater mental effort by shifting their attention from one plane to the other, and also had a higher workload due to the need to calculate/modify heading and to change altitude parameters more often. These factors increased the chance of experiencing inattentional deafness.

Interestingly, perceptual load did not increase the missed alarm rate. In order to disentangle cognitive from perceptual load as much as possible, the high perceptual load did not generate a supplementary cognitive effort as the number of peripheral aircraft was increased while the number of associated peripheral alerts remained constant. Indeed, as demonstrated by Manning et al. (2001), the total conflict alert displayed contribute to increase cognitive effort. Inattentional deafness seems to be produced when an individual is engaged and monopolized in a task rather than when the individual is gazing more passively at visual information. One might argue that the additional number of peripheral aircraft was simply ignored by the participant which may explain this lack of effect of the high perceptual load condition. However, the heatmap illustrating the distribution of fixations clearly showed that fixations on the peripheral aircraft increased in the high perceptual load condition.

As demonstrated by the heatmaps and by a decline in the detection rate of peripheral notifications, the high load condition resulted in an important focus on the central aircraft, a behavior that can be compared to attentional tunneling (Wickens and Alexander, 2009; Régis et al., 2014). In general, fixation duration is known to reflect the attention (Findlay and Gilchrist, 2003) and mental effort (De Rivecourt et al., 2008) of an observer. In this last study on simulated flights, De Rivecourt et al. (2008) showed that momentary altitude changes can result in increased mean fixation duration. Variation of the fixation duration should be considered as task dependent: both shorter and longer fixations may indicate an increase in workload, and in particular shorter fixations indicated higher workload and increased temporal pressure in our study. This strong engagement of cognitive resources seemed to contribute to create a momentarily ''deafness'' to auditory stimuli.

One might also argue that this high inattentional deafness rate was due to an insufficient time window to report the alarm (i.e., mean time window = 4.2 s). Yet, in all four experimental conditions, mean reaction times were markedly below the mean time available to respond between two tones (around 1 s), and the reaction times did not significantly vary across the four experimental conditions. Even if we cannot completely eliminate the idea that a relatively small number of deviant tones were not reported because participants reacted too late, these two results tend to exclude this explanation as a major contributor of variation in the inattentional deafness rate with increased cognitive load. The analysis of d<sup>0</sup> confirmed that the decline in the number of reported alerts in the high cognitive load condition is associated with a loss of sensitivity to deviant tones, and not due to an effect on the ability to discriminate the two tones. In this latter situation, the number of false alarms would have likely increased.

Importantly the inattentional deafness rate of the present study was considerably higher than in previous research using Laby (Giraudet et al., 2015a), whereby the percentage of unreported tones was 6% in the high cognitive load condition. To further understand the factors that promote the occurrence of inattentional deafness, the present study had employed two modifications to create a more engaging and complex version of the Laby task. First, there were two aircraft to guide in the high cognitive load condition whereas only one was displayed in the previous study. Second, in the current study participants had to mentally calculate the various headings that the aircraft should follow to turn and stay within the corridor, whereas previously these were pre-calculated by the system and just required selection from a drop-down menu. These modifications lead to a considerable rise in the incidence of inattentional deafness. It must be noted that in both studies, the importance of reporting the sounds was emphasized and that the time between the two tones was identical in both. The mental calculation of heading was undoubtedly a key factor in this increase of inattentional deafness as even in the low load condition, in which only one aircraft was displayed, the inattentional deafness rate was greater than three times that observed in the previous study in which the heading was given by the system.

# Lower Pupil Diameter Predicts Inattentional Deafness

The multivariate logistic regression confirmed that cognitive load significantly predicted the occurrence of inattentional deafness. Most importantly, the regression also revealed that pupil diameter was lower during the fixation that preceded the onset of the target tones in the ''deaf '' trials. This result is counterintuitive as inattentional deafness was indubitably increased by the high cognitive load context, which is supposed in turn to increase the pupil diameter (Kahneman and Beatty, 1966; Palinko et al., 2010; Peysakhovich et al., 2015). Yet, as previously mentioned, Beatty (1982) showed that vigilance decrement was associated to decreased amplitude of the phasic task-evoked pupillary response during an auditory vigilance task, while tonic or baseline pupillary diameter exhibited no such relationship. In addition, a very recent study (Unsworth and Robison, 2016) indicated that pupillary diameter can index lapses of sustained attention. They showed that compared to focused states, inattentive and mind-wandering states are associated with lower pretrial baseline pupil responses and that distracted states are associated with larger pretrial baseline pupil diameter. These results support the notion that pupil diameter is sensitive to different types of lapses of attention, which is consistent with theories of locus coeruleus norepinephrine (LC-NE) functioning. In our study, despite a context of sustained high cognitive load, momentary lapses of sustained attention may have occurred, which could explain the relationship between lower phasic pupil diameter and inattentional deafness occurrence. This assumption can be also related to a past study that revealed that information overload resulted in a leveling of the dilation pattern, which suggested a momentary suspension of processing effort (Peavler, 1974).

Our results demonstrating an effect of cognitive load but not of perceptual load on inattentional deafness, are somewhat contradictory to a study by Macdonald and Lavie (2011) in which participants were engaged in a visual discrimination task of a cross shape. In the low visual load condition, this discrimination was made according to the line color and in the high visual-load condition, participants had to discriminate subtle line length. One brief pure tone was presented simultaneously at the final trial onset. Failures to notice the presence of this tone reached a rate of 79% in the high-visual-load condition, significantly greater than in the low-load condition. We could postulate that the type of perceptual load manipulated by the authors likely generated an indirect increase in mental effort and task engagement due to the comparison process of the line length. For example, Fierro et al. (2000) showed that the line-length comparison process engages the parietal cortex, indicating that spatial cognition is also taxed in such a task. Also, the paradigm used by Macdonald and Lavie (2011) was quite different as only one pure sound was presented in the study while 10 target tones per condition were presented in the current study. We believe that our paradigm is closer to a real life context or complex activity such as ATC in which the auditory environment is composed of a mixture of different sounds that can be repeated several times. As our paradigm can be related to ''change deafness'' studies, for example in which a subtle change between two voices is unnoticed (Vitevitch, 2003), a future study would look to reproduce the same paradigm but with only deviant tone.

#### Conclusion

The present study suggests that inattentional deafness is promoted by cognitive load rather than by a ''passive'' perceptual load that does not generate a supplementary amount of work. A strong engagement of cognitive resources in a given task can momentarily render one ''deaf'' to auditory stimuli. In our study, the key factor that promoted inattentional deafness was most likely the cognitive load generated by the mental calculation of heading and by the numerous tasks

#### REFERENCES


to conduct. This result confirmed previous studies showing that inattentional deafness drastically increases in the context of high cognitive load (Giraudet et al., 2015b), which can have serious consequences in safety-critical occupations like ATC. Finally, the mean pupil diameter of the period that just preceded the rare sound onset was significantly lower in the trials in which the participants did not react to the target tones, likely showing a momentary lapse of sustained attention, which in turn promoted the occurrence of inattentional deafness.

### ETHICS STATEMENT

The study was approved by the National Scientific Research Ethics Committee in Paris (CEEI/IRB00003888).

# AUTHOR CONTRIBUTIONS

LG, J-PI and MC designed the experiment. LG conducted the experiments. LG, MC, J-PI analyzed and interpreted the data. MC, LG, ST and CJ wrote the article.

#### FUNDING

Financial support was provided by a Discovery grant from the National Sciences and Engineering Research Council of Canada (NSERC), awarded to Sébastien Tremblay [Grant number CG073877], by the Institut Supérieur de l'Aéronautique et de l'Espace to in the form of an operating grant to Mickaël Causse, and also by the Direction Générale de l'Armement in the form of a scholarship to Louise Giraudet.

#### ACKNOWLEDGMENTS

We would like to thank Danny Lebel for his technical support, and Marie-Kim Côté for her help running the experiment. We are also grateful to Zarrin Chua and Helen Hodgetts for proofreading earlier versions of the article.


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Causse, Imbert, Giraudet, Jouffrais and Tremblay. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# From Trust in Automation to Decision Neuroscience: Applying Cognitive Neuroscience Methods to Understand and Improve Interaction Decisions Involved in Human Automation Interaction

#### Kim Drnec<sup>1</sup> \*, Amar R. Marathe<sup>1</sup> , Jamie R. Lukos <sup>2</sup> and Jason S. Metcalfe<sup>1</sup>

<sup>1</sup> Human Research and Engineering Directorate, U.S. Army Research Laboratory, Aberdeen, MD, USA, <sup>2</sup> Advanced Concepts and Applied Research Branch, Space and Naval Warfare Systems Center Pacific, San Diego, CA, USA

#### Edited by:

Klaus Gramann, Berlin Institute of Technology, Germany

#### Reviewed by:

Agnieszka Wykowska, Ludwig-Maximilians-Universität, Germany Dietrich Manzey, Technische Universitaet Berlin, Germany

\*Correspondence:

Kim Drnec kdrnec@gmail.com; kim.a.drnec2.ctr@mail.mil

Received: 07 November 2015 Accepted: 30 May 2016 Published: 30 June 2016

#### Citation:

Drnec K, Marathe AR, Lukos JR and Metcalfe JS (2016) From Trust in Automation to Decision Neuroscience: Applying Cognitive Neuroscience Methods to Understand and Improve Interaction Decisions Involved in Human Automation Interaction. Front. Hum. Neurosci. 10:290. doi: 10.3389/fnhum.2016.00290 Human automation interaction (HAI) systems have thus far failed to live up to expectations mainly because human users do not always interact with the automation appropriately. Trust in automation (TiA) has been considered a central influence on the way a human user interacts with an automation; if TiA is too high there will be overuse, if TiA is too low there will be disuse. However, even though extensive research into TiA has identified specific HAI behaviors, or trust outcomes, a unique mapping between trust states and trust outcomes has yet to be clearly identified. Interaction behaviors have been intensely studied in the domain of HAI and TiA and this has led to a reframing of the issues of problems with HAI in terms of reliance and compliance. We find the behaviorally defined terms reliance and compliance to be useful in their functionality for application in real-world situations. However, we note that once an inappropriate interaction behavior has occurred it is too late to mitigate it. We therefore take a step back and look at the interaction decision that precedes the behavior. We note that the decision neuroscience community has revealed that decisions are fairly stereotyped processes accompanied by measurable psychophysiological correlates. Two literatures were therefore reviewed. TiA literature was extensively reviewed in order to understand the relationship between TiA and trust outcomes, as well as to identify gaps in current knowledge. We note that an interaction decision precedes an interaction behavior and believe that we can leverage knowledge of the psychophysiological correlates of decisions to improve joint system performance. As we believe that understanding the interaction decision will be critical to the eventual mitigation of inappropriate interaction behavior, we reviewed the decision making literature and provide a synopsis of the state of the art understanding of the decision process from a decision neuroscience perspective. We forward hypotheses based on this understanding that could shape a research path toward the ability to mitigate interaction behavior in the real world.

Keywords: trust in automation, interaction decisions, decision making, human automation interaction, neuroergonomics

# INTRODUCTION

The purpose of this review is to address a largely unexplored aspect of human automation interaction (HAI); that is, the human decision that leads to interaction behavior, traditionally considered a manifestation of the user's level of Trust in Automation (TiA). The extension of this concept has been that, if HAI is to be actively managed in joint human-automation systems, one must calibrate the TiA of the user so that decisions about interactions with automation are appropriate. Further, it has been considered that if one could measure instantaneous levels of TiA, inappropriate interaction decisions could be predicted and mitigated. Research interest in HAI systems is motivated in large part because of observations that even the most advanced HAI systems have not yet fully realized the ultimate vision of both safe and seamless integration of the human into the system that would lead to improved task performance. Specifically, successful applications of automation within task spaces involving human operators have not yet been realized without simultaneous definition of significant contextspecific design constraints that delineate human and automation responsibilities. Such constraints may improve focused aspects of performance, but also increase the risk in other ways, particularly in circumstances and moments involving handoff of control authority, and these constraints limit more generalized application of HAI concepts and methods, particularly in terms of improving joint system efficiency (Parasuraman and Riley, 1997; Dekker and Woods, 2002; Dzindolet et al., 2003; Jamieson and Vicente, 2005; Parasuraman and Manzey, 2010).

Decades of human factors research have resulted in an understanding of what factors affect TiA, but as of yet, it remains unclear how specific levels of TiA translate into specific human decisions regarding interaction with a given automation. This knowledge gap may exist because human behavior and joint system performance can be thought of as the result of a combination of many factors only one of which is TiA (Hancock et al., 2011; Schaefer et al., 2014). Research aimed at predicting interaction behaviors has previously met with some success, particularly with respect to decision aid systems (Bliss et al., 1995; Meyer and Bitan, 2002; Meyer et al., 2014) and automated driving aids (Kumagai et al., 2003; Gold et al., 2015; Terai et al., 2015). We consider such results to suggest that understanding the interaction behavior may be a more fruitful and immediate route toward active, online mitigation of problems thought to arise from mis-calibrated TiA. This idea is developed with the appreciation that interaction behaviors result from decisions about how and when to interact, and any individual interaction decision may or may not be motivated by a change in TiA.

Our specific proposal is that, as much as behavior is a key to managing HAI, understanding the process of decisionmaking in the context of HAI is critical for understanding and predicting interaction behaviors. It is an important step that is implicitly necessary for eventual online mitigation of inappropriate interaction behavior. This is especially applicable in our discussion inasmuch as we believe that TiA reflects changing degrees of perceived risk and uncertainty and is an instance of value based decision making in dynamic contexts. We further suggest that physiological correlates of value based decisions could be measured and leveraged to provide valuable data that may increase the likelihood of predicting a consequent interaction. To develop the connection between TiA and value based decision making, the discussion begins by reviewing extant human factors literature to demonstrate that, while TiA is one of many important factors influencing HAI performance, it is ultimately the interaction behavior that is of interest. It is then argued that this behavior, if intentional, is the result of a decision, and thus understanding the decision process leading to the behavior may facilitate near- to medium-term solutions for active mitigation strategies while understanding of the nuances and complexities of TiA continues to evolve over the long term. The discussion then turns towards a synthesis of selected cognitive neuroscience literature that focuses particularly on value based decision making. We conclude with future research directions that would be necessary to enable decision based monitoring and prediction of interaction behaviors and the eventual development of active mitigations for the types of HAI problems currently believed to be brought about by mis-calibrated TiA.

# THE IMPORTANCE OF TiA IN JOINT SYSTEM PERFORMANCE

The term automation, or automated system, as used here is best defined by Parasuraman et al. (2000) as a ''machine execution of functions''. This definition includes automation with capabilities as diverse as controlling a sophisticated cockpit system or as simple as an automated coffee maker. Because automation is not yet fully ''intelligent'' it has no agency for adapting to unexpected circumstances, and therefore often requires the supervision and/or occasional intervention of humans. Part of this supervisory role requires that there be HAIs, but these interactions need to be appropriate, or the joint system performance will suffer. Decades of HAI system research have indicated that appropriate interactions are the result of decisions subsequent to calibrated TiA. An established conceptual model of factors influencing HAI performance is provided in **Figure 1**. As the model suggests, TiA has traditionally been considered to be the critical component driving human user decisions about interactions such as intervening in an automated task. Given the importance that TiA has been accorded to overall joint system performance, in this section we provide a brief review of important aspects of TiA and its dynamics. We aim to highlight the complex relationship between TiA and human user behavior, and implications arising from this relationship that imply that even if moment to moment levels of TiA were to be measured, it is unclear how such information could be leveraged to predict an interaction behavior.

Early theories about the construct of TiA were developed from the psychological construct of interpersonal trust, and they posited that calibrated TiA was critical for successful HAI system performance (Sheridan, 1980; Sheridan and Hennessy, 1984). There are aspects of interpersonal trust that are analogous to TiA,

in particular that there needs to be a sense of risk or vulnerability on the part of the trustor for trust to develop (Lee and Moray, 1994; Muir, 1994; Corritore et al., 2003; Lee and See, 2004; Evans and Krueger, 2011). However, it has been debated whether the two constructs are homologous (Madhavan and Wiegmann, 2007), and so trust as it specifically applies to automation became a central point of interest in human factors research aimed at improving joint system performance (Lee and Moray, 1992; Muir, 1994; Muir and Moray, 1996; Lee and See, 2004). Myriad definitions of TiA imply that it is the result of a feeling of trustworthiness towards the automation such that a human user can depend on the automation to perform the task for which it was designed. It is worth noting that if the consequence of the task to the human user is small, if TiA develops at all, its level becomes irrelevant because the outcome of the joint system fails to be important. Therefore, much like interpersonal trust (Lee and Moray, 1994; Muir, 1994), TiA develops in the face of a sense of risk. In these situations, TiA then develops and shows dynamic changes from the ongoing comparison of the expectations about the automation's behavior and observations by the human user about the automation's performance weighted heavily on the risk borne by the human user (Sheridan and Hennessy, 1984; Muir, 1994; Muir and Moray, 1996).

# Determinants and Dynamics of Trust in Automation: Expectations and Observations

One of the first explicit theories of TiA (Muir, 1994) stated that appropriate levels of TiA would develop if three expectations were met during the course of automation interaction. These expectations are technical competence, persistence, and fiduciary responsibility, but they play differential roles in TiA development and dynamics throughout the course of automation use. For example, perceptions of competence might be more important in the early stages of automation use than later in time. The expectation of technical competence is the expectation that the automation will accurately and successfully perform the functions for which it was designed. Persistence, perhaps here better conceived of as predictability, relates to the issue of reliability in that an automation that performs in a particular manner now will be expected to perform in a same or similar manner when it encounters similar circumstances in the future. Finally, fiduciary responsibility addresses the notion that a given human user will hold expectations of an automation of a particular type that will impact role allocation. That is, the human user will expect that the automation will necessarily be responsible for its designed functions as they understand them and thus fewer personal resources need to be allocated to carrying out those functions. The importance of these expectations related to TiA dynamics differ depending on the stage of the interaction with the automation.

When first presented with an automated system there is limited information available for the human user to observe, and thus little with which to evaluate the trustworthiness of the automation. Some key elements that significantly affect early levels of TiA include initial expectations borne from biases toward automations in general, and initial observations about the design of the automation (Muir and Moray, 1996; Nass et al., 1996; Dzindolet et al., 2002; Lee and See, 2004; Parasuraman and Miller, 2004; Miller, 2005; Merritt and Ilgen, 2008; Merritt, 2011; Merritt et al., 2012; Pak et al., 2012). After having been introduced to an automation, human users tend to explore different strategies for subsequent interaction (Lee and Moray, 1992) and thus learn more about the automation's behavior. This experimentation arguably helps the human user gauge competence, which emerges as one of the most important predictors of TiA at this early stage. However, it is worth noting that for various reasons, human users are notoriously poor at making accurate judgments of competence (Sheridan and Hennessy, 1984; Lee and Moray, 1992; Dzindolet et al., 2002, 2003; Madhavan et al., 2006; Verberne et al., 2012; Merritt et al., 2014). Once automation competence has been judged, whether correct or not, the most important factor driving levels of TiA is persistence or predictability of performance over time (Lee and Moray, 1992). Persistence of performance is important enough that as long as errors are predictable and the automation error rate is at a consistent rate of approximately 30% or less, most human users will decide to continue to use and benefit from the automation (Parasuraman et al., 2000; Wickens and Dixon, 2007; Wang et al., 2009). As levels of TiA dynamically change throughout the course of observations about the automation's behavior, theory posits that interaction decisions, and consequent behaviors, should reflect the extant level of TiA. If there is too much or too little TiA, as it goes, a human user may decide to overuse or underuse the automation, respectively. Specific patterns of behaviors resulting from decisions about how to interact with the automation have been well documented and are commonly referred to as trust outcomes as they are believed to directly reflect certain levels of TiA.

# Trust Outcomes and their Relationship to TiA

The trust outcomes most commonly discussed are misuse and disuse and are described in detail by Parasuraman and Riley (1997). Misuse refers to instances when the automation is used without undo skepticism, tending to result in overuse (Parasuraman and Riley, 1997; Bahner et al., 2008; Parasuraman and Manzey, 2010). Misuse has two related causes; automation bias and complacency (Manzey et al., 2006; Parasuraman and Manzey, 2010). They are related in that they both result in a lack of monitoring where lack of attention plays a central role (Parasuraman and Manzey, 2010). Automation bias arises through the mere presence of an automated system, possibly because humans demonstrate a tendency to choose the route of least cognitive effort, making it easier, or at least preferable, to accept that feedback from an automation as correct (Dzindolet et al., 1999, 2001; Skitka et al., 1999, 2000; Wang et al., 2008; Parasuraman and Manzey, 2010; Goddard et al., 2014; Mosier and Skitka, 1996). Complacency, less well understood, can be said to occur when monitoring is less than optimal and joint system performance suffers (Parasuraman and Manzey, 2010). However, both automation bias and complacency tend to increase in cases of high workload and high consequence environments wherein users often make conscious decisions to rely on even imperfect automation (Dixon et al., 2007; Wickens and Dixon, 2007). Disuse describes a continuum that spans from the user underutilizing the automation to entirely abandoning the automation in favor of a manual mode. Disuse tends to occur if a human user has a high expectation of automation performance and then observes unexpected errors or has more self confidence in her ability to perform the task than confidence in the efficacy of the automation for the same task (Lee and Moray, 1994; Parasuraman and Riley, 1997; Moray et al., 2000; Dzindolet et al., 2003).

Although trust outcomes have been well defined, a synthesis of the literature, and recent experimental evidence (Wiczorek and Manzey, 2010; Chancey et al., 2015) indicate a far more complex relationship between TiA and trust outcomes than is implied in the above discussion, and this implies that predicting interactions based on extant TiA levels is problematic. Such complex interactions involve perceived risk, self-confidence, workload, and even personality type (Lee and Moray, 1994; Muir, 1994; Parasuraman and Riley, 1997; Lee and See, 2004; Merritt and Ilgen, 2008; Hancock et al., 2011; Schaefer et al., 2012; Merritt et al., 2014). For instance, human users have been documented as reporting a high level of TiA and then, paradoxically choosing a manual operation mode, demonstrating disuse (Lee and Moray, 1992). Conversely, even when low TiA has been reported, human users may misuse even a poorly competent automation, particularly under high workload conditions (Daly, 2002; Biros et al., 2004). Clearly levels of TiA do not map uniquely onto trust outcomes, regardless of how they are represented (i.e., attention, intervention rate, etc.). Therefore, they are not predictive of the way a human user will decide to interact with an automation, limiting the use of measuring TiA for real world applications to improve HAI. We suggest that this is because TiA is far more complex than may be useful for those with more immediate concerns regarding actively managing HAI. However, it is important to note that when TiA is studied it is the interaction behavior that is of interest most often. Therefore, regardless of the manageability of trust, what might be learned if we focus more simply and exclusively on the behavior?

# TiA as Predictable Behavior

The present discussion is not the first to offer that a shift in focus from trust to behavior is well justified. In fact, a number of researchers in this domain have re-framed the problem space of TiA into one of reliance and compliance, which are defined exclusively in terms of observable behavior, and are not intended to imply specific psychological cause such as trust (Meyer, 2001, 2004; Parasuraman et al., 2008; Rice, 2009; Meyer et al., 2014). Indeed, there is a non-unique mapping of reliance and compliance to traditional trust outcomes such that an observation of inappropriate amounts of either may alternately signal disuse or misuse and possibly motivate conflicting interpretations of TiA. We find these behaviorally defined terms to be useful in their functionality for application in real world situations. That is, objectively defined and observable behaviors are especially valuable for the purposes of modeling and prediction because they obviate the need for drawing inferences to and making assumptions about manifestations of more subjectively defined constructs, such as TiA, automation bias or complacency that are difficult to measure objectively and thus unsuitable for use in attempts at active optimization and/or mitigation.

Reliance is the tendency of the human user to accept the lack of an alarm, alert, warning, or prompt as a true reflection of the state of the world (Lee and Moray, 1994; Singh et al., 1997; Parasuraman et al., 2000; Yeh and Wickens, 2000; Moray, 2003; Dixon et al., 2006). That is, in the absence of an alarm or warning, the human user accepts, often tacitly, that all is well and there is no reason for possible intervention. Compliance, on the other hand, is defined when the user responds to, putatively agrees with, and ultimately takes the action specified by an alarm or recommendation from the automation (Meyer, 2001, 2004). Though reliance and compliance are often discussed in terms of optimal behavior, too much of either in the wrong context is detrimental to system performance. For instance, if an alarm is absent and the human user assumes that no circumstances warranting an alarm exist and thus fails to monitor the automation over significant time, he or she is at risk of over-reliance and the consequences thereof. Conversely, overcompliance occurs when the human accepts all suggestions from the automation (when present) without confirming their validity.

Beyond observation of general behavioral patterns, the greater benefit of defining compliance and reliance behaviors has been in providing an avenue towards greater precision in understanding the factors that affect automation use during HAI, which might eventually lead to prediction of an interaction. For instance, some have observed that reliance and compliance are differentially affected by error type, i.e., false alarms vs. misses in target detection tasks (Meyer, 2004; Rice and Geels, 2010; Wiczorek and Manzey, 2014), and by the predictive value of the alarm (Meyer and Bitan, 2002; Manzey et al., 2014). If a human user observes frequent failures to trigger alarms, the frequency of monitoring the automation will increase, thereby reducing reliance on the automated agent (Masalonis and Parasuraman, 1999; Bagheri and Jamieson, 2004; Meyer, 2004; Madhavan and Wiegmann, 2007; Parasuraman and Manzey, 2010; Geels-Blair et al., 2013). Compliance, however, is degraded by higher rates of false alarms. In particular, when higher rates of false alarms are observed, users tend to consume critical time and attentional resources to verify alarms before choosing a response. Further progress in this line of inquiry has resulted in more general characterization of how interaction behaviors change with the positive and negative predictive value of an alarm. Positive predictive value is derived from a Bayesian calculation of the likely existence of a hazard given an alarm and, likewise, the probability of an alarm given no existing hazard (Meyer et al., 2014). Negative predictive value is calculated similarly, but in the absence of an alarm. Therefore, positive predictive value decreases as false alarm rate goes up and negative predictive value decreases with more frequent misses (no alarm in the presence of a hazard). Interaction behavior is thus differentially affected by changing positive vs. negative predictive value. Positive predictive value has been shown to have strong effects on reliance, but only for values less than 0.75 (Meyer et al., 2014), where values below this threshold have been associated with excessive time spent monitoring the automation. Research in HAI domains has thus befitted considerably from the use of these narrowly and objectively defined behavioral terms.

We advocate here for shifting research towards more clearly defined behaviors and the factors that affect them because of how this shift creates important opportunities for systematic research into HAI. The domain of application for such a shifted focus would include contexts where TiA may be involved, at least inasmuch as TiA reflects assessments of the relative value of specific behavioral options defined in terms of probable risk versus reward. We argue that such behavior-based understandings are important for progress on multiple levels from phenomenology to predictive modeling. The extant work discussed above has provided an essential corpus of knowledge regarding the relationship between automation performance characteristics (i.e., error rate, type, and predictive value) and human user interaction behaviors. However, we also suggest that in order to be useful down the road for real-world mitigation of inappropriate interactions, this shift from trust to behavior does not go quite far enough for two important reasons. First, to mitigate a potentially detrimental interaction behavior in a dynamic context, prediction is necessary. This is because a behavior that has already occurred cannot be changed and the consequences are likely to be too immediate to offset in post hoc fashion. Moreover, the predictive power required must occur on a time-scale that allows a reasonable opportunity to enact a mitigation when an inappropriate behavior is expected. Second, the current understanding of reliance and compliance is tied to automation design; an automation that frequently misses events reduces reliance, and an automation that frequently produces false alarms reduces compliance. This understanding, then, usefully provides an improved framework for HAI, but has yet to account for variability in individual instances of HAI. Therefore, the predictive power of the current understanding of interaction behaviors based on population averages remains limited to overall design strategies whereas we are interested in building towards eventual prediction and mitigation of reliance and compliance at the level of individual instances of interaction behavior. In order to improve the ability to predict an interaction behavior we thus believe it is necessary to consider not only the effects of automation design on interaction behaviors such as compliance and reliance, but also the individual internal phenomena that precedes the behavior.

## INTERACTION BEHAVIOR REFRAMED AS A DECISION TO INTERACT WITH AN AUTOMATION

Before an intentional behavior occurs, the human user must make a decision as to which among a limited array of options will be selected. Here we argue that research concerned with improving HAI would benefit greatly from studying the decisions that precede interaction behaviors. Such an approach satisfies the need for focus on individual interactions in a manner that affords prediction on a time-scale that is useful for active mitigation. We define such decisions as interaction decisions, given as specific to the intention to interact with an automation. While TiA has often been considered to motivate interaction decisions, the richness of the decision process itself, as well as accompanying stereotypical psychophysiological indicators thereof, has not been thoroughly investigated as a source of information that could be applied to the prediction of a consequent interaction behavior. Our starting point in this pursuit is to understand the underlying psychological and physiological processes of decision making, with a particular focus on value based decision making. This understanding can provide a cornerstone for the advancement of scientifically based hypotheses about how interaction behaviors may eventually be predicted for the sake of active mitigation. Predicting decision outcomes, or the interaction behavior in a realworld HAI context, is of course not trivial, but laboratorybased research in decision neuroscience has established decision making as a reasonably stereotyped process with clear behavioral and physiological precursors. Further, the ability to predict decision outcomes has been pursued by both the cognitive neuroscience (Soon et al., 2008; Haynes, 2011; Perez et al., 2015) and brain computer interface (Musallam et al., 2004) communities. Indeed, attempts at predicting some types of decision outcome behaviors have already met with success in a laboratory environment, possibly because the specifics of a decision process in the brain begin even before there is conscious awareness of the impending decision (Soon et al., 2008, 2013; Haynes, 2011; Perez et al., 2015). Some of these studies have been criticized because there lacks a sense of risk or value to the decision maker in a controlled experiment, and therefore, the assumption is that the decision outcomes that are being predicted are trivial (Gold and Shadlen, 2007; Lavazza and De Caro, 2010).

The lack of risk, value, or reward in these controlled laboratory environments is in contrast to interaction decisions that inherently involve some type of personal risk or reward. For example, over relying on an automation can compromise joint system performance, and therefore causes degradations in joint system performance. Thus, we are chiefly concerned with decisions that are based on expected value and risk, or value based decisions (Rangel et al., 2008; Wallis, 2012). Value based decisions are particularly relevant to the HAI context because it is often required that a human user continuously weigh the expected personal value of allowing the automation to complete the task versus performing it manually or, rather, whether to comply with the recommendation of an automated system. Importantly, this assessment and subsequent judgment of value to the user must be made against the backdrop of risk that the decision may compromise joint system performance. Thus, through the common elements of risk, reward, and expected value, we believe interaction decisions during HAI to be an instance of value based decision making. We believe that understanding the value based decision process is important to improving HAI and, therefore, we briefly discuss results of value based decision making research as it relates to HAI in order to support hypotheses forwarded in the discussion, aimed at establishing a research path that will allow the eventual prediction of HAI interaction behaviors.

# The Importance of Considering the Decision Process

An important argument in favor of studying the decision process in order to improve HAI is that significant efforts in cognitive neuroscience have revealed decision making as fairly stereotyped, and therefore, a potentially predictable process. Moreover, this body of research has identified a number of psychophysiological correlates that unfold in advance of, and during a decision. Critically, these correlates are measurable, and therefore useful for understanding the decision process, at least in laboratory settings. Some have observed that these correlates unfold in predictable ways through defined cognitive stages, and therefore measuring them has potential use for active mitigations of inappropriate interaction decisions and behavior. This approach is fundamentally different than attempting to measure and calibrate TiA because the psychophysiological correlates of a decision are measurable whereas the construct of TiA is yet to be defined in a way that is equally useful for active monitoring. In general, many cognitive neuroscientists model decisions as comprising three cognitive stages (Fellows, 2004; Bogacz, 2007). However, five cognitive processes, some analogous to stages of the cognitive models of general decisions, have been described in value based decision making (Rangel et al., 2008) and are therefore relevant to our discussion of interaction decisions. These processes, which are not discrete stages per se, are: (1) representation of the problem, i.e., identification of alternative choices, and of internal and external states that affect the value of the choices; (2) evaluation of gathered evidence that allows the assignment of a value to the alternatives; (3) comparison of these values in order to make a decision; (4) accumulation of the comparative value for each alternative and making the decision; and (5) generation of prediction errors that provide feedback in order for learning to occur. The psychophysiological processes that unfold within the first four processes will be discussed as they relate to HAI contexts such as risk and reward. The fifth process, generating feedback on the decision has been studied in the context of learning, and may be useful for later development of adaptive mitigation strategies, but is beyond the scope of the current review (Nieuwenhuis et al., 2005; Christie and Tata, 2009; Cohen et al., 2011; van de Vijver et al., 2011). We note that we discuss these processes sequentially mainly for organizational purposes, however, during the decision process they may overlap or even occur in parallel (Rangel et al., 2008).

# The Value Based Decision Process

In order for the need to identify alternatives to arise, there must be some recognition of the need for a decision; in a sense it is the motivation to perform a task (Gold and Shadlen, 2007). Decisions must be initiated by either salient external or internal stimuli. These stimuli will often produce an orienting response (Sokolov et al., 2002; Glimcher and Rustichini, 2004; Delgado et al., 2005), characterized in humans by a measurable increase in tonic skin conductance (SC) levels and a decrease in heart rate variability (Figner and Murphy, 2011). In an HAI domain, relevant stimuli typically include those specifying alerts from the automation, acute changes in environment, or internal feelings that the current behavior is inappropriate (typically seen as an error-related potential in the brain or a gradual shift in peripheral physiology). Once the need for a decision has been established, however, decision alternatives are identified. As alternatives are identified, in the case of interaction decisions, the human user will also identify, if not consciously, a representation of internal and external states (Rangel et al., 2008). These representations play an important part during the process of assigning values to individual alternatives. For example, a human user is more likely to take control from the automation if they detect that the automation is malfunctioning and they perceive an associated risk. The neural basis of this early stage in the decision process is not well understood. For example, it is unclear how the brain decides which alternatives should be considered, and if there is a functional limit to the number that can be assessed at one time (Rangel et al., 2008). Nevertheless, such questions are important for determining how to leverage physiological indicators into models of decision making during HAI.

Once the possible alternatives are identified evidence for or against each alternative must be evaluated in order to make an optimal decision. In the case of interaction decisions, which due to the presence of risk are analogous to value based decisions, it has been hypothesized by some (Rangel et al., 2008; Glimcher and Fehr, 2013) Reading hidden intentions in the human brain cognitive valuation systems that the brain might use; the Pavlovian, Habitual, and Goal directed. We believe the goal directed system to be most relevant to our discussion because the goal directed system assigns values to potential actions by calculating action-outcome associations from previous experience and comparing this value to the perceived rewards associated with possible outcomes of the decision (Rangel et al., 2008). In the goal directed valuation system the value assigned to a piece of evidence is equivalent to the potential value of the alternative it supports, with the value assigned to an alternative being equal to the expected reward of the action. In the context of HAI, an important research question would be whether the probability of success is greater by relying on the automation or not and, moreover, how that probability scales with perceived risk to determine the direction of a given interaction decision.

When a person valuates a piece of evidence they will do so by observing the relevant data (e.g., visual scanning, sound or other stimulus), consulting their memory, and integrating this against a backdrop of expectations (Mulder et al., 2014). In the case of visual evidence, gaze fixation is thought to support evidence evaluation such that evidence about the value of an alternative is sampled at each fixation (Krajbich et al., 2010; Krajbich and Rangel, 2011). Memory consultation causes a person to compare past decision outcomes with the available alternatives. The brain creates a prediction error that would represent the difference between the expected value of choosing current decision alternatives from the value that has been experienced in the past by choosing alternatives that are similar in nature (Hare et al., 2008). For example, consider a human user who has previously experienced aberrant behavior from an automation, but there has been no decrement in joint system performance, and joint system performance continues to remain better than what would be expected from only one agent performing the task. Even in risky environments such as in the battlefield, or during a search and rescue operation, the experienced user is more likely to rely on an automation (Lyons and Stokes, 2012) than a user who has not experienced the aberrant behavior because the experienced user has realized the value in relying on the automation despite the probability of an error.

At a cellular level, data have suggested that the cognitive evaluation of evidence is supported by neural ''evaluators'' that store dynamic estimates of which decision alternative is supported by the evidence. For instance, studies using fMRI in risk reward scenarios have identified two candidate neural evaluators; the amygdala and ventral striatum. A reward based fMRI study indicated that the amygdala evaluates the cost or risk of acting on an alternative (Yacubian et al., 2006; Basten et al., 2010). In the same fMRI study the ventral striatum was implicated in the formation of representations of the expected value or reward of an alternative (Yacubian et al., 2006; Kable and Glimcher, 2007; Rangel et al., 2008; Basten et al., 2010; Lim et al., 2011). Other authors, however, have found that in addition to the amygdala and the ventral striatum that the lateral orbitofrontal cortex and the medial orbital frontal cortex also act as neural evaluators for risk and reward, respectively (Hare et al., 2008; Rangel et al., 2008; Rangel and Hare, 2010). The neural substrates that have been observed support value based decision making processes are detailed in **Table 1**.

The neural evaluators, then, form representations of the risks and rewards for each alternative; the benefit of relying or complying with an automation, as opposed to choosing to complete the task manually. As the risks and rewards of a potential interaction behavior are processed by the amygdala and ventral striatum, the value of these representations must be assessed relative to each other; they must be compared. Neural correlates of this third process involved in value based decision making, comparison of the values assigned to the evidence, have been observed in fMRI studies. That is, value based comparison has been suggested as supported by activation in the ventral medial prefrontal cortex (vmPFC; Chib et al., 2009; Gläscher et al., 2009; Basten et al., 2010), whereas the ''comparator'' function in perceptual decisions has been associated with increased activity in the dorsolateral prefrontal cortex (dlPFC; Basten et al., 2010; Philiastides et al., 2011). While the evidence comparison process unfolds, some have hypothesized that the comparative value, also known as the decision variable, is accumulated in the lateral intraparietal cortex (LIP) until a decision threshold is reached, bringing about a decision (Platt and Glimcher, 1999; Kiani and Shadlen, 2009; Mulder et al., 2014). Evidence to support this hypothesis has mainly been shown in primate studies of single cell recordings during cued saccade trials (Platt and Glimcher, 1999; Platt, 2002). However, there has been evidence from fMRI studies that the human parietal cortex is also involved in accumulating the decision variable (Ploran et al., 2007; Heekeren et al., 2008). It is interesting to note that the temporal integration of activity in the frontal-parietal regions, which are considered to be involved in comparing and accumulating compared value


signals, has been observed as preceeding the conscious decision to act (Gold and Shadlen, 2007; Soon et al., 2013; Perez et al., 2015).

The putative involvement of the parietal cortex in decision making is noteworthy because of its central role in the process. For example, gaze fixation, critical for evidence evaluation (Poole and Ball, 2006) in visually based decisions, is controlled by the LIP in monkeys (Coe et al., 2002). This region forms a ''salience map'' for the oculomotor system to saccade to a target, or maintain gaze fixation on a target (Goldberg et al., 2006). The LIP then, not only plays a role in accumulating the comparative value of the evidence as discussed above, but is critical for its initial evaluation. Brain computer interface research has also found that the medial intraparietal cortex in monkeys forms representations of the value of an alternative that has been encoded in the vmPFC (Musallam et al., 2004), such that the intent of the monkey to choose one alternative over an other can be decoded from intracellular electrodes. Although much evidence of the importance of the parietal cortex during decision making, and especially value based decision making, has come from primate research, there is evidence that analogs in the human parietal cortex are also central to decision making (Ploran et al., 2007; Heekeren et al., 2008).

Despite the hypothesized causal role of integrated frontalparietal activity in the conscious decision to act, there are no measurable psychophysiological variables that allow an accurate determination of the exact time that a decision threshold is reached. However, psychophysiological correlates occurring hundreds of milliseconds before a conscious decision involving risk and reward (Cohen et al., 2009), inherent in value based decision making, have been identified. For instance, the readiness potential, a slow negativity in scalp recording of cortical activity precedes fully endogenous decisions by a few hundred milliseconds (Libet, 1993). Even more proximal to the decision spectral correlates have been observed. In a paradigm involving playing a competitive game against a computer, spectral decomposition of scalp-recorded EEG led to the finding that the decision process was accompanied by a general shift in power between lower bands (delta, 1–4 Hz and theta, 5–7 Hz) to higher frequency bands (alpha, 8–12 Hz and beta, 13–35 Hz), as well as a broadband increase in cross-trial phase coherence at about 220 ms post stimulus (Cohen and Donner, 2013). Similar indications were found during complex real world choice tasks and a two-choice forced-decision paradigm. In these cases, significant correlations of increased power were seen in delta, theta, beta, and gamma (36 + Hz) bands of EEG activity approximately 250–500 ms post-stimulus (Guggisberg et al., 2007; Davis et al., 2011). In decisions involving risk, risk is represented by an asymmetry in the alpha band such that there is an increased alpha power in the right frontal region (Gianotti et al., 2009). These spectral correlates of value based decision making are measurable in real time and available for current application outside a laboratory, which is encouraging in the context of improving HAI. However, these scalp recorded spectral correlates occur only hundreds of milliseconds before an interaction decision and therefore have limited use because they are so temporally proximal to the behavior itself.

The proximity of these value based decision correlates to the actual decision may be discouraging in the context of predicting and mitigating interaction behavior. Nonetheless, research efforts in decision neuroscience and in brain computer interface have found in fMRI studies that the correlates of an outcome of a decision to move at a time chosen by the subject are measurable up to 7 s before the conscious awareness of the decision is reached. Moreover, through analysis of these correlates, the intended goal of the decision can be decoded before conscious awareness of it arises (Haynes et al., 2007; Soon et al., 2008, 2013; Haynes, 2011; Perez et al., 2015). In two fMRI studies (Haynes, 2011; Soon et al., 2013) subjects were asked to decide at will when to either press a button on their left or right side, or add or multiply a set of numbers, and then to report when they were consciously aware of the decision. Spatial pattern analysis of the blood oxygen level dependent signal, a measure of neural activation in fMRI studies, revealed that the frontal polar cortex appeared to encode the intentions of the subjects before they reported having made the decision. In a driving study using implanted EEG electrodes in human epilepsy patients a modulation of gamma power in the posterior lateral cortex predicted whether the subjects would turn left or right at an intersection before they consciously made the decision (Perez et al., 2015). These studies made use of technology such as fMRI that as yet is not available for real world applications, unlike EEG, because of the need for the subject to lie still in the large, importable fMRI equipment. Further, although EEG is portable, it cannot directly measure activity in deep cortical and subcortical areas, and these are exactly the areas that showed activity prior to a conscious decision. However, the results are encouraging for making use of the interaction decision process to improve HAI and, moreover, can be leveraged into research aimed at developing models that capture relations between cortical and subcortical brain activations during value based decision making. Identification of such relations is an important avenue for future research aimed at active mitigation during HAI.

Indeed, decision neuroscience and brain computer interface research have facilitated the development of precise understandings of decision making that could facilitate the development of methods for identifying an interaction decision within the contextual space of HAI. Efforts in decision neuroscience as well as in more applied domains, such as neuroergonomics, have shown that there are clear and measurable behavioral and psychophysiological correlates (fMRI activation patterns, EEG, SC, gaze fixation, heart rate, etc.) of component processes that are antecedent to the decision. In addition, decision neuroscience has begun to provide an understanding of the underlying cortical and sub-cortical processes involved in decision making. While many of these processes have as yet only been identified by fMRI, their understanding will allow meaningful hypotheses to be advanced about measures that can be recorded in real time.

# DISCUSSION

HAI systems have as yet to live up to their expectations, and one critical reason is that human users often make inappropriate decisions about how and when to interact with an automation. These interaction decisions have traditionally been considered to be motivated by extant levels of TiA. Therefore, if TiA can be measured, it is expected that it can be managed and inappropriate interaction behaviors could be mitigated. Given that substantive theory, it was appreciated that TiA is an important construct that undoubtedly affects human user interaction behavior, and hence we reviewed and synthesized the TiA literature. From that exercise, we observed that the relationship between TiA and human behavior is complex and not fully understood. Further, relevant to immediate real world applications for improving HAI system performance, TiA cannot be readily measured, and even if it were measurable in real time it is unclear how certain levels of TiA map onto specific interaction behaviors. By contrast, specific behaviors such as reliance and compliance are readily observed and measured in real time and do not have the confounding effect of inferring psychological causality. Such cause-agnostic variables are particularly attractive in HAI research aimed at defining concrete methods for improving joint system performance, both in terms of initial system design as well as for ultimate real-time applications.

Reframing the problem space of HAI and TiA as a problem of behavior, rather than of TiA, has been successful in allowing general predictions of interaction behavior based on knowledge of system design given specific environmental and internal conditions such as increased risk or increased workload, respectively. For example, knowing that an automation is prone to false alarms will allow the general prediction that a human user will often fail to comply with alerts. This understanding allows a system designer to set thresholds for alarms that are appropriate to intended use. For example, in high risk environments it may be better to set an alarm threshold low so that critical cases are not missed. This predictive ability has been significant in designing systems, but the knowledge is unlikely to allow active mitigation of interaction behavior on an individual basis in real time application, an implicit goal of the aims of HAI focused research. This is the case for two important reasons. First, we argue that focusing attention on interaction behavior does not go far enough because once the behavior has occurred it is too late to mitigate it in an post hoc fashion in a timely manner. Second, behaviorally based predictions for automation use are by definition general because the predictive ability has been achieved through extensive observation of how human user behavior is affected by system design. The behaviorally based predictions, however, do not take into account individual variation and dynamic changes in environment. Therefore, they are unlikely to apply to individuals on a case by case basis. For example, it has been shown that a human user may continue to rely on an automation that persistently commits errors of omission, or ''misses'', due to automation independent reasons such as increasing workload. Therefore, an approach is needed that considers individual cognitive and behavioral aspects, as well as ensuring that there is time to not only mitigate behavior, but allow the prediction of the likelihood of an interaction behavior.

We note that antecedent to the interaction behavior is a decision, which has been characterized by the decision neuroscience community as fairly stereotyped and accompanied by measurable psychophysiological correlates. These properties of decisions suggest predictability, and importantly, as decisions are individual in nature, these properties also imply the likelihood of behavioral prediction at an individual level. We thus believe that understanding the interaction decision is a useful approach to improving HAI, and that with future research that the decision correlates can be leveraged to predict the likelihood of an individual's impending interaction behavior. This approach not only satisfies the problems just discussed, but takes into consideration the human and environmental variability that is found in real world situations in ways that research focused on reliance and compliance has yet to achieve. While it is true that many of the psychophysiological correlates of value based decisions are only measurable with fMRI, we consider that the discoveries afforded by fMRI studies provide a solid basis to form specific scientific hypotheses to guide future research aimed at understanding the interaction decision and consequent interaction behavior. We understand that the fruit of decision neuroscience research might be applied to any domain where it would be advantageous for one decision outcome over another. However, our domain of interest is HAI, and therefore, our focus is in leveraging what is known about value based decision making to understand interaction decisions in the hopes of eventually predicting the likelihood of one decision over another. This goal will require future research, and we begin by forwarding hypotheses to guide research efforts.

One of our first assumptions in this review is that interaction decisions are in fact a case of value based decisions, and that assumption guides our first hypothesis; interaction decisions are a special instance of value based decisions, and therefore the neural correlates accompanying value based decisions will be observable during interaction decisions. One of the first avenues of research is to demonstrate that value based decisions and interaction decisions are analogous in that the sense of risk and reward are inherent in both. This could be achieved by measuring bilateral frontal alpha power during an interaction decision to look for the characteristic asymmetry found in situations entailing risk. Further research to support this hypothesis should necessarily include observing the predicted vmPFC-parietal activation found in value based decision making, during an interaction decision. The strength of this evidence, if found, could be enhanced if concurrent with the parietal-vmPFC activation there is significantly less activation of the dlPFC.

If our first hypothesis is confirmed, we can begin to make more specific hypotheses. Our second hypothesis relates to the fact that there is little understanding of the first stage of value based decision making, the observation of alternatives and representation of internal and external states. Understanding this stage could be particularly important for mitigating inappropriate behavior because it is also accompanied by physiological changes (SC, decreased heart rate variability), which are readily observed. We believe that future research should be aimed at revealing neural correlates of this stage. We hypothesize that activity in the frontal polar cortex and in areas of the parietal cortex during, or just prior to a conscious decision to interact with an automation will occur along with or just prior to the physiological correlates. Evidence previously discussed, that activation patterns in the area of the frontal polar cortex in humans, and in the area of the mid-parietal in monkeys, can be decoded to reveal behavioral intention supports this hypothesis. Should this hypothesis be confirmed, it would add significant evidence that the approach of focusing on interaction decisions will provide an improved method to mitigate interaction behaviors on an individual level. For example, consider the fact that users tend to rely on an automation in the face of risk as demonstrated by traditional behavioral research. However, if this interaction behavior is inappropriate, but the decision to rely can be decoded, there is a chance to mitigate the inappropriate interaction behavior.

While not hypotheses, we believe that future research should also be focused on understanding the psychophysiological basis for, or correlates of, the interaction behaviors of relying or complying on an automation. One first step should include finding the psychophysiological correlates of the demonstrated tendencies of users to rely or comply with automations in circumstances such as workload and risk. For example, what psychophysiological processes drive a human user to perhaps over comply with an automation in conditions of increased cognitive workload, and are there measurable correlates that would suggest that this is the likely interaction behavior? Conversely, what psychophysiological correlates can be found, apart from alpha asymmetry, that appear to suggest the human user is perceiving increased risk, and therefore more likely to over rely on an automation? In order to mitigate disuse, potential psychophysiological correlates, such as particular levels of heart rate variability, SC and EEG power require future study.

Finally, though on a longer horizon, we suggest that it might ultimately be feasible to leverage the understanding of value based decisions in behavioral mitigations aimed at improving HAI system performance. Logically, management of a particular interaction between a human and an automated system requires a minimum of three elements. First, it is essential to understand the capabilities and vulnerabilities of both the particular operator and the particular automation as well as how these may vary under different task and contextual constraints. With such knowledge, one may be able to infer an optimal strategy for allocation of control or decision authority, such as has been done with behaviorally based predictions. Second, though the behavior of automated controls is relatively predictable with knowledge of how its control system was designed, establishing likelihood of human behaviors is a much more challenging task. Therefore, it is also critical to develop methods for prediction of likely changes in operator behavior on a time-scale that leaves room for active intervention through the understanding of the interaction decision. Third, an understanding of how to influence the decision process of humans in principled ways is necessary to ultimately define appropriate systems of actuation when inappropriate behaviors are expected. Of these three elements, it seems that the second may be the most challenging. This is because it is relatively trivial to establish baseline operational or performance characteristics of both humans and automated systems and it is already known that human behaviors and perceptions are subject to influence by a variety of factors, including workload, display properties, transparency, and may be amenable to influence by other task and contextual factors. However, predicting impending behavioral choices is particularly challenging because this requires methods to develop advance insight into the unfolding of the decision process that has largely been studied through the use of fMRI. Here, we offer one way of addressing this; by the application of modern techniques from cognitive neuroscience and psychophysiology.

# CONCLUSION

The main purpose of this review is to explore the gap between the understanding of TiA and the actual human user interaction behavior which does not appear to have a clear mapping from TiA levels. We argue in this article that, in addition to understanding the influence of changing levels of TiA, understanding the antecedent decision of the human user's interaction behavior is critical for improving HAI system performance. Decisions have not been explicitly studied in the context of HAI and TiA specifically, but due to the importance of these interaction decisions we reviewed decision making literature and summarized findings that provide a basic understanding of the psychophysiological processes involved in decision making. We are particularly interested in value based decision making because, just as in the case of TiA, if there is no risk, the behavior ceases to be important. While the value based decision process is not yet fully understood as it relates to interaction behaviors, there is a significant understanding of the underlying psychophysiological processes and correlates. This knowledge can be used to advance hypotheses that define a research path aimed at achieving mitigation of human user interaction behaviors.

## AUTHOR CONTRIBUTIONS

KD: completed literature review and steered review towards current domain; main contributing author. JSM: final approval authority, guided literature review, advanced concepts germane to content, significant contribution to authorship. ARM: advised in fundamental concepts regarding BCI, HCI, and HAI; also

## REFERENCES


reviewed, edited, and approved successive steps in the process. JRL: provided expertise in the cognitive neuroscience of trust in automation; also reviewed, edited and approved successive review and writing processes.

#### ACKNOWLEDGMENTS

This research was supported by the U.S. Office of the Secretary of Defense through the Autonomy Research Pilot Initiative (MIPR DWAM31168) as well as by an appointment to the US Army Research Laboratory Postdoctoral Fellowship program administered by the Oak Ridge Associated Universities through a cooperative agreement with the US Army Research Laboratory. We would also like to acknowledge Dr. Kristin Schaefer for her insights and guidance regarding understanding the complex set of human factors that influence trust in automation. The authors are also very grateful for the feedback and guidance of two reviewers whose suggestions have proven rather valuable in improving the quality of the final work.


Proc. Natl. Acad. Sci. U S A 108, 13852–13857. doi: 10.1073/pnas.11013 28108


Proceedings of the 1st International Conference on Augmented Cognition (Las Vegas, NV).


Factors Ergon. Soc. Annu. Meeting, 44, 511–514. doi: 10.1177/154193120004 400507

**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The reviewer DM and handling Editor declared their shared affiliation, and the handling Editor states that the process nevertheless met the standards of a fair and objective review.

Copyright © 2016 Drnec, Marathe, Lukos and Metcalfe. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.