# NEURAL IMPLEMENTATIONS OF EXPERTISE

EDITED BY: Merim Bilalić, Robert Langner, Guillermo Campitelli, Luca Turella and Wolfgang Grodd PUBLISHED IN: Frontiers in Human Neuroscience

#### *Frontiers Copyright Statement*

*© Copyright 2007-2015 Frontiers Media SA. All rights reserved. All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.*

*The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.*

*Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.*

*Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.*

*As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.*

> *All copyright, and all rights therein, are protected by national and international copyright laws.*

*The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use.*

ISSN 1664-8714 ISBN 978-2-88919-688-3 DOI 10.3389/978-2-88919-688-3

#### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

#### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

#### Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

## **NEURAL IMPLEMENTATIONS OF EXPERTISE**

Topic Editors: **Merim Bilali**ć**,** Alpen Adria University Klagenfurt, Austria **Robert Langner,** Heinrich Heine University Düsseldorf, Germany **Guillermo Campitelli,** Edith Cowan University, Australia **Luca Turella,** University of Trento, Italy **Wolfgang Grodd,** University Hospital Aachen, Germany

When we think about expertise, we usually consider people who master tasks at a level not reachable by most other people. Although we rarely realise it, however, most humans are experts in many aspects of everyday life. This expertise enables us to find our way through a complex environment that is our life. For instance, we can instantly recognise multiple objects and relations between them to form a meaningful unit, such as an office. Thus, research on expertise is not only important to investigate the cognitive and neural processes within an "elite" group, but it is also a powerful tool to understand how everyone can acquire complex skills.

The goal of this RESEARCH TOPIC is to shed further light on the common and distinct neural mechanisms that implement various kinds of expertise. We broadly define expertise as skill in any perceptual, cognitive, social or motor domain, with the common core being optimised information processing due to knowledge acquired from repeated experiences. Thus, we are interested in the full range of mental processes modulated or modified by expertise, from "simple" object or pattern recognition to complex decision making or problem solving in a particular domain. These domains can range from everyday or occupational expertise to sports and rather artificial domains such as board games. In all cases, the aim should be to elucidate how the brain implements these sometimes incredible feats. We are particularly interested in connecting cognitive theories about expertise and expertise-related performance differences with models and data on the neural implementation of expertise. We welcome original research contributions using the full range of behavioural neuroscience methods, as well as theoretical, methodological or historical reviews, and opinion papers focusing on any of the above-mentioned aspects.

**Citation:** Bilalic´, M., Langner, R., Campitelli, C., Turella, L., Grodd, W., eds. (2015). Neural Implementations of Expertise. Lausanne: Frontiers Media. doi: 10.3389/978-2-88919-688-3

# Table of Contents


*113 Assimilation of L2 vowels to L1 phonemes governs L2 learning in adulthood: a behavioral and ERP study*

Mirko Grimaldi, Bianca Sisinni, Barbara Gili Fivela, Sara Invitto, Donatella Resta, Paavo Alku and Elvira Brattico

*127 Training of ultra-fast speech comprehension induces functional reorganization of the central-visual system in late-blind humans*

Susanne Dietrich, Ingo Hertrich and Hermann Ackermann


Assaf Harel, Dwight Kravitz and Chris I. Baker


Alan C.-N. Wong and Yetta K. Wong


Marcelo S. Brogliato, Daniel M. Chada and Alexandre Linhares


Emily B. J. Coffey and Sibylle C. Herholz

*223 The neural circuitry of expertise: perceptual learning and social cognition* Michael Harré

## Editorial: Neural implementation of expertise

Merim Bilalic´ 1 \*, Robert Langner <sup>2</sup> , Guillermo Campitelli <sup>3</sup> , Luca Turella<sup>4</sup> and Wolfgang Grodd<sup>5</sup>

<sup>1</sup> Department of Cognitive Psychology, Alps Adria University Klagenfurt, Klagenfurt, Austria, <sup>2</sup> Clinical Neuroscience and Medical Psychology, Heinrich Heine University Düsseldorf, Düsseldorf, Germany, <sup>3</sup> School of Psychology and Social Science, Edith Cowan University, Perth, WA, Australia, <sup>4</sup> Center for Mind/Brain Sciences, University of Trento, Trento, Italy, <sup>5</sup> Department of Magnetic Resonance, Max Planck Institute for Biological Cybernetics, Tuebingen, Germany

Keywords: expertise, neurosciences, sports, music, chess, go, language, perception

How the brain enables humans to reach an outstanding level of performance typical of expertise is of great interest to cognitive neuroscience, as demonstrated by the number and diversity of the articles in this Research Topic (RT). The RT presents a collection of 23 articles written by 80 authors on traditional expertise topics such as sport, board games, and music, but also on the expertise aspects of everyday skills, such as language and the perception of faces and objects. Just as the topics in the RT are diverse, so are the neuroimaging techniques employed and the article formats. Here we will briefly summarize the articles published in the RT.

### Board Games

The traditional expertise domain of board games has been covered in the RT by two articles, both employing the expertise approach of pitting experts against novices (Bilalic´ et al., 2010, 2012, 2014) but employing differing neuroimaging techniques. Bartlett et al. (2013) employed fMRI to demonstrate that chess experts engage the fronto-parietal network when they try to find a logical pattern in a "constellation" of randomly placed chess pieces. Jung et al. (2013) found structural differences as well as differences in brain networks between Baduk (Korean name for the board game Go) experts and novices, which point out the importance of visuospatial processing in problem solving and decision making of board-game experts.

#### Edited and reviewed by:

Srikantan S. Nagarajan, University of California, San Francisco, USA

> \*Correspondence: Merim Bilalic´ merim.bilalic@aau.at

Received: 09 June 2015 Accepted: 17 September 2015 Published: 30 September 2015

#### Citation:

Bilalic M, Langner R, Campitelli G, ´ Turella L and Grodd W (2015) Editorial: Neural implementation of expertise. Front. Hum. Neurosci. 9:545. doi: 10.3389/fnhum.2015.00545 Sport

Wright et al. (2013) extended the research on anticipation of action in sport by showing that the neural basis for deception involves, besides the well-known action observation network, the structures responsible for social cognition and affection. Turella et al. (2013) review other recent studies on the anticipation of action in sport and connect them with the mirror neurons in animal research. The review by Chang (2014) deals with motor domains such as sports and music and the structural and functional changes associated with expertise. Debarnot et al. (2014) go a step further in their review and contrast the neural changes during skill acquisition with those in mental training techniques such motor imagery and mediation.

#### Music

Music has been one of the most often investigated domains in expertise because its complexity and richness enable researchers to tackle diverse topics. The variety of themes in the domain of music is also evident in this RT. Tervaniemi et al. (2014), for example, pitted expert musicians against novices in a novel paradigm to investigate memory and attentional processes with EEG. On the other hand, Bergman Nutley et al. (2014) used the music domain to investigate longitudinal effects on cognitive processes such as working memory, speed of processing, and reasoning, while Fauvel et al. (2013) apply the promising findings of transfer and neural plasticity associated with musical practice to cognitive aging in their review.

#### Language

Unlike the previous articles, which deal with specialized expertise domains, a number of contributions highlight the fact that even the everyday skills we often take for granted represent impressive feats of human expertise. One group of articles deals with language, which is one such everyday skill. Reichle and Reingold (2013) review the electrophysiological evidence of the link between eye movements and the mind during reading. The learning of a second language based on its similarity to one's native language was investigated by Grimaldi et al. (2014), while Dietrich et al. (2013) demonstrated the neural changes associated with the process of learning to comprehend speech that was several times faster than normal speech. Finally, Lotze et al. (2014) demonstrate by means of resting-state fMRI that people who write highly creatively have increased functional connectivity between the task-related brain regions in the right hemisphere but reduced interhemispheric connectivity.

### Perception

Similarly, a couple of articles deal with perception of own-race and other-race faces (Wiese, 2013) as well as with perception of familiar faces and objects and the functional connectivity within the medial temporal lobe (McLelland et al., 2014). The role of the fusiform face area (FFA) in expertise has been a bone of

### References


contention between Harel et al. (2013, 2014), on the one hand, and Wong and Wong (2014), on the other.

### Theoretical and Simulation Work

Finally, a number of articles provide either new theoretical ideas or revisions of already established theories. Campitelli and Speelman (2013) highlight the advantages of using the expertise paradigm in investigating memory, while Brogliato et al. (2014) expand the Sparse Distributed Memory (SDM) model to incorporate the effects of practice on memory retrieval. Guida et al. (2013) extend their two-stage framework of skill acquisition (Guida et al., 2012) by arguing for the functional cerebral reorganization (FCR) as being the neural signature of expertise. The way one structures training studies is considered by Coffey and Herholz (2013), who suggest a new approach for characterizing and deconstructing the task requirements in training studies. Finally, Harré (2013) demonstrates the parallels between two seemingly unrelated fields, perceptual expertise and social cognition.

### Conclusion

It is clear that we cannot do justice to all submissions in this brief editorial. We hope, however, that our brief summary demonstrates the diversity in topics and methods employed in research on human expertise and also, indirectly, the growing interest in the field of expertise. It should become evident that research on expertise is not only relevant for understanding exceptional human performance but also for understanding how mind and brain work more generally. We are grateful to all authors for their contribution and hope that the RT, with its broad and deep coverage, will provide a useful reference for the reader interested in expertise and, particularly, current approaches to its neural implementation.


adulthood: a behavioral and ERP study. Front. Hum. Neurosci. 8:279. doi: 10.3389/fnhum.2014.00279


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Bilali´c, Langner, Campitelli, Turella and Grodd. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

### Expertise and processing distorted structure in chess

#### *James C. Bartlett <sup>1</sup> \*, Amy L. Boggan2 and Daniel C. Krawczyk1,3*

*<sup>1</sup> Program in Cognition and Neuroscience, School of Behavioral and Brain Sciences, The University of Texas at Dallas, Richardson, TX, USA*

*<sup>2</sup> Department of Psychology, Young Harris College, Young Harris, GA, USA*

*<sup>3</sup> Department of Psychiatry, The University of Texas Southwestern Medical Center, Dallas, TX, USA*

#### *Edited by:*

*Merim Bilalic, University of Tübingen Clinic, Germany*

#### *Reviewed by:*

*Guillermo Campitelli, Edith Cowan University, Australia Fernand Gobet, University of Liverpool, UK*

#### *\*Correspondence:*

*James C. Bartlett, School of Behavioral and Brain Sciences, The University of Texas at Dallas, 800 W Campbell Rd., Richardson, TX 75083-0688, USA*

*e-mail: jbartlet@utdallas.edu*

A classic finding in research on human expertise and knowledge is that of enhanced memory for stimuli in a domain of expertise as compared to either stimuli outside that domain, or within-domain stimuli that have been degraded or distorted in some way. However, we do not understand how experts process degradation or distortion of stimuli within the expert domain (e.g., a face with the eyes, nose, and mouth in the wrong positions, or a chessboard with pieces placed randomly). Focusing on the domain of chess, we present new fMRI evidence that when experts view such distorted/within-domain stimuli, they engage an active search for structure—a kind of exploratory chunking—that involves a component of a prefrontal-parietal network linked to consciousness, attention and working memory.

**Keywords: chess, chunking, consciousness, expertise, meaning, prefrontal-parietal network, structure**

#### **INTRODUCTION**

A useful strategy for addressing a complex cognitive process is to present people with stimuli that engage that process, but which will also disrupt or interfere with it, creating errors or difficulties in its execution. A classic example is Bartlett's (1932) famous study of memory for an English translation of a North American (Inuit) folk tale called "The War of the Ghosts." To his English participants, the story was strange with bizarre details and weird turns of events, and yet it was, quite clearly, a story. The wellknown finding was that participants' reproductions of the story were distorted in a way that made them more coherent and plausible than the original story was, a phenomenon Bartlett called "rationalization" and which he attributed to "effort after meaning." According to Bartlett (and many others since), rationalization and effort after meaning cannot be studied using meaningless stimuli, such as lists of nonsense syllables, because the relevant process—effort after meaning—will not be activated with such materials. At the same time, the cognitive effects of effort after meaning may be hard to discern with materials that are easy to interpret and readily assimilated with a person's prior knowledge. The effortful component of effort after meaning might be minimized in such cases.

Much more recently, Bor and colleagues (Bor and Owen, 2007; Bor, 2012; Bor and Seth, 2012) have proposed a conception of human consciousness that emphasizes the importance of frontal and parietal neural networks in the active search for patterns or chunks in stimulus displays, a process akin to Bartlett's concept of "effort after meaning." A core observation comes from working memory tasks in which participants are able to improve their memory for a sequence of numbers by detecting that the sequence follows a rule or is a repetition of a previously studied sequence, allowing chunking on that basis. Chunking of such sequences is associated with extensive activation of a prefrontal-parietal network, as detected by fMRI. In considering these studies, it is important to keep in mind the distinction between active, strategic chunking and the identification of overlearned stimulus patterns such as familiar words and acronyms (e.g., dog, FBI, see Gobet et al., 2001, for an elaborated theoretical discussion). In Bor's theory, it is only active, strategic chunking that engages the prefrontal-parietal network. Thus, as chunking of stimuli in a given domain becomes automatized through practice—as might be the case in domains of expertise—the role of the prefrontalparietal network will be decreased. For the sake of clarity, we will refer to such automatized chunking as "pattern recognition."

The present paper is focused on stimuli that differ rather drastically from sequences of numbers, but which are well suited for the study of chunking and pattern recognition as a function of expertise. Specifically, we examine working memory for chessboard displays by master-level chess players, as well as, for comparison, less skilled players and novices at chess. Our findings suggest that chess masters engage at least one component of the prefrontal-parietal network in the service of chunking chessboard displays. However, they do this more with "scrambled" displays—boards on which the pieces are placed randomly—as opposed to normal displays that represent possible chess game positions. We argue that our findings are in line with the view that the prefrontal-parietal network is involved with strategic, non-automatized chunking, as opposed to the automatized pattern recognition that occurs with chess experts viewing normal chessboard displays.

The basis for our argument is perhaps the best known finding from over 60 years of research on expertise: Chess experts are much better at recalling normal displays than randomized displays and, with the former, their recall is much greater than that of novices or less skilled players (Chase and Simon, 1973). The result has been attributed to knowledge structures in long-term memory that allow experts to encode a normal chessboard as a relatively small number of patterns or groups, each including several pieces and their relative positions on the board. Novices lack such knowledge structures, and therefore, are unable to perform this type of grouping. Less skilled players have fewer such structures, and/or less elaborated structures, as compared to more skilled players. Therefore, less skilled players encode chessboards less effectively than more skilled players do, encoding fewer and smaller patterns (see, e.g., Gobet and Simon, 1996a,b).

The pattern recognition account of normal/random chessboard recall is virtually unchallenged in the expertise literature, and we accept it for purposes of the present research. However, little is known about what occurs when expert players encounter random chess displays. Gobet and Simon (1996a) marshaled evidence that recall of random chessboards is positively correlated with chess expertise, albeit more weakly than the recall of normal chessboards. This finding suggests that experts perform some degree of pattern recognition, even with random boards. Indeed, this finding (and others) was predicted by a computer model that was trained in identifying patterns of pieces in positions from master-level chess games (Gobet and Simon, 1996b). As the degree of simulated training increased, the model recognized more patterns in random boards.

Yet the processes that differentiate more and less skilled players when encoding random boards are not fully understood. The computer simulations of Gobet and Simon (1996b) give strong support to one hypothesis: Because chess experts have a huge data base of chess-piece configurations in long-term memory, they are more likely than less skilled players to recognize meaningful patterns that occur by chance in random games. However, Gobet and Simon noted that some of the chunks encoded by chess players are not meaningful in chess, citing the example of an expert noticing that three white pawns formed a diagonal from a1. Based on such observations, they concluded that ".. chessplayers may use special strategies to recall pieces on a board that is almost bare of familiar patterns" (p. 501). Similarly, Gobet and Simon (1996a) suggested that stronger and weaker players might differ in "the possession of strategies for coping with uncommon positions" (p. 161).

The present study addressed an idea that links the specialstrategies hypothesis to the prefrontal-parietal network as conceptualized by Bor and Owen (2007). We propose that experts' processing of random chess displays differs from their processing of normal displays not only quantitatively (involving fewer and/or smaller patterns or groups), but qualitatively as well, engaging the prefrontal-parietal network in an active search for chunks. The chunks in question may include patterns of pieces identical or similar to what might occur in real chess games, as well illegal, strange, or highly unlikely patterns that are, nonetheless, encodable based on knowledge of chess (e.g., three white pawns on a diagonal from a1). The key claim is that experts engage this active, knowledge-based search process more than less skilled players do.

Our thinking departed from two recent studies comparing neural processing of chessboard displays—as well as faces and other stimuli—by experts and novices at chess (Bilalic et al., ´ 2011; Krawczyk et al., 2011). Both studies were focused on the fusiform face area (FFA) in the ventral temporal cortex, due to its importance in the processing of faces. Further, both addressed the question of whether the FFA is better characterized as being facespecific—responding more to faces than non-faces—or expertise specific—responding to faces as well as other objects with which observers have high expertise. Using a standard working memory task (one-back), Bilalic et al. ( ´ 2011, Experiment 1) reported that FFA activity was substantially greater for faces than chessboards, whether shown in standard upright orientation or upside-down. However, they also found that FFA activity in response to chessboards was greater among expert players than among novices at chess. Using a similar one-back working memory task, Krawczyk et al. (2011) also observed substantially higher FFA activation for faces than chessboards, though they found no evidence that FFA activation in response to chessboards was greater among experts than novices. Despite this discrepancy, the two studies converged in another respect: Both showed that FFA activation was as strong if not stronger with random chessboards than normal boards. In fact, in five of six conditions across Experiments 2 and 3 of the Bilalic et al. study, there was a statistically reliable interactive ´ pattern such that experts showed stronger FFA activation with random boards than normal boards, while novices showed no difference. The normal-random comparison in the Krawczyk et al. study produced only non-significant trends in FFA activation, possibly due to the limited sample size.

The experiment reported here is the same as that reported by Krawczyk et al. (2011), with the addition of: (a) five new master-level chess experts, bringing the total expert sample to an n of 11, and (b) five midlevel players. According to their international Elo ratings (Elo, 1986), our master-level experts (*M* = 2469) had greater expertise than the Bilalic et al. experts ´ (*M* = 2117). Our midlevel players had lower expertise, yet they were active players with national Elo ratings (*M* = 1501), and were substantially more skilled than our novice participants (*n* = 6), all of whom had played chess but did not do so regularly. Our goal was to further assess the strength of our prior observations, and, more importantly, to determine if experts' processing random boards produces not only an FFA response, but also activation of the prefrontal-parietal network previously described by Bor and Owen (2007). Our guiding hypothesis was that players with greater expertise would engage an active chunking strategy with the random chessboards to maximize their performance on the working memory task, and that their use of this strategy would involve activation in the prefrontal-parietal network, possibly extending to FFA regions due to top-down control effects (Corbetta and Shulman, 2002). Others have reported that prefrontal-parietal activation associated with working memory typically decreases in novices with practice up to a certain point at which functional reorganization occurs with greater expertise (Guida et al., 2012). Once an expert has achieved functional reorganization, he or she is more likely to access long term memory representations in the domain of expertise. Our expert players would likely fit this profile, showing *reduced* prefrontal-parietal activation during working memory for normal chess displays. Notwithstanding, an active chunking hypothesis predicts they will show *increased* prefrontal-parietal activation during working memory for random displays.

We planned to test our hypothesis with a whole-brain analysis as well as with more focused region-of-interest (ROI) comparisons based on coordinates for seven prefrontal-parietal regions provided by Bor and Owen (2007). These regions consisted of the anterior cingulate cortex (ACC) (Duncan and Owen, 2000), the left and right inferior parietal sulcus (IPS) (Duncan, 2006), the left and right ventrolateral prefrontal cortex (VLPFC) (Bor and Owen, 2007), and the left and right dorsolateral prefrontal cortex (DLPFC) (Bor et al., 2003, 2004). We also collected subjective reports of participants' chunking activity at the end of the experimental session. One goal was to determine whether any increment in activation for random boards as compared to normal boards extends throughout the entire prefrontal-parietal network, or is restricted to one or two components. A second goal was to test our assumption that such activation increments reflect an active search for chunks. To the extent that they do, these activation increments should show correlations with subjective reports of chunking.

In addition to the prefrontal-parietal ROI analyses, we also conducted ROI analyses for the left and right FFA. The goal here was to determine whether activation increments for random boards in the FFA (Bilalic et al., 2011 ´ ) are linked to activation increments in the prefrontal-parietal network.

One final ROI analysis was aimed at linking our activechunking hypothesis with accumulating evidence that highly expert individuals show activation in medial-temporal brain regions in working memory tasks with objects of expertise (see Campitelli et al., 2007 and Guida et al., 2012). The dominant explanation for these medial-temporal activations is that wellformed stimuli in a domain of expertise contain many familiar patterns that activate representations in long-term memory, allowing long-term memory to support performance in working memory tasks. Because normal chessboards contain more familiar patterns than randomized boards, expertise-related medialtemporal activations should be largely restricted to normal chessboards. Hence, while expertise-related prefrontal-parietal activations should be stronger with random boards than normal boards, expertise-related medial-temporal activations might be stronger with normal boards than random boards. We chose four medial-temporal ROIs—the left and right hippocampus and the left and right parahippocampal gyrus—to asses this possibility.

#### **MATERIALS AND METHODS SUBJECTS**

Subjects were 22 healthy, right-handed, male volunteers. Eleven subjects were chess experts recruited from the UT Dallas Chess Program, age 19–28 (*M* = 23 years). These subjects ranked within the top one percent of active tournament players (three Grandmasters; eight International Masters) at the time of the experiment. Subject expertise was substantiated by their competitive ratings (Elo range 2353–2570; *M* = 2469), their years playing chess (*M* = 16 years), and their tournament activity (*M* = 13 per year). Six of the remaining subjects were healthy males who were chess novices age 21–27 (*M* = 25 years). These subjects reported that they rarely played chess and had never participated in chess tournaments. Lastly, we included five players (age range 19–40, *M* = 24) who had some tournament experience in chess and were competitively rated (Elo range = 1332–1634; *M* = 1495), but did not approach the skill-level of the experts. Given the strong differences in expertise level between the experts and the other two groups (novices and individuals with some experience), we collapsed the non-expert groups forming a larger group of 11 individuals termed "less experienced players." Notably there was no significant difference in behavioral accuracy on the chessboard conditions between the true novices and the individuals with some chess experience. This experimental protocol received approval from the Institutional Review Boards of The University of Texas at Dallas and UT Southwestern Medical Center. All subjects provided informed consent in accordance with the 1964 Declaration of Helsinki.

#### **MRI PERCEPTION TASK**

We used a one-back task previously described by Krawczyk et al. (2011). In the task we presented blocks of visual items and subjects judged whether each item was a repeated image or new image. Stimuli consisted of sets of images of chess boards from normal games, randomly positioned chess boards that could not occur in normal games, faces, everyday objects (from Geusebroek et al., 2005), and outdoor scenes (**Figure 1**). Images were presented in five runs of 8 blocks with 12 images per block, 2 s per image, 500 ms inter-stimulus-interval (ISI). Stimulus exposure times and ISIs were set to ensure that novices could reasonably perform the task given the degree of visual complexity present in chess board stimuli. Images were presented offset from center to the right or left in an alternating sequence to avoid apparent motion effects that occur in the chess conditions between non-matching stimuli that occur in sequence. One or two images repeated per block, and subjects were instructed to press both buttons (one in each hand) when a repeat was detected. Each block contained one type of image (e.g., faces) or was a fixation block lasting 30 s. Block order was presented in a pseudo-randomized manner.

#### **POST-SESSION QUESTIONNAIRE**

After the imaging task participants completed a questionnaire on which they rated the difficulty they experienced normal chessboards, random chessboards, faces, scenes, and objects on a 1 to 7 scale. They also estimated the average number of groupings they perceived with normal and random boards, the average number of chess pieces per grouping, and the average total number of pieces they tracked.

#### **FUNCTIONAL MRI ACQUISITION AND ANALYSIS**

Images were acquired using a 3T Philips Achieva MRI scanner running a gradient echoplanar sequence (*TR* = 2000 ms, *TE* = 28 ms, flip angle = 20◦) sensitive to BOLD contrast. Each volume consisted of tilted axial slices (3 mm thick, 0.5 mm slice gap) that provided whole brain coverage. Anatomical T1-weighted images were acquired in the following space: *TR* = 2100 ms, *TE* = 10, slice thickness = 4 mm with no gap at a 90◦ flip angle. Head motion was limited using foam head padding.

FMRI block design analyses were carried out using multiple regression. Preprocessing was conducted using SPM5 (Wellcome Trust Center for Neuroimaging, www.fil.ion.ucl.ac.uk/spm). Echoplanar Images (EPIs) were realigned to the first volume of acquisition and then smoothed (8 mm 3D Gaussian kernel). Separate regressors were used to model each block-type. Each regressor was convolved with a canonical hemodynamic response function (HRF) used to model blood oxygen level dependent (BOLD) responses to trial blocks. A *t*-statistic was generated for each voxel, and a subsequent map (an SPM) was created. Linear contrasts were used to test the relative activation associated with conditions of interest. Resulting contrast maps reflected the differences in activation between the conditions at each voxel location. Significant voxels met both a whole brain threshold of *p <* 0*.*001, and a minimum Familywise Error-corrected cluster size threshold computed using the SPM data structure for each relevant contrast map (requiring minimum cluster sizes ranging from 60 to 100 contiguous voxels). Contrasts between normal chessboards minus random chessboards and for random chessboards minus normal chessboards were used for each of the subjects independently. We then performed a second-level analysis of the group activation for these contrasts. Finally, we performed a between groups analysis contrasting the experts and less experienced players on both the normal-minus-random contrast and random-minus-normal contrast in order to localize areas in which the effects of chessboard organization differed between groups.

#### **RESULTS**

#### **BEHAVIORAL RESULTS**

Experts performed with significantly greater accuracy (*M* = 98%) than less skilled players (*M* = 84%) for normal chessboard stimuli, *t(*19*)* = 3*.*09, *p <* 0*.*01. There were no group differences and high accuracy with random chessboards (experts: *M* = 89%, less skilled: *M* = 86%), faces (experts: *M* = 96%, less skilled: *M* = 93%), objects (experts: *M* = 95%, less skilled: *M* = 88%), and scenes (experts: *M* = 82%, less skilled: *M* = 86%).

The expert and less skilled groups varied on several questionnaire dimensions reported after the task, further establishing that the two groups differed in their processing of chessboard stimuli (evaluated with independent samples *t*-tests). Note that data were incomplete for some comparisons due to some participants not answering some questions. Looking first at the difficulty ratings, less skilled players reported significantly greater difficulty (*n* = 11, *M* = 4.27, 1–7 scale) with tracking normal chess stimuli than experts (*n* = 11, *M* = 2.05), *t(*20*)* = 3*.*82, *p <* 0*.*001, but not with random chess stimuli (*M*'s = 4.18 and 4.52, respectively). There were no significant differences in reported difficulty for comparisons of experts and less skilled players on faces, scenes, or objects. We note that nine of the eleven experts reported that the random chessboards were more difficult to perceive than the normal chessboards, with the two remaining experts reporting that they were equally difficult. Only three of the eleven less skilled players reported that the random boards were more difficult to perceive than the normal boards with all others reporting equal difficulty.

With normal chessboards, experts reported tracking a greater number of pieces (*n* = 9, *M* = 21*.*22) than did less skilled participants (*n* = 11, *M* = 6*.*73), *t(*18*)* = 4*.*75, *p <* 0*.*0001, as was expected given the difference in expertise. With random boards as well, experts reported tracking more pieces (*n* = 9, *M* = 10*.*28) than less skilled participants (*n* = 11, *M* = 5*.*00), though the difference was smaller and only marginally significant, *t(*18*)* = 1*.*80, *p* = 0*.*08.

Many participants reported seeing groupings of pieces (i.e., chunks) within chessboards. Experts reported more pieces per grouping (*n* = 8, *M* = 9*.*69) than less skilled players (*n* = 10, *M* = 2*.*75), *t(*16*)* = 2*.*15, *p <* 0*.*04. Looking at normal and random chessboards separately, there was also a marginally significant difference with experts reporting more groupings of pieces than did less skilled participants in normal chessboards (*n* = 8, 10, *M*'s = 2.94 and 1.45, respectively, *t(*16*)* = 1*.*81, *p* = 0*.*08), but not in random chessboards (*M*'s = 1.80 and 1.00).

#### **NEUROIMAGING RESULTS**

Our initial comparisons were conducted independently on each group (experts and less skilled players). A normal—random contrast (i.e., the normal-minus-random-chess difference) showed activation in the bilateral insula among the experts (**Figure 2A**), but it did not show any significant regions of activation among the less skilled players. The reverse, random normal, comparison resulted in extensive activation in the experts within the left inferior parietal lobe, the left middle frontal gyrus, the lingual gyrus of the occipital lobe bilaterally, the left cuneus, and the right temporal lobe (**Figure 3A**). By contrast, the less skilled players showed activation only within the bilateral fusiform gyrus for the random—normal comparison (**Figure 3B**, and refer to **Table A1** in the Appendix for activation coordinates and cluster sizes). Note that the random—normal contrast showed extensive activation in the expert group but not the less skilled group within parietal and frontal cortical regions that overlap with regions of the prefrontal-parietal network identified by Bor and Owen (2007).

**FIGURE 2 | Task-related activation comparing groups. (A)** Chess experts showed bilateral activation of the insula for the comparison of real chess perception minus randomly scrambled chess. **(B)** Chess experts differed from less skilled players in showing more activation in posterior cingulate and right superior temporal cortex when processing normal chess as compared to random chess. **(C)** Chess experts differed from less skilled players in showing more activation within the left parietal cortex when processing randomly scrambled chess as compared to normal chess.

extending into the inferior parietal cortex when contrasting randomly

As a conservative test of the group differences, we compared the experts to the less skilled players with respect to both the normal—random difference (**Figure 2B**) and the random normal difference (**Figure 2C**). The first of these comparisons identified two brain regions—the bilateral posterior cingulate and the right middle and superior temporal gyri—as those in which the normal—random difference was reliably greater in the expert group than in the less skilled group. The underlying pattern was that, with normal chessboards, experts exceeded less skilled players in mean percent signal change in both areas (*M*'s = 0.42 and 0.09, respectively, for the posterior-cingulate area, and −0.02 and −0.13, respectively, for the combined right temporal areas). These differences were absent or reversed with random chessboards (*M*'s = 0.10 and 0.11, respectively, for the posterior cingulate area and = −0*.*19 and −0.17 for the lateral temporal areas).

The second comparison identified a single brain region—the left inferior parietal cortex—as one in which random—normal difference was reliably greater in the expert group (**Figure 2C**). This area overlaps with the left inferior parietal region (the IPS) in Bor's frontal-parietal network. The next section examines the activation pattern in that region well as other frontal-parietal regions.

#### **PREFRONTAL-PARIETAL REGION-OF-INTEREST RESULTS**

The left inferior parietal region identified by our betweengroup analysis of the random—normal contrast is only one of seven areas that Bor and Owen (2007) linked to the prefrontal-parietal network. We sought to determine whether one or more of the remaining six areas might show a similar effect among experts—as compared to less skilled players—if examined with more sensitive ROI-based analyses. We created regions of interest (ROIs) using the coordinates and diameters specified by Bor and Owen (2007) based on prior reports from Duncan and Owen (2000), Bor et al. (2003, 2004), and Duncan (2006) using MarsBaR software (http://marsbar*.* sourceforge*.*net/). All ROIs were spherical with diameters of 10 mm and centers specified as follows in MNI coordinates: ACC (1, 33, 23), left DLPFC (−42, 33, 11), right DLPFC (39, 36, 13), left VLPFC (−43, 22, −6), right VLPFC (41, 22, −5), left IPS (−38, −47, 45), and right IPS (41, −47, 43). All ROI center coordinates were obtained by converting the Talairach coordinates reported in Bor and Owen (2007) to MNI using the Signed Differential Mapping software (http://www*.* sdmproject*.*com/utilities/?show=Coordinates). These regions are shown in **Figure 4**. ROI data were extracted and converted to percent-signal-change for each participant using MarsBar software (sourceforge.net/projects/marsbar, Brett et al., 2002). We extracted percent signal change for all ROIs for the chessboard and random chessboard conditions. We statistically evaluated the ROI data first by conducting 2 (group) × 2 (stimulus category) × 7 (area) ANOVA, which supported two interactions, an area by group interaction, *F(*6*,* <sup>120</sup>*)* = 3*.*09, *p* = 0*.*01, and an area × stimulus interaction, *F(*6*,* <sup>120</sup>*)* = 4*.*89, *p* = 0*.*0002. We followed this with separate group × stimulus ANOVAs for each of the seven areas.

scrambled minus real chess.

The ANOVAs of the DLPFC and VLPFC ROIs yielded no significant effects. However, the VLPFC areas showed significant correlations that are described in the subsequent *correlational analyses* section.

In the ACC ROI, we observed a significant main effect of group, *F(*1*,* <sup>20</sup>*)* = 10*.*77, *p* = 0*.*004, in which less skilled participants showed greater activation (*M* percent signal change = 0.51) than experts (*M* = −0*.*12). This group effect was approximately equally strong with normal chess and random chess, as shown in **Figure 5** (lower panel).

The IPS showed a different pattern, bilaterally differentiating the groups by stimulus category activation levels. The left IPS region showed a significant effect of category with random chessboards resulting in greater activation (*M* = 0*.*91) than normal boards (*M* = 0*.*67), *F(*1*,* <sup>20</sup>*)* = 31*.*32, *p* = 0*.*0001. It also showed a group by stimulus interaction, *F(*1*,* <sup>20</sup>*)* = 6*.*54, *p* = 0*.*02, such that the increased activation for random boards over normal boards was stronger among experts (*M*'s = 0.99 and 0.65, respectively), than among less skilled players, (*M*'s = 0.82 and 0.69 respectively), as shown in **Figure 5** (upper panel). A similar but less robust pattern was observed in the right IPS, where there was a significant main effect of stimulus category, with random boards yielding greater average activation (*M* = 0*.*66) than normal chessboards (*M* = 0*.*55), *F(*1*,* <sup>20</sup>*)* = 9*.*06, *p* = 0*.*007, with a marginal group by stimulus interaction present, *F(*1*,* <sup>20</sup>*)* = 3*.*88, *p* = 0*.*06 (**Figure 5**, middle panel).

As the left IPS ROI overlaps with the region in which we previously found a reliable group difference in random-minusnormal activation levels (**Figure 2C**), the ROI-based analysis of this region confirms our prior observation that experts more than less skilled players show increased activation to random boards than normal boards. The analysis of the right IPS suggests that the group difference is largely if not completely bilateral. Finally, the ACC analysis shows that this interactive pattern obtained in the IPS regions does not extend to the entire prefrontal-parietal network, and that, moreover, at least one region within this network—the ACC—shows a strikingly different pattern.

The significant group x stimulus category interaction in left parietal cortex—and, by a less stringent criterion, right parietal cortex—suggests that parts of the prefrontal-parietal network may be relevant for the processing of random boards by experts.

However, the theoretical significance of this pattern depends on whether experts' left parietal activation for random boards exceeds their left parietal activation not only for normal boards, but also for nonchess stimuli. We therefore, examined IPS activations in response to faces, scenes, and objects, in addition to random chessboards. The left IPS showed significant effects of Bartlett et al. Processing distorted structure in chess

both category, *F(*3*,* <sup>60</sup>*)* = 123*.*59, *p <* 0*.*001, and group (experts greater than novices), *F(*1*,* <sup>20</sup>*)* = 4*.*59, *p <* 0*.*05. Similar results were observed for the right IPS, with significant effects of category, *F(*3*,* <sup>60</sup>*)* = 63*.*43, *p <* 0*.*001, and group (experts greater than novices), *F(*1*,* <sup>20</sup>*)* = 69*.*02, *p <* 0*.*001. To establish whether the IPS activation to random boards was different than other categories in the experts, we conducted additional *post-hoc* comparisons (Bonferroni corrected *p <* 0*.*05). For the left IPS in experts, *post-hoc* comparisons revealed that random chess activation (*M* = 0.99) was higher than faces (*M* = 0.25), scenes (*M* = 0.29), and objects (*M* = 0.40). Similarly for the right IPS in experts *post-hoc* comparisons also showed that random chess activation (*M* = 0*.*71) was higher than faces (*M* = 0*.*13), scenes (*M* = 0*.*21), and objects (*M* = 0*.*20).

#### **FUSIFORM FACE AREA REGION-OF-INTEREST RESULTS**

To evaluate effects of stimulus and group in the FFA, we created left and right FFA ROIs independently for each participant by defining maximally active voxels within the fusiform gyrus based on a contrast of face activation minus scene and object activation. Note that this ROI does not violate independence, as the defining contrast did not include either of the conditions to be evaluated in the ROI data (normal and random chess). We then conducted 2 (group) × 2 (normal chess, random chess) ANOVAs like those we performed on the prefrontal-parietal ROIs. As shown in **Figure 6**, the left FFA showed a significant effect of stimulus category, *F(*1*,* <sup>19</sup>*)* = 11*.*75, *p <* 0*.*01, with the FFA response to random chess (*M* = 0*.*73) being higher than that for normal chess (*M* = 0*.*54). Similar results were observed for the right FFA, *F(*1*,* <sup>19</sup>*)* = 6*.*49, *p <* 0*.*05, with higher activation for random chess (*M* = 1*.*13) than normal chess (*M* = 0.96). Neither ANOVA supported effects involving expertise.

#### **MEDIAL-TEMPORAL REGION-OF-INTEREST RESULTS**

To examine expertise and stimulus effects in medial temporal regions linked to long-term memory, we evaluated hippocampus and parahippocampal gyrus ROIs bilaterally. These ROIs were created by using the anatomically defined ROI from the WFU PickAtlas tool (Maldjian et al., 2003, 2004). A set of four ANOVAs supported no significant differences within the right or left hippocampus, but did support a significant interaction of stimulus category and expertise, *F(*1*,* <sup>20</sup>*)* = 6*.*75, *p <* 0*.*02, in the left parahippocampal ROI. As shown in **Figure 6**, left parahippocampal activation to normal chess for experts (*M* = 0*.*27) was higher than that for less experienced players (*M* = 0*.*06), while both groups exhibited very similar activation levels to random chess (experts *M* = 0*.*18, less experienced players *M* = 0*.*16). A similar, marginally significant, interaction emerged in right parahippocampal gyrus, *F(*1*,* <sup>20</sup>*)* = 3*.*96, *p <* 0*.*06, with the activation to normal chess for experts (*M* = 0*.*34) being higher than that for less experienced players (*M* = 0*.*17), and both groups similar for random chess (experts *M* = 0*.*25, less experienced players *M* = 0*.*25). These interactions are in line with prior evidence for activation increases in memory-related, medialtemporal regions with experts processing well-formed objects of expertise (Campitelli et al., 2007; Guida et al., 2012). Note also that the previously reported whole brain normal—random

**FIGURE 6 | Activation from Regions of Interest within the FFA and Parahippocampal gyrus. (A)** Left FFA activation showed a significant effect of stimulus category with the response to randomly scrambled chess being higher than that for real chess. **(B)** Similar results were observed for the right FFA with higher activation for random chess than real chess. **(C)** An interaction in the left parahippocampal gyrus with activation to normal chess for experts being higher than that for less experienced players. Both groups showed similar activation to randomly scrambled chess. **(D)** A marginally significant interaction in right parahippocampal gyrus with the activation to normal chess for experts being higher than that for less experienced players.

contrast (**Figure 2B**) supported similar patterns in the posterior cingulate and right middle and superior temporal brain regions.

#### **CORRELATIONAL ANALYSES**

Of interest was whether activation differences between random chessboards and normal chessboards in the seven prefrontalparietal ROIs might be correlated with (a) self-reported chunking-related activity, and (b) activation differences in the FFA and parahippocampal gyri. We computed a mean percent signal change difference score from each ROI for each participant by subtracting mean percent signal change from the normal chess condition from that of the random chess condition. We then computed inter-correlations (Pearson correlation coefficients, *r*) between these ROI difference scores for the five prefrontal-parietal areas and the following behavioral measures: mean number of groupings of pieces for random chess trials, mean number of pieces per grouping for random chess trials, and mean overall estimate of number of pieces tracked for random chess trials (the latter measure was strongly correlated with the product of the first two, *r* = 0*.*87). Total-pieces-tracked was reliably correlated with activation differences in both left IPS (*r* = 0*.*48, *p <* 0*.*05) and right IPS (*r* = 0*.*51, *p <* 0*.*05, *df* = 20 in all cases).

We next computed correlations between random—normal differences in the five prefrontal-parietal ROIs with random normal differences in the FFA and parahippocampal gyri (*df* = 19 for correlations involving FFA, 20 for all others). Activation differences in the ACC were reliably correlated with activation differences in both left and right FFA and in both left and right parahippocampal gyrus (*r*'s = 0.54, 0.45, 0.46 and 0.52, respectively, all *p*'s *<* 0.05). Similarly, activation differences in the left VLPFC were correlated with activation differences the left FFA and left and right parahippocampal gyrus (*r*'s = 0.54, 0.59 and 0.52, respectively, *p*'s *<* 0.05). Finally, activation differences in the right VLPFC were correlated with activation differences in the left FFA (*r* = 0*.*49, *p <* 0*.*05). The correlations involving the ACC and VLPFC regions should not be viewed as independent, as activation differences in ACC were reliably correlated with activation differences in the left and right VLPFC (*r*'s = 0.60 and 0.48, respectively, *p <* 0*.*05), though activation differences in the left and right VLPFC were only weakly (and non-significantly) intercorrelated (*r* = 0*.*34). We note that activation differences in the left and right FFA and in the left and right parahippocampal gyri were strongly intercorrelated (*r*'s = 0.73 and 0.95, respectively, *p*'s *<* 0.01).

#### **DISCUSSION**

The main purpose of this study was to test the hypothesis that when experts encounter stimuli that fall within their skilldomain, but that are distorted in a manner which makes them impossible or in violation of rules in that domain, these experts engage an active search for structure akin to Bartlett's (1932) conception of effort after meaning. We framed this hypothesis in light of Bor and Seth's (2012) more recent theory linking conscious awareness to an active search for units or chunks, a process akin to effort after meaning. Bor's (2012) theory links this active chunking process to neural activity in prefrontal and lateral parietal brain regions, and this theory is supported by recent fMRI studies showing that neural activation in these brain regions known to be increased in a range of working memory and attentional tasks—are particularly increased in tasks involving active chunking of information.

Our focus here was on how extreme expertise might be related to this prefrontal-parietal chunking network in the domain of chess. Our starting point was the surprising observation that randomly scrambled chessboard displays, which violate the rules of chess and are known to disrupt expert performance, evoke as much if not more neural activation than do normal, meaningful displays in a ventral visual processing region linked to expertise, the FFA (Bilalic et al., 2011; Krawczyk et al., 2011 ´ ). Considering other evidence that the prefrontal-parietal network exerts topdown control over ventral visual cortex activity (Tomita et al., 1999), including the FFA (Chadick and Gazzaley, 2011), we hypothesized that experts' processing of random displays, as compared to normal displays, should be linked to activation in prefrontal and lateral parietal brain regions, as well as in the FFA.

We tested this prediction with 11 master level chess experts and 11 less skilled players, all of whom performed a simple working memory task with normal chessboards, random chessboards, and three other types of stimuli using a blocked design. A whole-brain analysis identified a left IPS region that was reliably more active with random boards than normal boards in our expert group. This effect was not observed among our less skilled players, and, moreover, the random—normal difference in this region was significantly greater in the expert group than in the less skilled group (**Figure 2**). This observation was extended in a ROI analysis based on Bor and Owen's (2007) coordinates for seven subregions of the prefrontal-parietal network. In one of these regions, the left IPS, we found an interaction such that experts more than less skilled players showed increased activation for random boards over normal boards, confirming our finding from the wholebrain analysis (**Figure 5**, panel **A**). Activation in the right IPS showed a similar pattern, though it was less robust statistically (**Figure 5**, panel **B**).

The IPS data are in line with the hypothesis that expert chess players performing a working memory task respond to random boards by engaging an active search for novel chunks involving prefrontal-parietal network. In further support of this hypothesis, IPS activation among expert subjects not only was higher for random boards than normal boards; it also was higher for random boards than for three types of nonchess stimuli (faces, objects, and scenes). Finally, we observed correlations between random—normal activation differences in the left and right IPS and self-reports of the number of pieces tracked from random boards (a measure strongly correlated with product of the reported number of groups and the average size of groups). These two findings strengthen the case that IPS activation is functionally linked to an active search for chunks, in line with the active chunking hypothesis.

It may seem odd to propose that experts engaged in more chunking activity with randomly scrambled boards than normal boards, as the former are undoubtedly difficult for them to encode and might reasonably be expected to support the detection of fewer and/or smaller patterns or groups. Indeed, our post-test questionnaire data support this proposition. However, our proposal is more intuitive by a dual-mode view (Bor, 2012, pp. 152–156), a view that distinguishes the active search for and discovery of chunks from a more automatic process of identifying previously learned patterns. It is the former process that we suggest is supported by the IPS, and that is more strongly engaged in expert processing of random boards than normal boards. Of course, a change in the behavioral task from our simple working memory test to one more demanding of chess expertise might increase experts' active search for chunks in normal chess displays. The conditions in which experts engage an active search for structure have only begun to be explored.

Although an active chunking hypothesis is consistent our data from the IPS brain regions, it was not supported for five other components of the prefrontal-parietal network identified by Bor and Owen (2007). In none of these five regions did we find support (in the form of a group x stimulus category interaction) for experts showing increased activation for random boards than normal boards relative to less skilled players. In fact, in one of these regions—the ACC—we found a strikingly different pattern: In that region associated with detection of conflict and executive function (Botvinick et al., 1999, 2001; Botvinick, 2007), experts showed reliably less activation than less skilled persons with both normal and random boards (**Figure 5**, panel **C**). We did not predict this pattern, but it is consistent with much evidence for reduced activation in prefrontal brain regions as a function of expertise with the stimuli being processed (Hill and Schneider, 2006). The fact that our ACC data support this effect with random boards as well as normal boards might be viewed as contradicting this general expertise-deactivation relationship. However, Bilalic et al. ´ (2010) have provided evidence that chess expertise entails object level processing (identifying individual chess pieces) as well as pattern-level processing (processing chess configurations). Thus, one possible resolution is that our ACC data reflect the effects of expertise on object level processing of chessboard pieces, which might proceed similarly with normal boards and random boards.

Another interesting difference between the ACC area and the IPS regions concerned correlations with the ventro-temporal FFA regions. We expected to observe correlations between random normal differences in FFA activation and random—normal differences in prefrontal-parietal regions. Indeed, activation differences in the left FFA were reliably correlated with activation differences in the ACC, and in the left and right VLPFC areas. However, activation differences in the FFA were not reliably correlated with activation differences in the IPS areas. The pattern is puzzling, as the ACC region did not show an overall random normal difference while the FFA and IPS regions did. Further, only the IPS regions supported the critical prediction of an increased random—normal difference in experts as compared to less skilled players.

There is much to learn. However, thinking more broadly, the striking difference between the IPS and ACC areas in the pattern of group and randomization effects underscores the importance of characterizing the functions of different prefrontalparietal regions as they relate to expertise. It is possible that only some of these regions contribute to the active search for chunks, or that, alternatively, all contribute to active chunking, but in different ways. One plausible and readily testable hypothesis is that our simple, 1-back working memory task did not evoke a full-fledged active chunking strategy, but only one component of that strategy. There is evidence that the IPS region may be involved in the conjoint encoding or "binding" of features "intrinsic" to a stimulus (e.g., the color and spatial location of a word, Uncapher et al., 2006). In the case of chess displays, binding of color, spatial location and shape might constitute one component of an active chunking strategy, one that is engaged in simple, time-limited tasks, such as the 1-back task used here. In more complex and temporally extended tasks, other components of an active chunking strategy might become engaged, and expertise-related random normal differences might emerge in other prefrontal-parietal brain regions.

A subsidiary goal of this study was to extend prior evidence bearing on the hypothesis that well-formed stimuli in a domain of expertise contain many familiar patterns that activate representations in long-term memory, allowing long-term memory to support performance in working memory tasks (Campitelli et al., 2007; Guida et al., 2012). An ROI analysis of the left and right parahippocampal regions supported this hypothesis: In both of these regions, known to be involved in recollection and memory for the spatial contexts of objects (Eichenbaum et al., 2007), chess expertise was linked to increased activation with normal boards, but not with random boards (see **Figure 6**). Our whole brain normal—random contrast supported a similar interactive pattern in the posterior cingulate and the middle and superior right temporal gyri (**Figure 2B**). Using a threat-detection task, Bilalic et al. (2012) ´ observed a similar interactive pattern in both the parahippocampal and anterior cingulate regions, though in their case expertise effect was larger with normal than random boards, though it appeared to be present with both. More research is needed to better characterize the processes linked to these regions in expert chess processing, as there are several viable candidates including episodic memory retrieval, pattern recognition, visuospatial semantic memory retrieval, and self-referential/default network processing (discussions in Krawczyk et al., 2011 and Bilalic et al., ´ 2012). For the present, the data from this and several other recent studies are consistent with the use of long-term memory in working memory tasks as one aspect of expertise. At the same time, the present study suggests that an active chunking strategy—or some component of that strategy—is engaged by experts when processing distorted structure in their domain of expertise.

#### **ACKNOWLEDGMENTS**

The authors wish to thank James Stallings and the University of Texas at Dallas Chess Program for assisting with the study. Daniel C. Krawczyk received financial support from a UT Dallas Catalyst Grant.

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 08 August 2013; paper pending published: 07 September 2013; accepted: 14 November 2013; published online: 03 December 2013.*

*Citation: Bartlett JC, Boggan AL and Krawczyk DC (2013) Expertise and processing distorted structure in chess. Front. Hum. Neurosci. 7:825. doi: 10.3389/fnhum. 2013.00825*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2013 Bartlett, Boggan and Krawczyk. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

#### Bartlett et al. Processing distorted structure in chess

#### **APPENDIX**

**Table A1 | Independent whole-brain comparisons for Expert and Less Experienced player groups.**


## Exploring the brains of *Baduk* (*Go*) experts: gray matter morphometry, resting-state functional connectivity, and graph theoretical analysis

#### *Wi Hoon Jung1, Sung Nyun Kim2, Tae Young Lee2, Joon Hwan Jang2, Chi-Hoon Choi 3, Do-Hyung Kang2 and Jun Soo Kwon1,2,4\**

*<sup>1</sup> Department of Psychiatry, Clinical Cognitive Neuroscience Center, SNU-MRC, Seoul, South Korea*

*<sup>2</sup> Department of Psychiatry, Seoul National University College of Medicine, Seoul, South Korea*

*<sup>3</sup> Department of Diagnostic Radiology, National Medical Center, Seoul, South Korea*

*<sup>4</sup> Brain and Cognitive Sciences-WCU Program, College of Natural Sciences, Seoul National University, Seoul, South Korea*

#### *Edited by:*

*Merim Bilalic, University Tübingen, University Clinic, Germany*

#### *Reviewed by:*

*Robert Langner, Heinrich Heine University Düsseldorf, Germany Xiaochen Hu, University of Tübingen, Germany*

#### *\*Correspondence:*

*Jun Soo Kwon, Department of Psychiatry & Behavioral Sciences, Seoul National University College of Medicine, 101 Daehak-no, Chongno-gu, Seoul 110-744, South Korea e-mail: kwonjs@snu.ac.kr*

One major characteristic of experts is intuitive judgment, which is an automatic process whereby patterns stored in memory through long-term training are recognized. Indeed, long-term training may influence brain structure and function. A recent study revealed that chess experts at rest showed differences in structure and functional connectivity (FC) in the head of caudate, which is associated with rapid best next-move generation. However, less is known about the structure and function of the brains of *Baduk* experts (BEs) compared with those of experts in other strategy games. Therefore, we performed voxel-based morphometry (VBM) and FC analyses in BEs to investigate structural brain differences and to clarify the influence of these differences on functional interactions. We also conducted graph theoretical analysis (GTA) to explore the topological organization of whole-brain functional networks. Compared to novices, BEs exhibited decreased and increased gray matter volume (GMV) in the amygdala and nucleus accumbens (NA), respectively. We also found increased FC between the amygdala and medial orbitofrontal cortex (mOFC) and decreased FC between the NA and medial prefrontal cortex (mPFC). Further GTA revealed differences in measures of the integration of the network and in the regional nodal characteristics of various brain regions activated during *Baduk*. This study provides evidence for structural and functional differences as well as altered topological organization of the whole-brain functional networks in BEs. Our findings also offer novel suggestions about the cognitive mechanisms behind *Baduk* expertise, which involves intuitive decision-making mediated by somatic marker circuitry and visuospatial processing.

**Keywords: amygdala,** *Baduk***, head of caudate, intuitive judgment, resting-state functional connectivity, somatic marker hypothesis, voxel-based morphometry**

#### **INTRODUCTION**

Board games such as chess have been studied by researchers from a variety of fields, such as economics (Levitt et al., 2011), computer science (Bouzy and Cazenave, 2001; Cai et al., 2010), and cognitive science (de Groot, 1965; Chase and Simon, 1973), because of the similarity between board games and real life in terms of the need to engage in decision-making and adaptive behavior to achieve specific goals under changing environmental conditions. Cognitive science, in particular, has used board games to study cognitive expertise, as playing involves diverse cognitive functions such as attention, working memory, visuospatial processing, and decision-making (Chase and Simon, 1973; Gobet and Charness, 2006). Board-game players with the highest level of skill, known as grand masters, are considered cognitive experts who develop the knowledge structures used in problem solving in a given domain through long periods of deliberate practice (Chase and Simon, 1973). Using these knowledge structures, called chunks, templates, or schemas (Chase and Simon, 1973; Gobet and Charness, 2006), experts can rapidly match the patterns they have learned and make faster and better decisions. Such chunk-driven unconscious automatic cognitive processes are often referred to as intuition, which is defined as the recognition of patterns or structures stored in long-term memory (Chase and Simon, 1973), and a number of researchers have proposed accounts of the mechanisms underlying intuitive judgment (Hodgkinson et al., 2009; Minavand chal et al., 2013), such as the following: dual-process theory, naturalistic decisionmaking (NDM), and somatic marker hypothesis (SMH). For example, the recognition-primed decision (RPD) model within the NDM approach focuses on the success of expert intuition (de Groot, 1965; Klein, 1998, 2008), as opposed to the heuristicand-biases approach which adopts a skeptical attitude toward expert judgment (Kahneman and Klein, 2009). This shows how experts can make extremely rapid and favorable decisions by combining two processes: (i) an intuitive (automatic) process involving pattern matching based on past experience and (ii) a deliberative (conscious) process involving mental simulation (or analysis) to imagine how a course of action will play out (Klein, 1993; Kahneman and Klein, 2009). The SMH emphasizes the influence of emotion-based signals (somatic states) emerging from the body, such as gut feelings on intuitive decision-making (Damasio, 1996; Dunn et al., 2006). Despite previous extensive studies on the mechanism behind intuitive expertise in board games, its neural basis remained largely enigmatic until the last two decades (Nichelli et al., 1994). Recent brain imaging studies during board-game play have resulted in renewed interest in the neural basis of cognitive expertise and have revealed brain regions associated with object recognition, such as the lateral occipital complex, occipitotemporal junction, (Bilalic et al., 2011a,b ´ ) and the fusiform cortex (FFC) (Bilalic et al., 2011c ´ ), with pattern recognition, such as the collateral sulcus (CoS) and retrosplenial cortex (RSC) (Bilalic et al., 2010, 2011b ´ ), with recognition of relations between objects, such as the supramarginal gyrus (SMG) (Bilalic et al., 2011a,b ´ ), and with intuitive best next-move generation during chess play, such as the head of the caudate (HOC) (Wan et al., 2011, 2012). However, most neuroimaging studies with board-game experts have involved chess, even though *Baduk* differs fundamentally from chess in terms of the mental strategies involved.

*Baduk*, as it is known in Korean (*Go* in Japanese and *Weiqi* in Chinese), is a popular board game in East Asia; it is played on a square board consisting of a pattern of 19 by 19 crossed lines. Whereas chess pieces have specific identities and functions, all *Baduk* pieces (called stones) have the same value and function. Rules of the game are very simple (http://english.Baduk.or.kr); two players, one playing with black stones and the other playing with white ones, alternately place a stone to capture as large an area as possible on the board by surrounding the opponent's stones. Despite its simple rules, *Baduk* is characterized by greater combinatorial complexity than chess due to the tremendous size of its game tree; the average branching factor (i.e., the number of move choices available per turn) is approximately 200 in *Baduk*, whereas it is about 35 in chess (Keene and Levy, 1991). Additionally, unlike most other strategy games, *Baduk* cannot be won by a computer program, whereas computerized chess programs can beat even the world's best human player (Bouzy and Cazenave, 2001). Although chess and *Baduk* share common cognitive and affective processes, such as memory, attention, perception, and emotional regulation, the two games nonetheless differ in the following important ways. Given its larger game tree and heavy dependence on spatial positioning rather than on selecting pieces according to their roles, knowledge and pattern recognition with respect to spatial positioning may be more important in *Baduk* than in other strategy games (Gobet et al., 2004). Recent neuroimaging studies on *Baduk* experts (BEs) have demonstrated increased activity in the occipitotemporal and parietal cortices, areas associated with visuospatial processing, such as integration of local features (Kourtzi et al., 2003) and spatial attention (Fink et al., 1996) respectively, while performing *Baduk* tasks (Chen et al., 2003; Ouchi et al., 2005). In addition to cognitive competences such as spatial processing, researchers have recently emphasized emotional processing in competitive board-game (Grabner et al., 2007) because based on evidence for the SMH (Bechara et al., 1994; Blakemore and Robbins, 2012), our performance (i.e., decision-making) is strongly affected by emotions. Thus, since board-game players experience a variety of emotions while playing, an imbalance in the emotions can cause mistakes (DeGroot and Broekens, 2003). Accumulated evidence from neuroimaging and lesion studies implicates the amygdala (AMY), striatum, and orbitofrontal cortex in emotional processing (Phillips et al., 2003a,b), and suggests the ventromedial prefrontal cortex (vmPFC), AMY, somatosensory cortex, and insula as regions of brain circuitry involved in the SMH (Damasio, 1996; Dunn et al., 2006). Particularly, the vmPFC is thought to play a role in generating somatic markers (Damasio, 1996). Taken together, BEs may show differences in morphology and/or function in brain regions associated with spatial processing and emotion-based decision-making. However, until recently, there have not been studies investigating whether such specific differences exist in the brains of long-term trained BEs.

Many neuroimaging studies about the learning- and practicebased superior performance of experts have provided evidence for cross-sectional differences and longitudinal changes in brain structure and function, known as neuroplasticity, in brain areas underlying specific skills. Such brain areas include the occipito– temporal cortex, which is associated with complex visual motions in jugglers (Draganski et al., 2004), the hippocampus, which is associated with spatial learning and memory in taxi drivers (Maguire et al., 2000; Spiers and Maguire, 2006), and the medial prefrontal cortex (mPFC)/medial orbitofrontal cortex (mOFC), which is associated with emotion regulation and self-referential processing in meditation experts (Jang et al., 2011; Kang et al., 2013). In particular, recent studies have revealed that, compared to novices, chess experts demonstrate morphological differences in the HOC and its influence on functional circuits, showing a decrease in gray matter volume (GMV) and an increase in functional connectivity (FC) in this region during resting-state (Duan et al., 2012). However, whether such brain differences are specific to chess experts or extend to experts in other strategy games remains unclear. To address this issue, we used voxel-based morphometry (VBM) and resting-state functional connectivity (RSFC) analysis to compare BEs and novices in terms of GMV and to examine the effects of these morphological differences on functional brain connectivity at rest.

RSFC analysis based on resting-state functional magnetic resonance imaging (rs-fMRI) reveals spontaneous or intrinsic functional connections of the brain, which are reflected in the correlation pattern of low-frequency blood-oxygen-level-dependent (BOLD) fluctuations between small regions of interest and all other brain regions (Fox et al., 2005). Recently, this approach has been extensively used in conjunction with graph theory to investigate the topological organization of brain networks (Wang et al., 2010). Graph theoretical analysis (GTA) of rs-fMRI enables visualization of the overall connectivity pattern across all brain regions and provides quantitative measurement of complex patterns of organization across a network, such as small-worldness, which measures global network connection efficiency. Using this approach, recent studies have reported differences in topological organization of the whole-brain functional network between personality dimensions of extraversion and neuroticism (Gao et al., 2013), as well as between various brain diseases that involve cognitive impairments, such as Alzheimer's disease (Supekar et al., 2008) and schizophrenia (Lynall et al., 2010), and healthy controls. However, the topological organization of the whole-brain functional network in cognitive experts is yet to be elucidated.

We hypothesized that BEs would exhibit morphological differences in brain regions underlying expertise in *Baduk*, particularly the occipitotemporal and parietal areas associated with visuospatial processing and spatial attention respectively, as well as the somatic marker circuitry involved in emotion-based decisionmaking, and that these morphological differences may be associated with alterations in the functional circuits of these regions. We also predicted that the topological organization of their wholebrain functional network would be altered in the service of achieving the most efficient network for playing *Baduk*. To test our hypotheses, we employed VBM and RSFC, and further analyzed the topological properties of the intrinsic brain connectivity network using a graph theoretical approach. We expect that this study will provide evidence for structural and functional brain differences in BEs, as well as offer additional insight into the nature of the varied and complex cognitive mechanisms that enable superior performance by BEs.

### **MATERIALS AND METHODS**

#### **PARTICIPANTS**

Seventeen BEs who had been training for 12*.*47 ± 1*.*50 years were recruited from the Korea *Baduk* Association (http://english.Baduk.or.kr/). BEs experts were statistically matched for age, sex, and education level to 16 novices who knew the rules for playing *Baduk*. All subjects were right handed and had no history of neurological or psychiatric problems. The demographic characteristics of each group are presented in **Table 1**. All procedures performed in this study were approved by the Institutional Review Board of Seoul National University Hospital.

#### **DATA ACQUISITION**

All image data were acquired using a 1.5-T scanner (Siemens Avanto, Germany). High-resolution anatomical images of the whole brain were acquired with T1-weighted 3-D magnetizationprepared rapid-acquisition gradient-echo (MPRAGE) sequence [repetition time (TR)/echo time (TE) = 1160/4.76 ms, flip angle = 15◦, field of view (FOV) = 230 mm, matrix size = 256 × 256]. rs-fMRI data were obtained via a gradient echo-planar


*IQ, intelligence quotient. Values are presented as mean (standard deviation). †χ*<sup>2</sup> *test was used. ‡Independent t-test was used.*

imaging pulse sequence (TR/TE = 2340/52 ms, flip angle = 90◦, FOV = 220 mm, voxel size = 3*.*44 × 3*.*44 × 5 mm3), during which subjects were instructed to relax with their eyes closed without falling asleep. rs-fMRI scans were part of fMRI sessions, during which participants performed working memory tasks. Resting-state runs were performed for 4.68-min (120 volumes) prior to administration of the working memory tasks. Other image parameters (task-related fMRI and DTI) that are not related to the present study are not described herein. Based on visual inspection, a neuroradiologist (CHC) judged all scans to be excellent, without obvious motion artifacts, signal loss, or gross pathology.

#### **VOXEL-BASED MORPHOMETRY ANALYSIS**

T1 data were processed using VBM8 toolbox (http://dbm. neuro.uni-jena.de/vbm.html) implemented in SPM8 (http:// www.fil.ion.ucl.ac.uk/spm), with default parameters incorporating the DARTEL toolbox to produce a high-dimensional normalization protocol (Ashburner, 2007). Images were corrected for bias-field inhomogeneities, tissue-classified into gray matter (GM), white matter (WM), and cerebrospinal fluid (CSF) based on unified segmentation from SPM8, and spatially normalized to the MNI space using linear (12-parameter affine) and non-linear transformations (warping). The nonlinear transformation parameters were calculated via the DARTEL algorithm (Ashburner, 2007) with an existing standard template in VBM8. The warped GM segments were modified to compensate for volume changes during spatial normalization by multiplying the intensity value in each voxel by the Jacobian determinants (modulated GMVs). Finally, the resulting GM images were smoothed with an 8-mm full-width at half maximum (FWHM) isotropic Gaussian kernel. Voxel-wise comparisons of GMV in the two groups were performed using two-sample *t*-tests. Total intracranial volume (TIV) was modeled as a covariate of no interest. TIV was calculated by summing the raw volumes of GM, WM, and CSF, in which each tissue volume was automatically generated as a text file for each subject (∗\_seg8.txt) in VBM8 processing. The statistical significance of group differences was set at *p <* 0*.*05 using AlphaSim correction (with a combination of a threshold of *p <* 0*.*005 and a minimum cluster size of 340 voxels) (Cox, 1996). Based on previous research (Duan et al., 2012), a looser p-threshold was chosen (*p <* 0*.*005 and expected voxels per cluster *k >* 133) to detect the presence of group differences in the HOC. To investigate associations between the GMV of BEs and training duration, we employed SPM8 to perform voxel-wise correlation analysis between these two values using a multiple regression model.

#### **FUNCTIONAL CONNECTIVITY ANALYSIS**

rs-fMRI data preprocessing was performed using SPM8 and REST V1.7 toolkit (http://www.restfmri.net/; Song et al., 2011). The preprocessing procedures for rs-fMRI data were performed as follows. After discarding the first four volumes to allow for stabilization of the BOLD signal, each subject's rs-fMRI data were (i) corrected for slice-timing differences, (ii) realigned to their first scan to correct for movement, (iii) spatially normalized to the MNI echo-planar imaging template in SPM8 (voxels were resampled to 3 × 3 × 3 mm3), (iv) spatially smoothed with a 6-mm FWHM Gaussian kernel, (v) removed of the linear trend of time courses, (vi) temporally band-pass filtered (0.01–0.08 Hz), and (vii) conducted regression of nuisance signals (head-motion profiles, global signal, WM, and CSF) to correct for physiological noises. Regions showing significant group differences in GMV according to the VBM results were defined as seed regions for subsequent FC analysis [i.e., right AMY, right and left nucleus accumbens (NA); **Figure 1A**]. FC maps were produced by extracting the time series averaged across voxels within each seed region and then computing the Pearson's correlation between that time series and those from all other brain voxels. Finally, correlation coefficients for each voxel were converted into a normal distribution by Fischer's *z* transform (Fox et al., 2005). For each group, individual *z*-value maps were analyzed with a random-effect one-sample *t*-test to identify voxels with a significant positive correlation to the seed time series, which correlations threshold at *p <* 0*.*001, uncorrected, and a topological false-discovery-rate (FDR) correction threshold at *p <* 0*.*05 for multiple comparisons (**Figure 2A**; Chumbley et al., 2010). For between-group comparisons, two-sample *t*-tests were used to compare *z*-value maps between experts and novices using AlphaSim correction with significance set at *p <* 0*.*05 (with a combination of a threshold of *p <* 0*.*005 and a minimum cluster size of 13 voxels for each mask map) (**Table 2**). This analysis was restricted to the voxels showing significant positive correlation maps for either experts or novices by using an explicit mask from the combined sets of the results of the one-sample *t*-tests (*p <* 0*.*05, topological FDR corrected) of the two groups.

#### **NETWORK CONSTRUCTION AND ANALYSIS**

In this study, brain networks were composed of nodes representing brain regions and edges representing interregional RSFC. To define network nodes, the Harvard–Oxford atlas (HOA) was employed to divide the whole brain, excluding the brainstem, into 110 (55 for each hemisphere) cortical and subcortical regions-of-interest (ROIs) (**Table 3**). To define the network edges, we calculated the Pearson correlations between pairs of ROIs. Correlation matrices were thresholded into binary networks, applying network sparsity (*S*) (the ratio of the number of existing edges divided by the maximum number of possible edges in a network). The sparsity threshold was normalized so that each group network had the same number of nodes and edges, allowing investigation of the relative network efficiency of each group (Achard and Bullmore, 2007). Given the absence of a gold standard for selecting a single threshold, based on previous studies (Wang et al., 2010; Tian et al., 2011), a continuous range of 0*.*10 ≤ *S* ≤ 0*.*42 with an interval of 0.01 was employed to threshold the correlation matrices into a set of binary matrices. This range of sparsity allows prominent small-world properties of brain networks to be observed (Watts and Strogatz, 1998); that is, the small-worldness of the thresholded networks was larger than 1.1 for all participants (Zhang et al., 2011; Gao et al., 2013). We calculated both global and regional network measures of brain networks at each sparsity threshold (**Figures 3A,B**, **4A,B**). The global measures included (i) small-world parameters (clustering coefficient *CP*, characteristic path length *LP*, and small-worldness σ) and (ii) network efficiency (local efficiency *E*loc and global efficiency *E*glob). The regional measures included three nodal centrality metrics: degree, efficiency, and betweenness (Rubinov and Sporns, 2010; Tian et al., 2011). In this study, we calculated all these metrics using GRETNA v1.0 (https://www.nitrc.org/projects/gretna/), which is a graphtheoretical network analysis toolkit, with PSOM (Pipeline System for Octave and Matlab, (http://code.google.com/p/psom) and MatlabBGL package (http://www.cs.purdue.edu/homes/ dgleich/packages/matlab\_bgl/). Mathematical explanations for each network metric are provided in the following sub-sections.



*†Values are presented as mean beta values (standard deviation) for each group; mean beta values for each subject were extracted from each region using MarsBaR toolbox for SPM (http://marsbar.sourceforge.net/).*

*‡We performed a correlation analysis between beta value extracted from the region of and training duration of each subject using SPSS (p < 0.001, r* = −0*.*802*).*

#### *Global network parameters*

To characterize the global topological organization of wholebrain functional network, we considered five network metrics: clustering coefficient (*CP*), characteristic path length (*LP*),

and left nucleus accumbens as seed regions in experts (the left column) and

small-worldness (σ), global efficiency (*E*glob), and local efficiency (*E*loc). *CP* indicates how well neighbors of a node *i* are connected (i.e., local interconnectivity of a network). *LP* is the shortest path length (i.e., number of edges) required to

right nucleus accumbens and right medial prefrontal cortex (panel **B**).

#### **Table 3 | Anatomic regions-of-interest included in the network analysis†.**


*(Continued)*

#### **Table 3 | Continued**


*†To define network nodes, the Harvard-Oxford atlas (HOA) was employed to divide the whole brain into 110 (55 for each hemisphere) cortical and subcortical regions of interest (ROIs), except the brainstem. ‡To facilitate data characterization and interpretation, we sorted nodes based on lobar (i.e., frontal, temporal, parietal, occipital, and subcortical) classification.*

transfer from one node to another averaged over all pairs of nodes. *E*glob is a measure of the capacity for parallel information transfer over the network, and is inversely related to *LP*. *E*loc is a measure of the fault tolerance of the network, indicating how well each subgraph exchanges information when the index node is eliminated, and is related to *CP*. While high *E*loc and *CP* reflect a high local specialization (called segregation) of information processing, high *E*glob and low

*LP* express a great ability to integrate information from the network.

For a given graph *G* with *N* nodes, the clustering coefficient is defined by Watts and Strogatz (1998) as:

$$C\_P(G) = \frac{1}{N} \sum\_{i \in G} \frac{E\_i}{D\_{\text{nod}}(i)(D\_{\text{nod}}(i) - 1)/2}$$

*,*

**FIGURE 4 | Regional network measures (i.e., nodal degree, nodal efficiency, and nodal betweenness) for experts and novices.** (Panel **A**) shows values of each nodal metric over a range of thresholds in each group. (Panel **B**) shows mean values for each nodal metric across a range of thresholds in each group, which were superimposed on an inflated standard brain using BrainNet Viewer (http://www.nitrc.org/projects/bnv/). (Panel **C**) shows a bar plot of the AUC values of each nodal metric (red, frontal areas; green, temporal areas; blue, parietal areas; sky, occipital areas; purple, subcortical areas).

where *D*nod*(i)* (see below) is the degree of a node *i*, and *Ei* is the number of edges in *Gi*, the subgraph consisting of the neighbors of a node *i*. The characteristic path length is defined by Newman (2003) as:

$$Lp(G) = \frac{1}{\frac{1}{N(N-1)}\left(\sum\_{j \neq i} \frac{1}{i \in G} \frac{1}{\mathbb{Z}\_{ij}}\right)},$$

where *Lij* is the shortest path length between nodes *i* and *j*. To examine the small-world properties, the values of *CP* and *LP* were normalized as compared with those of 100 degree-matched random networks (γ = *C*real *<sup>P</sup> /C*rand *<sup>P</sup>* and <sup>λ</sup> <sup>=</sup> *<sup>L</sup>*real *<sup>P</sup> /L*rand *<sup>P</sup>* , σ = γ*/*λ) before statistical analysis (Maslov and Sneppen, 2002). Typically, a small-world network should meet the following conditions: γ *>* 1 and λ ≈ 1 (Watts and Strogatz, 1998), or σ = γ*/*λ *>* 1 (Humphries et al., 2006). The global efficiency of *G* is defined by Latora and Marchiori (2001) as:

$$E\_{\text{glob}}(G) = \frac{1}{N(N-1)} \sum\_{j \neq i \in G} \frac{1}{L\_{ij}},$$

The local efficiency of G is defined by Latora and Marchiori (2001) as:

$$E\_{\rm loc}(G) = \frac{1}{N} \sum\_{i \in G} E\_{\rm global}(G\_i),$$

where *E*glob*(Gi)*is the global efficiency of *Gi*, the subgraph composed of the neighbors of a node *i*.

#### *Regional nodal parameters*

To investigate the regional characteristics of whole-brain functional network, we considered three nodal metrics: the nodal degree (*D*nod), the nodal efficiency (*E*nod), and the nodal betweenness (*B*nod*)*. All these nodal metrics detect the importance of individual nodes in the network. *D*nod measures the connectivity of a node *i* with all other nodes in the whole brain. That is, nodes with high degree interact with many other nodes in the network. *E*nod measures the information propagation ability of a node *i* with the all other nodes in the whole brain. *B*nod measures the influence of a node *i* over information flow between all other nodes in the whole network. That is, it is the fraction of all shortest paths in the network that pass through a given node. The nodal degree of a node *i* is defined as:

$$D\_{\mathrm{nod}}(i) = \sum\_{j \neq i \in G} e\_{ij},$$

where *eij* is the *i* th and *j* th column element of the obtained binarized correlation matrix. The normal efficiency of a node *i* is defined as Achard and Bullmore (2007):

$$E\_{\mathrm{nod}}(i) = \frac{1}{N - 1} \sum\_{j \neq i \in G} \frac{1}{L\_{ij}},$$

The betweenness of a node *i* is defined as Freeman (1977):

$$B\_{\text{nod}}(i) = \sum\_{\substack{j \neq i \ i \neq k \in G}} \frac{\mathfrak{d}\_{jk}(i)}{\mathfrak{d}\_{jk}},$$

where δ*jk* is the number of shortest paths from a node *j* to a node *k*, and δ*jk(i)* is the number of shortest paths from a node *j* to a node *k* that pass through a node *i* within graph *G*.

#### *Statistical analysis in network parameters*

For statistical comparisons of the two groups, we calculated the area under the curve (AUC) for each network metric, which yields a summarized scalar to integrate the topological characteristics of brain networks over a range of thresholds (**Figures 3C**, **4C**). Between-group differences in each measure were inferred by nonparametric permutation tests (5000 permutations) for the AUC of each global and regional measure. Based on a previous study (Zhang et al., 2011), we identified the brain regions showing significant between-group differences in at least one nodal metric (*p <* 0*.*05, permutation corrected). We also performed the Pearson correlation analyses between the AUC of each network metric and the duration of *Baduk* training in BEs using SPSS (*p <* 0*.*05/110 for multiple comparisons correction).

#### **RESULTS**

#### **GRAY MATTER VOLUME**

Relative to novices, BEs exhibited decreased GMV in the right AMY and increased GMV in the bilateral HOC, particularly the NA (**Table 2**; **Figure 1A**). Significant negative correlations were observed between the degree of GMV reduction in the mOFC adjacent to the gyrus rectus and training duration in BEs (*p <* 0*.*001, *r* = −0*.*802) (**Table 2**; **Figures 1B,C**).

#### **FUNCTIONAL CONNECTIVITY**

BEs showed increased FC in the right AMY seed and left mOFC, and decreased FC in the right NA seed and right mPFC compared to novices (**Table 2**; **Figure 2B**). We found no significant correlations between FC measures and training durations in BEs.

#### **TOPOLOGICAL ORGANIZATION OF THE WHOLE-BRAIN NETWORK** *Global network measures*

Both BEs and novices showed small-world architecture in wholebrain functional networks (i.e., σ *>* 1). Compared to novices, BEs showed a lower normalized characteristic path length λ and increased normalized global efficiency *E*glob in the wholebrain functional network (**Table 4**; **Figure 3C**). No significant differences were found in any other global network measures.

#### *Regional network measures*

The groups differed with respect to nodal centrality measures (i.e., nodal degree, nodal efficiency, and nodal betweenness) in several brain regions (**Table 4**; **Figure 5**). Compared to novices, nodal degree in BEs showed significant increases in the right postcentral gyrus (PocG), right inferior lateral occipital cortex (iLO), right thalamus, and bilateral NA, but significant decreases in the right superior frontal gyrus (SFG), bilateral inferior frontal gyrus

#### **Table 4 | Regions showing significant differences in nodal centrality metrics between experts and novices.**


*†Values are presented as mean AUC (standard deviation) for each group. ‡They are p-values based on nonparametric permutation tests. \*Regions were considered changed in experts if they exhibited significant between-group differences (p < 0.05, permutation-corrected) in at least one nodal metric.*

(IFG), right posterior middle temporal gyrus (pMTG), left superior lateral occipital cortex (sLO), and right AMY (**Figure 5A**). In comparison to novices, nodal efficiency in BEs was significantly increased in the right PocG, right iLO, right posterior cingulate gyrus (PCG), bilateral thalamus, and bilateral NA, but significantly decreased in the right SFG, right pMTG, and left sLO (**Figure 5B**). Compared to novices, nodal betweenness in BEs was significantly higher for the right PocG, right iLO, left intracalcarine cortex, left parietal operculum cortex (PO), and right thalamus, while significantly lower for the right SFG, right pMTG, temporooccipital part of right middle temporal gyrus (TO), right posterior supramarginal gyrus (pSMG), left anterior temporal fusiform cortex (aTFC), left occipital fusiform gyrus (OF), right pallidum, and left AMY (**Figure 5C**).

Nodal network metrics in several regions were correlated with training duration of BEs (*p <* 0*.*05; **Figure 6**), although there were no correlations between brain regions showing significant group difference in nodal network metrics and training durations of BEs. The duration of training in BEs was positively correlated with nodal degree in the left SPL, but negatively correlated with nodal degree in the left CN and left pTFC. Nodal efficiency in the bilateral SPL, right anterior supramarginal gyrus (aSMG), and right pSMG was positively correlated with training duration. Training duration in BEs was positively correlated with nodal betweenness in the left SPL, but negatively correlated with nodal betweenness in the right CN, left pMTG, and right PO. **Figure 6** shows plots of the correlation between nodal metrics and training duration in BEs. However, all of these correlations did not withstand correction for multiple comparisons (*p <* 0*.*05/110).

#### **DISCUSSION**

To our knowledge, this is the first study to combine structural MRI and rs-fMRI to investigate the morphological differences in the brain of BEs, the effect of such morphological differences to functional circuits at rest, and the topological organization of the whole-brain functional network in board-game experts, who are treated as excellent examples of cognitive expertise. Four main findings emerged from this study. First, relative to novices, BEs showed increased GMV in the bilateral NA and reduced GMV in the right AMY. Additionally, the GMV in the mOFC was correlated with training duration. Second, BEs showed higher FC between the right AMY and left mOFC, and decreased FC between the right NA and right mPFC. Third, BEs showed increased global efficiency and decreased characteristic path length, implying an increase in the global integration of the whole-brain functional network. Fourth, BEs exhibited differences in the nodal centrality metrics of many brain regions related to the diverse cognitive functions utilized in *Baduk* games.

#### **INTUITIVE EXPERTISE IN BOARD-GAME EXPERTS AND ITS IMPLICATION ON THE PRESENT FINDINGS**

Board games have historically been the primary focus of research on expertise (de Groot, 1965; Chase and Simon, 1973; Kahneman and Klein, 2009). Specifically, interest in how board-game experts make rapid and effective decisions and in the neural correlates associated with these intuitive judgments has been increasing in recent years (Duan et al., 2012; Wan et al., 2012). Such expertise or unconscious processing and linked brain changes are the products of prolonged experience within a domain and cannot be obtained through shortcuts. As it was previously stated, researchers have studied intuitive expertise in board-game experts using the NDM approach, particularly the RPD model (de Groot, 1965; Klein, 1998, 2008; Kahneman and Klein, 2009). To describe how experts can make extremely rapid and good decisions, the RPD model combines intuitive pattern matching processes based on past experience and deliberative mental simulation processes (Klein, 2008; Kahneman and Klein, 2009). These two processes correspond to System 1 and System 2, respectively, in dual-process accounts of cognition (Hodgkinson et al., 2009;

Kahneman and Klein, 2009). Interestingly, in this study, brain regions showing significant group difference in GMV, the NA and AMY, correspond to theories of the psychological and neurobiological mechanisms underlying intuitive judgment. That is, these regions belong to the X-system (System 1), which supports reflexive cognitive processing in social cognition (Lieberman, 2007) and these regions, as expected, also belong to the somatic marker circuitry based on the SMH. Compared with novices, BEs showed reduced GMV in the AMY and increased GMV in the NA. This finding is very interesting because of the inverse effect (increase/decrease) on the GMV of these two regions. Previous studies have reported specific correlations between regional cortical thickness and each component of cognition (Westlye et al., 2011), as well as a mixed pattern of increases and decreases in regional cortical thickness in experts (Kang et al., 2013). For example, meditation experts showed a thicker cortex in the mOFC and temporal pole, areas associated with emotional processing, and a thinner cortex in the parietal areas and PCG, areas associated with attention and self-perception when compared with novices (Kang et al., 2013). In addition, recent studies have argued for the nonlinearity of training-induced GMV changes, showing an initial increase followed by a decrease in regional GMV (Boyke et al., 2008; Driemeyer et al., 2008), and suggested that these changes are affected by training length and intensity (Takeuchi et al., 2011). Based on such previous studies, a possible explanation for the inverse effect of the AMY and NA on the GMV is that each component involved in *Baduk* expertise affects a different part of the brain in a distinctive way (e.g., increase/decrease on GMV), or that specific processing related to how the amygdala shows reduced GMV may be more strongly involved in BEs.

Contrary to our expectations, BEs and novices did not show any differences in the GMV of the occipitotemporal and parietal areas associated with visuospatial processing and spatial attention. However, we found significant group differences in terms of regional nodal properties of the functional brain network in these regions, particularly in the iLO, pSMG, TO, and pMTG, as well as in the NA and AMY, showing morphological differences. Previous studies have reported that compared with novices, long-term chess experts (over 10 years of training) have morphological and functional differences in the HOA, showing decreased GMV and increased FC in brain regions known as the default-mode network (DMN) during resting-state in the region (Duan et al., 2012), whereas individuals who trained for 15 weeks showed difference only in the activity of anatomically identified HOA but not in morphological differences in the region between before and after training, exhibiting increased activity during chess play (Wan et al., 2012). Based on previous studies and our present findings, it is, thus, conceivable that changes in brain structure, as well as those in brain function, through long-term training may be required to become experts (i.e., to reach a professional level), and that BEs may primarily be involved in intuitive decision-making, which is associated with the regions showing both morphological and FC differences, and secondarily in visuospatial processing, which is associated with the regions showing group differences only in the functional brain network.

#### **STRUCTURAL AND FUNCTIONAL DIFFERENCES IN THE AMYGDALA**

The observed decrease in GMV of the AMY following training is consistent with results from previous studies that involved cognitive, motor, or mental training (Boyke et al., 2008; Takeuchi et al., 2011; Kang et al., 2013), suggesting that a possible mechanism underlying such structural changes is the usedependent selective elimination of synapses (Huttenlocher and Dabholkar, 1997). The AMY is involved in configural/holistic visual processing for both faces and emotional facial expressions (Sato et al., 2011); it enables people to master facial affective processing. More specifically, the right AMY is more relevant to the unconscious (Morris et al., 1998) and rapid (Wright et al., 2001) processing of facial expressions than the left AMY. Beyond its role in emotional and facial processing, the AMY, in addition to the VMPFC, has also been proposed as one of key structures in the SMH and decision-making, as measured by the Iowa gambling task (Bechara et al., 2005), a decision-making task that requires implicit learning and executive function abilities. Additionally, animal studies have found that the AMY may participate in the recognition of visual patterns based on past experiences through direct thalamo–amygdala projection (LeDoux et al., 1989), and that lesions to this area may increase impulsivity in decisionmaking (Winstanley et al., 2004). Human patients with AMY damage have shown decreased decision-making performance under conditions of ambiguity and risk (Brand et al., 2007). Thus, GMV reduction in the AMY of BEs may be related to intuitive decision-making that is based on feelings (a somatic marker) rather than on reasoning and/or the enhanced cognitive functioning (e.g., holistic visual processing) achieved through long-term *Baduk* training. The results of our RSFC and correlation analyses support this interpretation by showing increased FC between the AMY and mOFC in BEs and a correlation between the GMV in the mOFC and training duration. Previous studies have demonstrated that interactions between the AMY and mOFC are crucial for goal-directed behavior (Holland and Gallagher, 2004) and emotional regulation (Banks et al., 2007). Lee et al. (2010) recently reported differences in the WM tract, the uncinate fasciculus, connecting these two regions in BEs. As mentioned above, the mOFC/vmPFC is an important area that generates somatic markers based on the SMH, and correlation between the GMV in the region and training duration suggests that morphological difference in the region may be contributed by *Baduk* training rather than pre-existing differences before training. Contrary to this interpretation, an alternative one of the present finding is that this opposing pattern (reduced GMV and increased FC in the amygdala) may reflect a compensatory neural mechanism pre-existing in people who later become BEs. That is, the observed pattern may be an endophenotype that enables improved somatic-marker-based (automatic) decision-making, and thus favorably influences (i.e., predicts) the development of *Baduk* expertise.

#### **STRUCTURAL AND FUNCTIONAL DIFFERENCES IN THE STRIATUM**

In contrast to the GMV in the AMY, the GMV in the NA was increased in BEs compared to novices. This is especially intriguing, as a recent study with chess experts using the same method reported a GMV decrease in the dorsal part of HOC, which is known to be involved in the quick generation of the best next move during chess playing (Wan et al., 2011), while we found a GMV increase in the ventral part of the HOC, the NA (Duan et al., 2012). Whereas the dorsal striatum is associated with sensorimotor experiences or reward-outcomes processing, the ventral striatum is associated with emotional and motivational experiences or reward-anticipation processing. Consistent with the present finding, Boyke et al. (2008) found increased GMV in the NA after juggling training. They suggested that learning to juggle may stimulate an increase in the size of this region due to its role as an interface between the limbic and motor systems rather than any role in motor control (Mogenson et al., 1980).

We also found decreased FC between the NA and mPFC in BEs compared with novices. Such connections are seen in the anterior part of the DMN (Di Martino et al., 2008; Duan et al., 2012; Jung et al., 2013), which is thought to be involved in self-referential processing and theory of mind (ToM) (Murdaugh et al., 2012; Reniers et al., 2012). ToM refers to the ability to infer the thoughts or intentions of others. Given the hyperconnectivity within the DMN of individuals at high-risk for psychosis (Shim et al., 2010), who also suffer from impaired ToM (Chung et al., 2008), a reverse pattern (decreased FC) in BEs may be due to their capacity to infer the opponent's intention while playing *Baduk*. It is also conceivable that the strength of this connection may reflect a change in their sensitivity to feedback, given that these couplings are thought to alter expectations in the face of negative feedback (van den Bos et al., 2012). However, these ideas are just speculations, and additional neuroimaging studies with *Baduk* tasks are necessary to identify the physiological mechanisms that underlie these brain differences and to clarify the relationship between their cognitive functions and brain structure and function.

#### **TOPOLOGICAL ALTERATION IN THE WHOLE-BRAIN FUNCTIONAL NETWORK**

The GTA results reveal increased global efficiency and decreased characteristic path length in Bes compared with novices, implying an increase in the global integration of the whole-brain network. This can be reflective of effective integrity and rapid information propagation between and among the remote regions of the brain involved in the cognitive processing required for *Baduk* play in BEs (Wang et al., 2010). Thus, this finding reflects a difference in the functional aspect of the whole-brain circuitry in the service of achieving the most efficient network for playing *Baduk*.

We also found the following differences between the two groups in the regional nodal characteristics of many brain regions: increased nodal centrality metrics in nine regions, namely the right PocG, right iLO, right PCG, left intracalcarine cortex, left PO, bilateral NA, and bilateral thalamus, and decreased nodal centrality metrics in 12 regions, namely the right SFG, right pMTG, right TO of MTG, right pSMG, right pallidum, left sLO, left aTFC, left OF, bilateral IFG, and bilateral AMY. Interestingly, whereas the brain regions showing increased nodal centralities in BEs were involved primarily in implicit processing, such as somatic sensation (PocG and thalamus), visual expertise (iLO), and affective/motivational processing (NA), the brain regions showing decreased nodal centralities were related to higher-order cognitive functions such as executive function (SFG, IFG, and sLO), semantic memory processing (pMTG and pSMG), and visual perception (TO and OF). It is speculated that brain regions showing differences in nodal centralities are important contributors in *Baduk* expertise, and may facilitate efficient exchange of information. In this context, our findings are consistent with the RPD model mentioned above (Klein, 2008) Therefore, *Baduk* is thought to involve diverse cognitive functions with respect to both automatic and deliberative processes.

Previous neuroimaging studies that focused on *Baduk* reported enhanced activation in the occipitotemporal and parietal cortices, during these games (Chen et al., 2003; Ouchi et al., 2005), as well as learning-induced differences in activity in fronto–parietal and visual cortices (Itoh et al., 2008), which corresponds to our findings of differences in the PocG and iLO. Correlation analyses between each nodal metric and training duration of BEs showed positive correlations between these two values in the parietal areas, particularly more extensive in the right than left hemisphere, although it did not remain significant after correction for multiple comparison. The right SPL in terms of *Baduk* play, may contribute to spatial working memory (Ungerleider et al., 1998) and/or spatial attention (Fink et al., 1996).

Chess requires the recognition of the identity and function of each piece (i.e., chess-specific object and function recognition, which is known to be associated with the occipito-parietotemporal junction, OTJ; Bilalic et al., 2011a,b ´ ), whereas this is not needed in *Baduk*. This difference as a confounding factor makes it difficult to compare or interpret the results of the present study with those of previous neuroimaging studies on chess experts. However, both *Baduk* and chess involve pattern recognition with respect to the spatial positioning of stones or pieces, which is an essential component in improving both games. Recently, Bilalic et al. (2010, 2011b) ´ demonstrated that the CoS and RSC, part of the parahippocampal place area, play an important role in chess-specific pattern recognition. Intriguingly, we found increased nodal centralities in the iLO, which corresponds to the OTJ associated with chess-specific object and function recognition in studies by Bilalic et al. (2011a,b) ´ , in BEs compared to novices but not any differences in the Cos and RSC between the two groups. This discrepancy in results between the present study with BEs and previous studies with chess experts may stem from differences between the basic aspect of pattern recognition in these board games; pattern recognition in *Baduk* is only based on the shapes the stones take, while that in chess is based on object and function recognition of chess pieces and their ability to rapidly access the information of potential moves or move sequences for each piece. Thus, differences in the iLO may be linked to the recognition of shapes and integration of local features (Kourtzi et al., 2003) during *Baduk* play.

Taken together, these GTA results provide such insight into the topological organization of the functional brain networks of BEs: increased functional integration across global brain regions and increased nodal centralities in regions associated with spatial attention and somatic sensation.

#### **LIMITATIONS**

The present study has some limitations. First, it is difficult to accurately and quantitatively assess the skill level of BEs. Although we used training duration (in years) as a proxy for skill level in BEs, the validity of this proxy can be challenged. Skill levels in BEs may be independent of training duration, resulting in similar skill levels for any given pair of short- and long-trained participants. Previous studies have described that the lack of any significant correlation between brain imaging data and the amount/intensity of training in experts may result from the inaccuracy of the proxy chosen in determining the actual extent/intensity of the individual training (Luders et al., 2011; Kang et al., 2013). Second, as a result of the cross-sectional nature of this study, our findings do not allow any unambiguous definitive causality. Therefore, it is unclear whether the brain differences we found result from acquiring *Baduk* expertise through prolonged training, or if they simply reflect pre-existing differences in brain structure and function that predict later expertise. Longitudinal studies will help to clarify this issue. Finally, the present findings with respect to FC and GTA are based on rsfMRI data of BEs rather than on brain activity during game performance. Thus, further functional imaging studies are necessary to investigate the topological properties and FC within the functional brain network while the individuals actually play *Baduk*.

#### **CONCLUSIONS**

The current study demonstrates differences in the structure and the functional circuits of the AMY and NA in BEs; compared with novices, experts showed decreased GMV and increased FC with the mOFC in the AMY as well as increased GMV and decreased FC with the mPFC in the NA. As interfaces between the cognitive and affective components of the limbic cortico– striatal loop, the AMY and NA are involved in implicit processing and goal-directed adaptive behavior under changing environmental conditions. In particular, the AMY is critical for emotional and holistic visual processing, as well as emotion-based decision-making. Based on our hypothesis that long-term *Baduk* training would influence the structure and functional circuits of regions associated with the cognitive mechanisms underlying *Baduk* expertise, our findings suggest that intuitive decisionmaking, which is mediated by somatic marker circuitry such as the AMY and NA, is a key cognitive component of *Baduk* play. The current study also provides new evidence for differences in the topological organization of the whole-brain network of BEs, showing increased global integration and altered regional nodal centralities in the regions related to visuospatial processing.

#### **ACKNOWLEDGMENTS**

This study was supported by a grant of the *Baduk* (Go) Research Project, Korea *Baduk* Association, Republic of Korea (KBA11). We thank the *Baduk* experts for participating, the Korea *Baduk* Association (http://english.Baduk.or.kr/) for supporting this study, Yong He, Jin-hui Wang, Xin-di Wang, and Ming-rui Xia for their development of the GRETNA toolbox, Sang-yoon Jamie Jung for the continuing encouragement, and Jiyoon Seol for the English revision of the revised manuscript. Furthermore, we are especially grateful to a number of individuals who provided valuable contributions to the study, including Ji Yeon Han, Bon-mi Gu, Ji-Young Park, and the clinical and nursing staff of the Clinical Cognitive Neuroscience Center.

#### **REFERENCES**


*Neurosci. Lett.* 487, 358–362. doi: 10.1016/j.neulet.2010.10.056


chess players. *Am. Econ. Rev.* 101, 975–990. doi: 10.1257/aer.101. 2.975


Miletich, R. (1994). Brain activity in chess playing. *Nature* 369, 191. doi: 10.1038/369191a0


Alzheimer's disease. *PLoS Comput. Biol.* 4:e1000100. doi: 10.1371/journal.pcbi.1000100


*J. Neurosci.* 24, 4718–4722. doi: 10.1523/JNEUROSCI.5606-03.2004

Wright, C. I., Fischer, H., Whalen, P. J., McInerney, S. C., Shin, L. M., and Rauch, S. L. (2001). Differential prefrontal cortex and amygdala habituation to repeatedly presented emotional stimuli. *Neuroreport* 12, 379–383. doi: 10.1097/00001756- 200102120-00039

Zhang, J., Wang, J., Wu, Q., Kuang, W., Huang, X., He, Y., et al. (2011). Disrupted brain connectivity networks in drug-naive, firstepisode major depressive disorder. *Biol. Psychiatry* 70, 334–342. doi: 10.1016/j.biopsych. 2011.05.018

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 12 June 2013; accepted: 12 September 2013; published online: 02 October 2013.*

*Citation: Jung WH, Kim SN, Lee TY, Jang JH, Choi C-H, Kang D-H and Kwon JS (2013) Exploring the brains of Baduk (Go) experts: gray matter morphometry, resting-state functional connectivity, and graph theoretical analysis. Front. Hum. Neurosci. 7:633. doi: 10.3389/fnhum.2013.00633*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2013 Jung, Kim, Lee, Jang, Choi, Kang and Kwon. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# **HUMAN NEUROSCIENCE**

## Brain regions concerned with the identification of deceptive soccer moves by higher-skilled and lower-skilled players

#### *Michael J. Wright <sup>1</sup> \*, Daniel T. Bishop1,2, Robin C. Jackson2 and Bruce Abernethy3,4*

*<sup>1</sup> Department of Psychology, Centre for Cognition and Neuroimaging, Brunel University, Uxbridge, Middlesex, UK*

*<sup>2</sup> School of Sport Sciences and Education, Centre for Sports Medicine and Human Performance, Brunel University, Uxbridge, Middlesex, UK*

*<sup>3</sup> Faculty of Health Sciences, University of Queensland, Brisbane, QLD, Australia*

*<sup>4</sup> Institute of Human Performance, University of Hong Kong, Pokfulam, Hong Kong, China*

#### *Edited by:*

*Merim Bilalic, University Tübingen, University Clinic, Germany*

#### *Reviewed by:*

*Yongmin Chang, Kyungpook National University, South Korea Rouwen Canal-Bruland, VU University Amsterdam, Netherlands*

#### *\*Correspondence:*

*Michael J. Wright, Department of Psychology, Centre for Cognition and Neuroimaging, Brunel University, Kingston Lane, Uxbridge, Middlesex, UB8 3PH, UK e-mail: michael.wright@brunel.ac.uk*

Expert soccer players are able to utilize their opponents' early body kinematics to predict the direction in which the opponent will move. We have previously demonstrated enhanced fMRI activation in experts in the motor components of an action observation network (AON) during sports anticipation tasks. Soccer players often need to prevent opponents from successfully predicting their line of attack, and consequently may try to deceive them; for example, by performing a step-over. We examined how AON activations and expertise effects are modified by the presence of deception. Three groups of participants; higher-skilled males, lower-skilled males, and lower-skilled females, viewed video clips in point-light format, from a defender's perspective, of a player approaching and turning with the ball. The observer's task in the scanner was to determine whether the move was normal or deceptive (involving a step-over), while whole-brain functional images were acquired. In a second counterbalanced block with identical stimuli the task was to predict the direction of the ball. Activations of AON for identification of deception overlapped with activations from the direction identification task. Higher-skilled players showed significantly greater activation than lower-skilled players in a subset of AON areas; and lower-skilled males in turn showed greater activation than lower-skilled females, but females showed more activation in visual cortex. Activation was greater for deception identification than for direction identification in dorsolateral prefrontal cortex, medial frontal cortex, anterior insula, cingulate gyrus, and premotor cortex. Conversely, greater activation for direction than deception identification was found in anterior cingulate cortex and caudate nucleus. Results are consistent with the view that explicit identification of deceptive moves entails cognitive effort and also activates limbic structures associated with social cognition and affective responses.

**Keywords: fMRI, action observation, deception, expertise, soccer, football, mirror neuron system, sport**

#### **INTRODUCTION**

Expert players in interceptive sports such as soccer react under great time pressure and therefore, need to predict the actions of their opponents and the direction of play (Reilly et al., 2000; Abernethy et al., 2001; Savelsbergh et al., 2005; Williams et al., 2011). Studies have shown that superior analysis of body kinematics underpins much anticipation skill in sport. In the temporal occlusion paradigm, action is cut off at various time intervals relative to a crucial event (such as point of direction change in soccer), and the observer judges the direction of the shot. Results consistently show that experts are able to detect the predictive information with greater accuracy and earlier than novices (Abernethy and Russell, 1984, 1987; Abernethy et al., 2008). The nature of the predictive information has also been identified using techniques such as spatial occlusion, in which different parts of the opponent's body are systematically masked (Muller et al., 2006; Jackson and Mogan, 2007). The reductive approach to identifying the minimum visual information sufficient to support expert anticipation has been taken further with the use of point-light video stimuli. Comparisons of performance based on ball trajectory alone and studies using point-light stimuli indicate the pre-eminence of body kinematics as a cue to future action (Abernethy et al., 2001, 2008; Huys et al., 2008).

A general conclusion from this research is that experts are better than novices at detecting predictive cues in opponents' body kinematics, and this gives them an advantage in speed and accuracy. Precisely for this reason, skilled players also need to develop strategies to reduce the predictability of their own actions. The effectiveness with which deceptive moves can thwart anticipation has been established in soccer (Dicks et al., 2011; Smeeton and Williams, 2012; Bishop et al., 2013); rugby football (Jackson et al., 2006; Brault et al., 2012; Mori and Shimada, 2013); basketball (Sebanz and Shiffrar, 2009; Kunde et al., 2011); handball (Cañal-Bruland and Schmidt, 2009; Cañal-Bruland et al., 2010), and tennis (Rowe et al., 2009). In these studies it was found that experts are more accurate than novices in predicting the outcome of deceptive moves, and that the expert-novice difference tends to be greater for deceptive than for normal moves.

Neuroimaging studies have provided some insights into the neural structures that mediate anticipation skills. A substantial literature has developed around functional imaging studies of cortical networks that mediate the perception of, and the production of responses to, others' actions. Molenberghs et al. (2012) conducted a meta-analysis of 125 fMRI studies of the human "mirror neuron system" (MNS) and identified a core network of brain areas including inferior frontal gyrus, dorsal and ventral premotor cortex, and inferior and superior parietal lobule, that were activated in studies involving the observation and/or production of actions. Most fMRI experiments on the observation of actions do not include direct evidence for the presence of mirror neurons, so we refer in the present paper to an action observation network (AON: Grafton, 2009) rather than MNS. The AON does nevertheless include the structures identified by Molenberghs et al. (2012) as core elements of the MNS.

Research has demonstrated the importance of the AON in sport, including structures traditionally interpreted as having motor functions. Wright and Jackson (2007) measured cortical fMRI activation in predicting the direction of a tennis serve from temporally-occluded video clips. Relative to a passive, actionobservation control condition, action prediction activated the anterior components of the AON, particularly the dorsal and ventral premotor cortex. Aglioti et al. (2008) found that observation of basketball shots increased the strength of motor-evoked potentials elicited by transcranial magnetic stimulation, and that experts showed a time-specific motor activation for missed shots, indicating a close and specific interaction between perceptual and motor systems that is dependent on experiential learning. Wright et al. (2010) found stronger activation for expert badminton players while predicting the direction of badminton shots in components of the AON, specifically, medial frontal cortex, inferior frontal gyrus, anterior insula, and superior parietal lobule. Wright et al. (2011) showed that low-resolution point-light badminton video effectively supported judgments of the direction of a shot, and elicited a corresponding full pattern of fMRI activations in these areas including expertise effects, thus, indicating the sufficiency of body kinematics as input to the AON. Bishop et al. (2013) studied neural correlates of direction prediction in soccer, with temporally-occluded video stimuli that included deceptive moves, and with randomized presentation that maximized uncertainty. High-skilled observers showed stronger responses than intermediates and novices not only in cortical AON structures but also in subcortical structures, including cerebellum, lentiform nucleus, and thalamus, that have been implicated in response selection (Yarrow et al., 2009).

Correct direction prediction in a situation where an opponent can use deceptive moves, for example when an oncoming rugby player executes a side-step, may involve attending to "honest" movement cues and ignoring "deceptive" movement signals (Brault et al., 2012). This is a complex skill that entails more than simply being able to recognize a normal or a deceptive move: the correct implication of that move in terms of outcome (future direction of play) must also be perceived or comprehended. This is perhaps the reason that highly-skilled players often take longer to react than novices in the presence of deception, and achieve greater accuracy as a result (Brault et al., 2012; Mori and Shimada, 2013). In some studies, experts are found to be significantly disadvantaged by deception, notwithstanding they may be less disadvantaged than novices (Brault et al., 2012; Bishop et al., 2013; Mori and Shimada, 2013). Possible reasons for this include an increased cognitive load, perceptual uncertainty, or misdirection of attention in the presence of deception.

The purpose of the present study was to analyze the neural and behavioral responses of lower-skilled and higher-skilled players to the task of identifying soccer moves as normal or deceptive, and by comparison, measuring the neural and behavioral response to identifying future direction of play in an identical (normal plus deceptive) stimulus set. Most studies of deceptive moves in sport have used identification of future direction of play as a measure (Jackson et al., 2006; Rowe et al., 2009; Dicks et al., 2011; Kunde et al., 2011; Brault et al., 2012; Smeeton and Williams, 2012; Bishop et al., 2013; Mori and Shimada, 2013). A smaller number have measured deception identification (Cañal-Bruland and Schmidt, 2009; Sebanz and Shiffrar, 2009; Cañal-Bruland et al., 2010). These tasks are not equivalent. Firstly, as Cañal-Bruland and Williams (2010) found, the kinematic information used when predicting the direction of a shot differs from that used when discriminating between two different movement patterns. Secondly, the consequences of the judgment are different. Direction identification requires a directional or spatial judgment with implications for the direction of an interceptive movement. Equally, deception identification implies a more analytical judgment of an observed action as having some goal or intent, but without specifying direction. It was therefore, hypothesized that both behavioral performance and cortical patterns of activation for direction identification and deception identification may differ, and that there would be differences in the activation of task-related regions, as identified by fMRI, in lower-skilled and higher-skilled players. In view of the research reviewed above showing the sufficiency of body kinematics in sport action prediction tasks, and in order to eliminate irrelevant stimulation by background stimuli, physical appearance and clothing of actors, we utilized point-light stimuli for the tasks.

#### **MATERIALS AND METHODS**

#### **EXPERIMENTAL DESIGN AND PROCEDURE**

In a block-design, fMRI study, participants in the scanner viewed 2-s video sequences of an opposing soccer player dribbling the ball toward the viewer, and pressed a button to indicate which direction the player would turn; that is, the leftmost button for a turn to the observer's left, and the rightmost button for a turn to the observer's right. There was an interstimulus interval of 2-s during which a gray screen at mean luminance was present, and instructions were to respond as accurately as possible during the interstimulus interval. There were five video clips in each block. Exactly half of the sequences of each type were based on deceptive moves (step-over) and half on normal moves, both for direction prediction and for control conditions. The type of move (normal or deceptive) was randomized within blocks. In addition to fMRI data, button press responses were recorded and analyzed for accuracy.

A second session of the experiment utilized exactly the same stimulus material and block design but required a different action identification task: instead of predicting which direction the player would turn, the observer had to indicate by a button press whether a move was normal or deceptive. The order of sessions was counterbalanced across participants.

For both action identification tasks, we used a control block: a single static frame at the start of the point-light footballer's run was used, and it was slowly magnified (zoomed) over 2-s to match the apparent motion of the footballer toward the observer. However, as it was derived from a static frame, there was no biological motion: that is, there was no relative motion between the dots representing the movements of the footballer's limbs and trunk. We therefore, refer to this as a non-biological motion (NBM) control. The required response for this type of video was simply to press a middle button. Mean accuracy on this task was 99.9%. A further type of block required participants to respond to an altered dot in the point-light footballer video (98.5% correct): but further analyses of the responses to this condition are not within the scope of the present paper.

Before each block of 5 trials, a 5-s instruction screen appeared specifying the task for the subsequent block. Blocks were presented in a fixed pseudorandom sequence. Altogether there were eight repetitions of the three types of block: (1) soccer direction identification with 0 ms occlusion; (2) soccer direction identification with −160 ms occlusion; (3) NBM control. The total duration including instruction screens and blank intervals was 18 min.

The same control task and stimuli were used in the deception identification session as in the direction identification session. Thus, the only difference in the material for the two versions of the experiment was in the on-screen instructions. The three types of block were thus, (1) soccer deception identification with 0 ms occlusion; (2) soccer deception identification with −160 ms occlusion; (3) NBM control. Participants undertook both versions of the experiment, and the order was counterbalanced between experiments. The experiment as a whole comprised two 18-min sessions, plus an anatomical scan lasting 5 min.

After completing the pre-scan screening and informed consent procedures, participants were instructed in the nature of the task and shown examples of the stimuli. They were asked whether they were familiar with the step-over as a deceptive move, and if not, a brief verbal explanation was given.

#### **PARTICIPANTS**

The participants were 17 higher-skilled male soccer players (mean age 22.6, *SD* 4.0, range 19–33 years), 17 lower-skilled male soccer players (mean age 22.1, *SD* 3.7, range 19–31 years). Additionally 17 females (mean age 20.1, *SD* 1.1, range 19–23 years) were included as a group with minimal soccer experience. Participants were recruited by advertising on University notice-boards and websites and by word of mouth and were offered £20 in expenses to recompense for their time and inconvenience. All participants gave their written informed consent as part of a protocol approved by the Brunel University Department of Psychology Ethics Committee. Procedures for fMRI were conducted according to the Rules of Operation of the Combined Universities Brain Imaging Centre. All participants completed a questionnaire giving brief demographic details and providing information on their soccer experience and expertise. Higher-skilled players were defined as those playing currently or within the last year in a league with regular fixtures and for a named club whose provenance could be checked on the internet. They were drawn from local leagues and University teams and did not include elite or professional players. Lower-skilled players were nonplayers or recreational players, but included some with previous experience (more than 1 year previous) of playing competitively for local sports clubs or school teams. All but one participant in the lower-skilled male group had played soccer in childhood. **Table 1** compares the samples according to age and soccer experience. Higher- and lower-skilled males differed significantly on a Mann-Whitney *U* test in the highest level of competition achieved; *U* = 29, *p <* 0*.*0005, the number of hours per week in training; *U* = 50, *p <* 0*.*005; and the number of matches watched (live or on television or other media) per month, *U* = 105, *p <* 0*.*05. They did not differ significantly in age, in the age at which they started playing, or in the skill level of other sports played. The lower-skilled females differed significantly from the lowerskilled males in the number of years playing, *U* = 82, *p <* 0*.*005; competitive level, *U* = 116, *p <* 0*.*05; hours per week training, *U* = 111, *p <* 0*.*05 and matches watched per month, *U* = 70, *p <* 0*.*005. They did not differ significantly in the level of other sports played. From the point of view of the research hypotheses, the female lower-skilled group provides a baseline with a low level of soccer experience: the possible influence of gender will be addressed in the Discussion.

#### **STIMULI**

All experiments utilized 2-s point-light video clips of three junior international male soccer players dribbling the ball toward a video camera (NV GS400; Panasonic Corporation, Secaucus, NJ) placed at a distance of 11.5 m from the start of the player's run, in an

#### **Table 1 | Comparison of the soccer experience of the participant groups.**


indoor sports hall. The actors ran toward the camera, then at a predetermined point, moved obliquely to the left or right as they would in evading a defending player's interception. They performed a deceptive maneuver known as a step-over in 50% of runs immediately prior to a direction change. The color video was edited (Pinnacle Studio Pro v 11.0, Pinnacle Systems, CA) frame by frame to produce sparse binary (black/white) point-light representations consisting of 15 small disc markers on principal body joints and extremities. The ball was represented in each frame by a white disc. There was no representation of surface texture, depth, orientation, or color, either that of the player or that of the background. To generate different levels of temporal occlusion, the video was truncated at various time points relative to the passing of the floor marker (0 ms). Two occlusion levels were used (−160 and 0 ms).

#### **ACQUISITION AND ANALYSIS OF fMRI DATA**

Functional and structural images were acquired on a MAGNETOM Trio 3T MRI scanner (Siemens Medical Solutions; Bracknell, UK) using Siemens' parallel imaging technology (iPat), which was deployed with a generalized auto calibrating partially parallel acquisitions (GRAPPA) acceleration factor of two, via a Siemens eight-channel array head coil. For each functional run, an ultra-fast echo planar gradient-echo imaging sequence sensitive to blood-oxygen-level dependent (BOLD) contrast was used to acquire 41 transverse slices (3 mm thickness) per TR (3000 ms, TE 31 ms, flip angle = 90◦). For each version of the experiment, 360 volumes were acquired in a 192 × 192 mm field of view with a matrix size of 64 × 64 mm, giving an in-plane spatial resolution of 3 mm (generating 3 mm<sup>3</sup> voxels). Anatomical data were collected in the same orientation and plane as the functional data to enable localization, using an MP-RAGE T1-weighted sequence, in which 176 one-mm slices alternated with a 0.5 mm gap. The structural sequence incorporated 1830 ms TR, 4.43 ms TE, FoV 256 mm and a GRAPPA acceleration factor of two.

#### *Data acquisition and preprocessing*

both sessions were spatially realigned by initially aligning the first images of each session, and then aligning the images within each session to the first image, to moderate the effects of participants' head motion. Images were normalized using the SPM8 EPI template to account for anatomical variability, and to facilitate reporting of activation sites in the Montreal Neurological Institute (MNI) standard space. Finally, data were smoothed using a Gaussian kernel of 6 mm full-width half-maximum (FWHM) to increase the signal-to-noise ratio according to the matched filter theorem. The selected design matrix convolved the experimental design with a hemodynamic response function to model the hemodynamic lag behind the neuronal response. This model was estimated using proportional scaling over the session to remove global effects, and with a high pass filter of 128 s. fMRI data were analyzed using the batch processing utilities of SPM8 (http://www*.*fil*.*ion*.*ucl*.*ac*.*uk/spm/). Functional images for

#### *Statistical analysis*

Individual level whole-brain fMRI *t*-contrasts were computed between experimental and control conditions as follows: (1) 0 ms occlusion vs. NBM control, (2) −160 ms occlusion vs. NBM control. This analysis was repeated for both direction identification and deception identification. Second-level, group data were analyzed using the SPM8 full factorial ANOVAs procedure. Two ANOVAs were conducted: the first, within-group ANOVA was based on the first level contrast between the experimental tasks and the NBM control and it was carried out twice, once for each group. The purpose of this analysis was to identify the withingroup patterns of activation for the two tasks, deception and direction identification. Between-group differences were analyzed in the main (3 × 2 × 2) mixed ANOVA; the three factors were *expertise* (higher-skilled males, lower-skilled males, lower-skilled females), *task* (deception identification, direction identification), and *occlusion* (0, −160 ms). The input data to the ANOVA model were the first-level *t-*contrasts for 0 ms and for −160 ms occlusion vs. NBM control, for both the deception and direction identification tasks. Family-wise error (FWE) correction was used for all whole brain data. Identification of the location of peaks and clusters and assignment of Brodmann area (BA) labels was carried out in MNI space using WFU-Pickatlas (Maldjian et al., 2003). Accuracy of behavioral responses in the scanner was also analyzed statistically: details are given below.

#### **RESULTS**

### **BEHAVIORAL RESULTS**

#### *Identification accuracy*

The percentage of correct responses was measured for both direction identification and deception identification in the scanner, and a mixed ANOVA was conducted with identification task (deception, direction), trial type (normal, deceptive), and occlusion (0, −160 ms) as within-participant variables and group (higher-skilled male, lower-skilled male, lower-skilled female) as a between-participant variable. There was a significant main effect of trial type, *F(*1*,* <sup>48</sup>*)* = 63*.*5, *p <* 0*.*0005, η<sup>2</sup> *<sup>p</sup>* = 0*.*59; with the mean accuracy higher for normal, *M* = 75*.*8% than for deceptive trials, *M* = 53*.*6%. There was also a significant main effect of occlusion, *F(*1*,* <sup>48</sup>*)* = 64*.*5, *p <* 0*.*0005, η<sup>2</sup> *<sup>p</sup>* = 0*.*59, with higher accuracy for late occlusion (*M* = 72*.*4%) than for early occlusion (*M* = 57*.*6%). There was also a significant main effect of group, *F(*2*,* <sup>48</sup>*)* = 10*.*4, *p <* 0*.*005, η<sup>2</sup> *<sup>p</sup>* = 0*.*32. Tukey's HSD showed that higher-skilled males (*M* = 72*.*7) differed significantly in overall accuracy from lower-skilled males (*M* = 64*.*5%, *p <* 0*.*05) and lower-skilled females (*M* = 57*.*9%, *p <* 0*.*001). Lower-skilled males and females did not differ significantly from one another. It was also expected that higher-skilled participants would be relatively superior in their response to deceptive stimuli, and this was confirmed; the interaction of expertise and trial type was significant, *F(*2*,* <sup>48</sup>*)* = 3*.*9, *p* =*<* 0*.*05, η<sup>2</sup> *<sup>p</sup>* = 0*.*15. These results are broadly consistent with previous work on expertise and anticipation skill. Additionally, normal and deceptive stimuli were differentially affected by occlusion, thus, for trial type x occlusion, *F(*1*,* <sup>48</sup>*)* = 25*.*7, *p <* 0*.*0005, η<sup>2</sup> *<sup>p</sup>* = 0*.*39.

A novel aspect of the design was the comparison of two different identification tasks. Overall, the two tasks had a similar level of difficulty: for deception identification (*M* = 65*.*7%) and for direction identification (*M* = 64*.*3%) and the overall difference in accuracy was not significant. However, there was a significant two-way interaction between identification task (deception, direction) and trial type (normal, deceptive), *F(*1*,* <sup>48</sup>*)* = 28*.*7, *p <* 0*.*0005, η<sup>2</sup> *<sup>p</sup>* = 0*.*39. As shown in **Figure 1**, on normal moves, accuracy was significantly higher for direction identification, but on deceptive moves, accuracy was significantly higher for deception identification. There was also a significant three-way interaction between identification task, trial type and occlusion, *F(*1*,* <sup>48</sup>*)* = 13*.*2, *p <* 0*.*005, η<sup>2</sup> *<sup>p</sup>* = 0*.*22, such that the task difference on deceptive trials was greater on late-occluded than early-occluded blocks. These interactions are of particular interest and to aid their interpretation, the mean scores are shown in **Figure 1**.

**Figure 1** shows that for normal trial stimuli, mean accuracy of identification was relatively high (66–86%), both for early and late occlusion. For deceptive trial stimuli, the results were more complex. Planned comparisons (within-participants *t*-tests) were carried out for all comparisons of deception identification and direction identification, and the significant results are shown in **Figure 1**. Thus, for late occluded stimuli, participants performed with 76% accuracy in identifying deceptive stimuli on deceptive trials, but on judging direction with the same stimuli their accuracy (54%) was not significantly better than chance on a one-sample *t*-test.

#### *Signal detection theory analysis*

Because the percentage correct accuracy values may be affected by response bias, signal detection theory (SDT: Green and Swets, 1966) was applied. This method has been used previously to analyze the identification of normal vs. deceptive movements by Cañal-Bruland and Schmidt (2009). SDT calculates two variables, d-prime (d : perceptual sensitivity), and beta (β: likelihood ratio or response bias). The d is a measure of the difference between the signal and noise distributions, calculated as d = *z*(H) - *z*(F), where H is "hits" or correct identifications, and F is false positives, expressed in terms of their common standard deviation (*z*-units) (Macmillan and Creelman, 2005). Thus, for deception

**FIGURE 1 | Mean percentage accuracy on normal and deceptive trials in scanner sessions where the task was to identify of the type of move (normal or deceptive) and in sessions where the task was to identify the direction of play (left or right).** Error bars are ±1 s.e.m. Difference between deception identification and direction identification (bracketed bars) is significant at ∗∗*p <* 0*.*005.

identification, H was taken to be the proportion of correct identifications of normal moves, and F was taken to be the proportion of deceptive moves incorrectly identified as normal. For direction identification, H was taken to be the proportion of correct identifications of direction on normal moves, and F was taken to be the proportion of incorrect identifications of direction on deceptive moves.

For deception identification, d was significantly greater for late-occluded stimuli (*M* = 1*.*46) than for early-occluded stimuli (*M* = 0*.*59). There was also a significant main effect of expertise. *Post-hoc* contrasts (Tukey) showed that the higher-skilled males were more sensitive to the difference between normal and deceptive moves than the lower-skilled males, and lower-skilled females.

For direction identification, d was significantly greater for stimuli occluded at 0 ms (*M* = 1*.*07) than for stimuli occluded at −160 ms (*M* = 0*.*39). There was also a significant main effect of expertise and *post-hoc* contrasts (Tukey) showed that the higher-skilled males were more sensitive to direction than the lower-skilled males or females, taking into account the false identifications of direction on deceptive moves. Although raw accuracy scores on normal moves were higher for direction identification (**Figure 1**), overall sensitivity to direction (d ) was lower than for deception identification, because of the incorrect responses on deceptive moves (**Figure 2**).

The criterion position *c* is the midpoint of the normalized hits and false positives, *c* = −1*/*2[*z*(H) + *z*(F)]. A more generally accepted measure of response bias is the likelihood ratio (β) which takes sensitivity (d ) into account, and is calculated as *ecd* , where cd = −1*/*2 [*z*(H)2− *z*(F)2] (Macmillan and Creelman, 2005; Cañal-Bruland and Schmidt, 2009). A neutral criterion is *c* = 0, or β = 1. For deception identification, if an observer were biased toward identifying moves as normal, then this would increase both the hits and the false positives, and give β *<* 1, that is, a liberal criterion. If the observer were biased toward identifying moves as deceptive, it would decrease both the hits and the false positives and give β *>* 1, that is, a conservative approach to identifying a normal shot.

For deception identification ANOVA showed a significant main effect of occlusion (**Table 2**), with larger β indicating a conservative criterion for late-occluded (*M* = 1*.*7) but not for early-occluded stimuli (*M* = 0*.*39). There was also a significant main effect of expertise. *Post-hoc* comparison showed that higherskilled males set their criterion significantly further toward deception, relative to lower-skilled females. The interaction between expertise and occlusion was also significant with the expertise difference appearing on late-occluded stimuli (**Figure 2**).

For direction identification the interpretation of β is a little more complex, as it represents a perceptual bias rather than a response bias. A value of β *<* 1 implies incorrect identification of direction on deceptive moves (designated false positives), without a corresponding increase in errors on normal moves (designated misses), that is, a tendency to analyze the direction of all moves as if they were normal. This would arise if deceptive cues that are incongruent with direction resemble normal cues that are congruent with direction. A value of β *>* 1, conversely, would represent a tendency to err on normal moves but not

**Table 2 | Significant results from ANOVA conducted separately for deception identification and direction identification.**


(one sample *t*-test, two-tailed).

*Means and standard errors for every cell are shown in Figure 2. In each case, the independent variables were temporal occlusion (0 ms,* −*160 ms), and expertise (HSM, higher-skilled males; LSM, lower-skilled males; LSF, lower-skilled females). p < 0.05 \* < 0.005 \*\* < 0.0005\*\*\*.*

on deceptive moves; in effect, to treat appearances as deceptive. Thus, some of the effect of deception is revealed on the criterion measures. **Figure 2** shows that all cell mean β values for direction identification were less than 1 and the planned comparisons (one-sample *t*-tests) were significant separately for all experimental conditions except for higher-skilled players on late-occluded stimuli: there was a general tendency to treat appearances as normal and to be fooled by deceptive moves. The main effect of expertise on β was not significant: there was no evidence that

occlusion; **(C)** β, 0 ms occlusion **(D)** β, −160 ms occlusion. Error

higher-skilled players adopted a different criterion from lowerskilled players. The only significant ANOVA result for β in direction identification (**Table 2**) was a main effect of occlusion, with β smaller for late-occluded (*M* = 0*.*70) than early occluded stimuli (*M* = 0*.*86).

It can be concluded that higher skilled males were significantly more sensitive to the cues that differentiate normal from deceptive soccer moves. However, they also showed a bias toward identifying moves as deceptive, and in this respect they resembled the skilled handball goalkeepers in the study of Cañal-Bruland and Schmidt (2009). Results also showed that higher-skilled males were significantly more sensitive overall than lower-skilled observers to directional cues. However, observers on average adopted a liberal criterion for direction identification (**Figure 2**), which is a significant tendency to treat deceptive moves like normal moves for the purposes of identifying direction, in other words, to be fooled by the deceptive stimuli.

#### **WHOLE-BRAIN ANALYSIS OF fMRI DATA**

#### *Within-groups analysis*

**Figures 3**–**5** show group data for the activations due to the firstlevel contrast between the two soccer identification conditions and the NBM control trials, superimposed on horizontal sections of a normalized brain anatomy.

The data were entered into separate 2 × 2 ANOVAs, using the factorial design options of SPM8, One ANOVA for each participant group. **Figures 3**–**5** show the responses in the deception task colored cyan (light blue), and the responses in the direction task colored magenta (pink). Overlapping activations are shown in a mixed color (purple). Both sets of data are based on first-level *t-*contrasts measured relative to NBM control. The second level data are displayed with a very conservative statistical threshold (*p <* 0*.*0005, FWE corrected) for both occlusion levels combined. **Figure 2** shows results for higher-skilled males, **Figure 3** for lower-skilled males, and **Figure 4** for lower-skilled females. For all participant groups, the main anatomical areas showing strong activations were similar for deception identification and direction identification and included regions identified as part of a human AON specifically the intraparietal sulcus (BA40) and premotor cortex (BA6). The supplementary motor area in medial frontal cortex (BA6) was also consistently activated, along with the adjoining anterior cingulate cortex (ACC; BA32). Consistent activations in the anterior insula (BA13) were also present. The numerical data corresponding to **Figures 3**–**5** are available in **Tables 3**–**5**.

Areas showing stronger activation to deception than direction identification are shown in darker blue; and areas showing stronger activation to direction than deception identification are shown in red. These are displayed with a liberal statistical criterion (*p <* 0*.*001 uncorrected, minimum cluster size = 5); some of these clusters coincide with the principal task-sensitive areas but some do not. A further analysis of task differences will be given in the next section of the Results.

#### *Analysis of differences between identification tasks and expertise groups*

To establish the significance of differences in between expertise groups and tasks, fMRI data were combined in a factorial ANOVA. There were three expertise groups (higher-skilled males, lower-skilled males, and lower-skilled females), two levels of task (deception identification, direction identification), and

**FIGURE 3 | Higher-skilled males.** Second-level fMRI activations (*p <* 0*.*005, FWE corrected, 25 voxels minimum cluster size) to deception identification (cyan) and direction identification (magenta) in point-light soccer video clips, relative to stimulus-matched non-biological motion (NBM) controls. Overlapping areas responding to both identification tasks appear purple. Activations above threshold (blobs) are displayed in co-registration with an individual normalized structural brain image and sampled in horizontal sections 10 mm apart from *z* = 60 to *z* = −20. In

darker blue areas, activation to deception identification exceeds activation to direction identification; and in red areas, activation to direction identification exceeds activation to deception identification (at *p <* 0*.*001 uncorrected). Key: a: premotor, BA6; b: parietal, BA40; c: medial frontal, BA6; d: anterior cingulate, BA32; e: posterior cingulate, BA23; f: dorsolateral prefrontal, BA46; g: caudate nucleus; h: superior temporal gyrus, BA37; i: anterior insula/frontal operculum, BA13/45; j: cerebellum; k: superior parietal lobule, BA7.

two levels of occlusion (0, −160 ms). The inputs to the second level factorial model were the first-level *t*-contrasts between the identification conditions and the NBM control condition. There were significant main effects of expertise group, task, and occlusion. No significant two- or three-way interactions were found.

appear purple. Activations above threshold (blobs) are displayed in co-registration with an individual normalized structural brain image and sampled in horizontal sections 10 mm apart from *z* = 60 to *z* = −20. In

*Differences between deception and direction identification.* **Figure 6** shows areas responding differentially to the two tasks, measured across all participants. Regions responding significantly more strongly to deception than to direction identification were identified in second-level SPM *t*-contrasts, at *p <* 0*.*05 with whole-brain FWE correction and minimum cluster size of 5. As identified in **Table 6**, these comprised the right dorsolateral prefrontal cortex (BA46), medial frontal cortex (BA6), right premotor cortex (BA6), left and right anterior insula (BA13), posterior cingulate cortex (BA23), and right intraparietal sulcus (BA40). Regions responding more to direction than to deception identification were limited to the (ACC: BA32) and caudate nucleus: the peaks in these two structures were connected at the cluster level at *p <* 0*.*05 FWE.

Additionally, there was a significant main effect of occlusion that was represented by a single large cluster located in left premotor cortex (BA6).

*Expertise group differences.* As shown in **Table 7** and **Figure 7**, significant differences between higher- and lower-skilled male groups were restricted to task-sensitive AON regions: dorsal and ventral premotor cortex and frontal operculum, together with the left occipital-temporal junction, a region sensitive to visual motion, and some differences in occipital cortex. There were no significant voxels for greater activation in low- than in higherskilled male players. Differences between male and female low skill participants were more extensive, principally comprising AON regions, but not overlapping with the male skill-related activations. The reversed *t*-contrast found areas in the temporalparietal junction (BA19) and visual cortex (BA18) responding more strongly in female than male lower-skilled participants.

dorsolateral prefrontal, BA46; g: caudate nucleus; h: superior temporal gyrus, BA37; i: anterior insula/frontal operculum, BA13/45; j: cerebellum; k:

#### **DISCUSSION**

superior parietal lobule, BA7.

#### **RESPONSES TO SOCCER ACTION IDENTIFICATION**

The general pattern of activations found in the action identification tasks in the present study is consistent with previous research on action identification in general (Decety and Grèzes, 1999; Rizzolatti and Craighero, 2004; Filimon et al., 2007) and in fMRI sport anticipation studies in particular (Wright and Jackson, 2007; Wright et al., 2010, 2011).

The involvement of limbic and subcortical structures in soccer action identification, specifically, anterior insula, ACC, cerebellum, posterior cingulate cortex, caudate nucleus and thalamus, extends previous findings and suggests that there is an affective aspect to these tasks that is emphasized by the inclusion of deceptive stimuli (Grèzes et al., 2004; Molenberghs et al., 2012; Bishop et al., 2013). There is a clear correspondence in present results

soccer video clips, relative to stimulus-matched non-biological motion (NBM) controls. Overlapping areas responding to both identification tasks appear purple. Activations above threshold (blobs) are displayed in co-registration with an individual normalized structural brain image and sampled in horizontal sections 10 mm apart from *z* = 60 to *z* = −20. In darker blue

premotor, BA6; b: parietal, BA40; c: medial frontal, BA6; d: anterior cingulate, BA32; e: posterior cingulate, BA23; f: dorsolateral prefrontal, BA46; g: caudate nucleus; h: superior temporal gyrus, BA37; i: anterior insula/frontal operculum, BA13/45; j: cerebellum; k: superior parietal lobule, BA7; m: medial occipital cortex, BA18; n: anterior cingulate.

with an extension of the action observation (AON) brain network that has been identified as the "social network" (SN) (Grafton, 2009; Juan et al., 2013). It is evident from the within-groups analysis of fMRI data, employing control stimuli closely matched in all respects, that the SN network is not simply an accessory to AON but was strongly activated in both deception identification and direction identification in the presence of deceptive stimuli (**Figures 3**–**5** and **Tables 3**–**5**). One of the largest and most consistently activated clusters in both groups of male participants and both tasks was in the anterior insula, and this area was also implicated in the females' data.

#### **EXPERTISE AND GENDER DIFFERENCES**

Expertise differences between the male lower- and higher-skilled groups were reflected in substantial differences in accuracy across all tasks and conditions. In the fMRI experiments, expertise effects in the male groups were identified in a subset of the AON regions that were activated by the experimental tasks, consistent with previous research (Wright et al., 2010, 2011).

The male and female lower-skilled groups did not differ significantly in accuracy on behavioral measures, but comparison of male and female lower-skilled groups revealed significant differences in fMRI activation in both AON and SN structures. The female participants' lower familiarity with soccer actions and the very low level of soccer playing experience (**Table 1**) clearly differentiates them from the lower-skilled male group. This would arguably contribute to the observed expertise-related group differences in fMRI when comparing the two lower-skilled groups. An exception was found in visual cortex, including presumptive visual motion areas, where stronger activation was found in females than in lower-skilled males. This however, is consistent with Wright et al. (2011) where, in a badminton direction identification task, stronger activation in novice brains was found, exceptionally, in visual cortex.

An interesting question is whether the gender-specificity of the video material may be a factor. For females, the gender of viewer and performer was always different, and the argument would be that this may reduce AON activation. Calvo-Merino et al. (2006) recorded fMRI while male and female dancers viewed videos of both gender-specific and gender-nonspecific ballet moves. The strength of activations depended both on motor expertise and on the gender of the viewer relative to the gender of the performer. Separating these effects; they showed that motor experience of gender-specific movers increased activation in motor components of AON (premotor cortex, parietal cortex, and cerebellum). The increased activation of visual cortex in females relative to lower-skilled males does not contradict Calvo-Merino et al. (2006) whose results applied specifically to motor-related rather than visual areas of AON. In the present study, the effect of viewing a performer of the same or different gender is likely to have been reduced but not abolished by the point-light representation (Pollick et al., 2005; Calvo-Merino et al., 2010). The motor expertise effect is moreover a plausible one for the interpretation of the present results because the female group had substantially

#### **Table 3 | Locations of significant clusters as shown in Figure 3.**


*VAN, ventral anterior nucleus; VLN, ventral lateral nucleus; LPN, lateral posterior nucleus.*

less motor experience of soccer moves than the lower-skilled male group.

Together with previous work, our results suggest that increasing familiarity with observed actions as well as motor experience of those actions is associated with increasing expertise and results in a shift in brain activation away from visual brain areas and toward AON motor areas and SN areas.

There are some general limitations of current fMRI research into action observation in general and sporting expertise in particular (Mann et al., 2013). The whole-body sensory-motor coupling, affordance-rich environment, and powerful contextual cues in soccer field play greatly exceed what is available to an immobile viewer of videos in a scanner. It is a challenge for future research to study the neural basis of sporting expertise in more dynamic and interactive scenarios. Despite this limitation, research to date has shown a consistent relationship between anticipatory behavioral responses to sports video and expertise in open-skill sports, and this extends to the use of point-light video stimuli (Abernethy et al., 2001, 2008; Huys et al., 2008). Moreover the present behavioral results recorded in the scanner (section Signal Detection Theory Analysis) have revealed expertise effects both in sensitivity (d ) and in response strategy (β). We would therefore, argue that the present fMRI results reflect the brain's processing of the minimum visual information sufficient to support an anticipatory response, and that our methods provide sufficient sensitivity to detect and localize expertise effects in the brain.

#### **TASK-RELATED DIFFERENCES IN fMRI**

There were no significant interactions between the task (direction identification or deception identification) and the three participant groups, either in the behavioral or in the fMRI data, but there were significant task-related differences in fMRI activations overall. Although AON and SN were activated strongly in both deception and direction identification, there


R sup parietal 7 12 −73 55 10.9 7 12 −73 −23 10.7

Cerebellum 6 L −9 −73 −23 9.4 166

#### **Table 4 | Locations of significant clusters as shown in Figure 4.**

were also significant differences in the two conditions (sections Within-Groups Analysis and Differences Between Deception and Direction Identification). The SN network is engaged particularly when participants are required to make inferences about the intentions of other people's behavior (Juan et al., 2013). This was explicitly the case in the deception identification condition, and in comparison with the direction identification condition, where participants were not required to identify deception, there was significantly greater activation in left and right insula and posterior cingulate, which are part of SN. It must also be fully recognized that despite the simplified and abstract nature of the point-light stimuli, they were universally understood as meaningful in a specific social context (the game of soccer).

Post. cingulate 23 −6 −55 −20 6.9

Medial occipital 18 −9 −76 −23 8.6 66

31 −18 −58 −23 6.7

The direct comparison of deception identification with direction identification in the present results provides further insights into specialization within the AON/SN network. First of all, there was significantly greater activation of ACC and caudate nucleus in the direction identification task compared with the deception identification task. The role of anterior cingulate has been established in response conflict and suppression of incorrect response tendencies (Carter and van Veen, 2007); and the caudate has been implicated in the learning of associations between stimuli and response tendencies (Melcher et al., 2013). This interpretation is consistent with both previous and present results; for example, Bishop et al. (2013) found ACC activity in higher-skilled players in the presence of deception, at a very early occlusion level, −160 ms. Bishop et al. (2013) also proposed that enhanced caudate activation in experts when predicting direction at very early occlusion (−160 ms), prior to an oncoming opponent's change of direction, indicated the learning of response contingencies. In the present study, awareness that an automatic left or right response tendency may need to be corrected, according to whether the move appears normal or deceptive, would occur only in the direction identification task. Thus, the greater caudate activation when predicting direction, may arise because the close mapping of leftward and rightward movements to left or

6 L −27 −55 −29 7.4 6 L −6 −55 −23 7.3



*VLN, ventral lateral nucleus; VPLN, ventral posterior lateral nucleus.*

right sided responses, respectively, is contingent upon whether the move is deceptive or normal; conversely, the identification of a move as deceptive or normal was not contingent on movement direction.

There was relatively greater activation in deception identification in right dorsolateral prefrontal cortex, which has a strong relationship with top-down cognitive and attentional control (Fassbender et al., 2006). This would be consistent with an interpretation that deception identification requires cognitive effort, whereas direction prediction is a more automatic perceptualmotor task (Kibele, 2006). The fMRI results are also consistent with Ivanoff et al. (2008) who found increased activation in



#### **Table 7 | Expertise group differences at** *p <* **0.05 FWE corrected.**

pre-SMA (medial frontal cortex, BA6) associated with criterion (β) effects in a motion coherence task.

The behavioral data identified the strong influence of the trial type (deceptive vs. normal) on accuracy, and found significant interactions with task type. It may be possible in future to conduct a finer-grained analysis of fMRI responses to normal and deceptive moves using single-trial blocks (Bishop et al., 2013) and multi-voxel pattern analysis (Norman et al., 2006) in order to study how normal and deceptive moves are classified, and how this classification interacts with other variables such as temporal occlusion and task type.

It is likely that for both higher- and lower-skilled players, deception identification is a less practiced skill, requiring greater cognitive effort, and that conversely, the ability to react to the trajectory of someone's body actions is to some extent based on general as well as sport-specific experience, and therefore, likely to have become somewhat automatic. Thus, for late-occluded sequences, the direction of a normal move was determined at significantly higher accuracy than identification of this move as normal, suggesting that if the valid direction cues can be picked



up, they readily prime the appropriate response. However, analysis of the sensitivity (d ) which takes into account the proportion of incorrect responses to deceptive moves, showed similar but slightly lower overall sensitivity for direction identification compared with deception identification, which would be consistent with the similar global strength of fMRI activations seen across tasks.

Likewise, early-occluded deceptive moves gave rise to significantly worse than chance direction identification because lowerskilled players especially were not simply responding randomly but were misdirected by the deceptive cues. This was borne out by analysis of likelihood ratio (β). In the direction identification task, observers adopted a liberal criterion, that is, one which increases both hits and false positives, and this was interpreted as a direct response to directional cues—veridical cues in the case of normal moves and false cues in the case of deceptive moves. Conversely, in the deception detection task, male observers tended to adopt a conservative criterion on late-occluded stimuli, which means that they were biased toward judging such moves as deceptive, and this inflated their correct detections of deceptive moves and reduced their correct detections of normal moves. Their overall accuracy remained higher than that of the two lower-skilled groups, as revealed by the d measure. The ability to identify a move as deceptive however, does not guarantee that its true direction can be identified. Experts are known in some circumstances to delay their responses (Brault et al., 2012; Mori and Shimada, 2013), perhaps so that they can inhibit and correct their initial automatic reactions. This dissociation between performance on deception and identification tasks is consistent with the differing involvement of the components of the AON and SN observed in the brain imaging data.

#### **ACKNOWLEDGMENTS**

Supported by the Research Grants Council of the Hong Kong Special Administrative Region, China. Project No. HKU 7400/05H.

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 29 August 2013; accepted: 21 November 2013; published online: 17 December 2013.*

*Citation: Wright MJ, Bishop DT, Jackson RC and Abernethy B (2013) Brain regions concerned with the identification of deceptive soccer moves by higher-skilled and lowerskilled players. Front. Hum. Neurosci. 7:851. doi: 10.3389/fnhum.2013.00851*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2013 Wright, Bishop, Jackson and Abernethy. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Expertise in action observation: recent neuroimaging findings and future perspectives

#### *Luca Turella1 \*, Moritz F. Wurm1, Raffaele Tucciarelli <sup>1</sup> and Angelika Lingnau1,2*

*<sup>1</sup> Center for Mind/Brain Sciences (CIMeC), University of Trento, Trento, Italy*

*<sup>2</sup> Department of Cognitive Sciences, University of Trento, Trento, Italy*

*\*Correspondence: luca.turella@gmail.com; luca.turella@unitn.it*

#### *Edited by:*

*Robert Langner, Heinrich Heine University Düsseldorf, Germany*

*Reviewed by:*

*Svenja Caspers, Research Centre Juelich, Germany*

**Keywords: fMRI, expertise, action observation network, mentalizing system**

#### **INTRODUCTION**

In everyday life, we continuously interact with other individuals. Understanding actions of other people, i.e., the ability to distinguish between different actions, such as passing over vs. threatening someone with a knife, has been crucial for the survival of our species and is a fundamental capability for our social interactions.

Neuroimaging studies investigated the neural substrates subtending action perception using a variety of techniques, ranging from univariate analysis of fMRI data (Brass et al., 2007; Gazzola et al., 2007; De Lange et al., 2008; Gazzola and Keysers, 2009; Turella et al., 2009a, 2012; Wurm et al., 2011; Wurm and Schubotz, 2012; Wurm et al., 2012; Lingnau and Petris, 2013), to fMRI repetition suppression (Dinstein et al., 2007; Chong et al., 2008; Lingnau et al., 2009; Kilner et al., 2009) and multivoxel pattern analysis (MVPA; Dinstein et al., 2008a; Oosterhof et al., 2010, 2012). These studies reported the consistent recruitment of a number of regions, generally assumed as pertaining to two different networks, typically referred to as the action observation network (AON) and the mentalizing system (**Figure 1A**). Both networks have been advocated to be involved in action understanding (Brass et al., 2007; De Lange et al., 2008; Van Overwalle, 2009; Van Overwalle and Baetens, 2009; Wurm et al., 2011), but their precise roles and their causal involvement are strongly debated (Dinstein et al., 2008b; Mahon and Caramazza, 2008; Hickok, 2009; Turella et al., 2009b; Rizzolatti and Sinigaglia, 2010).

In homology with monkey neurophysiological studies, three regions have been proposed to form the human AON (Rizzolatti and Craighero, 2004; Rizzolatti and Sinigaglia, 2010; see **Figure 1A**). This "core" AON was defined as comprising (i) the ventral premotor cortex (PMV) together with the posterior part of the inferior frontal gyrus (pIFG), (ii) the anterior inferior parietal lobule (aIPL) and (iii) the superior temporal sulcus (STS). Human neuroimaging studies suggested the recruitment of several additional areas that were incorporated in an "extended" version of the AON (Gazzola and Keysers, 2009; Caspers et al., 2010; see **Figure 1A**).

The *mentalizing system* has been identified in human neuroimaging studies investigating social cognition tasks, such as intention and beliefs attribution about the self or others, while observing actionrelated stimuli (Van Overwalle, 2009). The regions consistently assigned to this network are the medial prefrontal cortex (mPFC) and the temporo-parietal junction (TPJ) (**Figure 1A**), and less often also the precuneus and the posterior cingulate cortex (Amodio and Frith, 2006; Brass et al., 2007; De Lange et al., 2008; Van Overwalle, 2009; Van Overwalle and Baetens, 2009).

The first description of the involvement of sensorimotor regions during action perception started with the discovery of mirror neurons in the ventral premotor cortex in macaque monkeys (Di Pellegrino et al., 1992). These visuomotor neurons responded both while the monkey executed or observed similar actions and were later described also within the monkey inferior parietal lobule (Fogassi et al., 2005). Note that both regions also contain neurons with motor-only and visual-only properties (Gallese et al., 1996, 2002).

Following their discovery, motor theories of action understanding proposed that mirror neurons might provide the basis for a matching mechanism between what we observe and what we can perform allowing the understanding of observed actions in motoric terms (Rizzolatti et al., 2001). Even if this hypothesis is strongly debated (Jacob and Jeannerod, 2005; Mahon and Caramazza, 2008; Hickok, 2009, 2013), a similar homologue mechanism has been proposed to exist in the human AON (Rizzolatti et al., 2001; Rizzolatti and Craighero, 2004; Rizzolatti and Sinigaglia, 2010).

In this brief overview, we will first describe previous fMRI studies that investigated how motor experience affects activation within the AON, and to which degree these studies allow drawing conclusions about the role of this network in action understanding. As the majority of the studies investigated only the AON and given the limited scope of this Opinion, we will focus on this network, even if our considerations might also hold true for other areas. We will then try to delineate how future studies might exploit motor expertise as a tool for gaining insights into the neural basis of action understanding.

#### **RECENT NEUROIMAGING FINDINGS ON MOTOR EXPERTISE IN ACTION OBSERVATION**

Following motor theories of action understanding, changes in motor repertoire should modify the brain response within the AON while observing these newly acquired actions. Starting from this assumption, most studies on expertise investigated how the acquisition of a skilled action, such as sport or

**FIGURE 1 | Continued**

**FIGURE 1 | (A)** Schematic representation of the core AON, extended AON and mentalizing system. Three-dimensional representation of lateral and medial brain surface. The regions assigned to the "core" AON, "extended" AON and the "mentalizing" system are depicted. In red, the core AON is presented comprising: the PMV/pIFG complex, the aIPL and the STS. In pale red, the "extended" AON is presented comprising: the anterior part of the inferior frontal gyrus, (aIFG), the dorsal premotor cortex (PMD), the supplementary motor area (SMA), the superior parietal lobule (SPL), the anterior intraparietal sulcus (AIP), the somatosensory cortex (S1) and the occipito-temporal cortex (OTC), including also STS. The mentalizing system (blue) is assumed to consist of the medial prefrontal cortex (mPFC) and the temporo-parietal junction (TPJ). Note that the extension of these networks is not representative of their real dimension or functional significance. **(B)** Expertise effects. Three-dimensional representation of lateral and medial

brain surface with location of peaks for the comparisons of interest superimposed. For Kim et al. (2011), we considered the comparison between the two groups (Table 2). For Wright et al. (2010), we considered results from ROI analysis (Table 2). For Wright et al. (2011), we plotted results for normal video (Table 2). For Abreu et al. (2012), we used the peak of the significant cluster within the temporal lobe in the group comparison (page 1649 of the manuscript). For Calvo-Merino et al. (2005), we plotted the reported interaction (see Table 1). For Calvo-Merino et al. (2006), we plotted the results from Table S2. For Cross et al. (2006), we considered the main effect of the contrast of interest (Table 2). For Cross et al. (2009a), we reported the contrast for physical training (Danced *>* Untrained) (Table 1). For Cross et al. (2009b), we considered the physical training results (Table 1) and the observational training results (Table 1). We excluded the peaks located within the cerebellum.

dance moves, affects AON activity while observing the same movement.

Most of the contributions investigating motor expertise while observing sport actions are limited to one or few studies within the same domain, such as archery (Kim et al., 2011), badminton (Wright et al., 2010, 2011) or basketball (Abreu et al., 2012). Typically, these studies compare the blood-oxygen-level-dependent (BOLD) response between experts and novices. Although these studies considered different tasks and comparisons of experimental conditions, they seem to suggest a stronger activation for experts in comparison to novices not limited to the AON but recruiting also other brain regions (see **Figure 1B**). However, an interpretation of these results is difficult as, in addition to extensive practice of the observed movements, experts also have a strong visual familiarity with the observed stimuli which might affect the BOLD effect within the very same regions.

Beside these sparse investigations on different sport actions, a more systematic investigation involved the effect of dance expertise on activity within the AON (Calvo-Merino et al., 2005, 2006; Cross et al., 2006, 2009a,b, 2012, 2013; Pilgramm et al., 2010). Calvo-Merino et al. (2005) measured the BOLD effect of ballet dancers, capoeira dancers, and non-dancers watching two different types of dance movements (ballet or capoeira moves). They found a stronger recruitment of several regions within the AON (bilateral PMD, bilateral SPL and AIP, left PMV and left STS) in ballet and capoeira dancers for the observation of the trained in comparison to the untrained dance style, whereas they found no difference between the two dance styles in the nondancers.

Calvo-Merino et al. argued that the activation for the trained in comparison to the untrained dance style was due to simulation of those actions that were within the motor repertoire of the dancer. Alternatively, as pointed out above, dancers' strong visual familiarity with the observed stimuli might affect the measured difference in BOLD effect.

In a follow up study, Calvo-Merino et al. (2006) investigated this issue by trying to disentangle the different contributions of visual familiarity and motor practice on the BOLD effect within the AON of expert dancers. They exploited the fact that some ballet movements are gender-specific while others are commonly performed by both male and female dancers. Calvo-Merino et al. (2006) found that activity within several regions of the AON (left PMD, bilateral AIP) was higher when observing actions within the observer's motor repertoire. However, visual familiarity might have played a role also in this study since dancers might have gathered more visual experience with those movements that are part of their own motor repertoire.

Another series of studies by Cross et al. (2006, 2009a,b) explored how activity related to action observation is modified after the acquisition of motor (physical practice) and/or visual experience (visual practice) with specific dance actions. These authors demonstrated stronger activity within AON regions during the observation and imagination of observed actions which were previously trained physically in comparison to actions that were not (Cross et al., 2006). In a subsequent study Cross et al. (2009a,b) showed that both previous physical and visual practice of dance sequences modulates activity within the AON while observing dance movements.

**Figure 1B** shows the peaks of activations for the different motor expertise studies. It is evident that there seems to be a consistent recruitment of premotor and parietal nodes of the AON for observing trained with respect to untrained moves, but, at the same time, there is also a widespread recruitment of other brain regions.

These studies suggest an effect of motor expertise on AON activation while *perceiving* an action, but it is difficult to assess the involvement of the AON in action *understanding* as none of these studies adopted a task directly investigating this process in a quantitative manner. Action understanding is intended here as the distinction between different actions irrespective of the properties (e.g., kinematics, goal, environmental cues, etc.) adopted to achieve such discrimination. We will elaborate on this point in the final section.

#### **FUTURE PERSPECTIVE: USING MOTOR EXPERTISE TO STUDY ACTION UNDERSTANDING**

In this section, we discuss possible ways of testing the proposed role of the AON in action understanding. If the ability to understand actions depends on sensorymotor representations of these actions, then an experience-based modification (either impairment or improvement) of these representations should lead to a corresponding measurable modification in the ability to understand these actions, as in tasks involving action recognition. Crucially, it is also necessary to discount the possible role of regions outside this network (e.g., the *mentalizing system*).

Motor expertise might serve as an interesting tool to test the involvement of areas within and outside the AON in action understanding. However, one of the problems to overcome is making sure that the learned movements were not previously experienced by the participants. As most everyday actions are physically or visually experienced during normal development, the new acquisition of complex movements, such as sport and dance moves, allows to more easily control for possible confounds related to previous exposure or practice of the studied movements. Another problem to face is that performance might be close to ceiling in tasks using natural stimuli (videos or pictures of actions), making it difficult to find a modulation of performance as a function of motor experience. One possibility to overcome this issue could be to use point-light display (Johansson, 1973). This stimulation recruits part of the AON (Saygin et al., 2004; Wright et al., 2011), and its perception has been shown to be affected by motor expertise (Casile and Giese, 2006). Furthermore, the adoption of point-light display might mitigate visual familiarity confounds as they do not resemble a "natural" stimulation, and they can be easily manipulated in order to disrupt the perceived movement simply by adding noise. A recent study (Lingnau and Petris, 2013) adopted this approach and observed that the ability to understand actions decreased with increasing noise level.

This approach could be adopted to investigate differences in action understanding, using point-light display with different level of noise, within the same individual on trained and untrained stimuli after different types of practice (as in Cross et al., 2006, 2009a,b). In addition to the possible effects of visual and physical practice, a motor-only training could be introduced where physical practice might be performed blindfolded in order to eliminate potential visual confounds (as in Casile and Giese, 2006). These different types of training might affect common or different parts of the brain during action understanding. Crucially, motor learning without visual feedback alone could determine a modification in action understanding performance and a related functional modification within or outside regions of the AON. This could demonstrate that motor learning alone might have an effect on visual recognition of trained actions, avoiding interpretational confounds induced by a concomitant visual learning.

We have highlighted motor expertise as an interesting experimental manipulation to comprehend the role of the AON in action understanding. Further, these studies will profit strongly from the adoption of new MVPA decoding techniques (Kriegeskorte and Bandettini, 2007) as they allow a more fine-grained distinction (e.g., between different types of observed or executed actions, see also Oosterhof et al., 2013) that are not possible to reveal with univariate methods. This could be especially useful to assess decoding accuracy modifications between different actions (e.g., move A vs. move B) based on the type of training (trained vs. untrained) with different levels of noise. Further, changes in decoding accuracy between different actions before and after training might be also informative regarding the regions affected by the different types of training (physical, visual or motor-only).

To conclude, this Opinion focused on describing neuroimaging investigations on action perception/understanding, which are correlational in nature. It is not possible to define a causal link between such results and concomitant behavioral changes. However, these studies might provide interesting starting points for future studies using TMS in healthy participants or voxel-based lesion-symptom mapping in brain damaged patients.

#### **ACKNOWLEDGMENTS**

This work was supported by a CARITRO grant of the Fondazione Cassa di Risparmio e Rovereto to Angelika Lingnau, and by the Provincia Autonoma di Trento.

#### **REFERENCES**

Abreu, A. M., Macaluso, E., Azevedo, R. T., Cesari, P., Urgesi, C., and Aglioti, S. M. (2012). Action anticipation beyond the action observation network: a functional magnetic resonance imaging study in expert basketball players. *Eur. J. Neurosci.* 35, 1646–1654. doi: 10.1111/j.1460- 9568.2012.08104.x


and executed movements. *J. Neurophysiol.* 98, 1415–1427. doi: 10.1152/jn.00238.2007


agent do not modulate human brain activity during action observation. *Neuroimage* 46, 844–853. doi: 10.1016/j.neuroimage.2009. 03.002


*Received: 30 July 2013; accepted: 13 September 2013; published online: 16 October 2013.*

*Citation: Turella L, Wurm MF, Tucciarelli R and Lingnau A (2013) Expertise in action observation: recent neuroimaging findings and future perspectives. Front. Hum. Neurosci. 7:637. doi: 10.3389/fnhum. 2013.00637*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2013 Turella, Wurm, Tucciarelli and Lingnau. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Reorganization and plastic changes of the human brain associated with skill learning and expertise

#### *Yongmin Chang\**

*Department of Molecular Medicine and BK21 Plus KNU Biomedical Convergence Program, Kyungpook National University School of Medicine, Daegu, South Korea*

#### *Edited by:*

*Merim Bilalic, University Tübingen, Germany*

#### *Reviewed by:*

*Marika Berchicci, University of Rome "Foro Italico", Italy Ana M. Abreu, Technical University of Lisbon, Portugal*

#### *\*Correspondence:*

*Yongmin Chang, Department of Molecular Medicine and Radiology, Kyungpook National University and Hospital, 50 Samduk-Dong 2Ga, Chung-Gu, Daegu 700-721, South Korea e-mail: ychang@knu.ac.kr*

#### **INTRODUCTION**

Neuroplasticity, which refers to the brain's ability to change its structure and function, is not an occasional state of the brain, but rather the normal ongoing state of the human brain throughout the life span (Zilles, 1992; Pascual-Leone et al., 2005; Kempermann, 2006; Jancke, 2009). Plastic changes in the human brain lead to brain reorganization that might be demonstrable at the level of behavior, anatomy, and function and down to the cellular and even molecular levels (Kolb and Whishaw, 1998; Kelly and Garavan, 2005; Kleim et al., 2006).

Intentional practice in sports and music has been shown to contribute to acquisition of expertise (Schlaug, 2001; Baker et al., 2003; Hutchinson et al., 2003; Lotze et al., 2003; Calvo-Merino et al., 2005; Ericsson, 2005; Cross et al., 2006; Hung et al., 2007; Nielsen and Cohen, 2008). Acquisition of expertise is accompanied by structural and functional changes of the brain and the advent of brain imaging methods has bolstered the study of these changes in the human brain. Understanding of the neural mechanisms underpinning expertise may provide a basis for determining what types of practice or training are most likely to be beneficial for performance enhancement. This knowledge may also provide a clue as to why some people show improvement at different rates than others or reach much higher levels of achievement. Thus, the study of plastic changes associated with skill learning and expertise in the human brain is one of the most challenging areas of current neuroscience research.

This mini review provides a summary of the *in vivo* imaging evidence of longitudinal and cross-sectional studies on structural and functional plasticity of the human brain in skill learning and expertise with emphasis on sports and music. In the literature, a cross-sectional approach has been most widely used, and many interesting findings have been reported. However, one of the criticisms of cross-sectional studies is that the differences in brain organization are possibly correlational, and, thus, caution should

Novel experience and learning new skills are known as modulators of brain function. Advances in non-invasive brain imaging have provided new insight into structural and functional reorganization associated with skill learning and expertise. Especially, significant imaging evidences come from the domains of sports and music. Data from *in vivo* imaging studies in sports and music have provided vital information on plausible neural substrates contributing to brain reorganization underlying skill acquisition in humans. This mini review will attempt to take a narrow snapshot of imaging findings demonstrating functional and structural plasticity that mediate skill learning and expertise while identifying converging areas of interest and possible avenues for future research.

**Keywords: expertise, plasticity, reorganization, neuro imaging, skill learning**

be used in order not to draw overly strong causal inferences from the cross-sectional data. The concept of plasticity can involve many levels of organization involving molecular, neuronal, or chemical events, and these molecular views of neuroplasticity are beyond the scope of this mini review.

#### **STRUCTURAL NEUROPLASTICITY IN SKILL LEARNING AND EXPERTISE**

#### **CROSS-SECTIONAL STUDIES**

Cross-sectional imaging studies have demonstrated structural changes of the human brain as a result of experience and learning in sports and music (Amunts et al., 1997; Gaser and Schlaug, 2003; Bangert and Schlaug, 2006; Jacini et al., 2009; Jäncke et al., 2009; Park et al., 2009; Hänggi et al., 2010; Wan and Schlaug, 2010; Wei et al., 2011; Di Paola et al., 2013). For example, Jacini et al. (2009) reported that elite judo players had significantly higher gray matter volume in the frontal lobe, related to motor planning and execution and in regions of the prefrontal cortex, related to working memory and cognitive processes, compared to control subjects. Training induced enlargement in gray matter structure was not limited to brain regions associated with motor planning and execution. When compared to age-matched control subjects, world-class mountain climbers showed significantly larger vermian lobule volumes, possibly associated with highly dexterous hand movements and eye-hand coordination in detection of and correction of visuomotor errors (Di Paola et al., 2013). In the music domain, with measurement of the length of the posterior wall of the precentral gyrus as an estimate of the size of the hand motor area, Amunts et al. (1997) identified substantial structural differences in the hand motor area between professional musicians and non-musicians: in general, the hand motor area was larger in professional musicians than in non-musicians. More importantly, the authors also found that the measures of hand motor area on both hemispheres showed correlation with the age of commencement of musical training, implying that earlier musical training results in a stronger impact on structural changes in the hand motor area.

In a study using the voxel-based morphometry (VBM) technique, it was found that skilled golfers (professional and low handicap golfers) had larger gray matter volumes in a frontoparietal network, including premotor and parietal areas (Jäncke et al., 2009). Using the VBM approach, Gaser and Schlaug (2003) reported that professional keyboard players showed differences in gray matter volume in motor, auditory, and visual-spatial brain regions when compared with a matched group of amateur players and non-keyboard players. While the majority of studies on structural neuroplasticity have reported increased gray matter density or volume in expert brains, few studies have reported on the inverse relationship, that is, decreased gray matter volume (Draganski et al., 2006; Hänggi et al., 2010). The several possible reasons for discrepant findings were suggested (Hänggi et al., 2010).

A handful of studies have investigated differences in white matter structure between experts and non-experts, using diffusion tensor imaging (DTI); however, the results have been inconsistent. Using DTI, Jäncke et al. (2009) demonstrated decreased white matter volume and fractional anisotropy (FA) values in several brain structures, including the corticospinal tract (CST), in skilled golfers, compared with less-skilled golfers. Additional evidence for decreased white matter volume and FA values was reported in a study of professional ballet dancers (Hänggi et al., 2010). Contrary to decreased FA values in white matter structures, a very recent study on professional gymnasts showed increased FA values in the bilateral CST in elite gymnasts, possibly in response to long-term gymnastic training as compared to the control subjects (Wang et al., 2013). Inconsistent results have also been reported in the music domain. Imfeld et al. (2009) reported significantly lower FA values in both the left and the right CST in professional musicians compared to non-musicians. However, in another study, pianists who practiced frequently showed higher FA values (Han et al., 2009). Therefore, it appears that acquisition of further evidence will be necessary in order to make a conclusion with regard to whether specific structural changes in white matter can be induced by extensive training.

#### **LONGITUDINAL STUDIES**

To date, only a small number of longitudinal studies have investigated structural brain reorganization as a result of experience and learning. Draganski et al. (2004) investigated the training effect of juggling in inexperienced young jugglers. After a 3-month training period, subjects in the training group showed changes in gray matter density in the intraparietal sulcus and the midtemporal area of visual cortex. The intraparietal sulcus is involved in transforming retinotopic into body centered information necessary to visually control movements. The midtemporal area of visual cortex is a highly specialized brain area for analyzing visual movement information. Of particular interest, the authors also found that after another 3 months without juggling practice, the increase in gray matter density following practice had diminished in all subjects in juggling practice, indicating that structural plasticity is reversible. In a recent study of 60-year old elderly individuals who were able to learn juggling, gray matter changes related to skill acquisition were observed in the midtemporal area of visual cortex similar to that found in young subjects, suggesting that age is not in itself a limiting factor for structural brain plasticity driven by skill learning (Boyke et al., 2008). In a more recent longitudinal study using VBM, in golf novices between the ages of 40 and 60 years, 40 h of golf training showed an association with gray matter increases in a task-relevant cortical network encompassing sensorimotor regions and areas belonging to the dorsal stream (Bezzola et al., 2011). More importantly, in that study, a strong positive relationship was observed between the increase in gray matter and training intensity in the parieto-occipital junction (POJ), a critical structure of the dorsal stream. A recent review provided evidence of a close association of the POJ with visuomotor processes, particularly in the on-line control and on-line correction of visually guided arm movements (Kravitz et al., 2011). For musical training, Hyde et al. (2009) found that 6-year-old children receiving instrumental musical training for 15 months showed structural change in brain areas such as the precentral gyrus, which is known to be involved in control of playing a musical instrument. Most of these brain areas are part of the cortical motor system; however, structural changes in the auditory system, such as the Heschl gyrus and the corpus callosum, were also observed. These structural changes in the brain showed correlation with performance on various auditory and motor tasks. In addition, in the music domain, the evidence suggests that training-induced plasticity in musicians appears to be most prominent in those who engaged in practice early in childhood (for a review, see Wan and Schlaug, 2010).

#### **FUNCTIONAL NEUROPLASTICITY IN SKILL LEARNING AND EXPERTISE**

#### **CROSS-SECTIONAL STUDIES**

In motor function, a common finding is the functional enlargement or focused activation of the motor area involved in control of that particular skill (Krings et al., 2000; Pearce et al., 2000; Lotze et al., 2003; Haslinger et al., 2004; Meister et al., 2005; Bangert and Schlaug, 2006). For example, Pearce et al. (2000) reported that the cortical representation of the hand used for playing is larger in professional racquet ball players as compared with novices. In music, one study demonstrated a differential brain adaptation depending on instrument played (Bangert and Schlaug, 2006). More specifically, keyboard players had the left motor area more pronounced as they predominantly use the right hand. In contrast, string players had the right motor area pronounced as the left hand is crucially engaged while playing.

Recent neuroimaging studies have attempted to elucidate the neural activity during action observation in expert brain (Calvo-Merino et al., 2006; Pilgramm et al., 2010; Kim et al., 2011). For example, Calvo-Merino et al. (2006) demonstrated the neural bases of motor influences on action observation in expert ballet dancers. They have shown an effect of motor expertise on neural activation within the ventral premotor area and also stronger activation in the inferior parietal and cerebellar regions when observing dance videos, suggesting that the action observation network is more extended than previously suggested (Di Pellegrino et al.,



*MRI, magnetic resonance imaging; VBM, voxel-based morphometry; GM, gray matter; DTI, diffusion tensor imaging; FA, fractional anisotropy; DBM, Deformation based morphometry; TMS, transcranial magnetic stimulation; fMRI, functional MRI; EMG, Electromyography; EEG, electroencephalography.*

1992). For motor planning in expertise, an fMRI study using motor imagery task, which refers to the mental rehearsal of motor acts, demonstrated that the task-related neural networks of expert golfers are focused and efficiently organized, whereas novices have difficulty filtering out irrelevant information (Milton et al., 2007). This finding is consistent with the notion of relative economy (neural efficiency) in the cortical processes of elite athletes during the specific challenge in which they are highly practiced. Similar finding was also observed in professional musicians. Lotze et al. (2003) reported that professional violinists showed focused cerebral activations in the contralateral primary sensorimotor cortex, the bilateral superior parietal lobes, and the ipsilateral anterior

cerebellar hemisphere as compared to amateur violinists during the imagination of violin-playing movements.

As for the visuospatial abilities in sport, evidences seem to suggest that experts differ in visuospatial abilities directly tied to their domain of expertise. For example, one study reported that expert athletes did not differ in their visuospatial capacity than novices as measured on the general visuospatial test (Furley and Memmert, 2010). However, a recent study using fMRI reported quantitative differences in brain activation during visuospatial processing between elite rugby players and novices, indicating the possible existence of a strategy (a bird's eye view) regarding visuospatial cognitive processing for elite rugby players that differs from that of novices (Sekiguchi et al., 2011). More recently, Seo et al. (2012)investigated possible difference in cognitive strategy between archery experts and novices in visuospatial working memory processing. According to their results, archery experts have increased activation in cortical regions important for visuospatial attention and working memory, suggesting that degree of expertise may modulate higher order brain functioning. Taken together, these studies therefore demonstrated that the differences in visuospatial abilities are pronounced in specific domain but those differences did not transfer outside the domain to general visuospatial ability. The possible modulation on function of working memory and attention by expertise was also recently demonstrated in music training. In their multilevel crosssectional study, Oechslin et al. (2013) found evidence for stepwise modulation of brain responses according to level of music expertise in a fronto-temporal network hosting functions of working memory and attention.

#### **LONGITUDINAL STUDIES**

For motor skill acquisition, previous studies using fMRI demonstrated that learning of sequential finger movements initially leads to a functional expansion in the primary motor cortex (M1) and this change in M1 follows more dynamic, rapid changes in the cerebellum, striatum, and other motor-related cortical areas, suggesting an experience-dependent shift of activation from a cerebellar–cortical to a striatal–cortical network with extended practice (Karni et al., 1995; Doyon et al., 2002). In addition, repetition of movements has been suggested to result in motor memories in the primary motor cortex and probably other cortical areas that encode the kinematic details of the practiced movements (Classen et al., 1998; Butefisch et al., 2000; Stefan et al., 2005; Cross et al., 2009). Of particular interest, previous studies have demonstrated that motor memory can also be encoded by action observation and this form of action observation can enhance the effects of motor training on memory encoding, possibly through modulation of intracortical excitatory mechanisms (Stefan et al., 2005; Celnik et al., 2006).

Formation of multisensory connection during motor learning has often been reported in music. In a longitudinal EEG study (Bangert and Altenmüller, 2003), beginning pianists, who had never played an instrument before, were trained on a computer piano over a period of 5 weeks. They listened to short piano melodies, and, after a brief pause, they were then required to replay the melodies using their right hand. After 5 weeks of practice, listening to piano tunes produced additional activity in the sensorimotor regions and in turn, playing on a keyboard produced additional activity in the auditory regions. Therefore, this study nicely demonstrates how dynamic brain adaptations accompany these multisensorimotor learning processes. In another longitudinal study using fMRI (Herdener et al., 2010), the neural responses of musical students in acoustic novelty detection were compared before and after two semesters of intensive aural skills training. Following the training period, hippocampal responses to temporal novelty in sounds were increased in music students. A previous study suggested involvement of the hippocampus in various forms of novelty detection in addition to its role in memory (Knight, 1996; Strange et al., 1999). Therefore, this study provides evidence for functional plasticity in the adult hippocampus related to musical training.

#### **CONCLUDING REMARKS**

Over the past decades, advances in human brain imaging have provided new insights into the neuroplastic changes underlying skill learning and expertise in both sports and music (**Table 1**). These plastic changes can be seen at both structural and functional levels (**Figure 1**). A main finding of structural plasticity is increased volume and gray matter density of brain areas involved in control of the practiced task. Another major finding in structural plasticity is that experience dependent structural changes can disappear when practicing stops, indicating that structural plasticity is possible in all directions. In musical expertise, one of the distinctive features of structural neuroplasticity is that brain plasticity can be found more clearly if practice starts at a young age. That is, a period might exist, beyond which music-induced structural changes and learning effects are less pronounced. Unfortunately, such studies on a sensitive period are missing in the sport domain.

In functional reorganization, a common finding is the functional enlargement or focused activation of the motor area involved in control of that particular skill. In addition, because expert performance is mediated by cognitive and perceptual motor skills, functional imaging evidence has shown that functional neuroplasticity occurs not only in the motor domain but also in cognitive and perceptual domains associated with improved performances. Furthermore, in music, evidence has demonstrated a strong coupling of sensorimotor and auditory processing for music expertise. Practice in playing a music instrument involves constant improvement of complex sensory-motor coordination through repeated execution of motor activities under the controlled monitoring of the auditory system.

Despite accumulation of significant imaging evidence, as discussed in the current mini-review, understanding of mechanisms underlying these plastic changes is still far from complete which opens a broad avenue for future research. For example, neuroplasticity can be traced to cellular and molecular levels, and, thus, one of the main challenges is linking human brain imaging findings to the underlying molecular events. Because the poor specificity of macroscopic MR imaging signals largely precludes molecular information, other non-invasive approaches would be needed. Of these methods, molecular imaging using positron emission tomography (PET) is a good candidate. Although there is still a lack of prospective studies on plasticity, integration of PET into MRI with simultaneous recordings of molecular and hemodynamic brain responses opens new and promising prospects for the future (Judenhofer et al., 2008). Another challenge for the understanding of neural mechanisms underlying plastic changes is time scale of neural activity, because the temporal resolution of fMRI in the order of seconds is approximately three orders of magnitude away from the time scale of neural events in milliseconds order. Therefore, for measurement of brain activity on a time scale of neuronal activity and for assessment of specific neurophysiological events in human, combined fMRI with non-invasive electrophysiological methods such as electroencephalography (EEG) would be beneficial for simultaneous measurement of neuronal and neural brain responses. Combined EEG and fMRI studies can thus take advantage of both, the good spatial resolution of fMRI and the good temporal resolution of EEG (Thees et al., 2002; Debener et al., 2005).

#### **ACKNOWLEDGMENTS**

This study was supported by a grant of the Korean Health Technology R & D Project, Ministry for Health, Welfare, and Family Affairs, Republic of Korea (A092106).

#### **REFERENCES**


somatosensory categorization. *Neuroimage* 18, 707–719. doi: 10.1016/S1053- 8119(02)00054-X


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 18 July 2013; accepted: 17 January 2014; published online: 04 February 2014. Citation: Chang Y (2014) Reorganization and plastic changes of the human brain associated with skill learning and expertise. Front. Hum. Neurosci. 8:35. doi: 10.3389/ fnhum.2014.00035*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Chang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

**REVIEW ARTICLE** published: 07 May 2014 doi: 10.3389/fnhum.2014.00280

## Experts bodies, experts minds: How physical and mental training shape the brain

#### *Ursula Debarnot 1,2 , Marco Sperduti 3,4 , Franck Di Rienzo2 and Aymeric Guillot 2,5 \**

*<sup>1</sup> Département des Neurosciences Fondamentales, Centre Médical Universitaire, Université de Genève, Genève, Suisse*

*<sup>2</sup> Centre de Recherche et d'Innovation sur le Sport, Université Claude Bernard Lyon 1, Université de Lyon, Villeurbanne Cedex, Lyon, France*

*<sup>4</sup> Laboratoire Mémoire et Cognition, Institut de Psychologie, Boulogne-Billancourt, France*

*<sup>5</sup> Institut Universitaire de France, Paris, France*

#### *Edited by:*

*Merim Bilalic, Alpen-Adria-Universität Klagenfurt, Austria*

#### *Reviewed by:*

*Luca Turella, University of Trento, Italy Alessandro Guida, University of Rennes 2, France*

#### *\*Correspondence:*

*Aymeric Guillot, Centre de Recherche et d'Innovation sur le Sport, EA 647, Université Claude Bernard Lyon 1, Université de Lyon, 27-29 boulevard du 11 Novembre 1918, 69622 Villeurbanne Cedex, Lyon, France e-mail: aymeric.guillot@univ-lyon1.fr*

Skill learning is the improvement in perceptual, cognitive, or motor performance following practice. Expert performance levels can be achieved with well-organized knowledge, using sophisticated and specific mental representations and cognitive processing, applying automatic sequences quickly and efficiently, being able to deal with large amounts of information, and many other challenging task demands and situations that otherwise paralyze the performance of novices. The neural reorganizations that occur with expertise reflect the optimization of the neurocognitive resources to deal with the complex computational load needed to achieve peak performance. As such, capitalizing on neuronal plasticity, brain modifications take place over time-practice and during the consolidation process. One major challenge is to investigate the neural substrates and cognitive mechanisms engaged in expertise, and to define "expertise" from its neural and cognitive underpinnings. Recent insights showed that many brain structures are recruited during task performance, but only activity in regions related to domain-specific knowledge distinguishes experts from novices. The present review focuses on three expertise domains placed across a motor to mental gradient of skill learning: sequential motor skill, mental simulation of the movement (motor imagery), and meditation as a paradigmatic example of "pure" mental training. We first describe results on each specific domain from the initial skill acquisition to expert performance, including recent results on the corresponding underlying neural mechanisms. We then discuss differences and similarities between these domains with the aim to identify the highlights of the neurocognitive processes underpinning expertise, and conclude with suggestions for future research.

#### **Keywords: expertise, motor skill, motor imagery, meditation, motor consolidation, neural networks**

Brain plasticity refers to the putative changes in neural organization that accounts for the diverse forms of short-lasting or enduring behavioral modifiability. There is considerable evidence that neuronal plasticity is not an occasional state, but rather allows the human brain to adapt to environmental pressure, physiologic changes, and experiences (Johansson, 2004). Interestingly, changes in the input of any neural system, or in the trafficking of its efferent connections, lead to reorganizations that are visible at the level of behavior, anatomy, and physiology, encompassing cellular and molecular levels (Pascual-Leone et al., 2005). Currently, the challenge is to explore in greater details the processes of neuroplasticity and how to modulate them to achieve the best behavioral outcome. Typically, the human brain undergoes constant changes triggered by environmental stimulation or resulting from intrinsic remodeling activity (Johansson, 2011). Therefore, brain is the source of behavior, but in turn is modified by the behavior itself.

Most of the prior studies that aimed to investigate the neural substrates underlying expertise compared the skill level of novices and experts, which gives a snapshot of two endpoints on the skill level continuum. So far, a growing number of investigations rather test subjects at different occasions to assess the gain of expertise throughout training period, hence the"novice state" can be compared with the "expert state," or additionally in between (Guida et al., 2012). Using both approaches enables to infer the neural reorganization that contributes to reach the highest level of performance, which is impacted and reinforced through the consolidation processes (Diekelmann et al., 2011). This set of processes takes place automatically, without awareness, and allows the conversion of the initial unstable memory representation into a more stable and effective form, available for continued reactivation and recall over extended periods of time (Stickgold and Walker, 2007). Interestingly, processes of consolidation over time can also facilitate behavior, often through offline memory reorganization (Stickgold and Walker, 2005). Indeed, continued plasticity over time is crucial whenever newly acquired information are integrated with old memories (Abraham and Robins, 2005). Given this consolidation effect, some investigations include follow-up tests (for example, Draganski et al., 2004; Scholz et al., 2009). For all of these memory processes, sleep has been shown to play a critical role in meeting the demands of the organism (Walker and Stickgold, 2010; Rasch and Born, 2013). At the functional level, the consolidation refers

*<sup>3</sup> Centre de Psychiatrie et Neurosciences (Inserm UMR S894), Université Paris Descartes, Paris, France*

to processes in which reverberating activity in newly encoded representations stimulate a redistribution of the neuronal representations to other neuronal circuitries for long-term storage (Dudai, 2004).

On the basis of several thousand years of education, along with more recent laboratory research on learning and skill acquisition, a number of conditions for optimal learning and improvement of performance have been uncovered. The most common condition for optimal learning and improvement of performance concerns the extended periods of deliberate practice (Ericsson et al., 1993). Four hours of physical/mental deliberate practice daily for approximately 10 years purportedly leads, for example, to expertise in chess, music composition, art, sport, and science (the 10-year or 10,000-h rule; Ericsson, 2008). One of the major interests for modern neuroscience is to investigate the plastic changes that occur in brain structures when people participate in such intensive motor and/or mental training. Nowadays, skill acquisition and training in various domains, from motor function to higher order cognitive skills, were shown to elicit substantial changes in brain anatomy (Jäncke, 2009). Previous studies including cross-sectional and longitudinal designs showed that skill acquisition and practice can induce changes in the functional properties of specific brain regions involved in the corresponding task (Buschkuehl et al., 2012; Guida et al., 2012). Recent investigations further demonstrated that not only brain activity, but also gray and white matter structures change as a consequence of skill learning (Draganski et al., 2004; Scholz et al., 2009). For instance, recent neuroimaging studies focusing on the effects of motor or mental (meditation) training, lasting several weeks in previously untrained participants, showed plastic changes in specific white matter regions (Scholz et al., 2009; Tang et al., 2012). Overall, technological and methodological advances in neuroimaging and non-invasive human brain stimulation provided insights into the neuroplastic mechanisms that underlie skill expertise. More generally, cross-sectional paradigms, where highly skilled participants were compared with less-skilled persons, showed a strong link between skill acquisition and neuronal plasticity at both cortical and subcortical levels over time, engaging different spatially distributed and interconnected brain regions.

A fundamental ability of the human brain is to form and retrieve memories enabling the individual to adapt its behavior to the demands of an ever-changing environment, and appropriately select and improve the behaviors of its given repertoire. The distinction between expertise based on declarative (*knowing that*) and procedural knowledge (*knowing how*) is directly related to the real-world domains of expertise. Some domains are mostly characterized by one type of knowledge, although experts may have both and often need to rely on one type of knowledge rather than the other (Dror, 2011). Practically, the most appropriate type of expert knowledge depends on the situational demands and cognitive mechanisms involved in operationalizing this knowledge (e.g., Beilock et al., 2004; Dror, 2011). Such kind of adaptation is a reliable marker of expertise. Interestingly, it has been shown that there are both costs and benefits to expertise, so that the inflexibility of experts might make them unable to adapt to new task demands, at least in some occasions (Bilalic et al., 2008a,b) Typically, experts

may fail to develop successful patterns of thought by using a less familiar, albeit more appropriate solution, when available. Bilalic et al. (2008a,b, 2010a), however, demonstrated that individuals with the highest degree of expertise remained more flexible and more likely to find the optimal solution by memory retrieval and/or by search.

The primary aim of the present paper is to review studies showing how learning and experience induce structural and functional brain plasticity that supports expertise, following a motor to mental gradient of skill (**Figure 1**). We not only consider the structural and functional organization that the brain shapes along the course of gain in expertise, but also the brain's potential to reorganize its functional organization by modifying its structure in response to practice. Meeting this challenge, this review integrates available results from several neuroimaging approaches in human populations and animal models. For the first time, we address these issues in exploring three domains of expertise which are usually disconnected (motor skill, mental simulation of movement, and meditation). We first report and underline separately the neuroanatomic correlates of expertise following a motor to mental gradient, and then attempt to identify and link the common patterns of changes in the brain subserving expert-level of performance within each domain. Practically, using such a transversal approach, this review aims at providing new insights about how brain plasticity occurs and supports the expert level of performance.

#### **WHEN MOTOR SKILLS EXPERTISE SUBSERVES BRAIN OPTIMIZATION**

Recent years have seen a significant growing of knowledge on the neuroplasticity that underpins motor skill learning from its acquisition to the expert level of performance (for review, see Patel et al., 2013). Basically, compared to novices, highly trained individuals exhibit a number of differences including a reduction in the variability of repeated movements (Milton et al., 2004), in muscle activation (Lay et al., 2002), and a decrease in the overall volume of brain activation together with a relative increased intensity of activation in specific brain regions necessary for the execution of the task (Jäncke et al., 2000; Munte et al., 2002; Lotze et al., 2003).

In our attempt to understand such robust expertise-dependent differences, we first report in this section the main data interesting how motor skills are acquired and enhanced through the consolidation process. Then, we review recent findings showing changes in the pattern of brain activity as well as modifications in the neural structures associated with expertise.

#### **THE NEUROCOGNITIVE BASIS OF MOTOR SKILL LEARNING**

Motor skill acquisition refers to the process by which a movement is performed effortlessly through repeated practice and interactions with the environment (Willingham, 1998). Accurate motor performance is essential to almost everything we do, from typing to driving, or playing sports. In cognitive psychology, theoretical descriptions of changes in skilled performance were shown to move from cognitive to automatic processing (Fitts, 1964). The key concept is the increasing automaticity: controlled processes are attention demanding, conscious and inefficient, whereas automatic processes are rapid, smooth, effortless, require little attentional capacity, and are difficult to be consciously disrupted (Shiffrin and Schneider, 1977). Two experimental paradigms are used to investigate the cognitive processes and the neural substrates mediating our capacity to learn behaviors: the first measures the incremental acquisition of movements into a well-executed behavior (motor sequence learning), whereas the second tests our capacity to compensate for environmental changes (motor adaptation). Doyon and Benali (2005) proposed an integrated view of the functional plasticity that such a motor memory trace undergoes in each case. This model suggests that depending upon the nature of the cognitive processes required during learning, both motor sequence and motor adaptation tasks recruit similar cerebral structures early in the learning phase, including the striatum, cerebellum, and motor cortical regions, in addition to prefrontal, parietal and limbic areas. A shift of motor representation from the associative to the sensorimotor striatal territory can be seen during sequence learning, whereas additional representation of the skill can be observed in the cerebellar nuclei after practice on motor adaptation tasks. When consolidation has occurred, the participant has achieved asymptotic performance. However, the neural representation of a new motor skill is believed to be distributed in a network of structures involving the cortico-striatal and/or cortico-cerebellar circuits depending on the type of skill acquired. At this stage, the model suggests that motor adaptation rather involves the cerebellum for the retention and future executions of the acquired skill. By contrast, a reverse pattern of functional plasticity occurs in motor sequence learning, where the cerebellum is no longer essential, and the consolidation of the skill involves representational changes in the striatum and associated motor cortical regions. New insights into the neuroplastic mechanisms that underlie motor skill learning corroborate that skill acquisition is subserved by multiple mechanisms that operate across different temporal scales (Dayan and Cohen, 2011).

Practically, it is not automaticity *per se* that is indicative of high proficiency, but rather the level of skill at which automaticity is attained. Although the border between the automaticity and the expertise concepts beg for clarification, one may consider that most people fail to develop beyond a hobbyist level of performance as they settle into automaticity at a given level of skill that they find enjoyable, rather than continuing to improve skills (Ericsson, 2007). Hence, automaticity is more a false ceiling than a measure of excellence. Previous studies typically defined skill acquisition in terms of reduction in the speed of movement execution or reaction times, increase in accuracy, or decrease in movement variability. Yet, such measurements are often interdependent, in that faster movements can be performed at the cost of reduced accuracy and vice versa, a phenomenon which has been often referred to the speed-accuracy trade-off (Fitts, 1954). One solution to this issue is through assessment of changes in speed-accuracy trade-off functions, i.e., to defy the speed–accuracy tradeoff for a given task (Krakauer, 2009; Krakauer and Mazzoni, 2011). In other words, a skilled tennis player can serve both faster and more accurately than a novice. Thus, sporting skill at the level of motor execution can be considered as acquiring a new speed– accuracy trade-off relationships for each sub-task of the motor sequence.

#### **FUNCTIONAL PLASTICITY CHARACTERIZING NOVICES AND EXPERTS**

The existing longitudinal studies in healthy participants attached great importance to controlled practice situations by keeping training parameters as much constant as possible (e.g., training duration per day, overall training duration, training schedule, strategies, etc.). Movement automatization reflects high level of motor skill performance and has been associated with increased activation in the primary motor cortex (M1), primary somatosensory cortex, supplementary motor area (SMA), and putamen, as well as decreased activation in the lobule VI of the cerebellum (Floyer-Lea and Matthews, 2005; Lehericy et al., 2005). Thus, training-related automaticity decreases in the fronto-parietal and dorsal attention networks, hence suggesting that progress from acquisition to automatization stages of motor skill learning is characterized by concomitant reduced demands on externally focused attention and executive function (Kelly and Garavan, 2005). Such pattern of activation is particularly elicited for motor sequence learning, but one would hypothesize that forms of motor and visuomotor learning which are more cognitive or associative in nature (Parsons et al., 2005) may recruit slightly different cerebral networks undergoing other patterns of plasticity with learning. For example, the preparatory period has been most extensively studied in athletes where it is called the "pre-shot routine." Consistency and reproducibility of pre-shot routines are suggested to be among the most important differences that distinguish experts from novices in sport such as Golf (Feltz and Landers, 1983), compared to some motor skills such as goal kicking in rugby (Jackson, 2003) where no association between temporal consistency of the pre-shot routine and performance has been observed. In a recent study, Milton et al. (2007) found different functional activations during the pre-shot routine in expert and novice golfers. Especially, the posterior cingulate cortex, the amygdala–forebrain complex, and the basal ganglia were active only in novices, whereas experts yielded activation primarily in the superior parietal lobule, the dorsal lateral premotor area, and the occipital cortex. These results suggest that the disparity between the quality of the performance of novice and expert

golfers lies at the level of the functional organization of neural networks during motor planning. More generally, Patel et al. (2013) demonstrated that spatially distributed cortical networks and subcortical striatal regions may serve as neural markers of practice interventions.

The expertise stage of motor skill learning, both in humans and animals, consistently reported either increased or decreased M1 activation depending on both the time interval and task complexity. On one hand, performing an explicit sequence of finger movements over several weeks showed a progressive increase of activity in M1 (Karni, 1995; Hlustik et al., 2004; Floyer-Lea and Matthews, 2005), hence reflecting recruitment of additional M1 units into the local network that represents the acquired sequence of movements (Ungerleider et al., 2002). Learning a motor sequence over several days was also accompanied by an increase in the size of motor maps and cortico-motoneuronal excitability of the body parts involved in the task (Pascual-Leone et al., 1995). Such plastic changes in M1 function linked with slow stage of motor skill learning are well established in animal models as well. For example, functional reorganization of movement representations in M1 has been documented in squirrel monkeys (Nudo and Milliken, 1996) and rodents (Kleim et al., 2004). It was found that an expansion in movement representations with training, detectable only after substantial practice periods, paralleled behavioral gains. Such findings challenged theories of neural efficiency proposing that optimized neural processing is associated with reduced activity in M1 (Jäncke et al., 2000; Krings et al., 2000; Haslinger et al., 2004). Especially, an increased activity of the motor network was reported, including M1, cerebellum, premotor cortex, basal ganglia, pre-SMA, and SMA in the initial acquisition of motor skills, with significant attenuation of activity following consolidation of the motor skill (Steele and Penhune, 2010). Recently, Picard et al. (2013) examined the consequence of practice-dependent motor learning on the metabolic and neural activity in M1 of monkeys who had extensive training (∼1–6 years) on sequential movement tasks. They found that practicing a skilled movement and the development of expertise lead to lower M1 metabolic activity, without a concomitant reduction in neuron activity. In other term, they showed that less synaptic activity was required to generate a given amount of neuronal activity. The authors suggested that this gain in M1 efficiency might result from a number of factors such as more effective synapses, greater synchrony in inputs and more finely tuned inputs. They concluded that low activation in M1 elicited during extended practice might be a reflection of plastic mechanisms involved in the development of expertise. Although there is no clear consensus about M1 activation during the gain of expertise, it is more likely that time interval might be one of the main factors that play a role in brain modifications. This issue has been partially raised by Hotermans et al. (2008) who investigated the role of M1 in the different phases of motor memory consolidation by applying repetitive transcranial magnetic stimulation (rTMS) over M1 just after a training session involving a sequential finger-tapping task. They showed that interfering with M1 attenuated the early post-training performance without any detrimental consequences on the long-term behavioral improvement tested 4 or 24 h after. These results support that M1 is causally involved early after the

acquisition of a new motor-skill but is no longer mandatory following a consolidation-period. Overall, increased activity in M1 might correspond to the integration of the new sensory information during the short period of training of a new skill, but when the new learned skill becomes automatic, M1 activation is less important. **Table 1** summarizes the human studies reviewed in this section with the brain areas showing early–late and expert-related functional plasticity.

An alternative explanation for the modulation of M1 activity has been suggested by Landau and D'Esposito (2006)who reported that optimized motor system is capable of greater flexibility and adaptability, depending on the nature of the task demands. Using a complex sequential finger task, they found increased motor activations in pianists than non-pianists, and further argued that the former may yield decreased activation when they were less challenged. Further research is therefore needed to determine the modulation of M1 activation in experts and novices using task characteristics that vary across several degrees of difficulty either early or late during the learning process.

#### **EXPERIENCE-DEPENDENT STRUCTURAL CHANGES IN THE HUMAN BRAIN**

Besides to the functional reorganization of the brain motor networks, current neuroimaging studies suggest that physical practice is also reflected in macroscopic changes in motorrelated structures (Draganski and May, 2008). Practically, higher gray matter volume in auditory, sensorimotor, and premotor cortex, as well as in the cerebellum, was reported in musicians compared to non-musicians (Gaser and Schlaug, 2003). Related findings have been reported in many other specific tasks, including typing (Cannonieri et al., 2007), basketball (Park et al., 2009, 2011), or golf performance (Jäncke et al., 2009; Bezzola et al., 2011). Although these findings tend to follow the law "more skill, more gray matter," there is new evidence that training and ensuing expertise may induce local decrease of cortical volume (Hänggi et al., 2010; Granert et al., 2011). Controversial findings were also reported on trainingrelated plasticity in white matter microstructure in healthy adults, although fewer studies dealing with this issue have been published.

On the one side, numerous investigations reported a strong association between specialized skill and structural changes in particular brain structures (Maguire et al., 2000; Hutchinson et al., 2003; Mechelli et al., 2004; Park et al., 2009). In a longitudinal study, Draganski et al. (2004) used a complex visuo-motor juggling task where perception and anticipation of moving targets determined the planning of the subsequent motor action. Young volunteers were scanned before and after 3 months of a daily training period. The results showed a transient bilateral expansion in gray matter in the mid-temporal area (hMT/V5) and the left inferior parietal sulcus. The authors concluded that juggling, and consequently the perception and spatial anticipation of moving objects, is a stronger stimulus for structural plasticity in visual areas than in motor areas. Similarly, cross-sectional neuroimaging studies showed experience-dependent structural plasticity in the cerebellum following training of complex motor skills (Park et al., 2009; Scholz et al., 2009). For example, Park et al. (2012)


*FTT, finger tapping task; rTMS, repetitive transcranial magnetic stimulation; fMRI, functional magnetic resonance imagery; , increase; , decrease; M1, primary motor cortex; S1, primary somatosensory cortex; DLPC, dorsolateral prefrontal cortex; SMA, supplementary motor area; pre-SMA, pre-supplementary motor area; PMC, premotor cortex; CMA, cingulate motor areas.*

found greater right- than left cerebellar volume asymmetry and relatively larger volumes of right hemisphere and vermis lobules VI–VII (declive, folium, and tuber) in short-track speed skating players compared to matched controls. This finding suggests that the specialized abilities of balance and coordination are associated with structural plasticity of the right hemisphere of cerebellum and vermis VI–VII, these regions playing a critical role in balance and coordination. Cannonieri et al. (2007) further found that the volume of brain areas corresponding to the motor skill increased proportionally to the duration of the training period. Together, these findings demonstrate that practice modulates brain anatomy specifically associated with practice demands, albeit opposite structural neural correlates reflecting stepwise increases in expertise have also been found. For instance, Hänggi et al. (2010) found differences in structural

characteristics within the sensorimotor neural network between professional women ballet dancers and novices. Especially, they reported decreased gray matter volumes in the left premotor cortex, the SMA, the putamen, and the superior frontal gyrus anterior to the premotor cortex. More recently, James et al. (2014) reported changes in gray matter as a function of musical training intensity in three groups of young adults (non-musicians, amateurs, and professionals). Surprisingly, they observed a progressive increase of gray matter density with respect to the level of expertise in several regions involved in higher-order cognitive processing (e.g., right fusiform gyrus activated for visual pattern recognition), whereas an opposite pattern of results were found in sensorimotor areas. To summarize, the type of density changes (increase or decrease) and the localization of structural plasticity in gray matter may be related to different factors

such as the nature of the motor-skill, duration and stage of practice.

Nowadays, investigations are designed to determine the time scales of gray matter changes from novices to experts (Taubert et al., 2010; Dayan and Cohen, 2011). So far, the process of gray matter adaptation has been observed as early as following 7 days of practice (Driemeyer et al., 2008) and as late as after 6 weeks (Scholz et al., 2009), hence demonstrating a relatively fast and durable structural gray matter plasticity. Although interpretation of such striking results are premature, it has been proposed that processes occurring both at the synapse level and larger scales (e.g., glial hypertrophy), may play a contributory role (Draganski and May, 2008).

Recent longitudinal neuroimaging studies further focused on the effects of motor training lasting several days or weeks in previously untrained participants, and showed specific structural plasticity in white matter regions (Scholz et al., 2009; Taubert et al., 2010). Scholz et al. (2009) reported experience-induced changes in white matter architecture following a short period of practice. Practically, it was found that 6 weeks of juggling practice protracted an increased fractional anisotropy in a region of white matter underlying the intraparietal sulcus. Interestingly, Della-Maggiore et al. (2009) further demonstrated that the speed in a visuomotor adaptation task might be partially determined by the variation of fractional anisotropy in the posterior cerebellum and superior cerebellar peduncle. Together, these findings show that rates of motor skill practice might correlate with higher values of fractional anisotropy at a local level triggered by the nature of the motor task. Cross-sectional studies, primarily in highly trained musicians, also examined white matter correlates of skilled behavior (Schmithorst and Wilke, 2002; Bengtsson et al., 2005; Han et al., 2009). Bengtsson et al. (2005) found a correlation between fractional anisotropy in the posterior limb of the internal capsule, which contains descending corticospinal fibers from the primary sensorimotor and premotor cortices, with number of practice hours during childhood in skilled musicians. These results demonstrate that training during a critical developmental period may induce local structural plasticity. Rüber et al. (2013) further reported higher fractional anisotropy values in musicians than non-musicians in descending motor tracts, with differences when practice was a fine finger unimanual or bimanual motor skill. The matter tracts were modified to reflect specific motor demands, unimanual motor skill primarily eliciting structural remodeling of right hemispheric tracts, and bimanual motor skill leading to bilateral structural tract remodeling. Accordingly, Roberts et al. (2012) compared the behavior and brain structure of healthy controls with a group of karate black belts, an expert group who are able to perform rapid, complex movements that require years of training. As expected, experts were more able than novices to coordinate the timing of inter-segmental joint velocities, and data revealed significant group differences in the microstructure of white matter in the superior cerebellar peduncles and M1, these brain regions participating to the voluntary control of the movement. Overall, as experts demonstrate optimal behavior on specific tasks, differences in white matter structure relative to novices might reflect a "fine-tuning" of the connectivity between specific brain

regions. Nevertheless, contrasting patterns of results shows lower white matter values in experts than in non-experts. In order to disentangle the unresolved question of whether sensorimotor training leads to increased or decreased white matter density, Imfeld et al. (2009) investigated the corticospinal tract of musicians using different methods of analysis (fiber tractography, voxelwise analysis, region of interest analysis, and detailed slicewise analysis of diffusion parameters). Data clearly demonstrated that sensorimotor training leads to decreased white matter density. This finding is supported by Hänggi et al. (2010) who found reduced white matter volume in dancers vs. non-dancers. Further investigations are required to understand the cellular mechanisms underlying learning-dependent changes in white matter microstructure.

#### **THE NEURAL CORRELATES OF MOTOR IMAGERY**

Across a motor to mental gradient of skill learning, there is now compelling evidence that motor imagery contributes to enhance motor skill learning and motor performance (for reviews, see Feltz and Landers, 1983; Driskell et al., 1994; Guillot and Collet, 2008; Schuster et al., 2011). Motor imagery is a dynamic state during which one simulates an action mentally without any concomitant body movement. Since the last three decades, the advent of functional brain mapping studies has allowed researchers to investigate the neural correlates of motor imagery and to understand in greater details the neural underpinnings of expertise in imagery.

#### **THE EFFECT OF EXPERTISE LEVEL**

In previous sections, we reviewed a handful of experimental studies providing clear evidence that individuals differ in their ability to perform voluntary motor acts, and that advanced motor learning is associated with functional brain plasticity (for reviews, see Doyon and Benali, 2005; Doyon, 2008; Johansson, 2011). Practically, neuroimaging studies showed that the neural networks activated by the execution of the motor task differed as a function of the individual expertise level. Although there are few studies looking at this issue in motor imagery, similar observations were reported in the neural activations between novices and experts. For instance, by comparing the neural substrate of judgment processing in amateurs and professionals "Go players," Ouchi et al. (2005) found that in the checkmate-decision problems, the precuneus and cerebellum were activated in the professionals, while the premotor and parieto-occipital cortices were extensively activated in the amateurs. Their results support that the precuneus and the cerebellum play a crucial role in processing of accurate judgment by visual imagery. Likewise, brain imaging studies investigating object and pattern identification, such as pieces on chess and their respective prospective functions (e.g., imagining moving a piece to capture another), revealed that expertise modulates the activity of several regions. Accordingly, Bilalic et al. (2010a,b, 2011) showed that chess-specific object recognition was accompanied by bilateral activation of the occipito-temporal junction, while chess-specific pattern recognition was related to bilateral activations in the middle part of the collateral sulci. More generally, Bilalic et al. (2012) provided evidence

that experts not only engage the same regions as novices, but also recruit additional regions including bilateral activation of the retrosplenial cortex, the collateral sulcus, and the temporoparietal region (see also Rennig et al., 2013), hence suggesting that the pattern of activation moves from frontal parts at the beginning of the process to posterior parts responsible for retrieval of domain specific knowledge around the final expertise stage.

Several other studies were designed to investigate the neural networks mediating the phenomenological experience of imagining music (Halpern and Zatorre, 1999; Halpern et al., 2004; Zatorre et al., 2007; for reviews, see Zatorre and Halpern, 2005; Lotze, 2013). A nice study by Lotze et al. (2003) compared the patterns of brain activation during auditory imagery in experienced and novice musicians, with the formers reporting high vividness and frequent use of imagery. Interestingly, they showed that experienced musicians overall recruited fewer cerebral areas, while amateurs manifested a widely distributed activation map. In the professional group, however, more activation was observed in regions assuming motor functions including the SMA, the superior premotor cortex, and the cerebellum, as well as in the superior parietal lobule. By means of magnetoencephalography, Herholz et al. (2008) also compared musicians and non-musicians during imagery of a musical task. An early pre-attentive brain response (imagery mismatch negative response) to unexpected continuations of imagined melodies was observed only in musicians, hence reflecting the neuroplasticity due to intense training for music processing.

Similar experiments were conducted in the field of motor imagery *per se* by comparing the neural networks mediating the imagery experience in novices and elite athletes (e.g., Milton et al., 2008). Especially, Ross et al. (2003) compared the brain activations of six participants during motor imagery of a golf swing. They found an inverse relationship between brain activity and skill level, i.e., decreased activations occurred with increased golf skill level, especially in the SMA and cerebellum. Also, imagined golf swing elicited little activation of basal ganglia and cingulate gyri across all skill levels (see also Milton et al., 2007). Wei and Luo (2010) compared the pattern of cerebral activations in professional divers and novices during imagery of both professional (diving task) and simple (basic gymnastics task) motor skills. Elite athletes yielded greater activation in the parahippocampus during imagery of professional skills and a more focused activity of the prefrontal regions in both tasks. In a comparable study in archers, Chang et al. (2011) reported peaks of activation in premotor and SMA, in the inferior frontal region, as well as in basal ganglia and cerebellum, in novices. In contrast, elite archers involved predominantly the SMA, hence confirming a more focused pattern of activity following intense training. The between-group analysis revealed that novices exhibited significantly higher activation in M1, premotor area, inferior parietal cortex, basal ganglia, and cerebellum.

Overall, these data strongly support the existence of distinct neural mechanisms of motor expertise during imagery, as a function of the individual skill level. Interestingly, dynamic changes resulting from intense practice of a given motor task tend to support a reduction of the general cortical activation during motor imagery, with a more refined and circumscribed pattern of activity in trained participants (**Figure 2**).

Apart differences in terms of motor expertise, there is now ample evidence that the expertise in the use of imagery, which commonly refers to the individual imagery ability, also widely varies across individuals. It is therefore possible distinguishing good from poor imagers. Yet, very few experimental studies investigated the functional neuroanatomical correlates of imagery ability/expertise (**Figure 3**). Guillot et al. (2008) compared the pattern of cerebral activations in 13 skilled and 15 unskilled imagers, during both physical execution and imagery of a finger movement sequence. As expected, both groups manifested similar peaks of activation in many cerebral regions (inferior and superior parietal lobules, as well as motor-related regions including the lateral and medial premotor cortex, the cerebellum and putamen). Inter-group comparisons revealed, however, that good imagers activated more the parietal and ventrolateral premotor regions, known as playing a crucial role in the formation of the mental images. In contrast, poor imagers recruited the cerebellum, the orbito-frontal and posterior cingulate cortices. With reference to the motor sequence learning literature (Doyon and Ungerleider, 2002; Doyon and Benali, 2005), these findings strongly support that the neural networks mediating expertise in motor imagery are not identical in high and lowskilled individuals. Findings also suggested that compared to poor imagers, good imagers have a more efficient recruitment of movement engrams. In a recent study, van der Meulen et al. (2014) investigated the effect of imagery ability/expertise on the neural correlates of gait control. They confirmed that both good and poor imagers groups activated a network of similar brain activations. Good imagers, however, showed greater activity in the motor-related areas including the left M1, right thalamus and bilateral cerebellum, as well as the left prefrontal cortex contributing to higher order gait control. A greater activation was also found in the right SMA. Differences in the experimental designs as well as the criteria for determining the individual imagery ability may explain the differences between the two latter studies in terms of brain activations. Despite this, these studies provide a better understanding of the neural networks underlying imagery ability/expertise and highlight the importance of assessing the ability of participants to generate accurate mental images in order to adjust and individualize the content of mental practice programs.

#### **BRAIN PLASTICITY FOLLOWING MOTOR IMAGERY PRACTICE**

We provided evidence that the neural networks underlying motor imagery in novices and elite athletes are not totally overlapping but selectively depend upon the individual level of motor expertise and imagery ability. Based on this assumption, one may postulate that the pattern of cerebral activation recorded during imagery in poor imagers and/or novices might improve and evolve close to that observed in good imagers and/or experts with practice. This might suggest that the expected changes in subcortical and cortical activations during motor imagery would reflect those

elicited by the process of actual motor learning. In a pioneering series of experimental studies, Lacourse et al. (2004, 2005) reported functional cerebral and cerebellar sensorimotor plasticity following either physical or motor imagery practice. In line with such hypothesis, Lafleur et al. (2002) and Jackson et al. (2003) demonstrated that the functional plasticity occurring during the incremental acquisition of a motor sequence was also reflected during motor imagery. In other words, the patterns of dynamic changes in cerebral activity were significantly different when comparing both early and more advanced learning phases of imagined sequential foot movements. In particular, Lafleur et al. (2002) observed significant differences medially in the rostral portion of the anterior cingulate and orbito-frontal cortices, as well as in the striatum, bilaterally. Jackson et al. (2003) also explored the functional cerebral reorganization following motor imagery learning of a similar task. They confirmed the robustness of Lafleur et al.'s (2002) findings by a regression analysis, which showed a positive correlation between the increase in cerebral blood flow within the right medial orbito-frontal cortex and the individual performance enhancement. Recently, Baeck et al. (2012) confirmed the neuroplasticity elicited by motor imagery practice, while Bezzola et al. (2012) demonstrated that performing intensive physical training influences subsequent motor imagery of the corresponding task.

#### **CONCLUSION**

The results reviewed in this section support the existence of distinct neural mechanisms for expertise in imagery. Accordingly, the neural networks mediating the imagery experience in individuals with poor imagery ability are not totally similar to those observed in high imagers. As well, comparisons between expert athletes and novices demonstrated different patterns of brain activation during motor imagery of the corresponding task. At this stage, one cannot totally rule out that the modulation in brain activity might also arise from the visual familiarity by experts of the movement to be imagined. Accordingly, higher activation might thus arise by the fact that it's easier to imagine an action either because experts can perform the action or because they see/experience this action extensively in daily life. Such comparisons between expert and non-expert athletes, or between good and poor imagers, thus suffer from a major limit related to the familiarity with the movement to imagine, regardless the motor experience *per se*.

An interesting result supporting the functional equivalence between motor imagery and motor performance is that the functional plasticity that occurs during mental practice was found to closely mimic that observed after physical practice of the same motor skill. Imagery training may thus result in dynamic plastic changes so that the neural networks mediating imagery practice in poor imagers become closer to those observed in good imagers. Along these lines, real-time image analyses in functional magnetic resonance imaging (fMRI) studies may provide to participants some objective information related to the vividness of their imagery content. This method might be particularly useful during the imagery learning process, with regards to the

et al. (2012).

potential modification of the mental images, when the pattern of activation during mental simulation is not the one expected. In other words, the participant may directly modulate his/her level of motor imagery expertise.

cortex, 6 – frontal cortex, 7 – dorsal premotor cortex, 8 – anterior cingulate

A significant illustration of the strength of this methodology in the field of motor imagery was offered by de Charms et al. (2004) during imagery of a manual action task. In this study, participants received feedback about the activation level in the somatomotor cortex with a simple virtual reality interface. The results showed that they enhanced the level of activation driven by motor imagery in the somatomotor cortex through the course of training. Moreover, the activation of this region after imagery training was as robust as that recorded during actual practice. Yoo et al. (2008) later showed that real time fMRI might help individuals to learn how to increase region-specific cortical activity associated with a motor imagery task. Practically, the level of increased activation in motor areas was consolidated after the 2-week self-practice period. More recently,Xie et al. (2011)supported the effectiveness of delivering neurofeedback during motor imagery using real-time fMRI, and further provided evidence that the SMA was controllable by participants. These data strongly support that real-time fMRI is a valuable technique to investigate whether participants are able to use a cognitive strategy to control a target brain region in realtime, and that motor imagery can reflect plastic changes of neural correlates associated with intensive training (Baeck et al., 2012).

#### **MENTAL TRAINING: MEDITATION AS A PARADIGMATIC EXAMPLE**

In this section, we discuss recent findings on the cognitive processes and the neural underpinning supporting a pure mental

task, i.e., meditation. First, we briefly introduce the concept of meditation and give some definition broadly diffused in the neuroscientific community, then we review findings on the cognitive enhancement linked to different levels of meditative expertise. Finally, we expose recent findings on the brain structural and functional changes associated with long-term meditation practice and the neuronal underpinning of meditation. In the last two sections, we mainly take into account data from functional and structural neuroimaging studies without considering results from the large body of researches using electroencephalography (we address readers to some recent extensive reviews of this field, e.g., Cahn and Polich, 2006; Travis and Shear, 2010).

### **MEDITATION: BASIC CONCEPTS**

Meditation is commonly used in the literature as an umbrella term that covers different practices ranging from yoga, tai-chi, transcendental meditation and different techniques derived from the Buddhist tradition. Here we will mainly focus on the latter, since they have received great attention in the last years, also due to their clinical application (Rubia, 2009), and since they focus on the training of well-defined cognitive processes. Indeed, a common feature of these practices is the voluntary control of the attentional focus. A distinction between focused attention (FA) and open monitoring (OM) techniques has been proposed (Lutz et al., 2008). FA practices are based on the concentration of attention on a particular external, corporal, or mental object while ignoring all irrelevant stimuli. At the opposite, OM techniques try to enlarge the attentional focus to all incoming sensations, emotions, and thoughts from moment to moment without focusing on any of them with a non-judgmental attitude (Lutz et al., 2008).

FA is thought to not only train sustained attention but also to develop three attentional skills: the monitoring and vigilance to distracting stimuli beyond the intended focus of attention, the disengagement of attention from distracting stimuli once the mind has wondered, and the redirection of FA on the intended object. OM meditation involves more monitoring processes of one's own phenomenological experience and is thought to develop awareness and non-reactive meta-cognitive monitoring (Lutz et al., 2008; Slagter et al., 2011).

Expertise in other domains (e.g., sport, music) is defined as the achievement of better performance compared to novices in the particular activity that is trained. Following this definition it is not clear what could be an objective measure of expertise in meditation, since there are no clear assessments of "meditative performance," and the quality of meditation is at best measured by subjective introspective reports. For this reason, the only consensual measurable parameter of expertise is the extent of practice reported in years or hours per week of practice. Nevertheless, an indirect quantitative measure of expertise could be the improvement of cognitive functions that are supposed to be trained during meditation. Thus, in the following section, we start presenting behavioral results that have highlighted cognitive improvements in expert meditators.

#### **COGNITIVE ENHANCEMENT INDUCED BY MEDITATION**

Meditation training has been recently shown to improve different cognitive abilities. One of the most investigated domains, not surprisingly, is that of attentional improvement after either long- or short-term meditative practice.

Following the distinction of attentional processes proposed by Posner and Petersen (1990)in three subdivisions comprising alerting (the ability to reach and maintain a vigilance state), orienting (the capacity of focusing attention on a subset of stimuli) and conflict resolution or executive attention (the ability to resolve conflict or allocate limited resources between competing stimuli), different researchers have employed the Attentional Network Task (Fan et al., 2002) to investigate the impact of meditation on each subcomponent. Results on expert mindfulness meditators showed better performance in the orienting component and a trend to better executive attention (van den Hurk et al., 2010). These findings are consistent with other studies reporting enhanced sustained attention in expert meditators (Valentine and Sweet, 1999; Pagnoni and Cekic, 2007; Josefsson and Broberg, 2010).

In another study, Jha et al. (2007) compared performance on a similar task between a group of expert meditators, a group that followed a 5-week mindfulness training, and a control group. They further compared the performance between the three groups at baseline (T1) and after a 1-month meditative retreat for the meditators group, the 5-week training for the mindfulness group, and no treatment for the control group (T2). Their findings showed that at T1, experts had better performance in conflict monitoring compared to the other two groups, while at T2 the mindfulness group showed enhanced orienting and the retreat group performed better in the alerting component. In the same vein, Tang et al. (2007) reported that a short 5-days meditation training enhanced performance in conflict monitoring.

Other results in accordance with a benefit of meditation on the executive component of attention come from a study investigating the attentional blink effect. This effect consists on the fact that if two target stimuli (t1 and t2) are presented in rapid succession, normally the t2 is not detected (Raymond et al., 1992; Ward et al., 1996). Slagter et al. (2007) reported that after 3 months the intensive meditative practice the attentional blink was reduced as the result of reduced neurocognitive resources allocated to t1, as evidenced by a reduced electrophysiological brain potential (the P3b) normally associated with attentional resource allocation. These results, using a similar paradigm, were also replicated in a sample of more expert meditators (van Leeuwen et al., 2009).

Taken together, these findings suggest that the more affected component of the attentional network would be the executive one, even if some studies reported better performance on the orienting or the alerting component (van den Hurk et al., 2010). It should be noted, however, that participants in this study had a much longer meditation experience (mean 14.5 years) than in the others. This could suggest that the executive system is the first attentional component to benefit from meditation training and that longer practice is needed to achieve improvement in the other systems.

Beyond attentional performance, preliminary findings have shown that meditation could have a beneficial effect on other cognitive functions. For example, some studies reported better performance in participants assigned to a mindfulness meditation group on several executive functions such as verbal fluency (Wenk-Sormaz, 2005; Heeren et al., 2009; Zeidan et al., 2010), cognitive inhibition (Heeren et al., 2009) and working memory (Chambers et al., 2008; Zeidan et al., 2010; Mrazek et al., 2013).

Moreover, it has been shown that short-term meditation training can have a positive effect on autobiographical memory specificity in formerly depressed patients (Williams et al., 2000) and also in non-clinical population (Heeren et al., 2009), but these effects are largely mediated by enhanced performance in executive functions (Heeren et al., 2009).

#### **STATE-DEPENDENT FUNCTIONAL ACTIVATION DURING MEDITATION**

In this section, we mainly focus on the ongoing brain activity during the practice of meditation and in particular on the evolution of this activity with the extent of practice. Before starting reviewing existing data, we want to underline that since meditation is a long-lasting state, in which different parallel cognitive processes coexist, that could be achieved with a variable lapse of time, it is a phenomenon that is difficultly suitable for current neuroimaging investigation. Indeed, the temporal course for entering the meditation state is unknown and variable across participants. One interesting solution has been recently applied by Hasenkamp et al. (2012). In their study, they asked participants who performed FA meditation to press a button as soon as they realize their attention was wandering. Using this stratagem the authors were able to dissociate different aspects of the meditative process: mind wandering, awareness of mind wandering, shift of attention and FA. Moreover, in trying to adapt meditation task to neuroimaging protocol, most of the previous studies used blocked designs with short meditation period (i.e., from 30 s to several minutes); this methodology may not reflect the

complexity of meditative processes. Thus, results from functional neuroimaging studies on meditation should be taken with caution. Nevertheless, several activated regions were constantly reported across different studies and meditative techniques as pointed out by two recent meta-analyses (Sperduti et al., 2012; Tomasino et al., 2013).

Different studies have reported that meditation is supported by a large set of brain areas encompassing lateral and medial frontal regions, comprising the anterior cingulate cortex (ACC), parietal structures, the insula, and medial temporal structures such as the hippocampus and the parahippocampal formation, and the basal ganglia (Brefczynski-Lewis et al., 2007; Hölzel et al., 2007; Bærentsen et al., 2010; Engström et al., 2010; Manna et al., 2010; Wang et al., 2011; Sperduti et al., 2012; Tomasino et al., 2013). Frontal and parietal regions encompassing the ACC are thought to reflect cognitive control and attentional monitoring during meditation, while insula is known to be involved in interoceptive awareness. On the contrary, the role of hippocampus during meditation is still a matter of debate and could reflect memory consolidation, emotional regulation, or spontaneous thoughts monitoring (Engström et al., 2010; Sperduti et al., 2012).

Nevertheless, establishing a unitary neural correlate of the complex task of meditation is not without problems. Indeed, different studies have shown that the neural underpinning of meditation could differ depending on the specific meditative technique that is investigated and the expertise of the participants. For example, Manna et al. (2010) directly compared FA and OM meditation in the same participants, and reported that OM more strongly activated lateral prefrontal regions. In the same vein, Wang et al. (2011) showed that a breath-based meditation (defined by the authors as a non-FA technique), compared to mantra repetition (a form of FA meditation), activated to a greater extent limbic structures (hippocampus, parahippocampus, and amygdala), insula and lateral frontal areas, while mantra repetition was more associated with activations in the precentral gyrus, parietal cortex, and medial frontal gyrus. Moreover, the engagement of certain structures, above all frontal regions, could vary with the individual expertise. Indeed, Brefczynski-Lewis et al. (2007) reported an inverted U-shaped relation between frontal activity and meditators' expertise. In other words, the more trained persons showed a less frontal activity, suggesting a disengagement of attentional control with training. These results are in line with a recent meta-analysis by Sperduti et al. (2012) showing that pooling together studies on expert meditators belonging to different meditative techniques did not report frontal activity, but common activation in the basal ganglia, medial prefrontal cortex, and parahippocampus. These data suggest that some brain regions, in particular those involved in cognitive control, are necessary at the initial and intermediary levels of expertise, while with practice meditation may be a highly automatic and effortless process (Lutz et al., 2008). Moreover, while the neurocognitive underpinnings of different meditative practices would differ in novices, it could eventually converge when expertise is attained, as suggested by Newberg and Iversen (2003, p. 283): "*Phenomenological analysis suggests that the end results of many practices of meditation are similar, although these results might be described using different characteristics depending on the culture and individual. Therefore,* *it seems reasonable that while the initial neurophysiological activation occurring during any given practice may differ, there should eventually be a convergence*."

#### **TRAIT STABLE MODIFICATION OF BRAIN ORGANIZATION ASSOCIATED WITH LONG-TERM MEDITATION**

One of the first studies investigating long-term effect of meditation on brain morphology is that of Lazar et al. (2005). The authors reported greater cortical thickness in meditators than controls in the prefrontal cortex, the insula, and the somatosensory cortex. Moreover, they found a positive correlation between cortical thickness and expertise in the occipito-temporal visual cortex. Increase in gray matter concentration in the insula wasfurther confirmed by Hölzel et al. (2008), who additionally reported greater gray matter concentration in the right hippocampus and left inferior temporal gyrus. In this latter region, the amount of gray matter concentration positively correlated with that of meditation training. Hippocampal alteration in expert meditators has also been reported in several recent studies (Luders et al., 2009, 2013). Another structure that has been found to be altered in meditators is the putamen (Pagnoni and Cekic, 2007). Indeed, while in aging this region normally shows a decrease in gray matter concentration, this was not the case for a group of expert meditators. This result, together with findings suggesting a protective effect of longterm meditation on the cognitive decline in aging (Prakash et al., 2012), opens the interesting perspective of employing meditation as a neurocognitive training technique in elderly.

There are also some studies reporting white matter changes induced by meditation. Luders et al. (2011) observed widespread increased of fractional anisotropy, a measure of fiber tracks integrity through the brain in expert meditators compared to controls. Similar results have been reported in a recent study by Kang et al. (2013). Moreover, two interesting studies on novice meditators showed that only 11 h of meditation training increased fractional anisotropy in the corona radiata (Tang et al., 2010, 2012), a tract connecting the ACC with other cortical structures. Increased fractional anisotropy could be interpreted as an enhanced connectivity between large scale networks. Studies reporting altered resting state functional connectivity in meditators are compatible with this interpretation. Indeed, several studies reported both increased connectivity within the default mode network (Brewer et al., 2011; Jang et al., 2011) and the attentional network (Hasenkamp and Barsalou, 2012), and between these two networks (Brewer et al., 2011; Hasenkamp and Barsalou, 2012; Taylor et al., 2013).

These findings confirm that long-, but also short-term meditative practice, could lead to stable alterations at the structural level in gray and white matter, and in a functional rewiring of large scale networks. Regions that are active during meditation are the most affected by these structural changes, and in some cases, expertise has been reported to correlate with the magnitude of morphological changes.

#### **DISCUSSION**

In the previous sections, we reviewed findings showing that meditation is subserved by a widespread network of brain regions involved in different processes such as cognitive control, attention, and interoceptive awareness. Moreover, the engagement of some of these structures seems dependent on the degree of expertise, with frontal regions being possibly involved at initial stages of practice, while at more advanced stages, when meditation practice eventually becomes "effortless," basal ganglia seem to play a central role. The continuous training in meditation, possibly resulting on the repeated activation of this set of regions, produces long-term structural and functional changes at the local level, but also in long range brain connectivity.

At the behavioral level, meditation has been shown to produce not only an enhancement in attentional performance that is supposed to be directly trained during practice, but also to improve different cognitive processes, such as executive functions, working memory, and long-term autobiographical memory. These findings are in line with the proposal of Slagter et al. (2011) stating that meditation could promote "process-specific learning" that is a kind of learning which is not confined to the enhancement of performance in the trained task, but could transfer to other tasks and domains.

Since most of the evidence about the relationship between meditative practice and cognitive, structural, and functional brain alteration comes from cross-sectional studies, conclusion about the causal role should be taken with caution. Nevertheless, the correlations often reported between the degree of expertise and the amount of performance or neural modifications, together with some recent longitudinal studies on short-term meditative training, seem to corroborate this interpretation. Further efforts should be done to carry out longitudinal studies that could shed light on the progressive cognitive and brain reorganization induced by different levels of meditative expertise.

#### **GENERAL CONCLUSIONS**

Technological and methodological advances in neuroimaging and non-invasive brain stimulation in humans, along with novel findings stemming from animal studies, provide new insights into the neuroplastic mechanisms underlying expert level of performance, and suggest that multiple mechanisms operate across different temporal scales from skill acquisition to expertise. In this review, we attempted to identify such typical changes following a motor to mental gradient of skills, which affects neuronal processes.

We have seen, across each domain of expertise, that the degree of functional brain plasticity reflecting an increased level of expertise was task-dependent and followed extensive practice. Enhanced behavioral and cognitive performances involve dynamic shifts in the strength of pre-existing neuronal connections, including changes in task-related cortico-cortical and cortico-subcortical coherence. In the motor domain, experts usually yield a reduced pattern of brain activity in the corticocerebellum pathway, despite a more focused activation in the striatum (caudate nucleus/putamen), suggesting that this structure may be critical for long-term retention of well-learned motor sequences. Likewise, in the cognitive domain, Dahlin et al. (2009) demonstrated positive behavioral effects of training on working memory which are associated by a decrease in cortical areas typically related to working memory along with fewer peaks of activity in regions participating to attentional control processes (e.g., fronto-parietal regions) and increases in the striatum. Together, these data suggest that following a motor to mental gradient, there is lower involvement of control-related cortical areas (e.g., prefrontal cortex) and an increase in the recruitment of the striatum along with practice. Accordingly, we also underlined in this review the role of basal ganglia across the motor to mental gradient. Yet, basal ganglia are not restricted to motoric aspects of behavior, but rather involved most areas of cognitive and emotional functioning (Seger, 2006), which is consistent with their anatomical connections with all areas of the cortex (Alexander et al., 1986). Briefly, basal ganglia interact with the cortex through independent processing loops in which the cortex projects to the striatum, the striatum to the pallidum, the pallidum to the thalamus, and from there back to the cortex. These processing loops have functions that complement those of the cortical areas they interact with. Neuroimaging studies in humans have undoubtedly demonstrated that basal ganglia play a critical role in the planning, learning, and the execution of a new motor skill, as well as the long-lasting representation of the skilled behavior (Doyon et al., 2009). In the same vein, and in addition to common activations in brain areas associated with generation and maintenance of mental images in the working memory, basal ganglia showed increased activity after intensive motor imagery practice, and therefore demonstrate its involvement in the expertise level of performance and reinforcement learning (Baeck et al., 2012). Finally, the contribution of basal ganglia in meditation has also been reported, albeit in a different way, in experienced meditators. Indeed, meditative state starts by activation of frontal regions that would successively activate, in cascade, different cortico-thalamic-limbic-basal ganglia loops that would maintain the meditative state. Ritskes et al. (2003)further demonstrated that when experienced meditators switched from normal consciousness to meditative state, increased activation in the prefrontal cortex and basal ganglia occured along with decreased activation in the superior occipital gyrus and anterior cingulate. It has been hypothesized that increased activations may be associated with the gating of cortical–subcortical interactions that leads to an overall decrease in readiness for action (Kjaer et al., 2002).

As seen in this review, complex motor skills that involve procedural learning, as well as purely mental skills, result in measurable changes in brain structure. Recently, studies using MR-based morphometry provided new insight to our understanding of brain plasticity related to the expertise level. Accordingly, clear modifications of the gray and white matter volumes are related to the skill level, which in turn depends on training characteristics. Moreover, those structural changes have been shown to correlate with behavioral improvements (for motor practice) and attentional and emotional regulation (for meditation). This latter finding might serve as a neural body of evidence for the common idea that longer practice makes perfection. However, it remains unknown when morphometric changes can first be detected and how long changes last.

One of the main characteristic that entails neural functional plasticity across a motor to mental gradient of expertise is that experts achieve higher levels of performance with less cognitive effort. Especially, using a quantitative ALE meta-analytic method, Patel et al. (2013) demonstrated common reduced activation for both motor and cognitive training in regions closely overlapping with the fronto-parietal control and dorsal attention networks. Indeed, it remains difficult and time-consuming to control multiple tasks concurrently (Miller, 1956; Just et al., 2001; Rubinstein et al., 2001), so that any strategy that minimizes the demands placed on intentional control is of great benefit. For instance, the functional activation observed in premotor areas or M1 during motor performance is reduced or more focused in professional musicians compared to amateurs or non-musicians (Hund-Georgiadis and von Cramon, 1999; Meister et al., 2005). More generally, the reduced activation in highly skilled performers is often taken as evidence for "increased efficiency of the motor system"and the needfor a smaller number of active neurons to perform a given set of movements (Krings et al., 2000; Haslinger et al., 2004), albeit others researchers argued that lower levels of activation result from reduced attention or task difficulty (Floyer-Lea and Matthews, 2004; Landau and D'Esposito, 2006). As suggested by Picard et al. (2013), low activation is not always a sign of low neuronal activity, but it may rather be a reflection of plastic mechanisms involved as expertise emerges. Connectivity studies provide a definitive answer to this point by examining whether the reduction in activity of certain areas is followed by increased activity of other brain regions when experts perform skilled and non-skilled tasks.

#### **FUTURE PERSPECTIVES**

In this review, we attempted to provide an overview, although not exhaustive, of structural and functional brain plasticity that occurs while practicing at an expert level of performance. It is worth acknowledging that the emphasis given on each specific, albeit substantial, domain of research following a motor to mental gradient, imposes limits given that each constitutes a currently very active area of research in cognitive neuroscience. Nevertheless, and despite these limitations, the main purpose of this review was to select and consider the most relevant insights in the three domains of expertise and to identify the common patterns of brain changes.

As a postulate, changes in the functional or structural networks may be expected to occur after long-term intensive skill training. Many studies exploited those changes through cross-sectional designs to reveal group differences underpinning expert skills of performance. However, the main issue using this approach is the difference in individual anatomical networks before training that could allow for predisposition in practicing a specific skill. A solution to this problem comes from longitudinal studies, which are time-consuming and often nearly impossible if training occurs over long time periods. Thus, in the future, the emphasis should be the use of multimodal imaging approaches (e.g., DTI and highfield MRI) to provide conjoint analyses of changes in activity within specific regions and patterns of connectivity between different regions. Further work is still needed to better discriminate the specific composition of training tasks that influence expertise and how/when this is reflected in functional and structural plasticity.

Although further research is needed to understand and identify the functional and structural changes protracted by expertise, the current challenge to modulate neural plasticity for optimal and long-term behavioral gains is now possible. The use of real-time fMRI specifically opens a space to examine functional changes associated with the acquisition of a motor or cognitive skill, since participants have been educated to gain some familiarity with the neuroanatomy before neurofeedback sessions (Sulzer et al., 2013). Given that brain-activation patterns and behavior are assumed to be closely linked, it seems likely that learned control over brain activation would lead to changes in cognition and behavior (Fetz, 2007). A growing research is actually attempting to develop new types of task paradigms that might benefit most from neurofeedback fMRI. This neuroimaging technology has already demonstrated performance-enhancement applications, such as boosting memory by increasing activation in memory-related brain areas (Zhang et al., 2013), or improving motor imagery accuracy (de Charms et al., 2004). Likewise, non-invasive brain stimulations such as transcranial magnetic or direct current stimulation (TMS/tDCS) induce lasting effects that can be used to explore the mechanisms of cortical plasticity in the intact human cortex and determine therapeutic potentials for behavioral and cognitive improvement. For example, Marshall et al. (2004) showed that the application of tDCS over frontal areas during slow-wave sleep contributed to improve declarative memory consolidation. Thus, both approaches can be applied to explore and enhance brain plasticity in the context of expertise, and future research should certainly investigate the synergist effects with sleep homeostatic functions. Enthusiastic findings have already demonstrated that rTMS applied concomitantly to daily motor training during 4 weeks might contribute to improve motor performance, along with corresponding neural plasticity in motor sequence learning networks (Narayana et al., 2014).

#### **REFERENCES**


Nudo, R. J., and Milliken, G. W. (1996). Reorganization of movement representations in primary motor cortex following focal ischemic infarcts in adult squirrel monkeys. *J. Neurophysiol.* 75, 2144–2149.


comparison of amateurs with professionals. *Cogn. Brain Res.* 23, 164–170. doi: 10.1016/j.cogbrainres.2004.10.004


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 01 November 2013; accepted: 15 April 2014; published online: 07 May 2014. Citation: Debarnot U, Sperduti M, Di Rienzo F and Guillot A (2014) Experts bodies, experts minds: How physical and mental training shape the brain. Front. Hum. Neurosci. 8:280. doi: 10.3389/fnhum.2014.00280*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Debarnot, Sperduti, Di Rienzo and Guillot. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

### Corrigendum: Experts bodies, experts minds: how physical and mental training shape the brain

#### *Ursula Debarnot 1, Marco Sperduti 2,3, Franck Di Rienzo4 and Aymeric Guillot <sup>4</sup> \**

*<sup>1</sup> Département des Neurosciences Fondamentales, Centre Médical Universitaire, Université de Genève, Genève, Switzerland*

*<sup>4</sup> Centre de Recherche et d'Innovation sur le Sport, University Claude Bernard Lyon 1, Villeurbanne, France*

*\*Correspondence: aymeric.guillot@univ-lyon1.fr*

#### *Edited by:*

*John J. Foxe, Albert Einstein College of Medicine, USA*

*Reviewed by:*

*John W. Krakauer, Johns Hopkins University, USA*

**Keywords: expertise, meditation, motor imagery, motor skills, motor consolidation, neural networks**

#### **A corrigendum on**

**Experts bodies, experts minds: how physical and mental training shape the brain**

*by Debarnot, U., Sperduti, M., Di Rienzo, F., and Guillot, A. (2014). Front. Hum. Neurosci. 8:280. doi: 10.3389/fnhum.2014. 00280*

An important reference (Yarrow et al., 2009) has been mistakenly omitted from the published article.

The review uses the material of the article by Yarrow et al. in three parts, specifically in the section: The neurocognitive basis of motor skill learning. The sentences were, unfortunately, used and cited verbatim without citing this important reference. The passages concerned are reproduced below.

"In cognitive psychology, theoretical descriptions of changes in skilled performance were shown to move from cognitive to automatic processing (Fitts, 1964). The key concept is the increasing automaticity: controlled processes are attention demanding, conscious and inefficient, whereas automatic processes are rapid, smooth, effortless, require little attentional capacity, and are difficult to be consciously disrupted (Shiffrin and Schneider, 1977)."

"Practically, it is not automaticity per se that is indicative of high proficiency, but rather the level of skill at which automaticity is attained. Although the border between the automaticity and the expertise concepts beg for clarification, one may consider that most people fail to develop beyond a hobbyist level of performance as they settle into automaticity at a given level of skill that they find enjoyable, rather than continuing to improve skills (Ericsson, 2007). Hence, automaticity is more a false ceiling than a measure of excellence."

"One solution to this issue is through assessment of changes in speed-accuracy trade-off functions, i.e., to defy the speed–accuracy tradeoff for a given task (Krakauer, 2009; Krakauer and Mazzoni, 2011). In other words, a skilled tennis player can serve both faster and more accurately than a novice. Thus, sporting skill at the level of motor execution can be considered as acquiring a new speed–accuracy trade-off relationships for each sub-task of the motor sequence."

These sentences are lifted verbatim from the study by Yarrow and collaborators in their important review paper, which should have therefore been cited explicitly. This is an individual error that we collectively take responsibility for. The authors apologize for their oversight and the unintentional inconvenience caused.

#### **REFERENCES**

Ericsson, K. A. (2007). Deliberate practice and the modifiability of body and mind: toward a science of the structure and acquisition of expert and elite performance. *Int. J. Sport Psychol*. 38, 4–34.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 10 August 2014; accepted: 17 January 2015; published online: 09 February 2015.*

*Citation: Debarnot U, Sperduti M, Di Rienzo F and Guillot A (2015) Corrigendum: Experts bodies, experts minds: how physical and mental training shape the brain. Front. Hum. Neurosci. 9:47. doi: 10.3389/fnhum. 2015.00047*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2015 Debarnot, Sperduti, Di Rienzo and Guillot. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

*<sup>2</sup> Centre de Psychiatrie et Neurosciences (InsermUMRS894), Université Paris Descartes, Paris, France*

*<sup>3</sup> Laboratoire Mémoire et Cognition, Institut de Psychologie, Boulogne Billancourt, France*

## Melodic multi-feature paradigm reveals auditory profiles in music-sound encoding

#### **Mari Tervaniemi <sup>1</sup>\*, Minna Huotilainen1,2 and Elvira Brattico1,3**

<sup>1</sup> Cognitive Brain Research Unit, Institute of Behavioural Sciences, University of Helsinki, Helsinki, Finland

<sup>2</sup> Finnish Institute of Occupational Health, Helsinki, Finland

<sup>3</sup> Brain and Mind Laboratory, Department of Biomedical Engineering and Computational Science (BECS), Aalto University School of Science, Espoo, Finland

#### **Edited by:**

Merim Bilalic, Alpen Adria University Klagenfurt, Austria

#### **Reviewed by:**

Lutz Jäncke, University of Zurich, Switzerland Erich Schröger, University of Leipzig, Germany

#### **\*Correspondence:**

Mari Tervaniemi, Cognitive Brain Research Unit, Institute of Behavioural Sciences, P.O. Box 9 (Siltavuorenpenger 1 B), FIN-00014 University of Helsinki, Helsinki, Finland

e-mail: mari.tervaniemi@helsinki.fi

Musical expertise modulates preattentive neural sound discrimination. However, this evidence up to great extent originates from paradigms using very simple stimulation. Here we use a novel melody paradigm (revealing the auditory profile for six sound parameters in parallel) to compare memory-related mismatch negativity (MMN) and attention-related P3a responses recorded from non-musicians and Finnish Folk musicians. MMN emerged in both groups of participants for all sound changes (except for rhythmic changes in non-musicians). In Folk musicians, the MMN was enlarged for mistuned sounds when compared with non-musicians. This is taken to reflect their familiarity with pitch information which is in key position in Finnish folk music when compared with e.g., rhythmic information. The MMN was followed by P3a after timbre changes, rhythm changes, and melody transposition. The MMN and P3a topographies differentiated the groups for all sound changes. Thus, the melody paradigm offers a fast and cost-effective means for determining the auditory profile for music-sound encoding and also, importantly, for probing the effects of musical expertise on it.

**Keywords: musical expertise, auditory event-related potentials (ERPs), mismatch negativity (MMN), P3a, learning, memory, involuntary attention**

#### **INTRODUCTION**

Intensive music making modulates brain function and structure quite dramatically. By comparing adult individuals with training in music with those individuals who lack personal experience in music making, various functional and anatomical differences have been observed (for reviews, see Jäncke, 2009; Tervaniemi, 2009, 2012; Pantev and Herholz, 2011). In the auditory modality, these findings indicate enhanced functions in cortical (e.g., Koelsch et al., 1999; Pantev et al., 2001; Fujioka et al., 2004; Tervaniemi et al., 2005; Kühnis et al., 2013) and subcortical (Musacchia et al., 2007; Wong et al., 2007; Lee et al., 2009; Strait et al., 2009) brain regions.

In the current context, several relevant findings have been obtained by investigating how brain function can be adaptively adjusted to perceive and encode musical information even when the sounds are not in the focus of the selective attention of the participants. Such a setting is particularly well suited for studies on musicians since the differences in motivational background or attentional abilities of the participants are less likely to contaminate the observations (Kujala et al., 2007). More specifically, by recording the mismatch negativity (MMN) component of the auditory event-related potentials (ERPs), we can determine the accuracy of the auditory cortex to map the regularities of the sound stream. The MMN is an index for the acoustical or cognitive incongruity between various relatively constant sound events (Näätänen et al., 2001). If the incongruity between the most plausible sound and the encountered one is large, then the MMN can be followed by a P3a response (Escera et al., 2000; Friedman et al., 2001). Traditionally, it is assumed to reflect the involuntary attention shift towards the sound input. More recently, it has been suggested to reflect the multi-stage process of sound evaluation which, in turn, leads into attention shift (Schröger et al., 2013).

Originally, the MMN was first observed and extensively investigated in an oddball paradigm, consisting of two sounds with different acoustical features (Näätänen, 1992). In other words, within a sequence of frequent standard sounds (e.g., probability of occurrence 0.9) infrequent deviant sounds are presented (e.g., with a probability of 0.1). They differ from the standard sounds, for instance, in their frequency or in duration. However, more recently, researchers have taken special efforts to render the stimulus paradigms more ecologically valid to avoid artificially simplified paradigms. A more ecologically valid stimulation paradigm in ERP research can, for instance, consist of short transposed melodies with different contour or interval structure (Tervaniemi et al., 2001; Fujioka et al., 2004; Seppänen et al., 2007), of transposed chords differing in their mode (major/minor) (Virtala et al., 2011, 2012, 2013), or of melodies composed for the experiment (Brattico et al., 2006).

However, oddball paradigms were quite time-consuming. For instance, to record brain responses to out-of-key and out-ofscale stimuli subjects an EEG session of almost 1 h was required (Brattico et al., 2006). Most recently, MMN has been studied using a so-called multi-feature paradigm in which deviant sounds with different acoustic deviances alternate with standard sounds, allowing recordings of several MMNs as fast as in 15 min (Näätänen et al., 2004; Pakarinen et al., 2007; Partanen et al., 2011; Vuust et al., 2011; Torppa et al., 2012). In that paradigm, each standard sound is followed by a different deviant, sound sequence being, for instance, Standard Deviant-Frequency Standard Deviant-Intensity Standard Deviant-Duration etc. These sound sequences can consist of sinusoidal sound complexes (Näätänen et al., 2004; Pakarinen et al., 2007), phonemes (Partanen et al., 2011), or alternating piano tones (Vuust et al., 2011; Torppa et al., 2012).

During the past 15 years, the MMN and P3a responses have been found to differentiate musicians and non-musicians in various acoustical and musical features such as pitch (Koelsch et al., 1999; Brattico et al., 2001, 2009; Fujioka et al., 2004; Marie et al., 2012), spatial sound origin (Nager et al., 2003; Tervaniemi et al., 2006), sound omission (Rüsseler et al., 2001), speech sound features (Kühnis et al., 2013), rhythm (Vuust et al., 2005), and grouping (van Zuijen et al., 2004, 2005). In all these cases, the MMN and/or P3a have been either larger and/or faster in musicians than in non-musicians, this being considered as an index of facilitated neural processing in musicians.

However, the majority of the studies on musical expertise are suboptimal for two reasons. First, in these studies the sound stimulation has lacked ecological validity. To have a full control upon the sounds and their time of occurrence, they have been presented one by one or as brief sound patterns. Spectrally, they have also been relatively simple (e.g., sinusoidal tones or chords composed of sinusoidal tones). Even the two multi-feature paradigms utilized to study music-related brain functions have not improved much in terms of ecological validity due to their repetitive characteristics of the sound sequences. Thus, compared to real music with its multitude of overlapping and temporally successive sounds, the sound environment in the former ERP experiments of music-related brain functions has been oversimplified. Furthermore, the experiments have also been quite long and thus quite demanding to some groups of participants, particularly to children.

Second, in vast majority of the studies in the field of musical expertise, the participating musicians have had training in Western classical music (for exceptions, see Vuust et al., 2005, 2012; Tervaniemi et al., 2006; Brattico et al., 2013). Investigating brain basis of musical expertise in classical musicians was an excellent starting point for the pioneering studies on musical expertise and its brain basis. Musicians with classical orientation have relatively well documented history in their training (e.g., amount of hours spent in practicing) and most of them also have started their playing quite early in age (e.g., before the age of 7). Thus it has been feasible and fruitful to start tracing the neuroplastic effects of musical expertise from classically trained musicians. One should note, however, that in our current society, not all musical activities occur in the formal training contexts of classical music. Neither do all musicians have a musical history since their childhood. For instance, many teenagers all over the world play in bands, some of them even ending up to work and earn their livings as musicians and as other professionals in the music industry, without any formal training.

In the current paper, we wish to meet these two challenges at once. Here we use a novel melodic multi-feature paradigm, introduced by Putkinen et al. (2014). It is fast in practice and has better ecological validity than previous MMN paradigms (see below). Furthermore, because of the diversity of the current musical scene, we broaden the selection of the musicians under neuroscientific investigation. For us, of specific interest was to investigate folk musicians in Finland who, in most cases, have their first training in Western classical music but who later on turn to highly original forms of music making. In most cases, the musical score is not very much utilized which makes the performance include a different cognitive load within the auditory modality. Additionally Finnish folk musicians usually perform with many instruments, either within one instrumental family (e.g., different types of flutes) or from several families (e.g., flutes, percussions, string instruments with and/or without bowing with divergent tuning systems). Thus, from the viewpoints of music perception, cognition, and performance, the demands set by their music are rather different from the classical orientation. As the first evidence about their advanced neurocognitive processes, we recently found that the early right anterior negativity (ERAN; triggered by chord incongruities among Western cadences) (Koelsch, 2009) is both quantitatively and qualitatively modulated in Finnish folk musicians compared to previous data from musicians and nonmusicians (Brattico et al., 2013). More specifically, ERAN was generally larger in amplitude in folk musicians compared with non-musicians. Importantly, folk musicians showed a subsequent P3a to the strongly-violating ending chord and their late ERAN to mildly-violating ending tones was more prominent in folk musicians than in non-musicians.

Here, we investigated the auditory profile of Finnish Folk musicians using a novel melodic multi-feature paradigm lasting less than 20 min (see **Figure 1**). To upgrade the traditions of the ERP paradigms, it did not have any pauses even between subsequent melody presentations but instead the melody was presented in a looped manner. As introduced below, it had several kinds of deviant sounds in terms of pitch, timbre, harmony, and timing, ranging from low-level changes not affecting the melody content to high-level changes modulating the course of the successive melody. It was also transposed to 12 keys to prevent fatigue, as well as to increase sound variation and hence ecological validity.

#### **METHODS**

The experiment was approved by the Ethical committee of the Faculty of Behavioural Sciences, University of Helsinki. All participants signed an informed consent form and the study was conducted according to the principles of the Declaration of Helsinki.

#### **STIMULI**

The stimuli consisted of brief melodies composed by the second author. They are described in detail in Putkinen et al. (2014) and briefly introduced in the following.

The sound parameters were selected on the basis of careful listening of different combinations of deviances followed by

pilot EEG recordings. In the melodies, as the standard timbre, digital piano tones (McGill University Master Samples) were used to form a short melody pattern that was in accordance with Western tonal rules and was recursively repeated. Short melodies always started with a triad (300 ms), which was followed by four tones of varying length and an ending tone. There was a 50-ms gap between successive tones. The ending tone was 575 ms in duration. There was also a 125 ms gap between each melody. So, one melody lasted for 2100 ms. The melody was presented in total for 15 min, in a looped manner.

Six different deviant events (changes) were included in the melodies. They are divided into low-level changes which do not change the melody and into high-level changes which alter the melodic contour. One melody could contain several changes. For illustration, see **Figure 1**.

Low-level changes


High-level changes


All high-level changes became the repeated form of the melody in the subsequent presentations in the so-called roving-standard fashion (Cowan et al., 1993). Thus, for example, after a modulation of the melody, the following melodies were repeated in the modulated form. In addition, all high-level changes were musically plausible, i.e., the resulting melody was in key and consonant, very similar to the other variants of the melody. Correspondingly, the rhythm modulations resulted in maintaining the beat of the repeated melodies, and the melody modulations were from 1 to 3 semitones such that they resulted in the new tones belonging to the original key.

#### **PARTICIPANTS**

In total, there were 28 healthy adult participants involved in EEG recordings. Two of them were excluded from the EEG analyses, one based on noisy EEG and another one based on unspecific musical profile (this participant had some training in folk music but not advanced enough to justify being a member of that group, see below).

From the remaining 26 participants, 13 were active in learning and performing Finnish folk music. Their mean age was 26.7 yrs (range: 18–31 yrs, SD = 3.6), 10 of them were females. Currently they were either actively performing artists or students of the Sibelius Academy (Finnish university for music performance). They had started to play, on average, at the age of 7.8 years (range: 4–25 yrs, SD = 5.8 yrs). Their instrumental choices are listed in **Table 1**. Six of them named violin as the major instrument. The rest listed *kantele* (2; Finnish traditional instrument resembling Celtic harp), vocals (2), accordion, guitar, and wind instruments as their major instrument. In total, they named 21 instruments as their minor instruments. In this paper, these participants will be called *Folk musicians*. From the remaining 13 participants, most had not been involved in music lessons outside the school at all (*N* = 10) while some had some extracurricular music activities for 1–2 years prior to their puberty (*N* = 3). Their mean age was 25.1 yrs (range: 20–35 yrs, SD = 4.1), 11 of them were females. In this paper, they will be called *Non-musicians*.

#### **EXPERIMENT**

#### **Procedure**

During the EEG recordings, the participants were sitting in a dimly lit EEG chamber in a comfortable chair. They were instructed to watch a silent nature documentary while stimuli were presented via headphones. The EEG recording was preceded by a 10 min session during which the participants were asked to listen to three self-selected music samples. These data will be reported elsewhere.

#### **EEG recordings**

The recordings were conducted in an acoustically and electromagnetically shielded room (Euroshield Ltd., Finland) of the Institute of Behavioural Sciences, University of Helsinki.

The EEG was recorded with BioSemi–system with a sampling rate of 4096 Hz by using a 64-electrode EEG-cap and six additional facial silver-chloride (Ag/AgCl) electrodes. They were inserted on the mastoids, on the tip of the nose, under right eye (for EOG monitoring) and two on EMG-related (electromyography) sites on left cheek and over the left eyebrow. The mean value of mastoid electrodes was used as a reference during the offline analyses. The EOG electrode was used to extract artifacts from the data due to eye blinks.

Hearing thresholds were determined by averaging five tests of just-audible sounds. A volume level of 60 dB HL over the individually determined hearing threshold was used. The sounds were presented via headphones.

#### **DATA ANALYSIS**

In the first step of preprocessing, data were referenced to the nose electrode, resampled to 256 Hz and, due to fast stimulation

#### **Table 1 | Instrumental background of the Folk musicians.**


rate, filtered with 1-Hz high-pass cut-off. After this, data were divided into epochs from −100 ms prestimulus to 600 ms poststimulus and individual data blocks were merged together as one dataset. As recommended by Onton and Makeig (2006) before independent component analysis (ICA), the removal of "nonstereotyped" or "paroxysmal" noise, associated with non-fixed (random) scalp projections, was obtained by rejecting epochs based on voltage amplitudes. This voltage rejection was done with a threshold of ±300 µV in most participants, of ±340 µV with four participants and of ±370 µV in one participant, depending on the quality of the data in those different experimental sessions. This initial cleaning of the largest artifacts based on voltage amplitudes was conducted to improve the performance of ICA decomposition, which is known to be optimal for separating only certain kinds of artifacts, and particularly those associated with fixed scalp projections. The final rejection of extra-encephalic artifacts was hence achieved with ICA, which was conducted with the *runica*-algorithm of the EEGLab-software. The independent components (IC's) originating from eye blinks, ocular movements and other muscle artifacts were removed manually from the data of each individual subject. IC's were identified based on their topographical location, frequency power spectrum, and temporal shape. After removing non-encephalic IC's 25-Hz low-pass filtering was done and ±100 µV voltage rejection was applied to epochs to remove artifacts from ±100 to ±300 µV remaining in data. Finally, all the qualified data of each single subject was separately averaged according to the sound type (standard and the several different deviance types).

The ERP amplitudes were quantified by first determining the peak latencies from the grand-average difference waves separately for each sound change (deviant) as the largest peak between 100–300 ms at Fz for MMN and between 300 and 400 ms for P3a. After the peak definition, the amplitude values at the individual difference waveforms were integrated over a 40-ms window centered at the ERP peak.

#### **Statistical analysis**

The group differences in the brain responses were tested with separate three-way mixed-model ANOVAs with MMN/P3a amplitudes as the dependent variables. Group (Folk/Nonmusicians) was used as a between-subject factor and Laterality (left/midline/right) and Frontality (frontal/central/posterior) as within-subject factors. Factor Frontality consisted of the amplitude values obtained at three lines of electrodes: Frontal (F3, Fz, F4), Central (C3, Cz, C4), Posterior (P3, Pz, P4). Factor Laterality consisted of amplitude values as obtained at the Left (F3, C3, P3), Middle (Fz, Cz, Pz), and Right (F4, C4, P4). Subsequently, paired post hoc comparisons were conducted using Least Significant Different–test.

#### **RESULTS**

Using a novel melodic multi-feature paradigm with six sound changes (deviants) embedded within it, we show here that despite the complexity of the stimulation both Folk musicians and Nonmusicians were able to neurally discriminate the changes from the regular melody continuation. This was evidenced by the presence of the MMN in both groups of participants to all other deviants except the rhythmic modulation in non-musicians (see **Table 2**).

#### **LOW-LEVEL CHANGES**

#### **Mistuning**

As indicated in **Figure 2**, MMN to mistuning was significantly larger in Folk musicians than in Non-musicians (Group: *F*(1,24) = 17.3, *p* < 0.0001). Moreover, it was frontally maximal and largest at the midline (Frontality; *F*(2,48) = 24.1, *p* < 0.0001; Frontal MMN > Posterior MMN, *p* < 0.0001; Laterality; *F*(2,48) = 5.1,

**Table 2 | Two-tailed t-tests for verifying the significance of the MMN amplitudes against the zero baseline**.


For timing delay two tests were conducted since there were two distinct peaks observable.

*p* < 0.01; Midline MMN > Left MMN, *p* < 0.009, and Frontality × Laterality: *F*(4,96) = 4.0, *p* = 0.01, = 0.7). Additionally, its distribution was modulated by musical expertise (Frontality × Group; *F*(2,48) = 11.5, *p* < 0.0001), MMN being enhanced at the frontocentral electrodes in Folk musicians (*p* < 0.0001).

but they did not modulate the continuation of the melody.

Folk musicians were highly sensitive towards mistuning as indicated by a significantly larger MMN when compared to that of Non-musicians. Additionally about half of the deviants initiated further evaluation and attentional processing of the participants



*T*-tests were not conducted for mistuning and rhythm shortening since no P3a peaks were visible from the grand-average curves.

as evidenced by the P3a elicitation (**Table 3**). These and other results are further specified below, first for low-level changes and thereafter for high-level changes.

#### **Timbre**

MMN to timbre change was maximal at mid-central electrodes (Frontality; *F*(2,48) = 16.7, *p* < 0.0001, = 0.6; central MMN > frontal and posterior MMN, *p* < 0.006; Laterality: *F*(2,48) = 9.1, *p* < 0.001; Midline MMN > Left and Right MMN, *p* < 0.004). Additionally, its distribution was modulated by the musical expertise (Frontality × Laterality × Group, *F*(4,96) = 5.2, *p* = 0.001). According to *post-hoc* tests, in Folk musicians, the MMN over the left hemisphere and midline electrodes was larger at frontal and central electrodes than at posterior ones (*p* < 0.04). Over their right hemisphere, the MMN was larger at central than at posterior electrodes (*p* = 0.001). In contrast, in Non-musicians, the left central MMN was larger than posterior MMN (*p* = 0.03), midcentral MMN was larger than the frontal and posterior MMN (*p* < 0.02 in both comparisons), and the right central MMN was larger than the frontal MMN which, in turn, was larger than the posterior MMN (*p* < 0.02 in all comparisons).

P3a to timbre change was frontally maximal and largest at the midline (Frontality: *F*(2,48) = 19.9, *p* < 0.0001; frontal and central P3a being larger than posterior P3a, *p* < 0.002 for all; Laterality: *F*(2,48) = 17.5, *p* < 0.0001; midline P3a being larger than both left P3a and right P3a, *p* < 0.0001; Laterality × Frontality: *F*(4,96) = 6.7, *p* < 0.0001). However, musical expertise did not significantly modulate the P3a amplitude or its distribution above the scalp.

#### **Timing delay**

Timing delay elicited two subsequent MMN responses. The first one was frontocentrally maximal (Frontality; *F*(2,48) = 9.0, *p* = 0.004, = 0.6, frontal and central MMN larger than posterior MMN, *p* < 0.009 for all). Additionally, its distribution was modulated by the musical expertise (Laterality × Group, *F*(2,48) = 6.5, *p* = 0.003; Frontality × Laterality × Group, *F*(4,96) = 4.5, *p* = 0.002), with a tendency for a larger MMN in Folk musicians over Non-musicians at the left posterior electrode site (*p* = 0.08). Moreover, in Folk musicians the MMN over the left hemisphere was largest at frontal and central electrodes whereas it was minimal at the posterior site (*p* = 0.006). At midline electrodes their MMN was largest frontally and smallest posteriorly (*p* < 0.02). Over the right hemisphere there were no indications of frontality (*p* > 0.05). In Non-musicians the scalp distribution followed a different pattern: they also showed the MMN over the left hemisphere which was largest frontally and smallest posteriorly (*p* < 0.03). Over the midline electrodes there were no indications of frontality and, finally, over their right hemisphere the MMN was largest centrally, smaller at the frontal electrodes and smallest at the posterior electrodes (*p* < 0.05).

The second one was maximal at central electrodes (Frontality: *F*(2,48) = 4.5, *p* = 0.04, = 0.7; central MMN being larger than the frontal MMN and posterior MMN, *p* = 0.001 for all; Laterality: *F*(2,48) = 4.7, *p* = 0.01; midline MMN being larger than the left MMN, *p* = 0.005). Its distribution was modulated by musical expertise (Frontality × Laterality × Group: *F*(4,96) = 3.3, *p* = 0.01), resulting from the varying pattern of frontality between Folk musicians and Non-musicians. Folk musicians had centrally and posteriorly larger MMN than their frontal MMN at left and midline sites (*p* < 0.05 for all). Moreover, their MMN over the right hemisphere had no indication of frontality (*p* > 0.05). Non-musicians showed a broadly distributed MMN at left sites and a central MMN which was larger than their frontal MMN at midline sites (*p* = 0.01) and a central MMN which was larger than their posterior MMN at right sites (*p* = 0.003).

The MMN to timing delay was followed by a P3a that was maximal at central electrodes (Frontality: *F*(2,48) = 22.3, *p* < 0.0001; central P3a was larger than frontal P3a which, in turn, was larger than Posterior P3a, *p* < 0.02 for all; Laterality: *F*(2,48) = 22.4, *p* < 0.0001; midline P3a being larger than left and right P3a, *p* < 0.0001). Additionally, its scalp distribution was modulated by musical expertise (Frontality × Laterality × Group: *F*(4,96) = 4.1, *p* = 0.004), due to the larger P3a in Non-musicians compared to Folk musicians over the right frontal region (*p* = 0.01) and left posterior region (*p* < 0.05).

#### **HIGH-LEVEL CHANGES Melody modulation**

MMN to melody modulation was maximal at mid-central electrodes (see **Figure 3**; Frontality: *F*(2,48) = 21.1, *p* < 0.0001; frontal and central MMN being larger than posterior MMN, *p* < 0.001; Laterality: *F*(2,48) = 16.4, *p* < 0.0001; midline MMN being larger than left and right MMN, *p* < 0.0001). Additionally, its distribution was modulated by musical expertise (Laterality × Group: *F*(2,48) = 3.9, *p* = 0.03). This was caused by the MMN amplitudes which did not differ from each other between the left and right site electrodes but which were largest at the midline region (*p* < 0.05) for Folk musicians compared to other electrode sites. For Non-musicians, the MMN was larger at the left electrodes compared to the right ones (*p* = 0.02). Additionally, the left and right MMNs were smaller than the MMN at the midline (*p* < 0.006 for both comparisons). Also an interaction between

Laterality and Frontality was observed (*F*(4,96) = 5.0, *p* = 0.001), deriving from larger MMN at the frontal and central electrodes than at the posterior and lateral electrodes (*p* < 0.01 in all comparisons).

#### **Rhythm modulation: shortening of a long tone**

MMN to rhythm modulation of tone shortening was maximal at frontocentral midline electrodes (Frontality: *F*(2,48) = 7.5, *p* = 0.007, = 0.6; frontal and central MMN being larger than posterior MMN, *p* < 0.02; Laterality: *F*(2,48) = 6.4, *p* = 0.004; midline MMN larger than left and right MMN, *p* < 0.002).

#### **Rhythm modulation: lengthening of a short tone**

MMN to rhythm modulation of tone lengthening was frontocentrally maximal (Frontality: *F*(2,48) = 4.0, *p* = 0.04, = 0.8; Frontal and Central MMN > Posterior MMN) and modulated by musical expertise (Frontality × Group: *F*(2,48) = 7.4, *p* = 0.002). This resulted from larger MMN above the frontal and central electrodes (vs. posterior ones) in Folk musicians only (*p* < 0.002 for both) and small uniformly distributed MMN in Non-musicians (*p* > 0.05 in all comparisons).

#### **Transposition**

In both groups, MMN to transposition was maximal at the frontal electrodes (Frontality; *F*(2,48) = 12.1, *p* < 0.0001, = 0.7; frontal and central MMN being larger than posterior MMN, *p* < 0.001). P3a to transposition was maximal at frontocentral midline electrodes (Frontality: *F*(2,48) = 25.6, *p* < 0.0001, = 0.8; frontal and central P3a larger than posterior P3a, *p* < 0.0001; Laterality: *F*(2,48) = 3.4, *p* = 0.04; midline P3a larger than left and right P3a, *p* < 0.03). It was modulated by musical expertise (Frontality × Group: *F*(2,48) = 7.5, *p* = 0.001; Frontality × Laterality × Group: *F*(4,96) = 3.0, *p* = 0.02): In *post-hoc* analyses, Folk musicians showed a larger P3a overall at frontal electrodes (*p* = 0.008) and at the left central electrode (*p* = 0.03) when compared with Non-musicians.

#### **DISCUSSION**

In the present study, we use the novel melodic multi-feature paradigm for investigating the brain basis of early musical sound encoding using the MMN and P3a components of the auditory ERPs. We composed a short melody with several changes embedded in it (**Figure 1**). "Low-level" changes in timbre, tuning, and timing interspersed the regular melody content, but were not included in the continued melody. In contrast, "highlevel" changes in melody contour, rhythm, and key modified the continuation of the melody. The melody was presented in a looped fashion without any pauses between the subsequent melodies. Half of our participants were professional musicians with professional-level expertise in current Finnish folk music. According to the present results, the training in folk music is especially reflected in the brain encoding of pitch, as evidenced by larger MMN for mistuning observed in Folk musicians than in Non-musicians. Moreover, all changes evoked MMN or P3a responses with different scalp distributions in Folk musicians than in Non-musicians implying that the brain dynamics can be relatively broadly modulated by the music activities. Importantly, we also show here that this novel melodic multi-feature paradigm provides an excellent tool for revealing the auditory neurocognitive profile of musically non-trained adults.

To specify the results further, the MMN and P3a responses were enhanced or generated with slightly different brain architectures in Folk musicians and in non-musicians. These effects of musical expertise were most obvious for pitch information in terms of enhanced MMN specifically to mistuning in Folk musicians. This finding coincides and upgrades our previous results which indicated that Folk musicians encode chord cadences in a very unique manner (Brattico et al., 2013): in addition to generally enhanced early anterior negativity (ERAN) around 200 ms, Folk musicians had subsequent late ERAN and P3a responses as a function of the given inappropriate chord position. More specifically, Folk musicians showed a subsequent P3a only to the strongly-violating ending chord and a late ERAN in response to the milder chord violation at the fifth position (where it violated less the expectations of the listeners). Since in Brattico et al.(2013) the ERAN was recorded in a semiattentive paradigm when the participants were instructed to detect timbre-deviant chords (but not inappropriate chord functions), those results, however, leave it open as to whether the auditory sensitivity of Folk musicians exists already at the pre-attentive stage of processing. Even if the current study does not allow us to completely rule out the involvement of involuntary attentional functions either (especially for deviances in timbre, timing, and transposition as they were followed by P3a responses) particularly mistuned sounds seem to be encoded without the triggering of involuntary attention. This further strengthens the conclusion about a parameter (pitch, harmony) specific enhancement of sound processing in Folk musicians.

Notably, pitch is a pivotal dimension in Finnish folk music. Similarly to folk music in other parts of the world, Finnish folk music stresses the importance of improvisation and variation of a motif by adding grace notes or various kinds of auxiliary notes (Saha, 1996). These pitch-based features are partly a consequence of the independence of folk music from musical scores: most folk tunes are transmitted from generation to generation through performance practice and memory rather than written scores. Moreover, Finnish folk music is often monophonic or heterophonic, with contemporary variations of the same lines of music played according to the idiomatic characteristics associated with the performance of a particular instrument (Asplund et al., 2006). For instance, music played by the *kantele* is repeated and varied particularly in the melody and harmony dimensions (Saha, 1996). Hence, our findings of pitch-specific enhancements of ERP responses contribute to the growing literature on auditory neuroplasticity specific to the idiosyncratic practices of a musical style (see below).

Previously, parameter-specific pre-attentive enhancement of sound processing has been evidenced among classical musicians for timbre (trumpet *vs*. violin players) (Pantev et al., 2001) and spatial sound origin (conductors *vs*. pianists) (Nager et al., 2003). It was also shown for pitch (Koelsch et al., 1999) in violin players without being generalizable for other instrumentalists (Tervaniemi et al., 2005). Correspondingly, it was shown for spatial sound origin and intensity in rock musicians (Tervaniemi et al., 2006) and, among other sound features, also for rhythmic changes and pitch glides in jazz musicians (Vuust et al., 2005, 2012). Together with the current MMN data and previously introduced ERAN evidence (Brattico et al., 2013), we are tempted to conclude that the sound parameters which are of most importance in a given musical genre or which became most familiar during the training history (e.g., timbre of one's own instrument) become particularly sensitively processed during the course of the musicians' training history. This line of reasoning is supported by the current finding with regard to the rhythm MMN which, in contrast to pitch MMN, was enhanced in Folk musicians when compared with non-musicians only when tones were lengthened but not when they were shortened. In other words, rhythmic modulation was not encoded by Folk musicians in great detail. This is not a surprise when keeping in mind the special characteristics of Finnish folk music as described above. In contrast, rhythm MMN did differentiate jazz musicians and non-musicians in a pioneering MMNm study (Vuust et al., 2005). In jazz, very fine-grained modulations in the timing of the performance carry high importance in expressivity of music.

Yet, without a large cross-sectional study systematically comparing musicians from several genres (e.g., classical, jazz, rock, folk) using a same stimulation paradigm such a conclusion should be considered only as a tentative one. By such a laborious procedure one would obtain reliable information only about the current functionality of the auditory encoding in these musicians.

It is noteworthy that even then the outcome of such a crosssectional study conducted in adult musicians would not reveal whether the resulting auditory profile reflects the auditory encoding abilities as they were prior to commencement of play, or their maturation, and enhanced development due to training in music, or their combination. Therefore, there is an urgent need for longitudinal projects on children who are either involved in musical programs (musically active children) or in other training programs of comparative intensity and content but without a musical aspect (control children). Only those can reveal the relationship between the original auditory encoding accuracy and its further development, for instance, in terms of sound parameters that are unequally important in musical genres and instrument families. First of study along these lines was recently conducted by Putkinen et al. (2014). According to those results, the children actively training to play classical music displayed larger MMNs than control children for the melody modulations by the age 13 and for the rhythm modulations, timbre deviants and slightly mistuned tones already at the age of 11. At the onset of the study when children were 9 years old, there were no group differences. Thus, the current paradigm is also sensitive with regard to the development of the neural sound discrimination during musical training.

Methodologically, the current paradigm provides a novel means for determining the integrity of the auditory neurocognition simultaneously for six acoustical and musical parameters. It thus updates the tradition initiated by the multi-feature paradigm, which used isolated sinusoidal or speech-sound stimulation (Näätänen et al., 2004; Pakarinen et al., 2007; Partanen et al., 2011). Recently, the multi-feature paradigm was also successfully extended into fast musical MMN paradigm using instrumental sounds in a harmonic accompaniment setting (Vuust et al., 2011, 2012). The current paradigm is an upgrade of that since it includes both low- and high-level deviances, which either preserve or modify the melody. In parallel, one should not ignore previous initiatives to use more ecologically valid musiclike stimuli in MMN studies (see, e.g., Tervaniemi et al., 2001; Fujioka et al., 2004, 2005; van Zuijen et al., 2004; Brattico et al., 2006; Virtala et al., 2012, 2013). Even if those included a more limited number of deviances than the more recent multi-feature paradigms, they were able to reveal novel information regarding preattentive encoding of musically relevant sound-attributes both in musically trained and in non-trained groups of participants.

Highly promisingly, we show here that despite the complexity of the melodic multi-feature paradigm, the MMN could also be observed at the group level in non-musicians for all deviances except rhythm modulation. Moreover, the paradigm has been successfully used also in child recordings (Putkinen et al., 2014). Thus, in the future, auditory profiles of child and adult participants can be probed during the whole life span with various multi-features paradigms using sinusoidal, linguistic, and musical stimulation entities approaching natural sound environments as present in everyday life.

#### **ACKNOWLEDGMENTS**

We thank Ms. Tiina Tupala, M.A. (Psych) for the data acquisition and Mr. Hannu Loimo, M.Sc. (Tech.) for help with the data analyses. Technical assistance provided by Mr. Tommi Makkonen, M.Sc. (Tech.), is also warmly acknowledged.

#### **REFERENCES**


potential and behavioral study. *Exp. Brain Res.* 161, 1–10. doi: 10.1007/s00221- 004-2044-5


for musical ability. *Cortex* 47, 1091–1098. doi: 10.1016/j.cortex.2011. 04.026


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 03 April 2014; paper pending published: 09 May 2014; accepted: 18 June 2014; published online: 07 July 2014*.

*Citation: Tervaniemi M, Huotilainen M and Brattico E (2014) Melodic multi-feature paradigm reveals auditory profiles in music-sound encoding. Front. Hum. Neurosci. 8:496. doi: 10.3389/fnhum.2014.00496*

*This article was submitted to the journal Frontiers in Human Neuroscience*.

*Copyright © 2014 Tervaniemi, Huotilainen and Brattico. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Music practice is associated with development of working memory during childhood and adolescence

#### *Sissela Bergman Nutley\*, Fahimeh Darki and Torkel Klingberg*

*Neuroscience Department, Developmental Cognitive Neuroscience, Karolinska Institutet, Stockholm, Sweden*

#### *Edited by:*

*Merim Bilalic, University Tübingen – University Clinic, Germany*

#### *Reviewed by:*

*E. Glenn Schellenberg, University of Toronto, Canada Hervé Platel, Université de Caen – Inserm U1077, France*

#### *\*Correspondence:*

*Sissela Bergman Nutley, Neuroscience Department, Developmental Cognitive Neuroscience, Karolinska Institutet, SE-171 77 Stockholm, Sweden e-mail: sisselanutley@gmail.com*

Practicing a musical instrument is associated with cognitive benefits and structural brain changes in correlational and interventional trials; however, the effect of musical training on cognition during childhood is still unclear. In this longitudinal study of child development we analyzed the association between musical practice and performance on reasoning, processing speed and working memory (WM) during development. Subjects (*n* = 352) between the ages of 6 and 25 years participated in neuropsychological assessments and neuroimaging investigations (*n* = 64) on two or three occasions, 2 years apart. Mixed model regression showed that musical practice had an overall positive association with WM capacity (visuo-spatial WM, *F* = 4.59, *p* = 0.033, verbal WM, *F* = 9.69, *p* = 0.002), processing speed, (*F* = 4.91, *p* = 0.027) and reasoning (Raven's progressive matrices, *F* = 28.34, *p* < 0.001) across all three time points, after correcting for the effect of parental education and other after school activities. Music players also had larger gray matter volume in the temporo-occipital and insular cortex (*p* = 0.008), areas previously reported to be related to musical notation reading. The change in WM between the time points was proportional to the weekly hours spent on music practice for both WM tests (VSWM, β = 0.351, *p* = 0.003, verbal WM, β = 0.261, *p* = 0.006) but this was not significant for reasoning ability (β = 0.021, *p* = 0.090). These effects remained when controlling for parental education and other after school activities. In conclusion, these results indicate that music practice positively affects WM development and support the importance of practice for the development of WM during childhood and adolescence.

**Keywords: musical practice, working memory, reasoning, cognitive development, gray matter volume**

#### **INTRODUCTION**

Previous research on practice on specific skills demonstrates domain specific expertise, whether it be within the field of chess, memorizing numbers, or dance, with little or no transfer evident of this superiority to other tasks (Ericsson and Lehmann, 1996). This is explained by the use of material specific strategies, schemas, and automatization of the procedures being performed (Ericsson et al., 2006). However, the development of such strategies requires abilities considered to be executive functions such as working memory (WM), updating, monitoring of performance, etc. In the emerging field of cognitive training, targeted interventions aim to train these types of cognitive abilities, with WM being the most studied to date. The most effective of these paradigms often shows large effects on the trained ability and moderate effects on closely related abilities after as little as 5 weeks of computerized training (for a review, see Klingberg, 2010). This is of importance for theoretical and clinical reasons where a particular ability is deficient. A different way of achieving cognitive enhancement may be to target the entire bodily system through physical training where such effects have been observed (Hillman et al., 2008; Hotting and Roder, 2013). Yet another way to achieve this may be to regularly engage in a complex activity that requires one to use higher order thinking, such as playing a musical instrument.

Formal music practice involves several cognitively challenging elements, e.g., long periods of controlled attention, keeping musical passages in WM or encoding them into long-term memory, decoding music scores, and translating the product into corresponding motor commands. This type of activity taxes complex cognitive functions as seen in brain imaging research (Schon et al., 2002; Stewart et al., 2003). Other investigations suggest that music practice is associated with cognitive benefits (for a review, see Schellenberg and Weiss, 2013).

Associations between formal music training and cognitive ability have mostly been reported in retrospective studies of musicians and non-musicians (Schellenberg, 2006; Forgeard et al., 2008; Ruthsatz et al., 2008). Individuals practicing music demonstrate higher performance in tasks requiring visuo-spatial reasoning (Hurwitz et al., 1975; Rauscher et al., 1997; Bilhartz et al., 1999; Costa-Giomi, 1999; Schellenberg, 2004, 2006; Forgeard et al., 2008; Ruthsatz et al., 2008), processing speed (Schellenberg, 2006; Bugos et al., 2007) as well as WM (Schellenberg, 2006; Bugos et al., 2007; Pallesen et al.,2010). These effects point to a quite general cognitive advantage for music players compared with non-music players. There are also reports of associations between the number of months of music practice and academic performance in math, reading and spelling after controlling for general intelligence and parental education (Schellenberg, 2006), although these findings are not always consistent (Forgeard et al., 2008). A recent study also reported a positive association between music practice and grades in practically all school subjects in three separate grades

"fnhum-07-00926" — 2014/1/7 — 11:07 — page 1 — #1

(Cabanac et al., 2013). A reoccurring question in retrospective correlation studies is whether the differences reported in musically trained individuals predate their music training or appear as a consequence of it. There are however, a few prospective randomized controlled studies suggesting a causal relation between music and cognitive ability demonstrated with measures of processing speed and WM (Bugos et al., 2007) as well as on full scale IQ (Schellenberg, 2004).

Extensive music training is known to affect the anatomy of the brain, with greater gray matter volumes observed in motorrelated areas (Elbert et al., 1995; Pascual-Leone, 2001; Hyde et al., 2009), auditory discrimination areas (Gaser and Schlaug, 2003; Hyde et al., 2009) as well as greater white matter volumes in motor tracts (Bengtsson et al., 2005) in professional musicians. Whilst few studies have been carried out to understand the neurophysiological effects of music training on cognitive abilities there is some evidence to suggest an absence of pre-existing difference in musicrelated brain functions prior to learning an instrument (Norton et al., 2005), with differences emerging approximately 1 year after music training has commenced (Schlaug et al., 2005; Hyde et al., 2009).

Over the past decade, it has been clearly demonstrated that certain cognitive abilities are susceptible to targeted training. For instance, WM capacity has been shown to improve with as little as 5 weeks of cognitive training (Klingberg et al., 2002, 2005; Westerberg et al., 2007; Holmes et al., 2009; Thorell et al., 2009; Bergman Nutley et al., 2011; Brehmer et al., 2012; Green et al., 2012). The neural effects of this type of training have been studied with functional magnetic resonance imaging (fMRI), revealing increased prefrontal and parietal activity (Hempel et al., 2004; Olesen et al., 2004; Jolles et al., 2010). Other studies have reported gains in fluid intelligence after training WM (Klingberg et al., 2002, 2005; Jaeggi et al., 2008) or non-verbal reasoning (Bergman Nutley et al., 2011; Mackey et al., 2011), although the transfer effects from WM to reasoning tests are smaller and inconsistent across studies compared to the effect on WM (Holmes et al., 2009; Thorell et al., 2009). Studies also show that processing speed is susceptible to computerized training improvements (Mackey et al., 2011; Wolinsky et al., 2013). There is also emerging evidence, although with mixed results thus far, that WM training may affect academic performance with improvements in mathematical problem solving (Holmes et al., 2009) and reading comprehension reported (Dahlin, 2010; Egeland et al., 2013). If causality can be established between music training and its association with cognitive and academic benefits, then music training should perhaps be considered a type of cognitive training.

To date, the specific aspects of musical training that affect cognitive improvements have not been identified. One candidate factor is the practice of sight reading. The complexity of sight reading requires rapid (and for some instruments dual) information processing, visuo-spatial decoding, and constant updating of notes to come while playing current notes. In other words, sight reading requires visuo-spatial WM abilities. Sight reading shows correlations with both IQ (*r* = 0.6; Salis, 1978) and with WM (*r* = 0.3–0.4; Salis, 1978; Meinz and Hambrick, 2010). Brain activation studies during sight reading have shown occipito-temporal and parietal activations (Nakada et al., 1998; Schon et al., 2002; Stewart et al., 2003; Bengtsson and Ullen, 2006), areas typically involved in visuo-spatial abilities (dorsal-stream) and pattern recognition.

While data suggest that there is a link between music practice and cognitive ability, it has not been ascertained how music training affects cognitive abilities during development. One factor expected to be of importance for the effect is the time spent training (Schellenberg, 2006; Forgeard et al., 2008), although some studies have failed to find a relation between the number of music lessons and cognitive ability (Ruthsatz et al., 2008; Meinz and Hambrick, 2010).

Most studies have thus far either had small sample sizes (Rauscher et al., 1997; Gromko and Poorman, 1998; Bilhartz et al., 1999; Schlaug et al., 2005; Bugos et al., 2007) or have simply correlated music and cognitive ability in natural samples at one time point providing no information about causality (Schellenberg, 2006; Forgeard et al., 2008; Ruthsatz et al., 2008). To test if music training is associated with differential trajectories of the development of different cognitive functions, longitudinal designs on developing populations could be useful. This study aimed to investigate the effects of music training on WM, processing speed, and reasoning ability in a longitudinal developmental sample.

By using a longitudinal approach we can study the change in cognitive ability related to musical practice, controlling for initial cognitive performance, age, and other possible confounders. More specifically we hypothesize that music training will be associated with: (1) over all positive cognitive effects as seen in performance on neuropsychological tests of WM, processing speed and reasoning ability (primary outcomes) as well as on tests of mathematical and reading ability (secondary outcomes) across time points, (2) a dose-related development of cognitive and academic ability (as seen with the same outcomes as in hypothesis 1) between time points, i.e., a positive linear relation between the amount of practice and the magnitude of the cognitive and academic benefits, (3) structural differences in the brain between music players and non-music players across the three time points.

#### **MATERIALS AND METHODS PARTICIPANTS**

This study used a longitudinal design with three completed measurement points, T1, T2, and T3 collected in 2007, 2009, and 2011, respectively. Participants between the ages of 6 and 25 were randomly chosen from the population registry in the town of Nynäshamn, Sweden. At each time point, information regarding the study was sent out to the parents together with a consent form to be returned to the researchers upon agreement. Parents were then contacted by telephone and given an opportunity to ask questions. Participants aged 18 years and older were contacted directly. At T1, 339 subjects participated in the study, out of which 273 also participated at T2 and 65 at T3. Exclusion criteria were: a first language other than Swedish, any diagnosis of psychiatric or neurological disorder (with exception to ADHD or dyslexia), any vision or hearing impairment considered to affect the test-performance. The study was approved by the local ethics committee in Stockholm. A subset of the sample where also asked

"fnhum-07-00926" — 2014/1/7 — 11:07 — page 2 — #2

to participate in neuroimaging data collection (see **Table 1** for sample sizes).

#### **COGNITIVE TESTS**

Neuropsychological assessment and questionnaires were administered at T1, T2, and T3. The tests were administered individually in different schools in a separate, quiet room. The questionnaire was sent to the participant's homes, covering information regarding after school activities and socio-economic background. Raw data from the assessments are found in **Table 1**.

In order to assess visuo-spatial WM, the Dot matrix from the Automated Working Memory Assessment (AWMA) battery was used (Alloway, 2007). This is a simple WM task and involves remembering the location and order of dots displayed sequentially in a grid on a computer screen. The dots were displayed in red in a four-by-four grid on a white background. Each dot was presented for 1000 ms, with an intra-stimulus interval of 500 ms. Each level consisted of six trials, with four correct trials as the minimum required for moving to the next level, where the number of dots to be remembered increased by one item. The test terminated after three errors were committed on one level. The Dot matrix test was performed on a HP Compaq nc6320 laptop with a 15-in screen.

Backward digit recall was used to assess verbal WM (Alloway, 2007). This test requires the participant to repeat a list of numbers read out loud in reverse order and started with two digits. Difficulty level increased until the participant failed to pass a level. Each level consisted of six trials. Four correct trials were required to move on to the next level.

Speed of processing was measured using a letter–digit substitution task. In this task, the participants were shown a row with nine consonants, each paired with a number. Underneath, nine additional rows of 15 letters were presented with the numbers absent. The task was to pair as many numbers with their corresponding letter as possible during a 1 min period.

Raven's Advanced Progressive Matrices (Raven, 2003) was used to assess non-verbal reasoning. The test consists of matrices in black and white distributed over sets A–D for the 6-year-olds and A–E for the rest, with 12 items in each set. The task was to identify the completing piece to a matrix, having a choice of 6 in sets A–B and a choice of 8 in sets C–E. Because the items performed differed between individuals in the sample the outcome underwent item response theory modeling where the performance of each participant was converted into a standardized score which was used as the outcome measure.

#### **ACADEMIC TESTS (AGES 8–25)**

The arithmetical assessment was based on the Trends in Mathematics and Science Study (Martin et al., 2004) and Basic Number Screening Test (Gillham and Hesse, 2001), having been designed for four school-grade-dependent versions (grades 2, 4, 6, and 8, suitable for 14–27-year-olds). Grades 2 and 4 problems included magnitude judgments, questions about the number sequence, as well as elementary arithmetic (addition, subtraction, division, multiplication, and fractions). Grades 6 and 8 problems included elementary arithmetic and elementary algebra (simple equations with variables).

Narrative and expository texts from the Progress in International Reading Literacy Trend Study (PIRLS 2001 T) and The International Association for the Evaluation of Educational Achievement Reading Literacy Study 1991 (Gustafsson and Rosén, 2004) were used to measure reading comprehension.


**Table 1 | Sample sizes and descriptive data for the variables included for each time point, with standard deviations in parenthesis.**

*\*The academic tests were assessed through ages 8–25 and thus the Ns for the table do not encompass these two variables but had a total of 273 at T1.*

"fnhum-07-00926" — 2014/1/7 — 11:07 — page 3 — #3

Seventy-seven items were used to form four age-adapted reading comprehension tests for 8–25 years olds.

#### **LIFE STYLE QUESTIONNAIRE**

The questionnaire included questions on musical practice (yes/no), the instrument played, and the number of hours per week of practice. It also included questions on the highest level of education for each parent (total years of education used as a continuous variable), time spent online, gaming and watching TV (summed together in analyses) and time spent doing physical activity outside of physical education.

For the first hypothesis the music practice was coded as a binary variable (0/1) in order to investigate the relation between music practice and cognitive and academic performance overall (across the time points). The second hypothesis was investigated using the number of weekly hours reported spent on music practice for the time point previous to the predicted one as an explanatory variable (for individuals who had reported playing for two of the time points) for cognitive and academic development.

#### **NEUROIMAGING**

Three-dimensional structural T1-weighted imaging (magnetizationprepared rapid gradient echo sequence, repetition time 2300 msec, echo time 2.92 msec) with a 256 mm × 256 mm field of view, 176 sagittal slices, and 1 mm<sup>3</sup> voxel size was carried out with a 1.5T Siemens Avanto scanner on 64 participants and repeated after 2 and 4 years. GRAPPA parallel imaging technique with an acceleration factor of two was also employed to speed up the acquisition. Gray matter segmentation was performed on the structural data with a voxel-based morphometry tool available via SPM5 (www.fil.ion.ucl.ac.uk/spm/software/spm5) and followed by an alignment technique performed with the Diffeomorphic Anatomical Registration with Exponentiated Lie algebra (DARTEL) toolbox in SPM. This method iteratively aligned the gray matter images from all three time points to their common average template. The modulated images were then spatially smoothed with a Gaussian kernel size of 8 mm and registered to Montreal Neurological Institute space. Because the DARTEL morphing was applied to tissue segmented images, output images were the tissue probability maps in which each voxel shows the probability of being locally expanded or contracted.

#### **RESULTS**

Descriptive information regarding the demographics of the sample is included in **Table 1**. There was a significant difference between groups (music vs. non-music) in age (*t* = 1.98, *p* = 0.048) at time point 1, and trends for maternal and paternal education levels [*t* = 1.91, *p* = 0.061 (T3) and *t* = 1.77, *p* = 0.077 (T2)], and time spent online/gaming/TV viewing at T3 (*t* = 1.89, *p* = 0.067; as tested with *t*-tests of independent samples at each time point). There was neither a difference between the music and non-music players in weekly hours spent on physical exercise (*t* < 1.36, *p* > 0.17), nor on any of the other covariates at the remaining time points (all *t*s < 0.71, *p* > 0.48).

In order to investigate the first hypothesis: whether music practice was associated with a different performance level on the four cognitive tests across all three time points, we ran mixed linear regression models with time as the repeated variable (unstructured repeated covariance type) performance on the four tests, respectively, as the dependent variables with sex, time, and music practice (yes/no) as the factors and age (inverse),father's and mother's education levels (total years), hours spent on gaming/internet/TV viewing and on physical activity as covariates.

There was a significant main effect of music on visuo-spatial WM [*F*(1,333) = 4.59, *p* = 0.033], verbal WM performance [*F*(1,333) = 9.69, *p* = 0.002] processing speed [*F*(1,333) = 4.91, *p* = 0.027], and reasoning ability [*F*(1,332) = 28.34, *p* < 0.001]. There was also a significant positive association between music practice and math performance [*F*(1,317) = 12.91, *p* < 0.001] but not with reading comprehension [*F*(1,317) = 1.42, *p* = 0.23]. The residuals from corresponding mixed models excluding the music factor are plotted for the primary outcomes according to music category (thus removing the effect of age, sex, time, parental education, and hobbies) in **Figure 1**.

Next, we evaluated hypothesis number two: the expected doserelated development of cognitive ability between time points. This was done by running a mixed linear regression model (unstructured repeated covariance type) with a variable for later time points (e.g., visuo-spatial WM Late, which could be either data from T2 or T3 or both if the subject participated at all three time points) as the dependent variable and a variable for earlier time points (e.g., visuo-spatial WM Early, which could be T1 or T2 or both) as independent variables so that data from individuals participating in two or three time points could be included as repeated measures. The models also included music hours/week, age (inverse), sex, father's education level, mother's education level, TV/online/gaming hours/week, and exercise hours/week as independent variables.

The results showed a significant effect of music practice hours/week on the outcome measures visuo-spatial WM, verbal WM with a trend for reasoning ability (see **Table 2** for regression coefficients). There was no significant effect of music practice hours/week on the development of processing speed, math, or reading performance (all βs for the music variable <0.151, *p* > 0.17).

We investigated the presence of developmental windows where music practice would lead to larger cognitive effects by adding interaction terms of music practice by age to the models listed in **Table 2** but found no evidence of such an effect (with *p*s > 0.16).

The third hypothesis was that there would be a difference detected between music and non-music players in the brain. In order to find gray matter structural differences between music and non-music groups, we entered the music group as the main factor in a flexible factorial design analysis using SPM. All three time points of data were included in the analysis considering the subjects and time as factors, to take into account the repeated measures. The analysis was corrected for the effects of age, sex, handedness, and total gray matter volume. We also added an interaction term for age by group in order to detect age sensitive periods for music practice in the brain. The significant level was set at the cluster level with the threshold of

"fnhum-07-00926" — 2014/1/7 — 11:07 — page 4 — #4

**Table 2 | Regression coefficients for the models investigating the amount of change in cognitive performance on the three outcomes that could be explained by music hours/week and other possible factors.**


*p* < 0.05 (corrected for the multiple comparisons of all voxels within gray matter volume). We found two clusters significantly showing differences in gray matter volume. One in the temporal lobe (*p* = 0.008, see **Figure 2A**) situated mainly in the inferior temporal and temporal-occipital fusiform gyri, based on the MNI structural and Harvard-Oxford Cortical Structural atlases (see **Table 3** for coordinates and cluster sizes), and the other cluster was in the insula, caudate, and putamen (*p* = 0.002 see **Figure 2B**). The interaction of age by group was not significant.

#### **DISCUSSION**

In this study, the effect of musical practice on cognitive ability was investigated in a developmental longitudinal study. The results showed that practicing a musical instrument was associated with higher performance on tests of reasoning, processing speed, and WM, as well as mathematics but not reading comprehension. Importantly, the development between time points was related to the time spent on music practice per week for both WM tests. The effect was also seen after correcting for baseline performance, parental education, and other after

"fnhum-07-00926" — 2014/1/7 — 11:07 — page 5 — #5

**and the insula; (B) effects in caudate nucleus shown in sagittal section from both left and right hemispheres.**

school activities, including physical activity, TV-watching, and gaming. These results confirm previously reported associations between musical practice and cognitive ability (Hurwitz et al., 1975; Schellenberg, 2006; Forgeard et al., 2008; Ruthsatz et al., 2008). The dose-response relation also supports the previously reported causal relation between the two (Schellenberg, 2004; Bugos et al., 2007).

We did not find a dose-response relation between music practice and the development of processing speed and only found a trend for the development of reasoning ability. The positive effects on the secondary outcome of mathematics did not point to a causal effect of music on academic development, and was only seen as an association with music practice across the time points. This is in line with previous findings of associations from correlational studies between music practice and grades in mathematics (Schellenberg, 2006; Cabanac et al., 2013) and mathematical performance (Schellenberg, 2006). As most of the music players in our sample were already practicing an instrument at the beginning of the study, it is possible that potential *pre-study* music-related changes in the development of WM underlie the learning advantages seen for mathematics, given its strong relation to visuo-spatial WM in particular (Bull et al., 2008). We did not see an association between music practice and reading comprehension, a finding previously reported in a correlational study (Schellenberg, 2006). The reason for this is unclear and should be further studied.

Differences were also seen in the gray matter volume of the brain between music players and non-players in the temporooccipital and insular cortex across the time points. One could argue that since the music group outperformed the non-music group on cognitive measures at the beginning of the study that these differences should be controlled for in the brain imaging analysis. However, it is not certain and perhaps, given the literature (e.g., Schlaug et al., 2005), even unlikely that the cognitive differences are independent of music, thus controlling for them would also remove cognitive differences associated with music in the brain. The differences observed in the brain are consistent with previous findings showing higher gray matter volume for musicians in the fusiform gyrus (James et al., 2013), an area typically involved in visual pattern recognition. Specifically, this area has been identified through fMRI investigations as important during musical notation decoding (Schon et al., 2002; Stewart et al., 2003). These regions do not include the fronto-parietal networks involved in top-down attention and WM (Olesen et al., 2004). However, the effect was seen in the caudate nucleus which also has been implicated in WM training (Olesen et al., 2004; Dahlin et al., 2008; Backman et al., 2011). This demonstrates certain similarities between the neural correlates of cognitive training and music practice.

Over the past decade, research has shown that WM capacity is subject to training-induced improvements (Klingberg et al., 2002, 2005; Holmes et al., 2009; Thorell et al., 2009; Brehmer et al., 2012; Green et al., 2012) and that the size of the transfer effects are linearly related to practice time (Jaeggi et al., 2008; Bergman Nutley et al., 2011). It is therefore likely that the time spent on other WM taxing activities may affect WM capacity and other abilities partly depending on the same brain regions. Reasoning has also previously reported to improve by training on WM tasks (Klingberg et al., 2002, 2005; Jaeggi et al., 2008), non-verbal reasoning tasks (Bergman Nutley et al., 2011; Mackey et al., 2011), as well as in strategy training studies (Klauer et al., 2002; Hamers et al., 1998). These types of interventions target different aspects of reasoning and the performance improvements reported probably have different origins. Interestingly, it has been shown that it is possible to improve test performance in a single practice session by learning to, e.g., attend to one stimulus dimension at a

**Table 3 | Significant association between playing a musical instrument and gray matter volume.**


*\*The labeling is based on both MNI structural and Harvard-Oxford Cortical Structural atlases provided by FMRIB Software Library (FSL) (http://fsl.fmrib.ox.ac.uk/fsl/fslwiki/Atlases).*

"fnhum-07-00926" — 2014/1/7 — 11:07 — page 6 — #6

time (Denney and Heidrich, 1990). This indicates an immediate effect, based on learned strategies, on how to best approach the visuo-spatial structure of the task and is independent of practice time.

The decoding of musical notation is one factor that may explain the effect of music on cognitive functioning as it requires abilities such as spatial–temporal reasoning and visual perception (Gromko, 2004). Given that the results indicate linear effects of music practice on WM and that previous studies have demonstrated effects of music practice on reasoning after as little as 1 year of practice (Rauscher et al., 1997; Schellenberg, 2004), it may be that the music-related processes taxing reasoning are trained while *learning* the musical notation code, whereas the processes taxing WM are trained while *reading* the code, keeping sections of code in WM and in control of attention. This would thus explain the linear relation with time practiced between WM and music, as two more years of music reading would further support development of WM. The effect of music training on reasoning development, on the other hand, may occur as quickly as the foundation of the musical systems are understood rather than linearly and gradual. That being said, regardless of beneficial strategies or basic understanding of the task, WM is required to solve reasoning problems (Carpenter et al., 1990) and to read musical notation (which could explain the trend to linear effects on reasoning development).

Since the music players in this study showed superior performance on cognitive measures already at time point one it is not possible to conclude that music practice caused the cognitive effects, although the dose-response relation indicates that this could be the case. Given that the groups differed in age and that the natural development of the cognitive functions assessed in this study are not linear throughout the age span of the sample, it was not possible to simply test an interaction between time and music in the mixed model. In order to address this, case-control matched analyses were attempted but resulted in too few control cases to enable the analyses at sufficient statistical power. Hence, we instead performed linear regressions with the outcome at later time points as the dependent variable while controlling for the outcome at earlier time points to inform about the change in cognitive ability related to music practice between time points.

Although prior research has suggested positive cognitive effects of physical activity (Hillman et al., 2008; Hotting and Roder, 2013) as well as from playing computer games (Green and Bavelier, 2012), we only found a positive effect for verbal WM development related to time spent on TV viewing/online and gaming activities. The reasons for a lack of a general effect on all cognitive tests are unclear, however, could be due to the crude measures of the covariates (combined score for the gaming/online/TV viewing variable, with no distinction made between verbal and more visuo-spatial activities and no quantification of aerobic fitness or activity type). It may be that the combined score for gaming/online and TV viewing represented activities relying on the use of verbal information processing to a larger extent than taxing the visuo-spatial domain.

There may be alternative explanations to the different trajectories of WM development that in this study appear to be music-related, such as differences in personality. A recent study showed that personality type predicted the duration of music

training (along with parental education) to a larger degree than IQ (Corrigall et al., 2013). This could be of importance as our results suggest that time spent on music practice is predictive of WM development and could perhaps partly be explained by the fact that certain personality types of high-functioning individuals tend to show longer durations of practice and may, independently of music, have a different trajectory of their WM development. Another possibility is that there could be an interaction between personality type and the cognitive training effects associated with music practice. This will be for future studies to investigate.

It is likely that part of the sample in our study is musically trained in the vocal domain, which is not accounted for. However, this should hardly affect the interpretation of the results as it is likely to add noise to the data, underestimating the strength of the effects, if anything. Future studies should continue to pursue the question of causality, the mechanism through which music improves cognitive ability as well as potential effects on academic outcome. Ideally, detailed information on musical notation reading skills should be recorded along with personality dimensions and compared with performance on cognitive tests to further investigate the mechanisms involved in music practice and its effects on cognitive ability.

#### **CONCLUSION**

The results from this study show that music training is associated with cognitive and mathematical benefits. This was made apparent through the superior level of performance across time for music players compared with non-players. There was also a difference in the gray matter density within brain in areas related to music notation decoding. Furthermore, the data suggest that music players show a steeper development of both visuo-spatial and verbal WM over time, supporting previously reported causal effects between music practice and cognitive performance. Time spent on music practice predicted both visuo-spatial WM and verbal WM development. More generally, these findings support the importance of practice and learning for the development of WM during childhood and adolescence.

#### **REFERENCES**


"fnhum-07-00926" — 2014/1/7 — 11:07 — page 7 — #7


"fnhum-07-00926" — 2014/1/7 — 11:07 — page 8 — #8


Wolinsky, F. D., Vander Weg, M. W., Howren, M. B., Jones, M. P., and Dotson, M. M. (2013). A randomized controlled trial of cognitive training using a visual speed of processing intervention in middle aged and older adults. *PLoS ONE* 8:e61624. doi:10.1371/journal.pone.0061624

**Conflict of Interest Statement:** Sissela Bergman Nutley is an employee of Pearson/Cogmed. Torkel Klingberg has had consultancy agreement with Pearson/Cogmed.

*Received: 28 August 2013; accepted: 18 December 2013; published online: 07 January 2014.*

*Citation: Bergman Nutley S, Darki F and Klingberg T (2014) Music practice is associated with development of working memory during childhood and adolescence. Front. Hum. Neurosci. 7:926. doi: 10.3389/fnhum.2013.00926*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Bergman Nutley, Darki and Klingberg. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

"fnhum-07-00926" — 2014/1/7 — 11:07 — page 9 — #9

### Neural implementation of musical expertise and cognitiv transfers: could they be promising in the framework of normal cognitive aging? e

#### *Baptiste Fauvel 1,2,3,4 , Mathilde Groussard1,2,3,4 , Francis Eustache1,2,3,4 , Béatrice Desgranges1,2,3,4 and Hervé Platel1,2,3,4 \**

*<sup>1</sup> INSERM, U1077, Caen, France*

*<sup>2</sup> Université de Caen Basse-Normandie, UMR-S1077, Caen, France*

*<sup>3</sup> Ecole Pratique des Hautes Etudes, UMR-S1077, Caen, France*

*<sup>4</sup> CHU de Caen, U1077, Caen, France*

#### *Edited by:*

*Merim Bilalic, University Tübingen, Germany*

#### *Reviewed by:*

*Sebastian J. Lipina, Unidad de Neurobiología Aplicada, Argentina Tilo Strobach, Humboldt University Berlin, Germany*

#### *\*Correspondence:*

*Hervé Platel, INSERM, U1077, EPHE-Université de Caen Basse-Normandie, Caen, France; U.F.R. de Psychologie, Université de Caen Basse-Normandie, Esplanade de la Paix, 14032 Caen Cedex, France e-mail: herve.platel@unicaen.fr*

Brain plasticity allows the central nervous system of a given organism to cope with environmental demands. Therefore, the quality of mental processes relies partly on the interaction between the brain's physiological maturation and individual daily experiences. In this review, we focus on the neural implementation of musical expertise at both an anatomical and a functional level. We then discuss how this neural implementation can explain transfers from musical learning to a broad range of non-musical cognitive functions, including language, especially during child development. Finally, given that brain plasticity is still present in aging, we gather arguments to propose that musical practice could be a good environmental enrichment to promote cerebral and cognitive reserves, thereby reducing the deleterious effect of aging on cognitive functions.

**Keywords: musical training, expertise, brain plasticity, cognitive transfer, aging, brain reserve**

#### **INTRODUCTION**

To adapt to changing environmental requirements, the brain has structural and functional dynamic properties that allow regularities to be encoded and skills to be learned and refined through regular practice (Draganski and May, 2008). In return, the efficiency of the cognitive system is partly influenced by environmental features and individual daily experiences (Plomin et al., 1999).

Mastering a musical instrument is a particularly complex multimodal and multiprocesses behavior because it requires a theoretical learning of the solfeggios rules, as well as a repeated training for translating visual codes (the written scores) into precise motor sequences and auditory events (Wan and Schlaug, 2010). Musical training and musicians have therefore been often used in studies focusing on practice-related brain plasticity. Interestingly, these investigations have brought some empirical evidences arguing that the regular commitment in musical activities does not only lead to behavioral improvements in perceptivo-motor skills, but also in a wide range of non-specific cognitive functions (Schellenberg, 2006).

In this article, we review (1) the structural plasticity effects induced by musical practice, to point out that they can be located in general cognitive related-areas outside the auditorymotor domain; (2) the functional differences that exist between musically naïve and musically trained subject when they perform auditory-motor tasks, to show that trained subjects can rely on general cognitive processes when the task become harder; (3) the transfers effect that occur from musical practice toward other cognitive functions, because it justify the use of music, notably in rehabilitation program for developmental disorders or brain injuries. Finally, (4) we present works and arguments to suggest that musical practice could also be encourage as a particularly good environmental enrichment to promote a successful brain aging.

#### **NEURAL IMPLEMENTATION OF MUSICAL EXPERTISE STRUCTURAL PLASTICITY INDUCED BY MUSICAL PRACTICE**

Structural brain plasticity phenomenon (i.e., changes in cells anatomy and connections) can be longitudinally and crosssectionally recorded in human using morphometric method of MRI data analysis (Draganski and May, 2008). In a longitudinal MRI study conducted by Hyde et al. (2009), 31 children aged around 6 years were scanned before being assigned to 15 months of either individual keyboard lessons or a non-instrumental music class. Whereas no structural differences were observed before the experiment, those children who had learned to play the keyboard exhibited significantly greater increases in gray matter volume after the lessons within the primary motor cortex (right pre-central gyrus) and motor-related areas (part of the corpus callosum), as well as within the primary auditory cortex (right Heschl's gyrus). The extent of the rearrangements in these motor and auditory areas was correlated with improvements in finger tapping and auditory frequency discrimination tasks, respectively. Interestingly, the authors also found a greater increase in gray matter volume in brain regions dedicated to the integration of multimodal information (bilateral frontal areas and left pericingulate

"fnhum-07-00693" — 2013/10/21 — 10:25 — page 1 — #1

gyrus). In the context of musical practice, these regions may sustain the translation of the visual score into motor sequences and auditory-induced emotional events (Hyde et al., 2009).

Cross-sectional studies conducted with adult expert musicians have confirmed this anatomical brain shaping on several occasions. Adult violinists have a more highly elaborated topographic representation of their left hand in the right somatosensory cortex (Elbert et al., 1995). Moreover, the gray matter volume of the primary motor cortex (pre-central gyri) has been shown to increase in relation with the degree of musical expertise (Gaser and Schlaug, 2003), with a left-hemisphere advantage for pianists and a righthemisphere one for string players (Bangert and Schlaug, 2006), illustrating the close link between anatomical changes and training demands. Regarding the auditory cortex, Bermudez and Zatorre (2005)found greater gray matter volume in the right superior temporal gyrus of 43 adult musicians compared with non-musicians. Beyond the motor and auditory cortices, other local gray matter increases, possibly induced by score reading, have been found in the visuospatial and visuomotor areas (right fusiform gyrus, left intraparietal sulcus; James et al., 2013) and left inferior temporal gyrus (Gaser and Schlaug, 2003), as well as in multimodal association areas (right superior parietal gyrus; Gaser and Schlaug, 2003), and general cognitive-related areas (left inferior frontal gyrus; James et al., 2013).

Structural brain plasticity phenomenon also appears to take place in white matter tracts. The corpus callosum exhibits enlargement in musicians because they engage more frequently in bimanual behavior (Öztürk et al., 2002; Hyde et al., 2009), and possibly because this structure is involved in visuoauditory processes too (Bengtsson et al., 2005). Bengtsson and colleagues also found an increase in the diffusivity value of the internal capsule of the corticospinal tract, and a positive correlation between hours of musical practice and fiber tract diffusivity in frontal areas and the right arcuate fasciculus, which are crucial for language processing, and have been shown to be larger and more structured in musicians, especially singers (Halwani et al., 2011).

To resume, performing music involves elaborated, coordinated, and rules-based motor, auditory, and visual skills. Regular musical training therefore leads to an anatomical shaping of auditory, motor and visual-related areas, but some results showed that brain regions involved in more general non-musical cognitive processes, such as language, attention, or executive functions, are also affected by musical training-related structural plasticity (Hyde et al., 2009; James et al., 2013). This indirectly argues that these later functions could also play a crucial role in musical expertise achievement.

#### **FUNCTIONAL PLASTICITY INDUCED BY MUSICAL PRACTICE**

The impressive auditory skills and manual dexterity of musicians are not solely explained by the *anatomical* remodeling of auditory and motor-related areas, as they also exhibit *functional* brain differences with non-musicians when they perform tasks similar to music exercises (e.g., finger tapping tasks, tonal, or temporal auditory discrimination tasks).

fMRI studies have shown that when professional pianists and violinists are asked to perform complex uni- or bi-manual tapping tasks, they exhibit reduced activation of the primary and secondary motor areas (Jäncke et al., 2000; Lotze et al., 2003) as well as the supplementary and pre-motor areas (Krings et al., 2000), compared with matched controls. The authors of these studies concluded that motor skills are more automatized, and thus less costly for musicians, who need therefore to recruit fewer neurons for performing as well as controls subjects do. This automatization frees up resources that allow for better spontaneity and flexibility, as reflected in the greater activation of bilateral pre-frontal and parietal areas (Lotze et al., 2003; Landau and D'Esposito, 2006).

Regarding auditory skills, once again, besides the structural shaping of the auditory areas, particularities in neural responses have been recorded when expert musicians process musically relevant auditory stimuli. They have an enhanced cortical representation of the musical scale tones in the tonotopic map of the auditory cortex (Pantev et al., 1998) that leads them to exhibit greater expectancy and more effective attentional processes for musical sounds. Indeed, when they have to identify a sound with a deviant frequency (pitch) or rhythm embedded in a sequence of standard sounds, they display shorter latencies and/or higher amplitudes of the electrophysiological components known to reflect conscious sound discrimination and target detection [N2b and P3; Tervaniemi et al., 2005; Late Positive Component (LPC); Besson et al., 1994; P3, Ungan et al., 2013].

Musicians' enhanced cortical representations of musical rules turn also in a pre-attentive memory-based processing for musical sound encoding, maybe through the auditory corticofugal pathway. Indeed, modulations in both cortical and subcortical electrophysiological responses to the perception of musical irregularities have even been recorded when musicians' attention is diverted away from the stimuli. It is reflected in the appearance or increase of the MisMatch Negativity (MMN) component in the auditory cortex of musicians versus non-musicians when they are exposed to a melodic contour or interval change (Koelsch et al., 1999; Fujioka et al., 2004), and in the modulation of the Early Right Anterior Negative (ERAN) component when the changes consist of more complex musical irregularities (e.g., unexpected chords; Koelsch et al., 2002).

Concerning the subcortical level, musicians display faster neural synchronization and stronger brainstem encoding of chord arpeggios in both in tune and out of tune conditions. Interestingly, the magnitude of the brainstem response is predictive of the participants' performances in a pitch discrimination task (Bidelman et al., 2011).

Neuroimaging studies using fMRI have also been designed to explore whether musically trained individuals process the auditory aspects of music differently from non-musicians. When they are passively exposed to music or have to judge whether chords are consonant or dissonant, musically trained participants display a right to left shift in the activation of auditory temporal areas (Ohnishi et al., 2001) and frontal and inferior parietal areas (Minati et al., 2009). According to the authors, this suggests the use of more abstract and analytical strategies by the experts. Studies featuring pitch comparison tasks that involve greater working memory load (Gaab and Schlaug, 2003) or cognitive control and updating in working memory (Pallesen et al., 2010) report that, compared with non-musicians, musically trained

"fnhum-07-00693" — 2013/10/21 — 10:25 — page 2 — #2

participants recruit fewer early perceptual areas, and more auditory working memory-related areas (right posterior temporal and supramarginal gyri and bilateral superior parietal areas) or associative areas (right pre-frontal, parietal lateral, anterior cingulated, and dorsolateral frontal cortices). The more difficult the task is, the more the experienced musicians resort to these areas and the more their performances are better compared with those of controls (Oechslin et al., 2012). Finally, regarding musical semantic memory, a study conducted in our laboratory revealed that when musicians have to rate their familiarity with a melody, several autobiographical episodic memory-related areas are activated more strongly compared with non-musicians (bilateral hippocampus, visual primary cortex, cingulate cortex, and bilateral superior temporal areas; Groussard et al., 2010). This suggests that, owing to their musical experience, musicians engage self-referential processes to perform the task and gain access to richer and more vivid sensory details.

Moreover, musical expertise also leads to an auditory-motor coupling, as reflected by the activity displayed within the primary motor cortex when musicians hear a piece they have learned to play (Haueisen and Knösche, 2001; Bangert et al., 2006). Conversely, when they tinkle on a mute keyboard, the temporal areas dedicated to hearing are activated. In the same way, musical expertise is also built on auditory-visual and visuomotor associations. Indeed, it has been shown that when an atonal event (a wrong note) is written on a score, musicians are able to anticipate it, as reflected in their behavioral performance and physiological response to this event (Schön and Besson, 2005). Finally, when motionless pianists imagine themselves playing a piece from a score, they display the same pattern of neural activity within the secondary and associative motor areas as they do during an actual instrumental performance, although the degree of activation is reduced and does not concern the primary motor cortex (Meister et al., 2004).

To sum up so far, the expert auditory and motor performances of musicians are sustained by the functional reorganization of typical underlying neural processes, due to automation and better memory-based (top-down) processing. This makes the basic auditory and motor skills less costly for musicians, thereby allowing them to allocate resources to strategic processes when tasks become harder. Similar effects have also been observed for expert performance in other activities such as object and pattern recognition by chess master players (Bilalic et al., 2012) or working memory-related tasks (Guida et al., 2012).

#### **COGNITIVE TRANSFER OF MUSICAL TRAINING**

Musical hearing and practice seem not to be sustained by specific brain areas, but involve rather general pre-existing skills and cognitive functions. As musical learning put a high demand on it, it can result in transfers of improvements from one activity (music making) to others (e.g., language skills, executive functions).

#### **LANGUAGE SKILLS**

There is a longstanding debate about the division or sharing of the brain substrates of language and music. Although some famous neuropsychological cases of double dissociation have been reported (Stewart et al., 2006), we know that music and language perception and production share a lot of features, ranging from the basic sensorimotor level to auditory-cognitive processes. Indeed, it appears that humans perceive these two auditory stimuli using the same acoustic cues (rhythm, pitch, and timbre) and relying on similar resources for syntactic integration processing or memory. Furthermore, both language and music have a visual form of notation that allows them to be red or written, and producing either of these two behaviors involves auditory-motor coupling. According to Patel's OPERA hypothesis, transfers from musical training to language skills occur because (i) there is an anatomical overlap in their brain networks, (ii) musical practice implies more precision in processing features shared with language, (iii) this practice takes place in a repeated manner, and is associated with (iv) more focused attention, and (v) emotion (Patel, 2011).

Although they are less central to language than to music, pitch modulations are still important for speech in order to convey emotion. Cross-sectional (Magne et al., 2006), and longitudinal studies (Moreno and Besson, 2005) found that musically trained children were better at detecting pitch violations in a foreign or native language than their musically untrained counterparts. Moreover, the authors reported a cortical Late Positive Component (LPC) in response to pitch violations that was far less pronounced, or even absent, in the untrained children. Rather similar results have been reported for adults, with musicians displaying a shorter latency of this LPC when exposed to prosodic incongruities in foreign sentences (Schön et al., 2004; Marques et al., 2007).

Musicians also process the temporal features of speech sounds differently than non-musicians. Faster cortical neural responses to voice onset time, as well as to vowel or syllable duration, have been attested both attentively and pre-attentively for children who take part in music lessons (Chobert et al., 2012). As electrophysiological modulations have also been found at the brainstem level (Musacchia et al., 2007; Wong et al., 2007), Kraus and Chandrasekaran (2010) suggest that transfer effects from processing acoustic cues in a musical context to processing them in a speech context are mostly due to the reinforcement of the auditory corticofugal pathways (as in the reverse theory hypothesis that states that long-term cortically stored representations guide early perceptual encoding via descending pathways and top-down processes). The overtraining of musicians to detect, sequence, and encode relevant aspects of musical sound patterns shared with speech endows them with heightened phonological awareness.

Apart from the benefit of perceiving prosody in native and foreign languages, phonological awareness is also crucial for learning tone languages (Milovanov and Tervaniemi, 2011), hearing speech in noise, and reading skills (Strait and Kraus, 2011). This make music lessons potentially well suited to the rehabilitation of children with reading impairments such as dyslexia (Strait and Kraus, 2011).

Another point that music and language have in common is that they consist of auditory elements that unfold over time according to complex rules referred to collectively as syntax. Although the neural representations of these regularities are stored in different parts of the brain, their processing requires the same limited neural resources, allowing for transfer effects (Patel, 2011). When exposed to syntactic incongruities in speech, musicians have different Event-Related Potential (ERP) responses compared with

"fnhum-07-00693" — 2013/10/21 — 10:25 — page 3 — #3

non-musicians. These responses are more bilateral in adult musicians (Fitzroy and Sanders, 2013), and appear earlier during the development in children who participate to music lessons (Jentschke and Koelsch, 2009).

We have seen that regular musical training results in the reinforcement of auditory-motor coupling. This coupling is also essential for speech, and its stimulation through musical practice seems to contribute to the rehabilitation of stroke patients with language impairments (Rodriguez-Fornelis et al., 2012).

Finally, it seems that the short-term storage and manipulation of speech relies partly on the same neural networks and cognitive mechanisms in working memory as music (Besson et al., 2011; Schulze et al., 2011; Strait and Kraus, 2011). Regarding longterm episodic storage, a study conducted in our laboratory showed that the brain areas engaged in musical episodic retrieval match those known to be activated for the retrieval of non-musical stimuli (bilateral middle and superior frontal gyri and pre-cuneus; Platel, 2005). This could explain why musicians, who frequently use short- and long-term memory resources for processing music, often perform better than non-musicians on verbal working memory tasks (Bialystok and DePape, 2009), and sometimes on verbal episodic memory tasks, too (Chan et al., 1998).

To conclude, it appears that the brain areas thought to be dedicated to language and speech processing are rather involved whenever a behavior calls for fine-grained auditory analysis and implies an auditory motor coupling. Because musical experts rely on these brain mechanisms to process and play music, it turns to implications of language-related areas for processing music and to transfers from musical practice to language processes.

#### **EXECUTIVE FUNCTIONS AND IQ**

The positive effects of musical practice on cognition are not restricted to auditory and language skills. Considering literature, it seems that playing or learning music is linked to better performances on an astonishing range of cognitive measures. Even after taking the effects of potential confounding variables (e.g., family income, parents' education, and involvement in non-musical activities) into account, Schellenberg (2006) observed a positive association between the duration of music lessons in childhood and full-scale IQ results, as well as academic achievement. No evidence was highlighted for a particular strong correlation between musical training and one specific IQ subtests. Moreover, all the correlations disappeared when the IQ score was held constant, arguing for a homogenous and non-specific effect of musical training on intelligence. To certify a little the causal link, the authors replicated their findings in a follow-up study where children were assigned either to music, drama, or no lessons groups (Schellenberg, 2004). After the lessons, the improvement in IQ performances was significantly greater in the groups that had taken music lessons than in the other groups. Given that Schellenberg (2006) subsequently failed to find any effect on IQ test results when full-time music students were compared with non-musicians studying psychology, law, or physics, he concluded that musical learning is an extra-scholar, but scholar-like activity that requires more concentration, attention, and discipline than other everyday leisure activities. Therefore, according to this theory, music lessons could enhance the ability to plan and make decisions, correct errors, ignore irrelevant or distracting information, produce novel responses and avoid habitual ones, and cope well in difficult situations. These skills, referred to as *executive functions*, serve all the other cognitive domains and all kinds of learning. Accordingly, just like school learning does, taking music lessons as an out-of-school activity should enhance IQ in a non-specific manner (Hannon and Trainor, 2007).

To conclude, according to some authors, taking music lessons during childhood seem to act as a particularly powerful environmental enrichment to potentiate all cognitive development through the potentiation of general cognitive resources such as executive functions.

#### **MUSICAL PRACTICE AS A SHIELD AGAINST AGING CEREBRAL PLASTICITY AND COGNITIVE AGING**

With the ongoing increase of life expectancy in industrialized countries, cognitive and brain aging have become key issues in neuropsychology. It is known that interindividual differences regarding cognitive performances increase with aging and that clinical consequences of aging-related cerebral atrophy differ consistently from one individual to another (Villeneuve and Belleville, 2010). This has led researchers to come up with the concepts of brain and cognitive reserve. *Brain reserve* is based on the brain's anatomical characteristics and the fact that a higher gray matter volume counteracts atrophy and delays the appearance of the first clinical symptoms. Co*gnitive reserve* refers to neurocognitive mechanisms, such as enhancing the efficiency of the brain networks engaged to carry out a task, or using either supplementary or entirely alternative networks, reflecting recourse to compensatory strategies. These functional changes must allow maintaining efficient cognitive performances in the face of age-related physiological disturbances (Stern, 2009). Researchers have shown that the quality of such reserves is partly determined by early environmental features such as educational level, but also actual environmental variables, such as occupational level. In this framework, we think that musical activities could be a particularly suited occupation to promote brain and cognitive reserves. Indeed, (1) as we have seen, results from children and adults subjects indicate that musical practice is both a multimodal and a multiprocess activity, leading to wide cerebral plasticity phenomenon and putting heavy demands on executive functions that might help in this way to promote cognitive development (Schellenberg, 2006; Hannon and Trainor, 2007). It is not obvious that researcher will find similar results in elderly subjects (Strobach et al., 2012a,b), but there is good hopes that musical activities could be appropriate in the field of normal cognitive aging, where deterioration mostly affects executive functions (Verhaeghen, 2011), partly because of the neural disconnection that disrupts the functional integration of multiple systems (Madden et al., 2011). (2) Music (as other leisure activities) owns implicitly several characteristics of rehabilitative paradigms (e.g., gradual increment in task difficulty, motivation and arousal states elicited by the exercise, useful performance-related feedback and rewards; Green and Bavelier, 2008) that greatly increase training efficacy, generalization, and flexibility. Moreover, (3) music is not only a cognitively costly activity, but also an artistic occupation, a new field of research try now to demonstrate that musical

"fnhum-07-00693" — 2013/10/21 — 10:25 — page 4 — #4

enjoyment and chills are sustained by the release of chemical neurotransmitters (Blood and Zatorre, 2001: Salimpoor et al., 2011) and regulate some hormones level (Boso et al., 2006; Fukui and Toyoshima, 2008). Interestingly, some of these neurochemicals mechanisms could influence brain healing in positive way by partly controlling neurogenesis (Fukui and Toyoshima, 2008) and synaptogenesis (Kuo et al., 2008), as well as blood pressure level (Ramchandra et al., 2005). But the speculation that music could favor brain healing through its esthetic experiencerelated neuroendocrine mediators remains to be well empirically established.

At this moment, to our knowledge, only two studies have focused on the influence of musical practice on cognitive aging quality. One study, comparing professional musicians with amateur musicians and control participants aged from 60 to 83 years old, revealed that performances on the deferred recall of a geometrical shape, verbal denomination, and executive function tasks differed significantly in favor of professional musicians, with the amateur musicians' performances lying midway between those of the professional musicians and the controls (Hanna-Pladdy and MacKay, 2011). As it has been proposed for child development, the authors argued that musical practice has a strong effect on executive functions, indirectly improving a wide range of mental processes. In the other study, individualized piano lessons were given to non-musicians aged from 60 to 85 years (Bugos et al., 2007). After 6 months of lessons, those participants who had been assigned to piano lessons exhibited significant improvements in executive tasks (Trail Making Test and WAIS codes).

To resume, among others determinants, socio-cognitive lifestyle behavior influences the quality of cognitive aging, through brain and cognitive reserve potentiation. We think that theoretical arguments (e.g., ecological nature of musical training and "chemical substrate" of musical experiences) as well as empirical results

#### **REFERENCES**


from studies conducted with children and adults subjects must encourage works aiming at validate its use in the field of aging [as done by Hanna-Pladdy and MacKay (2011) and Bugos et al. (2007)].

#### **CONCLUSION**

To conclude, music is built on and constrained by the human brain and its cognitive capacities. Becoming a musician places a heavy demand on these capacities, in engaging them simultaneously and in a coordinated manner. Through regular practice, behavioral improvements are sustained by anatomical and functional brain changes, as well as by the application of strategic processes. Transfers occur from regular musical practice to non-musical skills, such as language, that rely upon the same neural resources and cognitive mechanisms. Moreover, during childhood, some authors argue that music lessons act as a particularly comprehensive source of environmental enrichment for promoting general cognitive development, perhaps through executive functions potentiation.

During aging, the main cognitive difficulties seems to deal with executive processes (Verhaeghen, 2011), therefore, there is a good reason to speculate that musical practice could also have a positive influence on cognition in this period of life. In fact, preliminary studies focusing on musical practice and cognition in elderly subjects tend to confirm this assumption (Bugos et al., 2007; Hanna-Pladdy and MacKay, 2011), but others well-controlled and comparative studies are needed to (1) confirm that musical activities engagement during life has a positive influence on cognitive efficiency during aging, (2) specify when, how, and under which conditions this influence occurs, (3) confront this influence with those of other leisure activities or cognitive intervention programs. If these investigations are conclusive, it would validate a little more the status of music as a "transformative technology of the mind" that benefits the very brain which created it (Patel, 2010).

of context dependant of objects and their relation. *Hum. Brain Mapp.* 33,


"fnhum-07-00693" — 2013/10/21 — 10:25 — page 5 — #5

*Cogn. Neurosci.* 16, 1010–1021. doi: 10.1162/0898929041502706


C., and Lazyras, F. (2013). Musical training yields opposite effects on grey matter density in cognitive versus sensorimotor networks. *Brain Struct. Funct.* doi: 10.1007/s00429- 013-0504-z [Epub ahead of print].


professional musicians and nonmusicians by using in vivo magnetic resonance imaging. *J. Neuroradial.* 29, 29–34. doi: 10.1093/cercor/bhs206


"fnhum-07-00693" — 2013/10/21 — 10:25 — page 6 — #6

and electrophysiological study. *J. Cogn. Neurosci.* 17, 694–705. doi: 10.1162/089829053467532


133–146. doi: 10.1525/MP.2011.29. 2.133


*Sci.* 34, 25–39. doi: 10.1007/s10072- 012-0961-9


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 31 July 2013; accepted: 01 October 2013; published online: 22 October 2013.*

*Citation: Fauvel B, Groussard M, Eustache F, Desgranges B and Platel H (2013) Neural implementation of musical expertise and cognitivetransfers: could they be promising in the framework of normal cognitive aging? Front. Hum. Neurosci. 7:693. doi: 10.3389/fnhum. 2013.00693*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2013 Fauvel, Groussard, Eustache, Desgranges and Platel. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

"fnhum-07-00693" — 2013/10/21 — 10:25 — page 7 — #7

### Neurophysiological constraints on the eye-mind link

#### *Erik D. Reichle1 \* and Eyal M. Reingold2*

*<sup>1</sup> School of Psychology, University of Southampton, Southampton, UK*

*<sup>2</sup> Psychology, University of Toronto, Mississauga, ON, Canada*

#### *Edited by:*

*Merim Bilalic, University Tübingen; University Clinic, Germany*

#### *Reviewed by:*

*Raymond Bertram, University of Turku, Finland Petar Milin, Eberhardt Karls University Tuebingen, Germany*

#### *\*Correspondence:*

*Erik D. Reichle, School of Psychology, University of Southampton, Southampton SO17 1BJ, UK e-mail: erik.d.reichle@gmail.com* Several current computational models of eye-movement control in reading posit a tight link between the eye and mind, with lexical processing directly triggering most "decisions" about when to start programming a saccade to move the eyes from one word to the next. One potential problem with this theoretical assumption, however, is that it may violate neurophysiological constraints imposed by the time required to encode visual information, complete some amount of lexical processing, and then program a saccade. In this article, we review what has been learned about these timing constraints from studies using ERP and MEG. On the basis of this review, it would appear that the temporal constraints are too severe to permit direct lexical control of eye movements without a significant amount of parafoveal processing (i.e., pre-processing of word *n* + 1 from word *n*). This conclusion underscores the degree to which the perceptual, cognitive, and motor processes involved in reading must be highly coordinated to support skilled reading, a par excellence example of a task requiring visual-cognitive expertise.

**Keywords: ERP, MEG, computational models, reading, saccades**

#### **INTRODUCTION**

Reading is one of the most complex tasks that we routinely perform. Part of this complexity reflects the fact that the visual acuity needed to identify the features of printed text seems to be largely limited to a 2◦ region of the central visual field, the fovea. Because of this limitation, readers must direct their eyes to the majority of words in a text (Rayner, 1998). And although the eyes normally move through the text so rapidly as to make the "task" of moving the eyes appear effortless, this impression is misleading because individual eye movements, called *saccades*, require time to program and execute (Becker and Jürgens, 1979) and are subject to both random and systematic motor error (McConkie et al., 1988). For those reasons, the programming and execution of saccades during reading is itself a highly skilled activity. Adding to this complexity is the fact that the "decisions" about when to move the eyes must be coordinated with the other cognitive processes involved in reading, such as the identification of words and the allocation of covert attention. Attempts to better understand these interactions have produced computational models that describe how lexical processing and attention are coordinated with the programming and execution of saccades to produce the patterns of eye movements that are observed with skilled readers (see Reichle et al., 2003).

Although the assumptions of these models are complex and varied, the two most successful of these models, *E-Z Reader* (Reichle et al., 1998, 2009) and *SWIFT* (Engbert et al., 2002, 2005), posit that the eyes are tightly coupled to the mind, with moment-to-moment "decisions" about when to move the eyes being controlled by lexical processing. For example, according to E-Z Reader, the completion of a preliminary stage of lexical processing (called the *familiarity check*) on a word initiates the programming of a saccade to move the eyes to the next word. And according to the SWIFT model, a saccadic program to move the eyes off of a word is initiated by an autonomous (random) timer that can be inhibited if the fixated word is difficult to process. In this way, both models can explain the ubiquitous finding that difficult (e.g., low frequency) words tend to be the recipients of longer fixations than easy (e.g., high frequency) words (Just and Carpenter, 1980; Inhoff and Rayner, 1986; Rayner and Duffy, 1986; Schilling et al., 1998; Rayner et al., 2004; Kliegl et al., 2006).

Because models of eye-movement control have been used to both simulate the "benchmark" findings related to eye movements in reading and examine various theoretical issues related to reading (e.g., how aging affects readers' eye movements; Laubrock et al., 2006; Rayner et al., 2006), the models represent serious attempts to explain the *eye-mind link*, or interface between lexical processing, on one hand, and eye-movement control, on the other. To the extent that they are successful in this capacity, however, the models raise an important question: How can something as slow as lexical processing mediate the decisions about when to move the eyes? To fully appreciate the paradoxical nature of this question, consider that, although fixation durations are quite variable during reading, occasionally being as short as 50 ms or as long as 800 ms, most are 200–250 ms in duration (Rayner, 1998). Because of this, and because of the fact that some non-trivial amount of time is required to program a saccade (Becker and Jürgens, 1979), it is not immediately obvious how there can be enough time available during each fixation to allow lexical processing to intervene in the decisions about when to move the eyes.

In the remainder of this article, we will attempt to resolve this paradox by reviewing what has been learned from neurophysiology studies about the time course of those processes that are known to play important functional roles in reading and according to models like E-Z Reader and SWIFT—eye-movement control during reading. The studies that will be reviewed (see **Table 1**) employ two basic methods, *event-related potentials* (*ERP*s) and *megnetoencephalography* (*MEG*), to examine the times required to propagate visual information from the eyes to the brain, and to then visually encode and engage in lexical processing of printed words. It is important to note, however, that although these studies provide estimates of the times required to complete these processes, these estimates are inherently conservative because they correspond to the first statistically reliable effects of, for example, some variable (e.g., word frequency) on ERP markers of lexical processing. We will therefore provide the minimum, mean, and maximum values of each estimate.

#### **RESULTS**

#### **RETINA-BRAIN LAG**

The retina-brain lag is the time required for visual information to propagate from the eyes to the earliest cortical areas of the brain. The duration of this lag has been estimated using ERPs by having subjects attend to a visual stimulus (e.g., checkerboard pattern) that is suddenly displayed on a computer monitor and measuring when a *visual-evoked potential* (*VEP*) occurs relative to the onset of the stimulus. Because early cortical areas maintain a retinotopic mapping between spatial locations in the visual field and cortex (Courtney and Ungerleider, 1997), it is possible to localize the neural generators of the VEP and thereby confirm that it reflects early visual processing.

Using this procedure, Clark et al. (1995) found that the VEP had a mean latency of 40–45 ms post-stimulus onset. Using a similar paradigm but having subjects make recognition decisions about images of faces, George et al. (1997) found that VEPs differentiate previously seen versus novel faces as early as 50 ms post-stimulus onset, with these repetition effects peaking at around 80 ms. This finding has been replicated (Mouchetant-Rostaing et al., 2000; Seeck et al., 1997), providing additional evidence that 45–50 ms is sufficient for visual information to reach the brain. Finally, in an experiment designed to examine the time course of both early and late visual processing, Foxe and Simpson (2002) observed a 50–63 ms VEP onset latency when subjects viewed pairs of bilaterally displayed disks with the task of indicating whenever one was displaced from the other. Thus, as **Table 1** indicates, estimates of the retina-brain lag range from 47–73 ms across the studies that were reviewed, with a mean of 60 ms.

**Table 1 | Studies (listed chronologically) examining the time course the retina-brain lag, visual encoding, and lexical processing, including their method, task and stimuli, and estimates (in ms) of when the processes occur.**


*Mean Estimates 147.6* [*126.6–171.8*]

#### **VISUAL ENCODING**

As with the retina-brain lag, the minimal time required to engage in visual encoding has been estimated by examining when differential effects related to the visual properties of stimuli that are suddenly displayed on a computer monitor are first discernable in the ERP record. For example, Van Rullen and Thorpe (2001) had subjects make categorization decisions about photographs of vehicles versus animals and found category-related differences in ERP components as early as 75–80 ms post-stimulus onset. Similarly, Foxe and Simpson (2002) found ERP components that could be localized to the infero-temporal, parietal, and dorsolateral-prefrontal regions (which have been implicated in high-level visual processing; Van Essen and DeYoe, 1995) were active by 70–85 ms post-stimulus onset, suggesting that these higher-level visual-processing regions can modulate visual processing in earlier regions via feedback in as little as 30 ms after processing begins in those earlier regions.

Similar estimates of the time course of visual encoding have also been reported in tasks involving (more global aspects of) lexical processing. For example, Hauk and Pulvermüller (2004) observed effects of word length on ERP components after 80–125 ms when subjects made lexical decisions about short and long letter strings. And using a similar methodology, Hauk et al. (2006) observed word-length effects within 90–100 ms. Two other studies (Assadollahi and Pulvermüller, 2001, 2003) demonstrated the generality of these results by examining the time course of visual encoding of printed words using MEG. In these studies, subjects first memorized a list of short and long high- and lowfrequency words and then viewed a random sequence comprised of those words and new words with instructions to press a button whenever they saw a new word. The key finding related to visual encoding were effects of word length, which were evident after 90–120 ms in the first study and after 60–120 ms in the second, indicating that visual properties of the words (i.e., their length) are encoded in as little as 10–40 ms after visual information had been propagated from the eyes to the brain. Thus, as **Table 1** shows, the studies reviewed in this section collectively suggest that visual encoding occurs within 77.5–105 ms, with a mean of 91.3 ms.

#### **LEXICAL PROCESSING**

For the purposes of this review, lexical processing will refer to mental operations that convert the visual representation of a word into its (abstract) orthographic form so that that information can be used to access that word's pronunciation and/or meaning. Unfortunately, attempts to determine the time course of lexical processing using neurophysiological methods have produced somewhat inconsistent results. For example, Sereno et al. (1998) conducted a seminal ERP experiment in which subjects made lexical decisions about letter strings that included high- and low-frequency words. (Because the frequency with which a word is encountered in text affects how rapidly its form and meaning can be accessed from memory, word-frequency effects are indicators that lexical processing is well underway; Hudson and Bergman, 1985) Sereno et al. found that word frequency modulated ERP components after only 132–164 ms, suggesting that lexical processing occurs very rapidly. Similarly, Hauk et al. (2006) observed word-frequency effects even earlier, by 100–120 ms.

This conclusion was bolstered by similar findings using both ERP and MEG. For example, the two MEG studies mentioned earlier (in relation to visual encoding) found a word-frequency effect (Assadollahi and Pulvermüller, 2001) and interaction between word frequency and length (Assadollahi and Pulvermüller, 2003) after 120–170 ms. Early lexical effects have also been observed in ERP experiments in which subjects read sentences that were displayed one word at a time: Penolazzi et al. (2007) observed a Frequency × Length interaction after 110–130 ms, and Sereno et al. (2003) observed a frequency effect after 132–192 ms. And similarly, Proverbio et al. (2004) observed word-frequency effects after 135–175 ms in an ERP experiment in which subjects detected target phonemes embedded in visually displayed words.

This evidence for rapid lexical processing must, however, be reconciled with results suggesting that such processing can be much less rapid. For example, Hauk and Pulvermüller (2004) had subjects make lexical decisions about letter strings that contained short and long high- and low-frequency words and found that ERP components were modulated by frequency after 150–190 ms. Similarly, Dambacher et al. (2006) recorded ERPs from subjects who read sentences and found word-frequency effects after 140–200 ms. Consequently, an important challenge for future investigations of the time course of lexical processing would be to isolate the methodological differences that produced the mixed pattern of results in the literature.

Because our goal is to better characterize the relationship between lexical processing and saccadic programming, the final two studies that will be reviewed are particularly important because they were explicitly designed to examine this relationship using ERP. In the first, Baccino and Manunta (2005) had subjects move their eyes to two peripherally-displayed words to make semantic-related judgments about those words. The key finding was that the frequency of the second word modulated ERP components after only 119–215 ms when the data were timelocked to the fixation on the first word, which was interpreted as evidence for rapid parafoveal lexical processing of the second word. In the second study, Reichle et al. (2011) had subjects move their eyes from centrally- to peripherally-displayed letter strings to indicate whether either was a non-word. The key finding was that the frequency of the central word modulated ERP components after only 102–162 ms when the data were time-locked to the onset of the saccade to the peripheral word, which was interpreted as evidence that an early stage of lexical processing initiates saccadic programming. These results, in combination with those of the other studies reviewed in this section, suggest that lexical processing is well under way by 126.6–171.8 ms, with a mean of 147.6 ms. However, it is important to note that, with the exception of Dambacher et al.'s (2006) experiment, these estimates were obtained using non-reading tasks that preclude normal parafoveal processing of upcoming words.

#### **DISCUSSION**

To better understand the theoretical implications of this review, it is instructive to superimpose the estimated process durations (**Table 1**) on a time-line corresponding to the amount of time available for cognitive processing during a single fixation of reading. **Figure 1A** thus shows the time course of the retina-brain lag, visual encoding, and lexical processing are aligned to the onset of a single 240-ms fixation. (Although this particular duration is arbitrary and ignores the issue of variability, it corresponds to the mean single-fixation duration observed on low-frequency words and is therefore a conservative estimate of the time available for lexical processing during most fixations; Reingold et al., 2012.)

As **Figure 1A** shows, the neurophysiological estimates suggest that, on average, 148 ms is required to visually encode a printed word and then complete some amount of lexical processing of that word. However, because the fixation is only 240 ms in duration, there is seemingly little time to complete all of the operations that are necessary to move the eyes off of the word 92 ms later. These operations (at a minimum) include the transmission of a signal to the oculomotor system to start programming a saccade, the actual programming of that saccade, and whatever afferent delay occurs in the brainstem circuitry prior to moving the eyes. The conclusion that so little time is available to complete these operations is seemingly at odds with eye-movement experiments suggesting that saccades require 125–200 ms to program

**FIGURE 1 | The time course of processing during a single, 240-ms fixation on a word, including: (1) the propagation of information from the retina to brain (green); (2) visual encoding of the word features (blue); (3) lexical processing (purple); (4) saccadic programming (red); and (5) shifting attention from one word to the next (orange).** (Panel **A**) Neurophysiological estimates for the times required for the retina-brain lag, visual encoding, and lexical processing are indicated by the colored bars superimposed on the time line that is shown at the bottom of the panel, with the three numbers above each colored bar indicating the estimated minimum, mean, and maximal times to complete each respective process (e.g., the estimated minimal time needed for the retina-brain lag is 43 ms). Based on these estimates, there should be little time (92 ms) available for saccadic programming and whatever transmission delays are necessary, e.g., to transmit a signal about the state of lexical processing to the oculomotor system. (Panel **B**) If some amount of lexical processing of the fixated word is actually completed parafoveally, from the previously fixated word, then the amount of (foveal)

lexical processing of the word being fixated is reduced (e.g., to 25 ms) and can thereby accommodate more realistic estimates of the time required to program saccades (e.g., approximately 124 ms). (Panel **C**) The time course of processing if one assumes direct lexical control of saccadic programming and the strict serial allocation of attention (e.g., see Reichle, 2011); as shown, the termination of whatever foveal lexical processing is necessary to initiate saccadic programming causes attention to shift to the next word, so that parafoveal lexical processing of that word can begin using visual information acquired from the fixated word. (Note that (Panel **C**) is meant to be theoretically neutral with respect to specific serial-attention models of eye-movement control, and lexical processing is thus shown as a single stage rather than, e.g., being divided into the two stages posited by E-Z Reader. However, the depicted time course maps onto the assumptions of E-Z Reader if: (a) the model's first stage of lexical processing corresponds to whatever lexical processing is completed prior to the initiation of a saccade, and (b) the model's second stage of lexical processing is subsumed in the time required to shift attention.)

(Becker and Jürgens, 1979; Rayner et al., 1983; Reingold et al., 2012). It is also at odds with models of eye-movement control in reading, which posit that lexical processing is the "engine" that cause the eyes to progress through the text (Reichle et al., 1998, 2009; Engbert et al., 2002, 2005). Our analysis of the time course of lexical processing thus poses a paradox if one is to maintain the position that the completion of some amount of lexical processing is what determines when the eyes move during reading.

The solution to this paradox is that a significant portion of the lexical processing of a word that must be completed to "trigger" saccadic programming is actually completed from the preceding word, using visual information that was acquired from the parafovea. How this happens is illustrated in **Figure 1B**, which is similar to **Figure 1A** except that lexical processing of the currently fixated word begins from the previously fixated word, so that only 25 ms of lexical processing of the fixated word is actually completed from that word. (Again, this precise value is arbitrary, ignores variability, and is only meant to provide an example.) Under this assumption, there is ample time (∼124 ms) for whatever neural transmission is required to signal the oculomotor system to program and then initiate a saccade. This hypothesis about the importance of parafoveal processing is consistent with survival analyses of fixation durations on high- and low-frequency words with versus without parafoveal preview: Word-frequency effects were discernable more than 110 ms earlier with than without preview (Reingold et al., 2012).

Finally, to make this hypothesis more concrete, **Figure 1C** shows the typical sequence of events that are posited to occur by eye-movement models in which attention is allocated to support the lexical processing of exactly one word at any given time (e.g., E-Z Reader or EMMA; see Reichle, 2011). As shown, the lexical processing of any given word is completed from two locations from the previously fixated word and from a fixation on the word itself. Then, upon completing whatever lexical processing is necessary to initiate saccadic programming, attention shifts to the

#### **REFERENCES**


visual evoked potential generators by retinotopic and topographic analyses. *Hum. Brain Mapp*. 2, 170–187. doi: 10.1002/hbm. 460020306


next word so that lexical processing of that word can begin using information acquired from the current fixation location. (Note that, according to the E-Z Reader model, the time required to shift attention also includes whatever additional time is needed to complete lexical access of the fixated word.)

Of course, the manner in which E-Z Reader instantiates eyemovement control places the most severe temporal constraints on lexical processing and its coordination with the oculomotor system because only one word is processed at a time. These constraints are significantly relaxed to the extent that multiple words are processed in parallel, as posited by the SWIFT model (Engbert et al., 2002, 2005). According to this alternative theoretical perspective, difficulty associated with processing the fixated word can inhibit the autonomous timer that otherwise initiates a saccadic program to move the eyes to a new viewing location. Recent studies on saccadic inhibition (Reingold and Stampe, 1999, 2000, 2002, 2003, 2004) and prior neurophysiological findings (Munoz et al., 1996) suggest that this type of hypothesized mechanism might produce a very rapid inhibitory effect in as little as 20–30 ms. Consequently, there seems to be ample time for an inhibitory mechanism to intervene in the decisions about when to move the eyes during reading (e.g., see Reingold et al., 2012 for a proposal of a hybrid eye-movement control mechanism incorporating both facilitatory and inhibitory lexical influences).

This review thus indicates that lexical processing is sufficiently rapid to permit direct control of the decisions about when to move the eyes during reading, but that such control also requires a substantial amount of lexical processing from the parafovea perhaps more that has been acknowledged by reading researchers. This latter conclusion underscores the more basic claim about eye-movement control during reading being a highly skilled activity—one that requires a tremendous degree of coordination between the systems that support attention, word identification, and the programming and execution of saccades.


linear analysis of ERP data. *Neuroimage* 30, 1383–1400. doi: 10.1016/j.neuroimage.2005.11.048


*Psychol. Rev*. 87, 329–354. doi: 10.1037/0033-295X.87.4.329


predictability on eye fixations in reading: implications for the E-Z Reader model. *J. Exp. Psychol. Hum. Percept. Perform*. 30, 720–732. doi: 10.1037/0096-1523.30.4.720


E-Z Reader to model the effects of higher-level language processing on eye movements during reading. *Psychon. Bull. Rev.* 16, 1–21. doi: 10.3758/PBR.16.1.1


effects in word recognition: evidence for early interactive processing. *Psychol. Sci.* 14, 328–333.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 01 May 2013; paper pending published: 28 May 2013; accepted: 23 June 2013; published online: 15 July 2013.*

*Citation: Reichle ED and Reingold EM (2013) Neurophysiological constraints on the eye-mind link. Front. Hum. Neurosci. 7:361. doi: 10.3389/fnhum.2013.00361*

*Copyright © 2013 Reichle and Reingold. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

### Assimilation of L2 vowels to L1 phonemes governs L2 learning in adulthood: a behavioral and ERP study

#### *Mirko Grimaldi <sup>1</sup> \*, Bianca Sisinni 1, Barbara Gili Fivela1, Sara Invitto2, Donatella Resta1, Paavo Alku3 and Elvira Brattico4,5*

*<sup>1</sup> Dipartimento di Studi Umanistici, Centro di Ricerca Interdisciplinare sul Linguaggio, Università del Salento, Lecce, Italy*

*<sup>2</sup> Laboratorio di Anatomia Umana e Neuroscience, Dipartimento di Scienze e Tecnologie Biologiche e Ambientali, Università del Salento, Lecce, Italy*

*<sup>3</sup> Department of Signal Processing and Acoustics, Aalto University, Espoo, Finland*

*<sup>4</sup> Brain & Mind Laboratory, Department of Biomedical Engineering and Computational Science, Aalto University, Espoo, Finland*

*<sup>5</sup> Cognitive Brain Research Unit, Institute of Behavioral Sciences, University of Helsinki, Helsinki, Finland*

#### *Edited by:*

*Merim Bilalic, Alpen Adria University Klagenfurt, Austria*

*Reviewed by: Antoine Tremblay, Dalhousie University, Canada Olli Kalervo Aaltonen, University of Helsinki, Finland*

#### *\*Correspondence:*

*Mirko Grimaldi, Dipartimento di Studi Umanistici, Centro di Ricerca Interdisciplinare sul Linguaggio, Università del Salento, Piazza Angelo Rizzo 1, Lecce 73100, Italy e-mail: mirko.grimaldi@unisalento.it* According to the Perceptual Assimilation Model (PAM), articulatory similarity/dissimilarity between sounds of the second language (L2) and the native language (L1) governs L2 learnability in adulthood and predicts L2 sound perception by naïve listeners. We performed behavioral and neurophysiological experiments on two groups of university students at the first and fifth years of the English language curriculum and on a group of naïve listeners. Categorization and discrimination tests, as well as the mismatch negativity (MMN) brain response to L2 sound changes, showed that the discriminatory capabilities of the students did not significantly differ from those of the naïve subjects. In line with the PAM model, we extend the findings of previous behavioral studies showing that, at the neural level, classroom instruction in adulthood relies on assimilation of L2 vowels to L1 phoneme categories and does not trigger improvement in L2 phonetic discrimination. Implications for L2 classroom teaching practices are discussed.

**Keywords: adult phoneme perception, mismatch negativity (MMN), foreign language acquisition, L2 classroom learning, event-related potentials, vowel perception**

#### **INTRODUCTION**

Learning a second language (L2) in adulthood challenges our brains. As mother tongue phoneme representations are formed in the brains of 6–12 months old children (Werker and Tees, 1983; Kuhl et al., 1992; Cheour et al., 1998; Kuhl, 2008) non-native speech sounds become increasingly difficult to discriminate and L2 perception generally turns into a demanding task for learners (Iverson et al., 2003). This loss of sensitivity does not prevent L2 learning in adulthood (Flege, 1995). The extent of success may depend nonetheless on numerous variables: i.e., age of L2 learning, length of residence in an L2-speaking country, gender, formal instruction, motivation, language learning aptitude and amount of native language (L1) use (see Piske et al., 2001 for an overview). When L2 learners are immersed in an L2 environment, the contribution of age toward learning to perceive and produce L2 sounds occurs primarily through interactions with the amount of L1 use and the amount of L2 native speaker input received (Flege et al., 1995, 1997, 1999; Flege and Liu, 2001; Flege and MacKay, 2004; Tsukada et al., 2005; see Piske, 2007 for a critical review). However, when learners are immersed in an L1 environment and have a reduced L2 exposure, primarily in a restricted setting (namely, with little or unsystematic conversational experience with native speakers) learning of L2 phonemes at the native speaker level becomes very difficult if not impossible. According to Best and Tyler (2007: 16), the perception of L2 in these individuals receiving only formal instruction in adulthood may resemble that of L2 naïve listeners. In other words, they are functional monolinguals, not actively learning or using L2 when compared with L2-learning listeners, i.e., learners who are in the process of actively learning an L2 to achieve functional, communicative goals within natural L2 context.

Cross-linguistic and L2 speech perception studies have shown that adult learners of L2 have difficulty with both the perception and production of non-native phonological segments, i.e., consonants and vowels that either do not occur or are phonetically different in their L1 (see Flege, 2003 for a discussion). Indeed, it is commonly thought that a major determinant of L2 foreign accent is the underlying problem associated with the perception of L2 phonological structures. In turn, acquisition of phonetic contrasts involves not only the detection of differences in the acoustic signal but also the accessing of internalized categories, which in the brain are most likely associated with definite neural representations. Within the behavioral literature, there are two major theoretical frameworks on L2 speech learning in adulthood, the Speech Learning Model (SLM, Flege, 1995) and the Perceptual Assimilation Model (PAM, Best, 1995). The SLM has been primarily concerned with the ultimate attainment of L2 production and perception and mainly deals with highly experienced L2 learners immersed in an L2 environment, whereas the PAM is mainly interested in explaining the initial L2 perception of L2 learners through the non-native perception of naïve listeners, who are in fact functional monolinguals (but see Best and Tyler, 2007, for an extension to L2 learning). Both SLM and PAM posit that the degree of success listeners will have in perceiving non-native L2 sounds depends on the perceived relationship between phonetic elements found in the L1 and the L2 systems. These models make predictions about performance in non-native segmental perception based on the perceived distance between L1 and L2 sounds (Guion et al., 2000).

This study investigated the thus far little studied L2 perception in functional monolinguals, by behaviorally and neurally testing the predictions posed by the PAM framework. The PAM predicts that if two non-native sounds are perceived as acceptable exemplars of two distinct native phonemes (Two-Category assimilation), their discrimination will be easy, while if both nonnative sounds are perceived to be equally poor/good exemplars of the same native phoneme (Single-Category assimilation), their discrimination will be difficult. An intermediate discrimination is predicted when the two non-native sounds are both perceived as the same native sound but differ in goodness rating (Category-Goodness assimilation). Finally, when an L2 category is perceived as more than one L1 phoneme and the other L2 category is perceived as a single native phoneme, a good discrimination is predicted (Uncategorized-Categorized assimilation). For predictions to be generated by PAM (or the SLM), cross-language phonetic distance data need to be obtained by means of behavioral experiments. The degree of perceptual distance between phonemes is usually examined using an identification and rating methodology. The foreign (or L2) sounds are first classified as instances of a phonetic category(s) in the listener's L1, then rated for goodness-of-fit to the L1 category.

Whereas the studies on L2 and non-native phoneme perception discussed above have used only behavioral techniques to address this question, we chose to adopt both behavioral (categorization and discrimination tests) and electrophysiological (event-related potential, ERP) techniques to examine the L2 perceptual abilities of our subjects. The ERP technique provides not only a millisecond precise measurement of information processing in the brain but also, depending upon the task, can allow one to disentangle automatic detection from attentional processes. ERP studies on L2 phoneme processing have used the oddball paradigm, alternating repetitive (standard) and infrequent (deviant) sounds (80–20% of occurrence respectively) while subjects are distracted from listening by a primary task (e.g., watching a silent movie), to measure the so-called mismatch negativity (MMN) response to L2 contrasts. The MMN is an ERP component, elicited by stimulus change at ≈100–250 ms, mainly generated in the auditory cortex and with additional generators in the inferior frontal cortex, reflecting the neural detection of a change in a constant property of the auditory environment (Picton et al., 2000; Näätänen et al., 2007). A large body of evidence supports the notion that the discriminative MMN process relies both on auditory sensory and categorical phonetic representations of speech stimuli and that these two codes are utilized in parallel by the pre-attentive change detection process reflected in the MMN component (Näätänen et al., 2001, 2011; Pulvermüller and Shtyrov, 2006). The MMN results from prediction violations on the basis of the repetitive standard presentation (Winkler and Czigler, 2012). It has been proposed that the standard presentation resembles perceptual learning during which hierarchical sensory levels of processing receive bottom-up sensory input from lower levels and receive top-down predictions from higher levels (Garrido et al., 2009). As a result of the repetition of the standard presentation, prediction errors are reduced by repetitive suppression or adaptation (Friston, 2005). A deviant presentation then leads to a violation of bottom-up prediction that is reflected in MMN generation (see also the discussion in Scharinger et al., 2012). Furthermore, the amplitude and peak latency of the MMN is directly correlated with the magnitude of the perceived change and, hence, it is considered a measure of individual discrimination accuracy (see Amenedo and Escera, 2000; Näätänen, 2001; Sussman et al., 2013 for a critical discussion).

The results of MMN studies, mainly focused on L2-learning listeners, are mixed. For instance, Winkler et al. (1999a) found that Hungarian adult late L2 learners who had been immersed for several years in the L2 context perceived non-native contrasts (in Finnish) as well as native speakers, as evidenced by comparable MMN amplitudes elicited by both native Finns and fluent Hungarians in response to a Finnish across category-boundary vowel contrast, when opposed to naïve Hungarians. The results by Winkler et al. (1999a) were not replicated in a population of advanced adult L2 learners (of English) who were not immersed, since advanced Finnish students of English did not show MMN to English phonemes that would be comparable to the one elicited by native Finnish phonemes, hence suggesting that learning in the classroom environment may not lead to the formation of new long-term native-like memory traces (Peltola et al., 2003). These brain responses to new phonemes probably develop in children at a very fast pace: i.e., within three months of intensive exposure, as evidenced by MMN to L2 phoneme contrasts in Finnish children participating in French language immersion education (Cheour et al., 2002; Shestakova et al., 2003; Peltola et al., 2005). Again, however, subsequent works did not confirm these findings when the L2 was English both for Finnish listeners (Peltola et al., 2007) and Japanese listeners (Bomba et al., 2011). Finally, Rinker et al. (2010) for bilingual Turkish–German kindergarten children growing up in Germany have shown that the MMN response is less robust in Turkish–German children to the German vowel, when compared to a German control group. Thus, immersion education and natural acquisition contexts did not guarantee native-like L2 vowel discrimination. Also, native-like L2 vowel discrimination is not guaranteed after a short training (50 min on 5 consecutive days) via associative/statistical learning: as showed by Dobel et al. (2009), who neurally investigates the perceptual acquisition of an L2 consonant (/φ/) in a group of adult German speakers using the MEG methodology. Instead of establishing a novel category the subjects integrated /φ/ into the native category /f/, demonstrating that native categories are powerful attractors hampering the mastery of non-native contrasts. None of these studies, though, have tried to explain the L2 perceptual processes according to any of the well-established models for L2 learning. Hence they left open the question of which mechanisms govern the acquisition of L2 phonemes in adult learners from formal instruction and with restricted L2 exposure.

The present study aims at studying the behavioral and neural (MMN) correlates of L2 learning in adulthood while directly testing the hypotheses that these correlates would index the perceptual mechanisms posed by the PAM model. Specifically, our study addressed two questions: (i) Do the predictions generated by the PAM through behavioral methods hold when they are neurophysiologically investigated, namely can the discrimination patterns predicted by the PAM for L2 naïve listeners be also mirrored in MMN amplitudes or latencies? (ii) Is L2 classroom learning associated with the typology of L2 naïve listeners, as recently suggested by Best and Tyler (2007)? To answer these questions, we measured the behavioral and electrophysiological data of two groups of Salento Italian (SI) undergraduate students of British English (BE) attending the first and the fifth year of the Foreign Languages and Literatures Faculty. Crucially, SI, the Italian variety spoken in Southern Apulia, presents a five stressed vowel system (i.e., /i, ε, a, c , u/; Grimaldi, 2009; Grimaldi et al., 2010) contrary to the richer vowel system of BE that shows, excluding diphthongs, eleven stressed vowels (see Stimuli). Therefore, for SI speakers, it could be relatively difficult to learn a complex L2 vowel system, supporting the idea that the L1 plays an important role and enables one to predict the relative difficulty of acquisition of a given L2 contrast (Iverson and Evans, 2007). Firstly, we behaviorally tested the two groups of students by means of an identification test. On the basis of the results of this test, the contrasts /i:/-/u:/ and /æ/-/ v / (for which the PAM's framework predicted an excellent and a good discrimination, respectively) were selected for a behavioral discrimination test. In the ERPs experiment, the groups of students were compared with a control group of listeners who were much more linguistically inexperienced of the L2, as their knowledge of English derived only from compulsory school studies. Moreover, as a control condition we introduced the L1 within-category contrast /ε/-[e], for which poor discrimination is predicted (cf. Phillips et al., 1995; Dehaene-Lambertz, 1997; Winkler et al., 1999b; see also Miglietta et al., 2013). These two vowels are phonologically contrastive in standard Italian and they are used to create lexical contrast (i.e., /"pεska/ "peach" vs. /"peska/ "fishing") whereas SI has the phoneme /ε/ only. Consequently, for SI speakers these stimuli belong to the same category, as /ε/ is the underlying phoneme and [e] represents an *allophone* (generally transcribed between brackets), namely a within-category variant of the same phoneme.

#### **METHODS**

#### **BEHAVIORAL EXPERIMENTS**

#### *Subjects*

Two groups of 10 normal-hearing (tested prior to the experiment), right-handed, undergraduate male students of the Foreign Languages and Literatures Faculty voluntarily participated in the experiments. One group was enrolled in its first year (age 21.4 ± 1.71; 9.4 ± 1.34 years of English studies in formal context), whereas the other was in its fifth year (age 25.6 ± 1.98; 14.3 ± 2.11 years of English studies in formal context). As assessed by a questionnaire of language use, all the subjects neither participated in Erasmus programs in England nor have had L2 native teachers prior to attending university. English instruction university classes are taught by Italian native-speakers prevalently, although for at least 6 months per year (3–5 h per week) these students had been attending lessons also with native English lecturers. However, in the last case, language classes are only a few hours per week and are just based on lexical and morphosyntactic formal instructions; no systematic and explicit phonetic instruction or training is administered.

#### *Stimuli*

The stimuli consisted of the 11 BE monophthong vowels, i.e., /i:/, /I/, /ε/, /æ/, / v /, /A:/, / A /, / ε :/, / c :/, /U/ and /u:/ (Ladefoged, 2001). These sounds were produced by three male native BE speakers (age 47.3 ± 4.9; years in Italy: 22.3 ± 5.13), two of them coming from London, one coming from Birmingham. The speakers read a list of monosyllabic words with the phonemes /i:/, /I/, /ε/, /æ/, / v /, /A:/, / A / and / ε :/ placed in a /p\_t/ context and the phonemes /i:/, / c :/, /U/ and /u:/ in an /s\_t/ context, for a total of 36 stimuli (3 speakers × 12 phonemes). Given that /i:/-/u:/ and /u:/-/U/ were part of the discrimination task as control and target contrasts, respectively, /i:, U and u:/ needed to be recorded in the same consonant context. Thus, the extra context /s\_t/ was used for these three vowels because there is no English word with /u:/ in the /p\_t/ context. These stimuli were recorded in the CRIL soundproof room by a CSL 4500 at a sampling rate of 22.05 kHz and were segmented and normalized in peak amplitude using the software Praat 4.2. Each of the student groups performed two perceptual tests: the identification and the oddity discrimination test. All subjects were individually tested in the CRIL soundproof room using a computer and with sounds (set at a comfortable sound level) delivered via headphones, for a total duration of approximately 40 min.

#### *Identification test*

The aim of the identification test was to examine the perceived phonetic distance between the L1 and L2 sounds: i.e., to detect which L2 sounds are more similar/dissimilar to the L1 sounds and, consequently, are more difficult/easy to discriminate by perception (Flege and MacKay, 2004). The 36 stimuli were randomly presented 3 times, and subjects identified each of them in terms of one of the 5 SI vowels /i, ε, a, c / or /u/ by clicking on the computer screen. Students could not rehear a stimulus, but they were told to guess if they were unsure. Before performing the test, students received instructions orally and a training test of 10 stimuli was administered in the presence of the experimenter to ensure that the students understood the task. No subject was rejected on the basis of the training test because they all found the task easy to perform.

#### *Oddity discrimination test*

The purpose of the oddity discrimination test was to measure the ability of listeners to discriminate L2 sounds. For each of the two contrasts, 8 change trials and 8 catch trials (32 total trials per student) were executed. The change trials were made up of 3 items, each one produced by one of the three BE speakers, with an odd item belonging to a different phonological category that subjects had to detect. The odd item was alternatively placed in the first, second or third position in a nearly balanced way (Tsukada et al., 2005) to avoid response bias (Bion et al., 2006). Additionally, the three native English speakers produced the catch trials, where all of the items contained the same phonological category. These kinds of trials test subjects' ability to ignore the acoustical differences among the stimuli belonging to the same phonological category. For instance, to test the contrast /i:/-/u:/ the change trials were /i:/-/i:/-/u:/ − /i:/-/u:/-/i:/ − /u:/-/i:/-/i:/ − /u:/-/u:/- /i:/ − /u:/-/i:/-/u:/ − /i:/-/u:/-/u:/, and the catch trials were /i:/-/i:/-/i:/ − /u:/-/u:/-/u:/. Subjects clicked the computer screen on "1," "2," "3," corresponding to the position of the item they perceived as different or to "none" if they perceived all items as equal. The results of this test, i.e., A scores, were calculated for each contrast by applying the formula of Snodgrass et al. (1985). These scores reduce the effects of response bias by calculating the proportion of hits (i.e., the number of correct selections of the odd item in the change trials) and the proportion of false alarms (i.e., the number of incorrect selections of an odd item in the catch trials). An A score of 1.0 indicates perfect discrimination and an A score of 0.5 indicates a null discrimination. Subjects were first given the instructions and then administered a training test in the presence of the experimenter to verify that they had understood the task. No subject was rejected on the basis of the training test because they all found the task easy to perform. This test was also executed by a control group of 10 male BE listeners (mean age: 20.5 ± 1.95), native speakers of the London variety.

*Statistical analysis of oddity discrimination test results.* Discrimination accuracy (A score) was analyzed in repeatedmeasures ANOVA with "contrast" (/æ/-/ v / and /i:/-/u:/) as the within-subject factor and "group" (first and fifth year) as the between-subject factor. In all of the statistical analyses, the alpha level was set to *p <* 0*.*05, and type I errors were controlled for by decreasing the degrees of freedom with the Greenhouse–Geisser epsilon. *Post-hoc* tests were conducted by Fisher's least-significant difference (LSD) comparisons.

#### **ERP EXPERIMENT**

#### *Subjects*

The two groups of students involved in the behavioral experiments participated in the ERP sessions. Additionally, a third control group of normally hearing (tested prior to the experiment), right-handed subjects with only compulsory school education (10 subjects; age 25 ± 4.26; years of English studies in formal context 5 ± 2.9) performed the electrophysiological test. The control group was primarily composed of carpenters, plasterers, or unemployed, and each participant received a small monetary compensation for participating in the experiment. If one considers that in Italy a foreign language is usually taught starting from the last two years of primary school (when children are normally 8 years old), we can suppose that the student groups and the control group have a similar starting age of L2 exposure. However, the student groups have more formal exposure to the L2, particularly the fifth year group. In contrast, the control group's L2 exposure was limited to compulsory school, where they passively received impoverished lexical or morphosyntactic inputs by non-native L2 teachers for approximately 3 h per week. Additionally, in Italy foreign programs are dubbed, so that the exposure to foreign languages in informal contexts is very low. We also excluded that the ordinary listening of English music could represent an involuntary L2 training, as the acquisition of L2 in adulthood presupposes a strong motivation and a continuous use of L2 in different conversational contexts (cf. Gardner, 1991). All of the subjects signed the informed consent form. The local Ethics Committee approved the experimental procedure.

#### *Stimuli and procedure*

We used the same contrast pairs as in the oddity discrimination test but the stimuli consisted of synthetic vowels whose duration was 350 ms (edited with Praat 4.2). Thus the contrasts tested were /i/-/u/ and /æ/-/ v /. A third contrast was added as control, i.e., /ε/-[e] where the former is a mid-opened vowel and the latter a mid-closed one. This is a within-category contrast for SI speakers and poor discrimination is predicted. In **Table 1**, we provide the acoustic characteristics of stimuli. First formant frequency (F1) and second formant frequency (F2) are given in Hz.

To avoid confounding the effects of acoustic variations in natural utterances with the ERP responses, the stimuli for the ERP experiment were created using the Semisynthetic Speech Generation method (SSG, Alku et al., 1999), which mathematically models the functioning of the human voice production mechanism. To obtain raw material for the SSG synthesis for the ERP experiment, short words produced by a native male BE speaker (44 years old coming from London) and by a native male speaker of Standard Italian (45 years old, coming from Florence) were recorded in a soundproof room using a Sennheiser MKH 20 P48 high-frequency condenser, omnidirectional microphone, and a response frequency of 20–20,000 Hz, and further processed with a sampling frequency of 22050 Hz and a resolution of 16 bits. Signal sections corresponding to the desired vowels to be synthesized were cut from the recorded words. From these selected sections, the corresponding vocal tract filters were computed with SSG using digital all-pole filtering (Oppenheim and Schafer, 1989) of 22.

The three contrasts /æ/-/ v /, /i/-/u/ and /ε/-[e] were presented in separate blocks lasting 15 min each, and each with 86% frequency of occurrence (582 trials) for the standard stimulus (the first vowel of each above listed pair) and 14% frequency (114 trials) for the deviant stimulus (the second vowel of each pair). The order of presentation was pseudo-randomized, since a deviant stimulus was never presented before three standards. The interstimulus interval was 750 ms. During the EEG recording, participants sat in a comfortable armchair and were instructed to watch a silent movie while paying no attention to the stimuli, which were binaurally presented in a soundproof room through loudspeakers at 65/70 dB.

#### *Electrophysiological recordings*

The EEG was recorded from the scalp using a 64 Ag/AgCl electrode cap (BrainCap, Brain Products) with a sampling frequency of 500 Hz. Eye movements were monitored with electrodes attached at the top and the bottom of the left eye and at the top

**Table 1 | Values of the first formant (F1) and the second formant (F2) given in Hz and Euclidean distances of the stimulus contrasts utilized in the ERP experiment.**


of the right eye. The reference electrodes were attached on the ear lobes. Impedance was kept under 15 k*-*. The signal was off-line filtered (0.5–50 Hz, 24 dB), and the threshold for artifact rejection was set at *>* ±125μV. The numbers of trials accepted after artifact rejection are reported in **Table 2**. Each standard following a deviant was removed from the averaging. The ERP epochs included a pre-stimulus interval of 100 ms, used for baseline correction, and lasted until 450 ms.

#### *Statistical analysis of ERP data*

To quantify the MMN, we first identified the most negative peaks at Fz around the time interval 120–300 ms for each contrast and group from the grand-average difference waveforms. Subsequently, the individual MMN amplitudes were calculated by taking the mean values from the same 40-ms interval around the grand-average MMN peaks for each contrast and group obtained as described above. The significance of the individual MMN amplitudes at Fz was verified by paired *t*-tests against the zero baseline. To test our hypotheses on the effects of contrast types and language exposure on the MMN amplitudes measured at F3, F4, C3, C4, P3, and P4, we used repeated-measures ANOVAs and linear mixed-effect models with the between-subject factor Group (first year, fifth year students and control group) and the within-factors Language (the within-category contrast /ε/-[e] and the English pairs /i/-/u/ and /æ/-/ v /), Contrast (/i/- /u/, /æ/-/ v /, and /ε/-[e]), Frontality (frontal, central, and parietal electrodes) and Laterality (right or left hemisphere). We also extracted the individual peak latencies of the MMN response recorded at Fz by searching for the most negative peak within the time interval 120–300 ms per each subject and each condition. For testing the hypotheses on the MMN peak latencies, a similar ANOVA as above (with Group, Language and Contrast as factors) was conducted but without the two electrode factors. For all statistical tests, the alpha level was chosen to correspond to *p <* 0*.*05. Type I errors were controlled for by decreasing the degrees of freedom with the Greenhouse–Geisser epsilon (original degrees of freedom are reported) or by adding subjects as random effect including it as intercept or random slopes, when appropriate as assessed by the Bayesian information criteria in a linear mixed-effect model. The difference threshold for accepting or rejecting a more complex model was set to 4. *Post-hoc* tests were conducted by Fisher's least-significant difference (LSD) comparisons.

**Table 2 | The average number of accepted standard (stand) and deviant (dev) trials for each contrast and each group (control group, first year students, fifth year students).**


*The percentages with respect to the total number of trials are also given in parentheses.*

### **RESULTS**

#### **IDENTIFICATION TEST**

The identification test results were considered in terms of the percentage of identification of BE phonemes with respect to the SI ones. The percentages indicate the frequency with which L1 SI vowels were used to classify the L2 BE vowels. The percentages of identification obtained by first (I) and fifth (V) year students are summarized in **Table 3**.

The percentages of identification of the L2 phonemes to the L1 phonemes are very useful for understanding how the former are perceived and categorized with respect to the latter. The L2 phonemes associated with an L1 phoneme with an identification percentage ≥ 80% were considered consistently identified to the L1 and only that identification was taken into account. Conversely, those L2 phonemes associated with two or more L1 phonemes (identification percentage *<* 80%) were considered as not consistently assimilated, and the first two identifications were taken into account.

The data summarized in **Table 3** show that both the first and the fifth year students adopted the same assimilation strategies, albeit with slightly different percentages. According to the identification consistency threshold identified above, the results depict the following scenario: /æ/ was consistently assimilated with the native phoneme /a/; / v / was identified to /a/ or /o/, so was not assimilated to either of these two native phonemes. Finally, /i:/ and /u:/ were each consistently identified with the native phonemes, /i/ and /u/, respectively. In fact, BE /i:/ and /u:/ (see **Table 1**) share some formant features with SI /i/ (F1 326, F2 2244) and /u/ (F1 368, F2 867) (Grimaldi, 2009) and consequently are perceived by SI listeners as their native counterpart.

According to the PAM typologies of assimilation, the vowels /æ/, / v /, /i:/ and /u:/, can be grouped into two contrasts of L2 vowels (see **Table 3**): (i) the contrast /æ/-/ v / falls into the Uncategorized-Categorized assimilation, for which good discrimination is predicted, as the non-native vowel /æ/ is consistently assimilated to a native phoneme (/a/), whereas the other vowel / v / is not categorized with any native phoneme; (ii) the contrast /i:/-/u:/ falls into the two-category assimilation, for which excellent discrimination is predicted, as they have been consistently



identified with two different native phonemes: i.e., /i/ and /u/. The discrimination ability by the two groups of students for these contrasts was further tested with the oddity discrimination test.

#### **ODDITY DISCRIMINATION TEST**

The repeated-measures ANOVA on A scores (**Table 4** and **Figure 1**) did not yield differences between the two groups, [*F*(1*,* 18) = 0*.*40, *p >* 0*.*05, η<sup>2</sup> *<sup>p</sup>* = 0*.*02] but it yielded a significant effect for the contrasts [*F*(1*,* 18) = 18*.*24, *p* = 0*.*000, η<sup>2</sup> *<sup>p</sup>* = 0*.*50]. The *post-hoc* analysis revealed that the contrast /i:/-/u:/ was discriminated with a higher A with regard to the contrast /æ/-/ v /. The interaction Group × Contrast was not significant [*F*(1*,* 18) = 0*.*26, *p >* 0*.*05, η<sup>2</sup> *<sup>p</sup>* = 0*.*01].

#### **ERPs**

**Figures 2**–**4** show the grand-average difference waveforms for all groups and for each stimulus contrast (see also **Figure S1** in the Supplementary Material). The mean MMN amplitudes and peak latencies are displayed in **Table 5** and **Figure 5**.

For all conditions and for all groups, we obtained a significant MMN response. In the ANOVA, the MMN amplitude was slightly significantly modulated by Contrast [*F*(2*,* 52) = 3*.*02, *p* = 0*.*05, η<sup>2</sup> *<sup>p</sup>* = 0*.*10; this result corresponded to an only marginal significance in the linear mixed-effects model with by-subjects random intercepts where by-stimulus random intercepts and by-subject random slopes for Contrast were tested for inclusion: *F*(2*,* 54) = 2*.*9, *p* = 0*.*07]. The *post-hoc* tests showed that there was a significant difference between the L2 /æ/-/ v / and the within-category contrast /ε/-[e] (*p <* 0*.*05) and a tendency toward a significant difference between /i/-/u/ and the within-category

**Table 4 | The A scores obtained by the first year group (I) and the fifth year group (V).**


*Standard deviations are in parentheses.*

contrast /ε/-[e] (*p* = 0*.*06). Namely, the within-category contrast /ε/-[e] had the lowest amplitude, while the L2 contrasts /i/-/u/ and /æ/-/ v / showed similar amplitudes. The MMN amplitude was also modulated by Frontality [*F*(2*,* 52) = 112*.*16, *p <* 0*.*001, η<sup>2</sup> *<sup>p</sup>* = 0*.*81; also replicated in the linear mixed-effects model: *F*(2*,* 400) = 2*.*4, *p <* 0*.*0001] and the *post-hoc* showed that the amplitudes were highest in the frontal area, then in the central and finally in the parietal area. Additionally, we found a modulation of the frontal MMN amplitudes by group expertise with the significant interaction Group × Frontality [*F*(4*,* 52) = 4*.*56, *p <* 0*.*001, η<sup>2</sup> *<sup>p</sup>* = 0*.*26; confirmed also in the linear mixed-effects model: *F*(4*,* 400) = 10*.*7, *p <* 0*.*001]. This interaction derived from the larger MMN amplitudes at frontal electrodes to any stimulus found in the control students as compared with the fifth year students (*p* = 0*.*06).

Moreover, the significant interaction Contrast × Frontality [*F*(4*,* 104) = 3*.*38, *p <* 0*.*05, η<sup>2</sup> *<sup>p</sup>* = 0*.*15; this result was replicated in the linear mixed-effects model: *F*(4*,* 400) = 4, *p* = 0*.*004] confirmed that in the frontal area the within-category contrast /ε/-[e] had lower amplitudes than /i/-/u/ and /æ/-/ v / (/i/-/u/ vs. /ε/- [e]: *p <* 0*.*05; /æ/-/ v / vs. /ε/-[e]: *p* = 0*.*01; /i/-/u/ vs. /æ/-/ v /: *p >* 0*.*05). The typical fronto-central MMN scalp distribution was also confirmed by the significant interaction Frontality × Laterality [*F*(2*,* 52) = 4*.*48, *p* = 0*.*01, η<sup>2</sup> *<sup>p</sup>* = 0*.*14; this result was not replicated though in the linear mixed-effect model: *F*(2*,* 400) = 1*.*6, *p* = 0*.*2] and the *post-hoc* showed that this pattern was present in both the right and left hemispheres. The amplitude of the MMN presented a difference in the frontal area only, where it was larger over the right than the left hemisphere (cf. **Table 6** for the repeated measures ANOVA results).

The MMN peak latency differed according to the vowel contrasts, as testified by the significant main effect of Contrast [*F*(2*,* 52) = 10*.*35, *p <* 0*.*001, η<sup>2</sup> *<sup>p</sup>* = 0*.*28] (cf. **Table 7** for all statistical results).

This effect obtained with a general linear model with fixed effects was confirmed also in a linear mixed-effects model of MMN peak latency as a function of Contrast with by-subjects random intercepts where by-stimulus random intercepts for Contrast were tested for inclusion (by-subject random slopes were not included instead, since they did not improve the model fit according to the Bayesian information criteria). Also in this more generalizable mixed-effects model the main effect of Contrast reached significance [*F*(2*,* 52) = 11*.*2, *p <* 0*.*001]. In *post-hoc* tests, the contrasts /i/-/u/ evoked a faster MMN than the contrast /æ/-/ v / (*p* = 0*.*01) and the within-category contrast /ε/- [e] (*p* = 0*.*000), and in turn the contrast /æ/-/ v / evoked a faster MMN than the contrast /ε/-[e] (*p <* 0*.*05).

#### **DISCUSSION**

This study tested whether the L2 discrimination patterns predicted by the PAM for L2 contrasts are mirrored in the MMN amplitudes and peak latencies to the same contrasts. The behavioral findings suggest that the first and the fifth year students did not differ in their discrimination processes, notwithstanding the different classroom and educational backgrounds. In particular, these two groups of subjects exhibited excellent discrimination of /i:/-/u:/ (belonging to Two-Category assimilation) and moderate to good discrimination of /æ/-/ v / (belonging to

line) and fifth (red dashed line) year students and the control group (black solid line) in response to the contrast /i/-/u/; **(B)** The grand-average difference

Voltage maps for the groups are plotted at the MMN peaks of the grandaverage waveforms, referenced to the algebraic mean of the electrodes.

line) and fifth (red dashed line) year students and the control group (black solid line) in response to the contrast /æ/-/ v /; **(B)** The grand-average difference

Voltage maps for the groups are plotted at the MMN peaks of the grandaverage waveforms, referenced to the algebraic mean of the electrodes.

line) in response to the contrast /ε/-[e]; **(B)** The grand-average difference

average waveforms, referenced to the algebraic mean of the electrodes.



*Standard deviations are given in parentheses.*

Uncategorized-Categorized assimilation). The findings obtained in the behavioral experiments are in accordance with the PAM predictions, as the PAM framework foresees excellent discrimination of /i:/-/u:/ and moderate-to-good discrimination of /æ/-/ v /.

Notably, PAM assimilation types describe the possible perceptive outcomes of first contact with an unfamiliar phonological system and its phonetic patterns. Hence, PAM assimilation types predict how naïve listeners will identify and discriminate non-native phonological contrasts. When a good or an excellent discrimination is predicted, this does not mean that L2 listeners are able to differentiate phonetic and phonological patterns in non-native stimuli, but that they can only easily recognize the acoustic deviations of the unfamiliar phones from their L1 phonemes (Best and Tyler, 2007). According to (Best and Tyler, 2007), this is a starting condition that may or not evolve in the formation of L2 phonetic and phonological categories during the acquisition process, depending on numerous variables: i.e., age of L2 learning, length of residence in an L2-speaking country, gender, formal instruction, motivation, language learning aptitude and amount of native language (L1) use (Piske et al., 2001). The current behavioral findings from both the identification and discrimination tests confirmed in perception those obtained in production by Suter's (1976) seminal work, according to which formal instruction was a factor which did not greatly contribute to the improvement of pronunciation. Suter's study showed that the pronunciation of students does not necessarily improve during their university education. Within the PAM and the SLM framework, supportive evidence, concerning both perception and production, was also behaviorally provided by Simon and D'Hulster (2012). Indeed, L2 university experience in Dutchspeaking learners of English did not have an important effect on

**Table 6 | Degrees of freedom (***df***),** *F* **and** *p* **values of the repeated measures ANOVA performed for the MMN amplitudes.**


**Table 7 | Degrees of freedom (***df***),** *F* **and** *p* **values of the repeated measures ANOVA performed for the MMN latencies.**


their production performance. That is, learners who were almost at the end of their university studies did not produce the English vowel contrast /ε/-/æ/ significantly more native-likely than learners who had only just begun their university studies in English. In parallel, according to PAM, Simon and D'Hulster (2012) found that in perception both inexperienced and experienced learners were able to discriminate the vowel contrast /ε/-/æ/ similarly, since they displayed a Category-Goodness assimilation for which intermediate discrimination is predicted (Best and Tyler, 2007).

In the ERP experiment we introduced a control group of listeners with English knowledge derived only from compulsory school, thus much more inexperienced than the students groups. Furthermore, we introduced a third contrast as control, i.e., the L1 within-category contrast /ε/-[e]. Based on the vowel space of SI, spoken by our subjects (cf. Grimaldi, 2009 and **Table 1**), we predicted that those two vowels should be perceived as good exemplars of the same native phoneme /ε/. Hence, we expected difficult discrimination for that contrast (Phillips et al., 1995; Dehaene-Lambertz, 1997; Winkler et al., 1999b). Indeed, our electrophysiological results confirmed that in all subjects the two L2 contrasts, /i/-/u/ and /æ/-/ v /, elicited larger MMN amplitudes than the L1 within-category contrast /ε/-[e] (cf. **Table 6**). According to PAM predictions, this finding indicates that our subjects discriminated well the two non-native contrasts.

MMN peak latencies, on the other hands, were modulated by the contrast type: the contrast /i/-/u/ elicited a faster MMN than the contrast /æ/-/ v / and the within-category contrast /ε/- [e]; in turn, the contrast /æ/-/ v / evoked a faster MMN than the contrast /ε/-[e]. This result reflected the acoustic distances between the stimuli (see **Table 1**), i.e., the smallest between the within-category contrast /ε/-[e] and the largest between the L2 contrast /i/-/u/. As a consequence, the MMN peak latency steadily decreased with increasing acoustic deviation (cf. Näätänen et al., 1997). Actually, the behavioral findings showed that the /i/-/u/ contrast is better discriminated than the /æ/-/ v / contrast. So, such fine mirroring of the MMN peak latencies to the behavioral discrimination performances suggests that the perceptual processes manifested by our subjects are influenced by stimulus representations containing mainly auditory (sensory) information.

Furthermore, the MMN peaked at frontal electrodes, was minimal over supra-temporal regions, and was right lateralized. This can shed further light on the nature of the perceptual processes of our subjects (cf. Näätänen et al., 1993; Rinne et al., 2000; Deouell, 2007). Indeed, the MMN generators are usually left lateralized over supra-temporal regions for speech stimuli, whereas the acoustical MMN is bilaterally generated, suggesting that the neural phoneme traces are located in the left auditory cortex (Näätänen et al., 1997; Rinne et al., 1997; Shestakova et al., 2002; Pulvermüller et al., 2003; Shtyrov et al., 2005; see Näätänen et al., 2007 for a discussion). Consequently, the similarity in MMN amplitudes between the groups and the predominant frontal right hemispheric activation suggest a discrimination of auditory sensory information rather than permanent phoneme traces.

Overall, these results confirmed our view based on PAM predictions, namely that both our student groups responded to L2 contrasts as they assimilate them to L1 phonemes, similarly to L2 naïve listeners. If native L2 perceptual abilities had emerged, we would have found significant differences in the MMN amplitude and peak latency responses between the three groups, which was not the case. However, we did find a slight difference in the MMN topography between the groups, although irrespective of the stimulus category: in the frontal electrodes the control group showed more negative MMN amplitudes than the fifth year group of students (**Figures 2**–**4**). This effect is most likely deriving from the overlap of the attention-related N2b component on the MMN response (Näätänen, 1992; Escera et al., 1998, 2000), so that the alternating effect of the L2 standard and deviant stimuli produced an attention-modulated neural processing in the less experienced subjects than in the ones more experienced with those speech sounds in general (Näätänen, 1990; Sussman et al., 1998). However, this effect was observed for all stimuli and not modulated by the sound category; hence, is not alone sufficient to claim for neuroplasticity to L2 sounds in the student groups.

Our findings suggest that the amount and the quality of classroom inputs received by our students might be insufficient to form long-term traces of the L2 sounds in their auditory cortex, as indexed by the MMN. This picture is consistent with earlier studies on Finnish children participating in English immersion education and on advanced adult classroom Finnish learners of English (Peltola et al., 2003, 2007) where no MMN traces were found for the development of a new L2 vowel category. Also, the same scenario emerged in studies on limited passive training (Dobel et al., 2009) where MEG data showed that L1 phonemic categories are powerful attractors in that they absorb the non-native stimulus, which is a considerable stumbling block on the path to the mastery of non-native contrasts. Based on these findings, the authors proposed that the maturation of new native-like memory traces is associated with the authenticity of the learning context. However, none of these studies have tested these processes within a theoretical framework on L2 speech learning in adulthood.

#### **CONCLUSIONS AND IMPLICATIONS FOR FUTURE WORKS**

Our study for the first time provides an electrophysiological confirmation of the PAM predictions. Specifically, our results confirm that the PAM framework is able to make predictions on non-native speech perception by L2 listeners who have not actively learned an L2 to achieve functional, communicative goals and that within this typology of learners one has to include L2 classroom learners (Best and Tyler, 2007: 16). Actually, foreign language acquisition usually happens in a pervasive L1 setting (where L2 pronunciation receives little attention) and does not extend much outside the classroom: it often employs formal instruction on lexical and grammatical information and lacks intensive perceptual and pronunciation training (Best and Tyler, 2007). When spoken in the classroom, the L2 is often uttered by L1-accented teachers or, at best, by speakers from diverse L2 varieties, which interferes with perception even for native listeners of the L2 (Bundgaard-Nielsen and Bohn, 2004). Thus, foreign language acquisition is a fairly impoverished context for L2 learning. Indeed, starting from the Suter's (1976) work, behavioral studies examining the influence of formal instruction on the acquisition of L2 foreign perception and production skills have not produced favorable results for language teachers (Flege et al., 1995). The amount of formal inputs received by L2 students has been shown to have a rather limited or null influence, except for the case in which specific training in the perception and production of L2 sounds or a substantial amount of high-quality input over a period of many years is administered (see Piske et al., 2001; Simon and D'Hulster, 2012, and the literature within cited). Thereby, we confirmed and extended the findings of previous behavioral studies (Flege and Fletcher, 1992; Flege, 1995; Flege et al., 1999) in neurally showing that long-term L2 language classroom has no influence on degree of L2 perception and foreign accent. Further studies might, however, utilize novel methods of signal processing to investigate whether differences in neural processing depending on classroom learning might be hidden in narrow EEG frequency bands or in trial-to-trial variations or in corticocortical transfer of information (e.g., Choi et al., 2013; Lieder et al., 2013), which could not be detected with the conventional approach adopted here.

Overall, this and earlier studies support the hypothesis that students in a foreign language classroom should particularly benefit from learning environments only where: (i) receive a focused amount of high-quality input from L2 native teachers; (ii) use pervasively the L2 to achieve functional and communicative goals; and (iii) receive intensive training in the perception and production of L2 sounds in order to reactivate neuroplasticity of auditory cortex (see the issues and studies discussed in Piske, 2007). In fact, recent behavioral and neurophysiological studies (Kraus et al., 1995; Pisoni and Lively, 1995; Tremblay et al., 1997, 1998; Tremblay and Kraus, 2002; Iverson et al., 2005; Ylinen et al., 2009; Zhang et al., 2009) suggest that the sensory resolution of phonetic features can be improved by targeted training, even in adults, and new phonetic representations may be stably developed.

#### **ACKNOWLEDGMENTS**

The authors thank Maija Peltola, Anna Shestakova, Sari Ylinen and Friedemann Pulvermüller for their helpful suggestions. We also wish to thank Francesco Sigona and Chao Liu for their help with signal processing in various stages of the study and Jari Lipsanen and Enrico Ciavolino for their help in statistical analyses. Finally we thank David Ellison for his support in improving the English text. This research was co-financed by an E.C. grant within the National Operational Program "Scientific research, technological development, higher education" (D.D. MIUR 1312), the 3-year grant of the University of Helsinki (project number 490083), and the post-doctoral project of the Academy of Finland (project number 133673).

#### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www*.*frontiersin*.*org/journal/10*.*3389/fnhum*.* 2014*.*00279/abstract

**Figure S1 | Power spectral density curves representing the EEG spectrogram recorded at the channel Oz for each subject and the three experimental contrasts.** To plot the curves, the Fourier Transform has been computed over the whole recording of the EEG time series for each subject and condition by using the function Matplotlib in Matlab environment.

#### **REFERENCES**


vowel contrasts*.* J. Psycholinguist Res*.* 36, 15–23. doi: 10.1007/s10936-006- 9030-y


finnish second-language users of english. *J. Cogn. Neurosci.* 22, 1319–1332. doi: 10.1162/jocn.2009.21272

Zhang, Y., Kuhl, P. K., Imada, T., Iverson, P., Pruitt, P., Stevens, E. B., et al. (2009). Neural signatures of phonetic learning in adulthood: a magnetoencephalography study. *Neuroimage* 46, 226–240. doi: 10.1016/j.neuroimage.2009. 01.028

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 25 July 2013; accepted: 15 April 2014; published online: 14 May 2014. Citation: Grimaldi M, Sisinni B, Gili Fivela B, Invitto S, Resta D, Alku P and Brattico E (2014) Assimilation of L2 vowels to L1 phonemes governs L2 learning in adulthood: a behavioral and ERP study. Front. Hum. Neurosci. 8:279. doi: 10.3389/fnhum. 2014.00279*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Grimaldi, Sisinni, Gili Fivela, Invitto, Resta, Alku and Brattico. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Training of ultra-fast speech comprehension induces functional reorganization of the central-visual system in late-blind humans

#### *Susanne Dietrich\*, Ingo Hertrich and Hermann Ackermann*

*Department of General Neurology, Center for Neurology, Hertie Institute for Clinical Brain Research, University of Tübingen, Tübingen, Germany*

#### *Edited by:*

*Merim Bilalic, University Tübingen, Germany*

#### *Reviewed by:*

*Shanqing Cai, Boston University, USA Alessandro Guida, University of Rennes 2, France*

#### *\*Correspondence:*

*Susanne Dietrich, Department of General Neurology, Center for Neurology, Hertie Institute for Clinical Brain Research, University of Tübingen, Hoppe-Seyler-Str. 3, D-72076 Tübingen, Germany e-mail: susanne.dietrich@ med.uni-tuebingen.de*

Individuals suffering from vision loss of a peripheral origin may learn to understand spoken language at a rate of up to about 22 syllables (syl) per seconds (s)—exceeding by far the maximum performance level of untrained listeners (ca. 8 syl/s). Previous findings indicate the central-visual system to contribute to the processing of accelerated speech in blind subjects. As an extension, the present training study addresses the issue whether acquisition of ultra-fast (18 syl/s) speech perception skills induces de novo central-visual hemodynamic activation in late-blind participants. Furthermore, we asked to what extent subjects with normal or residual vision can improve understanding of accelerated verbal utterances by means of specific training measures. To these ends, functional magnetic resonance imaging (fMRI) was performed while subjects were listening to forward and reversed sentence utterances of moderately fast and ultra-fast syllable rates (8 or 18 syl/s) prior to and after a training period of ca. 6 months. Four of six participants showed—independently from residual visual functions—considerable enhancement of ultra-fast speech perception (about 70% points correctly repeated words) whereas behavioral performance did not change in the two remaining participants. Only subjects with very low visual acuity displayed training-induced hemodynamic activation of the central-visual system. By contrast, participants with moderately impaired or even normal visual acuity showed, instead, increased right-hemispheric frontal or bilateral anterior temporal lobe responses after training. All subjects with significant training effects displayed a concomitant increase of hemodynamic activation of left-hemispheric SMA. In spite of similar behavioral performance, trained "experts" appear to use distinct strategies of ultra-fast speech processing depending on whether the occipital cortex is still deployed for visual processing.

#### **Keywords: cross-modal plasticity, speech perception, residual vision, strategies, blindness**

#### **INTRODUCTION**

The acquisition of sensorimotor or perceptual skills is associated with functional brain reorganization, and these processes may emerge within a single or extend across several modality-specific cortical regions. For example, training-induced neuroplasticity of auditory association areas has been observed in normal-sighted subjects who learned to understand time-compressed speech (Adank and Devlin, 2010). By contrast, cross-modal mechanisms appear to contribute to the processing of non-visual (tactile or auditory) stimuli or language-related tasks in blind individuals. Striate cortex, e.g., has been found to show significant hemodynamic activation during Braille reading (e.g., Büchel et al., 1998; Sadato et al., 1998; Gizewski et al., 2003; Sadato, 2005; Burton et al., 2006), auditory motion detection (Poirier et al., 2006), syntactic and semantic speech processing (Röder et al., 2002), verb generation, production of mental images based upon animal names, and verbal episodic memory retrieval (Amedi et al., 2003; Lambert et al., 2004; Raz et al., 2005). As a further example, vision-impaired individuals may learn to understand accelerated spoken language at a rate of up to about 22 syllables (syl) per second (s)—exceeding by far the capacities of normal sighted listeners whose upper limit is at ca. 8 syl/s (Moos and Trouvain, 2007). This exceptional skill allows for the processing of large amounts of written materials using screen-reading text-to-speech devices and may help, e.g., to better cope with the demands of college or university education. As a first approach to the elucidation of the cerebral mechanisms underlying these intriguing perceptual/cognitive capacities, a previous fMRI study of our group (Dietrich et al., 2013) delineated the hemodynamic activation pattern of late-blind and sighted individuals while listening to sentence utterances of a moderately fast (8 syl/s) or ultrafast (16 syl/s) syllable rate. The proficiency of the blind subjects extended from low to high comprehension capabilities (up to *>*90%—in terms of the percentage of syllables in correctly reproduced words in a sentence repetition task) at 16 syl/s, whereas the performance level of sighted subjects fell consistently below 20%. Besides the classical perisylvian "language zones" of the left hemisphere [inferior frontal gyrus (IFG)/superior temporal cortex] and the supplementary motor area (SMA), blind people highly skilled in ultra-fast speech perception showed significant hemodynamic activation of right-hemispheric primary visual cortex (V1), contralateral fusiform gyrus (FG), and bilateral pulvinar (Pv) (Dietrich et al., 2013). In a recent Hypothesis and Theory paper (Hertrich et al., 2013b), an expanded model of speech perception was introduced to describe how blind subjects might use their visual system for ultra-fast speech perception. Thereby, right visual cortex enhances time-critical speech processing due to its cross-links to (i) the afferent auditory pathway (e.g., via Pv) and (ii) to frontal action-related representations (e.g., SMA, IFG). Experimental data have shown that the visual system can impact auditory perception at basic computational stages such as temporal signal resolution. For instance, magnetoencephalographic measurements revealed an early field component in right occipital cortex phase-locked to the syllable onsets of accelerated speech (Hertrich et al., 2013a). In normal sighted people, the "bottleneck" for understanding time-compressed speech seems related to higher demands on the buffering of phonological materials and is presumably linked to frontal brain structures. Thus, occipito-frontal interaction via SMA might be an important factor for overcoming this bottleneck.

Moos and Trouvain (2007) as well as our previous group study (Dietrich et al., 2013) used as participants a group of blind performers and normal sighted non-performers and, thus, the factors *blindness* and *performance* were confounded. Thus, the question whether vision might be a limitation for ultrafast speech comprehension is not yet answered. Training studies can be expected to further elucidate the relationship between neuroimaging data and behavioral performance during ultrafast speech perception. More specifically, we hypothesized—based upon our preceding work—that, first, the acquisition of ultrafast speech comprehension skills translates into hemodynamic activation of the visual system, indicating that enhanced spoken language processing might be associated with the recruitment of occipital cortex. Second, the strength of training-induced responses of the visual system was expected to parallel the extent of vision impairment. Third, activation of left IFG and SMA as components of inner speech representations—might reflect speech understanding. In particular, left SMA is hypothesized to play a role in coordinating prosodic features such as the timing of syllable onsets with phonetic representations in IFG which is also assumed to strongly interact with the mental lexicon. The data, furthermore, will be interpreted in terms of different neuroplasticity patterns as suggested by Kelly and Garavan (2005), i.e., the distinction between reorganization and redistribution during the acquisition of a new skill. Thereby, functional reorganization in terms of recruiting an additional area with training associated with the shift in the cognitive processes underlying performance (see also Guida et al., 2012)—is expected in blind subjects with strongly reduced vision. By contrast, subjects with residual vision after training may just increase activation within the already existing classical language network.

As a first and still preliminary test of these suggestions, five late-blind subjects varying in residual vision capacities and one normal sighted individual—all of them never exposed to accelerated spoken language before—were instructed to train ultra-fast speech comprehension capacities over a period of ca. 6 months. The participants underwent behavioral performance tests as well as fMRI measurements prior to and after the training sessions while listening to sentence utterances of a moderately fast (8 syl/s) and ultra-fast (18 syl/s) speaking rate. As a control condition, the same test materials were applied as time-reversed events to the participants, representing unintelligible signals of a matched distribution of spectral energy. Given the relatively small sample size of the training study, each of the six individuals (differing in their residual vision) was evaluated based on whole-brain analyses as a single case. Furthermore, the "activation spots" of our previous group study (Dietrich et al., 2013) which displayed a significant covariance between BOLD responses and ultra-fast speech perception capabilities (right V1, ipsilateral Pv, left SMA, and left IFG) were considered regions-of-interest (ROI) and analyzed in more detail at the group level.

#### **MATERIALS AND METHODS PARTICIPANTS**

Five blind subjects and a single normal-sighted individual (3 males; mean age = 34.3 years, *SD* = 11*.*99) participated in this functional imaging experiment (**Table 1**). All of them were righthanded (Edinburgh handedness inventory) native German speakers without a history of neurological problems or hearing deficits as determined by means of an audiogram. The study design had been approved by the ethics committee of the University of Tübingen. All blind participants received a set of written information (MRI guidelines, data protection, and consent form) as pdf-files by email. Prior to fMRI measurements, the experimenter read, in addition, the materials aloud to each blind individual, and the consent form was signed in the presence of a sighted witness. Since the blind participants were recruited from community organizations, a detailed clinical data bank was not available to the authors and, thus, information on etiology and follow-up of the ophthalmological disorders had to be drawn from personal interviews and previous medical records. In all instances, a peripheral origin of blindness could be established, but the participants represented a rather heterogeneous group with respect to residual vision capabilities, i.e., peripheral visual field and visual acuity (see **Table 1**). The evaluation of the hemodynamic response patterns of each participant, therefore, had to take into account his/her individual profile of visual functions. Three of the six participants (see **Table 1**) showed no or low visual capabilities (nos. 147, 151, 150), residual or normal functions (nos. 144, 146, 142).

#### **TRAINING PROCEDURE**

After a baseline session (behavioral testing and fMRI measurements), the participants were instructed to use the screenreader JAWS (male voice, synthesizer "Eloquence," http://www*.* freedomsci*.*de) for at least 1 h per day. They received digital newspapers on a regular basis, but might also "read" other texts, e.g., e-books. Furthermore, the subjects were encouraged to speed up more and more the syllable rate according to their actual training level. When they themselves had the impression to understand more than 80% of the texts at a level of 13 syl/s, they were invited to a second series of fMRI measurements (i.e., first training target). A third functional imaging session was conducted as soon as a syllable rate of 18 syl/s (final training target) could be mastered. The time span between the recording sessions amounted to ca. 3


**Table 1 | Clinical and behavioral data of the vision-impaired and healthy subjects.**

months, but could ultimately be determined by the subjects themselves. The present study primarily compares the pre- (baseline) and post-training (trained at 18 syl/s) measurements, whereas the results of the intermediate session (trained at 13 syl/s), including complex de-/activation patterns, are not in the focus of the discussion.

#### **STIMULI OF THE fMRI EXPERIMENT**

Three sets (to be used at different sessions) of 90 different text passages comprising one or two sentences each were collected from newspapers. All these test items had a duration of ca. 4 s after text-to-speech conversion (formant synthesizer "eloquence" implemented in the synthesizer JAWS), at the three speaking rates considered (30 stimuli each at 8, 13, or 18 syl/s; see **Supplementary files 1**, **2** for examples). Thus, the faster test sentences encompassed more text than the slower ones. In a calibration experiment prior to the training study, the relationship between the internal speed parameters of the JAWS system and the respective mean syllable rate was determined. In addition, all stimuli were converted into time-reversed speech signals, serving as spectrally matched, but unintelligible control items (**Supplementary files 3**, **4**). Altogether, thus, each stimulus set contained 30 (items) × 3 (rates) × 2 (forward/backward) = 180 stimuli that were presented in pseudo-randomized order within one session.

#### **BEHAVIORAL DATA ACQUISITION AND ANALYSIS**

To obtain a quantitative behavioral measure of an individual's capability to understand time-compressed speech, each subject performed—outside the scanner and prior to the fMRI measurements—a sentence repetition task, encompassing a set of 33 sentences of a length of 18 syllables each. These verbal utterances were played to the participants at different speaking rates amounting from 6 syl/s up to 22 syl/s, using a manual staircase procedure in order to determine the actual syllable rate at which repetition performance amounted to about 80% (total percentage of syllables across all correctly repeated words). The stimulus materials were presented to the participants via a loudspeaker (Fostex, 6301B) within a sound-attenuated room, subjects being asked to repeat them "as good as possible," even when they failed to "grasp" all words (see **Supplementary file 5** for an example). The subjects' repetitions were digitally recorded (M-audio Microtrack 2496) and underwent subsequent quantitative evaluation of speech comprehension (percentage of correctly reproduced syllables (see above), ignoring minor errors such as plural endings). Admittedly, this repetition task has a memory component that might interfere with spoken language intelligibility. However, in a previous study (Dietrich et al., 2013) we observed a strong ceiling effect for moderately fast speech, indicating that memory is not the limiting factor in the subjects' performance.

#### **fMRI—DATA ACQUISITION**

Based upon an event-related design, the set of 180 stimuli of each functional imaging session (30 items at three speaking rates plus the respective backward signals) as well as 40 silent baseline events (scanner noise only) were applied in a randomized order, subdivided into five runs of 44 stimuli each as binaural-symmetric signals to the participants via headphones (Sennheiser HD 570; modified by removal of the permanent magnet, see Baumgart et al., 1998). Since these headphones show sufficient dampening of environmental noise, it was not necessary to provide additional earplugs during the experiments. The inter-stimulus interval amounted to 9.6 s (jitter ± 1.4 s, steps of 0.2 s). Since the design did not allow for an explicit control of speech comprehension during the fMRI experiment, behavioral performance was evaluated offline, and during scanning subjects were just instructed to listen carefully to the applied auditory stimuli and to try to understand them. Thus, the design did not allow for an explicit control of speech comprehension during the fMRI experiment. However, the brain structures sensitive to speech intelligibility have been found to "light up" even under listeningonly conditions (Scott et al., 2000). And activation of language processing areas such as IFG has been considered an indicator of actual speech comprehension (Poldrack et al., 2001). Subjects were asked to close their eyes during scanning and to report to the experimenters whether they could adequately hear the test materials in the presence of scanner noise, otherwise the sound amplitude of the stimuli was further adjusted.

The experiment was run on a 3 Tesla MRI system (Magnetom TRIO; Siemens, Erlangen, Germany), using an echo-planar imaging sequence (echo-time = 30 ms, 64 × 64 matrix with a resolution of 3 × 3 mm2, 27 axial slices across the whole brain volume, TR = 1.6 s, slice thickness = 4 mm, flip angle = 90◦, 270 scans per run). The scanner generated a constant background noise throughout fMRI measurements, serving as the baseline condition of the experimental design during the null events. Anatomical images required for the localization of the hemodynamic responses were obtained by means of a MDEFT sequence (T1-weighted images, TR = 2.3 s, TE = 2.92 ms, flip angle = 8◦, slice thickness = 1 mm, resolution = 1 × 1 mm2) of a bi-commissural (AC-PC) orientation.

#### **fMRI—DATA ANALYSES**

Preprocessing of the data encompassed slice time and motion correction, normalization to the Montreal Neurological Institute (MNI) template space, and smoothing by means of an 8 mm full-width half-maximum Gaussian kernel (SPM5 software package; http://www*.*fil*.*ion*.*ucl*.*ac*.*uk/spm). For the sake of statistical analysis, the blood oxygen level-dependent (BOLD) responses were modeled by means of a prototypical hemodynamic function within the context of a general linear model, event durations being specified as 4 s epochs. Any low-frequency temporal drifts were removed using a 128 s high-pass filter.

The evaluation of the functional imaging data encompassed the following steps of signal analysis:


*k* = 10 voxels; the SPM coordinates resulting from the threeway interaction *Mode* × *Speech rate* × *Training* are listed in **Supplementary file 10**).


#### **RESULTS**

#### **BEHAVIORAL PERFORMANCE AFTER THE TRAINING**

Prior to the training procedure, none of the subjects was able to understand ultra-fast speech at a level of 18 syl/s. More specifically, the percentage of correctly reproduced syllables fell below 14% in all instances. At (the second) follow-up examination, four subjects (nos. 147, 150, 144, and 142) showed an improved behavioral performance of up to ca. 70% points (mean = 71%, *SE* = 2*.*74%). This subgroup included participants with low or high residual or even normal vision. A single participant (no. 146) failed to show any learning effects, and another one (no. 151) achieved only modest accomplishments amounting to about 36% points (**Table 1**). Regarding moderately fast (8 syl/s) and accelerated speech at 13 syl/s, after training all subjects performed with *>*80% (8 syl/s: mean = 97%, *SE* = 6*.*94%; 13 syl/s: mean = 93%, *SE* = 5*.*71%) correctly reproduced syllables (**Table 1**).

#### **WHOLE-BRAIN ANALYSES**

#### *Ultra-fast speech vs. baseline pre-training*

Prior to the training period, hemodynamic responses to ultra-fast verbal utterances emerged within primary auditory areas of either hemisphere as well as adjacent structures of the superior/middle temporal cortex in all subjects (SPM *T*-contrast "forward ultra-fast speech vs. baseline," *p <* 0*.*005 uncorrected with an extent threshold *k* = 10 voxels; **Figure 1**; **Supplementary file 6**). Additionally, some participants showed activation within leftand right-hemispheric frontal, and parietal regions, the left cerebellum, and subcortical supratentorial areas such as the hippocampus, thalamus, or pallidum (**Supplementary file 6**).

#### *Ultra-fast speech vs. baseline post-training*

After completion of the training, all subjects displayed significant responses within primary auditory areas of either hemisphere as well as adjacent structures of the superior/middle temporal cortex

**FIGURE 1 | Whole-brain analyses from six single subjects differing in residual vision, based upon the SPM** *T***-contrasts for forward ultra-fast speech vs. baseline pre- and post-training (red = condition vs. baseline, green = baseline vs. condition), the direct contrast post- vs. pre-training** **with respect to the ultra-fast condition as well as the training-induced baseline shift in the null-event (red = post- vs. pre-training, green = prevs. post-training).** Threshold at *p <* 0*.*005 uncorrected with an extent threshold *k* = 10 voxels (for peak coordinates see **Supplementary files 6**–**9**). (**Figure 1**; **Supplementary file 6**). Regarding well-trained participants (nos. 147, 150, 144, and 142), visual inspection of the activation clusters within the temporal lobe showed less extension toward the anterior part of the superior temporal gyrus (STG) and sulcus (STS) in subjects with no or low residual vision (nos. 147, 150) as compared to the others (nos. 144, 142). Particularly, the normal-sighted subject (no. 142) displayed strong anterior temporal lobe activation (**Figure 1**). Furthermore, participants with low residual vision (nos. 147, 151, 150) activated the occipital lobe, e.g., primary and secondary visual areas (BA 17/18) (**Figures 1**, **2**; **Supplementary file 6**). The FG was found to "light up" in the subject with no residual vision (no. 147), whereas in other participants (nos. 151, 144, 142) BOLD signal changes were restricted to the inferior temporal gyrus (**Supplementary file 6**). All subjects showed activation within the left-hemispheric inferior frontal gyrus (IFG). Under the applied threshold, left IFG activation occurred sometimes as a secondary peak linked to the temporal cluster (**Supplementary file 6**). All well-trained subjects displayed hemodynamic responses of left SMA (**Figure 1**; **Supplementary file 6**), however, in subject no. 147 SMA activation fell below the applied threshold (*x*, *y*, *z* = −3, 6, 69, *T* = 2*.*24). One subject with high residual vision (no. 144) showed, obviously, BOLD responses within the left- and right-hemispheric frontal cortex, i.e., precentral gyrus (PrCG), middle frontal gyrus (MFG), orbital gyrus (OrbG), and IFG (**Figure 1**; **Supplementary file 6**). Some participants presented with additional activation clusters within the parietal lobe and subcortical areas such as thalamus and caudate nucleus. In all subjects, left and/or right cerebellar activation was noted (**Figure 1**; **Supplementary file 6**).

Regarding the SPM *T*-contrast vice versa (baseline vs. ultrafast speech condition), a widespread pattern concerning the parietal, occipital, and frontal lobe occurred, primarily in subjects with normal or residual vision (nos. 144, 146, 142) and predominantly after the training period (**Figures 1**, **2**; **Supplementary file 7**; for occipital clusters).

#### *Post- minus pre-training regarding the ultra-fast speech condition*

Regarding the training effect on hemodynamic activation during ultra-fast speech perception, the six subjects showed different individual response patterns:

Direct comparison of hemodynamic activation after and before the learning procedure (SPM *T*-contrast post- vs. pretraining; **Figure 1**; **Supplementary file 8**) revealed only in the sole subject (no. 147) with no residual vision and a high training effect significant BOLD signal changes at the level of the occipital lobe, i.e., right primary and bilateral secondary visual cortex (BA 17/18), and the cerebellum. Additionally, this participant displayed significant hemodynamic responses within the bilateral temporal lobe (STG, middle temporal gyrus = MTG, inferior temporal gyrus = ITG, temporal pole = Tp), and to a lesser extent within the frontal (bilateral PrCG, left IFG) and left parietal lobe (postcentral gyrus = PoCG, supramarginal gyrus = SmG). By contrast, subject no. 151, also afflicted with significantly reduced visual acuity, who, nevertheless, had only moderate training success (36% points), did not show enhanced occipital and cerebellar activation after training. An increase of BOLD responses in this subject occurred predominantly within parietal (left PoCG, bilateral angular gyrus = AG, bilateral inferior parietal lobule = IPL), frontal (left PrCG, left IFG,

**in each subject (T1 MNI template) based upon the SPM** *T***-contrast for forward ultra-fast speech vs. baseline**

with an extent threshold *k* = 10 (for peak coordinates see

**Supplementary files 6**, **7**).

bilateral MFG), and left temporal (STG, Tp) regions. Subject no. 144—with residual vision—displayed increased hemodynamic responses (from pre- to post-training runs) during the forward ultra-fast condition primarily within right frontal areas (right PrcG, bilateral IFG, bilateral MFG, bilateral SMA, right SFG) and left temporal regions (MTG, STG, Tp). The normal sighted individual (no. 142) showed a positive training effect within bilateral MTG, left PrCG and SMA, bilateral IFG, and left parietal lobe (SmG, IPL). Two further subjects (nos. 150, 146) did not display any significant response differences between preand post-training measurements. Most of the subjects (except for subject no. 147) showed a training-induced decrease in activation of temporal regions (MTG, STG; see **Figure 1** and **Supplementary file 8**).

#### *Post- minus pre-training regarding the null-event (baseline)*

Considering pre- and post-training runs in a concatenated sequence, the subject with no residual vision and a significant training effect (no. 147) showed also a baseline shift (regarding the null-event) in hemodynamic activity between post- and pretraining runs (SPM *T-*contrast, **Figure 1**; **Supplementary file 9** "post- vs. pre-training regarding the null-event (=baseline)"). Thereby, increased activation affected similar regions as described for the forward ultra-fast speech condition (see above paragraph: Post- minus pre-training regarding the ultra-fast speech condition): bilateral occipital cortex and cerebellum, distinct clusters within the bilateral temporal and frontal lobe. In the remaining subjects (nos. 151, 150, 144, 146, 142) an increase of baseline activation was found only within some small clusters: right STG/MTG, PrCG, SmG, IFG, and subcortical regions.

Decreased baseline activity after training was observed in subject no. 151 (weak training effect, low residual vision) within some regions of the occipital lobe whereas the remaining subjects showed such a decrease only in small frontal clusters (**Figure 1**; **Supplementary file 9**).

#### *Interaction of the experimental conditions (ANOVA)*

In order to delineate the network used (i) by a participant with no residual vision as well as (ii) a normal-sighted person during processing of ultra-fast speech (18 syl/s) after the training period, the three-way interaction *Mode* × *Speech rate* × *Training* was computed and found to achieve significance in both cases (**Figure 3A**; **Supplementary file 10**).

In subject no. 147 (no residual vision), bilateral occipital activation clusters emerged, i.e., left secondary visual areas (BA18) and a right primary visual area (BA17). Further, left temporal (MTG), bilateral fronto-parietal (PrCG, PoCG, paracentral lobule = Pcl), and right frontal (superior frontal gyrus = SFG, OrbG)

**FIGURE 3 | (A)** Three-Way interaction of an ANOVA (conditions vs. baseline), including the factors *Mode* (forward/backward speech), *Speech rate* (8/18 syl/s), and *Training* (pre-/post-training) exemplified for two single subjects—the one with no residual vision and the normal sighted, both with comparable training success [*p <* 0*.*005 uncorrected, *k* = 10 voxels]. Variance was calculated across five runs per subject and training stage. **(B)** Hemodynamic effects within one selected cluster from the ANOVA regarding each session (pre-, post-training as well as the intermediate measurement): (i) the right primary visual area (V1) of the blind individual and (ii) the right temporal pole (Tp) of the sighted subject. Values of percent signal change are shown

separately for the moderately fast (8 syl/s) and ultra-fast (18 syl/s) condition vs. baseline (left plot) as well as separately for the conditions and the null-event vs. the implicit baseline each (right plot). In the blind subject, right V1 activation during the ultra-fast speech condition increased after the training, whereas the reversed condition showed a decrease (note that also the null-event increased). Since no V1 activation or deactivation was found at pre-training measurements, functional reorganization is suggested. By contrast, since the Tp activation of the sighted person already existed in the pre-training stage during moderately fast speech processing and increased post-training during ultra-fast speech perception, redistribution is assumed to be the neuro-plastic mechanism.

areas were found to be activated. Considering *post-hoc* hemodynamic effects (values of percentage signal change) of the forward and reversed moderately fast (8 syl/s) and ultra-fast (18 syl/s) conditions (vs. baseline) within the right-hemispheric primary visual area, BOLD response increased after the training (compared with pre-training) during the forward ultra-fast—but not moderately fast—condition, whereas both reversed speech conditions remained at the level of zero or showed deactivation (**Figure 3B**, left plot).

In the sighted participant (no. 142), bilateral temporal (MTG, Tp), bilateral cerebellar (Cb), left frontal (SMA, IFG, PrCG), and parietal regions (left SmG, right Pcl) showed a significant three-way interaction. *Post-hoc* analysis found increased right temporal pole activation after training during the forward ultra-fast speech condition, but revealed this area also to be activated before training during the moderately fast speech condition. Additionally, this region showed selectivity to forward (compared with reversed) speech (**Figure 3B**, left plot).

Regarding the training-induced baseline shift (null-event; **Figure 3B**, right plot), hemodynamic activity during the nullevent was stronger after the training period in the blind individual whereas the normal-sighted participant showed an opposite tendency. The baseline shift in the blind subject was even stronger during the intermediate measurement at a training stage of 13 syl/s (not included in the ANOVA).

#### **REGIONS-OF-INTEREST (ROI) ANALYSES** *Right-hemispheric visual cortex*

Concerning right V1 (**Figure 4A**), the tests of percent signal changes addressed the difference of the values from zero and the differences between conditions. Only subject no. 147 (no residual vision) showed significantly positive values under the forward ultra-fast speech condition following the training sessions (*T* = 2*.*870, *p <* 0*.*05) and, additionally, higher values under the forward as compared to the reversed ultra-fast speech condition post-training (*T* = 2*.*479, *p <* 0*.*05). Subject no. 151 (low residual vision) achieved positive values under

pre- and post-training. Participants are arranged according to their visual acuity. Lower plots (blue): Percent signal change during forward (left) and reversed (middle) ultra-fast speech (18 syl/s) as well as forward moderately fast (8 syl/s) speech (right) pre- and post-training (vs. baseline). Significant differences from zero (one sample *T* -tests): <sup>∗</sup>*p <* 0*.*05, ∗∗*p <* 0*.*01, ∗∗∗*p <* 0*.*001, ∼ *p <* 0*.*1.

the reversed ultra-fast condition post-training (*T* = 2*.*248, *p <* 0*.*05), whereby forward ultra-fast speech not did yield any significant differences as compared to reversed utterances. Subject no. 150 (low residual vision) displayed neither significant positive/negative values nor significant differences between forward and reversed speech under the ultra-fast speech condition after the training sessions. Subjects nos. 144 (*T* = −3*.*123, *p <* 0*.*05), 146 (*T* = −1*.*637, *p* = 0*.*09), and 142 (*T* = −0*.*598, *p* = 0*.*291) showed negative values or did not differ from zero during the forward ultra-fast speech condition post-training. Furthermore, these latter subjects did not show any differences between forward and reversed ultra-fast speech post-training. In addition, significantly positive values under the moderately fast speech condition pre-training (*T* = 3.735, *p <*.05) and selectivity to forward compared with reversed speech (*T* = 1*.*944, *p <* 0*.*05) could be observed exclusively in subject no. 144.

Furthermore, a significant negative correlation between residual vision and the percent signal change during the forward ultra-fast speech condition (vs. baseline) could be detected before (ρ = −0*.*77, *p <* 0*.*05) and after the training sessions (ρ = −0*.*83, *p <* 0*.*05), with a tendency toward a steeper regression line at the post-training examination (*T* = −1*.*569, *p <* 0*.*1) (**Figure 5A**). By contrast, no significant correlations emerged during application of moderately fast speech or during the reversed ultra-fast speech condition—neither at pre- nor post-training measurements.

All three subjects with no/low residual vision (rank 1, 2, 3) showed positive values of percent signal change compared to subjects with residual vision (rank 4, 5, 6) with respect to the post- and pre-training stages (**Figure 5A**). Therefore, a wholebrain covariance analysis was performed with the residual vision as non-linear covariate (rank 1, 2, 3, 4, 4, 4), based on the hypothesis that training-induced enhancement of primary visual activation occurs only in subjects with no or low residual vision. Indeed, the training effect for forward ultra-fast speech (SPM *T*contrast post- vs. pre-training) was associated with hemodynamic responses of right V1 (**Figure 5B**) at a similar location as the activation cluster observed in our previous group study (Dietrich et al., 2013; **Figure 5C**).

#### *Right-hemispheric pulvinar*

As concerns right Pv, neither pre- nor post-training fMRI evaluation yielded any percent signal change values which significantly differed from zero (**Figure 4B**). Furthermore, no significant differences emerged between forward and reversed ultra-fast speech conditions post-training (**Figure 4B**). However, across the entire group of participants, a significant negative correlation between residual vision and percent signal change during forward ultra-fast speech was found at post-training measurements (ρ = −0*.*77, *p <* 0*.*05) in right Pv. Prior to the learning procedure, the correlation also showed a negative trend (ρ = −0*.*71, *p* = 0*.*055).

#### *Left-hemispheric SMA and IFG*

As concerns left-hemispheric SMA, all subjects with training effects (= all except no. 146) showed a positive change in percent signal change across follow-up under the forward ultra-fast

speech condition (**Figure 4C**, left plot). By contrast, negative changes from the pre- to the post-training values of this measure could be observed at the level of SMA under the forward moderately fast condition across most subjects (descriptive; **Figure 4C**, right plot). Considering left IFG, significant positive values of percent signal change were found during the forward moderately fast speech condition across most subjects pre- as well as posttraining (**Figure 4D**). By contrast, only two subjects (nos. 151, 142) showed significant positive values of percent signal change during the forward ultra-fast speech condition post-training, whereas during the backward ultra-fast speech condition negative values could be observed within one subject (no. 144) (**Figure 4D**).

Before training, a significant positive correlation between BOLD responses and behavioral performance (**Figure 6A**) was found across all three speech rates (8, 13, 18 syl/s) of the forward speech conditions and across all subjects within left SMA (*r* = 0*.*58, *p <* 0*.*05) as well as IFG (*r* = 0*.*72, *p <* 0*.*001). After

training, most subjects (except of nos. 151 and 146 with respect to the ultra-fast speech condition) performed all speech rates (8, 13, 18 syl/s) up to a level of at least ca. 70% correctly reproduced syllables. However, post-training BOLD responses of these subjects with at least ca. 70% performance during the ultrafast speech condition (vs. baseline, *n* = 4) were significantly higher than during moderately fast speech (*n* = 6) within SMA (*T* = −4*.*009, *p <* 0*.*05), but not within IFG (*T* = 1*.*639, *p* = 0*.*162) (**Figure 6B**). Expectedly, against the background that four of the six subjects reached comparable training effects, their residual vision did not significantly correlate with BOLD responses in left SMA or IFG neither pre- nor post-training (*p >* 0*.*1 for all tests).

#### **DISCUSSION**

#### **BEHAVIORAL TRAINING EFFECTS**

Four of the six participants showed—independently of their residual visual functions—a significant improvement of ultrafast (18 syl/s) speech comprehension, in terms of a performance level of ca. 70% points after daily training sessions. And the three vision-impaired subjects of this subgroup reported to benefit during daily life from their ability to use screen-reading text-tospeech devices. By contrast, poor (36% points) or no training effects could be observed in the two remaining individuals. So far, none of the participants achieved the pre-specified target of more than 80 % correctly repeated words at a speech rate of 18 syl/s. Since the participants underwent fMRI measurements as soon as they themselves had the impression to meet this criterion, they lacked an external control over actual performance. The sighted control subject of the present study reported that she had "reached her limits" and that she feels additional training would not yield further improvements of ultra-fast speech perception. By contrast, a further extension of the training period (so far ca. 6 months) might have been beneficial in case of vision loss, since higher performance levels have been documented in other blind individuals (Moos and Trouvain, 2007). Most noteworthy, residual vision in terms of visual acuity in the four subjects with a significant—and similar—increase of post-training behavioral performance ranged from none to 100%. Thus, this clinical parameter, obviously, did not constrain the acquisition of ultra-fast speech perception capabilities. It cannot be excluded, however, whether and in how far subjects with low residual vision might be able to further enhance ultra-fast speech comprehension in case of more expanded training procedures.

#### **POST-TRAINING BRAIN NETWORKS OF ULTRA-FAST SPEECH COMPREHENSION (WHOLE-BRAIN ANALYSES)** *Common activation "nodes" across subjects*

# Clinical and functional imaging data indicate a contribution of

left IFG to speech perception—at least in case more demanding segmentation processes and/or working memory operations are involved (Burton et al., 2000). Furthermore, hemodynamic activation of language processing areas such as IFG has been assumed to represent an index of actual speech comprehension (Poldrack et al., 2001). The present study revealed post-training left-hemispheric IFG responses across all subjects, indicating intelligibility of ultra-fast spoken language even if the training effect was low (subject no. 146 showed no training effects, nevertheless, performance level at baseline had amounted to 14%). Furthermore, left-hemispheric PrCG and the cerebellum displayed significant BOLD signal changes during forward ultrafast speech post-training. These observations are in line with clinical and functional imaging data pointing at a contribution of those structures—under specific circumstances—to auditory speech perception. For example, the cerebellum has been found engaged in the encoding of specific temporal-linguistic information during word identification tasks (Mathiak et al., 2002).

#### *Descriptive analyses of individual response patterns*

Four participants showed a significant—and similar—increase of post-training behavioral performance (ca. 70% points), independent of residual vision capabilities (*<*2, 10, 20, 100% visual acuity). Most noteworthy, different individual hemodynamic response patterns could be observed in these cases, indicating processing strategies of ultra-fast speech understanding to vary across these subjects.

Training of ultra-fast speech comprehension was found to induce hemodynamic activation at the level of the occipital lobe in three subjects with no or low residual vision. The strongest effects (also significant in the post- minus pre-training comparison) emerged in subject no. 147, a participant lacking any residual visual functions, and such responses to accelerated spoken language could not be observed at the baseline examination. These findings provide tentative support for a relationship between ultra-fast speech perception and right-hemispheric V1 activation—in case the central-visual system is deprived of modality-specific afferent input. Based on our previous studies, it has been suggested that collaboration of right auditory cortex and ipsilateral primary visual area allows for the implementation of a signal-driven timing mechanism related to syllabic modulation of the speech signal (Dietrich et al., 2013; Hertrich et al., 2013a,b). More specifically, subjects able to recruit the visual cortex during spoken language comprehension appear to install an additional information channel conveying temporal cues on syllable structure to the frontal lobe which then help to trigger phonological encoding processes and, as a consequence, to support verbal working memory operations (Hertrich et al., 2013b). Since the visual system seems to compensate for the temporal limitations of the auditory system under these conditions, this strategy might allow for a further increase in ultra-fast speech comprehension capabilities beyond the level that was achieved in the present training study.

Besides increased V1 activation during the ultra-fast speech condition, subject 147 also showed significant changes in the nullevent if the runs across the pre- and post-training sessions were concatenated for analysis. Interestingly, this increase was strongest in the session at an intermediate training level when the ultra-fast condition did not yet activate right V1 (vs. baseline). This "baseline shift" might result from actual activity during the null event. However, it could also reflect a discrepancy between the explicit (null-event) and the implicit baseline, which might indicate that the BOLD response in V1 is not strictly aligned with the stimuli across training stages. This inconsistency might be bound to the emergence of reorganization processes—rather than just redistribution of brain activity as reported by Kelly and Garavan (2005)–since occipital cortex was not active prior to the pretraining session and since a new area has been recruited during the learning procedure.

Interestingly, subject no. 151 (with low visual acuity similar to subject no. 147) showed a strong reduction of the null-event activity within the occipital lobe at the post- as compared to the pre-training session, but no increase of activation during the ultra-fast speech condition. The low training effect of this subject seems to be in line with the absence of significant V1 activation during the ultra-fast condition. However, the strong changes of the null-event activity might indicate that also in this subject training-induced reorganization is possible.

By contrast, subject no. 150 (10% visual acuity) did not show any training-induced changes in occipital cortex activity and, thus, visual system recruitment for the sake of ultra-fast speech perception seems to be impossible.

It should be noted that the post-training hemodynamic response pattern of subject no. 147, displaying the strongest response of right visual cortex, encroaches, in addition, upon left-hemispheric primary (BA17) and bilateral secondary (SOG, MOG) visual areas. As an explanation, the recruitment of occipital areas at either side of the brain might reflect central-visual support of several linguistic functions during ultra-fast speech comprehension, e.g., both segmental (left hemisphere) and suprasegmental/prosodic (right hemisphere) processing. Against the background of previous findings (Dietrich et al., 2013; Hertrich et al., 2013b), strong right-hemispheric lateralization of the primary visual area seems to be associated with a prosodic function, i.e., triggering syllable onsets within a metric pattern. Thus, a somehow unfocused reorganization pattern might be further stabilized after getting more experienced with ultrafast speech materials. Continuation of the training procedure at high rates could result, conceivably, in a less distributed occipital activation pattern centered around the right primary visual area.

In line with our preceding single-case and group studies on ultra-fast speech perception (Hertrich et al., 2009; Dietrich et al., 2013), post-training fMRI revealed hemodynamic activation of left FG in response to ultra-fast speech materials in subject no. 147, characterized by the strongest engagement of righthemisphere visual cortex. FG is embedded into the so-called ventral route of the central-visual system which, especially, supports object recognition (e.g., Haxby et al., 1991), but may contribute to phonological operations as well (e.g., Cone et al., 2008). Thus, left FG as a secondary phonological area might expand the phonological network and help to cope with the higher processing demands during ultra-fast speech perception.

Presumably, right-hemispheric frontal lobe activation, including PrCG, MFG, OrbG, and IFG as observed in subject no. 144, might reflect a network associated with phonological word stress processing during ultra-fast speech comprehension. In previous studies, neural correlates of word stress as compared to vowel quality encoding (Klein et al., 2011) or auditory processing of bi-syllabic pseudo-words (Tervaniemi et al., 2009) were found within a widespread fronto-temporal activation pattern including, among others, right-hemispheric activation clusters resembling the response pattern of subject no. 144. Against this background, it may be assumed that a prosodic/metric pattern was generated, predicting continuous word stress or metrical stress sequences during listening to ultra-fast speech. Such a focus on stressed syllables—ignoring more or less the unstressed ones—might help to overcome the backward masking effects of high speaking rates. However, this strategy of ultrafast speech comprehension, conceivably, poses higher demands on the auditory system with regard to the processing of word stress information, interpolation of unstressed syllables, and the mapping of these patterns onto articulatory motor plans. This assumption is in line with reports of increased activation of the auditory association cortices and left ventral premotor cortex after training to understand time-compressed speech in sighted individuals (Adank and Devlin, 2010). Although such a compensation strategy based upon prosodic/metric redundancies of verbal utterances might facilitate ultra-fast speech comprehension to a certain extent, it may not be able to really enhance the temporal resolution at the level of the syllable and, therefore, might require increased top-down effort of language processing.

Distinct hemodynamic activation of the temporal pole at either side was found after training of ultra-fast speech comprehension in the sighted subject no. 142. Combinatorial processes at sentence-level (Hickok and Poeppel, 2007) and the evaluation of affective semantics (Ethofer et al., 2006) have been associated with hemodynamic responses of the anterior temporal lobe, indicating holistic rather than compositional (i.e., hierarchical analysis of segments, syllables, and morphemes, and phrases) semantic processing under these conditions. Such top-down mechanisms based upon contextual/semantic pre-information—may facilitate sound-to-semantic mapping processes even in case of incomplete phonological decoding. Although the use of this strategy should overcome to a certain extent the backward masking effects of ultra-fast speech signals, we may speculate that subjects applying this strategy will have difficulty understanding the logical structure of more complex sentences. Moreover, the present (sighted) subject who might have used this strategy did not recruit any new area as in the case of the blind participant. Instead, activity existing pre-training during the moderately fast speech condition was found increased post-training during the ultra-fast speech condition. Thus, redistribution of activity as defined in Kelly and Garavan (2005) is assumed to be the underlying mechanism for plasticity.

Summarizing these descriptive analyses of distinct individual activation patterns, subjects appear to use different strategies during ultra-fast speech perception depending, among other things, on residual visual functions. In case of very low residual vision, the visual system seems to be recruited as a means to enhance actual temporal resolution of the acoustic signal via implementation of an extra-channel conveying information on syllabic structure to verbal working memory buffers. This new strategy concomitant with the recruitment of an additional area can be considered a variant of functional reorganization. If the visual system is not at a subject's disposal for spoken language comprehension, listeners may rely on already available semantic top-down strategies facilitating lexical access, based upon, e.g., redundant aspects of verbal materials. Under these conditions, a stronger engagement of (some components of) the classical language network might help to speed up the encoding processes, providing an example of neuroplasticity due to redistribution of brain activity.

#### **THE IMPACT OF RESIDUAL VISION AND BEHAVIORAL PERFORMANCE ON THE RESPONSES OF THE VISUAL SYSTEM (ROI ANALYSES)**

During forward ultra-fast speech perception, residual vision was found to have a significant impact on the activation of the righthemispheric primary visual cortex (V1) and ipsilateral pulvinar (Pv), a subcortical structure projecting to the occipital lobes. More specifically, a negative correlation emerged in terms of decreased hemodynamic activation concomitant with increased residual vision. The respective regression line showed a trend toward a steeper decline at the post-training measurements, indicating a differential impact of residual vision on the training effects. Thus, residual visual capacities appear to constrain the resources of audiovisual reorganization during ultra-fast speech perception. The critical threshold seems to be centered around acuity values of 10 to 20%, whereas a more efficient use of the visual cortex such as in subject no. 147 even may require residual vision to fall below a level of only 2%.

A previous study investigating cross-modal reorganization in a vision-impaired subject who had suffered from severe visual acuity reduction with no evidence of visual field loss, reported populations of visual neurons not critical for a subject's remaining low vision to be recruited for tactile information processing (Cheung et al., 2009). The observation of such a retinotopy-related functional segregation of neurons bound to residual vision, on the one hand, and cross-modal processing, on the other, might explain why audiovisual plasticity did not occur in the subjects of our group with normal or high residual vision. Furthermore, visual acuity rather than the peripheral visual field seems to be the crucial prerequisite of a recruitment of the visual system (Cheung et al., 2009). The present data confirm this suggestion since, e.g., subject no. 151 with no visual acuity, but intact peripheral visual fields showed increased hemodynamic activation of the visual system at post-training measurements although the subject showed a moderate training effect only. By contrast, participant no. 144 suffering from a strongly reduced visual field did not activate right visual cortex after the training sessions during the forward ultrafast speech condition, presumably, because visual acuity was not markedly reduced. Consequently, the foveal representation within the visual system seems to be the target of cross-modal reorganization processes during the acquisition of ultra-fast speech comprehension. In line with this hypothesis, training of ultra-fast speech comprehension induced strong visual cortex activation in a subject with no residual vision (no. 147). In spite of good training effects, only low hemodynamic responses within the visual system could be observed in the subject suffering from residual 10% visual acuity (no. 150).

Interestingly, forward moderately fast speech did not induce any distinct hemodynamic activation of visual cortex after training. It can be assumed, thus, that the speech task has to be sufficiently challenging in order to induce significant activation in the visual system. In fact, passive listening to sounds does not give rise to occipital activation in blind subjects (Arno et al., 2001). Accordingly, visual areas in the blind were not activated by the mere presence of sound, but were involved in attentive perception of changes in the auditory environment (Kujala et al., 2005). Attention is suggested to play an important role during ultra-fast speech comprehension as well—in terms of the detection of almost entirely masked syllable onsets (Dietrich et al., 2013; Hertrich et al., 2013b). However, significant visual cortex activation was also found at pre-training examination during the moderately fast speech condition selectively to forward (compared to reversed) speech in a subject with high residual vision (no. 144). As an explanation, hemodynamic responses of visual cortex during speech perception might reflect actual visual imagery. The subjects' high familiarity with speech signals in association with speaking faces during language acquisition might facilitate such effects also in late-blind subjects. In a previous study, addressing the reverse relationship of the two sensory modalities, perception of visual lip movements during silent syllable production was found to evoke primary auditory BOLD responses, indicating strong audiovisual interactions during speech perception (Hertrich et al., 2010).

Taken together, the highest activation levels of the visual system were found in case of (i) absent residual vision and (ii) good training effects. Only subject no. 147 met these two criteria while the remaining subjects fell short to either one or both characteristics. As concerns the latter participant, visual system recruitment was selectively limited to forward—and, thus, principally intelligible—utterances. As a rule, right Pv displayed a profile of BOLD responses similar to right visual cortex. Again, these findings point at a contribution of this subcortical structure to ultra-fast speech perception. Conceivably, the observed interactions of Pv and visual cortex reflect the operation of an audiovisual interface in that the secondary visual pathway provides the central-visual system with auditory input (see Hertrich et al., 2013b, for more details).

#### **VISION-INDEPENDENT AND PERFORMANCE-DEPENDENT CHANGES OF HEMODYNAMIC ACTIVATION OF LEFT SMA AND IFG (ROI ANALYSIS)**

Pre-training BOLD responses within left IFG and SMA were found to be significantly correlated—independently from residual vision—with the percentage of correctly understood speech materials, pooled across speech rates (8, 13, 18 syl/s). Consequently, both left IFG and SMA activation are associated with speech comprehension. More specifically, it is suggested that left IFG is involved in the speech production-related phonological reconstruction of perceived speech material (Burton et al., 2000). However, not all participants with significant training effect showed significant left IFG activation with respect to the cluster resulting from Dietrich et al. (2013) after the training period. Therefore, some subjects seem not to reach the threshold with respect to the relatively large cluster resulted from (Dietrich et al., 2013). However, the whole-brain analyses revealed post-training left-hemispheric IFG activation in subjects with significant training effect.

Prior to training, hemodynamic activation within left SMA during the "moderately fast condition" showed considerable variation across individuals. Recruitment of left SMA, thus, does not seem to be a prerequisite to speech comprehension *per se* as such. However, all subjects with training effects showed a positive change in percent signal change across follow-up under the forward ultra-fast speech condition. Moreover, left SMA, but not IFG activation post-training was significantly higher during the ultrafast than the moderately fast speech condition in well-performing subjects. Neither the pre- nor post-training responses of SMA and IFG depended on residual vision, an observation supporting the view of a supra-modal function of these frontal structures, irrespective of input modality. Several studies indicate SMA to engage in the syllabic organization of verbal utterances during speech production (Ziegler et al., 1997; Riecker et al., 2005; Brendel et al., 2010). On a broader scale, this region appears to support timing processes across various sensorimotor and cognitive domains (Rubia and Smith, 2004; Paz et al., 2005). Furthermore, clinical as well as experimental data point at a contribution of SMA to spoken language perception and verbal working memory operations as well (Schirmer et al., 2001; Smith et al., 2003; Schirmer, 2004; Chung et al., 2005; Geiser et al., 2008). Given that SMA might act as a platform for timing operations related to verbal working memory functions, sensory systems might convey temporal information on syllable structure via left SMA into this short-term storage system. More specifically, the joint activation of auditory and visual areas with left SMA could reflect a signal-driven timing mechanism facilitating the transformation of the perceived acoustic signal into an active pre-articulatory phonetic/phonological output structure in our speech generation system under time-critical conditions (see Hertrich et al., 2013b, for more details).

#### **CONCLUSIONS**

Several participants were found able to increase ultra-fast speech (18 syl/s) comprehension up to a level of ca. 70% points, irrespective of residual vision, across a training session of about 6 months. However, subjects seem to deploy distinct patterns of neuroplasticity in order to overcome the bottleneck of temporal resolution associated with high speaking rates: In case of very low residual vision, the right-hemispheric visual system (V1, Pv) was, apparently, recruited indicating functional reorganization at the level of the occipital lobes. By contrast, participants with normal or high residual vision did not show any training-induced responses of the central-visual system. However, they displayed increased activation (redistribution) of several parts of the classical speech processing network. Although both patterns of neuroplasticity resulted in comparable training effects, the strategy of processing ultra-fast speech might significantly differ: In case of a deployment of the visual system, actual temporal resolution of the acoustic signal might increase via the representation of a syllabic trigger cues. If, however, the visual system is not at a subject's disposal for spoken language processing, they may use, as an alternative, a strategy based on informational redundancies of the speech signal. Independent of residual vision, left-hemispheric SMA and IFG activation emerged as a common indicator of speech comprehension. In case of proficient ultra-fast speech comprehension, however, only SMA showed stronger activation during application of ultra-fast as compared to moderately fast test materials, pointing at a critical contribution to spoken language understanding under time-critical conditions.

#### **AUTHORS CONTRIBUTION**

Hermann Ackermann, Ingo Hertrich, and Susanne Dietrich delineated the rationale and developed the design of the study. IH and SD were engaged in data collection and development of analyses methods. SD performed the behavioral and fMRI data analyses, and drafted the first version of the paper. All authors contributed to the final version of the manuscript and approved its content.

#### **ACKNOWLEDGMENTS**

This study was supported by the German Research Foundation (DFG; AC 55 09/01) and the Hertie Institute for Clinical Brain Research, Tübingen. The authors would like to thank Maike Borutta for excellent technical assistance, recruiting and taking care of the participants.

#### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fnhum.2013. 00701/abstract

#### **REFERENCES**


mesiofrontal cortex to the preparation and execution of repetitive syllable productions: an fMRI study. *Neuroimage* 50, 1219–1230. doi: 10.1016/j.neuroimage.2010. 01.039


**Supplementary file 1 | An example for forward moderately fast speech (8 syl/s).** "Wegen den anstehenden wichtigen Prüfungen muss er viel lernen."

**Supplementary file 2 | An example for forward ultra-fast speech (18 syl/s).** "Die Mitfahrerin stellte sich als gute Gesprächspartnerin heraus."

**Supplementary file 3 | An example for reversed moderately fast speech (8 syl/s). See Supplementary file 1.**

**Supplementary file 4 | An example for reversed ultra-fast speech (18 syl/s). See Supplementary file 2.**

**Supplementary file 5 | An example of the repetition task for ultra-fast speech of blind listeners with ca. 70% comprehension capability.** "Die Karten für das Konzert nächstes Jahr sind bereits jetzt ausverkauft."

**Supplementary file 6 | Coordinates describing the SPM** *T-***contrasts "forward ultra-fast speech condition versus baseline" pre- and post-training separately.**

**Supplementary file 7 | Coordinates describing the SPM** *T-***contrasts "baseline versus forward ultra-fast speech condition" pre- and post-training separately.**

**Supplementary file 8 | Coordinates describing the SPM** *T***-contrasts "postversus pre-training" during the forward ultra-fast speech condition (and vice versa).**

**Supplementary file 9 | Coordinates describing the SPM** *T***-contrasts "postversus pre-training" (and vice versa) during the null-event (**=**baseline).**

**Supplementary file 10 | Coordinates describing the Three-Way interaction "***Mode* × *Speech rate* × *Training***" resulting from the ANOVA exemplified to subject nos. 147 and 142.**

motor area. *Am. J. Neuroradiol*. 26, 1819–1823.


(2003). Cross-modal plasticity for sensory and motor activation patterns in blind subjects. *Neuroimage* 19, 968–975. doi: 10.1016/S1053- 8119(03)00114-9


the speech signal—time-locked MEG signals during perception of ultra-fast and moderately fast speech in blind and in sighted listeners. *Brain Lang*. 124, 9–12. doi: 10.1016/j.bandl.2012.10.006


Cerebellum and speech perception: a functional magnetic resonance imaging study. *J. Cogn. Neurosci*. 14, 902–912. doi: 10.1162/089892902760191126


musical expertise and attentional focus. *Eur. J. Neurosci.* 30, 1636–1642. doi: 10.1111/j.1460- 9568.2009.06955.x


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 30 July 2013; accepted: 03 October 2013; published online: 23 October 2013.*

*Citation: Dietrich S, Hertrich I and Ackermann H (2013) Training of ultra-fast speech comprehension induces functional reorganization of the centralvisual system in late-blind humans. Front. Hum. Neurosci. 7:701. doi: 10.3389/fnhum.2013.00701*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2013 Dietrich, Hertrich and Ackermann. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Neural correlates of verbal creativity: differences in resting-state functional connectivity associated with expertise in creative writing

**Martin Lotze<sup>1</sup>\* † , Katharina Erhard<sup>1</sup>† , Nicola Neumann<sup>1</sup> , Simon B. Eickhoff 2,3 and Robert Langner 2,3**

<sup>1</sup> Functional Imaging Unit, Center for Diagnostic Radiology and Neuroradiology, University of Greifswald, Greifswald, Germany

2 Institute of Clinical Neuroscience and Medical Psychology, Heinrich Heine University Düsseldorf, Düsseldorf, Germany

3 Institute of Neuroscience and Medicine, Research Centre Jülich, Jülich, Germany

#### **Edited by:**

Merim Bilalic, Alpen Adria University Klagenfurt, Austria

#### **Reviewed by:**

Moritz F. Wurm, University of Trento, Italy Ulrike Halsband, University of Freiburg, Germany

#### **\*Correspondence:**

Martin Lotze, Functional Imaging Unit, Center for Diagnostic Radiology and Neuroradiology, University of Greifswald, Walther-Rathenau-Straße 46, D-17475 Greifswald, Germany e-mail: martin.lotze@ uni-greifswald.de

†These authors have contributed equally to this work.

Neural characteristics of verbal creativity as assessed by word generation tasks have been recently identified, but differences in resting-state functional connectivity (rFC) between experts and non-experts in creative writing have not been reported yet. Previous electroencephalography (EEG) coherence measures during rest demonstrated a decreased cooperation between brain areas in association with creative thinking ability. Here, we used resting-state functional magnetic resonance imaging to compare 20 experts in creative writing and 23 age-matched non-experts with respect to rFC strengths within a brain network previously found to be associated with creative writing. Decreased rFC for experts was found between areas 44 of both hemispheres. Increased rFC for experts was observed between right hemispheric caudate and intraparietal sulcus. Correlation analysis of verbal creativity indices (VCIs) with rFC values in the expert group revealed predominantly negative associations, particularly of rFC between left area 44 and left temporal pole. Overall, our data support previous findings of reduced connectivity between interhemispheric areas and increased right-hemispheric connectivity during rest in highly verbally creative individuals.

**Keywords: creativity, expertise, resting-state-fMRI, functional connectivity, temporal pole, interhemispheric connectivity, basal ganglia, brain**

#### **INTRODUCTION**

Creativity is considered as the ability to produce original and unexpected work, which is appropriate for a given goal (Stein, 1953). Recent reviews (Dietrich, 2007; Abraham, 2013) emphasized the need for cognitive models that define different aspects of creativity and specify the underlying cognitive processes. In a preliminary framework to distinguish between branches of creativity, Abraham (2013) grouped a problem-solving domain and an expression domain. In that framework, creative expression denotes the ability to express oneself in a unique manner. Within the expression domain, there are subgroups depending on the nature of the task (verbal, art, music, etc.). Referring to this framework, the present study represents a between-group approach, in which groups differ with regard to their ability to write creative texts (verbal expression domain).

Two previous studies have investigated the neural correlates of creative story writing. In a positron-emission-tomography study, Bechtereva et al. (2004) found left parieto-temporal regions (BA 39, 40) active during a difficult story generation condition in comparison to an easier one as well as conditions that controlled for syntactic and memory related aspects of the task. The authors concluded that these areas are required to provide the necessary flexibility for creative thinking. In contrast, Howard-Jones et al. (2005) using fMRI found right prefrontal areas as well as the anterior cingulate cortex associated with creative versus uncreative story generation. These activations were connected to episodic retrieval, monitoring, and higher cognitive control. Neither of the two studies investigating story generation, however, involved actual writing in the scanner.

When using a text continuation task, we recently demonstrated that creative writing involved bilateral hippocampi, temporal poles (BA 38) and the cingulate cortex (CC; Shah et al., 2013). These areas have been associated with episodic memory retrieval, free-associative and spontaneous cognition and semantic integration. In addition, there were correlations of the verbal creativity index (VCI; Schoppe, 1975) with activations in the left inferior frontal gyrus (BA 44), the left middle frontal gyrus (BA 9/46) and the left temporal pole (BA 38). In a recent study (Erhard et al., 2014), we compared functional activation during creative writing in groups of expert and non-expert writers using the same paradigm. Experts showed increased left-hemispheric activation in the caudate nucleus and superior medial prefrontal cortex.

Apart from task-related activation sites, studies on the interaction between brain areas in specific networks are informative, especially in creativity research (Jung et al., 2013). Functional connectivity is defined as the statistical association among two or more anatomically distinct time-series (Friston et al., 1993) and can *inter alia* be assessed with electroencephalography (EEG) coherence measures or fMRI resting state functional connectivity. Generally, creative achievement has been connected to decreased cortical arousal, as demonstrated by an increased EEG alpha power (Martindale and Hines, 1975; Fink and Benedek, 2012) and reduced scores on "latent inhibition", the capacity to screen from conscious awareness stimuli previously experienced as irrelevant (Carson et al., 2003). EEG coherence measures determined during rest have indicated less cooperation between brain areas in more creative individuals. This decoupling of brain areas has been found equally distributed over the right and left hemispheres, showing also significant interhemispheric decoupling (Jausovec and Jausovec, 2000). fMRI resting-state functional connectivity (rFC) studies revealed that higher creativity scores were associated with an increased rFC between the medial prefrontal cortex (mPFC) and the posterior cingulate cortex (Takeuchi et al., 2012), which was linked to a stronger interaction within the default mode network (Raichle et al., 2001). Wei et al. (2014)found a positive correlation of rFC between the left mPFC and the left middle temporal gyrus with creativity scores that was interpreted as representing another hub associated with the default mode network. Increased rFC between the mPFC and the posterior cingulate cortex, however, was not confirmed. Taken together, the two existing studies on rFC and creativity were not consistent, but both identified connections associated with the default mode network.

In contrast to the above-mentioned studies, in the present study we investigated rFC in a group of experts in creative writing and compared it to non-experts. rFC was not only correlated to creativity scores assessed by normative creativity tests, but also to the rating of the actual texts written inside the scanner. Seeds for FC analysis were selected from activation maxima calculated during a text continuation task performed by participants included in two previous studies (Shah et al., 2013; Erhard et al., 2014). These areas were the pars opercularis of the inferior frontal gyrus (area 44), the CC, the superior temporal gyrus (STG), the hippocampus, the caudate nucleus (caudate), and the intraparietal sulcus (IPS). According to the findings of EEG-resting state studies, we hypothesized a decreased interhemispheric and left hemispheric FC and a more right hemispheric FC in the expert writers.

#### **METHODS**

#### **PARTICIPANTS**

We investigated 43 native German participants. Twenty expert students of Creative Writing and Culture Journalism from the only two universities in Germany that offer academic courses in creative writing: the Universities of Hildesheim and Leipzig (8 females and 12 males; mean age: 25.2, standard deviation (±) 2.7; mean semester: 7.1 ± 3.9). These students can be considered as well-selected and domain-specific talented people, because the selection criteria for the programs are extremely competitive and only 6% percent of applications are accepted. Twenty-three students from the University of Greifswald (non-experts in creative writing; 11 female and 12 male; mean age: 24.0 ± 1.9) formed the control group. For the non-expert group, we investigated 22 students of medicine, four students of the humanities (psychology, history, english, philosophy), one student from the faculty of law, and one student from the faculty of business. All participants were right-handed (as assessed by the Edinburgh Handedness Inventory; Oldfield, 1971) and reported no history of neurological or psychiatric disorders. Written informed consent was obtained from all participants before entering the study, which was approved by the Ethics Committee of the Medical Faculty of the University of Greifswald.

#### **EXPERTISE MEASURES**

All participants were asked about their experience and practice of creative writing. Experts reported writing experience of 11.7 ± 4.8 years on average, including their studies of creative writing, whereas the non-experts claimed an average of 3.1 ± 5.2 years. Weekly writing practice during the last three months before scanning amounted to 21.0 ± 10.2 h for the expert and 0.5 ± 0.8 h for the non-expert group (*t*(19) = 9.0; *p* < 0.001). Likewise, experts had more years of experience (*t*(46) = 5.80; *p* < 0.001). Adapting a method commonly used in music research, we calculated an individual "practice index" (PI) by multiplication of creative writing experience with weekly writing practice [(semester + years of writing practice) × practice of writing per week].

#### **TASK**

All participants continued a text of two different literary texts (text A written by Ror Wolf; text B written by Durs Grünbein) over a time of 2 min and 20 s, respectively. In accordance with the CAT (Amabile, 1996), all produced texts were typewritten and sent in a randomized order to four independent judges, who were generally familiar with the domain (two professors and two lecturers from the department of Creative Writing and Culture Journalism at the University of Hildesheim). All judges rated the creativity of each text on a 10-cm-long visual analog scale (VAS; from 0: not creative at all, to 10: extremely creative). The creative writing performance in the scanner (creative writing ranking; CR) was calculated for every participant using the mean value of both texts A and B of the "creativity" rating from all judges.

The verbal creativity test (Schoppe, 1975) yielding a summary VCI, consisted of nine subtests analyzing the participants' verbal fluency and verbal production skills, whereas some subtests were also including aspects of flexibility and originality. We evaluated these verbal creativity tests according to its standardized instructions.

#### **DEFINITION OF THE SEED REGIONS RELATED TO CREATIVE WRITING**

The regions of interest ("seeds") for the present investigation had previously been identified by fMRI observed during a text continuation task (Shah et al., 2013; Erhard et al., 2014). The main effects for both groups were calculated and thresholded at *p* < 0.05 (false discovery rate (FDR)-corrected for multiple comparisons across the whole brain). The following seeds showed significance and were therefore tested in the present study: left posterior area 44 (MNI coordinates: −57, 6, 27), right posterior area 44 (54, 6, 33), medial cingulate cortex (−9, 12, 42), left IPS (−36, −39, 48), right IPS (42, −33, 45), left hippocampus (−27, −9, −24), left temporal pole (−48, 9, −18), left posterior superior temporal sulcus (STS) (−54, −42, 6), left (−9, 3, 15), and right (15, 21, 15) caudate nucleus. The peak activation foci of each cluster were taken as centers of spheres with 5 mm radius to define the volumes of interest for the present analysis.

#### **MAGNETIC RESONANCE IMAGING**

Data were acquired at a 3T Siemens Magnetom Verio (Siemens, Erlangen, Germany) with a 32-channel headcoil. Twodimensional echo-planar images (EPI) were acquired with repetition time TR = 2000 ms, echo time TE = 30 ms, flip angle = 90◦ , field of view = 192 × 192 mm<sup>2</sup> . Each volume consisted of 34 slices with a voxel size of 3 × 3 × 3 mm<sup>3</sup> with a 1-mm gap between them. The first two volumes of each run were discarded to allow for T1 equilibration effects.

We used baseline scans interspersed in an experimental block design alternating task-related activation and rest. Overall, each of six different activation conditions was presented twice to each participant. Experimental blocks between baseline periods used for resting-state analysis were five blocks of 60 s duration (experimental conditions: reading, copying, silent speech, brainstorming, correcting) and one block of 140 s duration (creative writing). Participants were presented instructions on a scanneradapted desk. A double mirror affixed on the headcoil enabled the view on the in-scanner desk with the instruction sheets, the text material, the writing sheet, and the fixation cross. During rest, participants were instructed to stop thinking about the task and fixate a fixation cross (eyes-open baseline). Rest periods had a total duration of 20 s (10 volumes). The first three scans of each baseline period were not used in order to reduce BOLD effects from the activation period. The procedure for using baseline scans from block design fMRI experiments was adapted for our purpose from Fair et al. (2007). For rFC analysis, 90 volumes for each participant were used (5 × 8 volumes before and between blocks, plus five volumes at the end of each scanning session). Details on fluctuations during baseline have been described more recently by Garrett et al. (2010). In total, 90 baseline scans of resting state data from two consecutive measurement runs were available for analysis.

#### **RESTING-STATE CONNECTIVITY ANALYSIS**

Data were jointly preprocessed using SPM8 (Wellcome Department of Cognitive Neuroscience, London, UK). Images were first corrected for head movement by affine registration using a twopass procedure by which images were initially realigned to the first image and subsequently to the mean of the realigned images. Each participant's mean image was then spatially normalized to the Montreal Neurological Institute (MNI) single-subject template brain using the "unified segmentation" approach (Ashburner and Friston, 2005), and the ensuing deformation was applied to the individual EPI volumes. Hereby, volumes were resampled at 1.5 × 1.5 × 1.5 × mm<sup>3</sup> voxel size. Images were then smoothed by a 5 mm full-width at half-maximum Gaussian kernel to increase the signal-to-noise ratio and compensate for remaining differences in individual anatomy.

rFC measures can be influenced by several confounds such as head movements and physiological processes (e.g., fluctuations due to cardiac and respiratory cycles; cf. Fox et al. (2009)). In order to reduce spurious correlations, variance explained by the following nuisance variables was removed from each voxel's BOLD signal time series (for a detailed evaluation of this procedure see Satterthwaite et al. (2013)): (i) the six motion parameters derived from the image realignment; (ii) the first derivatives of the six motion parameters; (iii) mean tissue-class specific signal intensity per time point (Cieslik et al., 2013). All nuisance variables entered the regression model as first- and second-order terms, resulting in a total of 30 nuisance regressors. After confound removal, data were band-pass filtered preserving frequencies between 0.01 and 0.08 Hz, as meaningful resting-state correlations will predominantly be found in these frequencies given that the BOLD response acts as a low-pass filter (Greicius et al., 2003).

The time course of each seed region's BOLD signal was then extracted for each participant as the first eigenvariate of activity in all gray-matter voxels located within the respective seed. For each participant, the time-series data of each seed region were correlated with each other, and the resulting Pearson correlation coefficients were transformed into Fisher's *Z* scores. Subsequently, the influence of age and sex was partialled out of both the restingstate correlations and the covariates of interest. Main effects of rFC (across the entire sample) were tested by one-sample *t*tests, applying a significance threshold of *p* < 0.05 (adjusted for multiple comparisons by FDR correction). Median rFC in experts and non-experts was compared via a non-parametric approach using 10,000 realizations of the null hypothesis (group-label exchangeability) in a Monte-Carlo simulation to create an empirical null distribution of group differences (posterior-probability significance threshold: *p* > 0.95, uncorrected). Additionally, we applied effect-size criteria: first, differences were only considered potentially relevant if the rFC score in either group (or both) corresponded at least to a small effect (i.e., *r* ≥ 0.10). Second, the between-group difference in rFC itself needed to correspond to a large effect (i.e., Cohen's *d* ≥ 0.80) to be considered relevant here. Finally, creativity-related changes in interregional coupling were examined by rank-correlating participants' Fisher-*Z*-transformed rFC values with creativity scores across both the entire sample and the expert subgroup alone. The results of these Spearman correlation analyses were regarded significant if they passed a threshold of *p* < 0.05 (uncorrected). Again, we applied an effectsize criterion: accordingly, correlations were regarded as relevant if they were of at least medium size according to Cohen's effectsize categorization (i.e., *r* ≥ 0.24).

#### **RESULTS**

#### **CREATIVITY SCORES**

Mean VCI (Schoppe, 1975) was 116.5 ± 9.9 for expert writers and 107.1 ± 8.8 for non-experts (*t*(46) = 3.42; *p* < 0.01). Creative performance of experts (creativity rating; CR) was commonly judged higher than those of non-experts (*t*(46) = 3.36, *p* < 0.01). We observed a positive correlation between creative performance in the scanner and individual verbal creativity scores (CR and VCI: *r* = 0.38, *p* < 0.01). Expert writers had much more experience in writing creative texts (PI) than non-experts (*t*(19) = 6.24; *p* < 0.001), and this correlated positively with performance (PI and CR: *r* = 0.46, *p* < 0.01) and individual verbal creativity (PI and VCI: *r* = 0.43, *p* < 0.01).

#### **BASIC RESTING-STATE FUNCTIONAL CONNECTIVITY (rFC)**

Across both groups, there was significant positive rFC between the following regions: bilateral IPS, area 44, and caudate nucleus, respectively, were all highly interconnected between hemispheres (IPS: *z* = 5.37; areas 44: *z* = 3.09; caudate nuclei: *z* = 3.19). IPS was additionally coupled to ipsilateral areas 44 on both hemispheres (right: *z* = 4.87, left: *z* = 5.14), and left IPS was coupled with right area 44 (*z* = 4.91). The left hippocampus and the left temporal pole also showed high rFC (*z* = 3.77; see **Figure 1**). In addition, left area 44 was significantly interconnected with MCC (*z* = 2.82).

#### **RESTING-STATE FC DIFFERENCES BETWEEN EXPERTS AND NON-EXPERTS**

Experts showed significantly increased rFC between right IPS and right caudate nucleus (*p* > 0.99, *d* = 1.00; see **Figure 2A**). Furthermore, experts showed reduced rFC between hemispheres in bilateral area 44 (*p* > 0.99, *d* = 1.08; see **Figure 2B**), between right area 44 and left IPS (*p* = 0.99, *d* = 1.12), as well as between right area 44 and left caudate nucleus (*p* = 0.97, *d* = 0.83).

#### **CORRELATIONS BETWEEN CREATIVITY SCORES AND rFC ACROSS ALL PARTICIPANTS**

We observed positive associations of the creativity ratings of the texts (CR) with rFC between left IPS and right caudate (*r* = 0.31, *p* = 0.041). Negative correlations of CR with rFC were found between left area 44 and right IPS (*r* = −0.36; *p* = 0.018) as well as rFC between left area 44 and left aSTG (*r* = −0.35; *p* = 0.022). Furthermore, we observed several negative correlations between the VCI and FC, specifically interhemispheric rFC between bilateral area 44 (*r* = −0.43; *p* = 0.004) as well as rFC between left temporal pole and left caudate (*r* = −0.37; *p* = 0.015), left IPS and left hippocampus (*r* = −0.34; *p* = 0.03), and right IPS and MCC (*r* = −0.31; *p* = 0.041).

#### **CORRELATIONS BETWEEN CREATIVITY SCORES AND rFC IN EXPERTS**

In experts, rFC between left area 44 and left temporal pole correlated negatively with CR (*r* = −0.62; *p* = 0.004; **Figure 3A**), while a positive correlation of CR was observed with rFC between right area 44 and left posterior STS (*r* = 0.55; *p* = 0.013) as well as rFC between left IPS and right caudate (*r* = 0.45; *p* = 0.0498). Furthermore, rFC between left temporal pole and left caudate was negatively associated with VCI (*r* = −0.54; *p* = 0.014; **Figure 3B**). Conversely, rFC between left IPS and right caudate were positively correlated with VCI (*r* = 0.44; *p* = 0.0496) in experts. Comparisons of the correlations between creativity scores and rFC in experts and non-experts yielded no significant results.

#### **DISCUSSION**

We here compared rFC between experts and non-experts in the field of creative writing. Experts showed considerable higher creativity scores in the verbal creativity tests and the creativity ratings

**FIGURE 3 | Correlation of behavioral data and FC in the expert group. (A)** Negative correlation between verbal creativity rating (Amabile, 1996) and connectivity between left area 44 and left temporal pole (TP). **(B)** Negative correlation between creativity index (CI; Schoppe, 1975) and connectivity between left TP and caudate.

of their written texts. These differences in performance might therefore well be associated with differential connectivity during rest. For investigating rFC, we did not investigate the default mode network as recently done by others (Wei et al., 2014), but used seed regions based on our previous studies with the same story generation task (Shah et al., 2013; Erhard et al., 2014). Across both groups, interhemispheric rFC during rest was high between the inferior parietal sulci, the caudate nuclei, and the areas 44. Additionally, the IPS was significantly connected to the ipsilateral area 44 on both hemispheres, as well as the left hippocampus to the left temporal pole. In addition, left area 44 was significantly interconnected with the MCC. Experts in creative writing differed from non-experts by an increased rFC between right caudate and right IPS and a reduced interhemispheric rFC between BA 44 and IPS and caudate.

Across all participants, behavioral creativity scores were predominantly inversely correlated to interhemispheric and leftintrahemispheric rFC. Only rFC between the left IPS and the right caudate correlated positively with the creativity rating of the text. The same pattern was observed in experts, since lefthemispheric and interhemispheric rFC was negatively associated with creativity, apart from positive correlations with the rFC of the right caudate and left IPS and of the right area 44 and the right posterior STS.

#### **DECREASED LEFT- AND INTERHEMISPHERIC rFC IN EXPERTS OF CREATIVE WRITING**

rFC in experts was characterized by a reduced left- and interhemipheric integration. These findings corroborate previous results of EEG coherence measures that demonstrated less cooperation of brain areas related to creative thinking (Jausovec and Jausovec, 2000). Here verbal creativity scores were negatively associated with right hemispheric and interhemispheric connectivity between cortical areas (Jausovec and Jausovec, 2000) supporting previous suggestions (Petsche, 1996) that it is the functional relations between brain regions, rather than the localized power measures that prove to be better indicators of individual differences. In a recent review, Jung et al. (2013) described creative cognition as characterized by "blind variation" (idea generation) and "selective retention" (convergent thinking) processes (Simonton, 2013). Within this framework, the default mode network (Raichle et al., 2001) could serve as a system operating disinhibitory mental simulation processes, whereas specific associated cognitive control networks based on excitatory processes would initiate selection processes and refine ideas (Jung et al., 2013). Our data fit to this model insofar as it provided evidence for a reduced left- and interhemispheric integration of language areas (especially left area 44) that may lead to a more autonomous and less constraining functioning of separate elements of the network. Previous rFC data (Takeuchi et al., 2012; Wei et al., 2014) have stressed the involvement of the default mode network in creative cognition.

#### **THE ROLE OF THE LEFT CAUDATE IN VERBAL CREATIVITY TASKS**

Concerning the role of the basal ganglia in creative cognition, Abraham et al. (2012a) found better scores in a creativity task that demands to overcome the constraining influences of salient examples in a group of patients with basal ganglia lesions. The fact that patients with lesions of the basal ganglia may perform better in problem solving tasks requiring to ignore pieces of information fits well with the literature on inhibitory control operations that are considered a central function of the basal ganglia. Basal ganglia lesions thus result in poor inhibitory control, inattention and increased distractibility (Aron et al., 2003), which is advantageous in overcoming knowledge constraints. In the present study, verbal creativity of experts was correlated to reduced FC of the left caudate with left temporal pole, what is in agreement with an enhanced verbal creativity going along with a decreased inhibition. The left temporal pole is considered a semantic "hub" in the brain (Patterson et al., 2007) and has been found involved in verbal (Abraham et al., 2012b) and nonverbal (Ellamil et al., 2012) creativity tasks, as well as in figurative language comprehension, such as metaphors (Schmidt and Seger, 2009; Mihov et al., 2010). Disinhibition of the left temporal pole may thus contribute to excellent verbal skills in experts of creative writing.

#### **INCREASED rFC BETWEEN INTRAPARIETAL SULCUS AND RIGHT CAUDATE IN EXPERTS**

Remarkably, in our study experts showed decreased corticocortical left and interhemispheric connectivity but increased right-hemispheric interactions of the caudate with the intraparietal sulcus. Further, rFC between right caudate and left IPS were positively correlated with creativity measures. The IPS is involved in verbal short-term memory and functions as an attentional modulator of distant neural networks which themselves are specialized in processing language representations (Majerus et al., 2006). Bilateral caudate activation in turn has been observed during an untrained working memory task (Moore et al., 2013) as well as during long-term working memory training (Kühn et al., 2013) and seems to mediate changes in underlying working memory ability. Increased rFC between the right caudate and bilateral IPS in experts of creative writing may thus be connected to their special expertise and practice with handling verbal information and not be an expression for verbal creativity *per se*.

Although rFC between two regions need not be based on direct anatomical connectivity, the following white-matter fibers interconnecting our seeds might be relevant here: all the interhemispheric connections (commissural fibers) are passing the corpus callosum. These connections have been identified between the inferior parietal sulci, the caudate nuclei, and the areas 44. As for *intra*hemispheric association fibers, the superior longitudinal fascicle III (Schmahmann et al., 2007) might be most relevant. This structure connects the IPS with ipsilateral area 44 on both hemispheres. Furthermore, the inferior longitudinal fascicle might be the relevant association fiber connecting the left hippocampus to the left temporal pole. Parts of the cingular bundle (Schmahmann et al., 2007) might interconnect left area 44 with the MCC. In addition, the right caudate and right IPS, whose functional interconnection was changed in the expert group, might be connected by the fronto-occipital fasciculus (Schmahmann et al., 2007).

#### **LIMITATIONS**

There are several limitations for the approach used in this study. One is that we selected baseline periods of a rest/activation blocked design study instead of measuring a single continuous resting-state period (cf. Fair et al., 2007). Since the number of scans used for our analysis is lower than in comparable restingstate analyses, our approach may have reduced statistical power, potentially leading us to miss relevant but smaller expertise effects. Therefore, further investigations of the effects of verbal creativity on brain networks using longer, continuous restingstate time series would be desirable.

In addition, it can not be excluded that there may be a carryover of the BOLD response from the previous block. Since we did not investigate ROIs of the default mode networks, as did others (Takeuchi et al., 2012; Wei et al., 2014), we are not able to comment on their findings of increased medial prefrontal lobe rFC for creativity. Therefore, future studies might take a more exploratory approach to be able to encompass a wider set of regions associated with verbal creativity.

Finally, it has to be kept in mind that verbal creativity scores and practice in professional writing were associated in our participants. This association is inherent in the expertise approach chosen here to study neural correlates of verbal creativity. Therefore, our approach does not allow for disentangling influences of practice and innate predisposition (i.e., talent).

#### **CONCLUSION**

We here reported on the first comparison of rFC in an expert group in creative writing relative to a closely matched control group. Experts exhibited a reduced interhemispheric rFC and negative correlations of creativity scores with rFC between left caudate and left temporal pole which may indicate less inhibition and more autonomous functioning of language areas. On the other hand, rFC between the right caudate and IPS may reflect long-term experience with verbal information processing. Future studies might use modulation procedures to investigate changes in cortical interaction for different phases of creative verbal processes. In addition, a closer focus on basal ganglia cortical interaction, for instance in patients with basal ganglia lesions, could be an interesting direction for new research.

#### **ACKNOWLEDGMENTS**

We thank the University of Hildesheim and especially Florian Kessler and Hans-Josef Ortheil for the support and valuable help on this study.

#### **REFERENCES**


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

#### *Received: 31 March 2014; paper pending published: 26 May 2014; accepted: 26 June 2014; published online: 15 July 2014*.

*Citation: Lotze M, Erhard K, Neumann N, Eickhoff SB and Langner R (2014) Neural correlates of verbal creativity: differences in resting-state functional connectivity associated with expertise in creative writing. Front. Hum. Neurosci. 8:516. doi: 10.3389/fnhum.2014.00516*

*This article was submitted to the journal Frontiers in Human Neuroscience*.

*Copyright © 2014 Lotze, Erhard, Neumann, Eickhoff and Langner. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms*.

## Do neural correlates of face expertise vary with task demands? Event-related potential correlates of own- and other-race face inversion

#### *Holger Wiese\**

*DFG Research Unit Person Perception, Institute of Psychology, Friedrich Schiller University of Jena, Jena, Germany*

#### *Edited by:*

*Merim Bilalic, University Tübingen, Germany*

#### *Reviewed by:*

*Benjamin J. Balas, Massachusetts Institute of Technology, USA Xin Zheng, York University, Canada*

#### *\*Correspondence:*

*Holger Wiese, DFG Research Unit Person Perception, Institute of Psychology, Friedrich Schiller University of Jena, Am Steiger 3/1, 07743 Jena, Germany e-mail: holger.wiese@uni-jena.de*

We are typically more accurate at remembering own- than other-race faces. This "own-race bias" has been suggested to result from enhanced expertise with and more efficient perceptual processing of own-race than other-race faces. In line with this idea, the N170, an event-related potential correlate of face perception, has been repeatedly found to be larger for other-race faces. Other studies, however, found no difference in N170 amplitude for faces from diverse ethnic groups. The present study tested whether these seemingly incongruent findings can be explained by varying task demands. European participants were presented with upright and inverted European and Asian faces (as well as European and Asian houses), and asked to either indicate the ethnicity or the orientation of the stimuli. Larger N170s for other-race faces were observed in the ethnicity but not in the orientation task, suggesting that the necessity to process facial category information is a minimum prerequisite for the occurrence of the effect. In addition, N170 inversion effects, with larger amplitudes for inverted relative to upright stimuli, were more pronounced for own- relative to other-race faces in both tasks. Overall, the present findings suggest that the occurrence of ethnicity effects in N170 for upright faces depends on the amount of facial detail required for the task at hand. At the same time, the larger inversion effects for own- than other-race faces occur independent of task and may reflect the fine-tuning of perceptual processing to faces of maximum expertise.

**Keywords: faces, event-related potentials, N170, own-race bias, inversion**

#### **INTRODUCTION**

Humans can typically recognize an immense number of previously seen faces and are therefore often considered to be experts in face recognition. Importantly, however, such expertise considerably varies depending on an individual's experience with a specific category of faces. The maybe best known example for this claim is the so-called own-race<sup>1</sup> bias (Malpass and Kravitz, 1969; Meissner and Brigham, 2001), i.e., the finding that participants are typically more accurate at recognizing faces from their own compared to another ethnic group. This phenomenon has been explained by the substantially larger experience that most people have with faces from their own relative to other ethnic groups, resulting in a fine-tuning of face perception mechanisms (Rossion and Michel, 2011). For instance, it has been found that so-called holistic face processing, i.e., the merging of facial features into a single Gestalt-like representation, but also the processing of the features themselves, is more efficient for own- relative to otherrace faces (Tanaka et al., 2004; Rhodes et al., 2006; Hayward et al., 2008).

Neural correlates of face perception have been extensively studied using event-related potentials (ERPs). Most researchers agree that the first component with a high degree of selectively for faces is the N170 (Bentin et al., 1996; Rossion and Jacques, 2008), a negative component peaking at approximately 170 ms at occipito-temporal scalp sites. The N170 is larger for faces than for most objects (e.g., Rossion et al., 2000; Itier and Taylor, 2004), and it has been suggested to reflect the structural encoding (Eimer, 2011) or detection of a face-like pattern (Schweinberger and Burton, 2003; Amihai et al., 2011).

Following-up on the suggestion that the own-race bias results from differences in perceptual face processing, a number of studies examined whether the ethnicity of a face affects the amplitude of the N170. At first sight, the results of these studies are rather discrepant, with roughly half of them reporting no significant difference in N170 for own- vs. other-race faces (James et al., 2001; Caldara et al., 2003, 2004; Wiese et al., 2009; Vizioli et al., 2010a,b; Herzmann et al., 2011; Ofan et al., 2011, 2013; Chen et al., 2013), and the other half finding larger amplitudes for other- relative to own-race faces (Herrmann et al., 2007; Gajewski et al., 2008; Stahl et al., 2008, 2010; Walker et al., 2008; He et al., 2009; Balas and Nelson, 2010; Brebner et al., 2011; Caharel et al., 2011; Wiese, 2012; Montalan et al., 2013; Wiese et al., 2013). Importantly, it has been suggested that varying task demands may contribute to these discrepant findings (e.g., Ito and Bartholow, 2009; Caharel et al., 2011). A larger N170 for other-race faces might occur when the identity of the face is relevant for the task, whereas N170 might be similar for own- and other-race faces when such detailed

<sup>1</sup>The term "race" is exclusively used to refer to visually distinct ethnic groups.

facial information is not task-relevant (as suggested by Ito and Bartholow, 2009).

Broadly in line with this idea, the publications which I am aware of to date can be roughly assigned to one of three categories (see **Table 1**): (i) studies with tasks that are based on *superficial information* in which specific facial detail was not directly taskrelevant (e.g., orientation tasks, detection of target objects etc.), (ii) studies in which *categorical information* was task-relevant (e.g., race categorization, gender categorization etc.), and (iii) studies in which the encoding of *identity information* of individual faces was necessary to perform the task (e.g., recognition memory, identity-repetition tasks etc.) <sup>2</sup> . As can be seen in **Table 1**, when studies are split into these categories the results are relatively consistent, with few exceptions from a clear-cut overall pattern: Whereas seven out of nine studies examining identity information reported an N170 ethnicity effect, with larger amplitudes for other- relative to own-race faces, only two out of ten studies using a task based on superficial information reported such an effect. When the task was based on categorical information, four out of six studies observed larger N170 amplitudes for other-race faces, with the remaining two either showing no effect or larger amplitudes for own-race faces. A crosstab χ<sup>2</sup> test (with studies classified by task category as dependent variables and coding a larger N170 for other-race faces as 1 and all other results as 0) resulted in a significant effect (χ<sup>2</sup> = 7*.*016, *p* = 0*.*030), suggesting a relevant influence of task demands on the occurrence of the N170 ethnicity effect. As is also evident from **Table 1**, other factors that might potentially contribute to the presence vs. absence of an N170 ethnicity effect (e.g., the use of color vs. grayscale images, or the specific other-race tested) do not offer a similarly straightforward explanation.

In a recent study, Senholzi and Ito (2013) directly tested the influence of task on the N170 ethnicity effect. In line with the above-described pattern, they found that ethnicity effects were absent in a butterfly detection task, whereas N170 was larger for other-race faces in an identity task. In a categorization task, however, larger amplitudes for own-race faces were observed. In sum, the systematic review of the literature provided here points to an important contribution of task on the presence vs. absence of the N170 ethnicity effect, but differences between studies may have also resulted from additional factors varying between participant groups (such as varying long-term expertise with other-race people in different countries, see Rossion and Michel, 2011). Moreover, in the study by Senholzi and Ito (2013) task was manipulated as a between-subjects factor, which introduced the possibility that group differences other than task that were not controlled in this study (such as differences in quality or quantity of contact to other-race people, or the distribution of participant gender in the three tasks), might have affected the results.

Furthermore, a number of the above-cited studies examined the so-called face inversion effect (FIE) for own- and other-race faces. It is well established that the picture-plane rotation of a face by 180◦ substantially impairs its recognition, and this effect is disproportionally stronger for faces relative to other objects (Yin, 1969). The FIE has been suggested to result from a substantial difficulty to process configural or holistic information from inverted faces (Maurer et al., 2002; Rossion, 2008). Given that other-race faces are processed less holistically, one might assume that the FIE should be smaller for these faces, a finding which has indeed been observed repeatedly (e.g., Rhodes et al., 1989; Hancock and Rhodes, 2008). Moreover, it is known that face inversion affects the N170, which is increased and delayed for inverted relative to upright faces (e.g., Eimer, 2000; Rossion et al., 2000; Itier and Taylor, 2002). Accordingly, one might expect a larger N170 FIE for own- relative to other-race faces. However, results on this issue are also mixed, with some studies showing larger N170 inversion effects for own-race faces (Vizioli et al., 2010a; Caharel et al., 2011; Montalan et al., 2013), whereas others do not (Wiese et al., 2009; Chen et al., 2013). As can be seen in **Table 1**, the number of relevant studies is relatively small to date, but an enhanced N170 FIE for own-race faces has been shown repeatedly in categorization tasks, while the situation with more superficial tasks is less clear.

Finally, a number of ERP studies tested effects of face ethnicity on the amplitude of the occipito-temporal P2, a positive-going component subsequent to N170, which has been suggested to reflect the processing of second-order configurations (Mercure et al., 2008), i.e., the metric distances between facial features, or the typicality of a face relative to a prototype (Schulz et al., 2012). Previous studies have reported substantially larger P2 amplitudes at both left- and right-hemispheric electrode sites for own- relative to other-race faces in participants without particular expertise for other-race faces (Stahl et al., 2008; Lucas et al., 2011), whereas participants with substantial contact to people from the other ethnic background showed only small ethnicity effects in the P2 (Stahl et al., 2008; Wiese et al., 2013). Importantly for the present purpose, the P2 effect has also been found to be modulated by task demands (Stahl et al., 2010), as it was substantially smaller and restricted to right-hemispheric electrodes when participants were asked to rate own- and other-race faces for attractiveness as compared to categorizing them according to ethnicity. However, it is unclear whether the higher amount of facial detail necessary for the successful completion of the attractiveness task or the reduced salience of ethnicity information in this condition led to the reduced P2 effect.

The present study aimed at testing the following predictions. (i) If the presence vs. absence of the N170 ethnicity effect for upright faces depended on task demands, this should be detectable using a within-subjects manipulation, which excludes the possibility of confounds by uncontrolled group variables. N170 amplitude should be similar for own- and other-race faces in a more superficial task, in which detailed facial information is not task-relevant (orientation task). By contrast, N170 amplitudes should be larger for other-race faces when category information is task-relevant (ethnicity task). Only few studies on categorization tasks are available, and the present study aimed at adding further evidence to this least often tested task category <sup>3</sup> . (ii) A larger N170 FIE for own-race faces has been observed in

<sup>2</sup>Please note that I do not consider the experiments by Ito and Urland (2005) in this review, as N170 is unusually small in this study, presumably due to the untypical use of an average mastoid reference.

<sup>3</sup>I chose to not test all three tasks, as this would have made the experiment inappropriately long.


#### **Table 1 | Previous studies on N170 ethnicity effects, sorted by task categories.**

tasks which either emphasized the processing of detailed facial information (Caharel et al., 2011; Montalan et al., 2013) or not (Vizioli et al., 2010a). Moreover, two further studies with superficial tasks did not find a larger FIE for own-race faces (Wiese et al., 2009; Chen et al., 2013). I thus expected to find an increased N170 FIE for own-race faces in the ethnicity task, whereas it was less clear whether a corresponding effect would emerge in the more superficial orientation task or not. (iii) If the previously observed reduction of the P2 ethnicity effect in an attractiveness judgment relative to an ethnicity categorization task was related to the larger amount of detailed face processing required for the attractiveness decision, a more superficial orientation task should result in a similar (or even larger) P2 effect as a categorization task. If, however, salience of ethnicity information contributes to the P2 effect, it should be larger in the categorization relative to the orientation task. Finally, to test whether any potential task effects were selective for faces, I added non-facial control stimuli to the experiment (i.e., Asian and European houses).

#### **METHODS**

#### **PARTICIPANTS**

Twenty right-handed Caucasian students from the University of Jena (13 female, mean age = 22.0 years ± 2.1 *SD*) contributed data. All participants reported normal or corrected-to-normal vision and no history of neurological or psychiatric disorders. Participants received course credits or a monetary reward of 5C/h for partaking. All participants gave written informed consent and the study was approved by the ethics committee of the Faculty of Social and Behavioral Sciences at Jena University.

#### **STIMULI**

Color images depicting 50 Asian and 50 Caucasian full-frontal faces with neutral or moderately happy expressions (50% female respectively), as well as 50 Asian and 50 European houses were taken from various internet resources. The house stimuli consisted of traditional buildings only to ensure easy differentiation of their cultural origin. Faces and houses were cut out and pasted in front of a uniform black background, such that no clothing or background information was visible. All stimuli were cropped to a frame of 300 × 380 pixels, resulting in a visual angle of 6*.*7◦ × 8*.*5◦ at a viewing distance of 90 cm. Stimuli were matched for luminance and contrast using Adobe Photoshop. Inverted versions of all stimuli were created by picture-plane rotations of the images by 180◦.

Five participants (all female, mean age = 22*.*2 years ± 2.8 *SD*), who did not take part in the main experiment, rated all face stimuli in upright orientation for emotional expressions on a 7-point scale (ranging from 1 = very angry to 7 = very happy; 4 = neutral). An item analysis showed that both Caucasian and Asian faces were rated as showing neutral expressions (Caucasian faces: *M* = 3*.*93 ± 0*.*71*SD*; Asian faces: *M* = 4*.*29 ± 0*.*60 *SD*). At the same time, Asian faces were rated more happy than Caucasian faces [*t(*98*)* = 2*.*74, *p* = 0*.*007, Cohen's *d* = 0*.*55].

#### **PROCEDURE**

Participants were seated in a dimly lit, electrically shielded and noise-attenuated cabin (400-A-CT-Special, Industrial Acoustics, Niederkrüchten, Germany) with their heads in a chin rest. The experiment consisted of two practice blocks (one for each of the two tasks) using additional stimuli and eight experimental blocks. Each trial started with the presentation of a fixation cross that randomly varied in duration between 1000 and 1500 ms, followed by the presentation of a face or house stimulus for 1000 ms. In different experimental blocks, participants were asked to indicate the orientation (upright, inverted) or the ethnic/cultural background of each stimulus (Asian or European). Participants were asked to respond via key presses using their left and right index fingers as quickly as possible without compromising accuracy. The assignment of keys to response categories was counterbalanced across participants.

The experimental design varied the factors stimulus type (face vs. house), ethnicity (Asian vs. European), orientation (upright vs. inverted), and task (orientation task, ethnicity task) withinsubjects. The task changed after each block, and task order was balanced across participants (ABABABAB vs. BABABABA). Each block contained 100 trials with either 12 or 13 trials per condition, which were presented randomly intermixed. Each individual image was presented twice in the course of the experiment, once in the ethnicity and once in the orientation task.

After the main experiment, all participants completed a questionnaire (see Wiese, 2012) asking them to indicate the amount of contact they have with Asian and European people (in h/week), the number of contact persons (per week) from both ethnic groups, and the intensity of contact (0 = no contact, 1 = very superficial to 4 = very intense) with Asian and European people in daily-life situations (such as job/university, meeting friends/spare time activities, family, domestic circumstances). Total scores were calculated for each participant by summing up (h/week, number of persons/week) or averaging (contact quality) self-report measures from the different situations separately for Asian and European contacts.

#### **EEG RECORDING AND ANALYSIS**

EEG was recorded using a 64-channel BioSemi Active II system (BioSemi, Amsterdam, Netherlands). Active sintered Ag/AgClelectrodes were mounted in an elastic cap, and EEG was recorded continuously with a 512 Hz sampling rate from DC to 155 Hz. Note that BioSemi systems work with a "zero-ref" setup with ground and reference electrodes replaced by a so-called CMS/DRL circuit (cf. to http://www*.*biosemi*.*com/faq/cms&drl*.* htm for further information).

Blink artifacts were corrected using the algorithm implemented in BESA 5.3 (MEGIS Software GmbH, Graefelfing, Germany). EEG was segmented relative to stimulus onset from −200 to 1000 ms, with a 200 ms baseline. Trials contaminated by non-ocular artifacts and saccades were rejected using the BESA 5.3 tool, with an amplitude threshold of 100μV and a gradient criterion of 75μV. Remaining trials were re-calculated to average reference, averaged according to experimental condition and digitally low-pass filtered at 40 Hz (12 db/oct, zero phase shift).

Latency of early ERP components (P1, N170) was analyzed at the electrodes of their respective maximum and the respective contralateral homologue (O1/O2 for P1; P9/P10 for N170). P1 peak amplitude was measured at electrodes O1 and O2 in a time window from 90 to 140 ms, N170 peak amplitude was measured at P7/P8, PO7/PO8, P9/P10, and PO9/PO10 in a time window from 140 to 210 ms. For P2, mean amplitudes were analyzed at electrodes P7/P8, PO7/PO8, P9/P10, and PO9/PO10 from 210 to 300 ms. Separate repeated-measures analyses of variance (ANOVAs) were calculated for each component. When appropriate, degrees of freedom were corrected according to the Huynh-Feldt procedure.

#### **RESULTS**

#### **CONTACT QUESTIONNAIRE**

Participants reported substantially more contact to European relative to Asian people [contact time: *M*European = 52*.*4 h/week ± 27.1 *SD*, *M*Asian = 3*.*0 h/week ± 7.5 *SD*, *t(*19*)* = 7*.*76, *p <* 0*.*001, Cohen's *d* = 2*.*638; number of contact persons: *M*European = 31*.*5 ± 22*.*9 *SD*, *M*Asian = 1*.*0 ± 1*.*6 *SD*, *t(*19*)* = 6*.*20, *p <* 0*.*001, Cohen's *d* = 1*.*879]. Moreover, participants indicated more intense contact to European relative to Asian people [*M*European = 2*.*6 ± 1*.*3 *SD*, *M*Asian = 0*.*7 ± 1*.*3 *SD*, *t(*19*)* = 4*.*42, *p <* 0*.*001, Cohen's *d* = 1*.*385].

#### **PERFORMANCE**

A repeated-measures ANOVA on mean reaction times for correct responses (see **Figure 1A**) with the within-subject factors task (orientation vs. ethnicity task), stimulus type (face vs. house), ethnicity (Asian vs. European), and orientation (upright vs. inverted) revealed significant main effects of task [*F(*1*,* <sup>19</sup>*)* = 114*.*98, *p <* 0*.*001, η<sup>2</sup> *<sup>p</sup>* = 0*.*858], stimulus type [*F(*1*,* <sup>19</sup>*)* = 15*.*10, *p* = 0*.*001, η<sup>2</sup> *<sup>p</sup>* = 0*.*443], ethnicity [*F(*1*,* <sup>19</sup>*)* = 28*.*73, *p <* 0*.*001, η2 *<sup>p</sup>* = 0*.*602], and orientation [*F(*1*,* <sup>19</sup>*)* = 5*.*15, *p* = 0*.*035, η<sup>2</sup> *<sup>p</sup>* = 0*.*213]. These main effects were qualified by significant interactions of task × ethnicity [*F(*1*,* <sup>19</sup>*)* = 44*.*58, *p <* 0*.*001, η<sup>2</sup> *<sup>p</sup>* = 0*.*701], reflecting faster responses for Asian stimuli in the ethnicity but not in the orientation task, task × orientation [*F(*1*,* <sup>19</sup>*)* = 5*.*58, *p* = 0*.*029, η<sup>2</sup> *<sup>p</sup>* = 0*.*227], with faster responses for upright stimuli in the ethnicity but not in the orientation task, and stimulus type × orientation [*F(*1*,* <sup>19</sup>*)* = 16*.*57, *p* = 0*.*001, η<sup>2</sup> *<sup>p</sup>* = 0*.*466], with faster responses to upright relative to inverted faces but not houses.

A corresponding analysis on accuracies (see **Figure 1B**) revealed significant main effects of task [*F(*1*,* <sup>19</sup>*)* = 11*.*53, *p* = 0*.*003, η<sup>2</sup> *<sup>p</sup>* = 0*.*378], ethnicity [*F(*1*,* <sup>19</sup>*)* = 11*.*36, *p* = 0*.*003, η<sup>2</sup> *<sup>p</sup>* = 0*.*374], and orientation [*F(*1*,* <sup>19</sup>*)* = 5*.*94, *p* = 0*.*025, η<sup>2</sup> *<sup>p</sup>* = 0*.*238], as well as significant two-way interactions of task × ethnicity [*F(*1*,* <sup>19</sup>*)* = 15*.*10, *p* = 0*.*001, η<sup>2</sup> *<sup>p</sup>* = 0*.*443], with more accurate responses for Asian than European stimuli in the ethnicity but not in the orientation task, and task × orientation [*F(*1*,* <sup>19</sup>*)* = 13*.*84, *p* = 0*.*001, η<sup>2</sup> *<sup>p</sup>* = 0*.*422], with more accurate responses for upright relative to inverted stimuli in the ethnicity but not in the orientation task. Finally, an interaction of stimulus type × orientation [*F(*1*,* <sup>19</sup>*)* = 15*.*42, *p* = 0*.*001, η<sup>2</sup> *<sup>p</sup>* = 0*.*448] was further qualified by a three-way interaction of ethnicity × stimulus type × orientation [*F(*1*,* <sup>19</sup>*)* = 10*.*30, *p* = 0*.*005, η<sup>2</sup> *<sup>p</sup>* = 0*.*352]. Follow-up analyses revealed significant inversion effects, with more accurate responses for upright than inverted stimuli for European faces [*F(*1*,* <sup>19</sup>*)* = 6*.*79, *p* = 0*.*017, η<sup>2</sup> *<sup>p</sup>* = 0*.*263], Asian faces [*F(*1*,* <sup>19</sup>*)* = 6*.*99, *p* = 0*.*016, η<sup>2</sup> *<sup>p</sup>* = 0*.*269], and Asian houses [*F(*1*,* <sup>19</sup>*)* = 5*.*26, *p* = 0*.*033, η<sup>2</sup> *<sup>p</sup>* = 0*.*217], but more accurate responses for inverted relative to upright European houses [*F(*1*,* <sup>19</sup>*)* = 6*.*14, *p* = 0*.*023, η2 *<sup>p</sup>* = 0*.*244].

#### **EVENT-RELATED POTENTIALS**

ERPs are depicted in **Figures 2** and **3**. The number of trials per condition in an individual participant included in the statistical analyses ranged from 27 to 50 (*M* = 44 ± 5 *SD*). In the following paragraphs, main effects of hemisphere or site, as well interactions containing only those factors are not reported. In addition, main effects and interactions qualified by higher-order interactions are not described in the text. A complete list of all significant effects and all statistical indices for the omnibus tests can be found in **Table 2**.

P1. A repeated-measures ANOVA on P1 peak amplitude with the factors hemisphere (left vs. right), task, stimulus type, ethnicity and orientation revealed a significant interaction of stimulus type × orientation, reflecting similar amplitudes for inverted relative to upright houses, but larger amplitudes for inverted relative to upright faces. Additionally, an interaction of hemisphere × ethnicity × stimulus type reflected larger amplitudes for European relative to Asian houses but not faces, an effect which was slightly larger at electrode O2.

Analysis of P1 latencies at O1 and O2 revealed significant main effects of stimulus type, reflecting earlier P1 peaks for houses than faces, ethnicity, with slightly earlier peaks for European stimuli, and orientation, with slightly earlier peaks for upright than inverted stimuli.

N170. Analysis of N170 peak amplitude yielded a significant interaction of task × stimulus type, reflecting larger amplitudes for houses in the orientation than ethnicity task. Moreover, an interaction of site × ethnicity × stimulus type × orientation was observed. Follow-up tests for face stimuli revealed significant interactions of ethnicity × orientation, reflecting larger inversion effects for European relative to Asian faces, at electrodes PO9/PO10 [*F(*1*,* <sup>19</sup>*)* = 18*.*87, *p* = 0*.*003, η<sup>2</sup> *<sup>p</sup>* = 0*.*385], P9/P10 [*F(*1*,* <sup>19</sup>*)* = 10*.*50, *p* = 0*.*004, η<sup>2</sup> *<sup>p</sup>* = 0*.*356], P7/P8 [*F(*1*,* <sup>19</sup>*)* = 5*.*89, *p* = 0*.*025, η<sup>2</sup> *<sup>p</sup>* = 0*.*237], and PO7/PO8 [*F(*1*,* <sup>19</sup>*)* = 6*.*41, *p* = 0*.*020, η<sup>2</sup> *<sup>p</sup>* = 0*.*252; see **Figure 4**]. For houses a significant interaction of ethnicity × orientation was detected at PO7/PO8 [*F(*1*,* <sup>19</sup>*)* = 10*.*76, *p* = 0*.*004, η<sup>2</sup> *<sup>p</sup>* = 0*.*362], with larger inversion effects for European relative to Asian houses, but not at any of the other electrode sites (0*.*16 *< Fs <* 4*.*06, 0*.*058 *< ps <* 0*.*898). The interactions of task × stimulus type × ethnicity [*F(*1*,* <sup>19</sup>*)* = 0*.*56, *p* = 0*.*464, η<sup>2</sup> *<sup>p</sup>* = 0*.*029] and task × stimulus type × ethnicity × orientation [*F(*1*,* <sup>19</sup>*)* = 0*.*34, *p* = 0*.*569, η<sup>2</sup> *<sup>p</sup>* = 0*.*017] were not significant in the omnibus ANOVA.

As the substantial between-category effects in N170 (i.e., houses vs. faces) may have obscured more subtle within-category effects (i.e., Asian faces vs. Caucasian faces) in the analyses described above, an additional ANOVA was carried out, in which only faces were used (see Wiese et al., 2009 for a similar approach). This analysis revealed significant main effects of site [*F(*3*,* <sup>57</sup>*)* = 15*.*89, *p <* 0*.*001, η<sup>2</sup> *<sup>p</sup>* = 0*.*455] and orientation [*F(*1*,* <sup>19</sup>*)* = 30*.*19, *p <* 0*.*001, η<sup>2</sup> *<sup>p</sup>* = 0*.*614], as well as significant

two-way interactions of site × orientation [*F(*3*,* <sup>57</sup>*)* = 2*.*60, *p* = 0*.*043, η<sup>2</sup> *<sup>p</sup>* = 0*.*151], and ethnicity × orientation [*F(*1*,* <sup>19</sup>*)* = 10*.*35, *p* = 0*.*005, η<sup>2</sup> *<sup>p</sup>* = 0*.*353], reflecting larger inversion effects for Caucasian relative to Asian faces. Importantly, the five-way interaction of hemisphere × site × task × race × orientation was significant [*F(*3*,* <sup>57</sup>*)* = 2*.*91, *p* = 0*.*042, η<sup>2</sup> *<sup>p</sup>* = 0*.*133]. *Post-hoc ttests* were calculated to see whether N170 amplitude differed between Asian and Caucasian faces, but I restricted these analyses to those electrode sites that had shown ethnicity effects in previous studies (P9/P10, PO9/PO10; see e.g., Wiese et al., 2013). For upright faces, N170 in the ethnicity task was larger for Asian relative to Caucasian faces at both P9 [*t(*19*)* = 2*.*27, *p* = 0*.*035, Cohen's *d* = 0*.*274] <sup>4</sup> and PO10 [*t(*19*)* = 2*.*28, *p* = 0*.*034, Cohen's *d* = 0*.*180], but neither at PO9 [*t(*19*)* = 0*.*98, *p* = 0*.*339, Cohen's *d* = 0*.*108] or P10 [*t(*19*)* = 1*.*34, *p* = 0*.*197, Cohen's *d* = 0*.*119]. In the orientation task, no significant differences between upright Asian and Caucasian faces were detected [0*.*69 *<* *ts <* 1*.*94, 0*.*067 *< ps <* 0*.*499]. For inverted faces, no significant differences were observed in the ethnicity categorization task (0*.*17 *< ts <* 1*.*32, 0*.*203 *< ps <* 0*.*864), whereas in the orientation task N170 was larger for Caucasian faces at PO10 [*t(*19*)* = 3*.*07, *p* = 0*.*006, Cohen's *d* = 0*.*226], but not at P9 [*t(*19*)* = 1*.*35, *p* = 0*.*193, Cohen's *d* = 0*.*134], PO9 [*t(*19*)* = 1*.*99, *p* = 0*.*061, Cohen's *d* = 0*.*208], or P10 [*t(*19*)* = 1*.*39, *p* = 0*.*179, Cohen's *d* = 0*.*155]. Thus, whereas ethnicity effects in N170 for upright faces were only evident in the ethnicity task (with larger amplitudes for other-race faces), ethnicity effects for inverted faces were only observed in the orientation task (with larger amplitudes for ownrace faces). Finally, two ANOVAs were conducted to confirm that larger inversion effects for own-race relative to other-race faces occurred in both tasks. The critical interaction of ethnicity × orientation was significant both in the orientation [*F(*1*,* <sup>19</sup>*)* = 7*.*73, *p* = 0*.*012, η<sup>2</sup> *<sup>p</sup>* = 0*.*289] and in the ethnicity categorization task [*F(*1*,* <sup>19</sup>*)* = 5*.*30, *p* = 0*.*033, η<sup>2</sup> *<sup>p</sup>* = 0*.*218].

Analysis of N170 latency at P9/P10 revealed significant main effects of stimulus type, with delayed peaks for houses relative to faces, and orientation, reflecting delayed peaks for inverted relative to upright stimuli (see **Table 2**). Moreover, an interaction

<sup>4</sup>Please note that the t tests in this section are not corrected for multiple comparisons, as they were calculated to clarify the significant interaction in the ANOVA.


of ethnicity × stimulus type was observed, with delayed N170 responses for Asian relative to European faces [*F(*1*,* <sup>19</sup>*)* = 34*.*28, *p <* 0*.*001, η<sup>2</sup> *<sup>p</sup>* = 0*.*643], but not houses [*F(*1*,* <sup>19</sup>*)* = 2*.*60, *p* = 0*.*123, η<sup>2</sup> *<sup>p</sup>* = 0*.*120].

P2. A repeated-measures ANOVA on P2 amplitude yielded significant interactions of ethnicity × stimulus type, with larger amplitudes for European relative to Asian faces but not houses, and site × task × ethnicity, reflecting particularly pronounced ethnicity effects in the categorization task at P9/P10. A further interaction of task × stimulus type × orientation was indicative of a larger inversion effect for houses in the orientation compared to the ethnicity task, whereas faces elicited similar inversion effects in the two tasks. Finally, a significant interaction of hemisphere × task × ethnicity × stimulus type was detected. *Post-hoc* analyses for face stimuli revealed significantly larger ethnicity effects over the left hemisphere in the ethnicity relative to the orientation task [interaction of ethnicity × task: *F(*1*,* <sup>19</sup>*)* = 5*.*59, *p* = 0*.*029, η<sup>2</sup> *<sup>p</sup>* = 0*.*227], whereas ethnicity effects over the right hemisphere did not interact with task (*F <* 1). At the same time, European faces elicited significantly more positive amplitudes than Asian faces in both tasks and over both the left and right hemisphere, which was reflected in significant main effects of ethnicity in all possible combinations of these factors [all *F(*1*,* <sup>19</sup>*) >* 27, all *p <* 0*.*001, all η<sup>2</sup> *<sup>p</sup> >* 0*.*592]. *Post-hoc* tests for house stimuli revealed no significant ethnicity effects, neither in the ethnicity nor in the orientation task (all *p >* 0*.*1).

#### **DISCUSSION**

The present study tested the effect of task demands on the neural processing of own- (i.e., European) and other-race (i.e., Asian) faces and non-facial control stimuli (i.e., houses). Concerning the predictions outlined in the introduction, the following main results can be summarized: First, task demands affected the N170 ethnicity effect for upright faces. More specifically, the ERP analysis revealed a larger N170 for upright other- relative to own-race faces in the categorization task, but no such effect in the orientation task. Second, a larger N170 FIE was observed for ownrelative to other-race faces. This interaction of face ethnicity and orientation generalized across tasks. Finally, the P2 ethnicity effect observed over the left hemisphere was reduced in the orientation relative to the ethnicity task, suggesting that a high saliency of ethnicity information amplifies this effect, but is not an essential prerequisite for its emergence. These ERP results and additional behavioral findings are discussed in the following paragraphs.

Most importantly, the analysis of ERP data revealed larger N170 amplitudes for upright other- relative to own-race faces in the ethnicity categorization task but not in the orientation task. This finding is in line with the literature reviewed in the introduction (see **Table 1**), in which the majority of studies using superficial tasks found no N170 ethnicity effect, whereas the majority of studies using categorization or identity tasks found larger N170 amplitudes for other-race faces. Similarly, a recent study by Senholzi and Ito (2013) observed no N170 ethnicity effect in a superficial task and larger amplitudes for other-race faces in an identity task. The present experiment adds to this previous finding by showing that even tasks that do not explicitly require the processing of identity information result in N170 ethnicity effects. It may therefore be seen as a stronger test to the idea that the N170 ethnicity effect is not elicited by particularly superficial tasks. At variance with the present findings, however, Senholzi and Ito (2013) reported larger amplitudes for

*own*-race than other-race faces in a categorization task. This pattern has not been reported previously, with the exception of one study (Ito and Urland, 2005), which used an average mastoid reference potentially obscuring any experimental effects at the nearby T5/T6 electrodes where N170 was measured. While the exact reason for this discrepancy remains unclear, the present results are in line with the majority of studies using categorization tasks.

It should be noted that the larger N170 for upright other- relative to own-group faces seems quite specific to face ethnicity, and does not reflect a more general mechanism differentiating between any social in-group vs. out-group faces. For instance, while the N170 ethnicity effect occurs in both Asian and European participants (Wiese et al., 2013), a similar interaction of stimulus category by participant group with larger amplitudes for othergroup faces is not observed for own- vs. other-age or own- vs. other-gender faces (see e.g., Wiese et al., 2008, 2012b; Melinder et al., 2010; Wolff et al., 2013). In sum, the present findings support the idea that ethnicity effects in N170 amplitude critically depend on task demands.

Interestingly, although N170 was larger for inverted own-race faces in the orientation task only, the N170 FIE was larger for own-race faces in both tasks. The N170 inversion effect has been suggested to reflect perceptual expertise for a given class of stimuli (Rossion et al., 2002), and larger effects for own-race faces are therefore well in line with expertise accounts of the own-race bias (e.g., Tanaka et al., 2004; Rossion and Michel, 2011). The present finding is also in line with those previous studies that observed a larger N170 FIE for own-race faces (Vizioli et al., 2010a; Caharel et al., 2011), but not with others that did not (Wiese et al., 2009; Chen et al., 2013). While the reason for this discrepancy remains somewhat unclear, it may be related to the fact that Chen et al. (2013) tested exclusively Chinese participants. Recent evidence suggests that Asian participants show a similar degree of holistic processing for own- and other-race faces (Crookes et al., 2013), possibly reflecting a larger degree of variability tolerated by the face processing system of this participant group. Moreover, it has been suggested that the presence of non-face stimuli in our previous study (Wiese et al., 2009) may have affected the results, as those previous studies showing a larger own-race FIE did not use additional non-face stimuli (Caharel et al., 2011). This suggestion is not supported by the present experiment, which demonstrated a corresponding result even though house stimuli were randomly intermixed in all conditions.

The typical finding that N170 amplitudes are larger for inverted relative to upright faces has been explained by Itier et al. (2007) and Itier and Batty (2009) by suggesting that N170 to upright faces reflected the activation of face-sensitive neurons, whereas N170 to inverted faces was elicited by the combined activity of both face- and eye-sensitive neurons (see also Kloth et al., 2013). When upright faces are presented, activity of eye cells is inhibited. This framework offers a possible explanation of the present results (see **Table 3**), as the larger N170 for upright other-race relative to upright own-race faces may reflect less efficient inhibition of eye cells for other-race faces. At the same time, larger N170 amplitudes for inverted own-race faces may result from the fact that the eye region for faces from different ethnic groups substantially differs, and that eyes of other-race faces may not be able to elicit eye cell activity to the same extent as ownrace faces. Importantly, the efficiency of inhibition may to some degree depend on top-down modulations such as task demands. A task that requires more in-depth face processing (such as a categorization task) may lead to a sharpening of neural processing to face cells, and thus to strong inhibition of eye cells. This sharpening, however, may be possible only for own-race faces as these are more prototypical. In the orientation task, neural processing may not be tuned as specifically toward face cells, resulting in an incomplete (and similar) inhibition of eye cells for upright ownand other-race faces. Moreover, for inverted faces eye cell activity may reach the respective maximum possible level for both own- and other-race faces, resulting in larger N170 amplitudes for inverted own-race faces. Importantly, this interpretation suggests that eye rather than face cells are responsible for ethnicity effects in N170. Please note, however, that this interpretation of the present results is speculative and needs further testing in future studies.

Of note, inversion effects in the N170 time range were also observed for house stimuli. As can be seen in **Figure 4**, these effects did not occur at sites where the N170 FIE was at maximum (P10/PO10) but had a clearly more dorsal scalp distribution, with largest effects occurring at PO7/PO8. This finding is in line with previous reports of inversion effects for houses at similar



*Black crosses indicate actual activity, gray crosses indicate maximum possible activity.*

scalp positions (Eimer, 2000; Itier et al., 2006) and at lateral occipito-temporal intracranial electrodes (Rosburg et al., 2010). These findings may suggest that different neural populations elicit the house and face inversion effects. If so, these populations seem to respond differently under varying processing demands, as house inversion effects were larger in the orientation task, while FIEs were larger in the ethnicity task.

House inversion effects in the N170 time range were larger for European ("own-race") relative to Asian ("other-race") houses. Importantly, the larger FIE for own-race faces was detected at all tested electrode sites, whereas the enhanced effect for European houses was restricted to the more dorsal electrodes PO7/PO8. Generally in line with this finding, previous studies detected larger N170 amplitudes for objects of particular expertise relative to control stimuli at similar scalp sites (Tanaka and Curran, 2001). The present results suggest that larger inversion effects to own-culture stimuli are not completely face-selective. Instead, the more dorsal portion may reflect overall enhanced familiarity with own-culture stimuli, whereas the more ventral part may more selectively represent the fine-tuning of facial expertise. In sum, N170 inversion effects were observed to be larger for own- relative to other-race faces, which was independent of task and cannot be fully explained by generally enhanced familiarity with European stimuli. Instead, the larger N170 FIE for own-race faces appears to be at least partly related to the fine-tuning of processes selective for face stimuli.

Subsequent to N170, larger P2 amplitudes were observed for own- relative to other-race faces. This P2 ethnicity effect was modulated by task, with larger effects in the categorization relative to the orientation task, which is reminiscent of a previous study from our group (Stahl et al., 2010). In this previous experiment, we observed clearly bilateral P2 effects in an ethnicity categorization task, but only a small and right-lateralized effect in an individualization task, in which participants had to rate each of the faces for attractiveness. The finding in the present study, in which a reduced P2 effect in a superficial orientation task relative to a categorization task was observed, suggests that it is not the amount of facial detail necessary for a given task that affects the magnitude of the P2 ethnicity effect. Instead, it appears that the explicit processing of ethnicity information boosts the P2 effect over the left hemisphere. The right-hemispheric effect seems less affected by task demands, but is reduced by long-term expertise with other-race faces (Stahl et al., 2008). In contrast to N170, however, P2 effects were not correlated with the own-race bias in memory in a recent study (Wiese et al., 2013), and the role of P2 during the processing of own- and other-race faces therefore remains somewhat unclear.

In a recent study, Balas and Nelson (2010) presented ownand other-race faces with either consistent shape and pigmentation information (own-race shape + pigmentation, other-race shape + pigmentation) or inconsistent information (own-race shape/other-race pigmentation, other-race shape/own-race pigmentation). In a time window similar to the P2 in the present study (230–300 ms) the authors observed larger amplitudes for own- relative to other-race shape information. Interestingly, pigmentation information was observed to have the opposite effect, with more positive amplitudes for other-race pigmentation. These findings suggest that the P2 effects observed in the present and previous studies from our group (Stahl et al., 2008, 2010; Wiese et al., 2013) largely reflect differences in shape. It is noteworthy in this context that we used Asian and Caucasian ownand other-race faces, whereas Balas and Nelson (2010) used Caucasian and African-American faces. It is thus likely that pigmentation differences between own- and other-race faces were more pronounced in the latter study and relatively less perceptually salient in the present and our previous experiments.

It should be noted that each individual stimulus was presented twice in the present experiment, and it is thus possible that participants recognized some of the previously presented faces, even though the processing of individual identity was never taskrelevant. One might suggest that the recognition of repeated faces may have been stronger in the categorization task, in which more detailed facial information was processed. Consequently, larger P2 ethnicity effects in the categorization relative to the orientation task may have not been related to ethnicity categorization *per se* but to more pronounced recognition of repeated faces. Although I cannot definitely exclude this possibility on the basis of the present data, it does not appear parsimonious when results of previous experiments are taken into account. If larger ethnicity effects in P2 were related to face recognition, this would suggest that in our previous study (Stahl et al., 2010) identity processing was stronger in an ethnicity categorization compared to an attractiveness rating task. This latter task, however, presumably required stronger processing of individual face information than the ethnicity task. In sum, while the suggestion that recognition of repeated faces enhanced the P2 effect in the present study appears less parsimonious than the alternative interpretation of stronger effects in case of ethnicity categorizations, further studies that avoid face repetition would be needed to definitely decide this question.

In addition to these ERP findings, two aspects of the behavioral results appear noteworthy. First, participants were faster to make ethnicity decisions for other-race than own-race stimuli. Similar findings have been reported by a number of previous studies, and have been interpreted to reflect the fast detection of an out-group defining feature in other-race faces (Levin, 1996, 2000). According to socio-cognitive theories of the own-race bias, this categorization advantage resulted in increased attention to general category compared to individuating information in other-race faces, which in turn led to less accurate memory (Hugenberg et al., 2010). The present results demonstrate that such a categorization advantage is not restricted to faces, but can also be observed for houses from a different culture. This general categorization advantage for "other-race" stimuli suggests that it is not face-selective, but may reflect an effect of overall familiarity extending to various stimulus classes.

Second, while no inversion effect was observed in the RT data of the orientation task, inversion slowed down participants' responses in the ethnicity task. Considering that the FIE is typically interpreted to reflect the disturbance of configural or holistic face processing (Rossion, 2008), this finding indicates that the categorization of facial ethnicity is not solely based on feature processing (for a recent demonstration of inversion effects in other categorization tasks, see Wiese et al., 2012a). This is at some variance with socio-cognitive accounts suggesting that the detection of race-specifying features drives ethnicity categorizations (Levin, 2000).

Finally, a potential limitation of the present study may be seen in the finding that Asian faces showed slightly but significantly happier expressions than Caucasian faces. Accordingly, this difference in expression may in principle have affected the present results, and may have led for instance to an increased N170 for happy rather than other-race faces. From my perspective, this assumption is not particularly likely for the following reasons: First, as noted above, a recent study from our group that used a similar face set found that effects of face ethnicity in the N170 and P2 interacted with participant ethnicity, and that both Caucasian and Asian participants demonstrated larger N170 amplitudes for the respective other-race category (Wiese et al., 2013). It is hard to see how this finding could be explained in terms of happier expressions in Asian faces. Second, a previous study did not detect differences in N170 amplitude for upright happy vs. neutral faces (Ashley et al., 2004). It should be further noted that this previous experiment used clearly emotional faces whereas in the present study, although a significant difference was detected, both Asian and Caucasian faces were rated as neutral on average.

In conclusion, the present results support the idea that differential processing of own- vs. other-race faces at early perceptual processing stages is modulated by task demands. More specifically, the necessity to process faces at a categorical or individual level appears to result in a larger N170 for own-race than otherrace faces, while the processing of more superficial stimulus properties does not. At the same time, the N170 FIE is substantially larger for own-race than other-race faces in both tasks. This latter finding can only partly be explained by larger overall familiarity with more commonly seen stimuli and may thus reflect the fine-tuning of early perceptual processing stages to faces of maximum expertise.

#### **ACKNOWLEDGMENTS**

This work was supported by a grant of the Deutsche Forschungsgemeinschaft (DFG; Wi 3219/5-2). I am grateful to Carolin S. Altmann for help during stimulus preparation and for programming the emotion ratings, and to Kathrin Rauscher for support during data collection. I am furthermore indebted to Nadine Kloth for commenting on an earlier version of the manuscript.

#### **REFERENCES**


Calder, G. Rhodes, M. H. Johnson, and J. V. Haxby (Oxford: Oxford University Press), 215–243.


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 29 July 2013; accepted: 09 December 2013; published online: 24 December 2013.*

*Citation: Wiese H (2013) Do neural correlates of face expertise vary with task demands? Event-related potential correlates of own- and other-race face inversion. Front. Hum. Neurosci. 7:898. doi: 10.3389/fnhum.2013.00898*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2013 Wiese. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Stimulus familiarity modulates functional connectivity of the perirhinal cortex and anterior hippocampus during visual discrimination of faces and objects

#### *Victoria C. McLelland1 \*, David Chan1, Susanne Ferber 1,2 and Morgan D. Barense1,2*

*<sup>1</sup> Department of Psychology, University of Toronto, Toronto, ON, Canada*

*<sup>2</sup> Rotman Research Institute, Baycrest, Toronto, ON, Canada*

#### *Edited by:*

*Wolfgang Grodd, University Hospital Aachen, Germany*

#### *Reviewed by:*

*Ute Habel, RWTH Aachen University, Germany Birgit Derntl, RWTH Aachen University, Germany*

#### *\*Correspondence:*

*Victoria C. McLelland, Department of Psychology, University of Toronto, 100 St. George Street, Toronto, ON M5S 3G3, Canada e-mail: victoria.mclelland@ utoronto.ca*

Recent research suggests that the medial temporal lobe (MTL) is involved in perception as well as in declarative memory. Amnesic patients with focal MTL lesions and semantic dementia patients showed perceptual deficits when discriminating faces and objects. Interestingly, these two patient groups showed different profiles of impairment for familiar and unfamiliar stimuli. For MTL amnesics, the use of familiar relative to unfamiliar stimuli improved discrimination performance. By contrast, patients with semantic dementia—a neurodegenerative condition associated with anterolateral temporal lobe damage—showed no such facilitation from familiar stimuli. Given that the two patient groups had highly overlapping patterns of damage to the perirhinal cortex, hippocampus, and temporal pole, the neuroanatomical substrates underlying their performance discrepancy were unclear. Here, we addressed this question with a multivariate reanalysis of the data presented by Barense et al. (2011), using functional connectivity to examine how stimulus familiarity affected the broader networks with which the perirhinal cortex, hippocampus, and temporal poles interact. In this study, healthy participants were scanned while they performed an odd-one-out perceptual task involving familiar and novel faces or objects. Seed-based analyses revealed that functional connectivity of the right perirhinal cortex and right anterior hippocampus was modulated by the degree of stimulus familiarity. For familiar relative to unfamiliar faces and objects, both right perirhinal cortex and right anterior hippocampus showed enhanced functional correlations with anterior/lateral temporal cortex, temporal pole, and medial/lateral parietal cortex. These findings suggest that in order to benefit from stimulus familiarity, it is necessary to engage not only the perirhinal cortex and hippocampus, but also a network of regions known to represent semantic information.

**Keywords: functional connectivity, perirhinal, hippocampus, perception, semantic memory, familiarity**

#### **INTRODUCTION**

The medial temporal lobe (MTL) is comprised of several highlyinterconnected structures including the hippocampus, entorhinal, perirhinal, and parahippocampal cortices. These regions have generally been thought to exclusively support functions related to long-term declarative memory (Squire et al., 2004; Squire and Wixted, 2011). Recently, however, it has become apparent that some of these structures play an important role in other cognitive functions, such as certain perceptual processes. For example, the perirhinal cortex is important for making perceptual discriminations among items that share a large number of overlapping features, particularly when it is necessary to process conjunctions of these features (Bussey and Saksida, 2002; Barense et al., 2005, 2007; Bartko et al., 2007; Devlin and Price, 2007; O'Neil et al., 2009). The involvement of the perirhinal cortex in perception has been demonstrated for a variety of stimulus classes with complex features, including inanimate objects, but also faces (Lee et al., 2005a,b, 2008; Barense et al., 2010a). Based on this evidence for a perceptual function of the perirhinal cortex, as well as on findings that the hippocampus is involved in the discrimination of threedimensional scenes (Lee et al., 2012), it has been argued that the recruitment of structures within the MTL depends more on the nature of the items being represented, as opposed to whether the task explicitly targets long-term memory (Bussey and Saksida, 2007; Graham et al., 2010).

The perirhinal cortex, located caudally in relation to the temporal pole, but also exhibiting strong connectivity with more posterior inferior temporal visual regions (Suzuki and Amaral, 1994), is well-placed to serve as an interface between perception and semantic memory. Within the domain of semantic memory, the perirhinal cortex seems to be particularly important for storing and binding conceptual information about objects (Murray and Bussey, 1999; Taylor et al., 2006; Chan et al., 2011), and also for differentiating between similar members of a single category that share many conceptual features, particularly among living things (Moss et al., 2005; Tyler et al., 2013). It is one of several regions affected by semantic dementia—a neurodegenerative disease characterized by progressive loss of semantic knowledge and degeneration of the anterior temporal lobes (Hodges et al., 1992)—and perirhinal cortex atrophy in the disorder has been associated with deficits on a large battery of tasks assessing conceptual knowledge (Davies et al., 2004).

There is evidence that semantic memory and perceptual functions interact, such that the process of perceiving everyday items like faces and objects is influenced by stimulus familiarity (in this case, the term "familiarity" refers to the degree to which participants know and have previous experience with the stimuli, and should not be confused with the concept denoting a feeling of knowing without vivid recollection of contextual details; e.g., Yonelinas, 2002). For example, the perirhinal cortex mediates aspects of the interaction between perception and conceptual knowledge, as this structure is necessary for the perception and detection of familiar object feature configurations in figure-ground tasks (Barense et al., 2012; Peterson et al., 2012). In general, familiarity significantly affects the efficiency with which items such as faces and objects are recognized (Bülthoff and Newell, 2006), and familiarity can alter the default level at which classes of objects are categorized, such that participants are able to more quickly categorize familiar items at an individual level (e.g., Bill Clinton, the Eiffel Tower), but are slower to categorize these same items at the basic level (e.g., a human face, a building), which is the reverse of the pattern typically seen with unfamiliar stimuli (Anaki and Bentin, 2009). Perceptual expertise, which is a form of familiarity that results from the acquisition of experience with particular natural classes of objects outside of a laboratory setting, can alter even the neural representation of these object categories, such that they come to be processed more in lateral occipital cortex and fusiform gyrus instead of in earlier visual cortical regions (McGugin et al., 2012; Wong et al., 2012).

Items with which participants have had some previous experience automatically prompt the retrieval of more related semantic or general conceptual information than do novel items. For example, viewing unfamiliar faces will result in retrieval of some limited conceptual information about the individuals' age, sex, and emotional expression, whereas viewing familiar faces is accompanied by retrieval of more detailed identity-specific information (Bruce and Young, 1986). Moreover, faces appear to be a type of stimulus that is particularly associated with semantic information, relative to other personal characteristics such as voices (Barsics and Brédart, 2012). The retrieval of semantic material associated with a stimulus appears to be spontaneous, with semantic information retrieved automatically even when participants are engaged in another irrelevant task (Jung et al., 2013). Semantic processes are therefore likely to be involved in any task that contains elements with which participants are familiar.

The automatic retrieval of semantic knowledge associated with familiar stimuli affects performance on perceptual discrimination tasks, even when the completion of these tasks does not overtly require the use of semantic information. In Barense et al. (2010b), two patient groups with differing profiles of temporal lobe damage completed perceptual discrimination tasks that required choosing the odd-one-out among concurrently-presented complex stimuli. The stimuli were either everyday familiar objects (e.g., cars) or unfamiliar, novel objects (e.g., "greebles," Gauthier and Tarr, 1997). Results revealed differential effects of stimulus familiarity in amnesic patients with non-progressive MTL damage ("MTL amnesics") vs. patients with neurodegeneration of the temporal lobes caused by semantic dementia ("SD patients"). Whereas both healthy controls and MTL amnesics benefitted from stimulus familiarity, SD patients did not show this facilitation. The MTL amnesics did in fact perform significantly worse than controls when discriminating among both novel and familiar stimuli, but their deficit for familiar stimuli was attenuated by their relatively unimpaired access to semantic memory. However, as the lesions in both the MTL and SD patients were widespread and variable, and both had significant damage to the perirhinal cortex and temporal pole, it was difficult to draw conclusions about the specific brain regions responsible for these differential effects.

To further investigate the neural correlates of familiarity effects in perceptual discrimination tasks, Barense et al. (2011) scanned healthy control participants while they identified the odd-oneout among sets of objects and faces that varied in familiarity (see **Figure 1**) and found that a number of regions throughout the MTL were sensitive to stimulus familiarity. Specifically, the perirhinal cortex, temporal pole, and anterior hippocampus were all more active bilaterally during discriminations of familiar faces relative to unfamiliar faces. Likewise, the perirhinal cortex and anterior/posterior hippocampus were more active bilaterally for familiar objects relative to unfamiliar objects. This observed activity was not simply a reflection of successful encoding, as these effects were still evident when the analysis was restricted to trials in which participants did not later remember the stimuli in a surprise recognition memory test. However, the perirhinal cortex, hippocampus, and temporal poles cannot be the sole regions underlying the familiarity effects observed in patients with temporal lobe damage (i.e., facilitation from familiar stimuli in focal MTL amnesics, but no such facilitation in SD patients), because these MTL amnesics had suffered damage to all of these structures and yet still showed facilitation from the use of familiar stimuli (Barense et al., 2010b).

Given that brain damage in humans is rarely restricted to discrete anatomical areas of theoretical interest, insight into the differences between these patient groups can be gained by examining how stimulus familiarity affects the broader networks of regions with which the perirhinal cortex, hippocampus, and temporal poles interact. The measurement of functional connectivity is one method of acquiring such information. This technique involves the identification of regions throughout the brain in which changes in activation occur at the same time and at a similar magnitude to the changes in activation in specific regions of interest, or *seeds*. Any areas exhibiting such a correlation are broadly thought to be functionally interacting with the seed regions in some way (Rogers et al., 2007; Friston, 2011). Using these techniques, it has already been established that the functional connectivity of the MTL can be affected by changing task demands (Martin et al., 2011; O'Neil et al., 2012).

In the present study, we examined how the functional connectivity of regions in the perirhinal cortex, temporal pole, and anterior and posterior hippocampus identified by Barense et al. (2011) varied during perceptual discrimination, depending on the degree to which participants were familiar with the items to be

discriminated. We hypothesized that for discriminations involving familiar stimuli (relative to unfamiliar stimuli), at least some of these four subregions would exhibit significantly greater interaction with structures thought to represent semantic information, including anterior temporal (e.g., Patterson et al., 2007; Binney et al., 2010; Visser et al., 2010), lateral temporal (e.g., Schmolck et al., 2002; Levy et al., 2004), and inferior parietal cortex (Binder and Desai, 2011; Fairhall and Caramazza, 2013). If confirmed, the findings would offer insight into the performance discrepancies reported between SD patients and MTL amnesics during perceptual discrimination of familiar stimuli (Barense et al., 2010b). To this end, we conducted a partial least squares (PLS) analysis of functional connectivity on the data described in Barense et al. (2011).

#### **MATERIALS AND METHODS**

This study involved a re-analysis of the data presented in Barense et al. (2011) using multivariate statistical imaging techniques, and thus, a full description of the methods can be found there. Accordingly, only the aspects of the experimental design that are relevant to the current analyses are presented here.

#### **PARTICIPANTS**

Eighteen young adult participants (12 female, *M* = 27*.*3 years old, *SD* = 5*.*5 years) were recruited. All were right-handed and did not suffer from any neurological abnormalities. All participants gave informed written consent, and this research received ethical approval from the Cambridgeshire Local Research Ethics Committee (LREC reference 05/Q0108/127).

#### **PROCEDURE**

An oddity discrimination paradigm was employed, with each participant completing 405 trials (81 trials per condition, 105 trials in each of the first three runs, and 90 trials in the fourth run). Participants were simultaneously presented with pictures of three items: two of these pictures were of the same stimulus and the other was a picture of a different stimulus. Participants were asked to identify the odd one out (see **Figure 1**), and indicated their responses by pressing one of three specified buttons on a fourbutton response box held in the right hand. The sets of stimuli always appeared in the same layout, with two items next to each other and a third item above the other two, though the location of the odd-one-out was counterbalanced across trials. All stimuli were trial-unique and therefore not repeated across trials. The stimuli in each trial belonged to one of four possible experimental conditions: *familiar (famous) faces*, *unfamiliar faces*, *familiar objects*, and *unfamiliar objects (greebles)*. In addition, there was a *size control* condition, in which the stimuli were black squares. The sets of stimuli were each displayed for 5.5 s, during which participants indicated the odd stimulus with a corresponding button press as quickly and as accurately as possible. The inter-trial interval was 0.25 s, except that on every 11th trial there was an additional 0.60 s delay, during which the experimental program checked to ensure that it was synced to the appropriate scanner pulse. Each condition was presented across "mini-blocks" of three trials, such that participants were shown three trials in a row belonging to a single condition before moving on to a miniblock of a different condition. The order of these mini-blocks was fixed for each participant and counterbalanced across participants. This ensured that a given trial type did not always follow the same trial type, allowing decoupling of signal across conditions.

In each of the two face conditions (*familiar* and *unfamiliar faces*), the items presented were grayscale photographs of White human faces displayed on a white background. On each trial, two of the three images were of the same face but shown from different viewpoints, while the third image was a different face shown from yet another viewpoint. The *familiar faces* were famous faces that were likely known to participants, while the *unfamiliar faces* were novel and not known to participants. On each of the 81 trials per condition, 40 of the trials involved female faces and 41 trials involved male faces.

In the two object conditions (*familiar* and *unfamiliar objects*), the stimuli were color photographs of objects. On each trial, all stimuli belonged to the same basic category, with two of the stimuli being the same object depicted from different viewpoints, and the third image showing a second, different object from another viewpoint. The two objects for each trial were chosen so as to have as many overlapping features as possible, in order to prevent participants from being able to make their oddity judgments based upon a single distinguishing feature. The *familiar objects* were commonplace and inanimate items selected from a large database (Hemera Photo-Objects Volumes 1–3), while the *unfamiliar objects* were "greebles" (Gauthier and Tarr, 1997), which are well-studied stimuli that, like faces, have a homogeneous spatial configuration, but with which our participants had no prior exposure or expertise.

In the *size control* condition, the stimuli consisted of three black squares presented in slightly jittered positions, with two of the squares being the same size and the third square being slightly smaller or larger (by a range of 9–15 pixels per side) than the other two.

#### **MRI PARAMETERS**

MRI images were acquired on a Siemens 3.0 Tesla Tim Trio MRI scanner. Each participant's anatomical scan was collected using a magnetization-prepared rapidly-acquired gradient echo (MP-RAGE) sequence [repetition time (*TR*) = 2250 ms, echo time (*TE*) = 2.99 ms, flip angle = 9◦, field of view = 256 × 240× 160 mm, matrix size = 256 × 240× 160 mm, spatial resolution = 1 × 1× 1 mm]. Functional images were acquired using a T2∗ weighted echo planar imaging (EPI) sequence with two echoes (spin echo and gradient echo, Schwarzbauer et al., 2010) in attempt to avoid the loss of signal that often occurs when collecting images of the inferior temporal lobes and orbitofrontal cortex (slice thickness = 3 mm, gap = 1 mm, matrix size = 64 × 64, in-plane resolution = 3*.*5 × 3.5 mm, *TR* = 2000 ms). Spin-echo images are generally less prone to susceptibility artifacts, but in this case the spin-echo data did not reveal any effects that were not already evident in the standard gradientecho sequence. Consequently, only data from the gradient-echo sequence is described here. Sixteen slices were acquired in an interleaved fashion (one spin-echo and one gradient-echo image per slice, resulting in an effective total of 32 slices), following the temporal lobes and parallel to the long axis of the hippocampus. Because half of the slices were devoted to the spin echoes and the time to acquire each volume was effectively doubled, the brain coverage was focused on temporal regions and our coverage of frontal regions was limited. Each participant completed four functional runs, with the first three runs lasting 630 s in duration, and the fourth run lasting 542 s. The first 5 scans of each run were discarded to allow the MRI signal to reach equilibrium.

#### **IMAGE PRE-PROCESSING**

Functional MRI images were preprocessed using a standard protocol within Statistical Parametric Mapping software (SPM5, www*.*fil*.*ion*.*ucl*.*ac*.*uk/spm/software/spm5). All functional images were realigned to the first image of the first run, and un-warped to correct for distortions in the main magnetic field. All participants' anatomical images were normalized to the Montreal Neurological Institute (MNI) template, and the resulting normalization parameters were then applied to all functional images (resampled at 3 × 3 × 3 mm voxels). The normalized images were then spatially smoothed with an 8 mm full-width half-maximum (FWHM) Gaussian kernel.

#### **PARTIAL LEAST SQUARES ANALYSES**

Spatiotemporal partial least squares correlation was used to analyze the functional MRI data (McIntosh et al., 1996, 2004; McIntosh and Lobaugh, 2004; Krishnan et al., 2011). PLS is a highly-sensitive, covariance-based multivariate technique, similar to principal component analysis (PCA) but instead of operating on the total variance, PLS focuses solely on the covariance between the functional data and the task/experimental design, constraining solutions to be those related to the experimental conditions (McIntosh et al., 2004). Using PLS software (http:// www*.*rotman-baycrest*.*on*.*ca/index*.*php?section=84), we first ran an initial mean-centered task PLS analysis, which identifies patterns of activation that optimally distinguish between experimental conditions. This was done in order to generate the voxel signal intensity data for our subsequent seed PLS functional connectivity analysis. Because our planned contrast for this seed PLS analysis was between the familiar vs. unfamiliar conditions, we were particularly interested in whether the mean-centered task PLS analysis would identify regions that discriminated between familiar and unfamiliar faces/objects. Subsequent seed PLS analyses were conducted to reveal networks functionally connected to the MTL regions found to be responsive to stimulus familiarity in Barense et al. (2011).

#### **MEAN-CENTERED TASK PARTIAL LEAST SQUARES ANALYSIS**

We initially conducted a mean-centered task PLS analysis (McIntosh et al., 1996, 2004; McIntosh and Lobaugh, 2004), from which we planned to extract signal intensity values from the MTL seed regions identified in the previously-published univariate analysis (Barense et al., 2011). Although the seed-based functional connectivity analyses were of primary interest in the current study, the mean-centered task PLS analyses also allowed us to determine whether an alternate, multivariate approach would also implicate the MTLs in the perceptual discrimination of familiar stimuli, as described by Barense et al. (2011). Meancentered task PLS is an exploratory, data-driven form of PLS in which no *a priori* contrasts or hypotheses are specified, and a series of *latent variables* (LVs) that optimally account for the covariance between functional data and experimental conditions are derived using singular value decomposition (McIntosh et al., 2004). These LVs, which are similar to eigenvectors in PCA, have three components: a *singular value*, which describes the proportion of the covariance between the task and the functional data accounted for by that particular LV, *voxel saliences*, which identify the distributed spatial pattern of brain voxels that is most related to the effect characterized by that LV, and *task saliences*, which illustrate the extent to which each experimental condition is associated with that pattern of voxels. Effects were assessed over a 16 s temporal window from the onset of each trial. As we were primarily interested in the differences between the four experimental conditions (*familiar faces*, *unfamiliar faces*, *familiar objects*, and *unfamiliar objects*), as opposed to differences between the experimental and control conditions, the *size control* condition was excluded from these analyses. The functional MRI data were transformed into a 72 × 282,440 matrix, in which the rows represented each of the four conditions for each of the 18 participants, while the columns contained the signal values for every voxel at each of the eight 2 s time lags contained within the 16 s temporal window. Singular value decomposition was performed on a mean-centered, columnwise averaged form of this matrix, yielding the three components mentioned above.

The statistical significance of each LV was evaluated with 500 permutation tests, calculated with a threshold of *p <* 0*.*05 (McIntosh et al., 1996; McIntosh and Lobaugh, 2004). Each permutation test involves random reordering and reassignment of every participant's data to the specified experimental conditions, determining the likelihood of each LV's singular value occurring by chance. Additionally, the reliability of the voxel saliences (i.e., the clusters of brain regions identified by each LV) was measured by bootstrap estimation of their standard errors, entailing random sampling of participants with replacement 300 times, determining which voxel responses appear reliably (and therefore are not heavily influenced by which participants are included in the sample). Here, clusters with bootstrap ratios (BSR) of greater than ±3.5 were classified as reliable (A BSR is approximately equivalent to a *z*-score, and in this case corresponds to a *p*-value of roughly 0.001; McIntosh and Lobaugh, 2004). As PLS analyses are conducted in a single analytic step, correction for multiple comparisons is not required.

#### **SEED PARTIAL LEAST SQUARES ANALYSES**

Following this preliminary mean-centered task PLS analysis, we then used a "seed" PLS analysis (McIntosh et al., 2004) to examine the functional connectivity of the four regions identified as being sensitive to stimulus familiarity in Barense et al. (2011). This form of PLS can be used to assess functional connectivity by pinpointing regions across the whole brain in which signal is correlated with that of user-specified seed regions of interest, and determining how this connectivity changes across experimental conditions. In this case, the seed regions were the peak MNI coordinates that had been identified in the univariate contrasts of the familiar vs. the unfamiliar conditions in Barense et al. (2011). These regions included the perirhinal cortex, the anterior hippocampus, the posterior hippocampus, and the temporal pole (one seed in each hemisphere). The voxel signal from each of these seeds was extracted from the mean-centered task PLS result using PLS software's multiple voxel extraction tool, centered on the following peak MNI coordinates identified by Barense et al. (2011): left perirhinal cortex (*x, y,z* = −33*,* −12*,* −27), left anterior hippocampus (*x, y,z* = −21*,* −9*,* −18), left posterior hippocampus (*x, y,z* = −33*,* −27*,* −15), left temporal pole (*x, y,z* = − 36*,* 18*,* −27), right perirhinal cortex (*x, y,z* = 36*,* −9*,* −30), right anterior hippocampus (*x, y,z* = 27*,* −15*,* −18), right posterior hippocampus (*x, y,z* = 33*,* −33*,* −12), and right temporal pole (*x, y,z* = 63*,* 3*,* −18), averaging signal intensity across the two nearest neighboring voxels. Peak signal intensity values were extracted for each voxel from lag 3 (6 s from trial onset) of the mean-centered task analysis, as this timepoint corresponds to the typical peak of the hemodynamic response function (cf. O'Neil et al., 2012).

These signal intensity values for each participant were entered in matrix form as the behavioral values in a single non-rotated seed PLS analysis. This 72 × 8 matrix contained four rows for each of the 18 participants, and one column for each of the seed regions. Correlations were computed between this matrix and the matrix of functional MRI data containing all voxel signal values as described earlier for the mean-centered task PLS analysis. The resulting correlation maps were stacked and again analyzed with singular value decomposition. The non-rotated version of PLS, instead of being data-driven, constrains the possible solutions to allow for the explicit testing of hypotheses via specification of *a priori* contrasts. Non-rotated PLS analyses yield one LV corresponding to each specified contrast. Considering that we were primarily interested in the effects of stimulus familiarity, within this model we specified a contrast for each seed investigating whether the functional connectivity of that seed differed between the familiar and unfamiliar conditions (irrespective of whether the stimuli were faces or objects). The significance of the LVs and reliability of the voxel saliences were evaluated in the same manner as the mean-centered task analysis, with 500 permutation tests (*p <* 0*.*05) and 300 bootstrap estimations (BSR = ±3*.*5). Peak MNI coordinates from reliable clusters identified in both the mean-centered and seed PLS analyses are reported in **Tables 2**–**4**, with anatomical labels assigned using the Automated Anatomical Labeling atlas (Tzourio-Mazoyer et al., 2002). In all figures for both the mean-centered and seed PLS analyses, conditions and brain regions with positive saliences are always displayed in red and those with negative saliences are always displayed in blue.

#### **RESULTS**

#### **BEHAVIORAL RESULTS**

Accuracy and reaction time (RT) for each of the five conditions are shown in **Table 1** (reproduced from Barense et al., 2011). A repeated-measures ANOVA indicated that there was no main effect of stimulus familiarity on participants' accuracy in the discrimination task, *F(*1*,* <sup>17</sup>*)* = 0*.*90, *p* = 0*.*357, indicating that differences in functional connectivity between the familiar



*RT, reaction time; standard deviation values are shown in parentheses.*

and unfamiliar conditions did not stem from factors related to accuracy. However, there was a significant main effect of familiarity on RT, *F(*1*,* <sup>17</sup>*)* = 41*.*47, *p <* 0*.*001, due to the fact that the mean RT for the familiar conditions (*M* = 2813.13, *SE* = 96*.*56) was significantly faster than for the unfamiliar conditions (*M* = 3048*.*00, *SE* = 80*.*50). Despite this effect, it is unlikely that the difference in RT underlies any observed differences in functional connectivity between the familiar and unfamiliar conditions. The mean RT discrepancy of 234 ms is far shorter than a single TR; consequently, peak correlations between functionally interacting regions during both the familiar and unfamiliar conditions would still fall within the same time lag in our PLS analyses.

#### **MEAN-CENTERED TASK PLS ANALYSIS**

The mean-centered task PLS analysis yielded three significant LVs (see **Table 2**). The first LV explained 42.53% of the crossblock covariance (*p <* 0*.*001), and differentiated between the face and non-face conditions. Two negatively-correlated patterns of activation were identified, one associated with both the *familiar* and *unfamiliar objects* conditions (shown in red in **Figure 2A**), and another associated with the *familiar faces* condition (shown in blue in **Figure 2A**). The regions associated more highly with the *familiar faces* condition relative to the *familiar* and *unfamiliar objects* conditions included bilateral clusters in the anterior hippocampus and anterior lateral temporal cortex, which are areas that correspond well to the regions that responded to face familiarity from the univariate general linear model analysis in Barense et al. (2011). The regions associated more with the two object conditions relative to the *familiar faces* condition included large sections of occipital cortex and bilateral fusiform gyrus.

The second LV explained 33.08% of the crossblock covariance (*p <* 0*.*001) and appeared to primarily distinguish the *unfamiliar objects* condition from the *familiar objects* and *unfamiliar faces* conditions. The regions associated more with the *unfamiliar objects* condition relative to the *familiar objects* and *unfamiliar faces* conditions included right inferior temporal and superior/middle occipital gyrus (shown in red in **Figure 2B**). The regions associated with the *familiar objects* and *unfamiliar faces* conditions relative to the *unfamiliar objects* condition included the right anterior hippocampus and bilateral insular cortex (shown in blue in **Figure 2B**). The *familiar faces* condition was not significantly associated with either of these two patterns of activation in LV 2.

Finally, the third LV was of most interest to the current study, as it differentiated between the conditions involving familiar and unfamiliar stimuli (explaining 24.39% of the crossblock covariance, *p <* 0*.*048). Specifically, this LV highlighted one pattern of activation that was correlated more with the *familiar faces* and *familiar objects* conditions relative to the *unfamiliar faces* condition (shown in blue in **Figure 2C**), and another correlated with the *unfamiliar faces* condition relative to the two familiar conditions (shown in red in **Figure 2C**). The *unfamiliar objects* condition was not significantly associated with either of these two negatively-correlated activation patterns. The regions associated with the familiar conditions consisted of strong bilateral activation along the entire extent of the parahippocampal gyrus, extending into the anterior hippocampus, and also included the bilateral temporal poles. This activation corresponded well with the MTL regions identified as being sensitive to stimulus familiarity in Barense et al. (2011). In contrast, the *unfamiliar faces* condition was associated primarily with bilateral medial parietal activation.

#### **SEED PLS ANALYSIS**

Following this mean-centered task analysis, we conducted our non-rotated seed PLS analysis with signal intensity values extracted from the mean-centered task PLS analysis for each seed. This analysis revealed that the functional connectivity of two of the eight seeds (the right perirhinal cortex and the right anterior hippocampus) was modulated by stimulus familiarity, such that these regions were functionally interacting with different networks depending on whether the stimuli to be discriminated were familiar or unfamiliar. Specifically, the right perirhinal cortex seed (*x, y,z* = 36*,* −9*,* −30) displayed this pattern (16.43% of crossblock covariance, *p <* 0*.*016). **Figure 3** illustrates that during the two familiar conditions (shown in red), signal in the right perirhinal cortex was highly correlated with a bilateral network including lateral temporal, anterior temporal, and medial and lateral parietal cortex (see **Table 3**). In contrast, during the two unfamiliar conditions, the functional connectivity of this right perirhinal seed region was significantly different, and instead correlated with large sections of occipital cortex.

This same contrast of the familiar vs. unfamiliar conditions was also significant in the right anterior hippocampal seed (*x, y,z* = 27*,* −15*,* −18*,* 15.38% of crossblock covariance, *p <* 0*.*036). Like the right perirhinal cortex, the right anterior hippocampus showed differential functional connectivity for the familiar vs. unfamiliar conditions, and many of the regions identified in the two functionally-connected networks overlapped with those found in the two networks showing connectivity with the right perirhinal cortex (see **Table 4** and **Figure 4**). During the familiar conditions, the network associated with the right anterior hippocampus included right anterior lateral temporal and lateral parietal cortex, cuneus and precuneus. For the unfamiliar conditions, the functionally-connected network consisted of regions such as bilateral occipital cortex and fusiform gyrus. **Figure 5** shows the time course of the correlation between signal in the right perirhinal cortex and right anterior hippocampal seeds


**Table 2 | Regions associated with the latent variables from the mean-centered task PLS analysis.**

*Note: Only clusters evident during the peak timepoint (TR 3) with a bootstrap ratio of greater than* +*/*−*3.5 are reported. \*Cluster size (k) indicates the number of voxels comprising the cluster; only clusters with a minimum extent of 10 voxels are reported. BSR, Bootstrap ratio; LV, Latent Variable; L, left; R, right.*

**Frontiers in Human Neuroscience www.frontiersin.org** March 2014 | Volume 8 | Article 117 |

during the familiar conditions.

brain images) and negatively-weighted (shown in blue on brain images) networks. These networks, in the form of voxel salience maps for each LV, are all shown for TR 3, superimposed on the ch2bet template in MRIcron

and a sample of the regions identified as functionally connected

In these same two perirhinal and anterior hippocampal regions in the left hemisphere, this contrast did not reach confidence intervals.

unfamiliar conditions. Maps are thresholded at the equivalent of *p <* 0*.*05 for visualization purposes. BSR, bootstrap ratio. Error bars reflect 95%

significance. Functional connectivity was not significantly modulated by familiarity in either the left perirhinal cortex (*x, y,z* = −33*,* −12*,* −27*,* 11.14% of crossblock covariance, *p <* 0*.*317) or the left anterior hippocampus (*x, y,z* = −21*,* −9*,* −18, 14.05%

**cortex seed.** A seed PLS analysis demonstrated that the functional connectivity of the right perirhinal cortex differed depending on stimulus familiarity. **(A)** Regions shown in red are those with which the right perirhinal cortex seed was functionally connected during the two familiar conditions in TR 3, whereas **(B)** displays regions shown in blue with which the perirhinal

graph in **(C)** depicts the correlation of the signal in the perirhinal seed with the two networks during the four experimental conditions. Maps are thresholded at the equivalent of *p <* 0*.*05 for visualization purposes, and networks are overlaid on the ch2bet template in MRIcron (Rorden et al., 2007). BSR, bootstrap ratio. Error bars reflect 95% confidence intervals.



*Only clusters evident during the peak timepoint (TR 3) with a bootstrap ratio of greater than* +*/*−*3.5 are reported. \*Cluster size (k) indicates the number of voxels comprising the cluster; only clusters with a minimum extent of 10 voxels are reported. BSR, Bootstrap ratio; L, left; R, right.*

of crossblock covariance, *p <* 0*.*078) seeds. Similarly, the contrast of connectivity for the familiar conditions vs. the unfamiliar conditions was not significant in either hemisphere for the other two remaining seeds in the posterior hippocampus and temporal pole. Permutation testing for the LVs in the right posterior hippocampus (*x, y,z* = 33*,* −33*,* −12*,* 8.41% of crossblock covariance, *p <* 0*.*884), left posterior hippocampus (*x, y,z* = −33*,* −27*,* −15*,* 12.73% of crossblock covariance, *p <* 0*.*158), right temporal


#### **Table 4 | Regions showing significant functional connectivity with the right anterior hippocampal seed.**

*Only clusters evident during the peak timepoint (TR 3) with a bootstrap ratio of greater than* +*/*−*3.5 are reported. \*Cluster size (k) indicates the number of voxels comprising the cluster; only clusters with a minimum extent of 10 voxels are reported. BSR, Bootstrap ratio; L, left; R, right.*

**hippocampal seed.** The functional connectivity of the right anterior hippocampus also differed depending on stimulus familiarity. **(A)** Regions shown in red are those with which the right anterior hippocampal seed was functionally connected during the two familiar conditions in TR 3. **(B)** Regions shown in blue are those with

pole (*x, y,z* = 63*,* 3*,* −18*,* 8.94% of crossblock covariance, *p <* 0*.*828), and left temporal pole (*x, y,z* = −36*,* 18*,* −27*,* 12.92% of crossblock covariance, *p <* 0*.*152) indicated that the functional connectivity of these four seed regions did not differ depending on stimulus familiarity.

in TR 3. The bar graph in **(C)** depicts the correlation of the signal in the anterior hippocampal seed with the two networks during the four experimental conditions. Maps are thresholded at the equivalent of *p <* 0*.*05 for visualization purposes. BSR, bootstrap ratio. Error bars reflect 95% confidence intervals.

#### **DISCUSSION**

The aim of this study was to examine the functional connectivity of several MTL structures during a complex perceptual discrimination task. Specifically, we were interested in whether MTL functional connectivity during the task would be affected

by participants' prior familiarity with the stimuli to be discriminated. The perirhinal cortex, anterior and posterior hippocampus, and temporal pole were previously shown to be more active during perceptual discrimination of familiar, relative to unfamiliar, faces and objects (Barense et al., 2011). We anticipated that when participants discriminated between familiar stimuli, some or all of these areas would show increased connectivity with anterior temporal, lateral temporal, and inferior parietal regions known to be involved in semantic memory (Binder and Desai, 2011), relative to when participants discriminated between novel or unfamiliar stimuli. Our findings indicate that the functional connectivity of the right perirhinal cortex and right anterior hippocampus did in fact differ across familiar and unfamiliar conditions, while the connectivity of the left perirhinal cortex, left anterior hippocampus, and bilateral posterior hippocampus and temporal pole was unaffected by stimulus familiarity.

During the two familiar conditions, signal in the right perirhinal cortex covaried with signal in bilateral anterior portions of the middle and inferior temporal gyri, the right temporal pole, bilateral posterior aspects of the middle and superior temporal gyri, and bilateral angular gyrus, cuneus and precuneus. In contrast, during the unfamiliar conditions, the right perirhinal cortex instead showed connectivity with bilateral occipital cortex (**Figure 3**). The pattern of differential functional connectivity exhibited by the right anterior hippocampus was nearly identical to that of the right perirhinal cortex for both the familiar and unfamiliar conditions, though during the familiar conditions, the network associated with the right anterior hippocampus was slightly more right-lateralized (**Figure 4**).

#### **RELATIONSHIP OF OBSERVED FUNCTIONAL CONNECTIVITY TO ANATOMICAL CONNECTIVITY**

The regions identified as being functionally connected to the perirhinal cortex and anterior hippocampus during perceptual discrimination correspond well with what is already known about the anatomical connections of the MTL. Functional correlations between regions during resting states are thought to reflect intrinsic anatomical connections, and studies using such methods have demonstrated that the perirhinal cortex and anterior hippocampus have very similar intrinsic functional connectivity, while the parahippocampal gyrus and posterior hippocampus show functional correlations with a separate, more posterior network (Kahn et al., 2008; Ranganath and Ritchey, 2012). Given the similar resting-state connectivity of the perirhinal cortex and anterior hippocampus, it therefore is not surprising that the taskrelated functional connectivity of these two regions was similarly affected by stimulus familiarity. The perirhinal cortex and anterior hippocampus are anatomically associated with an anterior cortical pathway that encompasses anterior lateral temporal cortex, including the temporal poles and following along the middle temporal gyrus (Kahn et al., 2008). Moreover, the perirhinal cortex has intrinsic connectivity with anterior fusiform gyrus, anterior lateral and inferior temporal cortex, anterior hippocampus, temporoparietal cortex, and multiple aspects of prefrontal cortex (Libby et al., 2012).

#### **REGIONS FUNCTIONALLY CONNECTED TO THE PERIRHINAL CORTEX AND HIPPOCAMPUS**

The networks that correlated with the perirhinal cortex and anterior hippocampus during the familiar face and familiar object conditions included anterior temporal, lateral temporal, and inferior parietal regions, which are all areas thought to represent semantic information. In particular, parts of the inferior parietal lobe and significant portions of the ventral and lateral temporal lobes are thought to be high-level "convergence zones" in which represented information is abstract and independent of specific modalities (Binder and Desai, 2011; Fairhall and Caramazza, 2013). The anterior temporal lobes have also been proposed to serve as an amodal semantic hub (Binney et al., 2010; Visser et al., 2010), representing conceptual similarities among items that differ drastically in shape, color, and function (e.g., the similarities between an ostrich and a hummingbird; Rogers et al., 2004; Patterson et al., 2006, 2007). Additionally, it has been suggested that the anterior temporal lobes store representations of unique entities, as anterior temporal lobe activation is often seen in response to the recognition of specific familiar or famous faces (e.g., Gorno-Tempini et al., 1998; Leveroni et al., 2000; Damasio et al., 2004) and famous buildings (Gorno-Tempini and Price, 2001), though others have argued that a more accurate interpretation of anterior temporal lobe function is in the representation of abstract social knowledge (Olson et al., 2007, 2013).

Within these broad regions, the specific structures identified in the functionally-connected networks have already been associated with semantic tasks involving face and object stimuli. For example, retrieving non-lexical information about the professions associated with famous faces produced activation in anterior middle temporal gyrus, temporoparietal junction, and temporal pole, while successful naming of famous faces generated activation in left posterior middle temporal gyrus and left inferior parietal cortex (Gesierich et al., 2012). All of these regions were present in the functionally-connected networks associated with the familiar conditions in the present study. Also identified in these networks was the angular gyrus, which has been described as a heteromodal region that is capable of integrating a wide range of conceptual information, and is consistently activated by a variety of semantic concepts with different modality-specific associations (Bonner et al., 2013).

The connectivity of the perirhinal cortex and anterior hippocampus with the cuneus for the familiar conditions may have been driven primarily by participants automatically activating conceptual knowledge about the familiar objects, as cuneus activation is seen in semantic tasks requiring participants to retrieve knowledge about the proper use of objects (Ebisch et al., 2007) and when judging the semantic relatedness of words referring to tools (Tyler et al., 2003). The connectivity with the precuneus, which is a region most frequently associated with the act of episodic memory retrieval accompanied by rich visual imagery (Cavanna and Trimble, 2006), could reflect spontaneous retrieval of episodic material associated with the familiar stimuli. Similarly, the presence of the calcarine fissure in the familiar face and object networks may have resulted from a greater degree of mental imagery generated for items with which participants are familiar (Klein et al., 2000; Lambert et al., 2002).

#### **HEMISPHERIC DIFFERENCES**

As the intrinsic functional connectivity of the perirhinal cortex and anterior hippocampus is similar in both hemispheres (Libby et al., 2012), and both left and right perirhinal cortex and anterior hippocampus were significantly more active for familiar vs. unfamiliar discriminations (Barense et al., 2011), it is not entirely clear why the effect of stimulus familiarity only significantly impacted the functional connectivity for these regions in the right hemisphere. As mentioned previously, the effect did approach significance in the left anterior hippocampus. However, it is possible that the automatic retrieval of semantic information associated with non-verbal stimuli in the current perceptual discrimination paradigm is slightly more lateralized to the right hemisphere. Previous studies have shown that the recognition of familiar faces and subsequent retrieval of person-related conceptual knowledge has a tendency to show rightward lateralization, being particularly associated with activation in the right anterior temporal lobes (Gainotti, 2013).

#### **CATEGORY SELECTIVITY**

The functional connectivity of each of the eight seed regions was not impacted in the same way by stimulus familiarity. The reason for this differential functional connectivity likely stems from some degree of stimulus category selectivity within these seed regions. Although all seeds were shown in Barense et al. (2011) to be sensitive in some manner to stimulus familiarity during perceptual discrimination, this varied for each seed depending on the stimulus category (i.e., faces vs. objects). More specifically, while the bilateral perirhinal cortex and anterior hippocampus were more active for familiar vs. unfamiliar stimuli in general (irrespective of whether the stimuli were faces or objects), the temporal pole was only sensitive to the familiarity of faces, and the posterior hippocampus was sensitive only to the familiarity of objects. Therefore, it seems likely that while all of these regions are involved to some extent in representing semantic information about the stimuli to be discriminated, this involvement is modulated by stimulus category and therefore the functional connectivity of these regions will not necessarily be identical.

#### **MEAN-CENTERED TASK PLS LV1 AND LV2: EFFECTS OF STIMULUS CATEGORY**

The mean-centered task PLS analysis, conducted as a preliminary step from which to extract signal intensity values for the functional connectivity analyses, also highlighted the main factors explaining the covariance between the functional neuroimaging data and the experimental design. This analysis identified regions associated with optimal combinations of the *familiar faces*, *unfamiliar faces*, *familiar objects*, and *unfamiliar objects* conditions. The first two latent variables primarily distinguished between specific stimulus categories irrespective of their familiarity. LV1 highlighted differences between the face and object conditions, and the existence of such differences is unsurprising given the numerous dissimilarities between faces and objects. LV2 distinguished the *unfamiliar objects* (greebles) condition from the remaining three conditions involving more everyday items, suggesting that there is a fundamental difference between discriminations involving a completely novel type of stimulus that has never been encountered before compared to discriminations involving stimuli to which participants have had some exposure—whether it be to the stimulus category in general (e.g., faces in general) or to the actual exemplars themselves (e.g., specific famous faces).

In contrast to these first two LVs, LV3 revealed differences in neural representation that resulted from the varying levels of stimulus familiarity, irrespective of stimulus type, which was our main *a priori* factor of interest. The relative importance of these three LVs indicate that while differences among stimulus categories account for a greater degree of covariance than differences in stimulus familiarity, both of these constructs make significant and simultaneous contributions.

#### **MEAN-CENTERED TASK PLS LV3: EFFECTS OF STIMULUS FAMILIARITY**

The focus of the current study was on the familiarity-related effects identified in the third LV. Nevertheless, some of the regions identified in the first latent variable were also relevant to this issue. The first LV identified the anterior hippocampus and anterior lateral temporal cortex as being particularly involved in making perceptual discriminations among familiar faces, relative to discriminations among familiar objects. The fact that this activation was located more anteriorly in the temporal lobes supports the findings of previous studies demonstrating some degree of stimulus selectivity along the longitudinal axis of the MTL, such that anterior regions are more content-general, responding to face, object, and some scene stimuli, while posterior regions respond selectively to scenes (Litman et al., 2009; Liang et al., 2012). Moreover, the anterior hippocampal and anterior lateral temporal activation seen in response to familiar faces corroborates the findings of Trinkler et al. (2009), who found that activation in the anterior hippocampus and anterior middle temporal gyrus was associated with greater pre-experimental knowledge about face stimuli, and that anterior hippocampal activation increased with the degree to which the faces were personally-known to participants. The anterior hippocampal activation seen for the *familiar faces* condition in the present study may therefore reflect the retrieval of episodic autobiographical information associated with the famous individuals whose faces were presented. While the current implication of the anterior hippocampus in perceptual discrimination of familiar faces is consistent with previous findings (e.g., Lee et al., 2008), it is at odds with a recent suggestion that a bias toward pattern completion processes in the anterior hippocampus renders it of no use for fine perceptual discrimination tasks (Poppenk et al., 2013). The perirhinal cortex alone does in fact appear to be capable of making such discriminations among object and face stimuli (Barense et al., 2010a), and the involvement of the anterior hippocampus may therefore simply stem from its strong connectivity with the perirhinal cortex (Kahn et al., 2008; Libby et al., 2012). However, the repeated implication of the anterior hippocampus in the perceptual discrimination of faces suggests the possibility of an additional, as yet unspecified, role for the anterior hippocampus in such tasks.

The third latent variable implicated large sections of the MTL in perceptual discriminations of familiar stimuli in general. For the combination of the *familiar faces* and *familiar objects* conditions, activation was localized bilaterally along the entire extent of the parahippocampal gyrus, extending into the anterior hippocampus, perirhinal cortex, and slightly into the posterior hippocampus, as well as in the temporal poles. These regions are generally consistent with those identified as showing greater activity during the familiar vs. unfamiliar conditions in the univariate general linear model (GLM) used by Barense et al. (2011), suggesting that the multivariate PLS and univariate GLM methods detected similar effects. Widespread involvement of the MTL in perceptual discrimination offers support for the proposition that aspects of this region serve as an extension of a representational hierarchy in the ventral visual stream (Cowell et al., 2010; Lee et al., 2012). Moreover, the observed sensitivity of the MTL to stimulus familiarity in perceptual discrimination suggests that these regions also represent semantic information (Murray and Bussey, 1999; Taylor et al., 2006; Barense et al., 2011).

#### **IMPLICATIONS FOR SEMANTIC DEMENTIA AND MTL AMNESIA**

The current findings offer insight into an intriguing and previously unexplained pattern of results from Barense et al. (2010b). As mentioned previously, this study found that two patient groups—amnesics with non-progressive MTL damage ("MTL amnesics") and patients with semantic dementia ("SD patients")—were both impaired relative to controls at perceptual discriminations of complex and visually similar stimuli. However, only the controls and MTL amnesics showed a benefit from stimulus familiarity when making such discriminations. By contrast, the SD patients failed to show this facilitation from the use of familiar stimuli, likely because they were unable to engage support from their impaired semantic system. Nonetheless, the neuroanatomical correlates of these behavioral differences were unclear. The two patient groups had largely overlapping MTL damage, particularly in the perirhinal cortex. Additionally, although Barense et al. (2011) illustrated that multiple MTL subregions were sensitive to stimulus familiarity during perceptual discrimination, none of these regions were clearly more damaged in the SD patients than the MTL amnesics. The current results suggest that even though the ability to discriminate between items with overlapping features depends heavily upon the perirhinal cortex, this structure (and those structures with which it is closely connected, such as the anterior hippocampus) receives relevant input from other functionally-connected regions depending on the familiarity of the stimuli to be discriminated. Both the SD patients and the MTL amnesics had damage to the perirhinal cortex, which impaired their discrimination performance, but the SD patients' additional damage to anterior and lateral temporal cortex, identified here as being functionally connected with the perirhinal cortex and anterior hippocampus for familiar stimuli, may have prevented them from benefitting from the support provided by intact access to semantic memory.

#### **CONCLUSIONS**

In summary, during a perceptual discrimination task in which participants selected the odd-one-out from a set of three complex faces or objects with many shared visual features, the functional connectivity of the right perirhinal cortex and right anterior hippocampus was significantly modulated by the degree to which participants were familiar with the stimuli being discriminated. This was the case despite the fact that the task did not explicitly require the retrieval of any conceptual information about the items to be discriminated and could be completed without drawing upon semantic memory. For familiar relative to unfamiliar faces and objects, both the right perirhinal cortex and right anterior hippocampus showed enhanced functional correlations with a network of regions associated with semantic knowledge. These findings illustrate that experience and expertise with particular classes of objects influences not only the location of their neural representation (McKeeff et al., 2010; McGugin et al., 2012), but also the functional interactions of these representations with broader whole-brain networks. The results have potential implications for semantic dementia patients, as the results suggest that it was the patients' inability to engage a network of regions including the lateral temporal cortex and the temporal pole, as opposed to localized damage in the perirhinal cortex or hippocampus, that impaired their ability to benefit from the use of familiar stimuli in non-semantic perceptual tasks (Binder and Desai, 2011). Future research in semantic dementia showing diminished interactions between perirhinal cortex, anterior hippocampus, and anterior/lateral temporal regions relative to normal controls during perceptual discrimination tasks will confirm this hypothesis.

#### **AUTHOR CONTRIBUTIONS**

Morgan D. Barense was responsible for the original experimental design, programming, and fMRI data collection. The data analysis was completed by Victoria C. McLelland with assistance from David Chan and Morgan D. Barense. Victoria C. McLelland interpreted the results and wrote the paper with input from Morgan D. Barense, Susanne Ferber, and David Chan.

#### **ACKNOWLEDGMENTS**

Morgan D. Barense and Susanne Ferber are supported by Discovery Grants from the Natural Sciences and Engineering Research Council of Canada and operating grants from the Canadian Institutes of Health Research. We thank Richard Henson and Kim Graham for support of the original univariate fMRI study, which was funded by the UK Medical Research Council (WBSE U.1055.05.012.00001.01). We also thank Michael J. Tarr (Carnegie Mellon University) for providing the greeble stimuli.

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 01 October 2013; accepted: 17 February 2014; published online: 04 March 2014.*

*Citation: McLelland VC, Chan D, Ferber S and Barense MD (2014) Stimulus familiarity modulates functional connectivity of the perirhinal cortex and anterior hippocampus during visual discrimination of faces and objects. Front. Hum. Neurosci. 8:117. doi: 10.3389/fnhum.2014.00117*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 McLelland, Chan, Ferber and Barense. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Beyond perceptual expertise: revisiting the neural substrates of expert object recognition

#### **Assaf Harel\*, Dwight Kravitz and Chris I. Baker**

Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of Health, Bethesda, MD, USA

#### **Edited by:**

Merim Bilalic, University Tübingen, Germany

#### **Reviewed by:**

Guillermo Campitelli, Edith Cowan University, Australia Elinor McKone, Australian National University, Australia

#### **\*Correspondence:**

Assaf Harel, Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of Health, 10 Center Drive, Bethesda, MD 20892, USA e-mail: assaf.harel@nih.gov

Real-world expertise provides a valuable opportunity to understand how experience shapes human behavior and neural function. In the visual domain, the study of expert object recognition, such as in car enthusiasts or bird watchers, has produced a large, growing, and often-controversial literature. Here, we synthesize this literature, focusing primarily on results from functional brain imaging, and propose an interactive framework that incorporates the impact of high-level factors, such as attention and conceptual knowledge, in supporting expertise. This framework contrasts with the perceptual view of object expertise that has concentrated largely on stimulus-driven processing in visual cortex. One prominent version of this perceptual account has almost exclusively focused on the relation of expertise to face processing and, in terms of the neural substrates, has centered on face-selective cortical regions such as the Fusiform Face Area (FFA). We discuss the limitations of this face-centric approach as well as the more general perceptual view, and highlight that expert related activity is: (i) found throughout visual cortex, not just FFA, with a strong relationship between neural response and behavioral expertise even in the earliest stages of visual processing, (ii) found outside visual cortex in areas such as parietal and prefrontal cortices, and (iii) modulated by the attentional engagement of the observer suggesting that it is neither automatic nor driven solely by stimulus properties. These findings strongly support a framework in which object expertise emerges from extensive interactions within and between the visual system and other cognitive systems, resulting in widespread, distributed patterns of expertise-related activity across the entire cortex.

**Keywords: expertise, object recognition, visual perception, fMRI, review, visual cortex**

#### **WHAT IS EXPERTISE AND WHY IS IT IMPORTANT TO STUDY IT?**

Understanding the impact of experience on human behavior and brain function is a central and longstanding issue in psychology and neuroscience. One approach to this question has been to investigate people with exceptional skill, or expertise, in various domains (e.g., chess, wine-tasting, bird watching) and determine how expert processing and the neural substrates supporting it differ from those in novices. Most broadly, expertise is defined as consistently superior performance within a specific domain relative to novices and relative to other domains (Ericsson and Lehmann, 1996). For example, top soccer players such as Cristiano Ronaldo, may excel at kicking soccer balls but not at pitching baseballs.<sup>1</sup> While there are many possible domains of expertise engaging diverse facets of human cognition, including perception, attention, memory, problem solving, motor coordination and action (Ericsson et al., 2006), they all provide an opportunity to study the effect of some of the most extreme and prolonged naturally occurring forms of experience on neural function.

In this article, we will focus on expert visual object recognition, which is an acquired skill certain people show in discriminating between similar members of a homogenous object category, a particularly demanding perceptual task (Jolicoeur et al., 1984; Hamm and Mcmullen, 1998). Face recognition is, arguably, the quintessential example of object expertise, as almost all humans have extensive experience with faces and show remarkable face recognition abilities (Carey, 1992; Tanaka, 2001; although see evidence of significant individual differences: Bowles et al., 2009; Zhu et al., 2010; Wilmer et al., 2012). However, some individuals develop expertise for other very specific object categories. For example, ornithologists are very adept at identifying different types of birds, which all share common features (e.g., feathers, beak) but are distinct from other animals (Rosch et al., 1976; Johnson and Mervis, 1997; see **Figure 1A** for examples of different domains of object expertise). Such expertise may extend into even more homogenous groups such as different kinds of wading birds (Johnson and Mervis, 1997; Tanaka et al., 2005). At an even more specific level, dog show judges have enhanced recognition of individual dogs only within the particular breeds they are familiar with (Diamond and Carey, 1986; Robbins and Mckone, 2007). Similarly, car experts can distinguish between different car

<sup>1</sup>http://www.washingtonpost.com/blogs/soccer-insider/wp/2013/08/01/ soccer-and-society-cristiano-ronaldo-juggles-baseball-at-dodger-stadium/

**FIGURE 1 | Expert visual object recognition**. **(A)** Expertise in visual object recognition has been demonstrated in several domains, including cars (e.g., Kanwisher, 2000; Rossion and Curran, 2010; Harel and Bentin, 2013), dogs (Diamond and Carey, 1986; Robbins and Mckone, 2007), birds (e.g., Johnson and Mervis, 1997; Kanwisher, 2000), x-rays (Harley et al., 2009), fingerprints (Busey and Vanderkolk, 2005), and chess (e.g., Krawczyk et al., 2011; Bilalic et al., 2012). **(B)** Discrimination performance of car experts and car novices with cars and airplanes. Relative to naïve observers (novices), car experts are very good at telling whether two car images varying in color, view and orientation are of the same model or not. However, when these car experts have to perform a similar task with airplane images, their performance drops dramatically and is as equally poor as of novices. This exemplifies the definition of expertise as consistently superior performance within a domain relative to other people and other domains. Figure adapted from Harel et al. (2010). **(C)** A schematic representation of the different levels of visual representation that may be modified by expertise (simple features, intermediate complexity features, holistic and conceptual representations). Here we highlight the interaction between these different representational levels in the visual system. There will be further interactions between visual representations and the higher-level conceptual system representing domain-specific knowledge.

models (Bukach et al., 2010; Harel et al., 2010), or make adhoc distinctions, such as between Japanese and European cars (Harel and Bentin, 2013), even across variations in color, view and orientation. However, this car expertise does not extend to other similar domains, such as other modes of transportation (e.g., airplanes) (**Figure 1B**).

In this article, we primarily focus on the mechanisms that support expert visual object recognition through an examination of their neural correlates. We argue that the neural substrates of expert object recognition are not discretely localized in visual areas but distributed (e.g., Haxby et al., 2001) and highly interactive (e.g., Mahon et al., 2007), with the specific regions engaged defined by the domain of object expertise and the particular information utilized by the expert (Op de Beeck and Baker, 2010a,b; Van Der Linden et al., 2014). Through experience, this information comes to be extracted and processed through specific observer-based mechanisms both within the visual system (e.g., tuning changes) and between visual regions and extrinsic systems, key amongst which are those supporting long-term conceptual knowledge and top-down attention (**Figure 1C**). More broadly, we suggest that such interplay between different neural systems is a common feature of all forms of expertise. This interactive framework contrasts with the view of expertise as a predominantly sensory or perceptual skill supported by automatic stimulusdriven processes localized within category-selective visual regions in occipitotemporal cortex (e.g., Bukach et al., 2006).

We will first describe the perceptual view of visual object expertise, contrasting it with an interactive view, before focusing on the face-centric account of expert object recognition. This account has had a large influence the field of object expertise but we will highlight its major theoretical and empirical limitations. Finally, we will discuss evidence in favor of an interactive account and conclude by suggesting how this account can be generalized to explain other forms of expertise.

#### **THE NEURAL SUBSTRATES OF EXPERT VISUAL OBJECT RECOGNITION**

#### **PERCEPTUAL VIEW OF EXPERTISE**

What underlies expertise in object recognition? Since the hallmark of expert object recognition is making very fine discriminations between similar stimuli, one intuitive possibility is that expert object recognition primarily entails changes to sensory or perceptual processing (Palmeri et al., 2004). Thus, attaining any form of visual expertise should be supported primarily by qualitative changes in processing within specific regions of visual cortex (Palmeri and Gauthier, 2004). We refer to this notion as the perceptual view of expertise. To the extent that any changes affect the bottom-up, sensory processing of visual information, expert processing under this perceptual view is automatic and stimulusdriven, with little impact of attentional, task demands or other higher-level cognitive factors (Tarr and Gauthier, 2000; Palmeri et al., 2004).<sup>2</sup>

This perceptual view of expertise is supported by the experience-dependent changes in neural tuning in areas of visual cortex reported in studies of perceptual learning (e.g., Karni and Sagi, 1991), that is, "practice-induced improvement in the ability to perform specific perceptual tasks" (Ahissar and Hochstein,

<sup>2</sup>Note that the perceptual view of expertise does not claim that the task used for the training is irrelevant (in fact it is critical for inducing expertise, see Tanaka et al., 2005), but rather, that real-world experts (who are superior in within category discrimination) automatically try to individuate objects from their domain of expertise irrespective of the task at hand. In other words, once experts master the ability to individuate exemplars, they cannot view their objects of expertise without attempting to individuate them. Thus, one should distinguish between task-specific learning effects (e.g., Tanaka et al., 2005; Wong et al., 2009b) and task-dependence following expertise training (e.g., Rhodes et al., 2004).

2004). For example, neurons in early visual areas (V1–V4) have been reported to show stronger responses and narrower orientation tuning curves following extensive training on orientation discrimination tasks (e.g., Monkey: Schoups et al., 2001; Yang and Maunsell, 2004. Human: Schiltz et al., 1999; Schwartz et al., 2002; Furmanski et al., 2004; Yotsumoto et al., 2008; for a recent review see Lu et al., 2011). Further, long-term training with artificial objects in both human (e.g., Op de Beeck et al., 2006; Yue et al., 2006; Wong et al., 2009b; Zhang et al., 2010) and non-human primates (e.g., Kobatake et al., 1998; Op de Beeck et al., 2001; Baker et al., 2002; Woloszyn and Sheinberg, 2012) have revealed specific changes in the response of high-level visual cortex such as increases or decreases in response magnitude and increased selectivity for trained objects and task-relevant stimulus dimensions (for review, see Op de Beeck and Baker, 2010b). For example, Op de Beeck et al. (2006) trained human subjects for approximately 10 h to discriminate between exemplars in one of three novel object classes ("smoothies", "spikies", and "cubies"). Comparison of fMRI data before and after training revealed training-dependent increases and decreases in response across distributed areas of occipitotemporal.

#### **INTERACTIVE VIEW OF EXPERTISE**

While these perceptual learning and training studies demonstrate changes in visual cortex with experience, such visual perceptual experience is only one aspect of real world object expertise. Objects, particularly real world natural objects embody rich information not only in terms of their appearance, but also in their function, motor affordances, and other semantic properties.<sup>3</sup> Given these extended properties, the cortical representations of objects can be considered conceptual and distributed rather than sensory and localized (Mahon et al., 2007; Martin, 2009; Carlson et al., 2014). Experts and novices are distinguished by differences in these conceptual associations, since long-term real world expert object recognition is accompanied by the ability to access relevant and meaningful conceptual information that is not available to non-experts (Johnson and Mervis, 1997; Barton et al., 2009; Harel and Bentin, 2009; Gilaie-Dotan et al., 2012). However, conceptual properties of objects have not typically been manipulated in training studies such as those described above (but see Gauthier et al., 2003; Weisberg et al., 2007). Thus, in the acquisition of expertise, conceptual knowledge develops, along with other observer-based high-level factors (e.g., autobiographical memories, emotional associations) in conjunction with experience-dependent changes in perceptual processing (Johnson and Mervis, 1997; Johnson, 2001; Medin and Atran, 2004), leading to a correlation between discrimination ability and conceptual knowledge within the domain of expertise (Barton et al., 2009; Dennett et al., 2012; McGugin et al., 2012a).

A complete account of real world expert object recognition cannot ignore these factors, and must specify how stimulusbased sensory-driven processing interacts with observer-based high-level factors. For example, the expert's increased knowledge and engagement may guide the extraction of diagnostic visual information, which in turn, may be used to expand existing conceptual knowledge. We refer to this experience-based interplay between conceptual and perceptual processing as the interactive view of expertise. This interactive view of expertise contrasts with the perceptual view of expertise (i.e., as automatic, domainspecific, and attention-invariant) and echoes a more general view of visual recognition as an interaction between stimulus information ("bottom-up") and observer-based cognitive ("topdown") factors such as goals, expectations, and prior knowledge (Schyns, 1998; Schyns et al., 1998; Lupyan et al., 2010). It is important to note while the interactive view does not support a strict stimulus-driven view of expert processing, it also does not suggest that the effects of experience are driven solely by top-down factors that operate independently of the perceptual processing in sensory cortex (for such a view, see Pylyshyn, 1999). Rather, we argue expertise arises from the interaction of sensory-driven and observer-based processing.

In terms of natural experience, faces perhaps best exemplify the combination of visual and conceptual properties that underlie object expertise. Faces are not only a distinct category of stimulus within which we make fine-grained discriminations, but are also typically associated with rich social, biographic and semantic information. Thus, faces seem the ideal domain to study realworld expert object recognition. And indeed, such considerations have led to an approach of studying expertise through the prism of face recognition. However, somewhat unfortunately, this approach has been dominated by the perceptual approach to expertise, focusing almost entirely on the visual aspects of processing while ignoring the influences of higher-level cognitive factors on the visual processing. We discuss this perceptually dominated face account of expertise in the following section, before presenting our interactive view of expertise in greater detail.

#### **THE FACE ACCOUNT OF EXPERT OBJECT RECOGNITION AND FUSIFORM FACE AREA (FFA)**

Face perception shows a number of specific behavioral markers (e.g., stronger effects of inversion (Yin, 1969)) not typically observed for other categories of visual stimuli that are thought to reflect specialized processing mechanisms. However, it has been claimed that some of these same markers can be observed for expert object recognition, leading to the suggestion that the face processing and expertise shared a common mechanism. In their seminal paper, Diamond and Carey (1986) reported that dog experts display a similar decrement in recognition of inverted compared to upright dogs (but see Robbins and Mckone, 2007 for a failure to replicate). They reasoned that the inversion effect emerges if three conditions are met: (1) members of an object category must share a prototypical configuration of parts; (2) it must be possible to individuate the members of the category on the basis of second-order relational features (spatial relation of the parts relative to their prototypical arrangement); and (3) the observers must have the expertise to exploit such features. According to this perceptual theory of expertise, acquiring expertise in object recognition leads to a unique mode of perceptual processing, namely, transitioning from a feature-based mode of

<sup>3</sup>One of the most striking examples of the importance of semantic information to object recognition comes from visual associative agnosia, in which patients show intact shape processing but are unable to connect it to visual knowledge of the object (McCarthy and Warrington, 1986; Farah, 2004).

processing into what is often referred to as a "holistic" mode of processing.<sup>4</sup> Consequently, this processing strategy was suggested to underlie expertise with objects in general (Gauthier et al., 2003).

In this context, many studies have compared expert and face processing to provide insight into the mechanisms of object expertise. When experts view objects from their domain of expertise, some studies have reported effects analogous to those found with faces. These include behavioral (Gauthier and Tarr, 1997, 2002), electrophysiological (Tanaka and Curran, 2001; Rossion et al., 2002; Gauthier et al., 2003; Scott et al., 2008) and neuroimaging (Gauthier et al., 1999, 2000) measures. However, other studies find conflicting results (Carmel and Bentin, 2002; Xu et al., 2005; Robbins and Mckone, 2007; Harel and Bentin, 2013) and much of evidence supporting the face account of expertise is controversial. In particular, it has been argued that the data presented in these studies is not sufficient to conclude that object expertise engages the same mechanisms as face perception (for detailed discussion see McKone and Kanwisher, 2005; McKone et al., 2007; McKone and Robbins, 2011). Here, we will focus on the neuroimaging evidence on expertise, which has predominantly investigated the role of the Fusiform Face Area (FFA; Kanwisher et al., 1997), a region in ventral temporal cortex that responds more when people view faces compared to other objects.

Broadly, there are two possible accounts of the face selectivity in FFA: (i) Stimulus driven—this region is specialized for processing faces only (Kanwisher, 2010) <sup>5</sup> or (ii) Process-driven this region is specialized for a specific computation (i.e., holistic processing) that is recruited when processing faces but can also be recruited for any object of expertise (Tarr and Gauthier, 2000). Under this process-driven account, any category of objects that share a prototypical configuration of features and require experience to discriminate between its members will engage the FFA (Gauthier and Tarr, 2002; McGugin et al., 2012b).

Supporting the process-driven account, Gauthier and colleagues reported that FFA showed a higher response to objects of expertise than to other everyday objects both in real-world experts (bird and car experts) (Gauthier et al., 2000; see also Xu, 2005) and in laboratory-trained experts with novel objects—"Greebles" (Gauthier et al., 1999). They suggested that FFA is recruited whenever expert fine discriminations among homogeneous stimuli are required. Thus, the expertise-enhanced response of FFA was suggested to be: (i) specific to categories with exemplars sharing a prototypical configuration of parts and (ii) independent of visual shape, as the increase in response was found for diverse objects of expertise (Greebles, cars, and birds). Later studies reported similar response enhancement in FFA (or in its vicinity) using chess configurations in chess experts (Bilalic et al., 2011; Righi et al., 2010). Response enhancement in FFA was also observed in children who were experts with Pokémon cartoon characters but not for Digimon characters with which they had no expertise (James and James, 2013).

However, the claim that the FFA supports expert object recognition is highly debated and is subject to much controversy. In particular, many studies have failed to find an increased response to objects of expertise in FFA: with real world expertise (Grill-Spector et al., 2004; Rhodes et al., 2004; Krawczyk et al., 2011), with short-term laboratory training (Op de Beeck et al., 2006; Yue et al., 2006) and even with the Greeble stimuli used in the original studies (Brants et al., 2011). Further, the presence of any expertise effect in FFA may reflect the perceived nature of the stimuli, particularly their resemblance to faces (Op de Beeck et al., 2006; for a discussion, see Sheinberg and Tarr, 2010).

Beyond these empirical concerns, it is important to note, that while this perceptual face-centric approach has generated a considerable body of research, it has major theoretical drawbacks for understanding the general nature of expert object recognition. These limitations are particularly evident in neuroimaging, where the theoretical discussion of the neural substrates of expert object recognition has seemingly reduced to the question of whether FFA is critically engaged in expertise (Xu, 2005; Bilalic et al., 2011; McGugin et al., 2012b) or not (Grill-Spector et al., 2004; Rhodes et al., 2004; Krawczyk et al., 2011), largely ignoring any neural signatures of expert object recognition beyond FFA that are nonetheless unique to expertise. In fact, even faces themselves elicit selective activation in many more regions than just the FFA, recruiting a whole network of cortical regions including the Occipital Face Area (OFA), Superior Temporal Sulcus (STS), Anterior Temporal Lobe (ATL), Ventrolateral Prefrontal Cortex (VLPFC), and the amygdala (for a review see Haxby and Gobbini, 2011). Further, information about faces is not restricted to these face-selective regions but is distributed across the ventral occipitotemporal cortex (Haxby et al., 2001; Susilo et al., 2010). All these regions may be highly relevant to different aspects of face expertise, for example distinguishing facial expressions supported by STS (Said et al., 2010; Pitcher et al., 2011), accessing information about unique identity invariant to visual transformations supported by ATL (Quiroga et al., 2005; Simmons et al., 2010), and processing of specialized facial features, such as the eyes, supported by VLPFC (Chan and Downing, 2011; for a review see Chan, 2013).

Thus, there is little theoretical justification for focusing solely on the FFA when many other regions, including those outside visual cortex, show the ability to support expertise with faces. Indeed, while faces are certainly a central domain of human visual expertise there are actually no a-priori reasons why the unique characteristics associated with their perceptual processing (such as holistic processing or activation of the FFA) should serve as a benchmark for all domains of object expertise. More generally, as we discuss in the next section, there is ample evidence that the neural manifestations of object expertise can be found not only in visual cortex, but also in many other cortical areas.

<sup>4</sup>Broadly defined, holistic processing refers to the calculation of the relations between the parts of the object rather than the piecemeal processing of individual object parts (for a review see Maurer et al., 2002). The term holistic is notorious in the face perception literature for its many definitions and associations (Gauthier and Tarr, 2002). In the present article, we use the term holistic in its most general, inclusive sense subsuming first- and second-order configural representations as well as holistic (integral) processing.

<sup>5</sup>Although the stimulus-driven account has often been linked to the notion of innate face processing, this is a separate issue. This account does not reject a role of experience, but suggests that experience contributes to the formation of stimulus-driven representations.

#### **BEYOND FUSIFORM FACE AREA (FFA): EVIDENCE FOR THE BROADLY DISTRIBUTED NATURE OF EXPERTISE**

Despite the strong focus on FFA in the perceptual account of expertise, it's clear that expertise-related activations for nonface objects are found outside FFA and even outside other face-selective regions. In fact, even the early fMRI studies of Gauthier and colleagues revealed expertise-related activations in the face-selective OFA and in other regions of occipitotemporal cortex including object-selective Lateral Occipital Complex (LOC; Malach et al., 1995), and scene-selective Parahippocampal Place Area (PPA; Epstein and Kanwisher, 1998). Subsequent fMRI studies of expert object recognition also reported expertise-specific activity outside of FFA (Harley et al., 2009; Krawczyk et al., 2011), and long-term training with artificial objects has been reported to elicit changes in many parts of occipitotemporal cortex (Op de Beeck et al., 2006; Yue et al., 2006; Wong et al., 2009b; Brants et al., 2011; Wong et al., 2012) as well as in areas outside visual cortex such as STS (Van Der Linden et al., 2010), posterior parietal cortex (Moore et al., 2006) and prefrontal cortex (Moore et al., 2006; Jiang et al., 2007; Van Der Linden et al., 2014).

To test the full extent of the neural substrates of expert object recognition across the entire brain, Harel and colleagues presented car expert and novice participants with images of cars, faces, and airplanes while performing a standard one-back task, requiring detection of image repeats (Harel et al., 2010, Experiment 1). Directly contrasting the car-selective activation (cars vs. airplanes) of the car experts with that of the novices revealed widespread effects of expertise, which encompassed not only occipitotemporal cortex, but also retinotopic early visual cortex as well as areas outside of visual cortex including the precuneus, intraparietal sulcus, and lateral prefrontal cortex (**Figure 2A**). These distributed effects of expertise suggest the involvement of non-visual factors, such as attention, memory and decisionmaking in expert object recognition (Harel et al., 2010; Krawczyk et al., 2011; Bilalic et al., 2012). Note that these patterns of activation represent the *interaction* between object category and group (experts/naïve observers), that is, reflecting car-selective activity that is greater in experts relative to novices. Thus the expert modulation of early visual cortex cannot be explained away by suggesting that low-level differences in the categories compared are driving the effect (McGugin et al., 2012b). Further, the lack of a difference in activation for faces between the experts and novices argues against a general motivational explanation.

The work discussed so far has focused on the activation differences between experts and novices at a group level. However, recently it has also been suggested that the critical test of the involvement of a region in object expertise is whether its response to objects of expertise correlates with the degree of expertise (Gauthier et al., 2005; Harley et al., 2009). Using this criterion, McGugin et al. (2012b), in a high-resolution fMRI study at 7T, reported that car selectivity in FFA correlates with car expertise (but see Grill-Spector et al., 2004 for a conflicting result). While these data, if taken alone, would appear to support the processdriven account of FFA, the focus on FFA may again be misleading. Importantly, significant correlations were found in many areas outside occipitotemporal cortex including lingual gyrus, and precuneus, strongly resembling the spatial distribution of expert activations of Harel et al. (2010; **Figures 2A, B**). Furthermore, within visual areas, significant correlations between car selectivity and expertise were found not only in face-selective voxels but also in non-selective voxels. Overall, if correlation between degree of expertise and response to objects of expertise is the critical marker for the neural substrates of expertise, these results suggest the involvement of a number of distributed regions and suggest no privileged status of face selectivity.

While the correlation findings of McGugin and colleagues suggest widespread effects of expertise, due to the nature of the highresolution scanning the imaged volume was restricted to parts of occipitotemporal cortex. Importantly, data was not acquired from early visual cortex, a region implicated in expertise effects by Harel and colleagues. To replicate the findings of McGugin and colleagues and see if the correlation effects extend even to early visual cortex (suggesting task-based attentional modulation of visual activity: Watanabe et al., 1998), data from Harel et al. (2010) was re-analyzed computing the correlation between a behavioral measure of expertise (pooled across car experts and novices) and the response to cars in a number of functionallydefined regions in visual cortex (Harel et al., 2012). Not only was a positive correlation found in FFA, but also in scene-selective PPA and object-selective LOC. Critically, a positive correlation was also found in early visual cortex, highlighting a general tendency across cortex for car selectivity to correlate with behavioral expertise (**Figure 2C**). Together, these results suggest that even when considering the specific correlation between activity and level of expertise, the neural basis of visual expertise is not relegated to specific "hot spots" in high-level visual cortex such as FFA (or any other single localized region, for that matter), but is rather manifest in a widespread pattern of activity specific to the domain of expertise, which may reflect the engagement of large-scale topdown attentional networks (Downar et al., 2001; Corbetta and Shulman, 2002).

These findings of widespread expertise effects across the cortex argue strongly against the perceptual view of expertise and instead support a framework in which a wide variety of different regions and processes generate expert performance. This characterization is in keeping with the critical role that non-perceptual factors play in distinguishing experts from novices. Having discussed the evidence for the engagement of both stimulus-driven *and* highlevel cortical regions, we now turn to studies demonstrating how their interaction supports expertise.

#### **BEYOND PERCEPTION: EVIDENCE FOR THE INTERACTIVE NATURE OF EXPERTISE**

The interactive view of object expertise proposes that expert object recognition depends on both sensory stimulus-driven processing as well as more high-level cognitive factors with a critical interaction between these processes, whereby the expert's increased knowledge and attention guides the extraction of diagnostic visual information. Indeed, we suggest that a theory of expert object recognition cannot be complete without taking both perceptual and top-down contributions into account. Evidence for this interaction comes from behavioral and neuroimaging studies from various domains of visual expertise that

engagement.

involve interactions among diverse high-level cognitive processes, particularly task-based attentional engagement and domainspecific conceptual knowledge. We first focus on two of the domains of expertise that have been most intensively investigated (cars, chess), followed by a brief review of other domains of expertise, focusing in particular on spatial navigation and reading.

#### **EXAMPLE OF INTERACTIONS WITH TASK-BASED ATTENTION IN CAR EXPERTISE**

behavioral car expertise (car discrimination relative to airplane

As noted above, the expertise effects found in Harel et al. (2010), Experiment 1 are so widespread, it seems most plausible that they reflect some non-specific effect, such as the increased level of top-down engagement that the experts have with their objects of expertise. For example, experts may direct more attention to their objects of expertise (Hershler and Hochstein, 2009; Golan et al., 2014), leading both to the increased activation observed inside (Kanwisher, 2000; McKone et al., 2007) and outside (Harel et al., 2010) FFA. Thus, an alternative account is that the enhanced activation observed for objects of expertise reflects a top-down attentional effect rather than the operation of an automatic stimulus-driven perceptual mechanism (Harel et al., 2010).

To directly test the role of attention in expertise, Harel et al. (2010), Experiment 2 explicitly manipulated the attentional engagement of both car experts and novices. Participants were presented with interleaved images of cars and airplanes but were instructed to attend only to cars in one half of the trials, and to attend only to airplanes the other half of the trials, responding whenever they saw an immediate image repeat in the attended category only. A purely perceptual view of expertise as an automatic process would predict that the spatial extent of expert car-selective activation would be similar in both conditions, that is, irrespective of the engagement of the experts (Gauthier et al., 2000; Tarr and Gauthier, 2000). Contrary to this prediction, experts showed widespread selectivity for cars only when they were task-relevant (**Figure 3**, top row). When the same car images were presented, but were task-irrelevant, the car selectivity in experts diminished considerably, to the extent that there were almost no differences between the experts and novices (**Figure 3**, bottom row). These findings strikingly demonstrate that the neural activity characteristic of visual object expertise reflects the enhanced engagement of the experts rather than the mandatory operation of perceptual, stimulus-driven expert recognition mechanisms.

Further support for the role of attention comes from a behavioral study showing expert categorization of even car fragments involves top-down mechanisms (Harel et al., 2011). Specifically, when car experts categorized car fragments of intermediate complexity varying in their diagnostic value (Ullman et al., 2002; Harel et al., 2007), they did not utilize the information differently

from novices, as might have been expected had their perceptual representations changed, but rather showed a general enhancement of response speed, indicative of a general bias or attentional effect. Further, when car experts search for cars among other common objects, they show a more efficient deployment of attention to cars relative to other object targets. The efficiency of visual search can be assessed by calculating search slopes, that is, estimating the linear increase in search speed as a function of the number of distractors displayed, with less efficient search resulting in greater increase in reaction times with increasing display size (Wolfe, 1994). Accordingly, car experts showed a shallower search slopes for objects from their domain of expertise relative to objects they are not experts with, suggesting a more efficient search (Hershler and Hochstein, 2009; Golan et al., 2014). Interestingly, the search for objects of expertise was still much less efficient than that for faces, which often result in nearly flat search slopes (Hershler and Hochstein, 2005), indicative of automatic and preattentive processing. This difference between non-face objects of expertise and faces is another demonstration that expertise in object recognition does not involve automatic perceptual processing.

While these neuroimaging and behavioral findings highlight the importance of top-down attention in expertise, experts not only direct more attention to objects of expertise, they engage in a multitude of other unique cognitive and affective processes, including accessing domain-specific knowledge. Ironically, the central role of top-down cognitive factors in object expertise can be illustrated in a domain of expertise that has been extensively studied from a perceptual perspective (e.g., Gauthier et al., 2000, 2003, 2005; Rossion et al., 2007; Bukach et al., 2010; but see Harel and Bentin, 2013). However, car experts are also more knowledgeable about cars, both about their shape and function, often possessing highly-specialized domain-specific knowledge (e.g., acceleration, horsepower). We suggest that this domain-specific conceptual knowledge interacts with and guides the extraction of visual information (**Figure 1C**). Several behavioral studies show that car discrimination ability is highly correlated with conceptual knowledge of cars (Barton et al., 2009; Dennett et al., 2012; McGugin et al., 2012a). These behavioral studies converge on the conclusion that car expertise integrates both visual and conceptual knowledge (for a similar conclusion, Van Gulick and Gauthier, 2013).

Finally, in addition to the fMRI studies discussed above which highlight the role of attentional engagement in car expertise, evidence for the involvement of non-visual factors can also be found in a structural MRI study. Gilaie-Dotan et al. (2012) showed that car discrimination ability is positively correlated with increasing gray matter density in prefrontal cortex. This finding is in contrast to the prediction of the perceptual view of expertise of specific changes to category-selective regions in visual cortex.

Taken together, the behavioral, structural and functional imaging studies suggest that when experts view objects from their domain of expertise, they differ from novices not only in their stimulus-driven perceptual processing of the objects, but they also direct more attention to them and access domain-specific knowledge. It is important to note that the interactive view of expert object recognition does not exclude the involvement of perceptual mechanisms in expertise that may or may not engage the FFA. Rather, changes in brain activity induced by expertise with objects reflect a multitude of interacting factors, both stimulus-driven and observer-based.

#### **EXAMPLES OF INTERACTIONS WITH TASK-BASED ATTENTION AND PRIOR KNOWLEDGE IN CHESS EXPERTISE**

So far we have discussed evidence for the involvement of attention and conceptual knowledge in expertise, however, studies of chess expertise suggest that these two factors may operate in tandem. Chess employs multiple cognitive functions, including object recognition, conceptual knowledge, memory, and the processing of spatial configurations (Gobet and Charness, 2006). And while chess expertise has been associated with selective activations in visual cortex, and in particular FFA (Bilalic et al., 2011, but see Krawczyk et al., 2011; Righi et al., 2010), a multitude of cortical regions are reported to be active in chess experts when viewing chessboards (Bilalic et al., 2010, 2012; Krawczyk et al., 2011). Expert-related activity was found to be widespread, extending beyond visual cortex to include activations in collateral sulcus (CoS), posterior middle temporal gyrus (pMTG), occipitotemporal junction (OTJ), supplementary motor area (SMA), primary motor cortex (M1), and left anterior insula. These regions have been suggested to support pattern recognition, perception of complex relations, and action-related functional knowledge of chess objects (Bilalic et al., 2010). The exact nature of the interactions between the different areas supporting chess expertise is yet to be determined, especially how visual information is utilized and accessed by higher-level cognitive processes ubiquitous to chess, such as problem solving and decisionmaking.

Critically, Bilalic and colleagues demonstrated that task context and prior knowledge play an essential role in driving cortical activations in chess experts (Bilalic et al., 2010, 2011, 2012). The expert-specific pattern of activation manifested only when the task was specific to the domain of expertise (e.g., searching for particular chess pieces), and not when a comparable control task was used (i.e., a task that did not require the recognition of particular chess pieces) with identical visual input. In other words, there was little activity that distinguished experts and novices when they were not engaged, directly echoing the findings of Harel et al. (2010). Further, activity in some of the visual areas that displayed task-specific expertise effects (e.g., CoS) were also modulated by prior knowledge, demonstrated in a lower magnitude of response when the chess displays represented random, impossible chess positions relative to possible ones.

#### **INTERACTIONS IN OTHER DOMAINS OF EXPERTISE**

The interactive view of expert object recognition can be expanded to account for the neural manifestations of other types of expertise involving visual information based on the totality of the cognitive processes they recruit. In essence, the interactive view suggests that expertise is supported by a multitude of brain areas, the identity of which determined by the informational demands imposed by the particular domain of expertise. Critically, these different brain areas do not operate independently, as activity in one area is mutually constrained by activity in the others, reflecting the interactive nature of visual processing in general, and expertise in particular.

The interactive view is supported by the extensive and varied activations observed for many domains of expertise (e.g., architecture: Kirk et al., 2009; reading musical notation: Wong and Gauthier, 2010; archery: Kim et al., 2011; basketball: Abreu et al., 2012). Critically, the specific networks involved are defined by the diagnostic information for those domains. For example, professional basketball players also excel at anticipating the consequences of the actions of other players (i.e., success of free shots at a basket: Aglioti et al., 2008), reflected in activations in frontal and parietal areas traditionally involved in action observation, as well as in the extrastriate body area (EBA, a body-selective region in a occipitotemporal cortex: Downing et al., 2001), probably due to their expert reading of the observed action kinematics (Abreu et al., 2012).

Whereas many examples of visual expertise involve recognition of objects or discrete stimuli, expertise can also be found for largescale spatial environments, for example taxi drivers navigating London (Woollett et al., 2009). Structural MRI studies have reported an increased hippocampal volume in taxi drivers relative to controls (Maguire et al., 2000; Woollett and Maguire, 2011). Importantly, these changes in hippocampus were not observed in London bus drivers with equivalent driving experience, indicating that specific navigation strategies interact with experience in producing changes to neural substrates (Maguire et al., 2006). However, in accord with the interactive view, the hippocampus is not the only region involved in navigation expertise. For example, visual inspection of landmark objects in city scenes by London taxi drivers (Spiers and Maguire, 2006) results in widespread patterns of activation along the dorsal (Kravitz et al., 2011) and ventral (Kravitz et al., 2013) visual pathways, as well as parahippocampal cortex, retrosplenial cortex, and various prefrontal structures all strongly associated with scene processing (Epstein, 2008), navigation, and spatial processing generally (Kravitz et al., 2011). Of course, all of these areas are strongly interconnected with the hippocampus, and thus constitute a network wherein multiple types of information are integrated to support complex spatial behavior.

Reading is an example of a domain in which the neural substrates supporting the interaction between conceptual and perceptual processing may be more predictable. Reading is a means of accessing the language system through vision, hence, involving the activation of multiple brain regions and interconnections supporting the processing and representation of different types of linguistic information (phonological, lexical, semantic; for a review, see Price, 2012). The visual component of reading, word processing, has been primarily linked to experiencedependent activations in ventral occipitotemporal cortex (Baker et al., 2007; Wong et al., 2009a; Dehaene et al., 2010) in a region often referred to as the Visual Word Form Area (VWFA; for a review see Dehaene and Cohen, 2011). Exemplifying the interaction between orthography and other language systems, VFWA activity following training with novel orthography was found to represent not only visual form, but also phonological and semantic information (Xue et al., 2006). In contrast to face-selective activation, which is typically stronger in the right relative to the left hemisphere, VWFA shows the opposite lateralization, with stronger responses in the left hemisphere. To explain the relative locations of face- and word-selective regions, Plaut, Behrmann and colleagues proposed a competitive interaction between face and word representation for foveally-biased cortex, constrained by the need to integrate reading with the language system that is primarily left-lateralized (Behrmann and Plaut, 2013; Dundas et al., 2013). This computational approach, which attempts to understand how higher-level, non-visual information constrains category specialization in visual areas, is likely to be a fruitful avenue for future research.

Together, these studies demonstrate that the neural substrates of visual expertise extend well beyond visual cortex, and are manifest in regions supporting attention, memory, spatial cognition, language, and action observation. Importantly, the involvement of these systems is predictable from their general functions, suggesting that expertise evolves largely within the same systems that initially process the stimuli. Overall, it is clear that more complex forms of visual expertise recruit broad and diverse arrays of cortical and subcortical regions. Visual expertise, in its broadest sense, engages multiple cognitive processes in addition to perception, and the interplay between these different cognitive systems is what unites these seemingly different domains of expertise. Notably, studying the different networks that form the neural correlates of expertise may inform us of the diverse cognitive processes involved in particular domains of expertise, as these processes are often not consciously accessible for the experts themselves (Palmeri et al., 2004).

#### **SUMMARY AND FUTURE DIRECTIONS**

Real-world expertise provides a unique opportunity to study how neural representations change with experience in humans. In this article, we focused on expertise in visual object recognition, reassessing its common view as a predominantly automatic stimulus-driven perceptual skill that is supported by category-selective areas in high-level visual cortex. We propose an interactive framework for expert object recognition, which posits that expertise emerges from multiple interactions within and between the visual system and other cognitive systems, such as top-down attention and conceptual memory. These interactions are manifest in widespread distributed patterns of activity across the entire cortex, and are highly susceptible to high-level factors, such as task relevance and prior knowledge.

While the interactive framework provides a more complete account of the neural correlates of visual expertise across its diverse domains, many questions are still open. Having established the involvement of multiple cortical networks in object expertise, the next natural question is what are the relative contributions of each of these processes to the unique behavior displayed by experts. For example, examining the role of topdown attention in expertise, what is the precise effect of the high engagement of experts with their objects of expertise (inherent to real world expertise) on the perceptual processing of these objects? Using experimental paradigms that are known to affect top-down attention, such as divided attention, will allow researchers to test the extent of the involvement of top-down attention in expertise. Further, given the modulation of activation by task relevance, how do different tasks affect the neural manifestations of expertise? Similar questions can be asked about the role of conceptual knowledge in guiding perceptual processing. Of particular interest here is how accumulating knowledge over time interacts with and affects the way experts extract information from their objects of expertise.

Finally, it should be noted that the great advantage provided by studying real-world expertise—its high ecological validity also poses a real challenge. How can the perceptual elements be teased apart from the other high-level top-down factors in realworld experts, which possess both qualities? One potential way to address this challenge is by studying long-term expertise in more controlled settings, which allow the researcher to tease apart the different factors involved in a particular domain of expertise. For example, one can study the time course of intensive, relatively short-term training with real world objects while controlling the visual input, the conceptual knowledge, and the level of engagement to manipulate the relationship between conceptual and sensory information. For example, Weisberg et al. (2007) showed that training participants to treat novel objects as tools engages action-related "tool" areas (left intraparietal sulcus and premotor cortex) that were not active before training or for objects not treated as tools. These findings demonstrate how a particular type of experience with objects is incorporated with perceptual visual information to form new object concepts. This approach can be extended to further our understanding of complex and diverse cortical networks and interactions underlying real-world expertise.

#### **ACKNOWLEDGMENTS**

The authors thank Hans Op de Beeck, Marlene Behrmann, and Alex Martin for helpful discussions. This research was supported by the Intramural Research Program of the US National Institutes of Health (NIH), National Institute of Mental Health (NIMH).

#### **REFERENCES**


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 29 August 2013; paper pending published: 01 October 2013; accepted: 05 December 2013; published online: 27 December 2013*.

*Citation: Harel A, Kravitz D and Baker CI (2013) Beyond perceptual expertise: revisiting the neural substrates of expert object recognition. Front. Hum. Neurosci. 7:885. doi: 10.3389/fnhum.2013.00885*

*This article was submitted to the journal Frontiers in Human Neuroscience*.

*Copyright © 2013 Harel, Kravitz and Baker. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms*.

### Holding a stick at both ends: on faces and expertise

#### *Assaf Harel <sup>1</sup> \*, Dwight J. Kravitz <sup>2</sup> and Chris I. Baker <sup>1</sup>*

*<sup>1</sup> Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of Health, Bethesda, MD, USA*

*<sup>2</sup> Department of Psychology, The George Washington University, Washington, DC, USA*

*\*Correspondence: assaf.harel@nih.gov*

#### *Edited by:*

*Merim Bilalic, Alpen Adria University Klagenfurt, Austria*

#### *Reviewed by:*

*Guillermo Campitelli, Edith Cowan University, Australia*

**Keywords: expertise, perceptual expertise, object recognition, visual perception, fMRI, review, visual cortex**

Ever since Diamond and Carey's (1986) seminal work, object expertise has often been viewed through the prism of face perception (for a thorough discussion, see Tanaka and Gauthier, 1997; Sheinberg and Tarr, 2010). According to Wong and Wong (2014, W&W), however, this emphasis has simply been a response to the question of modularity of face perception, and has not been about expertise in and of itself. It is precisely this conflation of questions of expertise and modularity, the consequent focus on FFA, and the detrimental effect this had on the field of object expertise research that we discussed as part of our original review (Harel et al., 2013).

We fully acknowledge that some recent works on visual expertise particularly outside the domain of real world object recognition (the focus of our article)—have started to discuss object expertise beyond sensory cortex (e.g., Wong and Gauthier, 2010; Wong et al., 2012). However, at the same time, other high-profile works continue to focus on expertise solely in the context of FFA and face-selectivity (McGugin et al., 2012, 2014), arguing that their results are inconsistent with the notion that "learning effects are distributed throughout cortex with no relation to face selectivity" (McGugin et al., 2012, p. 17067).

Focusing on discrete regions when the question is modularity, but focusing on distributed effects when the question is expertise itself, comes across as holding the stick at both ends and leads to the widespread misconception that FFA plays a privileged role in expertise. Of course, one can show that expertise effects occur within FFA while simultaneously acknowledging the widespread effects of expertise across the cortex. However, the significance of the former result to the understanding of object expertise is greatly reduced by the latter. Put simply, the more distributed expertise effects are, the less significant is the role of one particular region for our understanding of the general mechanisms of object expertise. Take for example, the widespread effects of car expertise, which includes even early visual cortex (Harel et al., 2013, Figure 2). Thus, a continued focus on the relationship between expertise and face processing detracts from the study of the general principles underlying real-world object expertise.

Beyond the issue of modularity, W&W suggest we mischaracterized prior research by stating it often emphasized expertise as an automatic, stimulus-driven skill, with little impact of attention, task and highlevel cognitive factors. However, we found this criticism rather surprising given (i) the extensive discussion of automaticity and interference in the expertise literature with many studies suggesting that processing becomes more automatic with expertise (Tarr and Gauthier, 2000; Gauthier and Tarr, 2002; McCandliss et al., 2003; McGugin et al., 2011; Richler et al., 2011), and (ii) recent work explicitly testing the hypothesis that car expertise effects are invariant to modulations of attention or clutter (McGugin et al., 2014). In their response W&W suggest that experts "*tend* to automatically process their objects of expertise in a certain way" but those processes can be "*overridden* by higher-level cognitive processing." It is unclear how an automatic process can be sometimes engaged and occasionally overridden. This comes across as another instance of holding the stick at both ends.

Despite the points of contention we have highlighted here, we are encouraged that W&W fully agree with the distributed interactive view of visual expertise we discussed (Harel et al., 2010, 2013). We are certain that future research fully focused on addressing the distributed and highly interactive nature of visual expertise will provide new insights into the cortical mechanisms underlying real world object expertise.

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 07 May 2014; paper pending published: 17 May 2014; accepted: 02 June 2014; published online: 20 June 2014.*

*Citation: Harel A, Kravitz DJ and Baker CI (2014) Holding a stick at both ends: on faces and expertise. Front. Hum. Neurosci. 8:442. doi: 10.3389/fnhum. 2014.00442*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Harel, Kravitz and Baker. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

### Interaction between perceptual and cognitive processing well acknowledged in perceptual expertise research

#### *Alan C.-N. Wong1 \* and Yetta K. Wong2 \**

*<sup>1</sup> Perception and Experience Laboratory, Department of Psychology, The Chinese University of Hong Kong, Hong Kong, China*

*<sup>2</sup> Department of Applied Social Studies, City University of Hong Kong, Hong Kong, China*

*\*Correspondence: alanwong@cuhk.edu.hk; yetta.wong@cityu.edu.hk*

#### *Edited by:*

*Merim Bilalic, Alpen Adria University Klagenfurt, Austria*

#### *Reviewed by:*

*Guillermo Campitelli, Edith Cowan University, Australia Assaf Harel, National Institutes of Health, USA*

**Keywords: perceptual expertise, object recognition, modularity, visual training, review**

To understand the neural correlates of expert object recognition, Harel et al. (2013) proposed the use of an existing theoretical framework (Mahon et al., 2007; Martin, 2007) that emphasizes the interaction between different parts of the visual pathway as well as between the visual and other cognitive systems. While we agree that focusing more on the role of these interactions in expertise acquisition is a fruitful research direction, we would like to clarify the position of perceptual expertise researchers <sup>1</sup> . In fact, perceptual expertise researchers never regard face-selective areas as the only neural substrates important for expert object recognition. Nor do they deny the role of interaction between the visual system and other cognitive systems. Instead, perceptual expertise researchers have been considering the interaction between perceptual and cognitive processing as an important component in understanding perceptual expertise for different objects. It is therefore unnecessary to create the debate between the so-called "perceptual view" and "interactive view" of expert object recognition, as the interaction between perceptual and cognitive processing has been well accommodated in perceptual expertise research. We elaborate on this idea through the following two points:

#### **PERCEPTUAL EXPERTISE RESEARCHERS DO EMPHASIZE THE ROLE OF ATTENTIONAL AND HIGHER-LEVEL COGNITIVE FACTORS IN EXPERTISE ACQUISITION AND EXPRESSION**

Harel et al. (2013) states that according to perceptual expertise researchers, "expert processing. . . is automatic and stimulusdriven, with little impact of attentional, task demands or other higher-level cognitive factors" (p.2). Unfortunately, this characterization of the views of perceptual expertise researchers is inaccurate. Palmeri and Gauthier (2004), for example, proposes the abandoning of the strict distinction between perceptual and cognitive processes in understanding expert object recognition. Bukach et al. (2006), another landmark paper detailing the perceptual expertise framework, affirms that there are different kinds of perceptual expertise for different objects, and to distinguish between them both physical and conceptual (e.g., functional knowledge) properties should be considered.

Research has shown the importance of task demands in the *development* as well as *expression* of expertise in object recognition. Task demand during training is always a major factor determining whether or what kind of expertise would be formed (e.g., Tanaka et al., 2005; Scott et al., 2006; Krigolson et al., 2009; Wong et al., 2009a,b, 2012). Task demand during testing also affects whether and how much expertise effects can be observed (Wong et al., 2009b, 2012, 2014). For example, in Wong et al. (2009b), two groups of observers learned to categorize the same set of artificial objects (ziggerins) in different ways, leading to different changes of neural selectivity patterns for the trained objects. Importantly, the neural changes were better observed when the testing task matched the training task. Even the FFA shows higher activity to different objects in tasks that requires more attention such as old/new recognition than passive viewing (Rhodes et al., 2004). Therefore, using Harel et al.'s (2013) terms (their footnote 2), both "task-specific learning effects" and "task dependence following expertise training" have been well identified among perceptual expertise researchers (see also Box 2 of Bukach et al., 2006).

Perceptual expertise researchers put a lot of emphasis on the engagement of non-visual factors and the involvement of visual and non-visual areas outside the FFA (James and Gauthier, 2003, 2004, 2006; James and Atwood, 2009; James and Cree, 2010; Wong and Gauthier, 2010a; Bilalic et al., 2011a, 2010, 2012; Behrmann ´ and Plaut, 2013; Kersey and James, 2013). For example, Wong and Gauthier (2010a) found that expert perception of musical notes engages not only higher visual regions that are distinct from the face- or letter-selective regions, but also bilateral early retinotopic cortex, and a wide range of multimodal regions including auditory, audiovisual, somatosensory, motor, parietal, frontal, and various subcortical areas. Similarly, a distributed network of areas including the motor and inferior frontal cortices is also engaged selectively for visual judgments of letters (James and Gauthier, 2006). A wide range of brain

<sup>1</sup>Here "perceptual expertise researchers" refer to those who investigate perceptual expertise in object recognition. To evaluate our interpretation of the position taken by perceptual expertise researchers in general, we encourage readers to refer to the papers we cited, including mainly but not limited to the works of researchers from the Perceptual Expertise Network (PEN).

regions in the occipital, temporal, and frontal regions has also been found to be more active when chess experts performed visual judgment of chess pieces on chessboards (Bilalic et al., 2010 ´ ; JEPG).

Training studies also show clearly the engagement of a widespread neural network of areas for expert object processing. When comparing the neural training effects of two traditions of visual perceptual training protocols (namely perceptual learning and perceptual expertise training), a wide range of brain regions has been investigated, including the recruitment and disengagement of early retinotopic cortex, higher visual cortex, parietal cortex, and the superior temporal sulcus (Wong et al., 2012). James and Gauthier (2003, 2004) also found that participants who verbally learned to associate artificial objects with conceptual features showed activations in non-visual areas during subsequent, perceptual judgment on these objects, including superior temporal gyrus (hearing), inferior frontal gyrus (semantics), etc.

Perceptual expertise researchers often emphasize that experts tend to automatically process their objects of expertise in a certain way (e.g., holistically, at a subordinate level of abstraction) or by recruiting certain brain areas even without explicit task instructions or requirements (e.g., Gauthier et al., 2000; Wong et al., 2009a,b). Importantly, however, this does not mean that such processes cannot be influenced or even overridden by higher-level cognitive processing <sup>2</sup> . On the contrary, as described above, both training studies and studies with real-world expertise demonstrate that cognitive processing (e.g., attention shaped by the current task demand, multimodal integration, and semantics) is often engaged even in tasks requiring only perceptual judgments.

Furthermore, it has been postulated that non-visual processing not only is engaged but also plays a crucial role in shaping neural selectivity for expert object categories. For example, writing training is found to be more effective than visual practice in contributing to the formation of letter selectivity in the fusiform gyrus, indicating a close interaction between motor and perceptual areas (James and Atwood, 2009; Kersey and James, 2013). Recently, Behrmann and Plaut (2013) propose that, the selective engagement of the left and right fusiform gyri for word and face processing respectively may originate from the constraint to keep the connections between visual word processing areas and language processing areas (both leftlateralized) as short as possible. Therefore, even when accounting for selectivity in visual areas, a distributed network of brain areas should be and have been considered.

#### **EARLY PERCEPTUAL EXPERTISE RESEARCH FOCUSES MORE ON THE FACE-SELECTIVE AREAS IN ORDER TO ADDRESS THE "FACE MODULARITY" DEBATE, BUT THAT DOES NOT NECESSITATE THAT RESEARCHERS REGARD FACE-SELECTIVE AREAS AS THE ONLY BRAIN REGIONS IMPORTANT FOR EXPERT OBJECT RECOGNITION**

Despite the abundant research on the interaction between visual and cognitive processing in expert object recognition, why may perceptual expertise researchers be regarded as face-centric, as in Harel et al. (2013; p.4)? It has to do with the "face modularity debate" that heat up from the late 90's in the field of face perception.

The face modularity debate concerns the nature of the fusiform face area (FFA) in face processing: Is the FFA a module specialized for face recognition, or is it responsible for expert subordinate-level recognition of any objects? As stated in Bukach et al. (2006) and McGugin et al. (2012), the degree to which FFA activity is exclusive for faces lies in the center of the debate. In support of the latter view, perceptual expertise researchers have shown that acquisition of expertise with various object categories (e.g., cars, birds, "Digimon" cartoon characters, chess, and artificial objects like greebles) either leads to or is associated with increased selectivity in the FFA (Bilalic et al., 2011b ´ ; e.g., Gauthier et al., 1999, 2000; Grelotti et al., 2005; Xu, 2005). The modularity vs. expertise debate, however, is still ongoing (e.g., McGugin et al., 2012; Rezlescu et al., 2014). Therefore, a more accurate depiction of the field of expert object recognition is that, researchers (including those holding the modular and expertise views) have been focusing a lot on the FFA due to their research question concerning face modularity.

It is important to note that, although early perceptual expertise research focuses more on the face-selective areas in order to address the "face modularity" debate, that does not necessitate that researchers regard face-selective areas as the only brain regions important for expert object recognition. As an analogy, that one focuses on studying expert object recognition does not mean that one regards expert object recognition as the only important function of vision. As detailed in our first point, perceptual expertise researchers have been tackling issues other than the face modularity debate in recent years, and have expanded their investigations to different domains of perceptual expertise in a widespread network of brain areas.

#### **CONCLUSION**

Perceptual expertise researchers have been actively investigating the neural changes associated with expertise both inside and outside of the visual cortex. Tackling the face modularity debate, the majority of early effort has been put into clarifying the nature of the FFA. However, much work has since been devoted to studying the role of other high-level cognitive factors, including the effects of task demand on both the development and expression of expertise, the involvement of visual and non-visual areas in expert object recognition, the effects of conceptual associations, the way non-visual processes helps determine the pattern of visual object selective activity, etc. In sum, there is no such thing as a rivalry between the so-called "perceptual view" and "interactive view" of expert object recognition, and the interaction between perceptual and cognitive processing has been well accommodated in perceptual expertise research.

#### **REFERENCES**

Behrmann, M., and Plaut, D. C. (2013). Distributed circuits, not circumscribed centers, mediate visual

<sup>2</sup>Take as an example the composite task frequently used to measure holistic processing in expert object recognition (e.g., Richler et al., 2008; Wong and Gauthier, 2010b; Wong et al., 2011). A typical observer would spontaneously process all parts of an object of expertise even though the task instruction requires the observer to focus on one part, leading to imperfect performance. However, the performance is mostly well above chance, indicating that one can override their natural tendency of holistic processing at least to a certain extent to fulfill task requirement.

cognition. *Trends Cogn. Sci.* 17, 210–219. doi: 10.1016/j.tics.2013.03.007


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 22 January 2014; paper pending published: 21 February 2014; accepted: 25 April 2014; published online: 20 May 2014.*

*Citation: Wong AC-N and Wong YK (2014) Interaction between perceptual and cognitive processing well acknowledged in perceptual expertise research. Front. Hum. Neurosci. 8:308. doi: 10.3389/fnhum.2014.00308 This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Wong and Wong. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Expertise paradigms for investigating the neural substrates of stable memories

#### *Guillermo Campitelli\* and Craig Speelman*

*School of Psychology and Social Science, Edith Cowan University, Perth, WA, Australia \*Correspondence: g.campitelli@ecu.edu.au*

#### *Edited by:*

*Robert Langner, Heinrich Heine University Düsseldorf, Germany*

#### *Reviewed by:*

*Vaibhav A. Diwadkar, Wayne State University School of Medicine, USA Karl Friston, UCL, UK*

**Keywords: stable memories, long-term memory, expertise, experimental paradigm, brain imaging**

One of the hallmarks of acquiring expertise in any area of life is the ability to maintain relevant information over a long period of time (i.e., years or decades). Understanding the neural implementations of this ability requires the elucidation of two issues. First, the processes whereby recently acquired pieces of information become stable over time (i.e., memory consolidation); and second, the localization of these stable memories in the brain.

Unlike neurobiological and neuropsychological memory research, brain imaging research has paid little attention to these issues. Instead, most memory research in brain imaging has focused on the processes of memory encoding and retrieval. In this article we first succinctly present the current debate on the localization of stable memories in neurobiology and neuropsychology. We then discuss the difficulties in studying the localization of stable memories in human neuroimaging. After presenting the most traditional paradigm in studying long-term memory and the autobiographical memory paradigm, we present three expertise brain imaging paradigms. We also discuss how the latter help overcome the technical difficulties to investigate the neural localization of stable memories.

#### **NEUROBIOLOGY AND NEUROPSYCHOLOGY OF STABLE MEMORIES**

Memory consolidation has been intensively studied in mainstream neurobiology of memory with animal models (e.g., McGaugh, 2000; Dudai, 2004; Izquierdo et al., 2006). Moreover, neuropsychological studies of patients with brain lesions have been relevant to investigate the brain localization of stable memories (e.g., Milner et al., 1968; Squire and Alvarez, 1995; Nadel and Moscovitch, 1997).

A synthesis between these lines of research led to the development of two main models of the neural implementations of stable memories: the standard consolidation theory (SCT, Burnham, 1904; Squire and Alvarez, 1995; McGaugh, 2000; Dudai, 2004) and the multiple trace theory (MTT, Nadel and Moscovitch, 1997). A comparison between these theories is beyond the scope of this article (see Winocur et al., 2010, for further details). Suffice it to say that they agree on that, after acquiring new information, there is a period of *synaptic consolidation* that lasts from seconds to days, and that the hippocampus is essential in this process. They also agree on that there is a second type of consolidation—*system consolidation*—that lasts from months to decades, and that the result of this process is the formation of stable memories. However, there is no agreement whether these stable memories are localized in the hippocampus, in associative areas of the cortex, or in both.

#### **DIFFICULTIES IN STUDYING THE LOCALIZATION OF STABLE MEMORIES IN BRAIN IMAGING**

Brain imaging studies are paramount to the testing of those theories in healthy humans. In their influential review on brain imaging, Cabeza and Nyberg (2000, p. 22) clearly indicated why this is a challenging endeavor:

"Encoding refers to processes that lead to the formation of new memory traces. Storage designates the maintenance of memory traces over time, including consolidation operations that make memory traces more permanent. Retrieval refers to the process of accessing stored memory traces. Encoding and retrieval processes are amenable to functional neuroimaging research, because they occur at specific points in time, whereas storage/consolidation processes are not, because they are temporally distributed (Buckner and Koutstaal, 1998)."

#### **TRADITIONAL PARADIGM**

Long-term memory tasks typically involve a learning phase and a test phase. In the former, participants are presented with a set of items, and they are requested to memorize them. In the test phase participants are presented with items and they are requested to indicate whether or not each item was present in the previously presented set. The top row of **Table 1** illustrates this approach. After subtracting the brain activation of a perceptual-motor control task, the activation due to the longterm memory task is believed to represent the neural correlates of long-term memory (e.g., Duncan et al., 2012).

Brain imaging studies that use this traditional paradigm of long-term memory provide information on the neural implementations of how people learn new information, and how people retrieve information learned a few minutes or hours ago. However, the traditional paradigm fails to provide information on whether stable memories have a specific localization, and if so where such stable memories are localized in the brain.

#### **AUTOBIOGRAPHICAL MEMORY**

The field of research that aims at filling this gap is the field of autobiographical memory. Instead of using a learning **Table 1 | Experimental paradigms to investigate neural correlates of long-term memory (LTM).**


phase and a test phase, autobiographical memory studies use personal information provided by participants to generate experimental situations in which past information or experiences are retrieved during the experiment. For example, the pre-scan interview paradigm (e.g., Maguire and Mummery, 1999; see second row of **Table 1**) uses information gathered in a previous interview with participants to generate cues that would trigger past personal experiences. As a control condition participants answer general knowledge questions. One of the problems of this paradigm is that the brain activity during the experiment may reflect aspects of the interview, rather than the targeted past personal memories. A number of techniques have been proposed to overcome this problem, but they also have drawbacks. For example, Cabeza et al. (2004) proposed the "photo paradigm," in which participants are given a camera to keep track of events of their lives. After a few days or weeks, participants are shown photos taken by them or other photos in a brain imaging session. The assumption is that the subtraction of the pattern of brain activation between these conditions would reveal the neural implementation of past personal memories. This paradigm solves the problem of contamination of memories from the interview, but it does not enable the study of remote memories.

Below we present three paradigms—the expert archival paradigm, the expert memory paradigm and the expert vs. novice paradigm—used in expertise studies that shed light on the neural localizations of stable memories. Memory theories based on expertise research (e.g., chunking theory, Chase and Simon, 1973; template theory, Gobet and Simon, 1996) emphasize the role of stable memories acquired through a period of practice of years or decades. Therefore, it is not surprising that brain imaging studies with experts have focused on the neural localization of these stable memories. We illustrate these paradigms with studies using chess players as participants.

#### **EXPERT ARCHIVAL PARADIGM**

Given that experts learn domain-specific patterns, and that these patterns are stable memories, expertise studies aiming to uncover the brain localization of stable memories do not require a learning phase as in the traditional paradigm. Indeed, the learning phase occurred years ago. Moreover, if archival data is available, stimuli can be constructed, thus avoiding the interview of participants typical of autobiographical memory studies.

These features afford the possibility to design experiments with stimuli that would trigger the activation of wellconsolidated memories. The third row of **Table 1** illustrates this paradigm. For example, Campitelli et al. (2008) used the expert-archival paradigm, in which chess international masters were presented with positions of games they played in the past and positions belonging to other players. The task was to identify whether the positions belonged to their own games or not. In other words, this is a longterm memory task in which the learning phase occurred years before the experiment was conducted, and it is an autobiographical memory task in which a pre-scan interview was not necessary. The authors found a left-lateralized pattern of brain activation in the chess players. The pattern included activity at or near the left temporo-parietal junction, and a number of areas in the left frontal lobe, which is consistent with previous autobiographical memory studies (Maguire and Mummery, 1999; Maguire et al., 2000; Gilboa et al., 2004; Levine et al., 2004). The fact that the study with the expert archival paradigm showed similar results to the typical autobiographical memory paradigms provides evidence that the results found with the pre-scan interview are not an artifact of the paradigm.

#### **EXPERT MEMORY PARADIGM**

The previous paradigm sheds light on the neural localization of autobiographical stable memories. The expert memory paradigm helps understanding of the neural substrate of stable episodic and semantic memories. The expert memory paradigm also takes advantage of the fact that experts possess well consolidated memories of domain-specific patterns. It involves the comparison of experts' brain activity performing a task (e.g., a delayedresponse task) using domain-specific stimuli and the same task with another type of stimuli. For example, Campitelli et al. (2007) compared the brain activity of chess experts performing a delayedresponse task in two conditions: 1. stimuli were chess positions; and 2. stimuli were scenes with gray and white backgrounds and black and white shapes. This contrast is intriguing because it identifies the neural implementations of stable memories of domain-specific material by using a "working memory" task. This is because the working memory component of the delayed-response task is canceled out in the contrast. Incidentally, using the same task in this subtraction avoids the problem of "pure insertion" (i.e., the assumption that adding a process component to a task does not produce an interaction between the new component and other components of the task; Friston et al., 1996).

With this paradigm Campitelli et al. (2007) found activation in medial temporal areas. In a more localized study Bilalic et al. (2011b) ´ found activity in a medial temporal area—the fusiform face area in the fusiform gyrus. This study provides evidence in favor of the view that this area is involved in expertise acquired to differentiate between members of the same class (e.g., Curby and Gauthier, 2010), as opposed to the view that this is an area specialized in processing faces (e.g., Kanwisher and Yovel, 2006).

#### **EXPERT vs. NOVICE PARADIGM**

The expert vs. novice paradigm, popularized by Chase and Simon (1973), can also shed light on the neural localization of stable memories. It involves recruiting a group of non-experts, who are requested to perform the same tasks as experts (i.e., a simple task of the domain of expertise). For example, Bilalic et al. (2011a) ´ asked experts and novices to determine whether the king was in check in a chess position (see also Bilalic et al., 2010, 2011b, ´ 2012, for similar approaches). A comparison of the brain activity in experts to that of non-experts affords the possibility of identifying whether stable memories are located in the same areas as not-so-well consolidated long-term memories. For example, Guida et al. (2012, 2013) conducted a review of expertise and training studies, and they identified that, in comparison to non-experts, when performing "working memory" tasks experts use less brain activity in working memory areas. Furthermore, experts show more activity than non-experts in long-term memory areas. These results support a two-stage model of neural implementations of expertise; the first stage involves efficiency in working-memory processing, and the second comprises a restructuring of brain areas involved in the consolidation of domain-specific long-term memories.

#### **CONCLUDING REMARKS**

We have described three expertise paradigms that have contributed to investigating the neural localization of stable memories. The expert archival paradigm aims at investigating the localization of stable autobiographical memories, the expert memory paradigm investigates the localization of episodic and semantic memories, and the expert vs. novice paradigm is important when investigating whether stable memories are localized in the same areas as the not-so-well consolidated long-term memories, or whether they become stable in other areas of the brain. An additional advantage of expertise paradigms is that they typically show large effect sizes, which increase the probability of finding statistically significant results.

Given that the ability to maintain relevant information over years or decades is apparent in domain-specific experts and everyday life experts these paradigms have the potential to shed light not only on the neural implementations of expertise, but also on the neural implementations of long-term memory in general.

#### **REFERENCES**


*Received: 15 July 2013; accepted: 15 October 2013; published online: 01 November 2013.*

*Citation: Campitelli G and Speelman C (2013) Expertise paradigms for investigating the neural* *substrates of stable memories. Front. Hum. Neurosci. 7:740. doi: 10.3389/fnhum.2013.00740*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2013 Campitelli and Speelman. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Sparse distributed memory: understanding the speed and robustness of expert memory

#### *Marcelo S. Brogliato1, Daniel M. Chada1,2 and Alexandre Linhares <sup>1</sup> \**

*<sup>1</sup> Behavioral and Decision Sciences, EBAPE/Fundação Getulio Vargas, Rio de Janeiro, Brazil*

*<sup>2</sup> Computational Cognitive Science Lab., Department of Psychology, University of California, Berkeley, CA, USA*

#### *Edited by:*

*Merim Bilalic, Alpen Adria University Klagenfurt, Austria*

*Reviewed by: Michael Harre, The University of Sydney, Australia Eric Nichols, Google, USA*

#### *\*Correspondence:*

*Alexandre Linhares, Behavioral and Decision Sciences, EBAPE/Fundação Getulio Vargas, P. Botafogo 190, Rio de Janeiro, 22250-900, Brazil e-mail: alexandre.linhares@fgv.br*

How can experts, sometimes in exacting detail, *almost immediately and very precisely* recall memory items from a vast repertoire? The problem in which we will be interested concerns models of theoretical neuroscience that could explain the speed and robustness of an expert's recollection. The approach is based on Sparse Distributed Memory, which has been shown to be plausible, both in a neuroscientific and in a psychological manner, in a number of ways. A crucial characteristic concerns the limits of human recollection, the "tip-of-tongue" memory event—which is found at a non-linearity in the model. We expand the theoretical framework, deriving an optimization formula to solve this non-linearity. Numerical results demonstrate how the higher frequency of rehearsal, through work or study, immediately increases the robustness and speed associated with expert memory.

**Keywords: sparse distributed memory, non-linearity, critical distance, theoretical neuroscience, expert memory**

#### **1. INTRODUCTION**

*Szilard told Einstein about the Columbia secondary-neutron experiments and his calculations toward a chain reaction in uranium and graphite. Long afterward [Szilard] would recall his surprise that Einstein had not yet heard of the possibility of a chain reaction. When he mentioned it Einstein interjected, "Daranhabe ich gar nicht gedacht!"-"I never thought of that!" He was nevertheless, says Szilard, "very quick to see the implications and perfectly willing to do anything that needed to be done."*

—July 16, 1941, meeting between Leo Szilard and Albert Einstein concerning atomic weapons (Rhodes, 2012, p. 305).

How can experts—like Albert Einstein—immediately find meaning given very few cues? How can experts—like Leo Szilard—recollect, sometimes in exacting detail, memories that non-experts would find baffling? These abilities span wide across the spectrum of human activity: From full chess games played decades ago, to verses written by Dante, to exotic wines, or to the script and actors involved in movie scenes, experts can *almost immediately and very precisely* recall from a vast repertoire. How can neuroscience explain the speed and robustness of experts' recollection?

The work done herein can be related to the work done by Shepard (1957) and further developed by Nosofsky (1986); Shepard (1987) in the sense that the models investigated here use conceptual approximation and distancing in what could be considered a psychological space. However, this work does not aim to continue these authors' approaches to identification, categorization, similarity and psychological distance. Here we aim at discovering the bounds and limits of conceptual retrieval in human memory via the Sparse Distributed Memory (SDM) proxy.

Recently, Abbott et al. (2013) explored a computational level (as defined by Marr, 1892) account of SDM as a model of inference. We provide here an initial exploration that may further the work done by these authors, providing a theoretical foundation for a computational account of the edges of recollection via Sparse Distributed Memory (and possibly other architectures, by means of the connectionist common-ground).

Other approaches that are neurally plausible could include the template and chunk theory by Gobet et al. (Gobet and Simon, 2000; Gobet et al., 2001; Harré et al., 2012; Harré, 2013). Chunks are stored memory items, and templates include slots in which items can vary.

Recent findings by Huth et al. (2012) suggest that human semantic representation resides in a continuous psychological space. The authors provide evidence in the form of fMRI results supporting that human semantic representation resides in a continuous multidimensional space. The SDM model explored herein is consistent with these findings in that SDM permits hierarchical relationships between concepts, and instantiates a multidimensional conceptual space which holds attractors to memory items that are, in fact, continuous (as a function of their distance from the reading point).

Two of the concepts with which we will deal here are reflected in this 1941 meeting: *information content*, shown by Einstein's surprise involved in unexpected information; and *the ability to rapidly access memory, in detail*, shown by Szilard's "long afterward" recollection of the meeting. A third concept we will use is that evidence points toward memory being organized around cell assemblies, and Sparse Distributed Memory takes advantage of this concept.

#### **2. CELL ASSEMBLIES AND THE SPARSE DISTRIBUTED MEMORY MODEL**

#### **2.1. CELL ASSEMBLIES**

How is information encoded in the brain? We postulate that information is encoded by cell assemblies, not by individual neurons (Sakurai, 1996, 1998, 1999). There are at least five reasons leading to this position. (1) Neurons constantly die—yet the brain is robust to their loss. (2) There is large variability in the activity of individual neurons—as would be expected on anatomical and physiological grounds alone. (3) A single neuron does not participate in a single function; as Sakurai (1998) puts it:

Even the famous "face neurons" in the temporal cortex do not respond to single unique faces but to several faces or to several features comprising the several faces. (p. 2)

(4) Studies of activity correlation between neighboring neurons show very low, if not zero, correlation. (5) Finally, while the number of neurons is quite large, it is minute in comparison with the different combinations of incoming stimuli one experiences during one's lifetime.

Furthermore, recent literature suggests the connection between the increased activation of the fusiform face area (FFA) and the acquisition of expertise (Gauthier et al., 1999; Xu, 2005; McGugin et al., 2012). Current results hold strong evidence that FFA activation is correlated with domain-specific expertise in naturalistic settings (Bilalic et al., 2011b ´ ). Additionally, it is shown that expertise in object-recognition tasks modulates activation in different areas of the brain (Bilalic et al., 2011a ´ ), including homologous right-left hemispheric activation in both object and pattern recognition expertise (Bilalic et al., 2010, ´ 2012). This evidence and the preceding points serve to further emphasize the distributed role of activation in recognition and expertise.

Hence we subscribe to the hypothesis that the unit of information encoding is not the individual neuron, but groups of neurons, or cell assemblies (Sakurai, 1996, 1998, 1999). In this model, shown in **Figure 1**, a single neuron may participate in a large number of assemblies, and the possible number of assemblies is enormous. Cell assemblies, rather than being encumbered by such combinatorial explosions, actually *take advantage* of them, as we will see in Sparse Distributed Memory.

#### **2.2. SPARSE DISTRIBUTED MEMORY**

A promising research programme in theoretical neuroscience is centered around *Sparse Distributed Memory*, originally proposed by Kanerva (1988). SDM is a neuroscientific and psychologically plausible model of human memory.

#### *2.2.1. A large space for memory items*

SDM introduces many interesting mathematical properties of *n*-dimensional binary space that, in a memory model, are psychologically plausible. Most notable among these are robustness against noisy information, the tip-of-the-tongue phenomenon, conformity to the limits of short-term memory (Linhares et al., 2011), and robustness against loss of neurons. The model has been explored in the study of vision and other senses (Olshausen et al., 1993; Laurent, 2002; Rao et al., 2002; Mazor and Laurent, 2005). In spite of the increasing number of neuroscientists displaying interest in Sparse Distributed Memory (Ballard et al., 1997; French, 1999; Ludermir et al., 1999; Silva et al., 2004; Laurent, 2006; Bancroft et al., 2012), we still have limited understanding of its properties.

As in some other neuroscientific models, inhibitory and excitatory signals are represented in binary form. In SDM, both the data and the storage space belong to {0*,* 1}*n*, hence a particular memory item is represented by a binary vector of length *n*, henceforth called a *bitstring*. These binary bitstrings are stored (as with most computational memory models) in *addresses*. In SDM, these also take the form of *n*-dimensional binary vectors.

The distance between two bitstrings is calculated using the Hamming distance. Hamming distance is defined for two bitstrings of equal length as the number of positions in which the bits differ. For example, 00110*<sup>b</sup>* and 01100*<sup>b</sup>* are bitstrings of length 5 and their Hamming distance is 2.

The size of the {0*,* 1}*<sup>n</sup>* address space grows exponentially with the number of dimensions *n*; i.e., *N* = 2*n*. While Kanerva (1988) suggests *n* between 100 and 10*,* 000, recently he has postulated 10*,* 000 as a desirable minimum *n* (Kanerva, 2009). This is, of course, an enormous space, unfeasible to be physically implemented.

To solve the feasibility problem of implementing this memory, SDM takes a uniformly distributed random sample of {0*,* 1}*n*, having *N* elements, and instantiates only these points of the space. These instantiated addresses in the sample are called *hard locations* and each hard location implements a set of *n* counters, which we will see in more detail. The hard locations allow SDM to use the entire (virtual) {0*,* 1}*<sup>n</sup>* space through distributed read

and write operations (described in more detail below). A random bitstring is generated with equal probability of 0's and 1's in each dimension. Thus, the average distance between two random bitstrings has a binomial distribution with mean *μ* = *n/*2 and standard deviation *<sup>σ</sup>* <sup>=</sup> <sup>√</sup>*n/*4. For large *<sup>n</sup>*, the vast majority of the space lies "close" to the mean (i.e., between *μ* − 3*σ* and *μ* + 3*σ*) and has few shared hard locations: as *n* grows, two bitstrings with distance far from *n/*2 are very improbable. We define two bitstrings to be *orthogonal* when their distance is close to *n/*2.

**Figure 2** provides a simplified view of the model, with a small space for hard locations and a large space for possible locations. The model instantiates a random sample of about one million hard locations—which is in fact, a minute fraction of the space: for *n* = 100, only 100 · 106*/*2<sup>100</sup> = 7 · 10−<sup>23</sup> percent of the whole space "exists" (i.e., is instantiated), and for *n* = 1000 only 100 · 106*/*2<sup>1000</sup> = 7 · 10−<sup>294</sup> percent.

### *2.2.2. Creating a cell assembly by sampling the space at μ***−***3σ*

The activation of addresses takes place according to their Hamming distance from the datum. Suppose one is writing datum *η* at address *ξ* , then all addresses inside an *n*-dimensional circle with center *ξ* and radius *r* are activated. So, *η* will be stored in *all* of these activated addresses, which are around address *ξ* , as shown in **Figure 3**. An address *ξ* is inside the circle if its hamming distance to the center *ξ* is less than or equal to the radius *r*, i.e., *distance*(*ξ,ξ* - ) ≤ *r*. Generally, *r* = *μ* − 3*σ*. The radius is selected to activate, on average, 1*/*1000th of the sample, that is, approximately 1000 hard locations for a model with one million hard locations. To achieve this, a 1000-dimension memory uses an access radius *r* = 451, and a 256-dimensional memory, *r* = 103. This will generate a cell assembly to either store or retrieve a memory item. With this activation mechanism, SDM provides a method to write and read *any* bitstring in the {0*,* 1}*<sup>n</sup>* space.

#### *2.2.3. Writing an item to the memory*

**Table 1** shows an example of a write operation being performed in a 7-dimensional memory.

One way to view the write and read operations is to visualize *neurons (hard locations) as vectors*, that is vectors pointing to certain areas of the space. In the SDM model, the cell assembly (i.e., the set of active hard locations) work in unison, rather like a sum of vectors: as one writes bitstrings in memory, the counters of the hard locations are updated.

When a bitstring activates a set of hard locations, the active hard locations do not *individually* point to the bitstring that activated them, but, taken together, they point to a coordinate in space (that is, the bitstring that activated them). In this fashion, any one hard location can be said to simultaneously point to many different areas of the space, and any point in space is represented by the set of hard locations it activates.

In other words, both reading and writing depend on many hard locations to be successful. This effect is represented in **Figure 4**: where all hard locations inside the circle are activated and they, individually, do not point to *η*. But, as vectors, their sum points to the general direction of *η*. If another datum *ν* is written into the memory near *η*, the shared hard locations will have information from both of them and would not point (directly) to either. All hard locations, inside and outside of the circle, may also point elsewhere to other additional data points: as we have seen, even "face" neurons have multiple functions.

The write operation works as follows: Suppose one is writing datum *η* at address *ξ* : then all hard locations inside an *n*-dimensional circle with center *ξ* and radius *r* are activated. So, *η* will be stored in all these *activated* addresses, which are close to address *ξ* . An address *ξ* is inside the circle if its hamming distance to the center *ξ* is less than or equal to the radius *r*, i.e.,

**Table 1 | Write operation example in a 7-dimensional memory of data** *η* **being written to** *ξ* **, one of the activated addresses.**


*distance*(*ξ,ξ* - ) ≤ *r*. The information will be written to the entire cell assembly: thus, *all hard locations within the circle* will be updated.

Each hard location has both an *address* (given by its bitstring) and a *value*. The value is stored in *counters*. Each hard location has one counter for each dimension in the space. Each counter stores, for its dimension, the bit value that has been written more frequently (0's or 1's) to its hard location. So each counter, corresponding to each dimension, is incremented for each bit 1 and decremented for each bit 0 written to that hard location. Thus, if the counter is positive, the hard location has had more 1's than 0's written to it, if the counter is negative, more 0's than 1's, and if the counter is zero, there have been an equal number of 1's and 0's written to that particular dimension in that particular hard location.

Each datum *η* is written into the counters of every activated hard location inside the access radius, centered on the address *ξ* that equals the datum: *ξ* = *η*. If some neurons are lost, only a fraction of the datum is lost, and the memory remains capable of retrieving the right datum due to the high redundancy of the model.

#### *2.2.4. Reading an item from memory*

**Table 2** illustrates a read operation over a 7-dimensional memory. The read operation is performed by polling each activated hard location and choosing the most-written bit for each dimension. A hard location is considered *activated* if it is within a hamming



distance (radius) of the activating bitstring cue. Activated hard locations are taken into account in calculating the result of a read operation, while others are ignored. Reading consists of adding all *n* counters from the activated hard locations and, for each bit, setting it to 1 if the counter is positive, setting it to 0 if the counter if negative, and randomly setting it to 0 or 1 if the counter is zero. Thus, each bit of the returned bitstring is chosen according to all written bitstrings in the entire cell assembly (i.e., all active hard locations) and is equal to the bit value most written in that dimension. In short, the read operation depends on many hard locations to be successful. If another datum *ν* is written into the memory near *η*, the shared hard locations will have information from both of them without directly pointing to *ν* either. In this way, any one hard location may, in a fashion, simultaneously "point" to multiple addresses.

An imprecise cue *η<sup>x</sup>* shares hard locations with the target bitstring *η*—yet it should be possible to retrieve *η* correctly, even if additional reading operations become necessary to retrieve *η* exactly. When reading a cue *η<sup>x</sup>* that is *x* bits away from *η*, the cue shares many hard locations with *η* (see **Figure 5**). The number of shared hard locations decreases as the distance of the cue to *η* increases, in other words, as *x* = *d*(*ηx,η*) increases. The target datum *η* is read in all addresses shared between *η* and *ηx*, thus they will bias the read output toward the direction of *η*. If the cue is sufficiently close to the target datum *η*, the output of the read operation will be closer to *η* than *η<sup>x</sup>* originally was. Iterating the read operation will obtain results increasingly closer to *η*, until it is exactly the same. So *ηx*<sup>0</sup> will yield an *ηx*<sup>1</sup> that is closer, reading at *ηx*<sup>1</sup> yields an *ηx*<sup>2</sup> that is closer still and so on until *ηxi* = *η*, if the iteration converges. Hence, performing a sequence of successive read operations will allow convergence onto the target data *η*.

Since a cue *η<sup>x</sup>* near the target bitstring *η* shares many hard locations with *η*, SDM can retrieve data from imprecise cues (i.e., as an autoassociative memory). In spite of this characteristic, it is crucial to know how imprecise this cue could be while still converging. What is the maximum distance from our cue to the

original data that still retrieves the right answer? There is a precise point in which a non-linearity occurs, and the qualitative behavior of the model changes.

A striking feature of this model is its reflection of the psychological "tip-of-tongue" phenomenon, which seems to reflect the limits of human recollection. It is the psychological state in which one knows that one knows some pre-registered memory item, yet one is unable to recollect it at a given time.

The tip-of-the-tongue phenomenon occurs when a person knows that he/she has been previously exposed to a certain stimulus, but is unable to recall some specifics. In SDM, a tip-of-tongue memory event occurs when the expected time to convergence (or divergence) approaches infinity. In other words, when successive read iterations fail to converge or to diverge. Kanerva (1988) called this particular instance of *x*, where the output of the read operation averages *x*, the *critical distance*. Intuitively, it is the distance from which smaller distances converge and greater distances diverge. In **Figure 6**, the circle has radius equal to the critical distance and every *η<sup>x</sup>* inside the circle should converge. The figure also shows an example of convergence in four readings. We put that this is a proxy for the edge of human recall: a threshold until which recollection occurs, and beyond which it no longer occurs.

Kanerva describes this critical distance as the threshold of convergence of a sequence of read words. It is "the distance beyond which divergence is more likely than convergence" (Kanerva, 1988). Furthermore, "a very good estimate of the critical distance can be obtained by finding the distance at which the arithmetic mean of the new distance to the target equals the old distance to the target" (Kanerva, 1988).

Kanerva has analytically derived this non-linearity for a very particular set of circumstances. His original book analyzed a specific situation with *n* = 1000 (*N* = 21000), 1,000,000 hard locations, an access-radius of 451 (with 1000 hard locations in each circle) and 10,000 writes of random bitstrings in the memory.

This is a very particular set of parameters, and doesn't shed light on questions of speed and robustness of expert recollection. In the next section we deal with this non-linearity and the issue of analyzing critical distance as an optimization problem.

In subsequent sections, we will derive an equation for the critical distance, in terms of SDM's parameters. We will then present empirical results of the evolution of the critical distance under varying conditions,which shed light on the model's behavior. It is worth noting that, since SDM is itself a computer simulation, what we call *empirical results* refer to conclusions obtained over data from thousands of runs of the simulation. All data and conclusions (aside from theory) herein refer to trials over computer simulations.

#### **3. MATERIALS AND METHODS**

#### **3.1. DERIVING THE CRITICAL DISTANCE AS A MINIMIZATION PROBLEM**

Kanerva has shown that, when 10*,* 000 items are stored in the memory, and the number of dimensions *N* = 1000, then the critical distance is at a Hamming distance of 209 bits: if one reads the item at a distance smaller than 209 bits, one is able to iteratively converge toward the item. If, on the other hand, one reads the item at a distance higher than 209 bits, the memory cannot retrieve the item. Furthermore at the juncture of about 209 bits, expected time to convergence grows to infinite. This reflects the aforementioned tip-of-the-tongue phenomenon: when one knows that one knows a particular bit of knowledge, yet is unable to retrieve it at that point. Psychologically, this would entail some top–down mechanism which would force the iterated search to halt. We establish a maximum number of iterated reads, based on repeated simulations (see section 4.2).

Kanerva thus fixed a number of parameters in order to derive this mathematical result:


As Kanerva defined it, approximately half of read operations 209 bits away from the target data will bring us closer to the target and approximately half will move us away from the target. His math could be simplified to this: each item will activate approximately 1000 hard locations, so writing 10*,* 000 items randomly will activate a total of 10*,* 000*,* 000 hard locations, giving an average of 10 different bitstrings written in each hard location. When one reads from a bitstring *η*200, 200 bits away of the target *η*, *η*<sup>200</sup> will share a mean of 97 hard locations with the target (Kanerva, 1988, Table 7.1, p. 63). This way, it is possible to split the set of active hard locations into two groups: one group having 903 hard locations with 10 random bitstrings written into each; and other group having 97 hard locations each with 9 random bitstrings plus our target bitstring *η*.

Let us analyze what happens to each bit of the read bitstrings. To each bit we have 903 · 10 + 97 · 9 = 9903 random bits out of a total of 10*,* 000 bits. The total number of 1-bits is a random variable that follows the Binomial distribution with 9903 samples and *p* = 0*.*5. It has a mean of 9903*/*2 = 4951*.*5 and standard deviation <sup>√</sup>9903*/*<sup>4</sup> <sup>=</sup> <sup>49</sup>*.*75. If our target bit is 0 we will choose correctly when our sum is less than half total, or 10*,* 000*/*2 = 5000. If our target bit is 1, our sum is the random variable of total 1-bits added by 97 1-bits from our sample. Adding a constant number changes only the mean and does not affect the standard deviation. So we will choose correctly when our sum of means 4951*.*5 + 97 = 5048*.*5 and standard deviation 49*.*75 is greater than 5000. Both probabilities here equal 83% of choosing the same bit as the target. As we have 1000 bits, in average, we can predict that the result of the read operation will be 170 bits away from the target.

The critical distance is the point where the aforementioned probability equals the distance from the bitstring *η<sup>x</sup>* to the target *η*, or *x* = *n*(1 − *p*), where *x* is the distance from the bitstring to the target, *p* is the probability of choosing the wrong value of a bit (given by the above technique), and *n* is the number of dimensions.

Given that we intend to study the critical distance as a theoretical proxy for the limits of human recollection, we would like to explore a larger number of possibilities and parameter settings of the model. Hence we compute the non-linearity of the critical distance as minimization problem. Let:


Consider a memory in which a total of *s* bitstrings have already been stored via write operations. Each of these write operations would have activated approximately *h* hard locations. This way, on average, all write operations together activate a total of *sh* hard locations. This gives an average of *sh/H* random bitstrings stored in each hard location.

Knowing the average number of bitstrings stored in each hard location, it is simple to find an equation for *θ*. Each read operation performed for a cue *η<sup>d</sup>* has *φ*(*d*) hard locations shared with the target bitstring *η*, and *h* − *φ*(*d*) non-shared hard locations. The non-shared hard locations have only random bitstrings stored in themselves. However, the shared hard locations have the target bitstring written *w* times, resulting in fewer random bitstrings. As the average number of bitstrings written in each hard locations is *sh/H*, we have:

$$\begin{aligned} \theta &= \frac{s \cdot h}{H} \cdot [h - \phi(d)] + \left(\frac{s \cdot h}{H} - \nu\right) \cdot \phi(d) \\\theta &= \frac{s \cdot h^2}{H} - \nu \cdot \phi(d) \end{aligned}$$

Suppose the *k*-th bit of our target bitstring is zero. The read operation will correctly choose bit 0 if, and only if, more than half of the bitstrings from the activated hard locations has the *k*-th bit 0 (setting aside the case of an equal number of zeros and ones<sup>1</sup> ). As each hard location has *sh/H* bitstrings and the read operation activates *h*, half of the bitstrings equals *h* · *sh/*(2*H*) = *sh*2*/*(2*H*). Then, to choose correctly, we should have *<sup>θ</sup> <sup>i</sup>* <sup>=</sup> <sup>1</sup> *Xi < sh*2*/*(2*H*), where *Xi* is the *k*-th bit of the *i*-th bitstring stored in each activated hard location.

Suppose the *k*-th bit our target bitstring is 1. The read operation will choose bit 1 when more than half of the bitstrings from the activated hard locations has the *k*-th bit 1. We have already seen that half of the bitstrings is *sh*2*/*(2*H*). But here, as the bit equals 1 and there are *w* target bitstrings in each *φ*(*d*), we have to add *w* · *φ*(*d*) to the sum. In other words, we must account for the number of times the target was written into the hard locations which are activated by both the target and the cue which is at a distance *d*. This gives us *w* · *φ*(*d*) + *<sup>θ</sup> <sup>i</sup>* <sup>=</sup> <sup>1</sup> *Xi > sh*2*/*(2*H*).

Summarizing, we have:

$$P(wrong|bit = 0) = 1 - P\left(\sum\_{i=1}^{\theta} X\_i < \frac{sh^2}{2H}\right)$$

$$P(wrong|bit = 1) = P\left(\sum\_{i=1}^{\theta} X\_i < \frac{sh^2}{2H} - w \cdot \phi(d)\right)$$

We already know that *P*(*Xi* = 1) = *P*(*Xi* = 0) = 1*/*2. Since each *Xi* corresponds to a Bernoulli trial, *<sup>θ</sup> <sup>i</sup>* <sup>=</sup> <sup>1</sup> *Xi* ∼ *Binomial*(*θ*; 0*.*5), which has mean *θ/*2 and standard deviation <sup>√</sup>*θ/*4.

The critical distance is the distance where the chance of convergence to the target equals the distance of divergence from the target. That is, in the critical distance, the probability of a wrong choice of the bit, times the number of bits, is equal to the original distance to the target. Then, the critical distance is the *d* that satisfies equation *P*(*wrong*) · *n* = *d* or *P*(*wrong*) = *d/n*.

Using the theorem of total probability, we have:

$$P(wrong) = P(wrong|bit = 0) \cdot P(bit = 0)$$

$$+P(wrong|bit = 1) \cdot P(bit = 1)$$

if we let

$$\alpha = P\left(\sum\_{i=1}^{\theta(d)} X\_i < \frac{sh^2}{2H}\right).$$

<sup>1</sup>the case for a random coin toss is negligible, since, as *θ* becomes large, its probability tends toward 0 quickly.

and

$$\beta = P\left(\sum\_{i=1}^{\theta(d)} X\_i < \frac{sh^2}{2H} - \omega \cdot \phi(d)\right).$$

thus,

$$P(\text{wrong}) = \frac{1}{2} \cdot [(1 - \alpha) + \beta],$$

This way, the equation to be solved is:

$$\frac{1}{2} \cdot [(1 - \alpha) + \beta] = \frac{d}{n}$$

Since *d* is an integer value and *θ* is a function of *d*, this equality may not be achievable (this describes a range, where for a certain *d*: *leftside > rightside* and for *d* + 1: *leftside < rightside*). In these cases, the critical distance can be obtained minimizing the following equation with the restriction of *d* ∈ N and *d* ≤ *n*:

$$f(d) = \left\{\frac{1}{2} \cdot \left[\left(1 - \alpha\right) + \beta\right] - \frac{d}{n}\right\}^2$$

If the size of the cell assembly, *θ*, is large enough, a good approximation to the *Binomial*(*θ*; 0*.*5) is the normal distribution. Let *N* be the normalized normal distribution with mean zero and variance one. We have:

$$\alpha \simeq N\left(z < \frac{sh^2/(2H) - \theta/2}{\sqrt{\theta/4}}\right) = \tilde{\alpha}$$

$$\beta \simeq N\left(z < \frac{sh^2/(2H) - w \cdot \phi(d) - \theta/2}{\sqrt{\theta/4}}\right) = \tilde{\beta}$$

Simplifying, we have:

$$N\left(z < \frac{sh^2/(2H) - \theta/2}{\sqrt{\theta/4}}\right) = N\left(z < \frac{w \cdot \phi(d)}{\sqrt{\theta}}\right)$$

$$N\left(z < \frac{sh^2/(2H) - w \cdot \phi(d) - \theta/2}{\sqrt{\theta/4}}\right) = N\left(z < \frac{-w \cdot \phi(d)}{\sqrt{\theta}}\right)$$

And we have to minimize the following function with restrictions of *d* ∈ N and *d* ≤ *n*:

$$\tilde{f}(d) = \left\{\frac{1}{2} \cdot \left[1 - \tilde{\alpha} + \tilde{\beta}\right] - \frac{d}{n}\right\}^2$$

In the case studied in Kanerva (1988), *n* = 1000, *h* = 1000, *H* = 1*,* 000*,* 000, *s* = 10*,* 000, *w* = 1, and *θ* = 10*,* 000 − *φ*(*d*). Replacing these values in the equation, we have to minimize:

$$\tilde{f}(d) = \left\{\frac{1}{2} \cdot \left[1 - \tilde{\alpha} + \tilde{\beta}\right] - \frac{d}{1000}\right\}^2$$

When *d* = 209, we have *φ*(*d*) = 87 and ˜*f*(209) ∼= 0*.*00032, which is the global minimum.

We note once again that equations to calculate *φ*(*d*) have been derived in Kanerva (1988, Appendix B) and need not be repeated here—see also the derivations for higher *d* by de Pádua Braga and Aleksander (1995). The example calculated above used Table 7.1 of Kanerva (1988), which has the values of *φ*(*d*) for *d* in a 1000 dimensional SDM with one million hard locations.

In the following section we briefly reiterate and discuss the contribution of the theoretical model, and turn to empirical results pertaining to the exploration of the critical distance. We vary parameters of the memory model in order to explore the changes to the critical distance. These empirical trials yield enlightening results pertaining to the critical distance as a parallel for the edge of human recollection and for human expertise in SDM.

#### **4. RESULTS**

In this text, we show that, given *α*˜, *β*˜, and *d*, minimizing the function (repeated here from the previous derivation):

$$\tilde{f}(d) = \left\{\frac{1}{2} \cdot \left[1 - \tilde{\alpha} + \tilde{\beta}\right] - \frac{d}{n}\right\}^2$$

solves the issue of non-linearity involved in the critical distance of the model, that is, the psychological limits of human recollection at a given point in time. Such result should be valuable to assess whether the memory is prone to convergence or divergence.

This result may help provide avenues of exploration in theoretical neuroscience and can be readily available to cognitive modelers. Yet, it still falls short of giving us an intuitive understanding of the speed and robustness of the memory of experts. Therefore, we will explore the critical distance behavior at different configurations. We have implemented the model and conducted a large set of computational experiments, whose visualizations illuminate the issue of expert memory.

#### **4.1. NUMERICAL SIMULATIONS: VISUALIZING THE MEMORY DYNAMICS**

So far we have seen a single particular case with set parameters, and our goal is to understand the speed and robustness of expert memory. Let us consider variations of these parameters, and compute, through simulations, the behavior of the critical distance. We vary the number of dimensions *N* ∈ {256*,* 1000}, we vary the number of stored items from the set{1000*,* 2000*,...,* 50000}, and we vary the rehearsal number: the number of times an item has been stored in the memory.

The following figures depict heat maps describing the behavior of the critical distance. In these simulations, all items are stored at their respective locations, that is, a bitstring *x* is stored at the location *x*. Generating each heat map proved computationally demanding: when *N* = 1000, approximately 305*,* 000*,* 000*,* 000*,* 000 bit-compares are required (storage of items in memory: 5 · 1013, and to read items from memory: 3 · 10<sup>15</sup> bit-compares). *Each individual pixel* demands an average of 7*,* 000*,* 000*,* 000 bit-compares.

All figures presented below have three colored lines. The green line marks the *first occurrence of non-convergence* to the exact target bitstring. The red line marks the *last occurrence of the* *convergence* to the exact target bitstring. Finally, the blue line marks the *estimated critical distance*, that is where the read output, on average, equals the input distance to the target bitstring. It is an estimation because the critical distance is not exactly defined this way. Critical distance is the point or region in which both divergence and convergence have a 50% chance to occur. That is, all points before the green line converged, all points after the red line diverged, and the points between these lines sometimes converge and sometimes diverge.

One may notice that, despite not having an exact convergence, almost all points between the green and the red line are near the target bitstring.

#### **4.2. INFLUENCE OF ITERATIVE READINGS IN CRITICAL DISTANCE**

The number of iterative-readings is an important parameter of an SDM implementation. Simulations were done in a 1000 and 256-dimensional SDM. Both with one million hard locations, activating (on average) 1000 hard locations per operation and varying the number of times the target bitstring *η* is written to memory.

For each write-strength of *η* (written once, twice, five times, nine times) we varied the saturation of the SDM, that is, the number of random bitstrings written (once each) along with *eta* in the memory. We varied this from 1000 to 50*,* 000 random bitstrings, in increments of 1000. Once populated with *eta* plus the random bitstrings, we performed 1–40 iterative-readings at each possible distance from the target (from zero to the number of dimensions).

**Figures 7A–D** show, respectively, a 1000-dimensional SDM checked with a single read, 6, 10, and 40 iterative-readings. It is easy to see a huge difference from a single read to more reads, but a small difference from 6 to 10 and from 10 to 40 iterativereadings. These observations also apply to our tests with the 256-dimensional SDM. As compared to the 1000-dimensional SDM, we found a smaller, more gradual difference from a single read to more reads, yet a minute difference from 6 to 10 and from 10 to 40 iterative-readings. Following these results, due to the number of computations needed in each simulation, all other simulations were done using 6 iterative-readings, since 40 iterative-readings have only a slight improvement in relation to six.

It is unexpected that, after 40*,* 000 writes in the 1000 dimensional memory, the critical distance is so small. Kanerva (1988) showed that, under these parameters, the memory capacity is slightly less than 100*,* 000 items. The author defines SDM capacity as saturated when its critical distance is zero. In the 256 dimensional memory, this behavior starts after 20*,* 000 writes. This is unexpected, since Kanerva's estimation for *N* = 256 is between 112*,* 000 and 137*,* 000 random bitstrings stored.

Our principal hypothesis for the discrepancies between our empirical results and the original theory is that, while the hard locations are instantiated as samples from a uniform distribution and our simulations wrote bitstrings randomly, they do not saturate uniformly. Any write activates a fixed average (around 1000 in our case) of hard locations, but the variance in this case is not insignificant. One bitstring read may activate 900 while another (in another area of the space, be it close or far) may activate 1100 hard locations. Thus, certain hard locations would become more

**FIGURE 7 | Influence of number of iterative-readings in a 1000-dimensional SDM memory.**

noise than signal during activation sooner, rather than a uniform degradation occurring. This discrepancy would cause, in the aggregate, a saturation of the SDM with fewer bitstrings stored than expected in theory. This remains one possibility, though we hope the issue will be explored in future work.

#### **4.3. INFLUENCE OF THE NUMBER OF WRITES ON THE CRITICAL DISTANCE**

The influence of the number of writes on the critical distance was not analyzed by Kanerva. It is important because, when a random bitstring is seen only once, it is psychologically plausible that it will be gradually forgotten with new incoming information. What matters is not exactly the number of writes, but the proportion of the number of times a bitstring was stored in relation to others.

A remark on cognitive psychology is in order here. Consider, as an example, the aforementioned exchange between Szilard and Einstein. As an expert confronts unexpected information, it is reasonable to expect that additional memory writes will occur. If we presume that evolution brought the human memory close to optimality, as explored by the rational analysis approach (Anderson and Milson, 1989; Anderson, 1990; Anderson and Schooler, 1991), one would expect some mechanism akin to Shannon's idea of *information content* to be in play.

That is, as an expert is surprised by new, unforeseen, information, say, an outcome , with information *I*() = −*log*(*P*()), where *I* stands for the information content in outcome . One would therefore expect the expert's memory to either place additional attention to the outcome, leading to: (1) additional writes to memory, or (2) amplification of the write operation's signals, or possibly (3) both effects.

**Figures 8A–D** show a 1000-dimensional SDM with 1, 2, 5, and 9 writes of the target bitstring *η*. It is easy to see a huge difference from 1 to 2 writes. Although the green line has a strange behavior near 50*,* 000 items stored, the critical distance was much greater than with 1 write. From 2 to 5 to 9 rehearsals, the critical distance starts growing rapidly and slows down near six writes. This makes sense, since it should have a threshold smaller than 500 bits.

The 256-dimensional memory has a similar behavior, but less abrupt. It keeps growing, but slower than a 100-dimensional memory. It never crosses the 50 bits on x-axis in 256 bits, while the 1000-dimensional reaches the 200 bits on x-axis and almost hits 400 bits on the x-axis.

These figures display the immense power of reinforcement or rehearsal: additional writes of a memory item significantly raise the attractor basin (critical distance) for that memory item.

This behavior is plausible, as the human brain rapidly recognizes a pattern when it is used to it. Many times, the patterns appear in different contexts, giving cues far from the target concept, much like a chess player, who looks at a position and rapidly recognizes what is happening (Bilalic et al., 2009; Rennig et al., ´ 2013).

#### **5. DISCUSSION**

This is the first work focused on better understanding the critical distance behavior of a Sparse Distributed Memory (Kanerva, personal communication). Our future research intends to explore the rehearsal mechanisms in cognitive architectures for one of the most studied domains of expertise: (Linhares, 2005; Linhares

**FIGURE 8 | Influence of number of target writes in a 1000-dimensional SDM memory.**

and Brum, 2007; Linhares and Freitas, 2010; Linhares et al., 2012; Linhares and Chada, 2013; Linhares, 2014), and attempt to bridge the low-level world of neurons and their assemblies with the high-level world of abstract thought and understanding of strategic scenarios. We have argued here that, as SDM remains both a psychologically plausible and a neuroscientifically plausible model of human memory, the study of its critical distance may provide insights into the edges of our own recollection. Without a precise understanding of the critical distance behavior, one cannot advance the theoretical model. Moreover, one cannot develop robust applications without knowing the limits of convergence.

The empirical tests shown here confirmed that the critical distance in SDM constitutes a "band" wherein both convergence and divergence become less and less likely. This is a palatable result because, intuitively, the Tip-of-the-Tongue phenomenon in humans seems like an attractor, something we sometimes "fall into." We argue this is a parallel between SDM and human recollection, and posit that our theoretical and empirical results provide evidence that the critical distance is a correlate to the edge of human recollection.

While humans sometimes fall into the TotT, there are also times when we *almost* fall into it and, after a bit of effort, are able to recall the desired information. In the model, this would mean we enter the critical band, but leave it after one or two iterations and converge. Likewise, it seems one can be very certain of what one is saying and, in mid-sentence, completely diverge from the next piece of information we wished to recall. In SDM this would amount to entering the critical band, but then diverging.

As **Figure 6** shows, the speed of convergence is a function of the number of read operations: additional read operations bring one closer to the memory item (assuming that the original cue was not past the critical point). We also see that this effect is greatly reduced after 6 to 10 read operations. As **Figure 7** shows, expertise can be correlated with providing additional writes to the memory, and we show that increasing the rehearsal number *greatly increases the margin for error or ambiguity*, and *greatly decreases the relevant information needed for convergence*, as the critical threshold is increased. In human terms, experts "know what you are talking about" with fewer cues. Their memory has much greater robustness.

Yet, it is *the combination of these two dynamics* that sheds light on experts' speed. Taking the SDM model as a plausible account of human memory, we can compare by saying that, for experts, having a much higher threshold may signify *being able to converge within fewer, or even a single, read operation*. As the hard locations have been reinforced with the original information, read operations converge faster. With very few cues and noisy, ambiguous, information, experts may still manage to recollect and understand almost immediately the object, situation, or event in question. It is no wonder Albert Einstein could immediately grasp Leo Szilard's concerns.

#### **5.1. DATA SHARING**

All the computational methods developed in this study are available as an open-source project, and can be found at https:// github*.*com/msbrogli/sdm.

#### **FUNDING**

This work has been generously supported by grants from the Fulbright Foundation, and FAPERJ Foundation (grants E-26/110.540/2012 and E-26/111.846/2011), grants from the CNPq Foundation (grants 401883/2011-6 and 470341/2009-2), the Pro-Pesquisa program of FGV Foundation.

#### **ACKNOWLEDGMENTS**

The authors are enormously grateful for a series of discussions with Dr. Pentti Kanerva, Dr. Eric Nichols, Dr. Robert M. French, Dr. Rafael Godszmidt, Dr. Alexandre Mendes, Dr. Christian N. Aranha, Felipe Buchbinder, Ariston D. de Oliveira, and Manuel Doria. We would also like to thank the referees for their lucid and highly constructive reviews of our work.

#### **SUPPLEMENTAL MATERIAL**

The supplemental data file consists of five parts: (1) An introduction to the computational methods available in https://github*.* com/msbrogli/sdm; (2) A large set of additional heatmaps, documenting the behavior of the model in a 1000-dimensional memory; (3) The same tests on 256-dimensional memory, (4) all rehearsal results for 256-dimensional memory, and (5) the same tests for 1000-dimensional memory.

The Supplementary Material for this article can be found online at: http://www*.*frontiersin*.*org/journal/10*.*3389/fnhum*.* 2014*.*00222/abstract

#### **REFERENCES**


Marr, D. (1892). *Vision*. New York, NY: W.H. Freeman.

Mazor, O., and Laurent, G. (2005). Transient dynamics versus fixed points in odor representations by locust antennal lobe projection neurons. *Neuron* 48, 661–673. doi: 10.1016/j.neuron.2005.09.032


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 25 September 2013; accepted: 28 March 2014; published online: 28 April 2014.*

*Citation: Brogliato MS, Chada DM and Linhares A (2014) Sparse distributed memory: understanding the speed and robustness of expert memory. Front. Hum. Neurosci. 8:222. doi: 10.3389/fnhum.2014.00222*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Brogliato, Chada and Linhares. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Functional cerebral reorganization: a signature of expertise? Reexamining Guida, Gobet, Tardieu, and Nicolas' (2012) two-stage framework

#### *Alessandro Guida1 \*, Fernand Gobet <sup>2</sup> and Serge Nicolas <sup>3</sup>*

*<sup>1</sup> Département de Psychologie, Centre de Recherche en Psychologie, Cognition et Communication, Université Rennes 2, Rennes, France*

*<sup>2</sup> Department of Psychological Sciences, University of Liverpool, Liverpool, UK*

*<sup>3</sup> Institut de Psychologie, Université Paris Descartes, Boulogne Billancourt, France*

*\*Correspondence: alessandro.guida@univ-rennes2.fr; alessandro.guida.psychology@gmail.com*

#### *Edited by:*

*Merim Bilalic, University Tübingen, University Clinic, Germany*

#### *Reviewed by:*

*Guillermo Campitelli, Edith Cowan University, Australia*

*Robert Langner, Heinrich Heine University Düsseldorf, Germany*

**Keywords: expertise, working memory, functional cerebral reorganization, chunks, templates, retrieval structures**

In 2012, Guida, Gobet, Tardieu and Nicolas proposed a two-stage framework to explain how cognitive changes due to practice could shape experts' brain physiologically and thus explain neuroimaging data of expertise acquisition. In this paper, after presenting the motivations for such a framework and the framework itself, we examine the idea that functional cerebral reorganization (FCR) could be used as a signature for expertise.

#### **CHUNKS, TEMPLATES AND RETRIEVAL STRUCTURES**

In the mid-nineties, Ericsson and Kintsch (1995) and Gobet and Simon (1996) proposed Long-Term Working Memory theory (LTWMT) and Template Theory (TT), respectively, in order to account for behavioral data in the domain of expertise. These data were difficult to explain with the sole concept of chunk (Chase and Simon, 1973), given the severe limitations of working memory (WM) (7 ± 2 for an optimistic estimation, Miller, 1956; but for recent reevaluations, see Cowan, 2001; Gobet and Clarkson, 2004; Mathy and Feldman, 2012). For example, several experiments (e.g., Charness, 1976; Frey and Adesman, 1976; Glanzer et al., 1984) showed that interfering tasks had almost no effect on WM performance or text comprehension with experts. Yet, according to chunking theory (Chase and Simon, 1973), interfering tasks should wipe out the content of WM where information is stored. This led Ericsson and Kintsch (1995) and Gobet and Simon (1996) to suggest that information was not stored in WM as initially proposed, but was rapidly and efficiently transferred in LTM, where the interfering tasks has no effect. Both theories proposed that this was possible only if knowledge structures were built. These structures were called templates with TT and retrieval structures with LTWMT. Even if differences exist between the two theories (e.g., Ericsson and Kintsch, 2000; Gobet, 2000a,b), LTWMT and TT revolve around the same fundamental core idea: Fast and reliable transfer in LTM becomes possible with expertise via knowledge structures, which enables LTM to be used during WM tasks, thus giving the appearance of expanding individuals' WM capacity. These two cognitive theories have been used to explain not only behavioral but also neuroscientific data (e.g., Pesenti et al., 2001; Ericsson, 2003; Campitelli et al., 2007; Bilalic et al., ´ 2010).

#### **EXPLAINING NEUROIMAGING DATA IN EXPERTISE ACQUISITION: A TWO-STAGE FRAMEWORK**

Recently, the core idea of the two theories has been used by Guida et al. (2012) to bridge together, for the first time, (a) neuroimaging data acquired from novice undergoing practice in WM-related tasks and (b) neuroimaging data acquired from experts in WM-related tasks. The results of the two groups of studies, which belong to two separate domains of research, diverge. Neuroimaging of novices practicing from 2 h up to 5 weeks show mainly a decrease of activation in prefrontal and parietal areas (for similar conclusions, see Kelly and Garavan, 2005; Hill and Schneider, 2006; Buschkuehl et al., 2012). Conversely, neuroimaging studies of experts who are compared to novices are more compatible with FCR, viz., experts and novices use different brain areas and different mental operations to perform similar tasks (for similar conclusions, see Ericsson, 2003). Notwithstanding these divergent findings (brain activation decrease vs. FCR), the core idea behind LTMWT and TT allows bridging these two neuroimaging patterns into a coherent two-stage framework.

#### **FIRST STAGE: DECREASE OF ACTIVATION DUE TO CHUNK CREATION AND RETRIEVAL**

When novices start practicing, and if the activity is new, the first important process is chunk creation. While executing their new activity several times, novices will start gradually chunking separate elements together through binding, viz., encoding the relations among stimuli that co-occur (Cohen and Eichenbaum, 1993). Once chunks have been created and thus stored in LTM, chunk can be retrieved and therefore used, allowing encoding multiple elements in WM with one chunk (e.g., "f," "b," "i," can be encoded as one element in WM instead of three), using fewer resources.

At a physiological level, chunk creation (through binding) and chunk retrieval are two reasons to expect brain activation decrease. First, if the binding process occurs in prefrontal regions (Prabhakaran et al., 2000; Raffone and Wolters, 2001) and in parietal regions (Shafritz et al., 2002; Oakes et al., 2006), less activation should be observed in these regions after a period of training, because as training progresses, fewer chunks will be created. Second, the use of chunks through chunk retrieval makes it possible to encode information in WM with less resources, as elements are grouped. Several researchers have shown that, physiologically, there is a correlation between the number of elements in WM and brain activity in prefrontal and parietal areas<sup>1</sup> (Todd and Marois, 2004; Vogel and Machizawa, 2004; Cowan, 2011). Therefore, if less WM space is used through chunk retrieval, decrease of brain activity should be expected in prefrontal and parietal WM areas (**Figure 1**).

#### **SECOND STAGE: FUNCTIONAL CEREBRAL REORGANIZATION DUE TO KNOWLEDGE STRUCTURES CREATION AND RETRIEVAL**

With practice (e.g., Cowan et al., 2004; Chen and Cowan, 2005) and expertise (e.g., Chase and Simon, 1973; Gobet and Simon, 1996), chunks get larger and more complex, and with years of training they become knowledge structures. For experts, the peculiarity of these structures, when used in their domain of expertise, is to allow rapid and reliable encoding in episodic LTM, even in WM-like conditions (fast presentation times of multiple elements) when usually elements can only be encoded reliably in WM (**Figure 1**).

In terms of brain activation, at this stage, not only a cerebral activation pattern compatible with WM activities is expected but conjunctly a pattern compatible with episodic LTM, that is, medial temporal lobe (MTL) activations (Gabrieli et al., 1997; Young et al., 1997; Lepage et al., 1998; for reviews, see Squire et al., 2004; Eichenbaum et al., 2007) due to the utilization of knowledge structures. From a longitudinal standpoint, this implies a FCR, which can be defined by two changes occurring with practice: (a) the decrease of brain activity undergirding cognitive processes that are used less with practice (here WM in stage 1), and (b) the emergence of brain activity in new areas supporting new cognitive processes (here episodic LTM in stage 2). Therefore, a FCR involving episodic LTM<sup>2</sup> is expected (**Figure 1**). Unfortunately, to our knowledge, nobody has followed the development of expertise in a WM-related task with neuroimaging long enough to test

2FCR, in this article, means, a shift of a way of performing a task to another way, without specifying the new way. FCR involving episodic LTM, in this article, means that the new way of performing the task is through episodic LTM

**FIGURE 1 | Schematic representation of the Two-Stage Framework linking the cognitive and cerebral levels in expertise acquisition, through two examples.** The "Examples" section shows the evolution of

the effect of knowledge on how items to-be-remembered are processed: at first, items are processed almost separately, later, items are regrouped in chunks, and finally in knowledge structures, which can be viewed as super-chunks that regroup multiple chunks into a high-level pattern. In the "Cerebral Level" section, the representation of brain activity is at an ordinal scale. SST stands for statistical significance threshold; if brain activity is beneath this threshold, it goes undetected. PFC stands for

prefrontal cortex, PL for parietal lobe, and MTL for medial temporal lobe. The first MTL activity on the left is almost at the same level than the statistical significance threshold in order to indicate that for novices, brain activity is sometimes detected (see section "Concluding Remarks"). For novices, detection seems to vary according to the kind of experimental paradigm, the parameters and maybe the participants of the experiments. If one considers that the MTL activity is above the statistical significance threshold for novices then functional cerebral reorganization is better suited to describe expertise acquisition; if it is beneath, then functional cerebral redistribution is better suited.

<sup>1</sup> It is particularly the case in the intra-parietal sulcus (e.g., Majerus et al., 2010; Cowan et al., 2011).

this hypothesis. Instead, what is possible is to compare novices against experts. This was the aim of Guida et al.'s (2012) review, which showed that most of the studies were compatible with FCR involving LTM.

#### **FUNCTIONAL CEREBRAL REORGANIZATION: A SIGNATURE OF EXPERTISE?**

Given that a link has been established between expertise and FCR, an important question is to know whether FCR could be used as a signature for expertise. A simple way to answer this question is to examine empirically whether the implication "expertise thus FCR" observed by Guida et al. (2012) could be reversed. In other words, when one looks for patterns compatible with FCR—viz. a decrease of brain activity concerning one cognitive process and the emergence of brain activity concerning new cognitive processes—is expertise found? If it is not the case then FCR does not imply expertise.

At first glance, this does not seem to be true. There are multiple examples showing that the simple utilization of different strategies can involve patterns of activation similar to FCR. For instance, the literature of WM-related tasks shows that when different groups of individuals use spontaneously different strategies—verbal strategy vs. visual strategy (Burbaud et al., 2000), or verbal strategy vs. spatial strategy (Glabus et al., 2003)—then completely different patterns of activation are detected. This seems to be true even when the strategies are dictated by the experimenter, as observed by Bernstein et al. (2002) when imposing different encoding strategies in a task of face recognition. These three between-subject studies bring only indirect evidence, but they are confirmed by a within-subject study. When Reichle et al. (2000) asked the same individuals to process a sentence-picture verification task with different strategies (linguistic vs. visual), completely different patterns of activity appeared: there was a decrease of brain activity concerning cognitive processes (e.g., linguistic) and the emergence of brain activity concerning new cognitive processes (e.g., visual). In all these cases, a pattern consistent with FCR is present but no expertise is found. Therefore, the implication "expertise thus FCR" does not seem reversible.

However, when considering precisely FCR involving episodic LTM areas, the picture is different. First, we found only one study (Kondo et al., 2005); secondly, it is the only study where participants were taught how to use knowledge structures. Kondo et al. (2005) asked their novice participants to encode ten object pictures using the method of loci, basing themselves on the visuospatial knowledge of their house. When comparing neuroimaging before and after using the method of loci, they observed a pattern consistent with FCR at retrieval. Hence, if one argues that the method of loci is based on the utilization of expertise (Guida et al., 2009, 2013), the conclusion from Kondo et al. (2005) could be that for FCR involving episodic LTM areas, the implication "expertise thus FCR" can be reversed, making the proposal that FCR is a signature for expertise verisimilar (when involving episodic LTM).

However, when trying to relate expertise and FCR and before one can be conclusive on the link between these two concepts, two elements need to be taken into consideration, functional cerebral redistribution and brain connectivity. These will constitute our concluding remarks.

#### **CONCLUDING REMARKS**

A very recent growing body of data suggests that in some cases, functional cerebral redistribution could also occur with practice. Both FCR and functional cerebral redistribution involve a combination of increases and decreases in activation (Kelly and Garavan, 2005); however, only FCR necessitates the emergence of new areas with practice. Recent evidence suggests that MTL could also be involved in WM tasks with no practice (e.g., Ranganath and Blumenfeld, 2005; Olson et al., 2006; Lee and Rudebeck, 2010; Campo et al., 2013). The debate is still ongoing and these results are considered artifactual by some, mainly because the tasks used seem more LTM-like than WM-like (Jonides et al., 2008). Squire and Wixted (2011) observed that if WM capacity were not exceeded, MTL was not involved (e.g., Shrager et al., 2008; Jeneson et al., 2012). Nonetheless, these data suggest that in some cases, MTL activation could be expected at the early stages of training. It is plausible that this activation could increase with expertise when knowledge structures are available, which means that this pattern would be better described by functional redistribution than FCR, because this last pattern implies no MTL activation at the initial stage of practice (**Figure 1**). However, the additional areas of experts are sometimes the same structures than that of novices but on the opposite hemisphere (e.g., Bilalic et al., 2011, 2012 ´ ), a.k.a. "double take" phenomenon (e.g., Scalf et al., 2007), which complicates sometime the distinction between functional redistribution and FCR. To conclude concerning MTL, more work needs to be done to ascertain its involvement, especially in practice-related studies where this kind of evidence is scarce (but, see Dahlin et al., 2008), therefore, presently, these are only assumptions.

Finally, when considering expertiserelated FCR, which constitutes a combined increase and decrease in activation across the brain, it is also crucial to understand how the different brain areas work together in terms of network connectivity. Fundamental in this respect is the idea of "neural context" proposed by McIntosh (1998; also, see Bressler and McIntosh, 2007), according to which the frame of activation (or the neural context of activation) around a determined brain area is at least as important as the activation of that brain area. If one relates this idea to practice, then the consequence is that even if the activation of a region does not change with practice, it can still be crucial, by influencing the increase or decrease of activation in other brain areas (Kelly and Garavan, 2005). The neural context could thus be important for the study of functional reorganization, and its application should be disseminated (Bressler and Menon, 2010).

#### **ACKNOWLEDGMENTS**

In memory of Hubert Tardieu, our dear friend, collaborator, and colleague. We feel privileged to have shared moments of his life. We will never forget his sense of humor, wit and genuine kindness.

#### **REFERENCES**

Bernstein, L. J., Beig, S., Siegenthaler, A. L., and Grady, C. L. (2002). The effect of encoding strategy on the neural correlates of memory for faces. *Neuropsychologia* 40, 86–98. doi: 10.1016/S0028- 3932(01)00070-7


task: a logical sequel to Miller (1956). *Psychol. Sci.* 15, 634–640. doi: 10.1111/j.0956-7976.2004. 00732.x


effect enthusiasm. *Mem. Cognit.* 41, 571–587. doi: 10.3758/s13421-012-0284-3


*Neurosci.* 18, 1087–1097. doi: 10.1162/jocn.2006. 18.7.1087


reduces attentional blink. *J. Exp. Psychol. Hum. Percept. Perform.* 33, 298–329. doi: 10.1037/0096- 1523.33.2.298


working memory capacity. *Nature* 428, 748–751. doi: 10.1038/nature02447

Young, B. J., Otto, T., Fox, G. D., and Eichenbaum, H. (1997). Memory representation within the parahippocampal region. *J. Neurosci.* 17, 5183–5195.

*Received: 23 July 2013; accepted: 02 September 2013; published online: 20 September 2013.*

*Citation: Guida A, Gobet F and Nicolas S (2013) Functional cerebral reorganization: a signature of expertise? Reexamining Guida, Gobet, Tardieu, and Nicolas' (2012) two-stage framework. Front. Hum. Neurosci. 7:590. doi: 10.3389/fnhum.2013.00590*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2013 Guida, Gobet and Nicolas. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Task decomposition: a framework for comparing diverse training models in human brain plasticity studies

### *Emily B. J. Coffey1,2 and Sibylle C. Herholz 1,2,3\**

*<sup>1</sup> Montreal Neurological Institute, McGill University, Montreal, QC, Canada*

*<sup>2</sup> International Laboratory for Brain, Music and Sound Research, Université de Montreal, Montreal, QC, Canada*

*<sup>3</sup> German Center for Neurodegenerative Diseases (DZNE), Bonn, Germany*

#### *Edited by:*

*Merim Bilalic, University Clinic, University of Tübingen, Germany*

#### *Reviewed by:*

*Luca Turella, University of Trento, Italy Alessandro Guida, University of Rennes 2, France*

#### *\*Correspondence:*

*Sibylle C. Herholz, Deutsches Zentrum für Neurodegenerative Erkrankungen, Holbeinstr. 13-15, 53175 Bonn, Germany e-mail: sibylle.herholz@dzne.de; sibylle.herholz@gmail.com*

Training studies, in which the structural or functional neurophysiology is compared before and after expertise is acquired, are increasingly being used as models for understanding the human brain's potential for reorganization. It is proving difficult to use these results to answer basic and important questions like how task training leads to both specific and general changes in behavior and how these changes correspond with modifications in the brain. The main culprit is the diversity of paradigms used as complex task models. An assortment of activities ranging from juggling to deciphering Morse code has been reported. Even when working in the same general domain, few researchers use similar training models. New ways to meaningfully compare complex tasks are needed. We propose a method for characterizing and deconstructing the task requirements of complex training paradigms, which is suitable for application to both structural and functional neuroimaging studies. We believe this approach will aid brain plasticity research by making it easier to compare training paradigms, identify "missing puzzle pieces," and encourage researchers to design training protocols to bridge these gaps.

**Keywords: expertise, plasticity, training, MRI, multisensory learning**

### **INTRODUCTION**

The idea that the structure and function of the human brain remains somewhat open to alteration by experience over the lifespan is now well established (Wan and Schlaug, 2010; Zatorre et al., 2012), although researchers have not yet formed a comprehensive view of how – and under which conditions – this occurs.

In this paper, we focus on the research looking at trainingrelated plasticity in human subjects that uses complex skills as models, such as juggling (e.g., Draganski et al., 2004; Boyke et al., 2008; Scholz et al., 2009), golfing (e.g., Bezzola et al., 2011), or various aspects of making music (e.g., Lappe et al., 2008; Hyde et al., 2009). Work using such skills complements earlier and ongoing research on more basic aspects of brain–behavior relationships, such as learning a simple finger-tapping task (e.g., Ungerleider et al., 2002).

Complex tasks offer several advantages over simpler tasks, as models: they involve more ecologically valid learning experiences; they offer an opportunity to study higher-order and domaingeneral aspects of learning; they are often inherently interesting to subjects, which offers benefits particularly for longitudinal studies in motivation and compliance; and most significantly, recent evidence suggests that the multisensory and sensorimotor nature of such tasks is particularly effective in inducing plasticity both in sensory and association cortical areas (Lappe et al., 2008; Paraskevopoulos et al., 2012).

The complex nature of the tasks also introduces major challenges for the comparison and integration of results across training studies. These studies usually produce complex results, including changes in activity or structure in many different brain regions. Strictly speaking, only a direct comparison can demonstrate specificity of plastic changes. Rarely are these available, though; the majority of training studies have used either control groups without training, or comparison with a within-subject baseline to assess the possibility of developmental or other non-specific changes. In a recent review of 20 studies on the structural effects of a range of cognitive and multisensory training paradigms, for example, only three compared the task of interest with a second task (Thomas and Baker, 2013). Since few direct comparisons are available, inferences regarding the specificity of task effects rest on arguments about the relevance of the brain structure to apparently related tasks, or on correlational evidence in the form of relationships with behavioral change. Typically, this works well when outcomes can be predicted beforehand, and indeed training studies report changes in brain areas (or other physiological measurements) that are known from previous work to be involved in related activities (e.g., in auditory and motor areas for musical training; Hyde et al., 2009). Findings which are not predicted, for example because relationships of higherorder cognitive systems to training are yet unknown, can pose greater interpretation problems. This is due to the dissimilarity of the studies available for comparison – studies using unalike paradigms offer only very weak and caveated support for one another.

As well as for explaining unexpected results, the diversity of complex task paradigms makes it hard for researchers to draw general bigger-picture conclusions about brain plasticity from the aggregate results. As Erickson (2012) notes, "it is difficult to retrieve much homogeneity in the outcomes from such a heterogeneous set of studies and primary aims." There are basic questions about brain plasticity which remain unresolved. For

"fnhum-07-00640" — 2013/10/5 — 17:43 — page 1 — #1

example, why does acquiring some skills lead to increases in measures of brain structure or activity (generally interpreted as strengthening existing capacity or recruiting additional machinery), whereas others lead to decreases (generally interpreted as improved efficiency and requiring less processing effort)? What determines whether a skill is transferable to other behaviors or results in a highly specific behavioral gain? It will be difficult to piece together answers from such a heterogeneous group of studies.

In sum, complex tasks are problematic because their study design space is vast. Because neuroimaging studies such as these are resource-intensive, particularly longitudinal studies which allow us to test causal hypotheses directly, systematically varying each aspect of the training paradigms is an impractical solution. The wide and sparse coverage of potential complex tasks that is already represented in literature implicates nearly every cognitive system, and in various combinations. Even when studies ostensibly use similar tasks, they may differ on tens of potentially important training design parameters, among them the control condition used (i.e., none, between subjects, or within subjects), the sample size, population characteristics, the duration and intensity of training, and the subjects' attained proficiency. Concurrently, rapid evolution of neuroimaging and analysis methods further reduce the comparability of studies.

One might argue for a return to simpler training paradigms until basic mechanisms of plasticity are more fully understood, were it not for the fact that a better understanding of complex task training-related plasticity and its underlying mechanisms is needed now. Important motivations fuelling the observed increase in research comes from promising yet early attempts to improve neurological rehabilitation after injury or stroke (Altenmüller et al., 2009), to prevent of cognitive decline in old age (Wan and Schlaug, 2010), to develop auditory training tools that target the brain to treat auditory processing disorder (Musiek et al., 2002; Loo et al., 2010), and possibly to transform the way the effectiveness of therapeutics and training techniques is evaluated (Erickson, 2012).

We must find new ways in which to integrate knowledge generated using many models. In the remainder of this paper, we propose one such approach.

#### **A FRAMEWORK TO CHARACTERIZE TRAINING TASKS**

In professional environments in which training of personnel must be both effective and efficient, instructional designers have refined the art of training; i.e., producing trainees with specific skills and knowledge. Briefly, one such instructional design process known as the Dick and Carey Systems Approach Model (Dick et al., 2004) begins with the definition of a set of concrete goals called "performance objectives" (POs). Flow charts are then used to illustrate the analysis of complex activities into smaller activities or functions. The POs and task breakdown serve as a reference when designers create evaluation measures, define an appropriate instructional strategy, develop and select instructional materials, and finally, evaluate the effectiveness of the training.

Whereas the goal of instructional designers is the successful and measurable transmission of skills and knowledge, the goals of researchers are usually either to design a training paradigm which provokes change in a certain brain structure or function, or to better understand what changes might have been caused by an existing training paradigm or naturalistic learning experience. In either case, two ideas can be borrowed from instructional design; the use of POs, and the task analysis. These are useful both when designing studies and when evaluating existing designs for comparability.

#### **PERFORMANCE OBJECTIVES**

A PO consists of a description of the desired outcome behavior; the circumstances under which the outcome should be met including any equipment, instructions, environmental variables like condition of the subjects and availability of feedback or coaching; and the criteria used to judge the learner's performance. It is worthwhile to create POs for a training study because they help researchers to maintain coherence between the performed task, the subjects' instructions, and the behavioral-dependent variables, and to consider addressing possible alternative explanations with additional controls or measures. In **Figure 1A**, we include some suggestions for writing and using POs.

The POs for some training studies are relatively straightforward and can be easily deduced from the methods description. For example, a recent study (Landi et al., 2011) investigated structural changes associated with motor adaptation. The PO for this task could have been written as follows:


Extracting POs from other training studies, particularly ones involving naturalistic designs on leisure activities, is sometimes less straightforward because the tasks are not always comprehensively described. This might be because a detailed account of a very popular activity seems unnecessary, because the training is not strictly under the experimenter's control, or because the aim of a study might be only to show that any change was caused by doing some activity.

In the study reports, conclusions are nevertheless almost always drawn about the relationships between many specific physiological findings and possible task-relevant cognitive activity. The details of how real-life complex tasks are taught and learned may be relevant for this interpretation. For example, a novice violinist who is encouraged to play entirely by ear will exercise a different set of cognitive skills than one who is learning by reading musical notation, which could explain differences in activity in visual and auditory areas. Making a clear statement about the intended focus of the training early in the study design phase makes it easier to identify supplementary measures and controls that might have explanatory value (e.g., a postpractice questionnaire to provide insight into the instructor's strategy).

**"fnhum-07-00640" — 2013/10/5 — 17:43 — page 2 — #2**

#### **TASK DECOMPOSITION**

Task analysis has a long tradition in cognitive psychology and behaviorism where it has been applied to develop models for behavioral contingencies (e.g., Skinner, 1954), to create computational models (Newell and Simon, 1972), to analyze individual differences in reasoning (Sternberg, 1977), and to build computer models of cognitive architecture (Anderson et al., 2003, 2009; Qin et al., 2003).

Unlike previous work in which creating models of behavior and cognition was the goal, our motivation is to be able to compare the neuroplasticity results from multifaceted tasks and from researchers of different theoretical persuasions. We must therefore target a level of generality that can be linked to the functional networks and modules accessible to neuroimaging methods, rather than on finer-grained analyses, such as specific thought processes. We must also prioritize training-related changes, and we must try to remain as theoretically neutral as possible, such that two researchers studying complex tasks need not first agree on cognitive and mechanistic models for each of many task components.

We propose a "task decomposition" in which a complex task is broken down into elements that are necessary to achieve the PO (see **Figure 1B** for suggested steps). To facilitate agreement and limit inherent theoretical assumptions, we suggest that the elements be formulated as potentially measurable behaviors (e.g., hold a sequence of notes in mind) rather than as cognitive constructs (e.g., auditory working memory). Choices must be made as to the generality of the elements, for example, whether a hand movement is broken down into finger movements. This will depend on the ability of the experimental design to resolve smaller elements, but if modeled hierarchically, elements could be expanded or collapsed to different levels of detail to accommodate different comparative goals. A common taxonomy of behavioral elements would aid task comparison. To the best of our knowledge a suitable taxonomy does not exist, but one readily observes multiple reoccurring elements when working through several decompositions. Enumerating and standardizing the wording of these would be a necessary step for any meta-analysis.

We relate the elements temporally, which is straightforward and does not introduce many cognitive assumptions. Elements may occur in sequence or concurrently, and series of events may occur as a discrete unit or as a loop. We have included behaviors normally considered both lower and higher-order cognition as elements (e.g., visual observation vs. evaluating the success of an action). Metacognitive elements like the selection of different strategies could also be included, but this would add a level of complexity, for example, if one evaluation element switched between two possible structures. It should first be considered if the component is a focus

"fnhum-07-00640" — 2013/10/5 — 17:43 — page 3 — #3

of neuroplasticity; i.e., likely to have changed with the training in a way that was measurable.

For some tasks, the selection of elements and their arrangement may lead to several competing structures which represent neurophysiologically relevant differences. In the case of a longitudinal study in which transient effects are observed over several measurement points (e.g., Taubert et al., 2010), different structures could usefully be related to expertise acquisition. Different structures could be caused by incorrect assumptions or inter-individual differences; we believe that even in these cases it will be valuable to document the task as a basis for discussion – problems can then be resolved empirically.

In the following section, we illustrate how task decomposition might be used to compare two tasks. We focus on multisensory training tasks in this paper, but this approach might also be applied to more purely cognitive training (see for example reviews by Buschkuehl et al., 2012; Guida et al., 2012).

#### **AN EXAMPLE OF USING TASK DECOMPOSITION**

Using this approach, we start to explore how changes in gray matter concentration as measured by voxel-based morphometry are related to two training tasks; visuomotor tracking described previously (Landi et al., 2011) and 40 h of amateur-level golf practice as an uncontrolled leisure activity (Bezzola et al., 2011). The conclusions we can draw from this two-task analysis might seem trivial as it is not difficult to compare two studies without decomposition. However, our goal here is to offer examples of task decomposition diagrams (TDDs) and a simple illustration of the principle of using them to compare tasks.

We have prepared a possible task decomposition for each study (see **Figure 2**) and highlighted the elements that differ between the tasks (bold font). We expect to find similar neurophysiological changes in tasks that share a component, and no change in this area with other tasks that do not have this component or do not stress adaptation and learning of this component.

**FIGURE 2 |Task decomposition diagrams for two training paradigms, (A) the visuomotor tracking task of Landi et al. (2011), and (B) a golf swing which we presume was a major part of golf training (Bezzola et al., 2011).** Sub-tasks of the main activity are shown with boxes. We have grouped similar elements into classes for the purposes of visualization

(perception – green, motor – blue, evaluation/error calculation – red, memory – yellow), though the elements themselves are likely to be more useful for task comparison. Arrows show dependencies between sub-tasks and thick bars indicate concurrent activities. Components that differ between the visuomotor tracking task and the golf swing are in bold font.

"fnhum-07-00640" — 2013/10/5 — 17:43 — page 4 — #4

The PO of the visuomotor tracking task was to minimize the average target–cursor distance over the session, whereas the PO of the golf practice was presumably to execute a golf swing so as to move ball to target location. The TDDs show overlap of task requirements relating to motor planning and execution. These can account for the convergent findings of changes in motor areas in the dorsal stream encompassing primary motor cortex (M1) contra-lateral to the (most) trained hand in both studies. In contrast, the divergence of the tasks in some aspects, in particular visuomotor control in two vs. three dimensions, hand vs. full body action including balance, and a tight coupling of action and outcome vs. integration of several separate movements into a larger sequence that involves more planning, could account for a discrepancy in findings in the frontal and parietal association areas, as these areas have been shown to be related to planning action sequences and visuomotor integration (Andersen and Buneo, 2002; Molnar-Szakacs et al., 2006) and the representation of one's own body in spatial reference frames (Vallar et al., 1999) – elements that are important parts of golf, but not visuomotor tracking.

This sort of analysis could then be used to investigate explanatory hypotheses. For example, based on a previous functional magnetic resonance imaging (fMRI) study using the same task (Della-Maggiore and McIntosh, 2005), Landi et al. (2011) had expected changes in a network including M1, posterior parietal cortex (PPC), and cerebellum, but found only the M1 result. Cerebellum and PPC are relevant functionally for online error correction, but the lack of structural changes might be due to the similarity of the manual tracking task to everyday tasks in these respects. The more novel kinds of whole-body and multisensory error corrections that are necessary for learning golf swings might stimulate greater neurophysiological adaptation. Next steps might be to compare these results with those from other manual tasks with these error correction requirements, or to design one.

#### **OTHER APPLICATIONS AND CAUTIONS**

Beyond uncovering patterns of task demands and neurophysiological effect across training studies, task decomposition could be used in other ways in plasticity research. Characterizing tasks used in human and animal research could facilitate cross-field comparisons from systems to circuit level (e.g., Sagi et al., 2012). Hypotheses as to the cause for divergent empirical findings can be tested by designing tasks in which only those sub-components

#### **REFERENCES**


suspected of causing the change are manipulated. It would also be possible to start with a brain region of interest, identify common characteristics or components of trainings that lead to enhancements, and design rehabilitative tasks emphasizing those elements.

There are several possible pitfalls to this approach: *post hoc* models could be biased toward task components that have known neural correlates that are in line with the results of the study; omitting crucial task components due to oversight or bias might result in incorrect assignment of neuroimaging results to task components that are included in the model; and since most brain regions are involved in multiple, different cognitive processes, changes in the same brain region may be due to different task components depending on context. These challenges parallel challenges interpreting neuroimaging data in general (e.g., Vul et al., 2009; Simmons et al., 2011), and can be partially addressed by *a priori* model setup and awareness of these limitations during interpretation.

#### **CONCLUSION**

In the rapidly evolving field of training-related plasticity, integration of results across studies will be crucial. For this, an approach like task decomposition could be useful to disentangle the respective influences of task demands on neuroplasticity, and increase the informational value and impact of each resource-intensive training study. By integrating across studies, we will be able to reveal specific and general mechanisms of plasticity within and across modalities such as the motor, visual, and auditory systems, and enhance our understanding of the role of higher-order functions and association areas in cortical plasticity. We argue that if researchers systematically consider what sub-tasks participants must perform in order to achieve training goals, and communicate them in the literature along with other aspects of their study design, it may turn the diversity in training studies into an advantage rather than an impediment by allowing us to extract meaning from aggregate results and to target future studies efficiently.

#### **ACKNOWLEDGMENTS**

We would like to acknowledge the support of the Canadian Institutes of Health Research (Vanier Canada Graduate Scholarship; Emily B. J. Coffey) and the Deutsche Forschungsgemeinschaft (HE6067/1-1 and 3-1; Sibylle C. Herholz).

10, 241–261. doi: 10.3758/BF031 96490


effects following working memory training. *Dev. Cogn. Neurosci.* 2(Suppl. 1), S167– S179. doi: 10.1016/j.dcn.2011. 10.001


"fnhum-07-00640" — 2013/10/5 — 17:43 — page 5 — #5

*Instruction*. Boston, MA: Allyn & Bacon.


benefits for children with languageand reading-related learning difficulties. *Dev. Med. Child Neurol.* 52, 708–717. doi: 10.1111/j.1469- 8749.2010.03654.x


emotion, personality, and social cognition. *Perspect. Psychol. Sci.* 4, 274– 290. doi: 10.1111/j.1745-6924.2009. 01125.x


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 25 July 2013; accepted: 16 September 2013; published online: 08 October 2013.*

*Citation: Coffey EBJ and Herholz SC (2013) Task decomposition: a framework for comparing diverse training models in human brain plasticity studies. Front. Hum. Neurosci. 7:640. doi: 10.3389/fnhum.2013.00640*

*This article was submitted to the journal Frontiers in Human Neuroscience*

*Copyright © 2013 Coffey and Herholz. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

"fnhum-07-00640" — 2013/10/5 — 17:43 — page 6 — #6

## The neural circuitry of expertise: perceptual learning and social cognition

#### *Michael Harré\**

*Complex Systems Research Group, School of Civil Engineering, The University of Sydney, Sydney, NSW, Australia*

#### *Edited by:*

*Merim Bilalic, University Tübingen, Germany*

#### *Reviewed by:*

*Guillermo Campitelli, Edith Cowan University, Australia Fernand Gobet, University of Liverpool, UK*

#### *\*Correspondence:*

*Michael Harré, Complex Systems Research Group, School of Civil Engineering, The University of Sydney, Building J05, Sydney, NSW 2006, Australia e-mail: michael.harre@ sydney.edu.au*

Amongst the most significant questions we are confronted with today include the integration of the brain's micro-circuitry, our ability to build the complex social networks that underpin society and how our society impacts on our ecological environment. In trying to unravel these issues one place to begin is at the level of the individual: to consider how we accumulate information about our environment, how this information leads to decisions and how our individual decisions in turn create our social environment. While this is an enormous task, we may already have at hand many of the tools we need. This article is intended to review some of the recent results in neuro-cognitive research and show how they can be extended to two very specific and interrelated types of expertise: perceptual expertise and social cognition. These two cognitive skills span a vast range of our genetic heritage. Perceptual expertise developed very early in our evolutionary history and is a highly developed part of all mammals' cognitive ability. On the other hand social cognition is most highly developed in humans in that we are able to maintain larger and more stable long term social connections with more behaviorally diverse individuals than any other species. To illustrate these ideas I will discuss board games as a toy model of social interactions as they include many of the relevant concepts: perceptual learning, decision-making, long term planning and understanding the mental states of other people. Using techniques that have been developed in mathematical psychology, I show that we can represent some of the key features of expertise using stochastic differential equations (SDEs). Such models demonstrate how an expert's long exposure to a particular context influences the information they accumulate in order to make a decision.These processes are not confined to board games, we are all experts in our daily lives through long exposure to the many regularities of daily tasks and social contexts.

**Keywords: expertise, perceptual template, theory of mind, social cognition, neural networks, stochastic differential equations**

#### **1. INTRODUCTION**

Those that have spent decades mastering a complex task have been the subject of considerable research and popular interest throughout the twentieth century and it is now a mature and well established area of research for the twenty-first century. Much of the interest in popular culture comes from the remarkable feats of accuracy, memory and speed these experts demonstrate with such relative ease when compared with the rest of us. It is hard not to be impressed by a chess Grand Master who can play against a half dozen other Grand Masters with little effect on the quality of their play (Gobet and Simon, 1996). Even more impressive is the ability of some chess players to play dozens of simultaneous games while *blindfolded* (Saariluoma, 1991). These are remarkable cognitive feats, but the basis of much current research is the idea that experts are not naturally or genetically *privileged*. There are aspects that are important; at what age training begins and the many hours of deliberate practice (Ericsson et al., 1993) play a vital role, but these are external factors. In principle at least, we all have the basic mechanisms that enable similar feats of excellence to be developed, up to some (possibly significant) degree of personal variation (Hambrick et al., 2013). On the other hand much of the scientific interest in this area stems from new ways in which neuro-imaging can be combined with expertise-specific tasks in order to analyze the basic neural processes that underpin these cognitive abilities and their development.

This article begins with the principle that there are universal mechanisms that exist in all of us but that experts have exploited these mechanisms to the very limits of their abilities. The significance of the universality of these mechanisms lies in the role "expertise" plays in all of our lives. For example we are all experts in facial recognition, we have a specific part of the brain dedicated to this task called the fusiform face area (Sergent et al., 1992). We are also experts in social reasoning, we have dedicated regions of the brain [such as the precuneus (Huth et al., 2012)] that enable us to perceive some situations as specifically social in nature. From this point of view, board games are a highly focused, complex task that involves multiply interacting processes, the constituent parts of which may be compared to what is already understood regarding simpler and comparatively well understood systems that may be differentially integrated and activated in experts.

This article addresses two specific aspects of expertise in board games using stochastic differential equations (SDEs), this is an approach to expertise that has not yet been explored in the literature but has been used extensively as a realistic model of neural dynamics in decision making and has lead to significant insights into the theoretical and computational modeling of neural dynamics (Bogacz et al., 2006; McMillen and Holmes, 2006). The first is the categorization of board positions by a fast feedforward mechanism and its integration with other neural processes as a source of *contextual information* (Harré et al., 2012). This framework captures the rapidity with which an expert can unconsciously appreciate a game's gestalt (Simon, 1986) and generate good options for the next move without conscious deliberation. The second is the ability of experts to understand the perspective of their opponent in deciding their next move. While strategic perspective taking is a little studied aspect of board game expertise there is considerable neuro-imaging evidence suggesting that perspective taking in economic games has mechanisms in common with board game expertise. With this in mind, the key contributions of this article are threefold: the introduction of an SDE formulation of expertise, its application to perceptual categorization in board games and its application to perspective taking in board games.

#### **2. PERCEPTUAL CATEGORIZATION FOR EXPERTS**

It has been hypothesized that expert perception in complex tasks is based on implicit learning of the statistical regularities of the environment to which the expertise pertains (Kahneman and Klein, 2009; Kellman and Garrigan, 2009; Harré, 2013). For example these statistical regularities allow a Grand Master chess player to rapidly categorize the current state of a game and generate good intuitive guesses as to what the next move might be. This requires a neural process of rapid consolidation of experience weighted percepts into a single categorical "whole" (Serre et al., 2007; Kriegeskorte et al., 2008; Wan et al., 2011; Harré and Snyder, 2012; Huth et al., 2012). An important implication is that for unconscious perceptual categorization of this sort there is no need for reward feedback, it is an unsupervised learning process as suggested by the early visual processing model in Serre et al. (2007). Such processes have been the subject of psychological studies at least since the early work of Ratcliff (1978) on the brain's statistical accumulation of percepts to a decision boundary and Nosofsky's (1984) work on categorical similarity for decision making.

#### **2.1. STOCHASTIC PROCESSES AS MODELS OF DECISION-MAKING**

This section introduces the mathematical framework in which perceptual decisions are modeled by SDEs. While the focus of this work is on binary decisions due to their simplicity of exposition, real decisions are the selection of one option from many alternatives. So while the work on stochastic decision boundaries (the basis of what follows) has recently been extended to multiple alternatives (McMillen and Holmes, 2006), the focus here is on the binary case. The simplest form of these equations is that of a time series of the incremental changes in a noisy variable *x* that has a constant (fixed) "drift" component μ and a statistical "diffusion" term σ*dW* (*dW* is a standard Wiener process with unit variance). During a time interval *dt* the change in *x* is given by:

$$d\mathfrak{x} = \mathfrak{u}dt + \mathfrak{o}dW.\tag{1}$$

With σ → 0 the noise term reduces to zero and *x* = μ*t* + *c* with integration constant *c*, i.e., a straight line with gradient μ and so the drift is thought of as the linear change in the accumulation of a signal *x*. With σ = 0 the path followed by *x* fluctuates around the expected increase μ*t* with the fluctuations proportional to the variance σ2. In the perceptual decision literature this drift diffusion model is often interpreted as a threshold decision process: if *x > Z*<sup>1</sup> or *x < Z*<sup>2</sup> then *x* has crossed a decision boundary, *Z*<sup>1</sup> or *Z*2, and a decision has been reached favoring one of two hypotheses, *H*<sup>1</sup> or *H*2, represented by these two boundary values. This accumulation to a decision threshold is shown in **Figure 1**. A common interpretation, and the one adopted here, is that a neuron (or an assembly of neurons) can be thought of as a noisy accumulator of signals from other neurons such that when the neuron is excited to a level *Z* it fires thereby signaling a decision to those neurons to which it is (forward) connected. This is loosely described as a neuron's decision-making process or a neuron having "made a decision."

There are three variations to this model that are interesting in the current context. The first is that the μ term can be split into as many components as needed to represent the dynamic microstructure within the interval *dt* providing these values are constants. For instance the composite term μ<sup>∗</sup> = μ<sup>1</sup> − μ<sup>2</sup> + μ<sup>3</sup> is the "net drift" in *x* as the term μ<sup>∗</sup> is still constant, but the

**one choice over another and the mean trajectory followed by the "average" time course of evidence accumulation.** μ is the average amount of information received during *dt* over the time course of evidence accumulation. Note that because there is some statistical variation in the time course of the accumulation of information, for a non-zero σ (cf. Equations 1–4) there is a finite chance that the signal will cross the *wrong* boundary (and therefore the neuron signals the wrong decision) in the sense that the underlying signal is distorted to the extent that the boundary that is crossed is different from the boundary that would have been crossed in the absence of noise. So as σ increases so too does the chance that the wrong boundary is crossed.

component parts μ*<sup>i</sup>* may have neurological, psychological or perceptual interpretations that are useful to distinguish.

The second generalization is to split the process into two separate and independent stochastic variables: If μ<sup>1</sup> and μ<sup>2</sup> are independent stochastic processes with fluctuations σ*<sup>i</sup>* then Equation (1) can be generalized:

$$d\mathbf{x}\_1 = \mu\_1 dt + \sigma\_1 dW\_1 \tag{2}$$

$$d\mathfrak{x}\_2 = \mathfrak{u}\_2 dt + \mathfrak{o}\_2 dW\_2 \tag{3}$$

This is often referred to as the race model: *x*<sup>1</sup> races against *x*<sup>2</sup> to reach their respective decision boundaries. In neurological modeling studies the *xi* are seen as *independent accumulators*, they collect noisy signals from other neurons until they cross their respective decision boundaries. The average signals they receive are the μ*<sup>i</sup>* plus fluctuations σ*idWi*, so these are thought of as feedforward neural models, neurons that fire at a pervious time feed an average signal of μ*<sup>i</sup>* that is aggregated in the decision variable *xi* encoded by a neuron.

A third generalization is to introduce a dependency of the rate of change in *x* on the current state of *x*. In this case *dx* is driven by a constant drift term μ as well as the present state of *x*:

$$d\mathbf{x} = (\mu + \lambda\mathbf{x})dt + \sigma dW \tag{4}$$

again noting that μ can be split into its (constant) constituent parts. This is called an Ornstein-Uhlenbeck (O-U) SDE, a common model of the neural processes involved in stochastic decisions (Bogacz et al., 2006). The interpretations of these models will be introduced as needed.

#### **2.2. STOCHASTIC DECISIONS REGARDING CATEGORY MEMBERSHIP**

These SDEs can be used as a model of neural interactions and category formation in the early stage, feedforward perceptual systems of the brain (DiCarlo et al., 2012). From this point of view a neuron receives signals from other neurons at an earlier stage of perceptual processing that encode a simpler set of percepts and this later neuron then aggregates these signals into a more complex representation. This is a process with some statistical variation due to the inherently noisy nature of the external environment as well as neural activity, but over a large population of neurons the information a higher level "scene categorization" neuron receives will represent the correct category. **Figure 3A** shows an abstract representation of the set of neurons that either excite or inhibit one of two higher level category neurons as modeled by Equations (2, 3). **Figure 3B** shows how two neurons (*E*<sup>1</sup> and *I*2) can encode a signal that excites one categorical neuron while inhibiting the other.

The decision neurons *C*<sup>1</sup> and *C*<sup>2</sup> represent two different hypotheses regarding the state of the world. *H*<sup>1</sup> : the current scene can be represented as a category encoded by *C*1, *H*<sup>2</sup> : the current scene can be represented as a category encoded by *C*2. A decision regarding the state of the world is reached when evidence for one category accumulates to one boundary before an alternative boundary is crossed and so either *C*<sup>1</sup> or *C*<sup>2</sup> fires signaling the category for which there is more evidence. This process can be thought of as a stream of signals arriving at the retina and being processed by neurons that encode increasingly more and more complex representations that span an ever larger range of the visual field as the signals pass through a feedforward neural network (Serre et al., 2007).

An SDE representing changes in the level of excitation of neurons *C*<sup>1</sup> and *C*<sup>2</sup> is given by (see the caption of **Figure 3** for the cell groups notation):

$$d\mathbf{x}\_1 = \overbrace{[E\_1 + E\_1 E\_2 + E\_1 I\_2}^{C\_1 \text{inhibety signals}} - \overbrace{(I\_1 + I\_1 E\_2 + I\_1 I\_2)}^{C\_1 \text{inhibety signals}}] dt + \sigma\_1 dW\_1$$

$$= \underbrace{\mu\_1^\* dt + \sigma\_1 dW\_1}\_{C\_2 \text{ excitation}y \text{ signals}} - \underbrace{(I\_2 + I\_2 E\_1 + I\_2 I\_1)}\_{C\_2 \text{inhibisy signals}}] dt + \sigma\_2 dW\_2$$

$$= \mu\_2^\* dt + \sigma\_2 dW\_2$$

Note the similarities of Equations (5) and (6) to Equations (2) and (3) and that the μ<sup>∗</sup> *<sup>i</sup>* are simply constants independent of the *xi* (also note the discussion regarding μ<sup>∗</sup> just before these equations). In Equations (5) and (6) the μ<sup>∗</sup> *<sup>i</sup>* terms represent the deterministic component of the net drift of the stochastic variable *xi* toward a threshold value *Z*, this threshold represents the evidence necessary to recognize a category. This drift in *dxi* is an aggregation of visual signals and such aggregation of signals is not perfect, so some signal (such as the detection of a "spikey hat" on the top of a chess piece) might suggest a Queen piece or a King piece when in fact it is a bishop, however, when aggregated over many different visual cues we often accurately differentiate Kings from Queens from Bishops. These stochastic infelicities occur at all levels of the processing of visual signals, so we model these noisy signals using SDEs just as is done in studies of simple perceptual decision making, the key difference here lying principally in how far along the perceptual hierarchy the neurons in question happen to lie, this perceptual hierarchy runs along the ventral stream from right to left in **Figure 2** and from top to bottom in **Figure 3**. So in order to categorize a whole board, a bishop in a certain position might inhibit the recognition of a certain game opening (because it never occurs in that position for that particular opening) while it excites the recognition of an alternative opening, but such a recognition is again imperfect, the square on which the bishop is placed may be misidentified by the player or a critical nearby pawn might be overlooked. Some of these infelicities might be addressed through slower and more deliberate analysis of the board position, but this may not always be effective as the perception of the game category can set the context on which further deliberate analysis of the board is based.

The visual processing of scene information goes from the perception of simple lines and angles in region V1 through to actual objects such as single chess pieces and ultimately to a representation of the category to which the board configuration belongs, if the player is experienced enough to have learned such categorical representations, see **Figure 2** that summarizes some of the key ideas in the literature. This is different from recognizing every individual game element on the board, a *category* in the current sense means the broad strategic layout of the game, usually indicated by a number of key game pieces in particular

**FIGURE 2 | A representation of the category formation mechanism.** Information arrives at the eye and passes to the anterior regions of the brain where these signals arrive at V1 (an early visual processing region of the cortex). These signals then pass along the ventral stream of the inferotemporal cortex (IT). Most people from a western country are likely to be able to recognize a single chess piece, more experienced players develop "Chunks" that are aggregates of games pieces as well as schematics for the whole board that include key pieces and their spatial relationships, these "Templates" enable the rapid comprehension of the current state of the game and provide "slots" into which the smaller chunks can be fitted [see for example Gobet et al. (2001) and Gobet and Lane (2010) who have modeled unsupervised template learning in their CHREST model and suggested a neural model for it (Chassy and Gobet, 2011), as does the perceptual templates in Harré (2013) and Harré and Snyder (2012)]. Some of the information (individual pieces) are shown obscured in the Template (far left), only information which is necessary to distinguish one category from the many other possibilities is needed, such game pieces need to have frequently co-occurred in the same places over and over again through a player's training and experience of the game. In this diagram only one template is shown, however, an expert will have encoded thousands of such templates in their IT and each one "competes" to be recognized: signals that originate from earlier neurons encode simpler representations (a line detected by V1 is simpler than a game piece, a game piece is simpler than a chunk and a chunk is simpler than a template) that combine to make larger and more complex representations of the game, eventually signaling the most complex learned representation such as a template. The neuron that accumulates enough (typically noisy) information to cross that neuron's threshold is the winner [see the race model, Equations (2) and (3)] i.e., for two categories the competition would be between neurons *C*<sup>1</sup> and *C*<sup>2</sup> in **Figure 3** and the thresholds would be *Z*<sup>1</sup> and *Z*<sup>2</sup> in **Figure 1**.

key positions (Harré and Snyder, 2012; Harré, 2013). So minor variations on a particular chess opening can belong to a single category, indicated by key pieces in key positions that have frequently occurred in these same positions. This is what is meant by the statistical regularity of the environment, it enables a chess player to implicitly learn the strategic context of a game and then, as they grow in experience, to use this context in their search for good moves based on the implicitly recognized cues involving spatial relationships, color and some pieces. But an expert player will have acquired many thousands of different such categories over their career so the *i* in *dxi* of Equations (5) and (6) runs into the 1000's. Recent work has extracted these templates and enumerated them using an artificial neural network and real games of amateurs and professionals (Harré, 2013). And just as identifying a single game piece can be a statistically uncertain process, so too is identifying the current game's categorical membership out of the many thousands of possibilities.

These SDEs are similar in nature to those described in Bogacz et al. (2006) as well as the hierarchical structure recently proposed by Serre et al. (2007) and DiCarlo et al. (2012) where feedforward inhibitory and excitatory signals compete to accumulate evidence for one category over another. The key notion of this new approach is the use of SDEs to describe the neural mechanisms and to apply these ideas to expertise. So in this model a visual signal will elicit a combination of signals from neurons in precursor neural assemblies (of intermediate complexity) *E*1, *E*2, *I*1, and *I*<sup>2</sup> (Equations 5, 6). The solutions to these equations can be usefully expressed in terms of the probability that one of the category boundaries is crossed (Bogacz et al., 2006):

$$p(Z\_j \quad \text{is\\_crossed}) = \frac{e^{\beta \mu\_j^\*}}{\sum\_k e^{\beta \mu\_k^\*}} \ j, k \in \{1, \dots, 1000\text{'s}\} \quad (7)$$

where β = 2*z/*σ<sup>2</sup> (for simplicity symmetrical decision boundaries are used: *j* ∈ {1*,* 2}*, Z*<sup>1</sup> = *Z*<sup>2</sup> = *z*), see **Figure 1** for a schematic of the binary categorization dynamics and the probability of crossing one threshold versus another. Equation (7) has a very simple interpretation: The probability of recognizing "board category" *j* is a function of the sum of the evidence μ<sup>∗</sup> *<sup>j</sup>* in favor of that category (relative to the evidence for other categories μ<sup>∗</sup> *<sup>k</sup>* ) subject to some statistical variability parameterized by β.

#### **3. PERSPECTIVE TAKING AND OUR "STRATEGIC THEORY OF MIND"**

The previous section extended the well studied modeling paradigm of SDEs to the issue of expert board perception and rapid categorization. While this is a novel extension of recent work the goal is relatively modest in that it aims to connect the theoretical principles of two approaches to the modeling of both simple and complex perceptual decisions.

This next section has a more ambitious goal: to use an extension of these SDEs (a hybrid combining Equations 2, 3 with Equation 4) as a model of decision-making processes whereby the decision-maker has an internal representation of the perspective of another person such as a chess opponent. One of the assumptions made in what follows is that in order to understand another person's perspective an individual needs a representation of the other's internal mental states e.g., their constraints and goals, and that these might be different from those of the first person. At this point two concerns arise: *Is this a reasonable assumption?* and *What is the evidence for such an assumption?* The latter will be covered in sections 3.1–3.3 but a few words are needed first to justify the reasonableness of this approach.

When a skilled player looks at a game in progress there may be sufficient information available in the first few moments of viewing the game for the player to make a decision as to where to move their eyes in order to refine their search such that only the most promising areas of the board are explored (De Groot et al., 1996). This fast perceptual comprehension is an unconscious aspect of expertise and it is an important part of an expert's remarkable speed in selecting a good move from a very short

**FIGURE 3 | A feedforward schematic of how percepts of intermediate complexity either excite or inhibit more complex percepts.** *C*<sup>1</sup> and *C*<sup>2</sup> are neurons that encode high order scene categories that are composed of percepts of an intermediate complexity. **(A)** A conceptual framework of cell assemblies that act to excite and/or inhibit more complex categorical representations. The groups of neurons *E*1*, E*2*, I*1, and *I*<sup>2</sup> are each a set of neurons some of which have overlapping functionality between excitation

and inhibition. For example the cells in the group *E*1*I*<sup>2</sup> = *E*<sup>1</sup> ∩ *I*<sup>2</sup> excite *C*<sup>1</sup> and inhibit *C*<sup>2</sup> whereas the cells in group *I*1*I*<sup>2</sup> = *I*<sup>1</sup> ∩ *I*<sup>2</sup> inhibit both *C*<sup>1</sup> and *C*2. **(B)** A more detailed model showing how a neuron in assembly *E*<sup>1</sup> receives noisy inhibitory and excitatory signals from other percepts. It can then directly excite percept *C*<sup>1</sup> while indirectly inhibiting *C*<sup>2</sup> via the excitation of intermediary inhibitory neurons that terminate on *C*2. The "Ventral Stream" is the feedforward direction of the neural signaling as shown in **Figure 4**.

exposure to a game position (de Groot and de Groot, 1978). The idea is that for exceptionally familiar positions the next move is so well understood that very little (if any) further analysis is necessary in order to know what the best move is. In such situations no comprehension of the other player's mental state is necessary, purely perceptual processes based on their extensive experience are sufficient to explain an expert's behavior and performance. If this is the case then very little planning and control beyond the early stages of perception is needed and a player's decisionmaking can circumvent the slow and computationally expensive sequential process of searching multiple alternative branches of play in order to find the best strategy and instead can move directly to organizing the motor pathways necessary to physically move the player's arm to make the move on the board. In **Figure 4** this is shown as the frontal region of the inferotemporal cortex providing a contextual signal directly to the primary motor area, i.e., a contextual signal generated by the activation of a single Template (see **Figure 2**) can contain sufficiently unambiguous information on which to base the next move. Also in **Figure 4** can be seen how such a contextual signal might connect directly to the frontal eye fields in order to signal the eyes of an expert [an expert's Templates guide their visual search Chun and Jiang, 1998] to quickly orientate their eyes to relevant regions of the board.

Even for very experienced players skilled perception is often necessary but it is not always sufficient for strong play, in such cases the state of the game will have information that enables a player to refine their choices but still leaves ambiguous exactly which move to make. So templates need to be supplemented with more deliberate strategic planning and analysis. Such *executive control* for any task is thought to occur in the prefrontal cortex (PFC, **Figure 4**) (Koechlin and Summerfield, 2007). The PFC integrates information from diverse regions of the brain (Miller and Cohen, 2001), this is part of a bottom–up process but note that the PFC can be circumvented for rapid and automatic behaviors (Miller and Cohen, 2001). The PFC also exerts top–down control, for example modulating earlier perceptual signals (Bar et al., 2006) and even directly eliciting long term memories stored

**FIGURE 4 | A schematic representation of the major cortical pathways of expertise: feedforward visual processing (red), strategic Theory of Mind and associated reward mechanisms (green) and top down planning and control (blue) discussed in the main text.** The anterior-most frontal regions integrate information from a strategic ToM network and a perceptual network for recognizing individual items of intermediate complexity. The categorical signal projects to a region posterior to the anterior regions of the frontal cortex that are associated with high level strategic planning and control processes. This allows for the possibility of the contextual signal to rapidly activate an eye saccade (and other motor activities) to a strategically relevant portion of the visual field without passing through the top–down planning areas in the PFC. Such search guided by implicitly learned visual cues was established by Chun and Jiang (1998, 1999) for large but relatively simple environments. V1, visual area 1 in the occipital cortex; OTJ, occipitotemporal junction; TPJ, temporoparietal junction; STS, superior temporal sulcus; IT, inferotemporal cortex; PC, precuneus; CN, caudate nucleus; ACC, anterior cingulate cortex; PFC, prefrontal cortex; Cont., visual context integration; FEF, frontal eye field; PM, primary motor cortex.

in the inferotemporal cortex (IT) (Tomita et al., 1999) (**Figure 4** only shows feedforward signals to the PFC, but feedback pathways exist from the PFC to the IT). From this point of view the PFC modulates and integrates perceptual signals with internally generated goals, plans future actions and acts as an informational switchboard for other regions of the brain. So it is in the PFC that we should expect planning, strategic analysis and forward search to be carried out for complex tasks such as chess (see sections 3.1– 3.3 for a selection of the literature supporting this), which might include a representation of another player's state of mind.

In terms of chess playing, when a player is planning their next sequence of moves, each player will associate a different "value" or place different constraints on the same move, for example a Black Kingside castling is never a move that white can make, so the constraint on this move is "illegal" for white but "legal" for black. A more sophisticated example is how a player represents the motivations on which another player bases their decisions. To illustrate this imagine a very well known chess opening has been played out for the fist three moves by each player. How does player 1 consider their choice of next move given that it is only a "good" move within the context of what player 2 might do in reply? It is not sufficient for player 1 to have a singular value of a move, a move's value is contextualized by the other player's likely next move, and player 2's likely next move is based on how player 2 values their move within the context of what player 1 will do following player 2's move. From this point of view player 1 needs to encode their estimate of the value of a move as well as the value player 2 will attach to the move player 2 is likely to make next. This regressive process of player 1 evaluating their choices in the context of player 2's likely choices is also the basis of economic game theory and requires each player to be able to approximate the other's mental strategic space which can include the other player's evaluation of the game, the strategy they seem to be following as well as the other's experience and habits, should these be known to player 1. These cognitive processes of the PFC are informed by the early perceptual signals the PFC receives from regions such as the IT but they are also based on a player's internal representation of their own strategy and how this strategy is contextualized by their internal representation of the other player's strategy.

With these ideas in mind, research into our "Theory of Mind" (ToM) focuses on the psychological and neurological mechanisms through which we understand the internal mental states and goals of other people (Lieberman, 2007) but it has not previously been connected with board games and expertise, and only in simple economic games such as those used in neuro-economics have SDEs been used to model these simple choices. ToM research covers a very broad range of topics, from psychological and neurological development through to genetic differences, traumatic brain injury and neuro-degenerative diseases. This breadth is due at least in part to the extensive interrelated cognitive processes that are involved and the very deep connection that our ToM has to the way in which we introspectively view ourselves, others and the choices we individually and collectively make.

In this section the goal is to first emphasize the overlapping neurological processes that play a role during ToM processing tasks, economic games (simple games), and board games (complex games). An important caveat is that the resultant model represents a strict subset of neurological processes that will be called our *strategic ToM*. Having illustrated the plausibility of a common neural mechanism, an SDE model of strategic social interactions will be introduced. The significant components are the cognitive ability to separate rewards received in the first person versus rewards received in the second person and a conscious perception of the relevant components in the external environment.

#### **3.1. NEURO-ECONOMIC GAME THEORY AND OUR "STRATEGIC IQ"**

The term *strategic IQ* was introduced in Bhatt and Camerer (2005) where the neural correlates of self-referential strategic reasoning, i.e., reasoning about someone else reasoning about you, in economic games were demonstrated using fMRI analysis. As discussed above, this is a minimal cognitive ability necessary to understand our actions in the context of other people's actions, but it is not the only necessary mechanism. In this section a sample of the fMRI literature on game theory, strategic IQ and ToM is explored and in the section that follows our strategic ToM is introduced and the literature from board game expertise supporting such an extension is surveyed.

A key point of interest in neuro-economics involves the neural regions that are active when we are thinking about our decisions in the context of other people's decisions. This entails at least some of the mechanisms that are active during ToM tasks and so there is an overlap between ToM research and neuroeconomics. This has recently lead Yoshida et al to proposes a *game ToM* (Yoshida et al., 2008) using the simpler economic games to motivate their basis of a ToM. ToM studies include a broadly defined and general purpose network that is activated in many situations in which inferences need to be made regarding the cognitive states of others. But there is also a specific sub-network that is activated in strategic interactions such as economic games where assessing another person's internal states is necessary for our performance in strategic decision making. As such accurate predictions regarding how others think about their environment as well as how they think about us improves our outcome in the interaction. This second definition narrows our focus, the ToM networks of neuro-economics are concerned with strategies and expected rewards and this circumscribes the situations considered.

The broad definition of a ToM neural network frequently includes (Amodio and Frith, 2006) the medial prefrontal cortex (mPFC), the temporal pole, the superior temporal sulcus (STS), the anterior cingulate cortex (ACC) and the temporoparietal junction (TPJ). Many of these areas may or may not be active during strategic interactions as they might play a role in more general social cognition. The TPJ for example appears to be active in many different social contexts such as when a person is simply observing other people interacting (Saxe and Kanwisher, 2003).

Using a combination of results from game theory and fMRI studies a related network of neural activity can be identified. This article is not a review of this entire field, but there are two specific types of task to focus on. In the first type a subject plays a game against a computer or a human and differentiated neural activity shows brain regions that are active when we play strategically against another socially aware subject. This enables us to differentiate between "social" and "non-social" strategic interactions and the associated brain activations. The second type is one in which subjects play games that involve different levels of strategic thinking regarding their opponent's thought processes. In this second task, a key finding is the correlation between the reward earned and increases in brain activity in specific areas.

When playing strategic games against a human as opposed to a computer the brain regions that are differentially activated include the ACC (McCabe et al., 2001; Gallagher et al., 2002; Sanfey et al., 2003), the STS (Rilling et al., 2004; Fukui et al., 2006; Coricelli and Nagel, 2009), the TPJ (Krueger et al., 2008; Coricelli and Nagel, 2009; Carter et al., 2012), the mPFC (McCabe et al., 2001; Bhatt and Camerer, 2005; Coricelli and Nagel, 2009) and the caudate nucleus (CN) (Bhatt and Camerer, 2005; Delgado et al., 2005; Rilling et al., 2008). As these regions are differentially more active for human opponents than computer opponents it suggests that these brain regions play a role in social competitive situations. This does not preclude them from playing a role in other strategic and/or social tasks of course.

On the other hand a player's strategic IQ is correlated with a related network of brain regions. In one recent study stronger activity in the precuneus and the CN (Bhatt and Camerer, 2005) correlate with strategic IQ in games that differentiated between degrees of belief regarding the other player's strategy. In a similar study it was shown that the depth of interpersonal strategic reasoning co-varied with activity in the medial PFC (Coricelli and Nagel, 2009). In a third study, along with the reward prediction based activity of the medial PFC it was shown that activity in the posterior STS was strongly correlated with the influence a player's action's had on another player (Hampton et al., 2008). These studies identify a network of key brain regions that are active in strategic situations involving economic rewards: the mPFC, the CN, the ACC, the posterior STS and the PC, all of which are strongly related to the strategic success of a player and that have an overlap with those regions that also play a role in our ToM network.

#### **3.2. PERCEPTION, GAMES AND A STRATEGIC THEORY OF MIND**

Board game expertise activates a large neural network with many interacting brain regions that can be differentiated on the basis of the task involved. In recent work it has been shown that the neural networks activated by game experts involves a large number of brain areas, some belong to the visual system and some to the ToM system, but these findings have not yet been integrated in terms of the overlaps and differences possibly due to the different research areas to which they belong. This section discusses four particular articles that have recently shed significant light on the different brain regions involved in board game expertise. These results are discussed in terms of a single system for board game expertise that encompasses both ToM and visual perception.

In two recent fMRI studies (Bilalic et al., 2010, 2012 ´ ), Bilalic´ et al have explicated the brain regions that are activated in expert chess play and their relationship to rapid eye movement toward areas of strategic importance. This is a purely perceptual body of work and so usefully isolates the perceptual mechanisms that generate eye movements without the need for considering the social context in which games are played (cf. the introductory remarks to section 3). Specifically they identified the ventral visual path (in the temporal cortex) as playing an important role in recognizing game pieces as well as familiar positional relationships between the game pieces in support of the role the IT plays in generating eye movement signals (cf. **Figure 4**). In the dorsal visual path the region forming a conjunction with the parietal, occipital and temporal cortices was found to be related to specific game pieces and their functional roles. A further activation in the retrosplenial cortex was also observed, a region that has been identified with scene context and the authors suggested this region plays a role in parsing the relationships between objects. Beyond the neurological findings, both of these studies highlighted the differences in eye movement between novices and experts. Experts focused quickly on the task relevant pieces in the scene and ignored irrelevant pieces whereas novices attended to irrelevant pieces much more often. In control tasks in which game piece relationships could not be used to guide the expert's behavior their performance decreased significantly but still maintained an advantage over novices.

A third article by Wan et al. (2011) considered three ranks of players, low ranked amateurs, high ranked amateurs and professionals of the Japanese board game Shogi. The players were required to generate the next move as quickly as possible while fMRI brain imaging followed the time-course of neural activity. The PC and CN were two regions that were strongly activated in professionals but not amateurs. As previous studies have shown that the PC is activated in understanding social contexts (Huth et al., 2012) and the CN is activated in strategic interactions with other people (Delgado et al., 2005; Rilling et al., 2008) as well as correlating with depth of strategic reasoning (Bhatt and Camerer, 2005), this suggests that the results of Wan et al. (2011) overlap with strategic reasoning, and as argued above strategic reasoning is a subset of the cognitive processes used during some ToM tasks. There was also considerable activity in the dorsolateral PFC for both amateurs and professionals when contrasts were made with control tasks. The authors concluded in part that this was not a direct "stimulus-response" activation as the players reported being unable to figure out a complete strategy before making their next move selection. Instead they concluded that "the generation of the next best move had to be based on perception of key features extracted from the pattern but not the pattern itself. In other words the mapping from inputs to outputs had to be categorical." Furthermore, the players were not able to picture all of the necessary intermediary moves required to complete the checkmate task they were given, instead they were only able to "get an idea of the arrangement of key pieces at the final checkmate." Beyond the activation of the PC and CN, an indicator of a socio-neural response, the conclusion that can be drawn is that categorical recognition and pattern completion are two key aspects of an expert's ability to quickly generate the next move in board games.

The fourth study was on the role of expertise in board games (Duan et al., 2012) in combination with the "default mode network" (DMN) (Raichle et al., 2001). The DMN is the resting activity of the brain and it plays a significant role in our understanding of ToM (Spreng et al., 2009). What is most interesting about this network is that it is significantly deactivated during goal directed tasks, presumably so that our cognitive functions can focus on the external environment rather than internal, reflective or introspective ruminations. In the study by Duan et al. (2012) Masters and Grand Masters (experts) of the game Chinese chess were imaged using fMRI for their resting state neural activity and for their task induced (Chinese chess problem solving) neural activity. There were two key findings: experts significantly deactivated the DMN relative to novices and the CN was considerably more active in the expert's DMN than that of the novice's. This should be contrasted with the areas commonly associated with ToM, in the earlier list of ToM brain regions the CN was excluded, this finding by Duan et al suggests that the DMN network (and therefore the ToM network) is significantly different for experts.

The relevance of these findings is in the relationship between the neural networks activated during economic games, during expert task execution and our understanding of another person's cognitive state. With this in mind the CN plays a striking role: it is significantly active in the DMN for experts, board game experts and in strategic economic games. The role the CN is commonly attributed with is in relation to feedback based learning (Haruno et al., 2004) so it is not so surprising that it should be active in economic games when rewards are earned based on performance, and perhaps even in the case of board games where rewards tend to be more abstract (winning or losing after many moves are made as opposed to money received immediately). But its role in the DMN is not immediately intuitive, but it might be understood to play a role in the encoding of another's payoffs as well as personal payoffs. Three recent papers have shown that this is a plausible role for CN; the macaque monkey CN encodes its own rewards as well as social status (Santos et al., 2012), in humans the CN is active in cooperation between people where no rewards are forthcoming (Krill and Platek, 2012) and it also encodes another's "moral character" in economic games (Delgado et al., 2005). These studies point to the CN as potentially encoding value judgements regarding other players as well as our own. Such estimates of another person's evaluation of the situation seems a likely minimum for estimating the motivations for another person's choices, and so representing another person's estimated value in order to model their motivations may well co-opt the pre-existing system of reward feedbacks that play a critical role in motivating our own decisions in a non-social context.

Strategic IQ is a measure of the depth to which we are able to reason about other people thinking about us thinking about them thinking about us etc. as measured by payoffs in economic games (Coricelli and Nagel, 2009). This is potentially an infinite regress for which there is no stopping point, however, it has been shown that an *equilibrium point* can be reached in this dynamic, and this equilibrium has been measured in the neural activity of people playing economic games (Bhatt and Camerer, 2005), but to date no theoretical model of the neural mechanisms involved has been proposed. On the other hand, a ToM is a very general cognitive process (Lieberman, 2007) that enables people to build a representation of the cognitive states of another person, including their beliefs, constraints, perceptions and potentially a ToM allows one person to build an internal representation of another person's representation of them. So the definitions of ToM and strategic IQ have commonalities but they are either too broad (ToM) or too narrow (strategic IQ) to capture the processes that are likely being used by decisionmakers in complex, social-competitive situations. To address this the term Strategic ToM encompasses the psychological aspects, neural dynamics and subsequent equilibrium points necessary to finitely represent our representation of another person thinking about us, but expanding on the strategic IQ notion to include the ToM components important to complex social-competitive decision-making.

Taken collectively, these studies have identified a network of activity that encodes visual perceptual cues and task specific objects as well as reward and social learning mechanisms in combination with aspects of a strategic ToM. Some of the most commonly cited and important brain regions in this network are identified in **Figure 4** for the perceptual aspects (red) and the strategic ToM aspects (green). This is necessarily a reduction to only the simplest functional roles and relationships, but it gives an indication of how these regions likely combine together to form a multifunctional network of interrelated brain regions. In the next section a theoretical model of rewards and strategic ToM is introduced, providing a theoretical approach to understanding some of the mechanisms discussed in this section.

#### **3.3. AN EXAMPLE OF PLAUSIBLE NEURAL ACTIVATIONS DURING GAME PLAY**

Before introducing the SDE dynamics of a strategic ToM we want to motivate what follows by illustrating the ideas presented so far with a worked example. To begin, a chess game opening is already in progress (as shown in **Figure 5**) and we ask what are the brain regions that might be activated in seeing this game and what roles do they play? We use the simplifying assumptions that most of the signals we are interested in will first pass through region V1 so that we are only considering the visual aspects of the game and that the players only ever consider two moves as they search forwards in the game looking for good moves to make. This second simplification makes the discussion much simpler, but the ideas can be readily extended to multiple base moves and multiple subsequent branches.

The social and reward related neural networks (green path in **Figure 4**) are activated when the player first sees the game. Generally the TPJ is activated if the situation requires understanding the internal mental states of another person (Saxe and Kanwisher, 2003), it is activated when strategically thinking about other people rather than computers (Krueger et al., 2008), its activation level correlates with the depth of strategic reasoning (Coricelli and Nagel, 2009) and it is associated with socially guided decisions (Carter et al., 2012). The PC is activated if the current situation is a social context involving people, movement, certain animals, cars, tools, equipment, talking etc. (Huth et al., 2012) as well as ToM tasks (Saxe and Kanwisher, 2003), but in this case it happens to be a chess game in progress (Wan et al., 2011), perceived visually by the chess board, game pieces and another player. The CN activation is strongly associated with activation of the PC in board games (Wan et al., 2011) and with the ACC when learning (reward feedback) in economic games (Sanfey, 2007) and in distinguishing between "me" and "not me" based rewards (Tomlin et al., 2006). Finally in this social/reward path the STS is activated during ToM tasks (Saxe and Kanwisher, 2003) and in the perception of intentional behavior in other people (Gallagher and Frith, 2003). Taken as a combination of activations, this neural network recognizes the social context of the board game and the tools of this social context (the board and chess pieces). It also recognizes that another person is involved in

first from the initial position and the tree analysis is simplified to considering only two possible branches at each stage. Knight to *f*6 and Pawn to *d*6 are two different "base moves" from which black then begins their analysis of subsequent play. In order for black to estimate the success of these two base moves they need to consider how White will reply, and the way White will reply depends on how white evaluates how black will reply to White's moves, hence Black needs to evaluate how white will evaluate Black's reply to

White's next move. Note that Black's final move choices shown here (pawn to *d*5 and Bishop to *c*5) are the same irrespective of what White chooses prior to these moves, this is a strategy that might occur in a real game. However, white values Black's possible responses differently depending on what move White is considering (either the Knight or the pawn) and subsequently this changes what Black might prefer to do when choosing either the Knight or the pawn as their very first move in this sequence. Such considerations in complex games have some commonality with economic games.

the situation and that they have internal mental states that will play a role in the decisions that will need to be made. Finally the feedback from the outcome needs to distinguish between rewards received by the first player ("me") and the second player ("not me") for their respective choices in order to accurately attribute each player's gain or loss to the relevant player and strategy, see section 3.4 for further discussion about each player's rewards and their influence on choices. This establishes the immediate social context in which decisions will be made.

The visual perception of this opening position is also activated when first seeing the game. The occipito-temporal junction (OTJ) is differentially activated for chess experts when compared to both controls and non-expert chess players as well as playing a role in guiding the eye movements of experts when searching the board for their next move (Bilalic et al., 2010, 2012 ´ ). The ventral stream along the inferotemporal cortex as a whole has been extensively studied in humans and other primates (Kriegeskorte et al., 2008) and it has been computationally modeled in terms of more and more complex visual representations of larger and larger portions of the visual scene (Serre et al., 2007). So this ventral pathway identifies visual objects such as individual chess pieces and constructs progressively more complex representations of these objects, including their familiar spatial relationships. As has been suggested in Wan et al. (2011) sufficiently complex representations of a board game are categorical representations, it is not an exact pattern matching process. Once the ventral path has aggregated the visual scene to the extent the player's experience makes this possible, a "context" signal can follow one of two paths. If there is sufficient information in the contextual signal to suggest a single move then the primary motor area (PM, **Figure 4**) is signalled to make that move. Alternatively a signal arrives at the frontal eye field (FEF, **Figure 4**) to tell the eyes where to move to next in order to explore different regions of the board, this is the basis of expertise guided search based on the contextual cues embedded in scenes (Chun and Jiang, 1998) and games (Bilalic´ et al., 2012). This establishes the initial perceptual processing of the board as a visual "scene."

The final process shown in **Figure 4** is the activation of the PFC and the subsequent processes that lead to the player actually making a move on the board (blue path). In the chess example this is where the social context, the differentiated roles of the two players and the perceptual information are integrated so that a coherent strategy can be developed. If a move has not yet been made (i.e., the context was not sufficient to suggest a move to make immediately) then the eyes are searching the scene providing more information to the PFC. This information needs to be integrated in terms the constraints, plans, goals and incentives of the player as well as a representation of the same mental states of their opponent. With this in mind Koechlin et al. (2003) and Koechlin and Summerfield (2007) have proposed a model of cascading levels of processing that begins in the most anterior regions of the PFC and ends at the posterior region of the PFC just before the PM cortex, this is the anterior to posterior path shown in blue of **Figure 4**. Importantly, the most anterior regions of the PFC seem to play a role in the integration of the outcome of multiple cognitive processes when a person is pursing a higher behavioral goal (Ramnani and Owen, 2004). In terms of exploring and planning possible moves in chess, the eyes foveate a potentially useful region of the board, this is called a "base move" (Gobet, 1997), guided by perceptual templates and from this region branching strategies of potential moves the player and their opponent might make are then searched to find a good intermediate position in the game (either a piece captured or a strong strategic configuration). So in **Figure 5** the Black player's eyes initially saccade to the pawn at *d*7 and the player considers moving this pawn to *d*6 from which a number of alternatives are possible for White to then play and Black to then reply etc., two of White's options are shown in **Figure 5**. The black player's eyes then saccade to the knight at *g*8 and considers the sequence of moves that begins with a move of the knight to *f* 6 and a sequence of possible plays is shown in **Figure 5**. Once the black player has decided which of these two strategies to adopt (knight to *f* 6 or pawn to *d*6), the neural activation *cascades* (Koechlin and Summerfield, 2007) from the anterior regions of the PFC in which strategic planning and branching has been mapped out at the conceptual level to the posterior regions of the PFC where a motor plan leads to the moving of the relevant parts of the body (Ramnani and Owen, 2004) to shift a game piece in order to make the first move. While this style of reasoning is somewhat similar to that used in the much simpler (and strategically different) economic games (Bhatt and Camerer, 2005), it is far more complex in that perception, constraints, learning and uncertainty in evaluations play significant roles in the decision-making process over and above the purely strategic structure of the game and the payoffs of economic theory.

#### **3.4. STOCHASTIC DYNAMICS FOR A STRATEGIC THEORY OF MIND**

In order to model one player's internal representation of another player, and how they use this perspective to evaluate their own strategy, it is necessary to represent the neural processes that are used in evaluating multiple different strategic alternatives that both players might consider, as illustrated in **Figure 5**. In order to do so, the following borrows significantly from the economic game theory literature in a similar fashion to that of Yoshida et al. (2008) but within the context of strategically complex games and using SDEs to model the underlying neural dynamics within the PFC. To begin, we identify the neural encoding of a player's strategy with levels of neural activity, a strongly favored strategy is reflected in higher levels of neural activity in an analogous fashion to other decision-making activations in other regions of the brain (Gold and Shadlen, 2001; Brown et al., 2009; Simen et al., 2009; Rorie et al., 2010). In this case *K<sup>b</sup>* represents the level of neural activity associated with Black's encoding of the choice *Knight to f6* and *P<sup>b</sup>* represents the level of neural activity associated with choice *pawn to d6*. We will also use the notation *p(Kb)* and *p(Pb)* to represent the probabilities of black choosing each of these two moves. Either of these two choices by Black leads to different choices by White when they move next (see **Figure 6**) and Black needs to estimate the "value" or "weight" (greek letters in **Figure 6**) attributed to each of these two outcomes for Black; Black plays Knight and either White knight to *g*5 = φ*kg*<sup>5</sup> or White knight to *c*3 = φ*kc*3, alternatively Black plays pawn and either White pawn to *c*3 = ϕ*pc*<sup>3</sup> or White pawn to *h*3 = ϕ*ph*3. So Black also needs to encode a representation of White's likely choices, we represent the neural activity in Black's PFC associated with White's choices using labels for White's four possible moves shown in the left of **Figure 6**: *<sup>K</sup>g*<sup>5</sup> *<sup>w</sup>* and *<sup>K</sup>c*<sup>3</sup> *<sup>w</sup>* if Black plays Knight, *Pc*3 *<sup>w</sup>* and *<sup>P</sup>h*<sup>3</sup> *<sup>w</sup>* if Black plays pawn. With this interpretation we can represent changes in the level of neural activity for each move's neural encoding as a function of the other player's strategy with some statistical variation:

$$d\mathcal{K}\_b = \left(\phi\_{k\mathfrak{g}5} \mathcal{K}\_w^{\mathfrak{F}5} + \phi\_{k\mathfrak{c}3} \mathcal{K}\_w^{\mathfrak{c}3}\right) dt + \sigma\_\mathcal{K}^b dW\_\mathcal{K} \tag{8}$$

$$d\mathcal{P}\_b = \left(\varphi\_{\mathbb{P}^{c3}} \mathcal{P}\_w^{c3} + \varphi\_{\mathbb{P}^{h3}} \mathcal{P}\_w^{h3}\right) dt + \sigma\_{\mathcal{P}}^b dW\_{\mathcal{P}} \tag{9}$$

These are a pair of Ornstein-Uhlenbeck (drift diffusion) equations in which the drift terms are not constant as they depend on other dynamic variables, in this case the levels of neural activity *<sup>K</sup>g*<sup>5</sup> *<sup>w</sup>* , *<sup>K</sup>c*<sup>3</sup> *<sup>w</sup>* , *<sup>P</sup>c*<sup>3</sup> *<sup>w</sup>* , and *<sup>P</sup>h*<sup>3</sup> *<sup>w</sup>* , discussed shortly. Looking at the rate of change in neural activity for Black's Knight move *dK<sup>b</sup>* it is composed of a weighted sum (the weights are the constants φ*kg*<sup>5</sup> and φ*kc*3) of the current level of activity of Black's neural encoding of White's options of two different Knight moves. Just as in Equation (1) there are noise terms σ*<sup>b</sup> <sup>K</sup>* and <sup>σ</sup>*<sup>b</sup> <sup>P</sup>* representing non-systematic errors in the encoding of the strategies. Both *dK<sup>b</sup>* and *dP<sup>b</sup>* are independent of each other in that the level of activity of one variable does not influence the other (neither term appears in the expression for the other). The four weights that appear in Equations (8, 9) (φ*kg*5, φ*kc*3, ϕ*pc*3, and ϕ*ph*3) are based upon Black's previous experience of their own ability in playing these two strategies. For example Black believes they play *K<sup>b</sup>* with strength <sup>φ</sup>*kg*<sup>5</sup> against White's *<sup>K</sup>g*<sup>5</sup> *<sup>w</sup>* and with strength φ*kc*<sup>3</sup> against White's *<sup>K</sup>c*<sup>3</sup> *<sup>w</sup>* . These are subjective estimates a player has developed through experience and are subject to uncertainty in there estimation, particularly when the strategies in question are unfamiliar or the other player's ability is unknown.

The Black player arrives at the decision to play either the Knight or the pawn when the absolute value of either *K<sup>b</sup>* or *P<sup>b</sup>* reaches a certain threshold value thereby signaling a decision (in principle similar to **Figure 1**, but see Bogacz et al. (2006) for details) and this signal cascades from the anterior PFC to the posterior PFC where this first move in Black's strategy is then turned into a motor plan by the PM cortex. The rate at which the neural activity of either *K<sup>b</sup>* or *P<sup>b</sup>* reaches this threshold depends on the fixed weights and the neural activity associated with White's strategy. The fixed weights can be attributed to learned and reinforced behaviors and so do not change during the time-course of a single decision, but note that these feedback (reward) based weights need to be correctly attributed to each player, so the neural encoding needs to reflect which player did what and what each player received as feedback. Incorrectly attributing feedback to the actions of different players will result in misattributing the weights associated with each player's strategy in future games. The neural activity associated with White's choices is a dynamic quantity associated with Black's representation of what Black believes White will choose to do after Black has made their move. This requires Black to represent White's decision-making process and White will choose the strategy that best advantages them given what White thinks Black will do following White's move. Just looking at *dK<sup>b</sup>* above, Black needs to encode the following decision-making processes in order to accurately represent what White is likely to do next:

$$d\mathcal{K}\_{\boldsymbol{w}}^{c3} = \left(\phi\_{\boldsymbol{p}d5} \mathcal{P}\_{\boldsymbol{b}}^{d5} + \phi\_{\boldsymbol{b}c5} \mathcal{B}\_{\boldsymbol{b}}^{c5}\right) dt + \sigma\_{\mathcal{g}5}^{\boldsymbol{w}} dW\_{\mathcal{K}} \tag{10}$$

$$d\mathcal{K}\_{\boldsymbol{w}}^{\mathcal{G}5} = \left(\varphi\_{\boldsymbol{p}d5} \mathcal{P}\_{\boldsymbol{b}}^{d5} + \varphi\_{\boldsymbol{b}d5} \mathcal{B}\_{\boldsymbol{b}}^{\mathcal{G}}\right) dt + \sigma\_{c3}^{\boldsymbol{w}} dW\_{\mathcal{K}} \tag{11}$$

Note that White's move of either *<sup>K</sup>g*<sup>5</sup> *<sup>w</sup>* or *<sup>K</sup>c*<sup>3</sup> *<sup>w</sup>* have fixed weights on the right hand side of Equations (10, 11) representing the learned payoffs White has for Black's moves of pawn to *d*5 and Bishop to *c*5, but while there are only two Black moves in Equations (10, 11) (see the caption to **Figure 5**), the weights White attributes to these two possible choices of Black's are different because White playing Knight to *g*5 first is strategically different to White playing pawn to *c*3 first, hence the payoff weights for White are different and this has a follow-on effect in Black selecting a move in Equations (8, 9). In this sense Black has an internal representation that is encoded in their neural activity of how White will make a decision, and this in turn depends on what Black will do in response to White's choices. Also note that Black needs to be able to encode the payoffs to White in order to estimate White's possible choices, so Black needs to distinguish between their payoffs and White's payoffs.

In such a situation it is not obvious that there is a solution to these dynamics that allow Black to settle on a decision as to which is the best move to make. Fortunately these types of SDEs have known solutions, particularly in the case where the drift term, that which appears immediately before the *dt* in Equations (8), (9), (10), and (11), is linear in the dynamic variables (Plastino and Plastino, 1998). These solutions take the form of probability distributions over the strategies that Black has available to them:

$$p(\mathcal{K}\_b) = \frac{\exp\left[\beta\_b \left(\phi\_{\mathbb{k}\mathcal{G}} p(\mathcal{K}\_w^{\mathbb{g}^5}) + \phi\_{\mathbb{k}c3} p(\mathcal{K}\_w^{\mathbb{g}})\right)\right]}{\exp\left[\beta\_b \left(\phi\_{\mathbb{k}\mathcal{G}} p(\mathcal{K}\_w^{\mathbb{g}^5}) + \phi\_{\mathbb{k}c3} p(\mathcal{K}\_w^{\mathbb{g}})\right)\right]} \quad \text{(12)}$$

$$+ \exp\left[\beta\_b \left(\phi\_{\mathbb{p}c3} p(\mathcal{P}\_w^{\mathbb{c}3}) + \phi\_{\mathbb{p}h3} p\left(\mathcal{P}\_w^{\mathbb{h}3}\right)\right)\right]$$

$$p(\mathcal{P}\_b) = \frac{\exp\left[\beta\_b \left(\phi\_{\mathbb{p}c3} p(\mathcal{P}\_w^{\mathbb{c}3}) + \phi\_{\mathbb{p}h3} p(\mathcal{P}\_w^{\mathbb{h}3})\right)\right]}{\exp\left[\beta\_b \left(\phi\_{\mathbb{k}\mathcal{G}} p(\mathcal{K}\_w^{\mathbb{g}5}) + \phi\_{\mathbb{k}c3} p(\mathcal{K}\_w^{\mathbb{g}3})\right)\right]} \quad \text{(13)}$$

$$+\exp\left[\mathfrak{k}\_b \left(\varphi\_{\mathbb{P}^{c3}}\mathfrak{p}(\mathcal{P}^{c3}\_{\text{w}}) + \varphi\_{\mathbb{P}^{h3}}\mathfrak{p}\left(\mathcal{P}^{h3}\_{\text{w}}\right)\right)\right]$$

where *<sup>p</sup>(Kg*<sup>5</sup> *<sup>w</sup> )* and *<sup>p</sup>(Kc*<sup>3</sup> *<sup>w</sup> )* are probability distributions defined in an analogous fashion as Equations (12, 13) but for the White player and a similar substitution for β*<sup>b</sup>* as for Equation (7) has been made in order to simplify the expression. Both *<sup>p</sup>(Kg*<sup>5</sup> *<sup>w</sup> )* and *<sup>p</sup>(Kc*<sup>3</sup> *<sup>w</sup> )* have a further dependence on the probabilities over Black's subsequent move choices after White's next move. Note that although Equation (7), (12), and (13) describe entirely different cognitive processes they have a very similar functional form:

$$p(\mathbf{x}\_i) = \frac{\exp[\beta f\_l(c, \mathbf{x})]}{\sum\_k \exp[\beta f\_k(c, \mathbf{x})]} \tag{14}$$

in which β is a noise parameter, *fi(c, x)* are linear functions of constants *cj* and dynamic variables *xi* represented here by a vector of such terms *c* and *x*. Such exponential forms of probability distributions over choices are common in models of bounded rationality in economics (McKelvey and Palfrey, 1995; Wolpert et al., 2011), in computational models of simple reinforcement learning (Williams, 1992; Sato et al., 2002) and as models of neural activity in theoretical psychology (Bogacz et al., 2006; McMillen and Holmes, 2006). However, to date the connection between the generic dynamics of SDEs, theoretical neuropsychology and expertise does not appear to have been made.

#### **4. DISCUSSION**

The need to distinguish between that which motivates a person's choices and that which motivates another person's choices, and how these motivations interact in a single individual's decision making processes is a critical component to the way in which we interact socially. How such strategic considerations are then integrated with our perceptual understanding of the environment is a challenging question that can be discussed in terms of skill in competitive board games.

This article has developed a model of visual perception and opponent modeling within the framework of SDEs representing task specific neural activity and subsequent decision-making. The modeling of game-scene perception follows a well developed research paradigm of progressively more sophisticated representations of the scene culminating in the most complex representations that a player's experience allows for. In the highest ranking experts this can result in an initial, rapid perception of the board being sufficient to initiate a single good move almost immediately after being presented with a game. More generally this expert perception activates the eyes to rapidly and efficiently search the board for the most likely candidate regions and base moves from which to explore possible branches of play. Once one of a small set of good base moves has been selected an expert is able to search forward in the game tree with greater strategic depth than a nonexpert and to be able to more effectively estimate the likely replies of their opponent and how these replies influence the player's choice of the next move to make. While these general psychological results have existed for some time now, there appears to be no previous analysis of how SDEs, their dynamics and probabilistic outcomes in terms of neural activity are related to the psychological literature of expertise.

The formal representation of how a player might model their opponent's state of mind, particularly their strategy space, incentives, constraints, and the influence these aspects have on their decisions, is a challenging task. However, the formal techniques have been available for some time and the results in terms of probability distributions are not particularly divergent from some previously established theoretical models. The demanding task is in the integration of the vast amount of data available from the neuro-imaging literature into a coherent whole that is both consistent and convincing. Theoretical (Yoshida et al., 2008) and empirical (Bhatt and Camerer, 2005) arguments for a game theoretical basis of a ToM have appeared in the literature, but these have not previously been extended to more complex tasks, expertise or used SDEs as the basis for their modeling. Furthermore, the relationship to the broader neuroscience literature has had almost no coverage in this respect, specifically how the different neural networks and their functional roles might be integrated as a whole in the modeling of expertise. The critical difference in the approach put forward here is that, unlike previous neuralconnectionist-reinforcement paradigms, this model represents a player's internal representation of the other player's internal strategic state of mind, not just their own. This entails several important cognitive steps, principally recognizing that the task environment contains another cognisant entity that will dynamically adapt their choices according to their beliefs or expectations about the choices others will make. It also requires the motivations and constraints of this other entity to be internally represented and so we need to consider how our ToM and reward mechanisms interact with our strategic perspective (and how we model the strategic perspective of others). This represents a significant step in showing how different cognitive processes might be integrated to help explain some of the prodigious skills we are all capable of expressing to some extent, and the role these skills might play in a broader context, such as our everyday social lives.

#### **FUNDING**

This work was supported in part by US AirForce Grant AOARD 104116.

#### **REFERENCES**


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 06 August 2013; accepted: 21 November 2013; published online: 17 December 2013.*

*Citation: Harré M (2013) The neural circuitry of expertise: perceptual learning and social cognition. Front. Hum. Neurosci. 7:852. doi: 10.3389/fnhum.2013.00852 This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2013 Harré. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

### ADVANTAGES OF PUBLISHING IN FRONTIERS

FAST PUBLICATION Average 90 days from submission to publication

COLLABORATIVE PEER-REVIEW

Designed to be rigorous – yet also collaborative, fair and constructive

RESEARCH NETWORK Our network increases readership for your article

#### OPEN ACCESS

Articles are free to read, for greatest visibility

#### TRANSPARENT

Editors and reviewers acknowledged by name on published articles

GLOBAL SPREAD Six million monthly page views worldwide

#### COPYRIGHT TO AUTHORS

No limit to article distribution and re-use

IMPACT METRICS Advanced metrics track your article's impact

SUPPORT By our Swiss-based editorial team

EPFL Innovation Park · Building I · 1015 Lausanne · Switzerland T +41 21 510 17 00 · info@frontiersin.org · frontiersin.org