**DISSECTING THE FUNCTION OF NETWORKS UNDERPINNING LANGUAGE REPETITION**

**Topic Editors Marcelo L. Berthier and Matthew A. Lambon Ralph**

HUMAN NEUROSCIENCE

#### *FRONTIERS COPYRIGHT STATEMENT*

© Copyright 2007-2014 Frontiers Media SA. All rights reserved.

All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.

The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.

Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.

Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.

As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.

All copyright, and all rights therein, are protected by national and international copyright laws.

The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use.

Cover image provided by Ibbl sarl, Lausanne CH

**ISSN** 1664-8714 **ISBN** 978-2-88919-364-6 **DOI** 10.3389/978-2-88919-364-6

#### *ABOUT FRONTIERS*

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

### *FRONTIERS JOURNAL SERIES*

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing.

All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

#### *DEDICATION TO QUALITY*

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view.

By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

### *WHAT ARE FRONTIERS RESEARCH TOPICS?*

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area!

Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

## **DISSECTING THE FUNCTION OF NETWORKS UNDERPINNING LANGUAGE REPETITION**

Topic Editors: **Marcelo L. Berthier,** University of Malaga, Spain **Matthew A. Lambon Ralph,** University of Manchester, United Kingdom

In the 19th century, ground-breaking observations on aphasia by Broca and Wernicke suggested that language function depends on the activity of the cerebral cortex. At the same time, Wernicke and Lichtheim also elaborated the first large-scale network model of language which incorporated long-range and short-range (transcortical connections) white matter pathways in language processing. The arcuate fasciculus (dorsal stream) was traditionally viewed as the major language pathway for repetition, but scientists also envisioned that white matter tracts travelling through the insular cortex (ventral stream) and transcortical connections may take part in language processing. Modern cognitive neuroscience has provided tools, including neuroimaging, which allow the in vivo examination of short- and long-distance white matter pathways binding cortical areas essential for verbal repetition. However, this state of the art on the neural correlates of language repetition has revealed contradictory findings, with some researchers defending the role of the dorsal and ventral streams, whereas others argue that only cortical hubs (Sylvian parieto-temporal cortex [Spt]) are crucially relevant.

An integrative approach would conceive that the interaction between these structures is essential for verbal repetition. For instance, different sectors of the cerebral cortex (e.g., Spt, inferior frontal gyrus/anterior insula) act as hubs dedicated to short-term storage of verbal information or articulatory planning and these areas in turn interact through forward and backward white matter projections. Importantly, white matter pathways should not be considered mere cable-like connections as changes in their microstructural properties correlate with focal cortical activity during language processing tasks.

Despite considerable progress, many outstanding questions await response. The articles in this Research Topic tackle many different and critical new questions, including: (1) how white matter pathways instantiate dialogues between different cortical language areas; (2) what are the specific roles of different white matter pathways in language functions in normal and pathological conditions; (3) what are the language consequences of discrete damage to branches of the dorsal and ventral streams; 4) what are the consequences (e.g., release from

inhibition) of damage to the left white matter pathways in contralateral ones and viceversa; (5) how these pathways are reorganised after brain injury; (5) can the involvement/sparing of white matter pathways be used in outcome prediction and treatment response; and (5) can the microstructure of white matter pathways be remodelled with intensive rehabilitation training or biological approaches.

This Research Topic includes original studies, and opinion and review articles which describe new data as well as provocative and insightful interpretations of the recent literature on the role of white matter pathways in verbal repetition in normal and pathological conditions. A brief highlight summary of each is provided below.

# Table of Contents


Sylvie Moritz-Gasser and Hugues Duffau

*28 The Roles of the "Ventral" Semantic and "Dorsal" Pathways in Conduite D'approche: A Neuroanatomically-Constrained Computational Modeling Investigation*

Taiji Ueno and Matthew A. Lambon Ralph

*36 Articulation-Based Sound Perception in Verbal Repetition: A Functional NIRS Study*

Sejin Yoo and Kyoung-Min Lee

*46 Mapping a Lateralization Gradient within the Ventral Stream for Auditory Speech Perception*

Karsten Specht


Marcelo L. Berthier, Seán Froudist Walsh, Guadalupe Dávila, Alejandro Nabrozidis, Rocio Juarez y. Ruiz de Mier, Antonio Gutiérrez, Irene De Torres, Francisco Alfaro, Natalia García-Casares and Rafael Ruiz-Cruces

*102 Sensory-To-Motor Integration During Auditory Repetition: A Combined fMRI and Lesion Study*

'Oiwi Parker Jones, Susan Prejawa, Tom Hope, Marion Oberhuber, Mohamed L. Seghier, Alex P. Leff, David W. Green and Cathy J. Price

*118 Dissecting the Functional Anatomy of Auditory Word Repetition* Thomas Matthew Hadley Hope, Susan Prejawa, 'Oiwi Parker Jones, Marion Oberhuber, Mohamed L. Seghier, David W. Green and Cathy J.. Price

## Dissecting the function of networks underpinning language repetition

#### *Marcelo L. Berthier <sup>1</sup> \* and Matthew A. Lambon Ralph2*

*<sup>1</sup> Cognitive Neurology and Aphasia Unit, Centro de Investigaciones Médico-Sanitarias, University of Málaga, Málaga, Spain <sup>2</sup> Neuroscience and Aphasia Research Unit, School of Psychological Sciences, University of Manchester, UK*

*\*Correspondence: mbt@uma.es*

#### *Edited and reviewed by:*

*John J. Foxe, Albert Einstein College of Medicine, USA*

**Keywords: repetition, arcuate fasciculus, dorsal stream, ventral stream, language, aphasia, conduction aphasia, temporal lobe**

In the nineteenth century, ground-breaking observations on aphasia by Broca (1865) and Wernicke (1906) suggested that language function depends on the activity of the cerebral cortex. At the same time, Wernicke (1906) and Lichtheim (1885) also elaborated the first large-scale network model of language which incorporated long-range and short-range (transcortical connections) white matter pathways in language processing. The arcuate fasciculus (dorsal stream) was traditionally viewed as the major language pathway for repetition, but scientists also envisioned that white matter tracts traveling through the insular cortex (ventral stream) and transcortical connections may take part in language processing. Modern cognitive neuroscience has provided tools, including neuroimaging, which allow the *in vivo* examination of short- and long-distance white matter pathways binding cortical areas essential for verbal repetition. However, this state of the art on the neural correlates of language repetition has revealed contradictory findings, with some researchers defending the role of the dorsal and ventral streams, whereas others argue that only cortical hubs (Sylvian parieto-temporal cortex [Spt]) are crucially relevant.

An integrative approach would conceive that the interaction between these structures is essential for verbal repetition. For instance, different sectors of the cerebral cortex (e.g., Spt, inferior frontal gyrus/anterior insula) act as hubs dedicated to short-term storage of verbal information or articulatory planning and these areas in turn interact through forward and backward white matter projections. Importantly, white matter pathways should not be considered mere cable-like connections as changes in their microstructural properties correlate with focal cortical activity during language processing tasks.

Despite considerable progress, many outstanding questions await response. The articles in this Research Topic tackle many different and critical new questions, including: (1) how white matter pathways instantiate dialogues between different cortical language areas; (2) what are the specific roles of different white matter pathways in language functions in normal and pathological conditions; (3) what are the language consequences of discrete damage to branches of the dorsal and ventral streams; (4) what are the consequences (e.g., release from inhibition) of damage to the left white matter pathways in contralateral ones and viceversa; (5) how these pathways are reorganized after brain injury; (5) can the involvement/sparing of white matter pathways be used in outcome prediction and treatment response; and (5) can the microstructure of white matter pathways be remodeled with intensive rehabilitation training or biological approaches.

This Research Topic includes original studies, and opinion and review articles which describe new data as well as provocative and insightful interpretations of the recent literature on the role of white matter pathways in verbal repetition in normal and pathological conditions. A brief highlight summary of each is provided below.

#### **ARTICLES**

Opinion Article, Published on 29 Jul 2013

**The anatomo-functional connectivity of word repetition: insights provided by awake brain tumor surgery** Sylvie Moritz-Gasser and Hugues Duffau

In this opinion article, Moritz-Gasser and Duffau (2013) provide important insights on the role of the strong interaction of dorsal and ventral streams in word repetition. The authors discuss their pioneering results obtained through direct electrical stimulation of white matter pathways in awake patients during brain tumor surgery.

#### Review Article, Published on 12 Jul 2013

**Language repetition and short-term memory: an integrative framework**

Steve Majerus

Short-term maintenance of verbal information is crucially important for efficient language repetition of complex information. In a comprehensive review, Majerus (2013) presents an integrative framework aimed at bridging research in the language processing and short-term memory fields.

#### Hypothesis and Theory Article, Published on 02 Oct 2013 **Mapping a lateralization gradient within the ventral stream for auditory speech perception** Karsten Specht

Specht (2013) analyses the results from several complementary functional neuroimaging studies with the aim to trace the hierarchical processing network for speech comprehension within the left and right hemisphere. The author pays particular attention to role of the temporal lobe and the ventral stream in auditory speech perception.

#### Original Research Article, Published on 05 Sep 2013 **Articulation-based sound perception in verbal repetition: a functional NIRS study** Sejin Yoo and Kyoung-Min Lee

Using functional near-infrared spectroscopy, Yoo and Lee (2013) examine healthy subjects while repeating pseudowords and words. This study reveals that passive listening without repetition to various sounds (natural environmental sounds, animal vocalizations, and human non-speech sounds) as well as articulation activate neural circuits that include both inferior frontal regions.

Original Research Article, Published on 26 Aug 2013

#### **The roles of the "ventral" semantic and "dorsal" pathways in conduite d'approche: a neuroanatomically-constrained computational modeling investigation**

Taiji Ueno and Matthew A. Lambon Ralph

In a computational modeling investigation of the dual dorsalventral pathway implicated in verbal repetition, Ueno and Lambon Ralph (2013) demonstrate that the successful phonetic approximations to target words (*conduite d'approche*), typically observed in patients with conduction aphasia and damage to the dorsal pathway (arcuate fasciculus), relies on the complementary activity of the ventral semantic stream.

#### Original Research Article, Published on 18 Oct 2013

#### **Repeating with the right hemisphere: reduced interactions between phonological and lexical-semantic systems in crossed aphasia?**

Irene De-Torres, Guadalupe Dávila, Marcelo L. Berthier, Seán Froudist Walsh, Ignacio Moreno-Torres and Rafael Ruiz-Cruces

In this study, De-Torres et al. (2013) show that repetition after subcortical lesions involving the dorsal and ventral streams in patients who are right-hemisphere dominant for language is not heavily influenced by lexical-semantic variables as is regularly reported in similar cases with left hemisphere damage.

#### Original Research Article, Published on 10 Dec 2013

#### **Predicting speech fluency and naming abilities in aphasic patients**

Jasmine Wang, Sarah Marchina, Andrea C. Norton, Catherine Y. Wan and Gottfried Schlaug

The identification of reliable biomarkers that predict the degree of chronic speech fluency/language impairment and potential for improvement after stroke is paramount. In this study, Wang et al. (2013) demonstrate that lesion load in the arcuate fasciculus (dorsal stream) is the best anatomical marker at stratifying patients into different outcome groups with high accuracy for speech fluency and naming.

#### Original Research Article, Published on 31 Jan 2014 **Sensory-to-motor integration during auditory repetition: a combined fMRI and lesion study**

'Oiwi Parker Jones, Susan Prejawa, Tom Hope, Marion ¯ Oberhuber, Mohamed L. Seghier, Alex P. Leff, David W. Green and Cathy J. Price

On examining sensory-to-motor integration during auditory repetition in healthy subjects and aphasic patients, Parker Jones et al. (2014) find that normal and abnormal repetition of pseudowords correlate with activity of the arcuate fasciculus, but is unrelated to the activity of different cortical areas.

Original Research Article, Published on 19 Dec 2013

#### **Dissociated repetition deficits in aphasia can reflect flexible interactions between left dorsal and ventral streams and gender-dimorphic architecture of the right dorsal stream**

Marcelo L. Berthier, Seán Froudist Walsh, Guadalupe Dávila, Alejandro Nabrozidis, Rocio Juarez y Ruiz de Mier, Antonio Gutiérrez, Irene De Torres, Francisco Alfaro, Natalia García-Casares and Rafael Ruiz-Cruces

Using multimodal neuroimaging, Berthier et al. (2013) evaluate the neural correlates of repetition performance in two aphasic patients matched for lesion volume (a female patient with preserved repetition and a male patient with impaired repetition). Dissociated repetition deficits in these cases are probably reliant on flexible interactions between left dorsal stream and left ventral stream and on gender-dimorphic architecture of the right dorsal stream.

#### Original Research Article

**Dissecting the functional anatomy of auditory word repetition** Thomas Matthew Hadley Hope, Susan Prejawa, 'Ôiwi Parker Jones, Marion Oberhuber, Mohamed L. Seghier, David W. Green and Cathy J. Price

Hope et al. (2013) use a single, multi-factorial, within-subjects fMRI design to identify those regions, and to functionally distinguish the multiple linguistic and non-linguistic processing areas, that are all involved in repeating back heard words. They find that repetition activates components of regions not hitherto implicated in word repetition. Thus, these novel findings challenge some of the commonly held opinions on the functional anatomy of language.

#### **REFERENCES**


Hope, T. M. H., Prejawa, S., Parker Jones, O., Oberhuber, M., Seghier, M. L., Green, D. W., et al. (2013). Dissecting the functional anatomy of auditory word repetition. *Front. Hum. Neurosci.* 8:246. doi: 10.3389/fnhum.2014.00246

Lichtheim, L. (1885). On aphasia. *Brain* 7, 433–484.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 28 August 2014; accepted: 29 August 2014; published online: 02 October 2014.*

*Citation: Berthier ML and Lambon Ralph MA (2014) Dissecting the function of networks underpinning language repetition. Front. Hum. Neurosci. 8:727. doi: 10.3389/ fnhum.2014.00727*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Berthier and Lambon Ralph. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Language repetition and short-term memory: an integrative framework

#### *Steve Majerus 1,2\**

*<sup>1</sup> Department of Psychology - Cognition and Behavior, Université de Liège, Liège, Belgium <sup>2</sup> Fund for Scientific Research - FNRS, Brussels, Belgium*

#### *Edited by:*

*Marcelo L. Berthier, University of Malaga, Spain*

#### *Reviewed by:*

*Paul Hoffman, University of Manchester, UK Beth Jefferies, University of York, UK*

#### *\*Correspondence:*

*Steve Majerus, Department of Psychology - Cognition and Behavior, Université de Liège, Boulevard du Rectorat, B33, 4000 Liège, Belgium e-mail: smajerus@ulg.ac.be*

Short-term maintenance of verbal information is a core factor of language repetition, especially when reproducing multiple or unfamiliar stimuli. Many models of language processing locate the verbal short-term maintenance function in the left posterior superior temporo-parietal area and its connections with the inferior frontal gyrus. However, research in the field of short-term memory has implicated bilateral fronto-parietal networks, involved in attention and serial order processing, as being critical for the maintenance and reproduction of verbal sequences. We present here an integrative framework aimed at bridging research in the language processing and short-term memory fields. This framework considers verbal short-term maintenance as an emergent function resulting from synchronized and integrated activation in dorsal and ventral language processing networks as well as fronto-parietal attention and serial order processing networks. To-be-maintained item representations are temporarily activated in the dorsal and ventral language processing networks, novel phoneme and word serial order information is proposed to be maintained via a right fronto-parietal serial order processing network, and activation in these different networks is proposed to be coordinated and maintained via a left fronto-parietal attention processing network. This framework provides new perspectives for our understanding of information maintenance at the non-word-, word- and sentence-level as well as of verbal maintenance deficits in case of brain injury.

**Keywords: language, repetition, short-term memory, working memory, serial order, attention**

Short-term maintenance processes are a core ingredient of language repetition due to the inevitable temporal separation of input and output processes, implicating a delay period during which the input has to be temporarily maintained, even if often for a very brief time period such as during repetition of short, single word stimuli. However, the relationship between language repetition and maintenance processes remains poorly understood, partly due to the parallel and independent evolution of research in language processing and verbal short-term memory (STM) domains: many cognitive and neural models of language processing remain vague about the nature and neural underpinnings of maintenance processes, and most models of verbal STM, although acknowledging links with the language system, do not consider these links with much detail. In her recent review, Friederici (2012) highlighted the need for language processing architectures to consider and integrate interactions with STM. We here provide a review of studies that have investigated the cognitive and neural networks of maintenance processes and their interaction with language repetition/reproduction processes from various theoretical and methodological perspectives. We will attempt at bridging the gap between language processing and STM architectures, by proposing an integrative framework of verbal maintenance and language processing in which maintenance of verbal information is an emergent process, resulting from the temporary activation of both dorsal and ventral language processing pathways and their interaction with attentional control and sequence representation systems.

#### **THE ROLE OF DORSAL AND VENTRAL PATHWAYS IN STM MAINTENANCE**

Recent models of language repetition (Jacquemot and Scott, 2006; Hickok and Poeppel, 2007; Friederici, 2012; Hickok, 2012) assume that the posterior superior temporal gyrus (pSTG) plays a central function during language repetition, by providing, via the dorsal stream of speech processing, a sensorimotor interface linking acoustic codes in the superior temporal gyrus to articulatory codes in the posterior inferior frontal gyrus. This function has been considered to interface input and output phonological representations and to buffer verbal information via the temporary activation of these representations in language repetition tasks (Jacquemot and Scott, 2006; Hickok and Poeppel, 2007). The most compelling evidence supporting the pSTG region as a buffer function comes from patients presenting lesions in the pSTG area and whose language repetition deficit is most parsimoniously explained by difficulties in maintaining verbal information during repetition, such as in different cases of conduction aphasia or of the logopenic variant of primary progressive aphasia (Buchsbaum and D'Esposito, 2008; Gorno-Tempini et al., 2008; Buchsbaum et al., 2011). This is most clearly illustrated by patients with deep dysphasia, a rare but highly compelling form of conduction aphasia. These patients have severe difficulties in repeating single words, with marked lexicality and word imageability effects: repetition of familiar, concrete words is much less impaired than repetition of non-words or low imageability words (Michel and Andreewsky, 1983; Duhamel and Poncet, 1986; Howard and Franklin, 1988; Martin and Saffran, 1992; Trojano et al., 1992; Croot et al., 1999; Majerus et al., 2001; Tree et al., 2001; Wilshire and Fisher, 2004). A further hallmark characteristic is the production of semantic paraphasias during single word repetition. The most parsimonious account that has been proposed to explain this symptom constellation is a phonological decay impairment: activated phonological representations decay at an abnormally accelerated rate, with only some residual semantic activation left at the moment of production, explaining the strong influence of lexical and semantic variables on repetition performance (Martin and Saffran, 1992; Martin et al., 1994a). If abnormally increased decay of phonological representations is the defining feature of this syndrome, then the duration of the delay between input and output stages during language repetition should be a critical variable. This is supported by a case study with a deep dysphasic patient who has partially recovered from his language impairment, but who shows again semantic effects during repetition as soon as the delay between language input and output is increased (Martin et al., 1996). This interpretation has also been supported by connectionist implementations of the decay hypothesis within an interactive spreading activation model (Martin et al., 1996; Foygel and Dell, 2000). In sum, these patients, provide compelling evidence for a STMbased repetition impairment, and given their lesion overlap in the left posterior temporo/parietal area, can be considered to show impairment to the pSTG hub region of the dorsal language repetition stream.

A further argument often invoked for localizing phonological maintenance processes in the pSTG/inferior parietal area is the documentation of patients with specific phonological STM deficits: these patients typically show relatively spared single word repetition, but a severe reduction of multi-word repetition abilities, in association with lesions in the posterior superior temporal area extending to the supramarginal gyrus and the arcuate fasciculus, i.e., the dorsal repetition pathway (e.g., Warrington et al., 1971; Vallar et al., 1990; Basso et al., 1982; Majerus et al., 2004a; Takayama et al., 2004). This was also supported by early neuroimaging studies of verbal short-term maintenance, locating the verbal short-term storage function to the same posterior temporo-parietal neural substrate (Paulesu et al., 1993; Salmon et al., 1996; see Becker et al., 1999; Chein and Fiez, 2001 for an exhaustive review of these studies). These data suggest a close association between STM deficits in the phonological domain and lesions in the posterior part of the dorsal repetition pathway.

A different type of patients has been described with difficulties in maintaining semantic information during language repetition and comprehension tasks (e.g., Martin and Romani, 1994; Martin et al., 1994b, 1999; Freedman and Martin, 2001; Martin and He, 2004; Hoffman et al., 2009; Barde et al., 2010). The lesion involved here is located in the more anterior part of the left inferior prefrontal cortex and/or middle and inferior temporal cortex, which is part of the ventral stream of language processing (Hickok and Poeppel, 2007; Friederici, 2012), suggesting that the ventral pathway is also related to maintenance aspects during language reproduction, and this more specifically for semantic information [R. (Martin et al., 1994b, 1999); however, see Barde et al. (2010) for further involvement of the left angular gyrus area in some patients]. More specifically, these patients are generally poor in maintaining semantic information during sentence repetition/comprehension and show diminished lexicality and semantic effects during language repetition, as well as an increased rate of intrusion errors (Martin et al., 1994b, 1999). Although initially attributed to a semantic buffer deficit, their semantic maintenance difficulties have subsequently been linked to difficulties in inhibiting previously activated items, and have recently been related to a more general semantic control deficit<sup>1</sup> (Hamilton and Martin, 2005, 2007; Jefferies et al., 2007; Hoffman et al., 2009). A second type of intervention of the ventral language pathway in maintaining semantic information in STM is illustrated by patients showing loss of semantic information, as is the case in patients with semantic dementia (Hodges et al., 1992). These patients present a progressive loss of semantic representations, with lesions typically involving the ventral speech stream, gray matter loss starting in inferior, anterior and medial regions of the temporal lobe and involving also the anterior inferior prefrontal and orbito-frontal cortex (Mummery et al., 2000; Good et al., 2002; Desgranges et al., 2007). During language repetition, in both single and multiple word/nonword repetition tasks, patients with semantic dementia present a marked reduction of lexicality effects, with word spans being severely impaired but non-word spans often remaining in the normal range (Patterson et al., 1994; Knott et al., 1997; Majerus et al., 2007a). The data from patients with semantic dementia show that temporary activation of long-term memory lexico-semantic representations is a further critical determinant of language repetition and maintenance. In sum, the data from patients with selective semantic STM or semantic knowledge impairment suggest that the ventral repetition pathway is involved in language maintenance processes by providing the necessary substrate for activation and representation the semantic information to be maintained, and by supporting semantic control processes which protect semantic memoranda against semantic intrusions.

A straightforward conclusion of these results, and which is a more or less implicit assumption of recent language processing models and language-based STM models (Martin and Saffran, 1992; Hickok and Poeppel, 2007; Acheson and MacDonald, 2009; Friederici, 2012; Hickok, 2012), is that temporary maintenance of phonological information during language repetition depends upon the dorsal pathway, and that temporary maintenance of semantic information during language repetition depends on the ventral pathway. In other words, the language processing networks could be considered to be sufficient for supporting shortterm maintenance in language repetition tasks, via temporary

<sup>1</sup>Note that for the patient described by Hamilton and Martin (2007), the patient's interference control deficits were not limited to semantic information, but also involved phonological information. See page 22 for a discussion of the role of the inferior prefrontal cortex for interference resolution at the semantic versus phonological level.

activation of phonological, sensori-motor interface and semantic representations, and this from the encoding stage until the response is produced. Verbal information is considered here to be maintained via continuous activation all over the maintenance phase of underlying language representations initially activated during encoding (Martin and Saffran, 1992). This conclusion, although parsimonious, however, does not take into account the results of studies that have more directly explored the neural substrates of verbal short-term maintenance. In the STM research field, load effects are considered to be a core characteristic of maintenance processes: the higher the number of stimuli to be maintained, the higher the maintenance load, and the greater the solicitation of maintenance processes. This implies that regions involved in temporary maintenance of verbal information should be sensitive to maintenance load (Postle, 2006). Neural substrates supporting load effects have indeed been identified, but these typically involve areas outside dorsal and ventral repetition pathways: the superior parietal and intraparietal cortex as well as the dorsolateral prefrontal cortex have been shown to be sensitive to STM load (Awh et al., 1999; Rypma and D'Esposito, 1999; Leung et al., 2002, 2004; Rypma et al., 2002; Ranganath et al., 2004; Ravizza et al., 2004; Narayanan et al., 2005; Majerus et al., 2012). Similarly, Martin et al. (2003) investigated phonological and semantic maintenance processes by exploring load effects in phonological (rhyme judgment) and semantic (category judgment) STM tasks: for both tasks, load effects were observed outside the ventral and dorsal language processing pathways, and involved the superior parietal cortex, the intraparietal sulcus and the dorsolateral prefrontal cortex in the left hemisphere; activations in the language network were observed, with specific recruitment of regions of the dorsal language pathway (left supramarginal cortex) for the rhyme judgment task but these activations did not respond in a load-dependent manner. These results suggest that while dorsal and ventral language pathways are specialized in representing phonological and semantic information, respectively, they do not reflect maintenance processes per se if we consider these to be defined by load effects.

These data should, however, not be taken as a conclusive argument against a role of temporary activation of language representations during STM maintenance. Studies will need to assess load-effects in dorsal and ventral pathways using more sensitive neuroimaging methods such as multivariate voxel pattern analyses; these techniques have recently allowed to show load effects in occipital visual processing areas during visual maintenance tasks where univariate analyses failed to reveal any such effects (Emrich et al., 2013). Maintenance of six vs. two verbal items may not necessarily be associated with differential activation levels in language processing areas, as measured with traditional univariate analyses, but could rely on more subtle differences in activation patterns. Enhanced activation levels in fronto-parietal cortex areas may on the other hand reflect increasing attentional demands for distinguishing increasingly overlapping activation patterns in the language processing areas when maintaining six vs. two items. Also, most patients with apparent selective phonological STM impairment are likely to have difficulties at the level of processing and maintaining language representations. These

patients typically present lesions in the dorsal language network (posterior superior temporal and temporo-parieral cortex) rather than the load-dependent fronto-parietal networks (Warrington et al., 1971; Basso et al., 1982; Vallar et al., 1990; Majerus et al., 2004a; Takayama et al., 2004). The vast majority of these patients show a history of aphasia, with residual phonological processing deficits in most cases (see Majerus, 2009). In their meta-analysis, Majerus (2009) showed a strong positive correlation between the severity of the phonological STM impairment and residual language processing impairment, suggesting that residual difficulties in representing and processing phonological information may at least partially explain phonological STM deficits in these patients (see also Buchsbaum and D'Esposito, 2008 and Postle, 2006).

Overall, the results from the language processing and STM research fields appear to be conflicting. On the one hand, the neuropsychological data reviewed here suggest that impairment at the dorsal and ventral language pathways is clearly associated with difficulties in tasks involving the maintenance of phonological and semantic information, respectively. On the other hand, studies from the STM research field highlight bilateral frontoparietal networks as being related to core STM processes such as load effects. In order to understand this apparent paradox, we need to examine more deeply the nature of representations and processes involved in the maintenance of verbal information. In order to achieve this, a three-component framework is proposed here. In this architecture, a first component is related to temporary activation of language representations in the dorsal and ventral language processing networks: to-be-repeated language stimuli have to be represented at the item level that is, their phonological and lexico-semantic features need to be encoded, represented and maintained, and this will be achieved via continuous activation in the language pathways. Second, the serial order of the stimuli needs to be represented: this is more particularly the case when sequence information of the words or phonemes within the string of memoranda is unfamiliar. Third, although language repetition is a simple, straightforward task and will not require demanding executive control processes, at least in healthy, language non-impaired individuals, attentional focalization on the target stimulus/stimuli will be required ad minima, and these requirements will increase with increasing number and decreasing familiarity of the verbal stimuli to be repeated. The latter two components are proposed to be supported by the load-dependent, fronto-parietal networks typically associated with STM tasks. In the following sections, we will discuss empirical support for this three-component architecture of temporary maintenance for verbal stimuli.

#### **LANGUAGE REPETITION PATHWAYS AND MAINTENANCE OF ITEM INFORMATION**

Dorsal and ventral language processing networks are proposed here to have one specific function during verbal maintenance: they provide the representational basis for the encoding of phonological and semantic features of the items to be maintained and repeated. In other words, via its temporary activation, the language network ensures the encoding and representation of phonological and semantic item information during temporary maintenance of verbal information. Critically, this excludes the representation of novel serial order information, such as the arbitrary ordering of the words within a list of words to be repeated, such as a phone number. This distinction between the representation of item information and serial order information is in short-term maintenance tasks defines most of recent verbal STM models and is supported by empirical evidence that will be presented in this and the next section. These STM models consider that during maintenance of verbal information, verbal item information is directly represented within the language system, rather than by a copy in a dedicated STM buffer, while the representation of novel serial order will be processed via a specific serial order processing system to which the language system is connected (Burgess and Hitch, 1999, 2006; Martin et al., 1999; Brown et al., 2000; Gupta, 2003). Verbal item information is considered to be maintained via sustained activation of the phonological representations and semantic representations along the dorsal and ventral repetition pathways and which have also served to process the target item during perception and encoding.

The assumption that language processing networks mainly serve to represent phonological and semantic item information is supported by a number of behavioral and neuroimaging studies. At the behavioral level, it is well-established that linguistic variables, such as word frequency, word imageability and semantic valence and richness will determine the amount of item information that is correctly recalled in a word list immediate serial recall task (i.e., the number of items independently of their serial position), but not recall of serial order information (i.e., the number of items within correct serial position) (Hulme et al., 1991; Poirier and Saint-Aubin, 1996; Nairne and Kelley, 2004; Majerus and D'Argembeau, 2011). This shows that access to linguistic levels of representation affects maintenance of verbal item information, but not within-list serial position information. Second, neuroimaging studies have shown that, when maintaining items and their phonological characteristics, phonological processing areas in the pSTG area and adjacent inferior parietal cortex are activated at least during the initial stages of maintenance (Collette et al., 2001; Martin et al., 2003; Majerus et al., 2006a, 2010; Pa et al., 2008; Gettigan et al., 2011). Furthermore, in a recent MEG study, Herman et al. (2013) showed that processing of long non-word sequences, involving a delay below stimulus input and repetition, were associated with increased reverberating activity between posterior (temporo-parietal) and anterior (inferior frontal) sites of the dorsal pathway, suggesting sustained and synchronized activation of input and output phonological representations during maintenance of verbal stimuli. Similarly, when maintaining semantic item information, semantic processing areas in the inferior temporal lobe have been shown to present sustained activation over the maintenance interval (Fiebach et al., 2006, 2007). These data suggest that ventral and dorsal language pathways are involved in maintenance during language reproduction tasks, by providing the representational substrates necessary for encoding and representing the items, i.e., the phonological and semantic characteristics of the information to be maintained and repeated.

This is further supported by patients with semantic processing deficits. These patients typically present difficulties in repeating semantic item information, but not serial order information, with serial order recall being perfectly preserved (Majerus et al., 2007a; Papagno et al., 2013). These patients produce very specific item error patterns in repetition tasks, the so-called blending errors where phonological forms of different words are recombined to form nonsense phonological forms. Although these errors could be considered to reflect syllable or phoneme ordering errors, they are in fact a direct consequence of the loss of semantic information: lexico-semantic knowledge normally binds the phonological segments defining a word form to its semantic referent allowing for robust phonological item representations; if this knowledge is degraded, lexical phonological representations for a given word degrade, the word being processed like a nonword and leading to the phonological recombination errors which are typically observed when healthy participants repeat sequences of non-words (Treiman and Danis, 1988; Patterson et al., 1994; Jefferies et al., 2005; Acheson and MacDonald, 2009). In support of this interpretation, Jefferies et al. (2006) have shown that healthy adults conduct the same type of phonological recombination errors in word list immediate serial recall tasks when word stimuli are not recognized as lexical items anymore, for example when presented together with non-words in mixed and unpredictable word-nonword list repetition designs. More generally, these data also show that in the absence of long-term language knowledge, serial order information of phoneme order is difficult to maintain and serial ordering errors appear during repetition performance.

Finally, syntactic information will also support maintenance and recall of information, by binding item information and item order via long-term syntactic structures. This will again be the case for familiar information such as coherent sentences with canonical sentence structure. A number of studies have shown that verbal STM span can be significantly increased when presenting word lists organized as sentences; in this case word span will increase to about 16 words (Brener, 1940; Baddeley et al., 1984, 2009). Although this finding has been attributed to increased opportunities for the use of chunking processes, a straightforward interpretation of these results is the intervention of syntactic and conceptual long-term memory structures which will determine the syntactic and conceptual relations between the items, and therefore also their position in a sentence structure (Garrett, 1980). For example, the pronoun "the" will always precede its corresponding noun, and the sentence subject will precede the verb while the object will follow the verb for canonical sentence structures. At the conceptual level, the agent will generally precede the action and the beneficiary. This knowledge, embedded in the ventral pathway for the conceptual aspects and in the dorsal pathway for the syntactic aspects, will support both item and order recall in a sentence context (Friederici, 2012). However, if incoherent sentences are presented with words in scrambled order, sentence span will decrease and especially serial order errors will appear (Hoffman et al., 2012). In other words, the representations of the language system are able to support familiar item and order information, but not unfamiliar order information, as has already been shown for non-word repetition.

In sum, these data provide support for language representations in the dorsal and ventral speech streams as providing the representational basis for temporary maintenance of *item* information. Language processing models, such as those developed by Hickok and Poeppel (2007), Jacquemot and Scott (2006) and Friederici (2012), and recent STM models mentioned in this section show strong theoretical convergence here, both considering that temporary activation of long-term representations in the language network is a critical step of verbal maintenance. However, temporary activation of representations in the dorsal and ventral language bases are not the only processes that intervene during short-term maintenance of verbal information, and it is at this point that language processing and STM models start to diverge.

#### **THE ROLE OF FRONTO-PARIETAL NETWORKS IN VERBAL MAINTENANCE: SERIAL ORDER PROCESSING**

A hallmark characteristic of many recent verbal STM models is the consideration of mechanisms that allow for the temporary maintenance and reproduction of arbitrary sequence information that is, the ability to recall verbal items/phonemes as a function of their serial position during list/stimulus presentation (Henson, 1998; Burgess and Hitch, 1999, 2006; Brown et al., 2000; Gupta, 2003; Botvinick and Watanabe, 2007). Language processing models rarely consider this ability but assume that serial order information is an inherent part of linguistic structure and is supported by linguistic structure during language reproduction (Acheson and MacDonald, 2009; see also Postle, 2006). This assumption is valid when linguistic structure knowledge is available. The ordering of phonemes for familiar word forms will be determined by phoneme- and syllable-cooccurrence and transition probabilities encoded in phonological representations; the same will also be true for non-words, where sublexical phonotactic knowledge and syllable structure knowledge will determine output in non-word recall tasks (Treiman and Danis, 1988; Vitevitch and Luce, 1999; Dell et al., 2000; Majerus et al., 2004b; Acheson and MacDonald, 2009; Gupta and Tisdale, 2009). Likewise, during sentence repetition, syntactic and conceptual knowledge will constrain the order of the words during output (Dell, 1986). However, when this knowledge is not available serial order errors will occur during repetition. This is illustrated by non-word recall where sublexical phonotactic knowledge is not sufficient to accurately encode and reproduce the serial ordering of the phonemes, especially if the underlying phonological pattern of the non-word is highly unfamiliar; in that case, errors during non-word repetition will start to appear and these errors will be mainly phoneme order errors (Treiman and Danis, 1988; Gupta et al., 2005; Jefferies et al., 2006). This is also the case when the order of an arbitrary list of words needs to be maintained and repeated, such as when repeating a phone number, a list of unrelated words, a novel sequence of task instructions or a novel sequence of orally given directions.

While in the STM domain, many detailed models of the processes supporting the maintenance and reproduction of novel sequence information have been developed (e.g., Henson, 1998; Burgess and Hitch, 1999, 2006; Brown et al., 2000; Gupta, 2003; Botvinick and Plaut, 2006), the neural pathways associated with these processes have only been recently uncovered. Studies exploring the neural substrates associated with serial order maintenance and reproduction have observed a critical role of the inferior parietal cortex, and more specifically the intraparietal sulcus area. Marshuetz et al. (2000) observed higher activation in bilateral intraparietal sulci when maintaining the serial order of arbitrary letter sequences as opposed to maintaining letter identity (see also Marshuetz et al., 2006). When comparing serial order and item STM conditions with a stricter control of task difficulty, Majerus et al. (2006a, 2010) observed that maintenance and retrieval of serial order information for word lists as well as non-word lists is restricted to activation in the right intraparietal sulcus, in addition to activation in the bilateral superior frontal cortex and the right superior cerebellum; the superior frontal cortex contribution to serial order processing has also been observed by Henson et al. (2000) and has been associated with serial regrouping. On the other hand, activation is stronger in the dorsal and ventral language networks when maintaining item identity information such as in conditions where participants have to focus on and later recognize the phonological, orthographic or semantic characteristics of the memoranda (Majerus et al., 2006a). Furthermore, the fronto-parietal network supporting encoding and maintenance of serial order information appears to be domain general, the same network having been shown to be also involved in the short-term maintenance of serial order information for visual sequences such as sequences of unfamiliar faces (Majerus et al., 2007b, 2010).

The separation between language -based item maintenance processes, and serial order maintenance processes is also confirmed by patients presenting verbal STM deficits. Case studies with double dissociations between item-based and order-based maintenance deficits have been documented. Attout et al. (2012) described two patients, MB and CG, with poor performance in verbal repetition and reproduction tasks and poor digit spans. An exhaustive exploration of MB's performance profile for STM tasks maximizing either the retention of verbal item information or serial order information showed that patient MB had difficulties mainly in recognizing and reproducing item information; word and non-word list repetition was characterized by a significantly increased rate of omissions errors and phonological paraphasias but his serial recall was perfect: all words correctly recalled were reproduced in correct serial position. His itembased STM impairment was furthermore associated with a mild residual phonological processing impairment, in the context of a left posterior peri-sylvian cerebro-vascular accident. On the other hand, CG, a patient with traumatic brain injury <sup>2</sup> , showed the reverse profile: he showed an abnormally high rate of serial ordering errors in verbal repetition tasks, while showing perfect item reproduction abilities; he recalled as many items as controls, but had substantial difficulties in outputting the items in correct serial position. The existence of a double dissociation between item and order verbal maintenance deficits is also an important argument against unitary models of verbal STM such as the model by Botvinick and Plaut (2006) considering that item and order information are bound in a single representation during

<sup>2</sup>CG's specific brain lesion could not be clearly determined; a CT scan showed damage to the left anterior lobe; however other white matter lesions could not be excluded due to traumatic brain injury which leads to diffuse axonal injuries which often are not visible via standard CT or MRI scanning protocols (Arfanakis et al., 2002).

maintenance and reproduction of verbal sequential information. These dissociations are also contradicting language-based serial order coding accounts, where the maintenance of serial order is supposed to be achieved mainly via repeated cycling of the input sequence through the language production system (Page and Norris, 1998; Postle, 2006; Page et al., 2007; Acheson and MacDonald, 2009). As already noted, phonological, semantic and syntactic linguistic structures will support serial recall if the sequence information can be mapped onto existing long-term memory sequence structures (such as syllable frames, phonotactic constraints, lexical word form representations, scripts), however, this will not be possible when the sequence information is novel, arbitrary and highly unfamiliar (Treiman and Danis, 1988; Gupta et al., 2005; Jefferies et al., 2006). The evidence presented here is in favor of separate cognitive and neural substrates supporting item vs. order representation in language maintenance and reproduction tasks.

Importantly, the distinction between item and order maintenance capacities has further functional implications for language processing. The serial order maintenance capacities supporting novel sequence reproduction may be critical for language repetition and learning. A number of behavioral studies have shown that repetition and learning of novel phonological sequences is most strongly associated with serial order maintenance capacities, as opposed to item maintenance capacities: children and adults showing high serial order maintenance capacities as measured by serial order reproduction and reconstruction tasks have larger vocabulary knowledge bases and learn faster novel vocabulary; item STM tasks involving item recall independently of serial order position information are more weakly associated with performance in novel word repetition and learning tasks (Majerus et al., 2006b,c, 2008a; Mosse and Jarrold, 2008; Leclercq and Majerus, 2010). A theoretical interpretation of these findings is that the ability to temporarily maintain sequence information via a dedicated short-term storage system for order information allows the unfamiliar phoneme sequences which define a novel word to be maintained and replayed in correct order during the repetition and learning process, thereby increasing the strength of the new lexical phonological representation being created in the language knowledge base; this entails that the language pathways (where item representations—phonemes/syllables/complete word forms—are stored, temporarily activated and learnt) and the order maintenance system are interconnected and in close interaction (Gupta and MacWhinney, 1997; Burgess and Hitch, 1999, 2006; Gupta, 2003).

In the light of these data, we should expect that, at the neural level, single novel word repetition and learning is also associated with the fronto-parietal serial order processing network. In support of this hypothesis, Majerus et al. (2008b) observed a correlation between novel word learning capacities in healthy adults and the recruitment of the frontal part of the fronto-parietal serial order processing network. In the same vein, a MEG study exploring the time course of brain activity associated with the repetition of non-word syllable sequences, observed, in parallel to reverberating activity in the dorsal language pathway, an involvement of the right intraparietal sulcus area; the non-word syllable repetition task used in that study had strong serial order processing requirements, since the different non-word sequences were sampled each time from the same set of three syllables (ba, da, or pa) with syllable serial order being the distinguishing feature between the different non-word sequences (Herman et al., 2013); furthermore, the right IPS involvement was not just coincidental, but it was associated with behavioral success during the syllable repetition task. On the other hand, other studies investigating the neural substrates of novel word repetition or maintenance have observed activation restricted mainly to the dorsal language pathway (e.g., Strand et al., 2008; Papoutsi et al., 2009; Gettigan et al., 2011). These studies, however, most often control for task-general factors by including baseline conditions which factor out neural activation related to serial order processing, such as via the use of tone sequence processing conditions (Strand et al., 2008; Gettigan et al., 2011). As we have noted, the fronto-parietal network supporting serial order processing for verbal tasks also supports ordinal processing in other modalities (Majerus et al., 2007b, 2010; Dormal et al., 2012). By using these control conditions, the intervention of fronto-parietal serial processing mechanisms may have been masked. Similarly, other studies contrasted non-word conditions that varied according to a number of linguistic dimensions (such as articulatory constraints) and where the reference non-word condition already included serial order processing, which may again have masked the potential intervention of fronto- parietal serial order processes (Papoutsi et al., 2009). In sum, the intervention of fronto-parietal serial order processing mechanisms has been established for the maintenance and reproduction of order information in word, non-word and letter sequences; the conditions under which these processes also intervene in single non-word processing have to be further investigated.

#### **THE ROLE OF FRONTOPARIETAL NETWORKS IN VERBAL MAINTENANCE: ATTENTIONAL FOCALIZATION**

A second hallmark feature of recent verbal STM models, and which is considered even less by language architectures than serial order processing, is attentional processing. Many recent models of STM consider that maintenance of verbal information does not only require temporary activation of language representations, but the maintenance of this activation over time until task completion is further under the control of attentional focalization processes (Cowan, 1995; Oberauer, 2002; Barrouillet et al., 2004; Engle and Kane, 2004). Although the role of attention has been acknowledged by early STM models, such as the working memory model by Baddeley and Hitch (1974), recent data show that attentional focalization intervenes not only in complex storage and processing tasks, but also in simple verbal tasks requiring only maintenance and output of a set of stimuli as is the case of language repetition tasks (Cowan et al., 2005; Majerus et al., 2009; Ötzekin et al., 2010). In other words, temporarily activated representations are considered to remain activated as long as required by being put in the focus of attention and by being re-activated each time they are the target of the focus of attention (Cowan, 1988, 1995). Direct neuroimaging evidence for this mechanism has been observed in the area of face processing, where Gratton et al. (2013) recently showed that items hold in the focus of attention are characterized by enhanced neural response in temporo-occipital face processing areas relative to items outside the focus of attention. This control of activation maintenance via attentional processes will further allow to ensure that activated input and output representations match, possibly via additional efference copies sent to the inferior parietal cortex, allowing that input information is correctly reproduced at output (Rauschecker and Scott, 2009). Attentional capacity is currently considered by many authors to be the core limiting factor of performance in verbal maintenance tasks, and the defining factor of maintenance capacity (Cowan, 1995; Oberauer, 2002; Barrouillet et al., 2004; Engle and Kane, 2004).

At the neuroimaging level, part of the fronto-parietal network that typically defines the neural substrates of verbal maintenance tasks has been associated with this attentional focalization function, and this more precisely at the level of the left intraparietal sulcus and the dorso-lateral prefrontal cortex (Salmon et al., 1996; Nystrom et al., 2000; Ravizza et al., 2004; Cowan et al., 2011; Majerus et al., 2012). Although bilateral fronto-parietal activity is typically observed in verbal maintenance tasks, only the left intraparietal sulcus and dorsolateral prefrontal cortex appears to be activated irrespective of the type of information to be maintained, and is considered to have a domain-general attentional control and focalization function in STM tasks (Majerus et al., 2010; Cowan et al., 2011). The right intraparietal sulcus area appears to have a more specific function and is activated more strongly when maintaining serial order information as we have seen in the previous section. This fronto-parietal network is also considered to support attentional focalization processes rather than a verbal buffer function since this network is sensitive to load-effects not only in the verbal domain, but also when temporarily maintaining other types of information such as faces, geometric stimuli, tactile stimuli or even social stimuli (e.g., Nystrom et al., 2000; Rämä et al., 2001; Hautzel et al., 2002; Todd and Marois, 2004; Brahmbhatt et al., 2008; Lycke et al., 2008; Meyer et al., 2012; Kaas et al., 2013). This has led to the currently dominant view in the STM research field that an important function of the frontoparietal network is the control of task-related attention during the maintenance of verbal information, allowing attention to be directed and maintained on the target stimuli to be processed and maintained (Todd and Marois, 2004; Postle, 2006; Nee and Jonides, 2011, 2013). More specifically, this STM load-dependent fronto-parietal network has been shown to involve a well-known network in the attention research field, the dorsal attention network which allows attention to be oriented on target stimuli as a function of ongoing task requirements, in both verbal and visual domains (Todd and Marois, 2004; Majerus et al., 2012). This network has been shown to increase its activity with increasing STM load, while competing with a second attentional network during verbal maintenance tasks, the ventral attention network involved in detecting novel, task-irrelevant stimuli; the ventral network, involving the temporo-parietal junction and the orbitofrontal cortex, is deactivated as a function of the amount of verbal stimuli to be maintained, and this deactivation is associated with attentional blindness for distractor stimuli presented while the verbal stimuli are being maintained (Todd et al., 2005; Fougnie and Marois, 2007; Majerus et al., 2012). These data demonstrate the central role of task-related attentional processes as defining left-hemisphere fronto-parietal activity during maintenance of verbal stimuli.

In the light of these data, any model representing language repetition and maintenance processes should consider interactions with these domain-general attention networks, since they have been shown to be one of the main function of the left-centered fronto-parietal network recruited during temporary maintenance of verbal information. Furthermore, repetition of multiple word or non-word sequences will particularly require attentional control processes in order to ensure that input and output match (Rauschecker and Scott, 2009). While this network is consistently observed to be involved in tasks involving the maintenance and reproduction of multiple word or non-word stimuli (Ravizza et al., 2004; Majerus et al., 2006a, 2010; Ötzekin et al., 2010; Cowan et al., 2011), this is less consistently the case for single word and non-word repetition. On the one hand, single word repetition will probably require attentional focalization processes only to a minimal amount, since a single target has to be processed and the maintenance delay is very short due to a quasi-immediate succession of input and output processes. This is also in line with neuroimaging studies of verbal maintenance showing no or minimal recruitment of left fronto-parietal networks in lowload conditions (e.g., when a single or two letters have to be maintained; Majerus et al., 2012). However, this may be different for single non-word repetition, especially if the phonological structure of the non-word is highly unfamiliar and multisyllabic and will be difficult to map onto existing sublexical phonological representations along the dorsal language pathway. In that case, fronto-parietal maintenance mechanisms are likely to be challenged to a higher extent. Studies having investigated the neural substrates of single non-word repetition do not systematically observe activation of the left-centered fronto-parietal network (Pa et al., 2008; Strand et al., 2008; Papoutsi et al., 2009; Gettigan et al., 2011). This could, however, be related to the control conditions used in these studies, factoring out domain-general processes such as attentional focalization. As already noted in the previous section, most of these studies aim at exploring neural activations specifically associated with linguistic processing and maintenance, and therefore use baseline conditions which remove more general cognitive variables, for example by presenting tone or gesture sequences to be processed and maintained as a reference condition (e.g., Pa et al., 2008; Gettigan et al., 2011). On the other hand, when considering activations shared with processing of the control conditions, activation in intraparietal areas can be observed, as was for example the case in the study by Pa et al. (2008) comparing speech and gesture maintenance. Also, in their recent MEG study exploring the time course of neural activation during language repetition, Herman et al. (2013) observed activation in the left fronto-parietal network during non-word repetition. Interestingly, this network reacted in a load-dependent manner, with higher recruitment for repetition of 4-syllable nonwords as compared to two-syllable non-words. Furthermore, it is important to note here that the involvement of the left frontoparietal network occurred at a relatively late time point after encoding, between 500 and 700 ms post-stimulus onset, while activation in the dorsal language network was present about 40 ms post-stimulus onset. This later involvement of the fronto-parietal network is in line with its top-down attentional control function during language processing: language representations of to-berepeated stimuli are first activated in the language processing networks, and their activation is then maintained and monitored via top-down task-related attentional control. Finally, in a recent neuroimaging study exploring functional connectivity patterns during a sentence processing task, Makuuchi and Friederici (2013) found further evidence for the involvement of frontoparietal networks in language processing tasks. Using dynamic causal modeling, they observed functional connectivity between the left-hemisphere fronto-parietal network and core language processing areas during sentence processing, and the strength of this association increased as a function of the linguistic complexity of the verbal material and, by extension, of the amount of attentional focalization/control needed. These data show that the fronto-parietal network is not only co-activated during language processing tasks, but is an integral and integrated part of language processing networks.

#### **TOWARD AN INTEGRATIVE FRAMEWORK FOR MAINTENANCE PROCESSES DURING LANGUAGE REPETITION**

In this review, aiming at elucidating the cognitive processes and neural networks involved in the maintenance of verbal information during language repetition, we have shown that research in the language and STM dosmains converge on one important factor: the importance of language knowledge supported by the dorsal and ventral pathways and its temporary activation during maintenance of verbal information. On the other hand, research in the STM domain points to two additional processes: those involved in maintaining novel sequence information, and those involved in maintenance control via attentional focalization processes. Although language processing models have given no or very little consideration to the latter two processes, the studies reviewed here show that temporary maintenance of verbal information can depend on all three factors identified here, especially when multiple word stimuli or long non-word stimuli need to be processed. Similarly, cognitive architectures of STM consider interactions between either language processing and serial order processing (e.g., Brown et al., 2000; Gupta, 2003; Burgess and Hitch, 2006), or language processing and attentional processing (e.g., Cowan, 1995; Oberauer, 2002; Barrouillet et al., 2004), but no STM model currently considers the three components of verbal maintenance identified here at the same time.

The integrative framework of verbal maintenance processes during language repetition proposed here considers language, serial order and attention components within a single model. An overview of this functional architecture and underlying neural networks is presented in **Figure 1**, for single word and non-word repetition, and in **Figure 2**, for word and non-word sequence repetition including sentence repetition. The basis of this architecture are the dorsal and ventral language pathways, where long-term phonological and semantic representations are activated upon presentation of a word (see **Figure 1**). More precisely, in the dorsal network, sublexical phonological representations in the posterior superior temporal area and the superior temporal

**FIGURE 1 | Outline of the networks and processes proposed to support maintenance processes during single word and short non-word repetition.** Maintenance during short non-word repetition is mainly supported by the dorsal language pathway, linking the superior and posterior temporal cortex to the posterior inferior frontal cortex, and, at the cognitive level, reflects temporary activation and interfacing of input and output phonological item representations. Maintenance of single word repetition is also supported by the dorsal language

pathway, but with additional intervention of the ventral language pathway, linking the middle and anterior temporal cortex to a more anterior site of the inferior frontal cortex, and reflects temporary activation of semantic item representations. The frontal endpoints of each pathway are further involved in protecting to-be-maintained information against phonological and semantic interference, respectively. The numbers indicate the main Brodman areas characterizing each functional region identified here.

sulcus will be activated and temporarily maintained (Binder et al., 2000; Scott et al., 2000); two different types of representation may be distinguished here: the posterior superior temporal area (planum temporale) has been proposed to support sensori-motor interface representations, which, in direct connection with the inferior frontal cortex, will allow target representations to get continuously reactivated and refreshed via subvocal articulatory rehearsal processes (the "doing" pathway; Hickok and Poeppel, 2007; Rauschecker and Scott, 2009); the more anterior superior temporal areas and superior temporal sulcus have been proposed to keep track of the initial perceptual properties of the target information (Buchsbaum et al., 2005). In the ventral network, activations in the anterior, middle and inferior temporal areas will represent the lexical and semantic properties of the target information (Scott et al., 2000; Binder et al., 2000, 2009; Friederici, 2012). Importantly, at this stage only item representations will be activated and maintained, allowing individual words to be maintained and repeated on the basis of their underlying phonological, lexical and semantic representations. However, as we have seen, these representations will not be sufficient to maintain sequence information, i.e., to maintain the (arbitrary) serial order in which the different words have been presented. Activation in the language pathways therefore needs to be synchronized with an additional system which allows for the coding of arbitrary sequence information (see **Figure 2**): this function is proposed to be supported by a fronto-parietal network centered on the right intraparietal sulcus, which will associate each activated item in the language network with a serial position marker ensuring that each item will be output in correct serial position at recall, as proposed by a number of computational models of serial order STM (Gupta and MacWhinney, 1997; Burgess and Hitch, 1999, 2006; Brown et al., 2000). Finally, attentional control will be needed to maintain the item and serial order representations activated over time and in the focus of attention, as a function of current task requirements. This function is proposed here to be supported by a fronto-parietal network centered around the left intra-parietal sulcus (**Figure 2**), in line with an increasing number of studies associating the fronto-parietal activations during verbal and non-verbal maintenance with the dorsal attention network (Todd and Marois, 2004; Cowan et al., 2011; Majerus et al., 2012). This network will interact with the other two networks in order to ensure synchronized activation and processing, which will lead to successful task performance and accurate reproduction of both item and order information.

This architecture of verbal maintenance is considered to be task-dependent: when repeating single words or short non-words with a familiar sublexical phonological structure in an immediate repetition task, processing is likely to be limited to the ventral and dorsal repetition pathways, respectively, since there will be no novel serial position information to be processed; requirements for extended maintenance via attentional focalization will also be minimal since the perceptual input will be processed by the language repetition pathways in a quasi instantaneous manner and the target item activation does not need to be protected against competitor stimuli. In accordance, patients with a severe single word repetition impairment often have lesions restricted to these pathways, and more precisely, the posterior part of the dorsal pathway (Buchsbaum et al., 2011). Furthermore, recent studies exploring the role of attentional processes on maintenance processes have shown that at output, not all information will be in the focus of attention, and some information can be directly retrieved from activated long-term memory (Nee and Jonides, 2008, 2011, 2013; Lewis-Peacock et al., 2012). Similarly, when repeating multiple word sequences, the serial order processing network may not be extensively recruited if output in correct serial position is not required. Previous studies have shown that healthy subjects can recruit the serial order processing network centered around the right intraparietal sulcus as a function of task demands: when processing of sequential aspects is stressed by task instructions, stronger recruitment of the right intraparietal sulcus is observed; but when task instructions focus on the maintenance of phonological and orthographic characteristics of the items, dorsal and ventral language processing streams are activated more strongly (Majerus et al., 2006a, 2010). The flexible recruitment of these different networks is supposed to be under the control of the fronto-parietal network centered around the left intraparietal sulcus involved in top-down attentional processing. For tasks with varying item and serial order processing demands, the left intraparietal sulcus has indeed been shown to be activated for both types of information but with differential functional connectivity patterns, connectivity being enhanced between the left intraparietal sulcus and language processing networks when item processing demands are high, and connectivity being enhanced between the left and right intraparietal sulci when serial processing demands are high (Majerus et al., 2006a, 2008b). These data suggest that attentional control by the left fronto-parietal network can be flexibly allocated to language processing and/or serial order processing networks, as a function of task demands.

A number of predictions are to be derived from the framework proposed here. First, a strong prediction of this framework is the greater involvement of the serial order processing and attentional processing components during non-word repetition, especially when the non-word sequence is long, complex and cannot be easily mapped to existing lexical and sublexical phonological structures, i.e., non-words with very low lexical neighborhood values and phonotactic probability values. In that case, the sequence of phonemes cannot be represented via existing sublexical phonological structures, and the novel sequence information needs to be maintained via strong connections between the phonological item representations supported by the dorsal repetition pathway, the novel sequence representations supported by the fronto-parietal network centered around the right intraparietal sulcus, and attentional resources supported by the frontoparietal network centered around the left intraparietal sulcus (see **Figure 2**). As already mentioned, previous studies exploring the neural substrates of non-word repetition typically focused on the linguistic networks and/or used control conditions factoring out any possible contribution of the serial order and attention processing components identified here (e.g., Gettigan et al., 2011). In support of this, studies looking directly at the time course of activation patterns during non-word repetition, without using any baseline condition, observed in addition to involvement of the dorsal language pathway activation in left and right inferior parietal areas which was stronger for longer non-word sequences. Moreover, if output is delayed, there will be additional requirements for short-term maintenance, and in that case, the intervention of attention networks may be necessary for maintaining active the corresponding phonological representations even for short non-words. Future studies will need to determine in a systematic manner the conditions in which serial order and attention processing networks intervene during single non-word repetition. In order to answer these questions, studies will need to use experimental designs that allow for the detection of domaingeneral attention and serial order processing networks instead of factoring them out.

A second prediction is related to sentence repetition. Repeating long sentences with delayed semantic integration, as is for example the case for sentences involving multiple adjectives or subordinate clauses, should put relatively high demands on temporary maintenance processes, and hence should rely on attentional support processes. Martin et al. (2003) showed that sentences where semantic integration is delayed put higher demands on semantic short-term retention abilities. Likewise, verbatim sentence repetition has been shown to be determined by phonological short-term retention abilities (Martin et al., 1999). Therefore, sentence repetition should involve the language repetition pathways as well as the fronto-parietal attention networks involved in short-term maintenance. With respect to the involvement of serial order representation mechanisms, syntactic structure knowledge will on the one hand constrain and determine word order allowing word order to be represented via activation of existing word co-occurrence and syntactic structures in the language network. On the other hand, when this knowledge is not sufficient, as is the case for example for reversible sentence constructions with the two possible interpretations being semantically plausible (e.g., John is being pushed by Eaton vs. Eaton is being pushed by John), the specific coding of word order will be important, potentially needing the recruitment of the serial order representational system supported by the right intraparietal sulcus. In support of this, studies exploring the neural substrates of sentence repetition or generation have shown involvement of both left and right intraparietal sulcus areas (Haller et al., 2005; Tremblay and Small, 2011). In both of these studies, this involvement was even stronger during sentence production than sentence listening/reading: especially sentence production will require detailed attention to both word identity and word order in order to allow for accurate reproduction, while sentence comprehension can be achieved via conceptual level processes for which the retention of specific word order is less determinant, except for the semantically plausible reversible sentence constructions mentioned above. Importantly, Segaert et al. (2013) explored brain activity associated with sentence repetition and observed specific involvement of the right intraparietal sulcus when varying syntactic structure, but not when varying verbs, pointing more directly to a specific role of the right intraparietal sulcus area in supporting processing of syntactic order information; in the same study, the left intraparietal sulcus was involved in the processing of both syntactic structure and verbs, in line with its more general attention processing role.

The role of inhibitory and interference control processes during maintenance of verbal information also has to be briefly discussed. As noted in the first section of this review, the frontal part of the ventral language pathway, i.e., the ventrolateral prefrontal cortex, has been associated with resistance to semantic interference during maintenance of semantic information (Thompson-Schill et al., 2002; Martin et al., 2003; Hamilton and Martin, 2007). A similar mechanism has been proposed for the dorsal language pathway, with the posterior inferior prefrontal cortex associated with phonological interference control processes (Postle, 2005; Schnur et al., 2009; Barde et al., 2010). These studies raise the question of the networks that link these prefrontal phonological and semantic interference control processing areas with the fronto-parietal attention control networks, and this especially in the context of multi-word and sentence processing where there is strong susceptibility for semantic and phonological interference to occur. The results of the dynamic causal modeling study by Makuuchi and Friederici (2013) are informative here since they show that the left inferior parietal cortex is increasingly connected with the inferior frontal cortex (pars opercularis) as a function of the complexity of the sentences to be processed, indicating that the parietal regions involved in attentional control potentially interact with frontal areas supporting inhibitory/interference control processes during sentence processing. Future studies will need to determine the precise task and linguistic conditions in which these interactions between attentional control and interference control networks occur. Furthermore, these processes may also be important to support serial order recall. Hoffman et al. (2012) as well as Jefferies et al. (2008) observed that patients with inhibitory/interference control deficits produced large number of order errors in sentence recall and word list recall. This is also in line with the neuroimaging studies discussed earlier and showing that the network activated when processing serial order information is not limited to the right intraparietal sulcus, but also includes superior frontal and prefrontal areas, including the left inferior prefrontal cortex associated with control of interference/inhibition (Majerus et al., 2006a, 2010). Resolution of interference between items competing for the same serial position is likely to be a further important determining factor of serial order maintenance and recall, especially if word order in a STM lists conflicts with existing word order knowledge structure, as is for example the case when recalling incoherent sentences with words in unexpected sentence positions (Hoffman et al., 2012). This type of process is also often used to model serial order recall in computational models, via competitive cueing and winner-take-all mechanisms (e.g., Burgess and Hitch, 1999).

A further central question relates to the nature of the serial order processing system proposed here and the representations used to represent sequence information. As already noted, this system is supposed to support representation of novel and arbitrary serial order information, with linguistic sequence knowledge as encoded in sublexical, lexical and semantic representations supporting processing of familiar or partially familiar sequences. For the representation of novel, arbitrary order information a number of mechanisms have been proposed, computational models considering that serial order is represented either via episodic context, time-based representations or positional vectors (Gupta and MacWhinney, 1997; Henson, 1998; Brown et al., 2000; Burgess and Hitch, 2006). All these models are able to reproduce the main characteristics of serial position coding such as serial position effects (primacy and recency effects) and transposition gradients during serial recall (items from adjacent positions tend to be exchanged more frequently than items from distant positions). A few models also consider that serial position information may be coded within item representations themselves, by considering that items are represented with different activation levels as a function of serial position or contain rank order information (Page and Norris, 1998; Farrell and Lewandowsky, 2002; Botvinick and Plaut, 2006; Botvinick and Watanabe, 2007). As we have seen, the dissociations observed between item and order processing, at both neuropsychological and neuroimaging levels, are difficult to reconcile with these latter accounts. However, this still leaves the question of the nature of serial order codes open. A possible hypothesis is that the right intraparietal sulcus area involved in serial order coding is involved in the creation of temporary domain-general ordinal representations, allowing the encoding of relational information about items within a sequence, and this in a domain-general manner. This is supported by data showing that this region responds to ordinal information also in other domains such as number processing and alphabetic order processing (Pinel et al., 2001; Fias et al., 2007; Kaufmann et al., 2009; Dormal et al., 2012). The general principle of ordinal coding that is, the assumption that serial order representations vary along a dimension that is organized in some ordinal manner (e.g., ordinal ranks, time-based ordinal information, large-to-small primacy gradient principle) is also at the heart of many of the computational serial order STM models discussed here (Gupta and MacWhinney, 1997; Henson, 1998; Page and Norris, 1998; Brown et al., 2000; Farrell and Lewandowsky, 2002; Botvinick and Plaut, 2006)).

We also cannot exclude the possibility that the right intraparietal sulcus area identified here reflects an ancillary attentional function during the processing of serial order information, given the bilateral intraparietal sulci have been shown to be linked to task-related attention (Todd and Marois, 2004; Duncan, 2010; Majerus et al., 2012). In the studies linking the right intraparietal sulcus to processing and maintenance of serial order information, much care had been taken to equate the item and order STM conditions with respect to task difficulty, which was reflected by equal levels of task performance. However, this does not necessarily guarantee that attentional demands were exactly the same in the two conditions. It may even be the case that a specific form of attentional processes directly supports representation of serial order information. Van Dijck et al. (2013) showed that serial position coding in STM and spatial attention actually interact: they showed a rightward spatial attention bias in a dot detection task which linearly increased as a function of the serial position of the items being retrieved in a concurrent STM task, the bias being largest when items from the end of the STM list were retrieved, and the bias being non-existent when items from the start of the STM list were retrieved. These results give rise to a further hypothesis of serial order coding in STM, namely the involvement of spatial attention and spatial frames (i.e., left-to-right reference frame) as supporting the coding of serial position in STM; this hypothesis is in line with greater involvement of the right vs. left intraparietal sulcus, since the right inferior parietal cortex is known to support this type of attention processes (Bricolo et al., 2002). Finally, as already discussed, executive processes such as control of interference and inhibition supported by ventro-lateral prefrontal cortex are also an important factor associated with serial order maintenance and recall, and in some patients, deficits at this level may explain their serial order deficits (Jefferies et al., 2008; Hoffman et al., 2012). In sum, given the current co-existence of many alternative and not necessarily mutually exclusive hypotheses about the processing of novel, arbitrary serial order information, future studies will be needed to achieve a better understanding of the specific neural and cognitive codes and processes involved in serial order maintenance.

#### **CONCLUSIONS**

The account presented here considers that short-term maintenance of verbal information during repetition is not subtended by specific and dedicated storage buffers, contrary to a number of theoretical models of verbal maintenance (Martin et al., 1994b; Baddeley et al., 1998; Baddeley and Logie, 1999; Vallar and Papagno, 2002). Rather, short-term storage results from synchronized and flexible recruitment of language, attentional and serial order processing systems. In this sense, short-term maintenance is an emergent function which depends on neural networks

#### **REFERENCES**


device. *Psychol. Rev.* 105, 158–173. doi: 10.1037/0033-295X.105.1.158


shared with other cognitive functions, including language processing networks (Cowan, 1995; Postle, 2006; Buchsbaum and D'Esposito, 2008). This account is similar to proposals by Postle (2006) and Cowan (1995) who also consider STM as an emergent function, resulting from temporary activation of long-term memory knowledge bases in the language processing networks, and attentional selection and control processes via fronto-parietal networks. Like language processing architectures, these proposals do not specifically consider the role of serial order processing and maintenance. On the other hand, serial order processing has been the focus of very detailed computational frameworks of verbal STM, with some additional consideration for interactions with linguistic representational systems, but no consideration of attentional processes. The present work is an attempt at providing a bridge between three core component processes of verbal shortterm maintenance, taking the form of an integrative cognitive and neural framework of the language, attention and serial order processes supporting maintenance during language repetition. This framework provides new perspectives for the understanding of language repetition and maintenance deficits, by allowing for a nuanced and integrative assessment of the multiple components that can lead to breakdown of maintenance of verbal information, including the consideration of the non-linguistic domain-general mechanisms involved in language repetition.

#### **ACKNOWLEDGMENTS**

This work was supported by grants F.R.S.-FNRS N◦1.5.056.10 (Fund for Scientific Research FNRS, Belgium) and PAI-IUAP P7/11(Belgian Science Policy).


*Cereb. Cortex* 10, 512–528. doi: 10.1093/cercor/10.5.512


*Neurosci.* 14, 980–993. doi: 10.1162/089892902320474454


On the capacity of attention: its estimation and its role in working memory and cognitive aptitudes. *Cogn. Psychol.* 51, 42–100. doi: 10.1016/j.cogpsych.2004.12.001


D., et al. (2002). Automatic differentiation of anatomical patterns in the human brain: validation with studies of degenerative dementias. *Neuroimage* 17, 26–46. doi: 10.1006/nimg.2002.1202


Topographic segregation and convergence of verbal, object, shape and spatial working memory in humans. *Neurosci. Lett.* 323, 156–160. doi: 10.1016/S0304-3940(02)00125-8


and speech processing? *Trends Cogn. Sci.* 10, 480–486.


*Affect. Behav. Neurosci.* 4, 553–563. doi: 10.3758/CABN.4.4.553


for order and the parietal cortex: an event-related functional magnetic resonance imaging study. *Neuroscience* 139, 311–316. doi: 10.1016/j.neuroscience.2005.04.071


functional MRI study. *Proc. Natl. Acad. Sci. U.S.A.* 109, 1883–1888. doi: 10.1073/pnas.1121077109


in working memory distinct from representations in longterm memory?: neural evidence in support of a single store. *Psychol. Sci.* 21, 1123–1133. doi: 10.1177/0956797610376651


one function is sensory gating. *J. Cogn. Neurosci.* 17, 1679–1690. doi: 10.1162/089892905774589208


Identification of a pathway for intelligible speech in the left temporal lobe. *Brain* 123, 2400–2406. doi: 10.1093/brain/123. 12.2400


case study. *Brain Cogn.* 18, 12–33. doi: 10.1016/0278-2626 (92)90108-X


of auditory verbal short-term memory. *Neuropsychologia* 9, 377–387. doi: 10.1016/0028-3932(71)90002-9

Wilshire, C., and Fisher, C. (2004). "Phonological" dysphasia: a crossmodal phonological impairment affecting repetition, production, and comprehension. *Cogn. Neuropsychol.* 21, 187–210. doi: 10.1080/02643290342000555

**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 26 April 2013; paper pending published: 03 June 2013; accepted: 21 June 2013; published online: 12 July 2013.*

*Citation: Majerus S (2013) Language repetition and short-term memory: an integrative framework. Front. Hum. Neurosci. 7:357. doi: 10.3389/fnhum. 2013.00357*

*Copyright © 2013 Majerus. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any thirdparty graphics etc.*

### The anatomo-functional connectivity of word repetition: insights provided by awake brain tumor surgery

#### *Sylvie Moritz-Gasser 1,2 and Hugues Duffau1,2\**

*<sup>1</sup> Department of Neurosurgery, Hôpital Gui de Chauliac, Centre Hospitalier Universitaire Montpellier, Montpellier, France*

*<sup>2</sup> Team "Plasticity of Central Nervous System, Stem Cells and Glial Tumors," INSERM U1051, Institute for Neuroscience of Montpellier, Hôpital Saint Eloi, Montpellier, France*

*\*Correspondence: h-duffau@chu-montpellier.fr*

#### *Edited by:*

*Matthew A. Lambon Lambon Ralph, University of Manchester, UK*

#### *Reviewed by:*

*Cornelius Weiller, Universität Freiburg, Germany Lauren L. Cloutman, University of Manchester, UK*

Word repetition (WR) is the language process which consists of immediately saying a word after hearing it. This skill plays a crucial role in language development by enabling the learning of new words. Clinically, repetition skill participates in the classification of aphasic syndromes, and can inform about prognosis (Hosomi et al., 2009) and guide rehabilitation (Schlaug et al., 2009). In addition, uncontrolled pathological repetition termed echolalia is observed in neurological disorders such as autism.

Despite improving knowledge of cortical networks underlying WR, its subcortical connectivity has received less attention. Here, we investigate the white matter pathways involved in WR.

#### **NEUROPSYCHOLOGICAL CONSIDERATIONS**

WR involves several levels of linguistic processing. The conversion of phonological information into articulatory-based representations requires multiple steps: (1) phonological decoding and temporary storage within the verbal working memory system utilizing subvocal rehearsal (Baddeley, 1996); (2) semantic recognition as word versus pseudo word; (3) phonological encoding and (4) planning for execution by the motor articulatory programs. These parallel and distributed levels of processing are supervised by initiation and inhibitory cognitive mechanisms.

The debate about the respective contributions of phonological and lexicalsemantic representations in auditory WR remains open. WR impairment may be explained by a "dual-route" model in which non-lexical and lexical-semantic pathways are distinct with some degree of interaction (Hanley et al., 2002). An alternative explanation is the "singleroute" model based on a "globality assumption," in which phonological and lexical-semantic representations interact in all aspects of repetition (Dell et al., 1997). In clinical practice, phonological errors during immediate repetition are regularly observed in patients harboring lesions involving the semantic system while to a lesser extent semantic errors may be observed during immediate repetition in patients with damage to the phonological system. Even though these observations support that semantic and phonological representations interact within a large-scale network, WR neural foundations remain poorly understood.

#### **NEURAL BASES OF WORD REPETITION: THE CLASSICAL DORSAL STREAM**

Wernicke (1874) described a region within the left posterior temporal lobe devoted to auditory word processing and postulated the existence of a connection between this area and the inferior frontal gyrus (IFG). He suggested that a lesion of this connection induced, in addition to phonemic paraphasias with associated correction strategies, the inability to repeat. The white matter pathway underlying this connection has been referred as the arcuate fasciculus (AF), and language disorders elicited by damage to the AF were termed "conduction aphasia" (Goodglass, 1992). Geschwind (1965) argued that this disconnection syndrome may be caused by a lesion to either the AF or to the left inferior parietal lobe (IPL). Advances in neuroimaging combined with lesion studies have confirmed the involvement of the supramarginalis gyrus (SMG) within the IPL in WR (Fridriksson et al., 2010), in addition to the IFG and cortical areas surrounding the posterior portion of the Sylvian fissure (Damasio and Damasio, 1980; Price et al., 1996; Baldo and Dronkers, 2006; Buchsbaum et al., 2011). In particular, a small temporalparietal area ("area Spt" for Sylvianparietal-temporal), inferior to the SMG, is thought to be crucial in phonological working memory and WR (Baldo et al., 2011). According to the dual stream model of language processing, the area Spt might represent an "interface" for the integration of sensorimotor representations of verbal sounds. Thus, lesions to Spt may result in inefficient transfer of auditory representations into articulatory representations stored in the IFG and ventral premotor cortex. The described temporal-parietalfrontal network would constitute a dorsal language stream dedicated to phonological processing, i.e., the ability to link sound to articulation—while a ventral stream would be dedicated to semantic processing (linking sound to meaning) (Hickok and Poeppel, 2004). It is worth noting that Wernicke himself assumed the existence of a ventral stream in addition to the dorsal one, running behind the insula and connecting temporal and frontal regions. However, this very modern view fell into oblivion up to the recent highlighting of such a dual stream organization of language (Weiller et al., 2011).

Recent studies combining functional MRI and diffusion tensor imaging show that this dorsal stream is subserved by the superior longitudinal fasciculus (SLF) (Catani et al., 2005; Saur et al., 2008). The SLF is comprised of three components: a medial direct pathway corresponding to the AF and two indirect pathways running laterally: an anterior part connecting the ventral premotor cortex with the inferior parietal cortex and a posterior part connecting the inferior parietal cortex with posterior temporal areas (Catani et al., 2005). In this dual stream model, the major role of a left dorsal network in WR, connecting via the AF/SLF complex, posterior temporal, inferior parietal, and inferior frontal areas, is now admitted (Fridriksson et al., 2010).

The question remains as to whether the dorsal stream is sufficient for WR or whether the ventral pathway can impact WR, as was recently suggested (Berthier et al., 2012). Intraoperative language mapping performed during awake brain tumor surgery, combined with perioperative language evaluations, may provide significant insights into this question.

#### **INTRAOPERATIVE MAPPING AND PERIOPERATIVE EXAMINATIONS: DOES THE VENTRAL STREAM PLAY A ROLE IN WORD REPETITION? INTRAOPERATIVE LANGUAGE MAPPING**

Awake surgery for tumor patients allows maximizing the extent of resection while preserving brain functions (Duffau, 2005). Direct electrical stimulation (DES) is applied over the brain, which temporarily inactivates restricted structures both at cortical and subcortical levels. Sensorimotor and language functions are continuously assessed throughout the resection. If a disorder is reproducibly observed during DES of a given area, that area is regarded as critical to the functional network being evaluated and is thus preserved. Based on an experience with hundreds of procedures, awake surgery enables mapping of eloquent pathways in humans with a spatiotemporal resolution unmatched by other techniques (Duffau et al., 2008). In practice, language is evaluated using a picture naming test: the patient is asked to name black and white pictures of common objects that are presented on a computer screen. When transient language disorders (such as phonological paraphasias, semantic paraphasias or anomias) are generated by DES, WR is assessed by simply asking the patient to repeat the correct word.

Here we report a summary of the more frequent disorders observed during intraoperative DES in 200 patients undergoing awake surgical procedures for brain tumors between 2009 and 2012. Reproducible disturbances were observed in approximately 80% of cases, demonstrating a reliable methodology despite interindividual variability and brain reorganization induced by the glioma (**Table 1**).

At the cortical level, DES of the inferior part of the SMG induced anomias or phonological disorders during the picture naming test. During WR tasks, phonological paraphasias were observed. DES of the ventral premotor cortex induced always a speech arrest, during both naming and WR. DES of the posterior part of the superior and middle temporal gyri induced anomias, with preservation of WR.

At the subcortical level, DES of the AF reliably induced phonological disorders during both naming and WR. Moreover, DES of the anterior part of the lateral SLF induced articulatory disorders during naming and WR. Finally, DES of the inferior fronto-occipital fasciculus (IFOF) produced semantic paraphasias related to the target word (i.e., belonging to the same category than the target word, or having an associative link with it) during naming, whereas during WR, the patient repeated normally or perseverated on the previous semantic paraphasia.

Our observations support the dual stream model of language processing for


*Upper: Language disorders induced by intraoperative DES of brain areas of interest during picture naming, and then followed by word repetition. Lower: Performances on the immediate postoperative language evaluation, likely reflecting edema-induced disruption of functional networks (phonological/semantic) adjacent to the surgical cavity. Abbreviations: DES, direct electric stimulation; word repet, word repetition; sp. arrest, speech arrest; Pho P, phonological paraphasia; artic disord., articulatory disorder; Sem P, semantic paraphasia; Perse, perseveration; vPMC, ventral premotor cortex; inf SMG, inferior supramarginalis gyrus; pSTG, posterior superior temporal gyrus; pMTG, posterior middle temporal gyrus; sup pAF, postero-superior arcuate fasciculus; lat vSLF, lateral ventral part of the superior longitudinal fasciculus; IFOF, inferior fronto occipital fasciculus; Ø, absence of response; regul, regularization (transformation of a pseudo word into a real word).*

#### **Table 1 | Neural bases of word repetition.**

both naming and WR. First, the dorsal phonological stream is supported by two components of the SLF: the longer, more medial AF which is crucial for decoding and encoding phonological representations (i.e., the phonological buffer), and the antero-lateral SLF III (Makris et al., 2005), which is essential in linking phonological information to articulatory-based representations i.e. the articulatory loop of verbal working memory (Duffau et al., 2003; Maldonado et al., 2011). In a recent study that utilized fiber dissection and tractography to isolate the subcomponents of the SLF and to identify their cortical terminations, it was demonstrated that (1) the AF connects the middle and inferior temporal gyri with the precentral and inferior frontal gyri, (2) the anterior segment of the SLF (SLF III) connects the SMG and superior temporal gyrus with the precentral gyrus (Martino et al., 2013). AF/SLF III connects different frontal, temporal and parietal areas that were previously shown to be critically involved in WR. Thus, the critical role of the dorsal stream in WR was validated by intraoperative DES. Concerning the ventral semantic stream, we have previously demonstrated the underlying role of the IFOF, where stimulation of this fascicle systematically induced semantic disorders during naming (Duffau et al., 2005; Duffau, 2008). Interestingly, WR can be unaffected or impaired (perseveration) during IFOF stimulation.

In summary, these intraoperative stimulation data indicate that when the AF/SLFIII is transiently inhibited by DES, inducing a disruption in the connectivity of the dorsal phonological network, WR is inefficient. When the ventral semantic network is inhibited via IFOF stimulation, WR can be either completely unaffected or disrupted, resulting in perseveration. Therefore, it seems that although the dorsal phonological network is crucial for WR, the ventral semantic network may play a role in control mechanisms of semantic activations during WR.

#### **PERIOPERATIVE LANGUAGE ASSESSMENTS**

Per our standard protocol, every patient undergoing awake surgery for glioma resection has preoperative (day prior to surgery) and postoperative (5 days and 3 months following operation) language evaluations. These evaluations include:


Preoperative examination allows assessment of the impact of the tumor on language functions. Immediate postoperative evaluation is particularly informative, because the postsurgical edema surrounding the surgical cavity may induce transient functional disorders (a few day) thereby illustrating the functional role of brain regions near the surgical cavity. Finally, the language examination 3 months after resection enables assessing the efficiency of functional reorganization following speech rehabilitation.

Preoperative evaluations showed no significant deficit.

#### **POSTOPERATIVE EVALUATIONS**

During the immediate postoperative evaluation of picture naming, as during surgery, when a disorder was observed, the patient was asked to repeat the correct word. Furthermore, at the end of the assessment, all patients were asked to repeat, with only auditory input, a series of simple real words and pseudo-words (**Table 1**).

Patients in whom the phonological network was primarily involved by postoperative edema (parietal operculum, parietal-posterior temporal lesions) presented more often with WR disorders than patients in whom the semantic network is mostly involved (frontal-temporal, inferior temporal lesions). Typically, real word and pseudo-WR gave rise to phonological or articulatory disorders, suggesting that when the connectivity underlying the phonological system is transiently damaged, the semantic network alone is generally not able to compensate for the phonological stream. Nonetheless, in some cases, repetition of real words was possible even in the presence of naming disorders. Therefore, we suggest some degree of participation of the semantic system in real WR.

Patients in whom the semantic network was primarily involved by postoperative edema (frontal-temporal, inferior temporal lesions) rarely experienced repetition disorders. In these rare cases, the most frequent disorder was phonological paraphasia, indicating that when the connectivity underlying the semantic system is transiently damaged, the phonological network may sometimes not be able to link sound to articulation for accurate WR. Another disorder observed was the regularization of pseudo-words, i.e. the production of a real word with phonological similarities, suggesting that the inefficiency of lexical judgment due to semantic disorganization is not compensated by the phonological processing.

Three months after surgery, following intensive speech-therapy, most patients recovered normal WR performance. The few patients who presented with persistent WR disorders were those with cavity or residual tumor involving the phonological network—but not the semantic network.

In summary, despite the crucial role of the phonological stream in repetition of real words or pseudo words, these observations underline strong interactions between the phonological and semantic systems. This is in agreement with previous studies highlighting on the one hand a *"division of labour"* between the two pathways (this division being not mutually exclusive, especially in the context of poststroke recovery), and on the other hand the *"interactive contributions of the two pathways"* in single-word comprehension and production, including WR (Nozari et al., 2010; Ueno et al., 2011).

#### **CONCLUSION**

The dorsal phonological pathway, supported by the left AF/SLF complex subcortically, is crucial to accurate WR. It enables the conversion of auditory input, processed in verbal working memory system, into phonological and articulatory-based representations. Although this contribution is essential, the ventral semantic pathway, connected by the left IFOF, also contributes to WR of real words or pseudo-words. Indeed, perioperative and intraoperative language evaluations of gliomas patients undergoing awake surgery highlight the strong interaction of both pathways in WR in order to convert auditory input into articulatory output efficiently.

Further studies are needed to validate this dual stream model of WR, and to bring insights into the possible role of the right hemisphere.

#### **REFERENCES**


electrostimulation. *Neuroreport* 14, 2005–2008. doi: 10.1097/00001756-200310270-00026


mapping in awake patients. *J. Neurosurg.* 115, 770–779. doi: 10.3171/2011.5.JNS112


*Received: 23 April 2013; accepted: 10 July 2013; published online: 29 July 2013.*

*Citation: Moritz-Gasser S and Duffau H (2013) The anatomo-functional connectivity of word repetition: insights provided by awake brain tumor surgery. Front. Hum. Neurosci. 7:405. doi: 10.3389/fnhum.2013.00405 Copyright © 2013 Moritz-Gasser and Duffau. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

### The roles of the "ventral" semantic and "dorsal" pathways in *conduite d'approche:* a neuroanatomically-constrained computational modeling investigation

#### *Taiji Ueno1,2 and Matthew A. Lambon Ralph1 \**

*<sup>1</sup> Neuroscience and Aphasia Research Unit, School of Psychological Sciences, University of Manchester, Manchester, UK <sup>2</sup> Japan Society of the Promotion of Science, Tokyo, Japan*

#### *Edited by:*

*Marcelo L. Berthier, University of Malaga, Spain*

#### *Reviewed by:*

*Jamie Reilly, University of Florida, USA Chris Code, University of Exeter, UK*

#### *\*Correspondence:*

*Matthew A. Lambon Ralph, Neuroscience and Aphasia Research Unit, School of Psychological Sciences, Zochonis Building (T7d, 3rd floor), University of Manchester, Oxford Road, Manchester, M13 9PL, UK e-mail: matt.lambon-ralph@ manchester.ac.uk*

Ever since the 19th century, the standard model for spoken language processing has assumed two pathways for repetition—a phonological pathway and a semantic pathway—and this idea has gained further support in the last decade. First, recent *in vivo* tractography studies have demonstrated both the "dorsal" (via arcuate fasciculus) and "ventral" (via extreme capsule and uncinate fasciculus) pathways connecting from the primary auditory area to the speech-motor area, the latter of which passes through a brain area associated with semantic processing (anterior temporal lobe). Secondly, neuropsychological evidence for the role of semantics in repetition is *conduite d'approche*, a successive phonological improvement (sometimes non-improvement) in aphasic patients' response by repeating several times in succession. Crucially, conduite d'approche is observed in patients with neurological damage in/around the arcuate fasciculus. Successful conduite d'approche is especially clear for semantically-intact patients and it occurs for real words rather than for non-words. These features have led researchers to hypothesize that the patients' disrupted phonological output is "cleaned-up" by intact lexical-semantic information before the next repetition. We tested this hypothesis using the neuroanatomically-constrained dual dorsal-ventral pathway computational model. The results showed that (a) damage to the dorsal pathway impaired repetition; (b) in the context of recovery, the model learned to compute a correct repetition response following the model's own noisy speech output (i.e., successful conduite d'approche); (c) this behavior was more evident for real words than non-words; and (d) activation from the ventral pathway contributed to the increased rate of successful conduite d'approche for real words. These results suggest that lexical-semantic "clean-up" is key to this self-correcting mechanism, supporting the classic proposal of two pathways for repetition.

**Keywords: repetition, dual dorsal-ventral pathway, conduite d'approche, semantics, computational modeling**

#### **INTRODUCTION**

Classic 19th century neurologists explained various kinds of language impairment (aphasia) by associating a specific language function to different neural substrates, via accumulated human post mortem data, and construction of neuroanatomical models of language processing (see, Weiller et al., 2011, for direct translation). During this era, Wernicke (Eggert, 1977) and Lichtheim (1885) proposed two pathways from "auditory word images" to "motor word images" for repetition: a phonological pathway and a semantic pathway. According to Wernicke, an inability to repeat despite preserved speech comprehension/production was termed *conduction aphasia* and argued that this reflected a failure in the direct, phonological (non-semantic) pathway. Based on his dual-pathway model, Wernicke predicted a possible patient profile, comprised of an inability to repeat pseudowords accompanied by semantically-related errors in real word repetition, which would be a signature of the impact of semantics in repetition (Eggert, 1977). In recent times, of course, this type of patient has been reported in many studies

(Michel and Andreewsky, 1983; McCarthy and Warrington, 1984; Butterworth and Warrington, 1995), and termed *deep dysphasia* by Michel and Andreewsky (1983). Another piece of evidence for the role of semantics in repetition is *conduite d'approche*, a progressive phonological improvement in conduction aphasic patients' responses by repeating several times in succession (Shallice and Warrington, 1977; Joanette et al., 1980; Kohn, 1984; Köhler et al., 1998; Saffran, 2000; Nadeau, 2001; Jacqueline and Marta Sarolta, 2008; Berthier et al., 2012), a behavior that is observed most often in the post-recovery phase (Köhler et al., 1998; Franklin et al., 2002; Jacqueline and Marta Sarolta, 2008). Nadeau (2001), amongst others, suggested that this behavior reflects a "clean-up" self-correcting mechanism underpinned by intact lexical-semantic information. Consistent with this hypothesis, successful conduite d'approche sequences are observed for semantically-intact patients (Köhler et al., 1998; Franklin et al., 2002), but not for semantically-impaired jargon aphasia (Butterworth, 1979; Nadeau, 2001). Moreover, this behavior is more common for real words rather than non-words (Kohn, 1984; Köhler et al., 1998; Nadeau, 2001; Franklin et al., 2002). Nadeau (2001) further elaborated Wernicke's dual pathways model and called for an implemented computational model with as much neuroanatomical constraints as possible (O'Reilly, 1998) in order to understand how both intact and aphasic behaviors emerge from a neuroanatomical brain model. The current study delivers Nadeau's request by simulating conduite d'approche using a model under the constraints from contemporary neuroanatomical data (Ueno et al., 2011).

Recent developments in neuroimaging have contributed to more precise neuroanatomical models of language function than those from the 19th century. Historically, the neural correlates of auditory language processing have been dominated by deliberation of the posterior superior temporal lobe (Wernicke's area), the inferior frontal lobe (Broca's area) and their connection via the direct *dorsal* pathway along the arcuate fasciculus, which subserves sound-motor mapping (Lichtheim, 1885; Geschwind, 1970; Eggert, 1977, but see, Weiller et al., 2011 for the direct translation of Wernicke's work). In the contemporary literature, there is an increasing number of studies that also implicate a *ventral* pathway connecting auditory regions, the anterior part of temporal lobe and prefrontal areas via the *extreme capsule complex* and/or the *uncinate fasciculus* (Hickok and Poeppel, 2004, 2007; Parker et al., 2005; Saur et al., 2008; Rauschecker and Scott, 2009; Binney et al., 2012). In contrast to the sound-motor mapping of the dorsal pathway, this ventral pathway has been associated with realization of a sound-meaning-motor mapping (Parker et al., 2005; Hickok and Poeppel, 2007; Saur et al., 2008; Rauschecker and Scott, 2009). These dual dorsal-ventral pathways models have indicated that both pathways work together to support language activities with a functional division of labor. Previously, we constructed a computational model by carefully mirroring this dual pathway neuroanatomy (Ueno et al., 2011) as well as assimilating important knowledge from the past computational models of cognitive/language processing (McClelland et al., 1989; Seidenberg and McClelland, 1989; Plaut et al., 1996; Dell et al., 1997; Joanisse and Seidenberg, 1999; Plaut and Kello, 1999; Harm and Seidenberg, 2004; Rogers et al., 2004; Botvinick and Plaut, 2006; Woollams et al., 2007; Dilkina et al., 2008, 2010; Sibley et al., 2008; Nozari et al., 2010; Welbourne et al., 2011; Yang et al., 2013). The resultant dorsal-ventral dual pathway neurocomputational model (see **Figure 1**) successfully simulated both normal and impaired language behaviors within a single neurocomputational framework (Ueno et al., 2011). In line with the original Wernicke-Lichtheim ideas, the dorsal pathway of this model developed to be more crucial for phonological processing whereas the ventral pathways was relatively dedicated to semantically-related processing (speech comprehension and speech production from meaning).

If this model is a successful one, then it should be able to simulate additional neuropsychological evidence regarding the claim for two repetition pathways. For this purpose, the current model was tested on its ability to simulate conduite d'approche. Given the division of labor across pathways in the model (Ueno et al., 2011) and the neuropsychological evidence noted above (Butterworth, 1979; Kohn, 1984; Köhler et al., 1998; Franklin et al., 2002; Jacqueline and Marta Sarolta, 2008), the following specific predictions can be made: (a) damage to the dorsal pathway of the model should result in conduction aphasia; (b) following recovery, the model should show more successful conduite d'approche sequences, indexed by the ratio of successfully repeated items in the second trial to the incorrectly repeated items in the first trial; (c) this should be more frequent for real words than for non-words; and finally, (d) successful conduite d'approche should be supported partly by intact lexical-semantic information from the ventral pathway. As such, a diagnostic lesioning analysis to the ventral pathway of the model should reduce the frequency of successful conduite d'approche, particularly for real words.

**FIGURE 1 | Implemented neuroanatomically-constrained dual-pathway language model (left), and its exact architecture and activation flow in a simple-recurrent Elman network model (right).** Note. See Ueno et al. (2011) for the full implementation details.

#### **MATERIALS AND METHODS**

#### **MODEL ARCHITECTURE AND ACTIVATION FLOW**

The left half of **Figure 1** summarizes the architecture of the neuroanatomically-constrained model, whilst the right panel shows the exact computational architecture of the paralleldistributed processing (PDP) model for Japanese spoken language processing (Ueno et al., 2011). This is a simple-recurrent Elman network model, in which upward arrows represent feedforward connections and downward ones denote the recurrent connections (Elman et al., 1996; Plaut and Kello, 1999; Sibley et al., 2008). We used the Light Efficient Neural Simulator (LENS software) to implement the model (Rohde, 1999).

#### **TRAINING (DEVELOPMENT), TASKS, AND PARAMETERS**

The simulation was divided into two phases. The first stage was the developmental phase, where the network was trained on three language tasks (see the upper half of **Table 1**) until its performance matched that of human adults. The implementation in this phase (e.g., items, parameters, materials) was exactly the same as that employed in Ueno et al. (2011), and readers are referred to the previous paper for further details (for both the procedure and the results). The focus of the current study was the post-recovery conduite d'approche data (see next section).

One detail of the previous simulations is worth rehearsing here (the time course of the trained tasks during development), in order that a new implementation method reported below (for simulating post damage recovery) is easier to follow. As listed in the upper half of **Table 1**, the temporal dynamics of the trained tasks were divided into three or six discrete time events. First, in repetition, an auditory word input (3-mora <sup>1</sup> sequence) was presented sequentially, one mora per event (in the 1st–3rd time events). After this, the network was required to reproduce this time-varying auditory sequence in the same order, one mora per event (in the 4th–6th time events). In

auditory comprehension, an auditory word input was presented in the same sequential manner (1st–3rd time events), during which the network was pressured to compute the correct semantic pattern as quickly as possible. In other words, the network was trained to compute the target semantic pattern at every time event. Finally, in speaking/naming, a semantic pattern was presented for three time events (1st–3rd time events), during which the network generated the correct 3-mora sequence (the phonological form for that meaning) in the correct order, one mora per time event. The development phase consisted of 200 epochs of training. During every epoch, each word appeared once for repetition, three times for comprehension and two times for speaking/naming, in random order. The final performance after training was exactly the same as reported in Ueno et al. (2011).

#### **SIMULATED LESIONING, RECOVERY, AND CONDUITE D'APPROCHE**

The second simulation phase targeted lesioning (damage) and the recovery process. Following the neuropsychological literature on conduction aphasia with conduite d'approche behavior, the simulated lesioning was applied to the dorsal pathway of the model. Given that conduite d'approche behavior can be observed without subcortical (arcuate fasciculus) damage (Berthier et al., 2012), simulated damage was applied not only to the connectivity in the model (analogous to the underlying white matter damage) but also to the unit outputs (reflecting the effects of cortical damage). Thus, 20% of the incoming links from the iSMG (supramarginal gyrus) layer to the speech-motor layer were randomly selected and removed, and Gaussian noise was added to the unit outputs of the iSMG layer (*SD* = 0*.*2). The resultant model became conduction aphasic, replicating the previous results of Ueno et al. (2011, **Figure 3**, p. 388).

After lesioning to the dorsal pathway, the recovery phase simulation was commenced. The conduction aphasic model was trained on the three language tasks as before with one addition, in order to simulate conduite d'approche behavior. Specifically, training consisted of the following four tasks (see lower half of **Table 1** for the temporal dynamics). One task was (standard) repetition which was trained in the same way as during development

**Table 1 | Time course of the trained tasks in the development phase (upper half) and in the recovery phase (lower half).**


*\*Since the network was also trained on standard repetition, the network automatically produced the auditory output during the 4th–6th time events as well. This means the network was trained to repeat twice. If the second repetition was correct whilst the first one was incorrect, this was counted as successful conduite d'approche behavior. See main text for details.*

<sup>1</sup>The mora is a subsyllabic spoken unit in Japanese. Morae include all of the following types of elements: a vocalic nucleus (V), a nucleus with onset (CV or CCV), a nasal consonant (N) in syllabic coda position, a geminate consonant or a long vowel.

(i.e., "listening" in the 1st–3rd time events → "reproducing" in the 4th–6th time events). In addition, the model was sometimes trained to reproduce the "heard" auditory sequence *once again* during the 7th–9th time events. In this conduite d'approche trial, the target output pattern was only applied during the 7th–9th time events. Since the model was trained for standard repetition as well (i.e., where the target was applied during 4th–6th time events), the model actually learned to produce the phonological output twice. The focus of the investigation here was whether or not the model was able to produce the correct output in the second trial after the first repetition had been incorrect (i.e., successful conduite d'approche sequence). We should note here that at least some patients, if permitted, will often continue to produce approximations of the target word, potentially ad infinitum. Under everyday circumstances, there are many potential sources of information to help with error monitoring including kinesthetic/articulatory feedback, auditory feedback, and non-verbal feedback from a person's conversation partner. These important observations go beyond the scope of this simulation which, instead, was focused on the basis for the changes between successive approximations of the target word or non-word.

Next, a slight change was added to comprehension trials as well. Given a conduite d'approche trial extended over nine time "ticks," the network was also trained to maintain the meaning of the "heard" auditory sequence for the same duration. Thus, in this recovery phase, the auditory input for comprehension was presented from the 1st to 3rd time events, but the target semantic pattern was applied from the 1st to the 9th time point. A key issue for this study was whether or not the network recovered to use this semantic information for successful conduite d'approche as an *emergent property* (i.e., the model was not "forced" by the modeler to use this maintained semantic information neither for the first nor the second repetition). The speaking/naming trial was the same as the development phase.

The duration of this recovery phase was 20 epochs (one-tenth of the development phase). In each epoch, each word appeared once for standard repetition, once for conduite d'approche, three times for comprehension, and two times for speaking. The network performance was evaluated every five epochs (**Figure 3**). Learning rate was set to 0.1, and weight decay rate was set to 0.0000001. We should note here that all simulation parameters, including learning rate and weight decay are specific to this model—indeed all computational investigations set modelspecific values depending on the nature of the training, representations, etc. The crucial thing to note here is that these parameters are fixed for all the simulations generated in this model (both in this paper and its sister publication: Ueno et al., 2011).

#### **MATERIALS FOR TESTING CONDUITE D'APPROCHE, LEXICALITY, AND SCORING METHOD**

Throughout the Results section below, we focus on conduite d'approche accuracy post recovery (see, Ueno et al., **Figures 2**–**4**, for the performance before damage). A 51-word set and 108 non-word set were created (matched for bimora frequency and Japanese pitch accent types, Tanida et al., 2010) for probing the network's ability to repeat twice. These item sets were used in Ueno et al. (2011, **Figure 4**, p. 389) to test the lexicality effect.

**FIGURE 2 | Rate of successful conduite d'approche post recovery.** Note. Accuracy is expressed as the number of successfully repeated items at the second time (7th–9th time event, see **Table 1**) divided by the number of incorrectly repeated items at the first time (4th–6th time events). Thus, this is a rate of successful self-correction in the conduite d'approche attempts. *Y* -axis error bars indicate standard errors.

The reported accuracy for conduite d'approche in the subsequent Figures refers to the number of successfully repeated items in the second trial (7th–9th time events, see **Table 1**) as a proportion of the number of incorrectly repeated items in the first trial (4th–6th time events). Note here that the network was trained to repeat during the 4th–6th time events and during the 7th–9th time events in equal frequency. Therefore, there was not an implementation-specific reason to expect a correct repetition following an incorrect repetition, but rather this successful conduite d'approche behavior was an emergent property.

#### **ROLE OF SEMANTICS (DIAGNOSTIC LESIONING TO THE VENTRAL PATHWAY)**

The role of semantics in conduite d'approche was probed by additional lesioning to the ventral pathway (**Figure 4**). The same approach was used in Ueno et al. (2011, **Figure 4**, p. 389). The study-specific parameters were as follows. The post-recovery model was tested for repeating twice as a function of increasing severity of diagnostic damage to the two layers in the ventral pathway (the ventral anterior temporal lobe layer and the anterior superior temporal gyrus/sulcus layer, see **Figure 1**). Twenty

levels of damage severity were simulated by adding an increasing amount of noise to the output of these two layers (ranging from 0.01 to 0.2 to vATL layer and from 0.005 to 0.1 to aSTG/STS layer, in equal intervals) and by removing an increasing proportion of the links incoming to these layers (from 0.5 to 10% to vATL layer and from 0.25 to 5% to aSTG/STS layer).

#### **STATISTICAL ANALYSIS**

The procedure described above was conducted on 10 randomly initialized (different initial weights) networks (as an analogy of collecting 10 participants in human experiments), and data from these 10 models were entered into a statistical analysis. Each damaged model was probed with random noise five times and, as a result, performance of the damaged model was measured 50 times and averaged for a stable outcome.

#### **RESULTS**

#### **CONDUITE D'APPROCHE POST RECOVERY**

**Figure 2** shows the conduite d'approche accuracy after 20 epochs of recovery. Like real patients, successful conduite d'approche was observed mainly for words, rather than non-words, *F(*1*,* <sup>9</sup>*)* = 34*.*39, *p <* 0*.*01. The rate of successful conduite d'approche for words (25%) was close to that of real patients (approximately 30% accurate, Köhler et al., 1998).

#### **THE EFFECT OF RECOVERY ON SUCCESSFUL CONDUITE D'APPROCHE**

**Figure 3** shows the conduite d'approche accuracy during recovery. The accuracy increased steadily during recovery and reached asymptote around 15th–20th epochs. There was a significant difference in accuracy between the early recovery phase (5th epoch) and the later phase (20th epoch), *F(*1*,* <sup>9</sup>*)* = 56*.*87, *p <* 0*.*01. Thus, like real patients, there was a greater number of successful conduite d'approche attempts post recovery (Köhler et al., 1998; Franklin et al., 2002; Jacqueline and Marta Sarolta, 2008).

#### **THE ROLE OF THE "VENTRAL" SEMANTIC PATHWAY IN CONDUITE D'APPROCHE**

**Figure 4** shows the effect of simulated diagnostic lesioning to the ventral pathway to probe its contribution to the conduite d'approche accuracy. The rationale behind this approach is that if activation from the ventral pathway is crucial, then accuracy should decrease as a function of increasing additional damage to the ventral pathway. This analysis found that the conduite d'approche accuracy for real words was particularly sensitive to damage to the ventral pathway. There was a significant difference in accuracy between the post-recovery model without additional ventral damage (left edge of **Figure 4**) and that with severest additional damage (right edge of **Figure 4**) for real words, *F(*1*,* <sup>9</sup>*)* = 16*.*03, *p <* 0*.*01, Cohen's *d* = 2*.*1. In contrast, the effect of this diagnostic lesioning for non-words was significant, but the effect size was smaller *F(*1*,* <sup>9</sup>*)* = 9*.*59, *p* = 0*.*012, Cohen's *d* = 1*.*2. The interaction between lexicality and with/without additional ventral damage was significant *F(*1*,* <sup>9</sup>*)* = 11*.*09, *p <* 0*.*01. Therefore, the contribution from the ventral semantic pathway was more crucial for real words than for non-words.

#### **DISCUSSION**

Since the work of Wernicke and Lichtheim, the neuroanatomical model for spoken language processing has proposed two pathways for repetition—a phonological pathway and a semantic pathway (Lichtheim, 1885; Eggert, 1977). More recently, this notion has been supported both by contemporary neuroanatomical data (tractography) that have demonstrated a ventral white matter pathway that passes through semantically-related brain areas (Hickok and Poeppel, 2004, 2007; Parker et al., 2005; Saur et al., 2008; Rauschecker and Scott, 2009; Binney et al., 2012) and by neuropsychological data regarding conduite d'approche in conduction aphasic patients (Shallice and Warrington, 1977; Joanette et al., 1980; Kohn, 1984; Köhler et al., 1998; Saffran, 2000; Nadeau, 2001; Franklin et al., 2002; Jacqueline and Marta Sarolta, 2008; Berthier et al., 2012) Nadeau (2001), amongst others, hypothesized that this conduite d'approche behavior is underpinned by a lexical-semantic mechanism which "cleans up" noisy phonology activation, and called for computational modeling to demonstrate this idea. The current study aimed to test this hypothesis by using a neuroanatomically-constrained dual dorsal-ventral pathways computational model (Ueno et al., 2011).

The current simulation closely mirrored data from conduction aphasic patients with a conduite d'approche behavior. First, damage to the dorsal pathway of the trained (adult) model gave rise to conduction aphasia (Ueno et al., 2011). The model was then re-exposed to the learning environment in order to simulate post recovery data (Welbourne and Lambon Ralph, 2007). Following this simulated partial recovery, the model acquired an ability to correct its own first repetition (successful conduite d'approche). As with real patients, successful conduite d'approche was mainly observed for real words than for non-words, and accuracy was equivalent to real patients (approximately 30% of successful conduite d'approche within the attempts made). Finally, diagnostic lesion analysis demonstrated that activation from the ventral semantic pathway contributed to successful conduite d'approche particularly for real words.

As noted above, previous researchers have accumulated detailed data regarding conduite d'approche and discussed this self-correcting mechanism in terms of a proposed lexicalsemantic "clean-up" (Kohn, 1984; Köhler et al., 1998; Nadeau, 2001; Franklin et al., 2002). Our computationally-implemented demonstration of the impact of ventral semantic pathway to conduite d'approche has advanced this theory more explicitly. In repetition, once an auditory input is heard, the corresponding semantic pattern is activated in the ventral semantic layer (i.e., is automatically comprehended). This activation in the semantic system then gradually propagates to the other layers in the ventral pathway, and eventually to the dorsal pathway as well. Given that repetition primarily relies on the phonological system rather than the semantic system (as is clear from our ability to repeat a pseudoword), activation from the dorsal, phonological pathway has a main (yet not modular) role in immediate repetition for neurologically-intact people and model (Fridriksson et al., 2010; Buchsbaum et al., 2011; Ueno et al., 2011). However, as reproduction of the auditory input is attempted twice (or more) in conduite d'approche, activation from the ventral pathway has a chance to interact further with the dorsal pathway and, as a result, the errorful repetition output from the first trial can be moved toward the correct target.

Our simulation can also explain why successful conduite d'approche in conduction aphasic patients is more readily observed in the post-recovery phase (Köhler et al., 1998; Franklin et al., 2002; Jacqueline and Marta Sarolta, 2008). The recent neuroimaging/neuropsychological/modeling literature has demonstrated that an impaired function can be at least partially re-acquired by reorganizing the function of the remaining brain areas (Leff et al., 2002; Saur et al., 2006; Welbourne and Lambon Ralph, 2007; Sharp et al., 2010). In terms of the dual dorsalventral pathway framework, Ueno et al. (2011) found that this recovery process could be supported, at least in part, by a changed interaction between the two pathways (see also Welbourne et al., 2011). Specifically, the intact model acquired a division of labor between the two pathways as an emergent property, such that the dorsal pathway was more crucial for phonological processing and the ventral pathway for semantic processing. Consequently, damage to the dorsal pathway impairs activities that heavily tap phonological processing such as repetition, resulting in conduction aphasia. Then, as the model is allowed to reorganize the remaining intact resources in the ventral pathway, some computational aspects of repetition can be shifted to the ventral pathway (Ueno et al., 2011). This plasticity-related post-recovery shift in division of labor (Welbourne and Lambon Ralph, 2007) might be a crucial foundation for lexical-semantic information to serve conduite d'approche.

Although not focused on rehabilitation per se, the current simulations provide some ideas for potential intervention strategies

#### **REFERENCES**


tractography. *J. Cogn. Neurosci.* 24, 1998–2014. doi: 10.1162/jocn\_a\_ 00263


that could be formally assessed. Specifically, the model improves its repetition performance by (a) increasing the role of the semantically-imbued ventral pathway; (b) involving word meaning more in spoken production; which, (c) increases over time (allowing semantically-driven activation to percolate through the remaining language system). For behavioral interventions, this would suggest that, when repeating, it might be beneficial (i) to encourage processing of the target's meaning as well as phonological form in this type of aphasic patient and (ii) also to discourage instantaneous, echolalic-like responding which may not allow sufficient time for word meaning to influence the spoken production. In addition, if transcranial stimulation techniques prove to be an effective method to modulate the functioning of the remaining language neural network, then it might be possible to enhance the relative contribution of the ventral (semantic) pathway, if still intact, and thus, improve the performance on real word production. For both rehabilitation approaches, we should note that the consequence of increased semantic contribution to repetition and phonological processes more generally, is that it will improve accuracy on real words but that the ability to generalize the acoustic-motor speech statistics and thus deal with non-word items will diminish, leading to increasing lexicalization errors (see Ueno et al., 2011 for a formal demonstration of this effect during simulated recovery).

Whilst the current simulation successfully demonstrated conduite d'approche behavior within the dual dorsal-ventral pathways neurocomputational framework, there are other relevant phenomena to be explained in future studies. For example, the current simulation does not clarify why conduite d'approche attempts sometimes deviate away from the target pattern in patients and why patients do not always make such attempts (Köhler et al., 1998; Saffran, 2000; Franklin et al., 2002). In addition, conduction aphasic patients show conduite d'approche in spontaneous speaking as well, not just in repetition (Franklin et al., 2002). Explaining these phenomena will be future targets.

#### **ACKNOWLEDGMENTS**

This work was supported by a Postdoctoral Fellowship for Research Abroad to Taiji Ueno from Japan Society for the Promotion of Science (JSPS), and a MRC programme grant to Matthew A. Lambon Ralph (MR/ J004146/1). The authors would like to acknowledge the assistance given by IT Services (formerly Research Computing Services) at The University of Manchester.

analysis of lesion and fMRI data. *Brain Lang.* 119, 119–128.


nonaphasic speakers. *Psychol. Rev.* 104, 801–838. doi: 10.1037/0033- 295X.104.4.801


97–115. doi: 10.1016/0093-934X (84)90009-9


processing. *Nat. Neurosci.* 12, 718–724. doi: 10.1038/nn.2331


pathways. *Neuron* 72, 385–396. doi: 10.1016/j.neuron.2011.09.013


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 29 May 2013; paper pending published: 19 June 2013; accepted: 14 July 2013; published online: 26 August 2013.*

*Citation: Ueno T and Lambon Ralph MA (2013) The roles of the "ventral" semantic and "dorsal" pathways in conduite d'approche: a neuroanatomically-constrained computational modeling investigation. Front. Hum. Neurosci. 7:422. doi: 10.3389/ fnhum.2013.00422*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2013 Ueno and Lambon Ralph. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Articulation-based sound perception in verbal repetition: a functional NIRS study

#### *Sejin Yoo1 and Kyoung-Min Lee2,3\**

*<sup>1</sup> R&D Team, Health and Medical Equipment Business, Samsung Electronics, Suwon, South Korea*

*<sup>2</sup> Interdisciplinary Program in Cognitive Science, Seoul National University, Seoul, South Korea*

*<sup>3</sup> Department of Neurology, Seoul National University, Seoul, South Korea*

#### *Edited by:*

*Marcelo L. Berthier, University of Malaga, Spain*

#### *Reviewed by:*

*Stephen V. Shepherd, Princeton University, USA Shanqing Cai, Boston University, USA*

#### *\*Correspondence:*

*Kyoung-Min Lee, Interdisciplinary Program in Cognitive Science, Department of Neurology, Seoul National University, 101 Daehak-ro Jongno-gu, Seoul 110-744, South Korea e-mail: kminlee@snu.ac.kr*

Verbal repetition is a fundamental language capacity where listening and speaking are inextricably coupled with each other. We have recently reported that the left inferior frontal gyrus (IFG) harbors articulation-based codes, as evidenced by activation during repetition of meaningless speech sounds, i.e., pseudowords. In this study, we aimed at confirming this finding and further investigating the possibility that sound perception as well as articulation is subserved by neural circuits in this region. Using functional near-infrared spectroscopy (fNIRS), we monitored changes of hemoglobin (Hb) concentration at IFG bilaterally, while subjects verbally repeated pseudowords and words. The results revealed that the proportion of oxygenated hemoglobin (O2Hb) over total Hb was significantly higher at the left IFG during repetition of pseudowords than that of words, replicating the observation by functional MRI and indicating that the region processes articulatory codes for verbal repetition. More importantly for this study, hemodynamic modulations were observed at both IFG during passive listening without repetition to various sounds, including natural environmental sounds, animal vocalizations, and human non-speech sounds. Furthermore, the O2Hb concentration increased at the left IFG but decreased at the right IFG for both speech and non-speech sounds. These findings suggest that both speech and non-speech sounds may be processed and maintained by a neural mechanism for sensorimotor integration using articulatory codes at the left IFG.

**Keywords: verbal repetition, inferior frontal gyrus, articulation-based codes, sound perception, functional near-infrared spectroscopy, hemoglobin concentration, sensorimotor integration**

#### **INTRODUCTION**

Verbal repetition is a sort of vocal imitation frequently used for learning languages. The imitation basically indicates sound mimicking without articulating sounds phonetically and phonologically. However, at a certain point of that learning, the imitation is turned into a specific linguistic process, i.e., speech processing that is dependent on limited sounds in a specific phonetic domain (Kuhl, 2004). The language-specific sound learning happens in a continuous manner, and thus it is in general not easy to specify how sounds become speech by learning. Furthermore, sounds processing and speech processing have much in common in terms of neural circuitries (Koelsch et al., 2009), which makes it more difficult to study the difference between sounds and speech.

The categorical perception (Liberman et al., 1957) can provide a helpful insight on solving the problem. Speech sounds are not so different from other sounds such as animal vocalizations and environmental sounds when acoustic sounds are processed along central auditory pathways from outer ears to auditory cortex (Malmierca and Hackett, 2010). However, the situation changes when the incoming signals arrive at the auditory cortex and some higher cortical regions, where the speech sound is perceived not only by its physical properties, but also by various linguistic features. In this sense, it is worthy to note how the brain extracts and deals with the linguistic information embedded on speech sounds. That is, it is important to know speech codes generated and maintained by the brain.

The neuropsychological theories of speech perception suggest at least two kinds of speech codes: acoustic and articulatory codes. The former assumes that speech sounds may be encoded with their acoustic characteristics (Stevens and Blumstein, 1981; Massaro, 1987; Goldinger, 1997; Johnson, 1997; Coleman, 1998), in which neural activities representing speech sounds are more likely to be directly modulated by frequency and duration of sound waves. However, the latter regards speech perception as a process in an articulatory domain, not in an acoustic domain (Liberman and Mattingly, 1985; Fowler, 1986). In this view, for example, the neural circuits for speech sounds are tuned for vocal tract gesture and hardly respond to the change of acoustic sound itself. According to the second theoretical stance, listeners perceive articulatory movements relatively invariant to acoustic changes, instead of acoustic features. In short, it is likely that speaking and listening are tightly coupled with each other and both are regulated by the same structural constraints and grammar.

In the same context, we have already found that speech codes can be differentially generated and maintained in distinct neural circuits, according to whether the incoming acoustic waves are perceived as meaningful sounds or not (Yoo et al., 2012). We introduced novel sounds with an ambiguous vowel sound. The sounds could be perceived as either a word or a pseudoword according to the interpretation of the vowel. In this way, we could examine how higher linguistic factor modulates speech codes while the acoustic features of speech sounds were not changed. Interestingly, the perception of meaningless sounds (pseudowords) was supported by articulatory codes separately reserved in left inferior frontal gyrus (LIFG). It implies that before learning, speech perception might be supported by articulatory circuits for movement imitation (Iacoboni, 2005; Iacoboni and Dapretto, 2006). Furthermore, if it were the case, articulation or motoric movements are likely to have a certain role in perceiving sounds other than speech.

In our previous study, neural activities were modeled as vascular response observed by functional MRI, i.e., cerebral blood oxygenation (Blood Oxygenation Level Dependent, BOLD) followed by hemodynamic activities. The neuronal activation causes metabolic changes, and as a result, the amount of deoxyhemoglobin (HHb) also changes. As HHb is paramagnetic, this change is observed in T2 weighted MRI. However, the BOLD contrast is known as a complex function of cerebral blood flow (CBF), cerebral blood volume (CBV), cerebral metabolic rate of oxygen (CMRO2), and so on. To describe neural activities more exactly, therefore, we need to measure these parameters independently and investigate how they interact with each other.

Currently, CBF can be measured by perfusion MRI, e.g., arterial spin labeling (ASL) MRI. While BOLD signal reflects changes in local HHb, CBF measured by perfusion MRI indicates the rate of delivery of metabolic substrates. For this reason, regional change of CBF (rCBF) is closer to neural activity than that of BOLD. However, it is less sensitive than BOLD and has lower temporal resolution same to fMRI. For CBV measurement, bolus injection is usually introduced. If we measure both CBF and CBV independently, we can estimate CMRO2, indicating that we can specify neural activities in a comprehensive way. However, the bolus injection is not practical to be used widely in that it is invasive.

As an alternative to the above, we considered functional nearinfrared spectroscopy (fNIRS). The CBV is known to be in proportion to total hemoglobin change (Takahashi et al., 1999), and relative CMRO2 is positively correlated with CBF and oxygen saturation (StO2) (Watzman et al., 2000). The StO2 is measured by the proportional change of O2Hb over total Hb. This means that fNIRS is a simple way to provide both CBV and StO2 by measuring O2Hb and HHb with a high temporal resolution. It is possible to observe CBF in fNIRS (Elwell et al., 2005), but it sometimes requires injection of tracer. Instead, we can estimate it from the non-linear relationship (a constant power law) between CBF and CBV (Brown et al., 2003). It is not yet clearly known how CBF and CBV changes during neural activation, but it is reported that greater increases in CBF than in CBV are observed during neural activation (Ito et al., 2004).

In this study, based on these findings, we continued to investigate speech (or sound) processing at LIFG by observing O2Hb and HHb. Here, we designed more natural situation of speech communication, i.e., freely listening to various sounds and responding to speech sounds only. The experiment design is a bit similar to infants' word learning, in that they selectively mimic human speech sounds out of various environmental sounds. Contrasting time-varying regional difference of Hb concentration successfully provided us with how speech could be distinguished from other sounds, i.e., natural sounds, animal vocalizations, and human non-speech sounds. In addition, verbal repetition of words and pseudowords revealed how meaningful (words) and meaningless (pseudowords) speech are distinguished from each other in terms of Hb concentration changes.

#### **MATERIALS AND METHODS**

#### **SUBJECTS**

Fifteen native Korean adults (9 males and 6 females) aged 19–37 years old (mean 25.3 years) participated voluntarily in this study. Informed consent was obtained from all participants before the experiment. The experiment procedure was approved by the Institutional Review Board of Seoul National University Hospital. All participants had normal auditory ability and reported no neurological deficits. The subjects completed a questionnaire to assess their handedness, according to the Edinburgh Handedness Inventory (Oldfield, 1971), and all were strongly right-handed (scored 80 or higher).

#### **STIMULI**

The auditory stimuli were prepared in five different categories. They were classified into five categories according to their linguistic structures: (1) natural sounds, (2) animal vocalizations, (3) human non-speech sounds, (4) pseudowords, and (5) words (**Table 1**).

The natural sounds were selected from the Pittsburgh Natural Sounds dataset recorded by Laboratory for Computational Perception and Statistical Learning (CNBC Lab., Carnegie Mellon University, USA). It consisted of ambient sounds (rain, wind, streams) with acoustic transients (snapping twigs, breaking wood, rock impacts) around the Pittsburgh region. Recording was carried out using a M-Audio's MobilePre-USB 16-bit/48 KS/s USB-powered Microphone Pre-amp, with all recordings made at 44,100 Hz. Twenty sound files out of the dataset were selected and then cut to be 2-s-length with normalized loudness as.*wav* files.

The animal vocalizations were collected from Avisoft Bioacoustics, Germany. It covered various animal vocalizations such as monkey, bird, sheep, horse, frog, etc. The recordings were made using SENNHEISER microphones K3/ME80, ME88, K6/ME62, 64, 66 or MKH60 connected to either a SONY DAT recorder TCD-D3, Marantz PMD 671, TASCAM DR-1, HD-P2, SONY PCM-M10, PCM-D50, or Fostex FR2-LE. We again



selected twenty sound files from the data set: monkey (4 ea), sheep (1 ea), horse (1 ea), dog (4 ea), wolf (1 ea), mice (2 ea), birds (3 ea), frog (2 ea), and bat (2 ea). All files were cut to be 2-s-length and normalized as.*wav* files.

The human non-speech sounds were collected from the various web sites. We used twenty sound files, consisting of gasp (2 ea), giggle (2 ea), slurp (2 ea), burp (1 ea), cry (1 ea), yawn (2 ea), kiss (2 ea), slurp (2 ea), snore (2 ea), breathe (1 ea), scream (1 ea), and cough (2 ea). All were recorded as.*wav* files and normalized with the same length (2 s) in duration.

The pseudowords were generated by randomly combining several consonants and a vowel (/*a*/) in Korean, and thus have no meaning in Yonsei Korean Corpus 1–9 (Yonsei Korean Dictionary, 1998). The words were selected from the same Corpus, with balanced word frequency. All pseudowords and words were four syllable lengths. The pseudowords and words spoken by a female Korean native speaker were recorded and converted into computer files of.*wav* format (22,050 Hz, 16bit, stereo). The loudness (average RMS level) of all stimuli was normalized (−60 to 0 dB) by a sound software (SoundForge; Sony Creative Software Inc.).

All stimuli were not significantly different in loudness and did not exceed 2 s in total length. As shown in **Table 1**, the stimuli were classified in terms of several linguistic features, i.e., whether they have linguistic meaning, whether there is linguistic segment, whether they are produced by same species (aka human), whether they are vocally produced, and whether they are acoustic sounds.

#### **EXPERIMENTAL PROCEDURES**

Lying in a comfortable table, the subjects were asked to repeat what they heard binaurally via an ear microphone in case of pseudowords and words, and otherwise simply listen to the stimuli. The sound volume was relevantly adjusted for comfortable and clear listening. In one category, twenty stimuli were used and totally 100 different stimuli in five different categories were presented to the subjects. The auditory stimuli in five different categories were pooled and then randomly presented to the subjects in four runs (twenty-five stimuli for each run).

One trial consisted of 2 s of perception, 2 s of production (only for pseudowords and words), and 12 s of resting to avoid interference from other trials (**Figure 1**). Therefore, the length of one session was 416 s, including initial dummy 16 s (totally 6 min. 56 s.). Note that there was no production phase for natural sounds, animal vocalizations, and human non-speech sounds.

#### **DATA ACQUISITION**

During the tasks, the hemodynamic changes at inferior frontal gyrus (LIFG) were bilaterally monitored by functional nearinfrared spectroscopy (fNIRS). The LIFG was identified as the locus of articulatory code recruited during verbal repetition (Yoo et al., 2012), and its right homologue was selected as an experimental control. We used Oxymon Mark III 8-channel system with sampling rate of 250 Hz (Artinis, The Netherlands), which was capable of measuring the oxygenated (O2Hb) and deoxygenated (HHb) hemoglobin concentration changes of the optical paths (banana-shaped) in the brain between the nearest pairs of transmitter and receiver.

The NIRS emits 2 wavelengths (763 and 860 nm) of continuous near-infrared lasers. We introduced a 4 × 1 configuration for measuring hemodynamic change (**Figure 2**), each of which was modulated at different frequencies to detect O2Hb and HHb at two different brain areas, i.e. left and right inferior frontal gyri (BA47). The one of activated locus (LIFG, [−22 18 −22]) in Yoo et al. (2012) and its right homologue were selected as translated into a coordinate of the 10-20 system (10/20 [−1.9 0.87]) on the scalp surface by Münster T2T-Converter (NRW Research Group for Hemispheric Specialization, Münster University). As a part of LIFG, the locus was selected to measure more stable NIRS signals with relatively high SNR bilaterally. In addition, we obtained two regionally-separated signals in left and right inferior frontal gyri and compared the results. This could make it clearer to interpret the experimental result.

**FIGURE 1 | Experiment Design.** For speech sounds, i.e., words and pseudowords, the subjects were asked to repeat what they heard. For the other stimuli, i.e., natural sounds, animal vocalizations, and human non-speech sounds, they simply listened to the stimuli. One trial lasted 16 s in length, and each run consisted of 25 trials. Each subject had four separate runs.

**FIGURE 2 | Locus for NIRS monitoring (only left side was shown here).** To detect Hb concentration changes at IFG, we positioned one receiver and two transmitter optodes near IFG (BA47) bilaterally. The transmitter and receiver were separated by 3.5cm from each other. The travelling pathways of light are determined by the distance between transmitter and receiver, source wavelengths, characteristics of medium (tissue), and so on. The detecting depth was relevantly corrected to focus on the deep gray matter in inferior frontal gyri.

To detect the hemoglobin concentration changes at the loci, we separated the distance between transmitter and receiver by 3.5 cm on the scalp surface (**Figure 2**), and used differential path length factor (DPF) of 4, by which we could measure hemodynamic changes in the gray matter on the inner brain (Fukui et al., 2003). Using the modified Beer-Lambert law (Cope and Delpy, 1988), we calculated the concentration changes of oxy- and deoxygenated hemoglobin. In this study, it is difficult to calculate exactly cerebral blood flow and blood volume from the oxy− and deoxygenated hemoglobin concentration because we did not have all required parameters to calculate them. However, we can safely assume that the cerebral blood flow (CBF) is largely correlated with the concentration change of oxygenated hemoglobin, whereas the cerebral blood volume (CBV) are equally correlated with oxy- and deoxygenated hemoglobin changes (Lammertsma et al., 1984; Edwards et al., 1988). Based on this assumption, we interpreted the experimental results.

#### **DATA ANALYSIS**

The acquired data were analyzed by the followings: at four optode sites, the NIRS system provided oxygenated (O2Hb) and deoxygenated (HHb) hemoglobin concentration calculated by the modified Beer-Lambert law (Cope and Delpy, 1988). In one session, the time-series signals at one optode consisted of twenty-five trials randomly selected from five different conditions. For a subject, we collected one hundred trials (20 trials × 5 conditions) across four different sessions. The collected time-varying signals (2 signals × 4 optodes) were low-pass filtered with cutoff frequency of 10 Hz (5th-order Butterworth filter) to remove high frequency noises and motion artifacts. As we aimed to see the difference of neural responses between left and right IFG, the signals (O2Hb and HHb) at two ipsilateral optodes were averaged in the same hemisphere to obtain higher signal-to-noise ratio (SNR). Accordingly, we obtained time-varying data consisting of 2 signals × 2 hemispheres for a subject.

Then, O2Hb and HHb signals at left and right IFG were aligned at stimulus-onset-time to obtain event-locked response. From this, we could calculate hemodynamic response function (HRF) in this study. The HRF was simply estimated by averaging all event-locked trials in five different conditions for O2Hb and HHb signals, respectively. Considering peak timing difference between categories, there was a bit jittering. We discovered that the time courses of our HRFs to auditory stimuli had peaks between 5 and 6 s after the stimulus onset, which was comparable to the canonical HRF specified in most fMRI studies (Friston et al., 1995). It implies that the hemodynamic responses observed in this study are reliable enough to be used as an indicator for neural activities. Therefore, we assumed that the acquired data was suitable for further analysis.

Next, we divided the O2Hb and HHb signals at left and right hemispheres into five different categories. We averaged twenty, event-locked word-trials at each hemisphere and made a single, filtered time-point response for word repetition. The same averaging process was applied to twenty, event-locked pseudowordtrials. Similarly, we obtained the averaged neural responses for natural sounds, animal vocalizations, and human non-speech sounds at each optode, respectively. As there were two measures (O2Hb and HHb) at left and right hemispheres, we have twenty event-locked temporal responses by a subject (2 Hb measures × 2 loci × 5 categories).

Before statistical analysis, we calculated the proportional change of O2Hb over total Hb (sum of O2Hb and HHb), which is thought to be correlated to the CBF change. As a statistical analysis of these data, we first contrasted the proportional change of O2Hb over total Hb within two speech sounds, i.e., words and pseudowords across fifteen subjects. Considering the hemodynamic delay of neural responses (5–6 s after stimulus onset), we used time-binned signals including the peaks of HRFs, i.e. from 4 to 7 s (bin size = 3 s). The difference between words and pseudowords was confirmed by analysis-of-variance (Two-Way ANOVA), in which two independent variables were categories and measured optode sites. Within non-speech sounds, we contrasted the proportional change of O2Hb over total Hb in a similar manner.

As an indicator of CBV, the total Hb change was calculated from the averaged time-point responses of O2Hb and HHb for each category. Since the amount of total Hb change was so variable according to categories, we normalized it by subtracting the average of the total Hb and then divided it by its variance. In this way, the maximum value was adjusted to be less than two for all categories and the dynamic range of total Hb change was equal among all categories. With the normalized signals collected from fifteen subjects, we again conducted statistical analysis (Two-Way ANOVA) within speech sounds (2 categories × 2 loci). The same statistical analysis was applied to the normalized signals within non-speech sounds.

Lastly, we wanted to contrast the total Hb change of speech with that of non-speech. To this end, we calculated the averaged total Hb (HbT) change of words and pseudowords, and calculated the HbT change of non-speech sounds (natural sounds, animal vocalizations, and human non-speech). As explained in the above, we normalized them before statistical analysis. Across fifteen subjects, we conducted Two-Way ANOVA (2 categories × 2 loci). The significance of statistical analysis was all confirmed at α = 0*.*05.

#### **RESULTS**

We first examined the hemodynamic responses by repeating speech (pseudowords and words) and listening to non-speech sounds (natural sounds, animal vocalizations, and human nonspeech sounds) at inferior frontal gyri (IFG, BA47) bilaterally during the tasks. Apparently, the result shows that speech sounds evoked higher hemodynamic responses at left inferior frontal gyrus than at right homologue, in terms of the percent change of O2Hb concentration over total Hb change (**Figure 3B**). In contrast to speech, non-speech sounds showed relatively small hemodynamic responses at the same locus (**Figure 3A**). At right inferior frontal gyrus, however, we found little hemodynamic responses for either speech or non-speech sounds. Among the stimuli tested in the current study, it is likely that only speech sounds can evoke a regional increase in cerebral blood flow at the left inferior frontal gyrus.

To support the above observation, we statistically analyzed the result and confirmed the difference between the types of stimuli and loci. In hemodynamic responses for speech sounds, it

change of O2Hb over total Hb concentration by verbal repetition of is notable that the O2Hb concentration changes by pseudoword repetition were higher than those by word repetition during the time windows of 4–7 s (shaded areas) after the stimulus onset

non-speech sounds, and natural sounds were shown. **(B)** The percent

(**Figure 3B**). It is reminiscent of the functional MRI study that left inferior frontal gyrus is reserved for articulatory speech codes of pseudowords (Yoo et al., 2012). The result was statistically significant at α = 0*.*05 [*F(*1*,* <sup>14</sup>*)* = 5*.*95, *p* = 0*.*0162; at 4–7 s time windows]. Consistently, its shape is more similar to the canonical HRF at the left inferior frontal gyrus (Friston et al., 1995), indicating that repeating pseudowords might recruit more neural circuits in this region. This result also indicates that there were large blood supplies with O2Hb to compensate O2 consumption by neural activities. In contrast, compared with the value at stimulus onset, we found no such Hb concentration change at the right inferior frontal gyrus for either words or pseudowords.

For non-speech sounds, compared with the value at stimulus onset, the O2Hb change over total Hb concentration was not found at either left or right inferior frontal gyrus (**Figure 3A**). In comparison with time windows of 1–4 s (before the shaded areas), the hemodynamic responses in shaded areas (time window of 4–7 s after the stimulus onset) were not significantly different at α = 0*.*05 [*F(*2*,* <sup>14</sup>*)* = 2*.*13, *p* = 0*.*1501 for left hemisphere; *F(*2*,* <sup>14</sup>*)* = 0*.*19, *p* = 0*.*6627 for left hemisphere]. In addition, for non-speech sounds, no main effects by either sound types or optode positions, and no interaction between them were found at α = 0*.*05 for the same time windows of 4–7 s (shaded areas) after the stimulus onset [*F(*2*,* <sup>14</sup>*)* = 1*.*13, *p* = 0*.*2901 for sound types; *F(*2*,* <sup>14</sup>*)* = 0*.*6, *p* = 0*.*5508 for optodes; *F(*2*,* <sup>14</sup>*)* = 0*.*6, *p* = 0*.*5494 for interaction].

Interestingly, non-speech sounds could increase total Hb concentration at left inferior frontal gyrus while speech sounds could not change total Hb concentration (**Figure 4**). At right inferior

vocalizations; hmn, human non-speech sounds; ntl, natural sounds;

wrd, words; pwd, pseudowords.

frontal gyrus, both speech and non-speech significantly decreased total Hb concentration. The difference between speech and nonspeech at left inferior frontal gyrus was significant during the time windows of 2–6 s (shaded areas) after the stimulus onset [*F(*1*,* <sup>14</sup>*)* = 6*.*58, *p* = 0*.*012 for sound types; *F(*1*,* <sup>14</sup>*)* = 4*.*12, *p* = 0*.*0447 for optodes; *F(*1*,* <sup>14</sup>*)* = 0*.*57, *p* = 0*.*4536 for interaction]. The time-to-peak (TTP) of hemodynamic response function is variable, i.e., about 5∼6 s. In this study, we found the TTP of non-speech sounds was slightly less than that of speech sounds. For this reason, we shifted the time window of non-speech sounds for statistical analysis to make sure that the peak point is located in the middle of the comparison window.

Note that listening to non-speech sounds increased total Hb concentration at LIFG, whereas the same task could not change the O2Hb change over total Hb concentration at the same locus. For speech sounds, it was observed in opposite direction, i.e., speech sounds increased the O2Hb change over total Hb concentration, whereas the same sounds could not change total Hb concentration at LIFG. Interestingly, we found negative changes in both speech and non-speech sounds in terms of total Hb concentration, but there were little changes in terms of the O2Hb change over total Hb concentration for either speech or non-speech sounds. All these imply that regional CBV and CBF might be separated from each other by neural activities, which has time-varying characteristics. They are hardly separable from each other in fMRI measuring BOLD signals.

During verbal repetition of words and pseudowords, we found little change of total Hb concentration at left inferior frontal gyrus (**Figure 5B**). Total Hb concentration was decreased for both words and pseudowords at right inferior frontal gyrus. The difference between words and pseudowords was not significant at either left or right inferior frontal gyri, and there was no interaction between sound types and optode positions during the time windows of 2–6 s (shaded areas) after the stimulus onset [*F(*1*,* <sup>14</sup>*)* = 1*.*78, *p* = 0*.*1849 for sound types; *F(*1*,* <sup>14</sup>*)* = 0*.*29, *p* = 0*.*5911 for optodes; *F(*1*,* <sup>14</sup>*)* = 0*.*35, *p* = 0*.*5525 for interaction].

In case of non-speech sounds, we observed increases of total Hb concentration by natural sounds at left inferior frontal gyrus (**Figure 5A**). Human non-speech sounds also evoked small increase of total Hb concentration at the same locus, but there was no change for animal vocalizations. At right inferior frontal gyrus, however, only human non-speech sounds could increase total Hb concentration while the other sounds decreased the changes of total Hb concentration. In other words, human non-speech sounds could increase the cerebral blood volume at bilateral inferior frontal gyri, implying the change of cerebral blood flow (**Figure 3A**).

This result implies that non-speech sounds can increase cerebral blood volume at left inferior frontal gyrus, but not at right inferior frontal gyrus. Statistical analyses showed that the difference between left and right inferior frontal gyri was significant at α = 0.05 during the time windows of 2–6 s (shaded areas) after stimulus onset time. However, there was no main effect of sound types, nor interaction between sound types and optodes [*F(*2*,* <sup>14</sup>*)* = 3*.*97, *p* = 0*.*0478 for sound types; *F(*2*,* <sup>14</sup>*)* = 1*.*8, *p* = 0*.*1687 for optodes; *F(*2*,* <sup>14</sup>*)* = 2*.*05, *p* = 0*.*1325 for interaction].

#### **DISCUSSION**

We examined hemodynamic responses at bilateral inferior frontal gyri (IFG, BA47) while the subjects verbally repeated speech sounds. We observed that the percent change of O2Hb concentration over total Hb was significantly higher for pseudowords than for words at left inferior frontal gyrus (LIFG). This result is consistent with the previous findings in Yoo et al. (2012). Interestingly, we also found significant increases of total hemoglobin concentration at LIFG even by passive listening of various non-speech sounds, which provides a new insight on sound perception in verbal repetition.

#### **ARTICULATION-BASED CODE AT LEFT INFERIOR FRONTAL GYRUS**

The main purpose of this study was to re-examine the findings in Yoo et al. (2012), i.e., whether pseudowords are differentially represented in left inferior frontal gyrus (LIFG), in contrast to words. In the fMRI study, we suggested that unfamiliar speech sounds such as pseudowords might use articulatory codes based on sound imitation at the LIFG, and this was not the case in word repetition. In this context, we expected that the percent change of O2Hb concentration over total Hb at LIFG, similar to BOLD signal change in fMRI, would be significantly higher for pseudowords than for words. It implies that there were more changes

standard deviation. LIFG, left inferior frontal gyrus; RIFG, right inferior frontal gyrus; anm, animal vocalizations; hmn, human non-speech sounds; ntl, natural sounds; wrd, words; pwd, pseudowords.

in oxygen saturation level by neural activation, leading to abrupt increases in CBF to compensate this change (Watzman et al., 2000). The result was exactly replicated in this study.

The region investigated in this study is slightly displaced from the peak locus found in Yoo et al. (2012). Nevertheless, there were regional changes by repeating meaningless sounds. This means that the change of regional cerebral blood flow (rCBF) estimated by fNIRS conformed to the regional BOLD signal change measured by fMRI. It also implies that the LIFG were likely to be locally reserved as a temporal storage of speech codes for pseudowords during verbal repetition (Yoo et al., 2012). With respect to the result, it is notable that BA47 is known as a part of speech production circuits involved in fluency controls (Brown et al., 2005; Kell et al., 2009). This is partly consistent with the finding in this study, in which it will be critical to prepare articulatory codes for fluent speech before learning of pseudowords.

It is also notable that there were relatively small but considerable increases in O2Hb concentration by word repetition at the LIFG. In this case, it was likely that articulatory coding was automatically initiated at the LIFG while perceiving words. Unfortunately, due to the limitation of fNIRS channels in this study, we could not measure the O2Hb change at left middle temporal gyrus (LMTG), supposed to be a center of acoustic-phonetic codes of words (Yoo et al., 2012). According to our previous results, however, it is more likely that the acoustic-phonetic codes at the LMTG became superior to the articulation-based codes at the LIFG for words. That is, two distinct neural activities at the LIFG and LMTG seem to be simultaneously evoked for perceiving words.

This is partly because the LIFG serves as speech parser to detect word segmentation in continuous speech sounds (McNealy et al., 2006). McNealy and colleagues observed left-lateralized signal increases in temporal cortices only when parsing the continuous sounds with statistical regularities, which was a precursor of words. More importantly, they found that neural activities at LIFG and LMFG were positively correlated with an implicit detection of word boundaries, i.e., the detection of speech cues. That is, the LIFG might act as speech segmentation circuits automatically recruited before auditory lexical retrieval was completed at the LMTG (Marslen-Wilson, 1987).

On the other hand, the LIFG was known as a part of human mirror neuron system, supposed to be neural correlates of imitation mechanism (Iacoboni, 2005; Iacoboni and Dapretto, 2006). This notion is easily suited for the articulation-based sound perception discussed in the above, in that unfamiliar sounds are apt to be imitated for verbal repetition. In the same context, the O2Hb change by word repetition observed in 4–7 time windows at the LIFG was likely to be originated from the analysis-by-synthesis facility to perceive the incoming speech sounds (Cohen et al., 1988).

The higher response of pseudowords might be accounted for by other causes such as the difficulty to memorize and repeat pseudowords, compared to words. To be sure that the subjects clearly listen to the stimuli, we carefully adjusted the loudness of stimuli for each subject and minimized environmental noises during the task. No subjects reported listening problem of pseudowords in practices conducted before this experiment. The syllable length of pseudowords is four, same to the length of words, which is less than the capacity of verbal short-term memory (Miller, 1956). Therefore, we assume that pseudowords are not more difficult to repeat than words in terms of syllable length. The repeating time is at 2 s after listening to the stimuli, which is surely in the order of the duration of verbal short-term memory.

Another possibility is that novelty in pseudowords might enhance the hemodynamic response for pseudowords. Human brain can detect novel events at sub-cortical level by encoding regularities in the recent auditory past (Slabu et al., 2012), but the pseudowords used in this study were not novel in this sense because each syllable in pseudowords was a proper syllable currently used in Korean. At cortical level, articulating pseudowords might evoke novelty effects in the mind because there is no corresponding mental lexicon for the sounds. That is, at this level, the novelty is introduced by generating articulatory codes and this is exactly what we expected in this study (Yoo et al., 2012).

Lastly, notice that the second positive peak is observed in word and pseudoword repetition commonly (**Figure 3B**). The peak of pseudoword repetition was found at about 11.39 s after the stimulus onset, followed by that of word repetition at about 12.28 s after the stimulus onset. The second peaks seem to reflect the speech production after listening to the sounds. Consistent with the notion, no second peaks were found in non-speech sounds because the subjects passively listened to non-speech sounds without verbal repetition of perceived sounds. The small phase difference of second peaks between words and pseudowords might be due to the difference of preceding events for perception.

#### **PERCENT O2Hb CHANGE vs. TOTAL Hb CHANGE**

It is interesting that no peaks were found in the proportions of O2Hb change over total Hb concentration for non-speech sounds despite that we found significant O2Hb change at the LIFG (**Figure 3**). In terms of total Hb change, however, non-speech sounds evoked large peaks at the LIFG while speech sounds did not change total Hb concentration (**Figure 4**). Therefore, it seems that total Hb change as well as percent O2Hb change is important to describe neural activities, indicating that there is a strong nonlinear relationship between neural activity and hemodynamic response (Brown et al., 2003).

We can hardly observe the above finding in BOLD-fMRI. The BOLD-fMRI can measure more CBV-related changes, whereas the fNIRS can estimate both CBV and CBF by measuring hemodynamic changes of HHb and O2Hb at the same time. In practice, the BOLD-fMRI is likely to have more artifacts indicated as neural activities than ASL-fMRI measuring regional changes in CBF despite that they have a high congruency in activated patterns (Kemeny et al., 2005). The main reason why the BOLD-fMRI overestimates neural activities is because BOLD contrast is a result of neurovascular coupling determined by lots of physiological events, e.g., blood oxygenation, cerebral blood flow (CBF), and cerebral blood volume (CBV) (Buxton et al., 1998; Logothetis, 2002).

To overcome this technical limitation, it is highly required that fMRI methods based on BOLD contrast are used in combination with other methods, e.g., ASL-fMRI to examine changes in blood oxygenation and CBF (Detre and Wang, 2002). Multi-modal imaging is also helpful to overcome a spatial and temporal limitation in measurement. However, multi-modal imaging is usually very complex and not cost-effective. Huppert and his colleagues showed that temporal dynamics of BOLD response were well correlated with the NIRS measure of HHb, indicating that fNIRS may be used as an alternative of fMRI (Huppert et al., 2006). In addition, the fNIRS can estimate cerebral metabolic rate to separate CBV and CBF (Boas et al., 2003).

The total Hb change measured in fNIRS is generally thought to reflect the change of regional cerebral blood volume (rCBV), i.e., in proportion to rCBV (Villringer and Chance, 1997; Takahashi et al., 1999). In this study, we found that the total Hb increased only in listening to non-speech sounds at LIFG, compared to speech sounds (**Figure 4**). This is a replication of our previous study using BOLD-fMRI in that BOLD-fMRI tends to reflect a regional change of CBV. That is, neural circuits for non-speech sounds are subserved by the change of rCBV rather than that of rCBF. It thus seems that rCBV is more important to generate and maintain articulatory codes for non-speech sounds (Yoo et al., 2012).

This abrupt change of rCBV at LIFG is not observed in speech sounds. Instead, the percent Hb change over total Hb change was significantly high for speech sounds (**Figure 3**). The percent Hb change over total Hb concentration is positively correlated with cerebral metabolic rate of oxygen consumption (CMRO2), and both rCBF and CMRO2 are coupled with each other during cognitive tasks (Hoge et al., 1999; Watzman et al., 2000). It means that repeating speech sounds evokes more rCBF change than repeating non-speech sounds at LIFG.

The fMRI cannot discriminate the increase of rCBF from that of rCBV because the increase of rCBF is followed not only by neural activation, but also by cerebral vasodilation at systolic phase of the cardiac cycle (Lerch et al., 2012). However, the relation between changes in rCBF and rCBV seems to be de-coupled during neural deactivation, indicating that there might be a different mechanism between them (Ito et al., 2004). It suggests that speech and non-speech sounds are differentially processed in neural circuits at LIFG during deactivated phase. In addition, discordant responses to rCBF and rCBV are often reported in pharmacological MRI (Luo et al., 2009). It also suggests that neural circuits at LIFG operate differentially for both speech and non-speech sounds in terms of oxygen metabolism.

#### **SOUND PERCEPTION AT BILATERAL INFERIOR FRONTAL GYRI**

Speech perception has been traditionally considered in a sensory or acoustic domain. Recently, however, some theories based on non-sensory domain are emerging to account for neural mechanism of speech perception. For example, the motor theory suggests that listener perceives not the acoustic features, but the abstract intended gestures required to articulate the sounds (Liberman and Mattingly, 1985). As another variant of the motor theory, direct realism tries to account for speech perception as perceiving actual vocal tract gestures using information in the acoustic signal (Fowler, 1986). These all presuppose that perceiving sounds intrinsically involves motoric movements (Fadiga et al., 2002).

After Broca's seminal discovery, the left inferior frontal gyrus (LIFG) was reported as the center of speech production of fluent and articulated speech as well as that of speech comprehension (Caramazza and Zurif, 1976). This means that speech perception is partly dependent on the LIFG. Our results further suggest that the LIFG might have a certain role in perceiving non-speech sounds, too. The non-speech sounds, e.g., natural sounds and animal vocalizations used in this study were not articulable in terms of human vocal organs. It is thus less likely that the subjects might subvocally articulate the non-speech sounds during passive listening. Nevertheless, there were significant hemodynamic changes by perception of non-speech sounds at the LIFG, which was comparable to speech sounds, in terms of total Hb change (**Figures 4**, **5A**).

With regard to this, it is reported that stimulus expectancy can modulate inferior frontal gyrus in passive auditory perception (Osnes et al., 2012). It is still debatable whether the LIFG has an essential or simple modulatory role in auditory perception, but motoric involvement is at least important in top-down control of auditory perception such as emotional arousal (Scott et al., 2009). This notion is supported by various sensorimotor integration mechanisms, too (Wilson et al., 2004; Pulvermüller et al., 2006; Wilson and Iacoboni, 2006). In addition, neural activities at the LIFG can predict individual differences in perceptual learning of cochlear-implant patients (Eisner et al., 2010), indicating that learning of sound perception is partly dependent on the LIFG.

However, it is not easy to interpret the hemodynamic modulation by the sound types at the LIFG (**Figure 5A**). It is likely to reflect the degree of internally simulated articulation to perceive incoming sounds, but it is not clear. Nevertheless, it should be noted that human non-speech sounds uniquely modulated total Hb changes at bilateral IFG, unlike the other sounds. It is possible that both left and right inferior frontal gyri responded to emotional stimulus, and as a result, autonomous nervous system (ANS) was activated. The activated ANS might change blood pressure and flow. To investigate it further, we need to see the whole brain areas with more NIRS channels, which can specify whether the change comes from local or global hemodynamic response.

In the line of emotional processing views, Hoekert and her colleagues revealed that both left and right inferior frontal gyri were involved in the processing of emotional prosody in speech (Hoekert et al., 2010). Another study with patients in supranuclear palsy reported that gray matter atrophy in RIFG has significant correlations with voice emotion recognition and theory of mind deficits, indicating that RIFG is associated with prosodic auditory emotion recognition (Ghosh et al., 2012). That is, the bilateral changes of total Hb concentration by listening to human non-speech sounds seem to be partly due to non-speech process in speech perception.

Putting all together, articulatory circuits at LIFG are involved in sound and speech perception. An auditory-motor integration was likely to develop in parallel with cognitive demands to organize incoming sounds as perceptually meaningful elements (Westerman and Miranda, 2002; Kuhl, 2004). The auditory-motor integration is also essential in social communication transferring non-verbal emotional states of others (Warren et al., 2006). Therefore, the hemodynamic changes

#### **REFERENCES**


at the LIFG suggest that auditory perception is in part supported by motoric representation, namely articulation-based sound perception.

#### **ACKNOWLEDGMENTS**

This study was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MEST) (20120005823).


*R. Soc. Lond. B* 357, 1003–1037. doi: 10.1098/rstb.2002.1114


human brain function. *Trends Neurosci.* 20, 435–442. doi: 10.1016/S0166-2236(97)01132-6


Yoo, S., Chung, J.-Y., Jeon, H.-A., Lee, K.-M., Kim, Y.-B., and Cho, Z.-H. (2012). Dual routes for verbal repetition: articulation-based and acoustic-phonetic codes for pseudoword and word repetition, respectively. *Brain Lang.* 122, 1–10. doi: 10.1016/j.bandl.2012.04.011

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 17 May 2013; paper pending published: 19 June 2013; accepted: 18 August 2013; published online: 05 September 2013.*

*Citation: Yoo S and Lee K-M (2013) Articulation-based sound perception in verbal repetition: a functional NIRS study. Front. Hum. Neurosci. 7:540. doi: 10.3389/fnhum.2013.00540*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2013 Yoo and Lee. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Mapping a lateralization gradient within the ventral stream for auditory speech perception

#### *Karsten Specht1,2\**

*<sup>1</sup> Department of Biological and Medical Psychology, University of Bergen, Bergen, Norway*

*<sup>2</sup> Department for Medical Engineering, Haukeland University Hospital, Bergen, Norway*

#### *Edited by:*

*Matthew A. Lambon Ralph, University of Manchester, UK*

#### *Reviewed by:*

*Sophie K. Scott, University College London, UK Steve Majerus, Université de Liège, Belgium*

#### *\*Correspondence:*

*Karsten Specht, Department of Biological and Medical Psychology, University of Bergen, Jonas Lies vei 91, 5009 Bergen, Norway e-mail: karsten.specht@psybp.uib.no* Recent models on speech perception propose a dual-stream processing network, with a dorsal stream, extending from the posterior temporal lobe of the left hemisphere through inferior parietal areas into the left inferior frontal gyrus, and a ventral stream that is assumed to originate in the primary auditory cortex in the upper posterior part of the temporal lobe and to extend toward the anterior part of the temporal lobe, where it may connect to the ventral part of the inferior frontal gyrus. This article describes and reviews the results from a series of complementary functional magnetic resonance imaging studies that aimed to trace the hierarchical processing network for speech comprehension within the left and right hemisphere with a particular focus on the temporal lobe and the ventral stream. As hypothesized, the results demonstrate a bilateral involvement of the temporal lobes in the processing of speech signals. However, an increasing leftward asymmetry was detected from auditory–phonetic to lexico-semantic processing and along the posterior– anterior axis, thus forming a "lateralization" gradient. This increasing leftward lateralization was particularly evident for the left superior temporal sulcus and more anterior parts of the temporal lobe.

**Keywords: ventral stream, fMRI, speech perception, auditory perception, temporal lobe**

#### **INTRODUCTION**

The research on speech perception, language, and human communication behavior has a long history in science and reveals to be an actual topic through centuries and, with the venue of neuroimaging methods, became an even broader research field over the last two decades (Price, 2012). The first important contributions to our current view on the neuroanatomy of language came from the French physician, anatomist, and anthropologist Pierre Paul Broca (1824–1880) and the German physician, anatomist, psychiatrist, and neuropathologist Carl Wernicke (1848–1905). Broca was the first to describe an association between language deficit and the damage of a specific frontal brain area, which is now referred to as "Broca's area" (Dronkers et al., 2007), while Carl Wernicke noticed that also lesions of the posterior part of the left superior temporal gyrus (STG) could cause language disorders, even though these disorders substantially differed from those deficits caused by frontal lesions (Wernicke, 1874). In a review published in 1885, Lichtheim developed a model of aphasia, proposing the posterior area of the temporal lobe to be involved in the comprehension of language, and the anterior area of the temporal lobe in its expression and production, while an anatomically less defined area was thought to process concepts (Lichtheim, 1885). Thereby, this early model was able to allocate various forms of lesion-induced aphasia to one of these areas, or to damaged connections between them. This model from the end of the nineteenth century was mainly based on clinical observations and neuroanatomical examinations. The majority of later neurological models of language processing focused on the arcuate fasciculus as the dominating fiber tract (Ueno et al., 2011;Weiller et al., 2011). With the venue of

functional *in vivo* measurements, such as electrophysiological and imaging techniques, this view has been revised, and the most recent models on speech perception propose a dual-stream processing network (Hickok and Poeppel, 2004, 2007; Scott and Wise, 2004), with a dorsal stream, comparable to the classical language network, and an additional ventral stream. The dorsal stream extends from the posterior temporal lobe of the left hemisphere through inferior parietal areas into the left inferior frontal gyrus, also including premotor areas. Anatomically, this hypothesized stream mainly follows the arcuate fasciculus, connecting the temporal and inferior parietal lobe with the inferior frontal gyrus, and possesses three distinct branches in the left hemisphere (Catani et al., 2007). The second stream is the ventral stream, which is assumed to originate in the upper posterior part of the temporal lobe and to extend toward the anterior part of the temporal lobe, where it also connects to the ventral part of the inferior frontal gyrus through the uncinate fasciculus and extreme capsule (Saur et al., 2008; Weiller et al., 2011). Confirming evidence for this dual-stream perspective come from several neuroimaging studies, presented in a recent review by Price (2012) that summarizes the attempts over the last 20 years in mapping speech perception processes using different neuroimaging methods and paradigms. Furthermore, neurocomputational models deliver further evidence for the dual pathway model, with a dorsal pathway that maps sounds-to-motor programs and is thus important for repetition, and a ventral pathway that is important for the extraction of meaning (Ueno et al., 2011).

Building on the work above, this article describes and reviews the results from a series of complementary functional magnetic resonance imaging (fMRI) and positron emission tomography

"fnhum-07-00629" — 2013/9/30 — 17:41 — page 1 — #1

(PET) studies that aimed to trace the hierarchical processing network for speech comprehension within the left and right hemisphere, with a particular focus on the temporal lobe and the ventral stream. To achieve this goal, the work presented here starts with studies exploring pure auditory processing within the primary and secondary auditory cortex, continues with studies on the processing of vowels and consonants and concludes with studies on the perception of syllables and the processing of lexical, semantic, and sentence information. These processes are the core processes for decoding speech and extracting its meaning and are thus important for communicative abilities. These functions are assumed to be subserved by the ventral stream. Thus, the ventral stream is an important part within the speech and language network as it is involved in both perception and production of speech.

However, there is a specific challenge in exploring auditory and in particular speech perception. Unlike visual information, auditory information is stretched over time and spectro-temporal characteristics are the information carriers. Based on the resonance frequencies of the vocal tract, characteristic patterns emerge that are important in identifying a sound as a speech sound. Various parameters play together. For example, a vowel, e.g., an /a/, is dominated by constant intonation and constant pitch of the voice. By contrast, an unvoiced stop consonant is dominated by a sound produced by the sudden stop of airflow within the vocal tract, and it is characterized by its place of articulation and the voice onset time (VOT; Benkí, 2001). Depending on the configuration of the vocal tract, this results in a very characteristic sound – or noise burst – for a stop consonant, e.g., a /t/. Similarly, the voiced consonant /d/ has a very similar configuration of the vocal tract, with respect to placement of the tongue, opening of the mouth, etc. However, a /d/ does not have an acoustically similar prominent stop of airflow as the /t/, but an earlier insertion of the voice in case of a following vowel, thus making it possible to differentiate a /da/ from a /ta/. Thus, these two syllables share the same place of articulation, but differ in their VOT. A similar association can be found for the syllable pairs /ba/ and /pa/ and /ga/ and /ka/. These described differences between, for example, the consonant–vowel (CV) syllables /da/ and /ta/ are easily visible in spectrograms. It is not only the spectro-temporal difference between, for example, a stop-consonant and a vowel that is characteristic for a speech sound, but also the temporospectral sub-structure, called "the formants." All voiced speech sounds are characterized by these formants, which are resonance frequencies of the vocal tract. In the spectrogram, the formants appear as distinguishable sub-structures in the lower part of the spectrogram and are the same for /da/ and /ta/. Since those CV syllables are important building blocks in several languages, they are often used to study basic speech perception processes, for example in dichotic listening tasks (Rimol et al., 2006a; Sandmann et al., 2007; Hugdahl et al., 2009). Therefore, all or some of the six CV syllables /ba/, /da/, /ga/, /ka/, /pa/, and /ta/ are used as test stimuli in some of the studies presented here (Rimol et al., 2005; van den Noort et al., 2008; Specht et al., 2009; Osnes et al., 2011b).

#### **MAPPING THE VENTRAL STREAM**

The following section describes a series of complementary studies that aimed to disentangle the different processes and neuronal correlates involved in auditory speech perception. The section starts with studies on the basic auditory perception of phonetic signals, such as vowels and consonants, and proceeds to studies on sub-lexical, lexical as well as semantic processing. These processes describe the function of the hypothesized ventral stream that is predominantly mediated through sub-structures of the temporal lobes. The aim of these studies was not only to identify the different processes associated with the ventral stream and to map them onto respective brain areas, but also to map the sensitivity of the contributing brain structures to the presence of phonetic information and to detect on which level a functional asymmetry between brain hemispheres emerges. To achieve this goal, three of the studies presented here were performed using a "dynamic" paradigm (in the following called "sound morphing"; Specht et al., 2005, 2009; Osnes et al., 2011a,b), which is a different experimental setup than typically applied in fMRI studies. Studies on auditory perception often compare categories of stimuli, such as noise, music, or speech (see, e.g., Specht and Reul, 2003). However, in order to assess whether a brain structure responds uniformly to a sound, or whether it is sensitive to the presence of relevant phonetic features, dynamic paradigms have the advantage that they can keep some general acoustic properties constant while varying others. Thus, it is possible to differentiate brain areas that show constant responses from areas that change following the manipulation, as seen, for example, in a study that gradually "morphs" a sound from white noise into a speech sound (Specht et al., 2009; Osnes et al., 2011a). Similar approaches have also been applied earlier by using, for example, noise-vocoded speech (see, for example, Davis and Johnsrude, 2003), where the manipulated sounds originate from undistorted sounds, or by using a morphing procedure for probing categorical perception (Rogers and Davis, 2009). Some of the studies presented here used a similar approach by morphing sounds across sound categories, e.g., from a non-verbal white noise into a speech sound, or from a flute sound into a vowel. These sound-morphing approaches provide additional information on perception processes, as they allow to differentiate between brain areas that follow the manipulation from those that response uniformly to the presence of a sound. Technically, a set of stimuli is generated where the presence of a respective acoustic feature is varied in its presence or intensity. Played in the corrected order, the respective feature becomes more and more audible. In this respect, it is important, that the subjects are naïve to this manipulation and that the sounds are not presented in the correct, gradual order, but randomly, since top-down and expectancy effects are known to influence the perception of distorted or unintelligible sounds (Dufor et al., 2007; Osnes et al., 2012).

The studies described below follow a simplified model of the ventral stream, as depicted in **Figure 1**, starting with the auditory–phonetic analysis of vowels and consonants, continuing to sub-lexical, lexical, and semantic processing. It should further be noted that in most studies, if not indicated otherwise, participants performed an attentive, but otherwise passive listening task, with either no task (Specht and Reul, 2003) or an arbitrary task not related to the content of the study (Rimol et al., 2005; Specht et al., 2009; Osnes et al., 2011a,b).

"fnhum-07-00629" — 2013/9/30 — 17:41 — page 2 — #2

studies and displays the results for auditory processes of vowels in red, auditory phonetic analysis of consonants in blue, phonological, and sub-lexical processes in green, and, finally, lexico-semantic processes in purple. For display purposes are all results converted into *z*-scales and projected onto a standard brain. **(B)** The simplified working model for the ventral stream is displayed with the same colors as in **(A)**. In addition a lateralization gradient indicates the increasing leftward asymmetry.

#### **AUDITORY–PHONETIC ANALYSIS**

It has been shown that non-verbal material, including pure tones and complex sounds, elicit asymmetric BOLD signals between the brain hemispheres, with stronger signals in the right posterior part of the STG and right Heschl's gyrus, while the perception of speech elicits stronger responses on the left (Specht and Reul, 2003). But what happens when the differentiation between verbal and non-verbal content is not that clear, especially when the participant does not recognize a difference between them? This was the central question of the study by Osnes et al. (2011b), where the sound from a flute was gradually changed into either the sound of trumpet or oboe, or alternatively into a vowel /a/ or /o/. This was achieved by a sound-morphing paradigm, where the vowel spectrum was linearly interpolated into the flute spectrum, resulting in a stepwise transition from a flute into a vowel sound over seven distinct steps. Step one was a sound consisting of mainly flute-sound features, while the presence of vowel-sound features increased over the subsequent steps two to seven. Non-phonetic control sounds were created in a similar manner, resulting in a step-wise transition from flute into either an oboe or trumpet sound. It is important to note that participants were not informed about this manipulation and also – after hearing the sounds – were not aware of that the sounds contained phonetic features to a varying degree, as revealed by post-study interviews. This is an important and fundamental study concept that was also used in some of the following studies in order to reduce the effect of expectancy, since the expectance of hearing a speech sound can substantially change the way that the sounds are perceived. This was, for example, impressively displayed in the study by Dufor et al. (2007) and recently replicated by Osnes et al. (2012) using the same stimuli described above. In addition, the level of attention can influence the extent of activation in primary and sensory areas (Jäncke et al., 1999; Hall et al., 2000; Hugdahl et al., 2000), influencing also the within-subject reliability of the activation, as shown for the visual cortex (Specht et al., 2003b). Hence, participants were given an arbitrary task, which was unrelated to the true aim of the study and, more important, did not contain any discrimination between the different sounds. Thus, the results reflect particularly the bottom-up, stimulus-driven brain response and allow to test whether the brain is able to differentiate between such ambiguous sounds that only vary in the degree of phonetic information without being obvious speech sounds. High sensitivity to the used phonetic manipulation was expected in the primary and secondary auditory cortex. The results broadly confirmed this *a priori* hypothesis by demonstrating a clear differentiation between sounds with increasing phonetic information versus sounds with unaltered phonetic information. Especially the STG and planum temporale followed this manipulation logarithmically, while more medial areas, i.e., the core area of the auditory cortex, did not respond to the manipulation. This indicated that the BOLD response prominently increased already in the early phase of the sound-morphing sequence, when only little phonetic information was present, while increases in the BOLD response were less prominent in the later phase of the morphing sequence. In addition, no obvious lateralization effects were observed, indicating that left and right posterior temporal lobes were equally sensitive to this manipulation (Osnes et al., 2011b).

Stop-consonants are even more important building blocks of speech than vowels. As described above, stop consonants are consonants in which the sound is produced by stopping the airflow in the vocal tract either with or without simultaneous voicing (voiced/unvoiced consonant), thus containing rapid frequency modulations. Rimol et al. (2005) explored the neuronal responses to unvoiced stop consonants. The results demonstrated bilateral activations in the temporal lobes with a clear leftward asymmetry for both consonants as well as CV syllables. This leftward asymmetry was further confirmed by direct comparison with the matched noise condition. A leftward asymmetry for consonants as opposed to vowels (Osnes et al., 2011b) could indicate a higher temporal resolution of the left primary and secondary auditory cortex (Zatorre and Belin, 2001; Zatorre et al., 2002; Boemio et al., 2005), which is then further reflected in a general left dominant processing of those speech-specific signals. This may to a certain degree confirm the asymmetric sampling theory (AST; Poeppel, 2003), although the left–right dichotomy in temporal resolution may oversimplify the underlying processes (McGettigan and Scott, 2012).

Nevertheless, the results of these studies clearly indicate that the different sound structures of consonants and vowels, with rapid frequency modulations for stop consonants and a more constant tonal characteristic for vowels, are differently processed by the two

"fnhum-07-00629" — 2013/9/30 — 17:41 — page 3 — #3

temporal lobes. More specifically, the left temporal lobe clearly has a higher sensitivity for consonants, while vowels are processed more bilaterally. This result was also confirmed by a study that used a dichotic presentation of CV syllables, where the functional asymmetry was explored on a voxel-by-voxel level (van den Noort et al., 2008). Besides bilateral activations, the results indicated a functional asymmetry toward the left, with significantly higher activations in the left posterior STG, extending into the angular and supramarginal gyrus.

Interestingly, these results are paralleled by behavioral investigations of the VOT effects in the dichotic listening task. In such a task, two CV syllables are presented to the participant at the same time, and the participant has the task to repeat the syllable that is perceived the most clearly. In most of the cases, this will be the syllable that was presented to the right ear (Hugdahl et al., 2009; Hugdahl, 2011), an effect termed "right ear advantage" (REA). However, the strength of the REA depends on the VOT. The strongest REA was observed when a syllable with a long VOT was presented to the right ear (Rimol et al., 2006a; Sandmann et al., 2007). These are also those syllables with the most complex temporo-spectral characteristics, thus likely benefiting from the assumed higher temporal resolution of the left auditory cortex (Zatorre et al., 2002), since signals from the right ear are predominantly transmitted to the left auditory cortex.

#### **SUB-LEXICAL PROCESSING**

In order to explore the phonological and sub-lexical decoding in more detail, the following study used again the sound-morphing procedure to investigate the dynamic of the responses in the posterior and middle part of the STG. This was achieved by sound-morphing white noise, i.e., a sound with equal spectral and temporal distribution, in seven distinct steps ("Step 1" to "Step 7") into either a speech sound or a short music sound. The latter served as control stimuli. In order to have a comparable spectral complexity of the target sounds, the sounds were selected based on their spectral characteristics. The speech sounds were the CV syllables /da/ and /ta/, and the music instrument sounds were a piano chord consisting of a major triad on a C3 root, and an A3 guitar tone (see Specht et al., 2009 for technical details). It is important to note that the stimuli were presented in a randomized order, i.e., that the participants never heard the stimuli in a sequential order from Step 1 to Step 7 to avoid expectation effects, as explained previously. As before, the participants performed an arbitrary task and were debriefed about the real aims of the study afterward. Parallel behavioral assessment was conducted in an independent sample of subjects to ensure that the subjects were naïve to the stimulus material in both studies (Osnes et al., 2011a).

While the previously described studies on auditory–phonetic processing revealed a high sensitivity of the STG to phonetic cues and demonstrated no lateralization for vowels, but a clear lateralization for stop consonants and CV syllables, the results of this study bridges the previous results by demonstrating an increasing lateralization toward the left as the sound became more and more a speech sound (CV syllable). Moreover, this increased leftward asymmetry was particularly prominent outside of the auditory cortex. More precisely, there was a small area in the middle part of the left superior temporal sulcus (mid-STS) that showed the strongest differentiation between the sounds along with a significant interaction between speech and music sound manipulations, and increasing response and increasing leftward asymmetry with increasing intelligibility of the speech sounds was demonstrated. Furthermore, this area (MNI coordinates −54, −18, −6) overlaps with the mid-STS area (MNI coordinates −59, −12, −6) that was detected in an earlier study that compared the perception of real words with complex sounds and pure tones (Specht and Reul, 2003). In contrast, when the sound morphed into a music sound, no lateralization was found, and activity in left and right temporal lobe areas increased to a comparable extent. In addition, a parallel behavioral study in a naïve sample of participants demonstrated that the participants were more able to identify distorted speech sounds as speech than the distorted music sounds as music (Osnes et al., 2011a). Interestingly, at an intermediate step, the breaking point from where on subjects perceived the sounds as speech sounds, there was additional activation in the premotor cortex, possibly indicating processes that facilitate the decoding of the perceived sounds as speech sounds. This link between speech perception processes and areas belonging to the dorsal-stream have been described before in case of degraded speech signals (Scott et al., 2009; Peelle et al., 2010; Price, 2010, 2012). Using dynamic causal modeling (DCM), Osnes et al. (2011a) was able to demonstrate that the connection between the premotor cortex and STS was bidirectional, while the connection from the planum temporale to the premotor cortex was only one-directional (forward), possibly reflecting a directed flow of information. Note that the premotor cortex was only involved when the sound was morphed into a speech sound, but that there were no connections between premotor cortex and STS or planum temporale, when the sound was morphed into a non-verbal sound.

It is important to emphasize that activations were always seen in both temporal lobes irrespective of the presented sound, but that only the left STS demonstrated an additional sensitivity to the sound-morphing manipulation. This, however, indicates only a higher sensitivity to the manipulation, but not necessarily a speech-specific activation.

Furthermore, there was no observable lateralization or exclusive processing of one stimulus category over the other on the level of the primary and secondary auditory cortex. This lack of lateralization in primary auditory processing is especially present in attentive but otherwise passive listening studies, while lateralization (leftward asymmetry) was observed in syllable discrimination tasks (Poeppel et al., 1996). Once a signal is identified as a speech stimulus, a stronger leftward asymmetry might emerge, indicating further phonetic and phonological processing (Specht et al., 2005). However, it is still an open question whether the identification of an acoustic input as speech sound is a bottom-up and thus stimulus-driven effect, or a top-down process. The results presented here indicate, at least to a certain extent, a bottom-up effect.

#### **LEXICAL PROCESSING**

In a third study that used the sound-morphing paradigm, only real words were used, but filtered in such a way that the sounds were identifiable as speech while at the same time varied in the degree of intelligibility (Specht et al., 2005). The results confirmed

"fnhum-07-00629" — 2013/9/30 — 17:41 — page 4 — #4

that especially the left temporal lobe is sensitive to the intelligibility of a speech sound, while the right temporal lobe responds in a comparable way to all stimuli, irrespective of the sound category. This was seen in both the voxel-wise analysis as well as region-of-interest analysis with a priori defined regions in the left and right temporal and frontal lobes. Note that once again the right temporal lobe responded to all stimuli, but did not follow the manipulation, in contrast to the left hemisphere. The increasing intelligibility of the words was also reflected by an increased activity within the left inferior frontal gyrus, comprising the dorsal-posterior part of Broca's area [Brodmann area (BA) 44], which may be due to an active processing of the distorted sounds, as subjects had to indicate by button press when the sound was intelligible, and may thus reflect a lexical processing of the stimuli.

These lexical processes were further explored by a lexical decision task, in which participants were asked to perform a decision between, either real words and phonologically incorrect nonwords, or, as a more demanding task, between real words and phonologically correct but otherwise meaningless pseudo-words. A high-low pitch decision served in both cases as auditory control condition. The results from this PET study demonstrated that the easier non-word/real word decision was made by a phonological analysis, involving only on the temporal lobe, in particular left temporal structures, without any involvement of frontal areas. By contrast, the more demanding pseudo-word decision involved also the left inferior frontal gyrus, including Broca's area (BA 44, 45), which is also in line with other studies on lexical decision making that use, for example, visual presentations (Heim et al., 2007, 2009).

#### **SEMANTIC PROCESSING**

The last process examined by the study series described here was semantic processing, a processing step distinct from lexical processing. In order to get these processes separated in the imaging data and to separate them also from auditory–phonetic processing, the respective study by Specht et al. (2008) used an independent component analysis (ICA; Calhoun et al., 2005, 2009; Keck et al., 2005; Kohler et al., 2008) rather than a univariate general linear model approach. The paradigm comprised three different linguistic levels. The first level was a passive perception of reversed words, which was used to control for auditory perception and, partially, for phonological processing. The second level was a passive listening to real words, which aimed to control for phonological and lexical processing. Finally, the third level was a covert naming task after aurally presented definitions, that reflects in particular semantic processing, but may to a certain degree be confounded by sentence processing. Hence, all three levels were expected to activate different processing stages of the ventral stream – or "what pathway" (Scott and Wise, 2004) – to different degrees.

An ICA is beneficial here as it has the ability to combine the involved brain areas to networks that show the same time course in the BOLD signal and share the same variance. Since the auditory and phonological processing was present in all three levels, the ICA was able to separate the respective network from the network for semantic and sentence processing that was only required in the naming task.

The two main components that were detected by the ICA, confirmed that the auditory processing of phonological information is an almost bilateral process, while speech comprehension, comprising lexical and semantic processing, is often left lateralized (Hickok and Poeppel, 2004, 2007; Poeppel et al., 2012). In particular, the left anterior temporal lobe (ATL) has been identified as an important structure for required for semantic and naming tasks (Schwartz et al., 2009; Binder et al., 2011). The areas of the second ICA component also nicely overlap with the ventral stream model, including mainly anterior portions of the temporal lobe, but also the temporo-parietal junction and a distinct area in the posterior part of the inferior temporal gyrus (ITG; Specht et al., 2008). An extension from the posterior superior temporal areas toward the temporal pole, forming the ventral stream, is a typical finding (Scott et al., 2000). This posterior–anterior extension reflects that the more the complexity of linguistic processing increases by involvement of semantic processing and sentence comprehension, the greater becomes the extension of activation to anterior and ventral parts of the temporal lobe. Also involved are inferior, posterior areas including the ITG, as repeatedly reported in studies on sentence processing and semantic aspects of language (Rodd et al., 2005; Humphries et al., 2006, 2007; Hickok and Poeppel, 2007; Patterson et al., 2007; Binder and Desai, 2011; Poeppel et al., 2012).

Interestingly, a very similar pattern is often found when analyzing the loss of gray matter in patients suffering from primary progressive aphasia (PPA), which is an aphasic syndrome caused by neuronal degeneration that can occur in different clinical variants (Grossman, 2002; Mesulam et al., 2009; Gorno-Tempini et al., 2011). Its neuropsychological syndrome is characterized by slowly progressing isolated language impairment without initial clinical evidence of cognitive deficits in other domains (Grossman, 2002; Mesulam et al., 2009). In particular, the clinical phenotype of semantic dementia, which may be a variant of a fluent PPA (Adlam et al., 2006), is mainly associated with damage to the temporal lobe, with the left ATL being affected most severely with respect to gray matter atrophy (Mummery et al., 2000;Adlam et al., 2006; Mesulam et al., 2009) and white matter damage (Galantucci et al., 2011). Although less common and less pronounced, ATL pathologies, in combination with parietal lobe pathologies, have also been observed in the non-fluent, logopenic PPA sub-type, as well (Zahn et al., 2005).

All results from the studies presented here are summarized in **Figure 1**. The summary depicts the ventral stream and displays in particular how the activation extends from the primary auditory cortex to anterior parts of the temporal lobe as the perceived sound becomes a meaningful speech stimulus, a real word, or a sentence. Furthermore, **Figure 1** indicates that the ventral stream is bilateral, but more extended on the left hemisphere. Only the left inferior frontal gyrus demonstrates a significant contribution to the processing.

#### **DISCUSSION**

Auditory speech perception is, as illustrated in this summary, a complex interaction of different brain areas that are integrated into a hierarchical network structure. To unravel the neuronal mechanisms of speech perception, it is of crucial importance to

"fnhum-07-00629" — 2013/9/30 — 17:41 — page 5 — #5

follow and to understand the organization of the information flow, particularly within the temporal lobes. Although auditory perception has been investigated by numerous functional imaging studies over the last decades, several aspects are still unresolved and not fully understood. One important contribution to the description of the processes behind auditory speech perception was the introduction of the concepts of the dorsal and ventral streams in recent models of speech perception (Hickok and Poeppel, 2004, 2007; Scott and Wise, 2004). On the neuroanatomical level, these two processing streams can to a certain degree be linked to two fiber tracts and their sub-branches (Catani et al., 2004, 2007; Saur et al., 2008; Weiller et al., 2011). However, one has to bear in mind that those theoretical concepts of "streams" do not necessarily have to follow neuroanatomical structures. Although this concept of two processing streams is striking, it is difficult to display them with functional neuroimaging, since neuroimaging results typically provide "snapshots" of brain activations rather than dynamic processes. Therefore, the series of complementary studies presented above focused particularly on two aspects: first, to create a series of studies that overlapped with respect to mapping the different processing nodes within the hierarchical network that configures the ventral stream in the temporal lobe, and, second, to use dynamic paradigms in which stimulus properties were gradually changed in order to identify brain areas that were sensitive to the manipulation. Thereby, speech sensitive areas could be separated from areas of general auditory perception, or lexical from sub-lexical areas.

The studies have consistently shown that speech perception is not a pure left hemispheric function. It is the interplay of the different left and right temporal lobe structures that generates a speech percept out of an acoustical signal, and the left and the right auditory systems process different aspects of the speech signal. Tonal aspects, such as the vowel, do not exhibit a strong lateralization. In contrast, the perception of consonants demonstrates a leftward asymmetry, supporting the hypothesis of different processing capacities and properties of the left and right auditory cortex with respect to temporal and spectral resolution (Zatorre et al., 2002), as well as temporal integration windows, as proposed by the "asymmetric sampling in time" (AST) hypothesis (Poeppel, 2003). However, this simple dichotomy of higher versus lower temporal resolution in the left and right temporal lobe, respectively, may oversimplify the underlying processes as well as the characteristics of speech sounds. Thus, future models should take the specific nature of speech sounds into account, given by the flexibility and limitations of the articulatory system that produces these sounds (McGettigan and Scott, 2012). Nevertheless, the differential processing within the left and right temporal lobe becomes in particular evident when comparing the study that used only vowels (Osnes et al., 2011b) to the study that focused on the processing of stop-consonants (Rimol et al., 2005) or dichotically presented CV syllables (van den Noort et al., 2008; Specht et al., 2009). While the more tonal vowels did not exhibit a left–right asymmetry, consonants and CV syllables were processed stronger by the left than the right auditory cortex and surrounding areas. Note that only asymmetries were detected on this level, but not clear unilateral processes. It is further important to note that the area of the planum temporale did not turn out to be speech specific, although has also been discussed for a long time as an area important for phonological processing. However, in agreement with recent neuroimaging studies, this view has been challenged, and it has been shown that the area of the planum temporale is also involved in early auditory processing of non-verbal stimuli, spatial hearing, as well as auditory imagery (Binder et al., 1996; Papathanassiou, 2000; Specht and Reul, 2003; Specht et al., 2005; Obleser et al., 2008; Isenberg et al., 2012; Price, 2012).

One area that repeatedly appears in the neuroimaging literature on vocal, phonological, and sub-lexical processing is the STS (Belin et al., 2000; Jäncke et al., 2002; Scott et al., 2009; Price, 2010, 2012). The importance of this structure was also supported by the studies presented here that showed distinct, mainly left-lateralized responses during passive listening to syllables and words, when compared to non-verbal sounds within the middle part of STS (Specht and Reul, 2003; Specht et al., 2005, 2009). However, it should again be emphasized that the results only indicate a high sensitivity to the phonological signals and a high sensitivity to sound-morphing manipulations, without necessarily implying that this is a speech-specific area. It is possible that a speechspecific involvement of the STS may emerge when required (Price et al., 2005). Interestingly, when the focus is on phonological processing, the left STS appears to be the dominating structure, while when voice aspects are in the focus, the right STS is more dominant (Belin, 2006; Latinus and Belin, 2011). Moreover, a recent meta-analysis by Hein and Knight indicated that the STS of the left and right hemisphere is apparently involved in several different processes involving not only phonological processing, but also theory of mind, audio-visual integration, or face perception (Hein and Knight, 2008). Thus, studies are required that examine these function on a within-subject level in order to verify the neuroanatomical overlap of these different functions. Besides the areas in the STG and STS, several studies also pinpoint an area in the posterior part of the ITG, close to the border to the fusiform gyrus. This area is typically seen in visual lexical decision task (see, for example, Heim et al., 2009), but also in auditory tasks, such as word and sentence comprehension (Rimol et al., 2006b; Specht et al., 2008). In general, there is reasonable evidence that this area serves as a supramodal device in which the auditory and the visual ventral streams meet or join. Thus, this area is independent from the input modality and has to be differentiated from an adjacent area, often referred to as the "visual word form area," which is located more posterior and medial (Cohen et al., 2004). The function of this inferior temporal area is still under debate, but several studies point to the fact that this area is especially involved in lexical processing. In accordance with that, the model by Hickok and Poeppel (2007) calls this area the "lexical interface." Interestingly, the same or nearby areas seem also to play an important role in multilingualism (Vingerhoets et al., 2003) and show also structural and functional alterations in subjects with dyslexia (Silani, 2005; Dufor et al., 2007).

Moving further along the ventral stream toward the anterior portion of the temporal lobe, the neuroimaging results presented here demonstrate, in agreement with the literature (Vandenberghe et al., 2002; Price, 2010, 2012; Binder and Desai, 2011), an increasing contribution of more anterior portions of the temporal lobe to lexical, semantic, and sentence processing (Specht et al., 2003a,

"fnhum-07-00629" — 2013/9/30 — 17:41 — page 6 — #6

2008). This shift from acoustic and phonological processing in the posterior superior temporal lobe to semantic processing in the ATL characterizes the ventral stream (Scott et al., 2000; Visser and Lambon Ralph, 2011). Interestingly, neurocomputational models confirm this gradual shift within the ventral stream. Ueno et al. (2011) modeled a neuroanatomically constrained dual-stream model, with a dorsal and a ventral stream. They were able to demonstrate the division of function between the two streams, and they were also able to demonstrate that a gradual shift from acoustic to semantic processing along the ventral stream improves the performance of the model (Ueno et al.,2011). However, this model was constrained to an intra-hemispheric network with one ventral and one dorsal stream only and did not consider any functional asymmetry. In contrast, neuroimaging data indicate a bilateral representation of some parts of the ventral stream (Hickok and Poeppel, 2007). This is reflected by different degrees of functional asymmetries along the ventral stream. While auditory and sublexical processing are more symmetrically organized, a stronger leftward asymmetry appears for lexical and semantic processes, which is in line with the notion that a leftward asymmetry for linguistic processes emerges only outside of the auditory cortex and adjacent areas (Binder et al., 1996; Poeppel, 2003). Furthermore, there is emerging evidence that semantic processing and conceptual knowledge are crucially dependent on the functional integrity of theATL,including among other areas the left ventrolateral prefrontal cortex and the left posterior temporal and inferior parietal areas. This was demonstrated by, for example, TMS studies (Lambon Ralph et al., 2009; Pobric et al., 2009; Holland and Lambon Ralph, 2010), studies using direct cortical stimulation (Luders et al., 1991; Boatman, 2004), intracranical recording studies (Nobre and McCarthy, 1995), studies in patients with semantic dementia (Patterson et al., 2007; Lambon Ralph et al., 2010), and studies that combined TMS, fMRI, and patient data (Binney et al., 2010). In addition, one has to distinguish between the anterior STG/STS and the ventral ATL that appear to host related but nevertheless distinct functions (Spitsyna et al., 2006; Binney et al., 2010; Visser and Lambon Ralph, 2011). The anterior STG/STS area is considered to be more related to the semantic and conceptual processing of auditory words and environmental sounds, while the ventral ATL is assumed to be a more heteromodal cortical region (Spitsyna et al., 2006). This might indicate a higher level of the ventral ATL within the processing hierarchy, since unimodal visual and auditory language processing streams converge in this heteromodal area (Spitsyna et al., 2006). Furthermore, differential contributions of the left and right ATL have been identified by the demonstration that the left ventral ATL responds stronger to auditory words, while visual stimuli and environmental sounds cause bilateral responses (Visser and Lambon Ralph, 2011). However, it is important to note that particularly the ventral ATL is difficult to access with fMRI, as susceptibility artifacts affect the signal-to-noise ratio in this area. Thus, it is difficult to examine the specific function of this area, and many studies may overlook this structure or are "blind" its responses (Visser et al., 2010, 2012).

Based on the neuroimaging data summarized in **Figure 1**, and in accordance with the literature, a "lateralization gradient" could be proposed for the ventral stream that becomes stronger left lateralized along the posterior–anterior axis (Peelle, 2012). However, this increasing leftward asymmetry, i.e., increasing strength of the lateralization gradient, could also be induced or influenced by topdown control, since a lexical and semantic process implies an active processing of the perceived speech signals rather than simply passive listening. Accordingly, studies that are based on a more passive processing of the speech signals are often showing more bilateral results than studies in which subjects are asked to process the stimuli actively, thus influencing the steepness of the proposed lateralization gradient. Furthermore, the information and stimulus type can influence the steepness of the proposed lateralization gradient, since the strongest lateralization for ATL structures appears for aurally perceived information, such as administered in the studies presented above, but might be less asymmetric for nonverbal, visual information, or figurative language (Binder et al., 2011; Visser and Lambon Ralph, 2011).

In contrast, the observed frontal activations were strictly left lateralized. As depicted in **Figure 1**, the activations extend bilaterally from the primary auditory cortex along the posterior–anterior axis of the temporal lobes, as the sound becomes meaningful speech, with additional involvement of only the left inferior frontal gyrus for lexico-semantic processing. Anatomically, this connection from the anterior portion of the left ATL to the inferior frontal gyrus is most probably provided by a connection via the extreme capsule (Saur et al., 2008; Weiller et al., 2011). However, this inferior frontal contribution is likely to reflect a top-down processing of the stimulus rather than a stimulus-driven bottom-up effect (Crinion et al., 2003), as these activations occurred only in studies using an active task on the lexical and semantic level, and are thus not considered to be a fundamental part of the ventral stream.

In general, it is clear that the temporal lobe, in particular the left temporal lobe, is of crucial importance for speech perception and other language related skills, such as reading and general lexical processing in more posterior and inferior portions of the temporal lobe (Price, 2012). Furthermore, the middle part of the left STS has repeatedly been described as an area central for speech perception. This emphasizes the importance of the ventral stream in the larger speech and language network. The ventral stream, and in particular the ventral stream within the left temporal lobe, is thus important for both perception as well as production of speech. Rauschecker and Scott (2009) proposed a closed loop by incorporation of the dorsal stream into their loop model. As demonstrated here, the ventral stream may terminate in the ATL or, perhaps, in the inferior frontal gyrus. In the later case, this stream has direct connection to the dorsal stream, providing also an anatomical basis for the proposed processing loop. Furthermore, Rauschecker and Scott (2009) proposed a loop for forward mapping and inverse mapping. To some extent, the study by Osnes et al. (2011a), using DCM in combination with the sound-morphing paradigm, demonstrated a link between the dorsal and ventral streams through an involvement of the premotor cortex. The DCM results further demonstrated that the premotor cortex has a bidirectional connection with the STS, but only a forward connection from planum temporale to the premotor cortex, resulting in a directed information flow, similar to the inverse loop proposed by Rauschecker and Scott (2009). Thus, this result helps to understand the perception processes in situations of degraded speech signals. It could also shed some light on disturbed

"fnhum-07-00629" — 2013/9/30 — 17:41 — page 7 — #7

processing networks, as for example found in developmental stuttering, for which there is evidence for functional (Salmelin et al., 1998, 2000) as well as structural (Sommer et al., 2002) alterations of the dorsal stream. In line with this hypothesis, different contributions of the dorsal and ventral stream in speech perception processes has recently been confirmed in a not yet published fMRI study in developmental stutterers (Martinsen et al., unpublished), using the same sound-morphing paradigm as introduced here (Specht et al., 2009).

In summary, the body of data presented here, derived from a series of stepwise overlapping studies that included the use of dynamic paradigms, demonstrates that auditory speech perception rests on a hierarchical network that particularly comprises the posterior–anterior axes of the temporal lobes. It has further been shown that the processes are increasingly leftward

#### **REFERENCES**


*Cereb. Cortex* 20, 2728–2738. doi: 10.1093/cercor/bhq019


lateralized as sounds gradually turn into speech sounds. Still, areas of the right hemisphere are also involved in the processing, which might be beneficial in the case of a stroke. While a multitude of studies demonstrate that temporal lobe structures are essential for speech perception and language processing in general, the fact that the same areas have been shown to be involved in other, non-speech related processes as well, should not be neglected. Thus, new models are needed that can unify and explain such diverging results within a common framework.

#### **ACKNOWLEDGMENT**

The studies, performed between 2008 and 2012, were supported by a grant from the Bergen Research Foundation (www.bfstiftelse.no).


"fnhum-07-00629" — 2013/9/30 — 17:41 — page 8 — #8


*Biol.* 21, R143–R145. doi: 10.1016/j.cub.2010.12.033


definition of language epicenters with PET. *Neuroimage* 11, 347–357. doi: 10.1006/nimg.2000.0546


*Nat. Neurosci.* 12, 718–724. doi: 10.1038/nn.2331


"fnhum-07-00629" — 2013/9/30 — 17:41 — page 9 — #9

pathway for intelligible speech in the left temporal lobe. *Brain* 123, 2400– 2406. doi: 10.1093/brain/123.12. 2400


nonfluent aphasia and Alzheimer's disease using chemical shift imaging, voxel-based morphometry and positron emission tomography. *Psychiatry Res.* 140, 115–131. doi: 10.1016/j.pscychresns.2005. 08.001


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 01 May 2013; accepted: 11 September 2013; published online: 02 October 2013.*

*Citation: Specht K (2013) Mapping a lateralization gradient within the ventral stream for auditory speech perception. Front. Hum. Neurosci. 7:629. doi: 10.3389/fnhum.2013.00629*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2013 Specht. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, providedthe original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

"fnhum-07-00629" — 2013/9/30 — 17:41 — page 10 — #10

## Repeating with the right hemisphere: reduced interactions between phonological and lexical-semantic systems in crossed aphasia?

*Irene De-Torres 1,2, Guadalupe Dávila1,3, Marcelo L. Berthier <sup>1</sup> \*, Seán Froudist Walsh1,4, Ignacio Moreno-Torres <sup>5</sup> and Rafael Ruiz-Cruces <sup>1</sup>*

*<sup>1</sup> Unit of Cognitive Neurology and Aphasia, Centro de Investigaciones, Médico-Sanitarias, University of Málaga, Malaga, Spain*

*<sup>2</sup> Unit of Physical Medicine and Rehabilitation, Carlos Haya University Hospital, Malaga, Spain*

*<sup>3</sup> Psychobiology Area, Faculty of Psychology, University of Málaga, Malaga, Spain*

*<sup>4</sup> Department of Psychosis Studies, Institute of Psychiatry, King's Health Partners, King's College London, UK*

*<sup>5</sup> Department of Spanish Language I, University of Málaga, Malaga, Spain*

#### *Edited by:*

*Matthew A. Lambon Ralph, University of Manchester, UK*

#### *Reviewed by:*

*Paul Hoffman, University of Manchester, UK Peter Mariën, Vrije Universiteit Brussel, Belgium*

#### *\*Correspondence:*

*Marcelo L. Berthier, Unidad de Neurología Cognitiva y Afasia, Centro de Investigaciones Médico-Sanitarias, Universidad de Málaga, Campus Teatinos, C/Marqués de Beccaria 3, 29010 Málaga, España e-mail: mbt@uma.es*

Knowledge on the patterns of repetition amongst individuals who develop language deficits in association with right hemisphere lesions (crossed aphasia) is very limited. Available data indicate that repetition in some crossed aphasics experiencing phonological processing deficits is not heavily influenced by lexical-semantic variables (lexicality, imageability, and frequency) as is regularly reported in phonologically-impaired cases with left hemisphere damage. Moreover, in view of the fact that crossed aphasia is rare, information on the role of right cortical areas and white matter tracts underpinning language repetition deficits is scarce. In this study, repetition performance was assessed in two patients with crossed conduction aphasia and striatal/capsular vascular lesions encompassing the right arcuate fasciculus (AF) and inferior frontal-occipital fasciculus (IFOF), the temporal stem and the white matter underneath the supramarginal gyrus. Both patients showed lexicality effects repeating better words than non-words, but manipulation of other lexical-semantic variables exerted less influence on repetition performance. Imageability and frequency effects, production of meaning-based paraphrases during sentence repetition, or better performance on repeating novel sentences than overlearned clichés were hardly ever observed in these two patients. In one patient, diffusion tensor imaging disclosed damage to the right long direct segment of the AF and IFOF with relative sparing of the anterior indirect and posterior segments of the AF, together with fully developed left perisylvian white matter pathways. These findings suggest that striatal/capsular lesions extending into the right AF and IFOF in some individuals with right hemisphere language dominance are associated with atypical repetition patterns which might reflect reduced interactions between phonological and lexical-semantic processes.

**Keywords: right hemisphere, language, crossed aphasia, conduction aphasia, language network, structural connectivity**

#### **INTRODUCTION**

It is well-established that the majority (95%) of right-handers have their left cerebral hemispheres dominant for language (Annett, 1998; Wada and Rasmussen, 2007). A minority (5%) of right-handers have right hemispheric specialization for language (Loring et al., 1990; Annett, 1998; Pujol et al., 1999; Knecht et al., 2002) and mixed language dominance (language production and reception represented in different hemispheres) which can occur in both normal (Lidzba et al., 2011) and brain damaged right-handers (Kurthen et al., 1992; Paparounas et al., 2002; Kamada et al., 2007; Lee et al., 2008) is even more infrequent. The rarity of complete or incomplete lateralization of language to the right hemisphere explains why only a minority of righthanded individuals develop language deficits after right hemisphere injury (crossed aphasia) (Bramwell, 1899; Alexander et al., 1989a; Mariën et al., 2001, 2004). Although crossed aphasia is rare, analysis of language functioning in these subjects represents an ideal opportunity to examine whether their language performance and neural architecture underpinning language functions in the right hemisphere are the same as those reported in subjects with left hemisphere language dominance (Catani et al., 2007; Turken and Dronkers, 2011; Catani and Thiebaut de Schotten, 2012). Here, we report the occurrence of fluent aphasia with severely abnormal repetition and deficits in sentence comprehension (conduction aphasia, CA) in two patients who suffered large right subcortical stroke lesions. This clinical-anatomical correlation is uncommon, but its description can further illuminate the neural organization of propositional language in the right hemisphere. In an attempt to accomplish this, in the present study the localization of damage to white matter tracts underpinning language repetition was outlined in one patient with the aid of brain sections depicted in an atlas of human brain connections (Catani and Thiebaut de Schotten, 2012) and in the other patient with diffusion tensor imaging (DTI) of bilateral white matter tracts.

Knowledge on the organization of propositional language in the right hemisphere comes from the analysis of aphasic patients with damage to the right hemisphere (see Alexander et al., 1989a; Mariën et al., 2004) and from a case series study of intraoperative cortical-subcortical stimulation (Vassal et al., 2010). Vassal and coworkers (2010) performed intraoperative cortical-subcortical electrical functional mapping in three righthanded adults who had right-sided low-grade gliomas. Right hemisphere language dominance was variously demonstrated by identification of language deficits during both partial epileptic seizures and preoperative formal testing, and activations in functional magnetic resonance imaging (fMRI) (one patient). During surgical interventions reproducible language disturbances were found by stimulating cortical sites in frontal and temporal cortices. Electrostimulation of the inferior fronto-occipital fasciculus (IFOF) elicited semantic paraphasias, whereas stimulation of the arcuate fasciculus (AF) caused phonemic errors, thus supporting in these cases the hypothesis of a mirror organization of white matter tracts between the right and left hemispheres (Vassal et al., 2010).

Studying patients with crossed aphasia Alexander and colleagues defined two clinical-radiological correlations which were named "mirror image" and "anomalous" (Alexander et al., 1989a; Alexander and Annett, 1996; Alexander, 1997; Mariën et al., 2004). The "mirror image" pattern assumes that the right language cortex has a similar structure and connections to the classical left language cortex, and therefore, similar language deficits to the ones observed after left hemisphere injury can be expected when the same injury occurs in homologous areas of the right hemisphere (Henderson, 1983; Bartha et al., 2004). This pattern occurs in as many as 60% patients and all clinical types of aphasia have been described (see Mariën et al., 2001, 2004). By contrast, the "anomalous" pattern considers that the structural arrangements and functional organization of the language cortex in the right hemisphere are different to the ones in the left language cortex, so that atypical language deficits can occur after right hemisphere injury (e.g., Wernicke's aphasia associated with frontal damage). The anomalous pattern has been described in approximately 40% of patients and it can be easily identified when patients present with relatively isolated phonological or lexical-semantic deficits associated with large lesions in the right perisylvian area (Alexander et al., 1989a; Mariën et al., 2001, 2004). Interestingly, the association of CA with an atypical location is more commonly encountered with right hemisphere lesions (35%) than after left hemisphere involvement (13%) (Basso et al., 1985; Alexander et al., 1989a; Dewarrat et al., 2009). Despite the relatively frequent occurrence of CA in cases of both "mirror image" (Henderson, 1983; Bartha et al., 2004) and "anomalous" crossed aphasia (Alexander et al., 1989a) comprehensive analyses of its main deficits (repetition, shortterm memory, sentence comprehension) have been described in only three cases (patient ORL, McCarthy and Warrington, 1984; patient EDE, Berndt et al., 1991; and patient JNR, Berthier et al., 2011). Below, a brief summary of the main findings from patient EDE are described. A further description of the other two cases is not provided here because their personal and developmental histories (mixed handedness and perinatal left hemisphere injury in JNR and left-handedness in ORL) invalidate the diagnosis of crossed aphasia.

Berndt et al. (1991) described the case of a 56-year-old strongly right-handed housewife (EDE) who acutely developed fluent aphasia with impaired auditory comprehension and rapid cycling mood changes in association with a right posterior cortical infarction. A formal evaluation of deficits in EDE was initiated 10 months after the stroke and by that time her reading and writing deficits had improved more than repetition span and auditory sentence comprehension. Since then language and cognitive deficits remained stable and were longitudinally evaluated during the next 5 years. An MRI performed approximately 4 years post-onset revealed a right temporal-parietal infarction compromising cortical regions (middle temporal gyrus and posterior superior temporal gyrus, temporal pole, and posterior insula) engaged in auditory comprehension. In retrospect, it could be argued that EDE probably had an acute Wernicke's aphasia which gradually resolved to CA in the chronic period (1 year postonset) (Berndt et al., 1991). Berndt and colleagues interpreted the clinical-anatomical relationships observed in EDE as indicative of "mirror image" crossed CA (Alexander et al., 1989a; Alexander and Annett, 1996; Alexander, 1997), although her performance in repetition and short-term memory tasks was atypical in comparison with other patients presenting with short-term memory deficits after left hemisphere damage. Indeed, EDE had intact input phonological processing, 1-item recency effect on list repetition, and absent meaning-based paraphrases during sentence repetition that in the authors' view reflected an atypical interaction between the right and left hemispheres (Berndt et al., 1991). Berndt and her colleagues concluded that in EDE:

*"... .there appears to be an unusual dissociation of functions such that the perception of auditory/phonetic information is separated from its storage, while access to semantic information from phonemic forms in connected speech is impaired ...... some initial processing of auditory/phonetic information is carried out in EDE's intact left hemisphere, while language functions responsible for phonetic storage and lexical/semantic assignment to sentence constituents are lateralized to the right hemisphere"* (p. 277).

Analysis of repetition performance in the other two patients yielded mixed results. Evaluation in patient JNR replicated the results obtained in EDE (except for abnormal phonological input processing), but patient ORL had repetition deficits similar to the ones described in cases with CA and left hemisphere involvement (see further details in Berthier et al., 2011; McCarthy and Warrington, 1984). In light of the limited data available and mixed results on the pattern of repetition in patients with crossed CA, analysis of further cases is clearly needed. In this study, we specifically investigated repetition deficits in two chronic stroke patients with crossed subcortical CA. We also examined for the first time the role of right white matter pathways involvement in repetition processes in crossed aphasia. Our results replicate findings from previous similar cases (Berndt et al., 1991; Berthier et al., 2011) showing that repetition deficits have atypical features in more demanding tasks (sentence repetition) reflecting limited reliance on lexical-semantic processing as has been reported in typical CA associated to left hemisphere damage. Further, our neuroimaging findings suggest that subcortical lesions in the right hemisphere lesioning perisylvian and commissural pathways may account for the observed language deficits by altering the interaction between right and left hemispheres.

#### **METHODS**

#### **PARTICIPANTS**

We examined language deficits including repetition performance (digits, words/non-words, lists of word pairs and triplets, sentences and novel sentences/idiomatic clichés) in two monolingual Spanish speaking patients with chronic CA secondary to large right hemisphere stroke lesions. These two patients were the only ones referred to our unit from 1997 to 2011 with crossed subcortical aphasia.

#### **PATIENT JAM**

JAM was a 46-year-old man who suffered a large intracerebral haemorrhage in the right striatal/capsular region 1 year before referral to our unit. In the acute period, he had a dense left hemiplegia, left hemianopia, left hemisensory loss, and mild left hemispatial neglect. After a short-lived period of global aphasia, language testing revealed fluent jargon aphasia with impaired auditory comprehension which gradually regressed to CA. Reading and writing were severely affected with features of both deep dysgraphia and deep dyslexia. He also had mild dyscalculia but he did not show ideomotor or buccofacial apraxia as reflected by ceiling scores on the apraxia subtest (60/60) of the Western Aphasia Battery (WAB) (Kertesz, 1982). This later finding is at variance to that commonly observed in patients with CA associated to left hemisphere damage (Geschwind, 1965; Benson et al., 1973; Tognola and Vignolo, 1980). At the time of formal language evaluation JAM was fully oriented and showed adequate insight into his deficits. His affect was flat and he tended to be isolated at home. He met diagnostic criteria for major depression as has been reported in patients with left basal ganglia strokes (Starkstein et al., 1988). JAM was strongly right-handed without history of perinatal injury, developmental delay, or familiar left-handedness. On the Edinburgh Handedness Inventory (Oldfield, 1971) his score was +100. During the first 6 months after the stroke, JAM received conventional speech-language therapy1 on an individual basis (2 h/week) showing improvement in spontaneous speech and auditory comprehension. No beneficial changes were reported on repetition deficits.

#### **PATIENT AFL**

AFL was a 63-year-old woman who developed fluent jargon aphasia with severely compromised auditory and written comprehension in association with a large right subcortical stroke. In the acute period, she had a dense left hemiplegia, left hemianopia, left hemisensory loss but not left hemispatial neglect. Apraxia scores on the WAB were only mildly impaired (49/60) most likely due to comprehension deficits with similar performances on pantomime to verbal commands and pantomime imitation, thus ruling out conduction apraxia (Ochipa et al., 1994). By that time, she was fully aware of her aphasic symptoms in spite of her severe jargon speech and comprehension deficits. She experienced despair, crying very frequently and also showing catastrophic reactions when physicians tried to interact with her (Berthier and Starkstein, 1994). She also met diagnostic criteria for major depression as has been reported in patients with left basal ganglia strokes (Starkstein et al., 1988). AFL was strongly right-handed without history of perinatal injury, developmental delay, or familiar left-handedness. On the Edinburgh Handedness Inventory (Oldfield, 1971) her score was +100. Six months after the stroke, AFL began to receive conventional speech-language therapy<sup>1</sup> on an individual basis (2 h/week) during a 6 month period showing improvement in spontaneous speech and auditory comprehension. No beneficial changes were reported on repetition deficits.

#### **IMAGING**

#### *Methods*

MRIs studies were performed on different scanners. AFL was studied in 1997 using a 1.5-T Signa scanner (General Electric Medical Systems, Milwaukee, WI) equipped with an eight-channel Philips SENSE head coil. Head movements were minimized using head pads and a forehead strip. High-resolution T1-weighted structural images of the whole brain were acquired with three dimensional (3D) magnetization prepared rapid acquisition gradient echo (3 D MPRAGE) sequence (acquisition matrix: 250/250 r; field of view: 240 ms; repetition time [TR]: 2250 ms; echo time [TE]: 238 ms; flip angle: 90; turbo field echo (TFE) factor: 100). The MRI study in JAM was performed on a 3-T magnet (Philips Gyroscan Intera, Best, The Netherlands) equipped with an eight-channel Philips SENSE head coil. Head movements were minimized using head pads and a forehead strap. High-resolution T1-weighted structural images of the whole brain were acquired with three dimensional (3D) magnetization prepared rapid acquisition gradient echo (3 D MPRAGE) sequence (acquisition matrix: 240/256 r; field of view: 240 ms; repetition time [TR]: 9.9 ms; echo time [TE]: 4.6 ms; flip angle: 8; turbo field echo (TFE) factor: 200; 1 <sup>×</sup> <sup>1</sup> <sup>×</sup> 1 mm<sup>3</sup> resolution). One hundred eighty two contiguous slices, each 1-mm thick, 0 mm slice gap, were acquired. The total acquisition time of the sequence was about 4:24 min. In addition to the 3D MPRAGE, a standard axial T-2 weighted/FLAIR (TR = 11.000 ms; TE = 125/27 ms; 264 × 512 matrix; field of view [FOV] = 230 × 230; 3-mm-thick slices with 1 mm slice gap) was obtained. A Short TI Inversion Recovery (STIR) was used to produce 24, 2.5 mm axial slices (interslice gap = 1 mm; TR = 4718 ms; TE = 80 ms; inversion time = 200 ms; 264 × 512 matrix; FOV = 230 mm; number

<sup>1</sup>Conventional speech-language therapy in both patients followed a syndrome-specific standard approach. The therapeutic repertoire ranged from exercises involving naming, repetition, sentence completion, following commands, spoken object-picture matching, and conversations on topics of the patients' own choice (see Pulvermüller et al., 2001; Basso, 2003; Basso et al., 2013).

of excitations = 2). In both patients the anterior commissure (AC) was identified in axial and coronal T1-weighted images at the level of the temporal stems (Warren et al., 2009).

#### *Results*

Lesion location was relatively similar in both patients (**Figure 1**). Axial MRI showed right basal ganglia lesions including the putamen, part of the external pallidum, and anterior limb, genu, and posterior limbs of the internal capsulae extending superiorly to the periventricular white matter (corona radiata). Tissue damage was also present in the white matter surrounding the hippocampus and the middle temporal gyrus with posterior extension to the auditory and optic radiations in the temporal stem (**Figure 2**). The right posterior ventral and dorsal insular cortices and the periventricular white matter deep to the supramarginal gyrus were also damaged in both cases, but AFL showed more extensive involvement. No lesions were documented in the left hemisphere.

#### **DIFFUSION TENSOR IMAGING (DTI)**

DTI allows for "*in vivo*" measurement of the diffusive properties of water in a way that allows information to be garnered about the microstructural organization of tissue (Basser et al., 1994). Tractography enables the orientation of white matter (WM) to be ascertained, thus making possible the segregation of WM into separate sections based on the paths of the distinct tracts (LeBihan, 2003). Data acquisition was performed using multi-slice single-shot spin-echo echo-planar imaging (EPI) with specific parameters as follows: FOV 224 mm, 2-mm-thick slices with 0 mm slice gap, *TE* = 117 ms, *TR* = 12408 ms, and b factor: 3000 s/mm2. The EPI echo train length consisted of 59 actual echoes reconstructed in a 112 × 128 image matrix. Sixty four diffusion directions were used in order to allow for precise construction of the diffusion tensor. Motion and eddy current correction were performed using FSL's FDT (http://www*.*fmrib*.*ox*.*ac*.* uk/fsl/) eddy current correction tool (Smith et al., 2004; Woolrich et al., 2009). Diffusion tensor estimation was carried out in using Diffusion Toolkit's least-square estimation algorithm for each voxel (Ruopeng Wang, Van J. Wedeen, TrackVis.org, Martinos Center for Biomedical Imaging, Massachusetts General Hospital). The whole brain tractography used an angular threshold of 35 degrees and an FA threshold of 0.2. The tensor was spectrally decomposed in order to obtain its eigenvalues and eigenvectors. The fiber direction is assumed to correspond to the principal eigenvector (the eigenvector with the largest eigenvalue). This vector was color coded (green for anterior-posterior, blue for superior-inferior and red for left-right) in order to help generate the color FA map. An FA map was also generated from these eigenvalues. This too was done using Diffusion Toolkit. Virtual dissections of the 3 parts of the AF and the IFOF were performed by using a region of interest (ROI) approach, following the directions of a white matter tractography atlas (Catani and Thiebaut de Schotten, 2012). All virtual dissections were performed using TrackVis (Ruopeng Wang, and Van J. Wedeen, TrackVis.org,

**FIGURE 1 | Structural axial MRI of patients JAM (A) and AFL (B) showing the full extension of lesions.** A 3T MRI (Short T1 Inversion Recovery—STIR—sequence) in JAM and 1.5T MRI (T2-weighted sequence) in AFL show relatively similar lesion topographies involving the right striatocapsular region with inferior extension to the temporal stem, ventral

insular cortex, and inferior fronto-occipital fasciculus. Note superior extension of the lesions to the arcuate fasciculus and white matter underneath the supramarginal gyrus. Schematic representation of the full extension of lesions **(C)** is depicted in axial MRIcron sections (Rorden, 2005) in JAM (blue lines) and AFL (red lines). The right side is shown on the left.

Uninflated surface of the cerebral hemispheres (FreeSurfer reconstruction) depicting gyri in green and sulci in red. The right image shows a small cortical component of the haemorrhage (red) involving the right anterior insula and superior temporal gyrus. The diffusion tensor imaging reconstruction of the arcuate fasciculus and inferior fronto-occipital fasciculus shows (left image) damage to the right long direct segment of the arcuate fasciculus (red) and inferior fronto-occipital fasciculus (blue) with relative sparing of short and long fibers of the anterior indirect segment (purple) and posterior segments (yellow), whereas the right image shows fully developed left perisylvian white matter pathways. **(B)** Anatomical axial MRI section (Short T1 Inversion Recovery—STIR—sequence) show the right striatocapsular lesion and perinecrotic tissue with degeneration of several white matter tracts (orange and blue arrows). AR indicates, auditory radiations; TS, temporal stem; SMG, supramarginal gyrus; AG, angular gyrus; EmC, extreme capsulae; vEmC, extreme capsulae; IFOF, inferior fronto-occipital fasciculus; AC, anterior commissure; AF-L, arcuate fasciculus-long segment.

Martinos Center for Biomedical Imaging, Massachusetts General Hospital).

#### *Results*

DTI was performed in patient JAM (**Figure 2**). DTI showed damage to the right long direct segment of the AF and IFOF with relative sparing of the anterior indirect and posterior segments of the AF together with fully developed left AF and IFOF. Since DTI could not be performed in patient AFL the white matter tracts affected by the lesion were identified with the aid of an atlas of human brain connections (Catani and Thiebaut de Schotten, 2012). The outline of white matter tracts in patient AFL suggested that both right AF and IFOF were damaged.

#### **LANGUAGE ASSESSMENT**

While both patients had Wernicke's aphasia in the subacute period, language deficits were more severe in AFL than in JAM. This was also reflected in the chronic period by the scores obtained in the Western Aphasia Battery (WAB) (Kertesz, 1982); JAM had an Aphasia Quotient of 79.6 (mild to moderate aphasia) and AFL of 56.4 (moderate to severe aphasia). JAM showed a combination of fluent and well-articulated spontaneous speech with rare phonemic paraphasias and occasional approximation to target words to repair errors (*conduite d'approche*), preserved auditory comprehension except for sequential commands and impaired repetition of multisyllabic words and sentences. Naming was relatively preserved. His WAB scores (fluency: 9, comprehension: 7.4, repetition: 6.2, naming: 9.2) were consistent with the diagnosis of CA (Kertesz, 1982). AFL showed fluent and well-articulated speech with mixing fragments of phonemic jargon and occasional normal utterances. Comprehension of sequential commands, sentence repetition, and naming were moderately impaired. Her WAB scores (fluency: 7, comprehension: 6.6, repetition: 5.7, naming: 3.9) were consistent with the diagnosis of CA (Kertesz, 1982).

#### **EXPERIMENTAL ASSESSMENTS**

To explore the interaction between phonology and lexicalsemantic processing, both patients were evaluated using selected subtests from the Psycholinguistic Assessments of Language Processing in Aphasia (PALPA) (Kay et al., 1992; Valle and Cuetos, 1995; Kay and Terry, 2004) and a battery of experimental tests (Berthier, 2001).

#### **PHONOLOGICAL PROCESSING** *Word pair discrimination*

*Method.* Four PALPA subtests were used to evaluate auditory processing for discriminating minimal pairs. These included Non-word Minimal Pairs (PALPA 1), Word Minimal Pairs (PALPA 2), Word Minimal Pairs Requiring Written Selection (PALPA 3), and Word Minimal Pairs Requiring Picture Selection (PALPA 4). The minimal pairs tests from the PALPA required same/different judgments for pairs of monosyllabic words/non-words that differed by a single phonetic feature (e.g., "sol-col" [sun-cabbage]). In half the trials, the two stimuli were identical and in half they were different.

*Results.* Both patients had abnormal performance on auditory discrimination of non-word minimal pairs with relatively similar scores on same and different pairs in AFL and significantly better performance on same pairs in JAM relative to different pairs which resulted from his tendency to classify most pairs as similar [χ<sup>2</sup> *(*1*)* = 25*.*2, *p <* 0*.*0001]. Performance was significantly better discriminating identical minimal word pairs than different word pairs in both JAM [χ<sup>2</sup> *(*1*)* <sup>=</sup> <sup>9</sup>*.*68, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*002] and AFL [χ<sup>2</sup> *(*1*)* = 9*.*24, *p* = 0*.*002]. AFL had impaired performance in auditory discrimination of word minimal pairs requiring written selection (this test was not administered to JAM). Scores in word minimal pairs requiring picture selection were relatively preserved in JAM and AFL (**Table 1**).

#### **RHYME JUDGMENTS** *Method*

Three PALPA subtests were used to evaluate processing for Rhyme Judgments in Auditory/Written (PALPA 15) and Pictures (PALPA 14) presentations. In each rhyme judgment task, two words were presented in the corresponding modality and the patient was required to say whether or not they rhymed (e.g.,

#### **Table 1 | Auditory and reading processing.**


*<sup>γ</sup> Test number follows the nomenclature of the original English version of PALPA (see Kay and Terry, 2004) which is slightly different from the Spanish version. Numbers in parentheses indicate proportion of correct responses. \*\*Normative data from Valle and Cuetos (1995). Abnormal results are highlighted in gray. See further information in text.*

"tarta-carta" [cake-letter]). There were 40 trials divided equally between rhyming and non-rhyming pairs.

#### *Results*

The ability of both patients to make rhyme judgments was abnormal in all modalities of presentation (auditory and written words and pictures) (**Table 1**).

#### **LEXICAL PROCESSING**

#### *Lexical decision*

*Methods.* Word/non-word discrimination was assessed with the Auditory Lexical Decision: Imageability × Frequency (PALPA 5) and the Visual Lexical Decision: Imageability and Frequency (PALPA 25). These two versions were administered 2 weeks apart to prevent learning. These tests use 80 words of high- and lowimagery and high- and low- frequency and 80 non-words derived from each of the real words by changing one or more letters. All non-words follow Spanish spelling rules and were pronounceable (Valle and Cuetos, 1995).

*Results.* JAM performance on Auditory Lexical Decision was preserved for words (77/80) and non-words (75/80) [χ<sup>2</sup> *(*1*)* = 0*.*13, *p* = 0*.*718]. Misses occurred in three lowimageability/low-frequency items ("anger," "dogma," "satire"), whereas false alarms in non-words were derived from lowimageability words (**Table 1**). Although AFL's performance in this task was abnormal, she recognized words (67/80) and nonwords (61/80) with similar efficiency [χ<sup>2</sup> *(*1*)* = 0*.*97, *p* = 0*.*325]. Most of her misses (e.g., "irony," "method," "satire") and false alarms occurred in low-imageability words and in non-words derived from low-imageability words. On Visual Lexical Decision JAM had better recognition of words (78/80) than non-words (67/80) [χ<sup>2</sup> *(*1*)* = 7*.*31, *p* = 0*.*007]. A similar dissociation was found in AFL (words <sup>=</sup> 71/80; non-words <sup>=</sup> 36/80) [χ<sup>2</sup> *(*1*)* = 32*.*4, *p* = 0*.*0001].

#### **SINGLE WORD COMPREHENSION** *Method*

Single word comprehension was assessed with the Spoken Word—Picture Matching (PALPA 47) and the Written Word— Picture Matching (PALPA 48) tasks. The two versions were administered 2 weeks apart to prevent learning. These tasks required that the patient match a spoken or a written word to one of five pictures (target nouns and four distractor items [one closely semantic, one distantly semantic; one visual, and one unrelated]).

#### *Results*

The performance of both patients was relatively preserved on the auditory and written presentations (**Table 1**).

#### **SENTENCE COMPREHENSION**

#### *Method*

Sentence comprehension was assessed using the Auditory Sentence Comprehension (PALPA 55) and the Written Sentence Comprehension (PALPA 56) tasks. These two versions were administered 2 weeks apart to prevent learning. These tasks require matching an auditory or written sentence presented with one of three figures, the target one and two distractors. Several types of sentences were examined including reversible (e.g., "The dog is approaching the girl") and non-reversible (e.g., "The dog is washed by the girl") sentences, active and passive sentences, directional and non-directional sentences, and gapped sentences.

#### *Results*

Both patients showed severely impaired performance in both auditory and written modalities of presentation. Their performance was similar for reversible and non-reversible sentences (**Table 1**).

#### **DIGIT PRODUCTION AND MATCHING SPAN** *Method*

This was assessed with the Digit Production/Matching Span (PALPA 13).

#### *Results*

Both patients has restricted digit production and matching span (**Table 2**).

#### **REPETITION OF WORDS AND NON-WORDS** *Method*

Length, frequency, and imageability of words can influence the accuracy of repetition amongst aphasic patients. Studies in CA suggest that repetition of short words is better than repetition of multisyllabic and grammatical words (Goodglass, 1992; Nadeau, 2001). Therefore, performance on output phonological tasks was assessed with two repetition subtests [Repetition: Syllable Length (PALPA 7) and Repetition: Non-words (PALPA 8)]. These tests contain 24 words and 24 non-words of increased length (3–6 letters). To further evaluate potential dissociations in repetition performance between words and non-words, the Repetition: Imageability × Frequency (PALPA 9) subtest was also administered. This test contains 80 words and 80 non-words presented in a mixed fashion. Words were grouped in four lists (20 items in each list) with variations in frequency and imageability. The lists contained high-frequency/high-imageability, highfrequency/low-imageability, low-frequency/high-imageability, and low-frequency/low-imageability words. These lists were matched for syllable length; items contained between one and four syllables. The non-words were matched to the words for phonological complexity. Errors in all repetition tasks were analyzed by two of us (ID-T, GD).

#### *Results*

Word repetition (PALPA 7) was mildly impaired in JAM (0.88) and AFL (0.83). Scores in word repetition were marginally better than those found in non-words (PALPA 8) in JAM [χ<sup>2</sup> *(*1*)* = 3*.*72, *p <* 0*.*054], whereas similar performances were found in AFL [χ<sup>2</sup> *(*1*)* = 0*.*46, *p* = 0*.*494] (**Table 2**). In PALPA 9, no differences were found in JAM [χ<sup>2</sup> *(*1*)* = 1*.*51, *p* = 0*.*22], but AFL repeated words significantly better than non-words [χ<sup>2</sup> *(*1*)* = 6*.*02,

**Table 2 | Auditory processing: repetition of digits, single words, and non-words.**


*Numbers in parentheses indicate proportion of correct responses.\*\*Normative data from Valle and Cuetos (1995). Abnormal results are highlighted in gray. See further details in text.*

*p* = 0*.*014]. Regarding word repetition in PALPA 9 test, both patients repeated items of the four lists with relatively similar efficiency. Repetition of low-imageability and low- frequency words in JAM (0.70) and AFL (0.80) was slightly poorer than repetition of the other lists, but differences did not reach significance. It should be noted that most non-words in the Spanish version of the PALPA 9 (Valle and Cuetos, 1995) have high word-likeness (Gathercole and Marin, 1996) because they are derived from words with a single consonant (*n* = 30; "pierna" [leg] → *pierla*) or a vowel [*n* = 22; "hospital" (hospital) → *hospitel*] exchanged. While word-likeness increases the likelihood of lexicalization on repetition tasks in patients with typical CA and left hemisphere damage (Saito et al., 2003), this was not the case in our patients as lexicalizations during non-word repetition (PALPA 9) were rare (JAM: 4/80 [0.05]; AFL: 5/80 [0.06]).

### **REPETITION: GRAMMATICAL CLASS AND MORPHOLOGY**

#### *Method*

Grammatical class (PALPA 10) and morphological endings (PALPA 11) were evaluated in both patients. PALPA 10 evaluates the effect of grammatical class. This test contains 80 words grouped in four different categories (nouns, adjectives, verbs, and functors) of 20 items in each list. PALPA 11 evaluates whether repetition is affected by morphological endings. This test contains 60 words grouped in three lists (regulars and control of regulars, irregulars and control of irregulars and derivates and control of derivates).

#### *Results*

Scores in PALPA 10 ranged from mildly (0.80) to moderately (0.60) impaired in AFL and JAM, but repetition performance was not influenced by grammatical class. Repetition with different morphological endings was mildly impaired in AFL, but her performance was relatively similar regardless the type of morphological endings. JAM had low average (0.90) repetition of irregulars and controls of irregulars and moderately impaired (0.60) regular and derivates and their controls (**Table 2**).

#### **WORD PAIR REPETITION** *Method*

To assess the influence of lexical-semantic information on repetition ability when the demand of the auditory-verbal shortterm memory is increased both patients were asked to repeat word pairs (e.g., "house-flower") (*n* = 56). Patients were asked to repeat immediately after auditory presentation in a no-delay direct condition (Martin et al., 1996; Gold and Kertesz, 2001) a total of 112 high-frequency words. The total list was composed of high-frequency/high imageability (*n* = 28), high-frequency/lowimageability (*n* = 28); low-frequency/high-imageability (*n* = 28) and low-frequency/low-imageability (*n* = 28) words. Responses were scored for the number of word pairs repeated verbatim and for the number of words repeated accurately as a function of serial position (initial and final) in the list, irrespective of whether the word pair was repeated accurately or not. The number of correct words, failures to respond, and semantic, phonologic, formal, neologistic, perseverative, and unrelated lexical errors was evaluated.

#### *Results*

Performance on this task was moderately impaired in both patients. **Table 3** shows the number of word pairs that were repeated correctly. Further analyses disclosed that JAM repeated correctly 74 of the total 112 (0.66) words. There was a serial position effect (initial <sup>=</sup> 43/56; terminal <sup>=</sup> 26/56) [χ<sup>2</sup> *(*1*)* = 9*.*58, *p* = 0*.*002] which may be attributable to his markedly reduced memory span (2 items). There were no effects of frequency/imageability. Abnormal responses were ordered by the frequency of occurrence and included: failures to respond = 17

**Table 3 | Auditory processing: repetition of word pairs, word triplets, sentences, and clichés.**


*Numbers in parentheses indicate proportion of correct responses. Abnormal results are highlighted in gray. \*Taken from Berthier (2001) except test 12. \*\*Note that testing of word triplets in JAM was partially administered. See further details in text.*

(0.44), phonological errors = 7 (0.19), neologisms = 5 (0.14), formal errors = 4 (0.11), unrelated errors = 4 (0.11), and perseverations = 1 (0.2). There were no semantic errors. AFL repeated correctly 67 of the total 112 words (**Table 3**). There was a marginally significant effect of frequency/imageability since she showed better repetition of high-frequency/high imageability word pairs than for high-frequency/low-imageability word pairs [χ<sup>2</sup> *(*1*)* = 3*.*51, *p <* 0*.*061], but there were no other differences. There were no serial position effects (initial = 30/56; terminal = 29/56) on word pair repetition which may be attributable to her memory span (3 items). Her responses included phonological errors = 22 (0.49), neologisms = 11 (0.24), formal errors = 7 (0.16), failures to respond = 3 (0.07), unrelated errors = 1 (0.02), and perseverations = 1 (0.2). There were no semantic errors.

### **REPETITION OF WORD TRIPLETS**

#### *Methods*

Patients were also asked to repeat word triplets. This task is a modification of the one used by McCarthy and Warrington (1984, 1987) in patients with CA. In the present battery two sets of 60 three-word lists (verb-adjective-noun) were created (Berthier, 2001). These were composed of word strings of increasing semantic richness that is from non-organized to organized semantic information. Two 20 three-word lists (List 1: 60 highfrequency words; List 4: 60 low-frequency words) consisted of random word combinations (e.g., "buy-sweet-country"). Two other 20 three-words lists (List 2: 60 high-frequency words; List 5: 60 low-frequency words) conveyed loosely constrained meaningful information (e.g., "defend-hero-gold"), and two other 20 three-word lists (List 3: 60 high-frequency words; List 6: 60 low-frequency words) conveyed closely constrained meaningful information (e.g., "cut-lovely-flower"). Words were read at a rate of one per second and patients were required to repeat the words in the order given by the examiner. Responses were scored for the number of lists repeated verbatim in each condition and for the number of words repeated accurately as a function of serial position (initial, medial and final) in the list, irrespective of whether the whole triplet was repeated accurately or not. The number of correct words, failures to respond, and semantic, phonologic, formal, neologistic, perseverative, and unrelated lexical errors was evaluated.

#### *Results*

Performance on this task was severely impaired in both patients (**Table 3**). JAM failed to repeat any word triplet correctly (e.g., "read-new-book" → *read . . . .don't know*). Since he became frustrated after repeated unsuccessful attempts the task was discontinued after 10 consecutive failures in each list. Analysis of individual words during these interrupted trials indicated that JAM repeated more words in triplets rich in semantic relations than in the other lists, showing significantly better performances in high-frequency triplets than low-frequency triplets [χ<sup>2</sup> *(*1*)* = 4*.*17, *p <* 0*.*041]. AFL could only repeat 8 out of 120 (0.07) word triplets correctly (List 2 = 3; List 3 = 4; List 6 = 1) and there was a trend for better repetition of high-frequency than low-frequency word triplets [χ<sup>2</sup> *(*1*)* = 3*.*32, *p <* 0*.*068]. Analysis of semantic relatedness in high-frequency lists (Lists 1, 2 and 3) showed a trend for better repetition of List 3 (constrained meaningful information) than List 1 (random word combination) [χ<sup>2</sup> *(*1*)* = 3*.*42, *p <* 0*.*064], but there were no further differences in the other high-frequency lists or in the low-frequency lists. Analyses of performance in repeating individual words disclosed that AFL produced more correct items while repeating high-frequency (75/180 [.42]) than lowfrequency (25/180 [.12]) triplets [χ<sup>2</sup> *(*1*)* = 33*.*1, *p <* 0*.*0001]. For the sake of simplicity, error analysis in both patients was performed considering only the total number of errors regardless the list in which they occurred. Abnormal responses were ordered by the frequency of occurrence. Note that since in JAM this task was interrupted after 10 consecutive failures in each list, only 180 words could be analyzed. His responses were failures to respond = 144 (0.80), semantic errors = 5 (0.03), perseverations = 4 (0.02), phonological errors = 3 (0.02), unrelated errors = 2 (0.01), and neologisms = 1 (0.00). The total number of words (*n* = 360) could be analyzed in AFL and her abnormal responses included phonological errors = 95 (0.36), neologisms = 50 (0.19), perseverations = 43 (0.16), failures to respond = 41 (0.16), formal errors = 13 (0.05), unrelated errors = 12 (0.05) and semantic errors = 6 (0.02). Although their responses contained a number of errors, none of them produced meaning-based paraphrases. Patients' performance according to the serial position in the list were relatively similar for initial (JAM = 0.3; AFL = 0.27), medial (JAM = 0.1; AFL = 0.23), and terminal (JAM = 0.7; AFL = 0.36) positions.

#### **REPETITION OF SENTENCES** *Methods*

Sentence repetition was assessed with the PALPA 12. This task evaluates the ability to repeat auditorily-presented sentences (*n* = 36) of different length (from 5 to 9 words). It is composed of reversible sentences (*n* = 20) and non-reversible (*n* = 16) sentences. Serial position curves were generated for all 7-word sentences (*n* = 18).

#### *Results*

Sentence Repetition (PALPA 12) was severely abnormal in both patients (**Table 3**). In fact, AFL failed to repeat a single sentence correctly, whereas JAM had less difficulty and could repeat some non-reversible sentences yet his performance was severely abnormal (8/36 [.22]). Error analysis revealed that both patients omitted many words and mainly produced phonological errors; neologisms were also heard in AFL. Semantic paraphasias were not observed in AFL, but JAM produced rare semantic errors ("man" → *owner*) and semantic perseverations. Paraphrases of target sentences were conspicuously absent in AFL. In JAM there were no paraphrases in strict sense, except for the presence of a difficult to classify sentence (sentence 17: "This dog has more cats to chase" → *This dog ... this cat, there are more to run*) in which the meaning of the original sentence was not fully replicated in the response (Saffran and Marin, 1975). Analyses of serial position curves of seven word sentences revealed a tendency for repeating initial (items 1 and 2) and terminal (item 6) words (range of correct for these positions: 60–80%) correctly with frequent omissions (range of correct: 20–40%) of words in the midportion of sentences (items 3, 4, 5) in JAM. A more inconsistent pattern was seen in AFL who tended to reproduce more regularly words in certain positions (items 1, 3, 5, 6) (range of correct: 60–80%) than the words in other positions (range of correct: 25%–50%).

### **REPETITION OF CLICHÉS AND NOVEL SENTENCES.**

#### *Method*

To explore possible dissociation between both types of sentences, JAM was asked to repeat familiar idiomatic Spanish sentences (clichés) (*n* = 40) taken from the 150 Famous Clichés of Spanish Language (Junceda, 1981) as well as a set of novel sentences (*n* = 40) that were construed following the methodology described by Cum and Ellis (1999) and Berthier et al. (2011). For example, for the idiomatic cliché: "Me lo dijo un pajarito" ("A little bird told me") the novel control sentence: "Me lo dijo mi compadre" ("My friend told me") was created. This task was not administered to AFL.

#### *Results*

JAM was moderately impaired in these tasks obtaining relatively similar scores in both types of sentences. He rarely made paraphrases in novel sentence repetition (3/40 [.08]) and only 1 paraphrase (1/40 [.02]) was heard in repetition of idiomatic clichés ("Mess things up" → *Make a mess*).

#### **DISCUSSION**

We have described the profile of language deficits in two chronic aphasic patients. They did poorly in input phonological tasks (minimal pairs, rhyme judgments) when stimuli were presented in auditory and written modalities. Lexical-semantic processing for single words (lexical decision, comprehension) was relatively preserved in these input modalities, but both patients infrequently accessed meaning when asked to comprehend and repeat complex verbal messages. Indeed, a relatively preserved performance in single word repetition contrasted with a severe impairment in repetition of digits, non-words, word lists, sentences, novel phrases and idiomatic clichés. In several instances, repetition was not significantly influenced by the frequency, imageability, and lexicality of stimuli. This atypical combination of language deficits could also be deemed uncommon because they took place in two strongly right-handed patients with residual crossed CA associated with predominantly right striatal/capsular lesions also affecting the AF, IFOF, anterior commissure, and temporal stem. The distinctive features of this clinical-anatomical correlation are discussed below.

#### **CROSSED SUBCORTICAL APHASIA**

Crossed subcortical aphasia is a rare condition to the extent that in a recent review of the literature only nine cases met criteria for "possibly reliable" or "reliable" diagnosis (De Witte et al., 2008). During the acute and early chronic periods both JAM and AFL most likely had Wernicke's aphasia and left hemiplegia which resulted from extensive right striatal/capsular lesions extending into the temporal stem/IFOF and supramarginal gyrus/AF. This clinical-anatomical correlation likely represents the rightsided analogue to the syndrome of Wernicke-type aphasia with right hemiparesis secondary to left subcortical injury originally described by Naeser et al. (1982). This syndrome, which is considered a rare entity (Wolfe and Ross, 1987), usually occurring with atypical language deficits (Damasio et al., 1982), has not been well-defined in crossed aphasic patients (Basso et al., 1985). In their original publication, Naeser and colleagues (1982) described three aphasic syndromes associated with left capsular/putaminal involvement and variable lesion extension to either anterior-superior, posterior, or both anterior-superior and posterior neighboring structures. Of these, the syndrome that best fits with the one we found in JAM and AFL after right hemisphere injury is characterized by poor comprehension, fluent Wernicke's type speech, and lasting right hemiplegia in association with left capsular/putaminal damage and posterior lesion extension to the auditory radiations in the temporal stem (Cases 4, 5, and 6 in Naeser et al., 1982, pp. 8–10). In Naeser et al.'s case series (1982) testing in the chronic period was possible in two patients and it revealed improvement in all language modalities in one patient and no changes in the other.

Our patients may be interpreted as presenting "mirror image" crossed CA (Alexander et al., 1989a) for two reasons: (1) similar surface symptoms and lesion topography to the syndrome described after left hemisphere involvement; and (2) gradual resolution of language deficits from receptive aphasia to a less severe CA as is regularly described in cases with Wernicke's aphasia and left hemisphere lesions (Goodglass, 1992). Regrettably, in the aphasic patients with left "capsular/putaminal with posterior lesion extension" described by Naeser et al., 1982 language deficits (including repetition) were succinctly described, thus making it hard to establish whether or not their intrinsic characteristics were typical. Increasing our understanding on this issue is desirable because evaluation of repetition deficits in patients with "mirror image" crossed CA has been performed only in patient EDE who unexpectedly showed atypical performance on word list and sentence repetition (Berndt et al., 1991). This would mean that repetition deficits in some cases with right-hemisphere language dominance deviate from the classical pattern reported in similar cases with left hemisphere dominance because the neural organization of language in the former is different. Regrettably, the scarcity of similar well-studied cases and the reported heterogeneity in demographic and clinical-anatomic variables prevent further elaborations. It suffices to say that atypical neural organization of language in the right hemisphere may apply for patient EDE with right temporal-parietal involvement (Berndt et al., 1991) but possibly not for ORF, a left-handed conduction aphasic patient with right parietal damage and good access to meaning during word list and sentence repetition (McCarthy and Warrington, 1984).

It is even more difficult clarifying the finding of atypical language deficits in our crossed aphasic patients with striatal/capsular involvement because atypical language deficits are common in left subcortical aphasia (Albert et al., 1981; Damasio et al., 1982; Fromm et al., 1985) and because the role of left basal ganglia in language deficits is still controversial (Damasio et al., 1982; Naeser et al., 1982; Cappa et al., 1983; Nadeau and Crosson, 1997). Most studies evaluating subcortical stroke provided evidence against a prominent role of basal ganglia in language and instead attributed language deficits to the deleterious effect of subcortical involvement on the overlying cortex (Nadeau and Crosson, 1997; Hillis et al., 2002; Radanovic and Scaff, 2003; de Boissezon et al., 2005; Choi et al., 2007). One study on vascular aphasia secondary to left subcortical lesions mainly affecting the striatum ascribed lexical-semantic deficits to dysfunction of the basal temporal language area and IFOF (de Boissezon et al., 2005). Anatomical data in our patients with crossed CA also suggest that the pattern of language deficits (impaired sentence comprehension, sentence repetition) may be linked to damage to the right basal temporal language area and white matter tracts rather than to the striatocapsular lesions.

#### **DISSOCIATED STRUCTURE-FUNCTION RELATIONSHIPS IN CROSSED SUBCORTICAL APHASIA?**

There is some evidence that the AF is asymmetric being larger in volume and having a higher fiber density in the left hemisphere compared to the right (Parker et al., 2005; Powell et al., 2006; Vernooij et al., 2007; Catani and Mesulam, 2008; Axer et al., 2012; Catani and Thiebaut de Schotten, 2012). Combining DTI and fMRI in a small group of strongly right-handed healthy subjects, Powell et al. (2006) demonstrated for the first time that a greater development of left hemisphere white matter tracts in comparison with their homologues counterparts correlated with left-sided lateralization of language function. Although this structure-function correspondence has been replicated in subsequent studies (Matsumoto et al., 2008; Saur et al., 2008), other studies variously combining DTI with fMRI, Wada test, or other ancillary methods (resting-state functional connectivity analysis) have questioned the long-held assumption that leftward asymmetry in volume of cortical areas (planum temporale) and white matter pathways underlie functional lateralization (see references in Vernooij et al., 2007; Turken and Dronkers, 2011). In complimentary terms, differences in the intra- and inter-hemispheric architecture and function of perisylvian white matter tracts exist and might account for the distinct performance in verbal repetition in healthy subjects (Catani et al., 2007) and in patients presenting with contrasting aphasic deficits (conduction aphasia *versus* transcortical aphasias) (Catani et al., 2005; Berthier et al., 2012). In fact, DTI studies reveal intra- and inter-hemispheric variability of white matter pathways underpinning repetition, most notably of the AF/superior longitudinal fasciculus (SLF) (Nucifora et al., 2005; Catani and Mesulam, 2008; Gharabaghi et al., 2009; Friederici and Gierhan, 2013). Leftward biased asymmetry of the AF/SLF predominates in males and usually coexists with the absence or vestigial development of its long segment in the right hemisphere (Catani et al., 2005; Powell et al., 2006; Catani and Mesulam, 2008; Thiebaut de Schotten et al., 2011; Catani and Thiebaut de Schotten, 2012; Häberling et al., 2013) although at least one study reproduced the left hemisphere architecture and connectivity in the right hemisphere (Gharabaghi et al., 2009). Another study found reversed asymmetry of the AF in healthy males with right hemisphere language lateralization (Häberling et al., 2013). More symmetric patterns (bilateral-left and bilateral) of the AF/SLF prevail in females (∼40%) and some researchers consider that other white matter bundles (IFOF) are also less lateralized than the dorsal stream but this has not been confirmed in all studies (Cao et al., 2003; Rodrigo et al., 2007). Regarding function of the AF/SLF, recent studies using Wada test (Matsumoto et al., 2008) or fMRI (Saur et al., 2008) documented leftward lateralization in subjects with left hemisphere dominance for language; however, it has also been shown that left-handers with right hemisphere language dominance (as seen using fMRI) (Vernooij et al., 2007) actually have left-lateralized AF. Taken together these later findings align with the hypothesis that lateralized hemispheric function is not always guided by structural asymmetry (Wada, 2009). In support of this view, we did find dissociation between structure and function in JAM. The extensive right subcortical lesion in JAM hindered not only the comparison of inter-hemispheric AF and IFOF architecture but also the possibility of ruling out a reversal of the anatomical asymmetry. Nevertheless, the DTI identified well-developed residual components (anterior indirect and posterior segments) of the right AF/SLF that have escaped from tissue damage together with fully developed AF and IFOF in the left hemisphere which suggest symmetric or leftward lateralization. Despite this structural arrangement, JAM had right hemisphere dominance for language as reflected by his severe and long-lasting repetition disorder consequential to damage to the right AF/SLF and IFOF. Our study did not provide direct evidence of the functional activity of the left white matter tracts (AF, IFOF), yet the persistence of severe deficits on repeating (non-words, word lists and sentences) and accessing meaning during both sentence comprehension and repetition 1 year after stroke onset makes the natural and therapy-based compensation of such deficits by means of the fully-developed left white matter tracts negligible. Nevertheless, further studies are clearly needed to establish the structure-function relationships amongst individuals with atypical language lateralization.

#### **IS REPETITION ATYPICAL IN CROSSED SUBCORTICAL APHASIA?**

In both JAM and AFL word repetition scores ranged from normal to mild impairment but their performance in non-word repetition was markedly abnormal, a profile generally described in patients with CA and left hemisphere damage (Caplan and Waters, 1992; Goodglass, 1992). Functional neuroimaging in healthy subjects shows activation of superior temporal and premotor cortices bilaterally during single word repetition, whereas non-word repetition activates the same cortical regions mostly in the left hemisphere (Weiller et al., 1995; Saur et al., 2008). Studies combining fMRI with DTI reveal interaction between superior temporal and premotor areas during sublexical repetition via the AF/SLF (Saur et al., 2008). Based on these observations the likely mechanism accounting for the superior performance in JAM and AFL on repeating words over non-words may be the conjoint activity of residual areas of the injured right hemisphere and the intact left hemisphere (Weiller et al., 1995; Ohyama et al., 1996; Abo et al., 2004). Poor non-word repetition may be the expected consequence of right hemisphere damage with limited possibility of natural left hemisphere compensation. In support, lesion analysis in both patients and DTI findings in JAM showed massive involvement of the long direct segment of the AF normally engaged in auditory/phonological transcoding (word and non-word repetition) (Catani et al., 2005; Saur et al., 2008; Catani and Thiebaut de Schotten, 2012; Cloutman, 2012; Friederici and Gierhan, 2013). It should be noted, however, that their performance in other repetition tasks differed in a number of important respects from typical CA associated with left hemisphere lesions (Saffran and Marin, 1975; McCarthy and Warrington, 1984, 1987; Martin, 1996; Martin and Saffran, 1997; Gold and Kertesz, 2001; Bartha and Benke, 2003). Repetition in phonologically-impaired patients with left hemisphere involvement (e.g., CA) is generally reliant on lexicalsemantic processing (McCarthy and Warrington, 1984, 1987; Martin and Saffran, 1997; Jefferies et al., 2007). The use of this alternative strategy increases the likelihood of producing word errors (formal paraphasias) and semantic errors particularly in highly demanding tasks such as immediate serial repetition of word lists and sentences and delayed repetition (Martin et al., 1994; Martin, 1996; Gold and Kertesz, 2001; Jefferies et al., 2006). Additionally, reliance on lexical-semantic processing in some conduction aphasic patients with severely abnormal phonological processing is manifested by "part of speech" effects (e.g., nouns are repeated better than verbs) and production of semantic paraphasias ("necklace" → *gold*) during single word repetition (deep dysphasia) (Michel and Andreewsky, 1983; Katz and Goodglass, 1990; Butterworth and Warrington, 1995; Martin, 1996; Martin et al., 1996; Ablinger et al., 2007; Jefferies et al., 2007). Such overreliance on lexical-semantic processing allows CA patients to excel in repetition tasks tapping these functions relative to other tasks taxing phonological processing. In this vein, patients with typical CA show better repetition of lowfrequency words embedded as the last word in a sentence than when the same word is presented in isolation (McCarthy and Warrington, 1984). Abnormal performance in repeating meaningless word lists by conduction aphasics improves when the meaningfulness of lists is increased (McCarthy and Warrington, 1987) and these patients are also better able to repeat novel sentences which require access to meaning than over-learned idiomatic clichés (McCarthy and Warrington, 1984; Berthier, 1999). Finally, verbatim repetition of word lists and sentences poses serious difficulties to conduction aphasics due to their impaired capacity to hold the phonological trace in auditoryverbal short-term memory forcing them to process sentences by meaning and producing paraphrases of the target sentence during repetition (Saffran and Marin, 1975; Martin, 1993; Bartha and Benke, 2003).

Our patients repeated words more accurately than non-words and in one patient (JAM) stimulus length influenced more than frequency/imageability the dissociation between word and nonword repetition, whereas the reverse pattern was true for the other patient (AFL). Nevertheless, the occurrence of other abovementioned features of typical CA did not occur in all repetition tasks in our patients. Indeed, frequency/imageability, and grammatical class had no influence on single word repetition performance, although we acknowledge that in one such task (imageability/frequency) both patients obtained high scores that may have attenuated differences due to ceiling effects. This effect was not observed in JAM in the other task (grammatical class), however. Word pair repetition was moderately impaired and a marginal effect of frequency/imageability was only found in AFL. Moreover, patients produced more omissions and phonological errors than formal errors or word pair repetition and there were no semantic paraphasias, a pattern of performance that differs from the "lexical bias" (formal and semantic errors *>* phonological errors) reported in patients with typical CA and left hemisphere damage (Gold and Kertesz, 2001). Since word triplet repetition was extremely poor in both patients, we analyzed the accuracy of individual words on triplets. There was an influence of frequency in both patients who produced more correct items while repeating high-frequency than low-frequency lists. Moreover, they accurately repeated more individual words in triplets containing meaningful semantic information than in other conditions, thus implying that accurate repetition required semantic support. However, reliance on lexical-semantic processes could be deemed incomplete because both patients did not produce meaning-based paraphrases (e.g., "eat-deliciousapple" → *eat-juicy-fruit*) which is at variance to that frequently reported in patients with typical CA during repetition of twoand three-word lists (Gold and Kertesz, 2001; Berthier et al., 2012). Repetition of sentences from PALPA 12 was severely impaired in both patients and again paraphrases of target sentences were absent in AFL and JAM rarely produced ill-formed paraphrases in this task, novel sentences and clichés. Limited lexical-semantic access during word triplet and sentence repetition is in accord with findings from the two previous cases of crossed CA (Berndt et al., 1991; Berthier et al., 2011). Moreover, superior repetition of novel sentences over idiomatic clichés previously reported in typical CA patients (McCarthy and Warrington, 1984) reflecting overreliance on lexical-semantic processes was not observed in JAM (this test was not administered to AFL). Finally, it should be noted that JAM had more reliance on lexical-semantic processes in other output modalities (reading and spelling) (De-Torres et al., *in preparation*), a dissociation already reported in other patients with "deep" disorders (e.g., Miceli et al., 1994; Jefferies et al., 2007). Analysis of further cases is clearly needed to examine whether or not interactions between phonological and lexical-semantic systems in crossed CA are dysfunctional.

If one accept that JAM, AFL and the two previously published cases, EDE (Berndt et al., 1991) and JNR (Berthier et al., 2011) had limited access to meaning at least during sentence comprehension and repetition, the question arising now is which neural mechanisms are dysfunctional. Analysis of available brain images in these two previous cases and the outline of white matter tracts with the aid of a fiber tract atlas (Catani and Thiebaut de Schotten, 2012) in JAM and AFL and DTI analysis in JAM revealed that cortical and subcortical lesions unfailingly compromised the right dorsal (AF) and ventral auditory processing streams (IFOF) in all cases. DTI in JAM disclosed damage to the right long direct segment of the AF and IFOF with relative sparing of the anterior indirect and posterior segments, together with fully developed left AF and IFOF. Although DTI could not be performed in AFL (she was studied in 1997), her anatomical T1-weighted images were compared with a human atlas of fiber tract connections (Catani and Thiebaut de Schotten, 2012) and revealed compromise of AF and IFOF. The role of the dorsal language stream system (AF/SLF) is to monitor auditory-motor integration of speech by allowing a fast and automated preparation of copies of the perceived speech input (Saur et al., 2008; Peschke et al., 2009; Rijntjes et al., 2012). Some components of this long-distance bundle have also been linked to attention and short-term maintenance of phonological traces (Majerus, 2013). The ventral language pathways (inferior longitudinal fasciculus, IFOF and uncinate fasciculus) participate in comprehension by mapping sounds onto meaning (Saur et al., 2008; Peschke et al., 2009; Weiller et al., 2011; Cloutman, 2012) although the precise functional role of every tract is still controversial (Duffau et al., 2009; Harvey et al., 2013). These white matter bundles are engaged in different language functions (Hickok and Poeppel, 2004; Rolheiser et al., 2011; Weiller et al., 2011; Cloutman, 2012; Friederici and Gierhan, 2013) although they interact in a synergistic way (Rolheiser et al., 2011; Cloutman, 2012; Majerus et al., 2012; Majerus, 2013), so that phonological sequencing and articulation from the dorsal stream operate in concert with the semantic information from the ventral stream to guarantee efficient production and comprehension of language (Turken and Dronkers, 2011; Cloutman, 2012; Friederici and Gierhan, 2013; Rijntjes et al., 2012). Therefore, impaired sentence comprehension and repetition of non-words, word lists and sentences in JAM and AFL may be ascribed to the simultaneous damage to the ventral (AF) and dorsal (IFOF) streams.

JAM, AFL and the two previous cases, EDE and JNR (Berndt et al., 1991; Berthier et al., 2011) also had variable cortical involvement which definitely contributed to the observed deficits. Right temporo-parietal involvement (large in EDE and JRN and mild to moderate in JAM and AFL) was heterogeneous but consistently involved the right ventral temporal cortex encompassing the temporal stem and its adjoining auditory and visual white matter tracts. Comprehension deficits in acute (Naeser et al., 1982; Kümmerer et al., 2013) and chronic aphasia (Alexander et al., 1989b; Sharp et al., 2004) have been correlated with dysfunction of ventral temporal cortex and interruption of long-distance association (ventral stream—IFOF) and commissural (anterior commissure) cortico-cortical pathways (Sharp et al., 2004; Warren et al., 2009; Turken and Dronkers, 2011; Weiller et al., 2011; Cloutman, 2012; Friederici and Gierhan, 2013). Functional neuroimaging and brain stimulation studies also found that the basal temporal cortex, frontal operculum and the ventral stream are strongly engaged in lexical-semantic and syntactic processing (Nobre et al., 1994; Sharp et al., 2004; Warren et al., 2009; Rolheiser et al., 2011; Friederici and Gierhan, 2013; Koubeissi et al., 2012; Weiller et al., 2011). In consonance with these data, our patients and the two previously published cases (Berndt et al., 1991; Berthier et al., 2011) had auditory and written comprehension preserved for single words but not for sentences presented in these input modalities. The basal ganglia components of the lesions in our patients involved the anterior commissure (Warren et al., 2009; Catani and Thiebaut de Schotten, 2012) and probably interrupted functional connectivity between homologous regions of the anterior and medial temporal cortex, thus preventing access to meaning in the left temporal cortex during sentence comprehension/production (Umeoka et al., 2009; Warren et al., 2009). In addition, tissue damage to the right basal temporal cortex is highly likely to disrupt its reciprocal connectivity with the posterior-superior temporal gyrus further hampering phonological processing (Ishitobi et al., 2000; Koubeissi et al., 2012). Therefore, it seems that damage to these structures might have impeded in our patients a compensatory recruitment of the lexical-semantic system in the service of repetition as in usually observed in patients with chronic CA and left hemisphere damage.

#### **LIMITATIONS**

One important shortcoming of our study is that formal language evaluations could be performed only in the chronic period. This precluded determining whether some functions were spared (e.g., single word comprehension) because they were unaffected by tissue damage or whether they were abnormal in the early stages and recovered later on reflecting the action of compensatory mechanisms associated with either brain reparation or the recruitment of alternative brain areas. Future studies in aphasic patients like the ones described here should be longitudinal, initiated soon after brain damage, and complemented with multimodal imaging (e.g., fMRI, arterial spin labeling, positron emission tomography) to evaluate dissociation of language functions and also to rule out remote effects in the contralateral hemisphere.

#### **CONCLUDING REMARKS**

In conclusion, our findings reveal that patients with crossed CA and right striatal/capsular lesions extending inferiorly to the temporal stem and IFOF and superiorly to the AF and white matter beneath the supramarginal gyrus may show limited access to lexical-semantic information during word list and sentence repetition. Interruption of the long direct segment of the right AF might account for the abnormal performance in word and non-word repetition. Damage to the right ventral stream (IFOF) running between the insular cortex and putamen might be responsible from the impairment of the lexicalsemantic and syntactic processing necessary for accurate sentence comprehension and repetition. In addition, the involvement of the right basal temporal cortex (temporal stem, basal language area) may have severed commissural pathways (anterior commissure) disrupting functional connectivity with its homologous counterpart further limiting the access to meaning during sentence comprehension/production (Umeoka et al., 2009; Warren et al., 2009) and also with the posterior-superior temporal gyrus disturbing phonological processing (Ishitobi et al., 2000; Koubeissi et al., 2012). Further analysis of individuals with right hemisphere language dominance is needed to enhance our understanding on the role of white matter tracts in language repetition.

#### **ACKNOWLEDGMENTS**

We appreciate the cooperation shown by the patients and their families. We thank Julian Hinojosa for his participation in the linguistic evaluation of patient AFL and Francisco Alfaro for performing MRIs.

#### **REFERENCES**


dysphasia". *Neurocase* 1, 55–66. doi: 10.1080/13554799508402346


lesions in the basal ganglia and internal capsule. *Arch. Neurol.* 39, 15–24. doi: 10.1001/archneur.1982.00510130017003


H. F., et al. (1990). Cerebral language lateralization: evidence from intracarotid amobarbital testing. *Neuropsychologia* 28, 831–838. doi: 10.1016/0028-3932(90)90007-B


*J. Neurol.* 255, 1703–1711. doi: 10.1007/s00415-008-0005-9


and fiber tracking. *AJNR Am. J. Neuroradiol.* 28, 1526–1531. doi: 10.3174/ajnr.A0584


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 28 June 2013; accepted: 26 September 2013; published online: 18 October 2013.*

*Citation: De-Torres I, Dávila G, Berthier ML, Froudist Walsh S, Moreno-Torres I and Ruiz-Cruces R (2013) Repeating with the right hemisphere: reduced interactions between phonological and lexical-semantic systems in crossed aphasia? Front. Hum. Neurosci. 7:675. doi: 10.3389/fnhum.2013.00675*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2013 De-Torres, Dávila, Berthier, Froudist Walsh, Moreno-Torres and Ruiz-Cruces. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Predicting speech fluency and naming abilities in aphasic patients

#### *Jasmine Wang , Sarah Marchina , Andrea C. Norton , Catherine Y. Wan and Gottfried Schlaug\**

*Neuroimaging and Stroke Recovery Laboratory, Department of Neurology, Beth Israel Deaconess Medical Center and Harvard Medical School, Boston, MA, USA*

#### *Edited by:*

*Marcelo L. Berthier, University of Malaga, Spain*

#### *Reviewed by:*

*Karine Marcotte, Université de Montreal, Canada Dorothee Kuemmerer, Universitaetsklinik Freiburg, Germany Ana I. Ansaldo, Université de Montréal, Canada*

#### *\*Correspondence:*

*Gottfried Schlaug, Department of Neurology, Beth Israel Deaconess Medical Center and Harvard Medical School, 330 Brookline Avenue, Boston, MA 02215, USA e-mail: gschlaug@bidmc.harvard.edu* There is a need to identify biomarkers that predict degree of chronic speech fluency/language impairment and potential for improvement after stroke. We previously showed that the Arcuate Fasciculus lesion load (AF-LL), a combined variable of lesion site and size, predicted speech fluency in patients with chronic aphasia. In the current study, we compared lesion loads of such a structural map (i.e., AF-LL) with those of a functional map [i.e., the functional gray matter lesion load (fGM-LL)] in their ability to predict speech fluency and naming performance in a large group of patients. The fGM map was constructed from functional brain images acquired during an overt speaking task in a group of healthy elderly controls. The AF map was reconstructed from highresolution diffusion tensor images also from a group of healthy elderly controls. In addition to these two canonical maps, a combined AF-fGM map was derived from summing fGM and AF maps. Each canonical map was overlaid with individual lesion masks of 50 chronic aphasic patients with varying degrees of impairment in speech production and fluency to calculate a functional and structural lesion load value for each patient, and to regress these values with measures of speech fluency and naming. We found that both AF-LL and fGM-LL independently predicted speech fluency and naming ability; however, AF lesion load explained most of the variance for both measures. The combined AF-fGM lesion load did not have a higher predictability than either AF-LL or fGM-LL alone. Clustering and classification methods confirmed that AF lesion load was best at stratifying patients into severe and non-severe outcome groups with 96% accuracy for speech fluency and 90% accuracy for naming. An AF-LL of greater than 4 cc was the critical threshold that determined poor fluency and naming outcomes, and constitutes the severe outcome group. Thus, surrogate markers of impairments have the potential to predict outcomes and can be used as a stratifier in experimental studies.

**Keywords: aphasia, fluency, outcome, therapy, lesion size/volume, diffusion tensor imaging, functional MRI**

#### **INTRODUCTION**

Aphasia is a common symptom after left hemisphere stroke, and affected individuals often experience incomplete recovery despite receiving intense speech therapy after the acute stroke phase (Kertesz and McCabe, 1977; Wade et al., 1986; Pedersen et al., 1995; Engelter et al., 2006). Most natural recovery and traditional speech therapy-facilitated recovery from aphasia occurs during the first 6 months following a stroke (Nicholas et al., 1993; Moss and Nicholas, 2006; Lazar et al., 2010), although significant improvements in language functions have been described in case studies and in chronic patients undergoing intense and experimental therapies (Meinzer et al., 2005, 2007; Fridriksson et al., 2012; Zipse et al., 2012). Factors that can determine a patient's recovery from aphasia include lesion size and lesion site (Lazar and Antoniello, 2008; Marchina et al., 2011), as well as the initial level of impairment (Lazar et al., 2010). Other factors such as age, gender, degree of hemispheric language laterality, and small vessel ischemic lesion burden are also likely to play a role, but their significance in explaining some of the variance in outcome has not been well examined in larger-scale studies.

Voxel-based lesion symptom mapping has been used in the past to relate lesions to particular language behaviors in aphasic patients (Borovsky et al., 2007; Turken et al., 2008; Baldo et al., 2012; Magnusdottir et al., 2012). Our previous work took this approach one step further and related lesion volume to a speechand language-relevant anatomical structure, creating a lesion load variable of the AF, which proved to be a superior predictor of speech production over lesion volume. Marcotte et al. (2012) also found that lesion volume *per se* was not a correlate of recovery in anomic patients. Introduced by Zhu et al. (2010), lesion load is a combined variable of lesion size and site that measures the effects of a lesion on easily definable and clinically relevant anatomical structures, such as white matter tracts derived from diffusion tensor imaging. The lesion load measure can serve both as a biomarker of speech fluency impairment and a predictor of aphasia outcome after a stroke. The method entails overlapping canonical probabilistic maps of a white matter tract (derived from diffusion tensor imaging) with patients' stroke lesion masks. One such speech-related tract is the arcuate fasciculus (AF), known from previous studies to play a critical role in the feedforward and feedback control of speech production (Breier et al., 2008; Hosomi et al., 2009; Saur et al., 2010b). The AF may have direct components (i.e., connections between temporal and inferior frontal brain regions) as well as indirect components (i.e., connections between temporal and parietal regions, and then parietal with frontal regions) (Catani et al., 2005). The horizontal portion of the AF mingles with the superior longitudinal fasciculus (SLF). Our understanding regarding the functional role of the AF in speech fluency/production and language functions in general is still evolving. It is thought that the AF is not only involved in auditory-motor mapping, including the feedforward and feedback control of speech-motor functions, but may also play a role in more domain general functions (Dick and Tremblay, 2012), such as syntactic processing, comprehension, and perception (Glasser and Rilling, 2008; Rolheiser et al., 2011).

On the other hand, contributions of the ventral white matter tracts (i.e., EMC and UF) in speech production remain unclear; despite fMRI and DTI evidence of the ventral stream's role in speech comprehension (Hickok and Poeppel, 2007; Saur et al., 2008, 2010b), lesion mapping, cortical and sub-cortical stimulation studies suggest that the ventral stream tracts (particularly the UF) do not play a dominant role in speech fluency and speechmotor functions (Duffau et al., 2009, 2013; Marchina et al., 2011; Moritz-Gasser and Duffau, 2013). Marchina et al. (2011) had explanded upon our original lesion load approach related to motor outcomes (Zhu et al., 2010) by overlaying lesions onto a canonical probabilistic map of the AF, the extreme capsule (EMC), and the uncinate fasciculus (UF) in 30 chronic patients with aphasia. They found that the Arcuate Fasciculus lesion load (AF-LL) best predicted speech fluency and naming, and that lesion loads of the EMC and UF tracts did not significantly correlate with measures of speech fluency and naming outcomes. In the present study, we re-examined possible contributions of the EMC and UF lesion loads using a larger group of patients; in addition, we updated the canonical maps of the AF, EMC, and UF tracts, which are now derived from probabilistic tracking in normal controls in lieu of deterministic tracking as used in Marchina et al. (2011).

Although the AF-LL has been shown to be a surrogate white matter marker of speech fluency after stroke (Marchina et al., 2011), speech production impairment and language recovery have also been related to the pattern of intact perilesional gray matter regions (Fridriksson, 2010; Fridriksson et al., 2012). In patients with relatively small left hemisphere lesions, particularly those sparing perisylvian regions of the temporal and inferior frontal cortices and allowing for reperfusion/recovery of those regions, recovery-related functional imaging changes are typically found in perilesional cortex (Heiss et al., 1997; Rosen et al., 2000; Crosson et al., 2007). Fridriksson et al. (2012) also found that the activation of perilesional areas within the language network was related to improvement in a naming task. While the contribution of contralesional homolog cortical activations toward recovery remains unclear (Heiss et al., 1999; Baumhauer et al., 2008; Meinzer et al., 2008; Bantis et al., 2010; Saur et al., 2010a,b; Schlaug et al., 2010), functional imaging studies in healthy controls suggest that both hemispheres are involved in the production and control of speech output when the rate of production is slow (Ozdemir et al., 2006). However, various studies have shown that speech functions are mostly left-lateralized (Knecht et al., 2000; Turken and Dronkers, 2011). For the current study, we defined a functional gray matter (fGM) map that included cortical brain regions active during speech production, and applied the lesion load method to this surrogate marker of lesion site and size. In addition, we tested prediction of a combined AF-fGM map, which was created from summing fGM and AF maps.

Since complex language function such as fluency, conversation, and naming are dependent on a cortical network of brain regions and connections through white matter tracts, it is clear that lesion map variables can only serve as surrogate marker for normal or impaired language function, and thus, do not allow us to draw firm conclusions that specific language functions are associated with particular structures. While a surrogate lesion marker may implicate the important role of a structure in the network of brain regions, it should not be assumed as the seed of the function.

The aim of the current study was to examine three surrogate biomarkers, a structural white matter lesion load (i.e., AF-LL), a functional gray matter lesion load (fGM-LL), and a combined structural and functional lesion load (AF-fGM-LL), in their ability to predict speech fluency and naming performance in a large group of chronic aphasic patients. In addition, we aimed to replicate the findings of our previous study comparing AF, EMC, and UF lesion load predictions of speech fluency and naming with updated probabilistic tracts. Lastly, we examined if we could identify a threshold of lesion load to differentiate severely affected patients from less severely affected patients using a receiver operation characteristic (ROC) approach.

#### **MATERIALS AND METHODS**

#### **PATIENT GROUP**

The patient group comprised 50 chronic stroke patients [mean age: 55 (*SD*: 11), 10F, 40M] (**Table 1**); thirty of whom had been used in a previous study correlating AF-LL with measures of speech fluency (Marchina et al., 2011). All patients had some degree of non-fluent aphasia in the subacute stroke phase (according to a review of medical records), but showed varying degrees of recovery at their assessment timepoint (all patients were at least 6 months post-stroke with a median of 16 months post-stroke). Demographic data, language testing data, and lesion data are presented in **Table 1**. Patients with bi-hemispheric or brainstem infarcts, primary intracerebral hemorrhages, previous strokes identified either by MRI or medical record (besides the stroke that caused the aphasia), concomitant neurological diseases/disorders, and other aphasic syndromes such as pure anomia or global aphasia with severe reduction in speech output and severe comprehension deficits [defined as scoring less than 20% correct on Auditory Comprehension subtest scores of the Boston Diagnostic Aphasia Evaluation [BDAE] (Goodglass et al., 1983)], as well as significant cognitive impairments (less than the 50% correct on the Raven's Colored Progressive Matrices (RCPM) (Raven, 1995) were not included in this study. Our local Institutional Review Board approved this protocol and all subjects gave informed consent.

#### **CONTROL GROUP**

Healthy subjects, age-matched with the patient group, were recruited in order to create canonical functional and structural maps. Functional MR images from one group of 12 healthy


**Table 1 | Patient** 

**Characteristics.**


**Table 1 | Continued**

controls [mean age: 52 (*SD*: 13.9), 7M, 5F] were acquired during a speech production task and used to create canonical maps of activated gray matter (fGM). High-resolution Diffusion Tensor Images (DTI) from another group of age-matched 12 healthy controls [mean age: 58 (*SD*: 13.9), 8M, 4F] were used to create probabilistic, canonical maps of white matter tracts (AF, EMC, and UF) via probabilistic tracking. All healthy elderly control participants were right-handed, native speakers of English who scored within normal range in the Shipley/Hartford Verbal and Abstraction subtests (Shipley, 1940), which have been shown to be a predictor of IQ (Paulson and Lin, 1970). Our group of normal healthy control subjects was not tested on any fluency measures or naming tests. However, published data of a healthy control group suggests that the range of CIUs/min can be from 92 to 175 and the range for Words/min can be from 105 to 198 (Nicholas and Brookshire, 1993). Our group of patients, even the well-recovered patients, was well below those ranges (see **Table 1**).

#### **BEHAVIORAL MEASURES**

All patients underwent a battery of language tests to assess spontaneous speech production, naming, repetition, and comprehension, although the focus of this study was on speech and fluency measures. Conversational speech production was measured using the Correct Information Unit method (CIU) (Nicholas and Brookshire, 1993), and naming ability was assessed by the Boston Naming Test (BNT) (Goodglass and Kaplan, 1983).

In brief, speech fluency was assessed by transcribing videotaped conversational interviews comprising questions about biographical information (e.g., questions such as "where do you live, who do you live with?"), medical history (e.g., "what happened when you had your stroke?"), daily activities (e.g., "what do you usually do on Sundays?"), and descriptions of complex pictures [e.g., the Cookie Theft picture from the Boston Diagnostic Aphasia Examination and the picnic picture from the Western Aphasia Battery (Shewan and Kertesz, 1980) as well as similar pictures] with each patient. Transcriptions of patient's speech outputs were timed and coded by independent raters not involved in patient assessments. Our two main measures of speech fluency were words per minute (words/min), and correct information units per minute (CIUs/min). For the current study, we rescored all transcriptions, including those of the previous 30 subjects, in order to ensure consistency across all 50 subjects reported here. Words per minute is a common fluency measure, while CIUs/min is also referred to as "speech efficiency," a measure that combines informativeness and fluency (Nicholas and Brookshire, 1993), and was found to have the highest correlations with the AF-LL in Marchina et al. (2011). In order to be counted as a CIU, words had to be intelligible, accurate, relevant, and informative to the prompt asked. To control for variation in length of responses, coders timed a full minute of patient speech production after each question or task description, and averaged the scores from each question/task description to produce a final overall score.

The BNT is a commonly used clinical assessment tool of naming ability in stroke patients. For this study, we used the 15-item Short Form published in the BNT 2nd edition (Kaplan et al., 2001). These 15 items correlated highly with the 60-item Standard Form (*R >* 0*.*9, *p <* 0*.*05). Other studies confirmed the 15-item Short Form to be an accurate assessment of naming (del Toro et al., 2011). Patients were not timed in their responses, and the maximum score was 15.

#### **STRUCTURAL MR IMAGING**

All stroke patients were scanned with a 3-Tesla General Electric MR scanner using a standard radiofrequency headcoil. T1-weighted MR images (voxel resolution 0.93 × 0.93 × 1.5 mm) were spatially normalized to the SPM T1-template (isotropic 2 mm voxel size) in SPM5 (Wellcome Trust Centre for Neuroimaging, London, UK) implemented in MATLAB (The Mathworks Inc., Natick, MA). Problematic normalizations were identified by visually inspection of registration, patients' T1-image normalizations were fixed by excluding the chronic ischemic lesion from the registration algorithm before normalization (Brett et al., 2001).

Twelve age-matched subjects [mean age: 58 (*SD*: 13.9), 8M, 4F] underwent diffusion tensor imaging (DTI) using a singleshot, spin-echo EPI sequence with the following parameters: *TR* <sup>=</sup> 10 s; *TE* <sup>=</sup> <sup>86</sup>*.*9 ms; resolution 2.6 <sup>×</sup> <sup>2</sup>*.*<sup>6</sup> <sup>×</sup> 2.6 mm3; 30 non-collinear diffusion directions with a *b*-value of 1000 s/mm<sup>2</sup> and 6 acquisitions with a value of 0 s/mm2. A total of 56 slices covered the entire brain including the brainstem. Postprocessing of DTI images and fiber tracking were done in FSL (www.fmrib.ox.ac.uk). Images underwent eddy current and head motion correction, and skull stripping with the brain-extraction tool (BET). Fiber probability distribution, diffusion tensor modeling, and fractional anisotropy (FA) images were generated during dtifit and bedpostx processing. The AF, EMC, and UF tracts were traced according to anatomical guidelines described in detail in Marchina et al. (2011).

#### **RECONSTRUCTION OF THE WHITE MATTER TRACTS**

For the arcuate fasciculus (AF) tracts, we defined two regions of interest (ROIs) on the raw diffusion space FA maps in the white matter underlying the posterior middle temporal gyrus (approximately at *x* = −50 mm, *y* = −40, *z* = −4; MNI coordinates) and superior temporal gyrus (approximately at *x* = −50 mm, *y* = −40, *z* = 8; MNI coordinates). A third ROI was drawn on the same sagittal slice (approximately *x* = −50 mm, *y* = 14, *z* = 16) in the white matter underlying the pars opercularis of the posterior inferior frontal gyrus (IFG) as described in Marchina et al. (2011) (**Figure 1**). The AF was traced from the seed region in the IFG to the middle and superior temporal regions. Exclusion masks were drawn in the axial plane of the external capsule, in the coronal plane posterior to the temporal gyri, and in the sagittal plane of the region medial to the fiber bundle in order to exclude fiber projections that were not part of the AF.

For the EMC, a region of interest was drawn on a sagittal slice (*x* = −46, *y* = 30, *z* = 10) in the white matter underlying the pars orbitalis and triangularis in the IFG; a second region of interest was drawn on the same slice in the midportion of the white matter underlying the superior temporal gyrus (*x* = −46, *y* = −34, *z* = 8) (Marchina et al., 2011).

For the UF we drew coronal ROI in the anterior region of the corona radiata (*x* = −32, *y* = 38, *z* = 2), the anterior part

**AF-fGM maps. (A)** First column is the canonical probabilistic map of AF tract derived from DTI overlaid onto a normalized averaged T1 brain in MRIcon. **(B)** Second column shows the canonical map of functional gray matter map. **(C)** Third column displays combined canonical map of combined AF-fGM map.

of the temporal lobe where the UF adjoins the inferior frontooccipital fasciculus, and in the white matter underlying the inferior (*x* = −34, *y* = 2, *z* = −8) and middle temporal (*x* = −34, *y* = 2, *z* = −24) gyri (Marchina et al., 2011).

All tracts were thresholded at 50th percentile to minimize extraneous fibers. The twelve resulting fiber tracts of each canonical map were normalized to the standard T1 MNI space in SPM5, then binarized and summed to create separate canonical probabilistic maps of the AF, EMC, and UF.

#### **FUNCTIONAL MR IMAGING**

A separate group of twelve age-matched healthy control subjects [mean age: 52 (*SD*: 13.9), 7M, 5F] participated in functional magnetic resonance imaging that included performing an overt speaking task and a sparse temporal sampling fMRI design [for details of the overt speech task, see (Ozdemir et al., 2006)] implemented on a 3 Tesla GE scanner (BOLD sequence characteristics: *TR* 15 s, *TE* <sup>=</sup> 25 ms, voxel resolution <sup>=</sup> 3.75 <sup>×</sup> <sup>3</sup>*.*<sup>75</sup> <sup>×</sup> 5 mm3). The scanner task was synchronized with auditory stimuli via Presentation software (Neurobehavioral Systems, Albany, CA). The fMRI experiment consisted of 6 blocks of 20 task trials. Each block contained 15 overt speaking trials, and 5 control nonspeaking trials. The auditory stimulus was recorded by a trained individual articulating 15 2-syllable phrases frequently used in everyday conversation (e.g., "goodbye," "thank you"), as determined by The Dutch Center for Lexical Information (CELEX; http://www*.*mpi*.*nl/world/celex) (average frequency = 4658.8). The fMRI behavioral task was chosen to match the speech-motor and speech-fluency capabilities of our moderately to severely impaired stroke patients, and to reveal brain regions involved in speech motor functions. Subjects listened to an auditory cue, and then overtly repeated the exact phrase back at the same pace, or remained silent when there was no cue during control runs. Subjects' responses were recorded to verify proper adherence to condition. Auditory stimuli were presented in randomized order, and total scan time for each subject averaged 55 min.

#### **FUNCTIONAL IMAGE ANALYSIS**

Functional scans were analyzed using SPM5 (Institute of Neurology, London, UK). The preprocessing steps included movement correction, spatial normalization to the SPM5 EPI template, and spatial smoothing with an isotropic Gaussian kernel of 8 mm.

The general linear model was used to estimate condition and subject effects; global differences in scan intensity were removed by scaling each scan in proportion to its intensity. A high-pass filter with 128 s cutoff setting was used to eliminate low-frequency drifts, and flexible finite impulse response measured the average BOLD response at post-stimulus time. Contrasts for speaking vs. silence for each subject were entered individually with a significance threshold of Family-wise Error at 0.05 (FWE).

#### **CANONICAL MAP CREATION**

Functional Gray Matter (fGM) maps were extracted from the fMRI analysis, and multiplied with a standard gray matter mask from SPM5 anatomy toolbox to restrict the functional activations to gray matter. The FWE-thresholded maps of the twelve control subjects were then binarized and summed to create a canonical, probabilistic map of functional gray matter activation patterns (fGM) (**Figure 1**). Canonical structural white matter and functional gray matter (fGM) maps were summed to create the canonical probabilistic AF-fGM map (**Figure 1**).

#### **LESION LOAD CALCULATION**

To assess lesion damage to relevant functional and structural speech regions, we manually delineated lesion masks from the anatomical magnetic resonance images of the 50 stroke patients. One rater who was blind to subjects' behavioral outcomes manually drew patient lesion masks. The drawings were made using MRIcro software (http://www*.*mccauslandcenter*.*sc*.*edu/mricro/ mricro/) on stroke patients' normalized T1-weighted images, with the coregistered FLAIR (0.5 <sup>×</sup> <sup>0</sup>*.*<sup>5</sup> <sup>×</sup> 5 mm3, 24 slices) images as a guide. No part of ventricular dilations or hemispheric atrophy that one can sometimes observe in chronic stroke patients was included in the lesion map. For verification, a second rater (also blind to patient behavioral scores) manually inspected and revised all lesion maps, and in addition drew lesion maps on a subset of patients. The inter-reliability for lesion map volume was *>*0.9. For the lesion load calculation, each stroke patient lesion map was individually overlaid onto the canonical AF map, canonical fGM map, and the combined AF-fGM map, as well as the EMC and UF maps to calculate the lesion load of each patient.

Lesion overlap calculations for each patient were done as described by Zhu et al. (2010). In short, the maps consisted of voxel intensities ranging from *I* = 0 (voxel is not present in any part of the tract or functional gray matter map in any subjects) to *I* = 12 (the voxel is present in the part of the tract in all subjects). The probability of each voxel being a part of the tract is 1/12 of that voxel's total intensity. Lesion load was calculated by summing the total intersecting voxels between the lesion map and the voxel intensity from each probabilistic map.

#### **STATISTICAL ANALYSES**

All statistical analyses were completed with Predictive Analytics Software (PASW) SPSS (17.0.2). Linear and multiple regressions analyses were run with AF, fGM, and Combined AF-fGM lesion loads to predict behavioral measures of speech fluency and naming ability; age and stroke-to-assessment onsets were controlled for in each analysis. In addition, multiple regressions models were run to compare AF, EMC, and UF lesion loads in their ability to predict the behavioral outcome, while controlling for lesion size. Two outliers were excluded with residual analysis, where case-wise diagnostics showed those values were outliers at ±2.5 standard deviations.

Curve estimation analyses determined that the relationships between speech fluency outcomes and lesion loads were not represented well by linear trends. Given the volumetric nature of lesion load, we used a cube root transformation for linearity and to reduce variance (Woo et al., 1999; van den Elskamp et al., 2011). Naming ability was linearly related to lesion loads, so transformations were not applied for those regressions.

To assess if the biomarker *lesion load* can classify severe impairments of speech production, two-step cluster analyses were run to separate patients into severe and non-severe groups, using the behavioral measures words/minute, CIUs/minute, and naming (BNT). With automatic cluster detection, 2 groups for each variable were formed with a range of behavioral cutoffs between non-severe and severe groups. Discriminant analyses were run on the resulting groups to determine accuracy of the cluster cutoff. ROC curves identified the most accurate predictor of behavioral outcome among AF, fGM, and combined AF-fGM lesion loads and lesion volume, and defined the best threshold for stratifying severe/non-severe outcome.

#### **RESULTS**

All three lesion loads measures (AF, fGM, and combined AFfGM) significantly predicted the two fluency and naming measures in linear regressions (**Table 2**), controlling for age and stroke-onset-to-assessment time. In multiple regression models, a comparison of AF-LL with EMC and UF lesion loads confirmed our previous finding that AF-LL was the only significant predictor of speech fluency and naming (**Table 2**); lesion volume was not significant in any multiple regression models relative to AF, EMC, and UF lesion loads (*p >* 0*.*05).

#### **LESION LOAD AND SPEECH FLUENCY (CIUs/min)**

In a multiple regression model, AF and fGM lesion loads significantly predicted the fluency measure CIUs/min (Adjusted *<sup>R</sup>*<sup>2</sup> <sup>=</sup> <sup>0</sup>*.*642, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*01 for the overall model). However, AF-LL explained more variance in CIUs/min than the fGM-LL (AF-LL partial *R* = −0*.*30, *p <* 0*.*01; fGM-LL partial *R* = 0.12, *p >* 0*.*05). The combined AF-fGM-LL also significantly predicted CIUs/min (*R*<sup>2</sup> <sup>=</sup> <sup>0</sup>*.*59, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*01), but did not predict more of the variance than either the individual AF or fGM lesion load models (**Figure 2**). In a separate analysis comparing AF-LL with EMC and UF-LL, while controlling for lesion size, AF was the only significant predictor of speech fluency (AF-LL partial *R* = −0*.*55, *p <* 0*.*05) compared to EMC and UF lesion loads (**Table 2**). Age and onsets-to-assessment were not significant predictors of fluency (*p >* 0*.*05).

#### **LESION LOAD AND SPEECH RATE (Words/min)**

Words/min was predicted significantly in a multiple regression model by AF and fGM lesion loads (Adjusted *<sup>R</sup>*<sup>2</sup> <sup>=</sup> <sup>0</sup>*.*53, *p <* 0*.*01, for the overall model), as well as by the combined AF-fGM-LL (*R*<sup>2</sup> <sup>=</sup> <sup>0</sup>*.*45, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*01). AF-LL predicted Words/min significantly better than fGM-LL (AF-LL partial *R* = −0*.*321, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*05, fGM-LL partial *<sup>R</sup>*<sup>2</sup> = −0*.*044, *<sup>p</sup> <sup>&</sup>gt;* <sup>0</sup>*.*05) (**Figure 3**). Controlling for lesion size, AF-LL was the only significant predictor of Words/min compared to EMC and UF lesion loads (AF partial *R* = −0*.*529, *p <* 0*.*05) (**Table 2**).

#### **LESION LOAD AND NAMING ABILITY (BNT)**

Naming ability was predicted significantly in a multiple regression model with AF and fGM lesion loads (Adjusted *<sup>R</sup>*<sup>2</sup> <sup>=</sup> <sup>0</sup>*.*501, *p <* 0*.*01). Combined AF-fGM lesion load did not predict better than the individual maps (*R*<sup>2</sup> <sup>=</sup> <sup>0</sup>*.*418, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*01). Naming was best predicted by AF-LL (partial *R* = −0*.*249, *p <* 0*.*05), and not by fGM-LL (partial *R* = −0*.*102, *p >* 0*.*05) (**Figure 4**). In the AF- EMC-, and UF-LL regression model, AF-LL remained the only significant predictor of naming (AF-LL partial *R* = −0*.*274, *p <* 0*.*05) (**Table 2**).

**Table 2 | Linear regression coefficient values for AF, fGM, Combined AF-fGM, EMC, and UF lesion loads predicting CIUs/minute and BNT-Aphasia Short Form.**


*Table 2A represents regression models with AF, fGM, and Combined lesion loads. Table 2B presents values from AF, EMC, and UF regression models. Semipartial correlation coefficients are shown from multiple regression models including both maps representing significant unique contribution to prediction models. Asterisks mark significant unique contribution to predicting behavioral outcome derived from partial R2 significance.*

#### **OUTCOME GROUP CLASSIFICATION**

A range of behavioral cutoffs for dividing severely and moderateto-mildly affected subgroups was assessed by cluster analyses. Two-step cluster analyses with automatic grouping were run for behavioral classification of speech fluency into the two subgroups of severity, and the most accurate classification threshold was chosen for the behavioral cutoffs. A discriminant analysis confirmed a cutoff of 8–13 CIUs/min and 31–32 words/min to be 98% correct for dividing behavioral outcomes into severely and moderate-to-mildly affected groups (**Figure 7**). Lower and higher ranges of cutoffs in Words/min and CIUs/min were also tested (**Table 3**). For naming, automatic grouping in a two-step cluster analysis determined that those with a score of lower than 6 points out of 15 belonged to the severe impairment group, and



*Table shows discriminant cluster classification accuracy of all data points, and resulting ROC prediction of AF-LL threshold and AUC. BNT measures are derived from the BNT aphasia Short Form, and maximum score is 15.*

a discriminant analysis confirmed this clustering was 100% accurate at classifying all data points; a range of naming cutoffs was also tested for cluster accuracy (**Table 3**).

Classification ROC curves were run for lesion loads and lesion volumes in order to determine the best lesion-load threshold for predicting severe and non-severe speech fluency and naming (**Figure 4**). With the previously determined behavioral cutoff at each range, AF-LL, fGM-LL, and combined AF-fGM-LL were all significant for predicting CIUs/min (*p <* 0*.*01) (data not shown). AF-LL was the best predictive model of severely impaired fluency (CIUs/min) (96% accuracy) with highest sensitivity (91%) and specificity (85%) (**Figure 5**) with the lesion load threshold for classifying a patient as belonging to the severe group around 3.75 cc of AF-LL (**Table 3**). Lesion volume was not as accurate a predictor as AF-LL with lower accuracy at 88, 80% sensitivity and 85% specificity, and a threshold for severe fluency at 105 cc lesion volume. For naming, AF-LL was again the best predictor with prediction accuracy at 90%, and a threshold for severely impaired naming classification at 4.01 cc of AF-LL with 91% sensitivity and 75% specificity, while lesion volume predicted naming with only 81% accuracy (**Figure 6**).

#### **DISCUSSION**

Similar to the findings in our previous study (Marchina et al., 2011), we found that AF-LL, in comparison to the novel fGM-LL and combined AF-fGM LL, best predicted our two measures of speech fluency (words/min and CIUs/min) and naming ability (BNT) in a large sample of patients. In addition, AF-LL provided the best classification of speech fluency and naming outcomes with *>*94 and 90% accuracy, respectively. An AF-LL threshold beyond ∼4 cc classified a patient as belonging to the group with severe speech fluency and naming impairments.

The reason that the AF-LL emerged as the best predictor of impaired speech production may be due to its significant role in the feedforward and feedback control of speech production including naming and repetition (Damasio et al., 1996; Hickok and Poeppel, 2004; Borovsky et al., 2007; DeLeon et al., 2007;

**FIGURE 5 | Speech Fluency ROC curve shows prediction from AF-LL and lesion volume for speech fluency.** AF lesion load (in red) was the best at 96% in accuracy predicting severe and moderately/mildly affected groups at threshold at 3.75 cc.

Tourville et al., 2008; van Oers et al., 2010). Previous studies have already reported that damage to the AF was predictive of speech repetition impairment (Fridriksson et al., 2009). These findings support AF-LL as a surrogate marker of the AF impairment. The AF also converges with the EMC on the lexical-semantic "hub" region of the middle temporal gyrus (Catani et al., 2005; Glasser and Rilling, 2008; Lawes et al., 2008; Turken and Dronkers, 2011) and has been associated with syntactic, semantic, and phonological tasks in language production and perception (Glasser and Rilling, 2008; Rolheiser et al., 2011). The involvement of the AF in many speech functions suggests that the degree of AF impairment

in the left hemisphere may be a pivotal determinant of aphasia recovery (Rolheiser et al., 2011).

When we examined the lesion loads of the ventral stream represented by the EMC and UF tracts, despite EMC-LL and UF-LL providing modest predictions of speech fluency and naming outcomes, the AF-LL remained the most significant predictor in a multiple regression analysis. We also replicated our previous finding that lesion size was not a significant predictor relative to lesion loads, and our findings are consistent with those from Marchina et al. (2011); thus, we confirmed that ventral stream lesion loads, though significant independent correlates of naming and fluency, do not provide the best predictions relative to AF lesion load. These results support an emerging theory that the relationship between dorsal and ventral streams in speech are not easily separated by localized speech functions, and could indeed have a synergetic relationship as proposed by Rolheiser et al. (2011).

In the current study, we also replicated results from Marchina et al. (2011) and those of Marcotte et al. (2012) with regard to lesion volume and its marginal predictive ability of outcome and recovery. Although lesion volume independently predicted speech outcomes, it does not survive significance in a multiple regression model with AF-LL. This may be because lesion volume significance was derived from the damage to relevant language brain structures such as the AF, and does not contribute unique prediction to speech outcome. In aphasia research, there are differences in methods for determining lesion size/location, stroke type, and behavioral tasks from study to study, so it is difficult to define a strict lesion cutoff that determines outcome. To our knowledge, no group has yet established a clear cut-off value for lesion volume that predicts speech outcome; however, our AF-LL variable may have the potential to provide such a value (e.g., an AF-LL of 4 cc or more seems to be associated with severe non-fluent aphasia). This would obviously have to be replicated and further tested in subsequent studies.

Even though the AF-LL is the best predictor of speech fluency among the three white matter tracts examined, a functionally defined gray matter template could have been possibly more predictive of speech fluency impairment. Our rationale for choosing a functional gray matter map was based on previous studies showing that variations in perilesional activations are related to recovery from aphasia (Meinzer et al., 2008; Fridriksson, 2010; Fridriksson et al., 2010; Saur et al., 2010a; Hamilton et al., 2011). Although Saur et al. (2010a) combined fMRI with Diffusion-Weighted-Imaging (DWI)-derived lesional data, they did not find an improvement in their outcome predictions (Saur et al., 2010a). We assume that the lesion load variable that we used and combined with the fGM maps was more specific to the interconnected functional regions that were damaged in our sample of subjects. Furthermore, we found the lesion load of our fGM map to be correlated with speech fluency and naming abilities after stroke, but it did not explain as much of the variance as AF-LL. This difference could be due to the smaller size of the structural canonical map, which connects the core regions of the speech-motor network. In contrast, the fGM map encompassed the wider and more diffuse functional network necessary for word/phrase repetition, which may have included cortical regions beyond the critical core regions of the speech-motor network.

While there were various options of fMRI tasks that could have been used to define the fGM map, the word/phrase repetition fMRI task used in the current study allowed us to exert a high level of control on the timing and duration of the speech production task, which is important in sparse temporal fMRI designs. Furthermore, the resulting pattern of activation revealed a speech motor network that included regions in the premotor, SMA, inferior frontal, primary inferior sensorimotor, and posterior superior temporal regions, and is similar to speech production activation patterns reported in other publications (Saur and Hartwigsen, 2012). Lastly, our choice to use a word/phrase repetition task was also driven by some overarching designs that were not necessarily directly related to the analysis in this particular study, but had to do with several ongoing studies examining fMRI networks in age matched normal controls and aphasic patients. Thus, in order to capture some degree of speech production in all participants—even the most severely impaired patients—a strictly controlled word/phrase repetition fMRI task was in our opinion the best option in comparison to other fMRI tasks such as conversation or picture naming that included additional confounds e.g., untimed responses and/or use of visual stimuli. Despite the limitations of finding a suitable fMRI task for a wide variety of healthy age-matched controls and patients with various impairments, the fGM-LL was still a robust predictor for our speech fluency and naming outcomes, indicating that this method is promising for future investigation.

Although both AF-LL and fGM-LL predicted speech production individually, the lesion load of the combined AF and fGM maps did not provide a significantly better prediction than either of the variables alone. Although other studies have combined DTI and fMRI techniques to confirm functional connectivity between activated speech regions (Saur et al., 2010b), to the best of our knowledge our current study is the only one using a combined cross-modality model that included DTI, fMRI, and lesion load information for predicting aphasia outcome.

For predicting severe fluency and naming outcomes, the ROC classification model indicated that once AF-LL exceeds ∼4 cc threshold, conversational fluency and naming in the outcome group are severely impaired; this threshold remained consistent through a range of behavioral cutoffs. Although other studies have correlated lesion size with speech outcome (Kertesz et al., 1979; Naeser et al., 1981), to our knowledge no other study has used a lesion-load threshold of functionally relevant gray or white matter to classify the severity of speech fluency impairment. This threshold of 4 cc AF-LL could be very useful as a clinical predictor of outcome, especially since patients in the severely impaired group could adopt alternative and intensive therapies in order to retrain or involve right hemisphere speech-motor networks, such as Melodic Intonation Therapy or non-invasive brain stimulation applied to the right hemisphere (Schlaug et al., 2008, 2010, 2011), or for the less AF-impaired group (i.e., those with small left AF-LL) to focus on rehabilitating the ventral stream or supporting perilesional neural networks of speech and language function with or without non-invasive brain-stimulation. Albeit our model is relatively simple, the clustering method provides an objective grouping of the behavioral outcome, while AF-LL seems highly accurate for stratifying non-fluent aphasic stroke patients in the chronic stage, especially compared to overall lesion size.

Although there are other options that may have been considered appropriate behavioral measures for determining degree of speech-motor impairment and/or degree of improvement in post-stroke aphasia, we chose CIUs/min (Nicholas and Brookshire, 1993) and Words/min as measures of speech fluency, because each measure provided important information regarding a patient's impairment. Words/min revealed patients' articulatory agility, but lacked "informativeness" (accuracy of information) or efficiency of the speech; CIUs/min was designed to be an accurate, quantitative measure of functional speech, and served to quantify both informativeness and efficiency of the patients' speech output. However, without Words/min, CIUs/min does not always reveal the full nature of the impairment. Using both measures has allowed us to capture multiple aspects of deficits and improvements in the speech output of non-fluent aphasic patients with relatively wide range of impairments.

A few caveats apply to our findings. Although our model reveals a strong relationship between left AF-lesion load and patient outcomes on measures of fluency and naming, we could not take into account remote effects of lesions onto non-lesional brain regions contributing to the behavioral phenotype due to a disconnection (Weiller et al., 1993), or the variable size of the right AF and homolog/homotop speech regions on the right hemisphere as showing plastic changes post-stroke and over time (Rosen et al., 2000; Crinion and Price, 2005; Saur et al., 2006; Raboyeau et al., 2008; Schlaug et al., 2009). Secondly, since the current study was partially a replication study including 30 original patients from a previous study out of 50 presented here, the results could possibly be biased; however, the updated probabilistic canonical white matter tracts and the functional GM maps as well as the combined structural and functional maps to determine lesion loads were new for all patients; thus, besides a significantly larger patient population, our investigation has novel aspects that go beyond a simple replication of a previous study. Thirdly, the white matter tracts were reconstructed in age-matched healthy elderly controls using 30 diffusion directions. There is some debate in the literature with regard to the optimal number of diffusion directions to be used. While some argue that a higher number is better, others have argued that 30 might be adequate (Mukherjee et al., 2008). There is no accepted standard on the optimal number of diffusion directions, although multicenter reliability studies have used 30 directions recently and found acceptable variations across sites (Magnotta et al., 2012). Nevertheless, the number of DTI directions may not be as problematic for our study, since we have aggregated white matter tracts from healthy, matched elderly controls into a probabilistic canonical map for the purpose of calculating a lesion overlay, rather than focusing on an examination that would require more directions for optimal DTI acquisition (e.g., fiber integrity in or surrounding an ischemic lesion in patients). Lastly, the generalization of our predictive model may be limited, since we exclusively recruited a group of patients with speech fluency impairments who were mainly classified as non-fluent aphasics in the acute stroke phase. Furthermore, it is possible that recovery from aphasia can continue to occur in the chronic stage, and thus correlations between lesion markers and behavioral profiles could change over time and our predictions could show some dependency on time after stroke. Our current set of data does not necessarily support this, but larger numbers of patients would need to be tested to examine this in more detail. Our model of outcome predictions using the AF-LL can be tested in acute, subacute, and chronic stroke patients with a wider range of aphasia classifications.

Whereas the importance of the AF-LL as a biomarker for degree of impairment in both speech fluency and naming ability in chronic stroke patients was established by our earlier publication (Marchina et al., 2011), the present study both confirms the original findings in a larger patient sample, and compares the predictability of AF-LL to that of a new measure—fGM-LL. Furthermore, the AF-LL marker can help stratify patients by their level of impairment (e.g., mild/moderate and severe), which should improve outcome predictions and thus, help (1) identify those who are likely to benefit from particular interventions and/or experimental treatment studies, (2) guide clinicians in the selection and implementation of such treatments, and (3) maximize treatment time with the goal of improving upon predicted outcomes. Two major advantages of the AF-LL marker are its simplicity and practicality, since there is no need for additional high-resolution MR imaging beyond what is typically acquired at the time of stroke onset. Furthermore, this measure can easily be calculated and used in both research and clinical settings, making this potentially valuable tool more widely available to stroke professionals. Although we see great potential for such a neuroimaging biomarker, the predictability of AF-LL should also be compared to other behavioral measures and combination of imaging and behavioral measures in future studies. Furthermore, future studies will be needed both to test and refine the AF-LL as a surrogate marker of speech fluency, and more deeply investigate its value in longitudinal outcome studies of stroke survivors with aphasia.

#### **ACKNOWLEDGMENTS**

The authors gratefully acknowledge support from NIH (1RO1 DC008796, 3R01DC008796-02S1, R01 DC009823-01), the Grammy Foundation, the Richard and Rosalyn Slifka Family Fund, the Tom and Suzanne McManmon Family Fund, and the Matina R. Proctor Foundation.

#### **REFERENCES**


recovery of expressive language functions. *Music Percept.* 25, 315–323. doi: 10.1525/mp.2008.25.4.315


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

#### *Received: 01 July 2013; accepted: 18 November 2013; published online: 10 December 2013.*

*Citation: Wang J, Marchina S, Norton AC, Wan CY and Schlaug G (2013) Predicting speech fluency and naming abilities in aphasic patients. Front. Hum. Neurosci. 7:831. doi: 10.3389/fnhum.2013.00831*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2013 Wang, Marchina, Norton, Wan and Schlaug. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Dissociated repetition deficits in aphasia can reflect flexible interactions between left dorsal and ventral streams and gender-dimorphic architecture of the right dorsal stream

*Marcelo L. Berthier <sup>1</sup> \*, Seán Froudist Walsh2, Guadalupe Dávila1,3, Alejandro Nabrozidis 4, Rocío Juárez y Ruiz de Mier 1, Antonio Gutiérrez 3, Irene De-Torres 1, Rafael Ruiz-Cruces 1, Francisco Alfaro4 and Natalia García-Casares <sup>5</sup>*

*<sup>1</sup> Unit of Cognitive Neurology an Aphasia, Department of Medicine, Centro de Investigaciones Médico-Sanitarias, University of Malaga, Malaga, Spain*

*<sup>2</sup> Department of Psychosis Studies, Institute of Psychiatry, King's Health Partners, King's College London, London, UK*

*<sup>3</sup> Department of Psychobiology and Methodology of Comportamental Sciences, Faculty of Psychology, University of Malaga, Malaga, Spain*

*<sup>4</sup> Unit of Molecular Imaging, Centro de Investigaciones Médico-Sanitarias, General Foundation of the University of Malaga, Malaga, Spain*

*<sup>5</sup> Department of Medicine, Faculty of Medicine, University of Malaga, Malaga, Spain*

#### *Edited by:*

*Matthew A. Lambon Ralph, University of Manchester, UK*

#### *Reviewed by:*

*Karsten Specht, University of Bergen, Norway Cornelius Weiller, Universität Freiburg, Germany*

#### *\*Correspondence:*

*Marcelo L. Berthier, Unidad de Neurología Cognitiva y Afasia. Centro de Investigaciones Médico-Sanitarias, Universidad de Málaga, Campus Teatinos, C/Marqués de Beccaria 3. 29010, Málaga, España e-mail: mbt@uma.es*

Assessment of brain-damaged subjects presenting with dissociated repetition deficits after selective injury to either the left dorsal or ventral auditory pathways can provide further insight on their respective roles in verbal repetition. We evaluated repetition performance and its neural correlates using multimodal imaging (anatomical MRI, DTI, fMRI, and18FDG-PET) in a female patient with transcortical motor aphasia (TCMA) and in a male patient with conduction aphasia (CA) who had small contiguous but non-overlapping left perisylvian infarctions. Repetition in the TCMA patient was fully preserved except for a mild impairment in nonwords and digits, whereas the CA patient had impaired repetition of nonwords, digits and word triplet lists. Sentence repetition was impaired, but he repeated novel sentences significantly better than clichés. The TCMA patient had tissue damage and reduced metabolism in the left sensorimotor cortex and insula. DTI showed damage to the left temporo-frontal and parieto-frontal segments of the arcuate fasciculus (AF) and part of the left ventral stream together with well-developed right dorsal and ventral streams, as has been reported in more than one-third of females. The CA patient had tissue damage and reduced metabolic activity in the left temporoparietal cortex with additional metabolic decrements in the left frontal lobe. DTI showed damage to the left temporo-parietal and temporo-frontal segments of the AF, but the ventral stream was spared. The direct segment of the AF in the right hemisphere was also absent with only vestigial remains of the other dorsal subcomponents present, as is often found in males. fMRI during word and nonword repetition revealed bilateral perisylvian activation in the TCMA patient suggesting recruitment of spared segments of the left dorsal stream and right dorsal stream with propagation of signals to temporal lobe structures suggesting a compensatory reallocation of resources via the ventral streams. The CA patient showed a greater activation of these cortical areas than the TCMA patient, but these changes did not result in normal performance. Repetition of word triplet lists activated bilateral perisylvian cortices in both patients, but activation in the CA patient with very poor performance was restricted to small frontal and posterior temporal foci bilaterally. These findings suggest that dissociated repetition deficits in our cases are probably reliant on flexible interactions between left dorsal stream (spared segments, short tracts remains) and left ventral stream and on gender-dimorphic architecture of the right dorsal stream.

**Keywords: transcortical motor aphasia, conduction aphasia, repetition, dual dorsal-ventral pathways, diffusion tensor tractography, positron emission tomography, functional magnetic resonance imaging**

#### **INTRODUCTION**

In their pioneering studies on aphasia Broca (1861, 1863) and Wernicke (1874, 1906, 1977) described distinct syndromes associated with involvement of anterior and posterior cortical areas of the left hemisphere, respectively. One of these syndromes was chiefly characterized by reduced speech fluency with relatively spared auditory comprehension (Broca's aphasia) (Lazar and Mohr, 2011), whereas in the other syndrome spontaneous speech was abundant but abnormal in content and auditory comprehension was also impaired (Wernicke's aphasia) (Albert et al., 1981). At the same time, Lichtheim (1885) was investigating language deficits which apparently resulted from selective damage to major commissures linking cortical speech areas. Lichtheim (1885) was particularly interested in two contrasting syndromes which in his view resulted from selective damage to commissural pathways. One such syndrome, early described by Wernicke is what nowadays is known as conduction aphasia (CA) (Benson et al., 1973; Henderson, 1992), whereas the other, termed by Lichtheim "inner commissural aphasia," is currently known as transcortical motor aphasia (TCMA) (a term coined by Wernicke) or dynamic aphasia (Luria and Tsvetkova, 1968; Albert et al., 1981; Berthier, 1999; Robinson et al., 2008). From a clinical viewpoint, TCMA and CA are easily distinguished between them by the ability to repeat language. Abnormal repetition, fluent spontaneous speech and preserved auditory comprehension are the key features of CA (Benson et al., 1973; Albert et al., 1981; Goodglass, 1992). By contrast, a relative preservation of repetition in the face of nonfluent verbal output and preserved auditory comprehension defines TCMA (Albert et al., 1981; Freedman et al., 1984; Berthier, 1999). These two syndromes result from the involvement of different structures. The typical TCMA syndrome has been linked with left frontal lesions beneath the Broca's area disconnecting it from the supplementary motor area (Albert et al., 1981; Freedman et al., 1984; Cauquil-Michon et al., 2011). Typical TCMA may also occur in association with extensive damage to the left superior frontal gyrus involving the supplementary motor area (Alexander and Schmitt, 1980). Variants of TCMA (Freedman et al., 1984; Taubner et al., 1999) and DA<sup>1</sup> (see Robinson et al., 2008) usually involve the left anterior perisylvian language cortex (Brodmann's areas 44, 45, 4, 6, anterior insula) although DA can also follow bilateral striatocapsular lesions (Gold et al., 1997; patient BS in Berthier, 1999, pp. 69–74). Left temporoparietal or inferior parietal lesions with variable involvement of the AF and insular cortex induce CA (Benson et al., 1973; Damasio and Damasio, 1980).

Wernicke, Lichtheim and their contemporary scholars described the surface features and pathological correlates of both syndromes. Wernicke noted that interruption of white matter pathways would lead to abnormal repetition, but in his original formulation of CA repetition deficits were not included as a hallmark component of the syndrome (Geschwind, 1967; Henderson, 1992; De Bleser et al., 1993). Wernicke (1874) provided the first two-route model of language repetition and this diagram was further elaborated by Lichtheim (1885) (**Figure 1**). He envisioned that repetition could be impaired by disruption of the route linking the auditory images of words ("A") and motor images of words ("M") which passes through the insular cortex. Despite having abnormal repetition, Lichtheim further contended that

**pathways and predicted sites of lesions that would cause aphasia (left image).** In this diagram, "A" indicates the center of auditory images, "M" the center of motor images, and "B" the center of concepts. Lesions interrupting the commissure "A"-"M" cause conduction aphasia (CA) (red line), whereas interruption of the commissure interconnecting B-M causes transcortical motor aphasia (TCMA) (yellow line). Diagram depicting a two route model to repetition (right image) (adapted from McCarthy and Warrington, 1984). The green circle represents the route that should be interrupted to induce CA, whereas the purple circle represents a lesion in the connection that should be interrupted to induce TCMA.

patients presenting with this disorder would maintain volitional speech relatively preserved because it could be mediated by direct connections between the concept center ("B") and the center of motor images ("M"). This syndrome corresponds to classical CA (Geschwind, 1965). In complimentary terms, Lichtheim (1885) interpreted his "inner commissural aphasia" (TCMA) as resulting from the interruption of the connection between concept center ("B") and the center for motor images of words ("M") (Berthier, 1999). Although further elaborations on the routes engaged in speech repetition were advanced (Kussmaul, 1877; Heilman et al., 1976), the cognitive mechanisms underpinning speech repetition in these syndromes remained unexplored until the seminal study by McCarthy and Warrington (1984) (hereafter McC & W). They performed a comprehensive evaluation of speech production deficits in two patients with CA (ORF and RAN) and in another one (ART) with TCMA which revealed a double dissociation in tasks that manipulated semantic processing requirements of repetition. McC & W did find that in tasks requiring active semantic processing, repetition performance was facilitated in the two patients with CA and hindered in the TMCA patient. Indeed, ORF and RAN performed better repeating semantically meaningful word triplet lists than similar lists with no semantic relatedness and they also repeated novel sentences (e.g., "She went to buy some milk") better than clichéd sentences (e.g., "On top of the world"). The opposite pattern of dissociation was found in ART in whom repetition performance was facilitated by tasks which entail little or no semantic processing (repetition of clichés) and rely more on automatic online strategies. Results from this investigation, allowed McC &W to reappraise the

<sup>1</sup>Transcortical motor aphasia (TCMA) is a heterogeneous condition and cases with atypical or "near variant" features have been described. These cases are atypical because some speech deficits (impaired articulation, stuttering) and language deficits (impaired auditory comprehension) deviate from the classical pattern, denoting the extension of the typical left frontal subcortical lesion to its neighboring areas in the frontal operculum, motor area, basal ganglia/internal capsule, or insula (Freedman et al., 1984; Taubner et al., 1999). Dynamic aphasia (DA) is closely allied with TCMA and is also heterogeneous in its main features and lesion sites (Berthier, 1999). Two main DA types have been described, pure DA (reduced verbal output without other language impairments) and mixed DA (reduced verbal output with additional phonological, lexical, syntactical and/or articulatory impairments) associated with mild and moderate involvement of the anterior perisylvian area, respectively, (Robinson et al., 1998, 2005, 2008).

two-route model for speech production early championed by Lichtheim (1885) (**Figure 1**) (see further details below). Nevertheless, McC & W concluded ".., the biological necessity and *modus operandi* of dual processing routes in speech production remain obscure" (p. 482). Regrettably, as a result of the limited development of brain imaging by that time, McC & W were unable to dissect the neural architecture of white matter pathways underpinning repetition.

With the advent of diffusion tensor imaging (DTI) an accurate *in vivo* delineation of white matter tracts is being achieved (Mesulam, 2005; Catani and Mesulam, 2008; Saur et al., 2008; Turken and Dronkers, 2011; Weiller et al., 2011; Rijntjes et al., 2012; Cloutman, 2013) and this fruitful knowledge has prompted retrospective speculations on the respective role of different components of the arcuate fasciculus (AF) in these two contrasting syndromes, CA and TCMA (Catani et al., 2005). Catani et al. (2005) attempted to provide an anatomical explanation for McC & W's findings by focusing their analysis on the involvement of the direct and indirect segments of the AF in CA and TCMA. These researchers noted that lesions in the temporoparietal cortex in both CA patients (ORF and RAN) extended deeply to injure the direct long segment of the AF (the classical AF) responsible for fast, automatic repetition of words and nonwords, but spared the indirect segment engaged in active semantic processing required for novel sentence repetition. This may account not only for impaired repetition and sparing of spontaneous speech and auditory comprehension, but also for better performance on repetition of novel sentences over over-learned clichés. By contrast, Catani et al. (2005) also noted that the inferior parietal lesion in ART was more superficial than the ones in ORF and RAN and hence ideally suited to injure only the superior part of the indirect segment or its parietal cortical relay station fully preserving the direct long segment. This strategically placed lesion in ART may have hindered repetition of novel sentences by preventing the access to meaning during repetition but his ability to repeat over-learned phrases (clichés) requiring less cognitive processing and effort was preserved because the direct segment of the AF remained intact. Lesion locations in patients reported by McC & W were established using computerized tomography scans thus hampering establishing reliable anatomo-functional relationships. Therefore, although the *a posteriori* interpretations of McC & W's findings by Catani et al. (2005) represent a step forward into the mechanisms underlying repetition, further studies using modern imaging methods are warranted.

Testing brain-damaged patients with different repetition deficits and selective injury to either the left dorsal or ventral white matter bundles can provide valuable insight not only on their respective roles in verbal repetition, but also on the vicarious capacity of right white matter tracts to compensate repetition deficits. However, the selective involvement of a single white matter pathway (Carota et al., 2007) or discrete portions of the cerebral cortex (Vallar et al., 1997) after highly focal lesions is exceptional because brain lesions are usually large and rarely respect anatomical boundaries. In the present study, we took advantage of the exceptional circumstance of examining two aphasic patients with dissociated speech production and repetition deficits and small contiguous but non-overlapping perisylvian infarctions. We evaluated the anatomo-functional correlates of these dissociated deficits with multimodal imaging.

#### **MATERIALS AND METHODS PARTICIPANTS** *Patient RTP*

RTP was a 41-year-old female who had completed 8 years of formal education and worked as a secretary before the stroke. She was referred to our unit for evaluation and treatment of a chronic nonfluent post-stroke aphasia. Her past history was unremarkable until December 2007 when she suffered the spontaneous rupture of a saccular aneurism of the left anterior choroid artery. The aneurism was successfully blocked with coiling, but 1 week after the procedure RTP suddenly developed mutism with preserved comprehension and right hemiparesis with sensory loss mostly affecting the hand. By that time, she also had dysphagia and oral apraxia. An angiography disclosed vasospasm of the left middle cerebral artery (M2 and M3 segments) and a brain MRI showed a small infarct involving the left sensorimotor cortex with extension into the middle and posterior insular cortex. After hospital discharge, she received speech-language therapy for 1 year. On formal evaluation performed in April 2009, RTP had a TCMA and a discrete right hand sensorimotor deficit. Although her communication in activities of daily living was judged to be relatively spared by their relatives, she did not attempt to communicate spontaneously and only speak when addressed. Spontaneous speech was sparse and hesitant. She had long latency to initiate verbal emissions, but her messages were devoid of paraphasias, perseverations, agrammatism or articulatory deficits. Auditory comprehension was impaired only for understanding complex sequential commands. Repetition and naming were preserved and were considerably better than spontaneous speech. Writing was impaired. She also had moderate cognitive and motor slowness, depression, mild apathy and reduced quality of life, particularly in physical and communication domains.

#### *Patient JGG*

JGG was a 52-year-old male who had completed 12 years of formal education and ran his own business before the stroke. He was referred to our unit for evaluation and treatment of a chronic fluent post-stroke aphasia. He had a history of treated hypothyroidism and well-controlled hypertension. On January 2009 while on vacations in Bangkok (Thailand) he suddenly lost consciousness and felt. On awakening, he was admitted to a local hospital where naming and short-term memory problems were identified. An MRI angiogram revealed a complete occlusion of the M4 segment of the left middle cerebral artery and an anatomical MRI showed a small left temporoparietal infarct. A cavum septum pellucidum and a cavum vergae were also seen (DeLisi et al., 1993; Choi et al., 2008). On returning to Spain, he received 3 months of speech-language therapy and some improvement in aphasia severity were noted. On a formal evaluation performed in August 2009, JGG showed language deficits consistent with a mild reproduction CA (Shallice and Warrington, 1977; Nadeau, 2001). His spontaneous speech was fluent and well-articulated but he made unsuccessful self-corrective attempts in barely accessible words during spontaneous speech (*conduite d'ecart*). Comprehension and object naming were virtually intact. Repetition was preserved for short words but not for polysyllabic words and sentences. He was depressed and mildly apathetic showing reduced communication in activities of daily living. He also felt tired and developed right fronto-parietal headache soon after starting aphasia testing or therapy, a set of symptoms resembling the ones reported in patients with minor strokes in either left parietal, thalamic or caudate regions (Staub and Bogousslavsky, 2001; Radman et al., 2012; Tang et al., 2013).

#### **HANDEDNESS**

Handedness was evaluated using the Edinburgh Inventory of Handedness (EIH) (Oldfield, 1971). Both patients were right handed (EIH, above +40). RTP was strongly right handed (EIH +100) and JGG also performed most activities, including writing, with the right hand but he used either hand to do some activities (EIH +77). Both patients had a negative history for familial lefthandedness, prenatal or perinatal injuries, learning disabilities or developmental disorders.

#### **LANGUAGE ASSESSMENT: APHASIA PROFILE** *Methods*

Language deficits were assessed with the oral subtests (spontaneous speech, comprehension, repetition, and naming) of the Western Aphasia Battery (WAB) (Kertesz, 1982) to obtain an Aphasia Quotient (AQ), a measure of aphasia severity. On the WAB-AQ patients are considered to have aphasia when they score *<*93.8 and lower scores indicate more severe deficits. The Naming × Frequency subtest of the Psycholinguistic Assessments of Language Processing in Aphasia (PALPA 54) (Kay et al., 1992; Valle and Cuetos, 1995) was also administered.

#### *Results*

Both patients obtained WAB-AQs below the cut-off score for diagnosis of clinically significant aphasia (Kertesz, 1982). RTP obtained a moderately impaired WAB-AQ (74.9/100) with mild impairment on auditory comprehension (8.7/10), repetition (8.4/10) and naming (8.3/10) and severe reduction of verbal output (fluency in spontaneous speech score: 4/10), a combination of deficits that characterizes the TCMA syndrome (Kertesz, 1982; Berthier, 1999). However, the profile of TCMA in RTP was atypical because she had deficits in nonword repetition and a lesion in the central perisylvian area (Freedman et al., 1984; Berthier, 1999)**1**. According to recent accounts (Robinson et al., 1998, 2005, 2008) the profile of aphasia in RTP could be also classified as *mixed dynamic aphasia***<sup>1</sup>** because she had phonological impairments (impaired nonword repetition) besides the severely reduced spontaneous speech (see below). JGG obtained a better WAB-AQ (84/100) than RTP and his subtest scores were mildly impaired for fluency (8/10), comprehension (9.3/10) and naming (9.1/10). His score on repetition (8/10) though impaired in polysyllabic words and sentences was better than the repetition scores (range: 0–6.9) required in the WAB (Kertesz, 1982) for meeting the diagnosis of CA. Although the WAB classified language deficits in JGG as anomic aphasia, his object naming was intact. Moreover, the WAB taxonomic criteria are not sensitive enough to distinguish different mild aphasic syndromes. Therefore, the aphasic deficits in JGG were classified as a mild reproduction CA (Nadeau, 2001). Picture naming was normal in both patients obtaining normal scores on WAB-object naming subtest (RTP: 59/60; JGG: 60/60) and PALPA 54 (both 59/60).

#### **SPEECH PRODUCTION** *Methods*

Since measures to rate spontaneous speech (fluency and information content) of the WAB have limited reliability, other rules for rating verbal production during picture description in aphasia are commonly used (Nicholas and Brookshire, 1993; Berndt et al., 2000; Marchina et al., 2011; Zipse et al., 2012). Therefore, communicative informativeness and efficiency of connected speech were evaluated with speech samples obtained during the description of the "Picnic Scene" from the WAB (time limit: 5 min). All descriptions were audiotaped. Speech samples were transcribed and analyzed for percentage of correct information units (CIUs) defined as non-redundant content words that convey correct information about the stimulus (Nicholas and Brookshire, 1993) using the following formula: number of CIUs/number of words × 100. According to Nicholas and Brookshire (1993) to be classified as CIUs, words should be not only intelligible in context, but also accurate, relevant and informative with respect to the stimulus. Meaningless utterances, perseverations, paraphasias and other inappropriate information (exclamations) were counted as words, but not classified as CIUs.

#### *Results*

RTP had reduced informativeness and overall efficiency of speech measured with CIU and %CIU in comparison to JGG (**Table 1**). Spontaneous speech in RTP was hesitant and perseverative as reflected by the high number of pauses (*>*3 s.) which in part contributed to reduced fluency. Spontaneous speech in JGG was fluent and more informative than in RTP but his utterances were occasionally punctuated by phonemic approximations to the target word (*conduite d'approche*).

#### **Table 1 | Language, communication, and behavior.**


*WAB indicates Western Aphasia Battery; PALPA, Psycholinguistic. Assessments of Language Processing in Aphasia.*

#### **COMMUNICATION**

#### *Method*

Communication in activities of daily living was assessed with the Communicative Activity Log (CAL) (Pulvermüller and Berthier, 2008). The CAL is composed of 36 questions divided in two parts that address quality of communication (e.g., "How well would the patient verbally express criticisms or make complaints?") and amount of communication (e.g., "How frequently would the patient verbally express criticisms or make complaints?"). The CAL quality of communication score is obtained by summing up scores for items 1–18. The amount of communication score is obtained by summing up scores over items 19–36. The total score range from 0 to 180 and high scores indicate better everyday communication. The CAL was completed by a reliable family member in the presence of one member of the research team in order to clarify potential misunderstanding in questions' content or scoring.

#### *Results*

Assessment of communication of daily living revealed that both patients were impaired in both quality and amount of communication (**Table 1**). However, RTP obtained better scores than JGG. Her quality of communication was rated much better than the amount of communication, whereas JGG obtained similar low scores in both communication subscales.

#### **BEHAVIORAL EVALUATION: DEPRESSION** *Method*

Depression was assessed with the Stroke Aphasic Depression Questionnaire (SADQ) (Sutcliffe and Lincoln, 1998). The SADQ is a 21-item questionnaire developed based on observable behaviors commonly associated with depressed mood. Questions address mood (e.g., "Does he/she have weeping spells?"), social interaction, loss of interest, sleep-related problems, and motivation. The SADQ was completed by a reliable family member on behalf of the patient behavior and high scores indicate more severe depression.

#### *Results*

Both patients had symptoms of post-stroke depression with JGG obtaining higher scores on the SADQ than RTP (**Table 1**).

#### **EXPERIMENTAL REPETITION TESTING**

Although both patients obtained relatively similar scores on the WAB-repetition subtest, qualitative differences were found. Repetition in RTP was flawless except for the longest sentence which she reproduced incompletely most likely due to a mild short-term memory deficit. Repetition in JGG was impaired for polysyllabic words and sentences. Therefore, further tests were administered. Patients' scores on repetition of words, nonwords and word triplets were compared with those from a group of 14 healthy controls [5 men and 9 women; mean age: 57.1 ± 6.6 years (range: 47–67 years); education: 10.2 ± 3.7 years; range: (6.5–18 years)] (Berthier, 2001).

#### **WORD AND NONWORD REPETITION** *Methods*

The two patients were asked to repeat a list of auditorily presented words (*n* = 80) taken from the Frequency Dictionary of Spanish Words (Juilland and Chang-Rodriguez, 1964) and nonwords (*n* = 80). The corpus of words (nouns) included 20 high-frequency/high-imageability nouns, 20 high-frequency/lowimageability nouns, 20 low-frequency/high-imageability nouns and 20 low-frequency/low-imageability nouns. Nonwords contained 3–8 letters, were pronounceable, and most of them (71%) were derived from real words. All responses were audiotaped for later transcriptions and only exact repetitions were scored as correct.

#### *Results*

Repetition was normal for words (nouns) and impaired for nonwords [RTP: χ<sup>2</sup> *(*1*)* , 15.53, *p <* 0*.*0001; JGG: χ<sup>2</sup> *(*1*)* , 51.57, *p <* 0*.*0001]. RTP repeated nonwords significantly better than JGG [χ<sup>2</sup> *(*1*)* , 22.01, *p <* 0*.*0001] (**Table 2**).

#### **REPETITION: GRAMMATICAL CLASS** *Method*

In order to identify possible differences in the ability to repeat grammatical categories of words, the patients were asked to repeat


*\*Experimental tests from Berthier (2001). See further details in text.*

a list of 200 words composed of nouns (*n* = 60), verbs (*n* = 50), adjectives (*n* = 50), and functors (*n* = 40) which were selected from the Frequency Dictionary of Spanish Words (Juilland and Chang-Rodriguez, 1964).

#### *Results*

Both patients showed a near perfect performance on this test (**Table 2**).

#### **DIGIT PRODUCTION**

#### *Method*

This was assessed with the Digit Production/Matching Span (PALPA 13) which assesses the immediate serial recall of sequence of digits (2–7) of increased length. In the Matching Span patients are required to indicate among two alternatives the sequence verbally presented by the examiner if the sequence is identical or not. For this task, a sequence of digits (2–7) of increased length was presented.

#### *Results*

Both patients had restricted digit production (**Table 2**). Digit span matching was not evaluated.

#### **REPETITION OF WORD TRIPLETS**

#### *Method*

To assess the influence of lexical-semantic information on repetition ability when the demand of the auditory-verbal short-term memory is increased both patients were asked to repeat word triplet lists. This task is a modification of the one used by McCarthy and Warrington (1984, 1987) in patients with CA. In our battery two sets of 60 three-word lists (verb-adjective-noun) were created (Berthier, 2001). These were composed of word strings of increasing semantic richness that is from non-organized to organized semantic information. Two 20 three-word lists (List 1: 60 high-frequency words; List 4: 60 low-frequency words) consisted of random word combinations (e.g., "buy-sweet-country"). Two other 20 three-words lists (List 2: 60 high-frequency words; List 5: 60 low-frequency words) conveyed loosely constrained meaningful information (e.g., "shake-full-bottle"), and two other 20 three-word lists (List 3: 60 high-frequency words; List 6: 60 low-frequency words) conveyed closely constrained meaningful information (e.g., "cut-lovely-flower"). Words were read at a rate of one per second and patients were required to repeat the words in the order given by the examiner. Responses were scored for the number of lists repeated verbatim in each condition.

#### *Results*

Performance on this task was normal in RTP and markedly impaired in JGG patients (**Table 2**). RTP repeated significantly better high-frequency word triplets than JGG [χ<sup>2</sup> *(*1*)* , 3.86, *p* = 0*.*049], whereas there were no significant differences in the repetition of low-frequency triplets [χ<sup>2</sup> *(*1*)* , 2.86, *p* = 0*.*091]. RTP repeated different word triplets with similar efficiency and JGG showed better performance on repetition of high-frequency and low-frequency with constrained semantic information than other word triplets but differences did not reach statistical significance. However, when high-frequency and low-frequency word triplets were analyzed together, JGG repeated word triplets containing constrained semantic information significantly better (27/40,0.67) than word triplets with random semantic organization [12/40,0.30; χ<sup>2</sup> *(*1*)* , 9.68, *p <* 0*.*005] and loosely semantic organization [13/40,0.32; χ<sup>2</sup> *(*1*)* , 8.34, *p <* 0*.*005].

#### **REPETITION OF CLICHÉS AND NOVEL SENTENCES** *Method*

To explore possible dissociation between both types of sentences, both patients were asked to repeat well-known Spanish clichés (*n* = 40) taken from the 150 Famous Clichés of Spanish Language (Junceda, 1981) as well as a set of novel sentences (*n* = 40) that were construed following the methodology described by Cum and Ellis (1999) and Berthier et al. (2011).

#### *Results*

RTP repeated significantly better clichés than JGG [χ<sup>2</sup> *(*1*)* , 15.88, *p <* 0*.*0001] but there were no differences between them in the ability to repeat novel sentences. JGG repeated novel sentences significantly better than clichés [χ<sup>2</sup> *(*1*)* , 5.33, *p* = 0*.*021], whereas RTP repeated clichés and novel sentences with similar efficiency (**Table 2**).

#### **MULTIMODAL NEUROIMAGING STRUCTURAL MRI**

#### *Methods*

MRIs studies in both patients were performed on a 3-T magnet (Philips Gyroscan Intera, Best, The Netherlands) equipped with an eight-channel Philips SENSE head coil. Head movements were minimized using head pads and a forehead strap. High-resolution T-1 structural images of the whole brain were acquired with three dimensional (3D) magnetization prepared rapid acquisition gradient echo (3 D MPRAGE) sequence (acquisition matrix: 240/256 r; field of view: 240 ms; repetition time [TR]: 9.9 ms; echo time [TE]: 4.6 ms; flip angle: 8; turbo field echo (TFE) factor: 200; <sup>1</sup> <sup>×</sup> <sup>1</sup> <sup>×</sup> 1 mm<sup>3</sup> resolution). One hundred eighty two contiguous slices, each 1-mm thick, 0 mm slice gap, were acquired. The total acquisition time of the sequence was about 4:24 min. In addition to the 3D MPRAGE, a standard axial T-2 weighted/FLAIR (*TR* = 11*.*000 ms; *TE* = 125*/*27 ms; 264 × 512 matrix; field of view [*FOV*] = 230 × 230; 3-mm-thick slices with 1 mm slice gap) was obtained. A Short TI Inversion Recovery (STIR) was used to produce 24, 2.5 mm axial slices (interslice gap = 1 mm; *TR* = 4718 ms; *TE* = 80 ms; inversion time = 200 ms; 264 × 512 matrix; *FOV* = 230 mm; number of excitations = 2). Lesion volumes were manually drawn by one of us (SFW) who was blind to patients' aphasic profiles. The drawings were made using MRIcro software (Rorden, 2005; http://www*.*mccauslandcenter*.* sc*.*edu/mricro/mricro/) on T1-weighted images.

#### *Results*

Anatomical MRI revealed contiguous but non-overlapping perisylvian infarctions (**Figure 2**). Lesions were small and had similar volumes (*RTP* <sup>=</sup> <sup>16</sup>*.*5 cm3; *JGG* <sup>=</sup> <sup>15</sup>*.*8 cm3). Tissue damage in RTP involved the left sensorimotor cortex (precentral gyrus and postcentral gyrus). There were minute foci in the left medial/posterior insula and superior temporal gyrus, but most

part of this gyrus and the whole supramarginal gyrus were spared. JGG had tissue damage centered in the left posterior temporal gyrus and supramarginal gyrus. The cortical lesion in RTP was more superficial than the one found in JGG which extended deeply to reach the ventricular wall at the level of the white matter underlying the superior temporal gyrus and supramarginal gyrus. There were no lesions in the right hemisphere. The MRI in JGG additionally disclosed cavum septum pellucidum and cavum vergae, but no other developmental malformations (**Figure 2**).

#### **DIFFUSION TENSOR IMAGING (DTI)** *Methods*

DTI allows for "*in vivo*" measurement of the diffusive properties of water in a way that allows information to be garnered about the microstructural organization of tissue (Basser et al., 1994). Tractography enables the orientation of white matter (WM) to be ascertained, thus making possible the segregation of WM into separate sections based on the paths of the distinct tracts (Le Bihan, 2003). Data acquisition was performed using multi-slice single-shot spin-echo echo-planar imaging (EPI) with specific parameters as follows: FOV 224 mm, 2-mm-thick slices with 0 mm slice gap, *TE* = 117 ms, *TR* = 12408 ms, and b factor: 3000 s/mm2. The EPI echo train length consisted of 59 actual echoes reconstructed in a 112 × 128 image matrix. Sixty four diffusion directions were used in order to allow for precise construction of the diffusion tensor. Motion and eddy current correction were performed using FSL's FDT (http://www*.*fmrib*.* ox*.*ac*.*uk/fsl/) eddy current correction tool (Smith et al., 2004; Woolrich et al., 2009). Diffusion tensor estimation was carried out in using Diffusion Toolkit's least-square estimation algorithm for each voxel (Ruopeng Wang, Van J. Wedeen, TrackVis.org, Martinos Center for Biomedical Imaging, Massachusetts General Hospital). The whole brain tractography used an angular threshold of 35◦ and an FA threshold of 0.2. The tensor was spectrally decomposed in order to obtain its eigenvalues and eigenvectors. The fiber direction is assumed to correspond to the principal eigenvector (the eigenvector with the largest eigenvalue). This vector was color coded (green for anterior-posterior, blue for superior-inferior and red for left-right) in order to help generate the color FA map. An FA map was also generated from these eigenvalues. This too was done using Diffusion Toolkit. Virtual dissections of the three parts of the AF, the corpus callosum and the inferior frontal-occipital fasciculus/extreme capsule (IFOF/EmC) were performed by using a region of interest (ROI) approach, following the directions of a white matter tractography atlas (Catani and Thiebaut de Schotten, 2012) and Catani et al. (2007). All virtual dissections were performed using TrackVis (Ruopeng Wang, and Van J. Wedeen, TrackVis.org, Martinos Center for Biomedical Imaging, Massachusetts General Hospital).

#### *Results*

Reconstruction of the three segments of the AF and the IFOF/EmC revealed differences between the patients in both the primarily affected (left) and unaffected (right) hemispheres. Reconstruction of the AF of the right hemisphere revealed that RTP had well-developed temporo-frontal, temporo-parietal and parieto-frontal sections (**Figure 3A**). The IFOF/EmC was also intact (**Figure 3A**). In the left hemisphere, however, we were only able to reconstruct the temporo-parietal segment of the AF (**Figure 3B**), with the parieto-frontal and temporo-frontal segments seemingly destroyed by the lesion (**Figure 3B**). We were also unable to reconstruct the entire path of the left IFOF/EmC

in RTP (**Figure 3B**). The occipito-temporal and frontal sections were reconstructed separately, but the section of this tract that swings from the temporal lobe through to the external capsule and connects both parts shown in **Figure 3B** was affected by the lesion and unreconstructable.

Analysis of the same tracts in JGG revealed a complete lack of the direct temporo-frontal section of the AF in the right hemisphere, as is often found in males (Catani et al., 2007; **Figure 4A**). The indirect temporo-parietal and parieto-frontal sections were present but very small (**Figure 4A**). Tractography of the left hemisphere tracts revealed that the fronto-parietal segment of the AF was intact, but the temporo-parietal and temporo-frontal segments were unreconstructable, and seemingly destroyed by the lesion (**Figure 4B**). The IFOF/EmC was reconstructed successfully in both hemispheres (**Figures 4A,B**).

Reconstruction of the corpus callosum of each patient revealed a large and completely intact tract in RTG (**Figure 2C** top panel), while a large section of the midbody of the corpus callosum in JGG revealed only a sparse amount of reconstructed streamlines

**FIGURE 4 | DTI, MRI, and PET.** Uninflated surface of the right **(A)** and left hemispheres **(B)** (FreeSurfer reconstruction) of the patient with conduction aphasia (JGG) showing gyri cultured in green with sulci shown in red. DTI of right hemisphere perisylvian pathways superimposed on JJG anatomical MRI shows a complete lack of the direct temporo-frontal section of the AF in the right hemisphere. The indirect temporo-parietal (yellow) and parieto-frontal (purple) segments are present but very small. White matter tract reconstruction in the left hemisphere shows that the fronto-parietal segment of the AF (purple) was intact, but the temporo-parietal and temporo-frontal segments were unreconstructable, and seemingly destroyed by the lesion **(B)**. The ventral stream (inferior frontal-occipital fasciculus/extreme capsule) was reconstructed successfully in both hemispheres **(A,B)**. The surface image of the left hemisphere depicted in **(B)** also shows the infarction (red area) involving the posterior-superior and middle temporal gyri and part of the supramarginal gyrus. Parasaggital T1-weighted image shows a component of the infarction in the supramarginal gyrus (white arrow) surrounded by perinecrotic tissue (yellow arrow) **(C)**. Parasaggital image of the 18FDG-PET (MRIcroN) showing an area of reduced metabolic activity in the left temporoparietal region (red) **(D)** which is slightly larger at the level of the posterior temporal gyrus than the area of infarction depicted in **(C)**. Less voluminous foci of reduced metabolic activity (all in red) are also shown in the left middle frontal gyrus (Brodmann's area 6), lateral orbitofrontal cortex (Brodmann's area 11), inferior temporal gyrus (Brodmann's area 20), and cerebellum **(D)**. See further details in **Table 4**.

with the rostral midbody particularly affected (**Figure 2C**, bottom panel) (complimentary data is shown in positron emission tomography section).

#### **FUNCTIONAL MRI**

#### *Methods*

The fMRI included the following parameters: T2-weighted fMRI scans were acquired using a gradient echo FFE-EPI (fast-field echo-echo planar image) sequence (repetition time/echo time = 2500/30 ms, flip angle = 60, field of view = 23.0 × 23.0 cm, matrix = 96 × 128; 40 axial slices aligned parallel to the anterior commissure-posterior commissure line, slice thickness = 2.5 mm; interslice gap = 0.5 mm). fMRI data processing and preprocessing were carried out using FEAT (FMRI Expert Analysis Tool) Version 5.98, part of FSL (Version 4.1.8) (Jenkinson et al., 2012). Functional datasets underwent pre-processing using the specifications outlined below. Data was corrected for movement during the scan using mcFLIRT (Jenkinson et al., 2002). BET brain extraction was used to delete non-brain tissue from functional datasets as preparation for registration to structural images (Smith, 2002). Data were "prewhitened" and smoothed to a standard of 5 mm FWHM. High-pass filter cut-off 60 s (as per FSL recommendation of a filter equal to the total design cycle time). Low resolution functional images were first registered to each individual's brain-extracted high resolution structural image using a linear search (6 DOF). Highres structurals were then registered to the standard space MNI-152 T1 2 mm template with a 12 DOF linear transformation followed by a non-linear warp. Manual denoising of data was completed by examining FSL MELODIC (Beckmann and Smith, 2004) output generated for each subject during pre-processing. Components were deleted if they demonstrated activations correlated with subject movement or other artifacts rather than authentic activation to task according to recommendations set out in (Kelly et al., 2010). Time-series statistical analysis was carried out using FILM with local autocorrelation correction (Woolrich et al., 2001). Z (Gaussianised T/F) statistic images were thresholded using clusters determined by Z *>* 5 and a (corrected) cluster significance threshold of *p* = 0*.*05.

#### **STIMULI AND EXPERIMENTAL DESIGN**

Both patients performed the behavioral testing (repetition of words, nonwords and word triplets) and fMRI on the same day. All paradigms contained the same number of stimuli in the behavioral session and fMRI session and stimuli were presented in the same order and timing. The experimental paradigms included three covert repetition activation tasks. In the single item repetition paradigm, one task contained 40 high-frequency, concrete Spanish nouns [e.g., "casa" (house)] whereas the other task contained 40 nonwords which were derived from real words ["piedra" (stone) → *pierla*] by substituting the phonemes on the basis of Spanish phonotactical rules. The third paradigm was a word triplet repetition task which contained 20 three-high frequency words composed of semantically random word combinations (e.g., "buy-sweet-country"). Only high-frequency words were used and were taken from the Frequency Dictionary of Spanish Words (Juilland and Chang-Rodriguez, 1964). All tasks were binaurally presented through headphones and patients were requested to covertly repeat the item or triplet without delay. During the word and nonword paradigms 4 baselines and 4 stimulus sequences were presented and each period (baselines and stimulus sequences) lasted 30 s. Each stimulus sequence included 10 items with a presentation time of 3.0 s per stimulus. The same methodology was applied for word triplet repetition, but each stimulus sequence included 5 word triplets with a presentation rate of 6.0 per stimulus. There was no repetition of words, nonwords or triplets between periods or time points. The number and proportion of errors (including errors and no responses) in each tasks during the behavioral testing was analyzed in each patient.

#### **BEHAVIORAL AND IMAGING RESULTS**

Behavioral assessment of single word repetition was flawless (40/40, 1.0) in RTP and almost perfect (38/40, 0.95) in JGG (χ<sup>2</sup> *(*1*)* , 0.51, *p* = 0*.*447) but nonword repetition was abnormal in both patients with RTP performing significantly better (33/40, 0.80) than JGG (17/40, 0.42) [χ<sup>2</sup> *(*1*)* , 11.85, *p <* 0*.*001]. Word triplet repetition was normal in RTP (38/40, 0.95) and moderately impaired in JGG (22/40, 0.55) (χ<sup>2</sup> *(*1*)* , 14.81, *p <* 0*.*000). The fMRI showed bilateral perisylvian activation in both patients in all three tasks. These are shown in **Figure 5** and **Table 3**. JGG showed greater areas of activation, extending into motor, premotor and prefrontal areas in addition to the perisylvian areas activated by both patients during the word and nonword repetition tasks (**Figure 5**). In contrast, word triplet repetition activated a greater bilateral network in RTP than JGG, with JGG exhibiting focal activation in left frontal and superior temporal areas, and small right-sided superior temporal sulcus and inferior frontal gyrus (**Figure 5**).

### **POSITRON EMISSION TOMOGRAPHY**

#### *Methods*

A [18F]-fluorodeoxyglucose positron emission tomography (18FDG-PET) was performed at rest using a GE Advance PET/CT scanner (GE Medical Systems). Preparation for the study included fasting for at least 6 h before the administration of 18F-FDG and oral hydration with water. Both patients and 25 healthy control subjects (female/male: 11/14; mean age ± *SD*; 56.9 ± 5.7 years; age range: 45–67 years) refrained from drinks containing alcohol, caffeine, and smoking for 12 h before the PET scan. The subjects received an approximate dose of [18F] FDG 370 MBq at resting conditions with eyes closed and in an environment with dimmed ambient light. Forty minutes after the injection PET acquisition were performed in a GE Discovery ST PET scanner during 20 min in 3D mode with a field of view of 15.7 cm and a pixel size 2.3 mm, after CT for attenuation correction purposes. The images were reconstructed with iterative construction resulting in 47 sections with a slice thickness of 3.27 mm. Statistical parametric mapping software (SPM5, http://www*.*fil*.*ion*.*ucl*.*ac*.*uk/spm/software/ spm5/), based on MATLAB, v7.7 (The Mathworks Inc, Natick, MA), was used for realignment, transformation into standard stereotactic space, smoothing (6 mmFWHM), and statistical analyses. Individual global counts were normalized by proportional scaling to a mean value of 50 mg/100 ml/min.

#### *Results*

The areas of reduced metabolic activity in both patients are shown in **Table 4** and **Figures 2**, **3**, **4**, **6**. The PET in RTP showed areas of significantly decreased metabolic activity in the area of infarction and surrounding it in the left precentral gyrus, postcentral gyrus, insula, and middle frontal gyrus. Other cortical areas in both cerebral hemispheres together with the left caudate also showed decreased metabolic activity. In JGG significant decrements of metabolic activity were found in the area of structural damage and surrounding it in the left supramarginal gyrus, middle temporal gyrus, inferior temporal gyrus and in the middle frontal gyrus. Decreased glucose metabolism was also found in other cortical areas in both cerebral hemispheres, left posterior thalamus, left caudate, bilateral cerebellum and rostral body of the corpus callosum (**Figures 2**, **6**).

#### **DISCUSSION**

In this study, we examined the structural and functional correlates of speech production and repetition in two patients with contrasting aphasic syndromes, TCMA and CA. In their cognitive investigation of the two routes of speech production, McC & W (1984) reported a double dissociation in patients with CA and TCMA (see Introduction). They did find that repetition relying on active semantic processing (novel sentences) was facilitated in CA and hindered in TMCA, whereas repetition that minimized the engagement of semantics (clichés) was privileged in TCMA and hampered in CA. We replicated and extended these behavioral findings and subsequent elaborations by other authors (Catani et al., 2005; Catani and Thiebaut de Schotten, 2012) on the plausible neural correlates of this double dissociation in the aphasics described by McC & W (1984). From a clinical and radiological standpoint, our cases are not fully comparable with the ones described by McC & W (1984) because their patients had more severe aphasias and larger lesions than those found in our cases. Moreover, one of the two patients with CA (ORF) and the patient with TCMA (ART) by McC & W (1984) were examined in the acute post-stroke period (∼1 or 2 months post-onset). By contrast, our patients were evaluated in the chronic stage, thus implying the possibility of recovery of certain functions. Nonetheless, we trust that our patients are comparable with McC & W's cases (1984) in other respects, since all these patients had a clear-cut double dissociation in tasks that manipulated semantic processing requirements of repetition. This piece of evidence may inform how selective damage or sparing of different segments of the dorsal stream (AF) and of the ventral stream (IFOF/EmC) can underpin stable deficits or preserved/compensated function.

Large-scale cortico-subcortical networks are engaged in language function (Mesulam, 1998), but the respective role of cortical areas and white matter tracts as well as their dynamic interaction in subserving language repetition is still controversial (Saur et al., 2008; Bernal and Ardila, 2009; Rolheiser et al., 2011; Turken and Dronkers, 2011; Berthier et al., 2012; Dick and Tremblay, 2012; Rijntjes et al., 2012; Friederici and Gierhan, 2013). Verbal repetition is a multifaceted function which involves multiple domains (attention, phonological working memory, and

#### **Table 3 | Brain areas activated on functional Magnetic Resonance Imaging during repetition tasks.**


*Localization is based on stereotactic coordinates MNI (Montreal Neurological Institute). These coordinates refer to the location of maximal activation indicated by the Z score in a particular anatomic structure. Distances are relative, to the intercommissural (AC-PC) line in the horizontal (x), anteriorposterior (y) and vertical (z) directions.*

lexical-semantic, syntactic, phonemic and motor production processes) requiring the concerted action of several cortical areas and white matter bundles in both cerebral hemispheres (Price et al., 1996; Castro-Caldas et al., 1998; Burton et al., 2001; Collete et al., 2001; Abo et al., 2004; Saur et al., 2008; Majerus et al., 2012; Hartwigsen et al., 2013). While performance in language repetition tasks in healthy subjects is almost perfect, differences between subjects have been described (Castro-Caldas et al., 1998; Catani et al., 2007) which may depend on demographic factors (e.g., gender, literacy) and on individual variability in the anatomy and function of cortical areas and white matter tracts (Catani et al., 2005; Berthier et al., 2012). The neural basis of repetition is currently interpreted in a dual dorsal-ventral pathways framework (Hickok and Poeppel, 2007; Rauschecker and Scott, 2009; Axer et al., 2013; Cloutman, 2013; Friederici and Gierhan, 2013; Kümmerer et al., 2013). The ventral pathway connects the frontal and temporal cortices via the ventral stream (IFOF/EmC and uncinate fasciculus) and it has been associated with comprehension processes by mapping sounds onto meaning (Makris and Pandya, 2009; Turken and Dronkers, 2011; Rijntjes et al., 2012; Axer et al., 2013). The AF is formed by three segments including the long-direct pathway connecting the temporal cortex to the

#### **Table 4 | Brain regions showing significant decreases of metabolic activity in RTP and JGG relative to 25 healthy control subjects.**


*FEW p < 0.05 (corrected for multiple comparisons). Localization is based on stereotactic coordinates MNI (Montreal Neurological Institute). These coordinates refer to the location of maximal decrement metabolic activity indicated by the Z score in a particular anatomic structure. Distances are relative, to the intercommissural (AC-PC) line in the horizontal (x), anterior-posterior (y) and vertical (z) directions.*

**FIGURE 6 | 18FDG-PET.** Axial PET images (MRIcroN) of patients with transcortical motor aphasia (blue) and conduction aphasia (red) showing significant reductions of metabolic activity in the left perisylvian areas and interconnected areas (see **Table 4**). Note that although the structural lesion in RTP was more superficial than the one in JGG (see **Figure 2**), decreased metabolic activity extended deeply to affect white matter in both cases. The left hemisphere is represented on the left side of the images.

prefrontal cortex, the indirect segment composed of the anterior segment connecting the inferior parietal cortex and Broca's area and a posterior segment connecting temporal and parietal regions (Catani et al., 2005). The direct segment supports sensory-tomotor mapping participating in speech perception and fast, automatic repetition (Catani et al., 2005; Hickok and Poeppel, 2007; Saur et al., 2008; Friederici, 2012), whereas the indirect segment has been linked to verbal comprehension (semantic/phonological transcoding, complex syntactic processing) (Catani et al., 2005; Friederici and Gierhan, 2013).

While the ventral streams are symmetrical, the dorsal streams (AF) are more asymmetrical (Paus et al., 1999; Catani et al., 2005; Turken and Dronkers, 2011; Axer et al., 2013) and most DTI studies reveal a gender-dimorphic architecture of the dorsal stream (Catani et al., 2005; Catani and Mesulam, 2008; Häberling et al., 2013, but see Gharabaghi et al., 2009). Leftward biased asymmetry of the AF predominates in males and usually coexists with the absence or vestigial development of its long segment in the right hemisphere. By contrast, females tend to have more symmetrical patterns (Catani et al., 2005; Powell et al., 2006; Catani and Mesulam, 2008; Thiebaut de Schotten et al., 2011; Catani and Thiebaut de Schotten, 2012; Häberling et al., 2013). The leftward biased asymmetry of the AF may depend on genetic influences (Häberling et al., 2013) and also on repeated practice (Halwani et al., 2011). The ventral streams are present at birth (Perani et al., 2011) but the segment of the AF connecting the temporal cortex and Broca's area is undetectable in newborns (Perani et al., 2011) and matures late adopting the adult-like structure by the age of 7 years (Friederici, 2012). Thereafter, the macrostructure (tract length and width) and microstructure (fractional anisotropy) of the AF do not remain static; rather the AF is gradually modeled by skill acquisition and repeated practice (e.g., musicians, bilinguals) throughout the lifespan (Draganski and May, 2008; Halwani et al., 2011; May, 2011; Mackey et al., 2012; Schlegel et al., 2012) and also by intensive training tailored to remediate pathological conditions (aphasia) (Schlaug et al., 2009, 2010; Breier et al., 2011; Zipse et al., 2012). Based on these behavioral and imaging data, we suggest that dissociated performances in speech production (nonfluent in TCMA and fluent in CA) and repetition tasks (preserved in TCMA and impaired in CA) in the present cases resulted from damage to the left dorsal stream and gender-dimorphic architecture of their right dorsal streams.

#### **DISSOCIATED SPEECH PRODUCTION DEFICITS**

Speech production deficits (fluency) in our cases were dissociated: slowness and hesitation pinpointed spontaneous speech in the TCMA patient (RPT), whereas fluent utterances occasionally punctuated by self-corrections were heard in the CA patient (JGG). Picture description was more informative in JGG than in RTP though both produced meaningful utterances devoid of paraphasias, articulatory or apraxic deficits. Auditory comprehension and object/picture naming were largely preserved in both cases, but word list generation (animal naming) was poor with RPT producing less exemplars than JGG. The fact that both patients had small lesions of similar volumes makes unlikely that this variable could explain the differences found in speech production (Lazar et al., 2008; Marchina et al., 2011). Instead, we attribute dissociated speech production deficits to differences in both lesion location and metabolic changes in regions nearby and distant to the areas of infarction. Anatomical MRIs in our patients showed small contiguous but non-overlapping left perisylvian infarctions. In RTP, there was involvement of left sensorimotor cortex and medial insula relevant for planning and execution of speech (Riecker et al., 2005; Sörös et al., 2006; Ackermann and Riecker, 2010; Price, 2010), whereas fluent speech in JGG could be related to involvement of the left temporoparietal cortex with sparing of more anterior cortical areas. Pervasive and sometimes long-lasting deficits in speech production can also emerge in association with damage to left hemisphere white matter tracts (medial subcallosal fasciculus, periventricular substance) (Naeser et al., 1989) and lesion load in the left AF can also impair speech production in aphasia (Marchina et al., 2011). Of note, DTI revealed damage to different segments of the AF in our cases which might account for dissociated speech production deficits. RTP had damage to the left parietofrontal segment of the AF already related to impaired speech fluency in TCMA (Catani et al., 2005; Catani and Thiebaut de Schotten, 2012), whereas this segment could be identified in patient JGG with fluent CA.

An important caveat on the contribution of the AF lesion load to speech production deficits in our cases is the concomitant cortical involvement. On the basis of recent research, it could be argued that functional and structural damage to the sensorimotor cortex, pars opercularis, and insula in RTP impacted fluency (Blank et al., 2002; Borovsky et al., 2007) more than the temporoparietal involvement in JGG (Hickok et al., 2011). On the other hand, the cortical temporoparietal component of the lesion in JGG could actually be responsible of other deficits characteristic of CA since it was strategically placed to disrupt phonological short-term memory (Vallar et al., 1997; Leff et al., 2009; Buchsbaum et al., 2011) and auditory-motor integration dependent upon the activity of the area Spt, a small region recently implicated in pathogenesis of CA (for review see Hickok et al., 2011). Indeed, some researchers argue that the left temporoparietal involvement suffice to explain language and short-term memory deficits in CA (Buchsbaum et al., 2011; Hickok et al., 2011), thus undermining the traditional "disconnection" mechanism in JGG (Geschwind, 1965). However, PET data in JGG showed significant metabolic decrements in frontal areas (Brodmann's areas 6, 9) remote to the temporoparietal region but connected with it via the AF (Rilling et al., 2008). Moreover, it should be kept in mind that the cortical damage encompassing the area Spt and surrounding regions also involve the cortical origins of the AF and this coupled with the additional involvement of the AF at subcortical level may have impacted on the activity of distant areas. In other words, damage of one component of a network may alter the function of the whole system (Gratton et al., 2012; Rijntjes et al., 2012).

#### **COMMUNICATION AND BEHAVIOR**

We also did find reductions in everyday verbal communication with the TCMA patient (RTP) unexpectedly attaining better scores in the quality and amount of communication subscales of the CAL (Pulvermüller and Berthier, 2008) than the CA patient (JGG). These findings appear paradoxical particularly in light of recent brain imaging studies reporting that deficits in speech fluency and conversational speech co-occur because the responsible lesions overlap in cortical (Borovsky et al., 2007) and subcortical sites (Marchina et al., 2011). The notion that communication relies on language is intuitively appealing, but recent studies suggest that language and communication may be dissociable abilities by virtue of being reliant on the activity of different neural systems (Ellis et al., 2005; Willems and Varley, 2010; Coelho et al., 2012; Moreno-Torres et al., 2013). The processing of phonology, syntax and lexical-semantics depends on components (perisylvian cortex, temporal pole) of an extended neural network that dynamically interacts with other networks (e.g., medial prefrontal cortex, basal ganglia, cerebellum) engaged in complimentary functions including the motivation to communicate messages, understanding the intention of others and so forth (Willems and Varley, 2010). Therefore, if some components of these large-scale networks are lesioned whereas other components remain functional, evaluation of language and communication abilities in such cases will show dissociable deficits (Ellis et al., 2005; Willems and Varley, 2010; Moreno-Torres et al., 2013).

In our cases, RTP exhibited the predictable correspondence between diminished speech production and poor communication. Verbal communication deficits in RTP affected more the amount than the quality of communication perhaps because the prefrontal areas (Brodmann's areas 9, 10, and 46) and the anterior insula implicated in narrative discourse (Alexander, 2006; Moreno-Torres et al., 2013) were only mildly affected. By contrast, the amount and quality of communication were equally affected in JGG and neuroimaging (anatomical MRI, DTI, PET) disclosed involvement of structures implicated in communication including the inferior parietal lobe (Geranmayeh et al., 2012), left anterior cingulate gyrus, rostral body of corpus callosum, bilateral cerebellum and right paravermis (Durisko and Fiez, 2010; Marvel and Desmond, 2010; Willems and Varley, 2010). The fact that both patients were depressed probably contributed to reduced functional communication in socially interactive contexts (Fucetola et al., 2006).

#### **DISSOCIATED REPETITION DEFICITS**

Multimodal brain imaging findings in our cases extend the interpretation of traditional models (Lichtheim, 1885; Wernicke, 1906; McCarthy and Warrington, 1984) and ensuing elaborations (Catani et al., 2005) by incorporating the compensatory activity of other white matter tracts and cortical areas. Our results suggest that dissociated repetition deficits in our cases depend on available interactions between left dorsal stream (spared segments, short tracts remains) and left ventral stream as well as on genderdimorphic architecture of the right dorsal stream. In the TCMA patient (RTP), damage to the left sensorimotor cortex and insula extending into the dorsal stream and part of the ventral stream did not alter repetition, except for a moderate impairment in nonword and digit repetition. The abnormal performance of RTP on nonword repetition after damage of the dorsal stream is compatible with its putative role on phonological transcoding (Saur et al., 2008; Rijntjes et al., 2012; Cloutman, 2013), but this function was only moderately impaired in RTP thus implying additional mediation or compensation by other structures. Moreover, preserved performance on word and sentence repetition tasks as documented in RTP is highly unlikely after damage to the left dorsal stream unless other structures also contribute to these language functions. DTI of the left hemisphere showed that the temporo-parietal segment of the left AF was spared as well as its intertwining with the ventral stream in the posterior temporal lobe (Rijntjes et al., 2012). Complimentary fMRI data during all repetition tasks showed a consistent activation of the left middle and superior temporal regions where the temporo-parietal segment of the left dorsal stream and the ventral stream interact (Rolheiser et al., 2011; Rijntjes et al., 2012; Cloutman, 2013). Word, word lists (triplets) and sentence repetition was almost intact in RTP with no influence of linguistic variables (word frequency, imageability, lexicality, meaningfulness of word triplets, and familiarity of sentences) thus raising the possibility that in the face of an unavailable left parieto-frontal and temporo-frontal AF segments, this verbal information was redirected via its spared temporo-parietal segment to the ventral stream to be repeated successfully (see Lopez-Barroso et al., 2011). Moreover, we also attribute the successful repetition performance of RTP to the contribution of the right dorsal and ventral streams (Berthier et al., 2012). In this regard it should be noted that RTP had welldeveloped right dorsal and ventral streams and that during fMRI tasks there was a bilateral activation in frontal areas suggesting recruitment of left and right dorsal streams and transmission of signals to superior temporal cortices indicating a shift of activation from dorsal stream to ventral stream. RTP was a female with apparently symmetric dorsal streams, an anatomical pattern which correlates with better verbal learning through word repetition in females than in males with leftward biased asymmetry of the dorsal stream (Catani et al., 2007). Accordingly, it is possible that the topography of fMRI activation foci in RTP indicates a rather symmetric organization of repetition before the stroke (Berthier, 1999), or its reorganization in the right hemisphere after brain injury. The later possibility seems unlikely because the left hemisphere lesion in RTP was small and compensation by the right hemisphere after stroke usually takes place in cases with large left hemisphere lesions (Heiss and Thiel, 2006; Berthier et al., 2011; Turkeltaub et al., 2011).

Brain-behavior relationships in the CA patient (JGG) were different to that found in RTP. He was moderately impaired in repeating nonwords, digits, non-meaningful word triplets and clichés although he could repeat fairly well-words, meaningful word triplets and novel sentences. Multimodal brain imaging disclosed tissue damage and reduced metabolic activity in the left posterior temporal cortex/supramarginal gyrus with additional metabolic decrements in the left frontal lobe and other structures (see below). DTI showed the left temporo-parietal and temporofrontal segments of the AF interrupted by the lesion, but the both ventral streams were spared. Importantly, the direct segment of the AF in the right hemisphere was also absent with only vestigial remains of the other dorsal subcomponents present, an architecture prevailing in males (Catani et al., 2005; Catani and Mesulam, 2008). Although both patients had small lesions of similar volumes, the fMRI tasks showed larger areas of bilateral perisylvian activation in JGG than in RTP extending into motor, premotor and prefrontal areas in addition to the perisylvian areas activated during the word and nonword repetition tasks by both of them. In contrast, word triplet repetition activated a greater bilateral network in RTP than JGG, with JGG exhibiting focal activation in left frontal and superior temporal areas, and small right-sided superior temporal sulcus and inferior frontal gyrus. This limited activation was not totally unexpected as word triplets were semantically unrelated and he could only repeat semanticallyrelated three word strings. In the same context, JGG repeated novel sentences requiring active semantic processing significantly better that overlearned clichés, a dissociated performance suggestive of reliance on the ventral streams.

The activation of large areas in JGG was not observed in previous CA cases with small (Fernandez et al., 2004; case JVA in Berthier et al., 2012) and large structural lesions (Harnish et al., 2008). Rather, this activation resemble the patterns recently described in both normal children with still undeveloped AF (Brauer et al., 2010) and adolescents with early damage to AF (Yeatman and Feldman, 2013). The MRI in JGG disclosed enlarged cavum septum pellucidum/cavum vergae and involvement of the anterior corpus callosum besides the poorly developed right dorsal stream. A small cavum septum pellucidum (Grades 0–2) is considered a normal neuroanatomical variation and can occur in around 30% of healthy control subjects (DeLisi et al., 1993; Hopkins and Lewis, 2000; Choi et al., 2008). By contrast, an enlarged cavum septum pellucidum (Grades 3 and 4) represents a midline malformation and a marker of arrested development of neighboring structures such as hippocampus, septal nuclei, limbic system, or corpus callosum (Kim and Peterson, 2003; Brown et al., 2009). The presence of an enlarged cavum septum pellucidum has been associated with various disorders related to dysfunction of the aforementioned structures such as schizophrenia (Degreef et al., 1992; Trzesniak et al., 2012), bipolar disorder (Kim et al., 2007), obsessivecompulsive disorder (Chon et al., 2010), and developmental disorders (macro/microcephaly, mental retardation, developmental delay, Tourette syndrome) (Schaefer et al., 1994; Kim and Peterson, 2003). JGG had a negative history for these disorders and the presence of enlarged cavum septum pellucidum/cavum vergae was clinically unsuspected, yet their occurrence raises the possibility that other brain regions were abnormally developed as well. Embryologic development of the septum pellucidum is intimately associated with the corpus callosum and we found that this commissural pathway was fully normal in RTP but abnormal in JGG. DTI and PET in JGG showed sparse amount of reconstructed streamlines and reduced metabolic activity in the rostral body of the corpus callosum, respectively. Since this part of the corpus callosum interconnects premotor, motor and supplementary motor regions (Witelson, 1989; Aboitiz and Montiel, 2003; Hofer and Frahm, 2006; Saur et al., 2010), it is possible that reduced inter-hemispheric interactions coupled with undeveloped right dorsal stream explained the limited capacity of JGG to compensate repetition deficits. The results obtained in JGG should be interpreted with caution because he had minor developmental anomalies that probably interfered with the development and maturation of some brain regions. Although these malformations were clinically silent and might be interpreted as incidental MRI findings, their impact in the profile and evolution of aphasic deficits remains to be determined.

The study of the neural correlates of dissociated speech production and repetition deficits with multimodal imaging in these cases confronted us with a complex scenario characterized by reorganization of repetition in both cerebral hemispheres. Our findings are preliminary because they were documented only in two patients and because performing single-subject experimental research using neuroimaging (DTI, fMRI, PET) entails some disadvantages in comparison with case series studies and group studies (Kiran et al., 2013). Although further studies are clearly needed, our findings in two well-matched patients with contrasting aphasic syndromes suggest that dissociated repetition deficits in these aphasic syndromes are probably reliant on flexible interactions between spared components of the left dorsal and ventral streams and on gender-dimorphic architecture of the right dorsal stream.

#### **REFERENCES**


premotor cortex during pseudoword repetition. *J. Cogn. Neurosci.* 25, 580–594. doi: 10.1162/jocn\_a\_00342


Lichtheim, L. (1885). On aphasia. *Brain* 7, 433–484. doi: 10.1093/brain/7.4.433


more common in schizophrenia spectrum disorders? *A systematic review and meta-analysis. Schizophr. Res.* 125, 1–12. doi: 10.1016/j.schres.2010.09.016


**Conflict of Interest Statement:** Marcelo L. Berthier declares association with the following companies: Bayer, Eisai, Eli Lilly, GlaxoSmithhKline, Janssen, Merz, Novartis, Nutricia, Pfizer, and Lundbeck. Rocío Juárez y Ruiz de Mier declares association with Pfizer. Seán Froudist Walsh, Guadalupe Dávila, Alejandro Nabrozidis, Antonio Gutiérrez, Irene De-Torres, Rafael Ruiz-Cruces, Francisco Alfaro, and Natalia García-Casares declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

#### *Received: 15 July 2013; accepted: 29 November 2013; published online: 19 December 2013.*

*Citation: Berthier ML, Froudist Walsh S, Dávila G, Nabrozidis A, Juárez y Ruiz de Mier R, Gutiérrez A, De-Torres I, Ruiz-Cruces R, Alfaro F and García-Casares N (2013) Dissociated repetition deficits in aphasia can reflect flexible interactions between left dorsal and ventral streams and gender-dimorphic architecture of the right dorsal stream. Front. Hum. Neurosci. 7:873. doi: 10.3389/fnhum.2013.00873 This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2013 Berthier, Froudist Walsh, Dávila, Nabrozidis, Juárez y Ruiz de Mier, Gutiérrez, De-Torres, Ruiz-Cruces, Alfaro and García-Casares. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Sensory-to-motor integration during auditory repetition: a combined fMRI and lesion study

*'Oiwi Parker Jones ¯ 1,2 †, Susan Prejawa1 †,Thomas M. H. Hope1, Marion Oberhuber 1, Mohamed L. Seghier 1, Alex P. Leff <sup>3</sup> , DavidW. Green4 and Cathy J. Price1\**

*<sup>1</sup> Wellcome Trust Centre for Neuroimaging, University College London, London, UK*

*<sup>2</sup> Wolfson College, University of Oxford, Oxford, UK*

*<sup>3</sup> Institute of Cognitive Neuroscience, University College London, London, UK*

*<sup>4</sup> Cognitive, Perceptual and Brain Sciences, University College London, London, UK*

#### *Edited by:*

*Matthew A. Lambon Ralph, University of Manchester, UK*

#### *Reviewed by:*

*Zarinah Karim Agnew, University College London, UK Branch Coslett, University of Pennsylvania, USA*

#### *\*Correspondence:*

*Cathy J. Price, Wellcome Trust Centre for Neuroimaging, University College London, London WC1N3BG, UK e-mail: c.j.price@ucl.ac.uk*

*†'Oiwi Parker Jones and Susan ¯ Prejawa have contributed equally to this work.*

The aim of this paper was to investigate the neurological underpinnings of auditory-to-motor translation during auditory repetition of unfamiliar pseudowords. We tested two different hypotheses. First we used functional magnetic resonance imaging in 25 healthy subjects to determine whether a functionally defined area in the left temporo-parietal junction (TPJ), referred to as Sylvian-parietal-temporal region (Spt), reflected the demands on auditory-tomotor integration during the repetition of pseudowords relative to a semantically mediated nonverbal sound-naming task. The experiment also allowed us to test alternative accounts of Spt function, namely that Spt is involved in subvocal articulation or auditory processing that can be driven either bottom-up or top-down. The results did not provide convincing evidence that activation increased in either Spt or any other cortical area when non-semantic auditory inputs were being translated into motor outputs. Instead, the results were most consistent with Spt responding to bottom up or top down auditory processing, independent of the demands on auditory-to-motor integration. Second, we investigated the lesion sites in eight patients who had selective difficulties repeating heard words but with preserved word comprehension, picture naming and verbal fluency (i.e., conduction aphasia). All eight patients had white-matter tract damage in the vicinity of the arcuate fasciculus and only one of the eight patients had additional damage to the Spt region, defined functionally in our fMRI data. Our results are therefore most consistent with the neurological tradition that emphasizes the importance of the arcuate fasciculus in the non-semantic integration of auditory and motor speech processing.

**Keywords: fMRI, lesions, language, speech, aphasia**

#### **INTRODUCTION**

Auditory repetition is a task that requires the immediate reproduction of an auditory stimulus. This involves auditory processing of a heard sound, and then translation of the auditory input into an articulatory output that reproduces the sound of the original auditory input as closely as possible. This paper is concerned with the neurological underpinnings of this auditoryto-motor "translation," "mapping," or "integration," process. At the cognitive processing level, we distinguish between semantically mediated and non-semantically mediated translation. Semantically mediated translation involves the production of speech from semantic representations, for example when naming the source of nonverbal sounds (e.g., "cat" in response to hearing a meow). Non-semantically mediated auditory-to-motor translation proceeds by prior learning of the mapping between auditory inputs and vocal tract gestures. This could be at the level of lexical representations (e.g., familiar words like "champion"), sublexical representations (e.g., sequences of syllables "cham-pi-on" or "chonam-pi" ) or non-verbal auditory features (e.g., when the human vocal tract is used to mimic nonverbal sounds that have neither phonological nor semantic associations). Here we are specifically interested in the translation of non-semantic auditory inputs to motor outputs.

With respect to the neural underpinnings of auditory-to-motor integration, the classic neurological model of language identifies Wernicke's area (in the left posterior superior temporal cortex) as the site of "auditory images of speech" and Broca's area (in the left posterior inferior frontal cortex) as the site of "motor images of speech," with the arcuate fasciculus white-matter tract serving to integrate the auditory and motor images. According to this model, selective damage to the arcuate fasciculus that preserves Wernicke's and Broca's areas would impair auditory repetition in the context of intact speech comprehension and intact speech production (Geschwind, 1965). More recently, there have been claims that a cortical area on the left TPJ, known informally as sylvian-parietal-temporal (Spt), is actively involved in integrating auditory inputs with vocal tract gestures (Hickok et al., 2003; Hickok et al., 2009; Hickok, 2012). According to this perspective, selective deficits in auditory word repetition are the consequence of cortical damage to Spt (Buchsbaum et al., 2011). We examine this possibility in the context of functional magnetic resonance imaging (fMRI) and lesion studies, which allow us to examine auditory

to motor translation. We start by considering prior functional imaging evidence for the functional role of Spt.

Sylvian-parietal-temporal region is functionally defined as an area at the posterior end of the lateral sulcus (Sylvian fissure), around the anterior end of the TPJ, which responds in general to both auditory perception and silent vocal tract gestures (Hickok et al., 2009; Hickok, 2012). For instance, Spt responds to covert rehearsal in tests of phonological short-term memory (Jacquemot and Scott, 2006; Koelsch et al., 2009). As Spt is involved in humming music and silent lip reading (Pa and Hickok, 2008; Hickok et al., 2009), it is not specific to speech input or output. Instead, the auditory-to-motor integration process has been described as a mechanism by which sensory information can be used to guide vocal tract action (Buchsbaum et al., 2011). Here we make a distinction between an area that acts as an interface between two tasks (i.e., a shared level of processing) and an area that is involved in integrating one level of processing with another. In other words, an interface region may be activated independently by separate tasks (logical OR), given that they share a common processing level, whereas an integration region should only be active when multiple processing levels are present (logical AND), and brought together (i.e., transformed) into an integrated output. If Spt is an integration area, rather than just an interface, then it should be more activated when the task involves the translation of sensory inputs to motor outputs. Previous studies have reported greater Spt activation for covert repetition than listening, and argued that this reflects the greater demands on auditory-to-motor integration during repetition (Isenberg et al., 2012). However, covert repetition may also increase the demands on subvocal articulation and auditory imagery of the spoken response (i.e., an internal representation of how the spoken response, or any other auditory stimulus, would sound). If Spt is involved in either of these processes (see below for evidence) then activation that is common to listening and covert repetition may reflect a shared level of processing rather than an active auditory-to-motor integration process. Prior to concluding that Spt actively integrates auditory information with motor output, we therefore need to factor out explanations that are related to subvocal articulation (independent of sensory input) or auditory processing (independent of motor output).

The association of TPJ with auditory processing and auditory imagery arose from early functional neuroimaging studies that observed left TPJ activation when subjects imagined hearing another person's voice in the absence of any auditory stimulation or motor activity (McGuire et al., 1996). Subsequent studies have also shown left-lateralized activation in the TPJ in response to: silently imagining speech (Shergill et al., 2001); imagining the auditory relative to visual associations of a picture of a scene (Wheeler et al., 2000); experiencing tones and visual stimuli (Xue et al., 2006); silence following familiar music, even when there was no instruction to remember the music (Kraemer et al., 2005); passively viewing finger tapping on a piano following keyboard training (Hasegawa et al., 2004); producing rhythmic finger sequences that had been learnt with an auditory cue (Bengtsson et al., 2005); and imagining heard speech, music or environmental sounds in the absence of any acoustic stimulus (Aleman et al., 2005; Bunzeck et al., 2005; Zatorre and Halpern, 2005). Without a

functional localizer it is unclear which, if any, of these responses in TPJ was generated in area Spt. Nevertheless, an explanation of Spt responses in terms of auditory imagery would explain the overlap of activation during auditory perception, subvocal articulation (Paus et al., 1996a,b; Wise et al., 2001), and silent auditory shortterm memory tasks (Buchsbaum and D'Esposito, 2009; Koelsch et al., 2009; McGettigan et al., 2011) without the need to account for Spt activation in terms of a function that integrates auditory and motor processing.

The association of TPJ activation with subvocal articulation that occurs automatically during speech perception, particularly when speech perception is challenging (Buchsbaum and D'Esposito, 2009; Price, 2010), comes from observations that TPJ activation increased when subjects articulated four versus two syllables during a task that involved delayed repetition and subvocal rehearsal of pseudowords (Papoutsi et al., 2009). This subvocal articulation/articulatory rehearsal account can explain activation in TPJ during auditory working-memory tasks (Buchsbaum and D'Esposito, 2009; Koelsch et al., 2009) but does not explain why TPJ activation has been reported for auditory imagery of sounds that cannot be articulated (see above). It is therefore possible that different parts of TPJ are involved in auditory-to-motor integration, auditory imagery, and subvocal articulation. Our interest is in testing whether there is more evidence that Spt, located in TPJ, is involved in auditory motor integration than articulation or auditory processing alone.

Using fMRI, we defined the Spt area of interest functionally as being activated by both auditory speech perception and subvocal articulation (Hickok et al., 2003, 2009; Hickok, 2012). We then investigated whether any part of this Spt area was responsive to the demands on (1) non-semantic auditory-motor integration, (2) semantic to motor integration, (3) auditory input, and/or (4) articulation. By manipulating these factors independently, we aimed to determine the most likely level of processing that drives Spt. Our fMRI experiment (Paradigm 1) had 16 conditions in a 2 × 2 × 4 factorial design: auditory input versus visual input; speech production responses versus finger press responses; and four types of stimuli that weighted semantic and phonologically mediated speech production differentially. Moreover, to broaden our interpretation of Spt, we will also discuss the results of a second fMRI experiment (Paradigm 2) reported by Parker Jones et al. (2012). Without this second experiment, we could not rule out the possibility that an increased response in Spt merely reflected the integration of any sensory input and speech output, regardless of whether this integration was semantically mediated or not, as we explain below (see Materials and Methods).

In addition to investigating whether fMRI activation in Spt reflected the demands on auditory-to-motor integration, we also investigated lesion sites that were consistently associated with auditory repetition deficits in the context of intact word comprehension and production (i.e., conduction aphasia). Unlike a recent lesion study that looked for lesions associated with patients who had damage to both auditory repetition and picture naming (Buchsbaum et al., 2011), we were more interested in lesions that impaired auditory repetition while preserving the ability to name pictures. According to the neurological model, lesions associated with selective repetition difficulties were expected in the arcuate fasciculus, but according to functional neuroimaging data Spt involvement is also expected (Buchsbaum et al., 2011). We considered whether selective deficits in auditory repetition could occur following lesions to: (1) TPJ/Spt with minimal involvement of the underlying white matter; (2) the temporo-parietal white matter tracts (in the vicinity of the arcuate fasciculus) with minimal involvement of TPJ/Spt cortex; (3) both TPJ/Spt and the underlying white matter; and/or (4) neither TPJ/Spt nor the underlying white matter.

In summary, we used fMRI to test whether non-semantic auditory-to-motor translation during auditory repetition involved Spt or not, and then used lesion analyses to determine whether selective deficits in auditory repetition (i.e., conduction aphasia) were the consequence of lesions to Spt, the arcuate fasciculus, or both.

#### **MATERIALS AND METHODS**

The study was approved by the London Queen Square Research Ethics Committee. All subjects gave written informed consent prior to scanning and received financial compensation for their time.

#### **FUNCTIONAL MAGNETIC RESONANCE IMAGING** *Participants, fMRI Paradigm 1*

In the fMRI study, the participants were 25 healthy, righthanded, native speakers of English, with normal or correctedto-normal vision (12 females, 13 males, age range = 20–45 years, mean = 31.4 years, SD = 5.9 years). Handedness was assessed with the Edinburgh Handedness Inventory (Oldfield, 1971).

#### *Experimental design, fMRI Paradigm 1*

The conditions of interest were auditory word and pseudoword repetition. However, these were embedded in a larger experimental design with a total of 16 different conditions (see **Figure 1B**) that allowed us to tease apart the activation related to auditoryto-motor translation from nonverbal auditory processing, auditory word perception, semantic processing, covert (subvocal) articulation, and overt articulation (see below for details).

The 16 conditions conformed to a 2 × 2 × 4 factorial design. Factor 1 was "stimulus modality": auditory versus visual. Factor 2 was "task": overt speech production in response to the stimulus versus one-back matching which involved a finger press response to indicate if the current stimulus was the same as the previous stimulus. Factor 3 was stimulus type, with four conditions that manipulated the presence or absence of phonological cues (i.e., words and pseudowords versus nonverbal stimuli) and the presence or absence of semantic stimuli (i.e., words, pictures, and nonverbal sounds of objects and animals versus pseudowords, meaningless scrambled pictures, and baseline stimuli). In the auditory modality, the stimuli were words, pseudowords, nonverbal environmental sounds, and humming in either a male or female voice. In the visual modality, the corresponding stimuli were words, pseudowords, pictures of objects, and pictures of scrambled objects.

In the speech production conditions, participants were instructed to: (a) repeat the auditory words and pseudowords which involves direct translation of auditory inputs to motor outputs; (b) name the source of the environmental sounds (e.g., "cat" in response to a meow), which involves semantically mediated auditory–motor translation; and (c) name the gender of the humming voice (male versus female), which served as the auditory baseline condition. The corresponding speech production conditions in the visual modality were: reading words and pseudowords (which involve direct visuo-motor translation); naming the objects in pictures (which involves semantically mediated visuo-motor translation); and naming the dominant color in meaningless pictures of nonobjects (the visual baseline condition).

In the eight silent one-back matching conditions (with exactly the same stimuli as the speech production conditions), participants were instructed to press a button box in response to each stimulus to indicate if the stimulus was the same or different to the previous one. Half the subjects used their right middle/index finger for the yes/no response. The other half used their left index/middle finger for the yes/no response. The proportion of repeated to non-repeated stimuli was 1:8. To keep the stimuli identical across tasks, stimuli were also repeated 1 every eight trials in the speech production conditions.

#### *Stimulus selection/creation, fMRI Paradigm 1*

Stimulus selection started by generating 128 pictures of easily recognizable animals and objects (e.g., cow, bus, elephant, plate) with one to four syllables (mean = 1.59; SD = 0.73). Visual word stimuli were the written names of the 128 objects, with 3–12 letters (mean = five letters; SD = 1.8). Auditory word stimuli were the spoken names of the 128 objects (mean duration = 0.64 s; SD = 0.1), recorded by a native speaker of English with a Southern British accent approximating Received Pronunciation. Pseudowords were created using a non-word generator (Duyck et al., 2004) and matched to the real words for bigram frequency, number of orthographic neighbors, and word length. The same male speaker recorded the auditory words and pseudowords.

The nonverbal sounds associated with objects were available and easily recognizable for a quarter (i.e., 32) of the stimuli, and taken from the NESSTI sound library (http://www.imaging.org.au/Nessti; Hocking et al., 2013). The duration of the nonverbal sounds needed to be significantly longer (mean length = 1.47 s, SD = 0.13) than the duration of the words (*t* = 37.8; *p* < 0.001) because shorter sounds were not recognizable. The auditory baseline stimuli were recorded by male and female voices humming novel pseudowords, thereby removing any phonological or semantic content (mean length = 1.04 s, SD = 0.43). Half of these stimuli were matched to the length of the auditory words, the other half to the length of the nonverbal sounds. The visual baseline stimuli were meaningless object pictures, created by scrambling both global and local features, and then manually edited to accentuate one of eight colors (brown, blue, orange, red, yellow, pink, purple, and green). Consistent speech production responses were ensured for all stimuli in a pilot study conducted on 19 participants.

#### *Stimulus and task counterbalancing, fMRI Paradigm 1*

The 128 object stimuli were divided intofour sets of 32 (A,B,C, and D). Set D was always presented as nonverbal sounds. Sets A, B, and C were rotated across pictures, visual words, and auditory words



**FIGURE 1 | Experimental hypothesis testing and results. (A**; top) describes the results that would support an interpretation of Spt activation in terms of sensory-to-motor integration, auditory imagery, and subvocal articulation. Note that the different accounts have opposing predictions for the same conditions (e.g., greater activation for pseudoword repetition than sound naming versus less activation for pseudoword repetition than sound naming). P1 = Paradigm 1, P2 = Paradigm 2 (see Materials and Methods). **(B**; bottom) lists the 16 different conditions, the statistical contrast used to test the different effects described in the top part of the figure, and the *Z* scores associated with each effect (i.e., the result). Aud = auditory presentation, Vis = visual presentation, O-B = one-back task, Articul. = Articulation, dec. = decision, Sens. = sensory speech input (no speech production), cM. = coverty mouth movements/articulation, oM. = overt mouth movements/articulation, nSem. = non-semantic, Sem. = semantic, S/nSem. = semantic and non-semantic, Dur. = auditory stimuli with long vs. short durations, ME. = main effect of auditory input, ns. = not significant.

in different participants. All items were therefore novel on first presentation of each stimulus type (for task 1) and the same items were repeated for task 2. Half of the subjects performed all eight speech production tasks first (task 1) followed by all eight one-back tasks (task 2). The other half performed all eight one-back tasks first (task 1) followed by all eight speech production tasks (task 2). Within each task, half of the subjects were presented auditory stimuli first, followed by visual stimuli; the other half were presented visual stimulus first, followed by auditory stimuli. The order of the four stimulus types was fully counterbalanced across subjects, and full counterbalancing was achieved with 24 participants.

Each set of 32 items was split into four blocks of eight stimuli, with one of the eight stimuli repeated in each block to make a total of nine stimuli per block (eight novel, one repeat). The stimulus repeat only needed to be detected and responded to (with a finger press) in the one-back tasks.

#### *Data acquisition, fMRI Paradigm 1*

Functional and anatomical data were collected on a 3T scanner (Trio, Siemens, Erlangen, Germany) using a 12-channel head coil. Functional images consisted of a gradient-echo EPI sequence and 3 mm × 3 mm in-plane resolution (TR/TE/flip angle = 3080 ms/30 ms/90◦, EFOV = 192 mm, matrix size = 64 × 64, 44 slices, slice thickness = 2 mm, interslice gap = 1 mm, 62 image volumes per time series, including five "dummies" to allow for T1 equilibration effects). The TR was chosen to maximize whole brain coverage (44 slices) and to ensure that slice acquisition and stimulus onsets were a synchronized, which allowed for distributed sampling of slice acquisition across the study (Veltman et al., 2002).

For anatomical reference, a T1 weighted structural image was acquired after completing the tasks using a three-dimensional modified driven equilibrium Fourier transform (MDEFT) sequence (TR/TE/TI = 7.92/2.48/910 ms, flip angle = 16◦, 176 slices, voxel size = 1 mm × 1 mm × 1 mm). The total scanning time was approximately 1 h and 20 min per subject, including set-up and the acquisition of an anatomical scan.

#### *Procedure, fMRI Paradigm 1*

Prior to scanning, each participant was trained on all tasks using a separate set of all training stimuli except for the environmental sounds which remained the same throughout both training and experiment. All speaking tasks required the subject to respond verbally by saying a single object name, color name or pseudoword after each stimulus presentation, whereas the one-back matching task required a button press (and no speech) after each stimulus presentation to indicate whether the stimulus was identical to the one immediately preceding it (yes with one finger/no with another finger). All participants were instructed to keep their body and head as still as possible and to keep their eyes open throughout the experiment and attend to a fixation cross on screen while listening to the auditory stimuli. Each of the 16 tasks was presented in a separate scan run, all of which were identical in structure.

Scanning started with the instructions "Get Ready" written on the in-scanner screen while five dummy scans were collected. This was followed by four blocks of stimuli (nine stimuli per block, 2.52 s inter-stimulus-interval, 16 s fixation between blocks, total

run length = 3.2 min). Every stimulus block was preceded by a written instruction slide (e.g., "Repeat"), lasting 3.08 s each, which indicated the start of a new block and reminded subjects of the task. Visual stimuli were each displayed for 1.5 s. The pictures subtended an angle of 7.4◦ (10 cm on screen, 78 cm viewing distance) with a pixel size of 350 × 350, with a screen resolution of 1024 × 768. The visual angle for the written words ranged from 1.47◦ to 4.41◦ with the majority of words (with five letters) extending 1.84◦–2.2◦.The length of sound files varied across stimuli and tasks, ranging from 0.64 to 1.69 s (see stimulus creation above). Auditory stimuli were presented via MRI compatible headphones (MR Confon, Magdeburg, Germany), which filtered ambient inscanner noise. Volume levels were adjusted for each subject before scanning. Each subject's spoken responses were recorded via a noise-cancelling MRI microphone (FOMRI IIITM Optoacoustics, Or-Yehuda, Israel), and transcribed manually for off-line analysis. We used eye-tracking to ensure participants paid constant attention throughout the experiment.

#### *Data Pre-processing, fMRI Paradigm 1*

We performed fMRI data preprocessing and statistical analysis in SPM12 (Wellcome Trust Centre for Neuroimaging, London, UK), running on MATLAB 2012a (Mathsworks, Sherbon, MA, USA). Functional volumes were (a) spatially realigned to the first EPI volume and (b) un-warped to compensate for non-linear distortions caused by head movement or magnetic field in homogeneity. The anatomical T1 image was (c) co-registered to the mean EPI image which had been generated during the realignment step and then spatially normalized to the Montreal Neurological Institute (MNI) space using the new unified normalization-segmentation tool of SPM12. To spatially normalize all EPI scans to MNI space, (d) we applied the deformation field parameters that were obtained during the normalization of the anatomical T1 image. The original resolution of the different images was maintained during normalization (voxel size 1 mm × 1 mm × 1 mm for structural T1 and 3 mm × 3 mm × 3 mm for EPI images). After the normalization procedure, (e) functional images were spatially smoothed with a 6 mm full-width-half-maximum isotropic Gaussian Kernel to compensate for residual anatomical variability and to permit application of Gaussian random-field theory for statistical inference (Friston et al., 1995).

In the first-level statistical analyses, each pre-processed functional volume was entered into a subject specific, fixed-effect analysis using the general linear model (Friston et al., 1995). All stimulus onset times were modeled as single events, with two regressors per run, one modeling instructions and the other modeling all stimuli of interest (including both the repeated and unrepeated items). Stimulus functions were then convolved with a canonical hemodynamic response function. To exclude lowfrequency confounds, the data were high-pass filtered using a set of discrete cosine basis functions with a cut-off period of 128 s. The contrasts of interest were generated for each of the 16 conditions of interest (relative to fixation).

#### *Effects of interest, fMRI Paradigm 1*

At the second level, the 16 contrasts for each subject were entered into a within-subject, one-way ANOVA in SPM12. From this analysis, we identified activation that increased in conditions that we hypothesized to tap the processing type of interest. A summary of the condition comparisons used to test our main hypotheses is provided in **Figure 1**. As with all imaging studies, the task analysis (i.e., the functional sub-processing involved in each task) involves a certain degree of *a priori* assumptions. Below, we try to make these assumptions and their bases explicit as well as testing their validity within the available data.

The effect of most interest was the location of activation associated with the non-semantic translation of auditory inputs to motor outputs. This was defined, *a priori*, as the area(s) where activation increased for repeating auditory pseudowords (that links auditory inputs to articulatory outputs) compared to naming nonverbal sounds (that accesses articulatory outputs from semantics). To control for auditory speech processing that is not integrated with a motor response, we also computed the interaction between stimulus (pseudowords > nonverbal sounds) and task (speech production that links the stimuli to articulation versus one-back matching that links the stimuli to a finger press response).

#### **DEFINING OUR REGION OF INTEREST IN Spt/TPJ**

In addition to conducting a whole brain search for areas that were more activated for pseudoword repetition than nonverbal sound naming, we also conducted a region of interest analysis, with a small volume FWE correction for multiple comparisons, focusing on the Spt area associated with sensory-motor integration in Hickok and Poeppel (2007), Hickok et al. (2009), and Hickok (2012) who define Spt functionally as an area at the posterior end of the lateral sulcus (Sylvian fissure), around the anterior end of the TPJ, which responds to both auditory perception and silent vocal tract gestures (=subvocal articulation during speech tasks). We used the same functional definition, locating Spt in TPJ where activation increased during (a) auditory word perception, (b) covert (subvocal) articulation, and (c) overt speech production–with the assumption that areas associated with covert speech production should also be activated during overt speech production.

Areas associated with auditory word perception, when motor output was controlled, were identified by comparing activation for (a) one-back matching on auditory words and (b) one-back matching oncolors. Areas associated with subvocal articulation, were identified by comparing activation for (a) one-back matching on visual pseudowords and (b) one-back matching oncolors. Areas associated with overt speech production were identified by comparing all eight speech production conditions to all eight one-back matching conditions. See **Figure 1B** for summary.

Our reasons for using visual pseudoword matching to identify areas involved in subvocal articulation were fourfold. First, on the basis of cognitive processing models of reading (e.g., Seidenberg and McClelland, 1989; Coltheart et al., 1993), we hypothesized that accurate one-back matching on visually presented pseudowords could either be based on orthographic similarity or phonological similarity. Second, we hypothesized that phonological processing of orthographic inputs involves subvocal articulatory activity related to how the sounds associated with the inputs would be produced by the motor system. This hypothesis was based on prior work showing that articulatory areas are activated in response to visual pseudowords even when participants are performing an

incidental visual matching task (see, Price et al., 1996). Third, evidence for articulatory processing during one-back matching of visual pseudowords in the current paradigm comes from the observation that a left premotor area (at MNI co-ordinates *x* = −51, *y* = −3, *z* = +33) is activated for the one-back task on pseudowords > words (*Z* score = 3.65), and, in turn, this region is activated during overt articulation (i.e., a main effect of speech > one-back tasks; *Z* score = 6.7). Thus, one-back matching on visually presented pseudowords covertly increased activation in areas, that are undisputedly associated with overt articulation, even though no overt articulation was involved. Fourth, by ensuring that our Spt area also responded to overt speech production, irrespective of stimulus type, we hypothesized that overlapping activation during silent one-back matching on visually presented pseudowords was more likely to be related to subvocal articulation than orthographic processing.

Consistent with the above hypotheses, we found activation (significant at *p* < 0.001 uncorrected) in TPJ for (i) one-back matching of auditory words relative to colors, (ii) one-back matching on visual pseudowords relative to colors, and (iii) all eight overt speech production conditions relative to all eight one-back matching conditions. The peak of this effect in MNI co-ordinates [−51,−39,+21] corresponds closely to the location of the Spt area reported by Hickok et al. (2009) where the mean effect across multiple single subjects analyses was located at Talairach co-ordinates [−50, −40, +19] which is [−51, −42, +18] in MNI space. As in our study, the Spt activation reported in Hickok et al. (2009) cannot be related to orthographic processing because it was identified using auditory stimuli only. Specifically, Hickok et al. (2009) identified activation related to covert articulation by comparing (a) a condition where participants hear speech and then covertly rehearse it to (b) a baseline condition where participants hear speech without instructions to covertly rehearse it.

In short, our definition of Spt was consistent with prior studies. Therefore our Spt-ROI for paradigm 1 was defined as the 33 contiguous voxels [around MNI co-ordinates (−51, −39,+21)] that were significant at *p* < 0.001 for (a) one-back matching on auditory words > colors, (b) one-back matching on visually presented pseudowords > colors, and (c) all overt speech production conditions relative to all one-back matching conditions.

#### **EXPLORING THE RESPONSE IN OUR FUNCTIONALLY DEFINED Spt AREA**

After defining our Spt region of interest, and testing whether it was involved in non-semantic auditory to motor translation (i.e., for auditory repetition of pseudowords more than nonverbal sound naming), we also tested whether our Spt area was sensitive to auditory processing, when articulatory processing was controlled. We dissociated auditory processing and articulatory processing by comparing activation for overtly articulating animal and object names during (a) the nonverbal sound naming conditions (say "cat" when hearing a meow) and (b) the auditory word repetition conditions (say "cat" when hearing "cat"). Activation in auditory processing areas was expected to be higher for hearing nonverbal sounds than auditory words because the duration of all the nonverbal sound stimuli (mean = 1.47 s, SD = 0.13) was significantly longer (*t* = 37.8; *p* < 0.001) than the duration of all the word stimuli (mean = 0.64 s; SD = 0.1). We also expected that, if our Spt area was sensitive changes in early auditory processing, then its response across conditions should mirror that seen in the early auditory cortex (e.g., Heschl's gyrus) and be greater during the auditory conditions than the corresponding visual conditions.

#### *Additional functional data, fMRI Paradigm 2*

In the fMRI design described above (Paradigm 1), all our speech production conditions involved the translation of sensory inputs to motor outputs in so far as the speech production output depended on the content of the sensory input. Therefore, as noted in the Introduction, we cannot fully exclude the possibility that an increased Spt response for speech production relative to oneback matching reflected the translation of any type of sensory input to speech outputs, irrespective of whether the sensory-tomotor translation was semantically or non-semantically mediated. We therefore report one further result from Parker Jones et al. (2012) The results we report were based on 36 native (monolingual) speakers of English. Full details of this second experimental paradigm, can be found in Parker Jones et al. (2011). In brief, Paradigm 2 included eight different conditions that involved either speech production, semantic matching, or perceptual matching (PM) on four types of stimuli (pictures of familiar objects, written names of the same familiar objects, pictures of meaningless nonobjects and meaningless strings of Greek letters), see **Figure 1B** for a list of the eight conditions.

The result of interest in Paradigm 2 concerned the level of Spt activation for two conditions that require speech production in response to sensory input (overt picture naming and reading) relative to two conditions that do not involve sensoryto-motor translation (saying "1-2-3" repeatedly to meaningless visual cues). In other words, if Spt is involved in semantically and non-semantically mediated sensory-to-motor integration then activation in Spt should be higher for naming and reading than repeatedly saying "1-2-3," irrespective of the visual input.

For this paradigm, we functionally defined Spt where there was an overlap of activation, in the TPJ territory, for (a) silent semantic decisions on written words relative to fixation (*p* < 0.001 uncorrected) and (b) reading aloud relative to semantic decisions on the same words (*p* < 0.001 uncorrected). The former contrast tapped word comprehension, the latter contrast involved overt speech production. The peak MNI co-ordinates for the overlapping activation were identified in TPJ at [−54, −38, +22] with a second peak at [−56, −42, +18]. Both peaks overlap with the P1-Spt-ROI. All surrounding contiguous voxels that were significant at *p* < 0.001 for both (a) and (b) were saved as the P2-Spt-ROI.

#### **LESION STUDY** *Patient selection*

Eight patients with selective deficits in auditory repetition were selected from the PLORAS database (Price et al., 2010) which includes lesion images and behavioral data from the Comprehensive Aphasia Test (CAT; Swinburn et al., 2004) and a continuously increasing population of Stroke patients (Price et al., 2010). The heterogeneity of patients in the database allows us to carefully select subsamples that are closely matched for all but one factor of interest. Patients are only excluded from this database if they have other neurological or psychiatric conditions, are unable to tolerate

2 h of speech and language assessments, or have implants or other contraindications to MRI scanning.

A neurologist (co-author Alex P. Leff) recorded whether the stroke resulted in left hemisphere, right hemisphere, or bilateral damage, and provided a comprehensive description of the lesion location. In addition, the lesion in each MRI scan was identified automatically as detailed below.

For the current study, we selected patients who were assessed 1–10 years after a left hemisphere stroke (ischemic or haemorrhagic) in adulthood (age range = 18–87 years), were right handed prior to their stroke, with English as their first language and with complete behavioral data on the CAT, and had focal lesions that were 50 cm3 or less. They were assessed on auditory repetition of words and non-words (pseudowords), picture naming, verbal fluency, auditory and written word comprehension, and semantic picture matching as described below. The inclusion criteria were scores in the aphasic range for word or pseudoword repetition and scores in the non-aphasic range for all other tasks picture naming, verbal fluency, auditory and written word comprehension, and semantic picture matching.

*Auditory word repetition.* This required an immediate response to each heard word, presented one at a time. There were 16 words with 1–3 syllables. Correct responses were given a score of 2 if promptly produced, and 1 if production was accurate but delayed (>5 s) or if a self-correction or if a repetition of the stimulus was required. There were no points for absent or incorrect responses, including "phonemic" (i.e., segmental), neologistic, and dyspraxic errors. Dysarthric errors were not penalized. We selected patients whose *t*-value was 52 or less (see **Table 1**), thereby excluding patients who had normal or mildly aphasic auditory word repetition.

*Auditory non-word repetition.* Auditory repetition of five heard non-words (syllable range 1–2). Scoring was as for word repetition. Unlike word repetition, repetition of non-words cannot be facilitated by word recognition or semantic processing; it is entirely reliant on phonological processing. The memory load may therefore be higher than that required for auditory word repetition. We selected patients whose *t*-value was 52 or less (see **Table 1**), thereby excluding patients who had normal or mildly aphasic auditory word repetition.

*Picture naming.* Patients were asked to generate the names of objects or animals in response to 24 black-and-white line drawings presented one at a time. Correct items, were given a score of 2 if accurate and promptly named, and 1 if accurate but delayed (>5 s) or if a self-correction. Incorrect responses or responses only obtained after a semantic and/or phonological cue were given a score of zero. We excluded patients who had either mildly or severely aphasic responses.

*Verbal fluency.* This score is a sum of two component tests: category fluency ("Name as many animals as you can") and phonological fluency ("Name words beginning with the letter 's' "). Each subject was allowed 60 s for each test. Subjects were allowed to make articulatory errors but repeated items (perseverations) were not counted. There was no auditory perceptual component to this task (other than self-monitoring). It was designed primarily to test


#### **Table 1 | Patients with conduction aphasia.**

*The results of the Comprehensive Aphasia Test (CAT) used to select the eight conduction aphasics are presented along with their age, years since stroke, lesion volume, and gender (see Materials and Methods). For each of the CAT assessments, t-values provide a standardized metric of abnormality (the position that a patient would have relative to a population of aphasics) rather than performance per se. These t-values therefore account for the fact that different assessments are not all equally difficult (Swinburn et al., 2004; p. 103). Abnormally low scores on the auditory repetition tasks are highlighted in dark gray. Scores that are on the border of normal/abnormally low are highlighted in light gray. Patient numbers (e.g., PS401) correspond to those from the PLORAS database (Price et al., 2010). W* = *words, NW* = *nonwords* = *pseudowords, Aud* = *Auditory, Vis* = *Visual, Comp* = *comprehension, F* = *female, M* = *male.*

word retrieval and is commonly used as a test of central executive processing (Baddeley, 1996). In this paper, we report a composite measure of semantic and phonological fluency and excluded patients who had either mildly or severely aphasic scores.

*Single-word auditory comprehension.* Subjects were presented with four black-and-white line drawings and a spoken word was presented. Subjects had to point to the corresponding target drawing. Alongside the target drawing there were three distractors. One was phonologically related to the target, one was semantically related, and one was unrelated. Subjects could request that the word was repeated without penalty. Subjects scored one point if they pointed to the correct target. There were 15 presentations in total. We excluded patients who had either mildly or severely aphasic responses.

*Single-word visual comprehension.* This subtest is constructed along the same lines as the single-word auditory comprehension test above except that the phonological distracters are both phonologically and *visually* similar to the target when the words are written down (e.g., target: "pin"; distractors: "bin," "needle," "basket"). The rated semantic similarity of target and semantic distractor is equal in the two subtests, allowing a direct comparison of the relative degree of impairment in the auditory and visual word comprehension. Different words were used in the auditory and visual versions of the task. We excluded patients who had either mildly or severely aphasic responses.

*Semantic memory.* The task involved visual presentation of an image in the center of a page surrounded by four other images. All images were black and white line drawings. Patients were instructed to point to the drawing that "goes best with," i.e., is most closely semantically related to the target object (e.g., hand). One of the four drawings was a good semantic match to the target (e.g., mitten), one was a close semantic distractor (e.g., sock), one more distantly related (e.g., jersey), and one was unrelated (e.g., lighthouse). One mark was awarded for each correct response. Successful performance on this task indicated that the patient had recognized the picture and accessed detailed semantic associations. We excluded patients who had either mildly or severely aphasic responses.

Images acquired from our Siemens 1.5 T Sonata (*n* = 5) had an image matrix of 256 × 224, with repetition time/echo time = 12.24/3.56 ms. Images acquired from our Siemens 3T Trio scanners had an image matrix of 256 × 256 (*n* = 2), with repetition time/echo time=7.92/2.48 ms. Images acquiredfrom our Siemens 3T Allegra (*n* = 1) had an image matrix of 256 mm × 240 mm, with repetition time/echo time = 7.92/2.4/530 ms.

The lesions were identified from the anatomical MRI images using a fully automated procedure described in Seghier and Price (2013). In brief, scans were pre-processed in SPM5/8 (Wellcome Trust Centre for Neuroimaging, London, UK), with spatial normalization into standard MNI space using a modified implementation of the unified segmentation algorithm that was optimized for use in patients with focal brain lesions. After segmentation and normalization, gray and white matter tissue images were smoothed and subsequently compared to control data from 64 healthy subjects. This identified abnormal voxels using an outlier detection algorithm that generates a binary image of the lesion site in standard MNI space (Seghier et al., 2008). Abnormal voxels in gray and white matter were finally grouped and delineated as lesions, creating a three-dimensional image of individual patients' lesions in MNI space. Individual lesions were then overlaid to create 3D lesion overlap maps, showing where patients shared damage at each voxel of the brain.

#### **RESULTS**

#### **IN-SCANNER BEHAVIOR**

Details of the in-scanner behavior are provided in **Figure 2**. Statistical analyses involved 2 × 4 ANOVAs in SPSS manipulating stimulus modality (visual versus auditory) with stimulus

**FIGURE 2 | In-scanner performance.** Accuracy (ACC) and response times (RT) for one-back (O-B) and speech production (SP) tasks are plotted in the top part of the figure for both visual (VIS) and auditory (AUD) modalities, where error-bars represent standard errors. Full details are provided in the bottom

part of the figure. WPSH = words, pseudowords, sounds and humming. WPPC = words, pseudowords, pictures and colors. SD = standard deviation, Min = minimum, Max = maximum, n-a = not available. For technical reasons, data for three participants were excluded from all O-B tasks.

type (word, pseudoword, sound/picture, and gender/color). All ANOVAs were corrected for potential violations of sphericity, adjusting their degrees of freedom using the Greenhouse–Geisser correction (Greenhouse and Geisser, 1959). These corrections result in more conservative statistical tests (i.e., decreasing the risk of false positives while increasing the risk of false negatives), and account for the non-integer degrees of freedom below. Data from all 25 subjects were included for the speech production tasks (measuring accuracy in both visual and auditory modalities), while data from only 22 subjects were included for the one-back tasks [measuring accuracy and response times (RT) in both visual and auditory modalities]. Three subjects'data were lost in the one-back tasks for technical reasons.

For speech production accuracy, we found a main effect across the four stimulus type conditions [*F*(1.38,33.11) = 29.14; *p* < 0.001, Greenhouse–Geisser] and a stimulus modality by condition interaction [*F*(1.52,36.41) = 3.82; *p* = 0.042, Greenhouse–Geisser] but no overall effect of stimulus modality [*F*(1.00,24.00) = 0.04; *p* = 0.84, Greenhouse–Geisser]. In the visual domain, accuracy was higher for words and colors than pictures and pseudowords. In the auditory domain, accuracy was higher for words and gender than sounds or pseudowords. Response time data were not available in the speech production task.

For accuracy in the one-back task (with partially missing data for three subjects), we found a main effect across the four stimulus type conditions [*F*(2.25,47.32) = 29.94; *p* < 0.001, Greenhouse– Geisser], a main effect of stimulus modality [*F*(1.00,21.00) = 4.89; *p* = 0.038, Greenhouse–Geisser] and a stimulus modality by condition interaction [*F*(2.08,43.65) = 6.54; *p* = 0.003, Greenhouse– Geisser]. In the visual domain, accuracy was higher for pictures, pseudowords and words relative to colors. Likewise, in the auditory domain, accuracy was higher for words, pseudowords and sounds than gender. The lower accuracy for color and gender arose because some participants attempted to match these stimuli on their visual or auditory forms, rather than their color or pitch.

For RT in the one-back task, we found a main effect across the four stimulus type conditions [*F*(1.62,34.07) = 21.17; *p* < 0.001, Greenhouse–Geisser], a main effect of stimulus modality [*F*(1.00,21.00) = 150.51; *p* < 0.001, Greenhouse– Geisser], and a stimulus modality by condition interaction [*F*(1.81,38.00) = 6.68; *p* = 0.004, Greenhouse–Geisser]. For all conditions, participants were slower in the auditory modality than the visual modality. Within both stimulus modalities, RT mirrored the accuracy on the one-back task with faster response time and higher accuracy for words and pseudo-words compared to the baseline conditions (gender and color).

#### **fMRI RESULTS**

#### *Non-semantic auditory-to-motor translation, fMRI Paradigm 1*

No brain areas, including Spt, were more activated by auditory repetition of pseudowords compared to sound naming. At the individual subject level, only one subject showed higher activation for pseudoword repetition than sound naming but this did not approach significance (MNI *x* = −51, *y* = −45, *z* = +15; *Z* score = 2.1; *p* > 0.05 following small volume correction for multiple comparisons). This null result leaves us with two questions: (1) is auditory-to-motor translation a function of the white matter connections (see lesion analysis below) and (2) what is the function of Spt in the TPJ.

#### *Auditory activation in area Spt, fMRI Paradigm 1*

There were highly significant increases in Spt activation when auditory input increased (see **Figure 1B**). Specifically, (1) Spt activation was higher (*Z* score = 6.6) for hearing and responding to nonverbal sounds of objects and animals than their heard names which had less than half the auditory duration of the sounds (mean 1.47 vs. 0.64 s, *t* = 37.8, *p* < 0.001); and (2) Spt activation was higher (*Z* score = 6.7) for the direct comparison of all auditory stimuli to all visual stimuli. A third relevant observation, illustrated in **Figure 3**, is that the pattern of activation in Spt over the eight auditory conditions mirrored that seen in Heschl's gyrus and the primary auditory cortex [compare the plot at (−51, −39,+21) and (−42, −27,+12)].

#### *Other types of sensory to motor activation in area Spt, fMRI Paradigm 2*

Activation in the P2-Spt-ROI, was greatest for reading aloud but did not differ for object naming (semantically mediated sensory-to-motor translation) and repeatedly saying "1-2-3" (no sensory-to-motor translation); see lower right-hand corner in **Figure 3**. Therefore we found no evidence that Spt was involved in either semantic or non-semantically mediated sensory to motor translation.

#### **LESIONS RESULTING IN SELECTIVE AUDITORY REPETITION DEFICITS**

At the time of analysis (May, 2013), eight patients in the PLORAS database met our inclusion criteria (see **Table 1** for details). The lesion overlap map (**Figure 4**) shows that six of the patient patients had damage to the temporo-parietal component of the superior longitudinal fasciculus, corresponding to the location of the arcuate fasciculus. The lesion extended ventrally, undercutting the left posterior superior temporal area (*z* = +8 in MNI space) associated with phonological processing during both speech perception and production (Wise et al., 2001). This is illustrated in **Figure 4** by showing sagittal, coronal, and axial MRI images, positioned at MNI co-ordinates [−40, −40, +10] which are medial to the pSTs area reported at [−63, −37, +6] by Wise et al. (2001). Cortical damage in the temporal lobe (at *z* = +8) was observed in 5/6 patients but only 1/6 patients had damage to Spt (at *z* = +20). There were no instances of Spt damage in the context of preserved temporo-parietal white matter tracts. However, there were three patients who had damage to the white matter but not to the more lateral cortical regions. Therefore, our results show that temporo-parietal white matter damage, in the vicinity of the

arcuate fasciculus, was sufficient to cause selective auditory repetition difficulties but we do not know if selective damage to Spt would also cause auditory repetition difficulties.

The remaining 2/8 patients (including the patient with selective difficulty repeating non-words) had damage to a more anterior component of the superior longitudinal fasciculus at the level of the motor cortex (*y* = −10 in MNI space).

#### **DISCUSSION**

The aim of this paper was to investigate the neurological underpinnings of non-semantically mediated sensory-to-motor translation during auditory repetition. On the basis of prior literature, we tested two hypotheses. The first was that afunctionally defined area (Spt) in the TPJ would respond proportionally to the demands on non-semantically mediated auditory input-to-vocal tract output. This was based on prior fMRI data (Pa and Hickok, 2008; Hickok et al., 2009), and tested with a new fMRI experiment that aimed to systematically tease apart activation related to auditory processing and articulation from activation related to semantically and non-semantically mediated sensory-to-motor integration. The second hypothesis was that selective deficits in translating auditory inputs to motor outputs during auditory repetition, when auditory comprehension and speech production were preserved (i.e., conduction aphasia), would be the consequence of damage to the arcuate fasciculus. This was based on the classic neurological model of language, where the arcuate fasciculus functions to connect auditory images of speech in Wernicke's area to motor images of speech in Broca's area (Geschwind, 1965). As discussed below, we found evidence for the second but not for the first hypothesis.

Evidence in support of the arcuate fasciculus being essential for auditory-to-motor integration during auditory repetition was provided by a lesion analysis which considered whether selective deficits in auditory repetition in patients who had preserved auditory comprehension, picture naming, and verbal fluency were the consequence of lesions to: (1) TPJ/Spt with minimal involvement of the underlying white matter; (2) the temporo-parietal white matter tracts (in the vicinity of the arcuate fasciculus) with minimal involvement of TPJ/Spt cortex; (3) both TPJ/Spt and the underlying white matter; or (4) neither TPJ/Spt nor the underlying white matter. The results from eight different patients provided consistent evidence (8/8) that selective difficulties with auditory repetition were the consequence of damage to white matter in the vicinity of the arcuate fasciculus. In 6/8 patients this was observed posteriorly in the temporal lobe, undercutting the left posterior superior temporal area associated with phonological processing during speech production. In the other two patients, the white matter damage was more anterior.

Although all eight patients with selective deficits in auditory repetition had white matter damage in the vicinity of the arcuate fasciculus, only one had damage that extended into the cortex surrounding the peak MNI co-ordinates associated with Spt [−51, −39, +21] in our fMRI study. Thus the lesion results provide evidence that selective repetition difficulties can result from white matter damage in the vicinity of the arcuate fasciculus when Spt is intact, but we did not find evidence that selective repetition difficulties can be caused by damage to the cortical area Spt when the white matter tract is intact.

**FIGURE 3 | Activation for each condition in auditory and motor areas, and in Spt.** These results illustrate the mean activation responses across all conditions in primary auditory and motor areas as well as in Spt. The top plots show activation responses for one-back (O-B) and speech production (SP) tasks in both auditory (AUD) and visual (VIS) modalities in left Heschl's gyrus (top-left plot, labeled "auditory input") and left central sulcus (top-right plot, labeled "motor output"). In the AUD modality, the stimuli were words, pseudowords, environmental sounds, and humming (WPSH). In the VIS modality, the stimuli were words, pseudowords, pictures of objects, and pictures of scrambled objects (WPPC). The center images locate our functionally defined mask for Spt at the TPJ. The bottom plots show activation responses in Spt in Paradigm 1 (P1; bottom-left plot) and in Paradigm 2 (P2; bottom-right plot). As both top plots use P1, the conditions are the same in the bottom-left plot. The bottom-right plot shows the primary and secondary peaks for Spt in P2, where the tasks were spoken response (SP), semantic matching (SM), and perceptual matching (PM) all in the visual (VIS) modality. Stimuli comprised pictures, words, nonobjects, and false-fonts (PWNF). In all five plots, error bars represent 90% confidence intervals. See Section "Materials and Methods." Note that in P1, activation in both Heschl's gyrus and Spt is lowest for the visual one-back task (O-B) because there was no auditory input in either the stimulus or the response. During the visual speech production conditions (VIS-SP) activation was observed in auditory areas because participants can hear the sound of their spoken response.

**FIGURE 4 | Lesion sites in patients with selective repetition difficulties.** These images illustrate the most consistent lesion sites in patients with selective repetition difficulties. The left column shows overlap maps. The first three rows of the left column show overlap maps of sagittal (*x* = −40), coronal (*y* = −40), and axial slices (*z* = +10) for six patients projected onto the canonical brain in MNI space. To the right, these six patients are represented by coronal sections of their individual anatomical brain images in normalized space (in the middle and right columns of rows 1, 2, and 3). The bottom row

shows the coronal overlap map (left column) for two patients (middle and right columns). These bottom two patients (PS091 and PS597) show lesions more anterior (*y* = −15) to the six patients above (*y* = −40). In the top three overlap maps, yellow indicates a lesion overlap of 4/6 patients, red a lesion overlap of 5/6 patients, and dark maroon a lesion overlap of 6/6 patients (i.e., the maximum possible overlap). In the bottom overlap map, the maximum overlap is 2/2 patients, which is indicated again by dark maroon. Red arrows point to the area of overlap in each patient and in the coronal overlap maps.

The distinction between cortical and white matter damage is not provided in the lesion analysis reported by Buchsbaum et al. (Buchsbaum et al., 2011) who show evidence that 12/14 of their patients with auditory repetition and picture naming difficulties had very extensive temporo-parietal damage that overlapped with the relatively small Spt cortical area identified in their fMRI experiment. In contrast, the lesion overlap was smaller in our patients who were selected to have focal lesions and deficits in auditory repetition but not picture naming, word comprehension, or verbal fluency. Our finding that some of our conduction aphasics had focal white matter damage that spared the surrounding gray matter (see bottom row of **Figure 4**) suggests new directions

for neurocomputational models of aphasia (Ueno et al., 2011). For example, Ueno et al. (2011) use a connectionist neural network to model conduction aphasia both by subtracting incoming links (simulating white-matter damage) and by simultaneously adding noise to unit outputs (simulating gray matter damage), whereas our finding suggests that the white-matter damage alone may suffice. Furthermore, our finding that the maximum overlap of damage in 6/8 of our patients was at the level of the left posterior superior temporal sulcus also stands in contrast to previous suggestions for the involvement of the supramarginal gyrus (Ueno et al., 2011) or the supratemporal plane (Buchsbaum et al., 2011).

The lesion analysis is consistent with the importance of the arcuate fasciculus for auditory repetition (see **Figure 5**). In our sample of patients, we found no evidence for or against the importance of Spt in sensory-to-motor integration. For this we would need to find patients with selective damage to Spt who had minimal involvement of the underlying temporo-parietal white matter. Such damage would be highly unlikely following a stroke because of the underlying vascular anatomy. We turn now to our fMRI experiment, which set out to investigate whether activation in a functionally identified Spt area was sensitive to the demands on auditory-to-motor integration when auditory input and the demands on articulation were tested independently.

If Spt is involved in non-semantic auditory-to-motor integration then we would expect activation to be higher for auditory repetition of pseudowords than for naming the source of nonverbal sounds, where the motor response is (a) semantically mediated and (b) does not mimic the auditory input. In contrast, we found that Spt activation was higher for naming nonverbal sounds than repetition of words or pseudowords. Prior literature does not suggest that Spt is selectively involved in semantically mediated sensory-motor integration because Spt activation has been reported for humming music (Pa and Hickok, 2008). Likewise, our study found no evidence that Spt activation increased for semantically mediated sensory-motor integration. In fMRI Paradigm 1, we found that Spt activation was lower for semantically mediated speech production during (i) word repetition relative to pseudoword repetition and (ii) object naming relative to pseudoword reading. In fMRI Paradigm 2, Spt activation during picture naming (semantically mediated sensory-to-motor integration) did not differ from that during an articulation task that involved no sensory-to-motor integration (saying "1-2-3" to the same pictures).

The pattern of activation in Spt is also inconsistent with what would be expected from the motor control of articulation, because we would expect the demands on articulatory planning to increase with novelty (pseudowords relative to words) and not to differ when the articulatory output was matched across participants (object naming versus reading in both fMRI Paradigms 1 and 2). Strikingly, however, the pattern of activation in Spt is consistent with that associated with auditory processing in response to auditory stimuli (greatest for nonverbal sounds irrespective of task), auditory feedback from the sound of the spoken response (speech production relative to one-back task). Indeed, the response pattern in Spt was very similar to that observed in the primary auditory cortex (left Heschl's gyrus), the main difference being that left Heschl's gyrus did not respond during the one-back task on visual pseudowords (fMRI Paradigm 1), nor did it respond during semantic decisions on written words (fMRI Paradigm 2). Thus, Spt distinguishes itself from primary auditory cortex because it appears to be an auditory site that is activated in conditions that might generate auditory associations in the absence of auditory stimuli. Such a conclusion is consistent with many prior studies that have reported TPJ activation during tasks that involve auditory imagery or auditory short term memory (Paus et al., 1996a,b; Shergill et al., 2001; Wise et al., 2001; Buchsbaum and D'Esposito, 2009; Koelsch et al., 2009; McGettigan et al., 2011). In brief, we are proposing that TPJ/Spt activation during covert rehearsal of auditory words (Hickok et al., 2009) reflects internal representations of sounds (akin to auditory imagery). This may be involved in, and even contribute to, articulatory planning, irrespective of how speech production is driven (e.g., sensory inputs, object recognition, counting or verbal fluency). The role of auditory imagery in speech production therefore contrasts to what is implied by the term "sensory motor integration" in which the motor output is computed from the sensory input. We cannot unpack all the different cognitive and neural mechanisms that might be involved in speech production or integrate all the different labels and terminologies that have been used. Instead, we focus on our empirical results from this study where we investigated whether there are any brain areas that are more activated for non-semantically driven auditory-to-motor translation (i.e., during auditory pseudoword repetition) than semantically mediated

auditory to motor processing (i.e., during nonverbal sound naming). We hypothesized that TPJ/Spt might be involved but we found no evidence to support this hypothesis. Instead, we suggest that non-semantically mediated auditory repetition may be supported by white-matter connections between auditory and motor areas (rather than in a cortical area that translates auditory to motor processing). Our findings allow us to provide further information about the functional response properties of the area commonly known as Spt.

Unraveling our own data, we are proposing that Spt activation is observed during (1) one-back matching of visual pseudowords because participants generate internal representations of the sounds associated with the pseudowords (i.e., their phonology); (2) semantic decisions on written words because as proposed by Glaser and Glaser (1989), participants access the sounds of words (i.e., phonology) when making semantic decisions; and (3) all overt articulation conditions because participants generate speech sounds that are processed in the auditory cortex and beyond. One might argue that auditory processing during articulatory activity could loosely be defined as sensory-motor processing. However, we have not found evidence that Spt/TPJ is required to transform sensory inputs to motor outputs. Therefore, it is unlikely to be an "integration" area. Instead, we are claiming that Spt/TPJ might reflect sensory processing after motor output which may or may not be fed back to influence the motor output.

Although we are arguing against a specific role for Spt in translating externally presented auditory stimuli into vocal tract gestures, it remains possible that auditory processing in Spt plays an important role in correcting speech production at a post articulatory stage, perhaps by matching auditory imagery of the expected spoken response with auditory processing of the generated spoken response and then relaying corrective signals to the motor cortex. Within a predictive coding framework (Friston and Kiebel, 2009), expectations are modeled within a cortical hierarchy as top-down (or "backwards") connections to sensory processing regions, whereas the opposite bottom-up (or "forward") connections to higher-order predictive regions represent error-propagation which applies when the top-down predictions are not adequate to match the sensory input. A similar matching process is associated with auditory error cells in the DIVA model of speech production (see, Guenther et al. (2006) for a formal description) which has suggested that these auditory error cells are located in Spt (Guenther and Vladusich, 2012). Support for this hypothesis comes from both behavioral and fMRI data. At the behavioral level, the importance of auditory feedback during speech production has been established in many prior experiments, for example showing that speech fluency is disrupted by delayed auditory feedback of one's own voice (Stuart et al., 2002) and showing rapid compensation of speech when the pitch of the auditory feedback is shifted (Burnett et al., 1997; Houde and Jordan, 1998). At the neural level, several prior studies have shown that altered auditory feedback increases activation in the posterior planum temporal region relative to unaltered speech (Tourville et al., 2008; Zheng, 2009). The co-ordinates of this effect [i.e., (−66, −38, +22) in Tourville et al., 2008; (−66,−45, +15) in Zheng, 2009] are close to those associated with Spt (−50, −40, +20) although future studies are required to show that the location of the auditory feedback effect corresponds exactly to a functionally identified Spt.

In conclusion, neither the results from our fMRI nor lesion experiments were consistent with Spt functioning as an area that is required to actively integrate auditory inputs into vocal tract gestures. The fMRI data were more consistent with an account where Spt is highly responsive to bottom-up auditory processing, with weaker responses to internally generated sounds. Such activity does not necessarily drive the motor response even when it co-occurs with auditory-to-motor integration. In contrast, the lesion data provided clear evidence that the temporo-parietal white matter that connects the left posterior superior temporal sulcus to the motor cortex is needed for auditory-to-motor integration but not for word comprehension or speech production during picture naming or verbal fluency. This is consistent with the neurological tradition that has attributed conduction aphasia to damage to a white matter tract – the arcuate fasciculus – which connects the two major language centers, Wernicke's area and Broca's area (Geschwind, 1965).

#### **ACKNOWLEDGMENTS**

This work was funded by the Wellcome Trust and by the James S. McDonnell Foundation. We thank Eldad Druks for providing the visual stimuli, Julia Hocking for providing the nonverbal sound stimuli, Jenny Crinion, Tom Schofield, and Sue Ramsden for setting up the patient database; and Nicola Wilson, Louise Lim, Deborah Ezekiel, and Zula Haigh for collecting the patient data.

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 11 July 2013; accepted: 10 January 2014; published online: 31 January 2014.*

*Citation: Parker Jones 'O, Prejawa S, Hope TMH, Oberhuber M, Seghier ML, Leff ¯ AP, Green DW and Price CJ (2014) Sensory-to-motor integration during auditory repetition: a combined fMRI and lesion study. Front. Hum. Neurosci. 8:24. doi: 10.3389/fnhum.2014.00024*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Parker Jones, Prejawa, Hope, Oberhuber, Seghier, Leff, Green and Price. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

### Dissecting the functional anatomy of auditory word repetition

#### *Thomas M. H. Hope1, Susan Prejawa1, 'Oiwi Parker Jones ¯ 1,2, Marion Oberhuber 1, Mohamed L. Seghier 1, David W. Green3 and Cathy J. Price1 \**

*<sup>1</sup> Wellcome Trust Centre for Neuroimaging, Institute of Neurology, University College London, London, UK*

*<sup>2</sup> Wolfson College, University of Oxford, Oxford, UK*

*<sup>3</sup> Department of Cognitive, Perceptual and Brain Sciences, University College London, London, UK*

#### *Edited by:*

*Marcelo L. Berthier, University of Malaga, Spain*

#### *Reviewed by:*

*Steve Majerus, Université de Liège, Belgium Takenobu Murakami, Fukushima Medical University, Japan Annette Baumgaertner, Hochschule Fresenius, Germany*

#### *\*Correspondence:*

*Cathy J. Price, Wellcome Trust Centre for Neuroimaging, University College London, 12 Queen Square, London WC1N 3BG, UK e-mail: c.j.price@ucl.ac.uk*

This fMRI study used a single, multi-factorial, within-subjects design to dissociate multiple linguistic and non-linguistic processing areas that are all involved in repeating back heard words. The study compared: (1) auditory to visual inputs; (2) phonological to non-phonological inputs; (3) semantic to non-semantic inputs; and (4) speech production to finger-press responses. The stimuli included words (semantic and phonological inputs), pseudowords (phonological input), pictures and sounds of animals or objects (semantic input), and colored patterns and hums (non-semantic and non-phonological). The speech production tasks involved auditory repetition, reading, and naming while the finger press tasks involved one-back matching. The results from the main effects and interactions were compared to predictions from a previously reported functional anatomical model of language based on a meta-analysis of many different neuroimaging experiments. Although many findings from the current experiment replicated many of those predicted, our within-subject design also revealed novel results by providing sufficient anatomical precision to dissect several different regions within the anterior insula, pars orbitalis, anterior cingulate, SMA, and cerebellum. For example, we found one part of the pars orbitalis was involved in phonological processing and another in semantic processing. We also dissociated four different types of phonological effects in the left superior temporal sulcus (STS), left putamen, left ventral premotor cortex, and left pars orbitalis. Our findings challenge some of the commonly-held opinions on the functional anatomy of language, and resolve some previously conflicting findings about specific brain regions—and our experimental design reveals details of the word repetition process that are not well captured by current models.

**Keywords: fMRI, language, auditory word repetition**

#### **INTRODUCTION**

Although auditory word repetition is amongst the simplest of language tasks, it involves many different brain regions whose functions are not yet fully understood. The aim of this paper was to dissociate the brain regions that support 10 different levels of processing, which are all thought to occur during auditory repetition. Importantly, all 10 levels of processing were investigated using a within-subject, fully-balanced factorial design that enables functional anatomy to be dissected at a spatial precision beyond that possible when results are compiled from multiple studies, conducted on different participant samples.

We start by (A) considering the sensorimotor and cognitive functions involved in auditory word repetition. We then (B) describe how our within-subjects experimental design is able to dissociate the brain areas supporting 10 different functions and (C) make predictions of the brain areas associated with each function based on hundreds of prior neuroimaging studies that each investigated only a small subset of the functions reported in the current study.

#### (A) *Functional models of auditory word repetition.*

Auditory word repetition requires the immediate reproduction of a word that has been spoken by someone else. In essence, it involves translating an auditory input into articulatory activity (i.e., the mouth movements, breathing, and laryngeal activity) that is required to produce an auditory output that matches the identity of the heard word. In most cognitive models, the mapping of auditory inputs to articulatory activity is mediated by previously-learnt representations of speech sounds (phonology), with further support from the semantic system when the speech has meaning (Hanley et al., 2002, 2004).

Standard cognitive models of speech and reading make a distinction between input phonology and output phonology (e.g., Patterson and Shewell, 1987; Ellis and Young, 1988; see Harley, 2001 for a review). Input phonology supports speech perception, when auditory speech inputs are linked to prior knowledge of speech sounds. Output phonology supports speech production when prior knowledge of speech sounds drives and monitors articulatory activity (Tourville and Guenther, 2011; Guenther and Vladusich, 2012). Neuroimaging studies have also distinguished an intermediate level of phonological processing that is actively involved in recoding auditory speech inputs into vocal tract gestures (Zatorre et al., 1992; Hickok et al., 2003). This is referred to as "articulatory recoding" or "sensori-motor integration."

The range of auditory repetition processes that we investigated in this study was determined by two considerations: (1) a priori predictions based on a single functional anatomical model of language that emerged from a review of 20 years of functional neuroimaging studies in healthy participants (Price, 2012); and (2) the limits of a single, within-subjects fMRI design. **Figure 1** illustrates the components of the functional-anatomical model of language reported in Price (2012) after removing the components that are not directly related to auditory word repetition or our experimental design. Our analysis focuses on 10 processes, extracted from this model. These are listed and described in **Table 1A**, for easy reference when describing the statistical contrasts (**Table 1B**), predictions (**Table 2A**), and results (**Table 2B**).

The 10 processing functions of interest were (P1) auditory processing of familiar and unfamiliar stimuli; (P2) recognition of familiar speech sounds, i.e., auditory phonological input processing; (P3) access to sublexical representations of speech sounds that can be recoded / integrated with articulatory associations; (P4) covert articulatory processing; (P5) sublexical phonological processing of words and pseudowords that influences the motor execution of speech, for example because sublexical phonological cues increase the demands on articulation sequences; (P6) accessing semantic knowledge; (P7) retrieving articulatory plans required to produce words from semantic concepts, as opposed to phonological clues; (P8) motor execution of speech output including orofacial and larynx activity, breathing and the timing of response; (P9) auditory processing of the sound of the spoken response; and (P10) domain general processing that occurs for all types of stimuli and response.

#### (B) *Our within-subjects fMRI design for teasing apart multiple processing areas.*

To tease apart brain regions that are involved in different processes or functions underlying auditory repetition, we used fMRI to compare brain activation during auditory repetition to brain activation during tasks that each activate a subset of the functions of interest. Altogether, there were four different experimental factors: (1) stimulus modality: auditory vs. visual stimuli; (2) sublexical phonological input: phonological vs. non-phonological stimuli; (3) semantic content: semantic vs. non-semantic stimuli; and (4) speech production: speech production vs. a one-back matching task (with finger press response). This resulted in 16 different conditions (i.e., 2 × 2 × 2 × 2), including auditory word repetition (a task involving auditory stimuli, with phonological and semantic content, and requiring speech production in response). The other 15 conditions are listed in **Table 3**. With this design, we dissected the functions of those regions found to be active for auditory word repetition relative to fixation and then dissected these regions according to our 10 functions of interest (see **Table 1** and Materials and Methods for further details). As with all experimental approaches, our rationale is based on our own assumptions about the level of processing that will be engaged in each condition of interest. The data allow us to test these assumptions by comparing the observed effects against those expected from previous studies.

#### (C) *Predictions based on prior neuroimaging studies.*

In the last 20 years, there have been literally hundreds of studies that have used functional neuroimaging techniques, such as PET and fMRI, to reveal the brain areas that support different levels of language processing. The number of possible predictions for our 10 processing functions of interest therefore becomes unruly without constraints. To simplify the selection of predictions, we focus on the brain areas predicted by a single functional anatomical model of language based on a review of many hundreds of neuroimaging papers (Price, 2012). These predictions are provided in Figure 3 of Price (2012) with the anatomical components relevant to auditory word repetition shown in **Figure 1B** of the current paper (see **Table 4** for the list of abbreviations). Those predictions specifically associated with our 10 processing functions of interest are listed in **Table 2A**. The results of each statistical contrast may then be considered according to whether or not they supported the predictions (see **Table 2B**). Our discussion focuses on novel findings that were not predicted *a priori*, and emphasizes the complexity of the brain networks that support auditory word repetition.

#### **MATERIALS AND METHODS**

The study was approved by the London Queen Square Research Ethics Committee. All participants gave written informed consent prior to scanning and received financial compensation for their time.

#### **PARTICIPANTS**

The participants were 25 healthy, right-handed, native speakers of English, with normal or corrected-to-normal vision; 12 females, 13 males, age range = 20–45 years, mean = 31.4 years, *SD* = 5.9 years. Handedness was assessed with the Edinburgh Handedness Inventory (Oldfield, 1971). A 26th participant was subsequently excluded from analyses because of data corruption in one condition.

#### **EXPERIMENTAL DESIGN**

The condition of interest was auditory word repetition. This was embedded in a larger experimental design with a total of 16 different conditions, which allowed us to tease apart the brain regions supporting the sub-functions underlying auditory word repetition. The 16 conditions conformed to a 2 × 2 × 2 × 2 factorial design (see **Table 3**). In brief, Factor 1 was stimulus modality: auditory vs. visual modalities; Factor 2 was the presence or absence of sublexical phonological cues; Factor 3 was the presence or absence of familiar semantic content in the stimuli; and Factor 4 was speech production in response to the stimulus vs. one-back matching which involved a finger press response to indicate if the current stimulus was the same as the previous stimulus.

**FIGURE 1 | The functional anatomy of auditory word repetition, based on an anatomical model of language in Price (2012) inspired by a review of hundreds of fMRI experiments.** Parts that are not related to processes

tested in the current experiment have been excluded. **(A)** defines the processing functions. **(B)** lists the brain areas that were associated with each function in Price (2012). Abbreviated names are explained in **Table 4**.

The stimuli with sublexical phonological cues and semantic content were auditory or visually presented words. The stimuli with sublexical phonological cues but no semantic content were auditory or visually presented pseudowords. The stimuli with semantic content but no phonological cues were pictures of objects and animals or their associated sounds. The stimuli with no sublexical phonological cues and no semantic content were colored meaningless scrambled pictures and human humming sounds.

#### *Participant instructions*

In the speech production conditions, participants were instructed to (a) repeat the auditory words, (b) repeat the auditory pseudowords, (c) name the source of the environmental sounds (e.g., "CAT" in response to "meow"), (d) name the gender of the humming voice ("MALE" or "FEMALE"), (e) read words, (f) read pseudowords, (g) name objects in pictures, and (h) name the dominant color in meaningless pictures of nonobjects. The oneback matching task allowed us to compare the effect of the same stimuli in different tasks because exactly the same stimuli were presented in the eight speech production and eight one-back matching conditions.

#### **STIMULUS SELECTION/CREATION**

Stimulus selection started by generating 128 pictures of easily recognizable animals and objects (e.g., cow, bus, elephant, plate) with one to four syllables (mean = 1.59; *SD* = 0.73). Visual word stimuli were the written names of the 128 objects, with 3 to 12 letters (mean = 5 letters; *SD* = 1.8). Auditory word stimuli were the spoken names of the 128 objects (mean duration = 0.64 s; *SD* = 0.1), recorded by a native speaker of English with a Southern British accent approximating Received Pronunciation. Pseudowords were created using a nonword generator (Duyck et al., 2004), and matched to the real words for bigram frequency, number of orthographic neighbors, and word length. The same male speaker recorded the auditory words and pseudowords.

The non-verbal sounds associated with objects were available and easily recognizable for a quarter (i.e., 32/128) of the (word/picture) stimuli, and taken from the NESSTI sound library (http://www*.*imaging*.*org*.*au/Nessti; Hocking et al., 2013). The **Table 1 | (A) Task decomposition for auditory word repetition into 10 processing functions of interest (P1–P10); (B) The 10 statistical contrasts (C1–C9, with two variants of C8) used to identify regional responses to each of the 10 processing functions of interest in (A).**


*The dark boxes with ticks indicate the column that shows the activation condition (top row) and baseline condition (second row) used in the statistical contrast. The third row indicates the stimulus modality or task that was kept constant across the activation and baseline condition. Contrast 8 (SP > OB) is repeated twice, once with an additional tick in the first column (AUD-VIS OB) and once with a white box and cross. This was to dissociate speech production areas into those that were (A) in auditory processing areas that were likely to be associated with auditory processing in response to the participants' own voices; and (B) not in auditory processing areas and therefore more likely to be related to the motor execution of speech.*

*Abbreviations: Phon, Phonological inputs (words and pseudowords); Non-phon, Nonphonological inputs; Sem, Semantic inputs (words, pictures & environmental sounds); Non-sem, Non-semantic inputs; OB, One back matching task; SP, Speech production task; Both, SP and OB; Aud, Auditory stimuli; Vis, Visual stimuli; AV, Auditory & visual stimuli*

duration of the nonverbal sounds needed to be significantly longer (mean length = 1.47 s, *SD* = 0.13) than the duration of the words [*t*(126) = 37*.*8; *p <* 0*.*001] because shorter sounds were not recognizable. The auditory baseline stimuli were recorded by both a male and a female voice humming novel pseudowords, and therefore did not carry lexical phonological or semantic content (mean length = 1.04 s, *SD* = 0.43). The male and female voices used to record the baseline stimuli were not used to record the auditory words and pseudowords. Half of these stimuli were matched to the length of the auditory words; the other half, to the length of the nonverbal sounds. The visual baseline stimuli were meaningless object pictures, created by scrambling both global and local features of the original object pictures, then manually editing those pictures to accentuate one of eight colors (brown, blue, orange, red, yellow, pink, purple, or green). We conducted a pilot study with these stimuli with 19 participants, to confirm that they elicited consistent speech production responses.

#### **STIMULUS AND TASK COUNTERBALANCING**

The 128 object stimuli were divided into four sets of 32 (A, B, C, and D). Set D was always presented as nonverbal sounds. Sets A, B, and C were rotated across pictures, visual words, and auditory words in different participants. All items were therefore novel on first presentation of each stimulus type (for task 1), and the same items were repeated for task 2. Half the participants (13/25) performed all eight speech production tasks first (task 1) followed by all eight one-back matching tasks (task 2). The other half (12/25) performed all eight oneback matching tasks first (task 1) followed by all eight speech production tasks (task 2). Within each task, half the participants (13/25) were presented auditory stimuli first, followed by visual stimuli; and the other half (12/25) were presented visual stimuli first followed by auditory stimuli. The order of the four stimulus types was fully counterbalanced across participants, and full counterbalancing was achieved with 24 of our 25 participants.

#### **Table 2 | (A) Brain areas that were predicted, a priori, for each of the 10 processing functions of interest (P1–P10) according to an extensive review of the literature (see Table 2 in Price, 2012). (B) Brain areas that were identified for each of the 10 statistical contrasts of interest (C1–C9, with two variants of C8).**


*Highlighted in bold font are those that were inconsistent with the predictions in Table 2A above. Regions in parentheses were predicted to respond, but did not. See Table 4 for region name abbreviations.*

**Table 3 | A schematic representation of the 16 tasks employed in this work, associating each task with the key factors: stimulus modality (auditory vs. visual); process (semantic and/or phonological content); and response modality (SP vs. OB).**


Each set of 32 items was split into four blocks of eight stimuli, with one of the eight stimuli repeated in each block to make a total of 9 stimuli per block (eight novel, one repeat). The stimulus repeat only needed to be detected and responded to (with a finger press) in the one-back matching tasks.

#### **DATA ACQUISITION**

Functional and anatomical data were collected on a 3T scanner (Trio, Siemens, Erlangen, Germany) using a 12-channel head coil. To minimize movement during acquisition, a careful head fixation procedure was used when positioning each participant's head. This ensured that none of the speech sessions were excluded after checking the realignment parameters. Functional images consisted of a gradient-echo planar imaging (EPI) sequence and 3 × 3 mm in-plane resolution (TR/TE/flip angle = 3080/30 ms/90◦, field of view (EFOV) = 192 mm, matrix size = 64 × 64, 44 slices, slice thickness = 2 mm, interslice gap = 1 mm, 62 image volumes per time series, including five "dummies" to allow for T1 equilibration effects). The TR was chosen to maximize whole brain coverage (44 slices) and to ensure that slice acquisition onset was offset-asynchronized with stimulus onset, which allowed for distributed sampling of slice acquisition across the study (Veltman et al., 2002).

For anatomical reference, a high-resolution T1 weighted structural image was acquired after completing the tasks using a three-dimensional Modified Driven Equilibrium Fourier transform (MDEFT) sequence (TR/TE/TI = 7.92/2.48/910 ms, flip angle = 16◦, 176 slices, voxel size = 1 × 1 × 1 mm). The total



scanning time was approximately 1 h and 20 min per participant, including set-up and the acquisition of the anatomical scan.

#### **PROCEDURE**

Prior to scanning, each participant was trained on all tasks using a separate set of all training stimuli except for the environmental sounds which remained the same throughout both training and experiment. All speaking tasks required the participant to produce a single verbal response after each stimulus presentation by saying the object name, color name, gender, or pseudoword. For the one-back-matching task, participants had to use two fingers of the same hand (12 participants used the right hand, and the other 13 used the left) to press one of two buttons on a fMRI compatible button box to indicate whether the stimulus was the same as the one preceding it (left button for "same," right button for "different"). This condition did not involve any overt speech but was expected to involve short term memory, supported by "inner" (covert) speech. Participants were also instructed to keep their body and head as still as possible and to keep their eyes open throughout the experiment and attend to a fixation cross on the screen while listening to the auditory stimuli. An eye tracker was used to ensure that participants had their eyes open and paid constant attention throughout the experiment.

Each of the 16 tasks was presented in a separate scan run, all of which were identical in structure.

The script was written with COGENT (http://www*.*vislab*.* ucl*.*ac*.*uk/cogent*.*php) and run in Matlab 2010a (Mathsworks, Sherbon, MA, USA). Scanning started with the instructions "Get Ready" written on the in-scanner screen while five dummy scans were collected. This was followed by four blocks of stimuli (nine stimuli per block, 2.52 s inter-stimulus-interval, 16 s fixation between blocks, total run length = 3.2 min). Every stimulus block was preceded by a written instruction slide (e.g., "Repeat"), lasting 3.08 s each, which indicated the start of a new block and reminded participants of the task. Visual stimuli were each displayed for 1.5 s. The pictures subtended an angle of 7.4◦ (10 cm on screen, 78 cm viewing distance) with a pixel size of 350 × 350, with a screen resolution of 1024 × 768. The visual angle for the written words ranged from 1.47 to 4.41◦ with the majority of words (with five letters) extending 1.84 to 2.2◦.

Auditory stimuli were presented via MRI compatible headphones (MR Confon, Magdeburg, Germany), which filtered ambient in-scanner noise. Volume levels were adjusted for each participant before scanning. Each participant's spoken responses were recorded via a noise-cancelling MRI microphone (FOMRI IIITM Optoacoustics, Or-Yehuda, Israel), and transcribed manually for off-line analysis.

#### **DATA PRE-PROCESSING**

We performed fMRI data preprocessing and statistical analysis in SPM12 (Wellcome Trust Centre for Neuroimaging, London, UK), running on MATLAB 2012a (MathWorks, Natick, MA, USA). Functional volumes were (a) spatially realigned to the first EPI volume and (b) un-warped to compensate for non-linear distortions caused by head movement or magnetic field inhomogeneity. We used the unwarping procedure in preference to including the realignment parameters as linear regressors in the first-level analysis because unwarping accounts for non-linear movement effects by modeling the interaction between movement and any inhomogeneity in the *T2*∗ signal. After realignment and unwarping, we checked the realignment parameters to ensure that participants moved less than one voxel (3 mm) within each scanning run.

The anatomical T1 image was (c) co-registered to the mean EPI image which had been generated during the realignment step and then spatially normalized to the Montreal Neurological Institute (MNI) space using the new unified normalizationsegmentation tool of SPM12. To spatially normalize all EPI scans to MNI space, (d) we applied the deformation field parameters that were obtained during the normalization of the anatomical T1 image. The original resolution of the different images was maintained during normalization (voxel size 1 <sup>×</sup> <sup>1</sup> <sup>×</sup> 1 mm3 for structural T1 and 3 <sup>×</sup> <sup>3</sup> <sup>×</sup> 3 mm3 for EPI images). After the normalization procedure, (e) functional images were spatially smoothed with a 6 mm full-width-half-maximum isotropic Gaussian Kernel to compensate for residual anatomical variability and to permit application of Gaussian random-field theory for statistical inference (Friston et al., 1995).

#### *First-level analyses*

In the first-level statistical analyses, each pre-processed functional volume was entered into a subject specific, fixed-effect analysis using the general linear model (Friston et al., 1995). All stimulus onset times were modeled as single events, with two regressors per run, one modeling instructions and the other modeling all stimuli of interest (including the repeated and unrepeated items). Stimulus functions were then convolved with a canonical hemodynamic response function. To exclude low-frequency confounds, the data were high-pass filtered using a set of discrete cosine basis functions with a cut-off period of 128 s. The contrasts of interest were generated for each of the 16 conditions of interest (relative to fixation). The results of each individual were inspected to ensure that there were no visible artifacts (edge effects, activation in ventricles, etc.) that might have been caused by within-scan head movements.

#### **IDENTIFYING THE EFFECTS OF INTEREST**

At the second level, the 16 contrasts for each participant were entered into a within-subjects one-way ANOVA in SPM12. First we identified areas that were activated for auditory word repetition relative to rest using a statistical threshold of *p <* 0*.*001 uncorrected. The activated voxels were saved as a single binary image file. Second, we repeated the same second level analysis, but this time we included the binary image file as a region of interest. By excluding any voxels that were not activated for auditory word repetition relative to rest, we ensure that all the effects we report are involved in auditory word repetition. The factorial analysis of the 16 different conditions was implemented at the level of statistical contrasts as described below.

#### *Factor 1 (stimulus modality)*

Auditory processing areas were identified by the main effect of stimulus modality during the one back matching task (Contrast 1). The statistical contrast compared activation for each of the 4 auditory stimuli to each of the four visual stimuli. To ensure that the observed effects were not driven by a subset of the auditory stimuli (e.g., those with sublexical phonology or semantic content), we used the "inclusive masking" option in SPM to exclude voxels that were not activated at *p <* 0*.*001 uncorrected in each of the four auditory one back matching tasks relative to fixation. We did not use the speech production conditions in the main effect of auditory processing because this would bias the effects toward auditory speech processing, given that all speech production conditions result in auditory processing of the speaker's voice, irrespective of whether the stimuli are auditory or visual.

#### *Factor 2 (sublexical phonological input)*

The effect of sublexical phonological input was tested by comparing stimuli with sublexical phonological content (words and pseudowords) to stimuli with no sublexical cues (pictures, environmental sounds, colored patterns, and humming sounds). In Contrast 2, this effect of phonology was computed for auditory one-back matching conditions only to identify areas associated with auditory recognition of speech. The two-way interaction of phonological input with stimulus modality was then computed to confirm whether any phonological effects were specific to the auditory modality.

In Contrast 3, the effect of phonology was computed across both tasks and stimulus modalities to identify activation related to abstract representations of speech sounds or articulatory recoding. In Contrast 4, the effect of phonology was computed for the one-back matching task only (across stimulus modalities) to identify areas associated with covert articulatory processing which might occur for phonological stimuli during the silent one-back matching task but for all stimuli when speech production was required. When a phonological effect was specific to the one-back matching task (i.e., in Contrast 4), we checked (i) the two way interaction of phonological input and task (one back matching *>* speech production); and (ii) whether the same regions were activated in the main effect of speech (Contrast 8 below) as this would be consistent with a role for the identified areas in articulatory processing.

In Contrast 5, the effect of phonology was computed for the speech production task only (across stimulus modalities) to identify areas where the motor execution of speech was influenced by sublexical phonological processing. Any such effects were checked with the two-way interaction of phonological input and task (speech production *>* one back matching).

#### *Factor 3 (semantic content)*

The effect of semantic input was tested by comparing all stimuli with semantic content (words, pictures, environmental sounds) to all stimuli with no semantic content (pseudowords, colored patterns, and humming sounds). In Contrast 6, this was computed across both tasks and stimulus modalities to identify activation related to accessing semantic associations. In Contrast 7, this was computed for speech production only to identify semantic activation that drove speech production responses. When semantic effects were observed in Contrast 7, we tested whether the effects were significantly enhanced during speech production, by computing the two-way interaction of semantic content with task; and the three way interaction of semantic content with task and stimulus modality.

#### *Factor 4 (speech production)*

The effect of speech production was tested by comparing all 8 speech production conditions to all eight one-back matching conditions (i.e., Contrast 8). We then separated activation related to the motor execution of speech (orofacial, larynx, and breathing) from activation that was related to auditory processing of the spoken response or domain general processing by looking at which brain areas overlapped with those identified in the main effect of auditory relative to visual input (Contrast 1) or the main effect of all 16 conditions relative to fixation (Contrast 9). Areas associated with motor execution of speech were those that were not activated (*p >* 0*.*05 uncorrected) in Contrast 1 or during any of the one-back matching tasks that required silent finger press responses (Contrast 9). Areas associated with auditory processing of the spoken response were those that were also activated for auditory relative to visual processing in Contrast 1 as well as speech production relative to one-back matching in Contrast 8.

#### *Domain general processing*

This was identified where activation was significantly activated in all speech production and all one back matching conditions relative to fixation (Contrast 8). We dissociate areas that were common to Contrasts 8 and 9 (i.e., domain general but enhanced during speech production) from those that were independent of speech production (i.e., Contrast 9 only).

The statistical threshold for all main effects was set at *p <* 0*.*05 after family wise error correction for multiple comparisons across the whole brain in either height or extent. Within the identified areas, we report interactions if they were significant at *p <* 0*.*001 uncorrected.

#### **RESULTS**

#### **IN SCANNER BEHAVIOR**

For technical reasons, button press responses were lost for three participants. Therefore in-scanner behavioral measures were based on all 25 participants for speech production but only 22 participants for the one-back matching tasks. In-scanner accuracy was high (*>*95%) for all conditions except auditory repetition and reading of pseudowords (88 and 85% respectively) and one-back matching of gender and colors (88 and 95% respectively). The lower accuracy for color and gender arose because some participants attempted to match these stimuli on their visual or auditory forms, rather than their color or pitch. Response times were only available for the one-back matching task and were measured from stimulus onset to response onset. As the time to present each stimulus varied across conditions, we expected the response times to be longer when the stimulus presentation time was longer. For example, in the visual conditions, all visual features are presented simultaneously and then remain on the screen throughout the stimulus duration. In contrast, in the auditory conditions, auditory features emerge over time. Consequently, the response times were slower for the four auditory one-back matching tasks (range across tasks = 880–1125 ms) than the four visual one matching tasks (range across tasks = 648–762 ms). Within the auditory conditions, response times were slower for sound and gender matching (1111 and 1125 ms) than auditory word or pseudoword matching (880 and 959 ms).Within the visual modality, color matching (762 ms) was slower than visual word, pseudoword or picture matching (655, 648, and 683 ms). We think this is because participants were distracted by the shape of the stimulus which changed on each trial irrespective of whether the color was changing.

#### **fMRI ACTIVATION RESULTS**

#### *Factor 1: Auditory processing (see blue areas in Figures 2–4)*

As expected, activation was significantly greater for auditory than visual stimuli in bilateral superior temporal gyri, including Heschl's gyri and plana temporale. At a statistical threshold of *p <* 0*.*05 corrected for multiple comparisons, there were 960 voxels in the right auditory cortex and 913 voxels in the left auditory cortex. These areas are associated with auditory processing.

#### *Factor 2: Phonological inputs (see Table 5A and turquoise areas in Figures 2–4)*

There was no effect of phonology in the auditory one back matching task (Contrast 2) that was testing for areas that might be involved in recognizing auditory speech. However, four other phonological effects were dissociated that were all common to auditory and visual inputs. The first two of these were identified in the main effect, across stimuli and tasks (Contrast 3) which was observed in the left superior temporal sulcus (STS) and posterior putamen. We dissociate the function of these areas because in the left STS, activation was additive with a main effect of auditory vs. visual inputs (Z scores for C1 = 5.7 in anterior left STS and more than 8 (i.e., assessed as "effectively infinite" by SPM) in posterior left STS); whereas in the left posterior putamen, the effect was additive with the main effect of speech production (**Table 6**). Third, during the one-back matching task only (Contrast 3), there was a main effect of phonology in the left pars orbitalis on the junction with the pars triangularis. At the same location, there was an additive effect of speech production (Zscore in C8 = 4.4). Fourth, during the speech production task only (Contrast 5), there was a main effect of phonology in the left ventral premotor cortex and the left anterior putamen. These effects were also additive with the main effect of speech production (see **Table 6** for details).

#### *Factor 3: Semantic content (see Tables 5B,C and pink areas in Figures 2–4)*

Three different semantic responses were dissociated. First, a ventral part of the left pars orbitalis was activated by semantic input across stimuli and tasks (Contrast 6). Second, during speech production (Contrast 7) but not one-back matching, the pars orbitalis activation extended more laterally and dorsally, bordering the area associated with phonological inputs during the one back matching task. In addition, semantic inputs during speech production (Contrast 7) increased activation in the left posterior middle temporal gyrus extending into the left angular gyrus and the left hippocampus. Third, there was a three way interaction of semantic content, task, and stimulus modality with auditory semantic inputs (environmental sounds and words) that enhanced activation in left ventral frontal lobe regions (frontal operculum, pars triangularis and the inferior frontal sulcus), and the right cerebellum (laterally in lobule VIIIA and medially in lobule VI).

#### *Factor 4: Overt speech production (see Table 6, green areas in Figures 2–6)*

Two different sets of speech production responses were dissociated from the main effect of speech production more than one back matching (Contrast 8) after areas associated with domain general processing (Contrast 9) were excluded. First, activation associated with the motor execution of speech was identified bilaterally in the SMA and anterior cingulate gyrus, precentral

**FIGURE 2 | Illustrations of activations related to auditory processing (in blue), phonological processing (in turquoise), semantic processing (in pink) and the motor execution of speech (in dark green) on a single sagittal brain image at** *x* **= −54 in MNI space.** Plots show the response for each of the 16 conditions in each of the regions of interest, with the name of the brain region, the x, y, and z MNI co-ordinates of the effect and the

Contrast number (e.g., C8) used to identify the effect. The order of the conditions is always the same with abbreviations defined at the bottom of the figure. The conditions of interest for each effect are highlighted in the corresponding color. Red bars on each plot are the 90% confidence intervals generated in SPM. The height of the bar is the mean effect across subjects in arbitrary units, as generated in SPM.

gyri (including the left ventral premotor cortex activated in C5), many posterior and anterior regions in the insula and putamen, the amygdala, temporal pole, pars triangularis extending into the left pars orbitalis (*Z* = 4*.*4 at −45, +27, −3) and the cerebellum (green areas in **Figures 2**–**6**). Notably, the speech production effects in the left ventral premotor cortex, anterior and posterior putamen and left pars orbitalis were additive with the main effect of phonology reported above and in **Table 5A**.

Second, activation associated with auditory processing of the spoken output (i.e., in areas that were also activated for auditory inputs in Contrast 1) was most significant (*p <* 0*.*05 corrected for multiple comparisons) in dorsal superior temporal gyri. These regions included the left anterior and posterior STS areas associated with the main effect of phonological input (C3; **Table 5A**). When the significance level was lowered to *p <* 0*.*001 uncorrected, the main effect of speech production (Contrast 8) was observed in 98% (938/960) of the right hemisphere auditory processing voxels (from Contrast 1) and 88% (801/913) of the left hemisphere auditory processing voxels (from Contrast 1). The absence of an effect of speech production in 12% of the left hemisphere auditory processing voxels suggests that these voxels are not responding to the sound of the spoken response. This apparent discrepancy between the effect of auditory processing of the speakers own voice (in C8) and that of another's voice (C1) will be investigated in a subsequent paper.

#### *Domain general processing: (Table 6, red and orange areas in Figures 3–6)*

Two different sets of domain general processing areas were dissociated from Contrast 9. Those that were enhanced by speech production (i.e., common to Contrast 8 and 9) were observed bilaterally in the pre-SMA, anterior cingulate sulcus, dorsal precentral gyrus, dorsal anterior insula (around the frontal operculum), and lateral regions of the cerebellum (red in **Figures 3**–**6**). Those that were independent of speech production (i.e., Contrast 9 only) were observed in the middle of the anterior cingulate sulcus and a dorsal region of the supramarginal gryus (orange in **Figures 3**, **5**).

In summary, the fMRI results replicated many previous findings (**Table 2A**) but also revealed many novel effects (**Table 2B**) which we now discuss.

#### **DISCUSSION**

In this paper, we have identified multiple regions activated during auditory repetition of words compared to resting with eyes open (i.e., fixation), then used multiple different contrasts,

**space.** Effects colored yellow/orange are those related to domain general processing. Effects in red are those that show an effect of domain general in light green was identified for producing speech from auditory semantic stimuli (C7A).

within-subject, to try to assign functional roles to those regions. Our results extend, refine, and in some cases undermine prior predictions about what regions support auditory word repetition, and what those regions actually do. In what follows, we discuss the results in the context of those prior expectations, focusing on phonological processing, semantic processing, and motor execution of speech during auditory word repetition.

#### **PHONOLOGICAL EFFECTS**

We found no regions that were specifically responsive to the auditory processing of speech (i.e., no effect of phonological inputs on auditory relative to visual stimuli in C2). However, we did find four effects of phonological input that were common to auditory and visual stimuli. Across tasks (C3), phonological inputs increased activation in the left STS and the left posterior putamen. In addition, during speech production, phonological inputs increased activation in the left ventral premotor cortex and the left anterior putamen (C5); and during one-back matching, phonological inputs increased activation in the left pars orbitalis (C4). Below, the role that each of these regions might play in phonology is discussed.

#### *Left STS*

Here we observed a main effect of phonological inputs (C3) that was additive with a main effect of auditory vs. visual processing (C1), see lower left of **Figure 2**. We predicted such a response in

left posterior STS but not left anterior STS. However, the effect we observed anteriorly (at −54, −18, −6) is consistent with a previous study that reported an anterior STS region (at −50, −22, −6) with an additive effect of (i) auditory vs. visual stimuli and (ii) reading and repetition of speech relative to non-speech (Price et al., 2003). What might this auditory processing area be doing during reading? Examination of the plot in the lower left corner of **Figure 2** shows that, although the effect of phonology on auditory stimuli was consistent for both tasks, the response to visual phonological inputs was primarily observed during speech production rather than one-back matching. This could either arise because (A) reading aloud increases access to auditory representations (phonology) more than naming pictures (Glaser and Glaser, 1989); or (B) participants enhance auditory processing of their spoken response during reading relative to naming (even though the auditory input from the spoken response is matched in the reading words and naming picture conditions). We exclude the latter explanation (B), because the common effect of auditory and visual phonology reported in Price et al. (2003) was observed in the context of silent speech production (moving lips without generating any sound) which eliminated auditory processing of the spoken response. We therefore focus on explanation (A), i.e., the left aSTS response reflects access to auditory representations of speech that are readily accessed during reading. Indeed, many previous studies have reported extensive STS activation in the context of audio-visual integration (Calvert et al., 1999; Noppeney et al., 2008; Werner and Noppeney, 2010).

#### *Left posterior putamen*

Here we observed an additive effect of phonological input (C3) and speech production (C8). The response in this region, from the same dataset, has been discussed at length in Oberhuber et al. (2013), which investigated differential activation for reading words and pseudowords and found higher left posterior putamen activation for reading and repeating words than reading or repeating pseudowords (see Figure 2 in Oberhuber et al., 2013). This was interpreted in light of other studies that have associated the posterior putamen with "well learnt movements" (Menon et al., 2000; Tricomi et al., 2009). As articulation is matched in the reading and picture naming conditions, increased left posterior putamen activation for reading must reflect pre-articulatory processing, particularly since left posterior putamen activation was also detected for phonological inputs during the one-back task that did not require overt articulation. We therefore speculate that the effect of phonological inputs on left posterior putamen responses reflected activation related to articulatory planning.

#### *Left ventral premotor cortex*

Here we observed an effect of phonological input during speech production (C5) and an additive effect of speech production (C8). **Table 5 | The effects of phonological vs. semantic inputs. Separate tables indicate the location and significance of activation for: (A) stimuli with phonological vs. non-phonological content; (B) stimuli with semantic vs. non-semantic content; and (C) auditory stimuli with semantic content.**


*The first column (C) reports the contrast number used in Table 1. The second column gives the anatomical name of the location of the activation using abbreviations explained in Table 4. The third column indicates the hemisphere (H), either left (L), or right (R). The fourth column gives the x, y, and z co-ordinates of the peak activation in MNI space. The last five columns give the Z scores for the effect of interest over task contrast OB&SP, OB only, SP only, OB > SP, and SP > OB. Note that the latter two effects indicate the interaction of task with the effect of interest (phonology or semantics). The main effect of task (over semantic and phonological stimuli) is reported Table 6. Effects that survived family wise error correction for multiple comparisons are marked "\*" and "*∧*" for height and extent, respectively. Effects that were not significant at p < 0.01 uncorrected are marked ns.*

Phonological inputs may increase the demands on overt articulation because they provide multiple sublexical phonological cues that need to be integrated (sequenced) into a lexical motor plan. This would be consistent with left ventral premotor cortex playing a role in articulatory sequencing. Contrary to our expectations, we found no evidence for covert articulatory processing in this ventral premotor area during the one-back matching task.

#### *Left anterior putamen*

Here we observed a pattern of response that was similar to that observed in the left ventral premotor cortex: i.e., an effect of phonological input during speech production (C5) and an additive effect of speech production (C8). In addition, using the same dataset, we have previously reported that the left anterior putamen was more responsive during pseudoword reading than word reading (see Figure 2 in Oberhuber et al., 2013). We interpreted this effect as in keeping with prior studies that have associated with left anterior putamen with "the initiation of novel sequences of movements" (Okuma and Yanagisawa, 2008; Aramaki et al., 2011; Wymbs et al., 2012) as opposed to well-known movements in posterior putamen. Keeping with this conclusion provides an interpretation of left anterior putamen activation that is similar to that of the left ventral premotor cortex; i.e., both are involved in sequencing the articulation of sublexical phonological codes.

#### *Left pars orbitalis*

Here, we observed an effect of phonological input during oneback matching (C4) with an additive effect of speech production (C8). Neither of these effects was expected in the left pars orbitalis, which is more commonly associated with semantic processing (Dapretto and Bookheimer, 1999; Poldrack et al., 1999; Devlin et al., 2003; Gough et al., 2005; Vigneau et al., 2006; Mechelli et al., 2007; de Zubicaray and McMahon, 2009). Nevertheless, there are other studies that have reported left pars orbitalis is sensitive to articulatory complexity during nonsemantic pseudoword production (Park et al., 2011) and for reading pseudowords relative to words (Hagoort et al., 1999). Therefore, our study is not unique in highlighting a non-semantic articulatory response in this region.

We controlled for articulatory complexity in our speech production conditions because speech outputs were the same object and animal names in auditory repetition, reading aloud and



*(Continued)*



*Columns 1–3 in each section correspond to columns 2–4 in Table 5. Column 4 reports the Z score (Zsc) for the main effect of speech production (Contrast 8) or domain general processing (Contrast 9). Activation in auditory processing areas (from Contrast 9) is excluded (see results section for details).*

object naming. Interestingly in this context, we did not see differential activation in the left pars orbitalis for phonological and non-phonological inputs during speech production. Plausibly, the effect of phonological inputs that we observed in the left pars orbitalis during the one back matching task might reflect covert articulatory processing that occurs automatically when stimuli have strong articulatory associations. Future studies should therefore consider the possible involvement of phonological processing when interpreting left pars orbitalis activation. For example, in Leff et al. (2008), we observed left pars orbitalis activation at MNI co-ordinates [−48, 28, −6] for listening to speech relative to reversed speech, and interpreted this effect as reflecting semantic processing. In light of the current study, increased left pars orbitalis to speech may have reflected covert articulatory processing.

To summarize this section, we have dissociated four different phonological effects. All were observed in the left hemisphere, with one in an auditory processing area (STS) and three in speech production areas: (i) during speech production only (left ventral premotor cortex and left anterior insula), which we associate with sequencing sublexical articulatory plans; (ii) during one-back matching only (left pars orbitalis), which suggests covert articulatory processing that is equally involved for phonological and non-phonological stimuli during overt speech production; or (iii) during both tasks (left posterior putamen), which is consistent with increases in both covert and overt articulatory activity.

#### **SEMANTIC PROCESSING**

Despite finding that part of the pars orbitalis responded, unexpectedly, to speech production and sublexical phonological inputs

**FIGURE 5 | As in Figures 2–4 but with effects located at** *x* **= +3 in MNI space to highlight the many different effects in the anterior cingulate and supplementary motor cortex.**

during the one back matching task, we found a different part of the left pars orbitalis for the main effect of semantic content (C7). This is consistent with the prior association of the left pars orbitalis with semantic processing mentioned above. The left hand plots in **Figure 4** illustrate the strikingly different responses to two distinct but neighboring parts of the left pars orbitalis, with the semantic area (in pink) lying ventrally to the phonological area (in turquoise).

Semantic content also increased activation in the left posterior middle temporal gyrus, extending into the ventral part of the angular gyrus, with a separate peak in the left hippocampal gyrus. These regions were expected during all semantic conditions but were only detected during the speech production conditions (**Table 5B**), suggesting that semantic associations were weak, or not engaged, during the one back matching. On the other hand, the demonstration that posterior left temporo-parietal areas were involved in auditory word repetition, and other semantic speech production tasks, provides a clear illustration that semantic processing is activated during auditory word repetition, even if it is not theoretically needed.

Likewise, our data suggest evidence that auditory word repetition increases the demands on phonological retrieval mechanisms in left lateralized frontal and right lateralized cerebellar regions that were activated during speech production in response to: (a) semantic relative to non-semantic inputs in the auditory modality; and (b) all conditions relative to fixation, in the visual modality. Activation was therefore least when speech was generated from auditory pseudowords or auditory hums (that have

no lexical or semantic representations); see the left middle plot in **Figure 3** and lower middle plot in **Figure 6**. This pattern of response is interesting because auditory words could theoretically be repeated like auditory pseudowords. The fact that auditory word repetition activated areas strongly involved in naming environmental sounds suggests that either the extra processing makes the task easier or it occurs automatically, whatever its actual benefits.

#### **MOTOR EXECUTION OF SPEECH**

Turning now to the effects of speech production that were not influenced by semantic or phonological differences in the stimulus input, our experimental design was, for what we think is the first time, able to segregate activation related to the motor execution of speech from that involved in domain general processing and auditory processing of the spoken output.

Areas associated with the motor execution of speech included all those that were predicted, with the exception of the thalamus. In addition, we observed activation in the SMA, anterior cingulate, pars triangularis, and an extensive region around the ventral insula and ventral putamen that included the claustrum and spread into the amygdala and temporal pole. We have replicated the effects of speech production in the temporal pole in another study and this will be discussed in our future communications. Here we focus on the effects in the SMA, anterior cingulate, cerebellum, and anterior insula. Those in the SMA and anterior cingulate were not predicted to be specific to speech output because they have been reported to respond during numerous studies of movement with other effectors such as the hand. Indeed, we found other parts of the cingulate and the pre-SMA were involved in both our speaking and finger-press tasks. Thus our study highlights the many different areas in the medial frontal cortex that are all involved in speech production but differ in their involvement in other tasks. **Figure 5** illustrates the relative location of these areas: those in red and orange responded during one back matching and speech production whereas those in green did not respond during one back matching—so appear to be primarily involved in the motor execution of speech. Likewise, **Figure 6** illustrates three different types of functional responses in the cerebellum with medial (paravermal) areas being most responsive during speech production and more lateral areas showing sensitivity to retrieval demands (lower middle plot) and one back matching (top right plot). Future connectivity studies could investigate how the different cerebellar regions shown in **Figure 6** interact with the medial frontal regions shown in **Figure 5**.

Finally, we highlight an interesting dissociation in the left anterior insula. Previous studies have associated this area with either covert speech production (Koelsch et al., 2009), or overt speech production (Ackermann and Riecker, 2004; Shuster and Lemieux, 2005; Bohland and Guenther, 2006). Here we dissociate two different areas involved in speech production, see right panel of **Figure 4**. A dorsal area close to the frontal operculum that is activated during silent and overt speech production conditions (shown in red) and a more ventral area that is specific to overt speech production (shown in green).

#### **IMPLICATIONS FOR THEORIES OF AUDITORY WORD REPETITION**

Our results illustrate the complexity of response profiles in many regions involved in auditory word repetition, and distinguish subregions within some of those regions that have not been identified before. For example, though we did find evidence consistent with the conventional view that the left pars orbitalis plays a semantic role, we also found more phonological activity in a different part of the same region. Similarly, we were able to dissociate three different regions in the cerebellum, and two different regions within the left anterior insula—all of which are implicated in the motor execution of speech, but each of which responds differently in different tasks. These results suggest that there are multiple, overlapping but at least partially independent circuits involved in auditory word repetition—circuits which are not addressed at all in contemporary models of the process (e.g., Ueno et al., 2011; Ueno and Lambon Ralph, 2013).

#### **CONCLUSIONS**

Auditory word repetition is a complex process, supported by a large network of widely distributed brain regions, many of which appear to have their own complex, task-dependent response profiles. As with many other language functions, our analyses of auditory word repetition are hampered first and foremost by that complexity: there are so many regions involved, with such complex response profiles, in such a potentially wide array of different functional circuits, that we cannot hope to explain it all here. Instead, we have sought to show how a multi-factorial, within-subjects design can be used to begin to dissect this complex network, and to illustrate some of the key lessons that can be learned—lessons which go beyond any simple tabulation of those regions that do or do not appear to be implicated by auditory repetition, and which were, or were not, expected to appear in that list (**Table 2**).

Perhaps the most important lesson of all here is that conventional models of auditory repetition, be they founded on the dual route hypothesis (e.g., Saur et al., 2008; Ueno et al., 2011), or even on our own analysis of many hundreds of previous imaging experiments (see **Figure 1**), are simply not detailed enough to account for the data we increasingly observe as our imaging methods improve. For example, we show here that there are several different types of phonological response that cannot be explained as a single concept. We have also shown how areas associated with semantic processing and phonological retrieval respond during auditory word repetition, even though they are theoretically not required. For anatomical models of auditory repetition, and language *per se*, we have shown that there are two different regions in the left pars orbitalis, one that responds to semantic content and one that responds to articulatory processing. At the speech output level, we have dissociated multiple different areas and responses that all need further investigation. Nevertheless, some of the results (e.g., those in the left anterior insula) allow us to reconcile previously conflicting reports.

We hope that these and other results will motivate future experiments that investigate, validate and interpret the vast array of neural responses that support even the simplest of language tasks. We also hope that our single experimental paradigm will be useful dissociating language responses at the individual level, particularly in the clinical setting.

#### **AUTHOR CONTRIBUTIONS**

Cathy J. Price, David W. Green, 'Oiwi Parker Jones. were respon- ¯ sible for the study design. Thomas M. H. Hope created the paradigm and supervised the data acquisition. Mohamed L. Seghier supervised the data analysis. All authors contributed to and approved the final manuscript.

#### **ACKNOWLEDGMENTS**

This work was funded by the Wellcome Trust. We thank Eldad Druks for providing the picture stimuli and Julia Hocking for providing the environmental sound stimuli.

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 25 August 2013; paper pending published: 20 October 2013; accepted: 03 April 2014; published online: 06 May 2014.*

*Citation: Hope TMH, Prejawa S, Parker Jones 'O, Oberhuber M, Seghier ML, ¯ Green DW and Price CJ (2014) Dissecting the functional anatomy of auditory word repetition. Front. Hum. Neurosci. 8:246. doi: 10.3389/fnhum.2014.00246*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Hope, Prejawa, Parker Jones, Oberhuber, Seghier, Green and Price. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*