# LANGUAGE BEYOND WORDS: THE NEUROSCIENCE OF ACCENT

EDITED BY: Ignacio Moreno-Torres, Peter Mariën, Guadalupe Dávila and Marcelo L. Berthier PUBLISHED IN: Frontiers in Human Neuroscience

#### *Frontiers Copyright Statement*

*© Copyright 2007-2017 Frontiers Media SA. All rights reserved. All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.*

*The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.*

*Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.*

*Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.*

*As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.*

> *All copyright, and all rights therein, are protected by national and international copyright laws.*

*The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use.*

ISSN 1664-8714 ISBN 978-2-88945-107-4 DOI 10.3389/978-2-88945-107-4

#### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

#### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

#### Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

#### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# **LANGUAGE BEYOND WORDS: THE NEUROSCIENCE OF ACCENT**

Topic Editors: **Ignacio Moreno-Torres,** University of Malaga, Spain **Peter Mariën,** Vrije Universiteit Brussel & ZNA Middelheim General Hospital Antwerp, Belgium **Guadalupe Dávila,** University of Malaga, Spain **Marcelo L. Berthier,** University of Malaga, Spain

Language learning also implies the acquisition of a set of phonetic rules and prosodic contours which define the accent in that language. While often considered as merely accessory, accent is an essential component of psychological identity as it embodies information on origin, culture, and social class. Speaking with a non-standard (foreign) accent is not inconsequential because it may negatively impact communication and social adjustment. Nevertheless, the lack of a formal definition of accent may explain that, as compared with other aspects of language, it has received relatively little attention until recently. During the past decade there has been increasing interest in the analysis of accent from a neuroscientific perspective.

This e-book integrates data from different scientific frameworks. The reader will find fruitful research on new models of accent processing, how learning a new accent proceeds, and the role of feedback on accent learning in healthy subjects. In addition, information on accent changes in pathological conditions including developmental and psychogenic foreign accent syndromes as well as the description of a new variant of foreign accent syndrome is also included.

It is anticipated that the articles in this e-book will enhance the understanding of accent as a linguistic phenomenon, the neural networks supporting it and potential interventions to accelerate acquisition or relearning of native accents.

**Citation:** Moreno-Torres, I., Mariën, P., Dávila, G., Berthier, M. L., eds. (2017). Language beyond Words: The Neuroscience of Accent. Lausanne: Frontiers Media. doi: 10.3389/978-2-88945-107-4

# Table of Contents


*60 Audiovisual cues benefit recognition of accented speech in noise but not perceptual adaptation*

Briony Banks, Emma Gowen, Kevin J. Munro and Patti Adank

*73 Visual Feedback of Tongue Movement for Novel Speech Sound Learning* William F. Katz and Sonya Mehta

# **SECTION 4: STUDIES IN PATIENTS WITH CHANGES IN ACCENT**

#### **Developmental Foreign Accent Syndrome**

*86 Developmental Foreign Accent Syndrome: Report of a New Case* Stefanie Keulen, Peter Mariën, Peggy Wackenier, Roel Jonkers, Roelien Bastiaanse and Jo Verhoeven

*95 Mild Developmental Foreign Accent Syndrome and Psychiatric Comorbidity: Altered White Matter Integrity in Speech and Emotion Regulation Networks* Marcelo L. Berthier, Núria Roé-Vellvé, Ignacio Moreno-Torres, Carles Falcon, Karl Thurnhofer-Hemsi, José Paredes-Pacheco, María J. Torres-Prioris, Irene De-Torres, Francisco Alfaro, Antonio L. Gutiérrez-Cardo, Miquel Baquero, Rafael Ruiz-Cruces and Guadalupe Dávila

# **Psychogenic Foreign Accent Syndrome**

# *113 Foreign Accent Syndrome As a Psychogenic Disorder: A Review*

Stefanie Keulen, Jo Verhoeven, Elke De Witte, Louis De Page, Roelien Bastiaanse and Peter Mariën

*129 Perceptual Accent Rating and Attribution in Psychogenic FAS: Some Further Evidence Challenging Whitaker's Operational Definition*

Stefanie Keulen, Jo Verhoeven, Roelien Bastiaanse, Peter Mariën, Roel Jonkers, Nicolas Mavroudakis and Philippe Paquier

## *143 Psychogenic Foreign Accent Syndrome: A New Case*

Stefanie Keulen, Jo Verhoeven, Louis De Page, Roel Jonkers, Roelien Bastiaanse and Peter Mariën

## **A New Variant of Neurogenic Foreign Accent Syndrome**

# *156 Loss of regional accent after damage to the speech production network*

Marcelo L. Berthier, Guadalupe Dávila, Ignacio Moreno-Torres, Álvaro Beltrán-Corbellini, Daniel Santana-Moreno, Núria Roé-Vellvé, Karl Thurnhofer-Hemsi, María José Torres-Prioris, María Ignacia Massone and Rafael Ruiz-Cruces

# Editorial: Language beyond Words: The Neuroscience of Accent

Ignacio Moreno-Torres <sup>1</sup> \*, Peter Mariën2, 3, Guadalupe Dávila4, <sup>5</sup> and Marcelo L. Berthier <sup>4</sup>

<sup>1</sup> Department of Spanish Language, University of Malaga, Malaga, Spain, <sup>2</sup> Department of Linguistics and Literary Studies, Clinical and Experimental Neurolinguistics, Vrije Universiteit Brussel, Brussels, Belgium, <sup>3</sup> Department of Neurology and Memory Clinic, ZNA Middelheim General Hospital Antwerp, Antwerp, Belgium, <sup>4</sup> Cognitive Neurology and Aphasia Unit and Cathedra ARPA of Aphasia, Centro de Investigaciones Médico-Sanitarias, Instituto de Investigación Biomédica de Málaga, University of Malaga, Malaga, Spain, <sup>5</sup> Department of Psychobiology and Methodology of Behavioural Sciences, Faculty of Psychology, University of Malaga, Malaga, Spain

Keywords: accent, foreign accent syndrome, neuroscience, psychiatric disorders, neuroimaging

**Editorial on the Research Topic**

**Language beyond Words: The Neuroscience of Accent**

#### INTRODUCTION

Speakers differ not only in the number of languages they master, but also in the accents they impart. Indeed, accent is an essential component of our identity to the extent that in many cases our social adscription to specific groups and our judgements about others are based on accent. The relevance of accent was often dismissed by many linguists for a large part of the twentieth century. However, today it is widely acknowledged that spoken language does not exist without an accent. There are, however, some circumstances under which speakers are unable to modulate accent properly. In fact, many late learners of a second language are unable to acquire the native-like accent in the new language, and some individuals with discrete brain lesions in the speech production network or neuropsychiatric problems may change or lose their regional accent or acquire a peculiar accent, which gives raise to what we know as foreign accent syndrome (FAS).

Edited and reviewed by:

Srikantan S. Nagarajan, University of California, San Francisco, USA

#### \*Correspondence:

Ignacio Moreno-Torres imoreno@uma.es

Received: 16 September 2016 Accepted: 30 November 2016 Published: 20 December 2016

#### Citation:

Moreno-Torres I, Mariën P, Dávila G and Berthier ML (2016) Editorial: Language beyond Words: The Neuroscience of Accent. Front. Hum. Neurosci. 10:639. doi: 10.3389/fnhum.2016.00639

The evidence that accent is extremely relevant both socially and linguistically is reflected by an increased interest of neuroscientists. An important aspect to advance our understanding on the neuroscience of accent is to gain further knowledge on the large-scale set of neural structures that modulate the reception and production of accents. Several populations are germane for the study of accent. One special group is composed of healthy late learners of a second language; another group consists of subjects who are unable to acquire their native pronunciation and instead speak with a foreign accent. Finally, other cases of interest for this Research Topic are patients with accent changes due to psychiatric disorders (psychogenic FAS) or neurological conditions and its variants (neurogenic FAS). Although pathological changes in accent have been considered rare, a growing number of cases are appearing in the literature. In the past decade, neuroscience has made a huge leap in developing new instrumental techniques for studying speech production (e.g., computer-assisted pronunciation training systems) and many labs worldwide are also using modern neuroimaging to explore the neural mechanisms underlying accent.

# CONTRIBUTIONS TO THE CURRENT RESEARCH TOPIC

The 13 articles compiled in this Frontiers Research Topic bring together experimental and theoretical research that links the brain with the phenomena of accent processing and accent change in second language learning as well as in neurological and psychiatric patients. We believe that the full list of papers provides a comprehensive update of the current state-of-the-art on accent in normal and pathological conditions.

# STUDIES IN HEALTHY SUBJECTS

### New Models of Accent Processing

The neural basis of accent is a hot topic in neuroscience research, yet so far there are only a handful of theories of how the brain learns a new accent. Simmonds takes advantage of the theory proposed by Jarvis (2004) on vocal motor learning in songbirds and humans. In this hypothesis and theory article, Simmonds proposes that accent learning depends on the early development of variability in the networks governing the production of each speech sound, an argument which favors accurate acquisition of native-like pronunciation of different languages. She concludes that the vocal learning pathway is less susceptible to variability in late learners of a second language (a sensitive period) than in early learners, because in the latter group this pathway is easily recruited allowing rapid accent learning. In their contribution, Adank et al. examine the neural bases of accentedspeech perception. This mini-review aims to integrate the neural architecture of processing accented speech in a single model that incorporates key neural areas dealing with auditory and phonological processing, sensorimotor mapping, and cognitive control processes. Together, Simmonds and Adank et al.'s provocative proposals shed a new light on the production and perception of accent. Their viewpoints may open new avenues in the study of accent acquisition in healthy late L2 learners as well as in the remediation of pathological changes in accent.

## Learning a New Accent

Christiner and Reiterer investigate the influence of mastering musical skills (as instrumentalists or as vocalists) on the ability to imitate a foreign accent. They show that both instrumentalists and vocalists outperform non-musicians and, not surprisingly, vocalists show superior performance than instrumentalists. This study suggests that intensive vocal and singing training may accelerate foreign accent acquisition processes. Two articles in the current Research Topic used functional magnetic resonance imaging (fMRI) or event-related potentials (ERPs) to assess the neural correlates of accent production and perception. Ghazi-Saidi et al. tested naming of phonologically and semantically similar words (cognates) across two languages (French: "piano"; Spanish: "piano") using experimental linguistic and neuroimaging methods. They show that the native speakers of French struggled to produce cognates with the Spanish accent although L2 lexical learning was consolidated at the phonological and semantic levels. Note that attempts to produce cognates with new accents were cognitively and anatomically demanding as f MRI revealed upregulation of the left dorsal insula, a cortical region which plays key role in accent processing. Romero-Rivas et al. evaluate changes in real time processing of native- and foreign-accented speech with ERPs during language comprehension in healthy subjects. This study reveals fast compensatory processing modifications (lexical-semantic levels, linguistic reanalysis) in signal amplitude after brief exposure to foreign-accented speech.

# Feedback and Accent Learning

Banks et al. compare the role of audio-only and audio-visual speech recognition on perceptual adaptation in a large sample of healthy subjects. Contrary to predictions and although recognition of the novel accent using audio-visual speech cues was better than recognition on the basis of audio-only cues no differences were found in perceptual gains between the two modalities. Therefore, more is not always better, at least in novel accent perceptual adaptation. Katz and Mehta used real-time visual feedback of tongue movements with an interactive 3D visualization system based on electromagnetic articulography. They show that this method strengths learning of non-native speech sounds in healthy speakers. Hopefully, "tongue reading" using computer-assisted pronunciation training opens new avenues for L2 accent learning as well as for the improvement of several speech production disorders (stuttering, apraxia of speech, FAS). The conclusions of these two studies together with data from Christiner and Reiterer seem to support the view that perception alone, be it auditory or audio-visual, does not suffice for successful accent learning. On the contrary, it seems that intensive vocal practice (e.g., as in vocalists) and visual feedback of motor movements production (e.g., through a tongue avatar) may facilitate accent learning and imitation.

# STUDIES IN PATIENTS WITH CHANGES IN ACCENT

The increased number of reports on FAS during the past decade (Gurd and Coleman, 2006; Moreno-Torres et al., 2013) have led to a better definition of the three different types of FAS (developmental, psychogenic, and neurogenic; Verhoeven and Mariën, 2010). Several cases describing these subtypes and included in the Research Topic are summarized below.

# Developmental Foreign Accent Syndrome

Keulen, Mariën, Wackenier, et al. report the case of an adolescent male with developmental FAS (DFAS) who did not show any familial antecedents of developmental disorders nor an abnormal personal psychiatric evaluation or cognitive testing except for impaired executive functions (non-verbal planning). A functional neuroimaging study with single photon emission computerized tomography (SPECT) showed a significant decrease of bloodflow on the medial prefrontal and lateral temporal regions bilaterally. The authors examine the boundaries between DFAS and DAS (developmental apraxia of speech) and since hypoperfusion approached statistical significance in cerebellum as well, they suggest that both disorders might be related to dysfunction in cerebro-cerebellar connections. Berthier et al. describe two adult males who presented with long-standing mild DFAS and internalizing psychiatric disorders (obsessions, anxiety, social phobia) which may suggest a psychogenic origin. Nevertheless, both subjects showed structural brain anomalies (venous malformation and expanded perivascular spaces) and diffusion tensor imaging additionally disclosed microstructural abnormalities in speech and emotion regulation networks. These results emphasize the need to use modern neuroimaging methods to detect subtle brain abnormalities in cases with a provisional diagnosis of psychogenic FAS (PFAS).

#### Psychogenic Foreign Accent Syndrome

Keulen et al. report three studies on PFAS (Keulen, Verhoeven, Bastiaanse, et al., Keulen, Verhoeven, De Page, et al., Keulen, Verhoeven, De Witte, et al.). In a review article, Keulen, Verhoeven, De Witte, et al. examine the extant literature (1907–2014) on the psychogenic subtype. This paper provides clues for its diagnosis in clinical practice and defends the relevance of classifying psychogenic cases as belonging to an independent category. Whitaker (1982) coined the term "foreign accent syndrome" and set out the initial criteria for its diagnosis in neurological cases. Keulen, Verhoeven, Bastiaanse, et al. consider that this early diagnostic recommendations were too restrictive and claim that a set of broader inclusion criteria is desirable to incorporate psychogenic cases. Finally, Keulen, Verhoeven, De Page, et al. report a new case of PFAS in a patient with head trauma. The absence of gross structural brain damage on neuroimaging coupled with the presence of a complex neuropsychiatric disorder lead the authors to assign the label of psychogenic. Cases like this revive the debate centered on the psychogenic and organic origins of FAS.

## A New Variant of Neurogenic Foreign Accent Syndrome

Variants of FAS have been described including changes in regional accent (e.g., from Parisian accent to Alsatian accent), stronger regional accent, and re-emergence of a previously learned and dormant regional accent. Berthier et al. describe a new variant of FAS in this Research Topic in three adult males who after recovering from Broca's aphasia lose their regional accent. This study shows that focal lesions in the middle part of the left motor cortex and adjoining regions seem to be crucial to

#### REFERENCES


alter the neural processes implicated in the production of regional accent features.

# Synthesis and Directions for Future Research

The articles in this Research Topic reveal that accent is not merely accessory to language but it is rather a fundamental component of it. Acquiring a new accent at an early age is easy partly thanks to the great flexibility of the neural networks supporting accent learning. Nevertheless, the acquisition of a new accent after childhood is very difficult because of the well-known reduction in plastic capability of the networks underpinning vocal learning. This means that healthy subjects who want to learn a new language or those who suffer changes in their native accent (e.g., FAS) as a result of psychiatric and/or neurological disorders will require external support to acquire/recover native-like accent. Therefore, it is necessary to further increase our knowledge on the neuroscience of accent so as to develop new training strategies focused on accent.

# AUTHOR CONTRIBUTIONS

IMT and MB contributed to the design of the work and drafted the editorial. PM and GD revised the draft for important intellectual content and contributed with the interpretation of the work. IMT, PM, GD, and MB approved the final version to be published.

#### ACKNOWLEDGMENTS

The authors of this Editorial thank the contributing authors who have worked hard to comply with deadlines and the reviewers of this Research Topic for the efficient work. IMT has been partly supported by a grant from the Spanish Ministerio de Economía e Innovación (FFI2015-68498P).

Whitaker, H. A. (1982). "Levels of impairment in disorders of speech," in Neuropsychology and Cognition, Vol. 1, eds R. N. Malatesha and L. C. Hartlage (Hague: Nijhoff), 168–207.

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Moreno-Torres, Mariën, Dávila and Berthier. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# A hypothesis on improving foreign accents by optimizing variability in vocal learning brain circuits

#### Anna J. Simmonds\*

Division of Brain Sciences, Computational, Cognitive and Clinical Neuroimaging Laboratory (C3NL), Imperial College London, London, UK

Rapid vocal motor learning is observed when acquiring a language in early childhood, or learning to speak another language later in life. Accurate pronunciation is one of the hardest things for late learners to master and they are almost always left with a non-native accent. Here, I propose a novel hypothesis that this accent could be improved by optimizing variability in vocal learning brain circuits during learning. Much of the neurobiology of human vocal motor learning has been inferred from studies on songbirds. Jarvis (2004) proposed the hypothesis that as in songbirds there are two pathways in humans: one for learning speech (the striatal vocal learning pathway), and one for production of previously learnt speech (the motor pathway). Learning new motor sequences necessary for accurate non-native pronunciation is challenging and I argue that in late learners of a foreign language the vocal learning pathway becomes inactive prematurely. The motor pathway is engaged once again and learners maintain their original native motor patterns for producing speech, resulting in speaking with a foreign accent. Further, I argue that variability in neural activity within vocal motor circuitry generates vocal variability that supports accurate non-native pronunciation. Recent theoretical and experimental work on motor learning suggests that variability in the motor movement is necessary for the development of expertise. I propose that there is little trial-by-trial variability when using the motor pathway. When using the vocal learning pathway variability gradually increases, reflecting an exploratory phase in which learners try out different ways of pronouncing words, before decreasing and stabilizing once the "best" performance has been identified. The hypothesis proposed here could be tested using behavioral interventions that optimize variability and engage the vocal learning pathway for longer, with the prediction that this would allow learners to develop new motor patterns that result in more native-like pronunciation.

Keywords: foreign accent, vocal learning, motor learning, non-native speech, language learning, variability, striatum

#### INTRODUCTION

## Vocal Learning

Vocal learning is the ability to imitate sounds that are heard, as opposed to producing innate vocalizations. Most mammals are not vocal learners and can only produce innate calls that remain unmodified throughout life (Petkov and Jarvis, 2012). Instead they are auditory learners and through experience can readily distinguish environmental sounds, making an appropriate response

#### Edited by:

Ignacio Moreno-Torres, University of Malaga, Spain

#### Reviewed by:

Erich David Jarvis, Duke University Medical Center, USA Jon Sakata, McGill University, Canada

\*Correspondence: Anna J. Simmonds anna.simmonds08@imperial.ac.uk

Received: 18 June 2015 Accepted: 20 October 2015 Published: 04 November 2015

#### Citation:

Simmonds AJ (2015) A hypothesis on improving foreign accents by optimizing variability in vocal learning brain circuits. Front. Hum. Neurosci. 9:606. doi: 10.3389/fnhum.2015.00606 to what is heard, e.g., a command to ''sit'', without the ability to produce it (Jarvis, 2004, 2006). In contrast, humans are highly skilled auditory and vocal learners. We are not born with speech and must learn by listening and practicing. Much of the neurobiology of vocal learning has been inferred from studies on songbirds and there are clear anatomical parallels between song learning birds and humans (**Figure 1**). Humans and songbirds both have a direct projection from motor cortex to motor neurons in the brainstem controlling movements required for vocalizations (larynx in humans and trachea and syrinx in songbirds). This projection is absent in non-learning birds such as chickens, and non-vocal learning primates, such as macaque monkeys (Petkov and Jarvis, 2012; **Figure 1**). Vocal learning, and motor learning more generally, involves the basal ganglia, which is the focus of the hypothesis presented here. It has been shown that basal ganglia circuitry is involved to a greater extent in motor learning than performance of acquired behaviors (Hikosaka et al., 1999, 2002). There have also been important distinctions made between different regions within the basal ganglia at different stages of motor learning, with the anterior striatum being involved in learning and the posterior striatum in production of overlearned automatic movements

FIGURE 1 | Direct and indirect vocalization pathways in complex-vocal learners, limited-vocal learners and vocal non-learners. Schematic of a songbird brain (A) and a human brain (B) showing the vocal motor pathway (blue arrow), the vocal learning pathway (white) and the laryngeal motorneurons (red). Also shown in (B) is the limbic vocal pathway for producing innate vocalizations (black). (C) Schematic of a vocal non-learning bird revealing the absence of forebrain song nuclei. (D) Schematic of limited-vocal learning monkeys showing presence of forebrain regions for innate vocalization and also of an indirect projection from a ventral premotor area (Area 6vr) to laryngeal motorneurons. Abbreviations: ACC, anterior cingulate cortex; Am, nucleus ambiguus; Amyg, amygdala; AT, anterior thalamus; Av, nucleus avalanche; DLM, dorsolateral nucleus of the medial thalamus; DM, dorsal medial nucleus of the midbrain; HVC, high vocal center; LMAN, lateral magnocellular nucleus of the anterior nidopallium; LMC, Laryngeal Motor Cortex; OFC, orbito-frontal cortex; PAG, periaqueductal gray; RA, robust nucleus of the of arcopallium; RF, reticular formation; vPFC, ventral prefrontal cortex; VLT, ventro-lateral division of thalamus; XIIts, bird twelfth nerve nucleus. Figure as originally published in Petkov and Jarvis (2012), reproduced with permission.

(Miyachi et al., 1997; Jueptner and Weiller, 1998; Graybiel, 2008; Yin et al., 2009). The hypothesis presented here focuses on the learning of foreign speech, which requires novel motor movements rather than previously acquired familiar articulatory movements used for native speech.

## Speech Acquisition in Infancy

Human infants begin speech acquisition by listening to speech in their environment. They are skilled both in auditory learning, memorizing the communicative sounds of people they interactive with, as well as in vocal learning, from babbling and single word production to articulating well-formed sentences. Stages of speech development start at a universal level and an infant has the ability to learn any language and will start learning the language to which they are exposed. At around 7 months for perception and 10 months for production, speech becomes language-specific. Although infants produce non-speech sounds from birth and vowel-like sounds at around 3 months, canonical babbling does not appear until around 7 months. Languagespecific speech production is observed at around 10 months and word production at around a year (Kuhl, 2004; Simmonds et al., 2011b).

# Speech Acquisition Later in Life

In contrast, when older children and adults begin learning a foreign language, they do not start with a perception phase, a period of listening to language without attempting production of speech sounds. Instead they begin producing speech early on in the learning process, at the same time as undergoing auditory learning. Unlike infants, older learners do not undergo a babbling phase but move straight to word meaning and phrase production, which is influenced by the native language. Using a listening task in bilinguals who learnt a second language after the age of 12, it has been shown that there is a strong tendency to translate a word in a foreign language (L2) into its native (L1) equivalent (Thierry and Wu, 2007; Wu and Thierry, 2010). Similarly, during L2 covert word production, both L2 and L1 phonological representations are retrieved (Wu and Thierry, 2011). Proficient use of vocabulary and grammar are essential skills, but can be learnt instructively, for example from books. However, acquiring a native-like accent requires repeated motor practice, with the accuracy of articulation dependent on repeated attempts to match auditory exemplars of correct pronunciation. Even then, there is considerable inter-individual variability in achieving accurate pronunciation, both in terms of learning strategies and in attainment (Bley-Vroman, 1990) and individual differences in performance have been shown to correlate with structural brain differences (Golestani and Pallier, 2007; Golestani et al., 2007). The challenge of speaking a foreign language is a problem faced by students and teachers of second language education around the world, and pronunciation errors substantially affect communication skills.

This challenge has effects on both the spoken performance in a foreign language, and the neural systems involved. The ''nativelikeness'' of an accent, as judged by native speakers, declines over time as the age at which the speaker starts using the foreign language increases. Italian immigrants arriving in the US were deemed to have a native-like accent if they arrived before the age of two, whereas those arriving as teenagers or young adults had accents that clearly marked them as non-native speakers (Flege, 1995). Perhaps one of the most famous examples of a marked foreign accent in a highly proficient user of a foreign language is Józef Teodor Konrad Korzeniowski, better known by his anglicized name, Joseph Conrad. As a late learner of English as a foreign language he mastered the language to such an extent that he was able to produce great works of fiction in English (his third language), yet was left with such a thick Polish accent that he was reported to be incomprehensible. Scovel (1988) coined the term the ''Joseph Conrad phenomenon'', referring to the mismatch between lexical, morphological and syntactic proficiency, and pronunciation. Even for highly proficient bilinguals, having learnt a language later in life results in differences in activation patterns during speech production. Speaking in a non-native, relative to native, language requires greater engagement of motor-sensory control systems (Simmonds et al., 2011a).

In addition to age at the time of learning, other factors claimed to affect the degree of foreign accent include gender, amount of time spent in an L2-speaking environment, amount of L1 and L2 use, formal instruction, motivation and language learning aptitude (Piske et al., 2001). Another explanation for the failure to acquire the native accent in a foreign language is that late bilinguals use the same syllable representation for both of their languages, which results in producing non-native L1-like patterns in their L2. In contrast, early bilinguals have separate representations for their two languages, even for syllables that are shared across the languages (Alario et al., 2010). The present article presents a novel hypothesis on what might explain the persistent accent in late language learners and considers how it could be improved. The hypothesis is informed by findings from vocal learning research in songbirds and motor learning more generally, as well as our previous work particularly focusing on the response of the anterior striatum during adult human vocal learning (Simmonds et al., 2014). Although the anterior striatum was initially active during production of unfamiliar foreign speech, activity in this region rapidly declined. The decline in the striatum happened over the course of the first scanning session, even before formal training. No decline was found for pronunciation of native non-word stimuli, indicating that the reduction was not an effect of novelty. These findings suggest that late language learners do not maintain use of the vocal learning pathway during learning. Although no direct comparison has been made between early and late language learners in terms of activity in the basal ganglia-forebrain-thalamic circuit, a likely finding would be that early learning of a native language would engage this circuit. However, without research on human infants during speech acquisition, this remains speculative.

# Parallels Between Song Learning Birds and Humans for Song and Speech

As discussed above, humans are highly skilled auditory and vocal learners. Vocal learning also exists in parrots and oscine songbirds (order: Passeriformes; Mooney, 2009; Petkov and Jarvis, 2012), hummingbirds (Jarvis et al., 2000), and to a far lesser degree, some of the traits associated with vocal learning also exist in mice (Arriaga and Jarvis, 2013). The hypothesis presented here is grounded in findings from the avian literature on song learning. There are a number of neural and behavioral parallels between humans and songbirds (see Doupe and Kuhl, 1999; Mooney, 2009; Fee and Goldberg, 2011; Sakata and Vehrencamp, 2012; Brainard and Doupe, 2013; Bertram et al., 2014; Woolley and Kao, 2015). In the same way as human infants learning speech, songbirds also begin vocal learning with a perception phase, during which they listen to songs from a tutor (Doupe and Kuhl, 1999; Brainard and Doupe, 2000; Konishi, 2004). Without exposure to adult song, production of accurate vocalizations is not possible. The production phase in songbirds begins with ''subsong'', (similar to human babbling), before moving onto ''plastic song'' (while they practice what they are learning), before ''crystallized'' song (the equivalent of human native speech) appears. During the plastic song stage, songbirds use trial-and-error learning to adjust their vocal performance until the auditory feedback from their vocal output matches the auditory templates acquired during the auditory learning phase (Brainard and Doupe, 2000; Mooney, 2009; Bolhuis et al., 2010).

As well as similarities in the developmental progression of learning, human speech learning and birdsong acquisition have parallels at the neural and genetic levels (Jarvis et al., 2005; Ölveczky et al., 2005; Bolhuis et al., 2010; Pfenning et al., 2014). A recent gene expression study examined transcriptional specializations in humans and song-learning birds and found that the songbird RA nucleus is most similar to layer 5 neurons of human laryngeal motor cortex (LMC; Pfenning et al., 2014). The songbird Area X in the striatum is most similar to a region within the human anterior striatum (Pfenning et al., 2014), and data from our recent vocal learning study on humans support this finding (Simmonds et al., 2014). The songbird HVC is similar to layers 2 and 3 neurons of primary motor cortex, and thereby possibly also to LMC; songbird LMAN has a weak similarity to Broca's area that requires further investigation for confirmation; DLM (dorsolateral nucleus of the medial thalamus) is most similar to the human anterior thalamus necessary for speech learning and production (Jarvis, 2004; Petkov and Jarvis, 2012).

In this article I present a hypothesis on how foreign accents could be improved by optimizing variability in vocal learning brain circuits, followed by support for the hypothesis, drawing on the literature on variability in songbird vocal learning and variability in motor learning. The article concludes with approaches for testing the hypothesis.

# HYPOTHESIS (FIGURE 2)

The hypothesis presented here is that, as songbirds do, humans have a vocal learning pathway that controls neural and behavioral variability and the influence of this pathway is reduced in older learners, which leads to an inability to master the native accent when learning new languages. Furthermore, if this variability can be optimized in late learners, vocal learning could perhaps be more complete and thereby reduce or eliminate the foreign accent. The focus here is on variability in the acoustic structure of speech, rather than sequencing or timing variability.

# Hypothesis Part 1: The Vocal Learning Pathway in Late Language Learners Becomes Inactive Too Early in the Learning Process and Prevents Accurate Pronunciation in a Foreign Language

In 2004, Erich Jarvis (Jarvis, 2004, 2007) put forward the hypothesis that as in songbirds there exist two pathways in humans: one for vocal learning, and one for production of previously learnt speech. Learning novel motor sequences that are necessary for accurately pronouncing foreign speech is a challenge, and in this article, I argue that for late learners of a foreign language, the vocal learning pathway becomes inactive too early in the learning process, engaging the motor pathway once again. Consequently these late learners do not acquire novel sequences of articulatory movements for the new speech; instead they adapt existing production sequences, which results in speaking the new language with an accent influenced by their own first language, rather than mastering the native-like accent of the target language.

**Figure 2A** presents a simplified diagram of the motor and vocal learning pathways in songbirds and humans. In both songbird pathways the HVC ultimately projects to motor neurons in the brainstem (the nXllts), which then projects to the vocal muscles for vocalization. Following the vocal motor pathway, the HVC projects directly to the RA, which in turn makes a direct projection to brainstem vocal motor neurons (see **Figure 2A<sup>i</sup>** . The vocal learning pathway (anterior forebrain pathway—AFP) consists of a cortical-basal-ganglia-thalamic loop similar to mammals, involving Area X, the DLM and LMAN (Jarvis, 2004, 2006). This loop can be further segregated into lateral and medial loops, both receiving input from HVC into Area X, but with different outputs. The output of the lateral loop is from LMAN to RA; the output of the medial loop is from MMAN (medial magnocellular nucleus of the midbrain) to HVC (Jarvis, 2006). The HVC continues developing until month four post-hatch, near the end of the plastic-song stage (Alvarez-Buylla et al., 1992).

In songbirds the vocal learning pathway is involved during the acquisition of the song pattern and remains important for the modulation of song across social contexts. The vocal motor pathway is involved in producing the learned song (Nottebohm, 2005), and during the plastic song stage in juveniles both pathways interact (Ölveczky et al., 2005). Subsong in juvenile birds does not require HVC, a key premotor area for singing in adult birds, but does require activity in RA and LMAN, which is involved in learning but is not necessary for adult singing of an established song (Aronov et al., 2008). Therefore the relative contributions of the vocal motor and learning pathways seem to change across development in songbirds. It is likely that a similar shift in balance between the two pathways occurs in humans at different stages of learning. I suggest that in late learners of a foreign language, the vocal learning pathway is involved to a greater extent at the beginning of the learning phase but before

learning is complete, the balance in activity between the two circuits shifts more to the motor pathway once again, which prevents accurate learning of pronunciation.

''Closed-ended learners'', such as the zebra finch, are unable to learn a new song in adulthood, even with an intact AFP (Brainard and Doupe, 2002; Funabiki and Funabiki, 2009), as the song they learn becomes crystallized at around 90 days post-hatch and remains stable throughout adulthood (Brainard and Doupe, 2002). An ''open-ended learner'', such as a canary, is able to repeat the learning process in adulthood (Nottebohm et al., 1976; Brainard and Doupe, 2002). If a region within the vocal learning pathway is lesioned in an adult open-ended learner, the bird can continue to produce song it had previously learnt, but is unable to learn a new song (Brainard and Doupe, 2000, 2002; Brainard, 2004). In humans, subcortical structures including the basal ganglia, similar to regions within the songbird AFP, modulate production of overlearned language (e.g., poems or quotations), automatic speech (e.g., counting or reciting the days of the week) and formulaic expressions or fillers (Bridges et al., 2013). Patients with lesions in these regions produce fewer examples of formulaic language than controls (Sidtis et al., 2009). This suggests that overlearned language relies more on subcortical structures than novel language does, perhaps reflecting less reliance on the vocal learning pathway in later language learning.

# Hypothesis Part 2: Variability in Neural Activity Within Vocal Motor Circuitry Generates Vocal Variability that Supports the Acquisition of Native-Like Pronunciation in a Foreign Language

Further, I suggest that prolonged random variation is an essential prerequisite for vocal learning, and optimal variability within the vocal learning pathway generates vocal variability and supports accurate pronunciation with a native-like accent. Activity within the vocal learning pathway in adult songbirds remains important for real-time generation of spectral variability necessary for adapting the song based on different social contexts. In songbirds, vocal variability is actively injected into the premotor song-control region RA (robust nucleus of the arcopallium) by the LMAN (lateral magnocellular nucleus of the anterior nidopallium), which is the output of the vocal learning pathway (Goldberg and Fee, 2011). The LMAN is not necessary for the production of song, only learning and modification to it. When LMAN neurons are inactive, the vocal motor pathway produces an accurate, established pattern. When the LMAN is active during song production, there is much more variability in the song. This variability is needed to reach accurate imitation of a pattern. I argue that in humans, strategies that increase the variability of neural activity in the vocal learning pathway may increase behavioral variability and exploration and promote more successful learning.

**Figure 2B** presents suggested levels of vocal variability when using the two pathways. I suggest that when using the motor pathway, production is stable, with little trial-by-trial variability. When using the vocal learning pathway, trial-by-trial variability gradually increases, reflecting an exploratory phase in which the learners try out different ways of pronouncing the words, before decreasing and stabilizing once the ''best'' performance has been identified.

# SUPPORT FOR THE HYPOTHESIS

In this article I argue that if variability can be optimized in late language learners, vocal learning could perhaps be more complete and thereby enable mastery of a native-like accent in the foreign language. It is not simply that variability in vocal learning needs to increase. Too much variability, or noise, prevents learning just as too little does (Faisal et al., 2008). Therefore, for effective learning it is necessary to optimize the amount of variability. By trying different versions of producing the target, a learner is able to monitor outcomes and refine the movement sequences that result in the most desired outcome. This is true in songbirds and is likely true in humans as well.

## Variability in Songbird Vocal Learning

A critical amount of noise within the song production pathway is necessary during song learning (Doya and Sejnowski, 1995; Ölveczky et al., 2005) and song variability is generated by the AFP (Woolley and Kao, 2015). This variability in the AFP has been shown to correlate with performance variability (Kao et al., 2005; Woolley and Kao, 2015). Although a critical amount of noise appears essential for songbird learning, optimal learning will only occur within the appropriate level of noise for a given stage of learning. There is a reduction in variability within the AFP as the song crystalizes, although some neural and vocal stochastic variability is present even in adult songbirds with apparently stable song (Kao et al., 2005; Kao and Brainard, 2006; Andalman and Fee, 2009). Using altered auditory feedback in adult songbirds, Tumer and Brainard observed that birds were able to learn how their song changed as a result of small variations in vocal performance (Tumer and Brainard, 2007). They suggest that residual variability that persists in well-learned skills reflects motor exploration as part of the trial-and-error learning and monitoring processes, and that this helps to support continuous learning and optimization of performance.

Within the AFP song learning pathway, lesions to Area X have little or no effect on song variability during the vocal babbling stage (Goldberg and Fee, 2011), but when Area X is lesioned in juveniles, the song does not fully crystallize as they become adults and instead remains variable (Sohrabji et al., 1990; Scharff and Nottebohm, 1991). In contrast, LMAN inactivation results in reduced, almost absent variability in song in juveniles and adults (Kao et al., 2005; Ölveczky et al., 2005; Kao and Brainard, 2006; Aronov et al., 2008; Thompson et al., 2011). Young birds at an early stage of song development, which have the most variable song performance, show the greatest reduction in song variability following LMAN inactivation (Ölveczky et al., 2005). Similarly, during vocal babbling in juveniles a lesion to the DLM, part of the thalamus that receives output from the basal ganglia, almost completely removes variability and causes the birds to produce a stable stereotyped song (Goldberg and Fee, 2011).

A decrease in variability has also been observed following lesions to the dorsal arcopallium, adjacent to RA, by authors who suggest this to be an auditory region involved in song learning (Bottjer and Altenau, 2010). However, this region, along with other brain areas adjacent to the vocal systems of vocal learning birds, has been shown to be active during limb and body movements (Feenders et al., 2008). This suggests that the systems involved in vocalizations are controlled by a cerebral motor system. Although a similar auditory pathway exists in both vocal learners and non-learners, vocal learners have a specialized vocal motor system that enables auditory input to be translated into vocal signals (Feenders et al., 2008). A recent electrophysiology and lesion study supports this motor hypothesis, again showing motor behavior and movement control of this region (Mandelblat-Cerf et al., 2014). Further support for the motor hypothesis comes from Pfenning et al. (2014) who, using gene expression, found that the molecular profile of this region is similar to that of the motor and premotor cortex in primates, and not the auditory cortex. Therefore, the variability observed by Bottjer and Altenau (2010) may be similar to that found in RA and motor pathways.

In trial-and-error learning in juvenile songbirds the ''trial'' is represented by the variability in the song, reflecting the motor exploration phase, and the ''error'' is represented by evaluation of song performance, based on auditory feedback (Tumer and Brainard, 2007; Andalman and Fee, 2009; Sober and Brainard, 2009; Fee and Goldberg, 2011). Such variability is necessary for reinforcement-based trial-and-error learning, as the learning process requires exploration of a range of action sequences, evaluation of performance with each and modifications to behavior that result in improved performance (Ölveczky et al., 2005).

Even in crystallized song in adult birds, trial-by-trial variability persists. This variability supports ongoing motor exploration, which maintains performance and makes modifications when necessary (Tumer and Brainard, 2007). Song variability is also context-dependent. During ''directed'' song, in which a male sings a courtship song to a female, the sequencing and structure of syllables are much less variable than when the male sings alone (''undirected'' song; Kao et al., 2005; Ölveczky et al., 2005; Kao and Brainard, 2006; Teramitsu and White, 2006; Sakata et al., 2008; Kojima and Doupe, 2011; Woolley et al., 2014). This suggests that singing alone reflects a practice state of exploratory vocal learning, and directed singing reflects a performance state, in which the male produces the best rendition of their song they memorized during the sensitive period in development (Kojima and Doupe, 2011). LMAN activity is much greater and more variable during undirected song than during directed song (Hessler and Doupe, 1999; Kao et al., 2005; Kao and Brainard, 2006; Kojima and Doupe, 2011; Brainard and Doupe, 2013; Woolley and Kao, 2015) and a lesion to LMAN removes the variability and causes undirected singing to be much more consistent (Kao et al., 2005; Kao and Brainard, 2006; Hampton et al., 2009).

#### Variability in Motor Learning

The hypothesis proposed here is also supported by findings from research on motor learning more generally. Noise in general motor learning (not just vocal learning) has been defined as a mismatch between expected and actual sensory feedback that is not necessarily related to performance errors (Faisal et al., 2008). Recent theoretical and experimental work suggests an important role for noise, termed stochastic facilitation, in motor learning, i.e., variability or noise in the motor movement is necessary for the development of expertise (McDonnell and Ward, 2011; Mendez-Balbuena et al., 2012). Stochastic processes, introducing variability in the execution of motor movements, permit a full exploration of the learning space. Motor learning involves an ''exploration'' phase, during which trial-and-error learning is performed to identify the optimal movement for a successful outcome. Once that is identified, the learner moves into the ''exploitation'' phase, in which they continue producing that movement until the necessary outcome is achieved. Motor learning therefore involves a tradeoff between performing multiple movements to find the one that most reliably produces the desired outcome, and continuing to produce that movement once it has been identified (Müller and Sternad, 2009; Ravbar et al., 2012). During the exploration phase performance is highly variable, and it becomes more consistent when the average performance is closer to the target outcome, suggesting that variance decreases with the bias (Müller and Sternad, 2009; Ravbar et al., 2012). The tradeoff between exploration and stabilization is not the same throughout the learning process. When learning continuous actions (such as dancing), different components of the action may need exploratory variability while others, which may be closer to the target, require stabilization (Doya, 2000). With this type of approach, breaking the movements down into segments would allow variability to be regulated locally so that only those parts of the action that need to change the most undergo exploration, i.e., learning based on the local bias (Doya, 2000; Ravbar et al., 2012).

Individual differences in the amount of motor variability have been associated with the ability to learn or adapt motor skills (Sober and Brainard, 2012; Wu et al., 2014) and models of trial-and-error learning suggest that previous performance can predict the amount of variability in the motor output (Kao et al., 2008). This suggests that motor ''noise'', or variability, is a central component of motor learning (Herzfeld and Shadmehr, 2014). Neural variability is also an indicator of motor learning. As motor habits form, spike firing in the ventromedial striatum peaks at the beginning and end of the motor sequence, and changes to this firing have been suggested to be a sign of learning (Howe et al., 2011). In non-vocal motor learning in rodents, using a rewardbased conditional T-maze task, spiking of striatal neurons has been shown to be highly variable at the initial stage of learning, but following training became more consistent (Barnes et al., 2005). The variable firing rate during learning is considered to represent ''neural exploration'', whereas the stable firing after learning reflects ''neural exploitation''.

# TESTING THE HYPOTHESIS

This converging literature from research on songbird vocal learning and more general motor learning motivated our previous work, which suggests that in late learners of a second language, the vocal learning pathway may become inactive too early, ending the motor learning phase prematurely. Instead, the motor pathway is recruited once more, which results in the learner producing the original native motor patterns for speech; this results in speaking with a foreign accent (Simmonds et al., 2014). The hypothesis proposed here could be tested using behavioral interventions that keep speakers in the learning phase (engaging the vocal learning pathway) for longer, with the prediction that this would allow them to develop new motor patterns that result in more native-like accuracy of pronunciation. This could be investigated using strategies that induce neural and behavioral variability, such as altering the auditory feedback that learners receive. Disrupting auditory feedback in songbirds results in rapid changes to learned song (Tumer and Brainard, 2007; Andalman and Fee, 2009; Hoffmann and Sober, 2014), although variability itself did not increase in these studies. This suggests that altering auditory feedback induces experimentally controlled ''errors'' and changes in song performance (Tumer and Brainard, 2007; Andalman and Fee, 2009; Fee and Goldberg, 2011). Sakata and Brainard (2008) have also found populations of neurons that appear to be sensitive to auditory feedback. Dramatic changes to auditory feedback can increase song variability and decrystallize the song (Leonardo and Konishi, 1999). Using Bengalese finches, Woolley and Rubel have demonstrated that temporary deafening leads to the rapid deterioration of syllable structure and an increase in vocal variability, but once hearing is restored, song is produced normally again (Woolley and Rubel, 1997, 2002). Therefore, although altered auditory feedback disrupts speech production, the auditory template of the acoustic template could remain intact. Assessing speech perception as well as production would identify whether the motor pattern or auditory target has been impaired.

Altered auditory feedback has also been shown to affect vocal production in humans (Houde and Jordan, 1998; Jones and Munhall, 2005; Tourville et al., 2008; Lametti et al., 2012; Kort et al., 2014; Ogane and Honda, 2014), although its role in language learning has not been explored. Different types of feedback could be used to investigate different ways of modulating variability during vocal learning, manipulating cognitive and motor processes to promote variability. Types of auditory feedback could include frequency-altered, delayed, background noise or white noise. Behavioral variability could be assessed by analyzing the acoustic properties of participants' speech, including simple measures of intensity, duration and frequency, as well as correlations of the long-term spectra of specific words and characterization of formants. Somatosensory feedback could also be manipulated, for example altering jaw

#### REFERENCES


movements during speech, which has been shown to result in a mismatch between the expected sensations and the sensory feedback actually received, which causes somatosensory error signals that lead to compensatory movements (Tourville et al., 2005; Guenther et al., 2006). Some speakers rely on auditory feedback information and others rely more on somatosensory feedback (Lametti et al., 2012). Investigating a range of alterations to feedback would allow optimization of variability.

Using continuous speech at the sentence level would allow evaluation of performance to be carried out locally, focusing on specific words or phonemes. Rather than aiming to adapt a speaker's overall level of variability, altered feedback could be used to only induce motor exploration in sounds that need to change. By assessing an individual's speech, feedback manipulations could be developed to only occur for certain words. This type of approach has previously been investigated in zebra finches by manipulating song learning so that only a specific part of the song requires vocal exploration. Ravbar et al. (2012) found no apparent increase in the variability of one syllable when a second first appeared, demonstrating that the bird was able to rapidly switch between performing a highly stereotyped and a highly variable syllable.

The hypothesis proposed here could also be tested using neurobiologically-plausible computational simulations of the neural systems involved in vocal learning. The known neuroanatomy and structural connections of networks involved in speech production, defined using imaging studies, could be used to create a neuroanatomically-constrained model to simulate behavioral variability and learning effects. This type of model would help explain how neural and behavioral stochastic facilitation, with a focus on the striatum as a mediator, could affect vocal learning and allow us to explore, theoretically, the most effective amount of stochastic variability for successful learning. This would also allow for theoretical investigation of the influence of stochastic processes on learning and to simulate interventions in order to predict the optimal level of induced variability for best learning. Larger projects could then investigate the long-term benefits of these novel strategies for foreign language learning, which could lead to the development of new training materials with a strong evidence base, and discussions with educational policy-makers directing future strategies for improving foreign language learning outcomes.

# FUNDING

This work was supported by The Leverhulme Trust, award number RPG-2013-258.


**Conflict of Interest Statement**: The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Simmonds. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Neural bases of accented speech perception

#### Patti Adank 1, 2 \*, Helen E. Nuttall <sup>1</sup> , Briony Banks <sup>2</sup> and Daniel Kennedy-Higgins <sup>1</sup>

<sup>1</sup> Division of Psychology and Language Sciences, Department of Speech, Hearing, and Phonetic Sciences, University College London, London, UK, <sup>2</sup> School of Psychological Sciences, University of Manchester, Manchester, UK

The recognition of unfamiliar regional and foreign accents represents a challenging task for the speech perception system (Floccia et al., 2006; Adank et al., 2009). Despite the frequency with which we encounter such accents, the neural mechanisms supporting successful perception of accented speech are poorly understood. Nonetheless, candidate neural substrates involved in processing speech in challenging listening conditions, including accented speech, are beginning to be identified. This review will outline neural bases associated with perception of accented speech in the light of current models of speech perception, and compare these data to brain areas associated with processing other speech distortions. We will subsequently evaluate competing models of speech processing with regards to neural processing of accented speech. See Cristia et al. (2012) for an in-depth overview of behavioral aspects of accent processing.

#### Edited by:

Guadalupe Dávila, University of Málaga, Spain

#### Reviewed by:

Antoni Rodriguez-Fornells, University of Barcelona, Spain Kristin Van Engen, Washington University in St. Louis, USA

#### \*Correspondence:

Patti Adank, Speech, Hearing and Phonetic Sciences, University College London (UCL), Chandler House, 2 Wakefield St., London WC1N 1PF, UK p.adank@ucl.ac.uk

Received: 13 April 2015 Accepted: 22 September 2015 Published: 06 October 2015

#### Citation:

Adank P, Nuttall HE, Banks B and Kennedy-Higgins D (2015) Neural bases of accented speech perception. Front. Hum. Neurosci. 9:558. doi: 10.3389/fnhum.2015.00558 Keywords: cognitive neuroscience, speech perception, accented speech, fMRI, speech in noise, noise-vocoded speech, time-compressed speech

# Processing Accent Variation at Pre- and Post-lexical Levels

Models outlining the neural organization of speech perception (Hickok and Poeppel, 2007; Rauschecker and Scott, 2009) propose that the locus of processing intelligible speech is the temporal lobe within the ventral stream of speech processing. Rauschecker & Scott suggest that intelligibility processing has its center of gravity in left anterior STS (Superior Temporal Sulcus), while Hickok & Poeppel propose that processing intelligible speech is bilaterally organized and located both anteriorly and posteriorly to Heschl's Gyrus. However, both models are based on intelligible speech perception and do not make explicit predictions about the cortical substrates that subserve speech perception under challenging listening conditions (cf. Adank, 2012a) for a discussion on processing of intelligible speech).

A handful of fMRI studies address how the brain processes accent variation. Listening to difficult foreign phonemic contrasts (e.g., /l/-/r/ contrasts for Japanese listeners) has been associated with increased activation in auditory processing/speech production areas, including left Inferior Frontal Gyrus (IFG), left insula, bilateral ventral Premotor Cortex, right Pre- and Post-Central Gyrus, left anterior Superior Temporal Sulcus and Gyrus (STS/STG), left Planum Temporale (PT), left superior temporal parietal area (Stp), left Supramarginal Gyrus (SMG), and cerebellum bilaterally (Callan et al., 2004, 2014). It is noteworthy that the neural bases associated with listening to foreign languages overlap with those reported for unfamiliar accent processing, including bilateral STG/STS/MTG, and left IFG (Perani et al., 1996; Perani and Abutalebi, 2005; Hesling et al., 2012).

For sentence processing (**Table 1**, **Figure 1**), listening to an unfamiliar accent involves a network of frontal (left IFG, both Operculi/Insulas, Superior Frontal Gyrus), temporal (left Middle Temporal Gyrus [MTG], right STG), and medial regions (Supplementary Motor Area [SMA])


TABLE 1 | Reported brain regions in studies investigating processing of accented, time-compressed, or noise-vocoded speech, plus speech with added background noise vs. undistorted words or sentences.

(Continued)

TABLE 1 | Continued


Note that the list of papers is not exhaustive. Coordinates in Talairach space were converted to MNI space using the tal2icbm\_spm algorithm www.brainmap.org/ale. Anatomical locations determined using the Anatomy ToolBox (Eickhoff et al., 2005, 2006, 2007) in SPM8 Wellcome Imaging Department, University College London, London, UK). \*Original location as reported in the study. AG, Angular Gyrus; FFG, Fusiform Gyrus; FO, Frontal Operculum; IFG, Inferior Frontal Gyrus; IOG, Inferior Occipital Gyrus; IPL, Inferior Parietal Lobule; MCC, Middle Cingulate Cortex; MFG, Middle Frontal Gyrus; MTG, Middle Temporal Gyrus; PG, Precentral Gyrus; POp, Pars Opercularis; PT, Planum Temporale; PTr, Pars Triangularis; POrb, Par Orbitalis: RO, Rolandic Operculum; SMA, Supplementary Motor Area; SMedG, Superior Medial Gyrus; SMG, Supramarginal Gyrus; STG, Superior Temporal Gyrus; STG, Superior Temporal Planum; STS, Superior Temporal Sulcus; TP, Temporal Pole.

(Adank, 2012b; Adank et al., 2012b, 2013; Yi et al., 2014). It is unclear how the accent processing network maps onto the networks in Rauschecker and Scott (2009) and Hickok and Poeppel (2007). The coordinates for accent processing in the left temporal lobe are located anteriorly and posteriorly to Hickok and Poeppel's proposed STG area for spectrotemporal analysis, while the coordinates in left IFG are located inside Hickok and Poeppel's left inferior frontal area assigned to the dorsal stream's articulatory network. In contrast, the temporal coordinates in **Table 1** fit well with Rauschecker & Scott's anteroventral and postero-dorsal areas placed anteriorly and posteriorly to left primary auditory cortex, respectively, and the left IFG coordinates fall within their antero-ventral left inferior frontal area.

# Accented Speech vs. Other Challenging Listening Conditions

As is the case with other types of distorted speech, understanding accented speech is associated with increased listening effort (Van Engen and Peelle, 2014). However, accent variation is of a conceptually different nature than variation in the acoustic signal resulting from an extrinsic source such as noise, i.e., phonetic realizations that differ from the listener's native realization of speech sounds. Furthermore, in contrast to speechintrinsic variation, noise compromises the auditory system's representation of speech from ear to brain. Accented speech also differs from distortions such as noise-vocoded or timecompressed speech as the variation does not affect the acoustic integrity of the acoustic signal, as only specific phonemic and suprasegmental characteristics vary.

Processing speech in noise involves areas also activated for speech in an unfamiliar accent (**Table 1**): left insula (Adank et al., 2012a), left MTG (Peelle et al., 2010), left Pars Opercularis (POp), bilateral Pars Triangularis (PTr). Comprehension of timecompressed sentences activates left MTG (Poldrack et al., 2001; Adank and Devlin, 2010), right STG (Peelle et al., 2004; Adank and Devlin, 2010), SMA and left Insula (Adank and Devlin, 2010), while noise-vocoded speech activates left Insula (Erb et al., 2013), and left MTG/STG (Zekveld et al., 2014). However,

it is clear from **Figure 1** that processing accented speech also activates areas outside the network activated for processing speech in noise, time-compressed speech, and noise-vocoded speech.

Another problem in identifying networks governing accent processing is that perceiving variation in an unfamiliar accent (i.e., in an accent that differs from one's own accent and that the listener has had little or no exposure to) is confounded with cognitive load. Note that such confounds also exist for other distortions of the speech signal, such as background noise. Listeners process speech in an unfamiliar accent slower and less efficiently (Floccia et al., 2006). It is thus unclear to which extent the network supporting accented speech perception is shared with the network associated with increased task/cognitive load processing. Notably, an increase in task difficulty/working memory load relates to increases in BOLD-activation in left insula (Wild et al., 2012), and in left MTG, SMA, left PTr, and right STG (Wild et al., 2012), and could therefore explain activations in these regions related to processing accented speech. Directly comparing the neural processing of familiar/unfamiliar accents may help distinguishing between the two networks.

# Accounts of Accented and Distorted Speech Processing

The current debate regarding how listeners understand others in challenging listening conditions focuses on the location and nature of neural substrates recruited for effective speech comprehension. The three accounts discussed below offer specific predictions regarding the neural networks involved in processing accented speech.

First, auditory-only accounts (Obleser and Eisner, 2009) hold that speech perception includes a prelexical abstraction process in which variation in the acoustic signal is "stripped away" to allow the perception system access to abstract linguistic representations. The abstraction process is placed at locations predominantly in the temporal (STS and STG) lobes. This account predicts that processing of accented speech takes place predominantly in the ventral stream, with minimal involvement of the dorsal stream.

Second, motor recruitment accounts suggest that auditory areas in the ventral stream and speech production areas in the dorsal stream are required to process unfamiliar speech signals (Wilson and Knoblich, 2005; Pickering and Garrod, 2013). These accounts assume that listening to speech results in the automatic activation of articulatory motor plans required for producing speech (Watkins et al., 2003). These motor plans provide forward models with information of articulatory mechanics, to be used when the incoming signal is ambiguous/unclear. Accented speech contains variation that can lead to ambiguities, and these accounts thus predict that perception of accented speech involves active involvement of speech production processes.

Third, executive recruitment accounts propose that activation of (pre-) motor areas during perception of distorted speech signals is not related to actual articulatory processing, but reflects the recruitment of general cognitive processes, such as increased attention, or decision processes (Rodríguez-Fornells et al., 2009; Venezia et al., 2012). Indeed, behavioral data suggest that recruitment of executive functions for processing accented speech (Adank and Janse, 2010; Janse and Adank, 2012; Banks et al., 2015) also predicts activation of frontal regions including left frontal operculum and anterior insula and precentral gyrus, as these regions have also been associated with executive functions such as working memory (Moisala et al., 2015).

The results in **Table 1** contrast with predictions made by the auditory-only account (Obleser and Eisner, 2009), as areas associated with processing accent variation in **Table 1** refer to a more widespread network than predicted. Instead, the network in **Table 1** converges with the latter two accounts, as activation is located across ventral and prefrontal areas in the dorsal stream. We propose that these three accounts are synthesized into a single mixed account for processing of accented speech that brings together neural substrates associated with increased involvement of auditory and phonological processing (e.g., bilateral posterior STG), (pre-)motor recruitment for sensorimotor mapping (e.g., SMA), and substrates associated with increased reliance on cognitive control processes (e.g., IFG, insula, and frontal operculum).

# Concluding Remarks

The neural mechanisms responsible for processing accent variation in speech are not clearly outlined, but constitute a topic of active investigation in the field of speech perception. However, to progress our understanding in this area, future studies should meet several aims to overcome previous design limitations.

First, experiments should be designed so that contributions from processing accented speech and effortful processing can be teased apart (Venezia et al., 2012). Second, studies should aim to distinguish between brain activity related to processing accent variation and other distortions, such as background noise. Adank et al. (2012a) contrasted sentences in a familiar accent embedded in background noise with sentences in an unfamiliar accent, to disentangle areas associated with processing accent-related variation from those associated with processing speech in background noise: Left posterior temporal areas in STG (extending to PT) and right STG (extending into insula) were more activated for accented speech than speech in noise, while bilateral FO/insula were more activated for speech in noise compared to accented speech, indicating that the neural architecture for processing accented speech and speech in background noise is not generic. Third, different accents vary in how much they deviate from the listener's own accent. Greater deviation between accents is associated with greater processing cost, but the neural response associated with variations in distance between accents has not been explored using fMRI. A recent study using Transcranial Magnetic Stimulation (TMS) showed a causal role for lip and tongue motor cortex in perceived speaker and listener distance processing (Bartoli et al., 2013). Another study used EEG to show that regional and foreign accents might be processed differently: processing sentences in an unfamiliar foreign accent reduces the size of the N400 compared to unfamiliar native accents (Goslin et al., 2012). It may be fruitful to use a wider variety of neuroscience techniques, including (combinations of) fMRI, EEG, MEG, and TMS, to investigate how the brain successfully accomplishes accented speech perception. Third, as processing effort, or cognitive load, is inevitably confounded with processing unfamiliar variation in accented speech, experiments should be designed to identify neural substrates associated with processing accent variation and those associated with increased cognitive load. One possibility would be to examine task difficulty and accent processing in a fully crossed factorial design to single out areas that show increased BOLD-activation for accented speech and for task difficulty. Finally, the contribution of production resources to processing accented speech should be examined, to explicitly test predictions from motor and executive recruitment accounts (e.g., Du et al., 2014).

# Acknowledgments

This work was supported by the Leverhulme Trust under award number RPG-2013-254.

#### References


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Adank, Nuttall, Banks and Kennedy-Higgins. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# A Mozart is not a Pavarotti: singers outperform instrumentalists on foreign accent imitation

#### Markus Christiner <sup>1</sup> and Susanne Maria Reiterer 1,2 \*

<sup>1</sup> Department of Linguistics, Unit for Language Learning and Teaching Research (SLLF), University of Vienna, Vienna, Austria, <sup>2</sup> Centre for Teacher Education, University of Vienna, Vienna, Austria

Recent findings have shown that people with higher musical aptitude were also better in oral language imitation tasks. However, whether singing capacity and instrument playing contribute differently to the imitation of speech has been ignored so far. Research has just recently started to understand that instrumentalists develop quite distinct skills when compared to vocalists. In the same vein the role of the vocal motor system in language acquisition processes has poorly been investigated as most investigations (neurobiological and behavioral) favor to examine speech perception. We set out to test whether the vocal motor system can influence an ability to learn, produce and perceive new languages by contrasting instrumentalists and vocalists. Therefore, we investigated 96 participants, 27 instrumentalists, 33 vocalists and 36 non-musicians/non-singers. They were tested for their abilities to imitate foreign speech: unknown language (Hindi), second language (English) and their musical aptitude. Results revealed that both instrumentalists and vocalists have a higher ability to imitate unintelligible speech and foreign accents than non-musicians/non-singers. Within the musician group, vocalists outperformed instrumentalists significantly. Conclusion: First, adaptive plasticity for speech imitation is not reliant on audition alone but also on vocal-motor induced processes. Second, vocal flexibility of singers goes together with higher speech imitation aptitude. Third, vocal motor training, as of singers, may speed up foreign language acquisition processes.

Keywords: vocal motor system, memory, speech imitation, language acquisition device, singing ability, instrumentalists versus vocalists, vocal flexibility

# Introduction

Recent research has shown that musical expertise heightens the potential to memorize and reproduce foreign languages orally (Nardo and Reiterer, 2009; Reiterer et al., 2011; Hu et al., 2012; Christiner and Reiterer, 2013). This relatively newly established and steadily growing scientific field, however, has hardly ever differentiated between two, for the faculty of language learning, relevant aspects: (1) different kinds of musical aptitudes such as instrument playing vs. singing; and (2) language testing of intelligible and unintelligible utterances.

While for measuring musical expertise already approved tests are available (e.g., Advanced Measures of Music Audiation, AMMA; Gordon, 1989), measuring speech imitation talent/aptitude remains a complex endeavour as individual differences in language production and perception can even be noted in native speakers (Pakulak and Neville, 2010; Andringa, 2014)

#### Edited by:

Ignacio Moreno-Torres, University of Malaga, Spain

#### Reviewed by:

Peter Schneider, Heidelberg Medical School, Germany M.C. Fonseca-Mora, University of Huelva, Spain

#### \*Correspondence:

Susanne Maria Reiterer, Department of Linguistics, Unit for Language Learning and Teaching Research (SLLF) and Centre for Teacher Education, University of Vienna, Spitalgasse 2, Court 8.3, 1090 Vienna, Austria susanne.reiterer@univie.ac.at

> Received: 28 May 2015 Accepted: 17 August 2015 Published: 28 August 2015

#### Citation:

Christiner M and Reiterer SM (2015) A Mozart is not a Pavarotti: singers outperform instrumentalists on foreign accent imitation. Front. Hum. Neurosci. 9:482. doi: 10.3389/fnhum.2015.00482 Christiner and Reiterer A Mozart is not a Pavarotti

and giftedness, as raw material, is considered to be a natural and inherent ability, free of educational influence (Gagne, 2005). This predetermines speech imitation test design and makes unintelligible and geographically and linguistically distant languages ideal test stimuli for defining an individual's speech imitation aptitude.

In language acquisition, accent imitation is sometimes considered one of the most crucial aspects of L2 learning. For example, already Seliger (1978) proposed that there are many critical periods for different aspects of language, with the ability to master a native accent in a foreign language being the first to be lost, around the onset of puberty. Hence, the aspect of ''phonetic ability'' has often been considered the first or only sub-ability in language learning which is ultimately subjected to a critical period (Moyer, 2014). Furthermore, pronunciation/accent ensures adequate communication and illustrates the speaker's proficiency (Dalton-Puffer et al., 1997). Sounding like a native speaker is a high aim a second language speaker wants to achieve. For the language learner, foreign accent imitation can be a challenging task, as languages differ on multiple aspects and even typologically close languages such as English and German, have many language specific consonants, vowels or diphthongs, causing non-native speakers difficulties with pronunciation. Typologically more distant languages such as South Asian languages (e.g., Hindi) convey phonetic contrasts such as retroflex vs. dental stop distinctions which are scarcely generated within European languages (Werker and Tees, 2005). Along with this, languages vary on their syllabic rhythm patterns dividing languages into stress-timed, syllable-timed and mora-timed languages. Hindi, for instance, has been classified as syllable-timed language, while German has been termed stress-timed. Characteristically, non-natives face difficulties to discriminate and generate unusual non-native contrasts (Werker et al., 1981; Tees and Werker, 1984; Polka, 1991; Polka et al., 2001; Werker and Tees, 2005) and most often fail to understand where a word begins or ends in a speech stream (Patel, 2008). This may go some way with ageing as language specification of the mother tongue goes at the expense of plasticity towards the acquisition of new languages in most adults.

This is not fully true for musicians who are equipped with special skills and imitation abilities. Many recent studies reported a significant relation between speech imitation and musical expertise. Musicians always outperformed non-musicians in language imitation tasks (Schön et al., 2004; Thompson et al., 2004; Wong and Perrachione, 2007; Pastuszek-Lipinska, 2008; Milovanov, 2009; Nardo and Reiterer, 2009; Reiterer et al., 2011; Hu et al., 2012). As far as known, studies hardly ever differentiated between musical ''sub-abilities'' and investigated whether the imitation skills of instrumentalists and vocalists contribute differently or not.

In previous investigations on language aptitude, singing capacity and musical instrument playing yielded strong correlations to the ability to imitate speech (Nardo and Reiterer, 2009; Reiterer et al., 2011). In a follow up behavioral experiment on vocalists earlier results were replicated and singing capacity contributed significantly to the imitation of familiar and unfamiliar utterances, but surprisingly perceptual musicality measurements correlated significantly lower with speech imitation skills and were irrelevant for explaining participants' imitation abilities in regression models (Christiner and Reiterer, 2013). Singers' musical abilities may be based on different skills than those developed by or pre-existent in instrumentalists. Most investigations focus on a purely perceptual advantage of musicians' language capacity and ignore the role of production. In the present study we integrated both.

Pre-speech studies have shown that the melodic elements of infants' cry might form the basis for both music and speech (Wermke and Mende, 2006, 2009) and may derive from one source which develops into separately, but closely related faculties later. On the perceptual and structural level language like music consists of rhythmic cues, language prosody. Music's metrical organization resembles that of language as beat, notes and ''patterns of tense and relaxation'' form higher units (Jackendoff and Lerdahl, 2006). Language prosody, for example, organises speech and consists of various elements such as ''tonal, temporal and dynamic features'' (Trofimovich and Baker, 2006). Prosodic variations of oral language also ''share many acoustic features with tone transitions in musical melodies'' (Oechslin et al., 2010). The ability to discriminate temporal or segmental information requires similar processes for perceiving both, speech and music. Brain imaging studies found that prosodic information has more strongly activated musical associated areas on the right hemisphere (Meyer et al., 2002), especially when linguistic information is reduced (Perkins et al., 1996). Evidently, music and language perception is not an either/or choice but highly interconnected and may be one of the underlying mechanisms why musicians are advantaged in the oral acquisition of foreign languages: musical expertise leads to an improvement of both, music and speech perception (Oechslin et al., 2010) but also to enhanced literacy and attentional skills (Seither-Preisler et al., 2014).

#### Neuronal Underpinnings of Vocalisations

What has widely been ignored so far is that vocalists train their vocal apparatus more precisely than instrumentalists do. Speech production is closer to the nature of singing than the nature of instrument playing. Hence, a singer's instrument is integrated into the body and already used for all forms of vocalization. Considering this, singing and instrument playing could be understood as separate musical abilities, especially when comparing musical expertise to language learning tasks.

Generally speaking, all forms of vocalization are based on the same principles and rely on the integration of multiple networks. While speech perception has been considered to be more left hemisphere dominated (Liégeois-Chauvel et al., 1999), vocal sound perception seems to be largely bihemispherically organized ''along the upper bank of the superior temporal sulcus'' regardless of whether participants were exposed to speech or non-speech sounds (Belin et al., 2000). All forms of vocalization such as song and speech require the control of the laryngeal system and the articulatory apparatus such as tongue, jaw and orofacial muscles (e.g., Zarate, 2013). Most of the former mentioned systems are largely bihemispherically organized. Orofacial and supralaryngeal movements show a largely bihemispherically organized specialization (Özdemir et al., 2006; Grabski et al., 2012a,b). This also applies to laryngeal processes. The neural correlates of the supralaryngeal movements include the ''sensorimotor cortex [. . .], the supplementary motor area and the superior celebellar hemispheres'' (Ackermann, 2008; Grabski et al., 2012a) on both hemispheres as well as orofacial motor control in the central sulcus, rostral region of the precentral gyrus and the caudal areas of the postcentral gyrus bilaterally (Grabski et al., 2012b). Zarate (2013) proposes that, regardless of whether utterances are sung or spoken, vocalization relies on ''M1, ACC, basal ganglia, thalamus, and the cerebellum''. In addition, vocalization requires the integration of auditory cortex, insula, parietal and premotor regions and the integration of somatosensory feedback of the primary and secondary somatosensory cortex (Ackermann, 2008; Ackermann and Riecker, 2010; Zarate, 2013; Ziegler and Ackermann, 2013; Ackermann et al., 2014). A comparative study on singing vs. speech found bilateral activation in the inferior pre- and postcentral gyrus, the superior temporal gyrus (STG) and the superior temporal sulcus during singing and speech production (Özdemir et al., 2006). However, singing, in marked contrast to speech production, shows further activation in the primary sensorimotor cortex and in the mid-portions of the STG (Özdemir et al., 2006). Speaking and singing share large parts of neural correlates suggesting that singing training, more than instrument playing, leads to higher language imitation abilities. This has been corroborated more recently in discrimination training and testing, suggesting that auditory training alone does not improve vocal accuracy in non-musicians (Zarate et al., 2010). Behavioral investigations, on the other hand, came to a similar conclusion demonstrating that changing vocal commands of specific utterances leads to perceptual shifts (Nasir and Ostry, 2009). In the light of the present findings it could be assumed that vocalists develop different networks when compared to non-musicians and instrumentalists.

#### Neural Correlates of Musicians vs. Singers

Neuroscientific evidence has shown that musicians show alterations of their brain structure when compared to nonmusicians (e.g., Schneider et al., 2002, 2005; Gaser and Schlaug, 2003; Seither-Preisler et al., 2014). The alterations of musicians' brain structure have been explained to take place during specific conditions achieved through musical training. The OPERA hypotheses, proposed by Patel (2011), claims that five conditions are essential to impose music induced neural plasticity. The model suggests that: (a) overlaps of speech and music processing; (b) more precisely processing of music in general; (c) positive emotions of music; (d) musical activities which are repetitive in their nature; and (e) focused attention taken together should lead to benefits in speech processing. Vocalists, on the other hand, have to train one more condition which can be differentiated from the more general musical training, their vocal motor apparatus.

Studies focusing on the difference between instrumentalists and vocalists are scarcely conducted as it is a challenging task to find participants. An interesting study conducted by Schneider et al. (2005) focused on contrasting musical abilities. They found individual differences in pitch perception even within vocalists where sopranos showed higher fundamental pitch discrimination ability in marked contrast to altos. They concluded that the size of the neural Hesch's gyrus depends on the musical ability. Musicians, that is to say instrumentalists and vocalists, show anatomical alterations of the gray matter (Schneider et al., 2002), the Heschl's gyrus (Schneider et al., 2002, 2005; Gaser and Schlaug, 2003; Seither-Preisler et al., 2014), and the arcuate fasciculus (Halwani et al., 2011). More recently, Halwani et al. (2011) compared musicians to non-musicians, and vocalists. Results revealed that vocalists show additional structural adaptations in the arcuate fasciculus which have not yet been found in pure instrumentalists (Halwani et al., 2011). This suggests that vocal-motor training induces a change in the volume and complexity of white matter tracts (Halwani et al., 2011). These adaptations seem to improve the interplay between the auditory feedback system and the kinesthetic system (Kleber et al., 2010). Trained singers can rely more on somatosensory feedback compared to non-singers and instrumentalists. At the other extreme, our closest relatives, monkeys, lack a complex connectivity between the auditory system and the oromotor system (Rilling et al., 2012; López-Barroso et al., 2013) which might be one explanation why monkeys are unable to store rapidly occurring acoustic signals, although they have a high proficiency in mastering tactile and visual recognition (Schulze et al., 2012). Schulze et al. suggest that ''. . . in audition alone, monkeys seem unable to store stimulus representations'' concluding that the oromotor system assists memorization of speech sounds in humans. If the oromotor system is involved in memorization of speech signals, we hypothesize that vocalists with their refined vocal motor ability will outperform instrumentalists when imitating new foreign speech material. This would be additional evidence that: (A) human's vocal motor system is involved in laying down memory of utterances; and (B) show that vocal motor training should become an integral part of language learning settings (Fonseca-Mora et al., 2011).

# Materials and Methods

#### Speech Imitation: Hindi Stimuli

For testing the speech imitation skills of the participants we used Hindi stimuli which have already been tested in previous investigations (Nardo and Reiterer, 2009; Reiterer et al., 2011; Hu et al., 2012; Christiner and Reiterer, 2013). The participants had to repeat four simple sentences (statements) of equal syllable length (11 syllables) which consisted of either five or six words. The Hindi sentence material contained difficult retroflex consonants within 11 syllables: retroflex n and retroflex r. Retroflex consonants are both difficult to perceive and to produce for German native speakers. The correct articulation requires the tongue curled back against the palate. Additionally, Hindi is syllable-timed and rhythmically differently organised compared to the participants' mother tongue and aims at measuring the potential of a person's speech imitation aptitude by excluding all educational influence/benefits.

Christiner and Reiterer A Mozart is not a Pavarotti

The recording was performed in a soundproof room where the participants were introduced to their tasks. For the familiarisation task the participants were listening to Hindi stimuli three times via loudspeakers (ADAM A7) and were asked to repeat the sentences after the third time without recording. After the familiarisation task, the participants used studio headphones (BEYERDYNAMIC DT-770 PRO) and repeated the Hindi stimuli in the best accent they could manage, while being recorded. The recordings were rated by seven naïve Hindi native speakers who evaluated the overall performance of ''sounding like a native Indian'' on an intuitive rating scale from 0 (min) to 10 (max) with no particular reference to individual phonetic characteristics. To ensure that raters understood their task correctly, we instructed them to think of characteristics such as word stress, the rhythm of the language, intelligibility and pronunciation.

#### Musicality Test: AMMA

The AMMA test measures the participants' perceptual musical abilities. The test consists of 30 musical statements where the participants have to discriminate tonal and rhythmical changes of two musical statements or indicate that they are the same (Gordon, 1989). The AMMA test can be targeted to high school students and university music and non-music majors. For this task the participants performed the familiarization task which consists of three conditions. Either the paired musical statements were the same, included a tonal or a rhythmic change. After familiarization, the participants used headphones (BEYERDYNAMIC DT-770 PRO) and were asked to perform the task within a single sitting and decide whether the paired musical statements were the same, included a tonal or a rhythmical change.

#### Participants

A sample of 96 participants was selected and comprised 33 vocalists, 27 instrumentalists and 36 non-singers/non-musicians (67 female and 29 male participants) of a large age range (20–59). None of the participants reported any hearing problems or other impairments and gave informed consent. The participants were selected according to their musical abilities and their language knowledge. Choosing Hindi for testing German native speakers was based on the fact that German native speakers are largely unfamiliar with this particular language. The participants were asked to inform whether they speak Hindi or not. Furthermore, people who were exposed to this language were excluded from the research. This should ensure that participants have had no experience in Hindi before. The participants were all monolingual German native speakers. All participants reported to speak English. 59.4% of the participants additionally spoke French, followed by 32.3% who spoke Spanish and 22.9% who mastered Italian. 25.0% of them spoke other languages. Further criteria were that vocalists did not play instruments professionally and defined themselves as singers. Additionally, the vocalists received formal vocal education for at least 2 years and were rated and defined as advanced singers by professional singing educators. The instrumentalists chosen played one or two instruments actively at an excellent level and had no history of vocal instruction or semi-professional singing activities (hobby singing). The control group had no experience in singing apart from traditional singing activities such as happy birthday, and only little experience in instrument playing at a very basic level and they defined themselves as non-musicians. The strict participants' selection aimed at dissociating instrumentalists from vocalists and non-musicians/non-singers from musicians in general.

#### Results

#### Data Analysis

For analyzing the results, we ran one-way ANOVAs. The first one-way ANOVA was performed to find out whether the speech imitation ability in Hindi of the instrumentalists, vocalists and the non-musicians shows differences. The second one-way ANOVA was performed to analyze whether the musical ability to discriminate rhythmical and tonal differences shows significant differences between the instrumentalists, musicians and nonmusicians. Age, gender and the number of languages spoken had no effect on the musicality test or speech imitation tasks.

#### Results Speech Imitation ANOVA

The mean of Hindi imitation for vocalists was 4.35, SD = 1.15, for instrumentalists 2.28, SD = 1.31, and for non-musicians/nonsingers 1.56, SD = 1.20.


#### Results Musicality Test ANOVA

The mean of the musicality test for vocalists was 60.27, SD = 8.21, for instrumentalists 61.19, SD = 10.57, and for nonmusicians/non-singers 52.22, SD = 7.69.


# Discussion

Results, indeed, revealed that vocalists outperform instrumentalists on language imitation tasks of unfamiliar utterances. The present investigation supports that instrumentalists and vocalists both are statistically more sensitive to perceive and discriminate rhythmical and tonal changes of melodies than non-musicians/non-singers. Their heightened ability to perceive musical stimuli, however, can only partly explain why numerous studies reported a positive relationship of musical expertise and oral language acquisition processes. Based on this study, it is virtually impossible to distinguish instrumentalists and vocalists on their ability to perceive musical stimuli while, on the other hand, results clearly indicate that an ability to reproduce unintelligible and unfamiliar utterances adequately is significantly higher in vocalists than in instrumentalists. This supports that vocalists

Instrumentalists and singers are significantly better than non-musicians/non-singers. However, there is no significant difference between instrumentalists and vocalists on the perceptual musicality test. and instrumentalists should be dealt with as different categories in the field of language acquisition research. Furthermore, it stresses that the positive relation between musical talent and language talent in the realm of speech imitation is not reliant on audition alone but also on oromotor induced functional processes.

#### Musicians' Enhanced Perception

This study's results are consistent with previous investigations on phonetic aptitude and musical expertise (e.g., Schön et al., 2004; Thompson et al., 2004; Wong and Perrachione, 2007; Pastuszek-Lipinska, 2008; Milovanov, 2009; Nardo and Reiterer, 2009; Reiterer et al., 2011; Hu et al., 2012; Christiner and Reiterer, 2013; Martínez-Montes et al., 2013). Musicians (instrumentalists and vocalists) were better in imitating unintelligible speech when compared to non-musicians/non-singers. Furthermore, the ability to imitate unintelligible speech is significantly higher in people with higher musical aptitude, even though the language tested (Hindi) was rhythmically differently organised and contains non-native language features (e.g., retroflex sounds) which are largely unknown to German participants. The underlying mechanisms for musicians' better performance in speech imitation have been discussed in much detail in the recent literature and approached from various angles. Research in neuroscience, for instance, concluded that musicians possess a better working memory (e.g., Koelsch et al., 2009; Rota and Reiterer, 2009; Schulze et al., 2011; Rodríguez-Fornells et al., 2012; Schulze and Koelsch, 2012) and have anatomical endowments in the brain (Schneider et al., 2002; Kleber et al., 2010) which differentiates them from average people to name but a few. Behavioral research, on the other hand, found that musicians may treat short and unintelligible speech streams like musical statements (Christiner and Reiterer, 2013), or rely on the ''musical components of speech'' when listening to new utterances (Milovanov, 2009). Others demonstrated that musicians could incorporate new utterances more easily (Kraus and Chandrasekaran, 2010) and remember longer sound chunks (Pastuszek-Lipinska, 2008), leaving no doubt about the positive transfer of musical abilities on foreign language perception.

#### The Vocal Motor System and its Neural Underpinnings

The present results point to statistically significant differences between instrumentalists and vocalists in oral language imitation abilities, although both are perceptually indistinguishable according to this study's musicality measurements (AMMA). If the difference does not lie in their perceptual musical abilities, it can only concern the vocal motor ability and its effect on language functions. The outperforming of vocalists over instrumentalists shows that the oromotor system plays a crucial role in language acquisition processes. In this study the Hindi stimuli were selected because they contain retroflex consonants which are rather uncommon and difficult to be produced by native German speakers. Singers' better performances indicate that singing and speech production are based on the same principles (García-López and Gavilán Bouzas, 2010). Both vocal behaviors are largely bihemispherically, cortically and subcortically organised (Özdemir et al., 2006; Ackermann, 2008; Grabski et al., 2012a; Ziegler and Ackermann, 2013; Ackermann et al., 2014) and draw on common grounds.

Being able to imitate foreign accents on a native level is a highly valued ability. This requires the speaker's awareness of language specific temporal, dynamic and tonal specialities. When acquiring a language, beginners, however, most often fail to understand where a word begins or ends in a speech stream. Thus, language learners apply the same segmentation strategy of their mother tongue to that of the foreign language (Patel, 2008). Language learners, therefore, need to adapt intonation, word stress, rhythm and melodic aspects of the target language to be as accurate as possible. This requires an enhanced perceptual ability which has been the dominant view in the recent literature, although, controversially, the importance of somatosensory information has been reported as well (e.g., Nasir and Ostry, 2009). However, in how far vocal motor ability and vocal motor flexibility play a major cause has poorly been investigated. Recent research showed that the changes of the vocal motor commands lead to perceptual shifts (Nasir and Ostry, 2009) demonstrating that language production and perception are ultimately linked with each other. Some decades ago Liberman already introduced the motor theory of speech perception and proposed that speech perception of an utterance is ''to perceive a specific pattern of intended gesture'' (Liberman and Mattingly, 1985). Broadly speaking, this would mean that acoustic speech sounds are transformed gestures and predicate that speech perception and production are ''in two ways motor'' (Liberman and Mattingly, 1985). Another more recent study provides evidence that the oromotor system may be involved in laying down memory (Schulze et al., 2012). This links speech perception, production and memory and would also explain why vocalists with their refined vocal ability can outperform instrumentalists despite their same perceptual musical ability. The vocal motor system assists memorization and an enhanced vocal flexibility may speed this process up.

Vocal motor training, however, has also shown to affect brain structure. While structural changes in the brain caused by developmental factors through language use has received considerable attention (e.g., Dubois et al., 2006; Brauer et al., 2011), an equivalent research for the influence of singing on brain development does not exist. Brain research in maturational language studies showed particular interest in how the connections between the arcuate fascicle, superior longitudinal fasciculus and the extreme capsulate fiber system influences language acquisition processes (Brauer et al., 2011). Within this study the arcuate fasciculus is most important as its role in motor execution of vocalization is well-known (Basser, 1995). Vocal motor induced alterations of the brain have been found recently in ''the dorsal and ventral branches of the left AF'' in vocalists which differentiate them from pure instrumentalists and non-musicians (Halwani et al., 2011) and may be one marker for higher vocal ability. In the present study singers were trained and had several years of training. It could be speculated that the years of vocal-motor training lead to structural changes and to higher connectivity between the auditory cortex and the somatosensory cortex. This has been supported recently as song-like training leads to structural adaptations in the arcuate fasciculus of people who suffer from brain lesions after successful speech recovery which improves spontaneous speech production (Schlaug et al., 2009). This speech therapy, the Melodic Intonation Therapy (MIT), uses similar tools as singing instruction including syllable lengthening and intoning (singing; Norton et al., 2009). The reasons why the MIT improves the spontaneous speech production in aphasics, have not been fully understood yet. A recent study suggests that rhythm may be most relevant in speech recovery (Stahl et al., 2011), while others favour the intoned instructions as a whole (e.g., Schlaug et al., 2009). But while auditory influence on speech recovery has received considerable attention, vocal motor induced anatomical adaptations in the brain have been largely ignored. It might even be more likely that the improvement in the spontaneous speech production of aphasics is a combination of more processes. In how song-like instruction play a role may be difficult to explain. However, singers' higher ability to imitate speech is reliant on their flexible vocal motor ability which resembles the flexibility of infants who can virtually learn all languages and phonemes within a particular time window (Kuhl, 2004). However, it may also be input dependent. Language targeted at infants is usually more song like in its quality (Murphey, 1990) and more musical (Brandt et al., 2012). Pre-speech research has also demonstrated that early vocalizations of infants ''contains melodic constituents for both musical and prosodic structures'' (Wermke and Mende, 2009). Singing integrates flexible vocal motor training and music-like instructions which may simulate a language learning situation analogous to infants' L1 learning and it may therefore be activating one of the most important language acquisition devices.

### Conclusion

Based on this study, it is virtually impossible to distinguish instrumentalists and vocalists on their ability to perceive musical stimuli while, on the other hand, results clearly indicate that an ability to imitate foreign speech (speech production) is significantly higher in vocalists than in instrumentalists. This indicates that vocalists and instrumentalists should be regarded as individual categories in language acquisition research and not summed up under the term ''musicians''. According to this study, adaptive plasticity for speech imitation relies equally on production and perception. Vocal flexibility of singers, on the other hand, has a positive transfer effect to the imitation of new and unusual utterances demonstrating that singing and speech imitation/production are closely connected. The present findings, however, also have implications on language learning and teaching. In the light of the present findings it may be justified to rethink language teaching as such. As shown in this research, musical expertise enhances language functions in adults, which shows that language teaching might benefit if musical input is included in language teaching.

# References


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Christiner and Reiterer. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# How native-like can you possibly get: fMRI evidence for processing accent

*Ladan Ghazi-Saidi1\*, Tanya Dash1 and Ana I. Ansaldo1,2*

*<sup>1</sup> Centre de Recherche de I'Institut Universitaire de Gériatrie de Montréal, University of Montreal, Montreal, QC, Canada, <sup>2</sup> Faculté de Médecine, Université de Montréal, Montreal, QC, Canada*

Introduction: If ever attained, adopting native-like accent is achieved late in the learning process. Resemblance between L2 and mother tongue can facilitate L2 learning. In particular, cognates (phonologically and semantically similar words across languages), offer the opportunity to examine the issue of foreign accent in quite a unique manner.

Methods: Twelve Spanish speaking (L1) adults learnt French (L2) cognates and practiced their native-like pronunciation by means of a computerized method. After consolidation, they were tested on L1 and L2 oral picture- naming during fMRI scanning.

Results and Discussion: The results of the present study show that there is a specific impact of accent on brain activation, even if L2 words are cognates, and belong to a pair of closely related languages. Results point that the insula is a key component of accent processing, which is in line with reports from patients with foreign accent syndrome following damage to the insula (e.g., Katz et al., 2012; Moreno-Torres et al., 2013; Tomasino et al., 2013), and healthy L2 learners (Chee et al., 2004). Thus, the left insula has been consistently related to the integration of attentional and working memory abilities, together with fine-tuning of motor programming to achieve optimal articulation.

#### *Edited by:*

*Marcelo L. Berthier, University of Malaga, Spain*

#### *Reviewed by:*

*Mirko Grimaldi, University of Salento, Italy Fabio Richlan, University of Salzburg, Austria*

#### *\*Correspondence:*

*Ladan Ghazi-Saidi ladan.ghazi.saidi@umontreal.ca*

*Received: 19 July 2015 Accepted: 09 October 2015 Published: 30 October 2015*

#### *Citation:*

*Ghazi-Saidi L, Dash T and Ansaldo AI (2015) How native-like can you possibly get: fMRI evidence for processing accent. Front. Hum. Neurosci. 9:587. doi: 10.3389/fnhum.2015.00587*

Keywords: accent, bilingual, second language acquisition, brain, insula

# INTRODUCTION

Second language (L2) acquisition encompasses mastering many components, including syntax, semantics, pragmatics, phonology, and phonetics. Adopting native-like accent is not always possible, and is mostly a function of age of acquisition (Long, 1990; Bongaerts et al., 1995; Birdsong, 1999; Singleton, 2003). The notion of accent is a complex one, as it concerns a number of features that go from phonological to motor and emotional dimensions. Hyman (2006) describes accent at the word and phrase level, as "stress accent" and "pitch accent" respectively. Moreover, accent is also influences by psychosocial factors, such as cultural background and education. In this regard, Crystal (1987, 2003) have defined accent as the way in which a specific language is pronounced, which allows identifying the region and the social status of the speaker. From a neurolinguistic point of view, accent comprises processing phonology, prosody, intonation, as well as motor programming and planning. Phonetic and prosodic rules that characterize a specific language are crucial features of accent. Thus, accent concerns segmental (i.e., prosodic distinction) and suprasegmental units (i.e., loudness, pitch and duration). Prosodic distinction is considered segmental based on its position in entire prosodic structure (Keating, 2006). For example, the phonetic realization of a consonant /p/ depends on the consonants' position in the prosodic structure (i.e., where the terminal node is going to come). As it is the case with other domains of language development, accent in mother tongue is acquired automatically (i.e., unconsciously). Hence, as it is acquired naturally as a lexical or a phrasal language component, it is hard to dissociate from language processing as a whole. However, in cases where the native speaker decides to change accent for pragmatic reasons (i.e., to mark social status, political or academic purposes, cultural influences, or in circumstances such as acting or dubbing), conscious control of accent is required, and changing accents may be even effortful.

In the context of second language (L2) learning, accent processing generally necessitates some level of cognitive control, particularly when the age of acquisition is above the critical period (Bongaerts et al., 1995; Birdsong, 1999; Singleton, 2003). Thus, there is evidence that children lose their capacity to perceive and distinguish phonemes as early as 6 months to 12 months of age (Werker and Desjardins, 1995), and therefore, after that critical period, the production of phonemes in L2 will be influenced by the mother tongue. Consequently, it has been argued that foreign accent can be detected both in early and late L2 learners (Barlow, 2014), in particular if L2 has been learnt in a formal manner (Peltola et al., 2003, 2007; Best and Tyler, 2007; Jost et al., 2015). Perhaps this is one of the reasons native-like accent has been used as an indicator of high proficiency in second language. This L2 native-likeness depends on a number of factors, including L2 age of acquisition, frequency of L2 use and amount of exposure, gender, formal training, motivation, amount of continued L1 use (Flege et al., 1995; Piske et al., 2001), as well as type of L2 learning approach and linguistic distance which eventually leads to inaccessible perceptual representation to L2 learners (Flege, 2003). Thus, regarding learning approach, there is evidence that it is more likely to achieve native-like accent when L2 acquisition occurs by informal exposure and interaction in a naturalistic L2 context, especially through interaction with friends, rather than formal training such as taking lessons in a classroom (Polat, 2007). As for the influence of linguistic distance on accent processing, the more the structural similarities across L1 and L2, the larger the overlap at the phonological and phonetic levels (Ringbom, 2007), and thus, the smaller the load on attentional and motor-processing abilities. However, there is evidence of better acquisition of L2 phonemes when the later are not part of the L1 phonology (Best and Tyler, 2007). Interestingly, there are models (PAM: Perceptual Assimilation Model, Best, 1995; SLM: Speech Learning Model, Flege, 1995), which interpret L2 sounds from L1 pre-existing Structures. In contrast, Markedness differential hypothesis (Eckman, 1977) functions on typological markedness between L1 and L2. Grimaldi et al. (2014) by using mismatch negativity experiments of speech sound discrimination confirmed the PAM framework suggesting assimilation of L2 vowels in the pre-existing phonology in absence of L2 phonetic discrimination. Lack of discrimination for vowel contrast was reported by Peltola et al. (2007), in early language immersion programs, in the form of the absence of native like memory trace of L2 vowels. On the other hand, Zsiga (2003) reported an asymmetrical transfer of patterns in L1–L2 syllable sequences thus supporting a default pattern of articulation uncharacteristic of L1 and L2. Such work is in favor to Markedness differential hypothesis, i.e., there are universal contrasts, which favor less marked structures, thus the asymmetry in the transfer between L1 and L2.

The study of accent has been approached by different means. Studies on adult healthy populations, including secondlanguage learners and studies on clinical populations presenting disordered speech motor control or other clinical conditions. Also, different methodologies have been used to explore the neural basis of accent, namely anatomical and functional neuroimaging, as well as computational modeling, based on clinical data.

Different brain regions play crucial role in speech motor production (Guenther, 2006; Guenther et al., 2006). The Directions Into Velocities of Articulators (DiVA) is a computational model, which argues for the role of the insula in articulation, and highlights the similarity between this role and that of the premotor and motor cortices (Guenther et al., 2004). Thus, the DiVA, model includes the insula as part of the motor speech control circuit, and thus, participating to in accent processing.

The neural basis of speech motor control has been extensively approached by means of behavioral and functional neuroimaging tools, but very few of them have specifically focused on the neural basis of accent processing. Specifically with healthy populations, one ERP study (Romero-Rivas et al., 2015), and one fMRI study on the comprehension of foreign accent speech (Adank et al., 2012) have been published. However, there are no neuroimaging studies on the production of accent in healthy population. Most data on accent processing comes from clinical reports, some of which include a comparison of functional neuroimaging in healthy control participants and patients with foreign accent syndrome (FAS), (Fridriksson et al., 2005; Poulin et al., 2007; Katz et al., 2012; Moreno-Torres et al., 2013; Tomasino et al., 2013) or AOS (Moser et al., 2009). Other case reports include imaging reports on a variety of brain- damaged populations presenting speech disorders (e.g., Kent and Rosenbek, 1983; Wertz et al., 1984; Gurd and Coleman, 2006; Mariën and Verhoeven, 2007; Moreno-Torres et al., 2013; Tomasino et al., 2013). The next paragraphs develop on these accounts. Motor speech disorders include apraxia of speech (AOS), Dysarthria and FAS, all of which are characterized by the disruption of phonetic-prosodic components of speech production, affecting the naturalness and native-likeness of speech. Specifically, AOS is characterized by an impaired ability of initiation, sequencing, timing, coordination and vocal tract shaping for speech sound production (Kent and Rosenbek, 1983; Wertz et al., 1984), and disrupted finetuning of the balance between production of phonetics and prosodic units of speech (Boutsen and Christman, 2002; Aichert and Ziegler, 2004) resulting in unnatural production of speech sound. This accent pattern has been related to damage in the left inferior frontal gyrus, and the anterior insula, both areas having been reported to play a role in novel speech production, particularly in regard to the facilitation of new motor plans for speech (Chee et al., 2004). On the same line, Moser et al. (2009) conducted an fMRI study with 30 healthy adults on a non-word-repetition task with English (native) or Non-English (novel) syllables; the authors (Moser et al., 2009) found greater activation in anterior insula with a novel syllable processing as compared to native syllable sequences. A single-case study Hiraga et al. (2010), provides further evidence on the role of the insula in accent processing, this time in the context of pure dysarthria. Dysarthria is characterized by deficient articulation resulting from reduced motor strength and motor coordination, and/or to defects of the articulatory apparatus (Goodglass, 1993). In their single-case study, the authors (Hiraga et al., 2010) reported on a 72-year-old male who suffered posterior insular damage with dysarthria and no aphasia. Although limited to a single-case, this study points to the role of the insula in the processing of accent.

Studies on FAS have been very important in pointing to the role of specific networks in the processing of accent. FAS is characterized by pronunciation alternations that make the native speaker of a given language sound as a foreign speaker. These alternations, at least in English, include syllable-timed speech rhythm instead of stress-timed speech rhythm, the insertion of epenthetic vowels, that change syllable structure, tense vowel systems in place of tense/lax systems, and sentential intonation, patterns with rising contours (Blumstein and Kurowski, 2006). This condition does not include phonological errors, and it is distinct in both its characteristics and underlying mechanisms from an AOS, a dysarthria and an aphasic speech output (Blumstein and Kurowski, 2006). Berthier et al. (1991), discuss four cases of FAS, by reference to all 10 case reports since 1919 and concluded that for the disorder to be diagnosed as FAS, co-occurrence of segmental and prosodic deficits is essential. They associate FAS to damage in the precentral gyrus, with better recovery being observed following premotor damage. The collective evidences across a variety of clinical populations shows the role of a set of areas in accent processing; these include Broca's area (frontal operculum and posterior third of the inferior frontal gyrus), the premotor cortex, the striatum, the insula, the pallidum, the thalamus, as well as white-matter pathways of the internal capsule—all typically on the left side in righthanded patients (Kurowski et al., 1996; Gurd and Coleman, 2006; Mariën et al., 2006, 2009; Scott et al., 2006; Kuschmann et al., 2012).

Functional neuroimaging evidence from FAS comes from comparisons between FAS patients and controls, performing a variety of language tasks (Fridriksson et al., 2005; Poulin et al., 2007; Katz et al., 2012; Moreno-Torres et al., 2013; Tomasino et al., 2013). Moreno-Torres et al. (2013) reported on a middleaged bilingual woman with chronic FAS, characterized by deficits including changes in linguistic and emotional prosody, as well as lack of motivation to communicate. Magnetic resonance imaging (MRI) showed bilateral lesions, particularly in the left deep frontal operculum, and dorsal anterior insula. Also, Diffusion tensor Imaging (DTI) and Tractography suggested disrupted left deep frontal operculum-anterior insula connectivity. Positron emission tomography (PET) showed decreased activation in Brodmann's areas 4, 6, 9, 10, 13, 25, 47, in the basal ganglia, and anterior cerebellar vermis. The authors (Moreno-Torres et al., 2013) argue that the ensemble of the neurofunctional and neuroanatomical evidence from this single-case report suggests that FAS entails altered planning and execution of speech production, with both cognitive control and emotional

communication dimensions. Moreover, this report shows the key role played by the insula-frontal operculum circuit in the processing of accent. In another study using functional MRI, Katz et al. (2012) reported the activation maps related to a picture-naming task in an English-speaking woman with FAS of unknown etiology. The activations included the left superior temporal and medial frontal structures, bilateral subcortical structures and thalamus, the left insula and the left cerebellum. Similarly, in their PET study, Tomasino et al. (2013) compared the accent of a patient suffering from FSA secondary to damage to the putamen, to that of a group of healthy controls, in the context of counting, sentence and pseudoword production and picture naming. As compared to healthy subjects, the patient showed an increased activation in the pre/postcentral gyrus and ventral angular gyrus. Authors conclude that FAS is a result of an impairment of the feed-forward control commands, in particular of the articulator velocity and position maps (Tomasino et al., 2013). Another PET study by Poulin et al. (2007) examined FAS in a case of bipolar syndrome and reported hypometabolism in the frontal, parietal and temporal lobes bilaterally, as well as a focal damage in the left insular and anterior temporal cortex (Poulin et al., 2007), thus pointing to the role of the anterior temporal gyrus and the left insula in accent processing. Finally, Fridriksson et al. (2005), report the case of a stroke patient with damage in the putamen and extending fiber tracts, showing symptoms of FSA. Concurrently with impaired motor speech regulation, fMRI results with an overt picture-naming task show a significant activation of the superior temporal and inferior frontal lobes, as well as in the inferior motor strip (face region) and the lateral occipital gyri. The authors (Fridriksson et al., 2005) argued that the lesion resulted in apraxia and FAS symptoms as a consequence of increased reliance on motor execution, as reflected by the activation motor cortex (Fridriksson et al., 2005). Another possible interpretation is that damage to the fiber tracts disconnected this circuit from the insula and leading to the reported FAS symptoms.

Despite the interest of the previous studies, it is difficult to draw any strong conclusions regarding the activation patterns reported in regard to the neural basis of accent. Thus, the activation maps observed in these patients are not exclusive to accent processing, but reflect a variety of task processing components. Also, given that brain damage disrupts complex brain circuits, and leads to symptoms that reflect both damage and compensation to damage, it is not possible to draw conclusions regarding the areas or set of areas specifically related to accent processing. In this regard studies with healthy and in particular, studies with second language learners, could open a window onto the normal neural mechanisms underlying the production of a foreign accent. In particular, fMRI studies on cognate learning in healthy adults can shed light on the neural basis of accent processing. Thus, cognates share phonological and semantic features across languages, and thus they are easier and faster to learn than non-cognates, which share semantics only, and clangs which share phonology but not semantics (De Groot, 1992, 1993; Sánchez-Casas et al., 1992; Ellis and Beaton, 1993; Kroll and Stewart, 1994; De Groot and Keijzer, 2000; Hall, 2002; Sánchez-Casas et al., 2005; Christoffels et al., 2007). Moreover, when learning of cognate is consolidated, they are almost processed as mother tongue (Perani et al., 1996; De Bleser et al., 2003). Still, there are subtle differences in the pronunciation of cognates at the level of intonation, prosody, and articulation placement lead to what we perceive as accent, which make cognates good candidates to isolate the neural markers of foreign accent in the healthy brain.

In the present study, we examined 12 healthy Spanishspeaking adults, who learnt French cognates by means of a computerized training program. They were trained for 4 weeks, until they attained a perfect score in picture naming of L2 cognates, after which they were tested on picture naming of cognates during fMRI scanning, both in L2 and their mother tongue.

#### MATERIALS AND METHODS

#### Experimental Design

This was an event related fMRI group study on cognate naming. Behavioral and event-related fMRI measures were collected after 4 weeks of cognate naming training, when participants attained a 100% success rate in naming. The study included a preexperimental assessment on bilingualism and cognitive status, and the completion of a computerized vocabulary-learning program. Participants received a short training on the use of the program, followed by self-training, for 15 min a day during 30 days. When participants attained 100% success rate on naming, they were tested on naming the object pictures in L2 (French), and in their mother tongue (Spanish). Accuracy rates (ARs), accent judgment rating and RT were collected, and activation maps relative to event-related BOLD responses were extracted. More details about participants, stimuli, training program and the task will follow.

#### Participants

Twelve Spanish-speaking (L1) adults (40.7 ± 13.0 years, range 26–66; six males, six females), with no history of neurological or neuropsychological disorders, participated in our study (refer to **Table 1** for details). All participants were right-handed, as measured by the Edinburgh Handedness Inventory (Oldfield, 1971), and were homogeneous in terms of their educational background (15.8 ± 1.5 years, range 12–18), and were matched for an elementary level of French from the elementary level immersion courses offered by the Quebec government for immigrants, who were tested to have no knowledge of French by a placement test. Given that students pass standardized placement tests and a thorough interview to be admitted to these courses, this ensured an equal amount of exposure to L2 at recruitment and an equivalent level of L2 knowledge. In addition, baseline in L2 proficiency was determined by means of a questionnaire based on the work of Paradis and Libben (1987), Flege (1999), Silverberg and Samuel (2004) used in previous studies in our laboratory (Raboyeau et al., 2010; Scherer et al., 2012; Marcotte and Ansaldo, 2014). This modified version of the questionnaire TABLE 1 | Participants' demographic data and Neuropsychological test results including: Montreal Cognitive Assessment (MOCA) Memory test (Nasreddine et al., 2005); Memory and Learning Test (Grober and Buschke, 1987; Grober et al., 1988), and the Attention and inhibition Stroop test (Beauchemin et al., 1996).


TABLE 2 | Information on the participants' knowledge of L2 (French) at baseline.


gives information on the participants' level of proficiency, L2 language use as well as the duration of the L2 courses (refer to **Table 2** for details). All the participants reported of minimal exposure of French at home as well as outside home. The participants were highly motivated toward the French language course as means of successful participation in the different social settings/community through the use of language. Further, participants were tested on their knowledge of the experimental stimuli before they experienced any lexical learning in L2; being able to name more than 15% of the stimuli, thus five stimuli, was considered an exclusion criteria.

Cognitive factors that may have an influence on L2 vocabulary learning were controlled by means of a series of tests. Mild cognitive impairment was ruled out by means of the Montreal Cognitive Assessment (MOCA, Nasreddine et al., 2005). Only participants with score of 26 or more (29.8 ± 0.77) were included. The Memory and Learning Test (Grober and Buschke, 1987; Grober et al., 1988) controlled for memory and learning

#### TABLE 3 | List of Spanish–French cognates.


skills, participants free recall skills (27.25 ± 4.24 raw score) and category recall skills (9.9 ± 2.0 raw score), and the nonverbal Stroop test (Beauchemin et al., 1996) for attentional and executive function abilities color-time (17.6 ± 6.01 s), word-time (16.2 ± 5.6 s) and word-color (22.3 ± 6.7 s). Participants who performed above cut-off levels for this battery were recruited to participate. After completing the pre-experimental assessment, participants were enrolled in a computerized lexical-training program in French.

#### Stimuli

The experimental list included 35 Spanish-French Cognates (*N* = 35; e.g., Téléphone /telefOn/, French, and Teléfono /telefOnO/, Spanish; both words meaning 'telephone'), and a set of color photographs depicting each of them (refer to **Table 3** for complete list of stimuli). Stimuli were matched across languages for: visual complexity, object and word familiarity, lexical frequency, word length, and number of phonemes and syllables. An equal number of items were selected for animals, fruits and vegetables, clothes and accessories, stationery, and household objects to control for a possible category effect (Caramazza and Shelton, 1998). Twenty distorted images were used as the control condition for fMRI scanning.

#### Lexical Training Program

Similar to our previous studies (Raboyeau et al., 2010; Ghazi-Saidi et al., 2013), participants completed a computerized selftraining to learn 35 cognates and their native-like pronunciation in Canadian French (Québécois). Participants spent 15 min a day during 4 weeks. With the computer software, the target picture is displayed on the screen, followed by a series of phonological cues, displayed under the target picture when an icon is pressed. The first cue is the first sound of the target word, followed by the first and second sounds, and finally the whole target word. Participants were instructed to look at the picture, and listen to the first cue, then to the second cue, and then to the whole word. They were allowed to repeat this procedure as many times as necessary to learn the word. In their subsequent practice sessions, participants would first try to name the object when they saw the target picture; if unsuccessful, they would press on the icon and listen to the first cue; if they failed to recall the name of the object, they would listen to the second cue, and if necessary to the whole word. Participants were asked to make an effort to pronounce the word as similarly as possible to the native pronunciation as possible. Thorough instructions were given to the participants at the beginning of the experiment; the respect of all instructions was checked with each participant, on the phone and by e-mail every 2–3 days. Participants were fully committed to respecting the 15-min training routine.

#### Experimental Task and Procedure

After the consolidation, participants were tested on an overt picture-naming task during fMRI scanning. Task instruction was to look at colorful photos of objects and name the object and pronounce the target word as closely as possible to the French native model they have been practicing, and to say dido (a pseudo-word in Spanish, French, and English, in response to seeing distorted images). The task was performed both in L2 (French) and in L1 (Spanish). The event-related experimental design included two runs. Thus, in Run 1, participants were asked to name the cognate pictures in L2, and to do so as closely as possible to the native accent, whereas in Run 2, they were asked to name the same pictures, in their mother tongue. The procedure and task were practiced in the fMRI simulator for optimal data acquisition conditions in the fMRI scanner. Stimuli were displayed by means of a computer equipped with Presentation software v.11.21 . Participants lay on their back with their head fixed by cushions and belts, and an fMRI-compatible microphone (MRConfon Optical microphone) was placed close to the participant's mouth to record responses. No bite-bars were used to allow accurate articulation and also considering that the evidence does not support the use of this device, as it may add extra inconveniences for the participants and thus affect their attention and performance (Heim et al., 2006). Rigid-body head movements were corrected with online movement correction. Before the naming task, and as practiced in the simulator, participants were instructed to look at the computer screen and name aloud each of the pictures presented to the as accurately and as quickly as possible. These pictures were the same as those used in the training phase (*N* = 35 stimuli) presented randomly by means of Presentation v11.2. Each picture was presented for 4 s, after which there would be a blank page for a randomized interval of 4600–8600 ms, then the next picture would be presented. Oral responses were acquired with the fMRI-compatible microphone, and Sound Forge software (Sonic Foundry, Madison, WI, USA). Following our previous studies (Raboyeau et al., 2010; Ghazi-Saidi et al., 2013), we used a variable inter-stimulus interval (ISI) to assure a better sampling of the hemodynamic response and prevent attentional bias (Huettel et al., 2004).

#### fMRI Parameters

Acquisition parameters were the same as in previous studies in our laboratory (Raboyeau et al., 2010; Ghazi-Saidi et al., 2013). The acquisition included 28 slides in the axial plane, so as to scan the whole brain, including the cerebellum. Sequential slices were collected, to avoid the stripping that might happen because of certain types of head motion (Siemens 3T Scanner User Training: Supporting Information and FAQ).

Stimulus presentation time was 4500 ms, with a variable ISI (between 4325 and 8375 ms), TR = 3 s, TE = 40 ms, matrix = 64 × 64 voxels, FOV = 24 cm, and slice thickness = 5 mm. A high-resolution structural scan was obtained during the two functional runs (naming in L1 and naming in L2), using a 3D T1-weighted pulse sequence MPRAGE (TR = 2.3 ms, TE = 4.92 ms, angle = 25◦, 76 slices, matrix = 256 × 256 mm, size = 1 mm × 1 mm × 1 mm, FOV = 28 cm).

#### Ethical Issues

This study was approved by the ethics committee of Réseau de Neuroimagerie du Québec (RNQ). All participants signed a consent form. The procedure was explained clearly to the participants. All data were recorded in the Unité neuroimagerie fonctionnelle (UNF) at the Institut de Gériatrie de Montréal (IUGM). Appendix 1 includes the UNF screening form and can be found in the online version, at http://dx*.*doi*.*org/10*.*1016/j*.* bandl*.*2012*.*11*.*008.

#### Data Analysis

#### Demographic and L2 Knowledge Related Patterns

A jury of three Canadian French native speakers rated the degree of native-likeness of word pronunciations following cognate learning. Raters included two women and four men, aged between 22 and 48. All raters were born and raised in Quebec, and were native speakers of Canadian French. Raters were asked to answer a questionnaire on their demographic information and their French knowledge. However, only three raters (two men and one woman) provided a complete rating of the accent characteristics of the participants, the three others were excluded, as they had not rated all of the stimuli, or had found it difficult to listen to some of the recordings.

The rating procedure was based on the procedure used by Piske et al. (2001). Thus, all naming responses in L2 were recorded for each participant. Each rater was given a scale (see Appendix 2) and asked to listen to responses and rate how nativelike each participant's accent. The instruction read as follows: please circle the value that you give to each participant, on the scale. On this scale 1 is very foreign and 9 is native-like. Raters rated each participant individually on a scale of 1–9, for one having heavy foreign accent and nine being perceived as a Canadian French native speaker. Please see Appendix 2. Questionnaire and the scale filled up by Canadian French Native raters.

#### Behavioral Data Analysis

The event-related design allowed discriminating between correct and incorrect responses. Response times (RTs) and ARs were calculated. Non-responses, Spanish words, and phonological errors (e.g., /pi/ instead of /pie/) were considered wrong answers, and thus not included in further analysis. Statistical analysis included ARs and RTs for Cognates as well as the pseudo-word with SPSS, version 17.0.

#### Neuroimaging Data Analysis

BOLD responses were analyzed for the correctly named items following the data analysis plan of our previous work (Raboyeau et al., 2010; Ghazi-Saidi et al., 2013; Marcotte and Ansaldo, 2014). Neuroimaging data was analyzed with Statistical Parametric Mapping-8 *(SPM-8, Welcome Trust Centre for Neuroimaging, Department of Cognitive Neurology, London, UK)*, established in Matlab *(MathworksInc, Sherborn, MA, USA)*<sup>2</sup> . Data analysis was performed individually first, and them within the group of participants. Slice timing, realignment, normalization, and segmentation were performed first. Images were spatially smoothed with an 8-mm Gaussian filter. Only BOLD responses for correctly retrieved words were included in the analysis. For each participant and for the whole group, task-related BOLD changes were examined by a convolving vector of the onset

<sup>1</sup>http://www*.*neurobs*.*com

<sup>2</sup>www*.*fil*.*ion*.*ucl*.*ac*.*uk/spm/

of the stimuli with a hemodynamic response function (HRF), and its temporal derivative. Statistical parametric maps were obtained for each individual subject, by applying linear contrasts to the parameter estimates for the events of interest (the correct responses); this resulted in a t-statistic map for every voxel.

One-sample *t*-test, group averages were calculated for Cognates minus the control condition (i.e., Cognates –dido). Cluster size (*k*) was superior to 20 voxels and *p <* 0.001. Further, direct contrasts were performed to examine the neural substrate that characterized the processing of accent, with the contrasts: (CognateL2 vs. CognateL1), Significant activated clusters were considered were larger than 15 voxels (*k >* 15) and *p*-value was settled at 0.001.

# RESULTS

# Behavioral Results with Cognate Learning

Mean ARs for naming cognates in L2 (*M* = 85.9, *SD* = 1.4). Correct responses for naming L2 Cognates, in the scanner, included an average of 30 items (maximum = 33, minimum = 28). Further, there was no significant difference in the RTs (in seconds) for naming Cognates in L2 (*M* = 1.81, *SD* = 0.64) and Cognates in L1 (*M* = 1.61, *SD* = 0.4); *t* (0.93) = 0.21, *p* = 0.36.

#### Accent Analysis Results

The jury of native speakers (raters) considered that participants showed a heavy foreign accent when producing learnt cognates (*M* = 3.1; *SD* = 1.4, on a scale of one to nine, where a score of 1 corresponds to perception of a strong foreign accent, and a score of 9 corresponds to the perception of Native). Amount of agreement between the three raters was 52% (κ = 0.346), indicating fair agreement (Landis and Koch, 1977).

## Neuroimaging Results

The fMRI contrast between L2 Cognates and Dido (i.e., Cognate L2 – Dido) for naming images in L2 (French), shoed a significant activation in the left Middle occipital gyrus, the left Lingual gyrus, the left Inferior frontal gyrus, the left Precentral gyrus, the left Inferior frontal gyrus and the left, and the right Middle occipital gyri, the right Parahippocampal gyrus and the right Cerebellar tonsil.

T-contrast fMRI analysis (i.e., Cognates L2-Cognates L1) showed a single significant activation, located in the left Insula.

**Table 4** summarizes the details of these activations and **Figures 1** and **2** show these activations.

# DISCUSSION

Studies on accent processing have mostly focused on clinical populations, with a variety of clinical conditions. The evidence from these studies provides some important clues regarding the neurobiology of accent. However, the complexity of the clinical conditions, the variety of lesion types, sizes and methods used to examine those cases precludes any generalization regarding the neural basis of accent. Moreover, given that clinical signs may reflect not only the effect of damage, but also compensatory mechanisms put into play after damage, and considering that most of this literature concerns single case reports, the results of this research require parsimonious interpretation.

The neural basis of accent processing can be further examined by looking at healthy populations learning a second language, in particular, learning cognates, words that share phonological and semantic features across languages, but still offer the opportunity to examine the impact of accent related differences, at the segmental or suprasegmental levels. In the present study, we examined a group of Spanish speaking (L1)

TABLE 4 | T-contrast fMRI analysis of Cognates L2 vs. Control condition (i.e., Cognates L2 - dido), and the direct contrasts of Cognates (i.e., Cognates L2-Cognates L1), (*k >* 15, *p <* 0.001).


*Significant activated areas resulting from comparing naming Cognates L2 (French) and the control condition (dido), (k > 15, p < 0.001). BOLD responses yielded in activation of the left Middle occipital gyrus, the left Lingual gyrus, the left Inferior frontal gyrus, the left Precentralgyrus, the left Inferior frontal gyrus and the left, the right Middle occipital gyrus, the right Parahippocampalgyrus and the right Cerebellar tonsil. T-contrast fMRI analysis (i.e., Cognates L2-Cognates L1) showed a single significant activation, located in the left Insula.*

FIGURE 1 | Significant BOLD signal increase (cluster size (k) superior to 20 voxels and *p <* 0.001), related to Simple contrasts with naming Cognates activated the left Middle occipital and Lingual gyrus (BA37 and BA19), the Inferior frontal gyrus (BA 46 and BA 47) and the left Precentral gyrus (BA6 and BA9), the right Middle occipital gyrus and the righ Parahippocampal gyrus (BA19) and the right cerebellum). Statistical parametric maps overlaid onto the average T1-weighted anatomy of all subjects (*n* = 12). Activation related to only one layer is presented, thus many activations may not be seen on this image.

each (Cognates – Non-cognates) and (Cognates L2 – Cognates L1) yielded to significant activation in the left insula (BA 44).

adults learnt French (L2) cognates by means of an audiovisual computerized method; after consolidating cognate picture naming, thus when participants attained maximum score on picture naming of cognates, a test on L1 and L2 oral cognate naming during fMRI scanning was performed. Participants were instructed to respect native accent in each language as much as possible.

Behavioral results showed that mean ARs and RTs did not differ across L1 and L2, which suggests consolidated learning of L2 cognates. However, a jury of native speakers perceived participants' L2 accent as foreign, as rated on a scale of 1–9, where nine being perceived as a Canadian French Native speaker (*M* = 3.1, *SD* = 1.4). This shows that regardless of the consolidation of L2 lexical learning, at the phonological and semantic levels, participants' accent is perceived as foreign. Before cognate learning, participants perceived their accent in French as 'discrete' as opposed to 'heavy' or non-existent. The fact that participants did not find their accent heavy even before training, while raters perceived a heavy foreign accent following training indicates that L2 speakers and native-speaker listeners may have different perceptions regarding accent, (Yi et al., 2014). The reasons why this is so are difficult to tease apart, and may include motivation, awareness, expectancy related factors. However, given that the average age of participants to this study was 43 y/o, the results can be interpreted within the context of the critical period hypothesis (e.g., Long, 1990; Bongaerts et al., 1995; Birdsong, 1999; Singleton, 2003). Thus, the capacity to discriminate novel sounds is limited to a critical period, which ends between 6 and 12 months of age (Kuhl et al., 2003; Houston et al., 2007), and after which learners become less sensitive to differences between their productions and native accent (Long, 1990; Bongaerts et al., 1995; Birdsong, 1999; Singleton, 2003). Lack of awareness leads to persistence of foreign accent, regardless of high proficiency in naming, as reflected in this study by equivalent RT and ER in naming L1 and L2 Cognates.

The fMRI data showed significant activations in a number of motor processing and control areas. Specifically, the contrast (Cognate vs. Dido), showed a significant activation in the left Middle occipital gyrus, the left Lingual gyrus, the left Inferior frontal gyrus, the left Precentral gyrus, the left Inferior frontal gyrus, and the left, the right Middle occipital gyrus, the right Parahippocampal gyrus, and the right Cerebellar tonsil. These brain areas have been reported to sustain cognate processing, in previous work by our team, and others (De Bleser et al., 2003; Abutalebi, 2008; Raboyeau et al., 2010; Ghazi-Saidi et al., 2013; Marcotte and Ansaldo, 2014) and their role in motor (i.e., premotor cortex and supplementary motor areas; Raboyeau et al., 2010), attentional processing (i.e., anterior cingulate cortex, caudate nucleus, prefrontal cortex; Abutalebi, 2008), and word comprehension (i.e., anterior inferior temporal regions; De Bleser et al., 2003), has been consistently documented in healthy adult second language learners. Further, evidence from clinical data emphasizes the role of these areas in various lexical, motor and attentional processing. Interestingly, significant activations in a similar set of areas have been reported in studies on patients with FAS (Fridriksson et al., 2005; Poulin et al., 2007; Katz et al., 2012; Moreno-Torres et al., 2013; Tomasino et al., 2013), and damage to these areas in FAS patients (Kurowski et al., 1996; Mariën et al., 2006, 2009; Gurd and Coleman, 2006; Scott et al., 2006; Kuschmann et al., 2012). Finally, in a recent review, Carbary et al. (2000) conclude that FAS is typically associated to damage in the left pre-central gyrus and inferior frontal gyri, the basal ganglia the insula cortex, which are similar to the areas reported in the fMRI studies on healthy participants, specifically focusing on the bilingual lexicon through cognate processing (Carbary et al., 2000; Scott et al., 2006; Ghazi-Saidi et al., 2013; Marcotte and Ansaldo, 2014). Our results provide a supplementary source of evidence to the role of these areas with healthy participants learning a second language vocabulary.

As for the contrast (L2 vs. L1 Cognate Naming), it allowed highlighting the specific feature distinguishing between L2 andL1 cognate naming, and this corresponded to a single significant activation in the left insula. Thus, cognates share phonological and semantic features with mother tongue, but they differ in terms of prosody, intonation and articulation placement, all of which are essential components of accent. Accordingly, the significant activation in the insula reported here, specifically reflects the accent component of L2 picture naming. In the next section we discuss the specific role of the insula on accent processing.

#### The Role of Insula in Accent Processing

Given its location in the brain, the role of insula in language processing was mostly examined in lesion studies. With the advancement of functional neuroimaging techniques, we have now access to literature that is not more than a decade old (Bamiou et al., 2003; Ackermann and Riecker, 2004; Menon and Uddin, 2010; Sterzer and Kleinschmidt, 2010; Damasio et al., 2013; Oh et al., 2014).

In line with Menon and Uddin (2010), the evidence reported in the present study stresses on the role of insula detecting salient events, specifically L2 accent patterns, which requires coupling attention, working memory, and motor planning for L2 word production. Moreover, in line with previous accounts on the role of the insula in vocal track manipulation for articulation and phonation (Ackermann and Riecker, 2004) and auditory processing (Bamiou et al., 2003), the significant activation of the insula observed in the present study can also be related to adjustments of the vocal track with the purpose of optimizing accent in L2.

From a broader cognitive perspective, Damasio et al. (2013), attribute to the insula a role in higher order cognitive and emotional processing, including subjective feelings from the body, and processing uncertainty. In line with this view, we believe that the activation of the insula in the context of persistent foreign accent can be related to higher order processes (Moyer, 2013) involved the ability to recognize, comprehend and integrate the segmental and suprasegmental levels of phonology with the purpose of achieving optimal word production. The insula's role on higher order speech language processing can be related to its highly connected network, with speech, language, and executive function centers in the brain. (Oh et al., 2014), which facilitates the integration of a large variety of cognitive processes, ranging from motor to executive function that are put into play to achieve native-like pronunciation, even when target words are L2 cognates.

The significant activation of the left insula in the context of novel syllable processing has also been reported both with healthy populations (Chee et al., 2004) and in cases of AOS (Moser et al., 2009). Also with brain damaged populations, the insula's role on accent processing has been documented in cases of FAS (Gurd and Coleman, 2006; Mariën and Verhoeven, 2007; Moreno-Torres et al., 2013; Tomasino et al., 2013), a finding that has lead a number of authors to hypothesize on the role of the insula in accent processing (Moonis et al., 1996; Carbary et al., 2000; Ackermann et al., 2004; Gurd and Coleman, 2006; Scott et al., 2006). The present study provides direct evidence to this hypothesis with healthy second language learners.

Moreover, the activation of the insula in the context of L2lexical learning has been reported in previous work by our team and others (De Bleser et al., 2003; Chee et al., 2004; Abutalebi, 2008; Ghazi-Saidi et al., 2013; Marcotte and Ansaldo, 2014), and its activation has been found to be positively correlated with decreased anterior cingulate and anterior frontal activation (Chee et al., 2004) Thus, Chee et al. (2004) argue that the insula plays a role in sensory-perceptual processing. Also, particularly relevant to the present study, is the evidence of the insula's role on sub-vocal rehearsal of speech contrasts (Fiez et al., 1996; Smith et al., 1998; Callan et al., 2003, 2004), a finding that is in line with the DiVA Model (Guenther, 2006) which highlights the role of the insula in the selection of speech sound maps, thanks to its connection with the premotor and motor cortices.

Finally, from a more general cognitive perspective, the significant activation of the insula has also related to selfconsciousness in the sense of agency, namely, experiencing oneself as being the cause of an action (Farrer and Frith, 2002). In the context of this study, the insula may as well be supporting awareness and regulation of accent features in L2 cognate production. Moreover, particularly involved in cognitive control (e.g., Menon and Uddin, 2010; Nelson et al., 2010; Tops and Boksem, 2011), the insula has been shown to be sensitive to salient events (Menon and Uddin, 2010); this finding together with its strong connectivity to the motor cortex (Ackermann and Riecker, 2004), argue in favor of a particularly important role of the insula when trying to attain native-like accent. The sensitivity to distinct accent features coupled with access to motor programming structures allows for feedback and feed-forward mechanisms at the core of L2 accent production.

# CONCLUSION

The results of the present study show that the production of a native-like accent remains challenging and cognitively effortful, even when L2 words share phonology and semantics with L1 equivalents, and despite the fact that vocabulary learning is consolidated. In line with clinical reports on FAS, and functional neuroimaging studies on accent production in healthy populations, the important role of the insula in accent processing may be related to a number of high order and highly accent specific processing features ranging from self awareness and monitoring, to vocal track control, and sub-vocal rehearsal of phonemic sequences.

The evidence provided by the present study is specific, as for the first the first time the role of the insula in accent processing is confirmed among healthy adults tested on a type of words whose only difference across mother tongue and second language is at the level of accent, namely cognates.

#### REFERENCES


#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal*.*frontiersin*.*org/article/10*.*3389/fnhum*.* 2015*.*00587


bipolar patient: a case report. *Ann. Gen. Psychiatry* 6:1. doi: 10.1186/1744- 859X-6-1


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Ghazi-Saidi, Dash and Ansaldo. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Processing changes when listening to foreign-accented speech

Carlos Romero-Rivas <sup>1</sup> , Clara D. Martin2, 3 and Albert Costa1, 4 \*

<sup>1</sup> Speech Production and Bilingualism, Center for Brain and Cognition, Universitat Pompeu Fabra, Barcelona, Spain, <sup>2</sup> BCBL – Basque Center on Cognition, Brain and Language, San Sebastian, Spain, <sup>3</sup> IKERBASQUE, Basque Foundation for Science, Bilbao, Spain, <sup>4</sup> Institució Catalana de Recerca i Estudis Avançats, Barcelona, Spain

This study investigates the mechanisms responsible for fast changes in processing foreign-accented speech. Event Related brain Potentials (ERPs) were obtained while native speakers of Spanish listened to native and foreign-accented speakers of Spanish. We observed a less positive P200 component for foreign-accented speech relative to native speech comprehension. This suggests that the extraction of spectral information and other important acoustic features was hampered during foreign-accented speech comprehension. However, the amplitude of the N400 component for foreign-accented speech comprehension decreased across the experiment, suggesting the use of a higher level, lexical mechanism. Furthermore, during native speech comprehension, semantic violations in the critical words elicited an N400 effect followed by a late positivity. During foreign-accented speech comprehension, semantic violations only elicited an N400 effect. Overall, our results suggest that, despite a lack of improvement in phonetic discrimination, native listeners experience changes at lexical-semantic levels of processing after brief exposure to foreign-accented speech. Moreover, these results suggest that lexical access, semantic integration and linguistic re-analysis processes are permeable to external factors, such as the accent of the speaker.

#### Edited by:

Ignacio Moreno-Torres, University of Malaga, Spain

#### Reviewed by:

Nandini Chatterjee Singh, National Brain Research Centre, India William F. Katz, University of Texas at Dallas, USA

#### \*Correspondence:

Albert Costa, Speech Production and Bilingualism, Center for Brain and Cognition, Universitat Pompeu Fabra, Carrer de Tànger, 122, 08018 Barcelona, Spain costalbert@gmail.com

> Received: 22 November 2014 Accepted: 11 March 2015 Published: 25 March 2015

#### Citation:

Romero-Rivas C, Martin CD and Costa A (2015) Processing changes when listening to foreign-accented speech. Front. Hum. Neurosci. 9:167. doi: 10.3389/fnhum.2015.00167 Keywords: ERPs, foreign-accented speech, adaptation, perceptual learning, lexical-semantic processing, P200, N400, P600

# Introduction

Most of the studies addressing the processes behind spoken sentence comprehension have been conducted in the context of native speech. Although this is a reasonable strategy, conversations involving at least one foreign-accented speaker are becoming frequent due to increasing interest in foreign language learning and global mobility. In this context, two questions are fundamental. First, what are the acoustic/perceptual (or "bottom up") processing challenges that listeners face when presented with foreign-accented speech? Second, to what extent are higher-level, lexical-semantic (or "top-down") processes altered for these same types of foreign-accented stimuli? The present study aims at addressing some aspects of these two issues.

Early stages of speech comprehension, in which the incoming signal is acoustically mapped onto the listener's phonological repertoire, seem to be somewhat compromised when processing foreignaccented speech (Lane, 1963; Munro and Derwing, 1995a,b; Schmid and Yeni-Komshian, 1999; van Wijngaarden, 2001). This is because the phonological properties of foreign-accented speech often depart from those of the native listener. For instance, when a target phoneme in the second language (L2) does not exist in the speaker's native language, or when it is very similar to a native phoneme, foreign-accented speakers frequently substitute the L2 sound with a native sound (Wester et al., 2007). Moreover, variation in non-native speech is not restricted to segmental information, but it is also perceptible at the suprasegmental level; that is, variation is not only present at phonological levels, but also in the speaker's pitch and intonation contour (Gut, 2012). This is important, since word and sentence stress, as well as prosody and intonational deviations, are as important to intelligibility as segmental aspects (Fraser, 2000; Jilka, 2000). In addition, non-native speakers tend to be more variable than native speakers in their pronunciation (Nissen et al., 2007; Wade et al., 2007), meaning that sometimes they succeed in producing canonical sounds and sometimes they do not (Hanulíková and Weber, 2012). Such difficulty in language production could compromise conversational partners' semantic and syntactic processes (Goslin et al., 2012; Hanulíková et al., 2012). Finally, listeners use foreign accents as cues to categorize non-native speakers, modifying their mental representation about the speaker [e.g., modifying their expectations about the grammatical well-formedness of foreign-accented speech (Hanulíková et al., 2012); and relaxing their vowel categories more readily for foreign-accented speakers than for native speakers (Hay et al., 2006)].

In this article we address two main issues related to speech comprehension in foreign-accented contexts. First, we explore whether fast adaptations at the phonetic/acoustic and/or lexical level occur during speech comprehension. In particular, we ask whether a relatively short exposure to correct sentences (later in the experiment there will be semantic violations as well) pronounced with a foreign accent is sufficient to significantly improve the comprehension of the words in such utterances. As discussed below, we take the modulation of the P200 and N400 ERP components across the experiment as an index of improvement at phonetic/acoustic discrimination and word comprehension, respectively.

Second, we explore whether semantic processing is affected after listeners have gotten better at comprehending foreignaccented speech. We address whether the N400 (also associated to difficult semantic integration during semantic violations processing) and the P600 components (associated with the remapping of unexpected semantic features) are modulated by foreign-accented speech when semantic violations are present. This will inform us about whether after adapting to foreignaccented speech, semantic integration and meaning construction processes are compromised during the comprehension of foreign-accented speakers.

#### Perceptual Learning of Foreign-Accented Speech

Despite the pervasive effects of foreign-accented speech on intelligibility (misidentification of words: Lane, 1963; Munro and Derwing, 1995a,b; van Wijngaarden, 2001) and comprehension<sup>1</sup> (detecting mispronunciations and during sentence verification tasks: Munro and Derwing, 1995b; Schmid and Yeni-Komshian, 1999), native speakers improve their understanding of foreignaccented speech after brief exposure. After training with accented speech, native speakers are more accurate with the accent they were trained with in subsequent word transcription tests (Clarke, 2000; Weil, 2001; Bradlow and Bent, 2003). Clarke and Garrett (2004) presented English native speakers with English sentences uttered either by a native speaker of English or by a Spanish foreign-accented speaker of English, in the context of a probe word matching task. Clarke and Garrett (2004) observed that listeners were initially slower to respond to the Spanish-accented speech than to the native speech, but this difficulty decreased after 1 min of exposure. More recent studies have shown that this effect appears regardless of the speaker's baseline intelligibility, although the amount of exposure needed to achieve significant improvements varies depending on the strength of the accent, as well as on the listener's experience with a particular accent (Bradlow and Bent, 2008; Witteman et al., 2013). Moreover, adaptation is even present when foreign-accented speakers are inconsistent in their pronunciations (Witteman et al., 2014), but not when the accented variant forms are arbitrary instead of genuine (Weber et al., 2014). Importantly, accent learning occurs both with sentence- and word-length utterances, which suggests that listeners are sensitive to the global properties associated with accent, such as prosody and intonation contours, and also to segmental properties of speech that vary with accent (Sidaras et al., 2009).

A potential mechanism behind this adaptation is perceptual learning, a process which helps listeners to categorize ambiguous phonemes using lexical information. In an influential study, Norris et al. (2003) presented listeners with an ambiguous phoneme [?], midway between [f] and [s]. Listeners were exposed to the ambiguous phoneme in one of three training conditions (first: [?] version of 20 [f]-final words and the natural version of 20 [s]-final words; second: [?] version of 20 [s]-final words and the natural version of 20 [f]-final words; third: [?] was presented as the last phoneme in experimental non-words). After training, subjects were asked to categorize a range of ambiguous fricatives on a five step [εf]–[εs] continuum. Results showed that the categorization of these sounds shifted as a function of the training condition. That is, there were more [f] responses after exposure to ambiguous [f]-final and natural [s]-final words than after exposure to ambiguous [s]-final and natural [f]-final words (and vice versa). Most importantly, perceptual learning seemed to be absent when training with the ambiguous phoneme happened in a non-words context (see also: Davis et al., 2005). Norris et al. (2003) concluded that their results "do not show an increase in the listeners' ability to make phonetic discriminations. Instead, the results show that there was a change in the way an ambiguous phoneme was categorized, with the direction of that change determined by information that was only available from the lexicon" (p. 229). In addition, these changes generalize to words that have not been presented during the training phase (Davis et al., 2005; McQueen et al., 2006; Sjerps and McQueen, 2010). Thus, listeners would learn that the ambiguous phoneme is a representative form of the original phoneme, and this processing would be driven by lexical information.

<sup>1</sup> In terms of classic models of lexical access (e.g., Lahiri and Marslen-Wilson, 1991; Pallier et al., 2001), comprehensibility refers to bottom-up activation processes from phonetic representations up to the lexicon to retrieve a possible lexical candidate. In contrast, intelligibility refers to top-down decision processes involving lexical and pragmatic knowledge, arising from the computation of these lexical candidates.

Therefore, the first purpose of our study is to explore the temporal dynamics of the perceptual learning of foreignaccented speech (that is, the interaction between bottom-up, acoustic/phonetic processing challenges and top-down, lexicalsemantic processes). Following Norris et al.'s (2003) conclusions, if perceptual learning does not entail an increase in the listeners' ability to make phonetic discriminations, then adaptation to a foreign accent should not appear during phonetic/acoustic extraction processes, but during lexical processing. For this purpose, we will explore the P200 and N400 ERP components modulation after exposure to foreign-accented speech (see below). Interestingly, Norris et al.'s (2003) results have been replicated recently using natural foreign phonemes (instead of an ambiguous phoneme artificially created in a laboratory; Sjerps and McQueen, 2010), and also in a foreign-accented context (Reinisch and Holt, 2014). This would suggest that adaptation to foreign acoustical properties might be guided, at least partially, by perceptual learning.

#### The P200 Component as an Index of Phonetic/Acoustic Processing

The P200 component (a positive deflection in the ERP wave peaking at around 200 ms after target presentation) has been often associated with the extraction of important acoustic features used for phonological and phonetic processing (Reinke et al., 2003; De Diego Balaguer et al., 2007). Interestingly, the amplitude of this component is positively correlated with the relative ease of this extraction process. For example, normal speech elicits a more positive P200 than degraded speech (Strauß et al., 2013). Hence, to the extent that extracting phonetic information from foreign-accented speech is more difficult than from native speech, one would expect the amplitude of the P200 to be smaller in a foreign-accented speech context.

In this study, we will explore whether exposure to foreignaccented speech across two experimental blocks has an effect on the extraction of the acoustic, pre-lexical features used for phonological processing. Indeed, this strategy has recently been followed by Rossi et al. (2013) in a different but related context. Rossi et al. (2013) conducted a study in which native speakers of German repetitively listened to Slovakian phonotactic regularities at the onset of pseudo-words. The results showed an increase in the amplitude of the P200 component after 3 days. However, Rossi et al. (2013) did not find differences in the amplitude of the P200 component between the pre-tests and post-tests in any of the 3 days of exposure. Since the P200 component is associated to the detection of relevant auditory cues (Reinke et al., 2003; De Diego Balaguer et al., 2007), Rossi et al.'s (2013) results suggest that listeners do not improve at detecting pre-lexical foreign features after brief exposure, but only after repetitive exposure to these pre-lexical regularities day after day. Therefore, if Rossi et al.'s (2013) observations on the learning of foreign phonotactic rules can be extended to the learning of foreign-accented speech phonetic variations, then we expect that the P200 component would not be modulated after brief exposure to foreign-accented speech. That would be suggesting that listeners do not improve at detecting relevant auditory cues for phonetic processing after brief exposure to foreign-accented speech.

## The N400 and P600 Components as Indices of Lexical and Semantic Processing

The N400 component is a negative deflection in the ERP wave peaking at around 400 ms, and that usually shows a centroposterior scalp distribution. The N400 component is sensitive to a range of features such as: (a) sublexical variables of words, like orthographic similarity to other words in the language [words with more orthographic neighbors elicit larger N400s (Holcomb et al., 2002; Laszlo and Federmeier, 2011)]; (b) lexical variables, such as word frequency, or concrete vs. abstract concepts (Kroll and Merves, 1986; Smith and Halgren, 1987; Van Petten and Kutas, 1991; West and Holcomb, 2000; Gullick et al., 2013); (c) semantic relationships among words (Neely, 1991; McNamara, 2005; Van Petten and Luka, 2006); and (d) cloze probability during sentence comprehension (Kutas and Hillyard, 1984; Kutas et al., 1984; DeLong et al., 2005, 2012; Thornhill and Van Petten, 2012; Wlotko and Federmeier, 2013).

In the present study—following Norris et al.'s (2003) and Davis et al.'s (2005) conclusions—we expect that a top-down mechanism will allow listeners to retune sublexical features during foreign-accented speech. As this mechanism is supposed to be driven by lexical information, we expect to capture a reduction in the N400 component for foreign-accented speech across experimental blocks.

Regarding semantic processing, there is scarce evidence and a lack of agreement on the effects of foreign-accented speech on the N400 component. Hanulíková et al. (2012) presented listeners with sentences uttered either by a native speaker of Dutch or by a Turkish foreign-accented speaker of Dutch. They found a similar N400 effect for semantic violations (e.g., "It was very cold last night, so I put a thick blanket/evening on my bed") during both native and foreign-accented speech, although it was more widely distributed over the scalp during foreign-accented speech comprehension. Hanulíková et al. (2012) concluded that native listeners had no problem understanding foreign-accented speech, as indicated by almost equivalent electrophysiological responses to semantic violations produced by native and foreign-accented speakers. On the other hand, Goslin et al. (2012) presented listeners with correct sentences uttered by native, regional (a different dialect), and foreign speakers of English. Final words uttered by foreign-accented speakers elicited reduced N400 components when compared to both native and regional accented conditions. Goslin et al. (2012) concluded that because of the degraded signal (due to foreign accent), native listeners hearing foreign speakers would rely on top-down processes (i.e., paying more attention and placing more effort on anticipating upcoming words) in order to understand the incoming speech. That is, Hanulíková et al. (2012) and Goslin et al. (2012) reached two different conclusions. While Hanulíková et al. (2012), proposed that global meaning was not affected by foreign-accented speech, Goslin et al. (2012) suggested that listeners had to use top-down processes in order to compensate for a comprehension deficit during foreign-accented speech.

The second purpose of this study is to clarify this issue, by including semantic violations in the second part of the experiment. More concretely, we will explore whether exposure to foreign-accented speech has an effect on further linguistic processes, such as semantic integration (as indexed by the N400) and meaning re-analysis (as indexed by the P600).

The P600 component is a positive-going deflection in the ERP wave which peaks at a later time point than the N400, lasting until approximately 900 ms after word onset. The P600 is considered an index of a second stage of processing, involving a continued analysis of the current word with respect to its context and to the information stored within long-term memory (Kuperberg et al., 2011). For instance, a P600 effect is observed for words that are highly semantically implausible with respect to their context (Kuperberg, 2007; Van de Meerendonk et al., 2010), or by words that require deeper causal inferences (Burkhardt, 2006, 2007).

The present knowledge regarding the modulation of the P600 component in foreign-accented contexts is limited to Hanulíková et al.'s (2012) study. Interestingly, in their study, the P600 component was sensitive to gender agreement errors only when sentences were presented in a native accent, but not when they were presented with a foreign accent. Hanulíková et al. (2012) concluded that listeners had "learned" to be tolerant to these grammatical mistakes when presented in foreign-accented speech<sup>2</sup> .

Summarizing, the present study aims to explore two main questions. First, what are the specific adaptations that native speakers perform to deal with foreign-accented speech? More concretely, we explored whether native speakers experience a change in the acoustic/phonetic processing after brief exposure to foreign-accented speech or, on the other hand, whether the usual improvement in comprehension observed during exposure to foreign-accented speech is dependent on top-down, lexicalsemantic processes. The second question is whether, after these adaptations are acquired, further linguistic processes are affected by foreign-accented speech—such as semantic integration and meaning re-analysis. To address these issues, Spanish native speakers were presented with a large set of sentences either produced with a native accent or with a foreign one. In the first block of the experiment we used standard (correct) sentences (meaningful and unsurprising sentences). In the second block, standard sentences were randomly mixed with sentences containing a semantic violation. The EEG was recorded during the experiment and time-locked ERPs were explored. We focused our analysis on the P200, N400, and P600 components elicited by the first, critical and final word of each sentence (see **Table 1** for examples).

Following Norris et al.'s (2003) conclusions, if perceptual learning does not entail an increase in the listener's ability to make phonetic discriminations, we expect a lower P200 amplitude for foreign-accented as compared to native speech across the whole experiment. Moreover, if listeners retune sublexical and/or supralexical features of speech using a top-down mechanism driven by lexical information (Norris et al., 2003; Davis

#### TABLE 1 | Examples of sentences with English translation.

Mi desayuno favorito es tostadas con mermelada y un café/hospital con mucha leche.

"My favorite breakfast is a toast with marmalade and a coffee/hospital with a lot of milk."

Cuando mi sobrina duerme en mi piso siempre le leo un libro/pan por la noche. "When my niece sleeps in my flat I always read to her a book/bread by the night."

María tuvo que imitar a un pirata/comercio en la fiesta. "María had to imitate a pirate/store in the party."

Para ir a Barcelona siempre pasamos por un túnel/piano en la autovía. "Coming to Barcelona we always cross a tunnel/piano in the highway."

ERPs were obtained during the first, critical and final word of the sentences (underlined words). Critical words are in italics. Semantic violations were only introduced during Block 2.

et al., 2005), we expect that the N400 amplitude for foreignaccented speech will decrease across experimental blocks. Also, if linguistic processes such as semantic processing and meaning re-analysis are affected by foreign-accented speech even after exposure to the accented speakers, we expect that the N400 and P600 effects for semantic violations in the second block will be modulated by the speaker's accent. Based on previous literature (Hanulíková et al., 2012), we expect an N400 effect for semantic violations distributed more widely over the scalp during foreignaccented speech comprehension, as compared to native speech comprehension. In addition, Hanulíková et al. (2012) observed a P600 effect for grammatical mistakes during native speech comprehension, an effect that was missing during foreign-accented speech comprehension. Thus, we also expect a modulation of the P600 effect depending on the speaker's accent.

#### Materials and Methods

#### Participants

Twenty native speakers of Spanish (12 women, all right handed, mean age = 24.1 years, range = 19–35 years) participated in this experiment in return for monetary compensation (10e/h). Participants were mostly from Catalonia (hence, from the same dialectal variation), and Spanish was their dominant language (they would speak Spanish to their parents, and they would use Spanish >70% of the time when interacting with other people). None of them reported any hearing or neurological impairments. Before the beginning of the experiment, subjects gave their written informed consent.

#### Stimuli

The experimental stimuli consisted of a set of 208 sentences uttered by four native and four foreign-accented speakers of Spanish. Each sentence was recorded four times: a standard version spoken by one of the native speakers, a standard version spoken by one of the non-native speakers, a version containing a semantic violation in the critical word spoken by one of the native speakers, and a version containing a semantic violation in the critical word spoken by one of the foreign accented speakers (resulting in 832 sentences; for auditory samples of some

<sup>2</sup>When focusing on the processing of semantic violation in Hanulíková et al.'s (2012) study, a positive deflection is noticeable in the ERP signal after the N400 effect for semantic violations, over the posterior electrodes, only during native speech (cf. Figure 3). This suggests that listeners might also be more tolerant of semantic mistakes when these are produced by foreign-accented speakers. However, Hanulíková et al. (2012) did not report statistical analyses on this effect.

experimental sentences, see Supplementary Material). Each quartet (the four versions of each sentence) was recorded by a native and a foreign-accented speaker. Critical words were always nouns in a mid-sentence position (range = 1–5 words between critical word and final word, mean = 2.34 words; SD = 0.9), balanced for phonological length [mean for the Standard condition = 6.66 phonemes (SD = 1.91); mean for the Semantic Violation condition = 6.48 phonemes (SD = 2.11); p = 0.37] and frequency [mean for the Standard condition = 3.08 (SD = 0.58); mean for the Semantic Violation condition = 3.09 (SD = 0.58); p = 0.35]. Logarithmic values for word frequency were extracted from the SUBTLEX-ESP corpus (Cuetos et al., 2011). In addition, the critical words of each sentence in Standard condition and Semantic Violation condition always started with a different phoneme. The first and final words of each sentence were also analyzed (see **Table 1**).

The native languages of the foreign speakers were French, Greek, Italian, and Japanese. The decision to use these speakers was rooted in the aim to test the main effect of foreign accented speech, independently of the native language of the foreign speakers and the similarities between Spanish and those other languages. We looked for variability in the speakers' accent on purpose, so any effect on comprehension would be due to foreign-accented speech, and not to a specific error pronounced with a specific accent (for similar methodological choices, see Schmid and Yeni-Komshian, 1999; Floccia et al., 2009) 3 . Nevertheless, we controlled for stress patterns in the pronunciation of foreign-accented speakers, in order to avoid effects such as those of weird stress shifts or irregular metrics on the ERP components (Rothermich et al., 2012). In order to do so, foreign-accented speakers were presented with native-accented versions of the sentences before their recordings. In any case, Reinisch and Weber (2012) showed that native listeners can adapt to stress errors produced by foreign-accented speakers after brief exposure.

Accent strength and intelligibility of the eight speakers were tested by an independent sample of 27 native speakers of Spanish (19 women, mean age = 22.93 years, range = 18–38 years). Participants in the pre-tests were also mostly from Catalonia, and Spanish was their dominant language as well (they would speak Spanish to their parents, and they would use Spanish >70% of the time when interacting with other people). These pre-tests were run in order to ensure that native and foreign-accented speakers were perceived differently, and that, beyond this difference, they were all understandable. Participants carried out two tasks. During the first task, they had to listen to the experimental sentences and rate them from 1 (native speech) to 5 (the speaker has a very strong foreign accent). For the second task, subjects had to write down the final word of each sentence (comprehension task). Regarding the first task, we carried out a repeated measures ANOVA including the within subject factors Accent (native, foreign) and Speaker (each of the eight speakers). A significant effect of Accent was obtained [F(1, 26) = 793.93, p < 0.001], revealing that foreign speakers' accents (mean = 3.58, SD = 0.2) were evaluated as stronger than native speakers' accent (mean = 1.22, SD = 0.07). We also obtained a significant effect of Speaker [F(3, 24) = 7.03, p < 0.01], and a significant interaction between the two factors [F(3, 24) = 30.82, p < 0.01]. Planned comparisons revealed that each native speaker was rated significantly less accented than each non-native speaker. Also, between native speakers only speaker number 2 was rated as significantly more accented than the rest. Among foreign accented speakers only the Japanese one was rated as significantly less accented than the rest (for further details, see **Figure 1**). Regarding the second task, participants recognized the last word of the sentences 100 per cent of the times both for the native and for the foreign accented speakers, and did not report any difficulties in understanding the sentences. Based on these results, we can conclude that native and foreign-accented speakers were perceived differently, although all of them were understood<sup>4</sup> .

Sentences were recorded and edited with Audacity (©Audacity Team), at 44.1 kHz, 32 bits and in stereo sound. Each speaker received a list containing the experimental sentences in both versions (standard and containing semantic violations), randomized in order to avoid undesirable effects, such as different speech rates and voice intensities for first and second presentations of the same sentence context. As mentioned before, foreign-accented speakers were presented with native-accented versions of the sentences before their recordings, in order to minimize possible differences in speech rate and prosody. Nevertheless, there were differences in the mean duration of the whole sentence (native speech = 3311.24 ms, SD = 542.34; foreign accented speech = 4149.21 ms, SD = 687.64; p < 0.001), mean duration of critical words (native speech = 358.36 ms, SD = 108.93; foreign accented speech = 450.14 ms, SD = 137.95; p < 0.001), and mean duration of final words (native speech = 474.21 ms, SD = 144.33; foreign accented speech = 521.93 ms, SD = 150.67; p < 0.001). Following Goslin et al. (2012), no attempt was made to control or adjust the temporal features of the stimuli, as longer productions are an inherent part of foreign-accented speech (see also Hanulíková et al. (2012) for differences in sentences and critical words durations across accents).

In addition, we analyzed the acoustic features of the critical words using Praat (Boersma and Weenink, 2001). More concretely, we measured intensity (dB), and f<sup>0</sup> mean and range (Hz) of the critical words of the 208 experimental sentences (see Supplementary Material Figure 5, for spectrographic representations of some sentences). We expected differences between native and foreign-accented speech in f<sup>0</sup> related values, since variation in

<sup>3</sup> In addition, as a secondary objective, we wanted to explore accent-independent adaptation to foreign-accented speech (taking into account accent strength ratings, not a specific mispronunciation uttered by a specific speaker), along the same lines as Baese-Berk et al. (2013). However, we did not obtain reliable results for accent-independent adaptation (for further details, see Footnote 5).

<sup>4</sup> "Derwing and Munro (1997) reported no relations between intelligibility and comprehensibility for accented speech, using a range of accented varieties of English (Cantonese, Japanese, Polish, and Spanish). Similarly, Weil (2003) failed to find a correlation between reaction-time based measures of comprehensibility collected in a repetition task using Mandarin and Russian accented words, and measures of intelligibility of the same words obtained in a discrimination task. Therefore, an accented speech sample can be rated as highly intelligible, but difficult to process at the same time," Floccia et al. (2009 p. 380).

the speaker's pitch and intonation contour is a common feature of foreign-accented speech (Gut, 2012). Repeated measures ANOVA were conducted for each measure, including the factors Accent (native, foreign) and Semantic Status (standard, semantic violation). In the intensity analysis we did not observe any significant effect or interaction. In the f<sup>0</sup> mean analysis, we observed a main effect of Accent [F(1, 207) = 6.50; p < 0.05], revealing a higher f<sup>0</sup> mean for foreign-accented speech (213.95 Hz) than for native speech (207.27 Hz). We also observed a main effect of Semantic Status [F(1, 207) = 4.15; p < 0.05], showing a higher f<sup>0</sup> mean for semantic violations (212.01 Hz) than for standard critical words (209.21 Hz). Importantly, the interaction between the two factors was not significant [F(1, 207) < 1; p = 0.91]. Finally, in the f<sup>0</sup> range analysis we observed a main effect of Accent [F(1, 207) = 64.87; p < 0.001], revealing a wider f<sup>0</sup> range during foreign-accented speech (96.03 Hz) than during native speech (77.13 Hz). These differences will be discussed later on (see Discussion).

Four experimental lists were created, each of them containing only one version of the 208 experimental sentences. There were two blocks in each list, although subjects were not warned about this characteristic. During the first block, subjects listened to only standard sentences (64 sentences, 32 spoken by the native speakers, 32 spoken by the non-native speakers, eight sentences per speaker, all sentences randomized). We chose to use only eight sentences per speaker because both improvement at foreign-accented speech comprehension (Clarke and Garrett, 2004) and perceptual learning (Norris et al., 2003) appear after very brief exposure to speech. During the second block, subjects listened to standard sentences and sentences containing semantic violations (144 sentences, 36 standard sentences spoken by native speakers, 36 standard sentences spoken by foreignaccented speakers, 36 sentences containing semantic violations spoken by native speakers, 36 sentences containing semantic violations spoken by foreign-accented speakers, with nine sentences per speaker and condition, all sentences randomized). The presentation of the standard sentences in the first or the second block was counterbalanced across subjects.

#### Procedure

Participants were seated in front of a computer screen, in a soundproof room. They were asked to listen carefully in order to comprehend all sentences during a passive listening task. We did not provide any information about the speakers or their accents, only telling the participants that they will be listening to people speaking in an everyday context. The experiment was run on E-Prime 2.0. Sentences were presented binaurally via headphones at a constant, comfortable listening level set by the experimenters. Each trial started with a fixation point, presented 1000 ms before the onset of the auditory sentences and remained on the screen until 1000 ms after sentence offset. Participants were asked to stare at the fixation point and to avoid blinking throughout the auditory sentence presentation. Participants controlled initiation of the next trial by pressing the space bar. They were told to rest between trials if needed. The whole experiment lasted approximately 25 min.

#### EEG Recordings and Analysis

The EEG signal was recorded from 64 active electrodes (impedance was kept below 10 k) mounted in an elastic cap, at standard 10–20 locations. The on-line reference electrode was attached to the left mastoid, and re-referenced off-line to the mastoid average. Lateral eye movements were recorded with an electrode beside the right eye, and eye blinks were recorded with two electrodes, one above and the other below the right eye. EEG signal was filtered on-line with a 0.1–100 Hz bandpass filter and digitized at 500 Hz. EEG epochs were time-locked to the first word and the final word of each sentence (either coming from a correct sentence context or from a context containing a semantic

violation), as well as to the critical words (those words manipulated to elicit a semantic violation in the experimental condition). Thereby we extracted the segments at 200 ms before and lasting until 1200 ms after the onset of each analyzed word. EEG waveforms were baseline corrected to a 200 ms pre-onset baseline, and averaged per participant and condition. Mean amplitudes in specific time windows were analyzed with repeated measures ANOVAs, analyzing three regions: frontal (F3, Fz, F4, FC5, FC3, FC1, FC2, FC4, and FC6), central (C3, C1, Cz, C2, C4, CP3, CP1, CP2, and CP4) and posterior (P5, P3, P1, Pz, P2, P4, P6, PO3, and PO4).

Statistical analyses focused on three main time windows. For the P200 effect we established an early time window (150–250 ms, based on previous literature; see, e.g., Rossi et al., 2013; Strauß et al., 2013). We analyzed the P200 component only on the first word of the sentences, since this component wanes in the ERP signal of words embedded in spoken sentences (e.g., see the comparison of spoken vs. written sentence final words in Kutas and Federmeier's (2001), Figure 1). A similar strategy has been used by Strauß et al. (2013): in a study on lexical expectations under degraded speech, Strauß et al. (2013) analyzed the P200 component only on the first word of spoken sentences. For the N400 effect we established an intermediate time window (250–600 ms, based on previous literature; see, e.g., Lau et al., 2008). The N400 component was analyzed on the first, critical and final words of each sentence. Finally, for the P600 component we established a late time window (600–900 ms, based on previous literature; see, e.g., Brouwer et al., 2012). The P600 component was analyzed for the critical word of each sentence, since it indexes semantic re-analysis processes, and we did not consider any hypothesis about this component for the first or final words of the sentences.

All effects and interactions including a variable with three factors were corrected for sphericity using the Greenhouse-Geisser correction. In addition, we used the Bonferroni correction for post-hoc analyses.

#### Results

#### P200: Acoustic/Phonetic Processing

As argued in the introduction, modulations of the P200 component could be taken as an index of improvements in the extraction of acoustic/phonetic information during foreign-accented speech comprehension. To assess this issue we compared the amplitude of the P200 component for the first word of the sentences across the two experimental blocks. The repeated measures ANOVA for the P200 effect (150–250 ms) only included the first word of the sentences. The analysis included the factors Topography (frontal, central, posterior), Block (first, second) and Accent (native, foreign). We obtained a significant effect of Topography [F(2, 38) = 9.11; p < 0.01], Accent [F(1, 19) = 6.99, p < 0.05], and a significant interaction between the three factors [F(2, 38) = 4.88, p < 0.05]. The post-hoc analysis for the interaction showed that the amplitude of the P200 component was similar for the two blocks in both native [t(38) = 0.59, p = 1] and foreign-accented speech [t(38) = 0.41, p = 1]. Furthermore, words produced with native speech elicited a more positive P200 amplitude than words produced with foreign-accented speech. This was the case for both block 1 [t(38) = 4.60, p < 0.001] and block 2 [t(38) = −3.82, p < 0.01]<sup>5</sup> . Native speech elicited a more positive mean amplitude than foreign-accented speech in the three topographic regions [frontal region: t(38) = 3.15, p < 0.01; central region: t(38) = 5.17, p < 0.001; posterior region: t(38) = 3.41, p < 0.01].

In sum, this analysis revealed that the mean amplitude of the P200 component for native speech was more positive than for foreign-accented speech across the experiment (**Figure 2**). This suggests that the extraction of phonetic/acoustic information was easier during native speech as compared to foreign-accented speech comprehension throughout the experimental session.

#### N400: Lexical-Semantic Processes

We carried out two repeated measures ANOVAs for the N400 component (250–600 ms). The first repeated measures ANOVA for the N400 component included the factors Topography (frontal, central, posterior), Position (first word, critical word, final word), Block (first, second) and Accent (native, foreign). Only correct sentences were included in this analysis. Our aim was to investigate whether listeners used specific lexical mechanisms in order to reach to a better comprehension of foreign-accented speech across the experimental session.

In this analysis we obtained significant effects for Topography [F(2, 38) = 7.91, p < 0.01], Position [F(2, 38) = 12.90, p < 0.001], and Accent [F(1, 19) = 9.27, p < 0.01]. Significant interactions between Topography and Position [F(4, 76) = 13.16, p < 0.001], and between Block and Accent [F(1, 19) = 3.88, p < 0.05] were also obtained. Importantly, post-hoc analysis of the interaction between Block and Accent revealed differences between native and foreign-accented speech in the N400 mean amplitude only in the first block of the experiment [first block, t(19) = 2.79, p < 0.05; second block, t(19) = 0.29, p = 1]. Furthermore, while correct sentences in the native language elicited the same N400 amplitude across the experiment [t(19) = 0.49, p = 1], foreignaccented sentences elicited a less negative N400 amplitude in the second block as compared to the first block [t(19) = −2.85, p < 0.05].

Post-hoc analysis of the interaction between Topography and Position showed that in the frontal region there were no differences between word positions in terms of the N400 mean amplitude. In the central region there were significant differences between first and critical words [t(76) = −4.59, p < 0.001], first and final word [t(76) = −4.72, p < 0.001), and critical and final words [t(76) = −3.29, p < 0.05]. In the posterior region there were also significant differences between first and critical words [t(76) = −4.16, p < 0.001], first and final words [t(76) =

<sup>5</sup>We also conducted an analysis taking accent strength ratings into account. In order to do so, we organized these variations in a by-subject parametrical order (from the least to the most foreign accented). The purpose of this analysis was to check to what extent adaptation and semantic processing are sensitive to subtle variations in the intensity of the foreignness of the speakers' accents. However, due to the limitations of the current design and the constraints typical of EEG experiments (small amount of epochs per condition considering sub-division), we did not obtain any reliable result. This issue should be explored in future work with experimental designs specifically focusing on this important question.

−4.22, p < 0.001], and critical and final words [t(76) = −3.01, p < 0.05].

Although we obtained a significant interaction between Block and Accent, we wanted to carry out a deeper exploration of these results. More concretely, we examined whether the adaptation on the N400 component was present in the three word positions. Crucially, it was. During native speech comprehension, there were no significant differences between block 1 and 2 [first word: t(19) = 0.13, p = 0.89; critical word: t(19) = 0.47, p = 0.64; final word: t(19) = 0.42, p = 0.68]. However, during foreign-accented speech comprehension, the mean amplitude of the N400 component was significantly reduced during block 2 as compared to block 1 [first word: t(19) = 2.11, p < 0.05; critical word: t(19) = 2.17, p < 0.05; final word: t(19) = 2.21, p < 0.05]. In addition, while foreign-accented speech elicited a more negative N400 amplitude than native speech in block I [first word: t(19) = 2.11, p < 0.05; critical word: t(19) = 2.15, p < 0.05; final word: t(19) = 2.21, p < 0.05], this difference disappeared in block 2 [first word: t(19) = 0.50, p = 0.62; critical word: t(19) = 0.19, p = 0.85; final word: t(19) = 0.97, p =0.34].

In sum, this analysis revealed that during the first experimental block, foreign-accented speech elicited a more negative N400 mean amplitude than native speech comprehension did. However, during the second block, this difference disappeared words uttered by native and foreign-accented speakers elicited similar N400 mean amplitudes (**Figure 3**). Importantly, this effect did not depend on word position. These results suggest that lexical-semantic processing of foreign-accented speech improved by the second experimental block.

In the second analysis performed on the N400 mean amplitudes, we included the sentences of the second experimental block, both correct sentences and sentences containing semantic violations (recall that no semantic violation was encountered during the first experimental block). Since the goal of this analysis was to explore the integration of semantic violations, we included only the critical words (those which could contain the semantic violation) of each sentence (and not the first and last words). The repeated measures ANOVA for the critical words of the second experimental block included the factors Topography (frontal, central, posterior), Accent (native, foreign), and Semantic status (standard, semantic violation). The motivation for this analysis was to explore whether foreign-accented speech affected semantic integration processes after the perceptual learning of the foreign accents. The analysis revealed a main effect of Semantic status [F(1, 19) = 4.25, p < 0.05], and significant interactions between Topography and Semantic status [F(2, 38) = 7.11, p < 0.05], Accent and Semantic status [F(1, 19) = 5.41, p < 0.05], and between the three factors, Topography, Accent and Semantic status [F(2, 38) = 3.69, p < 0.05]. Importantly, post-hoc analysis of the three-way interaction showed that in the frontal region, the difference between standard and semantic violation conditions was only significant for foreign-accented speech [t(38) = 4.43, p < 0.001]. The same effect was observed in the central region [t(38) = 4.49, p < 0.001]. In the posterior region, there were significant differences between the standard

condition and semantic violations for both native [t(38) = 3.15, p < 0.01] and foreign-accented speech [t(38) = 4.51, p < 0.001]. The analysis also revealed that while there were no differences in the N400 mean amplitude in the standard condition for either accent over the three regions of analysis, the mean amplitude in the semantic violation condition was different for native and foreign-accented speech in the frontal [t(38) = 4.69, p < 0.001], central [t(38) = 4.89, p < 0.001] and posterior region [t(38) = 4.53, p < 0.001]. Thus, correct sentences in the second block were processed similarly regardless of the accent (regarding the N400 time window, consistently with results of the previous N400 amplitude analysis), and sentences containing semantic violations were processed differently in native and foreign-accented speech.

In sum, this analysis revealed that the N400 effect for semantic violations was significant all over the scalp distribution during foreign-accented speech comprehension. However, this effect was only significant over the posterior region during native speech comprehension. In addition, although the N400 mean amplitude for correct sentences was similar across accents, semantic violations elicited more negative N400 mean amplitudes in foreignaccented speech compared to native speech comprehension all over the scalp. These results suggest that semantic violations were harder to process during foreign-accented speech as compared to native speech comprehension.

#### P600: Re-Analysis Processes

We carried out a repeated measures ANOVAs for the P600 component (600–900 ms). This ANOVA included the critical words of the standard and semantic violation conditions of the sentences in the second experimental block. The motivation for this analysis was to check whether foreign-accented speech affected meaning re-analysis processing taking place when listening to semantic violations.

The repeated measures ANOVA for the critical words of the second block included the factors Topography (frontal, central, posterior), Accent (native, foreign), and Semantic status (standard, semantic violation). In this analysis we obtained a significant effect of Topography [F(2, 38) = 18.39; p < 0.001], and significant interactions between Topography and Accent [F(2, 38) = 5.38; p < 0.05], and between Accent and Semantic status [F(1, 19) = 12.78; p < 0.01]. Importantly, the post-hoc analysis of the interaction between Accent and Semantic status revealed that the mean amplitude of the P600 component was more positive for semantic violations than for the standard condition only during native speech [native speech, t(19) = −3.73, p < 0.01; foreign-accented speech, t(19) = 0.92, p = ns]. This post-hoc analysis also revealed that the mean amplitude of the P600 component was more positive during native speech than during foreign-accented speech comprehension for sentences containing a semantic violation [t(19) = 2.77, p < 0.05]. No differences were observed for the standard condition [t(19) = −1.78, p = 0.36].

Post-hoc analysis of the interaction between Topography and Accent revealed that in the frontal and central regions, the mean amplitude of the P600 component was similar when comprehending native and foreign-accented speech, while in the posterior region the mean amplitude was significantly more positive for native speech as compared to foreign-accented speech [t(38) = 2.85, p < 0.05]. The post-hoc analysis of the interaction between Topography and Accent also showed that the mean amplitude of the P600 components in native speech comprehension was less positive over the frontal region than over central [t(38) = −4.54, p < 0.001] and posterior regions [t(38) = −4.73, p < 0.001], with differences also between central and posterior regions [t(38) = −4.96, p < 0.001]. The same was observed for foreign-accented speech, with significant differences between frontal and central regions [t(38) = −2.86, p < 0.05], frontal and posterior regions [t(38) = −4.85, p < 0.001], and central and posterior regions [t(38) = −5.30, p < 0.001].

In sum, this analysis revealed that a widely distributed positivity appeared after the N400 effect for semantic violations in the critical words, although this only occurred during native speech comprehension, not during foreign-accented speech comprehension (**Figure 4**).

FIGURE 4 | Grand average ERPs from critical words of Block 2 from Pz electrode. Averages were extracted for native speech during both standard (blue line) and semantic violation (dark blue line) conditions; and for foreign-accented speech also during standard (orange line) and semantic violation (red line) conditions. Grand average images were extracted at 200 ms before (baseline) and lasting until 1200 ms after the onset of the word. Below, topographic distribution of voltage differences between conditions between 250–600 ms and 600–900 ms after the onset of the critical words.

## Discussion

This study aimed at exploring two questions. First, what are the specific mechanisms that native speakers put into play to deal with foreign-accented speech? Previous literature has showed that listeners get better at comprehending foreign-accented speech after a very brief exposure to the accented speakers (Clarke and Garrett, 2004). We examined this issue by looking at the modulation of the P200 and N400 ERP components across two experimental blocks, to clarify whether this improvement takes place at phonetic/acoustic or lexical levels of processing, respectively. Secondly, we explored whether after these changes have taken place, further linguistic processes, such as semantic integration and meaning re-analysis, are affected by foreign-accented speech. This second issue was explored by analyzing the N400 and P600 effects during semantic violation processing in the second experimental block.

In brief, our results show that:


We will discuss the implications of these results in more detail below.

#### Phonetic/Acoustic Processing (P200)

In the introduction we argued that an improvement at the extraction of the phonetic/acoustic properties of foreign-accented speech should be indexed by a modulation of the P200 component. This hypothesis was based on previous observations (Reinke et al., 2003; Snyder et al., 2006; De Diego Balaguer et al., 2007; Paulmann et al., 2011), showing that this ERP component is related to the extraction of spectral information and other important acoustic features. Our results are congruent with this view, since the P200 was more negative for foreign-accented speech than for native speech, suggesting that the extraction of spectral information and other acoustic features (such as the information regarding f<sup>0</sup> mean and range) from the former was more difficult (as is also the case with degraded speech; Strauß et al., 2013).

More importantly for our present purposes is the fact that the amplitude of the P200 remained stable across experimental blocks for foreign-accented speech. That is, the extraction of such phonetic/acoustic information remained equally difficult across the experimental session.

A possible limitation of this result is that the P200 component was only analyzed in the first word of the sentences, because this component usually wanes later on at the onset of words embedded in spoken utterances (cf., Kutas and Federmeier (2001), Figure 1; Strauß et al. (2013) also used a similar strategy as our in a study on lexical expectancies under degraded speech). This way, the P200 at sentence onset might index difficulties at identifying the speaker as a foreign speaker, but later on in the sentence the phonetic processing of foreign-accented speech might have improved across experimental blocks. However, since the N400 component already decreased across experimental blocks for the first word of the sentences in foreign-accented speech comprehension, this alternative explanation does not seem applicable.

Taking all this information into account, our results suggest that, at least in the current experimental conditions (and to the extent that the P200 amplitude indexes the extraction of phonetic/acoustic information), rapid improvements do not occur in the extraction of phonetic/acoustic information during foreignaccented speech comprehension. The next question is therefore whether lexical processes actually reveal some sort of adaptation than can help speech comprehension.

#### Lexically-Driven Perceptual Learning of Foreign-Accented Speech

As we mentioned in the introduction, most of the studies on perceptual learning propose that this processing is driven by lexical information, which helps listeners to categorize and retune ambiguous phonemes (Norris et al., 2003; Davis et al., 2005; McQueen et al., 2006; Sjerps and McQueen, 2010; Reinisch and Holt, 2014). This way, during lexical processing, listeners would process ambiguous phonemes as representative forms of the original phonemes. We explored this issue by analyzing the modulation of the N400 ERP component across the experiment.

We observed a modulation of the N400 component across the two experimental blocks for foreign-accented speech, which suggests an improvement in lexical-semantic processing. In particular, the fact that the N400 mean amplitude for foreign-accented speech decreases in the second experimental block as compared to the first one could be interpreted as revealing that listeners learned to use lexical information to achieve a better comprehension of foreign-accented speech. Crucially, the N400 mean amplitude for foreign-accented speech decreased across the experimental blocks for the first, critical and final words of the sentences. The fact that there were no differences in the magnitude of the N400 in the second experimental block between the native and the foreign-accented speech conditions is congruent with this interpretation. This interpretation is also consistent with recent studies indicating that listeners are able to use lexical and semantic information during foreign-accented speech comprehension in order to aid online word comprehension, as well as to guide the retuning of their phonetic categories (Trude et al., 2013; Reinisch and Holt, 2014).

Nevertheless, there could be alternative explanations for our observations. One possibility is that attention might have had an effect on the modulations of the N400 ERP component. The differences in the N400 between native and foreign-accented speech comprehension for standard sentences during the first experimental block might be due to more attention being deployed for the foreign-accented speakers. However, under such explanation, we should conclude that attention is devoted to the same extent to native and foreign-accented speech during the second experimental block, since no differences between accent conditions are found in the N400. However, if one takes this view, it remains to be explained why the N400 effect for semantic violations during the second experimental block is larger for foreign-accented than for native speech. Hence, although we cannot exclude differences in attentional processes driving some of our observations, we do not think that such explanation captures the whole set of results.

Importantly, we did not manipulate specific phonetic shifts in the foreign accents. Previous studies on perceptual learning (e.g., Norris et al., 2003) normally used a concrete ambiguous phoneme to which listeners had to adapt. Instead, we used a more general, "ecologically valid" accent scenario, thus suggesting that adaptation to a broader accented speech also occurs due to a lexically-driven top-down mechanism. It is interesting to note that Witteman et al. (2014) observed adaptation to foreign-accented speech even when foreign-accented speakers were inconsistent in their pronunciations (meaning that sometimes foreign-accented speakers produced utterances in a native fashion, whereas other times they produced utterances in a foreign fashion). This would suggest that even in a broad, "ecologically valid," and more natural scenario (like is our case), in which speakers do not have to produce utterances in the same way multiple times, adaptation is also possible, and it is still ruled by lexical processing. Moreover, since perceptual learning generalizes to words that have not been presented during the training phase (Davis et al., 2005; McQueen et al., 2006; Sjerps and McQueen, 2010), the improvement in processing foreign-accented speech observed in our study was possible even if listeners were presented with new words in new sentences during each trial.

In addition, the absence of any difference between accents in the magnitude of the N400 in the second experimental block suggests that the particular features of our recordings (longer durations for critical words and sentences during foreignaccented speech as compared to native) did not affect the processing of the correct sentences. Furthermore, the fact that the N400 mean amplitude for correct sentences during native speech comprehension did not differ across experimental blocks suggests that semantic violations did not have any effect (such as a surprise effect) on the processing of correct sentences.

Thus, our results, along with other results from previous literature (e.g., Norris et al., 2003; Davis et al., 2005; McQueen et al., 2006; Sjerps and McQueen, 2010; Janse and Adank, 2012; Trude et al., 2013; Banks et al., submitted; Reinisch and Holt, 2014), suggest that lexical information may aid listeners to identify certain pattern variations in the speech of accented speakers. Information at the lexical level would allow listeners to relate their knowledge about the sounds of their native phonological system to sounds that depart from their phonetic/acoustic repertoire (such as the phonetic/acoustic variations in the speech of accented speakers). Furthermore, this lexical information would allow listeners to map these variations onto lexical items, making it possible to the listeners to improve at recognizing, retrieving and integrating the incoming words after brief exposure to the foreign-accented speakers. The improvement in processing foreign-accented speech is reached quickly, and it remains stable in order to be applied to new words in new utterances<sup>6</sup> .

Nevertheless, our data provide no direct behavioral evidence supporting the idea that listeners get better at comprehending foreign-accented speech. Thus, although our results are compatible with previous literature on perceptual learning (Norris et al., 2003; Davis et al., 2005; Reinisch and Holt, 2014), further research combining behavioral results and EEG data would be very enlightening to the field.

#### Semantic Integration and Meaning Re-Analysis After Adaptation to Foreign-Accented Speech

The second issue that we investigated in this study was whether complex linguistic processes, such as semantic integration and meaning re-analysis, were affected in some way by foreignaccented speech after exposure to the accented speech. As we explained before, we took the modulations of the N400 and P600 ERP components elicited by semantic violations during native and foreign-accented speech comprehension as indices of these cognitive processes.

We observed instructive differences between the comprehension of foreign-accented and native speech when the sentences carried a semantic violation. In fact, these violations elicited a larger N400 effect in the context of the comprehension of foreign-accented speech compared to native speech. In addition, the N400 effect for semantic violations during foreign-accented speech comprehension was distributed all over the scalp, while for native speech it only appeared in the classical centro-posterior distribution. This might be due to the lexically-driven perceptual learning of foreign-accented speech. More concretely, a higher demand on lexical processing, needed for the identification and retuning of ambiguous phonetic/acoustic features, might have rendered the effort of accessing the implausible word (and also integrating it in the previous context) extremely difficult.

A potential limitation of our results is the fact that the f<sup>0</sup> mean for semantic violations was higher than for standard words. Nevertheless, since this difference was present both during native and foreign-accented speech, f<sup>0</sup> mean differences between semantic violations and standard words do not seem to account for our pattern of results.

Moreover, the results regarding the modulations of the N400 component during the processing of semantic violations contrast, to some extent, with previous observations by Hanulíková et al. (2012) and Goslin et al. (2012). These two studies have explored the modulations of the N400 component associated with foreign-accented speech in two different contexts. First, Hanulíková et al. (2012) observed similar N400 effects associated with semantic violations irrespective of the accent of the speaker. This is in clear contradiction with our results, in which we found a larger N400 effect for semantic violations during foreign-accented speech comprehension. The difference between these two studies might be explained by the fact that Hanulíková et al. (2012) only used one foreign-accented speaker, with a mild (and highly familiar) accent, possibly making lexical-semantic processing easier. However, it is remarkable that as in Hanulíková et al. (2012), we found a significant N400 effect over the anterior region of the scalp only for foreign-accented speech. That is, while semantic violations during foreign-accented speech comprehension elicited a widely distributed N400 effect, during native speech comprehension semantic violations "only" elicited the classical centro-posterior N400 effect (see e.g., Federmeier and Laszlo, 2009). This might mean that processing semantic violations during foreign-accented speech comprehension requires more cognitive resources than during native speech comprehension.

On the other hand, Goslin et al. (2012) found a less negative N400 component for the final word of sentences produced with a foreign accent as compared to both native and regionally-accented speech. Note that these sentences did not involve semantic violations. Goslin et al. (2012) concluded that listeners tried to anticipate upcoming words in order to avoid difficulties in comprehension. It is important to note the difference that Goslin et al. (2012) reported: their foreign-accented speakers were significantly less intelligible than the native speakers, which is not the case in our, nor Hanulíková et al.'s (2012) study. This way, different levels of intelligibility may lead to different strategies for comprehension, and, therefore, to different modulations at the N400 component, an index of lexical-semantic processing. Future research is needed to distinguish Goslin et al. (2012) and Hanulíková et al. (2012) competing hypotheses.

Regarding the P600 component, which can be related to a second stage of meaning analysis (Kuperberg et al., 2011; Brouwer et al., 2012), we observed a modulation of this ERP component only in the native speech condition. The presence of a P600 modulation during native speech comprehension replicates and extends previous studies in sentence reading (Kuperberg, 2007; Van de Meerendonk et al., 2010). Hence, to the extent that this modulation indexes some sort of meaning re-analysis (Regel et al., 2011; Sanford et al., 2011; Van Petten and Luka, 2012; Martin et al., 2013), our observations would suggest that such re-analysis is not carried out in the foreign-accented speech condition. Importantly, in Hanulíková et al.'s (2012) Figure 3, a large positive deflection over the posterior region of the scalp can be observed during native speech comprehension, following the N400 component elicited by the semantic violation. During foreign-accented speech, this positivity is not present. These results go in the same direction as ours.

A tentative explanation for this absence of the stage of meaning re-analysis is that listeners avoid trying to find an alternative meaning for a semantic violation when it is produced by a foreign-accented speaker. This would be because listeners may treat the semantic violation as an error right away, due to the potential lack of knowledge or fluency of the non-native speaker,

<sup>6</sup> Several brain imaging studies investigating phonetic ability (Golestani et al., 2002; Golestani and Zatorre, 2004; Golestani and Pallier, 2007; Golestani et al., 2007) have observed structural as well as functional neural differences correlating with the individual differences in learning speed and success at comprehending and producing second language phonemes. In order to inform about the individual differences in our study, we provide the individual mean amplitudes (over electrode Pz) for each experimental condition (see Supplementary Material, Table 2).

hence blocking any re-analysis for alternative meanings. An alternative explanation would be that listeners would lack processing resources during foreign-accented speech comprehension (because of a higher demand on lexical processing) in order to carry out the meaning re-analysis online.

Nevertheless, it is important to remark that the classical P600 component usually has a distribution centered over the posterior areas of the scalp. In our case, the positive effect following the N400 effect was widely distributed. Thornhill and Van Petten (2012) observed that an anterior positivity was elicited by those words that were not highly predictable, independently of the semantic relationship with the expected word. This could mean that during native speech comprehension, semantic violations also elicited a frontal positivity. Thus, listeners would be able to have clear expectations about the upcoming words in an utterance when listening to a native speaker. However, during foreignaccented speech, expectations would not reach the same level of detail. These questions remain for future research on the topic.

#### Conclusions

The results of the present study suggest that listeners do not improve at extracting phonetic/acoustic features of foreignaccented speech after brief exposure to it. However, despite this lack of improvement at the extraction of acoustic features, native listeners seem to adapt to the foreign-accented speech due to perceptual learning driven by lexical information. More concretely, lexical information allows listeners to recognize and retune phonetic and acoustic variations onto lexical items, making it possible for the listeners to improve at recognizing, retrieving and integrating the incoming words after brief exposure to the foreign-accented speech. In addition, semantic violations uttered by foreign-accented speakers are harder to process, as compared to semantic violations during native speech comprehension. This is probably because of a higher demand on lexical processing in the retrieval of the non-expected words. Finally, native speech comprehension elicited some sort of meaning re-analysis when semantic violations were present. Such re-analysis seemed to be absent when processing foreign-accented speech, at least under the present experimental conditions.

#### References


#### Acknowledgments

This research was approved by the ethics committee of the Spanish Ministry of Economy and Finance, which funded this study. We thank Cristina Espinosa for her help with stimuli measurements; Silvia Blanch and Xavier Mayoral for their technical support; Jesús Bas and Meritxell Ayguasanosa for assistance in testing participants; and Xavier García, Joanna D. Corey, Cristina Baus, Anna Hatzidaki, Sara Rodríguez, and two anonymous reviewers for their thoughtful comments. This research was funded by an FPI grant (BES-2012-056668) and two project grants (PSI2011-23033 and Consolider INGENIO CSD2007-00012) awarded by the Spanish Government; by one grant from the Catalan Government (SGR 2009-1521); and by one grant from the European Research Council under the European Community's Seventh Framework (FP7/2007- 2013 Cooperation grant agreement 613465-AThEME). CM is supported by the IKERBASQUE institution and the Basque Center on Cognition, Brain and Language. AC is supported by the ICREA institution and the Center for Brain and Cognition.

#### Supplementary Material

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fnhum. 2015.00167/abstract


Audio 7 | Native speaker 3.


evidence from the comprehension of noise-vocoded sentences. J. Exp. Psychol. 134, 222–241. doi: 10.1037/0096-3445.134.2.222


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Romero-Rivas, Martin and Costa. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Audiovisual cues benefit recognition of accented speech in noise but not perceptual adaptation

#### *Briony Banks1\*, Emma Gowen2, Kevin J. Munro1 and Patti Adank3*

*<sup>1</sup> School of Psychological Sciences, University of Manchester, Manchester, UK, <sup>2</sup> Faculty of Life Sciences, University of Manchester, Manchester, UK, <sup>3</sup> Speech, Hearing and Phonetic Sciences, University College London, London, UK*

Perceptual adaptation allows humans to recognize different varieties of accented speech. We investigated whether perceptual adaptation to accented speech is facilitated if listeners can see a speaker's facial and mouth movements. In Study 1, participants listened to sentences in a novel accent and underwent a period of training with audiovisual or audio-only speech cues, presented in quiet or in background noise. A control group also underwent training with visual-only (speech-reading) cues. We observed no significant difference in perceptual adaptation between any of the groups. To address a number of remaining questions, we carried out a second study using a different accent, speaker and experimental design, in which participants listened to sentences in a non-native (Japanese) accent with audiovisual or audio-only cues, without separate training. Participants' eye gaze was recorded to verify that they looked at the speaker's face during audiovisual trials. Recognition accuracy was significantly better for audiovisual than for audio-only stimuli; however, no statistical difference in perceptual adaptation was observed between the two modalities. Furthermore, Bayesian analysis suggested that the data supported the null hypothesis. Our results suggest that although the availability of visual speech cues may be immediately beneficial for recognition of unfamiliar accented speech in noise, it does not improve perceptual adaptation.

#### *Edited by:*

*Ignacio Moreno-Torres, University of Málaga, Spain*

#### *Reviewed by:*

*Jeroen Stekelenburg, Tilburg University, Netherlands Matthias J. Sjerps, University of California, Berkeley, USA and Radboud University Nijmegen, Netherlands*

#### *\*Correspondence:*

*Briony Banks, School of Psychological Sciences, University of Manchester, 3rd Floor, Zochonis Building, Brunswick Street, Manchester M13 9LP, UK briony.banks@manchester.ac.uk*

> *Received: 31 March 2015 Accepted: 10 July 2015 Published: 03 August 2015*

#### *Citation:*

*Banks B, Gowen E, Munro KJ and Adank P (2015) Audiovisual cues benefit recognition of accented speech in noise but not perceptual adaptation. Front. Hum. Neurosci. 9:422. doi: 10.3389/fnhum.2015.00422* Keywords: speech perception, perceptual adaptation, accented speech, audiovisual speech, multisensory perception

#### Introduction

When we encounter a speaker with an unfamiliar accent, we are able to 'tune in' to the new phonetic patterns of speech to understand what they are saying. This type of perceptual adaptation is regularly encountered in daily life and allows us to recognize speech in a variety of native and non-native accents (Clarke and Garrett, 2004; Bradlow and Bent, 2008; Maye et al., 2008). It is a robust ability that is present in all stages of life (for a review, see Cristia et al., 2012) and occurs even with relatively unintelligible accents, albeit at a slower rate (Bradlow and Bent, 2008). The relative success and speed of perceptual adaptation depends on external factors such as the amount and variety of exposure to the accent (Bradlow and Bent, 2008). However, less is known about how the modality of speech can influence the adaptation process – for example, whether adaptation to accented speech is greater when audiovisual speech cues are available, compared to only auditory speech cues. Identifying ways to improve or facilitate this process may benefit communication in certain populations who have difficulty adapting to accented speech, such as older adults (Adank and Janse, 2010), individuals with aphasia (Bruce et al., 2012), or non-native speakers (Munro and Derwing, 1995); for example, audiovisual speech could be incorporated into language-learning tools or rehabilitation therapies for aphasia.

Perceptual adaptation to accented speech can be seen as a three-stage process: the listener first perceives the new, unfamiliar input; secondly, maps this onto stored lexical items, and thirdly, generalizes these new mappings to other lexical items. Indeed, research has successfully shown that this type of adaptation involves the modification of perceptual phonemic boundaries in relation to perceived lexical items (Norris et al., 2003; Kraljic and Samuel, 2005, 2006); for example, listeners who perceive an ambiguous sound midway between /d/ and /t/ spoken within the word 'crocodile,' are more likely to then categorize the same sound as /d/ when heard in isolation.

An improvement in perceptual adaptation to accented speech could potentially be achieved by influencing any one of the three stages involved, for example, the first stage may be facilitated through the availability of audiovisual (multisensory) cues. The integration of multisensory input across different sensory modalities can facilitate perception (Stein and Meredith, 1993); for example, auditory perception of speech is improved when integrated with visual input from a speaker's facial movements. Indeed, being face-to-face with a speaker improves speech recognition in noisy environments (Sumby and Pollack, 1954; Erber, 1975; MacLeod and Summerfield, 1987; Grant et al., 1998; Ross et al., 2007), particularly when speech is non-native (Reisberg et al., 1987; Arnold and Hill, 2001; Hazan et al., 2006). Research has shown that audiovisual speech cues help listeners to identify fricative consonants (Jongman et al., 2003) and prosodic cues such as lexical prominence (Swerts and Krahmer, 2008). The benefits of audiovisual cues may also extend to accented speech, as several studies have shown that recognition of accented speech is better for audiovisual compared to audio-only input (Arnold and Hill, 2001; Janse and Adank, 2012; Yi et al., 2013; Kawase et al., 2014). The integration of auditory and visual cues may benefit recognition of accented speech by helping listeners to resolve the perceptual ambiguities of an unfamiliar accent; for example, if a speaker's pronunciation of a particular phoneme or word is unclear, observing their mouth movements may help to identify the correct item. Indeed, exposure to ambiguous audiovisual cues using McGurk stimuli has been shown to influence subsequent phoneme categorization (Bertelson et al., 2003; Vroomen et al., 2004). A listener who is face-to-face with an accented speaker may therefore be able to exploit the perceptual benefit from additional visual input, and adapt more successfully to the accented speech – that is, their recognition of the speech may improve more greatly over time.

Although a large part of everyday communication is carried out face-to-face, most experimental work on accent perception is carried out in the auditory modality, and the use of visual speech information has gained relatively little attention in relation to perceptual adaptation to accented speech. Furthermore, much of the work regarding the potential benefits of audiovisual speech to perceptual adaptation has been carried out using noisevocoded speech rather than accented. While both speech types are less intelligible than familiar speech, and listeners adapt to them both, variation in noise-vocoded speech stems from degrading the acoustical composition of the entire speech signal, whereas accented speech varies in terms of its phonemic patterns, is acoustically intact and only affects certain speech sounds. Although audiovisual cues have been shown to benefit perceptual adaptation to noise-vocoded speech (Kawase et al., 2009; Pilling and Thomas, 2011; Wayne and Johnsrude, 2012; Bernstein et al., 2013), the observed effects are relatively small and, furthermore, we do not know if such results generalize to accented speech.

Two previous studies have investigated the role of audiovisual cues in perceptual adaptation to accented speech. In a phoneme-recognition study, Hazan et al. (2005) demonstrated that long-term perception of individual non-native phonemes improved when listeners were exposed to audiovisual input, compared to audio-only input; however, this finding was not tested with longer items such as sentences, and it is thus unclear if the results can be generalized to non-native speech in general. Indeed, when Janse and Adank (2012) compared perceptual adaptation to unfamiliar, accented sentences with or without visual cues, they observed no difference in the amount of adaptation, although a small, non-significant trend of greater adaptation during the early stages was present for audiovisual speech. However, two confounding factors may have influenced their findings. The experiment was carried out on older adults, a population that can have particular difficulty with processing visual speech (Sommers et al., 2005); this factor, combined with a relatively difficult semantic verification task, may have rendered the task cognitively demanding for the older participants and negatively affected their performance. Two possible conclusions can therefore be drawn from the two studies described here: first, audiovisual speech cues are not beneficial to perceptual adaptation to longer items of accented speech, although they may improve learning of particular phonemes in isolation (as shown by Hazan et al., 2005); or, audiovisual speech cues do benefit perceptual adaptation to accented speech, but the confounding factors outlined above prevented this effect from being observed. Therefore, evidence from young, healthy adults, using whole sentences and a simple speech recognition task, may help to establish the possible benefits of audiovisual speech cues for perceptual adaptation to accented speech.

We investigated whether audiovisual speech cues do indeed facilitate perceptual adaptation to accented speech. We did this across two studies, each using a different accent and speaker and a different experimental design, but with the same sentences and task. In particular, Study 2 addresses a number of questions arising from Study 1 (see Discussion, Study 1 for details). Study 1 employed a training design similar to those used in studies of noise-vocoded speech (Kawase et al., 2009; Pilling and Thomas, 2011; Wayne and Johnsrude, 2012), and a novel accent to control for familiarity effects (Maye et al., 2008; Adank and Janse, 2010; Janse and Adank, 2012). Participants underwent training in the novel accent with audiovisual or audio-only stimuli, with or without background noise. A visual-only (speech-reading) training condition provided a control group; that is, we did not expect visual training to affect adaptation to the accented speech. For the pre- and post-training sessions, we presented our accented stimuli in background noise to avoid ceiling effects associated with rapid perceptual adaptation (Janse and Adank, 2012; Yi et al., 2013). We also included two training conditions with background noise for two reasons: firstly, the learning context can influence the outcome of learning (Godden and Baddeley, 1975; Polyn et al., 2009), and consistency between the training and subsequent testing sessions may therefore affect adaptation. As the stimuli in our pre- and post-training sessions were always presented in the context of background noise, we predicted that training with background noise would facilitate recognition of the accented speech in noise following the training. Secondly, we predicted that altering the clarity of the auditory signal (by adding background noise) would increase the use of visual cues during the training (cf. Sumby and Pollack, 1954), and that this would, in turn, increase subsequent adaptation.

If audiovisual cues are beneficial to perceptual adaptation to accented speech, we expected to observe the following: (1) greater adaptation after audiovisual training compared to audio-only or visual-only training; (2) greater adaptation after audiovisual training with background noise compared to audiovisual training in quiet; (3) a greater 'audiovisual benefit' (the difference in adaptation between audiovisual and audio-only training) for the groups trained with background noise, compared to the groups trained without background noise; (4) greater adaptation following all types of training in comparison to visual training (that is, we expected the visual training to have no effect on subsequent recognition of the accented speech). Based on previous evidence that audiovisual cues can benefit recognition of accented speech compared to audio-only cues (Arnold and Hill, 2001; Janse and Adank, 2012; Yi et al., 2013; Kawase et al., 2014), we also expected to observe the following during the training session: (1) better recognition of the accented training stimuli for both audiovisual groups compared to the audioonly groups; (2) poorer recognition of the training stimuli presented in background noise compared to quiet; and (3) poorer recognition of the visual training stimuli compared to all other groups.

In Study 2, participants listened to a non-native (Japanese) accent in the audiovisual or auditory modality to test whether a greater amount of *continuous* exposure to audiovisual stimuli (without separate training) would reveal a difference in adaptation between the two modalities. This design enabled us to examine the overall amount of adaptation, as well as adaptation at different time points in the experiment (for example, the presence of audiovisual speech cues may afford benefits to recognition of accented speech in earlier compared with later trials; Janse and Adank, 2012). In addition, participants' eye movements were recorded to verify that they were predominantly looking at the speaker's face. As in Study 1, if audiovisual cues *are* beneficial to perceptual adaptation to accented speech, we predicted that participants exposed to audiovisual accented speech would adapt to a greater extent than participants exposed to audio-only accented speech. Conversely, if audiovisual cues *are not* beneficial to perceptual adaptation to accented speech, we expected to observe no difference in perceptual adaptation for the audiovisual and auditory modalities in either study.

# Study 1

#### Methods

#### Participants

One hundred and five students (26 male, *Median* = 20 years, age range 18–30 years) recruited from the University of Manchester, participated in the study. All participants were native British English speakers with no history of neurological, speech or language problems (self-declared), and gave their written informed consent. Participants were included if their corrected binocular vision was 6/6 or better using a reduced Snellens chart, and their stereoacuity was at least 60 s of arc using a TNO test. Participants' hearing was measured using pure-tone audiometry for the main audiometric frequencies in speech (0.5, 1, 2, and 4 kHz) in both ears. Any participant with a hearing threshold level greater than 20 dB for more than one frequency in either ear was excluded and did not participate in the study. We excluded one male participant based on the criteria for hearing, and four (one male, three female) based on the criteria for vision. We provided compensation of course credit or £7.50 for participation. The study was approved by The University of Manchester ethics committee.

#### Materials

We used 150 Institute of Electrical and Electronics Engineers (IEEE) Harvard sentences (IEEE, 1969) for our stimuli, and a 30-years-old male volunteer provided all recordings for the experiment. We transcribed and recorded 135 sentences in the novel accent, and randomly divided them into three lists, A, B or C. We recorded the remaining 15 sentences in the speaker's own British English accent to provide stimuli for a 'familiar accent' baseline test. We used a novel accent to avoid confounds from participant familiarity (that is, we could guarantee that none of our participants had ever encountered it before; see Adank et al., 2009), and to compare responses to the novel, unfamiliar accent with a familiar accent (our baseline measurement) from the same speaker (Adank and Janse, 2010). The novel accent (see Banks et al., 2015 for further details) was created by systematically modifying the vowel sounds of a Standard British English accent (**Table 1**). The accent was created using allophones from existing regional English accents (for example, Scottish or Irish) through an iterative process.

#### *Training stimuli*

Stimuli for the training sessions comprised six movies (three with and three without background noise), each comprising 45 video clips from one of the three novel-accented stimuli lists (A, B, and C). During recordings, the speaker looked directly at the camera with a neutral expression, and was asked to speak as naturally as possible. The recordings were made in a sound-treated laboratory with no natural light, using a High Definition Canon HV30 camera and Shure SM58 microphone. The camera was positioned ∼1 m from the speaker to frame the head and shoulders, with a blue background behind the speaker. Video recordings were imported into iMovie 11, running on an Apple MacBook Pro, as



large (960 × 540) digital video (.dv) files. Each recorded sentence was edited to create a 6-s video clip which were then compiled in a randomized order to create the training videos. Between each clip (sentence) there was a 7-s interval, during which the screen was black with a white question mark for 4 s (to indicate to participants they should respond) and a white fixation cross for 3 s (to indicate the next clip was imminent). Edited audio files (see Testing Stimuli, below) were re-attached to each video clip so that the normalized stereo tracks would be heard congruently with the video. For training conditions that included background noise, we added speech-shaped noise at a signal-to-noise ratio (SNR) of 0 dB to the audio files, using a custom script in Matlab software (R2010a, Mathworks, Inc.), before re-attaching them. Each movie was exported as a 960 × 540 MPEG-4 movie file with a bit-rate of 3269, in widescreen (16:9) ratio at 25 frames per second.

#### *Testing stimuli*

The audio track for each video clip (sentence) was extracted as an audio (.wav) file to be used for the auditory testing sessions. The experimenter checked all recordings and any that were not deemed suitable (for example due to mispronunciation or unnaturalness) were re-recorded in a second recording session. Audio files were normalized by equating the root mean square amplitude, resampled at 22 kHz in stereo, and cropped at the nearest zero crossings at voice onset and offset, using Praat software (Boersma and Weenink, 2012). The same procedure was used for the native-English recordings to produce stimuli for the familiar-accent baseline test.

We counterbalanced the presentation order of the novelaccented stimuli for the pre-training, training and post-training sessions across training groups; this was based on the sentence lists and followed the order ABC, CAB, and BCA. Each sentence was presented once per participant to avoid item-specific training effects. During the pre-training and post-training sessions, sentences were presented in a pseudo-random order per testing block and per participant, and the sentences used for the baseline and training sessions were presented in a fixed order.

#### Procedure

**Figure 1** shows the experimental design in full. Participants first listened to the 15 familiar-accented (baseline) sentences to habituate them to the task and to the background noise. This was followed by the pre-training session, after which participants underwent training in one of five randomly assigned conditions (*N* = 20 per group): audiovisual, audio-only, visual (speech-reading), audiovisual + noise, audio-only + noise. Each participant was exposed to training stimuli from one of the three lists (A, B, or C) presented on a laptop computer. However, for the two audio-only groups the screen was not visible, and for the visual group, participants were asked to remove their headphones

and to speech-read each sentence. Each session (pre-training, training, and post-training) comprised 45 sentences.

#### *Speech reception thresholds*

For the baseline, pre-training and post-training sessions (but not for the training), we measured participants' recognition accuracy as speech reception thresholds (SRTs; Adank and Janse, 2010; Banks et al., 2015) in speech-shaped background noise, a sensitive measure which eliminates the need to equate starting accuracy between participants as it keeps recognition accuracy constant throughout. An adaptive staircase procedure (Plomp and Mimpen, 1979) varied the SNR per trial depending on the participants' response; that is, the SNR increased following an incorrect response, decreased following a correct response, or remained constant if a response was 50% correct. Thus, the SNR decreased as participants' performance improved (Baker and Rosen, 2001). The SNR varied in pre-determined steps of 8 dB for the first two changes and 2 dB thereafter, and maintained recognition accuracy (number of correctly repeated keywords) at 50%. The procedure was carried out using Matlab (R2010a). The mean SNR for all reversals indicated the SRT measurement for each participant, with an average of 21 reversals (SD = 5.4) per 45 trials.

#### *Speech recognition task*

Throughout the experiment, we instructed participants to repeat out loud as much of each sentence as they could in their normal voice and without imitating the accent. The experimenter scored participants' responses immediately after each trial, according to how many keywords (content or function words) they correctly repeated out of a maximum of four (for example, "a pot of tea helps to pass the evening"). Responses were scored as correct despite incorrect suffixes (such as -s, -ed, -ing) or verb endings; however, if only part of a word (including compound words) was repeated this was scored as incorrect (Dupoux and Green, 1997; Golomb et al., 2007; Banks et al., 2015). If a participant imitated the novel accent rather than responding in their own accent this was also scored as incorrect, as we could not make a clear judgment as to whether they had recognized the correct word.

All tests and training were carried out in a quiet laboratory in one session lasting ∼50 min. Auditory stimuli for the baseline and testing sessions were presented using Matlab software (R2010a, Mathworks, Inc.), and training stimuli were presented using iTunes 10.5.1 on an Apple MacBook Pro. Participants wore sound-attenuating headphones (Sennheiser HD 25-SP II) for the duration of the experiment, except during the visual (speechreading) training. The experimenter adjusted the volume to a comfortable level for the first participant and then kept it at the same level for all participants thereafter.

#### Data Analysis

Perceptual adaptation was defined as the difference in SRTs before and after the training. We carried out a mixed-design ANOVA with a within-participant factor of testing session (two levels: preand post-training), and a between-group factor of training type (five levels: audio-only, audiovisual, visual-only, audio-only + noise, audiovisual + noise), was conducted on these difference scores. To investigate recognition of the novel accent in the different training modalities, we also analyzed accuracy scores (% correct keywords) from *within* the training session by conducting a one-way ANOVA (five levels: audio-only, audiovisual, visualonly, audio-only + noise, audiovisual + noise). To verify that baseline and pre-training measurements were equal across all groups, we carried out a one-way ANOVA for each data set with the between-group factor of training group (five levels). All *post hoc t*-tests carried out were two-tailed and we applied a Bonferroni correction for multiple comparisons. We identified two outliers in the data (one for the novel-accented SRTs and one for the baseline SRTs) with standardized residuals *>*3.291 , and these scores were modified to the value of the group mean SRT plus two standard deviations. Unless otherwise stated, our data met all other assumptions for the parametric tests that we used.

#### Results

**Table 2** shows the mean SRTs for the familiar-accented (baseline) speech, and mean pre- and post-training SRTs for the novel accent, per training group. As SRTs represent the SNR (dB) at which 50% recognition accuracy is achieved, higher levels reflect poorer performance. SRTs in all groups decreased following the training by ∼2 dB, indicating that participants' recognition of the accented speech improved over time and that perceptual adaptation took place. **Figure 2** shows the mean decrease in SRTs (amount of perceptual adaptation) following the training for each group. **Figures 3A–E** show a negative relationship between the amount of adaptation and pre-training SRTs; that is, participants who initially performed relatively worse adapted the most.

No significant differences were observed between groups for baseline SRTs (recognition of familiar-accented speech), or for pre-training SRTs (recognition of the novel-accented speech), confirming that the groups were equally matched for comparison. As expected, baseline SRTs across all five groups (*M* = 0.5 dB, SD = 1.68) were significantly lower than mean pre-training SRTs, across all groups (*M* = 7.7 dB, SD = 2.50), *t*(99) = 29.19, *p <* 0.001, confirming that the novel accent negatively affected participants' recognition in comparison to the

1In normally distributed data, z-scores would not be expected to be greater than 3.29.

TABLE 2 | Mean SRTs in dB per training group (Study 1).


*Training group N* = *20.*

familiar accent. We observed a main effect of testing session, *<sup>F</sup>*(1,95) <sup>=</sup> 119.48, *<sup>p</sup> <sup>&</sup>lt;* 0.001, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.56. Paired-sample *t*-tests (Bonferroni correction, *p <* 0.01) confirmed that decreases in SRTs following the training were statistically significant in every group (see **Table 2**); thus, participants' recognition of the accented speech significantly improved between the two sessions. Neither the main effect of training group, nor the testing session × training type interaction, were significant (*p*s *>* 0.05).

A null finding may be interpreted in two ways: (1) that no effect is present in the population and the null hypothesis is true, or (2) that the data are inconclusive; however, significance testing cannot confirm these interpretations. Calculating Bayes factor (*B*) can, however, test whether the null hypothesis is likely, regardless of observed *p*-values. We calculated Bayes factor for differences in the amount of adaptation between all five groups (see **Figure 2** and **Table 3**). These analyses indicated that the null hypothesis (that there was no difference in adaptation between the groups) was supported for the following comparisons: audiovisual vs. audio, audiovisual vs. visual, audiovisual + noise vs. visual, audiovisual vs. audio + noise, and audio vs. audio + noise (*B <* 0.33; significant differences between these groups were predicted if our experimental hypotheses were true). All other comparisons indicated that data from this sample were inconclusive (0.33 *< B <* 3.0).

#### Analysis of the Training Data

To further investigate how the presence of audiovisual cues affected participants' recognition of the novel accent, we analyzed recognition accuracy in the five groups *during* the training (**Figure 4**). Analysis of these data revealed a significant effect of training condition, *<sup>F</sup>*(4,95) <sup>=</sup> 331.47, *<sup>p</sup> <sup>&</sup>lt;* 0.001, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.93. Pairwise comparisons (Bonferroni correction, *p <* 0.005) confirmed that recognition accuracy was significantly lower in the visual group (*M* = 1.4%, SD = 0.82) than in all other groups, *p <* 0.001. Recognition accuracy was also significantly higher in the audiovisual (*M* = 85.2%, SD = 7.67) and audio-only (*M* = 82.7%, SD = 7.17) groups compared to the audiovisual + noise (*M* = 60.4%, SD = 11.90) and audio-only + noise (*M* = 45.6%, SD = 10.01) groups, *p*s *<* 0.001. Recognition accuracy was significantly higher in the audiovisual + noise compared to the audio-only + noise group, *p <* 0.001. However, the marginal difference between the audiovisual and audio-only groups was not statistically significant, *p* = 0.289, and a Bayes factor calculation suggested that the data were inconclusive, *B* = 0.30 (uniform distribution, 0–30% limit).

#### Discussion

In Study 1, we investigated whether training with audiovisual or audio-only speech, with or without the presence of background noise, affected perceptual adaptation to a novel accent. As in previous studies of perceptual adaptation to accented speech (Clarke and Garrett, 2004; Bradlow and Bent, 2008; Maye et al., 2008; Adank and Janse, 2010; Gordon-Salant et al., 2010; Janse and Adank, 2012), we observed significant improvements in recognition of the novel accent over time, represented by a decrease in SRTs following the training.

Contrary to our predictions, there was no significant difference in the amount of adaptation between any of the groups; that is, the type of training had no effect on adaptation. Bayes factor suggested that non-significant differences in adaptation for four of the group comparisons (most importantly, audiovisual vs. audio-only) supported the null hypothesis. This would suggest that audiovisual cues do not benefit adaptation to accented speech better than audio-only or visual-only stimuli. However, for most of the group comparisons (particularly audio-only vs. visual), Bayes factor indicated that the data were inconclusive. We had included visual training as a control group, and predicted that training with audio-only stimuli would lead to greater adaptation in comparison – this would indicate that the training had been effective. However, the difference between these groups was inconclusive, and we therefore cannot ascertain whether the training was fully effective, or whether the lack of differences between groups was due to methodological reasons.

Analysis of data from the training session confirmed our predictions that recognition accuracy for the visual group would be considerably and significantly lower than all other groups, and that audiovisual cues would provide a benefit to recognition of the accented speech, as recognition accuracy was significantly higher in the audiovisual + noise group than in the audio-only + noise group. However, the same 'audiovisual benefit' was not present for participants carrying out training in quiet, although this nulleffect was inconclusive – perhaps because accuracy was almost at ceiling level for these groups (Ross et al., 2007). Nevertheless, any effects observed during the training did not transfer to subsequent auditory testing, again suggesting that the training was not fully effective.

There are several possible explanations for this. Firstly, the timing of the training, and the length of the pre-training session, meant that participants had already begun adapting to the novel accent before the training. The training may therefore not have been fully beneficial at this stage. With longer exposure to the audiovisual stimuli at an earlier time point, we may have observed an effect of greater adaptation for this group. Secondly, inconsistency between the training and subsequent

TABLE 3 | Bayes factor (*B*) for comparisons of adaptation between groups in Study 1.


*N* = *20 per group. Calculations based on a uniform distribution with lower and upper limits of 0–6 dB change. B < 0.33 indicates results in favor of the null hypothesis; B > 3 indicates results in favor of the experimental hypothesis; intermediate values indicate that data is inconclusive. \*B < 0.33.*

testing sessions may have affected any benefits from the training, as consistency between training and subsequent testing can be beneficial to performance (Godden and Baddeley, 1975; Polyn et al., 2009). In fact, the switch to a separate training session may have been disruptive to adaptation. Thirdly, audiovisual cues from the particular speaker, or for the particular accent we used, may not have been sufficiently beneficial to improve perceptual adaptation. The relative benefit from audiovisual cues varies between different speakers (Kricos and Lesner, 1982, 1985), and this may also be the case for different accents. Indeed, Kawase et al. (2014) demonstrated that audiovisual cues vary in how much they can benefit recognition of non-native phonemes, in some cases even inhibiting recognition. Furthermore, Hazan et al. (2005) observed greater adaptation after audiovisual compared to audio-only training for *non-native* phonemes, whereas our novel accent was based on native (regional) English accents. We may therefore have observed a greater benefit to perceptual adaptation with audiovisual cues from a different speaker, and with a non-native accent.

To answer these remaining questions, we carried out a second study using a different experimental design, accent and speaker. In Study 2, we exposed participants to 90 sentences of unfamiliar accented speech in either the audiovisual or auditory modality without separate training, thus addressing concerns that the timing and length of the training, or inconsistency between training and testing sessions, affected the benefits gained from audiovisual cues in Study 1. Furthermore, this design allowed us to analyze the effects of audiovisual cues on adaptation at different stages of the experiment, for example during early compared with later trials, which may reveal more subtle effects (Janse and Adank, 2012). Secondly, we used a natural, nonnative (Japanese) accent produced by a different speaker for our stimuli. Additionally, we recorded participants' eye movements using an eye-tracker to verify that they were continually looking at the speaker's face during testing. We increased the number of participants in each group to address any potential concerns that sample size prevented the effects in Study 1 from reaching statistical significance. By addressing these remaining questions, we hoped to clarify whether audiovisual speech cues can indeed benefit perceptual adaptation to unfamiliar accented speech.

# Study 2

# Methods

#### Participants

Sixty five young adults (five male, *Median* = 20.55 years, age range 18–30 years) recruited from the University of Manchester participated in the study, following the same procedure and exclusion criteria as Study 1. Two participants were excluded (one male, one female) due to data loss during the eye-tracking procedure (see Data Analysis for full details), and one female participant was excluded due to technical issues during the experiment.

#### Materials

Stimulus material consisted of 120 of the IEEE Harvard sentences (IEEE, 1969) that had been used in Study 1. A 30-yearold male native Japanese speaker recited 90 of them in a soundproofed laboratory, and these were recorded and edited using the same equipment and procedure as for Study 1. Speechshaped background noise was added to the audio files using a custom Matlab script to create stimuli at SNRs of +4 to −4 dB. Background noise was included throughout to avoid ceiling effects associated with rapid perceptual adaptation to an unfamiliar accent (for example, Clarke and Garrett, 2004). For the audiovisual condition, the audio files were combined with the corresponding video clips using Experiment Builder software (SR Research, Mississauga, ON, Canada) to create congruous audiovisual stimuli. For the audio-only condition, a different static image of the speaker, taken from the video recordings, was displayed on screen simultaneously with each audio recording; this was to ensure that participants were processing auditory and visual information in both conditions. All stimuli were presented in a randomized order for each participant.

The native-accent baseline stimuli comprised the same 15 standard British English sentences from Study 1, plus an additional 15 recorded by the same speaker. We used 30 sentences to ensure that participants habituated to the background noise and task, as the SRT from this test would be used to set the SNR for presentation of the non-native accented stimuli. The baseline sentences were presented in a fixed order for all participants.

#### Procedure

All tests were carried out in a soundproofed booth in one session lasting ∼40 min. The familiar-accented baseline stimuli were presented and scored using Matlab software (R2010a, Mathworks, Inc.), through Sennheiser HD 25-SP II headphones, in the same adaptive staircase procedure used in Study 1 (see Speech Reception Thresholds for details). An Eyelink 1000 eye-tracker with Experiment Builder software (SR Research, Mississauga, ON, Canada) was used to present the accented stimuli and to record participants' eye movements. Participants wore the same headphones for the duration of the experiment, and sat with their chin on a chin rest facing the computer monitor. The experimenter adjusted the chin rest so that each participant's eyes were level with the top half of the display screen, which was positioned 30 cm from the chin rest. Eye movements were recorded by tracking the pupil and corneal reflection of the right eye at a sample rate of 1000 Hz. Calibration was carried out using a standard 9-point configuration before the start of the experiment, and 5 min after the start time. A drift-check was carried out immediately before each trial and calibration was performed again if required.

Participants were randomly allocated to either the audiovisual (*N* = 32) or audio-only (*N* = 30) condition. The experimenter set the volume for all stimuli at a comfortable level for the first participant, and kept it at the same level for all participants thereafter. Participants first listened to the 30 native-accented baseline sentences. The SRT acquired for this test was then used to set the SNR at which the accented stimuli were presented in the background noise, for each individual participant. The SRT was rounded to the nearest whole number (for example, if a participant's SRT for the familiar-accented speech was −1.3 dB, the SNR for the accented stimuli was set at −1 dB). This was intended to equate baseline recognition for the audiovisual group at ∼50% accuracy; however, we expected recognition to be lower for the audio-only group. This would allow us to verify the amount of 'benefit' provided by the audiovisual speech. In both conditions, participants were requested to watch the screen and to repeat each sentence following the same task and scoring procedure as in Study 1. Oral responses were recorded using a Panasonic lapel microphone attached to the chin rest, and responses were scored retrospectively by the experimenter. All 90 accented sentences were presented consecutively, and participants pressed the space bar to trigger each trial at their own pace.

#### Data Analysis

We measured recognition accuracy by calculating % correctly repeated keywords per sentence. To compare recognition accuracy between groups, and to analyze changes over time, we fitted a linear function to each participant's recognition data (Erb et al., 2012; Banks et al., 2015) using the equation *y* = *mx*+*b*, where *y* is the mean SRT, *x* is time (trial), *m* is the slope, and *b* is the intercept. The intercept of each participant's linear fit was used as the measurement of recognition accuracy, and the slope was used as the measurement of adaptation. We carried out *t*-tests and Bayes factor calculations to analyze effects of modality on recognition accuracy and perceptual adaptation. To confirm that participants in the audiovisual group were predominantly looking at the speaker's face, we created a semi-circular region of interest around this area, and calculated percent fixation time in this region for the duration of the stimulus presentation. We analyzed eye-tracking samples to check for data loss (for example due to blinks or head movements); trials with *>*20% data loss were excluded, and two participants who had *>*5 trials excluded were not included in our analyses (number of excluded trials: *M* = 1.24, SD = 3.06). For consistency, eye movement data were collected for both groups; however, as the data from the audioonly group is not relevant to this paper, these data will not be discussed further. All other analyses were conducted in the same way as in Study 1.

#### Results

**Figure 5** shows mean recognition accuracy of the accented speech in the audiovisual and audio-only modalities, with linear fits. Recognition accuracy increased over time by a maximum of 10.8% (SD = 10.94) in the audiovisual group, and a maximum of 8.7% (SD = 13.61) in the audio-only group, suggesting that both groups adapted to the non-native accented speech. Recognition accuracy was consistently greater in the audiovisual group than the audio-only group, with a difference of ∼30% between the groups throughout the experiment. An independent-samples *t*-test confirmed that there was no significant difference in nativeaccented SRTs between the two groups, and that they were equally matched in their baseline ability to process non-native speech in background noise. **Figures 6A,B** show a negative relationship between the slope and intercept in each group indicating that, as in Study 1, participants with lower starting accuracy adapted the most.

There was a significant difference in the intercept for the audiovisual group (*M* = 45.32, SD = 9.52) and the audio-only group (*M* = 14.44, SD = 6.82); *t*(57) = 13.82, *p <* 0.001, *d'* = 3.58, confirming that recognition accuracy was significantly greater for the audiovisual group. However, there was no significant difference in slope between the audiovisual group (*M* = 1.78,

SD = 1.91) and the audio-only group (*M* = 1.27, SD = 1.77), *t*(57) = 1.07, *p* = 0.291, *d'* = 0.28. A Bayes factor calculation confirmed that the null hypothesis (that there was no difference in adaptation between the two groups) was likely, *B* = 0.09 (based on a uniform distribution and upper and lower limits of 0–20% improvement). Finally, analysis of the eye-tracking data confirmed that participants primarily looked at the speaker's face during presentation of the audiovisual stimuli (% gaze time on the speaker's face: *M* = 100%, SD = 0.01%).

#### Discussion

Study 2 investigated whether perceptual adaptation to non-native accented speech differed when participants were exposed to audiovisual or audio-only stimuli. In comparison to Study 1, we exposed participants to the accented stimuli in either the audiovisual or audio-only modality without separate training. Participants were now exposed to twice as many audiovisual sentences as the training groups in Study 1, and could potentially benefit from the audiovisual cues at all stages of the experiment. Participants also performed the task in consistent conditions throughout the experiment without interruption, rather than in different modalities for testing and training. We used a Japanese accent and a different speaker for our stimuli to test whether audiovisual cues were more beneficial for recognizing a nonnative accent (in comparison to the novel accent used in Study 1). Lastly, we recorded participants' eye gaze to confirm that they looked predominantly at the speaker's face.

As in Study 1, recognition accuracy of the accented speech significantly improved over time. We observed a maximum increase of ∼10%, which is similar to previous studies of perceptual adaptation to accented speech (Bradlow and Bent, 2008; Gordon-Salant et al., 2010; Janse and Adank, 2012). As predicted, participants exposed to audiovisual stimuli had better overall recognition of the foreign-accented speech in noise than those exposed to audio-only stimuli. This replicates previous findings that audiovisual speech cues can improve recognition of accented speech in noise (Janse and Adank, 2012; Yi et al., 2013). However, we found no significant difference in the amount of perceptual adaptation between the audiovisual and audioonly groups at any stage of the experiment. If audiovisual cues were beneficial to perceptual adaptation of accented speech (in comparison to audio-only cues), we expected to observe a statistically significant difference.

# Overall Discussion

In the two studies described here, we investigated differences in perceptual adaptation to accented speech with audiovisual or audio-only stimuli. Study 1 employed an offline training design and a novel accent, while participants in Study 2 were exposed to a non-native accent in either modality without separate training. In both studies, we observed a benefit from audiovisual stimuli to recognition of the accented speech in noise. However, neither study demonstrated that audiovisual stimuli can improve perceptual adaptation to accented speech when compared to audio-only stimuli; furthermore, findings from Study 2 supported the null hypothesis.

#### Audiovisual Cues do not Improve Perceptual Adaptation to Accented Speech

We predicted that listeners would perceptually adapt to accented speech more when exposed to audiovisual stimuli, compared to just audio-only stimuli. We hypothesized that listeners would benefit from improved overall perception of the accented speech when visual cues were present (Arnold and Hill, 2001; Janse and Adank, 2012; Yi et al., 2013; Kawase et al., 2014), and would therefore be better able to disambiguate the unfamiliar phonetic pattern of the accent, and map it to the correct lexical items more successfully.

In Study 1, there was no significant different in adaptation between any of the groups. Bayes calculations indicated that there was indeed no effect present between the audiovisual and audioonly groups, however, much of the data was inconclusive and the training may therefore have not been fully effective. We argued that this may have been due to: (1) the length or timing of the training, (2) inconsistencies between the training and testing sessions, or (3) the specific accent or speaker. Nevertheless, after addressing these concerns in the design of Study 2, there was still no clear advantage for perceptual adaptation to accented speech with audiovisual cues. In fact, Bayes analyses suggested that the data from Study 2 support the null hypothesis – that is, the presence of visual cues does not benefit adaptation to accented speech.

Our results support previous findings by Janse and Adank (2012), who observed no significant difference in adaptation between audiovisual and audio-only accented sentences in older adults. However, our results conflict with the findings of Hazan et al. (2005), who observed that audiovisual cues *can* improve perceptual adaptation to individual non-native phonemes. These conflicting results suggest that, although audiovisual cues may help listeners to perceptually learn individual speech sounds (as in Hazan et al., 2005), this benefit does not generalize to longer items of accented speech such as sentences (as used in the present study), perhaps reflecting the increased difficulty of speech-reading longer items (Grant and Seitz, 1998; Sommers et al., 2005).

Our results suggest that perceptual adaptation to accented speech is a robust ability that is not necessarily affected by the perceptual quality of the speech, as our participants adapted to the accented speech equally in conditions with or without visual cues that improved intelligibility. Indeed, Bradlow and Bent (2008) have demonstrated that the relative intelligibility of an accent (and therefore the perceived quality of the perceptual input) does not necessarily influence the amount that listeners can adapt to it. Perceptual adaptation to accented speech may therefore be primarily driven by factors internal to the listener rather than the perceptual environment, for example statistical learning (Neger et al., 2014) or cognitive abilities (Adank and Janse, 2010; Janse and Adank, 2012; Banks et al., 2015). However, it is possible that audiovisual cues benefit listeners in ways that we did not measure in the present studies, for example in terms of listening effort – that is, the presence of audiovisual cues may have reduced the effort associated with processing accented speech (Van Engen and Peelle, 2014). A more sensitive measure such as response times may have revealed a benefit from the audiovisual cues, although this was not the case for older adults (Janse and Adank, 2012).

Some limitations to the present findings should also be acknowledged. Firstly, a benefit from audiovisual cues may be present with more exposure. Indeed, a significant benefit from audiovisual cues has been observed for perceptual adaptation to noise-vocoded speech after exposure to a greater number of stimuli than in the present two studies (Pilling and Thomas, 2011). Secondly, the audio-only group in Study 2 had a lower baseline level of recognition accuracy than the audiovisual group (15% compared to 45% accuracy); this was intentional and allowed us to confirm that the presence of audiovisual speech cues from our speaker was beneficial to performance. However, it left more room for improvement in the audioonly group and potentially impacted the amount of adaptation our participants achieved, as in both groups poorer performers adapted the most (see **Figures 6A,B**). A comparison of adaptation to audiovisual and audio-only accented speech, with baseline recognition equated in both groups, may produce different results.

#### Audiovisual Cues Benefit Recognition of Accented Speech in Noise

Results from both studies replicate previous findings that audiovisual cues can benefit recognition of accented speech in noise when compared to only auditory cues (Arnold and Hill, 2001; Janse and Adank, 2012; Yi et al., 2013; Kawase et al., 2014). We observed a difference in recognition accuracy of ∼30% between the two groups in Study 2, and 15% between the two groups in Study 1 (during training with background noise). It is likely that visual cues from a speaker's facial movements help the listener to identify ambiguous or unclear phonemes by constraining the possible interpretations, or perhaps helping to identify prosodic cues (Swerts and Krahmer, 2008). Nevertheless, in both studies, we only observed greater recognition accuracy for the audiovisual groups when background noise was present, suggesting that benefits may have been related to compensation for the background noise, rather than the accented speech *per se*. Particularly, in Study 1 we did not observe a significant difference in recognition accuracy between the audiovisual and audio-only training groups when the stimuli were presented in quiet. However, recognition accuracy for these training groups was almost at ceiling level and the additional perceptual input from the audiovisual cues may therefore have been redundant, as the perceived clarity of the auditory signal can influence the benefits gained from audiovisual speech cues (Ross et al., 2007).

Listeners can perceptually adapt to accented speech very rapidly, even after exposure to a few sentences (cf. Clarke and Garrett, 2004), and this poses a practical limitation to studies of perceptual adaptation to, or recognition of, accented speech. As in the present studies, the most commonly used method to avoid ceiling effects is to add background noise, and this is the context in which an audiovisual benefit to accented sentences has previously been observed (Janse and Adank, 2012; Yi et al., 2013). However, two studies have also demonstrated this effect with audiovisual stimuli presented in quiet. Kawase et al. (2014) investigated adaptation to audiovisual accented phonemes in quiet; however, removing any lexical or semantic information increases the task difficulty, but perhaps does not reflect an ecologically valid context. Arnold and Hill (2001) used longer speech passages and a semantic comprehension task to assess the contribution of audiovisual cues; but, the task may have reflected semantic memory processes rather than speech recognition *per se*, and the result has not since been replicated. The extent to which audiovisual cues can benefit recognition of accented speech in optimal, quiet listening conditions remains, therefore, to be confirmed.

Finally, we observed different amounts of audiovisual benefit between the two studies. This may be explained by differences in the speaker and accent used. Kawase et al. (2014) observed that audiovisual speech affects the perception of non-native phonemes to varying degrees; it is therefore likely that different accents result in varying benefits from visual speech cues. Furthermore, visemes (the visual equivalent of phonemes) from different speakers can vary in intelligibility (for example, Kricos and Lesner, 1982, 1985), possibly resulting in different benefits from our two speakers. Our results therefore add to existing evidence that being face-to-face with a speaker does not always benefit the listener to the same extent.

#### Conclusion

The present studies demonstrate that audiovisual speech cues do not benefit perceptual adaptation to accented speech – that

#### References


is, observing audiovisual cues from a speaker's face does not lead to greater improvements in recognition of accented speech over time, when compared to listening to auditory speech alone. Audiovisual cues may still provide benefits to recognition of accented speech in noisy listening conditions, as we found a benefit to recognition of both types of accented speech in noise in comparison to audio-only speech. However, our results also demonstrate that the benefits obtained from audiovisual speech cues vary greatly, and the extent to which they benefit recognition of accented speech, as opposed to background noise, still needs to be clarified.

# Acknowledgments

This work was funded by a Biotechnology and Biological Sciences Research Council research studentship awarded to BB, and by The University of Manchester. The authors thank Stuart Rosen for supplying the adaptive staircase program and the speechshaped noise.


comprehension in noisy environment. *Cereb. Cortex* 17, 1147–1153. doi: 10.1093/cercor/bhl024


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Banks, Gowen, Munro and Adank. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Visual Feedback of Tongue Movement for Novel Speech Sound Learning

#### William F. Katz \* and Sonya Mehta

Speech Production Lab, Callier Center for Communication Disorders, School of Behavioral and Brain Sciences, The University of Texas at Dallas, Dallas, TX, USA

Pronunciation training studies have yielded important information concerning the processing of audiovisual (AV) information. Second language (L2) learners show increased reliance on bottom-up, multimodal input for speech perception (compared to monolingual individuals). However, little is known about the role of viewing one's own speech articulation processes during speech training. The current study investigated whether real-time, visual feedback for tongue movement can improve a speaker's learning of non-native speech sounds. An interactive 3D tongue visualization system based on electromagnetic articulography (EMA) was used in a speech training experiment. Native speakers of American English produced a novel speech sound (/ã/; a voiced, coronal, palatal stop) before, during, and after trials in which they viewed their own speech movements using the 3D model. Talkers' productions were evaluated using kinematic (tongue-tip spatial positioning) and acoustic (burst spectra) measures. The results indicated a rapid gain in accuracy associated with visual feedback training. The findings are discussed with respect to neural models for multimodal speech processing.

Keywords: speech production, second language learning, visual feedback, audiovisual integration, electromagnetic articulography, articulation therapy

#### Edited by:

Marcelo L. Berthier, University of Malaga, Spain

#### Reviewed by:

Peter Sörös, University of Western Ontario, Canada Caroline A. Niziolek, Boston University, USA

> \*Correspondence: William F. Katz wkatz@utdallas.edu

Received: 13 June 2015 Accepted: 26 October 2015 Published: 19 November 2015

#### Citation:

Katz WF and Mehta S (2015) Visual Feedback of Tongue Movement for Novel Speech Sound Learning. Front. Hum. Neurosci. 9:612. doi: 10.3389/fnhum.2015.00612 INTRODUCTION

Natural conversation is a multimodal process, where the visual information contained in a speaker's face plays an important role in decoding the speech signal. Integration of the auditory and visual modalities has long been known to be more advantageous to speech perception than either input alone. Early studies of lip-reading found that individuals with hearing loss could more accurately recognize familiar utterances when provided with both auditory and visual cues compared to either modality on its own (Numbers and Hudgins, 1948; Erber, 1975). Research on healthy hearing populations has also shown that audiovisual integration enhances comprehension of spoken stimuli, particularly in noisy environments or situations where the speaker has a strong foreign accent (O'Neill, 1954; Sumby and Pollack, 1954; Erber, 1975; Reisberg et al., 1987). Even under optimal listening conditions, observing a talker's face improves comprehension for complex utterances, suggesting that visual correlates of speech movement are a central component to processing speech sounds (Reisberg et al., 1987; Arnold and Hill, 2001).

Studies investigating how listeners process conflicting audio and visual signals also support a critical role of the visual system during speech perception (McGurk and MacDonald, 1976; Massaro, 1984; Summerfield and McGrath, 1984). For example, listeners presented with the auditory signal for "ba" concurrently with the visual signal for "ga" typically report a blended percept, the well-known "McGurk effect." A recent study by Sams et al. (2005) demonstrated that the McGurk effect occurs even if the source of the visual input is the listener's own face. In this study, subjects wore headphones and silently articulated a "pa" or "ka" while observing their productions in a mirror as a congruent or incongruent audio stimulus was simultaneously presented. In addition to replicating the basic McGurk (blended) effect, researchers found that simultaneous silent articulation alone moderately improved auditory comprehension, suggesting that knowledge from one's own motor experience in speech production is also exploited during speech perception. Other cross-modal studies support this view. For instance, silently articulating a syllable in synchrony with the presentation of a concordant auditory and/or visually ambiguous speech stimulus has been found to improve syllable identification, with concurrent mouthing further speeding the perceptual processing of a concordant stimulus (Sato et al., 2013; also see Mochida et al., 2013; D'Ausilio et al., 2014). Taken together, these studies indicate that listeners benefit from multimodal speech information during the perception process.

Audiovisual (AV) information also plays an important role in acquiring novel speech sounds, according to studies of second language (L2) learning. Research has shown that speech comprehension by non-native speakers is influenced by the presence/absence of visual input (see Marian, 2009, for review). For instance, Spanish-speakers exposed to Catalan can better discriminate the non-native tense-lax vowel pair /e/ and /ε/ when visual information is added (Navarra and Soto-Faraco, 2007).

Computer-assisted pronunciation training (CAPT) systems have provided a new means of examining AV processing during language learning. Many CAPT systems, such as "Baldi" (Massaro and Cohen, 1998; Massaro, 2003; Massaro et al., 2006), "ARTUR" (Engwall et al., 2006; Engwall and Bälter, 2007; Engwall, 2008), "ATH" (Badin et al., 2008), "Vivian" (Fagel and Madany, 2008), and "Speech Tutor" (Kröger et al., 2010), employ animated talking heads, most of which can optionally display transparent vocal tracts showing tongue movement. "Tongue reading" studies based on these systems have shown small but consistent perceptual improvement when tongue movement information is added to the visual display. Such effects have been noted in word retrieval for acoustically degraded sentences (Wik and Engwall, 2008) and in a forced-choice consonant identification task (Badin et al., 2010).

Whereas the visual effects on speech perception are fairly well-established, the visual effects on speech production are less clearly understood. Massaro and Light (2003) investigated the effectiveness of using Baldi in teaching non-native phonetic contrasts (/r/-/l/) to Japanese learners of English. Both external and internal views (i.e., showing images of the speech articulators) of Baldi were found to be effective, with no added benefit noted for the internal articulatory view. A subsequent, rather preliminary report on English-speaking students learning Chinese and Arabic phonetic contrasts reported similar negative results for the addition of visual, articulatory information (Massaro et al., 2008). In this study, training with the Baldi avatar showing face (Mandarin) or internal articulatory processes (Arabic) provided no significant improvement in a small group of students' productions, as rated by native listeners.

In contrast, Liu et al. (2007) observed potentially positive effects of visual feedback on speech production for 101 English-speaking students learning Mandarin. This investigation contrasted three feedback conditions: audio only, human audiovisual, and a Baldi avatar showing visible articulators. Results indicated that all three methods improved students' pronunciation accuracy. However, for the final rime pronunciation both the human audiovisual and Baldi condition scores were higher than audio-only, with the Baldi condition significantly higher than the audio condition. This pattern is compatible with the view that information concerning the internal articulators helps relay information to assist in L2 production. Taken together, these studies suggest that adding visual articulatory information to 3D tutors can lead to improvements for producing certain language contrasts. However, more work is needed to establish the effectiveness, consistency, and strength of these techniques.

At the neurophysiological level, AV speech processing can be related to the issue of whether speech perception and production is supported by a joined action-observation matching system. Such a system has been related to "mirror" neurons originally described in the macaque brain [for reviews see (Rizzolatti and Craighero, 2004; Pulvermüller and Fadiga, 2010; Rizzolatti et al., 2014); although see (Hickok, 2009, 2010) for an opposing view]. Mirror neurons are thought to fire both during goal-directed actions and while watching a similar action made by another individual. Research has extended this finding to audiovisual systems in monkeys (Kohler et al., 2002) and speech processing in humans (e.g., Rizzolatti and Arbib, 1998; Arbib, 2005; Gentilucci and Corballis, 2006).

In support of this view, studies have linked auditory and/or visual speech perception with increased activity in brain areas involved in motor speech planning, execution, and proprioceptive control of the mouth (e.g., Möttönen et al., 2004; Wilson et al., 2004; Ojanen et al., 2005; Skipper et al., 2005, 2006, 2007a,b; Pekkola et al., 2006; Pulvermüller et al., 2006; Wilson and Iacoboni, 2006; Zaehle et al., 2008). Similarly, magnetoencephalography (MEG) studies have linked speech production with activity in brain areas specialized for auditory and/or visual speech perception processes (e.g., Curio et al., 2000; Gunji et al., 2001; Houde et al., 2002; Heinks-Maldonado et al., 2006; Tian and Poeppel, 2010). While auditory activation during speech production is expected (because acoustic input is normally present), Tian and Poeppel's (2010) study shows auditory cortex activation in the absence of auditory input. This suggests that an imaginary motor speech task can nevertheless generate forward predictions via an auditory efference copy.

Overall, these neurophysiological findings suggest a brain basis for the learning of speech motor patterns via visual input, which in turn would strengthen the multimodal speech representations in feedforward models. In everyday situations, visual articulatory input would normally be lip information only. However, instrumental methods of transducing tongue motion (e.g., magnetometry, ultrasound, MRI) raise the possibility that visual tongue information may also play a role.

Neurocomputational models of speech production provide a potentially useful framework for understanding the intricacies of AV speech processing. These models seek to provide an integrated explanation for speech processing, incorporated in

testable artificial neural networks. Two prominent models include "Directions Into Velocities of Articulators" (DIVA) (Guenther and Perkell, 2004; Guenther, 2006; Guenther et al., 2006; Guenther and Vladusich, 2012) and "ACTion" (ACT) (Kröger et al., 2009). These models assume as input an abstract speech sound unit (a phoneme, syllable, or word), and generate as output both articulatory and auditory representations of speech. The systems operate by computing neural layers (or "maps") as distributed activation patterns. Production of an utterance involves fine-tuning between speech sound maps, sensory maps, and motor maps, guided by feedforward (predictive) processes and concurrent feedback from the periphery. Learning in these models critically relies on forward and inverse processes, with the internal speech model iteratively strengthened by the interaction of feedback information.

Researchers have used neurocomputational frameworks to gain important insights about speech and language disorders, including apraxia of speech (AOS) in adults (Jacks, 2008; Maas et al., 2015), childhood apraxia (Terband et al., 2009; Terband and Maassen, 2010), developmental speech sound disorders (Terband et al., 2014a,b), and stuttering (Max et al., 2004; Civier et al., 2010). For example, DIVA simulations have been used to test the claim that apraxic disorders result from relatively preserved feedback (and impaired feedforward) speech motor processes (Civier et al., 2010; see also Maas et al., 2015). These neurocomputational modelingbased findings correspond with largely positive results from visual augmented feedback intervention studies for individuals with AOS (see Katz and McNeil, 2010 for review; also, Preston and Leaman, 2014). Overall, these intervention findings have suggested that visual augmented feedback of tongue movement can help remediate speech errors in individuals with AOS, presumably by strengthening the internal model. Other clinical studies have reported that visual feedback can positively influence the speech of individuals with a variety of speech and language problems in children and adults, including articulation/phonological disorders, residual sound errors, and dysarthria. This research has included training with electropalatography (EPG) (Hardcastle et al., 1991; Dagenais, 1995; Goozee et al., 1999; Hartelius et al., 2005; Nordberg et al., 2011), ultrasound (Bernhardt et al., 2005; Preston et al., 2014) and strain gauge transducer systems (Shirahige et al., 2012; Yano et al., 2015).

Visual feedback training has also been used to study information processing during second language (L2) learning. For example, Levitt and Katz (2008) examined augmented visual feedback in the production of a non-native consonant sound. Two groups of adult monolingual American English speakers were trained to produce the Japanese post-alveolar flap /ó/. One group received traditional second language instruction alone and the other group received traditional second language instruction plus visual feedback for tongue movement provided by a 2D EMA system (Carstens AG100, Carstens Medizinelektronik GmbH, Bovenden, Germany, www.articulograph.de). The data were perceptually rated by monolingual Japanese native listeners and were also analyzed acoustically for flap consonant duration. The results indicated improved acquisition and maintenance by the participants who received traditional instruction plus EMA training. These findings suggest that visual information regarding consonant place of articulation can assist second language learners with accent reduction.

In another recent study, Suemitsu et al. (2013) tested a 2D EMA-based articulatory feedback approach to facilitate production of an unfamiliar English vowel (/æ/) by five native speakers of Japanese. Learner-specific vowel positions were computed for each participant and provided as feedback in the form of a multiple-sensor, mid-sagittal display. Acoustic analysis of subjects' productions indicated that acoustic and articulatory training resulted in significantly improved /æ/ productions. The results suggest feasibility and applicability to vowel production, although additional research will be needed to determine the separable roles of acoustic and articulatory feedback in this version of EMA training.

Recent research has shown that 3D articulography systems afford several advantages over 2D systems: recording in x/y/z dimensions (and two angles), increased accuracy, and the ability to track movement from multiple articulators placed at positions other than tongue midline (Berry, 2011; Kroos, 2012; Stella et al., 2013). As such, visual augmented feedback provided by these systems may offer new insights on information processing during speech production. A preliminary test of a 3D EMAbased articulatory feedback system was conducted by Katz et al. (2014). Monolingual English speakers were asked to produce several series of four CV syllables. Each series contained four different places of articulation, one of which was an alveolar (e.g., bilabial, velar, alveolar, palatal; such as /pa/-/ka/-/ta/-/ja/). A 1-cm target sphere was placed at each participant's alveolar region. Four of the five participants attempted the series with no visible feedback. The fifth subject was given articulatory visual feedback of their tongue movement and requested to "hit the target" during their series production. The results showed that subjects in the no-feedback condition ranged between 50 and 80% accuracy, while the subject given feedback showed 90% accuracy. These preliminary findings suggested that the 3D EMA system could successfully track lingual movement for consonant feedback purposes, and that feedback could be used by talkers to improve consonantal place of articulation during speech.

A more stringent test of whether 3D visual feedback can modify speech production would involve examining how individuals perform when they must achieve an unfamiliar articulatory target, such as a foreign speech sound. Therefore, in the present experiment we investigated the accuracy with which healthy monolingual talkers could produce a novel, non-English, speech sound (articulated by placing the tongue blade at the palatal region of the oral cavity) and whether this gesture could benefit from short-term articulatory training with visual feedback.

### MATERIALS AND METHODS

This study was conducted in accordance with the Department of Health and Human Services regulations for the protection of human research subjects, with written informed consent received from all subjects prior to the experiment. The protocol for this research was approved by the Institutional Review Board at the University of Texas at Dallas. Consent was obtained from all subjects appearing in audio, video, or figure content included in this article.

# Participants and Stimuli

Five college-age subjects (three male, two female) with General American English (GAE) accents participated in this study. All talkers were native speakers of English with no speech, hearing, or language disorders. Three participants had elementary speaking proficiency with a foreign language (M03, F02: Spanish; F01: French). Participants were trained to produce a novel consonant in the /ACA/ context while an electromagnetic articulograph system recorded lingual movement. For this task, we selected a speech sound not attested as a phoneme among the world's languages: a voiced, coronal, palatal stop. Unlike palatal stops produced with the tongue body, found in languages such as Czech (/c/ and /Í/), subjects were asked to produce a closure with the tongue anterior (tip/blade) contacting the hard palate. This sound is similar to a voiced retroflex alveolar /ã/, but is articulated in the palatal, not immediately post-alveolar region. As such, it may be represented in the IPA as a backed, voiced retroflex stop: / ã/. Attested cases appear rarely in the world's languages and only as allophones. For instance, Dart (1991) notes some speakers of O'odham (Papago) produce voiced palatal sounds with (coronal) laminal articulation, instead of the more usual tongue body articulation (see Supplementary Materials for a sample sound file used in the present experiment).

Stimuli were elicited in blocks of 10 /ACA/ production attempts under a single-subject ABA design. Initially, the experimental protocol called for three pre-training, three training, and three post-training blocks from each subject (for a total of 90 productions). However, because data for this study were collected as part of a larger investigation of stop consonant productions, there was some subject attrition and reduced participation for the current experiment. Thus, the criterion for completion of the experiment was changed to a minimum of one block of baseline (no feedback) probes, 2–3 blocks of visual feedback training, and 1–3 blocks of post-feedback probes, for a total of 40–80 productions from each participant. All trials were conducted within a single experimental session lasting approximately 15 min.

#### Procedure

Training sessions were conducted in a quiet testing room at the University of Texas at Dallas. Each participant was seated next to the Wave system, facing a computer monitor located approximately 1 m away. Five sensors were glued to the subject's tongue using a biocompatible adhesive: one each at tongue tip (∼1 cm posterior to the apex), tongue middle (∼3 cm posterior to apex), tongue back (∼4 cm posterior to the apex), and both left and right tongue lateral positions. Sensors were also attached to a pair of glasses worn by the subject to establish a frame of reference for head movement. A single sensor was taped on the center of the chin to track jaw movement.

#### Visual Feedback Apparatus

External visual feedback for lingual movement was provided to subjects using a 3D EMA-based system (Opti-Speech, Vulintus LLC, Sachse, Texas, United States, http://www.vulintus.com/). This system works by tracking speech movement with a magnetometer (Wave, Northern Digital Incorporated, Waterloo, Ontario, Canada). An interface allows users to view their current tongue position (represented by an image consisting of flesh-point markers and a modeled tongue surface) within a transparent head with a moving jaw. Small blue spheres mark different regions on the animated tongue (tongue tip, tongue middle, tongue back, or tongue left/right lateral). Users may adjust the visibility of these individual markers and/or select or deselect "active" markers for speech training purposes. Articulatory targets, shown on the screen as semi-transparent red or orange spheres, can be placed by the user in the virtual oral cavity. The targets change color to green when the active marker enters, indicating correct tongue position, thus providing immediate visual feedback for place of articulation (see Katz et al., 2014 for more information). The target size and "hold time on target" can be varied by the user to make the target matching task easier or harder. An illustration of the system is shown in **Figure 1**.

#### Pronunciation Training

The backed palatal stop consonant / ã/ is produced by making a closure between the tongue tip and hard palate. Therefore, the tongue tip marker was designated as the active marker for this study. A single target was placed at the palatal place of articulation to indicate where the point of maximum constriction should occur during the production of / ã/. To help set the target, participants were requested to press their tongue to the roof of their mouth, allowing the tongue sensors to conform to the contours of the palate. The experimenter then placed the virtual target at the location of the tongue middle sensor, which was estimated to correspond to the palatal (typically, prepalatal) region. Based on previous work (Katz et al., 2014), we

FIGURE 1 | Illustration of the Opti-Speech system, with subject wearing sensors and head-orientation glasses (lower right insert). A sample target sphere, placed in this example at the subject's alveolar ridge, is shown in red. A blue marker indicates the tongue tip/blade (TT) sensor.

selected a target sphere of 1.00 cm in volume, with no hold time.

The current experiment was conducted as part of a larger study investigating stop consonant production that employed visual feedback for training purposes. As such, by the start of the experiment each participant had received an opportunity to accommodate to the presence of the Wave sensors on the tongue and to practice speaking English syllables and words under visual feedback conditions for approximately 25–30 min. In order to keep practice conditions uniform in the actual experiment, none of these warmup tasks involved producing a novel, non-English sound.

For the present experiment, participants were trained to produce the voiced, coronal, palatal stop, / ã/. The investigator (SM) described the sound to subjects as "sound[ing] like a 'd,' but produced further back in the mouth." A more precise articulatory explanation was also provided, instructing participants to feel along the top of their mouth from front to back to help identify the alveolar ridge. Participants were then told to "place the tip of [their] tongue behind the alveolar ridge and slide it backwards to meet with the roof, or palate, of the mouth." The investigator, a graduate student with a background in phonetics instruction, produced three repetitions of /A ãA/ (live) for participants to imitate. Each participant was allowed to practice making the novel consonantal sound 3–5 times before beginning the nofeedback trial sessions. This practice schedule was devised based on pilot data suggesting 3–5 practice attempts were sufficient for participants to combine the articulatory, modeled, and feedback information to produce a series of successive "best attempts" at the novel sound. Throughout the training procedure, the investigator provided generally encouraging comments. In addition, if an attempt was judged perceptually to be offtarget (e.g., closer to an English /d/ or the palatalized alveolar stop, /d<sup>j</sup> /), the investigator pointed out the error and repeated the (articulatory) instructions.

When the participant indicated that he/she understood all of the instructions, pre-training (baseline) trials began. After each block of attempts, participants were given general feedback about their performance and the instructions were reiterated if necessary. Once all pre-training sessions were completed, the participant was informed that the Opti-Speech visual feedback system would now be used to help them track their tongue movement. Subjects were instructed to use the tongue model as a guide for producing the palatal sound by moving the tongue tip upwards and backwards until the tongue tip marker entered the palatal region and the target lit up green, indicating success (see **Figure 2**). Each participant was allowed three practice attempts at producing the novel consonant while simultaneously watching the tongue model and aiming for the virtual target.

After completing the training sessions, the subject was asked to once again attempt to produce the sound with the visual feedback removed. No practice attempts were allowed between the training and post-training trial sessions. During all trials, the system recorded the talker's kinematic data, including a record of target hits (i.e., accuracy of the tongue-tip sensor entering the subject's palatal zone). The experiments were also audio- and video-recorded.

FIGURE 2 | Close-up of tongue avatar during a "hit" for the production of the voiced, retroflex, palatal stop consonant. The target sphere lights up green, providing visual feedback for the correct place of articulation.

# RESULTS

## Kinematic Results

All participants completed the speaking task without noticeable difficulty. Speakers' accuracy in achieving the correct articulation was measured as the number of hit targets out of the number of attempts in each block. Talker performance is summarized in **Figure 3**, which shows accuracy at the baseline (pre-training), visual feedback (shaded), and post-feedback (post-training) probes.

All talkers performed relatively poorly at baseline phase, ranging from 0 to 50% (x = 12.6%, sd = 14.1%) accuracy. Each participant showed a rapid increase in accuracy during the visual feedback phase (shaded), ranging from 50 to 100% (x = 74.9%, sd = 15.6). These gains appeared to be maintained during the post-feedback probes, with scores ranging from 70 to 100% (x = 85.3%, sd = 12.8%). Group patterns were examined using twoway paired t-tests. The results indicated a significant difference between pre-training and training phases, t(4) = 8.73, p < 0.001, and pre-training and post-training phases, t(4) = 14.0, p < 0.001. No significant difference was found between training and post-training, t(4) = 1.66, ns. This pattern suggests acquisition during the training phase, and maintenance of learned behavior immediately post-training.

An effect size for each subject was computed using the Percentage of Non-overlapping Data (PND) method described by Scruggs et al. (1987). This non-parametric analysis compares points of non-overlap between baseline and successive

intervention phases, and criteria are suggested for interpretation (Scruggs et al., 1986). Using this metric, all of the subjects' patterns were found to be greater than 90% (highly effective) for comparisons of both pre-training vs. training, and pre-training vs. post-training.

# Acoustic Results

In order to corroborate training effects, we sought acoustic evidence of coronal (tongue blade) palatal stop integrity. This second analysis investigated whether the observed improvement in talkers' articulatory precision resulting from training would be reflected in patterns of the consonant burst spectra. Shortterm spectral analyses were obtained at the moment of burst release (Stevens and Blumstein, 1975, 1978). Although, burst spectra may vary considerably from speaker to speaker, certain general patterns may be noted. Coronals generally have energy distribution across the whole spectrum, with at least two peaks between 1.2 and 3.6 kHz), termed "diffuse" in the feature system of Jakobson et al. (1952). Also, coronals typically result in relatively higher-frequency spectral components than articulations produced by lips or the tongue body, and these spectra are therefore described as being "acute" (Jakobson et al., 1952; Hamann, 2003) or "diffuse-rising" (Stevens and Blumstein, 1978).

Burst frequencies vary as a function of the length of the vocal tract anterior to the constriction. Thus, alveolar constriction results in a relatively high burst, ranging from approximately 2.5 to 4.5 kHz (e.g., Reetz and Jongman, 2009), while velar stops, having a longer vocal tract anterior to the constriction, produce lower burst frequencies (ranging from approximately 1.5 to 2.5 kHz). Since palatal stops are produced with a constriction located between the alveolar and velar regions, palatal stop bursts may be expected to have regions of spectral prominence between the two ranges, in the 3.0–5.0 kHz span. Acoustic analyses of Czech or Hungarian velar and palatal stops generally support this view. For instance, Keating and Lahiri (1993) note that the Hungarian palatal stop /ca/ spectrum slopes up to its highest peak "at 3.0–4.0 kHz or ever higher," but otherwise show "a few peaks of similar amplitude which together dominate the spectrum in a single broad region" (p. 97). A study by Dart (1991) obtained palatographic and spectral data for O'odham (Papago) voiced palatal sounds produced with laminal articulation. Analysis of the burst spectra for these (O'odham) productions revealed mostly diffuse rising spectra, with some talkers showing "a high amplitude peak around 3.0–5.0 Hz" (p. 142).

For the present experiment, three predictions were made: (1) palatal stop consonant bursts prior to training will have diffuse rising spectra with characteristic peaks in the 3.0–5.0 kHz range, and (2) following training, these spectral peaks will shift downwards, reflecting a more posterior constriction (e.g., from an alveolar toward a palatal place of articulation), and (3) post-training token-to-token variability should be lower than at baseline, reflecting increased articulatory ability.

#### Spectral Analysis

Talkers' consonantal productions were digitized and analyzed using PRAAT (Boersma and Weenink, 2001) with a scripting procedure using linear predictive coding (LPC) analysis. A cursor was placed at the beginning of the consonant burst of each syllable and a 12 ms Kaiser window was centered over the stop transient. Autocorrelation-based LPC (24 pole model, +6 dB pre-emphasis) yielded spectral sections. Overlapping plots of subjects' repeat utterances were obtained for visual inspection, with spectral peaks recorded for analysis.

**Figure 4** shows overlapping plots of spectra obtained preand post-EMA training for 4/5 talkers. Plots containing (RMS) averages for pre-training (incorrect) and post-training (correct) spectra are also shown, for comparison. Spectra for talker F01 could not be compared because this talker's initial productions were realized as CV syllables (instead of VCV), and differing vowel context is known to greatly affect burst consonant spectral characteristics (Stevens, 2008).

Results revealed mixed support for the experimental predictions. Similar to previous reports (e.g., Dart, 1991), there were considerable differences in the shapes of the burst spectral patterns from talker to talker. Three of the four talkers' spectra (M01, M02, and M03) were diffuse, having at least two peaks between 1.2 and 3.6 kHz, while the spectra of talker F01 had peaks in a mid-frequency ("compact") range of 2.0–3.0 kHz. Patterns of spectral tilt for all speakers were generally falling (instead of rising, as expected).

The prediction that 3.0–5.0 kHz spectral peak frequencies would lower following training was not uniformly obtained. Because standard deviations were relatively high and there was much inter-talker variability, the data are summarized, rather than tested statistically.

Talker M01's data had six peaks pre-treatment (x = 3967; sd = 596) and five peaks post-training (x = 4575; sd = 281). Talker M02's productions yielded five peaks pre-training (x = 3846; sd = 473) and nine peaks post-training (x = 3620; sd = 265). Talker M03 had six peaks pre-training (x = 4495 sd = 353) and nine peaks post-training (x = 3687; sd = 226). The spectra of talker F02 had peaks in a mid-frequency ("compact") range of approximately 2.0–3.0 kHz. This talker's spectral peak values did not shift with training (pre-training: x = 2359 Hz, sd = 139 Hz; post-training: x = 2390 Hz, sd = 194 Hz). In summary, talkers M03 and M02 showed the expected pattern of spectra peak

at right, for comparison.

lowering, F02 showed no training-dependent changes, and M01 showed a pattern in the opposite direction.

Of the talkers with spectra data available, three (M01, M02, and M03) showed marked reduction in variability (i.e., reduced standard deviation values) from pre-training to post-training, suggesting that training corresponded with increased production consistency. However, this was not the case for talker F02, whose mid-range spectral peaks showed a slight increase in variability after training.

## DISCUSSION

Five English-speaking subjects learned a novel consonant (a voiced, coronal, and palatal stop) following a brief training technique involving visual augmented feedback of tongue movement. The results of kinematic analyses indicate that realtime visual (articulatory) feedback resulted in improved accuracy of consonant place of articulation. Articulatory feedback training for place of articulation corresponded with a rapid increase in the accuracy of tongue tip spatial positioning, and post-training probes indicated (short-term) retention of learned skills.

Acoustic data for talkers' burst spectra obtained pre- and posttraining only partially confirmed the kinematic findings, and there were a number of differences noted from predictions. First, for those talkers that showed diffuse spectra (e.g., with two peaks between 1.2 and 3.6 kHz), the spectra were falling, instead of rising. This may have been due to a number of possible factors, including the current choice of a Kaiser window for spectral analysis. Some of the original studies, such as those which first noted the classic "diffuse rising" patterns in spectral slices, fitted half-Hamming windows over the burst to obtain optimum preemphasis for LPC analysis (e.g., Stevens and Blumstein, 1978). Second, talker F02 showed mid-range ("compact") spectral peaks ranging between 2.0 and 3.0 kHz. This may be due to tongue shape, which can affect the affect spectral characteristics of the stop burst. For example, laminal (tongue blade) articulation results in relatively even spectral spread, while apical (tongue-tip) articulation results in strong mid-frequency peaks (Ladefoged and Maddieson, 1996) and less spread (Fant, 1973). In the present data, the spectra of talker F02 fits that pattern of a more apical production.

Despite individual differences, there was some evidence supporting the notion of training effects in the acoustic data. Chiefly, the three subjects with diffuse spectra (M01, M02, and M03) showed decreased variability (lowered standard deviations) following training, suggesting stabilized articulatory behavior. Although the current data are few, they suggest that burst spectra variability may be a useful metric to be explored in future studies.

It was predicted that spectral peaks in the 3.0–5.0 kHz range would lower in frequency as talkers improved their place of articulation, with training. However, the findings do not generally support this prediction: Talker M03 showed this pattern, M02 showed a trend, F02 showed no differences, and M01 trended in the opposite direction, with higher spectral peaks after training. Since the kinematic data establish that all talkers significantly increased tongue placement accuracy post-training, we speculate that several factors affecting burst spectra (e.g., tongue shape, background noise, or room acoustics) may have obscured any such underlying spectral shifts for the talkers. Future research should examine how burst spectra may be best used to evaluate outcomes in speech training studies.

The current kinematic data replicate and extend the findings of Ouni (2013) who found that talkers produced tongue body gestures more accurately after being exposed to a short training session of real-time ultrasound feedback (post-test) than when recorded at baseline (pre-test). The present results are also consistent with earlier work from our laboratory which found that monolingual English speakers showed faster and more effective learning of the Japanese post-alveolar flap, /ó/ using EMA-based visual feedback, when compared with traditional Japanese pronunciation instruction (Levitt and Katz, 2008). Taken together with the experimental data from this study, there is evidence that EMA-provided articulatory visual feedback may provide a means for helping L2 learners improve novel consonant distinctions.

However, a number of caveats must be considered. First, the current data are limited and the study should therefore be considered preliminary. The number of subjects tested was few (n = 5). Also, since the consonant trained, / ã/, is not a phoneme in any of the world's language, it was not possible to include perceptual data, such as native listener judgments (e.g., Levitt and Katz, 2010). Additional data obtained from more talkers will therefore be required before any firm conclusions can be drawn concerning the relation to natural language pronunciation.

Second, real-time (live) examples were given to subjects by the experimenter (SM) during the training phase, allowing for the possibility of experimenter bias. This procedure was adopted to simulate a typical second-language instruction setting, and care was taken to produce consistent examples, so as to not introduce "unfair" variability at the start of the experiment. Nevertheless, in retrospect it would have been optimal to have included a condition in which talkers were trained with prerecorded examples, to eliminate this potential bias.

Third, since articulatory training is assumed to draw on principles of motor learning, several experimental factors must be controlled before it is possible to conclude that a given intervention is optimal for a skill being acquired, generalized, or maintained (e.g., Maas et al., 2008; Bislick et al., 2012; Schmidt and Lee, 2013; Sigrist et al., 2013). For example, Ballard et al. (2012) conducted a study in which a group of English talkers was taught the Russian trilled /r/ sound using an EPGbased visual feedback system. In a short-term (five session) learning paradigm, subjects practiced in conditions either with continuous visual feedback provided by an EPG system, or were given no visual feedback. The results suggested that providing kinematic feedback continually though treatment corresponded with lower skill retention. This finding suggests that speech training follows the principle that kinematic feedback is most beneficial in the early phases of training, but may interfere with long-term retention if provided throughout training (Swinnen et al., 1993; Hodges and Franks, 2001; Schmidt and Lee, 2013). A pattern in the current data also potentially supports this principle. Three of the five participants (M01, M03, and F02) reached their maximum performance in the post-training phase, immediately after the feedback was removed. While this pattern was not statistically significant, it may suggest some interference effects from the ongoing feedback used. Future research should examine factors such as feedback type and frequency in order to better improve speech sound learning.

The current findings support the notion of a visual feedback pathway during speech processing, as proposed in the ACT neurocomputational model of speech production (Kröger and Kannampuzha, 2008). Similar to the DIVA model, ACT relies on feedforward and feedback pathways between distributed neural activation patterns, or maps. ACT includes explicit provisions for separate visual and auditory information processing. In **Figure 5**, we present a simplified model of ACT (adapted from Kröger et al., 2009) with (optional) modifications added to highlight pathways for external and internal audiovisual input. Since people do not ordinarily rely on visual feedback of tongue movement, these modifications explain how people learn under conditions of augmented feedback, rather than serving as key components of everyday speech.

The external input route (dotted circle on the right) indicates an outside speech source, including speech that is produced while hearing/observing human talkers or a computerized training agent (e.g., BALDI, ARTUR, ATH, or Vivian). The input audio and visual data are received, preprocessed, and relayed as input to respective unimodal maps. These maps yield output to a multimodal phonetic map that also receives (as input) information from a somatosensory map and from a phonemic map. Reciprocal feedback connections between the phonetic map, visual-phonetic processing, and auditoryphonetic processing modules can account for training effects from computerized training avatars. These pathways would presumably also be involved in AV model-learning behavior, including lip-reading abilities (see Bernstein and Liebenthal, 2014 for review) and compensatory tendencies noted in individuals with left-hemisphere brain damage, who appear to benefit from visual entrainment to talking mouths other than their own (Fridriksson et al., 2012).

In the (internal) visual feedback route (dotted arrows), a talker's own speech articulation is observed during production.

This may include simple mirroring of the lips and jaw, or instrumentally augmented visualizations of the tongue (via EMA, ultrasound, MRI, or articulatory inversion systems that convert sound signals to visual images of the articulators; e.g., Hueber et al., 2012). The remaining audio and visual preprocessing and mapping stages are similar between this internal route and the external (modeled) pathways. The present findings of improved consonantal place of articulation under conditions of visual (self) feedback training supports this internal route and the role of body sense/motor familiarity. This internal route may also play a role in explaining a number of other phenomena described in the literature, including the fact that talkers can discern between natural and unnatural tongue movements displayed by an avatar (Engwall and Wik, 2009), and that training systems based on a talkers' own speech may be especially beneficial for L2 learners (see Felps et al., 2009 for discussion).

The actual neurophysiological mechanisms underlying AV learning and feedback are currently being investigated. Recent work on oral somatosensory awareness suggests people have a unified "mouth image" that may be qualitatively different from other parts of the body (Haggard and de Boer, 2014). Since visual feedback does not ordinarily play a role in mouth experiences, other attributes, such as self-touch, may play a heightened role. For instance, Engelen et al. (2002) note that subjects can achieve high accuracy in determining the size of ball-bearings placed in the mouth, but show reduced performance when fitted with a plastic palate. This suggests that relative movement of an object between tongue and palate is important in oral size perception. We speculate that visual feedback systems rely in part on oral selftouch mechanism (particularly for consonant production), by visually guiding participants to the correct place of articulation, at which point somatosensory processes take over. This mechanism may prove particularly important for consonants, as opposed to vowels, which are produced with less articulatory contact.

Providing real-time motor feedback may engage different cortical pathways than are recruited in learning systems that employ more traditional methodologies. For example, Farrer et al. (2003) conducted positron emission tomography (PET) experiments in which subjects controlled a virtual hand on a screen under conditions ranging from full control, to partial control, to a condition where another person controlled the hand and there was no control. The results showed right inferior parietal lobule activation when subjects felt least in control of the hand, with reverse covariation in the insula. A crucial aspect here is corporeal identity, the feeling of one' own body, in order to determine motor behavior in the environment. Data suggest that body awareness is supported by a large network of neurological structures including parietal and insular cortex, with primary and secondary somatosensory cortex, insula, and posterior parietal cortex playing specific roles (see Daprati et al., 2010 for review). A region of particular interest is the right inferior parietal lobule (IPL), often associated to own-body perception and other body discrimination (Berlucchi and Aglioti, 1997; Farrer et al., 2003; Uddin et al., 2006). Additional neural structures that likely play a role in augmented feedback training systems include those associated with reward dependence during behavioral performance, including lateral prefrontal cortex (Pochon et al., 2002; Liu et al., 2011; Dayan et al., 2014). As behavioral data accrue with respect to both external (mirroring) and internal ("tongue reading") visual speech feedback, it will be important to also describe the relevant neural control structures, in order to best develop more complete models of speech production.

In summary, we have presented small-scale but promising results from an EMA-based feedback investigation suggesting that augmented visual information concerning one's own tongue movements boosts skill acquisition during the learning of consonant place of articulation. Taken together with other recent data (e.g., Levitt and Katz, 2010; Ouni, 2013; Suemitsu et al., 2013) the results may have potentially important implications for models of speech production. Specifically, distinct AV learning mechanisms (and likely, underlying neural substrates) appear to be engaged for different types of CAPT systems, with interactive, on-line, eye-to-tongue coordination involved in systems such as Opti-Speech (and perhaps Vizart3D, Hueber et al., 2012) being arguably different than processing involved in using external avatar trainers, such as ARTUR, BALDI, ATH, or Vivian. These different processing routes may be important when interpreting other data, such as the results of real-time, discordant, crossmodal feedback (e.g., McGurk effect). Future, studies should focus on extending the range of speech sounds, features, and articulatory structures trained with real-time feedback, with a focus on vowels as well as consonants (see Mehta and Katz, 2015). As findings are strengthened with designs that systematically test motor training principles, the results may open new avenues for understanding how AV information is used in speech processing.

# AUTHOR CONTRIBUTIONS

WK and SM designed the experiments. SM recruited the participants and collected the data. WK and SM performed the kinematic analysis. WK conducted the spectral analysis. WK and SM wrote the manuscript.

# ACKNOWLEDGMENTS

The authors gratefully acknowledge support from the University of Texas at Dallas Office of Sponsored Projects, the UTD Callier Center Excellence in Education Fund, and a grant awarded by NIH/NIDCD (R43 DC013467). We thank the participants for volunteering their time and Carstens Medezinelektronik GmbH for material support toward our research. We would also like to thank Marcus Jones, Amy Berglund, Cameron Watkins, Bill Watts, and Holle Carey for their contributions in apparatus design, data collection, data processing, and other support for this project.

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fnhum. 2015.00612

# REFERENCES


**Conflict of Interest Statement:** This research was partially supported by a grant to Vulintus, LLC entitled "Development of a software package for speech therapy" (NIH-SBIR 1 R43 DC013467). However, the sources of support for this work had no role in the study design, collection, analysis or interpretation of data, or the decision to submit this report for publication. The corresponding author (William F. Katz) had full access to all of the data in the study and takes complete responsibility for the integrity and accuracy of the data. The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Katz and Mehta. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Developmental Foreign Accent Syndrome: Report of a New Case

Stefanie Keulen1, 2, Peter Mariën1, 3, Peggy Wackenier <sup>3</sup> , Roel Jonkers <sup>2</sup> , Roelien Bastiaanse<sup>2</sup> and Jo Verhoeven4, 5 \*

<sup>1</sup> Clinical and Experimental Neurolinguistics, Vrije Universiteit Brussel, Brussels, Belgium, <sup>2</sup> Center for Language and Cognition Groningen, Rijksuniversiteit Groningen, Groningen, Netherlands, <sup>3</sup> Department of Neurology and Memory Clinic, ZNA Middelheim General Hospital, Antwerp, Belgium, <sup>4</sup> Department of Language and Communication Science, City University London, London, UK, <sup>5</sup> Computational Linguistics and Psycholinguistics Research Center, Universiteit Antwerpen, Antwerp, Belgium

This paper presents the case of a 17-year-old right-handed Belgian boy with developmental FAS and comorbid developmental apraxia of speech (DAS). Extensive neuropsychological and neurolinguistic investigations demonstrated a normal IQ but impaired planning (visuo-constructional dyspraxia). A Tc-99m-ECD SPECT revealed a significant hypoperfusion in the prefrontal and medial frontal regions, as well as in the lateral temporal regions. Hypoperfusion in the right cerebellum almost reached significance. It is hypothesized that these clinical findings support the view that FAS and DAS are related phenomena following impairment of the cerebro-cerebellar network.

Keywords: Developmental Foreign Accent Syndrome, FAS, Developmental apraxia of speech, speech disorder, constructional dyspraxia, SPECT

#### Edited by:

Srikantan S. Nagarajan, University of California, San Francisco, USA

#### Reviewed by:

Gianluca Serafini, University of Genoa, Italy Stéphane Poulin, Université Laval, Canada

#### \*Correspondence:

Jo Verhoeven jo.verhoeven@city.ac.uk

Received: 16 June 2015 Accepted: 09 February 2016 Published: 10 March 2016

#### Citation:

Keulen S, Mariën P, Wackenier P, Jonkers R, Bastiaanse R and Verhoeven J (2016) Developmental Foreign Accent Syndrome: Report of a New Case. Front. Hum. Neurosci. 10:65. doi: 10.3389/fnhum.2016.00065 INTRODUCTION

Foreign accent syndrome (FAS) is a relatively rare motor speech disorder in which segmental and prosodic speech alterations cause patients to be perceived as non-native speakers of their mother tongue (Blumstein et al., 1987; Ingram et al., 1992; Lippert-Gruener et al., 2005; Pyun et al., 2013; Tran and Mills, 2013). In some cases, there is a reversion to a previously acquired language variety (Seliger et al., 1992; Kwon and Kim, 2006). In 2010, Verhoeven and Mariën provided a taxonomical classification of this speech disorder and defined three main types of FAS: a neurogenic, psychogenic and mixed type (Verhoeven and Mariën, 2010a). Neurogenic FAS is further subdivided into an acquired and a developmental<sup>1</sup> variant. The current article focuses on developmental FAS, which is one of the rarest etiological subtypes of FAS. To the best of our knowledge only two case studies have been published between 1907 and 2014 (Mariën et al., 2009). The first case was a 29-year-old female native speaker of Belgian Dutch who was diagnosed with FAS and developmental apraxia of speech (DAS). The second patient was a 7-year-old boy, who presented with FAS in the context of specific language impairment (SLI) of the phonological-syntactic type (Mariën et al., 2009).

Although the number of documented developmental FAS cases has remained low, accent change has been (anecdotally) reported in relation to neurodevelopmental disorders, especially autism of the Asperger-type (Attwood, 1998; Ghazziudin, 2005; Tantam, 2012). However, in these reports, the neurobiological relationship between the speech characteristics and the developmental disorder was not addressed in detail. Hence, it is possible that FAS is much more

<sup>1</sup>As the focus of the current article is the developmental subtype of foreign accent syndrome, the interested reader is referred to Verhoeven and Mariën (2010a) for a comprehensive discussion of the FAS taxonomy.

common in a population with developmental disorders than current statistics indicate. This article presents a new case of developmental FAS in combination with DAS: a neurologically based speech disorder that affects the planning/programming of phonemes and articulatory sequences as language develops, in the absence of any neuromuscular impairment (Crary, 1984; McNeill and Kent, 1990; Smith et al., 1994). The patient is a 17-year-old right-handed native speaker of Belgian Dutch (Verhoeven, 2005) who presented with articulatory problems and an accent, which was perceived as French or "Mediterranean" by family, medical staff and acquaintances. A neurological and neuropsychological assessment was carried out and both an MRI and a SPECT were performed. Furthermore, the patient's speech was analyzed phonetically. Since this occurrence of FAS is linked to a programming disorder, the hypothesis of FAS as a possible subtype of apraxia of speech will be addressed in detail.

## BACKGROUND

The assessment presented in this article was carried out following the principles of the standard clinical neurolinguistic workup of patients with speech- and/or language disorders at ZNA Middelheim hospital in Antwerp (Belgium). The patient's parents provided written informed consent to report the patient's medical data.

A 17-year-old, right-handed, native speaker of Belgian Dutch consulted the department of Clinical Neurolinguistics of ZNA Middelheim Hospital because of persisting articulation difficulties resulting in accented speech. The patient indicated that listeners identified him as a non-native speaker of Dutch with a French or "Mediterranean" accent. He was born at term after normal gestation and labor, and there had been no perinatal or postnatal problems. Medical history was unremarkable. According to "WHO child growth standards" acquisition of gross motor milestones was normal. He could sit without support at 5.5 months (mean = 6.0; SD = 1.1), stand with assistance at 7 months (mean = 7.6; SD = 1.4) and walk independently at the age of 11 months (mean = 12; SD = 1.8 months). He was able to independently ride a bicycle without support at the age of 4.0 years. By the age of 4–5 years he had developed a clear right-hand preference.

Except for a deviant development of articulation skills, developmental milestones were normal, including non-motor speech and language ability. The patient did not present with any pervasive developmental disorder and no family history of developmental disorders or learning disabilities was reported. There were no clinical indications for a psychiatric disorder. The parents and close relatives stated that the patient was in perfect mental health. The patient was not under any medication at the time of examination. Speech therapy was started at the age of 5 years and discontinued at the age of 10 because of a lack of therapeutic progress. The parents were monolingual speakers of Dutch. The patient had successfully finished primary school and obtained above average results in the 3rd grade of secondary school. Neurological investigations, including EEG recordings, were normal. MRI of the brain revealed no lesions at the supraand infratentorial level. There was no brain atrophy.

A quantified Tc-99m-ECD SPECT study was carried out. 740 MBq (20 mCi) Tc-99m-ECD was administered to the patient by means of a previously fixed butterfly needle while he was sitting in a quiet dim room, eyes open and ears unplugged. Acquisition was started 40 min after injection using a three-headed rotating gamma camera system (Triad 88; Trionix Research Laboratory, Twinsburg, Ohio, USA) equipped with lead super-fine fanbeam collimators with a system resolution of 7.3 mm FWHM (rotating radius 13 cm). Projection data were accumulated in a 128 x 64 matrix, pixel size 3.56 mm, 15 s per angle, 120 angles for each detector (3◦ steps, 360◦ rotation). Projection images were rebinned to parallel data, smoothed and reconstructed in a 64×64 matrix, using a Butterworth filter with a high cut frequency of 0.7 cycles/cm and a roll-off of 5. No attenuation or scatter correction was performed. Trans-axial images with a pixel size of 3.56 mm were anatomically standardized using SPM and compared to a standard normal and SD image obtained from ECD perfusion studies in a group of 15 normally educated healthy adults consisting of 8 men and 7 women with an age ranging from 45 to 70 years. This normal image was created by co-registration of each normal study to the SPECT template image of SPM using the "normalize" function in SPM. At the same time, the global brain uptake of each study was normalized. On the mean image, 31 ROI's were drawn and a 31 ROI template was created. Using the normalized studies and the 31 ROI template, the mean normal uptake and SD value (=1 Z-score) in each ROI was defined. Patient data were normalized using SPM in the same way and the perfusion uptake in each ROI was calculated. From this uptake, the mean uptake and SD value of the normal database, the Z-score for each region can be calculated. A regional Z-score of >2.0 is considered significant. SPECT findings are illustrated in **Figure 1**.

A significant bilateral hypoperfusion distributed in the medial prefrontal regions (right: −3.48 SD; left: −4.97 SD) and in both lateral temporal regions (right: −3.17 SD; left: −2.17 SD) was found. Decreased perfusion in the left inferior medial frontal region (−1.65 SD), the right inferior lateral frontal region (−1.62) and the right cerebellar hemisphere (−1.52) nearly reached significance.

#### Neuropsychological Investigations

In-depth neuropsychological assessment consisted of a range of formal tests including the Wechsler Adult Intelligence Scale, 4th Ed., Dutch version (WAIS-IV-NL) (WAIS-IV: Wechsler, 2008; WAIS-IV-NL: Kooij and Dek, 2012), the Bourdon-Vos Test (Vos, 1998), the Wisconsin Card Sorting Test (WCST) (Heaton et al., 1993), the Stroop Color-Word Test (Stroop, 1935; Golden, 1978), the Trail Making Test (TMT) (Reitan, 1958), the Rey-Osterrieth figure (Rey, 1941; Osterrieth, 1944), the praxis subtests of the Hierarchic Dementia Scale (HDS) (Cole and Dastoor, 1987), the Beery Developmental Test of Visual-Motor Integration, 5th Ed. (Beery and Beery, 2004) and the Test of Visual-Perceptual Skills, third edition (TVPS-3) (Martin, 2006). Neurolinguistic assessment consisted of the Boston Naming Test (Kaplan et al., 1983; Belgian norms (Dutch): Mariën et al., 1998), the Clinical

FIGURE 1 | SPECT-findings demonstrating a significant decrease of perfusion bilaterally in the prefrontal and medial frontal regions, as well as in the lateral temporal regions.

Evaluation of Language Fundamentals (Dutch version) (Semel et al., 2003) and the Dudal Spelling Tests (Dudal, 1998, 2004). Test results are summarized in **Table 1**.

General cognitive skills as measured by the WAIS-IV showed a high average full scale IQ level (FSIQ = 119) and average to above average results for each of the subscales. Problems primarily concerned abstract concept formation: shifting and maintaining goal-oriented cognitive strategies in response to changing environmental contingencies was abnormal as the patient only succeeded to complete 1 category within 128 trials (WCST). The planning and construction of a complex geometrical form (Rey-Osterrieth Figure) was abnormal. On the Beery Developmental Test of Visual-Motor Integration the patient obtained borderline results for visual-motor integration skills (−1.4 SD) and for visual-motor coordination (−1.8 SD). Visual perception was normal. Articulation and prosody in conversational and spontaneous speech were clearly abnormal. The patient produced several substitution errors as well as omissions and additions during spontaneous conversation. Oral-verbal diadochokinesis was within normal limits, whereas rapid repetition of polysyllabic words was hesitant. Visual confrontation naming (BNT) and semantic verbal fluency were normal as well. Indices on CELF-IV-NL (Semel et al., 2003) were all above average. No grammatical errors, and lexical retrieval difficulties were observed. Spelling of words and sentences (Dudal spelling) was normal. The isolated motor speech impairments consisted of substitution errors for consonants (affecting place and manner of articulation: e.g., "groepjen" instead of "groepjes": little groups, the use of a uvular trill instead of an alveolar trill) and vowels (affecting vowel distinctiveness), difficulties initiating words ("ra.. ra.. ra. . . geraak": get somewhere) and omissions of consonants ("geraa" instead of "geraak," "pagia" instead of "pagina": page). These errors are consistent with a diagnosis of DAS (see also "phonetic analysis" below).

#### Phonetic Analysis

A perceptual error analysis of a 1:36 min spontaneous speech sample consisting of 397 words was carried out. This was supplemented by an acoustic analysis of some key aspects of speech. As far as consonant production is concerned, occasional voicing errors were observed (stravde for strafte: past tense of "punished"). It was furthermore striking that the speaker used a uvular trill instead of the alveolar trill: although both are acceptable realizations of the trill in Dutch, the alveolar trill is the more common variant in the Brabantine geographical region

#### TABLE 1 | Overview of the neuropsychological test results.


#### TABLE 1 | Continued


of origin of this speaker. It is precisely the usage of a uvular trill that is typical of French non-native speakers of Dutch.

With respect to vowel articulation, various distortions were observed. In order to quantify these deviations, the formant

frequencies of the 358 peripheral vowels in the speech sample were measured by means of the signal processing software PRAAT (Boersma and Weenink, 2015). The instances of schwa were not analyzed. The mean formant values of the FAS vowels are illustrated in **Figure 2**. They have been correlated to the vowel formants of a group of 5 male native control speakers of Dutch from the same geographical region as the FAS speaker. The formant values of the control speakers were obtained in a data collection independent of this investigation, which is described in more detail in Adank et al. (2004).

**Figure 2** shows that with respect to vowel production: (1) there is a significant degree of vowel reduction and (2) a substantial erosion of vowel distinctiveness particularly in the front vowels. The observed vowel reduction, i.e. the more central realization of the vowels with respect to the control vowels, can be accounted for by the fact that the vowels in the FAS speaker and the control group have been recorded in different communicative settings. The vowels of the control group were recorded in a structured reading task in which the vowels were positioned in a prominent utterance position in order to attract sentence stress. This leads to a more careful pronunciation of the vowels and gives rise to more peripheral formant values than in spontaneous speech. Hence, the vowel reduction observed in the FAS speaker is unlikely to be contributory to the impression of a foreign accent.

The erosion of the distinctiveness of some vowels in the FAS speaker is particularly noticeable in the close front region of the vowel space: there is very little qualitative difference between /i/, /y/ and /e/, and between /I/, /ε/ and /Y/. This smaller distinctiveness cannot be explained by the regional accent of the speaker (Verhoeven and Van Bael, 2002): therefore, it is not unreasonable to assume that this lack of distinctiveness may have contributed to the perception of a foreign accent.

At the suprasegmental level, several dimensions were studied. First, speech rate was investigated from two perspectives, that is as speech rate and articulation rate. Speech rate is expressed as the number of syllables per second, including silent and filled pauses, while articulation rate is quantified as the number of syllables per second including filled pauses, but excluding silent pauses (Verhoeven et al., 2004). In this FAS speaker, speech rate was 3.83 syll/s and articulation rate amounted to 4.79 syll/s. This compares well to a control group of unimpaired native speakers of Dutch who had a speaking rate and articulation rate of 3.89 syll/s and 4.23 syll/s respectively (Verhoeven et al., 2004). From this, it can be concluded that this speaker's speech is generally very fluent and it is precisely the dissociation in fluency between FAS and AoS that has previously been mentioned as one of the hallmark features distinguishing both speech disorders from each other (Aaronson, 1990; Moen, 2000).

The next dimension that was investigated was the speaker's speech rhythm, which was quantified by means of the pairwise variability index (PVI) proposed by Low et al. (2000). This index is based on measures of vowel durations (vocalic PVI) and the duration of the intervocalic intervals (intervocalic PVI). In this speaker, the vocalic PVI amounted to 48: this is considerably lower than 65.5, which is the reference value for Dutch suggested in Grabe and Low (2002). However, it is very close to 43.5, which is the reference value for French. This suggests that the speaker's rhythm is more French-like (syllable-timed) than Dutch (stresstimed) and this may have contributed to the impression of a French accent.

Finally, the speaker's intonation was investigated along the same lines as Verhoeven and Mariën (2010a). As far as the mean pitch and the excursion sizes of the pitch movements in the contours are concerned, it was found that the speaker's mean pitch is 110.5 Hz while his pitch range amounts to 5.85 semitones. This agrees rather well with averages for male native speakers of Dutch suggested in 't Hart et al. (1990). The internal composition of the pitch contours was analyzed by means of the stylization method proposed by 't Hart et al. (1990). This method uses speech analysis and synthesis techniques to replace the original F0 contours by means of a minimal combination of straight lines which are perceptually equivalent. This method eliminates microprosodic variation and provides an insight in the internal structure of pitch contours. For more information about the application of this method to the analysis of speech pathology the interested reader is referred to Verhoeven and Mariën (2010b).

Application of the stylization method revealed 4 different pitch contours. The first one consists of a prominence-lending rising pitch movement (symbolized as 1) immediately followed by a prominence-lending fall (symbolized as A) in the same syllable. This (1-A) pattern occurred 49 times (36.6 %) in the patient's speech sample and it was always correctly associated with the most prominent syllable in the utterance. The second contour is one in which the rising and falling pitch movements 1 and A are aligned with two different prominent syllables: the two movements are connected by means of a stretch of high pitch. The occurrence of this contour is confined to the last two prominent syllables in sentences. This contour was used 13 times (9.7%) by the speaker: all instances were well-formed and agreed with the distributional restrictions of this contour. The third contour is another variant of 1-A in which the first sentence accent is realized by means of a prominence-lending rising pitch movement (1) and the last accent is marked by means of a prominence-lending falling pitch movement (A). Any intervening accents are marked by means of a half fall (symbolized as E) and this gives rise to a typical terrace contour. The speaker used this contour 8 times (6%). The fourth contour is a continuation contour in which the accent is realized by means of a prominence-lending rising pitch movement. The pitch remains high and is then reset to a lower level in order to mark a syntactic boundary (symbolized as B). This is the standard continuation contour, which indicates that the utterance is not finished yet. This contour was used 64 times (47.8%). The 1-B contour did not always coincide with syntactic boundaries, but it was noticed that often individual words within a larger syntactic unit were realized with this contour.

The frequencies of the contours in this speech sample were compared to reference frequencies for spontaneous Dutch reported in Blaauw (1995), who carried out a perceptual analysis of instruction dialogs in 5 speakers. This comparison revealed that the frequency of occurrence of all the speaker's contours was very similar to the reference values suggested in Blaauw (1995), except for the 1B contour, which was significantly more frequent than in unimpaired speech. A similar observation was reported in Verhoeven and Mariën (2010a) and Kuschmann (2010) for neurogenic acquired FAS.

# DISCUSSION

# Semiological Resemblances between FAS and DAS

This patient presented with isolated developmental motor speech problems consistent with a diagnosis of FAS and DAS. Previous research has shown that FAS may result from a compensation strategy by patients showing apraxia-like features in speech production (Whiteside and Varley, 1998). It is argued that the same can be assumed for DAS patients. Fluency has been mentioned as one of the key characteristics distinguishing AoS (Van der Merwe, 2009; Duffy, 2013) and FAS patients, and it seems that this semiological distinction also holds for DAS patients. Furthermore, DAS (and AoS) is often characterized by attainment of phonological sequences, whereas FAS is characterized by deviations of individual speech sounds (Moen, 2000).

This patient demonstrated many of the key features associated with DAS (Shriberg et al., 1997a,b; Nijland et al., 2003; Peter and Stoel-Gammon, 2005; McCauley and Strand, 2008; Morgan and Vogel, 2009; Terband et al., 2009; see also: neuropsychological investigations). Some of these errors are typical segmental errors which have also been observed in other FAS cases. However, this patient did not show the typical "trial-and-error" behavior which is regularly noted in DAS patients (Stackhouse, 1992; Moen, 2000, 2006; Ozanne, 2005; Hall et al., 2007; Terband et al., 2011). The analysis of suprasegmental features for this case provided supplementary evidence against the idea that FAS is primarily a prosodic deficit: the only remarkable feature was a syllable-timed speech rhythm and the excessive use of the 1B (continuation) contour. Speech and articulation rate, mean pitch (parameter of intonation) and the general shape of the intonation contours were normal.

# Planning Deficits: Crossing Speech Boundaries

The hypothesis of FAS as a subtype of AoS, has previously been described in a physiological (Moen, 2000) and a cognitive perspective (Whiteside and Varley, 1998). This patient was also investigated from both perspectives. Cognitive assessment demonstrated (selective) executive disturbances (deviant scores on the Wisconsin Card Sorting Test and low results on the Stroop Task-card III) and distorted planning and organization in the visuo-spatial domain. However, the patient obtained average to above-average results on other executive tasks (such as the digit span and TMT-B, for instance). Comparison with the cognitive profile of the previously published cases of developmental FAS revealed a comparable discrepancy. The neuropsychological test results of the first patient published by Mariën et al. (2009) demonstrated a low average performance IQ as well as depressed scores for digit span and TMT-A and B. Scores for the WCST and Stroop task on the other hand, were well within the normal range. In their second patient, only severe syntactic deficits affecting language processing were retained. All other cognitive test results were in the average range or above. The results were consistent with a diagnosis of SLI of the phonological-syntactic type. Both the results of this patient and the first patient described by Mariën et al. (2009) go against the finding that WCST scores are a predictor for TMT-B performance, claiming that both tests give expression to attentional set-shifting problems (Sánchez-Cubillo et al., 2009). Some studies have claimed that correlations between the Stroop interference and TMT-B constitute evidence of a shared expression of inhibitory control (Chaytor et al., 2006). Other studies have contradicted such a correlation. For instance, Sánchez-Cubillo et al. (2009) analyzed 41 Spanishspeaking healthy participants and found that TMT-A scores primarily tap visuo-perceptual abilities and visual search (a significant amount of the variance in multiple regression analysis was predicted by the WAIS-III Digit Symbol score), whereas the TMT-B was primarily informed by working memory and only then by task-switching ability (their correlation with the Stroop Interference Task was nulled in the multiple regression analysis).

Functional neuroimaging with SPECT in this patient revealed a decreased perfusion in the anatomo-clinically suspected brain regions involving the bilateral prefrontal cortex, the medial frontal regions and the cerebellum. On the basis of lesion studies, research has linked damage affecting the prefrontal cortex (PFC) to impaired executive functioning (Robinson et al., 1980; Yuan and Raz, 2014). Yuan and Raz (2014) carried out a literature survey about the anatomo-functional correlates of executive functions and showed that increased PFC volume in healthy subjects correlated (positively) with scores on the WCST. Buchsbaum et al. (2005) also found that perfusion in the bilateral PFC significantly increases during performance of tasks requiring executive planning and control. However, the value of the WCST as an exclusive indicator of frontal dysfunction remains a matter of debate. Chase-Carmichael et al. (1999) for instance, have contested the value of the WCST as an indicator of frontal pathology in a pediatric population (age 8–18). For their study, they classified children according to the affected brain area(s) (left hemisphere, right hemisphere, or bilateral frontal, extrafrontal, or multifocal/diffuse regions of brain dysfunction) regardless of the etiology (stroke, brain trauma, tumor, seizures, neurofibromatosis, lupus, myelomeningocele, and cognitive changes of unknown origin). Results did not support the assumption that WCST performance is more impaired in frontal lesions than extrafrontal or multifocal/diffuse lesions. However, they classified all patients with frontal lobe dysfunction together and did not take into consideration differences in the affected sub-regions. Moreover, they argue that dysfunction in certain sub-regions (e.g., medial frontal regions) of the frontal lobe in the left hemisphere leads to lower performance on the WCST (Drewe, 1974; Grafman et al., 1986). Still, their study confirmed that patients with left-hemisphere damage generally perform weaker than patients with right hemisphere damage. For adult stroke patients, the same conclusion holds (Jodzio and Biechowska, 2010).

This patient also obtained borderline scores on the motor integration and coordination subtests of the Beery-Buktenica Developmental Test of Visual-Motor Integration; which is a test administered to evaluate the integration of visual perception and co-ordination of fine motor skills in drawing (Beery, 1989). The patient also obtained a low score on the reproduction of the Rey Complex Figure (28/36). It was concluded from these results that the patient had spatial planning, visual structuring and copying (drawing) problems. The patient was diagnosed with a constructional dyspraxia following execution and planning problems of frontal origin.

Because the scores obtained for Block Design, Visual Puzzles (visuo-constructional tests), and the visual perception subtest of Beery-Buktenica Test (perceptual skills) were in the average, unimpaired range, it is hypothesized that the main deficit occurs in the programming phase of the relevant motor movements prior to execution of grapho-motor tasks (Del Giudice et al., 2000). According to the model proposed by Grossi and Angelini (Grossi, 1991, see also: Grossi and Trojano, 1998) the copying of drawings requires (1) a visuospatial analysis of the geometrical and spatial aspects of the figure to be copied, as well as a scan of the repertoire of internalized figures drawn in the past, (2) the formulation of a drawing plan, stored in the working memory (visuospatial sketchpad, Baddeley and Hitch, 1974) containing the integration of visuo-spatial representations into the required motor actions (programming phase) (3) the execution of the grapho-motor movements (4) and finally the control of these movements (see also: Denes and Pizzamiglio, 1998). Since this patient obtained a maximum score on the retention of visual material during neuropsychological testing, it is plausible that the impairment is situated after the instauration of the figure in the visuo-spatial sketchpad (working memory). This model is developed along the same lines as the speech sensorimotor control models (Van der Merwe, 2009). In short, the problem might be situated in the second phase of planning and programming. Furthermore, this patient did not demonstrate a hypoperfusion in the (superior) parietal region, where graphomotor plans are stored. Yet a significant hypoperfusion was found in the area circumscribing the (bilateral) prefrontal cortex, the area where graphomotor plans are programmed/integrated for execution (Mariën et al., 2013). Disorders of skilled movements, as well as underdeveloped constructional abilities have been noted in the context of DAS (Yoss and Darley, 1974; McLaughlin and Kriegsmann, 1980; Maassen, 2002).

# The Hypothesis of a Cortico-Cerebellar Network Dysfunction

The frontal executive dysfunctions in conjunction with the SPECT findings lead to the hypothesis that the pattern of hypoperfusions reflects significant involvement of the cerebrocerebellar functional connectivity network (Meister et al., 2003; Mariën et al., 2006, 2013; Mariën and Verhoeven, 2007; Moreno-Torres et al., 2013). Cerebellar involvement in speech disorders, including FAS and AoS, has previously been proposed from the viewpoint of the cerebellum as a coordinator of speech timing (see also: De Smet et al., 2007). Also, the phonetic analysis of our patient's speech gave evidence for semiological resemblances between DAS and FAS. However, one of the most striking differences between both conditions, namely the fluency aspect was equally confirmed for our patient. These findings provide support for the hypothesis that FAS may be a mild subtype of AoS as well the developmental cognate (Whiteside and Varley, 1998; Moen, 2000, 2006; Fridriksson et al., 2005; Mariën et al., 2006, 2009; Kanjee et al., 2010).

In hindsight, diffusion tensor imaging (DTI) might be of added value to identify structural changes to the white matter tracts which make up and connect with the cortico-cerebellar tract. DTI voxel-based morphometry was unfortunately not carried out in this patient. However, it could help to further clarify the pathophysiological substrate of neurodevelopmental disorders and should be considered in future research on developmental FAS.

# CONCLUDING REMARKS

A new case of developmental FAS with DAS and a visuo-spatial planning disorder was presented. From a semiological as well as structural and physiological point of view, the hypothesis of a connection between FAS and DAS seems plausible in this case. Moreover, the conjunction between the speech impairment and frontal executive deficits, supported by SPECT findings provide further evidence for a potentially primary role of the cerebrocerebellar network in both disorders. However, one of the main characteristics of DAS is trial-and-error behavior. This was not attested since the patient could adequately self-correct whenever production errors were made. Therefore, the hypothesis is put forward that FAS is a mild subtype of AoS, even when both are developmental in nature.

# AUTHOR CONTRIBUTIONS

All authors listed, have made substantial, direct and intellectual contribution to the work, and approved it for publication.

#### REFERENCES


#### ACKNOWLEDGMENTS

The authors thank Bastien De Clerq of the English Linguistics department at the Vrije Universiteit Brussel for correcting the English.


and lifespan reference data. J. Speech Lang. Hear. Res. 40, 723–740. doi: 10.1044/jslhr.4004.723


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Keulen, Mariën, Wackenier, Jonkers, Bastiaanse and Verhoeven. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Mild Developmental Foreign Accent Syndrome and Psychiatric Comorbidity: Altered White Matter Integrity in Speech and Emotion Regulation Networks

Marcelo L. Berthier<sup>1</sup> \* † , Núria Roé-Vellvé<sup>2</sup>† , Ignacio Moreno-Torres<sup>3</sup> , Carles Falcon<sup>4</sup> , Karl Thurnhofer-Hemsi2,5, José Paredes-Pacheco2,5, María J. Torres-Prioris1,6 , Irene De-Torres1,7, Francisco Alfaro<sup>2</sup> , Antonio L. Gutiérrez-Cardo<sup>2</sup> , Miquel Baquero<sup>8</sup> , Rafael Ruiz-Cruces<sup>1</sup> and Guadalupe Dávila1,6

#### Edited by:

Srikantan S. Nagarajan, University of California, San Francisco, USA

#### Reviewed by:

Katie A. McLaughlin, University of Washington Tacoma, USA Juliana Baldo, VA Northern California Health Care System, USA

#### \*Correspondence:

Marcelo L. Berthier mbt@uma.es

†These authors have contributed equally to this work.

> Received: 14 July 2015 Accepted: 26 July 2016 Published: 09 August 2016

#### Citation:

Berthier ML, Roé-Vellvé N, Moreno-Torres I, Falcon C, Thurnhofer-Hemsi K, Paredes-Pacheco J, Torres-Prioris MJ, De-Torres I, Alfaro F, Gutiérrez-Cardo AL, Baquero M, Ruiz-Cruces R and Dávila G (2016) Mild Developmental Foreign Accent Syndrome and Psychiatric Comorbidity: Altered White Matter Integrity in Speech and Emotion Regulation Networks. Front. Hum. Neurosci. 10:399. doi: 10.3389/fnhum.2016.00399 <sup>1</sup> Cognitive Neurology and Aphasia Unit and Cathedra ARPA of Aphasia, Centro de Investigaciones Médico-Sanitarias, Instituto de Investigación Biomédica de Málaga (IBIMA), University of Malaga, Malaga, Spain, <sup>2</sup> Molecular Imaging Unit, Centro de Investigaciones Médico-Sanitarias, University of Malaga, Malaga, Spain, <sup>3</sup> Department of Spanish Language, University of Malaga, Malaga, Spain, <sup>4</sup> Barcelonabeta Brain Research Center, Pasqual Maragall Foundation, Barcelona, Spain, <sup>5</sup> Department of Applied Mathematics, Superior Technical School of Engineering in Informatics, University of Malaga, Malaga, Spain, <sup>6</sup> Department of Psychobiology and Methodology of Behavioural Sciences, Faculty of Psychology, University of Malaga, Malaga, Spain, <sup>7</sup> Unit of Physical Medicine and Rehabilitation, Regional University Hospital, Malaga, Malaga, Spain, <sup>8</sup> Service of Neurology, Hospital Universitari i Politècnic La Fe, Valencia, Spain

Foreign accent syndrome (FAS) is a speech disorder that is defined by the emergence of a peculiar manner of articulation and intonation which is perceived as foreign. In most cases of acquired FAS (AFAS) the new accent is secondary to small focal lesions involving components of the bilaterally distributed neural network for speech production. In the past few years FAS has also been described in different psychiatric conditions (conversion disorder, bipolar disorder, and schizophrenia) as well as in developmental disorders (specific language impairment, apraxia of speech). In the present study, two adult males, one with atypical phonetic production and the other one with cluttering, reported having developmental FAS (DFAS) since their adolescence. Perceptual analysis by naïve judges could not confirm the presence of foreign accent, possibly due to the mildness of the speech disorder. However, detailed linguistic analysis provided evidence of prosodic and segmental errors previously reported in AFAS cases. Cognitive testing showed reduced communication in activities of daily living and mild deficits related to psychiatric disorders. Psychiatric evaluation revealed long-lasting internalizing disorders (neuroticism, anxiety, obsessive-compulsive disorder, social phobia, depression, alexithymia, hopelessness, and apathy) in both subjects. Diffusion tensor imaging (DTI) data from each subject with DFAS were compared with data from a group of 21 age- and gender-matched healthy control subjects. Diffusion parameters (MD, AD, and RD) in predefined regions of interest showed changes of white matter microstructure in regions previously related with AFAS and psychiatric disorders. In conclusion, the present findings militate against the possibility that these two subjects have FAS of psychogenic origin. Rather, our findings provide evidence that mild DFAS

occurring in the context of subtle, yet persistent, developmental speech disorders may be associated with structural brain anomalies. We suggest that the simultaneous involvement of speech and emotion regulation networks might result from disrupted neural organization during development, or compensatory or maladaptive plasticity. Future studies are required to examine whether the interplay between biological traitlike diathesis (shyness, neuroticism) and the stressful experience of living with mild DFAS lead to the development of internalizing psychiatric disorders.

Keywords: developmental speech disorders, foreign accent, diffusion tensor imaging, personality, psychiatric disorders

#### INTRODUCTION

Foreign accent syndrome (FAS) is a stigmatizing disorder that is defined by the emergence of a peculiar manner of articulation and intonation which is perceived as foreign (Whitaker, 1982; Blumstein et al., 1987; Berthier et al., 1991). In the vast majority of individuals with FAS the condition is acquired (AFAS) after brain damage or it emerges during the course of psychiatric illnesses (e.g., schizophrenia). Even less frequently, FAS occurs during speech-language development (Mariën et al., 2009; Keulen et al., 2016a). In these cases, toddlers develop the language (lexicon and grammar), but not the pronunciation (accent) that is peculiar to the community to which they belong (Flege et al., 2006). On exploring this issue in our unit, we were confronted with two different situations; one was that of adolescents with autism spectrum disorder (Asperger's syndrome) born in families with strong local accents but who spoke with clear standard Spanish accent.<sup>1</sup> A tentative explanation could be that these children absorb different words from different sources (other people, mass media)<sup>2</sup> and they use them as "formulaic language" (Locke, 1997) that may be exact replicas of what they had heard during language learning (a way of delayed echolalia for accent) so that their accent may sound as foreign or atypical.

The other situation, which motivates the present study, was the case of two adult males who claimed to have FAS since their early adolescence [developmental FAS (DFAS)]. They realized having DFAS after being alerted by classmates, but they remained uninformed about the nature of the condition until they accessed information on mass media. Both subjects reported on the negative emotional, social, and occupational consequences of speaking with a non-native accent. We selected these two subjects for a more in depth evaluation according to the following criteria: (a) claims by both subjects that their speech sounded foreign to naive listeners since their adolescence (Blumstein et al., 1987); and (b) presence of atypical phonetic features that might explain why their speech was perceived as foreign. This was confirmed by an expert phonetician with experience in FAS (IM-T). On the initial interview, one subject reported family history of stuttering and the other one's speech resembled cluttering. Moreover, we noticed that they were excessively concerned with their foreign accent; this presumably resulted from the presence of long-standing co-morbid psychiatric conditions (neuroticism, obsessive-compulsive disorder, anxiety, depression, and social phobia) and from limited coping strategies which were heightened by impolite reactions of others. Thus, the presence of psychiatric symptoms raised the possibility that the origin of FAS in these two cases might be psychogenic (Reeves and Norton, 2001; Van Borsel et al., 2005; Verhoeven et al., 2005; Reeves et al., 2007; Keulen et al., 2016b). Nevertheless, changes in brain function and anatomy have been implicated in the pathogenesis of developmental speech disorders including cluttering (Ward et al., 2015) and stuttering (Neef et al., 2015) as well as in AFAS (Fridriksson et al., 2005; Katz et al., 2012; Moreno-Torres et al., 2013; Tomasino et al., 2013). Furthermore, comorbid psychiatric disorders (e.g., obsessive-compulsive disorder, social phobia) can be associated with changes in regions reported to be altered in FAS (Pujol et al., 2004; Fan et al., 2012; Li et al., 2014; see further details below). This means that FAS should not be classified as psychogenic before performing detailed neuroimaging studies.

Since only three cases of DFAS have been reported up to now (Mariën et al., 2009; Keulen et al., 2016a), there is no information about the interaction between the neural systems underpinning speech and behavior in DFAS. Data from a well-studied developmental speech disorder (stuttering) indicates that concomitant disorders (e.g., neuroticism, anxiety, depression, and emotional/behavioral problems) depend on a complex interaction between biological (genetics), psychological vulnerabilities, temperament, cognitive styles, and familiar and peer influences (see Bleek et al., 2012; Alm, 2014; Gunn et al., 2014; Iverach and Rapee, 2014). Therefore, we examined DFAS from a multidimensional perspective, which includes linguistic characteristics, accompanying cognitive and psychiatric deficits, and white matter microstructure with magnetic resonance imaging (MRI) and diffusion tensor imaging (DTI).

The current (limited) knowledge on DFAS cases may be illuminated by prior data on AFAS. Results from previous studies reveals that AFAS is highly heterogeneous from a phonetic viewpoint. Subjects with AFAS may show different patterns of suprasegmental (prosodic) and segmental (consonant and vowel) errors. Suprasegmental errors seem to affect very different aspects (intonation, stress, and emotional prosody), and within

<sup>1</sup> Specifically, in the area where these children live, the phoneme / s / is produced systematically as an interdental fricative [ θ ], a phenomenon which is known as ceceo. Ceceo is socially stigmatizing and unacceptable in most of the country. When examining children from areas in which ceceo is common, we found that the phenomenon was observable in typical children (i.e., they produced / s / as [ θ ]) but not in those with Asperger syndrome (i.e., they produced / s / as [ s ], as in standard Spanish; Moreno-Torres, I., and Rodríguez-Ibanéz, D., unpublished). <sup>2</sup>http://www.wrongplanet.net/forums/viewtopic.php?t=184538

one single aspect, error patterns seem to be contradictory. For instance, while some studies reported excessive and atypical pitch contours (Blumstein et al., 1987; Berthier et al., 1991; Ingram et al., 1992; Takayama et al., 1993), others have observed reduced fundamental frequency (F0) ranges (Graff-Radford et al., 1986). Similarly, at the segmental level errors on both vowels and consonants have been observed. Errors in vowels include tensing (Whitaker, 1982; Van Lancker et al., 1983; Blumstein et al., 1987; Ingram et al., 1992), lengthening (Ardila et al., 1988), and schwa coloring (Whitaker, 1982; Gurd et al., 1988). Several studies have documented an overall reduction in the acoustic vowel space due to a restricted F1 range (Ingram et al., 1992; Kurowski et al., 1996; Moreno-Torres et al., 2013). The most commonly cited errors in consonants include manner changes (Ardila et al., 1988; Berthier et al., 1991; Ingram et al., 1992; Moreno-Torres et al., 2013) and voicing errors (Blumstein et al., 1987; Ardila et al., 1988; Gurd et al., 1988), while nasalization (Lippert-Gruener et al., 2005) and place of articulation changes (Whitaker, 1982; Ardila et al., 1988) are less common. Thus, as is the case for prosodic errors, no clear error pattern has emerged consistently in these subjects. The high heterogeneity of these patients explains the difficulty to characterize them, and to determine whether FAS is a syndrome or merely an epiphenomenon (Blumstein and Kurowsky, 2006). However, it has been proposed that the core deficits in these patients might be prosodic, with segmental deficits being secondary (Blumstein and Kurowsky, 2006). According to this proposal, the error patterns might be heterogeneous for three reasons: (1) patients with FAS might differ in the precise prosodic deficit and in its severity; (2) different subtypes of prosodic deficits might produce different patterns of segmental errors, which might vary crosslinguistically; and (3) many patients may have other deficits apart from the prosodic one (e.g., apraxia of speech, dysarthria). This emphasizes the need to examine how different prosodic deficits might disturb speech production. Two situations have been described. One group of patients seems to be characterized by slow articulation and staccato rhythm, which results in frequent consonant strengthening errors and a reduced vowel space (e.g., Ingram et al., 1992; Moreno-Torres et al., 2013). Note that strengthening errors vary cross-linguistically. In the case of Spanish, strengthening processes have been observed with the three voiced stop consonants (/b, d, g/). Typical speakers produce them as approximants [i.e., (β|, ð|, γ)] (Martínez-Celdrán, 1998). In contrast, FAS patients and many late L2 learners produce them as stops. In another group of patients the speech rhythm is not slow, but they may produce frequent pauses and variable consonant distortions (Gurd et al., 1988). However, not many studies have analyzed the interaction between prosodic and segmental errors. Although the number of subjects with DFAS described until now is scant (Mariën et al., 2009; Keulen et al., 2016a), it remains to be explored whether or not the same linguistic heterogeneity described in AFAS can be observed in developmental cases. It is hypothesized that the results of the present study would provide further information on this issue.

Information on the emotional consequences of AFAS is scarce (Miller et al., 2011; Moreno-Torres et al., 2013) and data reported on DFAS revealed normal behavior (Mariën et al., 2009; Keulen et al., 2016a). Therefore, since the two subjects included in this study complained of cognitive failure and psychiatric symptoms, a second aim our study was to identify the type and severity of these complaints and their potential relationship to FAS. This is a key issue as the nosological status of these psychogenic cases is still controversial (Verhoeven et al., 2005), but note that some guidelines for the diagnosis of the syndrome in clinical practice have recently been made (Keulen et al., 2016b). The term "psychogenic" has always been applied to individuals who show symptoms despite lacking evidence of organic damage (see Vuilleumier, 2005). Nevertheless, recent neuroimaging studies have revealed functional and structural brain changes in archetypal psychogenic disorders such as motor conversion paralysis (Vuilleumier, 2005; Aybek et al., 2014), functional dysphagia (Suntrup et al., 2014), psychogenic amnesia (Botzung et al., 2007), and adult-onset stuttering triggered by stressful life events (Chang et al., 2010). Moreover, cases of FAS have been classified as psychogenic in patients with bipolar disorder and schizophrenia (Reeves and Norton, 2001; Reeves et al., 2007) which have a well-defined neural basis (Vargas et al., 2013; Wheeler and Voineskos, 2014) and even in a patient with mild traumatic brain injury (Cottingham and Boone, 2010). Finally, it is noteworthy that in two cases of AFAS the initial label "psychogenic" was changed to "neurogenic" after demonstrating structural changes (cerebral atrophy, infarcts) and metabolic abnormalities on neuroimaging (Poulin et al., 2007; Moreno-Torres et al., 2013). This is not an inconsequential matter because testimonies from FAS persons reveal that obtaining a proper diagnosis helps to mitigate the negative consequences of speaking with a foreign accent (Miller et al., 2011) preventing dysfunctional adjustment and coping strategies (Moreno-Torres et al., 2013). The question that now arises is which brain regions participate in the coexpression of abnormal speech production and psychopathology in DFAS.

Brain imaging studies on DFAS have revealed normal gross anatomy but decreased activity of several components of the large-scale bilateral speech production network including the cerebellum, basal ganglia, and prefrontal-medial frontal regions (Mariën et al., 2009; Keulen et al., 2016a), a variety of regions which are also involved in cases of AFAS (see Carbary et al., 2000; Scott et al., 2006; Katz et al., 2012; Moreno-Torres et al., 2013; Tomasino et al., 2013, for reviews). Other affected regions in DFAS (left thalamus and lateral temporal regions and occipital cortex bilaterally; Mariën et al., 2009; Keulen et al., 2016a) have also been described in cases of AFAS, yet these areas are not integral components of the speech production network (Wise et al., 1999; Riecker et al., 2005; Sörös et al., 2006; Ackermann and Riecker, 2010). This latter finding increases the likelihood that abnormal activity in such regions could be the consequence of disorders other than DFAS, but which may coexist with it. Therefore, the third aim of the present study was to examine the neural substrate of DFAS and its cognitive and psychiatric comorbidity using DTI to identify changes in white matter microstructure. A comprehensive analysis of the neural substrate underlying developmental speech-language disorders potentially evolving to FAS (e.g., stuttering, specific language impairment) and their comorbid psychiatric disorders [anxiety, social phobia, obsessive compulsive disorder (OCD), neuroticism and so forth] observed in our subjects is beyond the scope of this study (see Servaas et al., 2013; van der Velde et al., 2013; Aghajani et al., 2014; LeWinn et al., 2014; Piras et al., 2015).

# CLINICAL CASE STUDIES

fnhum-10-00399 August 8, 2016 Time: 19:36 # 4

### Subject 1

Subject 1 was a 27-year-old right handed male who contacted us via e-mail for evaluation of a possible FAS. The family history was unremarkable except for the presence of persistent developmental stuttering in the father. The parents were not related to each other. Developmental history was self-reported and there was no opportunity to obtain information from family members. He was the product of a normal pregnancy and delivery. Developmental milestones were apparently normal and he denied learning disability or specific problems with speechlanguage, reading and writing, but described himself as a shy child (distress in some social situations; Miskovic and Schmidt, 2012). He grew up in a bilingual environment and his mother tongue (L1) was Spanish. He had learned Valencian (a dialectal variant of Catalan) at home and English at school and in USA where he lived 1 year at the age 9 years. However, since adolescence he did not like to use either Valencian or English in casual conversations because he felt his articulation in these languages was abnormal. Subject 1 attended normal schooling and passed all grades uneventfully. He progressed adequately in high school and college and obtained a degree in Library Science (see further details in psychiatric status and socialoccupational adjustment section). Neurological examination was normal showing no motor coordination disorders. He had a digital anomaly (camptodactyly) in several toes but no other developmental malformations.

# Subject 2

Subject 2 was a 46-year-old right handed male who contacted us via e-mail for evaluation of a possible FAS. The family history was unremarkable and his parents were not related to each other. Similarly to what occurred in subject 1, details of subject 2's developmental history were obtained from his testimony because there was no opportunity to obtain information from family members. He was the product of a normal full-term pregnancy and delivery. Early development was apparently normal but he described himself as a clumsy child with short attention span but no hyperactivity. Subject 2 was normal in language and reading acquisition, but he described occasional stuttering and deficient fine motor skills (e.g., difficulty tying shoe laces) with elements of motor and spatial dysgraphia (inappropriately sized and spaced letters, misspelled words). He grew up in a bilingual environment and his mother tongue (L1) was Spanish. Even though he was born in Catalonia, he had learned Catalan at the age of 14 years when he entered to a Catalan speaking high school. However, he had difficulties mastering Catalan accent to the extent that he disliked speaking this language. As an adult he lived in England and USA during short periods and he reported problems to learn vowels in English, but otherwise his grammar and vocabulary was above average. He attended normal schooling and passed all grades uneventfully. He progressed adequately in high school and college and obtained a Law degree (see further details in psychiatric status and social-occupational adjustment section). Subject 2 commented that he received musical lessons during childhood but he had difficulties to sing even the easiest melodies. Moreover, he reported having problems to impart affective intonation and that his linguistic prosody in everyday communication was abnormal. During adolescence when he wanted to question something he ended the phrase saying "I am asking you." Neurological examination revealed "soft neurological signs" (Hollander et al., 1990) including impaired finger-to-nose coordination on the right side and right/left confusion. He also had mild buccofacial apraxia.

#### Analysis of Foreign Accent Perception

The two subjects and the healthy control subjects signed an informed consent for participation after receiving an explanation of the aims and methodology of the study according to the Declaration of Helsinki. The study protocol was approved by the Ethical Committee of University of Malaga, Malaga, Spain. In order to discard the presence of perception deficits, both subjects were evaluated with a battery of perception tests previously used in our Lab in FAS patients (see Moreno-Torres et al., 2013). This battery includes both segmental (i.e., minimal pairs of words and non-words) and suprasegmental tasks (i.e., lexical stress → /'PA.pa/ vs. /pa.'PA/; intonation → exclamative vs. interrogative). As for both tasks the subjects scored at ceiling, further analysis was focused on productions aspects.

#### Production

Since it was expected that speech production deficits in these subjects were mild it was considered that spontaneous speech samples would be more informative than repetition data. Thus, one 15 min sample of informal conversation was obtained for each subject. The conversation was audiotaped for later analysis using a FOSTEX-LE2 recorder and an Audiotechnica AT2035 microphone. The recording took place in a silent room in our Lab. In addition, to allow for direct comparison with our database from four healthy males, the two subjects were required to produce speech in two conditions: sentence repetition and non-word repetition. The sentence repetition subtest from the Psycholinguistic Assessments of Language Processing in Aphasia (PALPA; Kay et al., 1992; Valle and Cuetos, 1995) was used. This task (PALPA 12) evaluates the ability to repeat auditorily presented sentences (n = 36) of different length (from 5 to 9 words). It is composed of reversible sentences (n = 20) and nonreversible sentences (n = 16). The Aguado's task was used to explore non-word repetition. This task has a total of 80 tokens divided in two sets of 40, for frequent and infrequent syllables, respectively. The two sub-lists are balanced on the number of non-words of each syllable length (range: 2–5 syllables; Aguado, 2011).

# Analysis of Production Data

fnhum-10-00399 August 8, 2016 Time: 19:36 # 5

#### Segmental Errors

Segmental errors in consonants were grouped as: voicing, nasalisation, place of articulation and manner of articulation. For vowels, errors were grouped as: place of articulation and manner of articulation. Voicing and manner of articulation errors in consonants were further classified as strengthening (fortition processes) or weakening (lenition processes; see Introduction).

#### Suprasegmental Errors

The analysis was based on the sentence repetition task and on the informal conversation. Two aspects were of particular interest, intonation contours and rhythm. Both aspects were analyzed with the help of Praat (Boersma and Weenink, 2010). For intonation contours we obtained measures of F0 range and form by examining the contrast between interrogatives and declarative sentences. In order to explore syllable rhythm, we calculated the speech rate (syllables per second) and its variability (standard deviation).

#### Degree of FAS

Ten students of Speech Pathology rated the degree to which the accent of subject 1, subject 2 and two healthy controls might sound foreign or native. The judges heard a total of four imitated sentences. A Likert scale was used to judge the degree of foreignness, with "1" corresponding to definitely foreign speaker; "2": probably foreign speaker; "3": probably native speaker; and "4": definitely native speaker. Judges were blind to the purpose of the study, other than having to rate items for foreignness Anticipating that due to the mild manifestation of FAS judges might not detect the presence of a foreign accent, another pool of 10 judges were asked to rate the degree of regional accent of the subjects and also of three control subjects. In each case the controls and the judges were from the same region as the subject, Valencian for subject 1 and Catalan for subject 2. For this task, the scores of the Likert scale were "1" corresponding definitely to another dialect; "2": probably speaker of another dialect; "3": probably speaker of my dialect; and "4": definitely speaker of my dialect.

# Cognitive and Language Testing

Handedness was assessed with the Edinburgh Handedness Inventory (Oldfield, 1971) and general intellectual abilities with the Mini Mental State Examination (Folstein et al., 1975). Executive functions were assessed with the Trail-Making Test (parts A and B), the Hayling test (Burgess and Shallice, 1996) and the Stroop Color-Word Test (Stroop, 1935). Verbal attention and memory for word lists were assessed with the Wechsler Memory Scale III (WMS-III; Wechsler, 1997) and visual memory with the Rey-Osterrieth Complex Figure (Rey, 1941; Peña-Casanova et al., 2009). Language functions (phonological, lexical, and semantic) were tested with several subtests of the PALPA (Kay et al., 1992; Valle and Cuetos, 1995) and the Boston Naming Test (short version) (Kaplan et al., 2001). The Controlled Oral Word Association Task (COWAT; Borkowski et al., 1967) and Category Fluency (animal naming) were used to examine phonemic and semantic fluency, respectively (Tombaugh et al., 1999). Communication in activities of daily living was assessed with the Communicative Activity Log (CAL; Pulvermüller and Berthier, 2008). Although the CAL was devised for persons with aphasia, recent data demonstrated that it is also useful to identify decrements in the amount and quality of communication in persons with AFAS (Moreno-Torres et al., 2013). The CAL was completed by the patients and in the case of subject 1 it was also completed by her partner.

# Psychiatric Testing

Since the subjects had a long-lasting history of obsessivecompulsive symptoms, anxiety, and depression, several psychiatric rating scales were administered. Personality and the impact of living with a non-native accent were also examined.

#### Obsessive Compulsive Disorder

The presence of OCD was assessed with the Leyton Obsessional Inventory (LOI; Cooper, 1970) and the Yale-Brown Obsessive Compulsive Scale (Y-BOCS; Goodman et al., 1989). The LOI is a 69-item questionnaire to rate obsessive symptoms (questions 1–46) and traits (questions 47–69). Subscales of the LOI to rate resistance to and interference of symptoms were not administered and symptom severity was rated with the Y-BOCS. The Y-BOCS is a rating scale intended for use as a semi-structured interview. It rates the obsessions and compulsions and their severity. Scores of symptom severity are as follows: subsyndromal: 0–7; mild: 8–15; moderate: 16–23; severe: 24–31; and extreme: 32–40.

#### Non-OCD Anxiety

Non-OCD anxiety was assessed with the Hamilton Anxiety Scale (HAS; Hamilton, 1959), a 14-item clinician-rated scale that measures psychic anxiety (mental agitation and psychological distress) and somatic anxiety (physical complaints related to anxiety). Each item is scored on a scale of 0 (not present) to 4 (severe), with a total score range of 0–56. Scores < 17 indicate mild severity, 18–24 mild to moderate severity, and 25– 30 moderate to severe. Since some patients with FAS tend to avoid social contacts and have reduced functional communication (Miller et al., 2011; Moreno-Torres et al., 2013) the presence and severity of social phobia were assessed with the Social Phobia Inventory (SPIN; Connor et al., 2000). The SPIN is a 17-item selfrating scale that includes items assessing symptom domains of social anxiety disorder (fear, avoidance, and physiologic arousal). The SPIN items are measured by a choice of five answers based on a scale of intensity of social phobia ranging from "not at all" to "extremely." Overall assessment is done by total score, and a total score higher than 19 indicates a likelihood of social anxiety disorder (scores ranging from 21 to 30 indicate mild severity, 31 to 40 moderate severity, 41 to 50 severe, and 51 or more very severe). Subject 1 also met diagnostic criteria of the Diagnostic and Statistical Manual of Mental Disorders, 5th Edition (DSM-5; American Psychiatric Association [APA], 2013) for a posttraumatic stress disorder (PTSD). These symptoms were evaluated with the 17-item Davidson Trauma Scale (DTS; Davidson et al., 1997), a self-report measure that assesses the 17 DSM-IV symptoms of PTSD. Items are rated on 5-point frequency (0 = "not at all" to 4 = "every day") and severity scales (0 = "not at all distressing" to 4 = "extremely distressing"). Subject 1 was asked to identify the trauma that was most disturbing to him and to rate, in the past week, how much trouble he had had with each symptom. The DTS yields a frequency score (ranging from 0 to 68), severity score (ranging from 0 to 68), and total score (ranging from 0 to 136).

#### Mood, Motivation, and Emotions

fnhum-10-00399 August 8, 2016 Time: 19:36 # 6

Depression was evaluated with the Hamilton Depression Rating Scale (HDRS), a 17-item interviewer-rated scale that measures psychological and autonomic symptoms of depression (Hamilton, 1959). Scores range from 0 to 52, with higher scores representing greater depressive symptoms. The presence of hopelessness was examined with the Beck Hopelessness Scale (BHS), a 20-item self-report inventory designed to measure three major aspects of hopelessness: (i) feelings about the future, (ii) loss of motivation, and (iii) expectations (Beck et al., 1974). Apathy was measured with the Apathy Scale (AS; Starkstein et al., 1992). The AS consist of 14 items rated on a 4-point scale. The total score is 42 and higher scores indicate more severe apathy. Difficulties in recognizing and verbalizing feelings (alexithymia) was assessed using the self-report 20-item Toronto Alexithymia Scale (TAS-20; Bagby et al., 1994). Scores range from 20 to 100 and scores above 61 are considered abnormal.

#### Personality

Personality was assessed with the Zuckerman–Kuhlman Personality Questionnaire (ZKPQ; Zuckerman, 2002). The ZKPQ assesses basic dimensions of personality or temperament particularly those which describe personality traits with biological-evolutionary roots. The ZKPQ consist of 99 dichotomous (true-false response) questionnaire distributed in five content scales, namely Neuroticism-Anxiety (19 items), Activity (17 items), Sociability (17 items), Impulsive Sensation Seeking (19 items), and Aggression-Hostility (17 items) and an Infrequent Scale which is used as a validity index. The psychometric properties in a large sample of Spanish subjects demonstrated that the ZKPQ is a valid self-report measure of personality traits (Gomà-I-Freixanet et al., 2008).

#### Personal Experience of Living with Changes in Accent

The personal experience of living with a change in accent was explored in both subjects using an unstructured interview which was mainly based on their testimonies. Three main topics were addressed during the interview (see Miller et al., 2011). These included: (i) details of accent change (initiation, type of accent) and related problems with prosody and communication; (ii) adjustment and copying strategies to accent; and (iii) reactions of family members and others to accent.

#### NEUROIMAGING

Magnetic resonance imaging studies were performed in the subjects with DFAS and in a group of 21 healthy control right-handed males. Healthy controls were matched with DFAS subjects by gender (all controls were male), age (mean age ± SD: 33.05 ± 10.03 years; range: 22–59 years) and education (controls: from college to university degree). Healthy controls were Spanish speaking male residents in Malaga with variable knowledge of other languages (English and French).

#### Image Acquisition

The MRI studies were performed on a 3-T MRI scanner (Philips Gyroscan Intera, Best, The Netherlands) equipped with an eight-channel Philips SENSE head coil. Head movements were minimized using head pads and a forehead strap. High-resolution T-1 structural images of the whole brain were acquired with three dimensional (3D) magnetization prepared rapid acquisition gradient echo (3 D MPRAGE) sequence (acquisition matrix: 240/256; field of view: 240 mm; repetition time (TR): 9.9 ms; echo time (TE): 4.6 ms; flip angle: 8◦ ; turbo field echo (TFE) factor: 200; 0.8 mm × 0.8 mm × 0.8 mm resolution). One hundred eighty two contiguous slices, 0 mm slice gap, were acquired. The total acquisition time of the sequence was about 4:24 min. In addition to the 3D MPRAGE, a standard axial T-2 weighted/FLAIR [TR = 11.000 ms; TE = 125/27 ms; 264 matrix × 512 matrix; field of view (FOV) = 230 mm × 230 mm; 3-mm-thick slices with 1 mm slice gap] was obtained. DTI data acquisition was performed using multi-slice single-shot spin-echo echoplanar imaging (EPI) with specific parameters as follows: FOV 224 mm, 2-mm-thick slices with 0 mm slice gap, TE = 117 ms, TR = 12408 ms, and b factor: 3000 s/mm<sup>2</sup> . The EPI echo train length consisted of 59 actual echoes reconstructed in a 112 × 128 image matrix. Thirty-two diffusion directions were used, and the acquisition was repeated twice in order to enhance the signal to noise ratio.

#### Image Processing

Motion and eddy current correction were performed on the DTI images of the two subjects and the 21 healthy controls using the eddy current correction tool (Smith, 2004; Woolrich et al., 2009) within FMRIB's Diffusion toolbox (FDT) of FSL<sup>3</sup> . The BET tool was used to delete non-brain tissue from the images, and diffusion tensor estimation was carried out using the DTIFIT tool within FDT, with the least-square estimation algorithm. Maps of fractional anisotropy (FA), mean diffusivity (MD), axial diffusivity (AD), and radial diffusivity (RD) were then obtained.

The T1 scans were AC-PC oriented, segmented into gray matter, white matter and cerebro-spinal fluid with the New Segment tool within SPM12. Then the Diffeomorphic Anatomical Registration Through Exponentiated Lie Algebra (DARTEL; Ashburner, 2007; Bergouignan et al., 2009; Klein et al., 2009) toolbox was applied to the segmented tissue maps in order to register them to the stereotactic space of the Montreal Neurological Institute (MNI). The B0 maps were used for a rigid coregistration of the DWI maps to the T1 AC-PC oriented images. This was carried out with the FLIRT tool from the FSL toolbox. Then a non-rigid coregistration of the FA images to the space of the white matter native segment was performed with the FNIRT toolbox. The same transformation was applied to the other DWI maps. This was done in order to remove

<sup>3</sup>http://www.fmrib.ox.ac.uk/fsl/

distortions of the DWI images and adapt them to the T1 scan before normalization. The DARTEL tissue deformations were then used to normalize the participants' FA, MD, AD, and RD maps to the MNI space. Finally, the resulting FA, MD, AD, and RD maps were written with an isotropic voxel resolution of 1.5 mm × 1.5 mm × 1.5 mm and smoothed with a 12 mm full with half maximum (FWHM) Gaussian kernel.

# Statistics

A voxel-based analysis on the FA, MD, AD, and RD maps was performed with SPM12. Given that each of the two subjects could present with specific abnormalities, each of them was separately compared to the group of healthy controls in a two sample t-test. Moreover, to study if there were effects shared by the two subjects with DFAS, a 2-sample t-test was also performed between the group of the two subjects and the control group. A set of brain regions was pre-selected to perform voxel-based comparisons with small-volume correction (SVC). Regions-of-interest (ROIs) were selected from TD Labels, WFU PickAtlas Tool. Specific ROIs previously described to be involved in AFAS (Katz et al., 2012; Moreno-Torres et al., 2013; Tomasino et al., 2013) and in associated psychiatric disorders (OCD, social phobia, depression, neuroticism, and alexithymia; Servaas et al., 2013; van der Velde et al., 2013; Aghajani et al., 2014; Piras et al., 2015) were selected. The following regions were included for analysis: bilateral orbital cortex, left inferior frontal gyrus, left precentral gyrus, left insula, left inferior parietal lobe, left posterior cingulate, left inferior temporal gyrus, left orbital cortex, left anterior cingulate, left cingulate gyrus, left superior frontal gyrus, left caudate, left lentiform nucleus, left middle frontal gyrus, and right middle frontal gyrus. In this analysis, voxels were regarded as significant when falling below a corrected voxel threshold of.05 (family wise error – FWE) adjusted for the small volume. In an initial analysis, differences were also assessed at the whole-brain level, in order to find any significant effects taking place in other areas. Voxels of p < 0.005 (uncorrected, cluster size ≥ 100 voxels) were considered as indicating significant difference between subjects and the control group. Both SVC and whole-brain analyses were also carried out for four randomly selected control subjects to rule out false positives. In the whole brain analysis clusters above the selected level were found for control subjects in various cases; therefore whole brain results were not considered significant for the patients.

# RESULTS

# Linguistic Data

#### Subject 1

Results revealed that linguistically and phonologically subject 1 could be considered a typical speaker. In the three speech perception tasks he scored at ceiling (words minimal pairs: 20/20; non-words minimal pairs: 19/20; lexical stress: 10/10; intonation: 20/20). As anticipated due to the mildness of FAS, subject 1 also scored at ceiling in the sentence imitation task (35/36) and in the non-word imitation task (77/80). Detailed linguistic analyses indicated that the production of subject 1 was

phonologically very close to typical. At the prosodic level, there was no evidence of atypical production in any of the tasks: subject 1 produced the typical F0 contours (e.g., descending intonation for declaratives, ascending intonation for interrogatives), and the F0 range could be considered as typical (e.g., neither flat nor excessively variable). At the segmental level, subject 1 did produce some errors, however, they represented less than 3% in the non-word repetition task, and less than 1% in the spontaneous speech sample. Altogether, this suggests that subject 1 did not have relevant phonological difficulties. In contrast, phonetically his production was not fully typical, as observed particularly in the frequent production of consonant strengthening. The phenomenon was relatively frequent with the / b / phoneme (19/58), and less frequent with the / d / (4/21). Strengthening was not observed with the / g /. Fortitions were specially frequent in specific phonetic contexts: in / lb / sequences (9/9), as in "alberto" (Albert); in /rb/ sequences as in "árbol" (tree), (2 /7); in /sd/ sequences as in "desde" (from; 3/5); in /θd/ as in "en vez de" (instead of; 1/1). Acoustic analyses provided further evidence of atypical phonetic characteristics of subject 1. In typical Spanish speakers there is a clear contrast in terms of amplitude or intensity between the VOT of a voiceless stop and the following vowel, whereas in subject 1 the two sounds had similar amplitudes (**Figure 1A**). This might indicate that subject 1 was reinforcing the consonant, or that he was weakening vowels. In order to measure this phenomenon the intensity contrast of 20 /kV/ sequences in a VkV context; e.g., /k/ and /o/ in the sequence /pako/) was calculated. The same measure was obtained from four typical speakers. The results revealed that the mean contrast in the controls was 18.6 dB (Range 15.5 dB: 21.0 dB) whereas in subject 1 the contrast was 11.1 dB. An independent samples t-test confirmed that the difference was significant (p < 0.01) between subject 1 and each of the controls. Seven judges considered that subject 1 was definitely a native speaker and the remaining three expressed some doubts and could only state that subject 1 was possibly a native speaker. Accordingly, the mean score was 3.7. This score was not significantly different for the one obtained by the controls (range: 3.8–4). A difficulty emerged with the evaluation of the dialectal origin by the naïve judges. The judges did not agree on the dialectal origin of the controls (i.e., some controls were classified as non-local). A close examination of the results revealed that the judges' ratings might be influenced by the fact that in Valencia there are two official languages (Valencian and Spanish). Some judges classified as Valencian only those control subjects whose Spanish showed a strong influence of Valencian (e.g., with a strong /l/) (Briz, 2004). For the rest of the controls, whose first language was Spanish, and also for subject 1, the judges disagreed, which indicates that regional accent of subject 1 was not easily distinguishable from that of the control subjects.

#### Subject 2

Similarly to the other subject in this study, subject 2 scored at ceiling in various tasks. In the perception tasks the scores were 20/20 (words minimal pairs), 18/20 (non-words minimal pairs), 9/10 (lexical stress), and 20/20 (intonation). In the sentence imitation task the score was 36/36. However, the results of the

were part of a long utterance ("Pues es complicado conocer gente allí; pues me relaciono con gente en general" ("So, it is not easy meeting people there. I interact with different people"). Subject 2 omits five out of eleven phonemes and distorts one vowel. The resulting sequence is non-intelligible and out of context.

non-word and the spontaneous speech sample indicated that his production was not fully typical, both at the suprasegmental and at the segmental levels. At the suprasegmental level, preliminary analyses indicated that while the speech rate and rhythm in the repetition tasks was normal, in the spontaneous sample the speech rate was atypically high, which seemed to reduce intelligibility. Also, the speech rate in the non-word and the sentence repetition tests was almost identical to the speech rate of one control subject [(mean syllables/second ± SD] subject 2: 4.6 ± 8; control subject: 4.7 ± 9]. In contrast, the rate in the speech sample was atypically high (8.7 ± 1.9 syllables/second). Data from the four healthy control subjects showed that the mean was 7.1 ± 2.1 syllables/second. This suggests that while in the imitation tasks, subject 2 tended to use the same speech rate as the model, in free speech he accelerated his emissions. Segmental errors were observed in all the conditions. In the non-word repetition test, subject 2 produced incorrectly 11% of the consonants (i.e., these errors were four times more frequent than in subject 1). Errors were varied, including consonant distortions and substitutions, and syllable or word structure errors. Consonant errors included nasalizations (n = 2), denasalization (n = 1), place or articulation (n = 2), manner or articulation (n = 2), and l > R (n = 2). Syllable/word structure errors included vowel insertion (n = 2), consonant insertion (n = 1), consonant omission (n = 1), and metathesis (n = 3). Errors were also varied in the other two conditions (i.e., sentence imitation and spontaneous speech). However, in the spontaneous speech sample segmental errors were more severe, included diverse suprasegmental errors. For instance, 3% of the words were completely unintelligible. Further, subject 2 tended to

omit consonants (>4%), syllables (>2%) and some grammatical words (e.g., determiners). An illustrative example of accelerated production in subject 2 is shown (**Figure 1B**). Altogether, these results indicate that the difficulties in speech production were clearly higher in the spontaneous speech samples than in the imitation tests. The mean score in the accent perception task was 3.5. This result indicates that the five judges (50%) were unsure about the origin of subject 2's accent. This score was not significantly different from that of the controls. Such result is compatible with the data presented above showing that while the speech of subject 2 was mostly typical, he did produce some errors which may have raised the doubts of the judges. The exploration of the dialectal origin of subject 2 raised the same problems that had been observed with subject 1 (i.e., the judges classified as dialectal only those controls who spoke Spanish with a very strong Catalan accent). And the judges disagreed as to the origin of subject 2 and the other controls.

# Cognitive Findings

Results from cognitive and language testing of both subjects are shown in **Table 1**. Both subjects were right-handed and they did not show general cognitive deficits. Both subjects had normal language production in the sense that they did not show word finding difficulties or grammatical and syntactic anomalies. Nevertheless, subject 2 had some disruption in the flow of verbal messages (fast rate of speaking, excessive collapsing, or deletion of syllables; see above) indicative of cluttering (St. Louis and Schulte, 2011). On subtests of PALPA, both subjects obtained normal scores in subtests tapping phonology, lexical and semantic processing, although subject 2 had a slightly decreased performance on auditory lexical decision of non-words. Their performance on semantic and phonological fluency were normal. Executive functions were normal, but subject 2 showed impaired performance on part B of the Sentence Completion Hayling Test showing greater response latencies and difficulty inhibiting incorrect words. Scores on verbal learning and memory for word lists ranged from low average to wellabove average, whereas visual memory was moderately impaired in both subjects, a finding already reported in subjects with OCD (Shin et al., 2014). Communication in activities of daily living (CAL) was decreased in both subjects, particularly in subject 1 most likely due to social phobia. Further item-byitem analysis of the CAL revealed that subject 1 had major problems in communicating with foreigners, with several others he does not know, in offices, stores or public institutions, under stress or when he was tired. Also, he rarely verbally respond to criticisms. Self-reflections on communication skills measured with CAL in subject 1 were more negative than the ones reported by his partner (a 26 year-old female) on the same scale (subject 1: frequency: 42; partner: frequency: 65, p = 0.0009, Fisher Exact Test, two-tailed; subject 1: quality: 53; partner; quality: 68 p = 0.0266, Fisher Exact Test, two-tailed). Scores on the CAL were also abnormal in subject 2 particularly in making statements or reports about facts, using the telephone, answering questions asked by others, under stress, or when he was tired.

# Psychiatric Status

Both subjects fulfilled DSM-5 (American Psychiatric Association [APA], 2013) criteria for OCD, generalized anxiety disorder, depression, and for other psychiatric conditions (**Table 2**). The two subjects obtained abnormal scores on obsessional traits (LOI-traits items) and also reported multiple obsessions and compulsions (LOI-symptoms items). On both the LOI and Y-BOCS symptom checklist they described aggressive and contamination obsessions as well as compulsions including checking, repeating, and hoarding rituals (subject 1) and cleaning/washing, checking, repeating, counting, hoarding (subject 2). Subject 1 also described perfectionism and pathological doubting, whereas subject 2 reported "need to know" obsessions and engaged in making excessive lists and mental play. Both subjects had generalized anxiety disorder without panic attacks or agoraphobia and subject 1 additionally met diagnostic criteria for social phobia, PTSD, and alexithymia. Subject 2 had "possible" alexithymia and although he described discomfort in some social situations, he did not meet criteria for social phobia. Both patients also had depression, hopelessness, and apathy. Assessment of personality with the ZKPQ revealed increased scores in the Neuroticism-Anxiety factor in both subjects and decreased scores in the Sociability factor in subject 1. None of them met diagnostic criteria for hypochondriasis or other somatoform disorders.

# Social-Occupational Adjustment

Regarding the personal experience of living with FAS both subjects reported that after taking notice on its existence in the media, they sought to confirm their provisional self-diagnosis of FAS. On interviews to assess their personal experience of living with FAS, both subjects stated that they realized for the first time that they spoke with foreign accent during adolescence when classmates commented about their extraneous manner of intonation. It was noteworthy that no one among their close relatives was aware of accent change in patients 1 and 2. The presence of persistent developmental stuttering in the father of subject 1, might explain why mild speech production problems in subject 1 passed undetected to his relatives. Subject 1 commented that entering high school was a terrifying and an overwhelming experience because his new classmates repeatedly asked him if he was a foreigner due to his manner of speaking. Most classmates (and later on other people) considered that his accent variously resembled French, American English, Argentinean, or Mexican Spanish, Rumanian, and Italian, while others simply contended that his accent sounded foreign. One adverse consequence of accent change in subject 1 what that his classmates used to call him "the foreigner." He was also deeply ashamed and unhappy by the fact that he was not recognized as a citizen of Valencia. He abstained from speaking in social situations to the extent that he remained silent when he met somebody on the street. By that time, subject 1 developed intrusive and avoidant symptoms (PTSD) and he became obsessed with his foreign accent. Moreover, when somebody asked him about either his origin or manner of speaking he ruminated about these questions during several days. He described no close

#### TABLE 1 | Cognitive testing.

fnhum-10-00399 August 8, 2016 Time: 19:36 # 10


<sup>∗</sup>Caplan and Mendoza, 2011; ∗∗Valle and Cuetos, 1995; †Normative data from Tombaugh et al., 1999; § Normative data from Tombaugh, 2004; <sup>&</sup>gt;Normative data from Burgess and Shallice, 1996; ¶ Peña-Casanova et al., 2009.

friends and reported that his atypical accent and social anxiety made difficult for him to establish new social relationships. In the case of subject 2, he also realized that his speech sounded foreign when he was alerted by his classmates at high school. Through several years naïve listeners (classmates and coworkers) considered that his accent variously resembled Spanish from Lerida (a region in the west Catalonia), South-America, or the Canary Islands, or French, while others simply

#### TABLE 2 | Psychiatric testing.

fnhum-10-00399 August 8, 2016 Time: 19:36 # 11


<sup>∗</sup>The LOI was used to assess the presence of obsessive and compulsive symptoms, whereas the Y-BOCS was additionally used to rate symptom severity; † scores > 14 in the Hamilton Anxiety Scale are clinically significant (Bech, 1993); ¶ a cut-off score of 19 distinguishes well between clinical populations of phobia patients (Connor et al., 2000); § scores ≥ 61 = alexithymia, scores between 52 and 60 = possible alexithymia (Bagby et al., 1994); <sup>&</sup>gt; indicates that both patients were 1 standard deviation above the mean (Gomà-I-Freixanet et al., 2008) whereas ‡ indicates 1 standard deviation below the mean (Gomà-I-Freixanet et al., 2008).

contended that his accent sounded foreign or uncommon. The most negative consequence of his accent was that it impeded him from obtaining a qualified job as attorney or commercial worker.

Both subjects believed that the deviation of their accents from the native prototype hampered the possibility of obtaining a job commensurate with their qualifications. In addition, subject 2 reported that in one occasion while living in Madrid he failed obtaining a job as a teacher of Spanish for foreigners in a telephone interview because of his distinct accent. Subject 1 had worked at home as a freelance website designer because he had problems to obtain other types of jobs. Both subjects had adjustment problems related to the consequences of living with FAS, yet the negative impact was more pronounced in subject 1. On the SPIN (an scale to assess social anxiety), subject 1 obtained the higher scores on questions related to verbal communication ("I avoid talking to people I don't know," "fear of embarrassment causes me to avoid speaking to people," "talking to strangers scares me," and "I avoid having to give speeches") in part because he disliked his accent and tone of voice. As already stated, he also reported symptoms of increased psychological sensitivity and arousal as well as avoidant behavior consistent with PTSD. On answering the question of the DTS which enquires about the precipitant event(s) for developing PTSD he considered that the only traumatic event which lead to PTSD was the consequence of living with FAS.

# NEUROIMAGING FINDINGS

#### Structural MRI

Magnetic resonance imaging in subject 1 showed a venous angioma close to the head of the left caudate that crossed the medial frontal lobe white matter to drain into the superior sagittal sinus (Lee et al., 1996; Osborn et al., 2004; **Figure 2**). There also was a mild dilatation of the left lateral ventricle, but the rest of the MRI did not show other structural brain anomalies. In subject 2, the MRI showed expanded perivascular spaces (EPVS) mainly involving both insular cortices (Song et al., 2000; **Figure 2**).

#### Diffusion Tensor Imaging

Although changes of white matter microstructure were found in both subjects, comparison of the two subjects together with a group of 21 age- and gender-matched healthy controls revealed

no differences. Individual analyses of DTI-based microstructural changes showed significant changes in both subjects in some predefined ROIs of the left hemisphere in comparison to the healthy control group. No changes were found in the right hemisphere with this methodology. **Table 3** shows MNI coordinates and cluster sizes of the different diffusion parameters in Subject 1; these parameters for subject 2 are shown below. Voxel-based comparisons with SVC (p < 0.05, FWE corrected) in subject 1 showed significant increases values of MD, AD, and RD in regions surrounding the venous angioma and affecting the superior frontal gyrus, medial frontal gyrus and anterior cingulate gyrus. Increased AD and MD were also identified in subject in the posterior cingulate gyrus (**Figure 3**). In subject 2, small clusters of decreased MD (superior frontal gyrus, seven voxels, x = −20, y = 52, z = −10) and AD (superior frontal gyrus, nine voxels, x = −18, y = 54, z = 10) were found in the left frontal lobe (**Figure 3**). Analysis of four healthy control subjects revealed only false positive results in one of them in two clusters. One cluster of 12 voxels was found in the left inferior temporal gyrus (x = 45, y = −66, z = −2) and another cluster of two voxels was found in the left superior frontal gyrus (x = −22, y = 4, z = 57).

# DISCUSSION

In the present study, we have described the case of two adult subjects presenting with mild DFAS. Speech development was presumably normal in subject 1, yet his father had persistent developmental stuttering. Subject 2 also had a positive history for stuttering and now he shows cluttering. Our findings suggest that

#### TABLE 3 | Areas of abnormal DTI-derived parameters in subject 1 compared to 21 healthy control subjects.


<sup>∗</sup>FEW indicates family wise error; ∗∗peak coordinates represent the location of the maximum pixel values in standard Montreal Neurological Institute (MNI) space. MD+ indicates elevated mean diffusivity, AD+ elevated axial diffusivity, and RD+ elevated radial diffusivity. WM indicates white matter and L, left side of the brain.

DFAS can occur in the context of different speech and language deficits (Mariën et al., 2009), a finding previously reported in AFAS (Coleman and Gurd, 2006; Katz et al., 2012; Moreno-Torres et al., 2013). Although both subjects (1 and 2) had reported that naïve listeners (classmates, friends, and acquaintances) frequently perceived their speech as foreign, naïve judges did not confirm this possibility. Given these results it could be argued that these two subjects should not be described as FAS cases. However, detailed phonetic analyses revealed that both subjects produced errors that might explain that under certain circumstances they were perceived as foreigners. Subject 1 produced frequently consonant strengthening errors. These errors consist in producing clearly a sound that is typically weakened. Consonant weakening is very selective and it is constrained by rhythmic factors. For this reason, whenever a speaker cannot use the local rhythm, he or she strengthens sounds that are normally weak. Typically, this happens in foreign speakers and in FAS cases, both of which tend to articulate too slowly for weakening processes to occur. In other words, the subtle phonetic errors observed in subject 1 might be caused by an underlying suprasegmental deficit (i.e., rhythmic), similar to the ones observed in other FAS cases (Ingram et al., 1992).

As for subject 2, our results indicated that his speech was also rhythmically atypical, with a tendency to accelerate excessively, resulting in an increased percentage of consonant errors particularly omissions, and even unintelligible production. Such speech disorder is suggestive of a mild form of cluttering (St. Louis and Schulte, 2011). The current working definition of cluttering (St. Louis et al., 2007) is a fluency disorder characterized by a speech rate that is perceived to be abnormally rapid, irregular or both. These rate abnormalities further may be result in one or more of the following features: (1) excessive disfluencies; (2) abnormal pauses, syllable stress, or speech rhythm; and (3) inappropriate degrees of co-articulation among sounds, especially in multisyllabic words. This means that while both FAS and cluttering are prosodic deficits that cause segment impairments, cluttering is characterized by atypical speech rhythm whereas FAS is characterized by slow rhythm and possibly atypical intonation patterns (Blumstein and Kurowsky, 2006; Moreno-Torres et al., 2013). In light of the syndromic overlap it is not surprising that the speech production of subject 2 was perceived as foreign by many listeners. Also note that these rhythmic errors were not observed in imitation, which suggests that when subject 2 has a model he can regulate the rhythm of his emissions. However, he might not be able to regulate it by himself during narrative.

Altogether, this suggests that there are coincidences and differences between subjects 1 and 2. The clearest coincidence is that in both subjects the underlying linguistic deficit seems to be related with prosody, and particularly with rhythm, a deficit

that results in varied segmental errors. This result would support the proposal that the core deficit in some FAS cases is a prosodic one (Blumstein et al., 1987). The difference between the two subjects is that the prosodic deficits are qualitatively different. While subject 1 tended to articulate slowly in all task conditions (i.e., imitation/spontaneous speech), which might be related with a difficulty to plan/execute phonological programs (Moreno-Torres et al., 2013), subject 2 instead tended to articulate too rapidly exclusively in spontaneous speech, a pattern reflecting his inability to regulate the rhythm.

# Cognitive Deficits and Psychiatric Comorbidity

Comprehensive testing of cognitive functions in previous DFAS cases yielded mixed results (see details in Mariën et al., 2009; Keulen et al., 2016a). One such patient had discrepant performance (better verbal than non-verbal scores) on intelligence and memory tests scoring in the lower range on visual search and sequencing (Patient TL, Mariën et al., 2009). Another patient had abnormal performance on abstract concept formation, set shifting, and maintaining goal-oriented strategies as well as abnormal visual memory with low average scores on visual-motor integration and coordination (Keulen et al., 2016a). However, the remaining patient had normal cognitive functioning (Patient KL, Mariën et al., 2009). Analysis of these data indicate not only that non-linguistic cognitive deficits are not a prerequisite for the occurrence of DFAS, but also that the detection of such deficits may point to the dysfunction of other neural networks or of a single network subserving more than one function. Cognitive testing in our subjects also revealed deficits beyond the speech-language domain, yet we did find fewer and milder impairments than in previous DFAS cases (Mariën et al., 2009; Keulen et al., 2016a). Although further studies are required to investigate the nature of such deficits, we suggest that the pattern of impairment in our subjects might be related to comorbid psychiatric disorders rather than being central constituents of DFAS. Indeed, the most consistent and severe deficit in our subjects was on visual memory (delayed reproduction of the Rey–Osterrieth Complex Figure), an alteration frequently reported among individuals with OCD (Shin et al., 2014). In addition, subject 2, who has had attentional deficits during childhood, demonstrated poor inhibition of automatic responses in part B of the Hayling Sentence Completion Test. Changes in communication have been described in previous FAS cases (Miller et al., 2011; Moreno-Torres et al., 2013) and our subjects were not the exception. However, in the present cases communication deficits may be directly linked to DFAS, at least in subject 2. Both subjects had decreased communication but the origins of such deficits may be different in subject 1 as his scores on the CAL were lowered mainly due to increased anxiety and avoidance in social situations involving speech (specific social phobia; Stein and Deutsch, 2003), a combination previously reported among individuals with stuttering (Iverach and Rapee, 2014).

Mood and behavior in previous cases of DFAS were not affected (Mariën et al., 2009; Keulen et al., 2016a). By contrast, our subjects had psychiatric complaints, a warning sign that prompted a detailed psychiatric evaluation. The liaison between psychological factors (e.g., personality traits, anxiety, and stress) and developmental speech-language disorders has been repeatedly mentioned in the literature (Snowling et al., 2006; Dietrich et al., 2012; Gunn et al., 2014; Karukivi and Saarijärvi, 2014) and it involves a complex and multifaceted cross-talk. For example, a central question is whether psychological symptoms once established persist in individuals with developmental speech-language disorders or if they actually develop in latter phases as aftermaths of delayed communication (Alm, 2014; Iverach and Rapee, 2014). In the case of DFAS, note that even if accent changes emerge during early childhood, the emotional impact of this way of speaking may not be apparent until adolescence when the use of language for social adjustment is more demanding (Mariën et al., 2009). Furthermore, studies of children with specific language impairment have shown that this diagnosis during childhood has some relations to adult psychosocial outcomes (Snowling et al., 2006; Whitehouse et al., 2009; Karukivi et al., 2012). Data from the two subjects described herein illuminates up to what point living with a mild form of DFAS may have consequences for social, emotional and occupational adjustment. Psychiatric examination in subjects 1 and 2 revealed a variety of disorders, specifically internalizing disorders including OCD, non-OCD anxiety, social phobia (subject 1), depression, apathy, and personality features of obsessive-compulsive disorder, neuroticism-anxiety, reduced sociability (subject 1), and alexithymia. This constellation of psychiatric disorders did not conform to previous reports of psychogenic FAS (Reeves and Norton, 2001; Verhoeven et al., 2005; Poulin et al., 2007; Reeves et al., 2007). Rather, associated psychopathology to DFAS in our subjects was remarkably similar to that reported by children and adults with stuttering and cluttering (Alm, 2014; Gunn et al., 2014; Iverach and Rapee, 2014). This was not totally unexpected in our cases as subject 1 had a positive family history of persistent developmental stuttering in his father and subject 2 had stuttering during childhood and now shows cluttering.

# Gross Structural Brain Anomalies: Incidental or Symptomatic?

In the present study, structural brain anomalies were identified on the MRI of both subjects, a venous angioma in the left frontal lobe in subject 1 and EPVS in both insular regions in subject 2. Some radiological studies and textbook descriptions considered that these gross anomalies are incidental MRI findings lacking clinical relevance (Song et al., 2000; Osborn et al., 2004). Nonetheless, venous angiomas involving the frontal lobe can be symptomatic and present with psychiatric symptoms even in the absence of hemorrhagic complications (Nagaratnam et al., 1990; Watanabe et al., 2005). Cerebral venous angiomas are presumably secondary to a primary dysplasia of capillaries and small transcerebral veins or represent a compensatory mechanism caused by an intrauterine occlusion of a normal venous system (Lee et al., 1996). Parenchymal changes adjacent to the venous angioma have been described (Santucci et al., 2008) and

correspond to demyelination, gliosis, leukomalacia, and neuronal degeneration (Noran, 1945) or result from compensatory adaptive changes secondary to hemodynamic disturbances (Kimura and Mitake, 2001; Watanabe et al., 2005; Santucci et al., 2008). Our findings suggest that the anatomical distribution of the venous angioma in subject 1 may have altered intrinsic connectivity within the frontal lobe as is probably reflected by significant increases of MD, AD, and RD in regions (left superior frontal gyrus, anterior cingulate gyrus) close to the venous angioma. In the case of subject 2 structural MRI revealed multiple confluent EPVS involving anterior and posterior insular regions bilaterally. The nature and clinical significance of EPVS is still under debate (Dávila et al., 2010) and it has been suggested that EPVS involving the insular cortex lack clinical relevance (Song et al., 2000). Nevertheless, this subject had mild DFAS, poor motor coordination, and difficulties in sequencing of complex motor tasks (mild buccofacial apraxia), all abnormal features previously linked to insular involvement (Tognola and Vignolo, 1980; Moreno-Torres et al., 2013). Thus, our findings align with results from previous studies linking EPVS with developmental disorders including autistic disorder (Boddaert et al., 2009), Tourette's syndrome (Dávila et al., 2010), and coordination disorder (Brockmann et al., 2009).

#### Abnormal White Matter Microstructure

Note that visual inspection of structural MRI in all three cases of DFAS reported so far failed to found structural brain anomalies (Mariën et al., 2009; Keulen et al., 2016a) indicating that the responsible pathological substrate is below the resolution of the naked eye and pass undetected unless more sophisticated neuroimaging methods (e.g., DTI voxelbased morphometry, positron emission tomography) are used (Moreno-Torres et al., 2013). In fact functional imaging in a case of DFAS disclosed decreased perfusion in several regions (Mariën et al., 2009) including areas (bilateral prefrontal cortex, medial frontal regions, and cerebellum - Keulen et al., 2016a) of the large-scale bilateral speech production network (Wise et al., 1999; Riecker et al., 2005; Guenther et al., 2006; Sörös et al., 2006; Eickhoff et al., 2009; Ackermann and Riecker, 2010; Bohland et al., 2010). Our study is the first one that used DTI to examine microstructural changes in DFAS. Comparison of our two subjects together with the group of age- and gender-matched healthy controls revealed no differences in DTI parameters. However, individual DTI-based analysis restricted to predefined ROIs revealed subtle abnormalities. Subject 1 did show elevated MD, AD, and RD values relative to controls in key components of speech and emotion regulation networks. Altered DTI-based parameters were found in the left hemisphere involving the superior frontal gyrus, the middle frontal gyrus, and the anterior and posterior cingulate cortex. These regions have been reported to be involved in AFAS (see Moreno-Torres et al., 2013) and in previous neuroimaging studies of the psychiatric disorders presented by our subjects (e.g., OCD, social phobia, alexithymia, neuroticism; Pujol et al., 2004; Fan et al., 2012; Servaas et al., 2013; Karukivi and Saarijärvi, 2014). In subject 2, small clusters of decreased MD and AD were found in the left superior frontal gyrus. Thus, the left superior frontal gyrus and its surrounding regions most likely played a role in clinical symptoms in both subjects as well as in another case of DFAS (Keulen et al., 2016a). Nevertheless, MD, AD, and RD were increased in subject 1, whereas MD and AD 2 were decreased in subject. This might have resulted because altered white matter integrity around the venous angioma in subject 1 could have a different pathological substrate than in subject 2. Changes in diffusion parameters in both subjects may variously result from disrupted neural organization during development, maladaptive neural plasticity resulting from a cognitive and emotional bias toward negative emotions (Koganemaru et al., 2012), or inhibition of emotional expression due to heightened reappraisal (Giuliani et al., 2011) triggered by the negative and stressful consequences of having speech production deficits (Dietrich et al., 2012).

Our findings also revealed that co-occurring mild FAS and developmental speech disorders (atypical phonetic production, cluttering) may be associated with abnormal emotion regulation. The involvement of some regions (insula and cingulate gyrus) related to the emergence of foreign accent are also important for the expression personality traits (neuroticism and alexithymia) and psychiatric disorders (OCD, anxiety, social phobia, PTSD, depression, apathy, and hopelessness) diagnosed in our subjects (Pujol et al., 2004; Servaas et al., 2013; van der Velde et al., 2013; Aghajani et al., 2014; LeWinn et al., 2014; Piras et al., 2015). Involvement of these regions and also of frontal lobe regions (middle frontal gyrus) important for motor control of voice and speech production might explain in our subjects the dysfunctional interaction between personality traits, response to stress, and speech production (Dietrich et al., 2012). Thus, an altered interplay between biological trait-like diathesis (shyness and neuroticism) and the stressful experience of living with DFAS might explain the development of internalizing psychiatric disorders during late adolescence.

# AUTHOR CONTRIBUTIONS

All authors listed, have made substantial, direct and intellectual contribution to the work, and approved it for publication. MLB, NRV, IMT, and GD were involved in conception and design, acquisition of data, or analysis and interpretation of data. MLB, IMT, GD, MJTP, and IDT performed language, cognitive and behavioral evaluations. NRV, CF, KTH, RRC, FA, and JPP interpreted neuroimaging data. MBL, NRV, IMT, and GD drafted the article and revised it critically for important intellectual content.

# ACKNOWLEDGMENTS

The authors thank subjects 1 and 2 and healthy control subjects for their participation in the study. KT-H and JP-P is supported by "Universidad de Málaga. Programas de Becas de Iniciación a la Investigación 2014." The authors also thank César Ávila and Julio González for their advice on analyzing Valencian accent.

# REFERENCES

fnhum-10-00399 August 8, 2016 Time: 19:36 # 16




**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Berthier, Roé-Vellvé, Moreno-Torres, Falcon, Thurnhofer-Hemsi, Paredes-Pacheco, Torres-Prioris, De-Torres, Alfaro, Gutiérrez-Cardo, Baquero, Ruiz-Cruces and Dávila. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Foreign Accent Syndrome As a Psychogenic Disorder: A Review

Stefanie Keulen1, 2, Jo Verhoeven3, 4, Elke De Witte<sup>1</sup> , Louis De Page<sup>5</sup> , Roelien Bastiaanse<sup>2</sup> and Peter Mariën1, 6 \*

<sup>1</sup> Department of Linguistics and Literary Studies, Clinical and Experimental Neurolinguistics, Vrije Universiteit Brussel, Brussels, Belgium, <sup>2</sup> Department of Linguistics, Center for Language and Cognition, Rijksuniversiteit Groningen, Groningen, Netherlands, <sup>3</sup> Department of Language and Communication Science, School of Health Sciences, City University London, London, UK, <sup>4</sup> Department of Linguistics, Computational Linguistics and Psycholinguistics Research Center, Universiteit Antwerpen, Antwerp, Belgium, <sup>5</sup> Department of Psychology, Clinical and Lifespan Psychology, Vrije Universiteit Brussel, Brussels, Belgium, <sup>6</sup> Department of Neurology and Memory Clinic, ZNA Middelheim General Hospital, Antwerp, Belgium

In the majority of cases published between 1907 and 2014, FAS is due to a neurogenic etiology. Only a few reports about FAS with an assumed psychogenic origin have been published. The present article discusses the findings of a careful database search on psychogenic FAS. This review may be particularly relevant as it is the first to analyze the salient features of psychogenic FAS cases to date. This article hopes to pave the way for the view that psychogenic FAS is a cognate of neurogenic FAS. It is felt that this variant of FAS may have been underreported, as most of the psychogenic cases have been published after the turn of the century. This review may improve the diagnosis of the syndrome in clinical practice and highlights the importance of recognizing psychogenic FAS as an independent taxonomic entity.

Keywords: foreign accent syndrome, psychogenic, non-organic FAS, speech disorder, review

# Edited by:

Srikantan S. Nagarajan, University of California, San Francisco, USA

#### Reviewed by:

Gianluca Serafini, University of Genoa, Italy Stéphane Poulin, Université Laval, Canada

#### \*Correspondence: Peter Mariën peter.mariën@vub.ac.be

Received: 27 July 2015 Accepted: 04 April 2016 Published: 27 April 2016

#### Citation:

Keulen S, Verhoeven J, De Witte E, De Page L, Bastiaanse R and Mariën P (2016) Foreign Accent Syndrome As a Psychogenic Disorder: A Review. Front. Hum. Neurosci. 10:168. doi: 10.3389/fnhum.2016.00168

## INTRODUCTION

It has now been over a century that researchers have reported on a motor speech disorder most frequently referred to as "Foreign Accent Syndrome" (FAS). The first patient with FAS was anecdotally described by Marie (1907). The term "FAS" was later coined by Whitaker (1982) who also proposed a set of diagnostic criteria: (1) "the accent is considered by the patient, by acquaintances and by the investigator, to sound foreign"; (2) "it is unlike the patient's native dialect before cerebral insult," (3) "it is clearly related to central nervous system damage (as opposed to an hysteric reaction, if such exist)"; (4) "(t)here is no evidence in the patient's background of being a speaker of a foreign language (i.e., this is not like cases of polyglot aphasia)" (Whitaker, 1982, pp. 196 and 198). These criteria only apply to one of the three FAS subtypes in the taxonomic classification recently developed by Verhoeven and Mariën (2010), who distinguished between a neurogenic (including a developmental subtype), a psychogenic and a mixed variant of FAS.

Psychogenic FAS is defined by Verhoeven and Mariën (2010) as "the variant in which the foreign accent of the patient is grounded in underlying psychological issues" (p. 601). It is also referred to as "non-organic," "functional," or "psychosomatic" FAS. Aronson and Bless (1990) have expressed a clear preference for the term "psychogenic" because this term has "the advantage of stating positively, based on an exploration of its causes, that the [...] disorder is a manifestation of psychological disequilibrium such as anxiety, depression, personality disorder, or conversion reaction [...]" (p. 121). In general, this "sub-category" contains all the cases of FAS in which an organic substrate cannot be identified after careful clinical neurological, neuroradiological, and/or Keulen et al. Psychogenic FAS: A Review

neurophysiological examination, and for which a clear psychological factor is identified (e.g., Verhoeven et al., 2005) as well as the cases for which it is hypothesized that a disclosed organic deficiency cannot be held responsible for the FAS (e.g., Gurd et al., 2001; Van Borsel et al., 2005). The latter is not uncommon.

According to Baumgartner (1999) several researchers in speech and language pathology have published cases in which a clear neurological impairment was identified, but the speech or voice disorder was convincingly argued to be of psychogenic origin (Tippett and Siebens, 1991; Baumgartner and Duffy, 1997). Baumgartner (1999) emphasizes the importance of carefully considering the patient's medical history, meticulously interpreting the symptoms, and evaluating the coherence between different observations. If medical history, onset of symptoms, symptom characteristics and their evolution, neurological examinations, neuroimaging, and cognitive workup do not unambiguously point toward a neurological disorder, an alternative interpretation should be considered.

This article presents a detailed review of FAS cases with an assumed psychogenic etiology published between 1907 and July 2014. The focus of the investigation is on the associated psychopathologies, the onset and remission of the accent, the type of accent, the segmental, and suprasegmental characteristics contributing to the perception of the patient's accent as "foreign," as well as the comorbid speech- and/or language symptoms.

The goal of this review is to analyze the main features of psychogenic FAS in order to shed more light on this taxonomic variant and facilitate the diagnosis in clinical practice.

### METHODS

The available literature on (psychogenic) FAS was identified by means of regular searches in online electronic databases (Web of Knowledge, ScienceDirect, PubMed, Medline, PsycINFO), using the following keywords in Boolean search: "foreign accent syndrome," "FAS," "psychogenic AND FAS," "psychogenic AND foreign accent syndrome." The reference sections of all relevant articles were scanned to identify additional references. All the articles between 1907 and July 2014 were included. Only original case descriptions were retained for this review, as some of the data were re-used by the same or other authors in later publications. Inclusion criteria for psychogenic FAS were: (1) the onset of a foreign accent, (2) the presence of, or indication(s) for psychological/psychiatric symptoms, (3) the absence of neurological damage that could explain the speech and/or language symptomatology

#### RESULTS

# Demographic Characteristics and Associated Psychopathologies

The initial database search resulted in a corpus of 129 articles reporting instances of FAS (regardless of the etiology). However, at least 24 cases were published twice or more. Only original case reports were included for the counts in this section. Fifteen of the 105 (original) FAS cases published between 1907 and July 2014 matched the inclusion criteria of psychogenic FAS (see **Table 1**). The putative psychogenic FAS cases represent 14% of all published FAS cases (n = 15/105). Two case reports [case 3, 8] were reported twice<sup>1</sup> . Sixty-seven percent of the included patients are women (n = 10/15), and 33% are men (n = 5/15). The mean age of patients with assumed psychogenic FAS is 48 years and 1 month (range: 30–74 years, SD: 12 years and 9 months). Men had a mean age of 56 years and 2 months (range 30–74 years, SD: 17 years 8 months) and women 44 years and 1 month (range 32–54 years, SD: 7 years 11 months). Patient's occupation was only mentioned in a few case reports (n = 5/15) [cases 3, 5, 8, 10, 12]. Education levels were never stated. Five patients are described as right-handed [cases 2, 5, 8, 11, 12]. However, handedness was only formally assessed in one case (case 5: right-handed; Edinburgh Handedness Test; Oldfield, 1971). For the remaining cases [1, 3, 4, 7, 8, 10, 13–15], handedness was not indicated. Two patients were self-proclaimed monolinguals [cases 8, 9], whereas two were definitely polyglots [case 5: Dutch-French-English, case 10: English-Spanish]. In case 5, FAS affected both Dutch and English, but French was perfect on all linguistic levels (suprasegmental, segmental, morphology, syntax). In case 10, however, it was not mentioned to what extent the patient's proficiency of Spanish was affected. As far as the psychological disorder is concerned, 33% of the cases presented with conversion disorder (n = 5/15; cases 5, 9–12), 13% with schizophrenia (n = 2/15) [cases 3, 6], 13% with bipolar disorder (n = 2/15) [cases 7, 8], 13% with obsessive-compulsive disorder (OCD) (n = 2/15) [cases 14, 15], 7% with post-traumatic neurosis (n = 1/15) [case 1], and 7% with mania (n = 1/15) [case 13]. In 13% of the cases, no clear psychological disorder was associated with the FAS (n = 2/15) [cases 2, 4] (see **Table 1**). However, for these cases neurological and neurophysiological examinations as well as neuroimaging were regarded incompatible with a neurogenic etiology, and it was concluded that the FAS had to be non-organic in nature.

## Phonetic Characteristics

Neurogenic FAS has been associated with a very diverse set of segmental and suprasegmental pronunciation characteristics, often with great inter-patient variability. While some studies primarily investigated the phonetic and acoustic characteristics of FAS, others focused on the pathophysiological substrate of the syndrome (see also Ingram et al., 1992; Kanjee et al., 2010). This dissociation equally applies to psychogenic FAS: some researchers have focused on the identification of the associated psychopathology and the link between the psychological disorder and FAS (e.g., Reeves and Norton, 2001; Reeves et al., 2007), whereas others described the segmental and suprasegmental transformations in speech (Verhoeven et al., 2005; Haley et al., 2010). The speech characteristics are listed in **Table 2**.

All the speech characteristics in **Table 2** have been reported for patients with neurogenic FAS as well. It seems that in patients

<sup>1</sup>The case reported by Reeves and Norton (2001) was reported again in Reeves et al. (2007; case 3) and the case reported by Poulin et al. (2007) is identical to the case reported by Roy et al. (2012, case 1). However, all the available information was used for further analyses.


#### TABLE 1 | Overview of the psychogenic case reports (literature review: 1907- July 2014).

(Continued)

#### TABLE 1 | Continued


(Continued)

TABLE 1 | Continued


Relevant information (from left to right) includes the age, gender and handedness of the patients, their medical history, the neurological and neuroradiological exams, the psychological or psychiatric affectation, the accent, and the comorbid speech and language disorders.

considered as psychogenic, vowels are more often affected than consonants and this also seems to hold for neurogenic patients (Ingram et al., 1992; Miller et al., 2006; Katz et al., 2008; Van der Scheer et al., 2014). Moreover, the nature of the changes is different for vowels and consonants: consonants are mainly affected by substitutions, omissions and additions, whereas errors against vowels mostly consist of substitution errors, vowel lengthening, and additions.

#### Accents Associated with Psychogenic FAS

**Table 3** shows the variety of accents associated with psychogenic FAS.

In 9 out of 15 cases (60%) the accent changed between geographical variants of the same language [cases 1, 3, 6, 7, 11– 15]. In 9 cases (60%) the mother tongue was a variant of English (either American or British, or a regional variant) [cases 1–3, 6, 7, 10–13]. In four cases, other variables, such as pathological language mixing [case 5] and code switching [cases 3, 14, 15], might have created the impression of FAS.

## Onset and Remission of the Accent

An acute onset of FAS occurred in 7 cases [cases 3, 6–8, 13– 15]. In these cases, FAS was associated with mania [case 13], bipolar disorder [cases 7, 8], and obsessive-compulsive disorder [cases 14, 15]. In the patients with schizophrenia [3, 6] the accent change co-occurred simultaneously with a psychosis. The patients who did not suffer psychiatric symptoms, related the onset of their FAS to a motor vehicle accident [cases 1, 11], a "near-accident" [case 5], possibility of MS [case 2], a whiplash trauma 9 years prior to consultation for FAS or after consultation of an otolaryngologist for a change of voice quality after a minor head trauma [case 4], admission to hospital for the sudden onset of sensory and gait symptoms [cases 9, 10, 12]. In 47% of the FAS cases considered psychogenic, the onset of the accent was delayed in comparison to the occurrence of the adverse life event that was held responsible for the FAS by the patients themselves [cases 2, 4, 5, 9–12]. In 5 of these cases, the patients were diagnosed with a conversion disorder [cases 5, 9–12].

In 27% of the cases (n = 4/15) [cases 3, 6, 7, 13], the accent resolved simultaneously with the associated psychiatric disorder. In two cases (13%) [cases 4, 10] FAS resolved spontaneously. In all other patients [1, 2, 5, 8, 9, 11, 12, 14, 15], FAS remained present throughout follow-up. In case 5, scores on the Minnesota Multiphasic Personality Inventory (MMPI; Butcher et al., 1989) and Dissociation Questionnaire-Revised (DISQ-R; Vanderlinden et al., 2009) were near the accepted mean, but the accent persisted.

Only three patients received speech-language therapy to reduce FAS [cases 4, 10, 12]. Van Borsel et al. (2005) applied auditory masking and delayed auditory feedback (see also comments of Moreno-Torres et al., 2013). However, these interventions did not resolve FAS. Case 10 received a symptomatic intervention for psychogenic voice and speech disorders (Duffy, 2005). However, progress did not transfer to



Cases marked by an asterisk are cases for which formal phonetic and acoustic analyses were carried out. For the remaining cases, the characteristics were noted based on perceptual (impressionistic) phonetic analysis.

conversational speech and the accent suddenly resolved after having quit outpatient therapy for several weeks. Case 12 agreed to behavioral speech therapy as well (targeting the production of TABLE 3 | Overview of the different accents associated with FAS.


individual speech segments), but she quit after one session for reasons that were not disclosed.

For patients whose accent change resolved during follow-up [cases 3, 4, 6, 7, 10, 13], the period between accent onset and remission was about 63 days on average, i.e., 9 weeks (range: 6 days–6 months, SD: 71 days). The patient described by Reeves and Norton (2001) [case 3], was re-admitted to hospital three times and this was taken into account for the calculation of the duration. In 60% of the cases [cases 1, 2, 5, 8, 9, 11, 12, 14, 15] the accent did not resolve. In these patients, investigation of the period between accent onset and last follow-up revealed that the accent persisted for 45 months on average<sup>2</sup> (range: 15 months–8 years; SD: 28 months and 2 days).

## Psychodiagnostic and Neuropsychological Testing

Formal psychodiagnostic testing was carried out in three patients (see **Table 4**). In case 5, the results obtained on the MMPI-2 in 1995 showed a conversion V-pattern. The conversion V-form designates a markedly low score on the depression scale (scale D): the conversion suppresses depression, which explains lower scores on scale D. On the other hand, it is associated with increased physical sensations, thereby increasing scores on the hypochondriasis scale and hysteria scale (Leavitt, 1985). The second patient's profile elicited an elevated degree of defensiveness (K: 70) and hysteria (Hys: 61). The restructured clinical scales revealed marginally elevated scores for depression (RC2: 66) and somatic complaints (RC1: 57). The elevated scores on the hysteria scale in conjunction with the somatic complaints (although only marginally elevated) are additional arguments to suspect conversion disorder, though the typical

(errors)

<sup>2</sup>The exact duration is unknown. The calculated figure is entirely dependent upon the duration of the follow-up for reported case studies.

#### TABLE 4 | Overview of the patients subjected to psychodiagnostic tests.


MMPI-2, Minnesota Multiphasic Personality Inventory-II; DISQ-R, Dissociation Questionnaire Revised; BDI-2, Beck Depression Inventory-2; NEO-PI-R, Neuroticism Extroversion Openness Personality Inventory, Revised; SCL-R, Symptoms Checklist-90-items, Revised; STAI, State Trait Anxiety Inventory.

V-pattern was not found. Although exact scores were not provided, a conversion-V profile was also found on the MMPI-2 for case 12 (code type 1-3/3-1 is generally associated with conversion disorder). Scores on the neuroticism scale of the NEO-PI-R were low, which indicates stable personality and emotions, calmness, but also a decreased reactiveness to everyday situations (Nelson, 2014). The patient scored in the average range for the extraversion, agreeableness and conscientiousness scales. No mention was made of scores for openness to experience. The SCL-90-R is a "90-item self-report symptom inventory" (Derogatis and Savitz, 1999) in which the patient rates the severity of a series of psychiatric symptoms. These are grouped around nine dimensions: somatization, obsessivecompulsiveness, interpersonal sensitivity, depression, anxiety, hostility, phobic anxiety, paranoid ideation, and psychoticism (Domino and Domino, 2006). Only one clinical score was mentioned, i.e., for the somatization scale (T = 65). This agrees well with the profile elicited on the MMPI-2. The STAI is a selfreport scale for anxiety consisting of two 20-item scales. The patient indicates (1) how he/she feels now (state) and (2) how he/she feels generally (trait) (Lam et al., 2005). Scores on the STAI were subclinical. Finally, the BDI-2 is a self-report inventory, which consists of a series of statements concerning complaints. The patient notes how he/she feels about the statements taking into account his/her psychological status over the last week. Scores on the BDI-2 were equally sub-clinical.

Only in a small number of case studies formal neuropsychological investigations were carried out. General cognition, memory, attention, executive functioning, and language was assessed in 4 cases [cases 5, 8, 11, 12]<sup>3</sup> (see **Table 5**).

In case 9, only intelligence was investigated. In cases 3, 4, 6, 7, and 10 only language testing was performed. Neuropsychological examination consisted of a variety of tests (**Table 5**).

Cognitive performance was "within normal limits" (p. 715, Gurd et al., 2001) for case 2 and average to above average on all tasks in case 5. In case 8, memory and attention were normal, but the patient gave evidence of difficulties with short-term TABLE 5 | Overview of the patients subjected to neuropsychological tests.


MMSE, Mini Mental State Examination; WAIS, Wechsler Adult Intelligence Scale; WMS, Wechsler Memory Scale; TMT, Trail Making Test; WRAT, Wide Range Achievement Test; CVLT, California Verbal Learning Test; RAVLT, Rey Auditory Verbal Learning Test; CLQT, Cognitive Linguistic Quick Test; BVMT-R, Brief Visuospatial Memory Test-Revised; HDS, Hierarchic Dementia Scale (HDS); ADAS, Alzheimer's Disease Assessment Scale; BNT, Boston Naming Test (BNT); PPTT, Pyramid and Palm Tree Test; MAE, Multilingual Aphasia (Continued)

Frontiers in Human Neuroscience | www.frontiersin.org April 2016 | Volume 10 | Article 168 |

<sup>3</sup>Gurd et al. (2001) (case 2) report that "Neuropsychological examination showed verbal and performance IQs, short- and long-term memory, naming, reading and spelling skills which were within normal limits" (p. 715). However, for IQ measures and evaluation of mnestic functions, it is not clear which tests were presented.

#### TABLE 5 | Continued

Examination; BDAE, Boston Diagnostic Aphasia Examination; AAT, Akense Afasie Test (Dutch version); SAN-test, Stichting Afasie Nederland; DO-80, Test de Dénomination Orale d'Images; PENO, Protocole d'Evaluation Neuropsychologique Optimal.

2\*: possibly only two subtasks of the BDAE were administered: the non-verbal and the verbal agility test.

4\*: only written language via AAT; sentence comprehension and word retrieval (animals) SAN-Test.

8\*: letter and category fluency.

10\*: auditory word and sentence comprehension, sentence repetition, and oral and written spelling MAE.

11\*: word reading and spelling tests of the WRAT; sentence repetition task, as well as the aural and reading comprehension task MAE.

12\*: repetition skills, auditory comprehension, token task, and reading comprehension MAE.

memory (Brown Peterson Task: mean of interference scores: 42%; norm: 97.22%, SD: 4.46), as well as with attention control and executive functions (Stroop test: Stroop effect: 249′′, norm: 142.4′′, range: 88–204′′; TMT-A: 61′′, norm: 41.3′′, SD: 15′′ and TMT-B: 253′′, norm: 111.4′′, SD: 72.2′′). In case 9, results on the WAIS-R were within the normal range (VIQ = 96, PIQ = 107, and FSIQ = 101). Case 11 presented poor executive functions (Stroop test, Interference <1 pc., and TMT-B: 83′′, mean: 56.0, SD: 21.2), problems with attention and poor processing speed (TMT-A: 43′′, mean: 23.8, SD: 6.9, Stroop test A: 101′′ , <1 pc.). Case 12 demonstrated impaired intelligence, memory, attention, executive functions and fine-motor skills: WAIS-III (FSIQ = 65, VIQ = 76, PIQ = 60); Trail Making Test (146′′), Grooved Pegboard (dominant hand: 149′′, mean = 85′′, range: 48′′– 121′′, non-dominant hand: 130′′, mean = 101′′; range: 47–152′′), and Green Word Memory Test (Green Word Memory Test: immediate = 87.5, delayed = 77.5, consistency = 70.0).

Most patients in whom language was assessed, obtained average to above average results [cases 3–7, 10]. Case 2, however, had impaired oral agility as demonstrated by the BDAE (nonverbal agility: 4/12 and verbal agility: 7/12). Case 8 presented with (severely) depressed scores on phonemic and semantic category fluency (letter fluency: 5, mean: 45.46, SD: 16.4; category fluency: 14, mean: 47.85, SD: 9.8). Case 11 obtained depressed scores on most tasks evaluating speech and language (WRAT; reading: 43, pc. 6; spelling: 43, pc. 37); MAE sentence repetition (A: 2, <pc. 1 and B: 3, <pc. 1), verbal fluency (FAS): 19, pc. 2. Case 12, also demonstrated low average to impaired scores on most of the administered tasks: the BNT score was considered low average (41/60). On the MAE the following scores were obtained: repetition: 5 (impaired); auditory comprehension: 15 (borderline impaired), token test (as part of MAE): 40 (low average), and reading comprehension: 16 (borderline).

#### Comorbid Speech and Language Disorders

Five cases presented additional speech and/or language deficits [cases 4, 5, 8, 11, 12], apart from FAS. Case 4 (Van Borsel et al., 2005) and case 12 (Jones et al., 2011) went through a period of pre-FAS mutism. In case 4 mutism was only documented by selfreport. Van Borsel et al. (2005) noted that the patient's language was characterized by grammatical anomalies. This was also the case for the patient of Poulin et al. (2007) [case 8].

Case 5 implemented French syntax in native Dutch speech. Non-fluent expressive output was characterized by mistakes typically made by French learners of Dutch. Oral output of case 11 was initially considered as dysarthria, later as "apraxia of speech" (p. 1010). As mentioned, the patient obtained lower scores for verbal fluency (F,A,S), but also for sentence repetition (MAE A&B: pc. <1) and the reading and spelling tasks of the WRAT (reading: 43, pc. 6; spelling: 43, pc. 37). It could have been expected that these symptoms are related to neurological damage. Indeed, apraxia of speech is caused by structural damage to the anterior insula of the language dominant hemisphere (Dronkers, 1996). Nevertheless, contrary to expectations, repeat structural imaging of the brain (CT and MRI) did not disclose any damage. In addition, FAS was accompanied by "telegraphic speech" (irregularly deleting prepositions, for instance). In this particular case, the comorbid symptoms and the language deficits were regarded as "not credible" because the extent of the deficit did not correspond to neuroimaging findings. The patient was diagnosed with FAS of a non-organic nature because of inconsistencies in the language symptoms.

# DISCUSSION

#### Demographic Data

Analysis of the available literature suggests that psychogenic FAS is quite rare (n = 15/105) (14%). During the past decade FAS has increasingly attracted the attention of the scientific community as 93% of the psychogenic FAS cases (n = 14) were published in a time span of only 12 years (2001–2013). The finding that there are more women with psychogenic FAS than men (67% are women, 33% are men), might be partly explained by the increased predisposition of women to several of the associated psychopathologies. Most mental disorders are also more prevalent among women than men (see also: World Health Organization, 2014). For schizophrenia, prevalence figures are esteemed to be equal, irrespective of gender, though symptoms occur earlier in men (Angermeyer and Kühnz, 1988; Saha et al., 2005; National Institute of Mental Health, 2015). On the other hand, the analysis of the neurogenic population revealed a similar demographic distribution: 68.6% of the authentic (neurogenic) FAS cases were women (n = 59/86). Interestingly, Baker (2003) points out that it should also be taken into account that women are twice as likely to seek medical attention than men. It thus seems that the explanation for this demographic distribution remains speculative.

#### Associated Psychopathologies

Several different psychopathologies have been associated with FAS. In patients with schizophrenia, all FAS episodes co-occurred with a discontinuation of anti-psychotic drugs, which caused exacerbations [cases 3, 6]. In the bipolar patients FAS also cooccurred with positive symptoms [cases 7, 8]. Reeves et al. (2007) put forward the hypothesis of a direct link between the manic/psychotic exacerbations and FAS in their patients via a Positive And Negative Syndrome Scale (PANSS; Kay et al., 1987). They also suggested that FAS could have been related to a temporary disruption of the inhibition of the bilateral superior temporal gyri (STG) during exacerbations. The STG is inhibited in healthy controls when the left dorsolateral PFC is activated for word generation. It is hypothesized that FAS may have been caused by the intermittent suppressed neural circuitry.

Moreno-Torres et al. (2013) observed that the dopaminergic system may be disrupted in FAS patients. The intake of dopamine antagonists (olanzapine, risperidone) in case 3 and 6 could have restored the neurotransmitter balance and diminish the FAS. Particularly in schizophrenic patients, the so-called "dopaminergic hypothesis" (Meltzer and Stahl, 1976; McCutcheon and Stone, 2015) agrees well with this theory. This hypothesis claims that positive symptoms in schizophrenia can be reduced by the intake of dopamine antagonists or dopamine D2-receptor blockers. It has also been shown that modulation of the dopaminergic system influences the functionality of the (pre)fronto-striato-pallidal-thalamic network, which is hypothesized by Reeves and Norton (2001) to be implicated in the accent change, and has been related to the occurrence of psychosis (Honey et al., 2003).

The symptoms of case 13 might be explained along the same lines, as excess dopamine transmission has been suspected to incite manic symptoms (Swerdlow and Koob, 1987; Cookson, 2013). Nevertheless, the pathophysiology of both psychiatric disorders is characterized by subtle differences. In schizophrenia, abnormal activity occurs in the striatum and the prefrontal cortex, whereas in mania the activity may be located more toward the dorsal nigrostriatal pathways (Cookson, 2013). Nevertheless, Cookson (2013) reported that antipsychotic drugs such as risperidone, and olanzapine (dopamine antagonists, and more specifically the ones administered to the schizophrenic FAS cases: case 3 and 6) work well on manic symptoms, such as pressured speech. The speech of case 13 was marked by excessive pressure, increased speed, loudness and forcefulness. The patient's FAS resolved simultaneously with resolution of mania after pharmacological treatment.

In case 8, a psychiatrist related the accent change and sudden Spanish and German sounding words to a psychological problem at a subconscious level. Poulin et al. (2007) performed a <sup>18</sup>F-FDG-PET scan which demonstrated metabolic changes in the area of the left insular and anterior temporal cortex and a diffuse hypoperfusion affecting the frontal, parietal, and temporal lobes bilaterally. MRI of the brain showed a slight asymmetrical atrophy. All imaging was performed in euthymic state. The possibility that both the language and psychological disorder were consistent with the neuroradiological findings was considered. However, the alterations at a linguistic level remain odd, even in the light of the attested neuroradiological findings. For instance, the output of the patient—contrary to what is expected in cases of agrammatism—was fluent, and despite a hypoperfusion affecting the insula, articulation was perceived as normal in every respect. There was no sign of apraxia of speech- , dysarthria-, or aphasic-like symptoms. All of the investigated linguistic functions were normal, except for a deficit in letter and category fluency.

Case 14 and 15 suffered from refractory OCD and were treated by means of deep brain stimulation (DBS). They both developed hypomanic behavior and started experiencing accent changes afterwards. The hypothesis of FAS due to an undetected lesion induced by the electrode implantation was excluded, as the accent only developed after the actual stimulation by the electrode and post-operative CT confirmed the absence of any additional structural brain damage. Furthermore, Polak et al. (2013) argue that lesions caused by DBS are smaller than those generally associated with FAS, including the peri-sylvian area, (pre-)motor area, and insula of the language dominant hemisphere. However, dysfunction of the previously mentioned cortico-striato-pallidalthalamic loop has frequently been suspected to be the pathogenic mechanism behind OCD, and the function of this circuit is altered when the nucleus accumbens is targeted for DBS.

"Hysteria," or "hysteric reaction," the term Whitaker (1982) used as an exclusion criterion for FAS, is an outdated term for "conversion disorder" [cases 5, 9–12]. Conversion disorder has been subsumed under the concept of "hysterical neuroses" in the DSM-II [American Psychiatric Association (APA), 1968]. According to Aronson and Bless (2011) a conversion reaction can affect any system requiring sensory or voluntary motor control and hence, also voice and speech. DSM-IV-TR [American Psychiatric Association (APA), 2000] criteria allow for such an interpretation as well, although the concept has frequently been the object of debate and is regarded insufficiently clearly defined to allow for a conclusive diagnosis (e.g., Delis and Wetter, 2007; Stone et al., 2011). In all psychogenic FAS patients with conversion disorder or those patients for whom the hypothesis of a conversion disorder was raised, the shift in accent was never the "first" conversion symptom to occur: all case studies report more general physical discomforts that preceded the FAS. Especially gait and balance disturbance [cases 5, 9, 10–12] occurred but also a range of sensory problems including tinnitus [case 9], left-sided weakness affecting face and arm [case 10], blurred vision [case 10], altered hearing [case 10], abnormal sensations in arms and legs [case 10], facial numbness [case 11], weakness in the right arm [case 11], deafness to the left ear [case 11], give-way weakness [case 12], and a right-side sensory loss [case 12].

In cases 2 and 4 an associated psychological disorder was not obvious, rather there was a range of clinical observations and findings from radiological and neurophysiological investigations, which suggested a potential psychogenic origin of FAS. Gurd et al.'s patient (2001) [2] was qualified as "psychogenic," even though CSF analyses revealed oligoclonal bands, a bio-marker of Multiple Sclerosis (MS) and EEG revealed transient spikes over the left temporal lobe. T2 hyper-intensities were found on MRI (judged clinically insignificant). It is therefore questionable whether patients suffering from MS (Gurd et al., 2001; Villaverde-González et al., 2003; Bakker et al., 2004; Chanson et al., 2009) really develop FAS as a consequence of their neurological disorder or due to accompanying psychological distress. Grazioli et al. (2008) note that over 50% of the MS patients suffer from depression. Case 2 obtained borderline results on the Hospital Anxiety and Depression Scale (Zigmond and Snaith, 1983). The case of Bakker et al. (2004) was noted to have very "labile emotions" (p. 271). The case of Villaverde-González et al. (2003) had a history of depression as well as an elevated irritability (p.1035). For the other patients, psychological well-being was not indicated.

Van Borsel et al.'s (2005) patient [case 4] had no demonstrable lesions on CT, and displayed no symptoms apart from a change of accent and some articulatory and grammatical difficulties. She had sustained a head trauma and whiplash 9 years earlier and had suffered from chronic headaches ever since. Her accent change had occurred after a visit to the otolaryngologist, approximately 1 month after she had suffered another minor head trauma. Van Borsel et al. (2005) diagnosed the speech disorder as non-organic FAS because of a psychiatric history (depression and suicidal ideation) which was related to marital problems, a completely normal neurolinguistic assessment apart from mild grammatical anomalies, articulatory difficulties, and an accent change, the absence of a organic deficit, and a spontaneous resolution of the accent 5 months after the initial visit.

Case 11 suffered a minor head trauma as well but developed FAS only 3 years later, associated with intermittent, atypical expressive language deficits, and apraxic as well as dysarthric symptoms. Initially, she also claimed that she was deaf to her left ear, but a hearing loss was formally ruled out. The patient displayed an "inconsistent" agrammatism, characterized by deletions of function words. She would use and subsequently erase the same words in a series of successive utterances. She also made other inconceivable mistakes, such as splitting numbers into digits. Given the high degree of automaticity of such numerical output, these errors are highly unlikely to occur in the absence of other language deficits. Since she passed most of the symptom validity tests, she was considered not to be feigning or malingering and was ultimately diagnosed with conversion disorder.

# Segmental and Suprasegmental Characteristics

Patients with FAS of an assumed psychogenic etiology present with a variety of segmental and suprasegmental errors. At the segmental level, the image more or less corresponds to what is generally found in neurogenic patients, including a dissociation between vowels and consonants (e.g., Katz et al., 2008). At the suprasegmental level, slow speech rate is often seen [cases 5, 8, 10–12]. Slow speech rate can be linked to slow processing speed, which may occur as a consequence of psychological and psychiatric impairment (e.g., depression, post-traumatic stress disorder, bipolar disorder, and schizophrenia). Analysis of (psychogenic) FAS-related segmental and suprasegmental errors has been predominantly impressionistic, except for a few cases in which (acoustic) measurements (e.g., fundamental frequency, speech intensity, speech, and articulation rate) were also included [cases 5, 8, 10, 12, 13]. Deviant intonation [cases 3, 6–13] is a function of pitch variation. Intonation was off in most patients with a reduced speech rate [cases 8, 10–12], but also in patients who spoke at a normal or even fast pace [case 13]. In four cases [cases 3, 6, 7, 13], deviant intonation may be associated with a psychopathology. In schizophrenia [cases 3, 6], difficulties with receptive affective prosody have been described (Rossell et al., 2013). However, Hoekert et al. (2007) state that dysfunctional expressive affective prosody also qualifies the speech profile. The manic patient of Lewis et al. (2012) demonstrated fast speech [FAS: 229 wpm; base line speech (BL): 173.9 wpm; average speech rate: 190 wpm based on (Yorkston et al., 1996)] and a pitch level that was considerably higher during FAS than during the baseline condition (conversational speech; FAS: 265.63 Hz, BL: 160.56 Hz; average F0 for a woman: 160–225 Hz based on Baken, 1987; Titze, 1994) (see also: Hanwella and de Silva, 2011). A higher speech rate was negatively correlated with the size of the vowel space, i.e., a higher speech rate leads to a more compressed vowel space in non-brain damaged subjects, which was exactly what Lewis et al. (2012) found in their patient. This compression could explain the reduced intelligibility of speech in comparison to the BL conversation sample (FAS: 73% vs. BL: 100% intelligible): contrasts between vowels diminish and vowel duration is shortened (Chen et al., 1983; Turner et al., 1995; Weinrich and Simpson, 2014).

### Accent Change

The overview of the different accents of the analyzed cases shows that there does not seem to be any consistency. However, some interesting observations can be made. Firstly, it is striking that in 7 out of 15 cases (47%) the accent changed from the standard language variant to a regional one, or the other way round. In 9 cases (60%) the mother tongue was some variant of English: either British English [cases 1, 2] or American English [cases 3, 6, 7, 10–13]. FAS is frequently documented in Anglo-saxon media<sup>4</sup> , as such the syndrome is more commonly known among lay people. For some cases more than just the accent gave the listeners the impression of a very specific foreign accent: language mixing (e.g., case 6) and code switching [case 3, 14, 15] were also observed. Code switching can be defined as switching between language varieties or registers within a single conversation. For case 3, this involved the use of words such as "blokes" instead of the usual American variant "friend." Case 14 occasionally<sup>5</sup> used a dialectal variant of Dutch while case 15 employed a vocabulary typifying a more formal register, e.g., the patient used words such as "public toilet" instead of the more informal: "loo." Polak et al.'s (2013) patients' alterations could be related to DBS, as such linguistic modifications can occur after stimulation. Verhoeven et al.'s (2005) 51-year-old female patient (case 5) occasionally used French words, made literal translations from French to Dutch, and adapted syntactic structures resembling Dutch of second language learners. It has to be mentioned that this patient had been a teacher of Dutch in a French company based in Holland and this may have rendered her very conscious of mistakes generally made by French learners of Dutch. These symptoms constitute another point of difference between the neurogenic and psychogenic patient population, as the insertion

Frontiers in Human Neuroscience | www.frontiersin.org April 2016 | Volume 10 | Article 168 |

<sup>4</sup>Madlen, Davies, "The woman with Foreign Accent Syndrome: Mother goes to bed with broad Staffordshire accent and wakes up sounding POLISH," MailOnline, October 2nd 2014, accessed on March 23rd, 2015, http:// www.dailymail.co.uk/health/article-2778297/The-woman-Foreign-Accent-Syndro me-Mother-goes-bed-broad-Staffordshire-accent-wakes-sounding-POLISH.html "Embarrasing bodies, Conditions: Foreign Accent Syndrome," channel4embarrassingillnesses.com, accessed on February 2nd, 2015; http:// www.channel4embarrassingillnesses.com/conditions/foreign-accent-syndrome/ Thomas, Emily, 'Sarah Colwill Speaks Out About Foreign Accent Syndrome In BBC Documentary "The Woman Who Woke Up Chinese"', Huffingtonpost.com, April 4th, 2013; accessed on 23rd March, 2015; http://www.huffingtonpost.com/ 2013/09/04/sarah-colwill-\_n\_3869077.html <sup>5</sup>no examples were provided.

of foreign words or regional expressions was previously only noted in a case of Ryalls and Whiteside (2006: insertion of British equivalents of American expressions) and a case of Laures-Gore et al. (2006, case 2: insertion of Spanish words in English speech). Both case reports, however, represent instances of mixed FAS (see also Verhoeven and Mariën, 2010). "Pure" neurogenic FAS patients who demonstrated such lexical borrowings have not been identified.

# Psychodiagnostic and Neuropsychological Testing

Only three patients were tested with formal psychodiagnostic test batteries. Only in two patients [case 5, 12] the pattern was significant for a conversion disorder. In case 11, somatization and hysteria were (slightly) elevated and a diagnosis of conversion disorder was agreed upon based on the inexplicable symptom course and the presence of symptoms which could not be explained on the basis of neurological impairment (apart from the FAS, sensory and motor problems equally occurred: see also Section Associated Psychopathologies). For case 9, who underwent a psychodiagnostic interview, family conflict was regarded to have had such a profound effect on the patient's mental state, that the symptoms could be related to psychological problems and a childhood trauma.

Only for case 11, additional symptom validity tests were administered. Incorporation of these tests in psychodiagnostic testing is always recommended, not only when secondary gains are at stake [case 11], but also when the impact of traumatic experiences or psychological discomforts are (possibly) downplayed (Cima et al., 2003; Bush et al., 2005). In these cases, it is important to interpret neurocognitive test results with caution, as these too can be consciously manipulated (see also: "cogniform condition/disorder": a recently developed concept within the somatoform disorders; described by Delis and Wetter, 2007).

With respect to neuropsychological testing, results were diverse for scores on tasks evaluating memory, intelligence, executive functions and attention. Three out of the five patients diagnosed with conversion disorder had poor memory and/or attention and executive functions [cases 8, 11, 12] and in one instance, deficits in fine motor skills were also observed [case 12]. Deficits in learning and memory, but also in executive function, attention, processing skills and word finding have been associated with somatoform disorders (Niemi et al., 2002; Trivedi, 2006; Demir et al., 2013). Especially, attention and executive functions are often impaired in this patient group. One of the hypotheses that have been raised to explain cognitive impairment in this group is that these deficits relate to frontal brain dysfunction. However, Wall et al. (2013) point out that the studies claiming an association between cognitive deficits and conversion disorder did not include symptom validity tests in their test protocol for patient selection and therefore no generalizations can be made. Still, the authors argue that the incidence of neurologically inexplicable cognitive deficits in patients with conversion disorder is quite high. It remains unclear whether there is a fixed set of neurocognitive deficits specific to this population, or, as others argue, whether the deficits are related to the associated psychiatric distress (Lamberty, 2008).

### Remission of the FAS

In the neurogenic population a late onset of FAS has only been noted when the FAS was "masked" by other speech or language disorders (mutism, Broca aphasia, apraxia of speech, or dysarthria). Apart from a pre-FAS muteness [cases 4, 12] and apraxic/dysarthric-like symptoms in one case [case 11], FAS was never "masked" by preceding speech/language deficits in current group. Hence, a delayed onset might be indicative of a psychogenic origin. For 27% of the investigated patients (n = 4/15), FAS resolved simultaneously with the remission of the related psychopathology [cases 3, 6, 7, 13]. In those cases, FAS developed after psychosis or after a (hypo)manic attack and was associated with a sudden withdrawal of neuroleptic drugs, or an unbalanced drug intake. In two cases (13%), FAS resolved spontaneously [cases 4, 10]. Only three patients received speech-language therapy in order to reduce the FAS [cases 4, 10, 12], and case 11 received speech-language therapy before the accent appeared. Case 10 received the symptomatic speech therapy as proposed by Duffy (2005). According to the authors, the patient occasionally managed to accurately realize the target items, though she herself did not embrace her progress. Delayed auditory feedback and auditory masking did not improve the speech deficits in the patient reported by Van Borsel et al. (2005), although this approach has been advocated by other researchers as well (González-Álvarez et al., 2003; Moreno-Torres et al., 2013). Butcher et al. (2007) point out that there is a lack of evidence-based treatment strategies for psychogenic speech and language disorders, and that this is directly related to the uncertainty and lack of confidence on the part of the speech therapist to diagnose a disorder of psychogenic origin. To the best of our knowledge, no large-scale study has ever been carried out to evaluate the effectiveness of a treatment for psychogenic speech disorders.

#### Comorbid Speech and Language Deficits

**Table 1** shows that two patients [cases 4, 12] were mute before the onset of FAS. Psychogenic mutism is well-recognized [Salfield, 1950; DSM-V: American Psychiatric Association (APA), 2013]. For case 4, the mutism can be related to the impact of psychological issues (depression, suicidal ideation) as well as to severe anxiety problems (permanent fear that the patient's son might develop Huntington disease). Case 12 was diagnosed with a conversion disorder. Mutism has previously been diagnosed in patients with conversion disorder and, in those specific cases, it is also referred to as "conversion mutism" (Rothbaum and Foa, 1991; Aggarwal et al., 2010).

In three cases, language was also characterized by agrammatic output [4, 8, 11]. McKenna and Oh (2005) note that Karl Kleist as early as 1914, used both the terms agrammatism (non-fluent, as in Broca-like speech; mostly seen in catatonic patients) and paragrammatism (fluent, more as in Wernicke-like speech; mostly seen in paranoid patients) in a psychiatric context. In 1976, Norman Geschwind described the case of a patient with a "hysterical pseudo-agrammatism" (Geschwind, 1976). The patient had been locked up in prison for passing bad checks, after which he suddenly developed a strange speech disorder and was admitted to a mental institution. What struck Geschwind was that the patient produced agrammatic speech at a normal rate in combination with stuttering behavior, a combination of symptoms, which according to Geschwind was "unique" (p. 81) and very unlike what is seen in agrammatic aphasic patients. In 1983, Levy and Jankovic published an experiment, in which they induced a (placebo) conversion reaction in a female patient in her mid-twenties. The researchers set up a double-dissociation experiment: first, the patient received a saline injection, but she was told it contained phenytoin. Later, she received the phenytoin injection, but this time she was told it contained "a neutral substance." The patient's neurological symptoms worsened after each explicitly mentioned "raise" in phenytoin, as did her scores on the various neurolinguistic exams (among others: the BDAE; Goodglass and Kaplan, 1972). Her speech became slower, (moderately) slurred and hypophonic. She made several literal paraphasias, used a telegrammatic style in repetitions and spontaneous speech, and employed overgeneralizations in picture naming. After the medicine was told to "have worn off " completely, neurolinguistic testing demonstrated only one (!) naming error. De Letter et al. (2012) reported three cases with (non-fluent) agrammatism, overgeneralizations, and paraphasias which could not be attributed to an underlying organic cerebral pathology. All three patients presented with psychiatric conditions: case 1 suffered from bipolar disorder, case 2 had a "manipulative personality" (p. 877), and case 3 had quite an extensive psychiatric history marked by mood swings, depression, and aggressiveness. All patients produced non-fluent speech, characterized by excessively long pauses. Furthermore, the patients demonstrated hypophonia, persevered in their errors, and spoke with a reduced speech rate. As was the case for the patient of Levy and Jankovic (1983) the patients never produced frustrated reactions and never attempted selfcorrection. For De Letter et al. (2012) the fluctuating language problems and neurological symptoms were the primary reasons for considering the speech/language problems of their patients as psychogenic, although they demonstrated organic anomalies. They argue that "the presence of a language disorder in patients with organic cerebral disease cannot demonstrate causation (e.g., Whitlock, 1967)" (p. 876).

Van Borsel et al. (2005) explicitly argues that "grammatical anomalies [...] did not conform to the pattern of agrammatism typical of Broca's aphasia or paragrammatism as seen in Wernicke's aphasia" (p. 424). In case 8, the agrammatism was equally noted in a context of otherwise well-articulated, fluent speech. However, apart from verbal fluency deficits (category and letter fluency) in case 8, there were no other notable deficits that characterized the neurolinguistic profile of most of these agrammatic patients. For case 11, it was mentioned that the patient had an agrammatism that was typologically different from Broca-aphasia (Kean, 1977, 1985): e.g., the patient was fluent and speech was not consistently agrammatic as she was able to rephrase sentences, and use initially omitted prepositions or verbs.

The case described by Cottingham and Boone (2010) [case 11] also presented with dysarthria-like symptoms and a suspected apraxia of speech, for which no structural lesions were seen on CT or MRI. Hence, the speech and language symptoms of their patient were considered as "non-credible." There are other reports of patients demonstrating similar incredible language symptoms. Recently, a report of De Witte and Mariën (2015) observed inexplicable post-operative language symptoms and considered them as psychogenic in a 28-year-old male patient, who had undergone awake surgery for the removal of a tumor in the left anterior inferior temporal gyrus. Post-operatively, the patient was able to repeat, read, write, name high and middle frequency words but auditory comprehension and naming of low frequency words were severely impaired and he displayed inconsistent comprehension deficits. It was noted that results on the CES-D (Center for Epidemiological Studies Depression; Eaton et al., 2004) and STAI (Spielberger et al., 1983) were higher than the cut-off, indicating a higher risk for depression or anxiety disorder. De Witte and Mariën (2015) hypothesize that the symptoms of their patient were non-organic because of the patient's sensitivity to stress and depression, the atypical (course of the) symptoms, and the fact that, despite the comprehension deficits, the patient had very good insight in the disorder as his aunt suffered from vascular aphasia. If the symptoms themselves, or the course of the symptoms, cannot be explained by attested neurological deficits, the possibility of a psychogenic etiology should at least be considered (see also: Baumgartner, 1999).

The case reported by Verhoeven et al. (2005) [case 5], presented with a form of "pseudo-paragrammatism." This patient's speech was characterized by mistakes typically made by French learners of Dutch. The patient did not speak in a telegram style speech, nor did she omit function words. She did, however, change the syntax in such a way that it no longer corresponded to what could be expected in her native language. She used French grammar in Dutch discourse, but not when speaking English. Paragrammatic speech is generally fluent, and marked by complex sentences which contain function words, verbs (also finite ones), nouns, in short: all elements required for the construction of a well-formed sentence are present, but the speakers do not apply the grammatical rules as expected.

# SHORTCOMINGS AND LIMITATIONS

The results of this review should be interpreted with caution. The scarcity of comparable measures characterizing the case reports compelled us to limit the quantitative analysis of FAS. With a view to future diagnostics, it is hoped that linguistic manifestations, medical findings, medical history, and psychiatric symptoms are documented in great detail, in order to enable a reliable FAS diagnosis and suitable therapeutic interventions.

# CONCLUSION

This paper explored psychogenic FAS as a subtype of FAS. The following conclusions can be drawn: firstly, psychogenic FAS is related to the presence of a psychiatric or psychological disturbance in the absence of demonstrable neurological damage or an organic condition that might explain the accent. Secondly, psychogenic FAS occurs more in women than men, in an age range which is likely to be prone to depression and mental problems (25–49 years). Thirdly, psychogenic FAS is characterized by both suprasegmental and segmental changes. A deviant intonation (variable pitch) and a slow speech and articulation rate are the most typical prosodic features. At a segmental level, vowels are more affected than consonants. Future research should report on segmental and suprasegmental changes in as much detail as possible, in order to aid diagnosis based on semiological distinctions between neurogenic and psychogenic FAS. Fourthly, the remission of FAS seems to be related to resolution of comorbid positive psychiatric symptoms. Fifthly, psychodiagnostic testing—including symptom validity tests—is highly recommended with a view to suspected psychogenic FAS; not only in view of adequate therapy, but also for the interpretation of cognitive deficits, which may be aggravated as well. Sixthly, patients with psychogenic FAS often demonstrate linguistic features in speech and language that are not consistent with neurogenic speech/language disorders, e.g., in psychogenic cases, FAS can co-occur with a form of isolated "pseudo-" agrammatism in unaffected fluent speech (different from agrammatism seen in non-fluent aphasic patients) and paragrammatism. Pre-FAS mutism has also been attested. Furthermore, language often shows code switching and language mixing which rarely occurs in polyglot aphasic patients.

#### REFERENCES


Future research should work toward validation of a set of criteria for psychogenic FAS via an extensive comparison with the neurogenic cognate. Moreover, in view of an efficient therapeutic guidance and clinical diagnosis, future research should focus on the treatment of non-organic speech and language disorders in large populations. We believe that a combination therapy focusing on the cognitive-behavioral problems on the one hand, and the speech and language deficits on the other, may be beneficial in this population. The intricate symptomatology often gives proof of overlapping cognitive, psychological and speech problems, and the FAS is interpreted as an (indirect or direct) emanation of the underlying psychological disturbances.

#### AUTHOR CONTRIBUTIONS

Conception and design: SK, PM, JV, EDW; acquisition of data: SK, PM, JV, EDW; analysis and interpretation of data: SK, PM; drafting of the manuscript: SK and PM; critical manuscript revision: all authors; and final manuscript approval: SK and PM on behalf of all authors.

#### ACKNOWLEDGMENTS

EDW is a post-doctoral research fellow of the Research Foundation—Flanders (FWO).


Zigmond, A. S., and Snaith, R. P. (1983). The hospital anxiety and depression scale. Acta Psychiat. Scand. 67, 361–370. doi: 10.1111/j.1600-0447.1983. tb09716.x

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Keulen, Verhoeven, De Witte, De Page, Bastiaanse and Mariën. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Stefanie Keulen1,2 , Jo Verhoeven3,4 , Roelien Bastiaanse2 , Peter Mariën1,5 , Roel Jonkers2 , Nicolas Mavroudakis6 and Philippe Paquier1,6,7\**

*1Clinical and Experimental Neurolinguistics (CLIEN), Vrije Universiteit Brussel, Brussels, Belgium, 2Center for Language and Cognition Groningen (CLCG), Rijksuniversiteit Groningen, Groningen, Netherlands, 3Computational Linguistics and Psycholinguistics Research Center (CLIPS), Universiteit Antwerpen, Antwerp, Belgium, 4Department of Language and Communication Science, City University London, London, UK, 5Department of Neurology and Memory Clinic, ZNA Middelheim General Hospital, Antwerp, Belgium, 6Department of Neurology, Erasme University Hospital, Université Libre de Bruxelles, Brussels, Belgium, 7Unit of Translational Neurosciences, Universiteit Antwerpen, Antwerp, Belgium*

A 40-year-old, non-aphasic, right-handed, and polyglot (L1: French, L2: Dutch, and L3: English) woman with a 12-year history of addiction to opiates and psychoactive substances, and clear psychiatric problems, presented with a foreign accent of sudden onset in L1. Speech evolved toward a mostly fluent output, despite a stutter-like behavior and a marked grammatical output disorder. The psychogenic etiology of the accent foreignness was construed based on the patient's complex medical history and psychodiagnostic, neuropsychological, and neurolinguistic assessments. The presence of a foreign accent was affirmed by a perceptual accent rating and attribution experiment. It is argued that this patient provides additional evidence demonstrating the outdatedness of Whitaker's (1982) definition of foreign accent syndrome, as only one of the four operational criteria was unequivocally applicable to our patient: her accent foreignness was not only recognized by her relatives and the medical staff but also by a group of native French-speaking laymen. However, our patient defied the three remaining criteria, as central nervous system damage could not conclusively be demonstrated, psychodiagnostic assessment raised the hypothesis of a conversion disorder, and the patient was a polyglot whose newly gained accent was associated with a range of foreign languages, which exceeded the ones she spoke.

Keywords: foreign accent syndrome, psychogenic, speech disorder, agrammatism, perceptual experiment, bi- and multilingualism

# INTRODUCTION

Foreign accent syndrome (FAS) is a speech-output disorder, which affects the segmental and suprasegmental characteristics of speech in such a way that a speaker is no longer able to make the correct phonetic/phonematic contrasts of his/her native language. The FAS speaker is interpreted by listeners to be a non-native speaker of his/her mother tongue, or – in some cases – as speaking a different dialectal variant. Numerous cases of FAS have been attested since Marie (1907)

#### *Edited by:*

*Srikantan S. Nagarajan, University of California San Francisco, USA*

#### *Reviewed by:*

*Michel Hoen, Oticon Medical, France Yan Wang, North Carolina State University, USA*

> *\*Correspondence: Philippe Paquier philippe.paquier@ulb.ac.be*

*Received: 22 July 2015 Accepted: 08 February 2016 Published: 02 March 2016*

#### *Citation:*

*Keulen S, Verhoeven J, Bastiaanse R, Mariën P, Jonkers R, Mavroudakis N and Paquier P (2016) Perceptual Accent Rating and Attribution in Psychogenic FAS: Some Further Evidence Challenging Whitaker's Operational Definition. Front. Hum. Neurosci. 10:62. doi: 10.3389/fnhum.2016.00062*

#### TABLE 1 | Whitaker's operational definition of FAS (Whitaker, 1982; pp. 196 and 198).


described the case of a Parisian man who started speaking with an Alsatian accent after having sustained an intracerebral hemorrhage, which Marie localized at the level of the left lentiform nucleus. Reeves and Norton (2001) were the first to explicitly link their schizophrenic patient's foreign accent (syndrome) to his psychotic exacerbations. Before them, Critchley (1962) and Gurd et al. (2001) had already hinted at a possible psychogenic etiology for FAS their patients had developed. However, they did not label it as such, possibly due to a lack of objective proof, and because in the context of Whitaker's (1982) operational definition (**Table 1**), the possibility of a psychogenic FAS is excluded. According to Whitaker's criteria, indeed, FAS is strictly related to central nervous system damage.

In 2005, Van Borsel et al. defended the hypothesis of a psychogenic FAS in their 32-year-old female patient, who presented with FAS, as well as with subtle grammatical anomalies. Medical history revealed she had suffered from depression and suicidal ideation. A neurological and radiological work-up did not reveal any neurological deficit. Other psychogenic case studies would follow (Verhoeven et al., 2005; Poulin et al., 2007; Reeves et al., 2007; Cottingham and Boone, 2010; Haley et al., 2010; Jones et al., 2011; Lewis et al., 2012; Roy et al., 2012; Polak et al., 2013). Close inspection of the FAS case studies – irrespectively of etiological substrate – reveals that this disorder rarely occurs as a "stand-alone phenomenon." Rather, there is a rich spectrum of possible comorbid speech and language impairments that can accompany FAS. The most common comorbid speech and language disorders of neurogenic FAS are dysarthria, apraxia of speech, and aphasia, mostly of the non-fluent type, although the fluent type has also been reported. In addition, muteness has been reported as a speech disorder frequently preceding FAS (**Table 2**). Furthermore, specific language impairment (SLI), developmental apraxia of speech (DAS), and agrammatism (mainly in the context of aphasia) have been noted.

In psychogenic FAS, the only comorbid speech/language impairments that have been attested over the years are a "pre-FAS"-muteness (Van Borsel et al., 2005; Jones et al., 2011), and grammatical anomalies (Van Borsel et al., 2005; Verhoeven et al., 2005; Poulin et al., 2007; Cottingham and Boone, 2010). The previously mentioned 32-year-old right-handed patient reported by Van Borsel et al. (2005) presented with grammatical errors, not explicable by any neurological damage. Aberrant realizations concerned substitution errors (mainly affecting nouns and verbs), omissions (especially affecting auxiliaries, prepositions and articles), and a dyssyntaxis. Importantly, the authors note that this pattern of grammatical errors did not conform to the pattern

TABLE 2 | Overview of the comorbid speech and language disorders in neurogenic FAS cases.


typically seen in Broca (agrammatism) or Wernicke (paragrammatism) aphasia. FAS in their patient found its expression through segmental alterations (e.g., "devoicing of voiced consonants," "cluster reduction of r-clusters," "initial consonant deletion of /h/"; p. 423) and suprasegmental alterations ("improper word stress," "improper sentence stress," "a tendency toward scanning speech"; p. 423). Two years later, Poulin et al. (2007) (Roy et al., 2012) diagnosed FAS in a 74-year-old, bipolar patient. Although they doubted the psychogenic origin of FAS, consensus as to the psychogenicity in their patient was subsequently reached among other authors (Mariën et al., 2009; Haley et al., 2010; Jones et al., 2011; Lewis et al., 2012). Other instances of psychogenic FAS in bipolar patients would follow (Reeves et al., 2007; case 2). Poulin et al.'s (2007) patient demonstrated "mild agrammatism." In contrast to what is generally seen in Broca aphasia patients with a marked agrammatism, the speech of their patient was fluently1 produced, though it was telegraphic in structure. Function words as well as bound grammatical morphemes were omitted. Unfortunately, the possible occurrence of a (multimodal) grammatical disorder in written language production was not investigated. In 2010, Cottingham and Boone described a 36-year-old woman implicated in a motor vehicle accident (MVA), who developed an Eastern European-like accent 3 years after the MVA occurred. She too developed a "telegraphic style" of speech. A combination of different arguments pleaded for the psychogenic etiology of her accent shift. First, there was the late onset of the accent (3 years post-MVA). In addition, there were no demonstrable anomalies on MRI and EEG. Furthermore, the patient exhibited a left-sided give-way weakness. Linguistically, she demonstrated difficulties in sentence repetition (10 days post-MVA), which were limited to the clinical test setting, as well as improbable error patterns (splitting numbers into separate digits).2 There were irregularities in the

<sup>1</sup>The American Speech-Language-Hearing Association (ASHA) defines fluency as "the aspect of speech production that refers to the continuity, smoothness, rate, and/or effort with which phonologic, lexical, morphologic, and/or syntactic language units are spoken" (American Speech-Language-Hearing Association, 1999). 2For instance, the number 11 was uttered as "one-one."

(aberrant) intonation pattern and inconsistencies in the grammatical disorder (deleting a preposition in one sentence, using it in the following sentence and then deleting it again). Lastly, her answers on the Minnesota Multiphasic Personality Inventory (MMPI-2) (Butcher et al., 1989), although less conclusively than expected (possibly influenced by a defensive stance), indicated a hysterical personality orientation (suggesting conversion disorder).

The current paper adds to the literature on psychogenic FAS by presenting a new case challenging Whitaker's (1982) operational definition. We report a 40-year-old, non-aphasic woman, with a 12-year history of addiction to psychoactive substances, who presented at the Erasme University Hospital neurology department between 2010 and 2013, with a complex and diverse set of symptoms mainly perturbing her gait and language. Alterations affecting her oral verbal output were stutter-like behavior, (atypical) grammatical errors,3 as well as FAS. Based on an analysis of her complex medical history, the psychodiagnostic neuropsychological, and neurolinguistic assessments, we advance the hypothesis of a psychogenic etiology. This case report also demonstrates that identifying the provenance of the perceived accent foreignness depends on the listener's subjective impression.

#### BACKGROUND

In a study on the pathophysiological mechanism of different speech disorders, Whitaker (1982) assigned four characteristics to the speech disorder, which he coined "foreign accent syndrome" (**Table 1**). As is clear from our introduction, many case reports have defied one or more criteria proposed by Whitaker. This has been an incentive for the conceptualization of a distinctive taxonomic variant of FAS, *psychogenic FAS*, which is instigated by psychological or psychiatric problems (Verhoeven and Mariën, 2010).

The aim of the current study is twofold: (1) based on the medical history, the symptoms at presentation, the neurocognitive work-up, as well as the psychodiagnostic and neurolinguistic assessments, we argue that the nature and evolution of the patient's speech/language symptoms are highly indicative of a psychogenic etiology and (2) we experimentally corroborate the hypothesis of FAS by performing an accent rating and attribution task (Verhoeven et al., 2013).

#### Patient and Medical History

The patient gave informed written consent to report her data according to the standards and regulations established by the ethics committee of the Erasme Hospital (ULB).4

The patient is a 40-year-old, right-handed, Belgian woman with 13 years of education. She is an unbalanced polyglot speaker: she was raised in French (L1) and Dutch (L2) as an early bilingual, and learned English (L3) at secondary school. French is her everyday language. She sustained a cerebral concussion after a fall at age 17. She had suffered from severe addiction to multiple opiates and psychoactive substances (cocaine, LSD, cannabis, etc.) for a period of 12 years (1988–2000). In 2003, she benefited from a last inpatient withdrawal treatment, after which she selfadmittedly stated to have been clean. In 2005, she was admitted to the same psychiatric institution because of somatization, insomnia, anxiety, underfeeding, and abulia, probably resulting from an anxio-depressive decompensation. She was considered to exhibit a histrionic personality disorder. An EEG was normal. She was discharged after a month of intensive psychotherapy, and remained under antidepressant and anxiolytic medication. She underwent surgery for a C5–C6 cervical hernia in 2008.

In February 2010, the patient was readmitted to the psychiatric institution because of speech problems of sudden onset,5 characterized by telegraphic speech, stuttering, and a change of accent especially when speaking French. She also complained of attention problems, nuchal pain, and arthralgia. She presented with non-rhythmic myoclonic jerks in lower limbs disturbing her gait, but clinical neurological examination revealed no motor or sensory deficits. Tendon reflexes were normal, and there was no cerebellar dysmetria. CT scan and MRI of the brain were normal, as was an EEG. Clinical biology tests revealed no abnormalities. Bone scintigraphy, cervical CT scan, and echography of uterus were all normal. Incidentally, at one occasion, she was noticed to speak normally during a temper tantrum caused by a feeling of not being taken seriously by the nursing staff.

In June 2010, the patient was seen at the neurological outpatient clinic for complaints concerning gait and speech. The gait and language abnormalities could not be explained by any neurologically induced deficits, and the hypothesis of a conversion disorder as well as Münchausen syndrome was formulated. In July 2010, the patient was hospitalized for largely the same complaints as the month before: (unstable) gait, backache, as well as impaired speech and language. The most striking speech symptoms consisted of a telegraphic output and stuttering (affecting her French), along with a change of accent (all formally attested during neurolinguistic investigation). Clinical neurological examination, CT scan of the brain, and clinical biology tests (HIV, mycoplasma, HCV, HBV, syphilis, and *Borrelia*) were completely normal. An MRI of the brain, performed prior to the current admission, was reported to be normal except for a discrete cortico-subcortical atrophy. An EEG was inconclusive because of the presence of muscular artifacts. Because of the multiple complaints of cervical and joint pain, a second followup was initiated at the outpatient algologic clinic. In September 2010, she was initially seen at the neurological outpatient clinic but was hospitalized because she repeatedly fell (admitted after a fall out of a wheelchair) and had diffuse pain complaints (especially situated near the cervical disks). Psychiatric complaints were noted after admission. The patient showed a behavioral regression limiting her autonomy.

<sup>3</sup>Though the grammatical disorder is a noteworthy aspect of the patient's language profile, the analysis of its multimodal characteristics is beyond the scope of the current study and will be reported elsewhere.

<sup>4</sup>Regulations and procedures to be found at http://www.erasme.ulb.ac.be/page. asp?id=9536&langue=FR.

<sup>5</sup>These speech problems showed up when the patient was refused additional financial support by an official social insurance company.

In April 2011, the patient was again admitted for approximately 1 month to the psychiatric ward because of depression, insomnia, and regression of her physical state. When hospitalized, medical staff equally noted a behavioral regression to an infantile state: the patient was incontinent (wore diapers), had cuddly toys in her hospital bed, used a pacifier, and kept herself in fetal position. Clinical neurological examination, CT scan of the brain, and an EEG were all normal. In May 2011, she received a full neurolinguistic work-up (*see below*), which demonstrated deficits affecting all language faculties. The foreign accent, articulatory efforts, and stuttering had diminished compared to June 2010. The grammatical output disorder, which affected her (fluent) speech as well as writing, was still perceptible. The neurolinguist concluded that the speech and language symptomatology was unlikely caused by a neurological disorder. The follow-up notes of the algologist until February 2012 did not mention any improvement of speech and language.

In August 2012, she was seen at the neurological outpatient clinic. At that time, she was wheelchair-bound due to sudden immobility of the lower limbs and hypoesthesia of the left hemicorpus. A last neurolinguistic work-up was realized, which demonstrated that the grammatical disorder was still present in writing, but no longer in speech. The foreign accent also had disappeared, and stuttering had remarkably diminished compared to May 2011. Language problems had – according to the patient – spontaneously resolved after she woke up from an appendectomy under general anesthesia performed 1 month earlier in a peripheral hospital.

The last time the patient was seen at the neurological outpatient clinic in July 2013, oral language production was normal. The patient presented with a complex clinical picture associating a fibromyalgic syndrome, osteo-articulatory pain, arthrosis, and a cervical discopathy. Because of the spontaneous resolution of her speech and oral language problems, the patient no longer sought neurological advice at our institution.

#### Psychodiagnostic Assessment

Psychodiagnostic assessment was conducted in 2010 by means of a structured interview, the Rorschach Test (Rorschach, 1921; Rorschach and Oberholzer, 1923), and the Object Relations Technique (Shaw, 2002). Results revealed passive self-reflection and infantile tendencies in thought, which had not (yet) found expression in her actions (this was the case in April 2011; see Patient and Medical History). The psychodiagnostic examination did not indicate a psychological dissociation. According to her Rorschach test results, the patient had regressed to an "archaic" stadium, which caused her to be nervous and which could have been incited by a fear to enter "the adult world," possibly due to traumatic events she experienced as a child (tumultuous relationships with her parents and relatives). Based on the neurological and psychiatric examinations, and given the numerous somatic complaints for which no organic lesions could be demonstrated, the patient was considered to suffer most likely from a "hysterical conversion disorder," although this was not substantiated by formal psychodiagnostic testing (she refused to be administered the MMPI).

# Neuropsychological Assessment

Standardized neuropsychological tests were carried out in 2010 (**Table 3**). The patient had an estimated premorbid IQ of 92 (Beauregard, 1971), which corresponded to an IQ of 91 as measured by the Raven Progressive Matrices (Raven et al., 1996). Verbal reasoning was normal according to the WAIS-Similarities subtest (Wechsler, 1970). The patient's short-term memory was normal in the visuospatial modality as measured by the Corsi block-tapping test (Milner, 1971) and the Violon Beehive Test (Violon and Wijns, 1984) but slightly defective in the verbal modality according to the WAIS-Digit Span (Wechsler, 1970). Delayed memory was impaired both in the visuospatial modality as assessed by the Benton Visual Retention Test (Benton, 1953) and the Rey–Osterrieth complex figure test (ROCF) (Rey, 1941), and in the verbal modality in agreement with the Rey Auditory-Verbal Learning Test (RAVLT) (Rey, 1964). Free verbal recall was normal according to the Wechsler Memory Scale-Logical Memory (Wechsler, 1969), but the RAVLT (Rey, 1964) showed decreased verbal learning. Visuoconstructive skills were normal, and there were no signs of visual neglect on the ROCF (Rey, 1941). The patient showed normal performance on Part A of the Trail Making Test (Reitan, 1992; Godefroy, 2008), but Part B indicated decreased speed for attention and sequencing. This could not be confirmed by the WAIS coding subtest (Wechsler, 1970). Of note, during the neuropsychological assessment, the psychologist also discerned a "German/Slavic"-like accent, along with a severe grammatical anomaly in spontaneous speech.

#### Neurolinguistic Assessment

Neurolinguistic assessments took place in July 2010, May 2011, and August 2012 by means of a series of standardized tests (**Table 4**), and repeatedly failed to evidence aphasia.

#### Auditory Comprehension

Auditory comprehension was assessed using the French version of the Boston Diagnostic Aphasia Examination (BDAE) (Mazaux and Orgogozo, 1983) and the shortened Token Test (De Renzi and Faglioni, 1978). Except for the Token Test, results were well within the normal range on the three occasions. Both Token Tests administered (2010 and 2011) were slightly defective because of confusions between tokens in otherwise correctly executed commands.

#### Oral Expression

Oral expression was assessed by means of the French version of the BDAE (Mazaux and Orgogozo, 1983) and the Bachy 36-items naming test (Bachy-Langedock, 1989). In July 2010, performance on most oral language tasks was severely hampered by a complex speech disorder combining (a) a stutter-like behavior with articulatory efforts in initiating words, associated with spectacular facial synkinesias, (b) an impressive grammatical disorder in spontaneous speech and across all tests administered (including oral repetition and reading aloud tasks) that was observed in L1, but not in L2 and L3, and (c) a foreign accent, which was perceived by the neurolinguist as either English or Slavic, and which similarly only affected her native language. Of note, the patient

#### TABLE 3 | Neuropsychological test results (September 2010).


*Pc., percentile.*

#### TABLE 4 | Neurolinguistic test results.


*NA, not administered; Pc., percentile.*

did not produce one single paraphasia in the sentences generated during the entire oral language assessment, and obtained normal results on automatized sequences (counting days of the week and months of the year), a responsive naming task (word finding upon orally presented questions), a body-part naming task, and a semantic verbal fluency task (1 and 2 min generation of animal names).

Overall performance in oral language was roughly similar in May 2011, though the foreign accent, articulatory efforts, and stutter-like behaviors had considerably diminished at that time. However, a prominent and paradoxically fluently produced grammatical disorder was still noticed in spontaneous speech and during all language tasks. Again, paraphasic errors were not observed.

In August 2012, 1 month after an appendectomy under general anesthesia, the patient was referred to the neuropsychological department by her neurologist, who was astonished by the unexpected and unexplained improvement of her oral language skills. The grammatical disorder in spontaneous speech and oral language tasks had completely disappeared, as had the foreign accent. Sporadically, a discrete and short-lasting stuttering was observed. Results on oral language tasks were well within normal limits, except for a persistently weak performance on visual confrontation naming and a decreased generation of animal names (paradoxical reduction of semantic verbal fluency in association with a spectacular improvement of oral expression) (**Table 4**).

#### Reading

Reading aloud in 2010 and 2011 (assessed by means of the BDAE) was effortful mostly because of the stutter-like symptoms and was characterized by a foreign accent. Moreover, reading sentences was contaminated by massive grammatical errors. The words composing the sentences, however, were correctly read. As was the case in spontaneous speech, in 2012 reading aloud had completely normalized. Reading comprehension of sentences and paragraphs was normal in 2010 and 2011. Unexpectedly, the patient performed worse at the time oral language and reading aloud had normalized (**Table 4**).

#### Writing

In written language production (assessed by means of the BDAE), graphomotor skills and writing words upon dictation were normal at the time of the three language evaluations. Writing sentences upon dictation was altered by grammatical errors (omissions of grammatical words and use of infinitive verbs), but the words themselves were written flawlessly. The written description of the Cookie Theft picture remained grammatically impaired over time, though, again, all individual words were spelled correctly.

#### Phonetic Assessment

The first author (Stefanie Keulen) performed a perceptual analysis of 5 min of spontaneous speech during which the patient explained her medical history, in order to seek which segmental and suprasegmental features could have induced or at least reinforced the impression of accent foreignness. To this purpose, the excerpt was transcribed into International Phonetic Alphabet. As

Frontiers in Human Neuroscience | www.frontiersin.org

the patient's foreign accent was judged to have diminished as of 2011, a sample was selected from the recordings made in 2010.

Perceptually, the patient appeared to realize the French uvular /R/ as an English diphthong. For instance, the verb *faire* (/fεR/) (to do) was pronounced as /feәr /. On other occasions, she used excessive alveolar trill (as, for instance, in Italian, Spanish, or Russian) instead of uvular rhoticity. Other segmental errors consisted of additions of [r] (devoir → dev*r*oir) and schwa (plus → p*e*lus) (epenthesis). The patient sometimes used a voiced velar fricative (/γ/) instead of the voiced velar plosive /g/ (e.g., /γrɑm/ for /*g*Rɑm/ or "gram" in English), which could have induced the impression of a Dutch/Flemish-like accent. Moreover, she produced voiceless and voiced ejective consonants as, for instance, in /k'ɒm/ (*comme*; like), /bRεIk'dæns/ (*breakdance*), and /beg'εje/ (*bégayer*; to stutter). Ejectives are highly uncommon in European languages and occur in some languages in the region of the Caucasus and the Americas (Hayward, 2013). The patient equally spoke with a strangled voice, probably reinforced by the repeatedly produced egressive, glottalic airflow which caused the realization of the ejectives, instead of the typical, expected pulmonic egressive airstream. Intonation of speech was aberrant. Word accent was sometimes wrongfully placed (e.g., *beau*coup; many). Melody of speech was equally altered in 2010, and there were sudden excursions of speech intensity.

#### EXPERIMENT

#### Aim

A perceptual accent rating and attribution experiment was set up with the purpose of disclosing (a) whether a group of Frenchspeaking listeners judged the patient to speak with a foreign accent, (b) which accents could possibly be identified in the FAS speaker's speech, and additionally (c) how native and non-native speakers of French could be identified. Because of the severe speech impediment suffered by the patient, we decided to apply Dankovičová and Hunt's (2011) procedure to select the stimuli (*see below*).

#### Methods

#### Materials and Samples

This study consisted of a perceptual experiment in which 25 French-speaking students in French linguistics at a francophone university in Brussels – who were not formally acquainted with speech pathologies of any kind – blindly assessed the (foreign) accent and linguistic background of six speakers. One speaker was the FAS patient, whose stimuli were mixed with stimuli from five other speakers: one was a native French-speaking Belgian woman stemming from the same geographic area as the patient, and four others were non-native speakers of French with an audible foreign accent.

The selected stimuli were retrieved from a recorded informal interview, which took place in 2010 in the context of neurolinguistic testing. The patient explained her medical history, symptoms, and the chronology of events. Nine isolated words and six grammatically correct utterances were selected and edited as to ensure full anonymity (Dankovičová and Hunt, 2011). Only correct utterances were chosen in order to avoid any possible artifacts in the listeners' judgments. In total, 90 stimuli were presented to the raters (15 stimuli × 6 speakers). Files were adjusted for the purpose of assessment using PRAAT, version 5.4 (PRAAT for Mac; Boersma and Weenink, 2014).

#### Control Speakers

Five female control speakers (**Table 5**) read the words and utterances selected from the patient's interview. Recordings were made with a Marantz Professional PMD 661 portable recorder and adjusted *via* PRAAT (Boersma and Weenink, 2014). The non-native speakers of French were, respectively, of Belgian (Dutch), English, German, and Chinese origin. In accordance with Verhoeven et al.'s (2013) methodology, their foreign accents had not been matched to those the medical staff had tentatively reported in the patient. It was assumed that most listeners would be acquainted with the control speakers' accents.

#### Stimuli and Assessment

Total sample time was 25 min. and 26 sec. The stimuli were separated from one another by a 15-s interval to allow for judgment. The sample consisted of 15 "blocks" in which each stimulus was uttered by all six speakers in pseudo-random order. Stimuli were presented only once, so each speaker recurred 15 times.

Before hearing the speech samples in open field at their institution, the listeners received the test instructions, and completed demographic information about themselves (age, gender, country of birth, time living in Belgium if not born here, mother tongue, and other spoken languages including an indication of proficiency in these languages) on a questionnaire. They were asked to rate the speakers' degree of French-speaking "nativeness" on a sevenpoint scale: 1 = "definitely *not* a native speaker of French" and 7 = "definitely a native speaker of French." In case the rating was <7, listeners were asked to identify the speaker's mother tongue.

#### Results

#### Demographic Results

Among the 25 raters (16–25 years old; mean age: 19 years and 3 months; 11 males and 14 females), 1 participant was born in England, 2 in Luxemburg, and 1 in Mali. However, they all were raised and educated in French, except for the English student (aged 17), who was raised bilingually (French–English) but had been living in the French-speaking part of Belgium for 16 years.

TABLE 5 | Demographic data of speakers (FAS and controls) in the perceptual accent rating experiment, including an indication of the level of French, CEFR, Common European Framework of Reference for Languages (Council of Europe, 2001).


#### Accent Rating Results

Results were loaded into SPSS version 22 for Mac OS X (Corp, 2013). First, inter-rater reliability was calculated for each speaker. As we had 25 different raters, this was examined by means of an intraclass correlation coefficient (ICC). As each item was assessed by each rater, and raters were randomly selected (sample selection, not population), the two-way random model was applied, checking for agreement implying that systematic differences between raters were taken into account. Results demonstrated that for FAS ICC (2,25) = 0.77, for French ICC (2,25) = 0.798, for Dutch ICC (2,25) = 0.948, for German ICC (2,25) = 0.936, for Chinese ICC (2,25) = 0.936, and for English ICC (2,25) = 0.713. These are acceptable values.

Mean scores, medians, SDs, minima, maxima, ranges, and interquartile ranges are provided in **Table 6**. Based on descriptive statistics, the French-speaking control appeared to be strongly associated with one extreme end of the continuum (*x* = 6.653, σ = 1.043, and *M* = 7; score 7 = "definitely a native speaker of French"), whereas the English-speaking control was clearly situated at the opposite extreme (*x* = 2.056, σ = 1.589, and *M* = 1; score 1 = "definitely *not* a native speaker of French"). The FAS patient, too, was associated more often with an elevated degree of foreignness (*x* = 2.288, σ = 2.166, and *M* = 1). The remaining speakers were situated in between, they apparently were the most difficult to qualify as they were equally associated with the greatest SDs (Dutch: *x* = 3.949, σ = 2.451, and *M* = 4; German: *x* = 3.880, σ = 2.422, and *M* = 3; and Chinese: *x* = 3.136, σ = 2.164, and *M* = 3).

As a Kolmogorov–Smirnov test of normality showed that data were not normally distributed (for all speakers: *p* < 0.1), non-parametric statistics were applied. A Kruskal–Wallis *H* test showed that there was a statistically significant difference among ratings for the different speakers [inter-speaker difference: *H*(5) = 778.751, *p* < 0.000]. Further analysis (Mann–Whitney *U* tests) was necessary to establish inter-speaker comparisons. There was a significant difference among all speaker ratings (**Table 7**), except in the case of the ratings for the FAS patient (*M* = 1) versus the English-speaking control (*M* = 1): *U* = 68,166.50, *p* = 0.407, and ratings for the Dutch- (*M* = 4) versus the German-speaking controls (*M* = 4): *U* = 69,469.500, *p* = 0.771, and as such: *p* > 0.0033 (corrected *p*-value, Bonferroni correction).

A correspondence analysis6 (Clausen, 1998) in which the FAS patient and the control speakers represent the first categorical variable, and the ratings attributed to them (7 = "definitely a native speaker of French" and 1 = "definitely *not* a native speaker of French") the second categorical variable confirmed that the FAS speaker and the English-speaking control were more strongly associated with accent foreignness than the other non-native speakers of French (**Figure 1**; **Table 8**). The native French-speaking control was most strongly associated with French-speaking "nativeness" (**Figure 1**).

<sup>6</sup>Correspondence analysis is a technique, which evaluates "the association between two or more categorical variables by representing the categories of the variables as points in a low-dimensional space. Categories with similar distributions are represented as points that are close in the space, and categories that have very dissimilar distributions are positioned far apart" (Clausen, 1998, p. 2).

#### Accent Attribution Results

13/25 (52%) raters tried to identify the origin of the accent in those control speakers they judged not to be a native speaker of French (score <7 on the rating scale) (**Table 9**). **Figure 2** graphically displays the accent attribution of the 13 raters for all 15 stimuli

TABLE 6 | Perceptual accent rating experiment: mean score, median, SD, minimum (Min), maximum (Max), range, and interquartile range for the patient and each of the control speakers.


per speaker (195 stimulus judgments per speaker). The native French-speaking control was recognized as a true native speaker of French in 185/195 (95%) of stimuli, whereas the FAS patient, who was also a native speaker of French, was perceived as such in only 46/195 (24%) of stimuli. In the native Dutch-speaking control, the difference between an association with a presumed French-like accent (*n* = 67/195, 34%) and a Dutch accent (*n* = 71/195, 36%) was minimal. The German-speaking control was identified as a native German speaker in only 20/195 (10%) of stimuli, whereas in 67/195 (34%) of stimuli, she was considered a French speaker and in 59/195 (30%) of stimuli, a Dutch speaker. The Chinese speaking control was identified as such in only 2/195 (1%) of stimuli. She was regarded as a native speaker of French in 34/195 (17%) of stimuli, and as a native speaker of Dutch in 55/195 (28%) of stimuli. Finally, the English-speaking control was properly identified as a native English speaker in 83/195 (43%) of stimuli, but was perceived as a Dutch speaker in 33/195 (17%) of stimuli. Speakers who were the least often associated with their native language (Dutch, German, and Chinese) were mostly given scores of 3 or 4 on the rating scale and had the greatest SDs.



FIGURE 1 | Perceptual accent rating experiment: correspondence analysis graphically displaying the accent dispersion and associated accent ratings in a two-dimensional space. The points represent a vector transformation of the data displayed in Table 8. The blue circles represent the accent rating and the green circles represent the speakers. Ratings were defined as column points, speakers as row points. The distances between the scores and speakers represent the strength of association between both values. Both FAS and English are more closely associated with "Definitely non-native speakers of French" (=rating 1). French is (correctly) associated with "definitely a native speaker of French" (=rating 7).

TABLE 8 | Perceptual accent rating experiment: correspondence table presenting the frequency of each response (1, 2, 3, 4, 5, 6, or 7) for the patient and each of the control speakers.

Correspondence table


*The data are transformed to vectors in a two-dimensional space (*Figure 1*).*

Assumptions about the native language of all six speakers were the least stratified in the French-speaking control. The stratification of the number of putative native accents perceived in the other speakers was fairly similar. Both the FAS patient and the Dutchspeaking control were associated with 13 different languages. The

TABLE 9 | Perceptual accent attribution experiment: number of different accent origins associated with the patient and each control speaker.


English-speaking control's utterances were associated with 14 different languages and those of the German-speaking participant with a total of 12 different languages. Accent attribution was most stratified in the Chinese speaker, her utterances being associated with no less than 15 possible languages (see **Figure 2**).

A majority of the raters surmised the FAS patient's utterances were produced by a person with a native Romance language: 87/195 (45%) of stimulus judgments were divided into 46 French, 19 Romanian, 18 Spanish, 3 Italian, and 1 Portuguese. It should be noted, however, that the Romance language family was the most familiar to the raters, who were all students in French linguistics. When accent attributions to Germanic languages (43/195, 22%) and Slavic languages (39/195, 20%) were taken together – the language families the neurolinguist associated with the FAS patient – the difference between associations with Romance languages on the one hand and Germanic or Slavic languages on the other hand appears quite small.

#### DISCUSSION

The patient we report presented with a complex set of symptoms mainly affecting her gait and verbal output, and which could reasonably not be explained by any neurologically induced deficits. The most striking speech/language symptoms consisted of a stutter-like behavior, a grammatical disorder, and a change of accent in the absence of aphasia. Interestingly, these speech/ language anomalies particularly altered her native language (L1: French) and were hardly observed in L2 (Dutch) and L3 (English). Two years after the initial neurolinguistic assessment, the oral speech/language deficits unexpectedly disappeared right

comparison of the FAS patient with the native French-speaking control clearly demonstrates that the raters identified the control's accent as their own in 95% of the stimuli versus a mere 24% for the FAS patient (see Accent Attribution Results).

after the patient woke up from general anesthesia induced for an appendectomy. While acknowledging the peculiar interest of the co-occurrence of accent foreignness and grammatical anomalies in psychologically induced speech-output disorders (Van Borsel et al., 2005; Verhoeven et al., 2005; Poulin et al., 2007), in the present study, we purposely focused on the patient's change of accent3 .

When a change of accent inducing an impression of accent foreignness originates from a pathological condition, it is called FAS. FAS is "a motor speech disorder in which patients develop a speech accent which is notably different from their premorbid habitual accent" (Verhoeven and Mariën, 2010; p. 600). Verhoeven and Mariën (2010) classified FAS into three distinct taxonomical types: neurogenic, psychogenic, and mixed. In *neurogenic FAS*, the change of accent is associated with damage to the central nervous system. As such, it corresponds to the prototypical FAS as defined by Whitaker (1982). In *psychogenic FAS*, there is no evidence of neurological damage, and the accent change is ingrained in underlying psychological issues or psychiatric disorders. In *mixed FAS*, a neurologically induced accent change brings about psychological adjustments aiming at improving the authenticity of the newly acquired accent in order to create a more coherent new personality. This taxonomic differentiation has important implications for the management of treatment strategies.

In the current study, the patient's accent foreignness was affirmed by 25 independent, native speakers of French on the basis of an accent rating and attribution experiment. A majority of stimuli (76%) spoken by the FAS patient were assigned a non-French accent (whereas 95% of stimuli spoken by the French-speaking control were allocated a French accent). The stratification of the number of putative accents perceived in the FAS patient and the non-French-speaking controls was fairly similar (**Figure 2**). This finding also demonstrates the patient's strong accent foreignness, and corroborates the results of the phonetic assessment, which identified several segmental and suprasegmental transformations affecting the patient's speech output.

We consider the patient reported in the present study to represent an instance of psychogenic FAS for several, not mutually exclusive reasons:

1. The sudden and unexpected remission of all oral verbal output anomalies immediately after waking up from general anesthesia seems hard to explain on neurological grounds. Although the impact of general anesthesia on cognitive functions is still a matter of opinion (Guay, 2011), one would expect such an impact, if any, to induce (transitory) post-operative cognitive defects rather than improvements (Monk et al., 2008).


enjoyed the attention her speech disorder received and always willingly participated in the neurolinguistic assessments. She did not try to avoid social contacts.


spontaneously and dramatically resolved only 2 months later, right after the surgical intervention.

Given the above-listed arguments, we strongly believe the accent foreignness in the reported patient to be of psychogenic origin. Although a conversion disorder could not formally be confirmed by means of an MMPI, repeated neurological and psychiatric observations, and follow-ups all clearly pointed to psychogenic behavioral and speech/language disturbances.

#### CONCLUSION

In 1982, Whitaker proposed four criteria which a patient should meet in order to be diagnosed with FAS (**Table 1**). In the current paper, we report on a non-aphasic patient with FAS who only partly satisfied these criteria. The patient's accent was – in accordance with Whitaker's first criterion – perceived as "foreign" by medical staff, friends, and relatives, as well as by a group of 25 independent, native French-speaking listeners who rated her accent in a perceptual rating and attribution experiment. However, the two following criteria were challenged, in that we could not find any evidence of a *clear* cerebral insult to explain the sudden arousal of the accent. In addition, the patient was an unbalanced polyglot speaker of three languages, which is defying Whitaker's fourth criterion. As regards this last criterion, in the current experiment, the patient's accent was associated with no less than 13 different languages, indicating that in polyglot FAS patients, listeners do not necessarily attribute the provenance of the perceived accent to one of the languages spoken by the patient. In fact, identifying the origin of the perceived foreign accent in FAS patients appears to depend on the degree of exposure of listeners to foreign accents (Di Dio et al., 2006; Miller et al., 2006; Verhoeven et al., 2013).

In the present study, it is also remarkable that the accent foreignness, along with the grammatical disorder and the stutterlike behaviors particularly affected the patient's mother tongue

#### REFERENCES


(French), whereas these anomalies were hardly observed in L2 (Dutch), and L3 (English). Furthermore, the occasional loss of foreign accent (and other speech anomalies) when the patient was emotionally distressed was quite noteworthy. In addition, the sudden and unexpected resolution of the foreign accent after the surgical intervention remained quite puzzling, as was the modality-specific recovery from the oral grammatical disorder (while written expression remained grammatically altered). These and other behavioral observations in the patient all pointed to a psychogenic disorder that further contests Whitaker's third criterion. The latter was also called in question by at least five reports of FAS in association with a conversion disorder published between 2005 and 2011 (Verhoeven et al., 2005; Tsuruga et al., 2008; Cottingham and Boone, 2010; Haley et al., 2010; Jones et al., 2011).

As we conclusively demonstrated, the patient reported here suffered from a speech-output disorder which listeners perceived as foreign-accented. The origin of the patient's accent foreignness and her multilingualism led us to conclude that Whitaker's operational definition of what he called "foreign accent syndrome" (Whitaker, 1982; p. 195) is too restrictive and outdated. Whitaker's criteria appear not to offer enough space to include the currently accepted taxonomic variants of FAS (Verhoeven and Mariën, 2010). Even for the neurogenic subtype, the last criterion seems barely maintainable, as polyglot, brain-injured FAS patients have also been reported (Schiff et al., 1983; Avila et al., 2004; Paquier and Assal, 2007; Levy et al., 2011). These findings underscore the necessity for a solid clinical diagnosis in the light of further treatment.

#### AUTHOR CONTRIBUTIONS

Conception and design: SK and PP. Acquisition of data: SK, NM, and PP. Analysis and interpretation of data: SK, JV, RJ, and PP. Drafting the manuscript: SK and PP. Critical manuscript revision: all authors. Critical revision of reviewed manuscript: SK and PP. Final manuscript approval: SK and PP on behalf of all authors.


Goodglass, H. (1993). *Understanding Aphasia*. San Diego, CA: Academic Press.


Rorschach, H. (1921). *Psychodiagnostik*. Bern: Bricher.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2016 Keulen, Verhoeven, Bastiaanse, Mariën, Jonkers, Mavroudakis and Paquier. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Psychogenic Foreign Accent Syndrome: A New Case

Stefanie Keulen1, 2, Jo Verhoeven3, 4, Louis De Page<sup>5</sup> , Roel Jonkers <sup>2</sup> , Roelien Bastiaanse<sup>2</sup> and Peter Mariën1, 6 \*

<sup>1</sup> Department of Linguistics and Literary Studies, Clinical and Experimental Neurolinguistics, Vrije Universiteit Brussel, Brussels, Belgium, <sup>2</sup> Department of Linguistics, Center for Language and Cognition Groningen, Rijksuniversiteit Groningen, Groningen, Netherlands, <sup>3</sup> Department of Language and Communication Science, School of Health, City University London, London, UK, <sup>4</sup> Department of Linguistics, Computational Linguistics and Psycholinguistics Research Center, Universiteit Antwerpen, Antwerp, Belgium, <sup>5</sup> Department of Psychology, Faculty of Psychology and Educational Sciences, Vrije Universiteit Brussel, Brussels, Belgium, <sup>6</sup> Department of Neurology and Memory Clinic, ZNA Middelheim General Hospital, Antwerp, Belgium

This paper presents the case of a 33-year-old, right-handed, French-speaking Belgian lady who was involved in a car accident as a pedestrian. Six months after the incident she developed a German/Flemish-like accent. The patient's medical history, the onset of the FAS and the possible psychological causes of the accent change are analyzed. Relevant neuropsychological, neurolinguistic, and psychodiagnostic test results are presented and discussed. The psychodiagnostic interview and testing will receive special attention, because these have been underreported in previous FAS case reports. Furthermore, an accent rating experiment was carried out in order to assess the foreign quality of the patient's speech. Pre- and post-morbid spontaneous speech samples were analyzed phonetically to identify the pronunciation characteristics associated with this type of FAS. Several findings were considered essential in the diagnosis of psychogenic FAS: the psychological assessments as well as the clinical interview confirmed the presence of psychological problems, while neurological damage was excluded by means of repeated neuroimaging and neurological examinations. The type and nature of the speech symptoms and the accent fluctuations associated with the patient's psychological state cannot be explained by a neurological disorder. Moreover, the indifference of the patient toward her condition may also suggest a psychogenic etiology, as the opposite is usually observed in neurogenic FAS patients.

Keywords: foreign accent syndrome, psychogenic FAS, speech disorder, psychodiagnostics, accent attribution experiment, accent rating experiment

# INTRODUCTION

Foreign accent syndrome (FAS) is a rare motor speech disorder which causes patients to speak their native language with an accent which is perceived as non-native by speakers of the same speech community. This "non-nativeness" is the result of suprasegmental and/or segmental changes, which—according to the criteria proposed by Whitaker (1982)—are the consequence of damage to the central nervous system. Often, the etiology is stroke or brain trauma affecting the language dominant areas of the brain, e.g., the left (pre)frontal, temporal and/or parietal region, the rolandic and perisylvian area, as well as the insular region. Nevertheless, FAS has also been associated with other etiologies including MS (Villaverde-Gonzalález et al., 2003; Bakker et al., 2004; Chanson et al., 2009), neoplasms

#### *Edited by:*

Srikantan S. Nagarajan, University of California, San Francisco, USA

#### *Reviewed by:*

Katia Nemr, University of São Paulo, Brazil Stéphane Poulin, Université Laval, Canada

*\*Correspondence:*

Peter Mariën peter.mariën@vub.ac.be

*Received:* 26 June 2015 *Accepted:* 21 March 2016 *Published:* 19 April 2016

#### *Citation:*

Keulen S, Verhoeven J, De Page L, Jonkers R, Bastiaanse R and Mariën P (2016) Psychogenic Foreign Accent Syndrome: A New Case. Front. Hum. Neurosci. 10:143. doi: 10.3389/fnhum.2016.00143 (Abel et al., 2009; Masao et al., 2011; Tomasino et al., 2013) and vascular dementia (Paquier and Assal, 2007). Verhoeven and Mariën (2010) argue that FAS is not only caused by (acute) neurological damage but it can also result from psychogenic issues. In psychogenic FAS, the accent is associated with a psychological/psychiatric disorder. Furthermore, Verhoeven and Mariën (2010) also identified a mixed type in which FAS initially develops on the basis of a neurological disorder: this affects patients so profoundly that they further develop the accent in order to create the impression of a more authentic personality.

The current study focuses on psychogenic FAS. For most of the psychogenic cases reported so far, a psychogenic cause was assumed because it was not possible to unambiguously identify a neurological disorder. Some authors have discarded the idea of psychogenic FAS because of diagnostic difficulties to objectify this condition (Gurd et al., 2001; Poulin et al., 2007). In some patients diagnosed with psychogenic FAS (repeated) brain imaging with CT or MRI revealed structural damage, but the speech problems were disproportionate in relation to the damage. Furthermore, in the majority of the psychogenic FAS cases symptoms were fluctuating, increasing in certain (social/emotional) contexts, diminishing or even completely resolving in others (e.g., Van Borsel et al., 2005; Tsuruga et al., 2008; Haley et al., 2010; Jones et al., 2011). Such a atypical fluctuating course of symptoms is typical of speech and voice disorders of psychogenic origin (Avbersek and Sisodiya, 2010). When FAS is typified by these phenomena and associated with identifiable psychological problems (e.g., depression, familial history, suicidal ideation) a non-organic origin may be expected (Roth et al., 1989; Tippett and Siebens, 1991; Baumgartner and Duffy, 1997; Baumgartner, 1999).

# BACKGROUND

In little over a century—counting from the first (anecdotal) FAS description by Pierre Marie in 1907 until July of 2014—only 15 FAS cases with a presumed psychogenic origin have been reported (Critchley, 1962; Gurd et al., 2001; Reeves and Norton, 2001; Van Borsel et al., 2005; Verhoeven et al., 2005; Poulin et al., 2007; Roy et al., 2012, case 1; Reeves et al., 2007; Tsuruga et al., 2008; Cottingham and Boone, 2010; Haley et al., 2010; Jones et al., 2011; Lewis et al., 2013; Polak et al., 2013). This study presents a new case of psychogenic FAS. Neuropsychological testing was carried out to assess a wide range of cognitive functions. The psychological state of the patient was evaluated by means of a series of psychodiagnostic tests, including symptom validity tests. Extensive neuropsychological investigations (Verhoeven et al., 2005; Poulin et al., 2007; Haley et al., 2010) and psychodiagnostic testing (Verhoeven et al., 2005; Cottingham and Boone, 2010) have only been occasionally reported in psychogenic case reports, although such an in-depth investigation is crucially important for accurate diagnosis and successful therapy (see also: Moreno-Torres et al., 2013). In addition, a perceptual analysis of the patient's most salient speech characteristics was carried out and an accent rating experiment was run to find out to what extent the patient's accent was considered as non-native. Additionally, the listening panel was asked to indicate the mother tongue of the FAS speaker. Such experiments have previously only been reported in four other studies (Di Dio et al., 2006; Kanjee et al., 2010; Verhoeven et al., 2013: rating and attribution experiment; Dankovicová and Hunt, 2011 ˇ : rating experiment). We are convinced that perceptual assessment reinforces the diagnosis of FAS and it may provide new insights into the perceptual impression(s) created by FAS in the ear of the beholder (Verhoeven et al., 2013).

The patient gave written informed consent to report the medical data. All the tests reported below are part of the standard, clinical neurolinguistic work-up in patients with speech and language disorders at ZNA Middelheim general hospital. Speech recordings were also made to allow for better follow-up. The patient gave written consent to use recorded speech samples for the perceptual evaluation in a public environment.

#### Case Presentation and Medical History

SB is a 33-year-old, right-handed, monolingual Frenchspeaking lady, originating from a village in the francophone Walloon part of Belgium near the Flemish border. She was raised in French and her parents were monolingual Frenchspeaking Belgians. From a neurological perspective, growth and development were unremarkable. There was no family history of neurodevelopmental disorders or learning disabilities. She had always obtained normal school results and had an educational level of 12 years. She consulted the neurology department in November 2013 because of a "Dutch or German-like accent," which she acutely developed approximately 6 months after she was hit by a car while crossing the street to deliver orders from the bakery where she worked as a saleswoman. A few months after the accident occurred, the patient mentioned an "abrupt change of personality." She considered her behavioral change as the cause for her sudden dismissal at work. There had been serious disagreements with colleagues, customers, as well as with her line manager. She was dismissed in June 2012. It was shortly after her dismissal that she developed a foreign accent.

The accident happened in December 2011. There had been no loss of consciousness. Apart from some superficial subcutaneous hematomas in the frontal and right peri-orbital region, clinical examination on admission to the hospital was normal. CT scan of the brain and spinal cord were normal. A diagnosis of minor head trauma was made. One week later, the patient started suffering from increasingly painful headaches (possibly a post-traumatic migraine, see: Weiss et al., 1991) and a desensitization of the scalp. She complained of vertigo and was hospitalized for 3 days. The clinical neurological examination on admission was normal. Laboratory investigations (blood and urine), EEG and CT were normal as well. She was diagnosed with a post-concussion syndrome, benign paroxysmal vertigo (positive Hallpike test) and a cervical trauma. Approximately 1 month later the symptoms were still present. She identified several regions of hyperaesthesia and anesthesia in the facial area and the scalp. The vertigo had receded, but she complained of severe neck and shoulder pain. Approximately 4 months after the accident, she consulted a neurologist again. The clinical neurological examination and EEG revealed no abnormalities. During this visit, the patient mentioned that she felt she had become "someone else" after the accident, with regular aggressive outbursts toward family, friends, strangers, and clients. The patient complained about attention deficits and permanent fatigue. She also mentioned that the intensity of the accent was fluctuating: the accent was heavier when she was tired.

Due to the persistence of her complaints with respect to her accented speech and memory, the patient was referred to hospital for additional radiological examinations. In November 2012, she underwent a saggital T1-weighted and axial FLAIR, diffusion, SWI, proton density and T2-weighted MRI of the head, a coronal FLAIR MRI perpendicular to the axes of the left and right hippocampi, as well as an angio-MRI of the brain and 3D TOF of the circle of Willis. The qualified radiologist reported that all acquisitions were normal.

In November 2013, she consulted our department because of the persistence of the accent change and cognitive complaints (attention problems and episodes of confusion). At a linguistic level she suffered from word-finding difficulties and morphological problems related to article-noun agreement (she did not differentiate between the masculine and feminine forms of the definite article). According to her, listeners had the impression that she spoke with a Dutch accent. Her previous customers, for instance, had perceived her as a native Dutchspeaking Belgian and repeatedly asked her why she spoke French instead of "Flemish" (the Belgian variant of Dutch; see: Verhoeven, 2005). She still suffered from behavioral changes and avoided social contact with her family and friends because of a lack of interest on her part. Yet, she was looking for more excitement in life, as well as a more frivolous, out-going lifestyle. She said she was deeply bored. In addition, a number of depressive symptoms were mentioned including apathy, loss of drive and initiative, and mood-swings.

#### Neuropsychological Testing

The first neuropsychological assessments were carried out approximately 1 year after the accident in January 2012 (see **Table 1** for an overview of the results). The test battery consisted of the Wechsler Adult Intelligence Scale-IV (WAIS-IV; Wechsler, 2011, French Ed.), the d2-test (Brickenkamp and Zillmer, 1998), the "Barrage de Zazzo" (Zazzo, 1974), the Stroop Test (Stroop, 1935), the Wisconsin Card Sorting Test (WCST; Grant and Berg, 1948), and the California Verbal Learning Task (CVLT, Delis et al., 2000). Repeated neuropsychological testing in 2014 consisted of the Wechsler Memory Scale—Revised (Wechsler, 1987), the Boston Naming Test (Kaplan et al., 1983); the Trail Making Test (Reitan, 1958), and the d2-test.

A full scale IQ (FSIQ) of 105 was found with a significant discrepancy of 24 IQ-points between the verbal (96) and performance IQ level (120). All subtest scores were within the normal range. Executive function (mental flexibility, frontal problem solving) was tested by means of the Stroop and the WCST. She obtained a normal result on the WCST, but depressed scores on the Stroop with slowed processing in the color naming condition (Z-score = −1.5 SD), interference condition (Z-score =−1,6 SD), and flexibility condition (Z-score = −1.7 SD). Tests measuring sustained visuo-motor and selective attention (d2-test in 2012/2014 and the "test de barrage de Zazzo") were performed at a slow pace. Scores for total items treated for the d2-test (2012: Z = −3.08 SD; 2014: Z = −2.44 SD) as well as the total items corrected (2012: Z = −2.94 SD; 2014: Z = −2.20 SD) were in the pathological range. As shown by the CVLT, verbal memory was intact, the patient obtained borderline results for the "total recollection" (5 trials) of List A (Z-score = −1.49 SD). On other subtasks of the CVLT she obtained normal results (+1 SD: Cued recall A, Delayed recall A, Cued delayed recall A, Recognition).

In 2014, a significant discrepancy between a very superior visual memory index (= 133) and clinically deficient verbal memory index (= 74; −1.7 SD) was found on the WMS-R. As reflected by a general attention index of 70 (−2 SD), the WMS-R tasks scores were in the deficient range. The Trail Making Test (part A and B) disclosed low average visual search (< pct. 10) and mental flexibility (pct. 20). Sustained visuo-motor attention scores were within the defective range. Performance on the BNT was normal. Overall, the data for the test session in 2014 were in line with the results obtained in 2012.

# Psychodiagnostic Assessment

The psychodiagnostic assessment consisted of an interview with an experienced clinical psychologist (LDP), which was followed some time later by a session during which the patient was asked to respond to a series of standardized questionnaires. These questionnaires were completed at the hospital, without the help of the examiner. Testing included the Minnesota Multiphasic Personality Inventory-2 (MMPI-2: Butcher et al., 1989); the Defense Style Questionnaire (DSQ-60: Thygesen et al., 2008); the Rotter Incomplete Sentences Blank (RISB-FR: Rotter et al., 1992); Beck Depression Inventory-II (BDI-2: Beck et al., 1996), Pathological Narcissism Inventory (PNI: Pincus et al., 2009; French version: Diguer et al., 2014), and the Narcissistic Personality Inventory-40 (NPI-40: Raskin and Hall, 1979).

Furthermore, symptom validity and self-presentation tests were carried out by means of the List of Indiscriminate Psychopathology (LIPP: Merten and Stevens, 2012), and the Supernormality Scale (SS: Cima et al., 2003). The LIPP is an experimental questionnaire, which measures calibration problems. It consists of questions addressing pseudo- and real symptoms (Merckelbach et al., 2013). Malingering participants are in doubt as to which symptoms they can report and which ones they cannot. The SS is a questionnaire, which evaluates deception or denial under the guise of giving socially desirable answers (Cima et al., 2003).

During intake the patient gave evidence of disinhibition which mainly manifested itself as laughing without reason, Witzelsücht and inappropriate comments. The patient was reticent and maintained a (psychologically immature) defensive attitude throughout the entire interview. Her thoughts were preoccupied by frustration about her own situation. The interview was dominated by her feelings concerning her increased impulsiveness, aggressiveness and apathetic demeanor vis-à-vis her family, former boss, and colleagues. The examiner noticed that a topic which rendered her frustrated led to an emotional breakthrough during which she lost the

#### TABLE 1 | Neuropsychological test results for the years 2012 and 2014.


(Continued)

#### TABLE 1 | Continued


FSIQ, full scale IQ; WAIS-IV, Wechsler Adult Intelligence Scale—IV; WMS-R, Wechsler Memory Scale; Stroop, Stroop task.

"Dutch/Flemish-like" accent. The patient's interview contained numerous contradictions (e.g., stating at first that she was a very lively, out-going person, but when asked later what she did during the day, she answered that she sat in a chair as all personal contact bored her and conversations with others—even friends were too difficult and tiring). The description of her emotional and family life remained superficial and prosaic. The interview revealed increasing relational problems. The relationship with her husband left her "unaffected" and relationships with friends, family and relatives were unstable, marked by serious rows in which she responded unpredictably.

She confirmed egocentric and narcissistic tendencies. It was not possible to detect signs of perceptual aberration or other florid psychotic symptoms. A few weeks after the interview, a series of standardized psychodiagnostic tests were administered. Symptom validity and self-presentation tests, such as the List of Indiscriminate Psychopathology and the Supernormality Scale, did not yield indications for (conscious or unconscious) manipulation. Personality testing indicated a wide, undifferentiated personality disturbance. Interestingly, scores on both narcissism measures (NPI and PNI) were at most extreme upper ends, which is consistent with her answers during the clinical interview. A thymic disturbance and affective lability were objectified (APA, 2000; DSM-IV-TR, Axis I), but test results did not equivocally point toward a well-defined personality disturbance. Clinically, however, the patient gave clear indications of highly dependent, histrionic and borderline personality characteristics (APA, 2000; DSM-IV-TR, Axis II). On a psychodynamic structural level, she was considered to have a borderline personality organization level of functioning (Kernberg, 1984), because of an immature defensive functioning, intact reality testing, but severe lack of personality integration. This is relevant in relation to (interpersonal) acting out and poor bodily representation. The overall clinical presentation seemed chronic, pervasive and well established throughout her psychic development.

# Perceptual Analysis of Spontaneous Speech Sample

A post-morbid speech sample was recorded in November 2013. It consisted of 5 min of video-recorded spontaneous speech, which was selected from an interview with the patient. In this interview she talks about her accent change and her relational and professional problems. This sample consisted of 644 words (including filled pauses). The patient also provided two (short) pre-morbid speech samples consisting of 43 and 26 s of conversational speech dating from April and July 2011, i.e., approximately half a year before the accident. When comparing pre- and post-morbid speech samples a number of striking differences were found. The first one was a very strong trilling aspect when realizing the uvular [R]. The trill is too excessive for French, and is more typical of the one in German and some regional variants of Dutch (36/644). According to Van de Velde and Van Hout, (1999, p. 178) "realizations of /r/ in standard Dutch until recently were the trilled realizations [R] and [r], with the uvular trill gaining in frequency and prestige especially in the Netherlands (Van Haeringen, 1924; Zwaardemaker and Eijkman, 1928; Blancquaert, 1934; Hol, 1951; Damsteegt, 1969; Mees and Collins, 1982; Vieregge and Broeders, 1993), but recently also in Flanders (Rogier, 1994)." For German this variant has been described as the most common allophone (Hall, 1993): the uvular trill-R constitutes a free (dialectal) variant of /r/, existing alongside the approximant /r/ (see Hall, 1993; Schiller, 1998). The excess trilling is particularly common in a prevocalic position (raconter, renverse, traite, ...: 27x), less

frequent in intervocalic position (direct: 1x) and postvocalic position (renverse, quart, ...:8x).

On the suprasegmental level, speech rate and articulation rate were particularly slow (speech rate: 2.67 syll/s, articulation rate: 3.813 syll/s). Avanzi et al. (2012) found a mean speech rate of 4.7 syll/s (SD: 0.7) and an articulation rate of 5.6 syll/s (SD: 0.6) for Belgian French of the Tournai region; the region our patient originated from. Melody and intonation appeared normal. In order to analyze rhythm, the Pairwise Variability Index was calculated (Low et al., 2001). Vocalic PVI amounted to 54.3. This is considerably higher than the accepted value for French (43.5), and is more in the range of the stress-timed languages, such as English (57.2), or German (59.7). However, the value is substantially lower than 65.5, which is the reference value for Dutch (Grabe and Low, 2002). It is also worth mentioning that the patient did not realize any liaisons, a phenomenon by which a latent word-final consonant preceding a word starting with a vowel becomes audible. Our patient failed to realize this connection for "c'est arrivé" and "tout est important". Moreover, she did not realize the elision<sup>1</sup> in "j'entends" (pronounced as "je<sup>∗</sup> entends").

Grammar was perceived to be more simplistic than would be expected from a native-speaker of French. Sentences were perceived to be very short. At the morphosyntactic level the patient omitted the article "le" (1/644) as well as "de" in "là dedans," which was realized as "là dans" (2/644). In addition, the patient made six morphological errors against the definite article. In 5 instances, the patient used the masculine definite article instead of the female form (la même chose → le même chose; la tête → le tête; ma maison → mon maison; la pire chose → le pire chose; la chose → le chose).

# PERCEPTUAL ASSESSMENT OF THE FOREIGN ACCENT

#### Aims

The foreign accent of the patient was assessed by a listening panel who listened to speech stimuli of the patient that were mixed with those of a native speaker of French and three nonnative speakers with a clear foreign accent. The listening panel was required to rate the degree of foreignness and they were asked to identify the mother tongue of each of the speakers. The ratings provide additional support for the diagnosis of FAS, whereas the accent attribution gives an indication of whether naive listeners are able to perceptually identify the mother tongue of native (including the FAS patient) and nonnative speakers of French. Furthermore, there was an interest to investigate whether there would be any differences between the FAS patient, the true non-native speakers and the native speaker of French.

# Methods

#### Materials and Samples

Thirty students of French linguistics were recruited at the Université Libre de Bruxelles (ULB) in Brussels (age: 16–24, mean age: 20 years, 12 male and 18 female) and they were asked to rate the degree of "foreign-ness" of five speakers and to determine their native language. The students had no formal experience with speech and language pathology.

The stimuli for this experiment were taken from the intake interview, in which the patient explains what had happened to her (accident), and elaborates on her relational and professional problems. From this interview, 6 words, 3 phrases, and 6 sentences were chosen (see also: Dankovicová and Hunt, 2011 ˇ ). Care was taken that (a) the medical status of the FAS patient could not be derived from the stimuli and (b) the stimuli did not contain any morphological mistakes (as this could possibly influence the ratings of the listener panel). Stimulus selection was carried out by means of PRAAT, version 5.4 (PRAAT for Mac; Boersma and Weeninck, 2014).

#### Speakers

The speakers in this experiment were the FAS patient and four control speakers (**Table 2**) who were matched for gender with the FAS patient. The mean age of the controls was 35 years and 10 months, with an age range from 27 to 48 years old. Two speakers were Belgian but one was French-speaking and the other was Dutch-speaking (or "Flemish"; see also: Verhoeven, 2005). A third control subject spoke both Dutch and (American) English, as she was born in the USA, but moved to the Netherlands 1 year later. She was raised in English, but her education as of the age of 3 had been entirely in Dutch (100% immersion; early bilingual; see also: Bhatia and Ritchie, 2013). She no longer had contact with relatives in the USA and lived alone in the Netherlands. She considered Dutch to be her dominant language. The fourth speaker was a Russian female. No attempt was made to match the accents to those that had been informally reported for the FAS patient. It was regarded likely that most listeners were familiar with the foreign accents of the control speakers. The control speakers read the 15 stimuli that had been selected from the speech of the FAS patient. The stimuli were recorded by means of a Marantz Professional PMD 661 portable recorder and manipulated for the purpose of this experiment via PRAAT (version 5.4, 2014).

#### Stimuli and Assessment

The perception experiment contained a total of 75 stimuli, i.e., 15 stimuli × 5 speakers. Each presentation block consisted of one stimulus read by the five different speakers. The order of the speakers differed for each block (in pseudo-random order). The stimuli were separated by a 15 s. pause to provide time for listeners to record their judgments. Total duration was 26 min. 26 s. The stimuli were played to the listeners in open field at their institution. The instructions to the test were given orally to the listening panel, but they were also able to read them. Raters provided demographic information (age, gender, country of origin, time in Belgium—if not born here, mother tongue, and other spoken languages including an indication of proficiency)

<sup>1</sup> In English, the term elision is sometimes used as a synonym for deletion (e.g., Miller et al., 2006). For current article, we make a distinction between a "deletion" and an "elision" (French: "élision"), which is the "the suppression of a word-final vowel preceding a word starting with a vowel" (in spoken French this can refer to actual vowels, or the latent word-initial "h" preceding a vowel—with a few exceptions; Schane and Filloux, 1967, p. 37, our translation).


TABLE 2 | Overview of the demographic characteristics of the FAS patient and the healthy, matched controls, including an indication of the level of French (CEFR, Common European Framework of Reference for Languages).

\*Control 3 moved to the Netherlands one year after she was born. She was raised in English and learned Dutch as of the age of 3. Her education (immersion, 100%; early bilingual) has been entirely in Dutch.

TABLE 3 | Overview of mean, median, standard deviations, minimum, maximum, range and interquartile range for the scores attributed to each speaker on a seven-point scale: 1, Definitely not a native speaker of French; 7, Definitely a native speaker of French.


in a short questionnaire. For the experiment, they were asked to first rate the "foreign-ness" of the speaker on a scale from 1 to 7. This scale is to be interpreted as a continuum ranging from "definitely not a native speaker of French" (= 1) to "definitely a native speaker of French" (= 7). If their response was anything other than 7, they were asked to indicate the mother tongue of the speaker (second part).

#### Results

#### Statistical Analysis of the Accent Rating Experiment

The data were processed statistically in SPSS version 22 (IBM Corp., 2013). First, inter-rater reliability was tested for each speaker by calculating the intraclass correlation coefficient (ICC). A two-way random model was chosen, as each item was assessed by each of the 30 raters and raters represented a randomly selected sample. Data were checked for agreement implying that systematic differences between raters were taken into account. For FAS: ICC(2, 30) = 0.94, French: ICC(2, 30) = 0.903, Dutch(Be): ICC(2, 30) = 0.955, English/Dutch(Nl): ICC(2, 30) = 0.959, and for Russian: ICC(2, 30) = 0.523.

**Table 3** provides a summary of the descriptive statistics including means, standard deviations, minima and maxima, range as well as interquartile range for each of the five speakers. Based on the means ( x¯ ) as well as median (M) it is clear that the FAS speaker is situated roughly in the middle of the seven-point scale ( x¯ = 3.791; σ = 2.318 and M = 4). The standard deviation was high, which indicates that the raters may have experienced some difficulty identifying the accent.

Application of the Kolmogorov-Smirnov test indicated that the data were not normally distributed (Kolmogorv-Smirnov: p < 0.1). Hence, non-parametric testing was applied. A Kruskall-Wallis H test was carried out to test whether there was a significant difference between the scores attributed to the different speakers. Results for the Kruskall-Wallis H test indicated that this was the case: H(5) = 1393.60, p < 0.0001. However, additional Mann-Whitney U tests (see **Table 4**) were carried out to identify the speakers who differed significantly from each other and who did not. All speaker differences were significant (p < 0.0001), except for one: Dutch (Be) and English/Dutch(Nl) (p > 0.003: Bonferroni correction; p = 0.290).

A correspondence analysis was performed to get a two dimensional image of the strength (distance) of the associations between rating and speakers, based on frequency counts (**Table 5**: correspondence table; **Figure 1**). This showed that the associations between the native French speaker and rating "7" were particularly strong. The FAS speaker was situated more toward the higher ratings (4, 5, 6, 7) than, for instance, both native Dutch speakers and even markedly more so than the Russian speaker (strongly associated with rating "1"), who clearly occupied a more isolated position on the two-dimensional plot.

#### Mother Tongue Identification

It appeared that only 50% of the raters (n = 15/30) had indicated the mother tongue of each speaker for each stimulus. Nine raters were female, and six were male (age range: 16–23 years; mean age: 19 years). **Figure 2** shows the different accents associated with the different speakers. Exact numbers can be found in **Table 6**.

In general, the FAS patient was less often identified as "French" (n = 61/225; 27.1%) than as a speaker of other languages (72.9%). However, the other languages attributed to the FAS patient were most often Romance languages (Spanish: n = 21; Italian: n = 21; Portuguese: n = 5; Romanian: n = 4; n = 112/225; 49.8%). Still, she was identified as Dutch in 21.3% of the stimuli (n = 48/225), and as German in 4% of stimuli (n = 10/225). The



TABLE 5 | Correspondence table with frequency data for the different speakers.


1, Definitely not a native speaker of French; 7, Definitely a native speaker of French.

FAS patient was less often identified as "French" than the native French speaker (n = 214/225; 95.1%), which corroborates the findings for the first part of the study. Hence, there seemed to be a clear difference in the perception of the FAS patient and the nonimpaired French control speaker. The Dutch (Be) speaker was associated with "Dutch" in 53.3% (n = 120/225) of the stimuli, whereas for the English/Dutch(Nl) speaker this was 28% of the stimuli (n = 63/225). In 8% of the stimuli she was associated with "English" (n = 19/225). The Russian speaker was correctly associated with her native language in 28.4% (n = 64/225) of the stimuli.

Interestingly, the accent stratification was most diverse for the FAS patient (16 different mother tongues were associated with her stimuli). For the other speakers, the number of attributed accents was: English/Dutch(Nl): 15; Russian: 13; Dutch (Be): 12; and French: 5. Equally interesting to note is that the accent of the FAS patient could not be identified in 30 items: this is considerably more often than for the other control speakers: French: 2; Dutch (Be): 14; English/Dutch (Nl): 16; Russian: 1.

# DISCUSSION

This article discusses the case of a patient who developed FAS in the absence of demonstrable damage to the central nervous system. No structural damage was visible on repeat CT and MRI of the brain. Repeat neurological and neurophysiological examinations were normal. An in-depth psychodiagnostic workup was carried out (a) to confirm the existence of psychological issues and (b) to identify a possible psychiatric disorder. Unfortunately, testing did not reveal a clearly delineated disorder on either axis I or II of the DSM-IV-TR (APA, 2000). Test results were, however, indicative of a highly dependent, hysterical and borderline personality. Although psychological problems were considered persistent and chronic, there were several elements in the clinical interviews that could corroborate the hypothesis of a psychogenic origin of the accent change.

are all grouped around the center ratings: 2,3,4, and 5.

First, the accent diminished whenever there was a psychological breakthrough during the clinical interview (Avbersek and Sisodiya, 2010; see also: Keulen et al., 2016<sup>2</sup> ). More specifically these episodes occurred when the patient talked about her relational problems, issues at her former workplace and the fact that she no longer had a job at the moment the interview took place. Interestingly, the negative impact of emotional disequilibrium, feelings of stress and/or anxiety on the recovery process has previously been established for neurological speech and language disorders (see also: Cahana-Amitay and Albert, 2015). In contrast, current patient seemed to benefit from these emotional triggers.

Second, there was a correspondence between the culmination of disputes with her line manager, which ultimately led her dismissal, and the onset of the accent: both occurred approximately 6 months after the accident.

Third—and related to prior argument—the increased emotional lability and hysteric symptoms may have been reinforced by the adverse life events that had marked her life in rapid succession: the car accident, the accent shift, the dismissal, and the relational problems. According to Avison and Turner (1988) the relationship between adverse life events and psychological distress is often underestimated. According to Charles et al. (2013) even naturally occurring daily stressors or minor affective experiences can have a far-reaching impact on mental health (p. 739). It is important to note that at the time we saw the patient, she had been unemployed for about a year and a half and had marital problems.

2 In this recently published article, another case of psychogenic FAS is presented. The patient suddenly lost her accent during a temper tantrum.

TABLE 6 | Overview of the mother tongues (rows) associated with each of the speakers (columns).


Languages pertaining to the same language family have been grouped together.

The patient had repeatedly complained about (sustained) attentional and amnestic problems, as well as slow cognitive processing. These complaints were confirmed by neuropsychological test results: the patient demonstrated impaired processing on the cognitive tasks appealing to working memory, attention, and executive function. These complaints have been noted regularly in psychogenic FAS patients (Poulin et al., 2007; Cottingham and Boone, 2010; Jones et al., 2011; case Roy et al., 2012) and have more generally been associated with somatic disorders (Niemi et al., 2002; Trivedi, 2006; Demir et al., 2013). However, studies claiming such an association have been the subject of scientific scrutiny, because hardly anyone administered symptom validity tests to their participants prior to inclusion. Delis and Wetter (2007) suggest that patients with psychogenic disorders may exaggerate cognitive deficits, due to external (medico-legal reasons, treatment), internal/interpersonal incentives (in order to sustain a dependent relationship with specialist or other) or even for unspecified reasons ("not otherwise specified"). The current patient completed symptom validity tests, which turned out negative for malingering and feigning. Moreover, neurocognitive testing was carried out on two occasions (2 years apart, both post-morbid). This is crucial, as significant underperformance or inconsistencies in cognitive test scores or profiles across repeated evaluation would be considered indicative of a feigned cognitive deficit.

In the case of our patient the profile seems mostly consistent with a post-concussion cognitive syndrome after a minor head trauma. The objectively attested cognitive deficits and the negligence of the cognitive complaints after prior examinations might also have contributed to the development of the FAS.

On a linguistic level, the patient's speech was characterized by the realization of the uvular R with a marked, atypical trill and occasionally, she deleted phonemes. Furthermore, the patient spoke at a very slow speech rate and had a speech rhythm that was qualified as stress-timed, whilst French is a syllable-timed language (Grabe and Low, 2002). The segmental and suprasegmental characteristics noted for this patient do not seem to be restricted to a psychogenic population: all have been attested for neurogenic FAS patients as well. However, the isolated, morphological deficits, which irregularly affected articles, and the occasional pronunciation deficits affecting liaisons and elisions (phenomena typifying French) seemed incredible. The grammatical deficit is very different and less substantial than the agrammatism and paragrammatism encountered in aphasics, for instance. Some degree of conscious or subconscious manipulation cannot be ruled out. Incredible grammatical disorders of the like have previously been reported in other psychogenic cases (e.g., Van Borsel et al., 2005; Cottingham and Boone, 2010).

Some speech characteristics might have been consistent with the impression of a Dutch or German accent. However, results of the listening experiment suggest that the patient was perceptually situated midway between a true non-native speaker of French and a native speaker of French. This finding is in line with what has been found in the experiments of Di Dio et al. (2006), Kanjee et al. (2010) and Verhoeven et al. (2013). However, the methodology in Verhoeven et al. (2013) and Kanjee et al. (2010) differed from the current one in the sense that they did not select words and sentences in pseudo-random order, but provided raters with spontaneous speech samples (Verhoeven et al., 2013) or elicited (read) sentences (Kanjee et al., 2010). For Di Dio et al. (2006) it is not clear what type of stimuli was used. The methodology in the present experiment was more comparable to the approach of Dankovicová and ˇ Hunt (2011), who used single words and phrases. As far the identification of the linguistic background of the speakers is concerned, it was found that the FAS patient was associated with French in only 27.1% of the stimuli. The French-speaking control subject on the other hand was almost always recognized as French (95.1%). The patient also demonstrated the most diverse association patterns regarding her native language. The "uncertainty" expressed in the first part of the experiment (M = 4) compares well with the second part of the study: the patient was associated with 16 different possible native languages, and for 13% of the items, the mother tongue could not be identified. Furthermore, the hypothesis that the patient was perceived as Dutch or German was not entirely confirmed, as most listeners still perceived her as being a native speaker of a Romance language (for 49.8% of the stimuli, including French, Italian, Spanish, Portuguese, and Romanian; in comparison: Germanic languages, including English, Dutch and German: 28.4% of the stimuli).

Remarkably, the patient did not seem bothered by the accent change at all. Nevertheless, there were clear problems at the cognitive-behavioral and psychological level (mentioned above). Moreover, she was not keen to be treated for the condition. Rather, she wanted to show off with it. She did not seem to be overtly concerned about her symptoms. This is unlike what is mostly seen in neurogenic patients, who are emotionally and psychologically affected by FAS (Miller et al., 2011). In fact, to the best of our knowledge, there are only two other reports (Laures-Gore et al., 2006; Tailby et al., 2013) in which it was mentioned that patients were almost completely indifferent to the negative implications. These cases were classified as "mixed FAS" by Verhoeven and Mariën (2010): these patients further optimize their accent and often start to use words of the language, which is suggested by their accent in order to create a more authentic personality. The use of foreign-sounding words or a more formal language variant has also been noted for psychogenic FAS patients (Reeves and Norton, 2001; Poulin et al., 2007; see also: Reeves et al., 2007, case 3; Polak et al., 2013). The processes which invoke this kind of change in language use, still remain to be clarified. For neurogenic cases, some positive associations have also been noted. According to some patients, living with FAS opened new horizons. However, in the longer term, the negative perceptions from others, the hybrid identity, a loss of sense of belonging, a breakdown of relationships, and the incapacity of medical staff to explain the change all lead to frustrations (Miller et al., 2011).

Patient coping strategies, psycho-emotional and -social implications have generally been underreported in the literature about both psychogenic and neurogenic FAS (for neurogenic patients: Munson, 2005; Miller et al., 2006, 2011; Moreno-Torres et al., 2013). Future research should identify and study the effects

#### REFERENCES


of this syndrome at the personal and inter-personal level to allow for a full rehabilitation of both speech profile and psychological well-being.

#### CONCLUDING REMARKS

Only a handful of putative psychogenic FAS cases have been described in the literature and many researchers have been hesitant to conclude to an underlying psychogenic etiology. Although it is hard to provide evidence for a direct causal link between the psychological factor in play and FAS, ample evidence exists that the FAS symptoms (and their course) in this patient are of a psychogenic nature: (1) clear absence of (visible) neurological damage or clinical evidence for a neurological disorder, in conjunction with (2) the presence of psychological and psychiatric factors, (3) the timing of the onset of the accent change, (4) the atypical and fluctuating symptom course, (5) irregular and incredible morphological mistakes occurring in a short sample of spontaneous speech, and the fact that (6) the patient was unconcerned by the change of accent. As most of the psychogenic FAS cases were published in the last decades, reports of cognitive-behavioral deficits such as the ones displayed by current patient are becoming increasingly important with a view to the development of the proper therapeutic approaches for this psychogenic FAS population.

#### AUTHOR CONTRIBUTIONS

Acquisition of data: SK, LDP, and PM. Analysis and interpretation of data: SK, JV, RJ, LDP, RB, and PM. Drafting the manuscript: SK and PM. Critical manuscript revision: all authors. Critical revision of reviewed manuscript: SK, JV, and PM. Final manuscript approval: SK and PM on behalf of all authors.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Keulen, Verhoeven, De Page, Jonkers, Bastiaanse and Mariën. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Loss of regional accent after damage to the speech production network

*Marcelo L. Berthier1\*, Guadalupe Dávila1,2, Ignacio Moreno-Torres1,3, Álvaro Beltrán-Corbellini1, Daniel Santana-Moreno1, Núria Roé-Vellvé4, Karl Thurnhofer-Hemsi4,5, María José Torres-Prioris1, María Ignacia Massone6 and Rafael Ruiz-Cruces1*

*<sup>1</sup> Cognitive Neurology and Aphasia Unit and Cathedra Foundation Morera and Vallejo of Aphasia, Centro de Investigaciones Médico-Sanitarias, University of Malaga, Malaga, Spain, <sup>2</sup> Department of Psychobiology and Methodology of Behavioural Sciences, Faculty of Psychology, University of Malaga, Malaga, Spain, <sup>3</sup> Department of Spanish Language I, University of Malaga, Malaga, Spain, <sup>4</sup> Molecular Imaging Unit, Centro de Investigaciones Médico-Sanitarias, General Foundation of the University of Malaga, Malaga, Spain, <sup>5</sup> Department of Applied Mathematics, Superior Technical School of Engineering in Informatics, University of Malaga, Malaga, Spain, <sup>6</sup> Centro de Investigaciones en Antropología Filosófica y Cultural, Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina*

Lesion-symptom mapping studies reveal that selective damage to one or more components of the speech production network can be associated with foreign accent syndrome, changes in regional accent (e.g., from Parisian accent to Alsatian accent), stronger regional accent, or re-emergence of a previously learned and dormant regional accent. Here, we report loss of regional accent after rapidly regressive Broca's aphasia in three Argentinean patients who had suffered unilateral or bilateral focal lesions in components of the speech production network. All patients were monolingual speakers with three different native Spanish accents (Cordobés or central, Guaranítico or northeast, and Bonaerense). Samples of speech production from the patient with native Córdoba accent were compared with previous recordings of his voice, whereas data from the patient with native Guaranítico accent were compared with speech samples from one healthy control matched for age, gender, and native accent. Speech samples from the patient with native Buenos Aires's accent were compared with data obtained from four healthy control subjects with the same accent. Analysis of speech production revealed discrete slowing in speech rate, inappropriate long pauses, and monotonous intonation. Phonemic production remained similar to those of healthy Spanish speakers, but phonetic variants peculiar to each accent (e.g., intervocalic aspiration of /s/ in Córdoba accent) were absent. While basic normal prosodic features of Spanish prosody were preserved, features intrinsic to melody of certain geographical areas (e.g., rising end F0 excursion in declarative sentences intoned with Córdoba accent) were absent. All patients were also unable to produce sentences with different emotional prosody. Brain imaging disclosed focal left hemisphere lesions involving the middle part of the motor cortex, the post-central cortex, the posterior inferior and/or middle frontal cortices, insula, anterior putamen and supplementary motor area. Our findings suggest that lesions affecting the middle part of the left motor cortex and other components of the speech production network disrupt neural processes involved in the production of regional accent features.

Keywords: speech production, regional accent, foreign accent, motor speech disorder, stroke

#### *Edited by:*

*Srikantan S. Nagarajan, University of California, San Francisco, USA*

#### *Reviewed by:*

*Peter Sörös, University of Western Ontario, Canada Caroline A. Niziolek, Boston University, USA*

> *\*Correspondence: Marcelo L. Berthier mbt@uma.es*

*Received: 18 May 2015 Accepted: 23 October 2015 Published: 05 November 2015*

#### *Citation:*

*Berthier ML, Dávila G, Moreno-Torres I, Beltrán-Corbellini Á, Santana-Moreno D, Roé-Vellvé N, Thurnhofer-Hemsi K, Torres-Prioris MJ, Massone MI and Ruiz-Cruces R (2015) Loss of regional accent after damage to the speech production network. Front. Hum. Neurosci. 9:610. doi: 10.3389/fnhum.2015.00610*

# INTRODUCTION

Regional accent (or *within-language accent*) is a manner of speaking peculiar to a location where its speakers reside (Wells, 1982; Cristia et al., 2012). Regional accent has coherent variations in phonetic, phonological, phonotactic, and prosodic information found within the standard language which allows it to be distinguished from other regional accents of the same language (Wells, 1982; Cristia et al., 2012). In general, the geographical, socio-economic, and ethnic background can be inferred by regional accent and dialect of speakers (Labov, 2006; Coupland, 2007). Traditionally, the study of regional accent and dialects pertains to the domains of applied psychology and socio-linguistics. Nonetheless, in recent years interest in studying accent has been expanded to the field of cognitive neuroscience to gain understanding of both the factors influencing accent perception (Adank et al., 2009; Bent and Holt, 2013; Trude et al., 2013; Smith et al., 2014) and neural mechanisms (Adank et al., 2012; Goslin et al., 2012; Callan et al., 2014). Interest in the analysis of accent in healthy subjects, however, is unbalanced since most studies have examined accent perception (Anderson-Hsieh and Koehler, 1988; Clarke and Garrett, 2004; Bradlow and Bent, 2008; Adank et al., 2009; Brunellière et al., 2009; Cristia et al., 2012; Goslin et al., 2012; Bent and Holt, 2013; Trude et al., 2013; Smith et al., 2014) rather than its production (Harrington et al., 2000; Golestani and Pallier, 2007; Golestani et al., 2007; Ventura-Campos et al., 2013). These studies have provided compelling evidence that understanding messages intoned with a non-native accent (*foreign accent*) entails more difficulty than processing regional accents, although non-familiar regional accents could also reduce the intelligibility, efficiency, and accuracy of linguistic processing (Floccia et al., 2006; Cristia et al., 2012; Van Engen and Peelle, 2014). Differences in comprehension of foreign and familiar regional accents have been interpreted as resulting from perceptual distance or different neural processing mechanisms (see Goslin et al., 2012).

Speaking with a foreign accent or unfamiliar regional accent implies a processing cost on the listener as these accents impose additional cognitive support to optimize intelligibility, comprehensibility, and speed of processing (Floccia et al., 2006; Adank et al., 2009; Bent and Holt, 2013; Van Engen and Peelle, 2014). Speaking with a non-native or unfamiliar regional accent also carries negative connotations and even jeopardizes the credibility of the speaker (accent discrimination; Lippi-Green, 1994; Lev-Ari and Keysar, 2010; Akomolafe, 2013). There are some dramatic examples; for instance, during the dictatorship of Rafael L. Trujillo (1891–1961) in the Dominican Republic, the Nobel Prize laureate Mario Vargas Llosa narrates the story of the parsley massacre (Vargas Llosa, 2000). To rapidly identify the nationality of Haitians, Dominican soldiers would hold up a sprig of parsley and asked "What is this?" Assuming that those who were incapable of pronouncing correctly the /r/ of the Spanish word "perejil" (parsley) were Haitians, soldiers assassinated more than 20000 refugees living within the Dominican border using their pronunciation as sufficient condemnation.

Interest in the study of accent change in pathological conditions is not new (Marie, 1907; Pick, 1919). Investigations on abnormal accent change have mostly been done in brain damaged individuals displaying a rare condition termed *foreign accent syndrome* (FAS; Whitaker, 1982). FAS is a motor speech disorder characterized by the development of speech patterns which are perceived as foreign (Marie, 1907; Pick, 1919; Monrad-Krohn, 1947; Whitaker, 1982; Berthier, 1994). Although the term "foreign" is generically used to designate the origin of accent change, it is noteworthy that in the original aphasic patient described by Marie (1907) speech changes occurred in "regional" accent (from Parisian French to Alsatian accent). Since then this case has been considered the original description of FAS (Whitaker, 1982) although subsequent cases of regional accent change (e.g., from native southern Ontario accent to Canadian east coast accent – Naidoo et al., 2008) have also been described (Critchley, 1962; Seliger et al., 1992; Dankovicová et al., 2001 ˇ ; Kwon and Kim, 2006; Ryalls and Whiteside, 2006) and reclassified as variants of FAS (Seliger et al., 1992; Verhoeven and Mariën, 2007) or foreign accent-like syndromes (Reeves and Norton, 2001). The analysis of changes in production or reception of accents using lesion-symptom mapping is providing fruitful insight on the linguistic, behavioral, and neural mechanisms underpinning the production of foreign and regional accents. These include studies in neurological patients with focal lesions (Kurowski et al., 1996; Blumstein and Kurowsky, 2006) or during the early stages of degenerative conditions (Alzheimer's disease, primary progressive aphasia; Luzzi et al., 2008; Hailstone et al., 2012; Fletcher et al., 2013; Paolini et al., 2013).

Perhaps because potential changes in accent are masked by cooccurring aphasia (Critchley, 1962), the status of regional accent in patients with left hemisphere damage is largely unexplored. Hence, the study of more pure cases of accent change in patients lacking prominent aphasic deficits is ideal for linguistic analysis and lesion-symptom mapping. In this context, one issue that has received little attention up to now makes reference to the loss of the segmental and suprasegmental features that characterize regional accents. Alexander et al. (1987) succinctly described the case of a person with aphasia who had lost his dense Bostonian accent as a result of a small infarct in the deep white matter near to the left anterior capsular-putaminal region. This patient had a transient impairment of prosodic production yet the loss of regional accent was persistent. Here, we report loss of regional accent after rapidly regressive Broca's aphasia in three Argentinean patients who had suffered unilateral or bilateral focal lesions involving different components of the speech production network. All patients were monolingual speakers with three different native Spanish accents (Cordobés or central, Guaranítico or northeast, and Bonaerense; **Figure 1**).

The three patients reported here were studied almost three decades ago in Buenos Aires by two of us (MLB and MIM) when knowledge of the linguistic and neural underpinnings of pathological changes in accent were emerging (Graff-Radford et al., 1986; Blumstein et al., 1987; Gurd et al., 1988). The aim of the present study is to interpret our own data in the light of new discoveries about accents and neuroscience. In spite of the time elapsed between evaluation and the present report, our patients were studied with comprehensive methodology

regional accent used in Córdoba is designated as Cordobés or central; the regional accent used in Corrientes is generically termed "Guarínitico" (due to the strong influence of Guaraní, one of the two languages spoken in the neighbor country Paraguay) or northeast; and the regional accent characteristic of Buenos Aires which is known as Rioplatense – bonaerense. Rioplatense makes reference to the Río de la Plata (Silver River) and Bonaerense to the city of Buenos Aires.

and we trust that due to the lack of previous similar cases the description of these cases continues to be interest. In the past few years, modern neuroimaging studies (Wise et al., 1999; Sörös et al., 2006; Eickhoff et al., 2009; Ackermann and Riecker, 2010) and computational models (Guenther et al., 2006; Bohland et al., 2010) have identified the large-scale bilateral network that mediates speech production and monitoring. Advances in the interpretation of modern cases of FAS within this neuronatomical and computational framework (Fridriksson et al., 2005; Katz et al., 2012; Moreno-Torres et al., 2013; Tomasino et al., 2013) have led to the conclusion that virtually all cases with purported changes in accent result from discrete and selective involvement of one or more components of the speech production network. This means that different clinicopathological correlations in FAS are possible, thus FAS could be deemed heterogeneous when clinical presentation is the focus of analysis. Damage to different nodes of the speech production network can induce FAS (Graff-Radford et al., 1986; Blumstein et al., 1987; Mariën and Verhoeven, 2007), paradoxically resolve it (Cohen et al., 2009; Bhandari, 2011), trigger changes in regional accent (e.g., from Parisian French to Alsatian accent; Marie, 1907; Ryalls and Whiteside, 2006; Naidoo et al., 2008; Polak et al., 2013), awake a previously learned and dormant regional accent (Critchley, 1962; Roth et al., 1997), make regional accent stronger only in one language in polyglots (Levy et al., 2011), or even produce a FAS restricted to one language in bilinguals (Avila et al., 2004). From a nosological viewpoint, FAS is also a heterogeneous condition as several types (neurogenic, psychogenic, developmental, and mixed) have been identified (see Reeves and Norton, 2001; Mariën et al., 2006, 2009; Reeves et al., 2007).

# MATERIALS AND METHODS

### Participants Patient OM

A 47-years-old, right-handed man suddenly noticed speech disturbances that rapidly progressed to mutism associated to right side weakness involving the face, arm, and leg. In the first 2 days post-onset, he was able to phonate but not to produce words. This situation lasted for 20 days until he was able to utter isolated words and the name of two neighbors with normal volume but monotonous voice. By that time, his auditory and written comprehension was normal, but he was unable to repeat words and sentences and displayed crying outbursts when unable to communicate verbally. Writing was moderately impaired and voluntary bucco-facial movements were abnormal. A computerised tomography (CT) scan revealed bilateral hemorrhages involving the left motor cortex and right insula-putamen region probably resulting from untreated hypertension (Sörös et al., 2013) or less probably from sporadic cerebral amyloid angiopathy (Samarasekera et al., 2012). His relatives reported that 1 year before the present episode, OM was admitted to hospital for 2 days for dizziness, instability, and impaired handwriting that lasted 2 days, but he did not present speech or language problems. He was referred for the present language evaluation 8 months after stroke onset. Neurological examination revealed a complete recovery of the right hemiparesis and improvement of language impairment. However, his discourse was ungrammatical and contaminated by occasional instances of mitigated echolalia (incorporation of part of the questions into his responses). He had moderate reading impairment, but writing was normal. His main complaint was a change in his speech. He commented that "Now, I don't speak as before*......* my voice has changed*...* .my language is smooth and flat and at times words come close together." He also reported that his verbal emissions were slow and devoid of emotional coloring, and reported problems to signal emphasis in interrogative sentences. OM was a monolingual Spanish speaker, born in Córdoba (Argentina). Before the stroke he spoke with a strong regional accent which changed afterward. Nevertheless, the origin of his newly acquired accent was puzzling. OM considered that after the stroke his accent sounded Italian and 7 months after onset when he attended his father's funeral, some relatives thought that he was speaking with Italian accent similar to the one used by his Italian father. However, his attending speech pathologist thought instead that OM had actually lost his previously regional accent. OM was a former soccer coach and at the time of the stroke he worked as a taxi driver. He suffered symptoms of mild depression and decreased motivation after the stroke.

#### Patient JF

An 18-years-old right-handed male was admitted to the emergency room with a 2 week history of fever, headaches, and vomiting. Upon admission, he developed focal seizures affecting the right face and tongue with secondary generalization and in one occasion transient speech arrest was documented after a seizure. CT and magnetic resonance imaging (MRI) scans revealed a mature, encapsulated abscess involving the left sensorimotor cortex with mass effect over the insular cortex and basal ganglia and perilesional oedema. The cerebral abscess was surgically evacuated. After surgery the patient was mute and aphonic, and had swallowing problems. The remainder of the neurological examination also disclosed tongue deviation to the right, "pseudoperipheral" right facial and velum paresis secondary to an incomplete anterior opercular dysfunction (Foix-Chavany-Marie syndrome; Starkstein et al., 1988; Martino et al., 2012). He also had a mild right hemiparesis affecting the arm with spared sensation. There was no bucco-facial apraxia. Two weeks after surgery, he regained fluent speech and at that time his mother and an aunt reported that he began to speak with a strange accent that resembled Japanese. This newly acquired accent spontaneously remitted in a few days and according to the same relatives, JF's speech was then "flat" in several contexts. For instance, when he asked a question it was not possible to discern if he was actually asking something or not because he could not modulate intonation properly. The same thing happened when he was asked to impart angry intonation in sentences unless he was really irritated. He was referred for the present language evaluation 4 months after surgery. JF was a monolingual Spanish speaker with very limited knowledge of Brazilian Portuguese and English. He was born in Corrientes (Argentina) and lived there until 8 months before developing the cerebral abscess when he moved to Buenos Aires to study computational engineering. Before the brain lesion, JF and his mother considered that he had a typical regional accent of Corrientes.

#### Patient RC

A 54-years-old man noticed inability to manipulate objects with his right hand just before developing a generalized seizure with transient loss of consciousness. On recovering consciousness he was mute and had right facial weakness. An emergency CT scan disclosed a hemorrhage in the left frontoparietal region secondary to the rupture of a frontal arteriovenous malformation (AVM). The hemorrhage was surgically evacuated and some feeding vessels of the AVM were occluded. During surgery it was confirmed that the premotor, motor, and sensorimotor cortices were damaged. After surgery, RC awoke with a right hemiparesis (it lasted 15 days) and mutism but he demonstrated preserved auditory comprehension. Recovery of speech production was gradual and his spontaneous speech was slow and monotonous with impaired grammar, naming, and writing. In the ensuing months, fluency continued to improve and word finding difficulties decreased. By that time, his speech was also monotonous to the extent that he was unable to signal proper tone in interrogative sentences. He was referred for the present evaluation 7 months after stroke onset. On the initial interview, RC could not convey emotion through words; he frequently said "the emotion was only noticed in my face*...*.my speech was flat*...*it did not have inflections." By contrast, he made no comments on the loss of his premorbid regional accent, suggesting he was unaware of any change in accent. RC was forced as a child to write with the right non-dominant hand but he used the left hand for other activities. RC was a monolingual Spanish speaker, born in Buenos Aires (Argentina) and lived there until early adulthood. Although he had lived in other provinces (Salta and Jujuy) of Argentina during several years, he and his wife considered that he had a typical regional accent of Buenos Aires (see below) which was the place where he lived for the two decades previous to his stroke.

#### Cognitive and Language Assessment

In all patients, general intelligence was assessed with the Wechsler Adult Intelligence Scale-Revised (Wechsler, 1988), whereas non-verbal intelligence was tested with the Raven's Colored Progressive Matrices Test (Raven, 1965; see also Kertesz, 1982). Language competence was examined with the oral subtests (spontaneous speech, comprehension, repetition, and naming) of the Western Aphasia Battery (WAB; Kertesz, 1982) and an Aphasia Quotient (AQ) was obtained to rate the severity and type of aphasia. The WAB-AQ considers that patients have aphasia when they score *<*93.8 (range: 0–100) and lower scores indicate more severe aphasic deficits. By contrasts, patients performing above this cut-off score (≥93.8) may actually have speech and subtle language deficits but they are not classified as having clinically significant aphasia. The Token Test (TT; De Renzi and Vignolo, 1962) was also administered to all patients. The TT is designed to assess verbal comprehension of commands of increasing complexity. The test employs a set of 20 plastic tokens consisting of two shapes (circles and squares) depicted in five colors and two sizes (small and big). The long version of the TT (62 items) was used for the present study. Phonological fluency was assessed with the Controlled Oral Word Association Task (COWAT; Borkowski et al., 1967). The study was performed according to the Declaration of Helsinki and the protocol was approved by the Ethical Committee of the Raúl Carrea Institute for Neurological Research (FLENI), Buenos Aires, Argentina. All patients provided written informed consent prior to the detailed analysis of their speech-language deficits.

#### Lesion Analysis

Lesion location in CT and MRI scans were examined by a neuroradiologist (RRC) blind to clinical information. Identification of damaged areas and the limiting fissures and sulci was performed using an atlas of neuroanatomy of language regions of the human brain (Petrides, 2014) and a brain atlas based on ultra-high field (7.0 Tesla) *in vivo* MRI and cadaver cryomacrotome sections (Cho et al., 2015). Identification of subregions in the motor cortex for phonation and articulation, including lip, tongue and jaw areas, was based on images of previous brain imaging studies (Fesl et al., 2003; Naidich et al., 2004; Brown et al., 2008; Grabski et al., 2012). Although two patients (OM and RC) had lesions in more than one region of the speech production network, the principal component of lesions in every patient overlapped in the left pericentral region. Thus, those parts of the lesions involving the premotor, precentral, and post-central areas were manually traced by one member of the team (NRV) and verified for reliability by two members (RRC and KTH) with experience in neuroradiology. Since the lesion (cerebral abscess) in JF had marked mass effect and perilesional oedema, lesion analysis in this patient was done in a control CT scan performed 3 months after the surgical removal of the abscess. Lesions were drawn on representative axial T1-weigthed MRI axial templates from the MRIcron software (Rorden, C., 2005. www*.*mccauslandcenter*.*sc*.*edu/mricro/mricron/). Lesion overlap in the left premotor and motor cortex was carried out with Imcalc from the Statistical Parametric Mapping (SPM) software package version 8 (Welcome Department of Cognitive Neurology, London, UK). The identification of involved cortical areas was done using the Automatic Anatomical Labeling (AAL) atlas (Tzourio-Mazoyer et al., 2002). All anatomical regions identified by AAL were verified by comparing with anatomical atlases of MRI (Petrides, 2014; Cho et al., 2015).

#### Analysis of Accent

Loss of regional accent in our three patients was studied through acoustic analysis, both segmental and prosodic, to determine the phonetic characteristics which distinguish this disturbance from typical regional accent. The absence of experimental data about regional accent in Argentinian's speakers of Spanish leads us to take into account the few impressionistic observations found in the literature (Vidal de Battini, 1964; Fontanella de Weinberg, 1971) when analyzing the data. In patient OM, a within-subject study was possible because a recording of his voice previous to the stroke was available. In the two remaining patients, regional accents were compared with data from healthy controls. In the case of patient JF, his emissions were compared with data obtained from a gender-, age-, and original dialectmatched control speaker. Data from patient RC were compared with the results obtained from four healthy control subjects who had participated in the assessment of Buenos Aires Spanish (Manrique, 1980; Manrique and Signorini, 1983; Massone, 1988).

#### Segmental Analysis

The study of both segmental and prosodic analysis utilized a sentence-reading procedure whereby speakers were asked to read eight declarative sentences that had been constructed to include particular variation in syntactic structure. A sample of spontaneous speech was used for segmental analysis. A set of sentences from the mood and tonic accent test were also included. The stimuli used for the analysis were recorded in a sound-treated room on an Ampex AG 440-2 tape recorder. Narrow phonetic transcriptions of this material were performed by two trained phoneticians. The acoustic description was accomplished with narrow and wide-band spectrograms extracted by mean of a Kay Elemetrics 7029. The following parameters were measured in the spectrograms: formant frequencies, voice-onset time (VOT) in voiceless stops, frequency position of noise bands and segment duration.

#### Prosodic Analysis

The prosodic patterns were examined in F0 plots which were obtained from a computer program based on the FPRD (fundamental period; Cooper and Sorensen, 1981) run on a PDP 11 Digital computer. The following measurements were made on the plots for each utterance: (1) initial and final F0 values; (2) the contours were analyzed according to the subcontours each had. The highest and the lowest F0 values in each subcontour were measured as the peak and valley of that subcontour. The subcontour starts on the syllable carrying an F0 accent and includes all following unaccented syllables. The overall average F0 was obtained by averaging the values of peaks and valleys. (3) The top reference line was adopted (O'Shaughnessy, 1976; Cooper and Sorensen, 1981) in order to describe the F0 general pattern of declination in declarative sentences. These lines were traced across the F0 peak values. The rate of declination (Hz/sec) was measured from the top line, which is the result of connecting the F0 peaks of the utterance. (4) For each contour, the F0 variation was defined as the difference between the F0 values for the lowest valley and for the highest peak. The range of F0 variation was the mean across the different contours.We also adopted an adjusted range measurement to counterbalance the effect of extreme points, by extracting the average values of all peaks (Ryalls et al., 1987); (5) Duration in msec of syllabic types (CV, CVC, CVVC) in stressed, unstressed and prepausal conditions was also measured. (6) Inter-stress intervals were measured from the consonant/vowel onset of the syllable bearing the stress to the consonant/vowel onset of the next syllable bearing stress; (7) speech rate was measured based on the duration of the inter-stress intervals and the duration of utterances. The healthy control subjects underwent the same assessment procedure.

#### Prosodic Tests

All patients reported problems producing emotional intonation to sentences. This finding coincides with description from previous cases of FAS and linguistic and affective dysprosody (Berthier et al., 1991; Moreno-Torres et al., 2013). Therefore, a mood production task (Weintraub et al., 1981; Graff-Radford et al., 1986; Berthier et al., 1991) was used. The three patients were asked to read four sentences with angry, sad, happy, neutral, and interrogative intonation. Targets sentences had a neutral propositional message (e.g., "Mañana voy a viajar a Mendoza" →*"Tomorrow I'm leaving for Mendoza"*). Perceptual judgements of affective and linguistic prosody production were rated by six undergraduate students of phonetics blind to patients' identity in the five different categories. F0 curves were generated from these recordings. Raters reported no speech or hearing problems, and none of them had extensive experience in rating pathological voices. The raters were required to assign one of the four affective tones (happy, angry, sad, neutral), and one linguistic tone (interrogative) to each sentence produced by the patients. Patients were also asked to place tonic accents in sentences following the methodology described in previous studies (Weintraub et al., 1981; Graff-Radford et al., 1986; Berthier et al., 1991). Based on visual analysis of F0 increases, the expert phoneticians decided whether or not the emphatic stress was corrected signaled. Patient heard a series of 10 declarative sentences (e.g., "Los anteojos están encima de la mesa" "The glasses are on the table") and were then asked some questions (e.g., ¿Qué está encima de la mesa? "What is on the table?"; "¿Dónde están los anteojos?" "Where are the glasses?"). Patients were informed that they should answer these sentences by placing emphatic stress on the appropriate words (subject, object, verb) depending on the questions. The full set was composed of 25 questions (10 base questions with two or three tonic accents each). Here, the patients were asked to mark the emphasis on the appropriate words by using appropriate word stress. Comprehension of affective prosody was also examined (Ross, 1981). Patients listened to 16 tape-recorded sentence intoned with angry, happy, sad and neutral intonation and were asked to identify the imparted intonation in a multiple choice format.

# RESULTS

#### Cognitive and Language Functions

General intellectual functions were preserved with average or above-average intellectual quotients scores (**Table 1**). Non-verbal intelligence was also normal in the two patients tested (OM and


RC; **Table 2**). On the WAB, all patients obtained AQs well-above the cut-off scores for diagnosis of aphasia. All patients had fluent and very informative spontaneous speech with no word finding difficulties or agrammatism. Although auditory comprehension of complex sentences (sequential commands subtest) was still mildly impaired in patients OM and RC, their performance on the TT was normal, as was patient JF's performance. Semantic category (animal naming) word generation though impaired in all patients was less affected than phonemic fluency (COWAT). Reading was mildly impaired in patients OM and RC and writing was additionally compromised in OM. Reading and writing could not be tested in patient JF.

#### Neuroimaging

A CT scan in OM revealed a resolving hemorrhage in the middle part of the left motor cortex surrounded by mild oedema and mass effect over the insular cortex and anterior corona radiata. There also was a small hemorrhagic focus involving the right anterior putamen, anterior insula, and dorsal part of the inferior the frontal gyrus (**Figure 2**). CT and MRI scans in JF revealed a mature, encapsulated abscess involving the left sensorimotor cortex surrounded by vasogenic edema and mass effect over the insular cortex and basal ganglia. Three months after surgery a control CT scan showed a small residual lesion involving the

#### TABLE 2 | Language and cognitive functions.


left primary motor area (**Figure 3**). In RC, a CT performed in the chronic period showed a hypodense lesion involving the left lower portion of the precentral gyrus and its adjacent pars opercularis of the inferior frontal gyrus, anterior insula, and superior temporal gyrus. There also was a moderate shrinkage of the middle and posterior insular cortex and superior temporal gyrus (**Figure 4**). Some remnants of the AVM were seen involving the left middle and superior frontal cortices [pre-supplementary motor area (pre-SMA) and SMA].

Lesion overlap in the left pericentral region showed involvement of the inferior frontal gyrus (pars triangularis and pars opercularis), Rolandic operculum, middle frontal gyrus, precentral gyrus, post-central gyrus, and dorsal insula (**Figure 5**). The greatest area of overlap was in the left precentral gyrus (**Figure 5**).

## Analysis of Accent Patient OM

#### *Acoustic analysis*

The overall impression in this patient was that he did not display a severe disturbance in speech articulation. Nevertheless, the production seemed asystematic as a multiplicity of variants occurred in those particular consonants that show a degree of variability among speakers of different Spanish dialects.

The Córdoba accent presents the aspirated variant of /s/ in intervocalic and preconsonantic positions. A voiced variant of /s/ is observed in some speakers mainly in the north of the province of Córdoba. Thus, this should not be regarded as an instance of atypical production. After the stroke, the following realizations of /s/ in both intervocalic and preconsonatic positions were observed in OM ([h], [s], [z], [š], and [θ]). Furthermore, he occasionally mispronounced /s/, confusing this sound with a low intensity [S]. The variants of [Z] in this region are [Z] and [λ], but OM alternated the variants [λ], [j], [S], and [Ã], realizations that are found in other Spanish dialects. In several regions of Argentina (including Córdoba and Corrientes) the trill is realized as [Z] and in clusters this sound is produced as a voiceless fricative of brief duration. OM produced this sound as it is usually pronounced by typical residents of Córdoba. Healthy Spanish speakers produce the approximant voiced variants [β fl , <sup>ð</sup>fl, <sup>γ</sup> fl ] of the voiced stops /b, d, g/ in intervocalic position. Like some brain damaged patients with FAS, OM produced stops in this context, thus indicating poor regulation of co-articulation (Berthier et al., 1991). Vowel shifts are relatively frequent in the accent from Córdoba as well as from other areas of Argentina. In contrast, vowel shifts were not observed in OM, which may have contributed to the perception of regional accent loss. However, entire voiceless segments including vowels were distinguished

in the spectrograms at the end of utterances, a finding that represents a misproduction. This phenomenon together with the occurrence of noise at 3000 Hz, may be indicative of poor control of phonatory vocal folding.

#### *Prosodic analysis*

Availability of comparable pre- and post-stroke samples of speech production allowed for an acoustic comparison, achieved by analyzing portions of OM's speech recorded from one prestroke audiotape. The inspection of a few F0 curves obtained from OM in a recording previous to the stroke showed some characteristics that are presumably characteristic of Córdoba accent and deviate from other regions of Argentina. In effect, whereas in Spanish speakers with typical Buenos Aires regional accent the F0 contours present a general declining trend for both peaks and valleys in declarative sentences (Manrique and Signorini, 1983), the curves of OM showed a rising end contour; F0 raised at the last accent syllable and continued to rise in the following unaccented ones. The initial and final F0 average values were 138.3 ± 12.6 Hz and 195 ± 37.7 Hz, respectively. This phenomenon has been mentioned as "a kind of tonal shift" in previous studies of regional accent in Argentina. Average inter-stress interval duration was 447 ± 92.9 ms. Analysis of F0 curves after the stroke of the same declarative sentence ("Pensamos salir ganadores" "We aim to win") showed a gradual declination. The slope of the top reference line presented an average rate of declination of about 41.4 Hz/s, similar to the value obtained in Spanish speaker from Buenos Aires (43 Hz/s). In yes-no questions overall F0 level was relatively high with regard to declarative sentences (average peak value in interrogative sentences: 152.5 Hz, final value: 288.8 Hz). However, this end rising pattern differed from the one observed in recordings obtained previous to the stroke for declarative sentences (**Figure 5**). In the latter, the overall level is lower (185 Hz). Otherwise, similarities were found in other F0 measures when pre- and post-stroke samples were compared (overall average F0: pre-stroke = 152.5 Hz; post-stroke = 155 Hz; range: pre-stroke = 80 Hz, post-stroke = 95 Hz; adjusted range: prestroke = 65 Hz, post-stroke = 54.02 Hz). However, these similar values in F0 range variation were due to high F0 values observed in the last portion of the utterances recorded previous to the stroke lesion. Except for this final portion of the utterances, the rest of the F0 contours showed a flat tendency with small F0 variation. Note that a fast speaking rate forced the F0 movements to be performed in less time and led to less F0 variation (O'Shaughnessy, 1976). Therefore, the fast speaking rate in the pre-stroke recording may explain the small F0 variation. Measures of temporal organization after the stroke showed a

extending deeply to dorsal head of the caudate nucleus. Note in the left image the involvement of the anterior insula (red arrow) and shrinkage of the middle and posterior insular cortex (orange arrow) and part of the inferior precentral gyrus and superior temporal gyrus. The right image shows a hyperdense rounded image which corresponds to a remnant of the arterio-venous malformation (AVM) involved the left superior frontal gyrus reaching the pre-supplementary motor area and the subcortical white matter (anterior centrum semiovale; yellow arrow). (Bottom) Drawings were made using MRIcro software (Rorden, C., 2005. www*.*mccauslandcenter*.*sc*.*edu/mricro/ mricro/) on T1-weighted axial images. The lesion is drawn in red (left and middle images) and the AVM in brown (right image). Fissures and sulcus are indicated with white arrows. ds indicates: diagonal sulcus; cis: central insular sulcus; iprs: inferior precentral sulcus inferior; cs: central sulcus (sulcus of Rolando); and sfs: superior frontal sulcus. Terminology and abbreviations for fissures and sulci were taken from the atlas of neuroanatomy of language regions of the human brain (Petrides, 2014). R: indicates right and L: left.

lower rate of speaking. Average overall duration of utterance was 20% longer than those obtained in the pre-stroke condition. Due to the fast rate of speaking observed in OM before the stroke, segment duration in the spectrograms was difficult to measure. Nevertheless, in the final portion of the utterances identified as having a rising F0 contour a lengthening effect of the entire syllable coincides with results previously reported for Córdoba accent (Fontanella de Weinberg, 1971). While average inter-stress intervals duration before the stroke were 447 ± 92.9 ms, duration was increased to 630 ± 250 ms after the stroke, an increment that can be explained by the insertion of pauses between words and the lack of co-articulation (**Figure 6**).

#### *Prosodic tests*

Interrogative sentences were accurately perceived by raters (100%). The sentences uttered with sad, angry, and happy intonation were mostly perceived as neutral (65% each) and mainly confused with sad. Sentences produced in neutral intonation were identified 92% as correct and confused (8%) with happy. When OM was asked to place tonic accent on some content words of a sentence from the perceptual test, his performance was flawless in all but one emission in which he emphasized every content word. The acoustic analysis showed increased F0 on stressed syllables with respect to the adjacent unstressed syllable. His comprehension of affectively intoned sentences was normal (94%).

# Patient JF

#### *Acoustic analysis*

The patient's speech production sounded similar to healthy Spanish speakers with no severe articulatory disturbances, although his production was different from the Guaranítico dialect, except in the case of the trill that continued to be produced as [Z]. The /Z/ is realized in Corrientes as [λ], [Z], or as [Ã], but several variations were found in JM including [λ], [Z], [S], [j] and mispronounced as [li]. In the case of /s/ this fricative phoneme is normally aspirated in both intervocalic and preconsonantic positions and the /θ/ phoneme frequently appears in closed syllables. The patient produced either [s] or [h] in the positions mentioned. Moreover, his realizations of [l] were grossly under-articulated and the tongue failed to reach its normal target area in most occurrences. In Spanish, vowels are either oral or nasalized in nasal context and JF frequently produced normal oral and nasal vowels. Interestingly, there were similarities with the case of OM as there were voiceless segments at the end of utterances and noise bands located at 3900 Hz, also probably due to the lack of phonatory regulation.

#### *Prosodic analysis*

There are no studies available of Guaranítico's regional accent. Therefore, speech production samples from JF were compared with those obtained in a well-matched healthy control subject, and this comparison revealed important differences that seem to demonstrate a loss of regional accent in JF. In fact, the range of intra-speaker F0 variations in both subjects was different showing that in the case of JF the F0 had less range of rise and fall movements (range = 34.1 Hz, adjusted range = 26.3 Hz) compared to the control subject (range = 90.8 Hz; adjusted range = 48.4 Hz). These results in JF are consistent with those obtained in speakers of other languages after damage to the left hemisphere (Ryalls, 1982, 1984). The tracing of the top reference line in the F0 curves of both subjects showed a difference in the slope with the control subject presenting a higher rate of declination (34.6 Hz) than in JM (10.6 Hz). Thus, the general decline trend was much lower and more gradual in JM than in the control subject and his curves showed a flatter tendency with some resetting effects (the declination line resetted to a higher F0 level). This phenomenon most likely contributed to the perception of a monotone speech. Spectrograms and F0 curves showed that the overall duration in JM (1.85 s) was similar to that of the control subject (1.93 s), but there was a considerable amount of variability in the average duration on inter-stress intervals between JM (700 ± 170 ms) and the control subject (570 ± 60 ms). It was apparent that JM showed less segment duration and hence a tendency to isosyllabicity than the control subject who manifested a rhythmic pattern based on stress alternation (**Figure 7**).

#### *Prosodic tests*

Raters accurately identified most interrogative and neutral sentences (93%), but sentences intoned with happy, sad, and angry emotion were identified most commonly as neutral (57, 65, and 82%, respectively). Happy and sad intoned sentences were rarely perceived as correct (26%). The patient performed correctly when asked to place tonic accents. Measurements of F0 peaks in all content words showed relatively higher values on lexically stressed syllables of the target stimuli. His comprehension of affectively intoned sentences was normal (94%).

#### Patient RC

#### *Acoustic analysis*

The patient did not show prominent articulatory deficits, yet his regional accent was different to the regional accent of Buenos Aires. The different variants of preconsonantic /s/ constitute a peculiar aspect of Buenos Aires Spanish (Manrique, 1980). The most frequent realization of /s/ in this position is the voiceless glottal fricative [h]. However, RC produced the dental [s] in most contexts. Moreover, due to his socio-dialect the variant expected should be the [h] and not [s]. While Buenos Aires Spanish presents a short trill sound [r] with only one period in closed syllables (Massone, 1988), RC instead produced a longer trill with more than one period, a realization characteristic of CV syllables in Buenos Aires Spanish. As already indicated for patient OM, voiced stops are realized as approximants in intervocalic position (a feature common in the majority of the Spanish dialects). In this context as well as in initial position the stopping of approximants was produced indicating the absence of co-articulation. RC also had a frequent occurrence of voiceless segments both in final and medial position of words and utterances, lack of coarticulation, inappropriate pauses, and vowel lengthening. All these impairments denoted the lack of phonatory regulation of vocal folding.

#### *Prosodic analysis*

Data obtained in RC were compared with findings on intonation and rhythmic patterns of Buenos Aires Spanish described in four healthy subjects (Manrique and Signorini, 1983; Massone et al., 1983). The range of intra-speaker F0 variation was different when compared with compared with RC (healthy controls: 64.3 Hz; RC: 82 Hz), but the adjusted range was similar (healthy controls: 42.7 Hz; RC: 43.6 Hz). However, the slow speaking rate of RC may account for the high range of F0 variation, since a slow rate allows sufficient time for all linguistic F0 movements. Inspection of the curves of declarative sentences through the tracing of the top reference line showed a declining trend (23.2 Hz/s) in RC which was considerably lower than observed in healthy controls (43 Hz/s). Taking into account that the range of F0 variation

in RC was close to normal values, the fact that the top line did not reflect a steeper slope may be due to the slow speaking rate and also to the presence of numerous pauses that increased the utterance's overall duration. Abnormal pauses may also be responsible for the length and variation observed in average duration of inter-stress intervals (RC: 820 ± 360 ms; healthy controls: 447 ± 92.9 ms; Manrique and Signorini, 1983). In healthy control subjects the production of interrogative yes-no questions had an overall F0 level that was relatively high with regard to declarative sentences, and a rising end contour. By contrast, this production in RC differed from normal because in the last portion of each declarative sentence F0 continued to rise even in unaccented syllables (**Figure 8**).

#### *Prosodic test*

Interrogative sentences were accurately identified (100%). Most sentences (83%) intoned with angry, happy, sad, and neutral emotions were identified as neutral, while a low percentage (14%) of sentences intoned with angry, happy, and neutral emotions were consistently confused with sad. The patient had problems to signal tonic accent in 22 out of 25 sentences. All content words in these sentences were perceived as bearing accent. In fact, the inspection of the curves showed F0 peaks in these words. In the only three correctly perceived sentences every content word also presented a F0 peak, but the presence

of a pause before the portion of the utterance which should receive accent may account for this correct perception. His comprehension of affectively intoned sentences was normal (100%).

#### DISCUSSION

In the present case series study, we described loss of regional accent after selective damage of different components of the distributed neural network for speech production. Our patients were Argentinean monolingual speakers with three different native Spanish accents (Cordobés or central, Guaranítico or northeast, and Bonaerense). After suffering unilateral or bilateral focal lesions these patients presented with regressive Broca's aphasia. Recovery of fluent production of speech was rapid (weeks) and it probably occurred because short fibers (subcallosum fasciculus; Naeser et al., 1989) and long-distance white matter tracts (arcuate fasciculus; Wang et al., 2013; Basilakos et al., 2014) previously linked to the persistence of reduced fluency were not affected. Regaining verbal communication and disappearance of typical features of Broca's aphasia (agrammatism), however, did not assure in these patients the production of normal speech in that velocity and rhythm were slow and phonetic features characteristic of premorbid regional accents were attenuated or appeared in a random way. By the time of formal evaluation none of the

patients could be classified as having aphasia since on the WAB they obtained an AQ well above the cut-off score (*>*93.8) for this diagnosis (Kertesz, 1982). Likewise, their performance on the TT, a difficult test of auditory comprehension which is useful to identify even very mild aphasic deficits (De Renzi and Vignolo, 1962), was also within normal limits. Reading, writing, or both were mildly impaired in two patients (OM and RC) but testing of general cognition (verbal and non-verbal intelligence, and praxis) was within normal limits. Samples of speech production were analyzed by expert phoneticians and linguists. Analysis of speech production revealed discrete slowing in speech rate, inappropriate long pauses, and monotonous intonation already described in FAS (Blumstein et al., 1987). Phonemic production remained similar to those of healthy Spanish speakers, but some phonetic variants distinctive to each accent (e.g., intervocalic aspiration of /s/ in Cordoba accent) were not present in any case. While basic normal prosodic characteristics of Spanish prosody were preserved, features intrinsic to melody of certain geographical areas (e.g., rising end F0 excursion in declarative sentences intoned with Córdoba accent) were absent. All patients managed to produce linguistic contrasts in sentences, but were impaired when producing affective prosody, a fact that rendered their speech monotonous. Brain imaging disclosed focal left hemisphere lesions (two hemorrhages and one abscess) mainly involving the middle part of the premotor/motor cortex with variable extension into their adjoining inferior and superior regions. The post-central cortex, posterior inferior and/or middle frontal cortices, anterior insula/putamen and SMA were also involved. Our findings suggest that lesions affecting the middle part of the left motor cortex and adjoining regions disrupt neural processes implicated in the production of regional accent features.

# Loss of Regional Accent or Foreign Accent Syndrome?

One intriguing characteristic of these three patients is that, while their brain lesions mostly overlapped with those of many previous patients with FAS (Berthier et al., 1991; Takayama et al., 1993; Sakurai et al., 2015), their accent was not perceived as foreign. On the contrary, they were perceived as native Argentinean speakers who did not have the typical regional accent. Two possibilities might be considered to explain this situation. One is that the linguistic characteristics of this population are fundamentally different from those of typical FAS patients. The other possible explanation is that, even if they share some of the linguistic characteristics described in FAS patients, the saliency of their specific errors produced the effect of dialectal change.

A close inspection of the data from the three patients reveals that they did have some of the characteristics of FAS patients. Most importantly, the slower speaking rate which was observed in the three patients had a negative impact on the speech rhythm and was possibly associated with F0 disturbances. In the case of JF and RC we also found a tendency to produce approximants as stops. This phenomenon implies that, similar to other FAS patients, the participants in this study did not produce the lenition processes which are characteristic of native speakers. In sum, the speech of these patients was not fundamentally different from the speech of other FAS patients. Thus, we should explain what made other speakers consider them to be Argentinean speakers without regional accent. It is important that none of the above mentioned characteristics (e.g., slow speech rate, flat F0, and stopping of approximants) is distinctive of any Argentinean (or Hispanic) dialect. This suggests that other characteristics of their speech may have blurred the effect in these three patients. It seems relevant that the three patients made errors in the production of a group of consonants that are possibly the most salient characteristic of Argentinean dialects (e.g., / [Z], S/), as opposed to other Spanish dialects (e.g., Peruvian, Mexican, European). Note also, that while there are minor differences among the different Argentinean dialects, they all share some of these sounds. The fact that these patients retained some of these sounds, but the actual distribution did not coincide with that of their own dialect, may explain that they were perceived as Argentinean speakers who had lost their original accent.

It is highly conceivable that persons with FAS attenuate the segmental and suprasegmental features that characterize their native (regional) accent and additionally acquire deviant speech features that give rise to the impression of a person speaking with a foreign accent. Bearing in mind such complex phenomena, the question that now arises is whether our patients had lost their regional accent or whether they had developed speech changes consistent with a foreign accent. Moreover, another still unaddressed issue is whether one condition can evolve into the other one or whether the two conditions can occur simultaneously insofar as two patients (OM and JF) had moments of speaking with foreign accent preceding or coexisting with loss of regional accent. Normal human language is not fixed, uniform, or unvarying (Akmajian et al., 2010) and performance in language tasks among brain damaged individuals often fluctuates nearly in an hour-to-hour basis and also during the resolution of deficits. According to the testimonies volunteered by patients' relatives, foreign accent was transiently perceived during the early recovery period in JF, but intriguingly it was heard for the first time in OM several months after onset when speech deficits are expected to be more stable. Before moving forward on this issue, it should be noted that the instances of foreign accent in our patients were based on the subjective impression of naïve listeners (patients' relatives). In the case of patient OM, while attending the funeral of his father, some relatives perceived that he was speaking with a foreign accent resembling the one used by his Italian father. This episode in OM could be reminiscent of previous cases classified as "reversion to a previously learned foreign accent" after brain damage (Seliger et al., 1992; Roth et al., 1997). Healthy children overhearing a foreign accent can automatically imitate its prosodic pattern (Nakazima, 1962) and persons with extensive exposure to a different accent spoken by their parents could adopt the accents of their families after suffering damage to the speech production network (Seliger et al., 1992; Roth et al., 1997). For example, a left-handed woman with premorbid accent from "Bronx" or "New York" developed an Irish brogue (spoken by her mother many years ago) after suffering an infarct in left centrum semiovale underlying the parietal cortex (Seliger et al., 1992). A citizen of the United States with typical American English developed a Dutch accent after a left parietal hemorrhagic stroke (Roth et al., 1997). It was noteworthy that this patient was born in the Netherlands and lived there until the age of 5 years. Thus, awakening of a dormant foreign accent should be entertained in OM case. Nevertheless, the production of an Italian accent by OM was a short-lived phenomenon (hours) that came about during a stressful life event (bereavement). Thus, an emotional reaction accounting for an evanescent foreign accent could not be discarded (Van Borsel et al., 2005; Verhoeven et al., 2005). In the case of patient JF, his mother and an aunt reported that during the early recovery phase from aphasia, he spoke with a foreign accent resembling Japanese. This lasted a few days and afterward, according to the same relatives, JF's speech was "flat" to the extent that "when he volunteered a question it was not possible to know if was actually asking something or not." Assuming that both patients actually had instances of foreign accent, these were short-lived phenomena and these features were no longer present at the moment of formal linguistic evaluation. By contrast, loss of regional accent was a long-lasting disorder in OM and lasted at least 4 months in JF (he was lost for follow-up). Moreover, at the time of formal evaluation of speech production, three experts with knowledge of Italian (the Italian population in Argentina is the third largest in the world) did not find linguistic elements of such accent in OM's verbal productions. Although the same three experts did not find hints of foreign accent in the other two patients (JF and RC) they concurred that the patients' accents

sounded "neutral" and devoid of the typical features of their respective regional accents.

### Neural Mechanisms

The interaction of different brain areas mediating speech production and monitoring is complex and, at present, is best conceptualized in the context of computational models like DIVA (Directions into Velocities of Articulators – Guenther et al., 2006) and GODIVA (Gradient Order DIVA – Bohland et al., 2010). These models of speech production control have been mapped onto anatomical regions in the human brain and their dysfunction has been invoked to explain FAS (Golfinopoulos et al., 2010; Tomasino et al., 2013). The role of brain regions involved in loss of regional accent will be briefly outlined in each of the next sections.

#### Motor, Premotor, and Sensorimotor Cortices

From a neuroanatomical viewpoint one relevant finding of the present study is that our three patients had damage to different components of the bilaterally distributed neural network mediating speech production. Analysis of lesion topography in our patients disclosed overlap along the central sulcus involving the medial part of primary motor cortical region [Brodmann's areas (BAs) 4 and 6] with extension to its adjacent post-central cortex (BA 3), Rolandic operculum, middle frontal gyrus (BA 6), inferior frontal gyrus [pars triangularis (BA 45) and pars opercularis (BA 44) and dorsal insula (BA 13)]. The only area consistently involved in all three patients was the left precentral gyrus. Recent studies using brain imaging, transcranial magnetic stimulation, and computational modeling have provided empirical evidence of the role played by different regions of the speech production network. The initiation and planning of speech, the control of the articulators, and the monitoring of one's own voice depends on the concerted activity of the primary motor and somatosensory cortices, auditory cortical areas, SMA, the precentral gyrus of the insula, and portions of the thalamus, basal ganglia, and cerebellum (Dronkers, 1996; Wise et al., 1999; Riecker et al., 2005; Sörös et al., 2006; Bohland and Guenther, 2006). Neuroimaging and brain stimulation studies reliably show that the activity elicited by both speech production and movements of the speech effectors is somatotopically organized with a dorso-ventral distribution (lip, jaw, vocalic/laryngeal, and tongue movements) in the motor cortex displaying an overlapping arrangement (Brown et al., 2008; Takai et al., 2010; Grabski et al., 2012) with great variability in the locations of activations among studies (see Brown et al., 2009 for a meta-analysis of phonation studies). This would imply that discrete damage to the primary motor cortex can induce different alterations of speech production (Eickhoff et al., 2009) and in the present cases it may have disrupted stored feedforward speech motor commands of phonetic features signaling regional accent (see Guenther et al., 2006). Indeed, circumscribed damage to the left primary motor cortex has been associated with different types of FAS affecting prosody (Takayama et al., 1993; Carbary et al., 2000; Katz et al., 2012), phonetics (Schiff et al., 1983; Kurowski et al., 1996), or both (Blumstein et al., 1987; Berthier et al., 1991; Scott et al., 2006). However, in the three patients described herein involvement of the middle part of the left primary motor/premotor cortices and their immediately adjacent areas did not induce FAS (but see below) as in the aforementioned cases with lesions restricted to these regions (Berthier et al., 1991; Takayama et al., 1993; Sakurai et al., 2015).

Since the cortical motor system is organized in a somatotopic fashion, features of articulatory/phonatory activities are controlled by different parts of motor and premotor cortex (Hauk et al., 2004; Pulvermüller et al., 2006). Previous brain imaging studies showed that the area with strongest activation in speech tasks corresponds to the region in the motor cortex underlying vocal folds activity (the larynx area; Brown et al., 2008; Simonyan, 2014). This cortical region has been repeatedly implicated in the pathogenesis of FAS (Berthier et al., 1991; Takayama et al., 1993; Tomasino et al., 2013: Sakurai et al., 2015). In one study, the left larynx area increased its activity to compensate the involvement of other areas relevant for phonation/articulation (Tomasino et al., 2013). By contrast, very discrete damage to activity of the larynx area correlated with abnormal prosodic production (Takayama et al., 1993) and prolongation of silent intervals between words (Sakurai et al., 2015) in previous FAS cases. We found that our three patients produced entire voiceless segments including vowels at the medial (RC) and final (OM and RC) position of words and utterances, a finding that reflects poor control of phonatory vocal vibration (Sakurai et al., 2015) most likely due to involvement of the larynx area in the left motor cortex.

This discrepancy between our patients and previous cases of FAS lies on the fact that the consistent damage to the left primary motor cortex in all three patients coexisted with involvement of other regions (insula, striatum, pre-SMA, and SMA) which were not simultaneously damaged in previous cases of FAS. Patient OM had a small hemorrhage involving the anterior insula/putamen region with minimal dorsal extension into the inferior frontal cortex, yet this lesion was right-lateralized and not placed in the left hemisphere as was described in previous cases of FAS (Scott et al., 2006; Moreno-Torres et al., 2013). By contrast, focal insular involvement in RC was in the left hemisphere and occupied in the anterior sector and extended into the anterior putamen. The rest of the left insular cortex together with ventral premotor and motor cortices, and superior temporal gyrus showed post-stroke atrophic changes. In patient JF, the cerebral abscess impinged upon the left anterior insula and displaced it medially but these pressure effects were no longer detected in a post-surgical CT which disclosed only a small residual lesion in the left primary motor cortex. The role of involvement of these cortical and subcortical regions on accent change is described below.

#### Insula

The insula acts as a multimodal integration hub to coordinate the activity of number of regions important for verbal and non-verbal processing (Mesulam and Mufson, 1985; Habib et al., 1995; Ackermann and Riecker, 2004, 2010). Moreover, the left anterior insula is a key component of the planning network for speech production playing a role in the formulation of complex articulation (Wise et al., 1999; Ackermann and Riecker, 2004, 2010; Riecker et al., 2005; Baldo et al., 2011). It is also important for producing speech with a distinct rhythm/intonation structure (Scott et al., 2006) and for phonetic learning (Golestani et al., 2007; Ventura-Campos et al., 2013). Insular involvement alone (Scott et al., 2006) or associated with damage to other regions (Moreno-Torres et al., 2013) has been described in FAS. Nevertheless, insular involvement in our cases did not induce FAS but it may well have contributed to an alteration of regional accent characteristics by causing additional deficits in the production of emotional prosody and, to a lesser extent, of linguistic prosody. In fact, by virtue of its strong connections with limbic and paralimbic regions, insular damage may play a key role on adjusting motor speech to speaker's emotional status (Mesulam and Mufson, 1985; Ackermann and Riecker, 2004, 2010). Insular damage in OM and RC or dysfunction in JF would have reduced modulation of prosody particularly in emotional contexts, eventually leading to a monotonous emissions (Ackermann and Riecker, 2004, 2010). Monotonous speech along with attenuation of phonetic features of native (regional) speech may have greatly influenced the opinion of experts' judges to classify speech production as lacking regional accent discarding the diagnosis of FAS.

#### Putamen

The right and left anterior putamen were involved in OM and RC, respectively. The role of the left putamen in speech articulation is supported by cases of FAS (Gurd et al., 1988) and by emergence of regional variant of her native language (Naidoo et al., 2008) both occurring after lesions restricted to it. Stroke lesions close to the left putamen have also been associated with loss of regional accent in a single patient (Alexander et al., 1987), whereas a loss of premorbid talent to imitate several dialects followed a stroke lesion in the right putamen (patient 2 in Van Lancker Sidtis et al., 2006). The putamen is part of the cortico-striato-thalamocortical network (Alexander et al., 1986) and acts as a relay station between the pre-SMA and motor cortex, so that its damage may alter the interplay between planning and execution of speech motor acts.

#### Pre-SMA and SMA

The pre-SMA is related to linguistic processing (Catani et al., 2013), whereas the SMA proper participates in speech initiation, coordination and speech monitoring (Laplane et al., 1977; Crosson et al., 2001; Alario et al., 2006). Both, the pre-SMA and SMA play a role on planning and motor initiation and interact with the executive motor cortex via the basal ganglia (motor loop) and thalamus (Bohland and Guenther, 2006; Bohland et al., 2010). Lesion mapping studies show that damage to medial frontal cortex (pre-SMA and SMA) interrupting (or not) the frontal aslant tract (FAT) has been associated with speech arrest (Martino et al., 2012), reduced speech fluency (Catani et al., 2013; Basilakos et al., 2014; Kronfeld-Duenias et al., 2014), and impaired morphological derivation of verbs (Sierpowska et al., 2015). The left pre-SMA and SMA were affected in patient RC and its involvement most likely disrupted connectivity through U-fibers with the precentral cortex and cingulate cortex and through long-distance tracts with the striatum and pars opercularis of the inferior frontal cortex via the FAT (Vergani et al., 2014). Involvement of the left SMA may have also interrupted in RC connectivity with the left Heschl's gyrus (both cortical areas were damaged) altering monitoring during overt speech production (McGuire et al., 1996; Christoffels et al., 2007; Toyomura et al., 2007; van de Ven et al., 2009) and hence contributing to RC's lack of insight about his accent change. On the other hand, he was fully aware of the flatness of his emissions presumably because the right SMA and Heschl's gyrus needed for informing the speaker on the status of emotional prosody production remained functional. Thus, involvement of the left pre-SMA and SMA was in a position to disrupt their dynamic interplay with structures important for planning and execution of speech as well as speech monitoring.

However, the contribution of the left pre-SMA, SMA and the origins of FAT to symptomatology in patient RC should be interpreted with caution. These structures were invaded by the AVM from early brain development, thus suggesting reshaping of original functions in other cortical regions (Lazar, 2001). Even though the left primary motor cortex was not involved by the AVM, the fact that handedness had changed might have modified its somatotopic arrangement (Klöppel et al., 2007, 2010) perhaps influencing the effects of brain damage. There are some clues that brain reorganization of language could have taken place in RC. First, he was a forced lefthander, a condition defined as "innately left-handed individuals forced to write with the non-dominant right hand." It has been demonstrated that being a forced left-hander modifies the architecture of the primary motor cortex (Siebner et al., 2002; Klöppel et al., 2007, 2010) thus implying that damage to such region in the left hemisphere might not exerted the same deficits as reported in right handed individuals. Second, RC showed a surprisingly rapid and complete recovery of aphasia despite having involvement of the left central perisylvian language cortex. Third, despite having a left hemisphere lesion, RC's performance on the WAIS was highly discrepant with unexpectedly higher verbal (111) than performance (97) IQ scores. Taken as a whole, these findings favors the position that some language functions were transferred to the right hemisphere during development (Vikingstad et al., 2000; Berthier et al., 2011) although some intrahemispheric reorganization to regions adjacent to the medial frontal cortex cannot be excluded (Duffau et al., 2000). Early cross-hemispheric plasticity may have assured normal speech and language acquisition in RC (he had no history of speechlanguage disability) but possibly interfered with typical right hemisphere cognitive functions, the so-called "crowding effect" (Satz et al., 1994; Lidzba et al., 2006). Lazar et al. (2000)

#### REFERENCES


using functional MRI showed that one of their patients (case 2) with a left frontal AVMs had mirror reverse pattern of activation in the right hemisphere (insula, frontal operculum pars opercularis, and inferior frontal gyrus) during word-list generation.

## CONCLUSION

In summary, the results of the present case series study suggest that damage to the left premotor/motor cortex and other nodes of the speech production network (insula, basal ganglia, pre-SMA, and SMA) can alter segmental and suprasegmental features that characterize regional accents. Loss of regional accent was long-lasting in the two patients who had additional damage to other structures involved in speech production, thus suggesting that these lesions exerted an additive negative effect precluding full spontaneous recovery. Further studies using functional neuroimaging are required to examine more fully the potential contribution of dysfunction of different components of the speech production network to both the emergence as well as the persistence of accent change. Moreover, studies examining the impact of losing regional accent on functional communication and psychosocial adjustment are strongly needed (see Lippi-Green, 1994; Miller et al., 2011; Moreno-Torres et al., 2013). Finally, we suggest that loss of regional accent should be added to the spectrum of disorders characterized by changes in accent after brain damage, like FAS and its variants.

### AUTHOR CONTRIBUTIONS

MB, GD, IM-T, and MM were involved in conception and design, acquisition of data, or analysis and interpretation of data. RR-C, NV, and KT-H interpreted neuroimaging data. MB, AB-C, DS-M, and MT-P drafted the article and revised it critically for important intellectual content; and all authors approved the final version for publication.

#### ACKNOWLEDGMENTS

The authors are grateful to Angela Signorini and Adelaida Ruiz for performing acoustic analyses and language testing, respectively. This study was partially funded by a grant from the Raúl Carrea Institute for Neurological Research (FLENI), Buenos Aires, Argentina to MB and MM. The authors are also grateful to Mary Griffith for critical reading of the final version.

hypothesis. *Brain Lang.* 89, 320–328. doi: 10.1016/S0093-934X(03) 00347-X


variation on phoneme perception. *Cognition* 111, 390–396. doi: 10.1016/j.cognition.2009.02.013


motor area in man. *J. Neurol. Sci.* 34, 301–314. doi: 10.1016/0022-510X(77) 90148-4


fasciculus and other white matter pathways in recovery of spontaneous speech. *Brain* 112, 1–38. doi: 10.1093/brain/112.1.1


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Berthier, Dávila, Moreno-Torres, Beltrán-Corbellini, Santana-Moreno, Roé-Vellvé, Thurnhofer-Hemsi, Torres-Prioris, Massone and Ruiz-Cruces. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*