**WHAT WE LEARN AND WHEN WE LEARN IT: SENSITIVE PERIODS IN DEVELOPMENT**

**Topic Editors Virginia Penhune and Étienne de Villers-Sidani**

#### *FRONTIERS COPYRIGHT STATEMENT*

© Copyright 2007-2014 Frontiers Media SA. All rights reserved.

All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.

The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.

Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.

Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.

As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.

All copyright, and all rights therein, are protected by national and international copyright laws.

The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use.

Cover image provided by Ibbl sarl, Lausanne CH

**ISSN** 1664-8714 **ISBN** 978-2-88919-327-1 **DOI** 10.3389/978-2-88919-327-1

## *ABOUT FRONTIERS*

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

## *FRONTIERS JOURNAL SERIES*

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing.

All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

## *DEDICATION TO QUALITY*

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view.

By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

## *WHAT ARE FRONTIERS RESEARCH TOPICS?*

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area!

Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

## **WHAT WE LEARN AND WHEN WE LEARN IT: SENSITIVE PERIODS IN DEVELOPMENT**

Topic Editors: **Virginia Penhune,** Concordia University, Canada **Étienne de Villers-Sidani,** Montreal Neurological Institute, McGill University, Canada

The impact of training or experience is not the same at all points in development. Children who receive music lessons, or learn a second language before age 7-8 are more proficient as adults. Early exposure to drugs or trauma makes people more likely to become addicted or depressed later life. Rat pups exposed to specific frequencies from 9-13 days post-partum show expanded cortical representations of these frequencies. Young birds must hear and copy their native song within 1-2 months of birth or they may never learn it at all. These are examples of sensitive periods: developmental windows where maturation and specific experience interact to produce differential long-term effects on the brain and behavior.

While still controversial, evidence for the existence of sensitive periods has grown, as has our understanding of the underlying mechanisms of brain plasticity. Behavioral evidence from studies of language, psychopathology or vision in humans has been complemented by evidence elucidating molecular, gene and hormonal mechanisms in animals. It has been proposed that sensitive periods can be both opened and closed by specific experience, and that there are multiple, overlapping sensitive periods that occur through-out development as functions come on line. It is also likely that experience-dependent behavioral or brain plasticity accrued during one sensitive period can serve as a scaffold on which later experience and plasticity can build.

Based on current knowledge, there are a number of broad questions and challenges to be addressed in this domain, these include: generating new information about the neurobiological mediators of structural and functional changes; proposing models of brain development that will better predict when sensitive periods should occur and what functions are implicated; investigation of the interaction between experience during a sensitive period and pre-existing individual differences; and the relationship between experience during a sensitive period and on-going experience.

The goal of this Research Topic is to bring together scientists in different fields whose work addresses these issues, including animal and human developmental neuroscience, language and cognitive development, education, developmental psychopathology and sensory neuroscience.

# Table of Contents


*143 Musical Training Heightens Auditory Brainstem Function During Sensitive Periods in Development*

Erika Skoe and Nina Kraus

*158 The Relationship Between the Age of Onset of Musical Training and Rhythm Synchronization Performance: Validation of Sensitive Period Effects* Jennifer A. Bailey and Virginia B. Penhune

## Time for new thinking about sensitive periods

## *Virginia Penhune1,2\* and Etienne de Villers-Sidani 2,3*

*<sup>1</sup> Laboratory for Motor Learning and Neural Plasticity, Psychology, Concordia University, Montreal, QC, Canada*

*<sup>2</sup> International Laboratory for Brain, Music and Sound, Montreal, QC, Canada*

*<sup>3</sup> Neurology and Neurosurgery, Montreal Neurological Institute, Montreal, QC, Canada*

*\*Correspondence: vpenhune@gmail.com*

#### *Edited and reviewed by:*

*Maria V. Sanchez-Vives, ICREA-IDIBAPS, Spain*

**Keywords: brain plasticity, early experience, cognitive development, neuro-development, brain maturation**

The impact of training or experience is not the same at all points in development. Children who receive music lessons, or learn a second language before age 7–8 are more proficient as adults. Early exposure to drugs or trauma makes people more likely to become addicted or depressed later life. Rat pups exposed to specific frequencies from 9 to 13 days post-partum show expanded cortical representations of these frequencies. Young birds must hear and copy their native song within 1–2 months of birth or they may never learn it at all. These are examples of sensitive periods: developmental windows where maturation and specific experience interact to produce differential long-term effects on the brain and behavior.

While still controversial, evidence for the existence of sensitive periods has grown, as has our understanding of the underlying mechanisms of brain plasticity. Behavioral evidence from studies of language, psychopathology or vision in humans has been complemented by evidence elucidating molecular, gene, and hormonal mechanisms in animals. It has been proposed that sensitive periods can be both opened and closed by specific experience, and that there are multiple, overlapping sensitive periods that occur through-out development as functions come on line. It is also likely that experience-dependent behavioral or brain plasticity accrued during one sensitive period can serve as a scaffold on which later experience and plasticity can build.

Research into sensitive periods—or the interaction between development and specific experience—has entered a new phase as evidenced by the range of contributions brought together in this volume. Until very recently, sensitive periods were considered to be relatively narrow phenomena, often associated with the acquisition of specific perceptual abilities. This narrow definition has now evolved into a broader concept suggesting that the timing of individual experience interacts with developmental changes in the brain to produce synergistic effects on perceptual, cognitive, and motor function.

The broad concept that the timing of individual experience interacts with brain development and might even guide it is illustrated by articles examining both lower and higher-level brain functions, such as the effect of age of start of music training on brain stem responses to speech sounds (Skoe and Kraus, 2013); the effect of age of language acquisition on discrimination of visual speech cues (Weikum et al., 2013) or novel language learning (Finn et al., 2013); and perceptual narrowing in infancy for cross-species voice perception (Friendly et al., 2013). This broader conceptualization of sensitive period effects is also illustrated by work examining the interaction of development and experience at different ages, including infancy (Bosseler et al., 2013; Weikum et al., 2013), early childhood (Bailey and Penhune, 2013; Putkinen et al., 2013) and even adulthood (Finn et al., 2013). Articles in this volume also examine sensitive period effects in the auditory and visual systems in relation to sensory loss or deprivation (Gordon et al., 2013; Voss, 2013). Sensitive period effects are being explored at a number of different levels of the nervous system, including work at the molecular and cellular levels. One study examines how the interaction of normal brain development and the timing of gene expression may explain pathology in developmental disorders (Kroon et al., 2013) and another paper reviews work using a rat model to study how the timing of a perinatal insult affects later auditory processing (Fitch et al., 2013). Finally, because complex experience impacts brain systems involved in multiple processes, a number of papers examine transfer across domains, especially the effect of musical training on language processing (Martínez-Montes et al., 2013; Putkinen et al., 2013; White et al., 2013).

Taken together, the articles selected for this Special Topic are outstanding examples of the range of questions and approaches that characterize the new approach to studying sensitive period effects today. We hope that they will provide both an empirical background and theoretical basis for future work.

Based on the research presented here, we see a number of broad questions and challenges to be addressed by future research into sensitive periods. These include: (1) generating new information about the neurobiological and experiential mediators of structural and functional brain changes; (2) proposing models of brain development that better predict when sensitive periods should occur and what functions would be implicated; (3) investigation of the interaction between experience during a sensitive period and pre-existing individual differences; (4) examining the relationship between experience during a sensitive period and ongoing experience and (5) determining the mechanisms by which sensitive period-like plasticity could be re-activated in the adult brain for the remediation of perceptual or cognitive impairments.

## **REFERENCES**


infant speech perception. *Front. Psychol*. 4:690. doi: 10.3389/fpsyg.2013. 00690


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 21 January 2014; accepted: 24 March 2014; published online: 09 April 2014. Citation: Penhune V and de Villers-Sidani E (2014) Time for new thinking about sensitive periods. Front. Syst. Neurosci. 8:55. doi: 10.3389/fnsys.2014.00055*

*This article was submitted to the journal Frontiers in Systems Neuroscience. Copyright © 2014 Penhune and de Villers-Sidani. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Age-related sensitive periods influence visual language discrimination in adults

#### *Whitney M. Weikum1, Athena Vouloumanos 2, Jordi Navarra3, Salvador Soto-Faraco4,5, Núria Sebastián-Gallés <sup>4</sup> and Janet F. Werker <sup>6</sup> \**

*<sup>1</sup> Department of Pediatrics, University of British Columbia, Vancouver, BC, Canada*

*<sup>2</sup> Department of Psychology, New York University, New York, NY, USA*

*<sup>3</sup> Parc Sanitari Sant Joan de Déu, CIBERSAM, Fundació Sant Joan de Déu, Barcelona, Spain*

*<sup>4</sup> Center for Brain and Cognition, Departament de Tecnologies de la Informació i les Comunicacions, Universitat Pompeu Fabra, Barcelona, Spain*

*<sup>5</sup> Institució Catalana de Recerca i Estudis Avançats, Barcelona, Spain*

*<sup>6</sup> Department of Psychology, University of British Columbia, Vancouver, BC, Canada*

#### *Edited by:*

*Virginia Penhune, Concordia University, Canada*

#### *Reviewed by:*

*Natalie Phillips, Concordia University, Canada Denis Burnham, University of Western Sydney, Australia*

#### *\*Correspondence:*

*Janet F. Werker, Department of Psychology, The University of British Columbia, 2136 West Mall, Vancouver, BC V6T 1Z4, Canada e-mail: jwerker@psych.ubc.ca*

Adults as well as infants have the capacity to discriminate languages based on visual speech alone. Here, we investigated whether adults' ability to discriminate languages based on visual speech cues is influenced by the age of language acquisition. Adult participants who had all learned English (as a first or second language) but did not speak French were shown faces of bilingual (French/English) speakers silently reciting sentences in either language. Using only visual speech information, adults who had learned English from birth or as a second language before the age of 6 could discriminate between French and English significantly better than chance. However, adults who had learned English as a second language after age 6 failed to discriminate these two languages, suggesting that early childhood exposure is crucial for using relevant visual speech information to separate languages visually. These findings raise the possibility that lowered sensitivity to non-native visual speech cues may contribute to the difficulties encountered when learning a new language in adulthood.

#### **Keywords: visual speech, language discrimination, sensitive period, adults, age of acquisition**

## **INTRODUCTION**

From the first days of life, language perception involves both auditory and visual speech information. The visual information available in talking faces contains linguistic cues often correlated with and complementary to the acoustic signal (e.g., Munhall and Vatikiotis-Bateson, 1998; Yehia et al., 1998). In adults, seeing talking faces enhances speech perception (Sumby and Pollack, 1954), and in some cases, can perceptually dominate overheard speech (see McGurk and MacDonald, 1976; Campbell, 2009). Similarly, there is evidence suggesting that very young infants can match heard speech with the corresponding talking faces (Kuhl and Meltzoff, 1982; Patterson and Werker, 2002), detect a mismatch between heard and seen speech (Kushnerenko et al., 2008; Bristow et al., 2009), and integrate mismatching audiovisual speech (Rosenblum et al., 1997; Burnham and Dodd, 2004; Desjardins and Werker, 2004). Moreover, both adults and young infants are able to discriminate between languages just from silent talking faces (Soto-Faraco et al., 2007; Weikum et al., 2007; Ronquest et al., 2010).

Sensitive periods in language development have been documented for both auditory and visual speech perception. Infants begin life with broad perceptual sensitivities that support learning phonetic properties from many of the world's languages (e.g., Saffran et al., 2006), but as their experience accumulates across the first year of life, their perceptual sensitivities become attuned to match the language(s) present in their environment (see Werker and Tees, 2005, for a review). This pattern is seen in age-related changes between 6 and 10 months of age for the discrimination of minimal pairs that are phonologically relevant to the infant's native language (e.g., Werker and Tees, 1984; Werker and Lalonde, 1988; Best et al., 1995; Bosch and Sebastián-Gallés, 2003; Tsao et al., 2006; Albareda-Castellot et al., 2011), in visual language discrimination (Weikum et al., 2007; Sebastián-Gallés et al., 2012), and even in auditory-visual matching (Pons et al., 2009). This tendency, often referred to as "perceptual narrowing" (Scott et al., 2007), seems to be extensively constrained by maturational factors, particularly in the domain of phonetic consonant discrimination (Peña et al., 2012).

An interesting case is when the listener is regularly exposed to more than one language (as is arguably the case for most of the world's population; see Brutt-Griffler and Varghese, 2004). Infants exposed to two different languages seem to maintain their sensitivity to the distinctions used in each of their languages. For example, at the end of the first year of life, bilingual infants can discriminate the heard speech sounds (Bosch and Sebastián-Gallés, 2003; Burns et al., 2003; Albareda-Castellot et al., 2011) and visual speech (Weikum et al., 2007) of both of their native languages. Thus, early life exposure to two languages results in a perceptual system that reflects, and is responsive to, the input from each language.

In stark contrast to the flexibility that "crib" bilinguals show, individuals who acquire a second language in adulthood have notorious difficulty learning to discriminate some of the phonological categories in their second language (L2). One of the best known examples is the difficulty Japanese learners often have in discriminating the English /r/ vs. /l/ contrast (Goto, 1971). It is equally hard for English speakers to learn to discriminate the dental /da/ vs. retroflex /Da/ sounds used in Hindi (Werker et al., 1981). In both cases, while intensive training can lead to some improvement, performance does not reach the level of native speakers (Tees and Werker, 1984; Lively et al., 1993; McClelland et al., 2002). Even highly proficient bilinguals, such as Spanishnative speakers of Catalan, can learn to discriminate contrasts specific to their L2 (i.e., /e/ vs. /ε/; Sebastián-Gallés and Soto-Faraco, 1999) but they nonetheless show poorer use of these distinctions in lexical decision and other higher level processing tasks (Pallier et al., 2001; Navarra et al., 2005; Sebastián-Gallés and Baus, 2005; Sebastian-Gallés et al., 2006; Díaz et al., 2008). Interestingly, the discrimination between Catalan sounds /e/ and /ε/ is enabled in Spanish-dominant Spanish-Catalan bilinguals who cannot otherwise distinguish these phonemes auditorily, when both the visual and the auditory speech information are available (Navarra and Soto-Faraco, 2007). This finding suggests that providing visual speech information can enhance discrimination of spoken L2 sounds.

Second language learners also show differences with regard to prosodic or supra-segmental language contrasts (e.g., Otake and Cutler, 1999). For instance, stress patterns on nonsense words are easily perceived by speakers of Spanish (a language in which stress can vary at the word level) but not speakers of French (a language in which stress is mostly invariant at the word level; Dupoux et al., 1997, 2008). Additionally, extensive training on some suprasegmentals (Mandarin tones) can lead to improvements in tone discrimination (Wang et al., 1999). However, in contrast to birth or very early bilinguals, adult L2 learners rarely achieve native-like performance.

Studies looking at the age of acquisition (AoA) of the second language suggest that the auditory phonemic system appears to start losing plasticity in early childhood. For example, among children who acquired a second language after age 7, auditory phonetic perception and production of accent-free speech are less precise than among children who acquired their second language before age 7 (e.g., Flege and Fletcher, 1992; Flege et al., 1995). Other studies indicate that even early bilinguals who learned their second language between birth and 6 years struggle on some phonological tasks in their second language (Pallier et al., 1997; Sebastián-Gallés and Soto-Faraco, 1999) and show, in general, poor sensitivity to phonetic distinctions from their non-dominant language when speech is presented acoustically (Navarra et al., 2005; Sebastián-Gallés and Baus, 2005). Early auditory language exposure thus seems important for achieving native-like phonological processing and accent-free fluency, though the age at which performance deteriorates can vary with the task.

Evidence concerning the importance of early experience for language acquisition also comes from studies of children and adults who, through adoption or immigration, had first language attrition to some degree while acquiring a second language. An influential series of studies tested adults who had been adopted from Korea between the ages of 3 and 9 into French homes and hence had little to no opportunity to speak or even hear Korean thereafter. These adults showed no savings from their early exposure to Korean, and were unable to recognize sentences or understand individual words in Korean (Pallier et al., 2003), or to discriminate the Korean 3-way distinction among plain, tense and aspirated voiceless Korean stops (not used in French; Ventureyra et al., 2004). Indeed, their performance on these speech contrasts was not significantly different from that of French speakers who had no exposure to Korean as children. In contrast, other studies have found lasting influences from the first language even years after it had attrited. For example, Korean adoptees to the U.S. were able to discriminate Korean words better than English listeners, particularly if they had some re-exposure to Korean (Oh et al., 2003). Moreover, studies following exposure to languages as diverse as Korean, Spanish, and Hindi—even just during the infancy period with subsequent loss of that first language—show a significant advantage in training studies or language learning classes for learning auditory phonetic contrasts from the attrited language (Tees and Werker, 1984; Au et al., 2002; Knightly et al., 2003; Oh et al., 2003, 2010; Hyltenstam et al., 2009). Thus, to the extent that retraining is seen as reactivation of old memory traces (e.g., Bjork and Bjork, 2006), one can say that exposure during the first few years of life can have a lasting effect on sensitivity to phonemic contrasts.

Despite all the research in speech perception, the vast majority of studies deal with auditorily presented materials. Much less is known about the development of visual speech perception capabilities. As previously mentioned, monolingual infants aged 4 and 6 months are able to discriminate their native language from an unfamiliar language just by watching silent talking faces, but no longer do so by 8 months unless they are growing up in a bilingual environment (Weikum et al., 2007; Sebastián-Gallés et al., 2012). Nonetheless, there is some latent sensitivity to visual information even among adults, but only if they know one of the languages. For example, Soto-Faraco et al. (2007) found that adult Spanish, Catalan, and Spanish-Catalan bilinguals were able to discriminate visual Spanish from visual Catalan significantly better than chance, whereas Italian and English speakers were not. Using two languages that were less similar, English and Spanish, Ronquest et al. (2010) reported similar results.

A question that these studies do not address is whether there is an influence of AoA for one of the test languages on visual language processing, in the same way that this variable plays an important role in auditory language perception. There is one suggestion in the literature of such an effect in a study of visual language discrimination of Finnish vs. Swedish where a trend was observed for better discrimination by participants' age of arrival in Sweden (Öhrström et al., 2009). The current study investigated precisely this question: Does age of acquisition of an L2 play a role in the ability to visually discriminate the L2 language from other languages? In order to investigate this issue, we tested adult participants from varied (non-French) language backgrounds who had acquired English at different ages (from birth to late childhood) on the visual French and English stimuli (used in previous work with infants, Weikum et al., 2007; Sebastián-Gallés et al., 2012). English and French differ both rhythmically and phonetically. Rhythmically, the two languages differ as English is a stress-timed language and French is a syllable-timed language (Pike, 1945; Abercrombie, 1967). Phonetically, segmental differences, such as more vowel lip-rounding and greater degree of lip protrusion in French, and the use of interdental articulations in English, exist between the two languages (Benoit and Le Goff, 1998).

On the basis of the literature reviewed above, showing age of acquisition effects on phonetic (segmental) and supra-segmental auditory speech perception, we hypothesized that visual language discrimination would also be influenced by the age at which the second language was learned. We therefore tested adults who had learned English at different ages. We divided the adults into three groups. The first group (Infant Exposure) was comprised of adults who had acquired English in infancy (by 2 years)—either as a single language or in a dual language-learning environment. Because an effect has been found for visual language discrimination between 6- and 8-months (Weikum et al., 2007), we were interested to determine whether this decline in visual language discrimination provides evidence for an optimal period in infancy that has life long consequences, or whether it shows a (re)organization process that has begun, but has not yet become permanent. However, adults are not accurate in reporting precisely when input from a second language began (especially if it was early in life), so we decided to use a broad range (0– 2) to cover infancy. Thus, although a cut-off at 6 months of age would have provided an ideal comparison for the perceptual change found in the infant work, to be conservative we used a 2 year cut-off. The second group (Early Exposure) was comprised of adults who had acquired English after age 2 and before 6 years. Previous studies examining auditory speech perception and production have suggested that age 6 may be an important cut-off for phonological processing and accent-free speech (e.g., Flege and Fletcher, 1992; Flege et al., 1995) and studies have also shown that even early bilinguals may show differences on difficult phonological tasks (Pallier et al., 1997; Sebastián-Gallés and Soto-Faraco, 1999). Thus, this middle age group was comprised of Early, but not "crib" bilinguals. From a theoretical perspective, this group would include individuals who acquired the second language once the perceptual reorganization for the first language had already been established. The third group (Late Exposure) was comprised of adults who had acquired English after age 6 and before age 15. We compared these three groups on their ability to discriminate English visual speech from French visual speech (a non-native language for all the participants).

We predicted that the adults' ability to discriminate English from French based on visual information alone would depend on the age at which they learned English. To control for the possibility that short-term familiarity with a speaker could enhance language discrimination, we showed all participants videos of three different bilingual speakers and tested participants under two conditions. In the random condition, paired sentences from all three speakers were presented in random order. In the blocked condition the participants viewed all the sentence pairs from each of the three speakers in succession. If the blocked condition (where participants were able to see the same speaker over and over) conferred any short-term familiarity benefits, we would expect improved performance among the speakers in the blocked condition.

#### **METHODS PARTICIPANTS**

In accordance with the Behavioral Research Ethics Board at the University of British Columbia, all participants gave informed consent before participating. There were 120 adult participants (see **Table 1** for details). Sixty participants had learned English as a first language (L1) before age 2. In this group, 40 participants had learned only English and 20 participants had learned English in conjunction with another language (Infancy multilinguals). An additional group of 60 had learned English as a second language (L2) after the age of 2 years. These L2 participants were further divided according to the age at which they started to learn English. Thirty participants had learned English as a second language in early childhood (age 2–6 years; Early multilinguals), and 30 participants had learned English learned as a second language in late childhood (age 6–15 years; Late multilinguals). Although the first language (L1) of the L2 participants was quite varied, the majority of the languages were either Cantonese or Mandarin (see **Table 2** for participant language background information). None of the participants were fluent in French<sup>1</sup> .

All subjects were highly proficient in English. All courses at the university they were attending were in English, and all who had English as a second language had passed the mandatory TOEFL requirement. In addition, we asked participants who had learned English as a second language, or simultaneously with another language from birth to rate themselves on their English proficiency. The first 11 participants rated their proficiency on a 7-point Likert scale where (1) represented native-like and (7) represented beginner. We switched to a more detailed questionnaire (Desrochers, 2003) for the remaining participants. This included 8 oral comprehension and 14 oral production questions. For each question, participants rated the difficulty of various speech activities on a 9-point Likert scale as very easy (1) to very difficult (9). The mean answer to these 22 questions was used as each participant's proficiency score. Proficiency in English was not available for 2 participants who had learned English simultaneously with another language.

<sup>1</sup>One of the subjects in the Late multilingual group whose first language was Mandarin subsequently became proficient in both French and English, but no longer uses French.


*\*Age at test was only available for 109 participants.*

#### **STIMULI**

The faces of three balanced bilingual (French/English) speakers were recorded while they recited sentences in both English and French. The French and English sentences were taken from the French and English versions of the book "The Little Prince," and were selected to overlap in content (same sentence translations) and to be roughly equivalent in length (see **Appendix** for examples). The sentences from each language were then individually digitized with the sound removed, to create 8–13 s silent video clips. There were no significant differences between sentence lengths for the English [average 37.24 (*SD* = 6*.*00) syllables] and the French [average 33.24 (*SD* = 5*.*88) syllables] video clips.

## **PROCEDURE**

Participants were tested in a sound-attenuated room and sat at eye level with the monitor (17--) of a Pentium 4 PC. From a distance of ∼75 cm, the participants watched 24 pairs of sentences, and each pair was played consecutively. For each pair of sentences, a white fixation point would first appear in the center of the black screen for 500 ms. Following this, a red frame with the speaker silently reciting one of the sentences would appear and was followed by a 1 s interval of black screen before the second sentence in the pair was played inside a green frame. Participants were asked to press the right mouse button (marked with an S) if they thought both clips were in the same language and the left mouse button (marked with as D) if they thought that they were from different languages. During the second sentence (green frame) participants had been instructed to respond as soon as they were sure of their judgment. If a response was not made during the second sentence, a white question mark appeared in the center of the black screen and was displayed until a response was made or 2000 ms elapsed. The language for each sentence clip was chosen pseudorandomly by the computer for each participant. The order and total number of sentences


was set to be equiprobable, with each sentence appearing only once.

The two sentences in a given trial were spoken by the same person and were different in content. In the random condition, the clips used in a given trial were selected randomly from one of the three speakers. In the blocked condition, eight clip pairs from each individual speaker were presented consecutively before moving on to the eight pairs from the next speaker. This allowed for a test of potential improvement across exposure to each speaker. The order of the speakers was counterbalanced for each condition and the speaker order for the blocks was counterbalanced across participants.

## **RESULTS**

Using group mean averages, a series of one-sample *t*-tests revealed that across all ages of acquisition, both the English L1 (English learned alone in infancy or simultaneously with another language) [*M* = 60%, *t(*59*)* = 6*.*84, *p <* 0*.*001] and English L2 (Early and Late multilinguals) [*M* = 54%, *t(*59*)* = 3*.*00, *p <* 0*.*05] discriminated the languages significantly better than chance, and did so in both the Random [*M* = 57%, *t(*59*)* = 4*.*56, *p <* 0*.*001] and Blocked [*M* = 58%, *t(*59*)* = 4*.*99, *p <* 0*.*001] speaker blocks. A univariate analysis of variance (ANOVA) including sex, language background (English as L1 or English as L2), and speaker order (blocked or random) yielded only a significant main effect for language background [*F(*1*,* <sup>119</sup>*)* = 8*.*08, *p <* 0*.*05; **Figure 1**]. Simple main effect analyses showed that the English L2 speakers performed significantly worse than the English L1 speakers [*F(*1*,* <sup>119</sup>*)* = 5*.*40, *p <* 0*.*05].

**FIGURE 1 | Accuracy (percentage correct) in identifying whether silent video clips were from the same or different languages in both Random and Blocked speaker orders.** The *y*-axis represents mean accuracy; the *x*-axis represents whether the adults had learned English before age 2 (L1) or after the age of 2 years (L2). Filled-in symbols represent the group means. Error bars represent the standard error of the mean. ∗*p <* 0*.*05.

To probe whether age of acquisition of English had an effect on visual speech discrimination, we ran additional analyses. An ANOVA analyzing the effect of age of English acquisition (age 0–2, 2–6, 6–15) yielded a significant effect [*F(*2*,* <sup>117</sup>*)* = 5*.*55, *p <* 0*.*05]. Planned comparisons focusing on the multilingual participant groups revealed that the Infant and Early multilingual age groups did not perform significantly different from each other [*F(*1*,* <sup>48</sup>*)* = 0*.*24, *p* = 0*.*63], but did perform better than adults who acquired English in late childhood (6–15 years) [*F(*1*,* <sup>78</sup>*)* = 3*.*90, *p* = 0*.*05]. In fact, performance was significantly better than chance for multilingual learners who acquired English in infancy [*M* = 56%, *t(*19*)* = 2*.*69, *p <* 0*.*02] and learners who acquired English in early childhood [*M* = 57%, *t(*29*)* = 3*.*53, *p <* 0*.*02], but not for participants who acquired English in late childhood [*M* = 52%, *t(*29*)* = 0*.*82, *p* = 0*.*417]. These results are graphically illustrated in **Figure 2**, which reveals as well that the vast majority of subjects in the infancy and early childhood groups, but not in the late English acquisition group, performed better than chance.

We performed several follow-up analyses with the multilingual groups in order to explore whether proficiency or number of years of experience, rather than age of acquisition (see Flege et al., 1997), could account for our findings. There was no significant correlation between discrimination performance and self-rated proficiency in English [*r(*77*)* = −0*.*18, *p* = 0*.*12]<sup>2</sup> . Correlating discrimination performance with total years of experience with English [*r(*70*)* = 0*.*09, *p* = 0*.*48] <sup>3</sup> , and

<sup>3</sup>The data for this analysis were only available for 71 of the 80 participants.

**from the same or different languages of multilingual adults who had learned English: simultaneously with another language before age 2 (Infancy), between age 2 and 6 (Early), and after the age of 6 (Late).** The *y*-axis represents mean accuracy and the *x*-axis represents the age at which English was learned. Filled-in symbols represent the group means. Error bars represent the standard error of the mean. ∗*p <* 0*.*05.

exposure to French [*r(*79*)* = 0*.*02, *p* = 0*.*84] also failed to reach significance. However, there were significant group differences between the means for proficiency scores, 1.16 (Infant multilinguals), 1.48 (Early multilinguals), and 1.95 (Late multilinguals), [*F(*2*,* <sup>74</sup>*)* = 5*.*92, *p <* 0*.*01] as well as the group means for years of experience, 20.1 (Infant multilinguals), 16.5 (Early multilinguals), and 12.4 (Late multilinguals), [*F(*2*,* <sup>65</sup>*)* = 40*.*14, *p <* 0*.*01].

To further probe the possibility that self-rated proficiency or years of experience with English may have contributed to our findings, we equated the Early and Late Multilingual groups by selecting subsets with equivalent proficiency scores or years of experience. We selected a subset of Late multilinguals who scored between 1 and 3 on the proficiency scale [with a mean score = 1.48(0.67) that was equivalent to the Early multilinguals = 1.53(0.60)]. The results from the full sample concerning the influence of AoA were replicated in the restricted Late multilingual sample as the late learning multilinguals again failed to perform significantly better than chance [*M* = 53*.*3%, *t(*22*)* = 1*.*47, *p* = 0*.*16].

Similarly, we also tested the effect of AoA by selecting a subset of English L2 speakers who had an equivalent amount of experience in total number of years (12–19 years), and then within this group, compared the effects of early and late AoA. This resulted in 2 groups: 20 early bilinguals with a mean = 15.3(1.26) years of experience and 16 late bilinguals with a mean = 14.06(2.17) years of experience, wherein the mean years of exposure were not significantly different. The results from the full sample concerning the influence of AoA were replicated in this restricted sample: early bilinguals performed significantly better than chance [*M* = 56*.*0%, *t(*19*)* = 2*.*79, *p <* 0*.*05] while the late learning bilinguals did not [*M* = 52*.*6%, *t(*15*)* = 0*.*96, *p* = 0*.*35].

## **DISCUSSION**

The age at which a language is learned (in this case, English) during childhood influences the ability to visually discriminate this language from others in adulthood. Interestingly, this effect of AoA could be examined separately from the influence of years of exposure or proficiency (self-rated). When tested on a visual language discrimination task, most participants who had learned English as a second language in late childhood (after 6 years) failed to discriminate English from French, whereas most participants who had learned English earlier, as infants (0–2 years old) or in early childhood (2–6 years old), succeeded. Allowing the participants to view the speakers in a blocked vs. random speaker order did not seem to have an influence on discrimination performance.

According to prior research, infants who are familiar with both languages (French and English since early infancy) retain the capacity to continue discriminating the languages visually at 8 months, while their monolingual counterparts fail (Weikum et al., 2007). This benefit arising from bilingual exposure appears to confer an advantage in adulthood too, as adults familiar with both test languages perform visual language discrimination significantly better than those familiar with only one of the test languages (Soto-Faraco et al., 2007). Based on the infant research, one might argue that the successful discrimination of French and English by monolingual English infants at 4 and 6 months,

<sup>2</sup>The data for this analysis were only available for 78 of the 80 participants and 1 participant's data was removed as their proficiency score was more than 3 *SD* from the mean.

followed by a decline at 8 months, predicts that monolingual English adults should also fail to discriminate English and French (Weikum et al., 2007). However, the present findings (see also Soto-Faraco et al., 2007 for converging results) show that monolingual participants do indeed successfully discriminate their native language from an unfamiliar language. One reason adults succeed and older infants do not, may be that adults are able to use a wider and more sophisticated range of strategies to resolve the task. However, if it was only strategy on the part of the monolingual adults that leads to their success in language discrimination, then the failure of our English L2 late learning adults to tell apart French from English is surprising. Instead, our results suggest that exposure to one of the languages any time before age 6 allows for continued discrimination in adulthood.

Sensitive periods have been previously identified for phonemic segment discrimination in auditory spoken languages (for a review see Werker and Tees, 2005) and for acquisition of syntax in signed languages (Newport, 1990). The results from this study further support these findings by showing that sensitive periods also exist for language discrimination based on visual speech cues alone. Although it was not the intention of this study to address what these cues may be (see Soto-Faraco et al., 2007; Ronquest et al., 2010; Navarra et al., submitted), for work investigating the role of visual phonetic and rhythmical cues), our results suggest that some visual language cues are subject to sensitive periods. On the other hand, some of the subjects in the late acquisition group did succeed at discriminating visual French from visual English. Thus, either some cues are subject to sensitive period effects and others are not, and the subjects differentially attended to these cues, or there are individual differences between the subjects such that some retain greater openness to non-native information than do others. Understanding this within group variability more deeply will be an important focus for future research. It will provide insight into the speech perception limitations faced by both first and second language learners, and provide guidance for improvement.

#### **ACKNOWLEDGMENTS**

This research was supported by Research grants from the Natural Sciences and Engineering Research Council (NSERC) and the Social Sciences and Humanities Research Council (SSHRC) to Janet F. Werker, by NSERC and SSHRC Fellowships to Whitney M. Weikum, and by grants PSI2009-12859, PSI2012- 39149 and RYC-2008-03672 from Ministerio de Economía y Competitividad (Spanish Government), and the European COST action TD0904 to Jordi Navarra. Salvador Soto-Faraco was supported by ERC (StG-2010263145), MICINN (PSI2010- 15426 and Consolider INGENIO CSD2007-00012) and AGAUR (SGR2009-092).

#### **REFERENCES**

Abercrombie, D. (1967). *Elements of General Phonetics*. Chicago, IL: Aldine.


variability in learning new perceptual categories. *J. Acoust. Soc. Am.* 94, 1242. doi: 10.1121/1.408177


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 29 April 2013; accepted: 25 October 2013; published online: 13 November 2013.*

*Citation: Weikum WM, Vouloumanos A, Navarra J, Soto-Faraco S, Sebastián-Gallés N and Werker JF (2013) Age-related sensitive periods influence visual language discrimination in adults. Front. Syst. Neurosci. 7:86. doi: 10.3389/fnsys.2013.00086 This article was submitted to the journal Frontiers in Systems Neuroscience.*

*Copyright © 2013 Weikum, Vouloumanos, Navarra, Soto-Faraco, Sebastián-Gallés and Werker. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

#### **APPENDIX**

## **SENTENCE EXAMPLES FROM THE BOOK, LE PETIT PRINCE/THE LITTLE PRINCE BY ANTOINE DE SAINT-EXUPERY**

#### *Sentence 1*

English version- The little prince had watched very closely over this small sprout which was not like any other small sprout on this planet.

French version- Le petit prince avait surveillé de très près cette brindille qui ne ressemblait pas aux autres brindilles.

## *Sentence 2*

English version- If the two billion inhabitants who people the surface were all to stand upright, all humanity could be piled up on a small Pacific islet.

French version- Si les deux milliards d'habitants qui peuplent la terre se tenaient debout et un peu serrés, on pourrait entasser l'humanité sur le moindre petit îlot du Pacifique.

## Investigating mechanisms underlying neurodevelopmental phenotypes of autistic and intellectual disability disorders: a perspective

## *Tim Kroon†, Martijn C. Sierksma† and Rhiannon M. Meredith\**

*Department of Integrative Neurophysiology, Centre for Neurogenomics and Cognitive Research (CNCR), Neuroscience Campus Amsterdam, VU University, Amsterdam, Netherlands*

#### *Edited by:*

*Virginia Penhune, Concordia University, Canada*

#### *Reviewed by:*

*Carlos Portera-Cailliau, University of California Los Angeles, USA Marco Atzori, University of Texas at Dallas, USA*

#### *\*Correspondence:*

*Rhiannon M. Meredith, Department of Integrative Neurophysiology, Centre for Neurogenomics and Cognitive Research (CNCR), Neuroscience Campus Amsterdam, VU University, De Boelelaan 1085, Room C448, 1081 HV, Amsterdam, Netherlands e-mail: r.m.meredith@vu.nl †These authors have contributed equally to this work.*

Brain function and behavior undergo significant plasticity and refinement, particularly during specific critical and sensitive periods. In autistic and intellectual disability (ID) neurodevelopmental disorders (NDDs) and their corresponding genetic mouse models, impairments in many neuronal and behavioral phenotypes are temporally regulated and in some cases, transient. However, the links between neurobiological mechanisms governing typically normal brain and behavioral development (referred to also as "neurotypical" development) and timing of NDD impairments are not fully investigated. This perspective highlights temporal patterns of synaptic and neuronal impairment, with a restricted focus on autism and ID types of NDDs. Given the varying known genetic and environmental causes for NDDs, this perspective proposes two strategies for investigation: (1) a focus on neurobiological mechanisms underlying known critical periods in the (typically) normal-developing brain; (2) investigation of spatio-temporal expression profiles of genes implicated in monogenic syndromes throughout affected brain regions. This approach may help explain why many NDDs with differing genetic causes can result in overlapping phenotypes at similar developmental stages and better predict vulnerable periods within these disorders, with implications for both therapeutic rescue and ultimately, prevention.

**Keywords: neurodevelopmental disorders, critical periods, gene expression, phenotype, development**

#### **RATIONALE AND BACKGROUND**

Cognitive disorders, including intellectual disability (ID) and autism spectrum disorders (ASD) are genetically and phenotypically highly heterogeneous. To date, more than 450 candidate genes are associated with ID and many hundreds with ASD numbers predicted to rise with the routine usage of high throughput sequencing technology (Mitchell, 2011; van Bokhoven, 2011; State and Sestan, 2012). Despite the heterogeneity of genes underlying both syndromic and non-syndromic forms of ID and ASD, they are often characterized by early onset of symptoms, overlapping developmental delays and prominent regression of acquired behaviors in ASD during early childhood (Geschwind and Levitt, 2007). However, the underlying mechanisms and early temporal dysregulation in neuronal signaling pathways that trigger neurodevelopmental disorder (NDD) onset and regulate symptoms are not fully understood.

Many known candidate genes for both ID and ASD are expressed synaptically, regulate synapse function and morphology or are themselves regulated by synaptic activity (Ramakers, 2000; Zoghbi and Bear, 2012). For known monogenic NDD syndromes, genetic mouse models such as the Fragile X mental retardation 1 knockout (Fmr1-KO) mouse for Fragile X syndrome (FXS; Bakker et al., 1994) or tuberous sclerosis protein 1/2 (TSC 1/2) models for tuberous sclerosis (Meikle et al., 2007; Ehninger et al., 2008) have enabled the functional study of these genes in the intact brain. For many such mouse models, the target gene is permanently disrupted early on in development, either globally or in a cell-type specific manner. Nevertheless, recent data reveal developmentally regulated and transient synaptic phenotypes in NDD models despite a permanent alteration in genotype (Meredith et al., 2012).

Here, we propose that key developmental aspects of NDD symptoms can be better understood by focusing on the interactions between synaptic NDD gene pathways and the underlying known critical periods in the neurotypical brain. Further, we propose that clustering NDD gene groups on their neuro-spatiotemporal expression profiles, rather than biological functions alone, may reveal novel NDD genes and explain the developmental regulation of specific symptoms. Combining knowledge of key gene networks dysregulated in NDDs and their role during critical periods may elucidate causal mechanisms for symptom onset and further our understanding of critical periods in neurotypical brain development. The ideas presented are formulated as three testable hypotheses for validation in known genetic NDD syndromes (**Box 1**).

#### **BOX 1 | Hypothesis**

#### **Hypothesis 1**

Dysregulation of synaptic pathways occurs at the subcortical level in NDDs at 'presymptomatic' stages.

#### **Hypothesis 2**

Dysruption of critical periods in subcortical regions such as brainstem precedes and consequently disrupts critical periods in thalamus and then cortex.

#### **Hypothesis 3**

No differences in synaptic networks or critical periods in NDDs occur prior to the neurotypical pre- or postnatal expression of the NDD gene in that brain region.

## **DEVELOPMENTAL DELAYS AND POSTNATAL ONSET IN NDDs**

Although ASDs and several forms of ID are heterogeneous, symptoms often emerge during early development. Initial symptoms such as hypotonia and developmental delay of motor activities, impaired social interactions, repetitive behaviors and epileptic seizures can manifest early in life (Zoghbi and Bear, 2012). Hypotonia during early neonatal periods is correlated with delayed motor skill development in infancy and characteristic for many monogenic disorders including FXS, Angelman syndrome and syndromic Oligophrenin-1 mutation ID (OPHN1; Kau et al., 2000; Bergmann et al., 2003; Clayton-Smith and Laan, 2003; Williams et al., 2006). Additionally, there is high comorbidity of epilepsy in ID and autism and often, seizure activity is developmentally regulated (Gillberg and Billstedt, 2000; Amiet et al., 2008; Ramamoorthi and Lin, 2011). For OPHN1 ID, absence and myoclonic jerks often develop into seizures with increasing frequency in the first 12 months (Bergmann et al., 2003). In Rett Syndrome, developmentally regulated seizures also occur along with regression of behaviors after 6–18 months of neurotypical progress (Steffenburg et al., 2001; Weaving et al., 2005). In many such disorders, the earlier the onset of first symptoms, the more severe the locomotor dysfunction and impairments in language acquisition (Gratchev et al., 2001). Impairments in speech and social interactions are commonly reported to be delayed in syndromes such as FXS and Angelman, where they may be characterized as core symptoms or as part of an ASD comorbid with ID (Gillberg and Billstedt, 2000; Amiet et al., 2008; Ramamoorthi and Lin, 2011). Altogether, the overlapping symptoms, their temporally restricted onset and an overall developmental delay suggest a common NDD etiology in brain development.

The impact of developmental delays is not just confined to symptom onset but could extend beyond the presentation period to disrupt subsequent developmental stages. This concept of "sleeper effects" is illustrated for permanent visual impairments emerging later on in life due to a lack of early sensory experience (Maurer et al., 2007). Early hypotonia and impaired motor skills, or aberrant sensory modulation and social avoidance are paired examples where earlier developmental impairments can have lasting consequences upon later behavior, despite the fact that the initial impairment was transient or lessened with age (Baranek et al., 2006; Ben-Sasson et al., 2009). Although these reports were not longitudinal, the correlations suggest that impairments of sensory or motor functions affect the acquisition of complex behaviors such as speech, language and social interaction. However, while the prevalence of sensory impairments is significantly greater in those with ID than in the general population (Carvill, 2001) it is important to note that not all pre- or early postnatal sensory impairments such as congenital blindness or deafness are associated with later diagnosis of ID or autistic syndromes. The strong association with sensory impairments may, in part, arise from infections or perinatal events that cause extensive neurological damage but for genetic conditions such as Usher syndrome, specific visual and auditory impairments can occur without cognitive or social disabilities. Regardless of the genetic and environmental heterogeneity in underlying NDDs, impaired development is characteristic for both syndromic and non-syndromic NDDs. Here, within the category of NDDs we focus on genetically identified IDs and ASDs as these disorders are widely studied in humans and investigated in animal models. Further, we speculate that the syndromic and nonsyndromic disorders converge on similar developmentally regulated mechanisms.

## **CRITICAL PERIODS AND NORMAL BRAIN DEVELOPMENT**

Critical periods are developmental time-windows during which external stimuli have a heightened influence on the proper development of an organism. While the early stages of development are largely based on hard-wired genetic and molecular cues (Chilton, 2006; Marin et al., 2010), at later stages neuronal activity becomes an important factor contributing to circuit development in the brain (Lendvai et al., 2000; Spitzer, 2006). This activity can be intrinsically generated (Golshani et al., 2009; Rochefort et al., 2009) or induced by sensory stimulation (Siegel et al., 2012). Although neuronal circuits remain malleable by external stimuli throughout life, most circuits are especially sensitive to external input during restricted time-windows, or critical periods (Knudsen, 2004; Hensch, 2005). Consequently, disruptions of external input have a much greater effect during the critical period than at other times and these effects can be irreversible. In the primary visual cortex (V1) of the cat, prolonged closure of one eyelid in kittens, shifts V1 neuron responsiveness toward the open eye (Wiesel and Hubel, 1963). This effect is largely absent in adult cats. Since then, this shift in ocular dominance in juvenile mammals has become the most widely studied instance of a critical period. Subsequently, critical periods have been found in many cortical regions and sensory modalities, such as the somatosensory (Fox, 2002) and auditory systems (Barkat et al., 2011; Yang et al., 2012).

Development is typically a set of processes influencing both behavioral and biological characteristics which occur sequentially (Michel and Tyler, 2005). It is interesting to note that there seems to be a sequential hierarchical structure to the order in which different critical periods occur. In somatosensory cortex, restricted critical periods for thalamocortical and then corticocortical synapse connectivity and maturation occur in a regulated layer-specific sequence (Fox, 2002; Feldmeyer et al., 2013). In the visual cortex, layer IV receives subcortical input, which is subsequently processed in both superficial and deep layers. The critical period for ocular dominance in these cortical layers lasts longer than that of inputs to layer IV (Daw et al., 1992). This may explain why there seems to be a lack of clearly defined critical periods for higher order functions involving sensory cortical networks spread across different layers. In the visual condition amblyopia (lazy eye), treatment is most effective in young children, but it can also still be treated in adults (Polat et al., 2004). This phenomenon whereby sensory plasticity underlying acquired behaviors can occur in the adult nervous system, albeit at a less effective level, also applies to the auditory system. For example, in congenitally deaf children, cochlear implants are most effective when treatment starts at an early age. The earlier the implantation, the more likely these children are to develop spoken language (Nicholas and Geers, 2007). Children who receive cochlear implants after the age of seven do not develop normal cortical responses to auditory stimuli (Sharma et al., 2009). However, there is no clear cut-off when cochlear implantation ceases to be useful, as implantation after this age does improve hearing (Harrison et al., 2005). Similarly, although second-language acquisition is most effective when started before the age of 4, adults retain the ability to learn new languages, albeit less fluently (Werker and Tees, 2005). Furthermore, musicians who start musical training before age 7 on average ultimately perform better than those who start training at a later age (Penhune, 2011), but learning to play music is still possible during adulthood. Thus, developmental time-frames for plasticity exist at both the synaptic and behavioral levels within which the greatest periods of phenotypic change occur and where lack of sensory experience has the most significant effects. These timeframes are commonly referred to as "critical" periods when investigating mechanisms of synaptic and molecular changes. They are also referred to as "sensitive" periods for many behaviors, although the distinction of usage and the exact ending of these periods is not always clear-cut (Johnson, 2005; Michel and Tyler, 2005). Here, we use the term "critical period" to refer to both synaptic and behavioral phenotypes that occur during documented neurotypical developmental stages.

At the level of the synapse, development and formation of functional connections during neurotypical maturation follows an established sequence: initial axonal and dendritic outgrowth, excess formation of immature long thin filopodia-like spines and subsequent pruning of synaptic contacts accompanied by an activity-induced maturation of remaining synapses (Katz and Shatz, 1996; Ethell and Pasquale, 2005; West and Greenberg, 2011). Whilst synapse remodelling is a lifelong process (Holtmaat et al., 2005; Grillo et al., 2013), the peak of synapse development and synaptic connectivity is predominantly established during early postnatal periods in vertebrates (Pan and Gan, 2008). For primary sensory cortices, the network is shaped by sensory input during the critical period coinciding with a high level of synaptic and neuronal remodelling. Thus, during neurotypical development, critical periods for the greatest changes in synaptic circuits in the brain and behavior are defined when the system is most susceptible to change. As such, plasticity of specific phenotypes is heightened relative to earlier or later developmental stages.

## **MOLECULAR PATHWAYS INVOLVED IN MONOGENIC NDD CONVERGE ON SYNAPSE FUNCTION**

Aberrant spine morphology is characteristic for individuals with NDDs as post-mortem studies report an abundance of immature, long thin spines and in some cases, altered spine density (Kaufmann and Moser, 2000; Ramakers, 2002; He and Portera-Cailliau, 2013; Maynard and Stein, 2012). Morphological aberrations also occur in non-syndromic ID where dendritic spine impairments correlated with age and severity of developmental disability (Purpura, 1974; Ramakers, 2002). Thus, a body of evidence from human post-mortem studies indicates a strong correlation between altered structural development of synapses and NDDs.

Initial stages of synapse formation and neuronal connectivity require modulation of the cytoskeletal F-actin via the Ras homologue subfamily of Rho GTPases. Many genes underlying monogenic NDDs interact directly with Rho signaling protein pathways. (**Figure 1**; Ramakers, 2002; Ethell and Pasquale, 2005). This family of small-GTPases includes ras homolog gene family, member A (RhoA), ras-related C3 botulinum toxin substrate (Rac1) and cell division cycle 42 (Cdc42), which dynamically regulate protrusion and retraction of spines via cytoskeletal actin remodelling (Tashiro et al., 2000; Ethell and Pasquale, 2005). Small guanosine-5- -triphosphate hydrolyzing enzymes (GTPases) typically cycle between active GTP-bound and inactive guanosine diphosphate- (GDP) bound states. These transitions are dynamically regulated by GTPase activating proteins (GAPs), guanine nucleotide exchange factors (GEFs), and by GDP dissociation inhibitors (GDI) inhibiting the conversion to the active GTP-bound state (Sasaki and Takai, 1998). In syndromic OPHN1 ID, changes in spine morphology are caused by the absence of OPHN1, a RhoA-GAP (Govek et al., 2004). The ID Williams syndrome is linked to the LIM domain kinase 1 (LIMK1) gene, whose product mediates changes in actin and spine morphology via Cdc42 and Rac1 pathways (Edwards et al., 1999). Additionally, LIMK1 interacts with P21-activated kinases (PAKs) which also harbor mutations in many nonsyndromic human ID cases (Allen and Walsh, 1999). Rho family members are activated by extracellular stimuli via growth factors and neurotransmitter release. Brain-derived neurotrophic factor (BDNF), involved in synaptic maturation, activates Rac/RhoA-GEF proteins via TrkB tyrosine kinase (TrkB) receptors and induces spine head growth (Hale et al., 2011). During synaptic activation, glutamatergic transmission activates 2-amino-3-(3-hydroxy-5-methyl-isoxazol-4-yl) propanoic acid receptor (AMPA) and N-Methyl-D-aspartic acid or N-Methyl-D-aspartate (NMDA) receptors and subsequently activates Rho proteins (Sin et al., 2002). Therefore, activity of the Rho proteins is sensitive to synaptic transmission and can regulate activity-induced maturation of the synapse. Synaptic maturation requires structurally modifying the synapse via cell-adhesion proteins including CNTNAP2, neuroligins 3 and 4, neurexin1, *δ*-catenin and associated Shank and Homer proteins which are frequently implicated in ASD (Tu et al., 1999; Sala et al., 2001; Jamain et al., 2003; Sudhof, 2008; Matter et al., 2009; Anderson et al., 2012). These proteins ensure proper synapse formation by bridging the pre- and postsynaptic sites, acting as a scaffold and stabilizing the cytoskeleton of the synapse (Kosik et al.,

**FIGURE 1 | Several NDD-associated genes function at the synapse**. Monogenic NDD genes (red) expressed in the synapse, illustrated here postsynaptically, mediate spine morphology changes via small GTPase-mediated signaling pathways and F-actin in response to synaptic activation via BDNF and glutamatergic excitation. Abbreviations: **4EBP1**, eukaryotic translation initiation factor 4E-binding protein 1; **AMPA**, 2-amino-3-(3-hydroxy-5-methyl-isoxazol-4-yl) propanoic acid receptor; **Cdc42**, cell division cycle 42; **CYFIP**, cytoplasmic binding partner of fragile X protein; **eIF4E**, eukaryotic translation initiation factor 4E; **FMRP**, fragile X mental retardation protein; **LimK1**, LIM domain kinase 1; **mGluR5** metabotropic glutamate receptor subunit 5; **mTOR1**, mammalian target of rapamycin 1; **NMDA**, N-Methyl-D-aspartic acid or N-Methyl-D-aspartate Receptor; **OPHN1**, oligophrenin-1; **PAK**, P21-activated kinase; **Rac1**, ras-related C3 botulinum toxin substrate; **Rheb**, Ras homolog enriched in brain; **RhoA**, ras homolog gene family, member A; **TrkB**, TrkB tyrosine kinase; **Tsc 1**, tuberous sclerosis protein 1; **Tsc 2**, tuberous sclerosis protein 2.

2005; Takeichi and Abe, 2005; Penagarikano and Geschwind, 2012). Since the Rho signaling pathways and synapse-spanning complexes are enriched with NDD-related proteins, they provide a direct link between NDDs and aberrant synapse development.

In addition to direct modulation of the cytoskeleton, many NDD-related proteins are regulators of gene transcription, mRNA translation and ultimately protein synthesis (Nan et al., 1997; Bagni and Greenough, 2005; Kelleher and Bear, 2008; Guy et al., 2011). NMDA-dependent, metabotropic glutamate receptor (mGluR)-dependent and BDNF-induced synaptic plasticity mechanisms depend on protein synthesis via the Ras-mitogen-activated protein kinase (Ras-MAPK) pathway and directly or indirectly modulate TSC 1/2 complex activity (Sweatt, 2004; Banko et al., 2006; Gong and Tang, 2006; Kelleher and Bear, 2008). Misregulation of mRNA translation, particularly for synaptic proteins, is proposed to underlie many "synaptopathies" with impairments or loss of fragile X mental retardation protein (FMRP), TSC 1/2, ubiquitin-protein ligase E3A (UBE3A) and eukaryotic translation initiation factor 4E (eiF4E) all causing altered protein synthesis (Auerbach et al., 2011; Zoghbi and Bear, 2012; Santini et al., 2013). Furthermore, altered transcriptional regulation via methyl CpG binding protein 2 (MECP2) is also linked to prominent impairments in Rett syndrome (Guy et al., 2011). Thus, the effects of many NDD-linked genes occur at the level of spine morphology, synapse function and regulation of local protein synthesis in the developing and adult mammalian brain.

## **TEMPORAL SYNAPTIC PHENOTYPES AND CRITICAL PERIODS IN NDD MOUSE MODELS**

Across different NDD mouse models, studies consistently report an abundance of thin immature filopodia-like spines and small spine heads (Meng et al., 2002; Galvez and Greenough, 2005; Cruz-Martin et al., 2010; Maynard and Stein, 2012; Powell et al., 2012) and/or an altered spine density (Dolen et al., 2007; Meikle et al., 2007; Yashiro et al., 2009; Sato and Stryker, 2010; Powell et al., 2012). In many models, alterations in synaptic phenotypes are often reported at one developmental stage, often corresponding to adult symptomatic stages or a period of 2–3 weeks postnatal age during which extensive refinement and plasticity of synapses occurs in rodent brain. However, data derived from longitudinal studies support the notion of developmentally regulated and transient phenotypes in NDD models.

In typically developing somatosensory cortex, spine morphology changes greatly between postnatal weeks 1–4, shifting from a high proportion of transient, thin "immature" spines to more mature, long-lasting stubby spines (Ethell and Pasquale, 2005). However, in Fmr1-KO mice, this transition is delayed at 2 weeks of age (Cruz-Martin et al., 2010) but both spine morphology and dynamic turnover are normalized around one month of age (Nimchinsky et al., 2001; Cruz-Martin et al., 2010). Intriguingly, the immature spine phenotype reappears in the adult Fmr1-KO mice (Galvez and Greenough, 2005) similar to the pattern of transient changes in spine morphology observed in the down syndrome cell adhesion molecule knockout (DSCAM-KO) mouse model for Down Syndrome (Maynard and Stein, 2012). Critical periods in the somatosensory cortex occur in a sequential pattern, from subcortical to later cortico-cortical changes (Fox, 2002; Feldmeyer et al., 2013). Transient phenotypes are also observed in thalamocortical pathways: in Fmr1-KO mice, enhanced NMDA/AMPA synaptic ratios and altered plasticity occur during the first but not by the end of the second postnatal week, indicating developmental delays within the neurotypical critical period for this pathway (Harlow et al., 2010). In contrast, premature maturation of thalamocortical NMDA/AMPA ratios and plasticity occurs in heterozygous mice for SynGap1, a Ras GTPase-activating protein implicated in ID and ASD but this also normalizes at the end of the first postnatal week (Clement et al., 2013). During the second postnatal week, after the cessation of thalamocortical plasticity, decreased connectivity strength and diffuse axonal branching occurs in cortical circuits between layers 4 and 2/3 of Fmr1-KO mice. Again, these deviations from neurotypical development are restricted and normalize one week later (Bureau et al., 2008). Thus, in somatosensory cortex, many transient changes occur during established critical periods for particular synaptic pathways. Such transient NDD phenotypes are not limited to sensory cortex but also occur in other brain regions including medial prefrontal cortex (Testa-Silva et al., 2012), amygdala (Vislay et al., 2013) and olfactory epithelium (Palmer et al., 2008).

In addition to aberrations in critical periods for synapse and circuit formation, dysregulated synaptic phenotypes occur during critical periods for adaptation to sensory deprivation. Ocular dominance and experience-dependent plasticity mechanisms in response to monocular deprivation (MD) are documented well for the mouse visual cortex and occur during a restricted postnatal period. In Fmr1-KO mice, a short MD period induced a significantly smaller reduction in response in the deprived cortex and an enhanced potentiation of input from the open eye compared to wildtype (WT) mice (Dolen et al., 2007). A lack of plasticity in the deprived cortex after MD was also observed in m-UBE3A-KO mice, a model for Angelman syndrome where the maternal gene copy is lacking (Yashiro et al., 2009; Sato and Stryker, 2010). This effect was not due to a developmental shift in the critical period for m-UBE3A-KO mice since no change in response to MD was observed if the deprivation occurred before, during or after the neurotypical critical period (Sato and Stryker, 2010).

The closure of the critical period for ocular dominance can be manipulated by changes in inhibition or by sensory deprivation through rearing mice in the dark (Hensch, 2005). In heterozygous MECP2-KO female mice, ocular dominance plasticity in response to MD could be induced far beyond the neurotypical critical period into young adulthood, suggestive of a lack of maturation and normal closure of this plasticity mechanism (Tropea et al., 2009). Early synaptic development of the visual system in MECP2 null mice appears normal up to P21 but is followed by later impairments of retinogeniculate synapses (Noutel et al., 2011), increased cortical inhibition and ultimately, impaired visual acuity (Durand et al., 2012). These later developmentally regulated changes in the MECP2 mouse model reflect the protein's proposed role in synaptic maintenance during adult stages (Guy et al., 2007; Robinson et al., 2012) similar to late postnatal onset of impairments in the Cri-du-Chat mouse model (Matter et al., 2009) but in contrast to other NDD models displaying earlier synaptic phenotypic impairments.

What are the consequences of a dysregulated synaptic phenotype or altered critical period in the developing brain? During retinotopic map development, disruption of synaptic activity during an early critical period alters later neuronal connectivity within the visual system. Desynchronization of early retinal waves of neuronal activity in mouse pups lacking the *β*2- nicotinic acetylcholine receptor subunit is a transient phenotype restricted to the first but not second postnatal week of development. This altered activity results in an impaired finescale refinement of retinal axons in the brainstem (Grubb et al., 2003; Mclaughlin et al., 2003), altered geniculocortical projections (Cang et al., 2005) and a decrease in visual acuity at the cortical level (Rossi et al., 2001). Therefore, disruption or loss of an early critical period can influence both functional and structural connectivity not only in the affected region but in other areas of the sensory processing system and result in altered sensory perception. Applying this principle to NDDs, early or transient alterations in synaptic phenotypes during known critical periods could account for later aberrations in synaptic function, morphology and potentially even behavioral impairments of sensory information processing that characterize many of these disorders.

## **NEURAL CONNECTIVITY AND EXCITATION-INHIBITION BALANCE IN NDDs**

Abnormalities in connectivity of excitatory and inhibitory neurons in NDDs are documented at many different levels from whole-brain functional imaging studies to electron microscopic changes in synaptic morphology (Kaufmann and Moser, 2000; Belmonte et al., 2004; Belmonte and Bourgeron, 2006; Dinstein et al., 2011). Dysregulation of excitatory/inhibitory (E/I) balance is proposed to impair neural processing and underlie cognitive deficits in many ID and autistic syndromes (Rubenstein and Merzenich, 2003). E/I is aberrant in many NDD mouse models: some have increased excitability [FXS: (Hays et al., 2011; Testa-Silva et al., 2012; Goncalves et al., 2013), TSC: (Bateup et al., 2013)], ASD models (Peca et al., 2011; Penagarikano et al., 2011; Clement et al., 2012) whilst others show increased inhibition [Downs: (Fernandez et al., 2007; Chakrabarti et al., 2010; Kleschevnikov et al., 2012) Rett: (Dani et al., 2005; Noutel et al., 2011; Durand et al., 2012), but see Calfa et al. (2011) and Kron et al. (2012)]. Thus dysregulation of either excitation or inhibition can disrupt the correct E/I balance in NDDs.

The interaction between E/I balance and development of synaptic networks during critical periods is likely a complex and finely tuned set of processes. In visual cortex, maturation of inhibition triggers critical period onset accompanied by regulation of excitatory synapse strength via activity-dependent mechanisms (Hensch, 2005). Thus both timing and synaptic maturation during critical periods depend upon a delicate interplay of both excitatory and inhibitory transmission and as such, are vulnerable to NDDs affecting E/I balance directly. An indirect effect of NDDs upon E/I balance could also arise if perturbations occur to delay or disrupt a critical period, thereby altering the correct development of synaptic connectivity. Given the sequential nature of synapse development from thalamocortical to sensory cortical regions, an early aberration affecting E/I balance during one critical period could give rise to impairments in a subsequent critical period of a cortical network. This may occur either directly via the same E/I critical period mechanism or as a consequence of, for example, impairments in the outgrowth of axonal projections from one synaptic network to the next.

A prevailing hypothesis in NDD research proposes a weakening of long-range projections in addition to a strengthening of local-range connectivity in the brain (Belmonte et al., 2004; Just et al., 2004). Local hyperconnectivity of excitatory networks in neocortex is observed in mouse models for FXS (Testa-Silva et al., 2012; Goncalves et al., 2013) and ASD (Rinaldi et al., 2008; Qiu et al., 2011) but Rett syndrome models show local hypoconnectivity (Dani et al., 2005). However, significantly less is known about long-range connectivity at the synaptic level in NDD mouse models or whether developmental trajectories are misregulated. It is likely that impairments in long-range projections in NDDs are not global but rather synapse-specific: alterations in long-range projections occur at cortical but not thalamic inputs to the lateral amygdala in a mouse model for Rett syndrome (Gambino et al., 2010) and in the ID associated gene il1rapl1 mouse model, thalamo-amygdala projections differ only on to principal cells but not interneurons (Houbaert et al., 2013). Furthermore, the period for normal synapse elimination and maturation of long-range projections to lateral amygdala occurred after 3 months of age, indicating that refinement of this synaptic pathway occurs relatively late in postnatal development and could potentially be disrupted by many other early critical period impairments (Gambino et al., 2010). Given the tightly regulated growth of the brain and sequential patterns of development from one synaptic network to another (Ben-Ari and Spitzer, 2010), we propose that long-range connectivity may be particularly vulnerable in NDDs, especially where the NDD-linked genes are strongly expressed at prenatal or early postnatal time-windows in brain development (Meredith et al., 2012). In a recent study, preliminary data reported infants at high risk for ASD had higher long-range functional connectivity than those at low ASD risk at 3 months age but lower connectivity at 12 months (Keehn et al., 2013). Thus longitudinal studies of interregional projections in the brain could reveal whether the key NDD hypothesis of weakened long-range connectivity is specific to the mature brain or applies also to early developmental stages, and how early brain connectivity relates to the onset of NDD symptoms.

#### **MECHANISMS UNDERLYING CRITICAL PERIODS AND NDDs**

The existence of sensitive time-windows for the manifestation of symptoms in animal models of neurological and neuropsychiatric disorders has recently been proposed (Leblanc and Fagiolini, 2011; Marco et al., 2011; Martin and Huntsman, 2012; Meredith et al., 2012). Here, we hypothesize that the concept of critical or sensitive periods can be applied to underlying mechanisms of NDDs in two ways.

First, the underlying pathology of NDDs could arise through aberrant interactions during existing critical period mechanisms that are in place during neurotypical development (**Figures 2A, C**). For example, both ocular dominance plasticity and mapping of frequency representation during their respective critical periods are impaired in the Fmr1 KO mouse but can be restored by reduction of metabotropic glutamate receptor subunit 5 (mGluR5) expression or pharmacological blockade (Dolen et al., 2007; Kim et al., 2013). The Fmr1 gene product, FMRP, is activated following mGluR5 stimulation and regulates synaptic mRNA translation and (Weiler et al., 1997) mGluR5 activation is necessary for certain types of synaptic plasticity (Huber et al., 2000; Raymond et al., 2000). Attenuation of mGluR5 signaling dysregulates both experience-dependent NMDA receptor expression and synaptic plasticity in young and adult visual cortex, respectively (Tsanov and Manahan-Vaughan, 2009). Therefore, the absence of FMRP in FXS affects the level of synaptic plasticity via mGluR5-mediated signaling dysregulation, which in turn affects the level of response during the critical period for ocular dominance.

The timing aspects of known critical periods in NDDs could also be affected via GABAergic inhibition. GABAergic inhibition is significantly altered in many NDDs (Rubenstein and Merzenich, 2003; Chattopadhyaya and Cristo, 2012). Intact GABAergic inhibition is necessary for the critical period for ocular dominance to occur: KO mice lacking the 65 kD isoform of the GABA production protein glutamate decarboxylase (GAD65) have impaired GABA function and do not show a normal critical period for ocular dominance (Hensch et al., 1998). The critical period can be induced experimentally by pharmacologically increasing GABA*<sup>A</sup>* receptor function (Hensch et al., 1998; Fagiolini et al., 2004). This opening of the critical period can be achieved independently of the age of the mice, indicating that adequate GABAergic signaling is necessary for the critical period to occur, while other mechanisms that act during the critical period are already in place. Thus, an alteration in GABAergic inhibition during brain development in NDDs can thereby lead indirectly to perturbations in the timing of critical periods.

The second concept to link NDDs and critical periods during development is that the expression profile of the gene underlying an NDD may in itself constitute a critical period during which the effects of the NDD are manifest (**Figure 2B**). This deviates slightly from the general definition of a critical period, as it does not necessarily pertain to external stimuli affecting network development. In this model, upregulation of a gene at a particular time is necessary for the network to develop normally. It is therefore a critical period in the sense that expression of the gene is necessary during a particular time-frame. This has been shown in a *Drosophila* model for FXS, where reintroduction of the *Drosophila* homologue of FMRP (dFMRP) in the knock-out model rescues certain aspects of synaptic morphology only during a 2 day time-window, but not during earlier development or later in the adult (Gatto and Broadie, 2009).

## **TEMPORALLY DYSREGULATED GENE EXPRESSION UNDERLYING NEURODEVELOPMENTAL BRAIN DISORDERS**

Gene expression is a dynamic process throughout life and is tightly regulated on both spatial and temporal dimensions. The transcriptome, the collective expression of multiple genes, differs significantly in a tissue-specific and brain region-specific pattern across both cortical and subcortical structures in mammals (Allen Brain Atlas,<sup>1</sup> Hawrylycz et al., 2012). Transcriptomic profiles reveal distinct layer-specific and non-layer-specific expression patterns for many thousands of genes in the sensory neocortex of adult mouse (Belgard et al., 2011). Similarly, robust genetic signatures for individual cortical layers and also specific brain regions are found in both human and non-human primates, with greater similarity in lamination between primate species than to rodents (Belgard et al., 2011; Bernard et al., 2012).

Given the protracted development of human brain over many years, it is not surprising that the spatial transcriptome varies considerably over time: in humans, more than 90% of detected genes in the brain are differentially regulated in a spatio-temporal manner from embryonic through to geriatric periods (Kang et al., 2011). The greatest changes in regional gene expression occur during prenatal and early postnatal periods (Colantuoni et al., 2011; Kang et al., 2011). In the mouse brain, cohorts of genes are differentially expressed in the sub-

<sup>1</sup>www.brain-map.org

plate at specific developmental stages from late embryonic through to early and late postnatal periods (Hoerder-Suabedissen et al., 2013). Thus, the transcriptome is tightly regulated in the neurotypical mammalian brain and reveals both restricted expression windows and developmentally changing gradients of gene expression.

The developmental regulation of spatial patterns of individual gene expression in the neurotypical brain includes many known NDD candidate genes for monogenic syndromes (Allen Brain Developing Human and Mouse Brain Atlas,2). Of interest, many genes linked to ASD show dynamic changes in expression in subplate layers of the mouse cortex, suggesting disruption of early developmentally regulated NDD candidates (Hoerder-Suabedissen et al., 2013). However, the direct functional effects of these gene changes are not yet known. Prominent genes underlying ID and ASD, including Fmr1, neurofibromin (NF1) and TSC 1/ 2 show strong developmental mRNA upregulation particularly from late embryonic stages onwards (**Figure 3**). For Fmr1, this upregulation is transient, peaking between postnatal days (P) 4 and 14 in telencephalic and thalamic defined regions before decreasing by P28 (**Figure 3**). Given that transient phenotypes in thalamocortical and cortico-cortical synaptic pathways occur in the Fmr1-KO mouse model, it is plausible that these temporal impairments only arise during periods of peak expression for the Fmr1 gene. That is to say, irregularities in an NDD only result in a phenotype at the time when the NDD gene peak expression would usually occur in neurotypical development. No synaptic NDD phenotype is observed if the gene is not prominently being expressed in that brain region and as such, there is no noticeable impairment in the KO mouse model at that stage.

Exome sequencing of many hundreds of families with individuals affected by ID and ASD reveal a high genetic heterogeneity and many *de novo* mutations (Neale et al., 2012; O'Roak et al., 2012; Sanders et al., 2012). Whilst changes in individual gene expression can be tracked throughout development of the brain, much insight can be gained from groupings of genes based on cell-type expression, synaptic location, similar cellular functions, or spatio-temporal expression patterns (Ruano et al., 2010; Kang et al., 2011; Hawrylycz et al., 2012; Lips et al., 2012). Clustering genes into such modules proves extremely useful for genetically heterogeneous disorders, such as ASD and ID, where individual genes explain, at best, a few percent of cases (Manolio et al., 2009; Ruano et al., 2010; Voineagu et al., 2011). In autistic brain samples, grouping many genes in network modules based on differential expression patterns revealed a downregulation in specific networks related to synaptic function. Additionally, gene networks for astrocytic/microglia function and immune function were enriched relative to neurotypical age-matched brain (Voineagu et al., 2011).

<sup>2</sup>https://molnar.dpag.ox.ac.uk/subplate

Many NDD gene products regulate expression of many other target genes and orchestrate a cascade of signaling proteins. In FXS, FMRP regulates over 800 mRNA targets (Brown et al., 2001; Vanderklish and Edelman, 2005; Darnell et al., 2011) and alters expression of many different synaptic proteins (Adusei et al., 2010; Klemmer et al., 2011). These FMRP targets are common to regulation throughout the nervous system (Ascano et al., 2012), occur both pre- and postsynaptically and can be grouped according to broad biological functions (Darnell et al., 2011). Thus, for complex disorders, a gene clustering approach on differential expression patterns may likely yield many new targets and therefore insights into the mechanistic basis of these NDD syndromes.

To-date, much emphasis is placed upon the individual signaling pathways dysregulated in specific monogenic NDDs. However, it is apparent that there may be key "hubs" that act as common points of dysregulation within the many signaling pathways in ID and ASD (Bill and Geschwind, 2009; Sakai et al., 2011; Voineagu et al., 2011; Zoghbi and Bear, 2012). Shared pathophysiological signaling pathways are of importance for rescue strategies of synaptic function, protein synthesis and behavioral impairments in mouse models of FXS, TSC and neuroligin-3 (Auerbach et al., 2011; Baudouin et al., 2012). The heterogeneity of NDDs of ID and ASDs proves a major source of difficulty for both researchers and the pharmaceutical industry to propose unifying mechanisms that underlie these disorders and importantly, to find viable therapeutic targets. Clinical testing of multiple targets specific for each syndrome is costly both in time and money. Identification of "hub" NDD genes or their key targets with high expression relatively early in development could provide a new therapeutic angle to intervene in particular NDDs. This approach is by no means straightforward and given the sequential development of critical periods in different brain regions, would be difficult to restrict therapeutic actions to specific synaptic pathways. However, the current testing of mGluR5 inverse agonists in phase II and III clinical trials for cognitive and behavioral phenotypes in FXS is being extended to younger children (Levenga et al., 20104). Whether developmental age in clinical trials affects outcome is not known, but in the Fmr1-KO mouse model, a greater effect of mGluR5 blockade was observed upon rescue of spine morphology in young compared to old neurons (Su et al., 2011). Furthermore, these findings will have implications for other NDDs with potential for early developmental dysregulation of mGluR5 signaling (Zoghbi and Bear, 2012).

3http://developingmouse.brain-map.org 4www.fraxa.org

## **TESTABLE HYPOTHESES FOR VALIDATION IN NDDs**

On these bases outlined, we propose three testable hypotheses (**Box 1**) to guide further investigation into neurobiological mechanisms for pathology of NDDs:

During development of sensory systems in the neurotypical brain, critical periods occur in a sequential pattern from brainstem, to thalamus to cortical regions as synapses form, refine and mature. Given that critical periods at thalamocortical and corticocortical synaptic pathways are affected in NDDs, we propose that dysregulation of synaptic pathways occurs at the subcortical level in NDDs at earlier stages than are currently known, during "presymptomatic" stages **(Hypothesis 1)**. For human NDD syndromes, this could point towards prenatal and early neonatal changes in brain formation and function at stages not currently tested in the clinic. The implications of abnormalities in brain activity at such early developmental stages would be significant initially for detection and screening for NDDs in the fetus or newborn baby and raise possibilities for therapeutic interventions, technological challenges notwithstanding. It may also challenge the notion at which point a child is considered to be presymptomatic, if changes in brain activity are found at increasingly younger developmental stages.

Many NDD genes exhibit prominent expression in subcortical brain regions as well as in more commonly studied cortical circuitry (Allen Brain Atlas5). Building on the observations of sequential disrupted critical periods in NDDs, we postulate that in sensory circuits of a NDD, dysregulation of a critical period in subcortical regions such as the brainstem precedes and consequentially disrupts subsequent critical periods in thalamus and then cortex (**Hypothesis 2**). Thus, dysregulation and potential developmental delay for one known critical period would have a knock-on effect for synaptic circuits regulated at later timepoints at downstream synaptic pathways. Little is known regarding subcortical brain regions in NDD mouse models. However, alteration of GABAergic transmission and reduction of GABA-A receptor subunits is reported at postnatal day 7 in ventrolateral brainstem of MECP2 KO mice (Medrihan et al., 2008). Current use of constitutive knock-out mouse models for genetic NDDs are valid experimental tools to detect such early changes: however, conditional knockout models where gene expression can be temporally controlled in specific cell types would better enable proof of a causal relation between a disrupted critical period in subcortical regions directly leading to later cortical impairments. Combining knowledge of the critical periods for specific mouse brain regions in neurotypical normal development with the temporal expression profile of genes implicated in NDDs can guide the spatial and temporal parameters for designing these experiments.

Observations in mouse models of genetic NDD syndromes, demonstrate that alterations in synaptic networks occur during early brain development. Taking the Fmr1-KO mouse model, for example, reported thalamocortical and cortico-cortical synaptic impairments correlate with FMRP expression that occurs in the normally developing brain (Harlow et al., 2010; Meredith et al., 2012). Although it may be purely coincidental that synaptic impairments in an NDD model co-occur with the normal time

5http://www.brain-map.org/

period for peak expression of that NDD gene, we believe these are directly linked and that the most prominent phenotypic impairments first occur during the period when the gene would be normally activated and most strongly expressed in the brain. Therefore, we propose that no differences in synaptic networks or critical periods in NDDs occur prior to the neurotypical preor postnatal expression of the NDD gene in that brain region **(Hypothesis 3)**. Thus, a gene with limited postnatal expression in the brain would not give rise to aberrant prenatal synaptic phenotypes since the gene is not normally activated in cells prior to birth. One upshot of this idea is that discovery of prenatal expression patterns of a gene implicated in NDDs may not only lead to detection of prenatal synaptic phenotypes but highlight additional previously unknown functions of a gene during early developmental stages of the nervous system.

## **COMPENSATORY MECHANISMS IN SYNAPTIC NETWORKS AND BEHAVIORAL PROCESSING**

Alterations in activity levels during early neuronal network development lead to remodelling and compensatory changes in synaptic strength, a phenomenon known as homeostatic plasticity (Turrigiano and Nelson, 2004). This plasticity mechanism enables a network to regulate its synaptic activity in response to the dynamics of the local environment changed by both intrinsic factors and external stimuli, such as sensory input during early postnatal periods (Marder and Goaillard, 2006). Lack or lossof-function mutations in MeCP2 disrupts homeostatic network plasticity in both developing cortex (Blackman et al., 2012) and hippocampal cultures (Qiu et al., 2012). Further, lack of FMRP disrupts one specific type of homeostatic plasticity dependent upon retinoic acid and protein synthesis in developing hippocampal networks (Soden and Chen, 2010). Thus, later symptomatic changes in brain networks in some NDDs could arise indirectly from impairments in network homeostasis rather than direct synaptic effects of the NDD protein itself.

The transience of synaptic impairments observed during sensitive time-windows (Meredith et al., 2012) could also be influenced by network compensation mechanisms acting to normalize synaptic phenotypes through homeostatic plasticity at that particular developmental stage. For many NDD target proteins, while they may play a key "hub" role in regulating transcription and translation processes in the cell or signaling at the synapse (Bill and Geschwind, 2009; Zoghbi and Bear, 2012), they are not the sole regulator and residual function is likely to be mediated by additional candidates within a synaptic network. Indeed, the initial delays but not absences of key synaptic phenotypes observed in many NDDs (referred against the already known "developmental checkpoints", Ben-Ari and Spitzer, 2010) could be due to the extra time necessary for compensatory mechanisms to regulate and support the network, taking over residual functions not provided by the (missing) NDD gene.

Compensatory mechanisms may also operate during developmental stages of NDDs at the level of systems processing and behavior (Johnson, 2012). In an imaging study of young children with diagnosed ASD, fMRI revealed significant differences in brain activation patterns compared with neurotypical agematched children during a simple motion perception task (Kaiser et al., 2010). However, more interestingly, the unaffected siblings of ASD participants with shared genes and an increased risk for later developing ASD showed significantly different activation patterns to both their siblings and neurotypical controls during the task. Increased activation occurred in ventromedial prefrontal cortex and right posterior superior temporal sulcus, two regions associated with motion processing and general executive function skills (Bechara et al., 2000). These neuro-"endophenotypes", characteristics reflecting susceptibility for a genetic disorder not manifesting as a clinically defined phenotype, could reflect compensatory processing in the brains of those individuals with higher genetic risks for NDDs but not sufficient alterations to warrant a diagnosis.

In conclusion, establishing the mechanisms that underlie early time windows for aberrations in synaptic circuits and impaired behavioral development in NDDs has the potential to reveal new approaches for pharmacotherapeutic correction of brain activity during early development or even new neurobiological gene targets (Levenga et al., 2010; Meredith et al., 2012). Furthermore, we believe this approach outlined in a set of testable hypotheses may reveal dysregulation of brain activity and neuronal circuit formation at significantly earlier presymptomatic stages in nervous system development than previously thought in both syndromic and nonsyndromic neurodevelopmental brain disorders.

#### **ACKNOWLEDGMENTS**

Our research is financially supported by NWO ZonMW VIDI grant (#917.10.372), Fondation Jérôme LeJeune and Hersenstichting NL to Rhiannon M. Meredith. Martijn C. Sierksma is supported by the Erasmus Life Long Learning Program of the European Commission. We thank Ioannis Kramvis and Tinca Polderman for their comments on an earlier draft of the manuscript.

#### **REFERENCES**


metabotropic glutamate receptor-dependent long-term depression. *J. Neurosci.* 26, 2167–2173. doi: 10.1523/jneurosci.5196-05.2006


opening. *Proc. Natl. Acad. Sci. U S A* 106, 15049–15054. doi: 10.1073/pnas. 0907660106


**Conflict of Interest Statement**:The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 14 May 2013; accepted: 15 October 2013; published online: 31 October 2013. Citation: Kroon T, Sierksma MC and Meredith RM (2013) Investigating mechanisms underlying neurodevelopmental phenotypes of autistic and intellectual disability disorders: a perspective. Front. Syst. Neurosci. 7:75. doi: 10.3389/fnsys.2013.00075 This article was submitted to the journal Frontiers in Systems Neuroscience.*

*Copyright © 2013 Kroon, Sierksma and Meredith. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Musical expertise and foreign speech perception

#### *Eduardo Martínez-Montes <sup>1</sup> \*, Heivet Hernández-Pérez 1, Julie Chobert 2, Lisbet Morgado-Rodríguez 1, Carlos Suárez-Murias 1, Pedro A. Valdés-Sosa1 and Mireille Besson1,2*

*<sup>1</sup> Cuban Neuroscience Center, Havana, Cuba*

*<sup>2</sup> Laboratoire de Neuroscience Cognitive, CNRS-Aix Marseille Université, Marseille, France*

#### *Edited by:*

*Virginia Penhune, Concordia University, Canada*

#### *Reviewed by:*

*Claude Alain, Rotman Research Institute, Canada Stefan A. Frisch, University of South Florida, USA*

#### *\*Correspondence:*

*Eduardo Martínez-Montes, Cuban Neuroscience Center, Ave. 25, esq 158, #15202, Cubanacán, P.O. Box: 6414, Havana, Cuba e-mail: eduardo@cneuro.edu.cu*

The aim of this experiment was to investigate the influence of musical expertise on the automatic perception of foreign syllables and harmonic sounds. Participants were Cuban students with high level of expertise in music or in visual arts and with the same level of general education and socio-economic background. We used a multi-feature Mismatch Negativity (MMN) design with sequences of either syllables in Mandarin Chinese or harmonic sounds, both comprising deviants in pitch contour, duration and Voice Onset Time (VOT) or equivalent that were either far from (Large deviants) or close to (Small deviants) the standard. For both Mandarin syllables and harmonic sounds, results were clear-cut in showing larger MMNs to pitch contour deviants in musicians than in visual artists. Results were less clear for duration and VOT deviants, possibly because of the specific characteristics of the stimuli. Results are interpreted as reflecting similar processing of pitch contour in speech and non-speech sounds. The implications of these results for understanding the influence of intense musical training from childhood to adulthood and of genetic predispositions for music on foreign language perception are discussed.

**Keywords: musical expertise, auditory perception, speech perception, foreign language, pitch, duration, Voice Onset Time, Mismatch Negativity**

#### **INTRODUCTION**

Normally-developing infants can learn any of the world languages as nicely stated by Patricia Kuhl: "children are born citizens of the world" (Kuhl, 2002). Unfortunately, this wonderful ability seems to quickly disappear as shown in an elegant experiment by Cheour et al. (1998). These authors used the well-known Mismatch Negativity (MMN) paradigm (Näätänen et al., 1978) to test for the idea of a sensitive period in phoneme perception. Results showed that by one year of age, Finnish phonemes had acquired a special status for Finnish children compared to a phoneme (in Estonian) that did not belong to the Finnish phoneme repertory. These results are clear evidence in favor of an early critical period for phoneme acquisition.

Nevertheless, humans can learn new languages at any time in life even if factors like the starting age of learning (e.g., Flege et al., 1995; Birdsong, 2005, 2006), the amount of knowledge in the native language (e.g., Flege and MacKay, 2004) and the proximity between native (L1) and second language (L2) phonetic inventory (e.g., Flege, 1995; Best et al., 2001) are known to influence learning efficiency (e.g., Golestani and Zatorre, 2009), together with extra-linguistic factors such as motivation (Moyer, 1999), working memory (Miyake and Friedman, 1998; Majerus et al., 2008) and attention control (Segalowitz, 1997; Guion and Pederson, 2007).

Of most interest for our concerns, musical expertise has also been shown to influence foreign language perception and production (Chobert and Besson, 2013). Slevc and Miyake (2006) tested Japanese adults immersed in their L2 (English) after the age of 11 and controlled for the age of first L2 exposure, working memory and level of L2 use. Results of correlation analyses showed that musical abilities were predictive of phonological abilities tested through the perception and production of the English /r/-/l/ contrast that does not belong to the Japanese phoneme repertory. Tervaniemi and collaborators (Milovanov et al., 2008, 2010) also reported that Finnish children and young adults with advanced musical skills had better English pronunciation abilities than those with less-advanced musical skills. Studying prosody perception and recording ERPs together with behavior, Marques et al. (2007) showed that French adult musicians perceived subtle pitch changes at the end of Portuguese sentences, a language that they did not understand, better than French non-musicians. The onset latency of the late positivity to pitch deviants was 300 ms earlier in musicians than in nonmusicians. Similar results were reported by Deguchi et al. (2012) with Italian musicians and non-musicians presented with pitch changes in syntactically correct or jabberwocky Italian and French sentences. Taken together, these results suggest an interconnection between musical expertise and foreign language perception and production.

In most of the experiments aimed at testing the influence of musical expertise on foreign language perception, lexical tones have been used as stimuli presented to native and non-native tone language speakers (e.g., Gottfried and Riester, 2000; Alexander et al., 2005; Delogu et al., 2006, 2010; Gottfried, 2007; Wong et al., 2007; Lee and Hung, 2008; Marie et al., 2011a). At the behavioral level, Gottfried and Riester (2000) and Gottfried (2007) showed that English musicians unfamiliar with tone languages identified the four Mandarin tones better than non-musicians. Moreover, Lee and Hung (2008) reported that English musicians were more accurate than non-musicians in identifying intact syllables produced on the four Mandarin tones among syllables modified in pitch height or pitch contour.

At the subcortical level, Wong et al. (2007) recorded the brainstem Frequency Following Response (FFR) to the contour patterns of Mandarin tones in English amateur musicians and non-musicians who were unfamiliar with tone languages. They reported higher quality of linguistic pitch encoding in the auditory brainstem responses of musicians compared to non-musicians. More recently, Bidelman et al. (2011) presented iterated rippled noise homologue of a lexical tone to English amateur musicians and non-musicians and to Mandarin Chinese speakers. Pitch-tracking accuracy and the strength of the FFR were larger in musicians and in Chinese speakers than in English non-musicians. Finally, Chandrasekaran et al. (2012) demonstrated that the relationship between the efficiency of inferior colliculus pitch representations (assessed by fMRI-Adaptation) and the quality of neural pitch pattern representations (assessed by auditory brainstem recordings) was stronger in musicians than in nonmusicians.

At the cortical level, Chandrasekaran et al. (2009) showed that deviants in pitch contour homologous to Mandarin tones elicited larger MMNs in English musicians than in English non-musicians, thereby showing increased automatic processing of pitch variations in both music and speech in the musician group. Using an active discrimination task on sequences of Mandarin Chinese monosyllabic words, Marie et al. (2011a) recorded behavioral measures and ERPs from French musicians and non-musicians, unfamiliar with tone languages. Musicians detected both supra-segmental tonal and segmental (consonant, vowel) variations more accurately than nonmusicians. Moreover, tonal variations were categorized faster by musicians than by non-musicians, as reflected by the shorter latency of the N2/N3 components (see also Fujioka et al., 2006; Moreno et al., 2009). Finally, the decision that tone and/or segmental variations were different was associated with larger P3b components (e.g., Duncan-Johnson and Donchin, 1977; Picton, 1992) in musicians than in nonmusicians.

Taken together, studies of lexical tone perception by nonnative listeners showed that musicians discriminated and/or identified segmental and supra-segmental linguistic contrasts in a foreign language better than non-musicians. Results also revealed more reliable encoding of linguistic pitch patterns at the subcortical level as well as enhanced MMNs to pitch contour deviants and enhanced discrimination and decision-related ERP components at the cortical level in musicians compared to non-musicians. These results demonstrate that long-term musical training not only facilitates the processing of unattended and attended harmonic and musical sounds but also impacts on the processing of speech sounds. These findings have been taken as evidence that some aspects of music and speech involve common processing mechanisms and transfer effects (see Kraus and Chandrasekaran, 2010 and Besson et al., 2011, for reviews).

While several studies have examined the influence of musical expertise on non-native tone perception, only one study has, to our knowledge, examined the influence of musical expertise on non-native vowel duration, a phonemic contrast that is linguistically relevant in quantity languages such as Finnish or Japanese. Sadakata and Sekiayama (2011) recently examined categorical perception of both supra-segmental moraic features in Japanese and segmental vowels variations in Dutch by presenting Dutch and Japanese musicians and non-musicians with discrimination and identification tests. The mora is defined as a perceptual temporal unit and is used by Japanese listeners to segment speech signals (Cutler and Otake, 1994). Results of the same/different discrimination task with pairs of Japanese (e.g., kanyo-kannyo) and Dutch words (e.g., kuchkech), differing in morae or vowel respectively, showed that musicians, Dutch and Japanese, outperformed non-musicians in the discrimination of supra-segmental and segmental variations in their own language, as well as in the foreign language. Moreover, after learning, identification performance of moraic feature (in stop Japanese contrast) was higher in musicians (Japanese and Dutch) than in non-musicians. These results are important because they demonstrate that musical expertise not only influences phoneme perception and discrimination but also categorical perception. As such, they raise the interesting possibility that musical expertise enhances the ability to build reliable abstract phonological representations (e.g., Slevc and Miyake, 2006; Degé and Schwarzer, 2011; Ott et al., 2011).

Based on these results, the overall aim of the present study was to use the MMN to determine whether musical expertise influences the perception of non-native supra-segmental and segmental speech variations when participants are not required to focus attention on the phonemic contrasts of interest. We examined three types of phonemic contrasts: pitch contour, vowel duration and Voice Onset Time (VOT). VOT is a phonological parameter acoustically defined as the interval between the noise burst produced at consonant release and the onset of the waveform periodicity associated with vocal cord vibration (Lisker and Abramson, 1967). Changes in VOT typically allow one to perceive stop consonants as voiced (e.g., /b/) or voiceless (e.g., /p/). To our knowledge, no studies have yet investigated the automatic processing of vowel duration and VOT contrasts in a language unknown for the participants. However, two recent studies have investigated the automatic processing of these phonemic contrasts in the participants' native language. Chobert et al. (2011) tested 9-year-old children and found larger MMNs to vowel duration and VOT deviants in musician compared to non-musician children. Very recently, Kühnis et al. (2013) reported enhanced MMNs to native contrasts of vowel frequency (fundamental frequency and second formant transition), vowel duration and VOT deviants in musician compared to nonmusician adults. Thus, it was of interest to determine whether similar results would be obtained with non-native phonemic contrasts.

To this aim we used a multi-feature MMN design (Näätänen et al., 2004) with Mandarin Chinese syllables presented to Cuban musicians and visual artists. The syllable "Cha" was used as standard and deviant syllables were either close to or far from the standard (Small and Large deviants) on three dimensions: pitch contour, vowel duration and VOT. Based on the results summarized above (Chandrasekaran et al., 2009; Chobert et al., 2011; Kühnis et al., 2013), we hypothesized that musicians should be more sensitive than non-musicians to spectrally (pitch contour) and temporally (vowel duration and VOT) deviant Mandarin Chinese syllables, even if these syllables were unfamiliar to all participants. Moreover, based on previous results (e.g., Chobert et al., 2011; Marie et al., 2012), we also expected larger differences between musicians and visual artists for Small deviants, that are difficult to perceive, than for Large deviants, that are easy to perceive and can be detected by both groups of participants.

In addition, we controlled that musicians were more sensitive than non-musicians to manipulations of harmonic sounds similar to those created for Mandarin Chinese syllables. The note "Mi3" played on a clarinet was used as standard with Small and Large deviant sounds/notes on three dimensions: pitch contour, duration and an equivalent of VOT (see Methods). We hypothesized that Cuban musicians should be more sensitive than Cuban visual artists to the different manipulations of the harmonic sounds, again with larger between-group differences for Small (difficult to detect) than for Large deviants (easy to detect).

In sum, the originality of the present study was to compare spectral and temporal automatic processing for both syllables and harmonic sounds within the same participants, to use spectral manipulations of pitch contour that are linguistically relevant in Mandarin Chinese and to create manipulations of harmonic sounds as similar as possible from the linguistic sounds. Moreover, because this study was conducted in Cuba, all participants had very homogenous socio-economic background and level of education.

#### **METHODS**

#### **PARTICIPANTS**

Twenty six musicians and twenty six visual artists from the "Instituto Superior de Arte" (ISA; Superior Institute of Art) participated in this experiment that lasted for about 2 h. All subjects were native speakers of Spanish, with no experience of tone languages and without hearing or neurological disorders. None of the visual artists had any formal musical training other than that provided in elementary school and none of them played a musical instrument. Musicians started musical training around the age of 7 and had 16 ± 4 years of musical training on average at the time of testing. They played different instruments as detailed in **Table 1**. Visual artists started training in painting and/or sculpturing around 14 and had 7 ± 3 years of training on average. Four musicians and four visual artists were not included in the analyses because of too many artifactcontaminated trials in their EEG recordings. The final groups comprised 22 musicians (mean age 23*.*4 ± 3*.*6, 11 women; 17 right-handed, 2 left-handed and 3 ambidextrous) and 22 visual artists (mean age 23*.*4 ± 2*.*0, 8 women; 19 right-handed, 2 lefthanded and 1 ambidextrous). All participants gave their informed consent to participate in the experiment that was conducted according to the ethical guidelines of the Cuban Neuroscience Center.

**Table 1 | List of instruments or abilities of the musicians participating in the study.**


#### **STIMULI**

Linguistic stimuli were built from the natural syllables "Cha" (Tone 1), "Chá" (Tone 2), and "Zha" spoken by a Mandarin Chinese native speaker by using the Praat software (Boersma and Weenink, 2009). All stimuli had a Consonant-Vowel structure and a mean intensity of 70 dB SPL (except for intensity deviants1). For all stimuli (linguistic and non-linguistic), intensity was calibrated in dB SPL using a Brüel & Kjaer sound level meter (Investigator 2250 with microphone type 4189). The standard stimulus "Cha" comprised the consonant "Ch" (105 ms in duration) and the vowel "a" (Tone 1; 260 ms in duration) with a total duration of 365 ms and a fundamental frequency (F0) of 164.8 Hz (defined from the first to the last pulse of the vowel).

Deviants were chosen based upon the results of a pilot study with 12 Chinese native speakers, 12 Cuban musicians and 13 Cuban non-musicians. We first digitally created continuous changes in F0, vowel duration, consonant duration (VOT) and average vowel intensity from the standard syllable "Cha". For pitch deviants, the F0 was increased using a sigmoid function extracted from the original recording of "Chá" (Tone 2), going from standard F0 (164.8 Hz) in the first pulse of the vowel to different ending frequencies in the last pulse of the vowel (resulting in frequency changes of 5, 10, 20, 30, 50, 80, 100, 200, 300, 400, and 500 cents of semitones). For Duration deviants, the duration of the vowel was shortened by re-synthesis in steps of 10 ms, from 260 ms (standard) to 140 ms. VOT deviants were built by shortening the consonants of "Cha" and "Zha" by 10 ms (using the nearest zero-crossing point to avoid high frequency artifacts). A continuum was built from the standard "Cha" to "Zha." Finally, for intensity deviants, intensity was decreased in steps of 2 db, from 70 db (standard) to 62 db.

Pilot participants listened to 48 pair of sounds (standarddeviant stimuli) and they had to decide whether they were same or different. Based on these results, we selected the large deviant for each condition (frequency, duration, VOT and intensity) as the one similarly detected by all groups. By contrast, small deviants were the ones better detected by musicians and Chinese participants than by non-musicians. In the case of VOT, the large deviant was the unmodified "Zha" syllable (across phonetic category). The small deviant was a small change to the consonant "Ch" that sounded more similar to "Zh" but was still recognized as "Ch" by all groups in the pilot study.

Non-linguistic stimuli (harmonic sounds) were built by using procedures similar to those used for linguistic sounds. All stimuli were high-quality recordings of natural clarinet sounds with a mean intensity of 70 dB SPL (except for intensity deviants<sup>1</sup> ). The standard sound had an F0 of 164.8 Hz (Mi3) and a total duration of 260 ms. Continuous changes in F0 (following the same sigmoid function as in syllables), duration and intensity were built in the same way as for the linguistic stimuli. For VOT deviants, the first part of the sound was removed (zeroed) in steps of 5 ms in order to obtain a different temporal relationship between the low frequency components of the clarinet timber (starting at the beginning) and the higher frequency components (starting around 65 ms). This procedure is similar to the one used in previous studies (e.g., Chobert et al., 2011) to convert a "Ba" into a "Pa." The total duration of the sounds was kept to 260 ms to ensure that all stimuli (standard and deviants) were synchronized in time to the first pulse. Pilot participants were presented with 48 pairs in a same/different task as described above for syllables. In each condition (frequency, duration, VOT and intensity), we selected a large deviant that was similarly detected by all groups, while the small deviant was better detected by musicians than by non-musicians.

**Figure 1** illustrates the sound waveforms, whose acoustic properties are summarized in **Table 2**. In both syllables and harmonic sounds and for Large pitch contour deviants, the F0 increased from 164.8 to 185.0 Hz (a continuous increase of 2 semitones that is 20.2 Hz, 11% increase). For Small pitch deviants, the F0 increased from 164.8 to 169.6 Hz (50 cents of a tone that is 4.8 Hz, 2.9% increase) in harmonic sounds and from 164.8 to 168.7 Hz (40 cents of a tone that is 3.9 Hz, 2.4% increase) in syllables. For both type of stimuli, Large duration deviants were 120 ms shorter than the Standard (46.2% decrease) and Small duration deviants were 40 ms shorter than the Standard (15.4% decrease). For linguistic stimuli, the syllable "Zha" was used as Large VOT deviant. The Small VOT deviant comprised the first 60 ms of the consonant "Ch" joined to the vowel of the standard. In both cases, a period of silence was added at the beginning to keep a total duration of 365 ms (same as standard). For harmonic sounds, the Large and Small VOT deviants were built by zeroing the first 60 and 30 ms of the sound, respectively (using the nearest zero-crossing point to avoid artifacts in both cases).

#### **MMN EXPERIMENT PROCEDURE**

The EEG was recorded while participants sat in a comfortable chair and watched a silent subtitled movie of their choice displayed on a screen at one meter distance. Participants were asked to watch the movie without paying attention to the sounds that were presented through headphones.

Syllables and harmonic sounds were presented in two separate blocks that lasted for 12.2 min each. Pitch, duration, VOT and intensity deviants were presented in a balanced pseudorandom order, always including one or two standards sounds between

deviants. Five different sequences were created with a Stimulus Onset Asynchrony (SOA) of 700 ms and with a total of 1200 stimuli: 504 deviants (72 for each of the 7 deviant types; 6% probability) and 696 standards plus 15 standards at the beginning of the sequence. Sequences were balanced between subjects. Moreover, half of the participants started with the syllable sequences and the other half with the harmonic sound sequences. Participants were asked questions at the end of the experiment to ensure they had paid attention to the movie.

#### **ERP RECORDINGS**

The EEG was continuously recorded from 32 Biosemi pin-type active electrodes (Amsterdam University), mounted on an elastic head cap, and located at standard left and right hemisphere positions over frontal, central, parietal, occipital, and temporal areas (International 10/20 system sites: Fz, Cz, Pz, Oz, Fp1, Fp2, AF3, AF4, F3, F4, C3, C4, P3, P4, P7, P8, O1, O2, F7, F8, T7, T8, Fc5, Fc1, Fc2, Fc6, Cp5, Cp1, Cp2, Cp6, PO3, PO4). Moreover, to detect horizontal eye movements and blinks, the Electro-oculogram (EOG) was recorded from Flat-type active

<sup>1</sup>Note 1: This experiment also included an attentive listening condition (results not reported here) in which participants were asked to press a button to intensity deviants. As the intensity deviants were only included for this purpose, they are not analyzed in the present experiment.



*\*VOT for harmonic sounds is defined as the time set to zero from the beginning of the stimuli. For Syllables it represents the length of the first part of the consonant used. The dash (*−*) represents the same value as the standard.*

electrodes placed 1 cm to the left and right of the external canthi, and from an electrode beneath the right eye. Two additional electrodes were placed on the left and right mastoids. EEG was recorded at a sampling rate of 512 Hz using Biosemi amplifiers. The EEG was re-referenced offline to the algebraic average of the left and right mastoids and filtered with a bandpass of 1–30 Hz (12 db/oct). Impedances of the electrodes never exceeded 5 k*-*. Data were segmented in single trials of 800 ms starting 100 ms before stimuli onset and were analyzed using the BrainVision Analyzer software (Brain Products, Munich). Trials containing ocular artifacts (75µV threshold on vertical and horizontal EOG) and movement artifacts (75µV threshold on all channels) were excluded from the averaged ERP waveforms. On average, the number of rejected trials for each deviant stimulus was less than 15% of the total number of trials.

#### **DATA ANALYSES**

Artifact-free ERP trials (mastoid referenced) were averaged for each subject and for each experimental condition. Difference waveforms were obtained by subtracting the ERPs elicited by the Standard stimuli from those elicited by each deviant stimulus. The MMN was also computed by using the nose reference to ensure the typical MMN inversion between Fz/Cz and the mastoid electrodes (for review, see Näätänen et al., 2007). However, because mastoid-referenced averages typically show a better signal-tonoise ratio than the nose-referenced averages (e.g., Kujala et al., 2007), the former were used to quantify MMN amplitude.

For each condition, *Z*-tests were performed to compare each time point of the grand average traces of each group (musicians, MUS and Visual Artists, VA) against zero. The significant points were selected as those with Z-statistics higher than the threshold corresponding to a corrected significance level of 0.05 (using Bonferroni correction). Mean MMN amplitude were measured in a 50 ms time window, centered at the peak negative value. The same procedure was used for the other two components: the early negativity to Large duration and VOT deviants (peaking around 100–120 ms) and the P3a to Large duration deviants (peaking around 500 ms). Maximum MMN amplitude developed between 200 and 400 ms for both pitch and duration deviants in harmonic sounds, and slightly later in syllables. Maximum MMN amplitudes for VOT deviants were between 150 and 250 ms in both harmonic sounds and Mandarin syllables (see next section).

Given the different nature of the stimuli used for Pitch, Duration and VOT, and because the MMNs were largest at frontal electrodes, Three-Way ANOVAs were conducted at frontal sites for each Dimension separately that included Group (MUS vs. VA) as a between-subject factor and Deviance size (Small vs. Large) and Laterality (Left vs. Midline vs. Right) as withinsubject factors. As the Group by Laterality interaction was not significant for syllables or for harmonic sounds, these results are not reported further. Results of exploratory analyses (ANOVAs) conducted for large and Small deviants separately are also reported when they allow a better understanding of the effects of interest.

## **RESULTS**

#### **MMN IN EACH CONDITION AGAINST ZERO**

Results of two-tailed *Z*-tests vs. 0 using the Bonferroni correction across time (alpha = 0.05) showed that the MMNs to large deviants, whether in syllables or in harmonic sounds, were always significantly different from zero in both groups but the latency band within which these differences were significant varied. The MMNs to small pitch deviants in harmonic sounds were significantly different from zero in both groups but only for musicians for syllabic pitch. The MMNs to small duration deviants were only significantly different from zero for musicians and for harmonic sounds. Finally, the MMNs to small VOT (or equivalent) deviants were not significantly different from zero either for syllables or for harmonic sounds.

#### **MMN DIFFERENCES BETWEEN GROUPS FOR THE THREE TYPES OF DEVIANTS**

In order to control that the MMN differences reported below are due to the processing of the deviants rather than to the standards, *t*-tests were first computed between both groups (Bonferroni corrected) on the standards in each condition. As illustrated on **Figure 2**, the ERPs to both syllables and harmonic sounds very well overlap in the two groups and results revealed no significant differences. Thus, the between-groups differences that we report below for the MMN are more likely linked to the deviants than to the standards.

#### *Pitch contour deviants*

For pitch contour deviants in Mandarin syllables (see **Figure 3**, upper row), the MMN was marginally larger in MUS (−2*.*73µV) than in VA [−2*.*06µV; main effect of Group: *F(*1*,* <sup>42</sup>*)* = 3*.*18, *p* = 0*.*08]. Large deviants (−3*.*42µV) elicited larger MMNs than Small deviants [−1*.*36µV; main effect of Deviance size: *F(*1*,* <sup>42</sup>*)* = 40*.*59, *p <* 0*.*001]. The Group by Deviance size interaction was significant [*F(*1*,* <sup>42</sup>*)* = 9*.*70, *p <* 0*.*001]. Separate analyses for Large and Small deviants revealed that while the MMNs to Large deviants were not significantly different between MUS (−3*.*58µV) and VA (−3*.*28µV; *F <* 1), the MMNs to Small deviants were larger in MUS (−1*.*88µV) than in VA [−0*.*85µV; *F(*1*,* <sup>42</sup>*)* = 11*.*41, *p <* 0*.*002].

For pitch contour deviants in harmonic sounds (see **Figure 3**, lower row), the MMN was larger in MUS (−5*.*00µV) than in VA [−3*.*40µV; main effect of Group: *F(*1*,* <sup>42</sup>*)* = 6*.*61, *p <* 0*.*02]. Large deviants (−5*.*53µV) elicited larger MMNs than Small deviants [−2*.*87µV; main effect of Deviance size: *F(*1*,* <sup>42</sup>*)* = 94*.*88, *p <* 0*.*001].The Group by Deviance size interaction was not significant [*F(*1*,* <sup>42</sup>*)* = 2*.*60, *p >* 0*.*10] but subsequent exploratory analyses revealed that while the MMNs to Large harmonic pitch deviants were not significantly different between MUS (−6*.*11µV) and VA (−4*.*94µV; [*F(*1*,* <sup>42</sup>*)* = 2*.*41, *p >* 0*.*10], the MMNs to Small duration deviants were larger in MUS (−3*.*90µV) than in VA [−1*.*85µV; *F(*1*,* <sup>42</sup>*)* = 11*.*52, *p <* 0*.*001].

#### *Duration deviants*

For duration deviants in Mandarin syllables (see **Figure 4**, upper row), the main effect of Group was not significant (MUS = −1*.*62µV; VA = −1*.*41µV; *F <* 1). Large deviants (−1*.*89µV) elicited larger MMNs than Small deviants [−1*.*14µV; main effect of Deviance size: *F(*1*,* <sup>42</sup>*)* = 12*.*79, *p <* 0*.*001]. The Group by Deviance size interaction was not significant [*F(*1*,* <sup>42</sup>*)* = 2*.*54, *p >* 0*.*10]: MMNs to Large and Small Duration deviants were not significantly different between groups.

For duration deviants in harmonic sounds (see **Figure 4**, lower row), the main effect of Group was

**FIGURE 3 | MMNs to Large and Small pitch contour deviants in Mandarin syllables and in harmonic sounds recorded at Frontal sites (F3, Fz, and F4) are overlapped for musicians (gray dashed line) and visual artists (black solid line).** On this and subsequent figures: ns means "not significant"; one (∗) and two stars (∗∗) represent significance with *p <* 0*.*05 and *p <* 0*.*01, respectively.

marginally significant [MUS = −2*.*33µV; VA = −1*.*74µV; *F(*1*,* <sup>42</sup>*)* = 3*.*23, *p <* 0*.*08]. Large deviants (−2*.*87µV) elicited larger MMNs than Small deviants [−1*.*20µV; main effect of Deviance size: *F(*1*,* <sup>42</sup>*)* = 41*.*88, *p <* 0*.*001]. The Group by Deviance size interaction was marginally significant [*F(*1*,* <sup>42</sup>*)* = 3*.*04, *p* = 0*.*08]. Subsequent exploratory analyses revealed that while the MMNs to Large

duration deviants were not significantly different between MUS (−2*.*94µV) and VA (−2*.*80µV; *F <* 1), the MMNs to Small duration deviants were larger in MUS (−1*.*72µV) than in VA [−0*.*68µV; *F(*1*,* <sup>42</sup>*)* = 7*.*45, *p <* 0*.*01] thereby explaining that the main effect of Group was marginally significant.

For Large duration deviants, the early negativity (around 120 ms) was not significantly different between the two groups (*F <* 1) but the P3a component (around 500 ms) was significantly larger in MUS (0.81µV) than in VA [−0*.*27µV; main effect of Group: *F(*1*,* <sup>42</sup>*)* = 5*.*56, *p <* 0*.*03].

#### *VOT (or equivalent) deviants*

For VOT deviants in Mandarin syllables (see **Figure 5**, upper row), the main effect of Group was significant [MUS = −1*.*31µV; VA = −0*.*59µV; *F(*1*,* <sup>42</sup>*)* = 7*.*77, *p <* 0*.*008]. Neither the main effect of Deviance size (Large deviant: −0*.*97µV and Small deviants: −0*.*92µV; *F <* 1) or the Group by Deviance size interaction were significant [*F(*1*,* <sup>42</sup>*)* = 1*.*27, *p >* 0*.*20]. To better understand the main effect of group, separate exploratory analyses were conducted for Large and Small VOT deviants. Results revealed that while the MMNs to Small VOT deviants were not significantly different in MUS (−1*.*14µV) and in VA [−0*.*70µV; *F(*1*,* <sup>42</sup>*)* = 1*.*33, *p >* 0*.*20], the MMNs to Large VOT deviants were significantly different between MUS (−1*.*48µV) and VA [−0*.*47µV; *F(*1*,* <sup>42</sup>*)* = 8*.*57, *p <* 0*.*006] thereby explaining that the main effect of Group was significant.

The early negativity (around 100 ms) to Large and Small VOT deviants in syllables was not different between groups (*F <* 1).

For the equivalent of VOT deviants in harmonic sounds (see **Figure 5**, lower row), the main effect of Group was significant [MUS = −2*.*01µV; VA = −1*.*48µV; *F(*1*,* <sup>42</sup>*)* = 4*.*47, *p <* 0*.*05]. Large deviants (−2*.*52µV) elicited larger MMNs than Small deviants [−0*.*97µV; main effect of Deviance Size: *F(*1*,* <sup>42</sup>*)* = 26*.*53, *p <* 0*.*001]. The Group by Deviance size interaction was not significant [*F(*1*,* <sup>42</sup>*)* = 1*.*23, *p >* 0*.*20]. Subsequent exploratory analyses revealed that while the MMNs to Large VOT-equivalent deviants were not significantly different in MUS (−2*.*62µV) and in VA (−2*.*42µV; *F <* 1), the MMNs to Small VOT-equivalent deviants were significantly larger in MUS (−1*.*40µV) than in VA [−0*.*54µV; *F(*1*,* <sup>42</sup>*)* = 8*.*87, *p <* 0*.*005] thereby explaining that the main effect of Group was significant.

For Large deviants, the early negativity (around 100 ms) was larger in MUS (−0*.*19µV) than in VA [0.47µV; *F(*1*,* <sup>42</sup>*)* = 4*.*72, *p <* 0*.*04].

## **DISCUSSION**

Results are discussed in turn for each type of deviant in Mandarin syllables and in harmonic sounds before being considered together in a general discussion.

#### **PITCH CONTOUR DEVIANTS**

One of the most interesting finding is that the MMNs to Small pitch contour deviants in Mandarin syllables were larger in musicians than in visual artists. Thus, in line with our hypothesis, musicians were more sensitive to small changes in the pitch contour of Mandarin syllables than visual artists. This conclusion is strengthened by the complementary finding that the MMNs to small pitch deviants were only significantly different from zero for musicians but not for visual artists. This is taken to show that while musicians automatically perceived the small difference in pitch with the standard syllable "Cha," this difference was too small to be automatically detected by visual artists.

By contrast, no significant between-group difference in MMN amplitude was found for Large pitch contour deviants most likely because they were easy to perceive by both musicians and visual artists. In line with this conclusion the MMNs to Large pitch deviants were significantly different from zero in both groups. This is taken to show that the large pitch difference with the standard syllables was automatically detected in both groups. Importantly, these results extend to real Mandarin syllables previous results from Chandrasekaran et al. (2009) showing enhanced MMNs in English musicians than in English non-musicians to pitch contour deviants homologous to Mandarin tones. They also extend previous results from Kühnis et al. (2013) with pitch deviants in native vowels.

The finding that musicians are more sensitive to subtle pitch contour differences in Mandarin Chinese syllables than visual artists has interesting consequences for foreign language perception and learning. When immersed in a foreign language environment, automatic processing of subtle changes in pitch that are linguistically relevant (as in Mandarin Chinese) may largely contribute to the explicit learning of the language. Previous results have shown that musical expertise facilitates the attentive processing of pitch variations in a foreign language both at the segmental and supra-segmental levels (e.g., Marques et al., 2007; Marie et al., 2011b). However, to our knowledge the relationship between automatic and attentive processing has not yet been directly tested within the same participants. This may be an important aspect to explore in future experiments.

Results for harmonic tones are in line with many results in the literature showing larger MMNs to pitch deviants in pure tones, harmonic tones, and musical sounds in musicians than in nonmusicians (e.g., Tervaniemi et al., 2001; Nikjeh et al., 2009; Marie et al., 2012). These results are typically interpreted as reflecting increased sensitivity to pitch contour in musicians than in non-musicians. Importantly, results for pitch contour deviants in harmonic tones were very similar to those reported above for Mandarin syllables. While the between-group difference was not significant for Large pitch contour deviants, that were easy to detect, the MMNs to Small pitch deviants in harmonic tones were larger in musicians than in visual artists. Thus, the effects of musical expertise are best seen when the differences between the deviants and the standard are difficult to perceive. However, and in contrast to Small pitch contour deviants in Mandarin syllables, for both types of deviants and in both groups, the MMNs were significantly different from zero so that both musicians and visual artists automatically detected Large and Small pitch deviants in harmonic sounds. Finally, the finding that musicians are more sensitive to Small pitch contour deviants both in harmonic sounds and in Mandarin syllables is in line with the hypothesis that speech and non-speech sounds rely on common pitch processing (Wong et al., 2007; Chandrasekaran et al., 2009; Kraus and Chandrasekaran, 2010; Besson et al., 2011; Bidelman et al., 2013).

#### **DURATION DEVIANTS**

Results were not as clear-cut for duration deviants in Mandarin syllables as for pitch contour deviants. As expected, the betweengroup differences were not significant for Large duration deviants that were easy to detect (120 ms shorter than the Standard) and the MMNs were significantly different from zero in both groups. However, in contrast to our hypothesis, the MMNs to Small duration deviants were not significantly different between musicians and non-musicians. It may be that the 40 ms difference with the Standard was too small to be automatically perceived. In line with this explanation, the MMNs to Small duration deviants were not significantly different from zero in either group. These results contrast with those reported by Milovanov et al. (2009) showing enhanced MMNs to speech duration deviants in 10- to 12 year-old children with high musical aptitudes and pronunciation skills compared with children who lacked these skills. Moreover, Chobert et al. (2011) also found larger MMNs to Large and Small duration deviants in 9-year old musician than in non-musician children. These different results are likely linked to the specific characteristics of the stimuli and the duration that was chosen for the Small duration deviants. In this respect, results of the pilot experiment showed that musicians were able to attentively detect the Small duration deviants better than non-musicians. Moreover, other results in active listening tasks point to enhanced processing of the metric structure of words presented in sentence contexts in musicians compared to non-musicians (Marie et al., 2011b) and to enhanced discrimination and identification of moraic features in musicians (Japanese and Dutch) than in nonmusicians (Sadakata and Sekiayama, 2011). Thus, taken together, these results suggest that while the 40 ms duration difference between the standard and the Small duration syllabic deviants was possibly too small to be automatically processed, it could be attentively detected. Such dissociation between pre-attentive and attentive processing has already been reported in several experiments (e.g., Tervaniemi et al., 2009; Marie et al., 2012) and needs to be further explored.

For harmonic sounds, results were in line with our hypotheses with significant between-group differences for Small duration deviants and no significant group differences for Large duration deviants. Thus, the MMNs to large duration deviants were significantly different from zero in both groups, but not significantly different between musicians and visual artists, possibly for the same reasons as detailed above for Mandarin syllables: the 120 ms duration difference with the Standard was easy to detect and automatically processed by both musicians and visual artists. Moreover, an early negativity that seemed larger in musicians than in visual artists developed around 120 ms post-stimulus onset but this difference was not significant. By contrast, the P3a was clearly larger in musicians than in visual artists thereby showing that attention was automatically attracted to the large duration deviants in musicians but not in visual artists (Courchesne et al., 1975; Squires et al., 1975).

For Small duration deviants, the MMN was significantly different from zero in musicians but not in visual artists and the MMN was significantly larger in musicians than in visual artists. Thus, while musicians automatically processed the 40 ms difference in duration between the standard and the small duration deviants, this difference was too small to be automatically perceived by visual artists. These results are in line with those reported by Marie et al. (2012) showing enhanced MMNs to duration deviants in harmonic sounds in French musicians compared to French non-musicians. Importantly, however, these results also reveal differences in the processing of duration in syllables and harmonic sounds since the 40 ms difference with the standard was automatically perceived by musicians in harmonic sounds but not in syllables.

#### **VOT DEVIANTS**

For Mandarin syllables, the syllable "Zha" was used as the Large VOT deviant (across phonemic category deviant). The finding of larger MMNs in musicians than in visual artists extends previous results related to the automatic processing of VOT in syllables belonging to the native phonemic repertory [Chobert et al. (2013) in children and Kühnis et al. (2013) in adults) to the automatic processing of VOT in foreign syllables. However, the MMNs to the syllable "Zha" were smaller and less-well defined as compared to those obtained for the syllables "Pa" (Chobert et al., 2013) or "Ta" (Kühnis et al., 2013). This probably reflects differences in acoustic features since the attack time for the syllable "Zha" is less marked than for stop consonants such as "P" and "T."

By contrast, the MMNs to small VOT deviants in Mandarin syllables were not significantly different from zero in either groups and no significant differences were found between musicians and visual artists. As argued for Small duration deviants in syllables, the Small VOT deviants were probably too close from the standard to be automatically detected. These results contrast with those of the active listening task of the pilot study showing higher detection rates of Small VOT deviants in musicians than in non-musicians. Again, they point to some dissociation between automatic processing, as measured with the MMN, and attentive processing, as in the active listening tasks of the pilot study.

Finally, for harmonic sounds, the MMNs to large VOTequivalent deviants were significantly different from zero in the two groups but not significantly different between musicians and visual artists. Thus, both musicians and visual artists were equally sensitive to the equivalent of a VOT manipulation, created by zeroing the first 60 ms of the harmonic tone. However, a frontocentral early negativity developed between 50 and 130 ms and was larger in musicians than in visual artists thereby indicating that musicians were more sensitive to this manipulation than visual artists. For Small VOT-equivalent deviants, MMNs were significantly different between the two groups. However, this result should be considered with caution because the MMNs were not significantly different from zero in either group which is taken to indicate that zeroing the first 30 ms of the harmonic sounds was too small of a difference from the standard to be automatically processed either by musicians or by visual artists. Again, the choice of the value for small deviants was based on the results of the pilot study showing that these stimuli were attentively detected with higher detection rates by musicians than by nonmusicians. As proposed for Small duration deviants in syllables, these results suggest some dissociation between automatic and controlled listening when the deviant stimuli are close to the standard. Clearly, these results point to the importance of choosing the right stimuli to test for the effects of interest.

## **CONCLUSIONS**

Results for duration and VOT deviants were not clear-cut in showing similar effects of musical expertise for Mandarin syllables and harmonic tones, possibly due to the specific characteristics of the stimuli that were chosen based on results in an attentive listening task (pilot study). As such, they reveal interesting dissociations between automatic and controlled attentive processing that will be examined further by requiring participants to actively discriminate the different types of deviants. However, an alternative interpretation needs to be considered. The differences between the musicians and the visual artists tested in the present experiment may be smaller than between the musicians and the non-musicians tested in the pilot study for at least two reasons. First, the non-musicians of the pilot study had no strong artistic background (they were mainly Master and PhD students in Neuroscience). By contrast, the visual artists were professional artists with more than 7 years of intensive training. Thus, musicians and visual artists may have developed "artistic brains," more similar to each other than to the "non-artistic brain" of the nonmusicians tested in the pilot study. Second, visual artist typically spend 5–10 h a day working on their creations and they listen to music most of the time while working. Thus, even if they did not receive formal music education, they may have developed a "musical ear" through thousands of hours of passive exposure to music. To test for this hypothesis, non-musicians only occasionally listening to music (i.e., scientists who are not music-lovers) should be used as a control group.

By contrast to results for duration and VOT deviants, results were clear-cut in showing that the processing advantage of musicians over visual artists for pitch contour deviants in harmonic sounds extended to pitch contour deviants in Mandarin syllables, specifically when the differences between the deviants and the standard are small. These findings are in line with the hypothesis that years of musical practice increase auditory processing abilities and confer an advantage to musicians not only for harmonic sounds but also for speech sounds (Kraus and Chandrasekaran, 2010; Strait et al., 2010; Besson et al., 2011). These abilities may turn out to be very important to facilitate the learning of foreign languages, specifically when pitch variations are linguistically relevant as in Mandarin Chinese and in many other languages of the world (e.g., most African languages).

An issue that has been hotly debated in the literature is whether the differences between musicians and non-musicians reflect genetic predispositions for music or are linked with extended musical training (e.g., Schellenberg, 2004; Hyde et al., 2009; Moreno et al., 2009; Corrigall et al., 2013). While genetic predispositions certainly play a role in the observed differences, Musacchia et al. (2008) showed, by using both sub-cortical and cortical measures, that processing the specific pitch features of the syllable "Ba" was correlated with the duration and the age of onset of musical training thereby pointing to the importance of musical training. Most importantly, results of longitudinal studies have demonstrated that effects similar to those found in cross-sectional studies comparing musician and non-musician children can be generated by training non-musician children with music (e.g., Moreno et al., 2009; Chobert et al., 2012; François et al., 2013). As these effects were found in children, it may be that there is a critical period for musical training so that different results would be obtained in adults trained with music. To our knowledge, such a study remains to be conducted to test for the hypothesis of a critical period for music learning.

Taken together, these results show that musical expertise positively influences the automatic processing of non-native suprasegmental contrasts. Musicians were more sensitive than visual artists to changes in syllabic pitch contours even if these changes did not belong to the phonemic repertory of their own language. These results raise the interesting possibility that, by being more sensitive to pitch and to lexical tone contrasts, musicians may learn tone languages more easily than non-musicians (e.g.,Wong and Perrachione, 2007). This hypothesis will be directly tested in future experiments.

## **ACKNOWLEDGMENTS**

We are thankful to the former Cuban Minister of Culture, Mr. Abel Prieto and to the President of the Cuban Music Institute, Mr. Orlando Vistel Columbié for making this investigation possible. Special thanks to the Rector of the Instituto Superior de Arte (ISA), Mr. Rolando González Patricio, to the Dean of the School of Music, Mariana Hevia Román and her team, to the Dean of the School of Visual Arts, Mr. Jorge Braulio Rodríguez Quintana, and to many other teachers and personnel of ISA, for their help in organizing the practical aspects of this research. We also thank the PhD student Rui Zhang for generously lending us his voice for the preparation of the linguistic stimuli used in this study and Solvi Ystad for providing us with the clarinet sound; Lídice Galán and Agustín Lage-Castellanos for statistical advices, as well as our colleagues Yun Nan and Pavel Prado-Gutiérrez for advices on the selection/preparation of the stimuli and very thoughtful discussions. We would also like to thank the Laboratory of Excellence "Brain Language Research Institute" (BLRI) for institutional support, but most of all, we are really grateful to all the musicians and visual artists who participated in this experiment: it has been a pleasure working with them.

## **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 24 May 2013; accepted: 25 October 2013; published online: 14 November 2013.*

*Citation: Martínez-Montes E, Hernández-Pérez H, Chobert J, Morgado-Rodríguez L, Suárez-Murias C, Valdés-Sosa PA and Besson M (2013) Musical expertise and foreign speech perception. Front. Syst. Neurosci. 7:84. doi: 10.3389/fnsys.2013.00084*

*This article was submitted to the journal Frontiers in Systems Neuroscience.*

*Copyright © 2013 Martínez-Montes, Hernández-Pérez, Chobert, Morgado-Rodríguez, Suárez-Murias, Valdés-Sosa and Besson. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Learning language with the wrong neural scaffolding: the cost of neural commitment to sounds

## *Amy S. Finn1,2\*, Carla L. Hudson Kam1,3, Marc Ettlinger 1,4, Jason Vytlacil <sup>1</sup> and Mark D'Esposito1,5*

*<sup>1</sup> Department of Psychology, University of California, Berkeley, CA, USA*

*<sup>2</sup> Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA*

*<sup>3</sup> Department of Linguistics, University of British Columbia, Vancouver, BC, Canada*

*<sup>4</sup> Department of Veterans Affairs, Northern California Health Care System, Martinez, CA, USA*

*<sup>5</sup> Helen Wills Neuroscience Institute, University of California, Berkeley, CA, USA*

#### *Edited by:*

*Virginia Penhune, Concordia University, Canada*

#### *Reviewed by:*

*Narly Golestani, Université de Genève, Switzerland Michael Ramscar, Tübingen University, Germany Ruth De Diego-Balaguer, Institució Catalana de Recerca i Estudis Avançats, Spain*

#### *\*Correspondence:*

*Amy S. Finn, Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, 43 Vassar Street, 46-4037, Cambridge, MA 02139, USA e-mail: amyfinn@mit.edu*

Does tuning to one's native language explain the "sensitive period" for language learning? We explore the idea that tuning to (or becoming more selective for) the properties of one's native-language could result in being less open (or plastic) for tuning to the properties of a new language. To explore how this might lead to the sensitive period for grammar learning, we ask if tuning to an earlier-learned aspect of language (sound structure) has an impact on the neural representation of a later-learned aspect (grammar). English-speaking adults learned one of two miniature artificial languages (MALs) over 4 days in the lab. Compared to English, both languages had novel grammar, but only one was comprised of novel sounds. After learning a language, participants were scanned while judging the grammaticality of sentences. Judgments were performed for the newly learned language and English. Learners of the similar-sounds language recruited regions that overlapped more with English. Learners of the distinct-sounds language, however, recruited the Superior Temporal Gyrus (STG) to a greater extent, which was coactive with the Inferior Frontal Gyrus (IFG). Across learners, recruitment of IFG (but not STG) predicted both learning success in tests conducted prior to the scan and grammatical judgment ability during the scan. Data suggest that adults' difficulty learning language, especially grammar, could be due, at least in part, to the neural commitments they have made to the lower level linguistic components of their native language.

#### **Keywords: language learning, sensitive period, fMRI, plasticity, expertise**

## **INTRODUCTION**

Language is an exceedingly complex learned behavioral system. It is well-documented that children ultimately learn this system better than most adults (Snow and Hoefnagel-Höhle, 1978; Birdsong, 1999; Newport et al., 2001; Mayberry and Lock, 2003). However, age-related learning and memory differences usually go in the opposite direction, with young adults consistently outperforming children (Gathercole et al., 2004; Ghetti and Angelini, 2008)<sup>1</sup> . Why is learning language an exception?

One long-posed explanation is that adults' language learning difficulties are the consequence of diminishing neural plasticity (Penfield and Roberts, 1959; Lenneberg, 1967; Pulvermüller and Schumann, 1994). While the mechanisms of plasticity were underspecified in these early proposals, some support for this general idea comes from work showing that cortical sensitivity to different languages in bilinguals is spatially distinct (Whitaker and Ojemann, 1977; Ojemann and Whitaker, 1978; Lucas et al., 2004). These studies applied electric current to cortical regions prior to brain surgery in order to identify (and avoid) language-sensitive regions. Patients also showed more diffuse cortical sensitivity for their second-language (L2) as compared to their native-language. While no causal arguments can be made from these data, the L2 could have a spatially distinct and more diffuse representation because the native-language regions are optimized for (or tuned to) the native-language and therefore cannot process the L2 well. The very process of tuning to the native-language, while beneficial for processing that language, could result in being less open (or plastic) for tuning to the L2.

As compared with this patient work, imaging studies allow the analysis of many more individuals, and therefore permit the exploration of how later—vs. earlier—learned L2s are represented. While both early and late-learned languages are associated with the activation of classic language regions (Klein et al., 1995; Yetkin et al., 1996; Chee et al., 1999; Rüschemeyer et al., 2005, 2006; Indefrey, 2006; Abutalebi, 2008; Consonni et al., 2013), later-learned languages are associated with (1) a greater activation of language regions [especially the left Inferior-Frontal-Gyrus (IFG)] (Dehaene et al., 1997; Chee et al., 2003; Tatsuno and Sakai, 2005; Golestani et al., 2006; Rüschemeyer et al., 2006) and, (2) the involvement of additional (contralateral and subcortical) regions (Klein et al., 1994; Perani et al., 1996; Abutalebi et al., 2013). Likewise, recruitment of the IFG overlaps more for early vs. late bilinguals (Kim et al., 1997) and for more vs. less proficient bilinguals (Perani et al., 1998; Wartenburger et al.,

<sup>1</sup>It is well-established that these very process that increase during childhood executive function and memory—also decrease with aging. See Luo and Craik (2008), for a full review.

2003; Dodel et al., 2005; Tatsuno and Sakai, 2005; Golestani et al., 2006; Leonard et al., 2011). These studies all suggest that laterlearned languages are represented differently, overlapping less with circuitry supporting the native-language.

Neural tuning could explain this. Studies in rats have shown that auditory neurons tune to environmental stimuli (Zhang et al., 2001; Chang and Merzenich, 2003) and that early exposure can lead to more efficient processing of a particular stimulus later on (Insanally et al., 2009). In human infants, behavioral work has shown that a similar tuning process most likely occurs with exposure to native-language phonetics; as infants learn more about the relevant contrasts in their native language they lose the ability (previously held) to distinguish phonetic contrasts not present in their language (Werker et al., 1981). A similar mechanism could be driving age-related differences in the neural representation of language.

Several recent theories of first language acquisition highlight this possibility. These propose that language learning is best viewed as a series of nested sensitive periods; tuning in one area (say to the phonetic categories of one's language) gives rise, in turn, to an ability to learn other aspects of language (Kuhl, 2004; Werker and Tees, 2005). Importantly, these theories suggest that the neural networks dedicated to processing nested aspects of language (i.e., phonetic categories for spoken languages) do not just influence learning at the same level of linguistic knowledge, but also promote (or inhibit) the brain's future ability to learn *other* aspects of language, such as grammar. In other words, the neural networks dedicated to the newly learned languages should differ not just in regions that are directly sensitive to phonetics or grammar, but across the network in terms of how these regions interact with one another.

While such interactions have yet to be explored in the brain, there is some modeling and behavioral evidence for this pattern of nested learning. Modeling work has shown that experience (or the number of training trails) is crucial for tuning: with more training, individual units are more committed (or tuned) to specific functions (see Ramscar et al., 2010). There is also behavioral evidence for this pattern of learning, both for facilitation in L1 acquisition and inhibition in adult L2 acquisition. For instance, Kuhl et al. (2005) found that infants who were good at phonetic contrasts in their native language and poor at irrelevant contrasts (and are therefore more "tuned" to the sound properties of their language) performed better, as compared to those who were less specifically tuned, when measured on other aspects of language processing later-on. And Finn and Hudson Kam (2008) found that adult L2 learners' ability to segment words from running speech via statistical learning was compromised when L1 word formation patterns (phonotactics) conflicted with the L2 word boundaries. Since tuning to novel phones is known to be especially difficult for adults (Golestani and Zatorre, 2004; Zhang et al., 2005; Wong et al., 2007), the nesting hypothesis suggests that this may account for their difficulties with all other aspects of language as well. Moreover, and of particular relevance for the present paper, tuning should influence the neural representation of later-learned languages, both within and across regions, in terms of how they interact with each other.

## **METHODS**

To investigate this, we examine whether non-native L2 phonology (sounds and phonotactics)—defined here as the degree to which it is shared with native language—can affect where L2 *grammar* is processed in the brain. We created two miniature artificial languages (MALs) both with the same syntax but each with different sound systems, which we taught to two different groups of adult learners over the course of 4 days. After the language exposure, participants underwent fMRI scanning while making grammaticality judgments in the MAL they had learned and in English (their native language). Importantly, the shared grammatical structures of the MALs were distinct from English. Crucially, one miniature language was phonologically similar to English (English-Phonology; EP), the other was distinct (Non-English-Phonology; NEP).

If the ideas outlined above are correct, we should observe (1) less overlapping recruitment for the language with distinct phonology (NEP) and English than the EP language and English (Kim et al., 1997), (2) the recruitment of additional regions [including contralateral regions (Golestani and Zatorre, 2004; Perani and Abutalebi, 2005; Klein et al., 2006)] for the NEP vs. the EP language, and (3) more native-like connectivity within the network recruited for the EP language as opposed to the NEP language. Analyses are conducted across the brain and focused especially on the left Inferior Frontal Gyrus (IFG) and left Superior Temporal Gyrus (STG) as both are associated with processing of syntax (Friederici and Kotz, 2003; Musso et al., 2003; Opitz and Friederici, 2007; Herrmann et al., 2012) and speech perception/production (Hickok and Poeppel, 2000).

### **PARTICIPANTS**

Twenty individuals from the University of California, Berkeley were randomly assigned to learn one of the two languages. Since gender is related to differences in the neural representation of language (Harrington and Farias, 2008), this was balanced across groups, 5 of the 10 NEP leaners were male and 5 of the 10 EP learners were male. Age was also matched (EP: mean: 24.5 yrs, *SD*: 4.99; NEP: mean: 24 yrs, *SD*: 5.27). All participants were righthanded native English speakers with no history of hearing loss and no more than 3 years of classroom based exposure to another language. Participants were excluded if they had any previous exposure to an SOV language or any home-based exposure to a language other than English [since phonetic information can be retained after this kind of experience (Kit-Fong Au et al., 2002)].

### **STIMULI**

Both languages comprised 4 transitive verbs, 30 nouns, which were arbitrarily divided into two noun classes, and 4 suffixes. Sentences followed a subject-object-verb word order. All nouns were followed by one of two noun suffixes, which served to indicate noun class membership. There was also subject-verb agreement. The subject agreement suffix depended on the noun class of the subject noun, but was not the same form as the suffix on the noun itself (**Figure 1A**). Importantly, the two languages have exactly the same grammatical structure as each other, but one which is distinct from English and so requires learning.

Critically, however, the two MALs differ in their phonological inventories. The EP language is comprised of phones that occur regularly in English (**Figure 1B**). Individual token frequencies were matched to English in both syllable position frequencies, and syllable structure frequencies as closely as possible. For example, if a phone occurs at the beginning of a word 5% of the time in English, this is also true for EP. Likewise, if 20% of English words follow a consonant-vowel-consonant pattern, 20% of EP words do as well <sup>2</sup> . In contrast, the NEP language is comprised mostly of phones that do not occur in English (**Figure 1C**) drawn from an inventory of phonemes from

2Following these constraints, 60 possible words were actually generated, of which 30 were chosen based on English-likeness ratings from native English speaking raters blind to the overall goals of the study (*n* = 10).

**FIGURE 1 | EP and NEP languages.** EP and NEP languages share the same grammar **(A)**, but have different phonological inventories **(B,C)**.

across the world's languages <sup>3</sup> . To construct words in the NEP language and develop the NEP phoneme inventory, non-native phones were substituted into EP words maintaining major manner and place features. For example, the word for truck in EP, /hIn/, starts with a glottal fricative while the word for truck in NEP, /xy /, starts with a velar fricative; the bilabial voiceless plosive, /p/, is replaced with a bilabial ejective /p'/, and so on. Thus, the NEP has the same number of phonemes as the EP and English.

All stimuli from all three languages (English, EP, and NEP) were recorded in a sound booth by the same male native Englishspeaker, who is a trained phonetician. To ensure parity of production fluency, the NEP language was practiced several times until speech rate and duration across EP and NEP were approximately equivalent.

The languages were created in conjunction with a small world of objects and actions. Even with the semantic restrictions imposed by the referent world, there are over 3600 possible sentences. This creates a wide scope for testing participants using novel sentences.

#### **TESTS**

There were 4 tests—vocabulary, verb agreement, noun class, and word order. Each of these tests was administered at various points

3One hundred and fifty phones that do not occur in English were chosen from a list of phonemes from across the world's languages (Maddieson, 1984). Native English speaking participants blind to the study design rated these phones, presented individually, on their English-likeness (*n* = 10). The lowest ranked phones (13 vowels, 19 consonants) were chosen for constructing the words.

during training. Here we present results from the final tests (end point) since that was integral to the design of this study<sup>4</sup> . To test vocabulary, participants viewed a picture, heard three possible labels for that picture and indicated which of the three labels they thought matched the picture with a button press. Verb agreement, noun class and word order were also tested. The tests of verb agreement and noun class were forced choice; learners were asked to indicate which of two sentences sounded like a better sentence in the language they just learned. For verb agreement, they chose between a correct subject-verb pairing and an incorrect pairing with every other aspect of the sentences being equivalent (and correct). For noun-class, they chose between a sentence with a correct noun class suffix and an incorrect noun class suffix; everything else was equal. The word order test was also forced choice; individuals were presented with a scene and heard two possible sentences that could correspond to that scene. One sentence followed the correct subject-object-verb word order and one flipped this arrangement having object-subject-verb word order.

#### **PROCEDURE**

Learning occurred over the course of 4 days and the fMRI scan occurred on the 5th day. To learn, participants watched a series of short scenes on the computer, listened to their corresponding sentences, and repeated the sentences out loud. In order to better mimic naturalistic language learning (as opposed to classroom L2 learning) learners were not given any direct feedback during this training (Hudson Kam and Newport, 2005, 2009). Days 1– 3 each consisted of one 90-minute session during which the 57 scenes (and their corresponding MAL sentences) that comprised the stimulus set were repeated three times.

4The results of earlier tests (not end-point tests presented here) are the subject of another paper currently in preparation. For the purposes of measuring neural activation, we were focused on equating for proficiency prior to scanning.

learners recruit the superior-temporal gyrus more than EP learners (NEP *>*

Because we know that difficulty of processing and time on task can drive differences in the blood oxygen level dependent (BOLD) response (Whitaker and Ojemann, 1977; Huettel et al., 2009) and because language proficiency impacts neural representation (Perani et al., 1998), we felt that is was important to match participants' learning-levels (and not necessarily the amount of exposure to the language) prior to participation in the scan. To ensure no differences, participants were tested on all measures at the end of day 3. If after day 3, performance was below 75% on any test, participants were given the full 90-minute exposure on day 4 (the 57 scenes presented three times). If performance was above 75% on all measures, participants were given only 30 min of exposure on day 4 (the 57 scenes were presented only once). This design allowed us to control proficiency prior to the scan, allowing the direct comparison of neural responses across the languages even though the NEP should be harder to learn. Accordingly, four NEP and two EP learners received the 90 minute exposure on day 4, while all other leaners received 30 min of exposure on day 4.

Neural recruitment was probed on day 5 while individuals determined whether a sentence was grammatical or not in alternating blocks of English or the MAL they learned. Blocks were counterbalanced across participants and conditions; half of the scans began with English and the other half began in the MAL they learned. These were presented in blocks so that learners were not required to switch between languages when making grammaticality judgments. This task was chosen in order to engage regions targeting grammatical processing, and not phonology (at least not directly). For each language, 15% of the items were not grammatical. This percentage was chosen to maximize the number of grammatical trials that can be used for data analysis, while having enough ungrammatical items to hold listeners' attention. Ungrammatical English items were modeled after Johnson and Newport (1989). Half of the ungrammatical MAL items were verb agreement errors and the other half were noun class errors. In this event related design, each sentence was presented over noisecancelling earphones for 4 s, after which participants had 2 s to indicate their response. Sentences across the three languages— English, EP, and NEP—were matched for length. Finally, there was a jittered rest period prior to the next trial (from 2 to 8 s mean length: 5 s). Each trial lasted an average of 11 s; there were 160 trials of each condition, split into 4 runs of 80 trials each.

Functional MRI data were acquired on a Siemens MAGNETOM Trio 3T MR Scanner 291 at the Henry H. Wheeler, Jr. Brain Imaging Center at the University of California, Berkeley. Anatomical images consisted of 160 slices acquired using a T1-weighted MP-RAGE protocol (*TR* = 2300 ms, *TE* = 2*.*98 ms, *FOV* = 256 mm, matrix size = 256 × 256, 294, voxel size 1 × 1 × 1 mm). Functional images consisted of 27 slices acquired with a continuous gradient echoplanar imaging protocol (*TR* = 2000 ms, *TE* = 32 ms, *FOV* = 1380 mm, matrix size = 128 × 128, voxel size 1*.*8 × 1*.*8 × 3*.*5 mm).

#### **fMRI ANALYSIS**

Functional MRI data processing, analysis were completed using a Statistical Parametric Mapping program [SPM5 (Friston et al., 1995)]. Temporal sync interpolation was used to correct for

EP) **(D)** Heat maps indicate the *t*-statistic.

#### **Table 1 | Univariate activity during new language processing.**


*In this and all other table presenting univariate data, regions are listed where period-specific parameter estimates were significantly greater than baseline (with a t statistic of 3 with a minimum contiguous cluster size to 10 voxels) across scan times.*

between-slice timing differences. Motion correction was accomplished using a six-parameter rigid-body transformation algorithm, and data were spatially smoothed using 8 mm FWHM Gaussian kernel. A statistical parametric map was calculated for each participant based on linear combinations of the covariates modeling each task period (listening and response for English and the newly learned language separately; correct and incorrect trials were modeled separately and only correct trials were included in the final analyses). These individual results were then combined into a group analysis. All data presented refer to the listening (and not response) phase of the experiment.

Whole brain conjunction analyses was completed using SPM5, following the minimum statistic, conjunction null method in which all of the comparisons in the conjunction must be individually significant (Nichols et al., 2005). In all cases, the conjunction was conducted for the contrasts (1) English *>* implicit baseline, and (2) new language (EP or NEP) *>* implicit baseline. Regions of interest (ROI) were created for the left IFG [Broca's region (Amunts et al., 1999)], the left STG (Morosan et al., 2001), and anterior and posterior regions of the left Angular Gyrus [AGa and AGp (Caspers et al., 2006)] using the SPM Anatomy Toolbox (version 1.6; Simon Eickhoff). The number of overlapping voxels (from the conjunction analysis) were counted within these masks for each individual (normalized space). Voxels reaching a range of thresholds (from *t* = 3 to *t* = 5*.*5) were identified.

#### **Table 2 | Univariate Across language comparisons.**


#### **Table 3 | Univariate data: EP and English.**



#### **Table 4 | Univariate data: NEP and English.**

In addition, the mean contrast values for processing in the new language (EP or NEP vs. implicit baseline) were extracted from these ROIs (in normalized space) using MarsBar (Brett et al., 2002) and correlated with behavior. Behavioral regressors (learning scores) were included in the second level analysis in order to identify regions—across the brain—most related to behavior. To measure functional connectivity, the magnitude of the taskrelated BOLD response was estimated separately for each of the experimental trials, yielding a set of beta values for each condition for every voxel in the brain (beta series). The extent to which two brain voxels interact during a task condition is quantified by the extent to which their respective beta series from that condition are correlated (Rissman et al., 2004).

#### **RESULTS**

Due to technical errors during data collection, behavioral data during the scan is missing from one individual (an NEP learner). As expected, repeated measures analyses of variance (ANOVAs) reveal a main effect of language such that performance was better [discrimination sensitivity (d- ): *F(*1*,* <sup>17</sup>*)* = 23*.*130, *p <* 0*.*001] and faster [*F(*1*,* <sup>17</sup>*)* = 5*.*215, *p* = 0*.*036] for English (mean reaction time from sentence onset = 4392 ms, *SD* = 514) as compared with the MAL (mean reaction time from sentence onset = 4715 ms, *SD* = 317). There was no main effect of learning group [d- : *F(*1*,* <sup>17</sup>*)* = 0*.*014, *p* = 0*.*907; reaction time: *F(*1*,* <sup>17</sup>*)* = 0*.*198, *p* = 0*.*662] and no group by language interaction [d- : *F(*1*,* <sup>17</sup>*)* = 1*.*358, *p* = 0*.*260; reaction time: *F(*1*,* <sup>17</sup>*)* = 0*.*127, *p* = 0*.*725; EP reaction time: mean = 4721 ms, *SD* = 426; NEP reaction time: mean = 4709 ms, *SD* = 146]. Thus, grammaticality judgments did not differ across groups for either English or MAL during the scan (**Figure 2A**). Likewise, performance across groups was matched prior to the scan overall [average performance on all tests on all test days: *t(*18*)* = 1*.*79, *p* = 0*.*090] and on each grammatical test (average performance on both days tested): noun class *t(*18*)* = 1*.*418, *p* = 0*.*173, verb agreement *t(*18*)* = 0*.*916, *p* = 0*.*372, word order *t(*18*)* = 0*.*551, *p* = 0*.*588; **Figure 2B**<sup>5</sup> .

NEP and EP learners both recruited regions known to be critical for language processing while performing grammaticality judgments in English and the MAL they learned (**Figures 3A,B**; **Table 1**); all contrasts reported are during the listening period. One sample *t*-tests reveal that regions recruited by both groups for the newly learned language (vs. implicit baseline) include the left IFG (including Broca's region) the Insula (bilaterally) the STG [bilaterally; including posterior language regions, and the Angular Gyrus (**Figures 3A,B**; **Table 1**)].

Across MALs, important differences were observed. Independent sample *t*-tests reveal that EP learners recruit posterior language regions to a greater extent (left temporoparietal region; EP *>* NEP; **Figure 3C**; **Table 2**), while NEP

<sup>5</sup>While vocabulary performance differs across learning groups both overall [across all days tested: *t(*18*)* = 3*.*130, *p* = 0*.*006] and during the final test day [*t(*18*)* = 2*.*33, *p* = 0*.*032], it is very high for both EP (mean = 98.6% correct, *SD* = 0*.*023) and NEP (mean = 93.7% correct, *SD* = 0*.*064).

learners recruit bilateral superior temporal gyrus (STG) more than EP learners (NEP *>* EP; **Figure 3D**; **Table 2**; see **Tables 3** and **4** for differences between EP/NEP and English).

In the next set of analyses, we use overlap and connectivity methods to explore which recruitment profile (EP vs. NEP) is more similar to English, participants' native language. First, if experience-driven neural tuning contributes to sensitive period phenomena, we should observe less overlapping recruitment for the language with distinct phonology (NEP) and English than EP and English. Both EP and NEP recruitment overlaps with English in the IFG, AG, and STG (along with other regions including the Basal Ganglia; **Table 5**; **Figures 4A,B**). To investigate differences across the groups of learners, we counted the number of voxels that were jointly active for English and the new language (EP or NEP; **Figure 4C**) in the left IFG, left STG, and left AG (posterior and anterior) at multiple different thresholds (*t* = 3, 3.5, 4, 4.5, 5, and 5.5; **Figure 4D**). We then compared the means of these values across groups using independent samples *t*-tests (**Table 6**), and found that EP learners have more overlapping recruitment (of the language they learned and English) than NEP learners in the left IFG and AG (both anterior and posterior regions), but not in the left STG (**Table 6**).

For both EP and NEP learners left IFG activity is related to behavioral performance whereas activity in the STG and AG is not. That is, the magnitude of recruitment within the left IFG while processing the newly learned language (EP or NEP *>*

#### **Table 5 | Conjunction Analyses.**


*Regions are listed where conjunctions were significant after correcting for multiple comparisons (FDR, p < 0.01).*

implicit baseline) is correlated with learning (average of all tests collected prior to the scan, *r* = 0*.*488, *p* = 0*.*029; **Figure 5A**) and performance on grammaticality judgments for the newly learned language in the scanner (percent correct: *r* = 0*.*507, *p* = 0*.*027; **Figure 5B** <sup>6</sup> ; and a trend toward a relationship with d- : *r* = 0*.*418, *p* = 0*.*075; **Figure 5C**). These relationships were not observed in the STG (learning: *r* = 0*.*096, *p* = 0*.*687; percent correct: *r* = −0*.*098, *p* = 0*.*691 d- : *r* = −0*.*103, *p* = 0*.*674) or AG (anterior: learning: *r* = 0*.*169, *p* = 0*.*687; percent correct: *r* = −0*.*004, *p* = 0*.*986 d- : *r* = −0*.*146, *p* = 0*.*552; posterior: learning: *r* = 0*.*037, *p* = 0*.*875; percent correct: *r* = −0*.*116, *p* = 0*.*637 d- : *r* = −0*.*249, *p* = 0*.*304)<sup>7</sup> . Interestingly, this relationship between learning and performance in the IFG appears to be specific to the newly learned language (the MAL). The same relationship is not observed in the left IFG for making grammaticality judgments in English while processing English (percent correct: *r* = 0*.*155, *p* = 0*.*525; d- : *r* = 0*.*286, *p* = 0*.*235; reaction

time: *r* = 0*.*194, *p* = 0*.*427). This was also true of the left STG (percent correct: *r* = −0*.*155, *p* = 0*.*525; d- : *r* = −0*.*137, *p* = 0*.*576; reaction time: *r* = −0*.*073, *p* = 0*.*766), left AGa (percent correct: *r* = −0*.*058, *p* = 0*.*815; d- : *r* = −0*.*122, *p* = 0*.*619; reaction time: *r* = 0*.*418, *p* = 0*.*075), and left AGp (percent correct: *r* = −0*.*134, *p* = 0*.*586; d- : *r* = −0*.*123, *p* = 0*.*615; reaction time: *r* = 0*.*434, *p* = 0*.*063). It is likely that such a brain-behavior relationship (with English) is not detectable when the language is well-established (due to ceiling effects and a lack of variability) and might be more detectable earlier in the learning process, as is observed in these data for MAL learners.

In order to localize where within the left IFG the relationship between learning and neural recruitment while processing the MAL (MAL *>* baseline), we entered learning scores as a regressor in the group level whole-brain analysis and found the strongest relationship in the left IFG (MNI peak coordinates: −40, 18, 12) which corresponds with the Pars Triangularis (note other relationships within the right IFG and Basal Ganglia; **Table 7**).

These data establish an important role of the left IFG in learning the MAL and performance, while making grammaticality judgments in the new language. Whole brain analyses also establish the importance of the STG while processing these newly learned languages, especially for NEP learners (left STG recruitment is greater for NEP than EP learners; **Figure 3D**). If this region is not important for making grammaticality judgments or overall learning, then why are NEP learners recruiting

<sup>6</sup>Notice there is one statistical outlier who has very low accuracy (55%). This subject's performance was also low on grammaticality judgments in English (60%) and so this low performance is likely due to factors other than not learning the new language. Only correct trials were included in the brain analyses and this brain-behavior correlation remains significant when this outlier

is excluded (percent correct: *<sup>r</sup>* <sup>=</sup> <sup>0</sup>*.*523, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*026). 7Note that these relationships are only marginally significant (between learning and recruitment of the IFG and percent correct and recruitment of the IFG) when corrections for multiple comparisons are made (bonferroni *p* for 3 tests per *DV* = 0*.*017).



this region more so than EP learners? To address this question, we performed functional connectivity analyses by choosing seed regions in the left IFG and the left STG (the 10 most active contiguous, voxels within the anatomical region while processing *English* (English *>* implicit baseline) and searched for correlated fluctuations in activity (with the time series in the seed region: beta series analysis) the brain while individuals were processing the MAL they learned (vs. implicit baseline) (Rissman et al., 2004). First, expected beta series correlations were observed in EP and NEP learners with classic language regions in both hemispheres (**Table 8**). Notably, the left STG seed was coactive with the left IFG (*t* = 4*.*34, *p <* 0*.*001; **Figure 6A**; **Table 8**) and the posterior left temporal-parietal-occipital region [also important for higher-order language processing (Poeppel and Hickok, 2004), *t* = 4*.*44 *p <* 0*.*001] in NEP but not EP learners (**Figure 6A**; **Table 8**). The STG appears to be more involved in the neural network involved in processing the MAL in the NEP learners, a finding that could shed light on why NEP learners recruit this region more.

Is this broader network recruited by NEP learners more similar to or distinct from English? To understand how networks differ from English (and thus what is more similar to native language recruitment), we conducted the same connectivity analysis (Rissman et al., 2004) in the same seed regions (IFG and STG) for a different contrast—newly learned language vs. English (MAL*>*English)—to reveal regions that are more co-active for processing the MAL vs. English. For EP learners, the left IFG seed was more coactive with the contralateral (right) IFG (*t* = 6*.*12, *p <* 0*.*001), and the left STG seed was also more co-active with the contralateral (right) STG (*t* = 7*.*83, *p <* 0*.*001), for MAL processing as compared to English. For the NEP learners, the left IFG seed was more co-active with the bilateral STG (left: *t* = 6*.*50, *p <* 0*.*001; right: *t* = 3*.*77, *p <* 0*.*001), and the left STG seed was more coactive both with the contralateral (right) STG (*t* = 6*.*08, *p <* 0*.*001) and ipsilateral (left) IFG (*t* = 4*.*35, *p <* 0*.*001), for

MAL processing as compared to English (**Figure 6B**; see **Table 8** for all comparisons with English). In sum, the EP network differs from English with greater recruitment of the contralateral hemisphere (both for the IFG and STG) and the NEP network differs from English with greater coactivity between the STG and IFG regions. Both connectivity profiles differ in important ways from English, with EP learners being less lateralized and NEP learners showing greater coactivity between the IFG and STG.

#### **DISCUSSION**

In this study, we asked whether tuning to the properties of one's native language can explain, at least in part, the sensitive period for language learning. In particular, we asked whether changing an earlier-learned (and tuned) aspect of language—sound structure—would have an impact on the neural representation of a later learned aspect—grammar. The data clearly indicate that it does. EP learners' neural recruitment overlaps more with English in key language regions (including the left IFG and left AG). Likewise, the neural circuit recruited to process the EP language is similar to the neural circuit recruited during the processing of

English, albeit less lateralized (including contralateral regions). EP learners also recruit the left temporo-parietal region more than the NEP learners, a finding that could reflect greater phonetic expertise and sensory—motor integration (Buchsbaum et al., 2001). NEP learners, on the other hand, recruit the STG (bilaterally) more than EP learners. Moreover, this region appears to be part of the broader and less lateralized neural circuit used to process the NEP language that involves greater STG/IFG connectivity. We review the implications of these findings with respect to the tuning hypothesis.

Native language regions were less involved in the processing of the NEP as compared to the EP language. This was evident in the left IFG and AG, where recruitment overlapped more for English and EP than English and NEP. This pattern of findings supports our tuning hypotheses: the NEP could overlap less with English simply because cortex used for processing English is tuned for English and therefore less able to process the NEP language.

Greater recruitment of STG in NEP learners also supports the idea that native language regions are not as capable of processing the NEP language. The STG is known to be involved in phonetic processing (Hickok and Poeppel, 2000), including the perception of speech sounds (Buchsbaum et al., 2001), is engaged to a greater degree bilaterally when individuals process non-native phonological distinctions (Zhang et al., 2005), and is associated with successful learning of non-native pitch patterns in speech (Wong et al., 2007). The greater recruitment of this region for NEP learners could therefore reflect a process, whereby the brain is in the process of tuning to the sounds<sup>8</sup> . With more exposure to the language or perhaps more direct training on the sounds, we would expect NEP learners to recruit this region less over time.

Proficiency and fluency with language (Perani et al., 1998; Chee et al., 2002; Consonni et al., 2013) as well as cognitive demand (difficulty, more broadly construed) are important factors known to influence neural recruitment, especially in the prefrontal cortex, including the left IFG (Raichle et al., 1994; Rypma and D'Esposito, 2000; Crittenden and Duncan, 2012),

### **MAL recruitment and learning score (pre–scanner)**


<sup>8</sup>The STG is of course not the only region in the brain that is associated with phonological processing. In fact, prefrontal regions (the IFG) are associated with phonological decoding and processing and the Medial Temporal Gyrus (MTG) is also widely implicated along with more posterior superior temporal regions [See Poeppel and Hickok (2004), for a comprehensive review]. Likewise, successful learning of non-native contrasts is associated with recruitment of the same regions used for native contrasts: the left STG, the insula (frontal operculum), and left IFG (Golestani and Zatorre, 2004).

#### **Table 8 | Beta series correlations.**


both in terms of degree of recruitment (magnitude) and how the region interacts with other regions (Rypma et al., 2006; Rissman et al., 2008). Differences in recruitment across EP and NEP learners could therefore be related to these known factors. Importantly, EP and NEP learners did not differ in terms of reaction time or accuracy when assessing the grammaticality of sentences in the scanner. Likewise, we do not observe differences in the pure univariate contrast EP vs. NEP in the left PFC; rather differences are observed in degree of overlap with English and connectivity with the STG. Observed differences across languages are therefore likely to reflect requirements imposed by phonological processing and attempts to processes (and tune to) the new sounds.

While the STG appears to be involved in tuning to new sounds, recruitment of the left IFG appears to be more related to performance and learning. Indeed recruitment of the left IFG (but not the left STG) significantly correlated with performance in the scanner and, even more strikingly, learning measured prior to the scan. NEP learners' greater recruitment of STG (independently and as part of the larger language network) does not directly relate to performance. Why then are they recruiting this region so robustly? It is likely that this recruitment reflects an attempt to process (and tune to) the new sounds (Zhang et al., 2005, 2009; Wong et al., 2007).

At present, however, we cannot know for certain whether this is the case. While differences in the STG across the learning groups are especially striking, training studies such as these are expensive and limited in size (only 20 learners overall) therefore limiting the generalizability of the data. In addition, even though creating these productive MALs allows for strict control over the linguistic features of interest—both grammar and phonology—they are nonetheless still miniature and artificial. It is hard to know if differences we observe here would scale to real and larger languages. Along these lines, future research should investigate the relationship between the recruitment of the STG and IFG over time with growing phonological as well as grammatical expertise. By measuring changes in phonological expertise more directly, the "phonetic scaffold" could be characterized more fully and the influence of this learning on grammar learning (both behaviorally and in the brain) could be much better understood. Exposure is also likely to impact learning outcomes. It could be (and is very likely) that 4 days of exposure to novel phonology is not nearly enough to build the phonemic maps necessary to process new sounds, but increased exposure would result in overcoming this and developing the requisite "scaffolding." Delays in the making of this scaffold are likely to be part of the cause of adult languagelearning difficulties and further work needs to characterize this alongside grammatical learning during longer periods of time in adults.

Further work characterizing the anatomical and functional specificity of these scaffolds is also necessary. Much recent work aims to characterize the functional specificity of sub-regions both within in the IFG (Fiebach et al., 2006; Fedorenko et al., 2011) and the STG (Indefrey and Levelt, 2004) and to more carefully specify the functional anatomy of language (Poeppel and Hickok, 2004). While this is not possible in the current sample (functional localizers were not employed and the sample is insufficient for extensive brain-behavior analyses), it should be an important goal of future investigation especially for thinking about possible learning interventions.

Despite the need for further studies, our findings have implications for understanding the sensitive period for language learning. Neural recruitment—even when proficiency is matched—differs across EP and NEP learners. The ways in which this recruitment is different (additional STG, less overlap with English in the left IFG) is consistent with the nested tuning theory which predicts that differences in more foundational aspects of language (such as sounds) should have implications for the neural representation of aspects of language that depend on the foundational ones (grammar). We show that it does. Adults' difficulty in learning language may therefore be due to the recruitment of the "wrong" neural scaffolding.

#### **AUTHOR CONTRIBUTIONS**

Amy S. Finn and Carla L. Hudson Kam developed the idea for the study. Marc Ettlinger and Mark D'Esposito contributed to the study design. Testing and data collection were performed by Amy S. Finn. Amy S. Finn, Jason Vytlacil, and Marc Ettlinger performed the data analysis and interpretation under the supervision of Carla L. Hudson Kam and Mark D'Esposito. Amy S. Finn drafted the paper, and all co-authors provided critical revisions. All authors approved the final version of the paper for submission.

#### **ACKNOWLEDGMENTS**

This research was funded by NIH [Grants MH63901 and NS40813 (Mark D'Esposito), HD04857 (Carla L. Hudson Kam)] and an NSF Graduate Research Fellowships Program Award (Amy S. Finn). We thank Ashley Smart, Joscelyn Daguna, and Polly Chen for their assistance.

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 31 May 2013; accepted: 25 October 2013; published online: 12 November 2013.*

*Citation: Finn AS, Hudson Kam CL, Ettlinger M, Vytlacil J and D'Esposito M (2013) Learning language with the wrong neural scaffolding: the cost of neural commitment to sounds. Front. Syst. Neurosci. 7:85. doi: 10.3389/fnsys.2013.00085*

*This article was submitted to the journal Frontiers in Systems Neuroscience.*

*Copyright © 2013 Finn, Hudson Kam, Ettlinger, Vytlacil and D'Esposito. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Learning, neural plasticity and sensitive periods: implications for language acquisition, music training and transfer across the lifespan

## *Erin J. White1, Stefanie A. Hutka2\*, Lynne J. Williams <sup>3</sup> and Sylvain Moreno3*

*<sup>1</sup> Rotman Research Institute, Baycrest, Toronto, ON, Canada*

*<sup>2</sup> Department of Psychology, University of Toronto, Toronto, ON, Canada*

*<sup>3</sup> Centre for Brain Fitness, Baycrest, Toronto, ON, Canada*

#### *Edited by:*

*Virginia Penhune, Concordia University, Canada*

#### *Reviewed by:*

*Christopher I. Petkov, Newcastle University, UK Aniruddh Patel, Tufts University, USA*

#### *\*Correspondence:*

*Stefanie A. Hutka, Rotman Research Institute, Baycrest, 3560 Bathurst Street, Toronto, ON M6A 2E1, Canada e-mail: shutka@research.baycrest.org*

Sensitive periods in human development have often been proposed to explain age-related differences in the attainment of a number of skills, such as a second language (L2) and musical expertise. It is difficult to reconcile the negative consequence this traditional view entails for learning after a sensitive period with our current understanding of the brain's ability for experience-dependent plasticity across the lifespan. What is needed is a better understanding of the mechanisms underlying auditory learning and plasticity at different points in development. Drawing on research in language development and music training, this review examines not only *what* we learn and *when* we learn it, but also *how* learning occurs at different ages. First, we discuss differences in the mechanism of learning and plasticity during and after a sensitive period by examining how language exposure versus training forms language-specific phonetic representations in infants and adult L2 learners, respectively. Second, we examine the impact of musical training that begins at different ages on behavioral and neural indices of auditory and motor processing as well as sensorimotor integration. Third, we examine the extent to which childhood training in one auditory domain can enhance processing in another domain via the transfer of learning between shared neuro-cognitive systems. Specifically, we review evidence for a potential bi-directional transfer of skills between music and language by examining how speaking a tonal language may enhance music processing and, conversely, how early music training can enhance language processing. We conclude with a discussion of the role of attention in auditory learning for learning during and after sensitive periods and outline avenues of future research.

**Keywords: sensitive period, learning, plasticity, language, second language, music, transfer, attention**

### **INTRODUCTION**

The auditory cortex (A1) is shaped by our experience with sounds in our environment. Incoming sounds sum in the auditory nerve response. Yet, from this, the neural networks underlying auditory processing extract the features that segregate auditory objects and extract meaning from the signal (Bregman, 1994; Werner, 2012). Language and music are among the most cognitively complex uses of sound by humans; however humans have the capacity to readily acquire both skills early in life as a result of exposure and interaction with sound environments. A central question of neurobiology and human development is whether this learning is contingent on the developmental timing of exposure, that is, whether there may be sensitive periods in development during which learning and its corresponding neural plasticity occur more readily than at other points.

Sensitive periods are epochs in development where specific experiences have enhanced, long-lasting effects on behavior and the brain (Knudsen, 2004; Penhune, 2011). During these times, there is increased sensitivity to regularities in sensory input that are readily extracted through exposure and interaction with the environment. As such, they are an optimal time for learning (Werker and Tees, 2005). The term "critical period" is often used interchangeably with 'sensitive period', although important distinctions exist between them. *Critical periods* posit short and sharply defined windows-of-opportunity during which exposure to environmental input causes irreversible changes in brain function and structure, whereas *sensitive periods* involve gradual shifts in sensitivity to environmental input outside of which learning is still possible (Lamendella, 1977; Oyama, 1979). The broader term "sensitive period" will be used here to refer to periods in development in which experience has unusually strong effects on brain and behavior (Knudsen, 2004) and to underscore the potential for learning and brain plasticity to continue throughout the lifespan. Sensitive periods are thought to underpin the development of a variety of auditory skills, from the basic encoding of acoustic information in the primary A1 (De Villers-Sidani et al., 2007; De Villiers-Sidani et al., 2008) to many higher-order aspects of language (e.g., Johnson and Newport, 1989; Kuhl, 2010) and music processing (e.g., Penhune, 2011).

The goal of this review is to better understand the mechanisms by which learning and plasticity occur both during and after sensitive periods in auditory development. In the following sections we first give an introduction to general mechanisms by drawing on animal models of auditory development and perceptual learning. Next, we examine three issues that are specific to human auditory development: (1) the role of language exposure versus training in initiating the formation of language–specific phonetic representations in infants and adult second language (L2) learners; (2) the outcome of training that begins at different points in development on neural and behavioral correlates of sensorimotor, motor and auditory processing using music as a platform; and (3) the extent to which childhood auditory experiences, be it with music or speech, result in domain-general enhancements in auditory and auditory-attentional processing. We conclude with critical considerations about the role of selective attention during and after sensitive periods and present directions for future research.

## **AUDITORY LEARNING AND PLASTICITY DURING A SENSITIVE PERIOD**

Although there may be multiple sensitive periods, each guiding different aspects of auditory development, the mechanism by which learning and plasticity occurs is similar. At the beginning of a sensitive period, neural representations are rather broadly tuned to relevant environmental stimuli (Dahmen and King, 2007; Scott et al., 2007). Broad tuning is advantageous because it allows the developing brain to perceive and respond to the features of the sensory environment. Throughout the sensitive period, neural representations become increasingly refined and begin to preferentially respond to frequently encountered features (Scott et al., 2007), thereby allowing for more accurate and efficient processing of salient and frequently encountered information (Kuhl et al., 2008).

Across multiple sensory systems, learning and plasticity during sensitive periods is a "bottom-up" process, characterized by a perceptual narrowing in which perceptual discrimination and underlying neural representations become increasingly selective in their responsiveness to environmental input (Werker and Tees, 1984; Scott et al., 2006, 2007; Kuhl and Rivera-Gaxiola, 2008). It is this initial under-specification of neural systems that is thought to drive the rapid changes that are observed during this time in response to exposure to environmental stimuli (Knudsen, 2004). Within the auditory system, perceptual narrowing during specific sensitive periods in development characterizes how infants learn to group speech sounds into language-specific phonetic categories (Werker and Tees, 1984), process culture-specific musical rhythms (Hannon and Trehub, 2005a,b) and harmonic relationships (Lynch et al., 1990), as well as encode basic auditory features in the primary auditory cortex A1 (Zhang et al., 2002).

Animal models of auditory development have informed our understanding of the time course in which auditory experience becomes represented in the primary A1. In prenatal development, animal models show that spontaneous rhythmic sound pulses create rudimentary tonotopic maps (Lippe, 1994, 1995; Jones et al., 2007). Following birth, these underspecified tonotopic maps enhance their response specificity through exposure to complex sound streams in the environment, which result in the formation of highly organized maps that are dynamically regulated by environmental input (De Villers-Sidani et al., 2007; De Villiers-Sidani et al., 2008; Zhang et al., 2001, 2002). For example, De Villers-Sidani et al. (2007) exposed rat pups to a series of repetitive tones and found abnormal tonotopic map development. That is, in these rats more neurons were devoted to processing the frequencies of the repeated tones, with consequently fewer neurons devoted to processing other tone frequencies, relative to rat pups raised in a normal acoustic environment. Evidence for sensitive periods in audition also comes from studies of disrupted or altered auditory input at different ages (see e.g., Zhang et al., 2002; Chang and Merzenich, 2003; Chang et al., 2005; Takahashi et al., 2006). Zhang et al. (2002) exposed 9 days old rat pups and adult rats to 20 days of pulsed white noise, disrupting the normal temporal patterns of neural discharge that represent specific auditory inputs. At 80 days postnatally, they found degraded tuning curves in A1 in noise-reared rat pups. The tuning curves were broader than in control pups, with multiple peaks in their receptive fields. Moreover, this disordered auditory representation was maintained, with the tonotopic map representing only a twoway distinction between high and low frequency sounds. Adult rats, by contrast, did not show any significant changes to their pre-existing auditory neural representations when exposed to prolonged noise pulses. The effects appear to result from exposure during key, and sometimes very narrow, developmental epochs (De Villers-Sidani et al., 2007; De Villiers-Sidani et al., 2008).

## **AUDITORY LEARNING AND PLASTICITY AFTER A SENSITIVE PERIOD**

In contrast to other sensory systems, the A1 appears to have an extended period of heightened developmental plasticity, with changes in cellular organization and connectivity continuing throughout childhood (for reviews see Kral and Eggermont, 2007; Penhune, 2011). Indeed, the A1 shows considerable changes as a result of perceptual training even into adulthood (Recanzone et al., 1993; Feldman and Brecht, 2005; Polley et al., 2006; for reviews, see Fahle, 2009; Blundon et al., 2011; Chun et al., 2013). However, the conditions that induce plasticity appear to change with age and experience; namely, the bottom-up learning of the sensitive period becomes increasingly influenced and gated by top-down processes (Ahissar et al., 1992; Crist et al., 2001; Fritz et al., 2005, 2007; Polley et al., 2006; Froemke and Martins, 2011). Bottom-up and top-down processes describe the two ends of a continuum that describes the relative weight of external environmental signals versus internal cognitive processes in driving cortical map plasticity. Bottom-up learning is largely a data-driven driven process, whereby exposure to frequently encountered stimulus features refines their corresponding neural representations (Scott et al., 2007). Once rudimentary representations and higherorder categories are formed, they begin modulating sensory feature processing in an increasingly top-down manner (Kral and Eggermont, 2007). Attention also provides top-down input that, with development, increasingly interacts with and shapes bottomup signals (Jagadeesh, 2006). Although both processes interact throughout development, the close of a sensitive period may be in a shift in the relative reliance on bottom-up versus top-down processing in learning.

For example, Polley et al. (2006) selectively trained two groups of adult rats to make a snout press to either the frequency or the intensity of the same auditory stimuli that varied in both dimensions. If bottom-up processes are primarily responsible for adult cortical plasticity, as in juvenile animals, they hypothesized that mere exposure to frequency and intensity variation would be enough to elicit the same plastic changes in the representation of both frequency and intensity in their respective groups. Yet, electrophysiological recordings revealed functional changes in primary and secondary auditory cortices that were associated with perceptual learning of task-relevant stimulus features and not stimulus general features. In other words, a double-dissociation was observed among the groups, with no change in cortical map representations observed for task-irrelevant features. Different profiles of neural plasticity were observed despite exposure to same auditory stimuli, which was taken as evidence that adult cortical plasticity may be modulated by top-down inputs that signal the importance and relevance of particular stimulus features.

Thus, while cortical maturation results in a progressive decline in capacity for bottom-up processes to induce auditory plasticity, concurrent development of higher-order auditory representations (e.g., categories) and other top-down influences such as attention regulation increasingly compliment bottom-up processes to modulate the residual capacity for adult cortical reorganization (Kral and Eggermont, 2007). Although both processes may interact throughout the lifespan, sensitive periods and age-related changes in the propensity for learning from mere exposure may be associated with a developmental shift in the relative reliance on bottomup versus top-down processes. Language acquisition provides a good illustration.

## **EVIDENCE FOR A SENSITIVE PERIOD IN THE PERCEPTION OF SPEECH SOUNDS**

Language is often taken as a classic example of sensitive periods in neurobiology and human development (Lennenberg, 1967; Hensch, 2004; Knudsen, 2004; Kuhl, 2010). However, not all aspects of language display the same temporally defined windows of opportunity. Vocabulary learning, for example, continues throughout life, though there is rapid growth around 18 months of age (Long, 1990; Kuhl, 2010). In contrast, the degree and timing of neuroplasticity for phonology and syntax are thought to be highly sensitive to the age at which language exposure occurs (Werker and Tees, 2005; Stevens and Neville, 2009). Although issues remain concerning the timing and extent to which sensitive periods may guide phonological development, the general consensus is that a sensitive period exists for phonetic learning (e.g., Kuhl, 2010).

#### **EARLY LANGUAGE EXPOSURE RESULTS IN A PERCEPTUAL SHIFT**

Language development during the first year of life is characterized by a shift from language-universal to language-specific phonetic perception (Werker and Tees, 1984, 2002, 2005). At birth, innate perceptual sensitivities allow young infants to categorically perceive and discriminate virtually any speech sound in any language, even those to which they have not been exposed (Eimas et al., 1971; Jusczyk and Luce, 2002). However, between 6 and 12 months of age, infants' auditory systems begin a dramatic perceptual shift that directs how they respond to speech sounds. During this time, which some view as the sensitive period for phonetic learning (e.g., Kuhl, 2010), exposure to the language(s) used in their environment is thought to guide infants' formation of language-specific phonetic representations that serve optimal processing of their native language(s) (Kuhl et al., 2003). Following Hebbian principles (neurons that fire together, wire together; Hebb, 1949), this exposure strengthens the neural representations for speech sounds in infants' native language(s), while neural representations of unused phonetic distinctions weaken (McClelland, 2001). Infants' progressive reductions in sensitivity to phonetic distinctions that are not used in the language(s) of exposure has been documented for a variety of non-native consonant (Werker et al., 1981; Werker and Tees, 1984), vowel (Polka and Werker, 1994; Bosch and Sebastian-Galles, 2003) and lexical tone (Mattock et al., 2008) contrasts.

However, more recently, research has shown that this phonetic shift also results in perceptual gains, conferring an enhanced sensitivity to frequently encountered, meaningful phonetic distinctions in the native language(s) that facilitates future language learning (Kuhl et al., 2005, 2008). For example, Kuhl et al. (2008) reported that event-related potential (ERP) correlates of phonetic discrimination the mismatch negativity, (MMN; Näätänen et al., 1997) measured at 7.5 months in response to native-language phonetic contrasts were positively correlated with measures of vocabulary and syntactic development up to 2 years later. By contrast, larger MMNs in response to non-native phonetic contrasts were associated with fewer words and less complex sentences 2 years later. The authors suggest that infants' discrimination of the native and non-native phonetic contrasts reflects important differences in brain development: better discrimination of nonnative contrasts reflects an immature developmental stage in which the infant's auditory system has not yet committed to relevant native-language speech patterns, whereas enhanced nativelanguage discrimination is associated with neural circuits that have already begun specializing to the speech patterns present in the input language. This underscores the importance of language experiences during a sensitive period: the earlier language-specific neural representations of phonetic categories are formed, refined and stabilized, the earlier and more efficiently they can guide other aspects of language learning.

What guides infants' shift in phonetic perception and the formation of language-specific neural representations? There is evidence that this perceptual shift is dynamically regulated by the statistical distribution of phonetic variation in the language(s) that the infant is exposed to, which suggests that a bottomup learning mechanism also drives the development of speech perception. In their seminal study, Maye et al. (2002) examined infants' discrimination of a non-native phonetic contrast after a brief 2-min exposure to speech sounds from a phonetic continuum that displayed one of two frequency distributions: (1) bimodal, where tokens from endpoints of the continuum were presented relatively more often; or (2) unimodal, where tokens from the center of the continuum were presented relatively more often. In the test phase, only the infants exposed to the bimodal frequency distribution could discriminate the phonetic contrast, even though both groups were exposed to the same stimuli. The authors posited that sensitivity to the statistical distribution of speech sounds is one tool that infants use to determine which acoustic variations are more reliable and therefore more informative for differentiating phonetic categories in the language(s) they are learning. A bottom-up, domain-general statistical learning mechanism has been proposed to underpin other aspects of early language development, including the ability to accurately segment words (Saffran et al., 1996) and order them according to syntactical rules (Saffran and Wilson, 2003). Thus, the perceptual re-organization associated with the establishment of language-specific phonemic representations appears to develop in a bottom-up manner.

Work with near-infrared spectroscopy (NIRS) suggests that the developmental shift towards differentiating language-specific phonetic contrasts coincides with changes in the auditory network subserving phonetic processing, in particular the development of left-lateralization (for reviews see Minagawa-Kawai et al., 2008; Obrig et al., 2010). For example, Minagawa-Kawai et al. (2007) presented five groups of infants (aged 3–4, 6–7, 10–11, 13– 14 and 25–28 months) with vowel duration contrasts that corresponded to across- or within-phonetic boundary changes in their native language (Japanese). Phonemic-specific responses (i.e., larger cerebral hemodynamic responses for across- compared to within-phonetic category changes) were transiently observed in 6 to 7 month old infants, before stabilizing in infants 12 months and older. After 12 months, phonemic-specific responses also began showing a left-hemisphere dominance, as in adult native speakers. The authors interpret these findings as a developmental shift in the mechanisms used for phonetic discrimination—from more general auditory processing at 6–7 months to more linguisticspecific processing after 12 months.

In sum, language-specific left-dominant phonemic category representations appear to develop in a bottom-up manner as a result of language-specific experience during the first year of life. Once a rudimentary version of phonemic category representations exist, they enter into a feedback relationship that increasingly guide speech perception in a top-down manner (Kral and Eggermont, 2007) and bootstrap further language development (Kuhl et al., 2008). Infants' period of heightened sensitivity to the distribution of phonetic cues in their language(s) of exposure (i.e., the sensitive period for phonetic learning) may end when the underlying neural representations of phonemic categories reach a finite point of specificity and stability (Kuhl et al., 2008). Although this may be advantageous for processing one's native language(s), it can have deleterious consequences for processing new stimuli with a different distribution of acoustic features. Such is the case for adult L2 learners.

#### **EXPOSURE VERSUS TRAINING IN SECOND LANGUAGE LEARNING AFTER A SENSITIVE PERIOD**

Examining the process and outcome of L2 learning at different points in development provides a unique perspective into sensitive period effects. In particular, examining L2 acquisition in adult learners allows us to examine the extent to which neural systems that were established for optimal processing of one set of inputs (i.e., a first language; L1) can be later adapted in order to process another set of language inputs (i.e., L2) more effectively. Moreover L2 learning can occur at different ages, in a variety of L1 speakers and through different learning experiences (e.g., implicit learning through exposure vs. explicit training). Consequently, L2 acquisition provides a unique model for examining how experiential and maturational factors interact to facilitate or restrict learning throughout the lifespan.

The most controversial issues in the field of L2 acquisition are the extent to which a learners' age impacts his/her ultimate L2 attainment level and whether there may be one or more sensitive periods in language development that limit lifelong L2 learning (e.g., Singleton and Ryan, 2004; Birdsong, 2006). Successfully acquiring L2 phonology is highly sensitive to the age at which learning begins (for review, see Piske et al., 2001). For example, Flege et al. (1999b) examined the pronunciation skills of a large sample of native Korean speakers who had arrived in the United States between the ages of 1 and 23 years who, upon arrival, began intensive English L2 learning. Results showed a positive correlation between degree of foreign accent and age of arrival (even after controlling for years of education, length of residence and L1/L2 use). In contrast, the correlation between age of acquisition and performance on a grammaticality judgement task was not significant after controlling for these confounding variables. The authors took this as evidence that age of acquisition may exert a greater impact on L2 pronunciation than on morpho-syntactic skills (c.f., Johnson and Newport, 1989 for a discussion of how L2 morphosyntax acquisition may also be vulnerable to delays in acquisition). Age of acquisition effects have also been reported for the perception of non-native phonetic contrasts (Flege et al., 1999a).

What causes these age of acquisition effects in successful L2 phonological attainment? Difficulties that late L2 learners experience with L2 perception and production after years of regular L2 exposure has been taken as evidence that successful L2 phonetic learning and its corresponding neural plasticity may not be possible after a sensitive period has ended (e.g., see, Long, 1990; Pallier et al., 1997; Sebastián-Gallés and Soto-Faraco, 1999; Sanders et al., 2008). The close of sensitive period(s) for language development and the resulting decreased capacity for L2 learning with age has been tied to brain maturation (e.g., Lennenberg, 1967; Scovel, 1988; Johnson and Newport, 1989). Maturational declines in synaptic density, decreased levels of brain metabolism (Bates et al., 1992), and increased axon mylination (Pulvermuller and Schumann, 1994) may reduce the potential for successful late L2 acquisition. Alternatively, the act of L1 learning itself may also change the way L2 speech sounds are perceived, thus regulating L2 phonological attainment as a function of the developing L1 phonological system (Flege, 2003). According to this view, age of L2 acquisition predicts discrimination difficulty in so far as older learners tend to have had more L1 experience and thus more opportunity to develop refined and stabilized L1 representations that are neurally committed to L1 processing (Kuhl et al., 2003). These stabilized L1 representations then compete with the formation of L2-specific representations, making L2 learning more difficult (Hernandez et al., 2005). In effect, brain maturation and prior L1 experience likely co-occur and the age-of-acquisitioneffect in L2 phonological attainments reflects complex bidirectional interplay of both brain maturation and early language experience (Bates et al., 2002).

Once the L1 phonological system is firmly established, it may act as a perceptual filter that shapes how late L2 learners perceive L2 speech sounds. This can be maladaptive depending on the similarity and degree of acoustic overlap between the L1 and L2 phonetic categories (Flege, 1995a,b; Kuhl and Iverson, 1995; Strange, 2011). The classic example is the persistent difficulty that many native Japanese speakers have with perceiving and producing English /r/ and /l/. This contrast is challenging for many Japanese speakers (particularly those who began learning English later in life) because, unlike English, Japanese groups the phonetic units /r/ and /l/ to one phonemic category (Japanese /r/), thereby treating any acoustic differences between the units as irrelevant (Iverson et al., 2003; Aoyama et al., 2008). For example, Raizada et al. (2010) showed that native English speakers exhibit two distinct patterns of fMRI activity in right Heschl's gyrus when listening to the English syllables "ra" and "la", whereas native Japanese speakers tended to exhibit similar activation patterns for each syllable type. Moreover, the degree to which Japanese speakers showed separation between English "ra" and "la" predicted discrimination performance. The tendency for L2 learners to activate the same groups of auditory neurons for processing L1 and L2 speech sounds may explain why non-native phonetic discrimination is so challenging.

Following Hebbian rules (Hebb, 1949), the more neurons within one region fire in response to two different L2 phonemes, the more that pattern is reinforced (see McClelland, 2001 for a discussion). This makes late L2 learning after a sensitive period unlikely to occur through bottom-up processes triggered by exposure alone; that is, neural systems "optimized for performance, may not be optimal for learning" (Thompson-Schill et al., 2009, p. 260). As such, late L2 learners face more difficulties with accurate L2 phonetic perception, which subsequently affects the development of motor programs necessary to produce the subtle difference between L1 and L2 phonemes (Flege, 2003).

Does this mean that it is impossible for successful L2 learning to occur after a sensitive period has closed? Not necessarily. Although delayed L2 exposure may reduce the likelihood of successful learning and plastic changes occurring through exposure alone, many studies have shown that explicit L2 phonetic training can induce both functional changes in brain activity (Callan et al., 2003; Golestani and Zatorre, 2004; Zhang et al., 2009) and successful learning in adult learners (Guion and Pederson, 2007; Kondaurova and Francis, 2010). Phonetic training teaches learners to discriminate L2 speech sounds that not used contrastively in the L1 and are, thus, difficult to differentiate, either because they activate a single L1 phonetic category or are filtered by the L1 phonological system and therefore do not effectively activate any category (Flege, 1995a,b; Kuhl and Iverson, 1995). Explicit training can induce learning by overtly specifying regularities in the signal or by directing learners' attention to particular forms (DeKeyser, 2003). Such training takes advantage of adults' propensity for top-down learning, which can allow L1 representations to adapt to the new L2 input (Archila-Suerte et al., 2012).

The method of phonetic training is also important. For example, Guion and Pederson (2007) tested monolingual English speakers on their discrimination of non-native Hindi contrasts before and after being randomly assigned to either a sound- or meaning-attending training group. The sound-attending group was instructed to listen for sounds of Hindi words, while the meaning-attending group was instructed to listen for the meaning of the same words. The sound-attending group showed greater improvement in a categorical discrimination task, particularly for the most difficult contrast.

Training that teaches learners to redistribute their attention to L2 speech sounds may be particularly effective in improving L2 phonetic perception. Kondaurova and Francis (2010) examined the impact of three phonetic training methods on native Spanish speakers' perception of an English-specific vowel contrast (/i/ versus /I/; as in *sheep* and *ship*) that is not used in Spanish. Native English speakers distinguish these vowels using two acoustic dimensions, spectrum (vowel quality) and vowel duration. Spanish speakers, by contrast, tend to rely predominately on vowel duration, leading to difficulties discriminating the contrasting vowels. Kondaurova and Francis (2010) assigned Spanish speakers to one of three training conditions: vowel spectral enhancement, vowel duration inhibition, or natural correction (which resembled natural language exposure). Results on identification and discrimination tasks showed that while performance for all three groups improved Spanish speakers' relative use of vowel quality cues, the vowel duration inhibition training was the most effective in reducing reliance on duration cues (although vowel enhancement training was also effective relative to natural correction training).

Several neuro-imaging studies also have reported functional changes in cortical activity during phonetic processing as a result of perceptual training (e.g., Callan et al., 2003; Golestani and Zatorre, 2004; Zhang et al., 2009), suggesting potential for cortical plasticity, even after a sensitive period. For example, Golestani and Zatorre (2004) trained monolingual English speakers to identify Hindi speech sounds as belonging to either dental or retroflex phonetic categories, a phonetic distinction that is not used in English. After only 5 h of training, results showed significant behavioral improvements and functional changes within cortical areas that are used during the classification of native language speech sounds, including within the left superior temporal gyrus (an area associated with phonemic perception; Liebenthal et al., 2005), the left inferior frontal gyrus, and the left caudate nucleus (areas associated with speech articulation; Hickok and Poeppel, 2007). Correlations between degree of success in learning to identify the contrasting phonetic units and changes in neural activity were also observed. These findings underscore how even relatively short periods of phonetic training can induce functional changes in L2 phonetic processing.

Most neural imaging studies of foreign-language phonetic training involve naïve listeners or relatively low proficiency L2 learners participating in short training periods (e.g., ranging from a few hours to a few weeks). Thus, it is unclear the extent to which any behavioral or neural activity differences observed between learners and native speakers also characterize more proficient late L2 learners. More longitudinal training studies are needed to examine the extent to which explicit phonetic training, coupled with frequent and extended L2 use, change L2 phonetic representation and processing in a way that ultimately resembles that of early learners and/or native speakers (for a discussion of how L2 proficiency may impact other aspects of L2 processing, see Steinhauer et al., 2009; White et al., 2012).

Adult cortical plasticity, unlike sensitive period related plasticity, requires a mismatch between the functions of an existing neural network and demands imposed by the environment to generate lasting functional and structural change (Lövdén et al., 2010). Purely bottom-up (implicit) learning mechanisms may not be sufficient for adult learners to change pre-existing L1 phonetic representations in order to better differentiate L2-specific contrasts (Archila-Suerte et al., 2012). By contrast, top-down processes evoked by explicit training that is goal-oriented progressively adapts to participants' performance, provides feedback and directs attention to the relevant L2 features that require encoding, may enhance post-sensitive period L2 learning by allowing learners to attend to the mismatch between their current and goalstate performance and initiate plastic changes (see Ullman, 2001 for a similar argument about the relative role of declarative and procedural memory in initial stages of L2 syntax acquisition).

## **LEARNING MUSIC THROUGH TRAINING DURING A SENSITIVE PERIOD**

Like language, music relies heavily on auditory processing. However, unlike language, music training is a formal process where lessons typically occur early in life, and are quantifiable (Bengtsson et al., 2005; Wan and Schlaug, 2010; Penhune, 2011). This makes musicians an optimal population for studying the effects of sensitive periods on brain and behavior (Steele et al., 2013). Music training also allows us to examine the brain's capacity to learn and change as a result of training at different ages and examine the processes and skills that are differentially affected by this learning.

Within the last fifteen years, there has been a proliferation of studies examining the neural functioning of adult musicians as compared to non-musicians (e.g., Halpern and Zatorre, 1999; Blood and Zatorre, 2001; Koelsch et al., 2003; Zatorre, 2003). Music training has been associated with volumetric differences in the primary and secondary A1 (Schneider et al., 2002; Bermudez et al., 2009), planum temporale (Schlaug et al., 1995b), corpus callosum (Elbert et al., 1995; Schlaug et al., 1995a; Schmithorst and Wilke, 2002; Sluming et al., 2002; Lee et al., 2003), and motor areas associated with one's instrument of practice (Amunts et al., 1997; Pantev et al., 1998). Some of these differences have been shown to be functionally relevant. For example, Schneider et al. (2002) found that musicians showed bilateral differences in gray matter volume in anteromedial portion of Heschl's gyrus that were 130% larger than in non-musicians. This size difference was correlated with melody discrimination performance, such that greater differences were associated with better performance, suggesting that volumetric increases are functionally relevant and enhance music processing abilities.

Collectively, studies examining cognitive and motor performance in musicians versus non-musicians provide a platform from which we can explore the developmental aspect of music training—does music training result in differences in brain structure and function or are there pre-existing structural differences that allow one to excel at music? As the majority of the studies that compared musicians to non-musicians did not report the age at which musicians started their training, they do not allow us to examine whether training that begins early in life is necessary to experience these changes. Of the studies that do report the age at which musicians began their training (Elbert et al., 1995; Schlaug et al., 1995a; Amunts et al., 1997; Sluming et al., 2002; Lee et al., 2003), only a few specifically test for age-related differences in neural structure and function. These studies demonstrate that, as compared to training that begins later in life, early music training is related to enhanced motor processing and representational plasticity (e.g., Elbert et al., 1995; Amunts et al., 1997), greater bimanual motor synchronization (e.g., Schlaug et al., 1995a), and sensorimotor integration (e.g., Watanabe et al., 2007; Steele et al., 2013), suggesting that sensitive periods also may exist in the domain of music acquisition. To facilitate a more nuanced understanding of the relationship between sensitive periods and auditory processing, we will discuss how early versus later music training can affect changes at the motor, sensorimotor, and cognitive levels.

#### **SENSITIVE PERIODS IN MOTOR PROCESSING**

Several studies used regression models to examine whether age of starting musical training could account for structural differences in the brain (e.g., see Elbert et al., 1995; Amunts et al., 1997). Elbert et al. (1995) examined string players who started musical training across a range of ages (from 5 to 19), and found that the earlier string instrument training began, the more extensive the cortical network responses to tactile stimulation. Similarly, Amunts et al. (1997) found that the age at which keyboard players began their music training was negatively correlated with the size of the intrasulcal length of the precentral gyrus. Together, these findings suggest that the motor cortex can exhibit long-lasting structural adaptations that are induced by specific experience. The specificity of these effects are a function of the kind of experience musicians have with their instruments, which suggests that age of onset of training plays an important role in driving the structural and functional changes seen in adult musicians.

Bimanual motor performance also may be impacted by the age at which music training begins. In one of the earliest studies to directly test the effects of age of commencement of music training on neural structure, Schlaug et al. (1995a) found that the mid-saggital anterior corpus callosum (maCC) was significantly larger in musicians who started music training before age 7 versus musicians who commenced training after that age. Moreover, the maCC in both musician groups was significantly larger relative to a control group of non-musicians. Similarly, Lee et al. (2003) found further evidence for a link between early commencement of music lessons (i.e., before age seven) and increased maCC size, which was related to continuous practice of bimanual motor training.<sup>1</sup>

<sup>1</sup>It is, however, important to note that these studies do not specify if the duration of musical training was the same for those who began music training

Further support for a sensitive period for bimanual performance comes from studies on the plasticity of the maCC. The maCC undergoes significant structural and functional changes between ages six to eight. These changes, in turn, may affect the possible degree of cortical plasticity and the extent to which training after this age results in the same degree of cortical reorganization (Chiang et al., 2009; Westerhausen et al., 2011; Kurth et al., 2012).

#### **SENSITIVE PERIODS IN SENSORIMOTOR PROCESSING**

Early music training may also impact sensorimotor integration, both neurally and behaviorally. Steele et al. (2013) tested if music training might have a differential impact on plasticity in whitematter fibers connecting sensory and motor regions, resulting in better sensorimotor integration. Using diffusion tensor imaging they found that early-trained musicians had greater connectivity in the posterior midbody/isthmus of the corpus callosum. Fractional anisotropy in this region was related to age of onset of training and sensorimotor synchronization of performance. From this, the authors posited that training before age seven results in changes in white-matter connectivity and that these changes "may serve as the scaffold upon which ongoing experience can build" (p. 1282).

Behaviorally, Watanabe et al.(2007) compared adult musicians who began music instruction early (before age 7) and late (after age 7) though they were matched for years of experience and amount of current practice. Participants were tested on their ability to tap in synchrony to a visually presented complex rhythm. Results showed that even though both groups had experienced many years of music training, the early training group showed better synchronization with music rhythms compared to the late training group. This suggests that early training may impact neural systems involved in sensorimotor integration and timing to a greater extent than later training. Likewise, Bailey and Penhune (2012) reported similar results on an auditory rhythm synchronization task, which was taken as evidence that there may be sensitive periods during which music training has long-lasting impacts on rhythm synchronization and other musical skills.

However, important considerations must be kept in mind when interpreting the results of cross-sectional studies (i.e., the studies on music training discussed thus far) and the conclusions they make about sensitive periods. Importantly, cross-sectional studies do not allow us to investigate the causality of differences between musicians and non-musicians. Differential innate predispositions for musical ability may confound these studies and could explain differences between those who began music training earlier independently from the brain's capacity to learn and change as a result of age of training onset. Additionally, musicians with early-onset training typically have more training than those who began later (see Watanabe et al., 2007; Bailey and Penhune, 2012) or are younger at the time of testing. Both of these factors could account for differences in brain structure and function and in behavioral performance. Finally, cross-sectional studies involve retrospective evaluation of the extent to which the nature, quantity and quality of training were similar across all participants and therefore interpretations of a musical advantage may be somewhat unreliable.

The first longitudinal study to examine structural brain and behavioral changes in the developing brain as a result of music training was conducted by Hyde et al. (2009). They investigated whether 15 months of instrumental music training in 6-year-old children would provide benefits beyond participation in weekly school-based group music classes. Hyde et al. (2009) searched the brain for local brain size differences between groups and found no behavioral or brain differences between the two groups of children at baseline. After 15 months, the children in the instrumental training group showed greater improvements on finger motor tasks and melody/rhythmic tasks post-test, but, importantly, not on the non-musical tests. The instrumental training group also demonstrated greater relative voxel size change as compared to controls in motor regions (e.g., precentral gyrus), corpus callosum, and Heschl's gyrus. These findings are important because they suggest that the neuroanatomical differences seen in adult musicians relative to non-musicians may result from intensive music training rather than a biological predisposition to music (Norton et al., 2005; Schlaug et al., 2005). Moreover, Hyde et al. (2009) illustrate several key points: (1) early music training may indeed *lead* to substantial neural changes that were not apparent at the start of training, and are thus not due to pre-existing differences in brain structure; (2) the type of music training received may be an important factor in determining the degree and kind of structural changes observed in the brain; and (3) benefits conferred from music training can manifest in a relatively short time (15 months) in young children.

#### **EFFECTS OF EARLY MUSICAL TRAINING ON AUDITORY PROCESSING**

In addition to providing evidence that musicians exhibit enhanced motor and sensorimotor processing relative to nonmusicians, there are also a number of studies that demonstrate that early music training can impact multiple levels of auditory processing. For example, Pantev et al. (1998) measured the cortical representations of highly-skilled musicians using functional magnetic source imaging (single dipole model). The age-of-onset of musical training ranged from three to twelve. Dipole moments for piano tones, but not for pure tones of similar frequency, were enlarged by approximately 25% in the musician group, relative to the non-musician controls. Enlargement was inversely correlated with the age at which musicians started to practice, such that the younger the musicians were when they started to practice, the larger was the cortical reorganization in response to piano tones. Pantev et al. (1998) suggested that use-dependent functional reorganization extends across the sensory cortices, reflecting the pattern of sensory input processed by the participant as his/her musical skills develop.2

before and after age seven. This means that the total number of years of musical training may also be important to maCC development, as well as ageof-onset of music training.

<sup>2</sup>Monaghan et al.(1998) criticized Pantev et al.(1998) because of the statistical techniques used (i.e., using two-tailed instead of one-tailed tests) and the correlational nature of the data without controlling for genetic or environmental effects. Subsequent research has supported Pantev et al. (1998) interpretation (e.g., see Shahin et al., 2004).

Similarly, Shahin et al. (2004) measured auditory evoked potentials (AEPs) elicited by piano, violin, and pure tones in four- and five-year-old children enrolled in Suzuki music lessons and non-musician controls. AEPs reflect the development of mature synaptic connections in the upper neocortical laminae that occurs between ages 4 and 15. Results showed that music training affected the AEPs at multiple stages of auditory processing. Compared to controls, Suzuki students exhibited larger P1 and P2 components when listening to their instrument of practice (piano or violin). Moreover, the AEPs observed for piano tones in the music students were comparable to those found in non-musician children three years older. This suggests that musical training can influence and expedite the shaping of neural development.

This neural development, especially the sub-cortical auditory plasticity seen in young musicians, can persist into adulthood (Skoe and Kraus, 2012). Skoe and Kraus (2012) showed that adults who received formal music lessons as children (but who had not played in many years) had more robust brainstem responses to sound than those who had never received lessons. Neural response quality increased significantly from those who had no music lessons during childhood, to those who had 1 to 5 years, to those who had 6 to 11 years. Similarly, Zendel and Alain (2012) found that this benefit may persist into old age. They compared older amateur and professional musicians who started music lessons before age 16, all of which had continued to play throughout their life. They found ongoing music playing mitigated the central auditory processing declines typically associated with aging. Collectively, these findings clearly demonstrate that early training affects the brain, leading to life-long changes in brain function.

#### **PITCH MEMORY AND ABSOLUTE PITCH**

Although age of start of musical training is not generally the focus of most studies examining the cognitive benefits of musical training, absolute pitch (AP) is an exception. AP is the ability to identify or produce a specific pitch without a reference pitch (Baggaley, 1974). Levitin (1994) proposed a two-component theory of AP, which posits that AP is comprised of pitch memory and pitch labeling.

Pitch memory is the ability to maintain and access stable, long-term representations of specific pitches in memory (Levitin, 1994). It is a common ability found in both musicians and nonmusicians, as a result of everyday exposure to music (Terhardt and Ward, 1982; Terhardt and Seewann, 1983; Halpern, 1989). For example, Levitin (1994) investigated pitch memory in participants with and without musical training. When instructed to sing several bars of their two favorite songs, both groups came within two semitones of the original recordings for both songs, suggesting that everyone—musicians and non-musicians alike posses pitch memory ability. In pursuit of a related question, Schellenberg and Trehub (2003) had non-musician adults hear a version of a familiar TV theme song played at the standard key and transposed by either one or two semitones. The participants identified above chance which excerpt was in its original key. Similar findings have been observed in children (9 to 12-yearolds; Schellenberg and Trehub, 2008) and infants (Volkova et al., 2006), who were also able to recognize the correct key of familiar recordings, suggesting that pitch memory develops early in life.

Trehub et al. (2008) indirectly addressed whether or not a sensitive period exists for AP by studying the effects of age and culture on children's memory for the pitch level of familiar music. English speaking Canadian nine- and ten-year-olds were able to distinguish between the original pitch level of familiar television theme songs and foils that were pitch-shifted by one semitone, whereas five- to eight-year-olds could not make this distinction. Conversely, Japanese five- and six year-olds could distinguish the pitch-shifted foils from the originals, performing significantly better than their same-age Canadian counterparts. Trehub et al. (2008) suggested that these differences may stem from Japanese children's use of a pitch-accent language rather than a stressaccent language (English), thus affording these children additional experience with musical pitch labels. These findings suggest that language type (e.g., pitch- versus stress-accent language) may determine when pitch memory abilities come online and that increased experience with pitch discrimination, whether through language or increased exposure to music, can improve pitch memory (as in the case of the improvement between the five- and six-year-old Japanese children's performance). The finding that five- to eight-year old Japanese children performed better than their Canadian age-matched counterparts, and that the Canadian children could not discriminate the pitch change until age nine and ten, suggests that experience with a pitch-accent language bootstraps pitch memory abilities earlier than experience with a stress-accent language.

Pitch labeling—the rare ability to attach a meaningful label, such as D#, A440, or Do, to pitches—is the hallmark of AP (Levitin, 1994). Because it requires knowledge of note names, its prevalence is restricted to those with music training (Schellenberg and Trehub, 2008). The probability of developing pitch labeling, and thus AP, substantially increases if music training begins prior to age 6 to 7 (Sergeant, 1969; Miyazaki, 1988; Baharloo et al., 1998; Gregersen et al., 1999; Brown et al., 2002; Deutsch et al., 2006; Miyazaki and Ogawa, 2006), suggesting that AP shows signs of having a sensitive period (Bachem, 1940; Sergeant, 1969; Miyazaki, 1988; Gregersen et al., 1999; Russo et al., 2003; Levitin and Rogers, 2005; Deutsch et al., 2009; Lee et al., 2011). For example, Schellenberg and Trehub (2008) found that early music training is the best predictor of pitch labeling. However, it is unclear whether these age-effects reflect some of the confounding factors that are related to age or maturational differences in the brain's capacity to reorganize its cortical representations of pitch as a result of music training at different ages.

## **TRANSFER OF AUDITORY SKILLS BETWEEN MUSIC AND LANGUAGE**

Like language, music appears to have sensitive periods. Although neural network differences exist between music and language (Zatorre et al., 2002), they both rely on many similar sensory and cognitive processes. They use the same acoustic cues (pitch, timing and timbre) to convey meaning, rely on systematic soundsymbol representations, and require analytic listening, selective attention, auditory memory, and the ability to integrate discrete units of information into a coherent and meaningful percept (Kraus and Chandrasekaran, 2010; Patel, 2011). This overlap in neuro-cognitive systems leads to the possibility that experience or training in one domain may enhance processing in the other (Patel, 2008; for a longer discussion, see Moreno, 2009).

Transfer between music and language is typically studied in the context of how childhood music training impacts language development (for reviews see Moreno, 2009; Strait and Kraus, 2011). In addition, there is new evidence that suggests language experience also may enhance music processing (Deutsch et al., 2006, 2009; Bidelman et al., 2013). Research into music-language transfer provides a unique perspective into sensitive periods effects because it allows us to examine the extent to which early auditory experiences, be it with language or music, alter the functionality of sensory and cognitive systems in a domain-general way.

#### **THE CASE OF LANGUAGE TO MUSIC TRANSFER**

Although the increased prevalence of AP among certain Asian populations has been suggested to reflect genetic factors (Zatorre, 2003), it may also be related to their experience speaking a tonal language. For example, Mandarin and Cantonese use tone (i.e., pitch fluctuations, Deutsch et al., 2004, but see Burnham et al., 2004; Trainor, 2005) to express word meaning. Bidelman et al. (2013) compared adult Cantonese-speaking non-musicians, English-speaking non-musicians and English speaking trained musicians on music-processing tasks (e.g., pitch discrimination and memory). They found that Cantonese speakers' performance was comparable to that of musicians and enhanced relative to the English speaking non-musicians. Moreover, in a sample of native Mandarin and English speakers attending music schools in their respective countries, Deutsch et al. (2006) found that Mandarin speakers showed a higher incidence of AP than English speakers (but see Baharloo et al., 1998; Gregersen et al., 1999; Baharloo et al., 2000, for a discussion of AP and genetic influences). The greatest incidence of AP was in children who began music training before 8 years of age, regardless of their language background. However, only a small percentage of Mandarin speakers (and none of the English speakers) developed AP if music training began later, suggesting that previous experience with a tone language may gate the closure of a potential sensitive period.

#### **THE CASE OF MUSIC TO LANGUAGE TRANSFER**

Several studies have examined the transfer of skills from music to language. This transfer can be observed at multiple levels (Bidelman et al., 2013; Moreno and Bidelman, 2013), from perceptual (e.g., acoustic parameters, Chartrand and Belin, 2006; Bidelman et al., 2009, 2011; Bidelman and Krishnan, 2010), to cognitive (Anvari et al., 2002; Franklin et al., 2008; Moreno et al., 2009; Chobert et al., 2012; Francois et al., 2012; Marie et al., 2012), to domain-general (e.g., attention and inhibition, Bialystok and DePape, 2009; Moreno et al., 2011a). This work suggests that there may be an association between childhood music training and improved language processing for a variety of language skills, including pitch discrimination in speech (Moreno and Besson, 2006; Moreno et al., 2009), perception and neural encoding of speech in noise (Strait et al., 2009; Strait and Kraus, 2011), and a variety of reading-related measures, including phonological awareness (Bolduc, 2009; Tsang and Conrad, 2011), naming speed (Herrera et al., 2011), the ability to match visual symbols to words, (Moreno et al., 2009, 2011b), spelling (Overy, 2003), vocabulary (Moreno et al., 2011a), and reading comprehension (Corrigall and Trainor, 2011). Moreover, relationships between early music training, enhanced language processing and increased attentional control (Moreno et al., 2011a; Strait et al., 2012) and auditory working memory (Strait et al., 2012) have been observed in children. The collective importance of these findings is underscored by studies that reported associations between childrens' music training and increased Intelligence quotient (IQ; Schellenberg, 2006) and school performance (Wetter et al., 2009). Furthermore, the enhancements seen in language domains have been shown to correlate with length and intensity of musical training (e.g., enhanced subcortical auditory and audiovisual processing, Musacchia et al., 2007; subcortical processing of vocal expressions of emotion, Strait et al., 2009). These findings have also been demonstrated in music intervention studies (Besson et al., 2011; Bhide et al., 2013; Thomson et al., 2013). For example, Chobert et al. (2012) found that 12 months of active music training enhanced pre-attentive processing of syllabic duration and voice onset time in 8 to 10 year-olds.

Most studies to date have investigated the impact of music training on developing language and cognitive skills in children. Thus, the extent to which similar transfer effects might occur at different points in development is unclear. Whereas training in adults and older children modifies existing neural circuits, in young children it may still influence the initial formation of those circuits. Consequently, training could result in quantitatively and qualitatively different changes, depending on the brain maturation and an individual's relative position on his/her language development trajectory (for a discussion see Jolles and Crone, 2012). For example, one might predict that music training may have a greater impact on emerging literacy and selective attention skills in younger children because the room for improvement is larger.

#### **MECHANISMS OF TRANSFER**

Examining the mechanisms by which training may enhance children's language and cognitive skills can enhance our understanding of how early auditory experiences shape auditory processing. This is important both practically and theoretically. Practically speaking, it is important for developing effective educational programs that maximize the potential for high-quality learning outcomes. Theoretically, it is tied to fundamental questions about the processes by which the brain generalizes and transfers learning from one domain to another (Gazzaniga, 2008). We suggest that transfer between music and language could occur via shared processing in both auditory and attention control systems (Kraus and Chandrasekaran, 2010; Patel, 2011; Strait and Kraus, 2011).

A neurocognitive model that has been used to illustrate music-to-language transfer is Patel (2011) OPERA hypothesis. The OPERA hypothesis details how musical training facilitates recruitment of neural areas that are used in both music and language, such as Broca's Area (i.e., Overlap) through a learning process that involves precision (P), emotional-engagement (E), repetition (R), and attentional focus (A). The components of the model contribute to increased neural processing precision for all salient acoustic information, whether musical, linguistic, or other. A central proposition of the OPERA hypothesis is that transfer occurs because the basic encoding of acoustic features in speech and music rely on largely overlapping subcortical and cortical networks. Music-to-language transfer occurs because music processing requires acoustic features to be encoded with a higher degree of precision than is typically required when processing speech. High-precision training of particular acoustic features (e.g., frequency, duration) in music that rely on overlapping neural systems in speech, leads to enhanced precision of those features in both domains. This enhanced precision of acoustic features can then feed-forward to influence higher levels of language processing (e.g., phonemic categorization, phonological-lexical processing; Besson et al., 2011). Similarly, experience with particular acoustic features in language (e.g., lexical tone) may facilitate the neural encoding and processing of those same features in music. This potential bidirectionality of transfer between music and language was supported by Bidelman et al. (2013) who found that adult Cantonese-speaking non-musicians' performance on music-processing tasks was comparable to that of musicians and enhanced relative to English speaking non-musicians.

Patel (2011) hypothesizes that transfer is possible via shared underlying neural networks mediated by enhanced attentional control. The mechanism of these processes may again lie in Hebbian principles (Hebb, 1949), such that stimulation in one network stimulates the complementary domain by nature of overlapping neural networks. The demands of music training reinforce the auditory and attentional networks which, in turn, transfer to other domains (e.g., language) and improve cognitive skills. Specifically, under OPERA, early music training promotes language development by allowing learners to allocate more attentional resources to shared auditory features, thereby enhancing processing of those features as well as the executive control systems that guide auditory attention and inhibition more generally (Kraus and Chandrasekaran, 2010; Patel, 2011; Strait and Kraus, 2011; Moreno et al., 2011a).

A second neurocognitive model that builds on the OPERA hypothesis has recently been proposed to explain music-language transfer effects (Moreno and Bidelman, 2013). According to this model, the degree to which transfer occurs and the neural systems affected can be conceptualized as a spectrum along two orthogonal dimensions: Sensory-Cognitive and Near-Far (**Figure 1**). The *Sensory-Cognitive* dimension characterizes the processing level affected and ranges from low-level sensory processing that is specific to the auditory domain, to high-level domain-general cognitive processes that support language and executive function (e.g., mechanisms that regulate, control and manage attention, working memory and planning). It is supported by research that shows benefits of music training at sensory levels (e.g., experiencedependent plasticity in brainstem AEPs, Kraus et al., 2009; Krishnan and Gandour, 2009; Krishnan et al., 2012) as well as cognitive levels (e.g., music training impacting cortical plasticity, e.g., Münte et al., 2002; Trainor et al., 2003; Zatorre, 2005; Moreno et al., 2011a; Herholz and Zatorre, 2012, and attention/inhibition control, e.g., Moreno et al., 2011a; Strait et al., 2012). The *Near-Far* dimension characterizes the "distance" of transfer (i.e., the degree of similarity) from the domain and context of training to the skills assessed. Examples of near transfer include findings that repeated exposure to the manipulation of auditory patterns leads to the subsequent development of analytic listening skills required for robust auditory stream segregation (Zendel and Alain, 2009), complex sound manipulation (e.g., musical transposition, Foster and Zatorre, 2010), and "cocktail party listening" (Parbery-Clark et al., 2009; Bidelman and Krishnan, 2010; as discussed in Moreno and Bidelman, 2013). Examples of far transfer include when the auditory precision demanded by music training benefits auditory sensory encoding in unrelated domains such as speech and language (Wong et al., 2007; Moreno, 2009; Schlaug et al., 2010; Bidelman et al., 2011, 2013). According to this model, the amount of benefit (i.e., the extent of transfer and the processing levels affected) depends on the length and intensity of training and the degree to which training tunes general cognitive skills. This leaves open the possibility that the particular focus of a given training programs and individual differences in attention control may differentially impact transfer outcomes.

Many studies demonstrate an effect of music training on both language and attentional control (see Kraus and Chandrasekaran, 2010; Strait and Kraus, 2011; Moreno and Bidelman, 2013). For example, Strait et al. (2012) compared the ability to encode speech in noise in children (ranging in age from 7 to 13) who had been receiving regular music training starting before the age of 5 versus those who had not received regular music instruction. The children who had received music training showed enhanced perception of sentences and greater brainstem response to speech sounds in noise. Moreover, this more accurate sentence perception in noise and more robust and faster brainstem encoding of key features of speech sounds were correlated with improved performance on measures of auditory attention. Thus, music training appears to improve the ability to rapidly detect, sequence and encode sound patterns that are deemed important, while suppressing and disregarding irrelevant and meaningless information (Kraus and Chandrasekaran, 2010). These abilities are arguably related to fine-tuning of executive control mechanisms in the brain and, specifically, selective attention mechanisms. Difficulty identifying speech sounds in noise has been argued to be a fundamental deficit for children with specific language impairment (Ziegler et al., 2005) and developmental dyslexia (Ziegler et al., 2009), raising the possibility that music training may provide a benefit for children who struggle with language (Kraus and Chandrasekaran, 2010).3

<sup>3</sup>Research into auditory scene analysis (i.e., how we form a meaningful auditory percept from multiple incoming auditory signals), also points to a facilitative developmental role of attention in audition and suggests a potential for music training for reading/language rehabilitation. Sussman and Steinschneider (2009) compared the amount of frequency separation that children and adults require to perceive two separate sound streams in active and passive listening conditions (i.e., with or without attention). In contrast to adults, who displayed similar ERP indices of sound segregation in both conditions, children required much larger frequency separation in passive compared to active listening conditions. This suggests that attention plays an important developmental role in shaping the neural networks underlying

However, many studies of music-to-language transfer employ cross-sectional designs that compare children who have or have not received music training, making it difficult to determine the extent to which differences in language processing reflect the effect of music training *per se* as opposed to pre-existing, innate capacities, motivation, parental involvement or other environmental factors (e.g., Penhune, 2011). To this end, longitudinal studies that randomly assign participants to music or other related training programs are important for understanding the mechanisms of transfer and the extent to which transfer may be sensitive period dependent. In a series of longitudinal studies, Moreno et al. (2009, 2011a,b) and Moreno and Besson (2006) examined the benefit of music training on multiple aspects of language processing by randomly assigning children to teacher-led, computer-based music listening or visual art training programs. For example, Moreno et al. (2009) found that eight year old children showed improvements in EEG correlates of pitch processing in speech after participating in six months of music training as compared to matched children who participated in visual art training (see also Moreno and Besson, 2006). Enhanced auditory processing of important acoustic features in speech may be particularly beneficial for speech perception under challenging listening conditions, as suggested by a musician advantage in detecting speech in background noise (Strait et al., 2012). Moreover, using an intensive (20 day) version of these training programs with younger children (age 4–6), Moreno et al. (2011a) found that music training led to significant enhancements in verbal intelligence (as measured by the Wechler Preschool and Primary Scale of Intelligence – Third Edition, WPPSI-III), with over 90% of the children showing improvements. Significant changes to ERP indices of executive function in a visual Go/No-Go task were also observed, which positively correlated with improvements in verbal intelligence.<sup>4</sup> Crucially, neither verbal memory nor executive function were significantly enhanced in the control group of children who were randomly assigned to a visual art training group. Collectively, these findings provide causal evidence for the role of music training in enhancing children's developing language skills. They suggest that children's language performance may benefit from music training via two sources of transfer: the near transfer of skills within the auditory domain that enhance the encoding of speech and the far/broad transfer of skills between high-level

sound pattern organization used in passive listening conditions. Such findings may be particularly pertinent for children with language and/or reading impairment, who have been shown to have difficulty with sound segregation (i.e., require larger temporal or spectral differences to perceive segregated streams; Sutter et al., 2000) and who, according to some theories, suffer from impaired attention control (Petkov et al., 2005). To the extent that music training is associated with enhanced sound segregation (Zendel and Alain, 2009), music training may provide an important vehicle for reading/language rehabilitation.

<sup>4</sup>Not all studies report superior visual attention skills in musicians relative to non-musicians. For example, Strait et al. (2013) report a significant difference between musicians and non-musicians in auditory, but not visual, attention, as assessed by reaction time to particular (visual or auditory) target stimuli. In contrast, Moreno et al. (2011a) found an advantage of musical training on visual attention using ERP indices of response inhibition using a go/no-go procedure. Differences between studies may be due to the use of behavioral versus electrophysiological measures, assessing attention to a target, versus response inhibition to a distracter as well as the operationalization of what attention is.

cognitive activities, as mediated by enhanced attention control (for a discussion see Moreno, 2009; Moreno and Bidelman, 2013).

#### **SENSITIVE PERIODS FOR MUSIC-LANGUAGE TRANSFER**

Empirical evidence supports that some aspects of language and music are sensitive-period dependent. Given the bidirectionality of the transfer between music and language (i.e., Bidelman et al., 2013), we suggest that there may also be a sensitive period in transfer, such that the effects of training may be greatest during the overlap of the sensitive periods. We also believe that transfer is influenced by the interaction between genetics and environment (i.e., "nature" and "nurture"). AP is an example of this phenomenon. Genetic predispositions have been cited as a contributing factor to AP-development (Baharloo et al., 2000; Drayna et al., 2001; Zatorre, 2003), conferring a general aptitude for frequency encoding. Yet, environmental influences are also important. For example, Schellenberg and Trehub (2008) found that early music training is the best predictor of pitch labeling. However, music training may not be the only "nurturing" auditory experience that contributes to pitch labeling skill. Speaking a tone language is also associated with higher rates of AP (Gregersen et al., 1999; Deutsch et al., 2006), suggesting that tone language experience may bootstrap the ability to meaningfully label sounds, as discussed in relation to pitch memory. Thus, AP appears to be a combination of "nature" and "nurture", such that some individuals may be born with a pre-disposing genetic disposition that may be more likely to develop into AP when music training and particular language experience is provided early in development. Crossdomain bootstrapping is one of many examples of transfer in and between the domains of language and music.

## **DISCUSSION: MECHANISM OF AUDITORY LEARNING AND TRANSFER DURING AND AFTER A SENSITIVE PERIOD**

We suggest that auditory learning and plasticity is possible both during and after a sensitive period; however they differ in their relative reliance on two underlying mechanisms. The difference can be best considered as end points of a continuum between bottom-up and top-down processing mediated by attention (e.g., Strait et al., 2010). During a sensitive period learning is largely a bottom-up process that is triggered by exposure to auditory input. It is an optimal period for learning because underlying neural circuits have not yet been fully specified and are extremely sensitive to input received. Learning occurs through a process of perceptual narrowing that hones in on frequently occurring, and thus important, features in the input (Scott et al., 2007). This occurs gradually as input progressively directs the refinement and stabilization of neural circuits, until a threshold level of stability has been attained, thus, corresponding to the gradual closing of the sensitive period for the skills sub-served by those circuits (Kral and Eggermont, 2007; Kuhl et al., 2008).

After a sensitive period, learning is largely a top-down process that depends on attention to enhance the salience of features in order to encode them. It is a process of changing the structure and efficiency of pre-existing circuits to more optimally process a new input source (Knudsen, 2004; Lövdén et al., 2010). In the case of L2 learning this may involve creating a completely new circuit. In the case of music training, this may involve dramatically improving the specificity of circuits that were created through earlier exposure to music. Both may require explicit training that teaches learners how to best direct their attention to relevant information to initiate plasticity. Indeed, animal studies demonstrate that acetylcholine (a neurotransmitter associated with sustained attention; Sarter et al., 2001) plays an important role in adult experience-dependent plasticity (Kilgard and Merzenich, 1998; Mercado et al., 2001). Acetylcholine is thought to gate learning and plasticity by enhancing the processing of relevant sensory stimuli and filtering out irrelevant noise and distracters (Sarter et al., 2001; Seitz and Dinse, 2007). The release of acetylcholine with attention may mark the importance of particular stimulus features by increasing the responsiveness of neurons, increasing the probability of synchronous firing and strengthening of synaptic connections (Jagadeesh, 2006). Therefore, learning, particularly after a sensitive period, appears to be a gated system, through which attention (via acetylcholine) can facilitate or restrict plasticity (Seitz and Dinse, 2007).

Although bottom-up and top-down processes can be considered as ends of a continuum, the difference between learning during and after a sensitive period can be viewed as one of degree rather than kind: age-related shifts in the relative reliance on each process may be a gradual, rather than an all-or-none, shift. Although bottom-up processes may predominate during a sensitive period, auditory learning may also be facilitated by top-down internal mechanisms and external cues that regulate attention. For example, Conboy et al. (2008) showed that individual differences in 8–11 month old infants' cognitive control is inversely related to their discrimination of non-native phonetic contrasts (see also Lalonde and Werker, 1995). This suggests that even as early as the first year of life, the domain-general ability to ignore irrelevant information and focus on relevant information may promote early stages of language learning (Diamond et al., 1994). Moreover, infant-directed speech and maternal singing are thought to promote phonetic learning by directing arousal and attention to relevant speech cues (Werker et al., 1996; Trehub and Trainor, 1998). However, the protracted development of the prefrontal cortex and its associated executive functions (Gogtay et al., 2004) and the under-specification of higher-order categories (Kral and Eggermont, 2007) may place an upper limit on the extent to which top-down mechanisms mediate learning early in development. Similarly, although top-down processes may predominate after a sensitive period, bottom-up mechanisms (e.g., statistical learning of speech; Saffran et al., 1997) may continue to operate, although the extent to which they induce learning may depend on the level of specification of the existing neural network (Kuhl et al., 2008) and the efficiency with which the existing network processes new environmental input (McClelland, 2001). Thus, both bottomup and top-down mechanisms influence learning and plasticity during and after a sensitive period, though the relative reliance on each may change across development.

Viewing learning and plasticity during and after sensitive periods as falling along a continuum between bottom-up and topdown processing mechanisms can help us understand why childhood training is so beneficial. Music training, for example, may be associated with such long-lasting benefits in music, language and attention processing because it strengthens emerging topdown processes at a time when bottom-up mechanisms are still available. Indeed, one benefit of music training may be to expedite the developmental trajectory of top-down control over speech processing (Strait et al., 2013). For example, early music training (i.e., before age 6 or 7) has been found to be associated with more precise encoding of speech and enhanced auditory attention– a benefit observed for both adult and child musicians (ages 7 to 13; began lessons before age 6) relative to age-matched nonmusicians (Strait et al., 2013). Significant correlations between attention and neural encoding of speech throughout development, supports the view that strengthened top-down control may be one mechanism underlying musicians' more precise auditory processing for both music and speech. Moreover, enhancements may already be evident following relatively few years of continuous music training in young children (Strait et al., 2013). Future research on this topic should clarify the relative dependence of learning on bottom-up and top-down processes during and after sensitive periods and the extent to which this balance is impacted by training. This is an exciting new field of research that may lead to new training methods geared towards optimizing learning across the lifespan.

#### **ACKNOWLEDGMENTS**

We would like to thank Patrick Bermudez, Yunjo Lee, Aline Mossard, Tristan Watson and Bozena White for their helpful comments on earlier versions of this manuscript. This work was financially supported through grants awarded to Sylvain Moreno from the Canadian Federal Ministry of Economic Development, to Erin J. White from the Fonds de recherche du Québec - Société et culture (FQRSC) and to Stefanie A. Hutka from the Natural Science and Engineering Research Council of Canada (NSERC)- Create: Training in Auditory Cognitive Neuroscience.

#### **REFERENCES**


cortex. *Proc. Natl. Acad. Sci. U S A* 102, 16460–16465. doi: 10.1073/pnas. 0508239102


Hickok, G., and Poeppel, D. (2007). The cortical organization of speech processing. *Nat. Rev. Neurosci.* 8, 393–402. doi: 10.1038/nrn2113


Monaghan, P., Metcalfe, N. B., and Ruxton, G. D. (1998). Does practice shape the brain? *Nature* 394, 434. doi: 10.1038/28775


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 01 June 2013; accepted: 29 October 2013; published online: 20 November 2013.*

*Citation: White EJ, Hutka SA, Williams LJ and Moreno S (2013) Learning, neural plasticity and sensitive periods: implications for language acquisition, music training and transfer across the lifespan. Front. Syst. Neurosci. 7:90. doi: 10.3389/fnsys.2013.00090*

*This article was submitted to the journal Frontiers in Systems Neuroscience.*

*Copyright © 2013 White, Hutka, Williams and Moreno. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

**REVIEW ARTICLE** published: 21 October 2013 doi: 10.3389/fnsys.2013.00058

## Early neural disruption and auditory processing outcomes in rodent models: implications for developmental language disability

## *R. Holly Fitch1\*, Michelle L. Alexander <sup>2</sup> and Steven W. Threlkeld3*

*<sup>1</sup> Department of Psychology/Behavioral Neuroscience, University of Connecticut, Storrs, CT, USA*

*<sup>2</sup> Department of Pediatrics, University of Minnesota, Minneapolis, MN, USA*

*<sup>3</sup> Department of Psychology, Rhode Island College, Providence, RI, USA*

#### *Edited by:*

*Virginia Penhune, Concordia University, Canada*

#### *Reviewed by:*

*Etienne De Villers-Sidani, McGill University, Canada Xiaoming Zhou, East China Normal University, China*

#### *\*Correspondence:*

*R. Holly Fitch, Department of Psychology/Behavioral Neuroscience, University of Connecticut, Box U-1020, Bousfield, Babbidge Road, Storrs, CT 06268, USA e-mail: Roslyn.h.fitch@uconn.edu*

Most researchers in the field of neural plasticity are familiar with the "Kennard Principle," which purports a positive relationship between age at brain injury and severity of subsequent deficits (plateauing in adulthood). As an example, a child with left hemispherectomy can recover seemingly normal language, while an adult with focal injury to sub-regions of left temporal and/or frontal cortex can suffer dramatic and permanent language loss. Here we present data regarding the impact of early brain injury in rat models as a function of type and timing, measuring long-term behavioral outcomes via auditory discrimination tasks varying in temporal demand. These tasks were created to model (in rodents) aspects of human sensory processing that may correlate—both developmentally and functionally—with typical and atypical language. We found that bilateral focal lesions to the cortical plate in rats during active neuronal migration led to worse auditory outcomes than comparable lesions induced after cortical migration was complete. Conversely, unilateral hypoxic-ischemic (HI) injuries (similar to those seen in premature infants and term infants with birth complications) led to permanent auditory processing deficits when induced at a neurodevelopmental point comparable to human "term," but only transient deficits (undetectable in adulthood) when induced in a "preterm" window. Convergent evidence suggests that regardless of when or how disruption of early neural development occurs, the consequences may be particularly deleterious to rapid auditory processing (RAP) outcomes when they trigger developmental alterations that extend into subcortical structures (i.e., lower sensory processing stations). Collective findings hold implications for the study of behavioral outcomes following early brain injury as well as genetic/environmental disruption, and are relevant to our understanding of the neurologic risk factors underlying developmental language disability in human populations.

**Keywords: rapid auditory processing, language disability, cortical lesion, timing effects, plasticity and injury, rodent models, medial geniculate nucleus**

### **INTRODUCTION**

The profound plasticity of the developing brain affords an adaptive and often advantageous quality that is no longer prominent in adulthood (though recent research shows that the adult brain retains a greater level of plasticity than once thought). This early plasticity reflects the unique capacity of the developing brain to rapidly respond to external input and functional demands by enhancing, rerouting, or eliminating underlying neural circuitry—thus promulgating a brain (organism) more precisely suited to its unique environment. One important implication of this early and transient "responsiveness and optimization" capability is that *the developing brain is also potentially much less vulnerable to the detrimental effects of injury*. As the word "potentially" suggests, however, this principle is not straightforward. In order to tease apart the critical mechanisms and consequences of early brain disruption as indexed by later cognitive outcomes, it is quite valuable to employ animal models that allow us to map out the relative impact of clinically relevant neural manipulations (such as induced injury or genetic manipulations) on more basic outcome measures (such as rapid auditory processing (RAP)). Initially, in order to fully understand how the plasticity of early systems might contribute to an enhanced capacity to respond in a beneficial way to injury and/or disruption, it is important to briefly review key neurodevelopmental events for the central nervous system (CNS) in general, and the central auditory system in particular.

#### **A BRIEF OVERVIEW OF CNS DEVELOPMENT IN MAMMALS**

During embryonic development, the CNS arises from a specialized subset of epithelial cells (the neural plate). As the neural plate expands, the lateral edges fold in and merge, separating from the rest of the epithelium to create the neural tube (Nowakowski and Hayes, 2002; Diaz and Gleeson, 2009). In humans, formation of the neural tube occurs around embryonic day 26–28, and in rodents, around embryonic day 10.5 (E10.5). Following closure of the neural tube, regional specification begins, with the emergence of forebrain (rostral), midbrain, hindbrain, and spinal cord. In general, development proceeds along a "bottom-up" (or lowestto-highest) gradient, with the spinal cord and hindbrain (caudal) structures maturing first. Around the end of the first gestational month in humans (E10–12 in rodents), proliferation of neural stem cells (fated to become neuroglia or neurons) begins, with the timing of local neurogenesis temporally staggered (again proceeding caudal to rostral or "bottom-up"; Rice and Barone, 2000; Nowakowski and Hayes, 2002; see **Figure 1**). Within the rostralmost forebrain, a highly proliferative area (destined to become neocortex) emerges along the surface of the lateral ventricles—the ventricular zone.

During the early stages of cortical cell proliferation, progenitors in the ventricular zone undergo symmetrical self-renewing cell division that generates additional progenitors. During this phase, some progenitors differentiate into radial glial cells (characterized by a long radial process that extends from the ventricular surface to the pial surface). During subsequent phases of neural proliferation, some radial glial cells continue to divide symmetrically in the proliferating zones of the cortex, but some radial glial cells undergo asymmetric cell division that results in the creation of one clone radial glial cell and a committed neural cell (post-mitotic neocortical neuron). The process of asymmetric cell division at this stage is known as neurogenesis. As post-mitotic neocortical neurons are born, they begin to migrate radially toward the pial surface, following the scaffolding created by radial glia (Nowakowski and Hayes, 2002; Diaz and Gleeson, 2009; Rakic, 2009). This process is called neuronal migration, and occurs between 13–21 weeks gestation in humans (Chong et al., 1996), and approximately embryonic day 14 (E14) to postnatal day 3 (P3) in rats. As an aside, it is important to note that in many lower areas of the CNS (spinal cord, brainstem) newborn neurons are moved into final laminar patterns through a passive displacement rather than active migration (**Figure 1**).

During the initial stages of neuronal migration, the first postmitotic neurons born in the ventricular zone migrate a short distance to form the cortical pre-plate. As new neurons are generated they continue to accumulate in the pre-plate, ultimately forming the cortical plate—which will in turn give rise to neocortical layers II–VI. The emergence of the cortical plate splits the preplate into the superficial marginal zone (layer I in the mature cortex) and the sub-plate below. Thus at this stage, the cerebral wall is characterized by four layers, including (from the most interior to most superficial): (1) the ventricular/sub-ventricular

zone; (2) the intermediate zone; (3) the cortical plate/sub-plate; and (4) the marginal zone. During the next phase of neuronal migration, the cortical plate gradually develops more defined layers. Waves of newly generated neurons continue to migrate from the ventricular zone, past the sub-plate and into peripheral regions of the cortical plate, stopping short of the marginal zone. As a result, early born neurons are found in the deeper layers of the neocortex (layers V–VI), while later born neurons migrate beyond earlier migrating neurons to form the more superficial layers of the cortex (layers II–IV). This produces an inside-out pattern of lamination of the six cortical layers, as seen in both rats and humans (Rice and Barone, 2000; Nowakowski and Hayes, 2002; Diaz and Gleeson, 2009). Importantly, this process describes migration of excitatory glutamatergic pyramidal neurons (the bulk of cortical neurons), but neurons also migrate tangentially to reach their respective locations. Specifically, neurons that are destined to become inhibitory GABAergic cortical neurons show tangential migration, moving from their site of origin in the lateral and medial ganglionic eminence to their appropriate destinations in the cortex. This process typically is completed *after* radial migration ends, consistent with an initial hyper-excitability of immature cortex (i.e., tangential migration of inhibitory GABA neurons is delayed; see Marín and Rubenstein, 2001). Moreover, even as early GABAergic neurons complete migration and begin to form synaptic connections, they are initially excitatory (due to maturational shifts in intra-cellular/extra-cellular Cl− gradients).

Once neurons settle into a permanent position in cortex, synaptogenesis begins. During this stage, neurons extend their axons (via dynamic growth cones) to locate a target region on another neuron. As with proliferation and neuronal migration, the window for peak synaptogenesis varies across the CNS, but again generally follows a "lowest to highest" (caudal to rostral) scheme. Notably, differences in the timing of synaptogenesis are seen even between cortical layers, yet the mechanisms remain largely the same (see Webb et al., 2001). In brief, growth cones on the leading edge of the growing axon contain receptors that detect local chemo-attractants, chemo-repellants, and cell adhesion molecules in the extracellular environment. The topographic pattern of these cues arises out of differential regional gene transcription and translation, leading to a complex extra-cellular pattern that "guides" axons to post-synaptic targets (e.g., see Rowitch and Kriegstein, 2010). Once a growth cone finds an appropriate postsynaptic target (soma or dendrite), the axon stops growing, and differentiates into a presynaptic terminal, while the target specializes into a postsynaptic site (Webb et al., 2001; Nowakowski and Hayes, 2002). Notably, although these early synapses are initially "functional," they do not always function in the same manner as in adults (e.g., as indicated above, GABA is excitatory in early neurodevelopment but inhibitory in the mature brain; Ben-Ari, 2002). The first functional synapses emerge at approximately 27 weeks gestational age (GA) in human neocortex, with a peak in density around postnatal month fifteen (Huttenlocher and Dabholkar, 1997; Webb et al., 2001). In rats, the first functional cortical synapses are observed around E16, with peak synaptic density seen at approximately 3–4 postnatal weeks (P21–28; König et al., 1975).

Notably, at the same time that cortical neurons begin to seek intra-cortical targets (around P5 in rats), projecting axons from thalamic nuclei (whose terminals have been "waiting" in the subplate) begin moving into the cortex, seeking their target neurons in layer IV. The establishment of reciprocal cortico-thalamic and other subcortical projections proceeds slightly later, as cortical neurons in layers V and VI begin to extend *their* axons downward into the sub-plate, also seeking respective neural targets (Diaz and Gleeson, 2009). One interesting feature of this early process has particular relevance to the plasticity of young brains, and that is the fact that initial thalamo-cortical and reciprocal corticothalamic connectivity tends to be highly distributed and crossmodal (Katz and Shatz, 1996). This cross-modal connectivity in the very young brain is thought to give rise to unique reorganizational capabilities, such as the ability of temporal cortex to respond to visual stimuli in the congenitally deaf, and visual cortex to respond to somatosensory input in the congenitally blind (Bavelier and Neville, 2002). Based on these and other findings, it is believed that the immature brain can re-organize *across modalities* by retaining connections otherwise destined for pruning (Innocenti and Price, 2005). This process may also come into play in response to injury, for example as seen in the maintenance of ipsilateral motor connections that are retained when contra-lateral motor regions that would normally control function are injured (Johnston, 2009).

In general, it is believed that initial patterns of synapse formation in early development reflect a genetically mediated "best guess" of optimal neural configuration (Katz and Shatz, 1996), coupled with a dramatic exuberance in the production of neurons and synapses. As the brain matures, environmental stimulation (i.e., neural activation via input and action) facilitates the addition, elimination, and strengthening of synapses—allowing for further modification and refinement of neurocircuitry. Specifically (according to classic work by Hebb), synaptic circuits that receive the most activation persist and are stabilized, while circuits that receive little or no activation regress and are eliminated (Webb et al., 2001; Nowakowski and Hayes, 2002). This active elimination (pruning) of synaptic circuits continues well into postnatal life, with some areas of cortex (e.g., prefrontal cortex) pruning well into adulthood (early twenties in humans; Huttenlocher and Dabholkar, 1997). Myelination (or the formation of fatty sheaths around axons that increase speed and efficiency of conduction) also begins postnatally in rats and humans, and continues quite late in life. In fact, the proliferation of oligodendrocytes (pre-oligos) begins in the ventricular and subventricular zones largely *after* neuronal proliferation and migration is complete, and includes a "re-purposing" of radial glia (once their role in neuronal migration is over) into other forms of glia, including astrocytes (which support neurons) and oligodendrocytes (which produce myelin).

Behaviorally, the emergence of psychomotor and sensory functions necessary to perform more complex cognitive behaviors parallels the neurodevelopmental trajectory of structures and systems sub-serving those functions. That is, as different structures and neural systems come "on-line," correlated behavioral capabilities simultaneously emerge (see discussion of these parallel trajectories in humans by Casey et al., 2005). As an example, P15 rat pups are unable to perform a rotarod task (a common behavioral tool to assess motor coordination), with adult-like patterns of performance on this task emerging around P20 (Bâ and Seri, 1995). Concurrently, underlying functionality of *cerebellum* and *basal ganglia* also approach an adult-like state around P21–28 (Bâ and Seri, 1995). Developmental changes in cognitive ability can also be observed as a function of structural maturation. For example, the hippocampus shows rapid maturation between P21– 28 in the rodent (Bâ and Seri, 1995), and rodents also begin to show adult-like proficiency on spatial tasks such as the Morris water maze (a spatial learning and memory task) at this age (Bachevalier and Beauregard, 1993).

#### **EARLY DEVELOPMENT OF THE CENTRAL AUDITORY SYSTEM**

A bottom-up (lowest to highest) pattern of maturation is generally seen in the central auditory system, much as in other brain areas/systems. At the level of the inner ear, hair cells of the cochlea (which transduce sound waves into neural signals) undergo genesis and differentiation, and eventually form synapses with underlying spiral neurons of the auditory nerve. Once the cochlear apparatus and hair cells become functional, they are activated in response to stimulation of the tympanic membrane (or *in utero*, via bone conduction; Graven and Brown, 2008). In the immature system, spiral neuronal axons remain unmyelinated and of smaller diameter, accounting in part for initial long-latency responses to sound. The auditory nerve projects to the cochlear nucleus (CN), from which some ascending fibers cross to the contralateral superior olive (SO) and inferior colliculus (IC), and others synapse on the ipsilateral SO. The SO projects ipsilaterally to the IC, which projects to the medial geniculate nucleus (MGN) of the thalamus, and in turn to primary auditory cortex (AI). Development of functional connectivity between these structures appears to *precede* their peripheral activation by sound, with histologic evidence of synapses between hair cells, spiral ganglion neurons, CN and SO reported for rodents as early as E10–14 (Hoffpauir et al., 2009)—well before behavioral hearing onset (around P11–12, when the first indications of response to specific sounds are evident in rodents). Importantly, *spontaneous* activity—believed to be crucial to circuitry formation—can also be seen in these acoustic structures much earlier than P12 (i.e., before external activation, ascending propagation, and sound processing are evident; Tritsch and Bergles, 2010). Notably, core regions of these ascending structures are organized tonotopically (i.e., following an anatomic map characterized by progressive steps in the characteristic frequency producing maximal neural excitation, from low to high). This tonotopy is highly conserved in patterned ascending projecting systems in the adult brain, and is present in immature form (with initial representation mainly for mid-range frequencies) at the time of hearing onset (around P11 in rats; de Villers-Sidani et al., 2007).

In humans, the central auditory system reaches an initial milestone of maturity prenatally (onset of hearing), based on evidence of speech recognition for familiar voices in newborn infants, coupled with evidence of behavioral and auditory brainstem responses (ABRs) as early as 27–29 weeks of gestation (Sininger et al., 1997; Graven and Brown, 2008). However, the auditory system also continues to undergo considerable postnatal development, as evidenced by the high degree of behavioral plasticity, as well as changes in typical ABRs and auditory evoked potential response patterns (AERPs) across maturation. Indeed, while adult-like ABR to some low-frequency resolution tasks have been reported as early as 6 months (with high-frequency resolution developing slightly later in humans since higher frequencies are blocked *in utero*), many studies do not report adult-like ABR and/or AERP responses to more complex stimuli such as speech until much older ages (up to 16–18 years on some tasks; Fischer and Hartnegg, 2004).

Consistent with the generally later neurodevelopmental scheme in rats as compared to humans (with birth on P1 approximating mid-human gestation; Clancy et al., 2001; Workman et al., 2013), hearing and associated detectable ABRs do not come online in the rat until P11–12, with adult-like patterns of ABR and AERP emerging around P22 (depending on stimuli used). And as in humans, the ongoing development of higher acoustic structures undergoes substantial postnatal maturation. Over the period from P11 (approximate hearing onset) to P14, which has been identified as a "critical period" for plasticity in response to sound exposure in rats, de Villers-Sidani et al. (2007) report substantial expansion of the A1 cortical field, extension of high and low frequency representation, and decreases in neural thresholds and latencies to respond to sound. Moreover, tonotopic representation and response field properties are *highly* affected by experience during this window. For example, exposure of rats to chronic white noise during the first month of life results in deteriorated tonotopy, broader tone frequency tuning and degraded cortical temporal processing (as shown by poor response to rapid tone trains; Zhang et al., 2002; Zhou and Merzenich, 2008, 2009). Moreover, early exposure to noise appears to extend the "critical window" for auditory development, effectively prolonging immaturity of the system (Chang and Merzenich, 2003). Conversely, enriched postnatal exposure to tonal stimuli can enhance developmental precision and behavioral discrimination of sounds, with beneficial effects seen in rats following post-weaning acoustic enrichment and musical exposure (Engineer et al., 2004; Xu et al., 2009). Similarly, exposure to pulsed tones during development in rats broadens tone frequency tuning and results in an expansion of the A1 representations of the familiar tone frequency. Interestingly, these latter effects appear to trigger an earlier closure of the "critical window" (Zhang et al., 2001; de Villers-Sidani et al., 2007).

These developmental features characterize typical development of the central auditory system, but may also come into play in the neural response to early CNS disruption—particularly, disruptions known to alter auditory processing outcomes later in life.

### **A BRIEF HISTORIC OVERVIEW ON TIMING OF BRAIN INJURY AND OUTCOMES**

Studies of the long-term behavioral consequences of brain injury as a function of age began in earnest with the extensive and seminal work of Margaret Kennard, who sought to assess the impact of early lesions in non-human primates on motor outcomes as a function of lesion timing, laterality, extent, and location. Although Kennard's research did support a view that earlier lesions were at times less deleterious than comparable lesions at later ages, she also demonstrated that early lesions tended to lead to more negative outcomes when they occurred in regions that were closer to "functional maturity" at the time of injury (more like adult brains) as compared to more immature (later developing) regions. She also reported that initial evidence of sparing of function following early lesions could give way to emergent deficits later in life, with injured subjects failing to maintain typical maturation trajectories (reviewed in Dennis, 2010). Kennard supplemented these findings with her investigations of brain lesions and cerebral palsy in children, concluding that in general, the young brain had a remarkable capacity for reorganization following injury. Regrettably, interpretations of her research became over-simplified after her death, leading to the promulgation of the "Kennard Principle" (which Kennard never directly espoused). This view professed the simplified idea that behavioral recovery from brain injury would *always* benefit by occurring at an earlier time-point (i.e., the earlier an injury, the less severe the impact). This view was not entirely supported by Kennard's own work, nor by subsequent research. For example, subsequent work has demonstrated that early lesions to *subcortical* areas can produce devastating behavioral consequences (e.g., Schneider, 1979). Thus early central auditory system disruptions that extend into sub-cortical structures (e.g., MGN, IC, CN) might exert more profoundly deleterious effects on long term acoustic outcomes as compared to higher order (cortical) disruptions.

More recently, Kolb and colleagues conducted a series of lesion studies on juvenile rats, assessing relative behavioral outcomes using both motor and learning tasks, as well as histologic measures taken *post mortem* (Kolb et al., 1983, 1984; Kolb, 1987; Kolb and Elliot, 1987; Kolb and Tomie, 1988). Results showed intriguing differences as a function of the timing of injury, as well as the effects of unilateral versus bilateral injury. Specifically, these researchers found that bilateral focal cortical lesions on P1 or P5 led to worse outcomes than those seen for adult rats with similar lesions. Interestingly, bilateral focal lesions on P10 led to greater sparing and improved performance relative to P1, P5, or adult lesions. Conversely, the effects of complete unilateral cortical ablation were relatively mild when performed < P14, with outcomes far better than were seen for adult rats with comparable ablation. Results clearly seemed to suggest that recovery from unilateral injury—even that of a dramatic nature (e.g., hemidecortication)—is better in developing animals as compared to disruption in which homologous regions of both hemispheres are injured (Kolb, 1995). This intriguing principle could have important significance for the study of language disabilities, wherein researchers have long been puzzled by the fact that massive unilateral temporal lesions in early years still allow for language recovery, while individuals with no discernable neuropathology (at least as identifiable by current neuroimaging technology) can nonetheless exhibit profound language deficits (e.g., in specific language impairment (SLI) and/or dyslexia). This paradox suggests that developmental disruptions that occur bilaterally and very in early development (e.g., whole brain genetic or other prenatal risk factors) may lead to profound but subtle alterations in neural circuitry that are difficult to characterize via current technology, and yet could underlie robust changes in behavioral performance.

Additional research studies focused on the impact of early lesion timing (as measured by cognitive outcomes) have been conducted by Stiles and colleagues. These researchers assessed cognitive outcomes in language and visuo-spatial domains among infants and children with focal lesions (reviewed in Stiles et al., 2002). Results showed that: (1) patterns of long-term deficits depend greatly on when childhood lesions are incurred; (2) although early lesions do tend to lead to less pronounced deficits as compared to comparable lesions occurring later, subtle deficits can still be evidenced when the correct tasks are used; and (3) the pattern of outcomes differ (at least in humans) depending on whether lesions occur in the left versus right hemisphere, with children incurring early left lesions showing evidence of greater language preservation and recovery, but children incurring right lesions more likely to show persistent visuo-spatial deficits more comparable (though not as severe) as effects seen in similarly injured adults. The authors suggest that these disparities could reflect unique aspects of language organization in cortex, such as theories that language is protected at the expense of other domains in the developmental re-organization process (i.e., "crowding effects"). Alternately, it has been suggested that right hemisphere functions may be phylogenetically "older" and therefore more hard-wired (i.e., more difficult to shift to other uninjured cortical sites). Additional interpretations include the possibility that language functions show resilience to injury because of their more distributed nature, or that the relative timing of neural circuitry underlying language versus visuo-spatial functions may be "protective" to language. Overall, these findings have important implications for the study of outcomes in auditory processing following early brain injury, since aspects of auditory processing believed critical to language development (i.e., processing of rapid acoustic signals embedded in spoken language) may be more left-lateralized, while other aspects of auditory processing (e.g., processing of spectral components and music) could be preferentially sub-served by the right hemisphere (Okamoto et al., 2009)—at least in humans. The implications of such findings to small animal models where functional cortical lateralization is less evident remain unclear.

In summary, it is apparent that although long term outcomes following early brain disruption tend to be more adaptive following early injuries, many factors temper this phenomenon, including whether an injury is cortical or sub-cortical, unilateral or bilateral, left or right, and/or whether the incidence of injury occurs in a region and during a period of key neurodevelopmental events (e.g., window of peak proliferation or neuronal migration). In the following section we move to a discussion of research addressing specific neurodevelopmental mechanisms that might contribute specifically to anomalies in auditory processing and subsequent language development, with an emphasis on the possible role of differential plasticity as a function of the timing of early neural disruption.

## **AUDITORY PROCESSING DEFICITS AND LANGUAGE DISABILITY: HUMAN POPULATIONS, AUDITORY PROCESSING, AND ANIMAL MODELS**

## **A NEURAL SIGNATURE FOR DEVELOPMENTAL LANGUAGE DISABILITY?**

Given the "exceptions" to robust recovery from early neurodevelopmental disruption discussed, it is perhaps not surprising that developmental disorders—including developmental language disabilities—do occur, and with notable frequency. However, any early neural "plasticity" in response to underlying causal factors (such as genetic factors or undiagnosed injury or toxins) underlying these very early neurodevelopmental disruptions could reflect alterations in fundamental circuitry that occurred very early and are now hard to detect (i.e., via neuroimaging methods). In fact, a consistent underlying "neuropathological profile" or signature accompanying developmental disabilities of language tends to be subtle at best, and certainly very hard to identify—even by looking for "common denominators" across neurologic profiles of varied populations with developmental language impairment. For example, individuals with SLI and/or dyslexia show relatively subtle anatomic brain changes that require a large sample size for detection (e.g., alterations in asymmetry of the planum temporale and callosal cross-sectional area; see discussion by Leonard et al., 2006; Richardson and Price, 2009, for review). Similarly, populations with early hypoxic-ischemic (HI) injuries resulting from prematurity also exhibit poor language outcomes (among other anomalous cognitive measures; see Section Timing of Early Injury and Auditory Processing Outcomes in Rodent Models for further discussion), yet remarkably "normal" neural profiles (although subtle findings, such as abnormal fractional anisotropy in white matter circuits, appear to correlate with language outcomes in the preterm population; see Feldman et al., 2012). These results have puzzled researchers aiming to define the neural profile underlying developmental language disability, and to define specific neural substrates that might be studied in animal models (where experimental variables can be more easily and precisely controlled and studied). The remaining sections of this review further address this issue of a "neural substrate" for language disability, and our efforts to examine the relative impact of timing of "disruption of brain development" on behavioral outcomes relevant to the language domain (specifically, RAP) using rodent models.

### **SUBCORTICAL ANOMALIES AND RAPID AUDITORY PROCESSING DEFICITS IN LANGUAGE DISABLED HUMAN POPULATIONS**

In 1985, Galaburda and colleagues published a groundbreaking report of focal cortical anomalies found *post mortem* in the brains of four dyslexic individuals. Histologic characteristics of these malformations strongly suggested a genesis in prenatal development, since they revealed abnormal placement of neurons within cortical layers (i.e., malformations including ectopias and microgyria). More recently, similar findings have been reported for individuals with developmental language impairment (Oliveira et al., 2007; Brandão-Almeida et al., 2008; Boscariol et al., 2010, 2011). Initially, these findings were thought to implicate a relationship between clinical diagnosis and specific disruption of fronto-temporal regions critical to language processing (since the distribution of anomalies in the affected brains was substantially greater in left perisylvian areas). However, subsequent studies demonstrated additional—lower level—anomalies in the same brains. Specifically, cellular anomalies in the lateral geniculate thalamic nucleus (LGN) and MGN were reported, with an excess of small neurons and a paucity of larger neurons in the thalamic nuclei of the dyslexic brains (Livingstone et al., 1991; Galaburda et al., 1994). In the LGN, this effect was attributed to disruptions specifically to the magnocellular sub-division, although in MGN, similar functional/structural sub-divisions have not been clearly identified (but see Stein, 2001). Moreover, related studies indicate that the reduction in large magnocellular cells of the LGN in dyslexic brains was likely associated with concurrent functional evidence that dyslexic subjects exhibit deficits in processing temporally relevant (magnocellular) aspects of visual information (i.e., low-contrast motion; Lovegrove et al., 1990; Livingstone et al., 1991; Slaghuis et al., 1992; Lehmkuhle et al., 1993).

Evidence of thalamic disruption in dyslexic brains led to a novel conjecture about the relationship between neuropathology and dyslexia. Specifically, the findings suggested that early disruption of developing cortico-thalamic projections could exert a cascading deleterious impact on *lower-level sensory processing*, and thus disrupt initial language development, and/or subsequent online processing (in both cases, a "bottom-up" phenomenon). In fact, recent and intriguing new research has shown processing anomalies at the level of the MGN (auditory thalamus) using neuroimaging technology in adult dyslexics during a phonemic processing task (Díaz et al., 2012). In accord with these findings, evidence of a concurrent reduction in large cells of the MGN of dyslexic brains (Galaburda et al., 1994) has been suggested to relate to consistent and wide-spread evidence that developmentally language disabled populations (including dyslexics) show deficits in processing rapidly changing aspects of auditory information. In fact, an early seminal series of studies by Tallal and colleagues showed that children diagnosed with SLIs were significantly worse than controls in discriminating fast (but not slow) tone sequences, and also were significantly worse than controls in discriminating consonant-vowel syllables with short, rapidly changing formant transitions (e.g., /ba/, /da/, /pa/, /ta/; see Tallal and Piercy, 1973a,b, 1975; Tallal, 1980, 2004; Tallal and Newcombe, 1978; Tallal and Stark, 1981; reviewed in Fitch and Tallal, 2003). Ongoing behavioral and psychophysical studies continue to accumulate demonstrating core deficits in RAP in varied developmentally language-disabled populations (McCrosky and Kidder, 1980; Reed, 1989; Robin et al., 1989; Watson, 1992; Neville et al., 1993; Farmer and Klein, 1995; Hari and Kiesla, 1996; Kraus et al., 1996; McAnally and Stein, 1996, 1997; Wright et al., 1997; Witton et al., 1998; Sutter et al., 2000; Renvall and Hari, 2002; Edwards et al., 2004; Cardy et al., 2005; Corbera et al., 2006; Au and Lovegrove, 2007; Cohen-Mimran and Sapir, 2007; Gaab et al., 2007; King et al., 2007).

Notably, although some critics suggest that auditory deficits could be simply co-morbid (parallel but non-causal) to language deficits (McArthur and Bishop, 2001; Rosen and Manganari, 2001; Ramus, 2003), ongoing research has revealed compelling evidence of robust longitudinal prediction. For example, Benasich et al. (2002, 2006) found that infants with a family history of language impairment or dyslexia (i.e., at an elevated risk of developing language problems; Tallal, 1980) were impaired relative to controls in the ability to discriminate twotone sequences incorporating a short inter-tone interval, but not a longer interval. Longitudinal follow-up of these children revealed a strong relationship between early auditory processing thresholds and language outcomes at 12–24 months in both at-risk and typical groups. More recently, a similar relationship was seen for early AERP/EEG scores using the same two-tone sequences and language outcomes (Choudhury et al., 2007; Choudhury and Benasich, 2011). Predictive associations between early auditory processing skills have also been related to language performance in typically developing samples. Trehub and Henderson (1996) found that children who had performed above the median on a variety of acoustic gap detection tasks at 6 or 12 months were found to have larger productive vocabularies, use longer, more complex sentences, and produce more irregular words compared with children who had scored below the median. Such findings are supported by evidence from studies recording event related potential (ERPs) to auditory stimuli in infancy. Molfese and Molfese (1997) found that ERPs to consonant-vowel syllables recorded from infants within 36 hours of birth differed between children whose verbal IQ was above the norm at 5 years. Similarly, infants with a family history of dyslexia showed different patterns of ERPs to consonant-vowel stimuli as compared to matched controls at 1 week and at 6 months (Leppänen and Lyytinen, 1997; Leppänen et al., 1999; Pihko et al., 1999; see also Leppänen et al., 2012).

Collectively, the data clearly support the notion that the ability to make fine grained auditory discriminations (RAP) is strongly related to later language development, and that deficits in this basic function may impair subsequent language development—with ultimate implications for higher-order processes (such as reading) that are seemingly distal to (i.e., far downstream/upstream from) basic acoustic processing. These and other findings argue convincingly for a relationship between early acoustic processing capabilities (such as might be affected by disruption to auditory thalamic structures such as the MGN), and long-term language outcomes. Based on these links, a theoretical "next step" was to examine the neurodevelopmental underpinnings for this functional language "pre-cursor"—RAP—in a non-human model.

#### **ANIMAL MODELS OF RAPID AUDITORY PROCESSING DEFICITS**

Initial efforts in developing an animal model for RAP deficits focused on evidence that induction of a focal freeze lesion to cortex of a 1-day-old rat pup (performed through the skull cap, which is very thin at this age) would lead to the subsequent formation of a microgyrus—a focal region of cortex characterized by anomalous cortical layers (thus indicative of abnormalities in migration; see **Figure 2**). Microgyri induced in this manner were found to be remarkably histologically similar to the microgyria identified by Galaburda et al. (1985) in *postmortem* human dyslexic brains (Dvorák and Feit, 1977; Humphreys et al., 1991; Rosen et al., 1992). Subsequent research revealed that rats with induced unilateral or bilateral microgyria consistently evidenced

deficits in RAP—deficits remarkably similar to those seen in children and adults with language dysfunction (note that auditory processing deficits were greater for rats with bilateral microgyria and/or bilateral double microgyria; Fitch et al., 1994; Clark et al., 2000a,b; Rosen and Manganari, 2001; Peiffer et al., 2002, 2004b; Threlkeld et al., 2009). Moreover, these same microgyric rats showed anatomic disruptions in the MGN, also similar to those seen in human dyslexic brains (i.e., a shift in cell size distribution toward smaller cells as compared to sham MGN; Herman et al., 1997; see also Peiffer et al., 2002). It remains unknown whether the shift in cell size in the MGN associated with these induced cortical anomalies reflects a loss of large MGN neurons, or some other developmental mechanism.

Importantly, the behavioral RAP deficits found in microgyric rats were seen concomitantly with normal performance (comparable to shams) on easier acoustic tasks that *did not incorporate a temporal demand*, such as simple tone detection, long silent gap detection, or discrimination of two-tone sequences with longer inter-stimulus intervals. Moreover, additional research revealed that microgyria-induced RAP deficits were particularly evident in juvenile rats, as compared to these same subjects when tested in adulthood (Friedman et al., 2004; Peiffer et al., 2004a). Specifically, whereas young microgyric rats exhibited RAP deficits that could be elicited on relatively simple (but still temporally demanding) tasks such as short gap detection, more complex rapid processing tasks (such as discrimination of two-tone sequences with short intra-stimulus intervals) have been used to elicit more robust deficits in older microgyric rats. These findings may parallel similar developmental trends seen in child versus adult human dyslexic populations—specifically, that impairments in silent gap detection thresholds are seen in dyslexic children, but are no longer seen in dyslexic adults (Hautus et al., 2003). Also, these findings may be consistent with suggestions that while some of the more basic sensory processing deficits in language disabled populations may remediate with age, the long-term consequences of those early deficits (as measured by language processing) may persist.

Given these multiple parallels between the emergence of RAP skills in an animal model, and human clinical data, we set out to explore more specifically the parameters governing the relationship between early brain disruption and auditory discrimination outcomes in a rodent model. This approach included a series of studies examining the relative impact on long-term processing of rapidly changing acoustic information following different *types* of early brain injury, as well as different *timing* of injuries, in efforts to provide insights about how and when the brain might respond to disruptions/injuries that are relevant in human populations—to long-term language outcomes (for further discussion see Fitch et al., 1997a; Fitch and Tallal, 2003). Importantly, for all of the studies described below, easier versions of acoustic tasks were also used to ensure that impaired subjects could hear and process basic sounds. These distinctions are critical in pointing out that we are not modeling a generalized learning or sound processing disorder but rather, a deficit specific to the processing of rapidly changing (short duration) acoustic stimuli.

## **TIMING OF EARLY INJURY AND AUDITORY PROCESSING OUTCOMES IN RODENT MODELS**

#### **A RAT MODEL OF CORTICAL NEURONAL MIGRATION ANOMALIES AND RAP OUTCOMES**

To further examine the underlying neurodevelopmental events that may contribute to functional RAP deficits, we investigated silent gap detection capabilities in juvenile and adult rats that received bilateral freezing lesions or sham surgery on P1, 3 or 5 (Threlkeld et al., 2006), following on the procedure described earlier that leads to cortical microgyria when performed on P1 (as described in Section Animal Models of Rapid Auditory Processing Deficits, see **Figure 2**). The behavioral task was developed based on the widely held view that the ability to detect a very brief silent gap in a white noise background is a good measure of fine-grained temporal acoustic acuity, particularly at very short durations. As such, we employed gap durations between 0 and 10 msec (although easier/longer duration stimulus versions of the task were also used for comparison). This silent gap detection task was embedded in a pre-pulse inhibition paradigm (allowing us to assess rodent processing thresholds without a need for training and learning confounds; Fitch et al., 2008). The timing of the lesions on P1, P3 and P5 was selected based on evidence that, relative to human neurodevelopmental milestones, these dates would correspond roughly to human GA's 20, 25, and 30 weeks (i.e., prenatal development; Clancy et al., 2001; Workman et al., 2013). Importantly, the critical neurodevelopmental events ongoing in the rat brain during this period include the end of neuronal migration to upper cortical layers—which is largely completed by P2–3 in rats (although cortical neuronal migration is entirely prenatal in humans). Consistent with this timeline, our histology revealed classic "microgyria" in P1 and 3 focal lesioned rats, but *not* in the P5 lesion group (which only showed evidence of glial cortical scarring). We also found a significant reduction in brain weight and neocortical volume in P1 and 3 lesioned (microgyric) brains relative to shams (Threlkeld et al., 2006), as well as graded reduction in the size of the corpus callosum that was most evident in P1 lesioned (microgyric) subjects (Threlkeld et al., 2007b). In terms of behavioral outcomes, RAP scores (on the 0–10 msec silent gap task) from subjects in the juvenile period revealed significant RAP deficits in *all three lesion groups* as compared to sham subjects, but adult (P60+) data revealed a persistent disparity *only between P1-lesioned (microgyric) rats and shams* (Threlkeld et al., 2006; **Figure 3A**).

Importantly, we have reported previously that the cortical location of lesion/microgyria induction is not a critical variable in eliciting later RAP deficits in rats (Herman et al., 1997). That is, focal bilateral lesions induced in *parietal, visual, or pre-frontal* cortex were all found to lead to RAP deficits in rats (as measured

adulthood, only P1 lesioned (microgyric) subjects were significantly impaired as compared to shams. Data adapted from Threlkeld et al. (2006). **(B)** For subjects with HI induced on P1 and P7, both groups again showed deficits in the juvenile period, but only P7 HI remained impaired relative to matched shams in adulthood. Data adapted from McClure et al. (2006).

on silent gap detection tasks), regardless of lesion location. In fact, the standard microgyria induction protocol for the P1/3/5 timing study described above used lesion induction directed at *parietal* and not temporal cortex (Threlkeld et al., 2006). Thus convergent data suggest that some form of *generalized* pathology affecting overall neocortical and/or cortical/sub-cortical development is responsible for these emergent RAP deficits, rather than factors specific to the local formation of microgyria in auditory cortical areas *per se*. This is also consistent with evidence (described above) that the window for the induction of RAP deficits via focal disruption of cortical neuronal migration is constrained to *the window during which neuronal migration occurs* (i.e., < P3 in rats). We hypothesize that a disruption to the formation of cortical layers in a focal region of cortex (through ischemic necrotic death of middle cortical layers) may initiate a cascade of developmental changes impacting on cortico-thalamic connectivity—leading in turn to developmental changes in the thalamus itself. This latter view is consistent with evidence that cellular changes in the MGN induced by microgyria formation are *also* seen regardless of microgyria location in cortex (Herman et al., 1997). Interestingly, even though the development of cortico-thalamic projections is still ongoing in rats at P5 (Diaz and Gleeson, 2009), the fact that cortical layering is largely established at this time may minimize the subsequent disruption to the developmental cascade, with *transient* rather than permanent effects evident on functional auditory processing (RAP) in rats that received a focal lesion when cortical layers were largely in place (P3–5; Threlkeld et al., 2006; **Figure 3A**).

### **A RAT MODEL OF PREMATURE VERSUS TERM HYPOXIC-ISCHEMIC (HI) INJURY AND RAP OUTCOMES**

In addition to collective findings linking cortical neuronal migration anomalies with deleterious long-term language outcomes, other forms of early brain disruption are also associated with impaired long-term language outcomes. In particular, a major cause of brain injury among neonates involves HI injuries, reflecting compromised blood and/or oxygen delivery to the brain.

In premature/very low birthweight (VLBW) infants, brain injury can arise due to fragile cerebral vascular systems as well as poor auto-regulation. Specifically, blood pressure fluctuations can lead to ruptures, which in turn can result in intraventricular hemorrhage (IVH; bleeding within the ventricles) or periventricular hemorrhage (PVH; bleeding surrounding the ventricles; Volpe, 1997, 2009). Ischemic re-perfusion failure, characterized by collapse of capillaries during low blood pressure fluctuations followed by failure to re-perfuse, can also lead to nonhemorrhagic HI injury (e.g., periventricular leukomalacia PVL; Volpe, 2001). PVL is associated with a loss of white matter surrounding the ventricles. Similarly, HI injuries can arise in term infants, typically following birth complications such as cord prolapse, placental disruptions/failure, and/or cord asphyxia (Johnston et al., 2001; Volpe, 2001; de Vries and Cowan, 2009; Lai and Yang, 2011). Due to the more global nature of these insults, full term infants with HI events are more commonly diagnosed with hypoxic ischemic encephalopathy (HIE), and show damage in predominantly gray matter areas such as cortex, hippocampus, basal ganglia, and thalamus (Huang and Castillo, 2008; Martinez-Biarge et al., 2011).

Not surprisingly, both preterm and term HI populations exhibit long-term disruptions in language abilities. For example, children born very prematurely are at elevated risk for early language delays (Foster-Cohen et al., 2007), and show deficits in spelling, reading, and writing, as well as receptive and expressive language (Ortiz-Mantilla et al., 2008; Luu et al., 2009; Van Lierde et al., 2009). Early language measures also predict later language scores in this population, for example with comprehension scores at 4 years correlating with later performance on language comprehension, naming, and auditory discrimination tasks (Jansson-Verkasalo et al., 2004). At age 6, these same subjects showed alterations on mismatched negativity during naming tasks and difficulty in pre-attentively discriminating changes in syllables (Jansson-Verkasalo et al., 2004). Full term infants with moderate to severe HIE also show receptive language, reading and spelling scores in childhood that are significantly lower than healthy full term control scores (Badawi et al., 2001), and correlations can be found between verbal IQ and degree of injury (Steinman et al., 2009). Importantly, researchers have also demonstrated that children diagnosed with severe PVL lesions at birth show deficits on RAP tasks later in childhood (Downie et al., 2002), opening the door to behavioral assessments of RAP in animal models of induced neonatal HI injury as a possible window to neuropathological underpinnings of language difficulties in this population.

Fortunately, animal models can provide further insight into the neuroanatomical and behavioral features of neonatal HI injury, for example using the Rice-Vannucci method (Vannucci and Vannucci, 2005). This model entails cauterization of the right common carotid artery followed by exposure to a less than normal oxygen environment for a period of time (typically 8% oxygen (as opposed to the normal 21% partial pressure) for 90– 150 min; Vannucci and Vannucci, 2005). Induction of HI injury using this method in rodents between P1–5 can produce injuries that correspond roughly to those seen in premature/VLBW infants with HI, including ventriculomegaly and predominantly white matter damage (much like human PVL; Scafidi et al., 2009; **Figure 4**). Conversely, injury induced between P7–10 leads to neural anomalies that appear to correspond to term birth HI injury, with gray matter damage predominating (as in the case of HIE; Vannucci and Vannucci, 2005; **Figure 4**). These differential neuropathological profiles open the door to experimental assessment of the impact of timing of induced HI on neuropathogy and associated long-term RAP profiles.

Recently, we performed a study to characterize the similarities/differences in RAP and other behavioral outcomes following early (P1–3) and late (P7) HI injury in rats. Male rats with comparable HI (same period of hypoxia) but induced on P1/P3 *or* P7, as well as sham controls, were tested on a variety of behavioral tasks in both juvenile and adult periods. Results showed that all groups could hear normally, and could comparably perform simple sound processing tasks (e.g., single tone detection and longgap detection). However, P1/P3 HI animals showed only *transient* deficits on RAP tasks (in the juvenile period but not in adulthood) as compared to shams (McClure et al., 2006; Alexander et al.,

submitted-1; **Figure 3B**). P7 HI animals, conversely, exhibited *persistent* deficits in processing rapid acoustic information across both juvenile and adult periods (see **Figure 3B**). Also, P1–3 HI animals did not show any significant reductions in brain volume that we could detect, although substantial reductions in the volume of right cerebral cortex, hippocampus and striatum (as measured by stereologic reconstruction) were seen in P7 HI rats. Interestingly, P7 HI rats also showed a significant shift to more small and fewer large neurons in the MGN, an effect that was *not* seen in P3 HI subjects (Alexander et al., submitted-2).

Here, our results appear to contradict the findings of Threlkeld et al. (2006), where we found that a focal induced ischemic lesion leading to the formation of microgyria had the most significant effects on RAP, brain weight, and callosal area when performed on P1, rather than P3 or P5 (**Figure 3A**). Using an induced HI injury, we found virtually an opposite effect—that subjects with P1/3 HI had only transient RAP deficits, while those with P7 HI had *permanent* robust RAP deficits as well as significant loss of neural tissue in a variety of regions, and a shift in MGN cell size towards more small and fewer large neurons (an effect also seen in microgyric rats when lesions were induced on P1; Herman et al., 1997; **Figure 3B**). The possible implications of these combined findings are discussed further below.

## **AUDITORY EXPERIENCE AND AMELIORATION OF AUDITORY DEFICITS IN RODENT MODELS**

A key final note is that we have found an important role for age of testing in eliciting RAP deficits associated with early neural disruption, and we have also found an impact of prior experience on outcomes during later testing. Specifically, we performed a study in which male rats received bilateral induced microgyria (via focal ischemic cortical lesions on P1, see Section Animal Models of Rapid Auditory Processing Deficits), while comparable sham littermates were retained as baseline controls. In addition, a subset of these animals were tested on auditory processing tasks as juveniles, while their counterparts remained undisturbed until adult testing, when all animals again received a full battery of auditory discrimination assessments. Results were extremely intriguing. First, test results from naïve juvenile rats compared to naïve adult rats showed a small maturational improvement in auditory processing acuity (with better performance and lower thresholds in adults). Second, results showed that the performance of adult shams that *received juvenile testing* improved orders of magnitude more than was seen from endogenous (undisturbed) maturation alone. Third, we found that the microgyria-associated deficits in RAP, which were significant in our juvenile samples, were *no longer seen* when these same rats were tested as experienced adults. However, when examining the naïve adult cohort, significant deficits among the microgyric subjects on RAP tasks *were* found (Threlkeld et al., 2009). These results point to critical issues regarding the role of assessment in defining disorders specifically where prior experience has occurred, such that underlying deficits may be masked or even remediated. Normal human development entails substantial experience of varied and complex nature, and thus our ability to assess and define critical underlying processing deficits in older populations (as is necessary to disentangle the neurologic and behavioral underpinning of higherorder dysfunction) is called into question. In fact, the results described above may help to explain why evidence of basic deficits can be subtle or may even fail to be replicated across studies with clinical language disabled populations. On the other hand, the ability to test infants and small children is constrained by our inability to diagnose language difficulty until relatively late milestones fail to be achieved. This conundrum represents a huge issue in human clinical language disability research, and clearly highlights one reason that animal research is crucial to a complete understanding of the mechanisms at play in the complex process of emergent developmental disorders of language.

## **DISCUSSION**

The research reviewed here highlights several principles of developmental response to brain injury and the role of timing. First, it is clear from a vast literature that the young brain is indeed "plastic," and in many cases can respond more effectively to external input (as measured by learning) when compared to the adult brain. Moreover, these adaptive features of plasticity can extend to the response of the developing brain to disruption, where positive compensatory and/or adaptive responses to injury (leading to functional optimization) are often seen in the young brain. However, this latter extension of the beneficial effects of "early plasticity" must be qualified. Specifically, the parameters determining whether the developmental response to a disruption (injury, mutation, toxin) will be "adaptive" or "maladaptive"—or what Giza and Prins (2006) call "good" versus "bad" plasticityremains something of a mystery. Indeed, it is difficult to ascertain how brain mechanisms *can* be uniquely responsive to immediate cues in deploying patterns of reorganization that will provide an optimal compensatory outcome down the road (as opposed to an *even more deleterious* behavioral outcome) following a given disruption. And the answer may be that the brain is only *coopting* existing mechanisms that evolved to support maximal early development and learning. Indeed, it seems unlikely that evolutionary pressures acted directly on mechanisms of response to brain injury *per se*, since strong reproductive contributions after such injuries seem unlikely. Accordingly, in some cases, *patterns of re-organization and/or compensation to early disruption may actually be worse than if no re-organization had occurred at all* (Schneider, 1979; Giza and Prins, 2006).

### **WINDOWS OF VULNERABILITY**

Certainly, experiencing injury or disruption during a window of heightened vulnerability (i.e., a period of critical processes) can be one impediment to optimal reorganization. In this case, disruption of a key process that cannot be reproduced or mended by following an "alternate route" appears to cause permanent deleterious consequences. Periods of peak neurogenesis, for example, represent windows of particular vulnerability for fetal exposure to radiation and toxins (Rice and Barone, 2000). Similarly, cortical disruption during peak periods of migration, or during later periods of neuronal maturation and critical synaptogenesis, may lead to long term deficits that might not be seen in response to the same disruption at a slightly earlier or later point. We suggest that this interpretation may explain why focal freezing lesions to the cortical plate that produce microgyria lead to lasting RAP deficits when induced on P1, but not on P5 after neuronal migration is completed (Threlkeld et al., 2006). These cortical anomalies may in turn lead to deleterious developmental changes that ultimately alter sub-cortical functions (e.g., MGN; Herman et al., 1997).

This interpretation is consistent with findings from other developmental manipulations we have performed that also produce RAP deficits in rodent models, for example the *in utero* knock-down of dyslexia risk genes. Specifically, evidence has identified both Kiaa0319 and Dyx1C1 as risk genes for dyslexia, and concurrent animal research shows that both genes are involved in regulating early neuronal cortical migration (E14—P3 in rats; Galaburda et al., 2006). Accordingly, the RNAi knock-down of the rodent homolog's for these proteins (transfected into newborn ventricular zone neurons) would be expected to impair the cortical neuronal migration process—and in fact, migrational anomalies are seen in the cortex of both Kiaa0319 and Dyx1c1 RNAi rodent models (Galaburda et al., 2006). Importantly, we found also that these *in utero* manipulations led to later RAP deficits in these same rats (Threlkeld et al., 2007a; Szalkowski et al., 2012, 2013). Recently published related research has also demonstrated anomalies in neuronal encoding of speech stimuli in cortical neurons from rats transfected embryonically with Kiaa0319 (Centanni et al., 2013). And, in Dyx1c1 RNAi animals, a shift in cell size of the MGN (towards more small and fewer large cells) was also found (Szalkowski et al., 2013). Again, these findings point to the critical consequences of disrupting cortical neuronal migration.

#### **BILATERAL VERSUS UNILATERAL INJURY**

Another factor in interpreting the experimental data presented here is that induced cortical microgyria (though small and focal) were *bilateral* (see Section Animal Models of Rapid Auditory Processing Deficits for details), whereas our more severe HI injury was *unilateral* (noting that some injury to hemisphere contralateral to carotid ligation can occur, but most pathology measures fail to show significant cell death or tissue loss from the period of reduced oxygen alone). Thus our HI findings appear consistent with those of Kolb (1995), who showed that recovery from very early *bilateral* injuries is particularly poor, whereas rats showed remarkable preservation of cognitive skills following complete hemi-decortication during this same early window (Kolb, 1995). Indeed, the bilateral nature of our induced microgyria versus unilateral HI injury could account in part for different patterns of outcome on RAP tasks. In support of this view, a related study examined auditory outcomes as measured by A1 neuronal recordings in rats subjected to complete anoxia (0% oxygen) for about 15 min on P1 and again on P2. Here—unlike induced HI injuries that employ a coupling of unilateral carotid ligation and prolonged reduced oxygen to produce a unilateral injury—rats were subjected to a very severe anoxic incident impacting both hemispheres equivalently. Interestingly, the authors of this study found that after an anoxic incident on P1 and P2, acoustic responses from neurons in A1 as measured in adulthood (P90+) were significantly degraded (Strata et al., 2010). Changes included broader tuning curves, increased latencies, reduced response amplitudes, and a degraded capacity to follow high-rate repetitive stimuli. Authors suggest that although measures were recorded from the cortex, anomalies in processing may very well have arisen at lower levels of the auditory system (e.g., CN, IC, or medial geniculate; see also Strata et al., 2005), but anomalies did not appear to reflect direct damage to cochlear mechanisms (based on histology; Strata et al., 2010). Although Strata and colleagues did not perform behavioral assessments, these findings supplement our own in showing that a severe *bilateral* developmental disruption on P1 or P2 can produce lasting deficits in acoustic signal processing, even though a severe *unilateral* HI injury on P1 or P3 failed to exert permanent effects on RAP (McClure et al., 2006; Alexander et al., submitted-1).

### **CORTICAL VERSUS SUBCORTICAL ANOMALIES**

Here we return again the evidence that—across various developmental rodent models we have successfully employed to elicit persistent behavioral RAP deficits (including P1 induced cortical microgyria, P7 HI, and *in utero* RNAi transfection with Dyx1c1)—animals demonstrating RAP deficits *consistently also* show significant cellular anomalies in the MGN. In the P1 microgyria model, these anomalies *must* be secondary to cortical disruption, since no direct injury was induced in thalamus. Similarly, in RNAi knock-down of Dyx1c1, MGN anomalies are seen along with migrational abnormalities in cortex (including ectopias, microgyria and band heterotopias) that reflect direct transfection of newborn cortical neurons in the ventricular zone. In the case of induced HI or anoxia, of course it is possible that damage to the MGN occurs through direct injury, but equally possible that the substantial injury to cortex exerts deleterious effects via corticothalamic developmental feedback. Overall, cumulative findings consistently point to the fact that, despite the relatively profound plasticity of the developing cortex (i.e., evidence that many early cortical injuries can be compensated through reorganization and/or other forms of plasticity), injuries that trigger developmental changes that *cascade into sub-cortical structures* may lead to profound and lasting deficits for which the developing system is unable to compensate. Such effects may be particularly profound when they occur in neural substrates upon which critical and distributed cognitive processes—such as language—are built. This assertion is consistent with evidence that subcortical indices of speech/language processing are highly predictive of higher order language difficulties in children (Hornickel et al., 2009; Díaz et al., 2012), that language impairments associated with sub-cortical anomalies tend to be more severe (Aram and Eisele, 1994), and also that volumetric measures in subcortical regions can accurately predict language outcomes (Ortiz-Mantilla et al., 2010).

## **CONCLUSION**

Cumulative evidence presented here suggests that developmental neuronal reorganization triggered by disruption—regardless of when (within the early postnatal window we examined) or how disruption occurs—that alter *subcortical* development in some way may have particularly maladaptive consequences for later ability to process rapidly changing acoustic information. This latter point may explain why profound impairments in critical processes such as language can be evidenced even when a brain appears to be anatomically "normal" at a gross level. In effect, developmental "rescue" mechanisms may have been deployed in response to whatever underlying deviations occurred (i.e., genetic, toxins, injury), yet these mechanisms failed to prevent a deleterious functional outcome. In fact, re-organizational mechanisms as implemented may have produced *worse* outcomes.

#### **REFERENCES**


rodents, monkeys, and humans. *Hippocampus* 3, 191–201.


Moreover, negative consequences may be particularly pronounced as measured by processes that are highly dependent on speed of processing (which requires optimal neural efficiency)—such as the discrimination of rapidly changing sensory input.

In closing, a review of our data—in combination with that of many others—supports the position that the developing brain responds very differently to injury as compared to the adult brain, and that in many cases this response is in fact adaptive. Indeed, infants and children show overall better cognitive outcomes following injuries and disruptions that would be devastating to an adult brain. On the other hand, some complex higher order processes—particularly the unique process of language (which requires optimal processing at *both* low levels of the auditory system (to encode speech), as well as optimal complex encoding at higher levels of cortical language-specific areas)—can be particularly vulnerable to developmental shifts that *alter critical subcortical processing stations*. And although RAP deficits may have minimal effect on species survival and evolutionary fitness in a non-lingual species such as rodents, humans—who have evolved complex higher order processes that are integral to the ability to function in society—show devastating behavioral impairments that include disruptions of critical language and reading development. Future research is needed to address how reorganizational mechanisms leading to alterations in subcortical morphology might be triggered, and how interventions might be employed to guide the developing CNS to an optimized neural "system" following disruption that would preserve RAP functions critical to later language development.

## **ACKNOWLEDGMENTS**

The authors wish to acknowledge Dr. Albert Galaburda for theoretical contributions to the work described here. Research was supported by NIH Grant HD049792, and by NIH grant P01HD57853.

doi: 10.1016/j.neuropsychologia. 2005.06.004


*Dev.* 33, 824–831. doi: 10.1016/j. braindev.2010.12.006


*Neurosci.* 9, 1213–1217. doi: 10. 1038/nn1772


doi: 10.1097/00005072-199103000- 00006


ablation produce similar behavioral sparing but opposite effects on morphogenesis of remaining cortex. *Behav. Neurosci.* 97, 154–158. doi: 10.1037/0735-7044.97.1.154


eds M. H. Johnson, Y. Munakata, and R. O. Gilmore (Baltimore, MD: Brooke Publishing), 57–82.


dysfunction? *Curr. Opin. Neurobiol.* 13, 212–218. doi: 10.1016/s0959- 4388(03)00035-7


*Hear. Res.* 1–2, 27–38. doi: 10. 1016/s0378-5955(96)00178-5


behavioral impairments in rats following in utero RNAi of candidate dyslexia risk gene Kiaa0319. *Int. J. Dev. Neurosci.* 30, 293–302. doi: 10. 1016/j.ijdevneu.2012.01.009


mental cortical injury differentially alters corpus callosum volume in the rat. *BMC Neurosci.* 8:94. doi: 10. 1186/1471-2202-8-94


species. *J. Neurosci.* 33, 7368–7383. doi: 10.1523/jneurosci.5746-12. 2013


influences of early acoustic environments on primary auditory cortex. *Nat. Neurosci.* 4, 1123–1130. doi: 10. 1038/nn745


cortical temporal processing restored by training. *Nat. Neurosci.* 12, 26–28. doi: 10.1038/nn. 2239

**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 17 July 2013; accepted: 11 September 2013; published online: 21 October 2013.*

*Citation: Fitch RH, Alexander ML and Threlkeld SW (2013) Early neural disruption and auditory processing outcomes in rodent models: implications for developmental lan-* *guage disability. Front. Syst. Neurosci. 7:58. doi: 10.3389/fnsys.2013. 00058*

*This article was submitted to the journal Frontiers in Systems Neuroscience.*

*Copyright © 2013 Fitch, Alexander and Threlkeld. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Plasticity after perceptual narrowing for voice perception: reinstating the ability to discriminate monkeys by their voices at 12 months of age

## *Rayna H. Friendly1, Drew Rendall <sup>2</sup> and Laurel J. Trainor1,3\**

*<sup>1</sup> Department of Psychology, Neuroscience and Behaviour, McMaster University, Hamilton, ON, Canada*

*<sup>2</sup> Department of Psychology, University of Lethbridge, Lethbridge, AB, Canada*

*<sup>3</sup> Rotman Research Institute, Baycrest Centre, Toronto, ON, Canada*

#### *Edited by:*

*Virginia Penhune, Concordia University, Canada*

#### *Reviewed by:*

*Minna Huotilainen, University of Helsinki, Finland Psyche Loui, Wesleyan University, USA*

#### *\*Correspondence:*

*Laurel J. Trainor, Department of Psychology, Neuroscience and Behaviour, McMaster University, 1280 Main Street West, Hamilton, ON L8S 4L8, Canada e-mail: ljt@mcmaster.ca*

Differentiating individuals by their voice is an important social skill for infants to acquire. In a previous study, we demonstrated that the ability to discriminate individuals by voice follows a pattern of perceptual narrowing (Friendly et al., 2013). Specifically, we found that the ability to discriminate between two foreign-species (rhesus monkey) voices decreased significantly between 6 and 12 months of age. Also during this period, there was a trend for the ability to discriminate human voices to increase. Here we investigate the extent to which plasticity remains at 12 months, after perceptual narrowing has occurred. We found that 12-month-olds who received 2 weeks of monkey-voice training were significantly better at discriminating between rhesus monkey voices than untrained 12-month-olds. Furthermore, discrimination was reinstated to a level slightly better than that of untrained 6-month-olds, suggesting that voice-processing abilities remain considerably plastic at the end of the first year.

**Keywords: voice discrimination, perceptual narrowing, infant development, plasticity, cross-species experience, learning**

"fpsyg-04-00718" — 2013/10/9 — 12:02 — page 1 — #1

## **INTRODUCTION**

Human perception becomes specialized for socially relevant information in faces, voices, music, and language through a process of *perceptual narrowing,* whereby perception improves for native stimuli experienced in the environment, and becomes worse for foreign stimuli not experienced in the environment (for reviews, see Scott et al., 2007; Lewkowicz and Ghazanfar, 2009). Perceptual narrowing likely contributes to facilitating identification of individuals within one's group and becoming a fully functioning member of that group. This specialization enables people to discriminate between individuals, identify group and species members, and discern who is from one's own group and who is from outside one's group in order to help inform decisions such as whether to approach or withdraw from a situation. A number of studies indicate that, although perceptual narrowing appears to be largely accomplished by the end of the first year after birth, a certain amount of plasticity remains beyond this age. Here we examine whether around 12 months of age a relatively small amount of experience with voices from a foreign species with which infants have little prior experience can reinstate the ability to discriminate pairs of voices from that foreign species.

An advantage for processing differences in native compared to foreign stimuli by 12 months of age has been documented across a number of domains, including faces (e.g., Pascalis et al., 2002; Lewkowicz and Ghazanfar, 2006; Kelly et al., 2007; Pons et al., 2009; Simpson et al., 2010), voices (Friendly et al., 2013), music (e.g., Lynch et al., 1990; Hannon and Trehub, 2005a,b; Trainor, 2005; Trehub and Hannon, 2006; Hannon and Trainor, 2007), language (e.g., Werker and Tees, 1984; Kuhl et al., 1992, 2006; Polka and Werker, 1994; Tsao et al., 2000; Kuhl, 2004, 2008; Palmer et al., 2012; for reviews see Werker and Tees, 2005; Curtin and Werker, 2007), and even action (Loucks and Sommerville, 2012). For example, 6-month-olds are equally good at discriminating two monkey faces as they are at discriminating two human faces, but 9-month-olds and adults are much better with human faces (Pascalis et al., 2002). Similarly, in the language domain, 6-month-old infants are equally good at discriminating consonant speech sounds from two native or two foreign phonemic categories whereas 10- to 12-month-olds are much better with native categories (e.g., Werker and Tees, 1984; Kuhl et al., 2006).

Although less studied than speech and faces, voices are important social stimuli for infants. People can be identified by the unique characteristics of their voices, which is especially useful when visual cues are poor. Voices also provide cues to the listener about the size, gender, and age of a talker (e.g., Smith and Patterson, 2005). Recently, Johnson et al. (2011) found that by 7 months of age, infants exhibit an own-language effect when discriminating voices, just as they exhibit an own-race effect when discriminating faces, with better discrimination of voices speaking their native language compared to a foreign language. Subsequently, Friendly et al. (2013) found evidence of perceptual narrowing for voices toward the end of the first year after birth. Specifically, between 6 and 12 months of age, infants became significantly worse at discriminating between two foreign-species (rhesus monkey) voices, but marginally better at discriminating between two native-species (human) voices.

The important role of experience in perceptual narrowing is indicated by research showing that sensitivity to foreign stimuli can in some cases be *maintained* with exposure to a foreign stimulus during the period when loss typically occurs (e.g., Burns et al., 2003; Pascalis et al., 2005; Scott and Monesson, 2009, 2010) or*reinstated* with exposure to a foreign stimulus after the period of loss (e.g., Kuhl et al., 2003; Hannon and Trehub, 2005b; Anzures et al., 2012). For example, Pascalis et al. (2005) found that after 2 weeks of daily exposure to monkey faces at 6 months of age, followed by 2.5 months of less frequent exposure, 9-month-olds maintained the ability to discriminate a novel set of monkey faces at a level comparable to that shown at 6 months of age. With respect to reinstatement, infants who received training at 8- to 12-months of age with other-race faces (Anzures et al., 2012), musical (Hannon and Trehub, 2005b), or linguistic (Kuhl et al., 2003) stimuli foreign to their environment, demonstrated successful processing of those stimuli, whereas their untrained counterparts did not. In the present paper, we investigate the generality of reinstatement by testing whether experience with rhesus monkey voices around 12 months of age can reverse the decrement in ability to discriminate such voices that has been documented in previous research (Friendly et al., 2013).

In designing our training protocol, we considered two features that appear to be important for exposure or training in infancy to result in learning. The first is individual-level encoding (Archambault et al., 1999; Waxman and Braun, 2005; Scott et al., 2006,2008; Scott andMonesson,2009,2010). In the visual domain, Scott and Monesson (2009) repeated the Pascalis et al. (2005) monkey-face maintenance study, but used three different types of training regimen. One group of infants was trained at the individual level, similar to the infants in Pascalis et al.'s (2005) study, with parents labeling each monkey face with its own unique name (e.g., Dario, Flora, Boris, etc.). A second group of infants was trained at the category level, with parents labeling all monkey faces with the identifier "monkey." The third group of infants was trained through passive viewing alone, with no label being read by parents during exposure. Interestingly, only the infants who received individual-level training with monkey faces showed maintenance of the ability to discriminate novel monkey faces at 9 months of age. Scott and Monesson (2009) concluded that individual-level exposure was critical for obtaining the effects found in Pascalis et al.'s (2005) study because it focused infants' attention on what features were unique in each face, rather than on what the monkey faces had in common. Similarly, Anzures et al. (2012) successfully reinstated sensitivity to foreign-race faces in 8- to 10-month-olds after exposing them to daily videos of foreign-race women who introduced themselves by name. However, this was not compared to a no-name condition, making it unclear if labeling the faces by name influenced reinstatement in this study. Nevertheless, considering the findings from Scott and Monesson (2009), we designed our training procedure such that particular monkey voices would be associated with unique monkey names and characters.

The second feature that has been found to promote improved learning is social interaction. Kuhl et al. (2003) demonstrated that English-learning infants who were exposed to 12 sessions of Mandarin Chinese training between 9 and 10 months of

age only showed a reinstatement in the ability to discriminate between Mandarin-specific phonemes at 10 to 11 months of age if they interacted with the Mandarin speakers in person. Infants who received the same training in the form of an audiovisual video or who received audio-alone training did not show a reverse in the decline of the ability to discriminate Mandarin phonemes. Similar benefits of social interaction have been found for native-language training using Baby Einstein© videos, where infants required parental interaction in order to learn the words that were featured in the videos (Richert et al., 2010). Likewise, active musical interaction between infants and parents has been found to lead to earlier musical pitch enculturation compared to passive music listening (Gerry et al., 2012; Trainor et al., 2012). On the other hand, Anzures et al. (2012) found evidence of reinstatement, at 8 to 10 months of age, of sensitivity to foreign-race faces after 3 weeks of daily exposure to audio-visual videos of other-race females and Hannon and Trehub (2005b) found reinstatement of sensitivity to foreign musical rhythms in 12-month-old infants who passively listened to these rhythms at home on a CD. However, while parents were instructed to avoid drawing the infants' attention to the music and instead to go about their regular routines, the CD may have been played during some type of social interaction between the parent and infant.

In the present study, we gave 11.5-month-old infants specific exposure to rhesus monkey voices under conditions that promoted individual-level encoding in the context of social interaction. In particular, we designed a storybook and accompanying audio CD narration that parents listened to with their infants twice a day for a 2-week period. The storybook contained a number of exemplars of each of four monkey characters' voices. Following this exposure, we tested infants' ability to discriminate a new set of monkey voices. We compared their discrimination of monkey voices to that of the 12- and 6-month-olds in Friendly et al. (2013) who did not receive any training. We aimed to determine whether this exposure would reinstate the ability to discriminate the monkey voices to the original level found at 6 months of age, before perceptual narrowing was fully underway.

## **MATERIALS AND METHODS PARTICIPANTS**

"fpsyg-04-00718" — 2013/10/9 — 12:02 — page 2 — #2

Twenty-four infants (mean age = 12.0 months, SD = 0.19 months at the time of testing; 10 females) received 2 weeks of monkeyvoice training prior to testing (Trained-12 month Group). They were compared to two groups of infants from our previous study that were tested in the identical procedure, but who did not receive any training (Friendly et al., 2013). One group was also 12 months of age (*n* = 24, mean age = 12.0 months, SD = 0.19 months; 9 females; Untrained-12 month Group) and the other group was 6 months of age (*n* = 24; mean age = 6.1 months, SD = 0.22 months, 11 females; Untrained-6 month Group). Parents gave informed consent and reported normal hearing for all infants. Parents also reported all infants as hearing English 98–100% of the time in their home environment. An additional 20 infants across the three groups were excluded from the final sample due to fussiness (*n* = 4), failure to pass the familiarization phase of testing (see Procedure section 2.3.2 below; *n* = 7), receiving less than 23 training sessions (*n* = 5), being too old at time of testing (*n* = 1) and hearing non-English languages more than 2% of time in their home environment, as reported by parents (*n* = 3).

#### **STIMULI AND APPARATUS**

#### *Training stimuli*

A CD-narrated picture storybook was created for the monkey voice training. Entitled "Beach Day for the Monkey Family," it contained colorful illustrations of four members of the Monkey Family going out for a day at the beach. Each monkey (labeled as Daddy, Mommy, Sister and Brother Monkey) was shown individually on a separate page, in consecutive order, six times throughout the storybook (for an example, see **Figure 1**). The CD-narration for the storybook was read by a monolingual English-speaking adult female and spoken in an infant-directed manner. Parents were instructed to listen to the accompanying CD and turn the page only at the sound of the chime, which occurred 4 s after the last vocalization on each page. The CD was designed so that every time one monkey was being viewed in the storybook, infants heard two vocalizations produced by a real rhesus macaque (*Macaca mulatta*). Thus, on the CD, each monkey character in the storybook was associated with the vocalizations of only one

particular rhesus monkey. Twelve rhesus voice recordings were heard for each of the four rhesus monkey characters on the CD (6 tokens of the "coo" call category, heard two times each). The rhesus monkey recordings were obtained from author DR (for methodology on obtaining these recordings, see Rendall et al., 1996; Owren and Rendall, 2003), and edited using Cool Edit Pro [Syntrillium Software; sampling rate = 44.1 kHz, (intensity) resolution = 16-bit] and normalized for peak intensity across the sample. On the CD, the recordings of each monkey were ordered randomly, with the stipulation that the same"coo" token was never presented twice in a row and that the two tokens heard on each page formed a unique pair. The four rhesus monkeys on the CD were different from the monkeys used for testing. Those on the CD formed two matched pairs (Daddy/Mommy monkey; Sister/Brother monkey) such that the set of tokens for each voice in the pair were matched for mean duration (mean = 0.367 sessions, SD = 0.065 sessions; mean = 0.477 sessions, SD = 0.080 sessions, for the two pairs, respectively) and minimum (mean = 218.78 Hz, SD = 116.45 Hz; mean = 469.45 Hz, SD = 24.78 Hz), maximum (mean = 458.48 Hz, SD = 70.31 Hz; mean = 535.57 Hz, SD = 20.50 Hz) and mean (mean = 340.12 Hz, SD = 86.48 Hz; mean = 514.46 Hz, SD = 15.55 Hz) F0 (analyzed using Praat software's autocorrelation algorithm, F0 searched for between 100

"fpsyg-04-00718" — 2013/10/9 — 12:02 — page 3 — #3

**FIGURE 1 | An excerpt from the training stimulus storybook "Beach Day for the Monkey Family."** A CD that contained the story narration and vocalizations belonging to four individual rhesus monkeys (six different coo tokens per monkey) accompanied the storybook. Each rhesus voice was always associated with the same monkey character in the story (either Daddy, Mommy, Sister or Brother Monkey), and the monkey characters were presented in the same order six times throughout the book. A sample of the storybook and CD can be found at: http://psycserv.mcmaster.ca/ ljt/RSM/RSM.html

and 600 Hz; Boersma and Weenink, 2009). An excerpt from the training storybook and CD (full version: 6 min 40 s in duration) can be found at: http://psycserv.mcmaster.ca/ljt/RSM/RSM.html.

## *Testing stimuli*

The rhesus monkey vocalizations used for testing were identical to those used in our previous study (see Friendly et al., 2013), but different from those used during training. Six vocal samples of the "coo" call from each of four female rhesus monkeys were obtained from author DR (see Rendall et al., 1996; Owren and Rendall, 2003), edited using Cool Edit Pro [sampling rate = 44.1 kHz, (intensity) resolution = 16-bit] and normalized for peak amplitude across the sample (Pair 1: voice 1 mean=49 dB, range = 46–54 dB, voice 2 mean = 49 dB, range = 47–51 dB; Pair 2: voice 1 mean = 55 dB, range = 53–57 dB, voice 2 mean = 56 dB, range = 55–57 dB). Two pairs of primate voices (6 different "coo" call tokens for each individual monkey) were paired based on acoustic analyses using Praat software, such that their sets of tokens were matched for mean duration (mean = 0.30 sessions, SD = 0.047 sessions; mean = 0.27 sessions, SD = 0.061 sessions, for the two pairs, respectively) and minimum (mean = 282.36 Hz, SD = 66.28 Hz; mean = 502.48 Hz, SD = 18.93 Hz), maximum (mean = 351.84 Hz, SD = 49.87 Hz; mean = 562.59 Hz, SD = 32.49 Hz) and mean (mean = 320.50 Hz, SD = 48.72 Hz; mean = 542.04 Hz, SD = 26.30 Hz) F0. Four conditions (1A, 1B, 2A and 2B) were created for testing infants so that, for each voice pair (1 and 2), one voice in the pair served as the "change" voice and the other as the "background" voice for condition A (see Procedure). The change and background voices were switched for condition B.

## **PROCEDURE**

#### *Training procedure*

Two weeks prior to testing, infants in the Trained Group were mailed a package containing the illustrated storybook and accompanying CD. The package also contained a music and language questionnaire, daily reading log and instructions for the infant's training schedule. The questionnaire asked what languages were spoken in the home and, for each, the proportion of time it was spoken, as well as whether infants attended music classes, whether parents played musical instruments, how often parents sang to their infants each week, and how often the infants listened to music each week. Parents were instructed to play the CD twice a day at homefor 2 weeks,for a total of 28 training sessions,following along in the storybook with their infant. In order to ensure that the voice of each rhesus monkey on the CD was associated with a particular character in the book (labeled either Daddy, Mommy, Sister or Brother Monkey), parents were instructed to turn each page of the storybook only when they heard a musical chime sound. To make sure that infants were actively engaged during the monkey-voice training, parents were instructed to listen to the storybook together with their infant, interacting to engage their infant's attention as much as possible. Infants were reported to have received between 23 and 29 sessions of training (mean = 28 sessions, approximately 6.5 min per session). One day after the completion of the 2-week training period, infants were brought into the lab for testing.

## *Testing procedure*

Infants in the Trained Group were tested in the identical conditioned head turn (CHT) procedure as infants in the Untrained Groups from Friendly et al. (2013; also see Werker et al., 1998). Infants were assigned randomly to one of 4 stimulus conditions (1A, 1B, 2A or 2B), where the A and B conditions reversed which voice of the pair was the background and which the change voice.

During the testing phase of the CHT procedure, a loudspeaker located 90◦ to the infant's left played the six *"coo"* tokens from the background voice repetitively in a quasi-random order such that the same token was never repeated consecutively (stimulus onset asynchrony = 1750 ms). The parent sat across from the experimenter with the infant seated on his/her lap and listened to masking music through headphones in order to eliminate potential parental influence on the infant's behaviour. The experimenter likewise listened to masking music during testing. Throughout the experiment, tokens from the background voice were played continuously. The experimenter pressed one button when the infant was paying attention and facing forward (toward the experimenter), indicating to the computer that the infant was ready for a trial. There were 24 trials. Half (12) were control (no-change) trials that were indistinguishable from the repeating background. The other half (12) were change trials, on which the background voice was replaced by one of the six tokens of the changed voice for one repetition. Across the 12 trials, each of the six changevoice tokens was presented twice in a random order. The order of change and control trials was quasi-random, with the constraint that no more than two control trials were presented in a row. The experimenter pressed a second button when the infant made a head-turn response of 45 degrees or more to the left toward the speaker from which the sounds were played. Head turn responses occurring on control trials (i.e., false alarms) were not rewarded by the computer. In contrast, head turns on change trials (i.e., hits) that occurred within 1.5 s of the onset of the changed voice were rewarded by the computer with 2 s of an animated light and toy display. The proportion of hits and false alarms were converted into d-prime (d ) scores for data analysis.

Before infants began the testing phase of the CHT procedure, they first had to pass an initial training phase designed to familiarize them with the rule that when they made a headturn response to a change from one monkey's voice to another, they would be rewarded with an animated toy display. In this phase, only two of the six change-voice tokens were used and there were no control trials. Furthermore, during training the change voice was played, on average, 8 dB louder than the repeating background voice (see Stimuli and Apparatus) in order to make it a noticeable difference that would attract the infant's attention to look toward the loudspeaker. In order to pass the familiarization phase and proceed to the testing phase, infants were required to make four correct head-turn responses in a row within 20 training trials. Infants who did not pass this training criterion were excluded from the final data set (see Participants). Once in the testing phase, all six "coo" tokens of the change were presented without the increase in intensity used during training.

**"fpsyg-04-00718" — 2013/10/9 — 12:02 — page 4 — #4**

## **RESULTS**

Preliminary analyses revealed no significant differences in performance between male and female infants. As well, performance was not significantly related to whether or not infants attended music classes, whether parents played musical instruments, how often parents reported singing to their infants, and how often the infants were reported to listen to music. Thus these variables were not considered further in the following analyses. As can be seen in **Figure 2**, infants in the Trained-12 month group performed quite well at discriminating the monkey voices. Although all three groups performed significantly above chance levels (Untrained-12 month: *t*(23) = 3.53, *p* = 0.002; Untrained-6 month: *t*(23) = 9.70, *p* < 0.001; Trained-12 month: *t*(23) = 8.89, *p* < 0.001), a one-way ANOVA with group indicated a significant difference across groups in d scores, *F*(2,69) = 9.84, *p* < 0.001. A follow-up independent samples t-test indicated that infants in the Trained-12 month Group performed much better than age-matched infants in the Untrained-12 month Group, *t*(46) = 4.01, *p* < 0.001, Cohen's *d* = 1.16. As well, a comparison of infants in the Trained-12 month Group to the younger infants who had not yet achieved perceptual narrowing (Untrained-6 month Group) indicated that the trained 12-month-olds actually performed slightly better than the untrained 6-month-olds, *t*(46) = 2.00, *p* = 0.05, Cohen's *d* = 0.58. Furthermore, results from our previous research on the untrained infants showed that the ability to discriminate individual monkeys by voice decreased significantly between 6 and 12 months of age (Friendly et al., 2013). Together, these findings suggest that 2 weeks of exposure to monkey voices at

11.5–12 months of age (after perceptual narrowing has occurred) can reinstate sensitivity to voices from a foreign species to a level equivalent to or better than that observed at 6 months of age.

## **DISCUSSION**

"fpsyg-04-00718" — 2013/10/9 — 12:02 — page 5 — #5

In previous work, we showed that the development of specialization for own-species voice discrimination follows a pattern of perceptual narrowing, with a decrease in infants' ability to individuate rhesus monkey voices between 6 and 12 months of age (Friendly et al., 2013). In the present paper, we found that after 2 weeks of twice-daily exposure to rhesus monkey voices in the form of a CD-narrated storybook, 12-month-old infants demonstrated significantly better discrimination of novel rhesus monkey voices not heard during training compared to 12-monthold infants who received no such training. Furthermore, the performance of the 12-month-olds who received exposure to the monkey voices was actually slightly better than that of untrained 6 month-old infants. The fact that monkey voice exposure enhanced discrimination of monkey voices in 12-month-olds to a level seen in 6-month-old infants indicates that the processes underlying perceptual narrowing for voice identification retain considerable plasticity at least until 12 months of age. This conclusion is consistent with studies in other domains that indicate that exposure to a socially relevant foreign stimulus can either maintain (Burns et al.,2003; Pascalis et al.,2005; Scott andMonesson,2009,2010) or reinstate (Kuhl et al., 2003; Hannon and Trehub, 2005b; Anzures et al., 2012) sensitivity past the period during which perceptual narrowing normally occurs.

Previous studies on face processing suggest that maintenance of sensitivity for individuating faces from a foreign species requires that the exposure to those faces be at an individual level, with different labels, such as names, being applied to the different faces in the exposure set (e.g., Scott et al., 2006; Scott and Monesson, 2009). These studies found that having infants simply observe foreign-species faces, or experience them in the context of a common label applied to all faces (e.g., "monkey"), does not lead to maintenance of the ability to discriminate these faces. As well, Anzures et al. (2012) found that exposure to labeled foreign-race faces at 8- to 10-months reinstates infants' ability to recognize foreign-race faces, although they did not test under conditions with no labeling. Scott and Monesson (2009) suggest that labeling faces by name might draw infants' attention to the differences between individuals, rather than to what the individuals have in common. On the other hand, phoneme categories in speech and metrical structures in music do not apply to individual people or individuals from other species (although changes in phonemes can signal changes in word meaning), and explicit individuation though the use of labels in these cases does not seem to be necessary for exposure to foreign categories or structures to disrupt perceptual narrowing (Kuhl et al., 2003; Hannon and Trehub, 2005b). The case of distinguishing individuals by their voice would seem to be similar to the case of distinguishing individuals by their face, suggesting that attention to differences between individuals during exposure might be critical for reinstating sensitivity to voices from a foreign species. This could be tested in future research with respect to distinguishing voices by using a training protocol that either applies no label at all or that applies the label "monkey" to each rhesus monkey voice sample during training.

Social interaction has also been identified as important for plasticity during the period of perceptual narrowing (Kuhl et al., 2003; Gerry et al., 2012). For example, Kuhl et al. (2003) found that interpersonal interaction between English-learning infants and Mandarin-speaking adults reinstated infant's sensitivity to Mandarin phonemic distinctions after narrowing had occurred, whereas exposure to audio-visual and audio-alone recordings of these adults speaking Mandarin did not. On the other hand, Hannon and Trehub (2005b) found that 2 weeks of twice-daily passive exposure to foreign musical rhythms was sufficient to reinstate 12-month-olds' ability to detect violations in foreign rhythmic structure. In the present study, infants receive training in the social context of parental interaction. It is possible that passive exposure to foreign stimuli can affect perceptual narrowing, but that exposure in a social context is more powerful. It remains for future research to determine whether the social interaction during the training phase of the present study was a necessary condition for the reinstatement of sensitivity to monkey voices. In order to investigate this, future studies could test training conditions in which parents are instructed to avoid interaction with their infant while experiencing the audiobook (audio-visual condition) or instructed to listen to the narration without looking at the storybook (audio-alone condition). If social interaction is as important for plasticity during perceptual narrowing for voices as it is for phonemic categories, then the infants in the audio-visual and audio-alone conditions should show no (or less) reinstatement of ability to discriminate monkey voices compared to the trained 12-month-olds in the present study.

In interpreting the effects of different kinds of experience on reinstatement of abilities at 12 month of age, it is important to consider the results from a study by Fair et al. (2012), in which sensitivity to distinctions in foreign-species (monkey) faces at 12 months was observed without a prescribed at-home training period, by simply by extending the length of the familiarization period at the time of testing. Specifically, Fair et al. (2012) demonstrated that 12-month-olds showed no evidence of discriminating unfamiliar monkey faces after 20 s of familiarization, but did discriminate them after 40 s of familiarization. This extended familiarization period could be considered a relatively brief form of training, but it is surprising that reinstatement could be achieved after such a brief training.

A final question concerns the age range over which the window of plasticity remains open with respect to learning to discriminate foreign voices. In other domains, there is evidence that some plasticity remains throughout the lifespan in that exposure to foreign stimuli in childhood (e.g., Feinman and Entwisle, 1976; Cheour et al., 2002; Shestakova et al., 2003; Wang and Kuhl, 2003;

#### **REFERENCES**

Anzures, G., Wheeler, A., Quinn, P. C., Pascalis, O., Slater, A. M., Heron-Delaney, M., et al. (2012). Brief daily exposures to Asian females reverses perceptual narrowing for Asian faces in Caucasian infants. *J.* *Exp. Child Psychol.* 112, 484–495. doi: 10.1016/j.jecp.2012.04.005

Archambault, A., O'Donnell, C., and Schyns, P. G. (1999). Blind to object changes: when learning the same object at different levels of categorization modifies its perception. *Psychol.* Sangrigoli et al., 2005; Macchi Cassia et al., 2009) or adulthood (e.g., Tees and Werker, 1984; Bradlow et al., 1999; McCandliss et al., 2002; Iverson et al., 2005; Pruitt et al., 2006; Scott et al., 2006, 2008; de Heering and Rossion, 2008; Kuefner et al., 2008; Zhang et al., 2009) results in improved processing of those stimuli, particularly if the training contains highly variable and numerous stimuli (Zhang et al., 2009), if differences between the stimuli are exaggerated (McCandliss et al., 2002), and if the person had some exposure to the stimuli earlier in life (e.g., Lenneberg, 1967; Tees and Werker, 1984; Newport et al., 2001; Sangrigoli et al., 2005; Macchi Cassia et al., 2009; Oh et al., 2010). However, completely native-like processing of foreign stimuli appears to be very difficult, if not impossible, to achieve in adulthood (e.g., Takagi and Mann, 1995; Flege et al., 1999; McCandliss et al., 2002; Takagi, 2002; Hannon and Trehub, 2005b; Iverson et al., 2005, for reviews see Birdsong, 2006; Hernandez and Li, 2007). Nevertheless, Sangrigoli et al. (2005) found native-like discrimination of Caucasian (French) faces by Korean adults who were adopted by French families between 3 and 9 years of age, suggesting that it is possible to demonstrate native levels of foreign-race face processing under certain conditions. In the present study, informal feedback from parents who participated with their infants suggests that parents had a difficult time distinguishing between the four monkey voices used in the storybook, even after listening to the story with their infant for the 2-week, twice-daily period. However, it is possible that with sufficient training adults could become proficient at discriminating foreign voices, as their ability to discriminate the monkey voices is poor, but above chance levels, without training. Again, this could be investigated in a future study.

In summary, perceptual narrowing achieved by the end of the first year after birth for discriminating voices can be modified by 2 weeks of exposure to voices from a foreign species, indicating a period of flexibility and plasticity following narrowing. It remains for future research to determine, (1) the time course of this plasticity across the lifespan, (2) effects of the social context, (3) whether individual-level training is important in perceptual narrowing for voice discrimination, (4) whether perceptual narrowing also occurs for other voice types (e.g., vocalizations from other species, sexes, races, and age groups), and (5) which particular acoustic characteristics of native and/or foreign voices, if any, promote the perceptual narrowing for voices observed in Friendly et al. (2013) and the reinstatement of foreign-voice discrimination in the present study.

#### **ACKNOWLEDGMENTS**

This research was supported by a grant to Laurel J. Trainor from the Natural Sciences and Engineering Research Council of Canada. We thank Ren Ee Choo, Aliyah Mohamed and Andrea Unrau for assistance in data collection.

*Sci.* 10, 249–255. doi: 10.1111/1467- 9280.00145

Birdsong, D. (2006). Age and second language acquisition and processing: a selective overview. *Lang. Lear.* 56, 9–49. doi: 10.1111/j.1467-9922. 2006.00353.x

"fpsyg-04-00718" — 2013/10/9 — 12:02 — page 6 — #6

Boersma, P., and Weenink, D. (2009). *Praat: Doing Phonetics by Computer Version 5.1.04. Computer program*. Available at: http://www.praat. org/

Bradlow, A. R., Akahane-Yamada, R., Pisoni, D. B., and Tohkura, Y. (1999). Training Japanese listeners to identify English /r/and/l/: longterm retention of learning in speech perception and production. *Percept. Psychophys.* 61, 977–985. doi: 10.3758/BF03206911


11, 466–472. doi: 10.1016/j.tics. 2007.08.008


*Proc. Natl. Acad. Sci. U.S.A.* 100, 9096–9101. doi: 10.1073/pnas. 1532872100


"fpsyg-04-00718" — 2013/10/9 — 12:02 — page 7 — #7

and screams: perceptual experiments with human (*homo sapiens*) listeners. *J. Comp. Psychol.* 117, 380–390. doi: 10.1037/0735-7036.117.4.380


*Sci.* 16, 197–201. doi: 10.1111/j.1467- 8721.2007.00503.x


hypothesis: perceptual learning of Mandarin tones in American adults and American children at 6, 10 and 14 years of age," *Poster Presented at the 15th International Congress of Phonetic Sciences*, Barcelona, 1537– 1540.


"fpsyg-04-00718" — 2013/10/9 — 12:02 — page 8 — #8

of phonetic learning in adulthood: a magnetoencephalography study. *Neuroimage* 46, 226–240. doi: 10.1016/j.neuroimage.2009. 01.028

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 29 April 2013; accepted: 18 September 2013; published online: 09 October 2013.*

*Citation: Friendly RH, Rendall D and Trainor LJ (2013) Plasticity after perceptual narrowing for voice perception: reinstating the ability to discriminate monkeys by their voices at 12 months of age. Front. Psychol. 4:718. doi: 10.3389/ fpsyg.2013.00718*

*This article was submitted to Auditory Cognitive Neuroscience, a section of the journal Frontiers in Psychology.*

*Copyright © 2013 Friendly, Rendall and Trainor. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Sensitive and critical periods in visual sensory deprivation

## *Patrice Voss 1,2\**

*<sup>1</sup> Cognitive Neuroscience Unit, Montreal Neurological Institute, McGill University, Montreal, QC, Canada*

*<sup>2</sup> International Laboratory for Brain, Music and Sound Research, Montreal, QC, Canada*

#### *Edited by:*

*Virginia Penhune, Concordia University, Canada*

#### *Reviewed by:*

*Catherine Y. Wan, Beth Israel Deaconess Medical Center and Harvard Medical School, USA Harold Burton, Washington University School of Medicine, USA*

#### *\*Correspondence:*

*Patrice Voss, Neuropsychology/Cognitive Neuroscience Unit, Montreal Neurological Institute, McGill University, Room 276, 3801 University Street, Montreal, QC H3A 2B4, Canada e-mail: patrice.voss@mcgill.ca* While the demonstration of crossmodal plasticity is well established in congenital and early blind individuals, great debate still surrounds whether those who acquire blindness later in life can also benefit from such compensatory changes. No proper consensus has been reached despite the fact that a proper understanding of the developmental time course of these changes, and whether their occurrence is limited to—or within specific time windows, is crucial to our understanding of the crossmodal phenomena. An extensive review of the literature reveals that while the majority of investigations to date have examined the crossmodal plasticity available to late blind individuals in quantitative terms, recent findings rather suggest that this reorganization also likely changes qualitatively compared to what is observed in early blindness. This obviously could have significant repercussions not only for the training and rehabilitation of blind individuals, but for the development of appropriate neuroprostheses designed to aid and potentially restore vision. Important parallels will also be drawn with the current state of research on deafness, which is particularly relevant given in the development of successful neuroprostheses (e.g., cochlear implants) for providing auditory input to the central nervous system otherwise aurally deafferented. Lastly, this paper will address important inconsistencies across the literature concerning the definition of distinct blind groups based on the age of blindness onset, and propose several alternatives to using such a categorization.

#### **Keywords: blindness, crossmodal plasticity, early and late blind, critical periods, sensitive periods**

The scientific literature has grown rich in research illustrating the remarkable ability of the brain to reorganize itself following sensory loss. In particular, visually deafferented regions within the occipital cortex of early blind individuals have been repeatedly shown to be functionally recruited to carry out a wide variety of non-visual tasks. While the demonstration of crossmodal plasticity is well established in congenitally (CB) and early blind (EB) individuals, significant debate surrounds whether those who become blind later in life can also benefit from such compensatory changes. For instance, several initial neuroimaging reports (e.g., Cohen et al., 1999; Sadato et al., 2002) suggested that the crossmodal plastic phenomena observed in the blind are likely regulated by a particular *critical period* beyond which no observable changes occur. However, a number of other studies (e.g., Büchel et al., 1998; Burton et al., 2002a,b; Voss et al., 2006) have demonstrated that such crossmodal plastic phenomena might instead be regulated by a *sensitive period,* as opposed to a more rigid critical period, where sensory experience has a relatively greater influence on behavioral and cortical development, but is not necessarily exclusive to that period. Consequently, work to date has focused on the amount of measurable crossmodal plasticity as a function of the age of blindness onset, thus more or less assuming that differences observed between groups of individuals with differing onsets are quantitative in nature (i.e., individuals with an earlier blindness onset will show more crossmodal recruitment of occipital cortex than those with a later onset). Nonetheless, in light of recent findings, one could argue that the plastic changes that occur following blindness do not only change quantitatively with increasing age of blindness onset, but also qualitatively in that crossmodal recruitment of occipital cortex might reflect different processes and purposes for EB and late blind (LB) individuals. For instance, the functional relevance of crossmodal plasticity observed in late-onset blindness has yet to be clearly established, whereas it has been clearly linked to behavior in EB; thus even if we were to observe similar levels of crossmodal recruitment of visual areas in both EB and LB, the observed occipital activations may not share the same functional or behavioral relevance for both blind groups. As a result, we should perhaps no longer simply investigate the presence or absence of plasticity in early and late-onset blindness, but more importantly ask ourselves how the plastic processes and mechanisms change with increasing age of onset. Evidence supporting this claim will be discussed in detail below, following a brief primer on some general plastic properties of the visual system and an in-depth review of findings depicting the crossmodal plasticity phenomenon observed in early blindness.

### **PLASTICITY IN THE VISUAL SYSTEM**

Much of what we know today on the brain and its plastic properties, we owe in great part to the pioneering work of Nobel laureates David H. Hubel and Torsten N. Wiesel performed in the early 1960s. Their investigations on the effects of monocular deprivation revealed both an early innate period of development and a later critical period of experience-dependant plasticity. Indeed, their choice to deprive young kittens of vision in only one eye allowed them to directly compare the responses of both eyes, thus acting as an internal control for variations in the developmental stage of the animal. They showed that monocularly depriving newborn kittens for a least a month induced a dramatic shift in the primary visual cortex (V1) responses from the deprived eye to the non-deprived eye (over 98% of the recorded neurons were unresponsive to input to the formerly deprived eye) (Wiesel and Hubel, 1963). Follow up studies revealed that when kittens were binocularly deprived from birth, more than half of the cells continued to respond to both eyes (Wiesel and Hubel, 1965), and that when eyes were kept from working together by alternating occlusion of the two eyes, nearly all of the cells stopped responding to both eyes and were instead driven by one eye or the other (Hubel and Wiesel, 1965). These findings led them to hypothesize that the loss of deprived-eye responses was a result of competitive processes with the non-deprived eye and not simply from disuse.

Of greater relevance to the current special topic, Hubel and Wiesel (1970) later investigated whether these physiological effects were governed by a period of susceptibility; that is when these effects were greatest and how long they lasted, the duration of deprivation necessary to produce a change, as well as the relationship between the timing of deprivation and the ability to recover normal function. To address these issues, they deprived kittens for various periods of time at different ages and compared neural responses in the striate cortex from both monocular inputs. Importantly, they first showed that a period of susceptibility did in fact exist, starting early in the 4th week following birth and remaining high for approximately three weeks, only to slowly decline until the end of the third month. What really highlights the importance of this period is the fact that a monocular deprivation occurring during the first three months—even one as short as 3 or 4 days—leads to a lasting and largely irreversible decline in the proportion of cells responding to the deprived eye, whereas very long periods of monocular deprivation in the adult cat has very little to no physiological effects (Hubel and Wiesel, 1970). This observation of a critical period of susceptibility to deprivation was among the first to reveal the high degree of sensitivity of the immature brain to an altered sensory state during a very restricted time period in life.

It is probably pertinent at this point to make an important distinction between two related concepts; that is the difference between a sensitive period and a critical period. While both concepts have at times been interchangeable to a certain extent in the literature, they are best segregated to explain distinct developmental phenomena. Sensitive periods generally refer to a limited time window in development during which the effects of experience on the brain are unusually strong, whereas a critical period is defined as a special class of sensitive periods where behaviors and their neural substrates do not develop normally if appropriate stimulation is not received during a restricted period of time (Knudsen, 2004). The above-mentioned studies on monocular deprivation are perfect examples of critical periods, where the absence of normal sensory input during a specific time window leads to irreversible changes in brain function and connectivity. Indeed, if normal binocular input is not achieved by three months of age in kittens, no cells will ever respond to input from the occluded eye, even if visual input to the occluded eye is restored after the critical period.

So far the focus has been on an animal model of monocular deprivation to illustrate the importance of time windows in development during which competitive processes determine the role played by individual cells in the primary visual cortex. An important question that has not been raised yet concerns what happens when no visual input reaches the visual processing centers of the brain (i.e., binocular deprivation). Does the lack of sensory experience lead to disuse-related atrophic processes within these regions? Or are their still competitive processes at play to gain control of occipital cortical regions despite the lack of visual input? Such questions have led to many investigations and revealed that the blind constitute excellent models for studying the plastic nature of the brain (Bavelier and Neville, 2002; Pascual-Leone et al., 2005). The following section will describe in detail what we currently know about the consequences of complete blindness in human adults, both in terms of brain and behavioral changes.

## **EXTREME CIRCUMSTANCES: THE CASE OF COMPLETE BLINDNESS**

#### **FUNCTIONAL AND BEHAVIORAL ADAPTATIONS**

We have a pretty good understanding of how the brain processes visual information and of the specific roles played by various regions throughout the visual system. However, until recently, we had very little knowledge concerning what happened to these regions when an individual was cut off from the visual world due to peripheral lesions of the visual system (e.g., damage to the lens, retina or optic nerve) and thus leading to complete blindness. Evidently, progress has rapidly increased with the advent of specialized neuroimaging tools that allowed for the *in-vivo* investigation of the brain. The first neuroimaging studies used positron emission tomography (PET) to study the glucose metabolism of the occipital cortex at rest in both EB—individuals that become blind during the first few years of life (see **Box 1**)—and sighted individuals (Wanet-Defalque et al., 1988; Veraart et al., 1990). It was shown that the glucose metabolism observed in occipital cortex of blind individuals was greater than that observed in blindfolded sighted subjects, but comparable to what was observed when the blindfold was removed. These initial observations obviously raised important questions on the functionality of the EB's visual cortex. Subsequently, Uhl et al. (1991, 1993) were among the first to show task-related activations in response to tactile stimulation within occipital cortex of EB, and shortly thereafter came a multitude of brain imaging studies showing that their occipital cortex could be crossmodally activated by a variety of tactile (Sadato et al., 1996; Büchel et al., 1998; Burton et al., 2002a) and auditory (Weeks et al., 2000; Arno et al., 2001; Burton et al., 2002b) tasks.

Despite the impressive nature of the observed crossmodal activations in the occipital cortex, important questions still remained regarding their exact significance. Are they truly task-related or simply an epiphenomenon associated with the absence of visual

#### **Box 1:**

**Congenital blindness**: refers to individuals that were born blind, and as a result, were never exposed to visual stimulation.

**Early blindness**: refers to cases of blindness that occurred during the first few years of life, generally prior to the age of 5. However, there are multiple exceptions, with some studies including subjects up to 14 years of age in what are defined as early blind groups. Also, early blind groups often include congenitally blind individuals, unless otherwise specifically stated. While early and congenitally blind individuals are often pooled together, more recent studies have started segregating them into separate groups, as even a few years of visual experience could strongly alter the functioning and the anatomy of visual structures.

**Late blindness**: generally refers to cases of blindness that began after puberty (typically *>*16 years of age) or in adulthood. Again, there are exceptions to this with some studies including individuals with ages of onset as low as 7 years of age.

The lack of consistency in defining blind groups across studies has had two major consequences. The first is the often omittance of individuals with intermediate onsets of blindness (e.g., between 5 and 16 years of age), which of course introduces a strong sampling bias when attempting to relate the age of onset of blindness to a behavioral or neuroanatomical measure. The second is the undesired overlap between defined groups from different studies, where a given individual would be considered as 'early blind' in one and as "late blind" in others.

input? Several findings suggest that the occipital cortex does indeed play a functional role in processing non-visual information following early blindness. The first line of evidence stems from research demonstrating strong correlations between brain activity in occipital cortex of EB and behavioral performance on a variety of tasks including verbal memory (Amedi et al., 2003), episodic retrieval (Raz et al., 2005) and sound localization (Gougoux et al., 2005). This is perhaps not so surprising given the wealth of evidence documenting the development of heightened compensatory perceptual and cognitive abilities in EB (see Voss et al., 2010). Auditory spatial abilities in particular have been heavily investigated in light of substantial questions concerning a blind person's ability to form adequate spatial representations in the absence of vision; consequently, an abundance of compelling evidence linking occipital functioning and sound localization in early blindness has been brought to light (see **Figure 1**; see also Collignon et al., 2009).

Additional evidence supporting the functional relevance of the crossmodal recruitment of occipital cortices in early blindness comes from the use of trans-magnetic stimulation (TMS) which enables inferences on causality via the temporary disruption of cortical functioning within very specific brain areas. Indeed, the application of TMS to occipital areas significantly hampers the performance of EB in tasks assessing sound localization (Collignon et al., 2007), verbal memory (Amedi et al., 2004) and Braille identification (Cohen et al., 1997), while leaving the performance scores of sighted individuals unaffected. Perhaps the most striking form of evidence comes from a blind expert Braille reader, who completely lost the ability to read Braille following an ischemic stroke causing bilateral lesions to her occipital cortex (Hamilton et al., 2000). Similarly, a middle-aged blind individual was reported as having transient difficulties in reading Braille while he experienced temporary visual hallucinations (Maeda et al., 2003). The fact that his ability returned to normal following the hallucinations suggests a causal relationship between occipital functioning and Braille reading in this blind individual. Taken together, these findings suggest that occipital cortex might still serve some functional purpose following blindness. What is not clear at this point, however, is how these crossmodal plastic adaptations come to be? Properly understanding how nonvisual sensory inputs are processed within occipital cortex is a challenging task and is discussed in the following section.

#### **CROSSMODAL PLASTICITY: UNDERLYING MECHANISMS**

As highlighted earlier, many neural processes and connections are the result of competitive interactions between different neurons and sensory inputs, and as previously suggested by Pascual-Leone and Hamilton (2001), visual inputs might actually gain access to occipital regions by means of such competitive processes with the other senses during early development. One popular hypothesis is that occipital cortex might be by design best suited to carry out predetermined specialized functions for which the visual system provides the most adequate sensory input. However, in the case of blindness, other senses providing potentially relevant sensory input could gain access to the "visual" regions of the brain for further processing. Such a view therefore assumes that the functional specialization of "visual" cortical regions is preserved in blindness, and indeed there are a growing number of findings that support it.

For instance, regions specializing in the spatial processing of sounds in blind individuals appear to map onto areas of the dorsal visual stream known for similar processing of visual stimuli (Collignon et al., 2009, 2011). Another area well known for its functional specialization is the lateral-occipital complex (LOC), typically involved in object/form recognition processes, which has been shown on several occasions to be responsive to non-visual form processing in EB (Amedi et al., 2007, 2010). Similarly, the visual word form area, which, as its name indicates, responds well to the visual presentation of words, has been shown to be highly responsive to tactually presented Braille words in EB subjects (Reich et al., 2011). Furthermore, Pietrini et al. (2004) had previously shown that the tactile exploration of faces activated different regions than those elicited by the exploration of objects in the blind, suggesting that the development of topographically organized, category-related representations in extrastriate visual cortex does not require visual experience. Similarly, distinct regions within the ventral visual pathway of blind individuals show neural specialization for non-living and living stimuli in the auditory modality, suggesting that the conceptual domain organization in the ventral visual pathway does not require visual experience to develop (Mahon et al., 2009). Lastly, another well known area for its functional specialization is the human extrastriate cortical region known as the middle temporal complex (hMT+), which is highly responsive to visual motion. Several studies have shown that this region in blind individuals becomes

responsive to both tactile motion on the fingers (Ricciardi et al., 2007) as well as to moving sound stimuli (Poirier et al., 2006). These findings, taken together, provide compelling evidence that the functional specialization of occipital regions is preserved in early blindness, and that the operations subserved by each region need not depend on visual input to be solicited by a given task.

Although many higher tier visual areas seem to have preserved there functional specialization following blindness, it is still undetermined how the non-visual input reaches occipital cortex. Two obvious possibilities are either via already existing connections or through the establishment of new connections not present in sighted individuals. The former could result from the unmasking or strengthening of latent pre-existing pathways between sensory-specific cortices and/or between multisensory areas and occipital cortex. The latter, however, appears unlikely for at least two reasons. The first, as discussed later on, stems from a growing body of evidence demonstrating that crossmodal recruitment of occipital cortex is possible in normal sighted individuals after brief transient periods of visual deprivation, which suggests that already existing intermodal connections are at play [see reviews on potential multisensory pathways by Schroeder et al. (2003); Cappe et al. (2009)]. The second, results from animal work investigating the developmental synaptic pruning period in early infancy. It has been shown that corticocortical projections from auditory to visual cortex are present in infant kittens only to be soon after pruned away due to competitive processes (Innocenti and Clarke, 1984; Innocenti et al., 1988). However, in kittens deprived of vision at birth, these extrinsic connections to the occipital cortex seem to remain (Berman, 1991; Yaka et al., 1999). These findings rather suggest that it is the strengthening of normally transient intermodal connections, and not the formation of new connections following blindness, that is likely to provide the substrate for the crossmodal innervation of occipital cortex following early blindness.

Research with animal models of blindness has illustrated several such pathways that could potentially mediate the crossmodal processing of sound in blindness. For instance, studies with blind rodents have shown the existence of connections between the inferior colliculus (an important auditory relay) and the lateral geniculate nucleus (LGN—an important visual relay) (Doron and Wollberg, 1994; Izraeli et al., 2002), suggesting that auditory information may reach the occipital cortex via the optic radiations ascending from the LGN. Alternatively, auditory information could be fed via direct connections between the medial geniculate nucleus (MGN—an important auditory relay) and the occipital cortex (Laemle et al., 2006). Furthermore, Karlen et al. (2006) have shown that the occipital cortex of CB oppossums receives projections from not only the auditory (MGN), but also from the somatosensory (ventral posterior) nucleus of the thalamus, thus suggesting a possible route for tactile information to be conveyed toward the occipital cortex. More recently, the findings of Laramée et al. (2011) suggest that corticocortical pathways could also mediate the crossmodal input into deafferented visual areas by showing indirect connections between the primary auditory and the primary visual cortex in visually deprived mice.

Anatomical tracer studies in normally seeing primates have shown the existence of direct connections going from caudal auditory areas to peripheral V1/V2 (Falchier et al., 2002; Rockland and Ojima, 2003), suggesting that the necessary pathways to mediate crossmodal plasticity likely exist prior to visual deprivation. Evidence in humans is a little sparser, but several recent findings also support corticocortical pathways between auditory and visual areas as a likely source for streaming auditory input into the occipital cortex. For instance, a recent diffusion tensor imaging (DTI) tractography study in normal seeing humans has revealed the existence of connections between Heschl's gyrus and the calcarine sulcus (Beer et al., 2011). Whether this pathway is different in blind individuals has yet to be established, although it perhaps need not be to subserve the crossmodal recruitment of visual areas by sound. Moreover, a pair of recent studies used dynamic causal modeling (DCM) to investigate the effective connectivity between regions underlying auditory activations in the primary visual cortex of EB individuals. DCM is a powerful hypothesis-driven tool that allows for inferences on the causality between the activity observed in different brain areas and, analogously, to study how information flows in the brain (Friston et al., 2003). It was found that auditory-driven activity in V1 is best explained by direct connections with A1 (Collignon et al., 2013) and that the connectivity between both structures was stronger in the blind compared to sighted individuals (Klinge et al., 2010). A final argument in favor of corticocortical pathways underlying auditory recruitment of occipital areas stems from neuroanatomical investigations showing the optic radiations (geniculocortical tracts) of EB humans to be severely atrophied (Noppeney et al., 2005; Shimony et al., 2006; Pan et al., 2007; Park et al., 2007; Ptito et al., 2008), rendering them unlikely candidates for relaying auditory information to visually deafferented cortical areas.

#### **CROSSMODAL PLASTICITY IN BLINDNESS: BOUNDED BY CRITICAL OR SENSITIVE PERIODS?**

So far only research findings relating to early or congenital blindness have been covered (see **Box 1**), more or less ignoring the notion of critical periods. This is partly due to the fact that most research has primarily focused on the effects of early blindness, and also because, there is little consensus on the effects of late-onset visual deprivation. The following sections attempt to disentangle the different findings relating to late blindness and to contrast them with those relating to early blindness.

One of the first neuroimaging studies to investigate the occipital brain metabolism in EB individuals (Veraart et al., 1990) also examined a group of LB individuals. It was shown that occipital functioning in LB was different from that of EB: while EB were found to have higher occipital glucose metabolism relative to sighted individuals, LB showed a reduction. This finding obviously served as an early indication that the age of blindness onset was potentially a determining factor in the changes that occur in occipital cortex following visual deprivation. Indeed, a pair of early investigations of task-related activations showed that while crossmodal recruitment was observed in EB, no such observation was made in LB (Cohen et al., 1999; Sadato et al., 2002). This finding suggested the existence of a strict critical period for the development of crossmodal plasticity within the occipital cortex (14 years of age: Cohen et al., 1999; 16 years of age: Sadato et al., 2002), after which no crossmodal reorganization would take place if the onset of blindness occurred beyond this period. However, findings from a large number of other studies have since challenged this view. Kujala et al. (1997) first suggested the possibility of crossmodal reorganization in LB individuals by showing posterior event-related potential (ERP) responses similar to those observed in EB when they performed sound-change detection tasks. Subsequently, a PET study revealed activation of visual cortex, albeit manifesting somewhat different patterns, during Braille reading and auditory word processing in both EB and LB subjects (Büchel et al., 1998). This was later followed by a series of studies by Burton et al. in which LB were shown to activate occipital regions in response to a variety of tactile and auditory tasks (Burton et al., 2002a,b, 2003, 2004, 2006; Burton and McLaren, 2006). Similarly, several auditory spatial tasks elicited occipital activations in late-onset blind individuals (Voss et al., 2006, 2008, 2010). However, these crossmodal changes were not accompanied by behavioral enhancements, as is the case in EB individuals, raising questions concerning the functional relevance of the observed crossmodal plasticity in LB.

Despite some exceptions, there thus appears to be some agreement that crossmodal recruitment of deafferented visual areas is not exclusive to EB and can be observed in cases of late-onset blindness as well. While this is the case, the crossmodal recruitment in LB appears to be nonetheless generally reduced (both in terms of intensity and spatial extent) relative to EB, suggesting that while the development of crossmodal plastic processes might not be bound by a critical period, it is definitely modulated by a sensitive period in early development during which reorganization is likely to be more pronounced.

#### **CROSSMODAL CHANGES IN SIGHTED INDIVIDUALS**

Additional evidence supporting the existence of adult crossmodal plasticity stems from research investigating the effects of temporary visual deprivation in normal sighted individuals. One of the first studies to document such effects revealed that short-term light deprivation enhances the excitability of visual cortex. Indeed, a brief period of visual deprivation was shown to not only induce a reduction in the TMS thresholds required for eliciting phosphenes but also lead to an increase in visual cortex activation by photic stimulation (Boroojerdi et al., 2000). Subsequently, using a pharmacological approach in combination with TMS, it was shown that GABA, NMDA, and cholinergic receptors likely play an important role in rapid experience-dependent plasticity in visual cortex, as administering appropriate agonists/antagonists eliminated the TMS phosphenethreshold decrease associated to transient visual deprivation (Boroojerdi et al., 2001).

These findings were soon followed by research inspired by a school for the blind in Spain, which required that its instructors experience daily life without sight for an entire week during training (Pascual-Leone and Hamilton, 2001). The instructors reported having heightened awareness for sounds, being able to better distinguish different speakers and to better orient themselves in response to incoming sounds. To follow up on these reports, Pascual-Leone and Hamilton (2001) developed a protocol in which sighted volunteers would be blindfolded for 5 days. Preliminary findings revealed an increase in BOLD signal within the occipital cortex in response to tactile stimulation after 5 days of complete visual deprivation, and that this increase was no longer present the day following blindfold removal. These findings indicated that rapid crossmodal changes can occur in the occipital cortex of adults when temporarily deprived of vision, and were further documented in Merabet et al. (2008). Remarkably, such crossmodal deprivation-related effects were limited to the blindfolding period and were rapidly reversible.

Subsequent work has impressively shown that very short time periods of visual deprivation are sufficient to induce marked crossmodal changes in occipital cortex. For instance, Weisser et al. (2005) demonstrated that 2 h of visual deprivation was enough to induce the neural changes for the processing of tactile shapes within the occipital cortex of normally sighted individuals. In a recent study, we used a novel technique to determine whether occipital cortex processes auditory input in a similar manner to auditory cortex (Lazzouni et al., 2012). We developed a blindfolding protocol to assess the effects of short-term visual deprivation on the auditory steady state response (ASSR). The ASSR can be defined as an electrophysiological response to rapidly changing auditory stimuli, where neuronal populations respond at the same frequency as the modulation rate of an amplitudemodulated (AM) tone and, importantly, for which the sources of the activity can be extracted using dipole analyses. The ASSR therefore constitutes a powerful tool as it evokes a response that is intrinsically linked to the stimulus and can be tracked within the brain. The results showed that the two spectral peaks associated with the modulation rates of two dichotically presented stimuli (39 and 41 Hz) were observed only within auditory cortex prior to blindfolding. Following 6 h of visual deprivation, however, two peaks were also observed in occipital cortex (see **Figure 2**), thus shedding light on the timeline associated with short-term crossmodal recruitment of input-deprived sensory cortices. This finding also demonstrates that visual cortex can display auditory cortex-like functioning in response to auditory input during periods of deprivation.

#### **CROSSMODAL PLASTICITY: EARLY- vs. LATE-ONSET BLINDNESS**

The previous sections documented multiple demonstrations of the crossmodal processing that occurs in the mature occipital cortex. However, an important question to ask concerns whether the plasticity observed in the adult brain is similar to what is observed in the visually deprived immature brain. Aside from the typical observation of reduced crossmodal recruitment in LB (with the exception of Büchel et al. (1998) who reported greater activation in LB), the following sections will highlight four major distinctions between the crossmodal changes observed for early and late onset blindness that argue for the existence of important underlying functional differences between the two (see also **Figure 3**). Indeed these findings point not only to quantitative differences (i.e., the amount of crossmodal recruitment observed) between the compensatory reorganization that occurs following early and late onset blindness, but also to qualitative ones relating to, for instance, the underlying mechanisms of crossmodal recruitment and its functional relevance to behavior.

#### *Functional relevance of crossmodal processing*

As highlighted above, there is an abundance of evidence demonstrating the functional relevance of the crossmodal recruitment of occipital areas in EB. Several studies have showed strong correlations between behavioral performance and occipital activity (Amedi et al., 2003; Gougoux et al., 2005; Raz et al., 2005), whereas others have shown that the temporary (Cohen et al., 1997; Amedi et al., 2004; Collignon et al., 2007) and permanent (Hamilton et al., 2000) dysfunction of occipital neurons interferes with performance in non-visual tasks. Interestingly, there is little to no evidence of this in LB. This is likely in part due to the limited evidence of enhanced perceptual abilities in LB,

**individuals.** This figure portrays a recent MEG finding that testifies to the impressive speed at which the visual cortex can display auditory cortex-like functioning following a short period of visual deprivation. The left graph shows that prior to blindfolding the two spectral peaks (left temporal in red; right

temporal in green) associated with modulation rate of the auditory stimuli presented to both ears (39 and 41 Hz) are clearly restricted to the temporal electrodes (auditory cortex). However, as shown in the right graph, the same peaks can now be found in visual cortex (purple peaks) following a 6 h visual deprivation period. Adapted with permission from Lazzouni et al. (2012).

as they are often found to be indistinguishable from sighted individuals in terms of performance. The observed crossmodal recruitment in LB therefore seemingly doesn't lead to any behavioral gain as it does in the EB. This assumption is supported by data provided by Cohen et al. (1999), where performance on a Braille reading task was unaffected in LB by the application of TMS over occipital cortex, whereas it reduced performance in EB. While there are a few exceptions where LB have demonstrated heightened perceptual abilities compared to sighted individuals (e.g., Voss et al., 2004), such instances have generally not been associated with increased crossmodal plasticity. Indeed, several other factors could explain increased performance (e.g., training, experience) without the involvement of occipital regions.

One previously proposed hypothesis to explain occipital activations observed in the late-blind stated that they might be the result of mental imagery processes. It was reported by Büchel et al. (1998) that their LB subjects immediately transformed tactile and auditory cues into a visual representation, implying that any occipital activation could be due to "visualization" of the task. While such visual imagery processes have been shown to activate components of the visual system in normal sighted individuals (Kosslyn et al., 1995), more recent paradigms, however, have shown that occipital recruitment necessitates more active tasks that explicitly require subjects to use visual imagery (Kosslyn et al., 2001). Moreover, the visual imagery hypothesis loses traction when considering that occipital recruitment is seldom observed in the sighted when performing non-visual tasks that are also performed by the blind. This would imply that the unlikely scenario where LB resort to visual imagery and not sighted individuals takes place. In fact, it is often reported that when sighted individuals perform non-visual tasks, cross-modal inhibitory mechanisms are engaged (e.g., occipital deactivation is observed) to reduce the functioning of cortices subserving the unattended (and potentially distracting) visual modality (e.g., Laurienti et al., 2002; Gougoux et al., 2005).

#### *Attentional mechanisms/processes*

One exception that has linked superior performance in LB to brain changes has done so using an auditory spatial changedetection task and ERP measurements (Fieger et al., 2006). LB participants were significantly more accurate than sighted participants at localizing/detecting deviant auditory stimuli in peripheral auditory space (performance for both groups was identical for central auditory positions). This was also a task for which the CB had been shown previously to excel at (Röder et al., 1999), and important differences were observed when comparing the ERP results from both studies. The N1 ERP component displayed a more sharply tuned spatial gradient during peripheral attention in CB than in the sighted group, whereas the P3 component was identical in both groups (Röder et al., 1999). Conversely, the early N1 amplitude to peripheral standard stimuli displayed no significant spatial tuning in either the LB or the sighted controls, whereas the amplitude of the later P3 elicited by targets/deviants displayed a more sharply tuned spatial gradient during peripheral attention in LB compared to controls (Fieger et al., 2006). As such, it appears that CB persons possess a more sharply tuned early attentional filtering, manifested in the N1 component, while LB show superiority at deploying late attentional processes of target discrimination and recognition, indexed by the P3 component. These findings therefore strongly suggest that even when both CB and LB individuals show a behavioral advantage over sighted subjects on a given task, these enhancements are potentially mediated by different underlying cerebral mechanisms.

#### *Source of auditory input into the occipital cortex*

The potential role played by corticocortical connections in mediating the crossmodal recruitment of occipital cortex was specifically underlined in previous sections. For instance, a DTI tractography analysis has shown the existence of direct connections between primary auditory and visual areas in normal seeing individuals (Beer et al., 2011), whereas the use of DCM enabled researchers to establish that the functional connectivity between both structures is stronger in EB than in sighted individuals (Klinge et al., 2010). To addresses the possible differences between EB and LB individuals, we have recently shown that the flow of auditory information into the occipital cortex might be mediated by a different pathway in LB using DCM analyses (Collignon et al., 2013). Since it was recently demonstrated, using DCM, that crossmodal plasticity observed in CB individuals is more likely to be supported by corticocortical connections rather than thalamocortical connections (Klinge et al., 2010), we included only corticocortical connections in our models. Our findings indicated that the auditory activity observed in occipital cortex of CB individuals was best explained by direct feed-forward connections from primary auditory to primary visual cortex, whereas in LB, auditory information appears to rely more on an indirect feedback route using parietal regions as a relay between both primary sensory areas (Collignon et al., 2013). This strongly suggests that the crossmodal recruitment of visually deafferented areas is likely mediated by different pathways in EB and LB.

Indeed, it is highly likely that EB individuals have access to different pathways given the excessive connectivity between regions in early development. Indeed, the synaptic density of visual cortex reaches levels greater than that of adults in early infancy through synaptogenetic processes, and then gradually decreases to adult levels by approximately 5 years of age through the pruning of exuberant connections (Johnson, 1997), a process that is interrupted by visual deprivation (Stryker and Harris, 1986). Moreover, as highlighted above, the existence of such corticocortical connections between auditory and visual areas has been shown in young infant animals, only for the majority of these connections to be pruned away during normal development (Innocenti and Clarke, 1984; Innocenti et al., 1988). Whether similar corticocortical connections are also more prominent during early development in humans is unclear, but if present, EB may utilize and strengthen these normally transient exuberant connections to compensate for the loss of sight through experience-dependent stabilization processes, whereas LB must rely on connections that develop within the normal visual brain.

#### *Preserved functional specialization*

More recent research has begun to examine whether the crossmodal takeover of occipital cortex due to blindness follows some sort of organizational principle. There are now several lines of evidence stemming from neuroimaging studies [reviewed in Voss and Zatorre (2012)] that illustrate how the pre-existing functional specialization of specific cortical regions appears to be preserved following visual deprivation. As discussed earlier, a well documented example of this concerns the LOC, notably involved in object/form recognition processes. Amedi et al. (2007, 2010) have shown on multiple occasions that this region is also recruited by auditory and tactile form recognition tasks in EB individuals. Similarly, the visual motion processing center (area MT) has been shown to be recruited by both tactile (Ricciardi et al., 2007) and auditory (Poirier et al., 2006) motion stimuli. Both of these examples convincingly suggest that visual deprivation does not alter the specialized modular organization of the visually deafferented occipital areas of the brain, and that the operations subserved by each region need not depend on visual input to be solicited by a given task. Importantly, with respect to the objectives of this paper, two recent investigations have comparatively investigated this topic in both CB and LB individuals. First, we have recently shown that while both recruit occipital regions for sound processing, the preferential activation of the right dorsal stream for the spatial processing of sounds (compared to spectral processing of sounds) was only observed in CB (Collignon et al., 2013). This suggests that these occipital regions maintain a functional specialization for spatial processing in other senses only if vision is lost early in life. A second example supporting such a claim was provided by Bedny et al. (2012) who investigated the role of the visual cortex in language processing in both CB and LB individuals. Again, while they observed that occipital cortex was recruited by general auditory input in both groups, a preferential response to speech stimuli in the left hemisphere (compared to non-speech) was only observed in CB, suggesting that early visual experience might be detrimental to the occipital cortex acquiring a role in language processing following blindness.

The above-mentioned points raise interesting questions concerning the role played by sensitive/critical periods. While the general observation of crossmodal recruitment in LB individuals suggests that it is subject to the influence of a sensitive period, the highlighted differences indicate that different processes might be mediating the observed crossmodal recruitment in both early and LB individuals. If this is indeed the case, it rather suggests that *critical periods* may play a role after all, with perhaps varying cutoff points with regards to the different processes in play. Consequently, future work would benefit from attempting to target these issues by relating the age of blindness onset with the development of specific particularities that so far have only been observed in early blindness (e.g., functional relevance of recruitment, corticocortical connectivity).

## **IMPLICATIONS FOR SIGHT RESTORATION**

What happens to the ability of the "visual" brain to process visual information once it "goes auditory?" Such a question has important repercussions when considering the potential outcomes of sight restoration procedures and prostheses. Over three centuries ago, the Irish philosopher William Molyneux posed an analogous question to one of his contemporaries, John Locke, on how long term blindness would affect one's ability to see should sight be restored (Degenaar, 1996): *"Suppose a man born blind, and now adult, and then taught by his touch to distinguish between a cube and a sphere of the same metal, and the same bigness, so as to tell, when he felt one and the other, which is the cube, which is the sphere. Suppose then, the cube and the sphere placed on a table, and the blind man to be made to see. Query, whether by sight, before he touched them, he could distinguish, and tell, which is the globe, which is the cube?"* While this matter has since been debated for decades on end between various historical figures, there have been several case studies that have provided some insight into the matter, demonstrating for instance that visual acuity is severely reduced after cataract removal surgery following prolonged periods of deprivation (von Senden, 1960; Gregory and Wallace, 1963; Fine et al., 2003). Additionally, recent neuroimaging data allows for the investigation of potential underlying mechanisms. As already noted, the visual brain goes through drastic changes that might significantly alter an individuals' ability to process visual information should sight be restored. The next section will, however, first examine research with deaf individuals, as technological advances for restoring hearing in profoundly deaf individuals have achieved a fair deal of success with the development of sophisticated cochlear implants (CI). Such progress has allowed researchers to ascertain the consequences of crossmodal plasticity in the deaf population on the success rate of CIs, and will therefore provide insight into how to approach the same issues in blindness.

#### **INSIGHTS FROM THE DEAF**

Once they have become responsive to a new input modality, can the auditory cortices still process to their original source of input? This question bears special importance given that profound deafness can sometimes be reversed by auditory stimulation via a cochlear implant (CI) (Ponton et al., 1996). Put simply, the device replaces normal cochlear function by converting auditory signals into electrical impulses delivered to the auditory nerve (see Mens, 2007 for further details). Several studies have shown the existence of a critical period that cannot be exceeded for recovery of auditory functions following aural deprivation (Kral et al., 2005; Sharma et al., 2005). This time window is generally limited to the first few years of life, with evidence suggesting that if implanted before the age of 2, children can acquire spoken language in a comparable time-frame to normal hearing children (Waltzman and Cohen, 1998; Hammes et al., 2002).

Although it was initially thought that the duration of auditory deprivation should account for most of the variance of the implantation outcome, several lines of evidence clearly suggest other modulating factors (O'Donoghue et al., 2000; Lee et al., 2001; Sarant et al., 2001). In fact, a retrospective case review showed that the duration of deprivation only accounted for 9% of the variability in implant outcome (Green et al., 2007). An alternate predictor can be found, for instance, in preoperative measures of cerebral metabolism. Lee et al. (2001) for instance, showed that the temporal cortex becomes hypometabolic following auditory deprivation, and that the level of hypometabolism is correlated to speech comprehension scores obtained postimplantation. In other words, the longer a person has been deaf, the less likely it is that their temporal cortex will be hypometabolic and the more likely their speech perception capacity will be compromised. In the same vein, it was later shown that speech perception performance was negatively associated with activity in occipito-temporal networks (Lee et al., 2005), even when factoring out the confounding effect of age of implantation (Lee et al., 2007). Furthermore, other important processes may be also at play, such as the level of crossmodal reorganization of the auditory cortex (see Giraud and Lee, 2007). For instance, one study compared cortical evoked potentials involved in the processing of visual stimuli in implanted subjects (Doucet et al., 2006). After evaluating the speech perception abilities of the implanted subjects, they were subsequently divided into two groups based on their performance. It turned out that the group with the poorest performers for speech perception was also the one where implanted individuals showed broader and more anterior scalp distributions when processing visual stimuli (i.e., likely the result of crossmodal processing of the visual stimuli in temporal auditory areas), and vice-versa. It thus appears that several interacting factors influence the outcome of cochlear implantation, of which importantly is crossmodal reorganization. Awareness of this important fact will evidently have an important impact on how similar concerns will be addressed in blindness.

#### **IS THE VISUAL SYSTEM STILL VISUAL FOLLOWING BLINDNESS?**

Knowing whether crossmodal plastic changes are reversible is crucial to the proper development of neuroprostheses designed to restore vision in blind individuals. Although significant progress has been made toward achieving such a goal, future research is extremely dependant on our understanding of how blindness affects the brain, and on how these effects are driven or modulated by the age of blindness onset. Indeed, the brain of a LB individual may be more apt to process visual input following a prolonged period of visual deprivation, whereas the brain of an EB individual has likely underwent permanent plastic changes rendering it unable to process visual information. For instance, the finding that the optic tracts and radiations are atrophied in EB (Noppeney et al., 2005; Shimony et al., 2006; Pan et al., 2007; Park et al., 2007; Ptito et al., 2008) raises serious questions about the integrity of the pathways and whether or not they could convey electrical information stemming from retinal, subretinal, or epiretinal implants (see Merabet et al., 2005), or even transmit retinal images obtained following cataract removal in individuals with congenital cataracts. Furthermore, the numerous reports of significant reduction of cortical gray matter in occipital cortex raises serious questions regarding the area's ability to process visual input (Pan et al., 2007; Ptito et al., 2008; Lepore et al., 2010). A compensatory approach more likely to provide a successful outcome in EB is the use of sensory-substitution devices, where one sensory modality is used to supply information normally gathered by the deprived sense. Perhaps the most well known example of this is Braille, which of course has been highly successful in providing information normally acquired through vision (e.g., reading material) via the tactile modality. Several more sophisticated devices -that transform visual information captured via cameras into spatially relevant tactile or auditory stimulationhave since been implemented (Meijer, 1992; Bach-y-Rita et al., 1998; Capelle et al., 1998) and have allowed blind individuals "to see" complex two-dimensional objects and shapes (e.g., Arno et al., 1999; Renier and De Volder, 2010), and more recently to even navigate around obstacles in a highly controlled environment (Chebat et al., 2011). While these devices are not at a point where they can be relied upon to successfully navigate in the real world, they provide nonetheless a very promising avenue for future research designed to aid visually deprived individuals.

Visual restoration, however, might still be possible for LB individuals. For instance, Pan et al. (2007) showed that white matter (WM) loss in the optic tract and radiation of EB individuals was modulated by the age of blindness onset, suggesting that a later onset would have less effect on the anatomical integrity of the visual pathways. Moreover, Schoth et al. (2006) found no evidence of WM loss in either visual cortex or in visual tracts in subjects that could be categorized as LB (with a mean age of blindness onset of twelve), suggesting that the visual pathways may still be able to communicate signals toward occipital cortex. Consequently, approaches that involve cataract removal and retinal implants are likely to be considerably more viable in individuals that benefitted from the normal development of the visual system.

#### **FUTURE CONSIDERATIONS**

Given the influence early development has on the emergence of crossmodal plastic phenomena in blind individuals, what steps need to be taken to further our understanding of the different processes at play? A crucial first step will be to address inconsistencies across the literature regarding how blind individuals are segregated into different groups based on their age at the onset of blindness (e.g., EB and LB individuals). This segregation is often done in a very arbitrary manner, as very few studies use the same definitions to classify and circumscribe early and late onset blind groups, and in fact occasionally overlap across published reports. This is of course quite troublesome when wanting to compare findings across studies, and will require greater care and cooperation between research groups in order for future work to yield fruitful results.

As first highlighted in **Box 1**, the current lack of uniformity across studies in defining the range of onsets of blindness for EB and LB groups has yielded at least two substantial issues. The first relates to the often non-inclusion of a large group of blind individuals with ages of blindness onsets that lie between chosen cut-offs for both for early and late onset blind groups. This practice not only introduces a strong sampling bias, but also removes potentially important data when investigating blindness-induced crossmodal plasticity. Indeed, important developmental sensitive periods may take place during this gap in the ages of onset. The addition of one or more distinct blind groups covering this gap could help alleviate the loss of potentially important information. In this vein, Li et al. (2013) very recently addressed this issue by defining four distinct groups in investigating brain anatomical connectivity networks: CB, EB (onset after birth but prior to the age of 12), adolescent blind (onset between 12 and 15 years of age inclusively) and LB (onset after 15 years of age). While the chosen ranges could be debated, this nonetheless represents an important first step in directing future research. This work also highlights the fact that it might also be wise to divide CB and EB individuals into separate groups (which several groups have started doing), as even a few years of visual experience could have a significant impact on the functional architecture of the visual system and on the manner it is crossmodally recruited following blindness. Indeed the use of a continuum of onsets of blindness will better allow for the direct investigation of the developmental time-course of processes that govern the emergence of crossmodal plasticity.

The second, potentially more serious issue arising from the inconsistent definitions, concerns the often overlapping of groups across different studies; i.e., a given blind individual could be categorized as an EB individual in one study and as a LB individual. For instance, Burton et al. (2002b, 2003, 2004, 2006) have often considered individuals with onsets of blindness occurring after the age of 7 as a LB individual; so has Fieger et al. (2006) and Bedny et al. (2012) for individuals with onsets occurring after the age of the 9. This is in stark contrast with other reports that have considered individuals with an onset occurring prior to 13 years of age as EB (e.g., Cohen et al., 1999; Sadato et al., 2002; Voss et al., 2008). This is a clear indication that greater effort and care should be put into homogenizing blind group definitions in order to better understand the effects sensitive and critical periods in sensory deprivation.

Lastly, the above-mentioned concerns could also be significantly alleviated by simply moving away from creating groups altogether. Certainly, it could be argued that we should be looking to use the age of blindness onset more as a continuous variable and search for non-linearities in the resulting functions linking the age of onset with various dependant variables, which would be indicative of sudden changes in the occurrence of crossmodal plasticity and possibly resulting from important critical or sensitive periods. For instance, if crossmodal plasticity changes only quantitatively over time, than the relationship between the age of blindness onset and various dependant variables would be a linear one. However, as discussed above, there are several lines of work suggesting that the crossmodal plastic process also undergoes some qualitative changes with later onsets, suggesting that the relationship could in fact be non-linear. Such an approach would have multiple benefits, perhaps none greater than the removal of the group definitions which are often highly arbitrary and the cause of discrepancies between studies. Moreover, treating the age of blindness onset as a continuous variable should allow for the extraction of important time-points during the development of crossmodal plastic phenomena in a data-driven way, rather than by the use of a-priori definitions of particular subgroups based on the age at blindness onset.


*Res.* 63, 163–180. doi: 10.1016/0165- 3806(91)90076-U


*IEEE Trans. Biomed. Eng.* 45,


(2006). Cross-modal reorganization and speech perception in cochlear implant users. *Brain* 129, 3376–3383. doi: 10.1093/brain/ awl264


closure in kittens. *J. Physiol.* 206, 419–436.


in humans with early- and lateonset blindness. *Psychophysiology* 34, 213–216. doi: 10.1111/j.1469- 8986.1997.tb02134.x


a blind man with selective calcarine atrophy. *Psychiatry Clin. Neurosci.* 57, 227–229. doi: 10.1046/j.1440- 1819.2003.01105.x


humans. *Nature* 400, 162–166. doi: 10.1038/22106


Diffusion tensor imaging reveals white matter reorganization in early blind individuals. *Cereb. Cortex* 16, 1653–1661. doi: 10.1093/cercor/ bhj102


Lepore, F. (2004). Early- and lateonset blind individuals show supranormal auditory abilities in far space. *Curr. Biol.* 14, 1734–1738. doi: 10.1016/j.cub.2004.09.051


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 30 April 2013; accepted: 05 September 2013; published online: 26 September 2013.*

*Citation: Voss P (2013) Sensitive and critical periods in visual sensory deprivation. Front. Psychol. 4:664. doi: 10.3389/ fpsyg.2013.00664*

*This article was submitted to Auditory Cognitive Neuroscience, a section of the journal Frontiers in Psychology.*

*Copyright © 2013 Voss. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

**REVIEW ARTICLE** published: 16 October 2013 doi: 10.3389/fpsyg.2013.00719

## Benefits and detriments of unilateral cochlear implant use on bilateral auditory development in children who are deaf

## *Karen A. Gordon1,2,3\*, Salima Jiwani 1,2 and Blake C. Papsin1,3*

*<sup>1</sup> Archie's Cochlear Implant Laboratory, The Hospital for Sick Children, Toronto, ON, Canada*

*<sup>2</sup> Institute of Medical Sciences, Faculty of Medicine, University of Toronto, Toronto, ON, Canada*

*<sup>3</sup> Department of Otolaryngology – Head and Neck surgery, Faculty of Medicine, University of Toronto, Toronto, ON, Canada*

#### *Edited by:*

*Virginia Penhune, Concordia University, Canada*

#### *Reviewed by:*

*Minna Huotilainen, University of Helsinki, Finland Heikki Lyytinen, University of Jyväskylä, Finland*

#### *\*Correspondence:*

*Karen A. Gordon, Archie's Cochlear Implant Laboratory, The Hospital for Sick Children, 555 University Avenue, Room 6D06, Toronto, ON M5G 1X8, Canada*

*e-mail: Karen.gordon@utoronto.ca*

## We have explored both the benefits and detriments of providing electrical input through a cochlear implant in one ear to the auditory system of young children. A cochlear implant delivers electrical pulses to stimulate the auditory nerve, providing children who are deaf with access to sound.The goals of implantation are to restrict reorganization of the deprived immature auditory brain and promote development of hearing and spoken language. It is clear that limiting the duration of deprivation is a key factor. Additional considerations are the onset, etiology, and use of residual hearing as each of these can have unique effects on auditory development in the pre-implant period. New findings show that many children receiving unilateral cochlear implants are developing mature-like brainstem and thalamo-cortical responses to sound with long term use despite these sources of variability; however, there remain considerable abnormalities in cortical function. The most apparent, determined by implanting the other ear and measuring responses to acute stimulation, is a loss of normal cortical response from the deprived ear. Recent data reveal that this can be avoided in children by early implantation of both ears simultaneously or with limited delay. We conclude that auditory development requires input early in development and from both ears.

**Keywords: deafness, cochlear implantation, unilateral hearing, auditory brainstem responses, auditory evoked cortical potentials, auditory development, plasticity, binaural hearing**

"fpsyg-04-00719" — 2013/10/15 — 18:28 — page 1 — #1

## **INTRODUCTION**

A cochlear implant is an auditory prosthesis which is surgically implanted into the cochlea (inner ear), and allows children who are deaf to develop oral speech and language. Because the brain is most susceptible to changes in early life, providing access to sound at a young age is essential to promote auditory development (Papsin and Gordon, 2007; Kral and O'Donoghue, 2010). The implant cannot restore normal hearing. It provides only a crude representation of acoustic sounds, eliminates important cochlear processing, and may not be able to completely reverse the effects of deafness. In addition, cochlear implants were traditionally provided unilaterally (i.e., in only one ear) in children, leaving the opposite pathways deprived of input and susceptible to degeneration and reorganization (O'Neil et al., 2010; Gordon et al., 2013; Kral et al., 2013). Yet, despite these disadvantages, many children achieve excellent listening and oral communication abilities. In the present review, we share findings from studies exploring whether cochlear implantation can limit reorganization of the deprived immature auditory brain and promote appropriate and normal-like development along the auditory pathways.

## **THE AUDITORY SYSTEM REORGANIZES WHEN BILATERALLY DEPRIVED**

Prior to cochlear implantation, the absence of auditory input to the auditory system leaves the brain vulnerable to reorganization (Nishimura et al., 1999; Bavelier et al., 2000, 2006; Finney et al., 2001; Lee et al., 2001; Bavelier and Neville, 2002; Merabet and Pascual-Leone, 2010). Secondary and association auditory areas, including parts of the planum temporale, all of which respond to multi-sensory input including hearing, vision and touch (Pandya and Yeterian, 1985; Giard and Peronnet, 1999; Calvert et al., 2001; Calvert and Thesen, 2004), become recruited by the visual (Finney et al., 2001; Lee et al., 2001, 2007b; Lomber et al., 2010; Meredith and Lomber, 2011) and somatosensory (Levänen et al., 1998; Levänen and Hamdorf, 2001; Auer et al., 2007; Meredith and Lomber, 2011) systems to perform non-auditory functions. As a consequence of early auditory deprivation, processing of visual peripheral localization by the posterior auditory field (Lomber et al., 2010), visual motion detection by the dorsal zone of the auditory cortex (Lomber et al., 2010), and somatosensory sensation by the anterior auditory field (Meredith and Lomber, 2011) become enhanced in individuals who are deaf. These changes appear to result from a direct competition for resources in areas which receive multi-sensory input. If governed by principals of Hebbian processing (Hebb, 1949; Abbott and Nelson, 2000; Song et al., 2000), neurons in these areas might preferentially form viable connections with nonauditory inputs to the detriment of inputs carrying auditory information. We must be concerned by the reorganization of the deaf auditory cortex because, depending on how quickly these processes occur, they may be impossible to reverse and could impair outcomes after cochlear implantation. It is also

becoming clear that these changes do not occur uniformly in children who are deaf and may be related to the heterogeneity in the onset and cause of pediatric deafness (Gordon et al., 2011a,c).

Limiting the period of bilateral deafness in early life is essential to drive maturation in the auditory pathways (O'Donoghue, 1999; Kral et al., 2001; Ponton and Eggermont, 2001; Sharma et al., 2005; Papsin and Gordon, 2007; Gordon et al., 2008a, 2010; Nikolopoulos et al., 2009), and promote optimal hearing and speech and language development (Beadle et al., 2005; Harrison et al., 2005; Nicholas and Geers, 2007; Geers and Sedey, 2011). Many studies investigating auditory development after cochlear implantation focus on children who are deaf in infancy, but do not examine the larger heterogeneity in etiology, onset and/or degree of deafness. These factors may each have unique effects on auditory activity in the brain prior to implantation. For example, biallelic mutations of the Gap Junction Beta-2 (GJB-2) gene causes deficits in the cochlea at likely very early stages of development with possible consequences for auditory function after implantation (Propst et al., 2006). The GJB-2 gene normally codes for the connexin-26 protein, which creates gap junctions in the cochlea necessary for the appropriate release and maintenance of electrochemical gradients. This in turn, generates action potentials and stimulates the auditory nerve (Kelley et al., 1998; Cohn and Kelley, 1999; Gualandi et al., 2002). Electrophysiological recordings of auditory evoked cortical activity at initial cochlear implant activation in children with severe GJB-2 mutations revealed that responses from the cortex were more homogenous in this cohort compared to those children who did not have such a mutation. Auditory evoked cortical responses in children with GJB-2 mutations were characteristic of earlier stages of cortical development, perhaps reflecting restricted spontaneous activity in the auditory system and more limited access to sound prior to implantation compared to their peers who did not have a GJB-2 related deafness (Gordon et al., 2011c). This was further supported by poorer hearing sensitivity in the low frequencies in the GJB-2 group (Propst et al., 2006).

The degree of residual hearing is another important predictive factor for cochlear implant outcomes. Traditional candidacy criteria for cochlear implantation in children include a diagnosis of permanent severe-to-profound hearing loss bilaterally with little or limited access to acoustic input through hearing aids (Osberger et al., 2002). We recently reported that children who had better hearing at 250 Hz used their hearing aids for longer durations prior to receiving a cochlear implant (Hopyan et al., 2012). Of interest, these children performed significantly better on tests of music perception with their implants, particularly when detecting differences in rhythm, compared to children who did not have acoustical access to these low frequencies prior to implantation (Hopyan et al., 2012). Thus, there are advantages of acoustical input for auditory development which can be capitalized upon after cochlear implantation. In general, we are learning that the cause, onset and degree of deafness in any one child will be important to understand in order to ensure that he/she makes the best possible use of his/her device.

## **UNILATERAL COCHLEAR IMPLANTATION RESTORES HEARING AND PROMOTES AUDITORY DEVELOPMENT**

The cochlear implant was made available to children in North America in the early 1990s and works by stimulating the auditory pathways with electrical pulses. The implant contains an array of electrodes which is surgically placed in the scala tympani of the cochlea. These electrodes each deliver electrical pulses to stimulate the auditory nerve. External equipment is worn which takes in acoustic sound through the microphone, extracts frequency and intensity information in a speech processor and sends instructions to an internal device through an FM transmitting coil. The internal receiver-stimulator sends this information to the electrodes which are organized to mimic the normal cochlea; high frequency sounds are allocated to basal electrodes with lower frequencies being allocated to progressively more apical electrodes. In this way, the child receives an electrical representation of the acoustic world and learns to understand sounds including speech.

Auditory brainstem development, measured by decreasing latencies of evoked potential peaks, is largely complete by the first year of cochlear implant use in children with early onset deafness (Gordon et al., 2003, 2006), indicating increasing efficiency of neural conduction and improved neural synchrony with exposure to sound (Gordon et al., 2003). Similar changes have been reported from the auditory brainstems of normal hearing children over a similar time-course (Salamy and McKean, 1976; Starr et al., 1977; Jerger and Hall, 1980; Salamy, 1984; Hecox and Burkard, 2006). Data from Gordon et al. (2006)is shown in **Figure 1A**; on the left is an example of an electrically evoked auditory brainstem response. The stimulus artifact is shown at time 0 ms followed by waves eII, eIII and eV, and on the right, the latency values of wave eV are plotted at initial device activation and over the first year following cochlear implant use in 44 children who had early onset deafness and were implanted unilaterally (Gordon et al., 2006). Recently, we recorded these same responses in two children who were in the original study once they had over a decade of unilateral cochlear implant experience. Their responses are shown in **Figures 1B,C** (Jiwani et al., 2011). In both cases, wave eV latency clearly decreases over the first year of cochlear implant use, with no further changes thereafter. This suggests that activity in auditory brainstem is largely complete by the first year (Gordon et al., 2006).

Further studies concentrated on the development of cortical auditory activity in children with time after cochlear implantation. Cochlear implants provided to children who are congenitally deaf within 3.5 years of bilateral deafness promote age-appropriate cortical responses over the first 3–6 months of implant use (Sharma et al., 2002a). After this initial period, these responses change at a rate which is similar to normal (Eggermont et al., 1997; Eggermont and Ponton, 2003). We recently assessed changes in cortical responses after longer term unilateral cochlear implant use in children who were implanted early (Jiwani et al., 2013a). Grand mean cortical evoked responses from 79 unilateral cochlear implant users (red waveforms) are plotted in **Figure 2** along with the grand mean responses from 58 normal hearing peers (black waveforms) for different intervals of hearing experience. **Figures 2A–C** show grand mean cortical evoked waveforms from children who have between 0 and 7 years (40 cochlear implant

"fpsyg-04-00719" — 2013/10/15 — 18:28 — page 2 — #2

"fpsyg-04-00719" — 2013/10/15 — 18:28 — page 3 — #3

**FIGURE 1 | (A)** Example of an electrically evoked auditory brainstem response waveform is shown on the left. The onset of the cochlear implant artifact is shown at time 0 ms, followed by peaks eII, eIII and eV. Data from Gordon et al. (2006) are plotted on the right and show the mean wave eV latency values of 44 children recorded at initial activation of the implant, and at months 2, 6 and 12 following unilateral cochlear implantation. **(B,C)** on the right show the changes in the brainstem responses of two children who were in the original study (Gordon et al., 2006), recorded from initial activation of the device to different intervals over the first year of cochlear implantation use. New responses recorded after 10 years of unilateral cochlear implant experience are also shown (Jiwani et al., 2011), further confirming that little change in the eV latency occurs beyond the first year of implant use. The wave eV latencies at each time-point are represented on the right for each child.

"fpsyg-04-00719" — 2013/10/15 — 18:28 — page 4 — #4

users; 11 normal hearing), 7 to 12 years (21 cochlear implant users; 18 normal hearing) and over 12 years (18 cochlear implant users; 29 normal hearing) of hearing experience, respectively. Cochlear implant users represented in these Figures had limited durations of bilateral deafness prior to implantation (2.03 ± 1.36 years) with typical heterogeneity in their etiologies of deafness.

As shown in **Figure 2A**, responses from children with up to 7 years of hearing experience with an implant or with normal bilateral hearing are dominated by a large and broad positive amplitude peak, labeled P1/P2. Comparison of peak latencies (*t*(47.3) = −1.63; *p* > 0.05) and amplitudes (*t*(42.1) = −0.64; *p* > 0.05) reveal no significant differences between the two groups. This positive-peaked response is believed to reflect either excitatory auditory activity from the thalamus to deep layers of the auditory cortex (Liegeois-Chauvel et al., 1994), or auditory driven activity from association auditory areas to the reticular activating system in the non-lemniscal auditory pathways (Kraus et al., 1992; Ponton et al., 2000; Ponton and Eggermont, 2001). As thalamocortical and cortico-cortical connections develop around 9 to 12 years of age in superficial layers of the auditory cortex, a small negative amplitude peak, labeled N1, develops in the cortical evoked response and bifurcates the large P1/P2 response into three peaks: P1, N1 and P2. Similar developmental changes to the cortical response are observed in early implanted cochlear implant users who have equal durations of hearing experience. Indeed, as shown in **Figure 2B**, with 7 to 12 years of auditory experience (9.38 ± 1.57 years in cochlear implant users; 9.92 ± 1.57 years in normal hearing individuals), the cortical response in both groups begins to develop into a polyphasic waveform. The grand mean response from all 21 unilaterally implanted children begins to bifurcate into a 3-peaked cortical response at this stage of implant use (**Figure 2B**). Differences in the wavepeak latencies (P1: *t*(10) = −0.88, *p* > 0.05; N1: *t*(10.18) = −1.3, *p* > 0.05; P2: *t*(10.77) = 1.43, *p* > 0.05) and peak-to-peak amplitudes (P1-N1: *t*(6.87) = 1.75, *p* > 0.05; N1- P2: *t*(10.67) = 2.2, *p* > 0.05) between both groups were not significant. This response continues to develop with time. As auditory pathways mature in the auditory cortex, peaks P1- N1-P2-N2 become clearly present (**Figure 2C**) when auditory experience exceeds 12 years in all 18 cochlear implant users (13.81 ± 0.92 years of unilateral implant experience) and 29 normal hearing peers (15.30 ± 1.81 years of age and hearing) (Jiwani et al., 2013a).

The data from individuals with normal hearing shown in **Figure 2** is consistent with findings by Ponton, Eggermont and colleagues who suggested that peak N1 normally emerges around 9 to 12 years of age reflecting maturation of thalamo-cortical and cortico-cortical loops in superficial layers of the auditory cortex (Ponton et al., 2000; Eggermont and Ponton, 2003). These pathways mediate the transfer of primary auditory and multi-sensory input from the thalamus to various regions of the ipsilateral and contralateral auditory cortices (Winer et al., 2001, 2005; Razak et al., 2009), and the transmission of information from the auditory cortex to primary and secondary sensory areas in both hemispheres (Read et al., 2002; Lee and Winer, 2005; Klinge et al., 2010). The developmental trajectory of the electrically evoked cortical waveform suggests that similar development is taking place in children using cochlear implant (Jiwani et al., 2013a), perhaps establishing: (1) appropriate relay of auditory input from the ear to the cortex, via the thalamus; (2) communication between the two cortical hemispheres; and/or (3) connectivity between different sensory areas (Jiwani et al., 2013a). These normal-like developmental changes to the auditory cortex may underlie the impressive improvements in auditory function observed with cochlear implant use over time (Beadle et al., 2005; Nicholas and Geers, 2007; Geers and Sedey, 2011).

## **DIFFERENCES FROM NORMAL PERSIST IN AUDITORY PROCESSING DESPITE LONG DURATIONS OF UNILATERAL COCHLEAR IMPLANT USE**

Although early implantation of young children results in normallike cortical response peaks, as shown in **Figure 2C**, the waveform has at least one abnormality. Specifically, the amplitude of the P2 peak in cochlear implant users is larger than in normal hearing peers (*t*(14.51) = 2.49, *p* < 0.05) (Jiwani et al., 2013a). The importance of this recent finding is that it suggests that deviations from normal cortical processing remain in these young people despite long-term unilateral implant use. Enhanced P2 peak amplitudes in normal hearing adults are known to reflect increases in selective attention (Picton and Hillyard, 1974; Hocherman et al., 1976; Rif et al., 1991; García-Larrea et al., 1992; Posner and Dehaene, 1994; Grady et al., 1997; Fujiwara et al., 1998; Tremblay et al., 2009) and increases in multi-sensory integration during auditory processing (Hari, 1990; García-Larrea et al., 1992; Levänen et al., 1998; Webster and Colrain, 2000; Moller and Rollins, 2002; Crowley and Colrain, 2004; Johnson and Zatorre, 2005). These processes cause a reduction in the primary network which becomes supplemented by the frontal and parietal areas through increased neural recruitment and synchrony (Tremblay et al., 2001, 2009; Tremblay and Kraus, 2002; Tremblay, 2007) from the non-primary and association auditory pathways (Hocherman et al., 1976; Kraus and McGee, 1993; Kraus et al., 1994; Grady et al., 1997; Busse et al., 2005). It is therefore possible that the larger than normal amplitude of peak P2 observed in children with long-term cochlear implant experience reflects increased cognitive demands for attention and multi-sensory system integration during hearing (Jiwani et al., 2013a). This may reflect compensatory mechanisms to offset: (1) the reorganization in the auditory brain potentially occurring during the period of deafness prior to implantation; (2) the abnormal auditory input provided by the cochlear implant; and/or, (3) the absence of sound to the un-implanted ear which may lead to reorganization in the deprived pathways.

Cochlear implant users compensate for the abnormal input they receive through the device (Doucet et al., 2006; Giraud and Lee, 2007; Lee et al., 2007a,b; Hopyan-Misakyan et al., 2009; Strelnikov et al., 2010; Hopyan et al., 2011; Kral and Sharma, 2011; Lazard et al., 2011, 2012; Hopyan et al., 2012; Sandmann et al., 2012). We found that children using cochlear implants depend on visual cues more heavily than normal to listen for complex information embedded in speech (Hopyan-Misakyan et al., 2009). Emotion perception was tested using 2 subtests of the standardized Diagnostic Analysis of Nonverbal Behavior-2 (DANVA-2) in 18 cochlear implant users who received one implant by 2.9±0.9 years, had 7.2 ± 1.3 years of cochlear implant experience at the time of the test, and had good speech perception skills. In the first test, children listened to the spoken sentence: "I'm going out of the room now and I'll be back later" (24 trials), and had to decide which 1 of 4 emotions (happy, sad, angry or fearful) was conveyed by the voice. In the second test, children watched pictures of other children's faces, each depicting one of the same four emotions, and had to decide which emotion was conveyed by the photographs. Performance accuracy was assessed for each task, and compared to 18 normal hearing controls who were matched for age (10.3 ± 1.5 years of age) (Hopyan-Misakyan et al., 2009).

Children using cochlear implants showed significantly poorer than normal performance on the emotion identification task in the auditory subtest (*F*(1,34) = 43.7, *p* > 0.01). This deficit does not reflect a general failure to identify emotions, however, since they performed as well as their peers with normal hearing when the emotions were presented in the visual modality (*F*(1,34) = 0.1, *p* > 0.05) (Hopyan-Misakyan et al., 2009). The inability of these children to perceive emotions in speech might reflect abnormal development of cortical representation of emotional prosody in speech without normal hearing (Nishimura et al., 1999; Lee et al., 2001, 2007b; Doucet et al., 2006; Meredith and Lomber, 2011; Sandmann, 2012; Sandmann et al., 2012).

In sum, unilateral cochlear implantation promotes the development of normal-like activity in the auditory pathways over the long-term, but functional abnormalities persist. These could reflect: (1) deleterious or irreversible changes to neural reorganization which occurred during the period of auditory deprivation in early life, (2) abnormal representation of sound through electrical pulses stimulation of the auditory system, and/or (3) abnormal cortical development driven by the absence of auditory input to the deprived pathways from the opposite un-implanted ear. We have been studying effects of the latter issue in children.

## **BINAURAL HEARING IS NOT AVAILABLE TO TRADITIONAL UNILATERAL COCHLEAR IMPLANT USERS**

Hearing through only one cochlear implant eliminates access to binaural hearing, which is the ability of the auditory system to process and integrate auditory input from both ears. Binaural hearing is especially important for children because they are rarely in one place and listening to a single speaker at a time. Children need to attend to and discriminate between several sound sources when playing and learning. The noise, reverberation and distance, predominant in most situations including typical classrooms, make it challenging for children to listen and learn when binaural cues are not accessible. For children who are deaf in both ears, binaural hearing might be achieved with bilateral cochlear implantation (i.e., cochlear implants in both ears) (van Hoesel and Tyler, 2003; Litovsky et al., 2004, 2006; Brown and Balkany, 2007; Steffens et al., 2008b; Basura et al., 2009; Eapen and Buchman, 2009; Gordon et al., 2010, 2011b; Salloum et al., 2010; Chadha et al., 2011). Bilateral cochlear implantation is now being increasingly provided to children either in the same surgery (simultaneously) or in two different surgeries following a period of unilateral implant use (sequentially).

Bilateral cochlear implants attempt to restore binaural hearing by providing information to both ears. Normally, the auditory system compares, processes and integrates subtle differences between level and timing of sounds reaching each ear. In this way, binaural hearing allows: (1) the identification/localization of sound sources in space (Batteau, 1967; Lorenzi et al., 1999; Van Deun et al., 2009b; Grothe et al., 2010); (2) increased perception of loudness through binaural summation (Bocca, 1955; Blegvad, 1975);

"fpsyg-04-00719" — 2013/10/15 — 18:28 — page 5 — #5

and (3) improved hearing in quiet and in noisy environments through the head shadow and squelch effects (Hawley et al., 2004; Van Wanrooij and Van Opstal, 2004). Binaural hearing also makes communication less tiring which enables listening and communication to be a more pleasant experience. Although restoring binaural hearing is the goal of bilateral implantation, this has not been completely realized in either adults or children (van Hoesel and Tyler,2003; Seeber and Fastl,2008; Grieco-Calub and Litovsky, 2010; Salloum et al., 2010).

Children who are deaf in both ears hear speech better with bilateral cochlear implants than unilateral implants (Litovsky et al., 2004; Brown and Balkany, 2007; Ching et al., 2007; Galvin et al., 2007; Peters et al., 2007; Seeber and Fastl, 2008; Steffens et al., 2008a; Basura et al., 2009; Eapen and Buchman, 2009; Gordon and Papsin, 2009; Van Deun et al., 2009a; Salloum et al., 2010; Chadha et al., 2011), but do not hear binaural cues normally (Grieco-Calub and Litovsky, 2010; Salloum et al., 2010). Outcomes improve when both implants are provided with limited delays and at young ages (van Hoesel and Tyler, 2003; Gordon and Papsin, 2009; Van Deun et al., 2009a; Gordon et al., 2010; Chadha et al., 2011). As the duration of inter-implant delay decreases, the two ears develop more symmetric speech perception abilities and children show increasing advantages of bilateral over unilateral implantation (Gordon and Papsin, 2009). Significant improvements on standardized speech perception tests are seen as early as 6 months following bilateral cochlear implant stimulation in children who receive their second implant simultaneously or within short delays (Gordon and Papsin, 2009). Furthermore, children implanted with both cochlear implants simultaneously derive significantly more benefit from spatial separation of noise compared to children who have longer delays between implants (Chadha et al., 2011). Sound localization improves in children who are provided access to sound early and in both ears (Van Deun et al., 2009a). By contrast, children who receive both cochlear implants sequentially after long inter-implant delays (>2 years) have persistent asymmetries in auditory function and compromised bilateral benefits for speech perception, even after 36 months of bilateral cochlear implant use (Gordon and Papsin, 2009). Sequentially implanted children also seem to depend more on their first implanted ear than their second for speech perception, and show less bilateral improvement (relative to unilateral implant use) on speech outcomes than children implanted simultaneously or with limited delay (Gordon and Papsin, 2009). These children localize sound inaccurately and rely heavily on level cues to do so (Grieco-Calub and Litovsky, 2010). The negative effect of inter-implant delay might be explained by underlying changes to the developing auditory pathways before and after unilateral and bilateral implantation.

## **EVIDENCE OF A SHORT SENSITIVE PERIOD FOR BILATERAL INPUT IN HUMAN AUDITORY DEVELOPMENT**

Data presented in **Figures 1** and **2** show that unilateral stimulation promotes development of the auditory pathways (Jiwani et al., 2013a), thus limiting effects of deafness. At the same time, this development might occur at the expense of pathways from the opposite and deprived ear. This might be explained by the absence

"fpsyg-04-00719" — 2013/10/15 — 18:28 — page 6 — #6

of inhibition which would normally have come from input from the opposite ear during binaural hearing (Grothe et al., 2010). Without this inhibition, ascending projections from the stimulated ear may be abnormally strengthened in children who are deaf and use unilateral cochlear implants.

We studied bilateral auditory function in children who had different durations of unilateral exposure. We hypothesized that the stage of unilaterally driven brainstem development would be an important factor to consider. Perhaps changes occurring in the brainstem at earlier stages of unilaterally driven development would have less long lasting consequences on the bilateral pathways than after the unilaterally stimulated brainstem reached maturity. Development in the auditory brainstem is largely complete by 1 year of unilateral implant use (Gordon et al., 2006). Thus, children with >2 years of unilateral experience were categorized as having mature auditory brainstem function and long unilateral use. Children with <1 year of unilateral experience were considered to have short-term use with continuing auditory brainstem development. Auditory development in these children was compared to that of children who were deaf and had not yet used cochlear implants (i.e., limited to no auditory brainstem development). All children were implanted bilaterally, allowing us to assess auditory brainstem function evoked by stimulation from each ear. All children receiving bilateral implants sequentially showed brainstem responses which were faster when evoked by the experienced ear compared to the newly implanted ear at initial bilateral implant use (Gordon et al., 2008b). This was expected and confirmed earlier findings that the first implant promoted improved neural conduction through the brainstem. Repeated tests completed after 1.7 ± 1.65 year of bilateral implant use indicated that mismatches in response latencies persisted in a group of children receiving the second implant after a long delay (>2 years) (Gordon et al., 2012). Increased response latencies in response to sound from the second implanted side could reflect decreased axonal myelination, longer neural conduction times, slower or weaker synapses or more asynchronous neural activity – all signs of more limited brainstem development. Abnormal mismatches between brainstem response latencies were never present in children receiving bilateral implants simultaneously and resolved with bilateral implant use in children who received both implants after a short inter-implant delay (<1 year) (Gordon et al., 2007, 2008b, 2011b, 2012). Thus, allowing the brainstem to develop unilaterally for >2 years compromises the later promotion of symmetrically functioning bilateral auditory brainstem pathways.

Mismatched bilateral auditory development in sequentially implanted children was not restricted to the brainstem. Effects of asymmetric activity in the pathways from the first stimulated ear were also found in the auditory cortex. Consistent with the brainstem findings, cortical abnormalities were not resolved by chronic bilateral implant use (3.57 ± 0.74 years) when unilateral experience exceeded 1.5 years in children who were implanted early (1.87 ± 1.25 years of age). These findings were recently reported by Gordon et al. (2013) and are shown in **Figure 3** (re-printed from that paper). We used a unique and validated"Time Restricted Artifact and Coherent Suppression" (TRACS) beamformer method (Wong and Gordon, 2009) to suppress the electrical artifact from

"fpsyg-04-00719" — 2013/10/15 — 18:28 — page 7 — #7

**FIGURE 3 | Re-printed with permission from Gordon et al. (2013).** "**(A)** Per cent cortical lateralization (mean ± 1 standard error) is plotted for each participant group. Greater than normal contralateral lateralization to right/CI-1 stimuli was found in long delay and unilateral cochlear implant users (*p* < 0.05 and <0.0001, respectively) but not in short delay and simultaneous groups (*p* > 0.05). The long delay group showed a decrease in contralateral lateralization/increase in ipsilateral lateralization relative to those with normal hearing in response to left/CI-2 stimulation. This did not occur in the short delay and simultaneous groups. **(B)** Grand mean virtual sensor data for left

and right hemispheric sources of P1 (normal hearing) and P1ci (cochlear implant users for stimulation from right/CI-1 and left/CI-2). Large peaks in responses to CI-1 (right) stimulation can be seen in the long delay and unilateral group data. **(C)** Left and right hemispheric dipole moments (mean ± 1 SE) for P1/P1ci in each group in response to right/CI-1 and left/CI-2 stimulation. In response to CI-1 (right) stimulation, there is a marked increase in left hemispheric dipole moments in participant groups with >2 years of unilateral hearing experience (long delay and unilateral; *p* < 0.05)." (Gordon et al., 2013; Brain, Figure 7, p. 11)

the cochlear implant device and spatially localize areas of cortical activity in hemispheres ipsilateral and contralateral to stimulation. Like many imaging methods, the brain was divided into thousands of 3-dimensional coordinate spaces (voxels). Responses were recorded at 64-cephalic surface electrodes and the contribution of the dipole centered in each voxel to the measured field was assessed by the adaptive spatial filter of the TRACS beamformer. Dipole moments for a given voxel were calculated across latency (virtual sensor) and peak values were used for analyses.

Cortical responses were evoked by unilateral electrical pulse trains delivered from one implant electrode in seven children with normal hearing, eight children who were implanted unilaterally in the right ear (2.32 ± 1.61 years) and had 7.21 ± 2.48 years of hearing experience and 26 children who used bilateral cochlear implants for 3.42 ± 0.59 years. Of the bilateral implant users, 10 children received both cochlear implants simultaneously and 16 were sequentially implanted (right ear implanted first with no hearing aid in the left ear). Bilateral deafness prior to implantation was limited (1.74 ± 0.90 years) in all children. The children in this study had less than 12 years of hearing experience, and therefore all produced a cortical evoked response which was dominated by an immature large amplitude positive peak, similar to the one shown in **Figure 2A**. The differences between the dipoles from the left and right auditory cortices were normalized as a percent lateralization [% lateralization = (dipole right − dipole left)/(dipole right + dipole left) × 100].

A larger than normal variability in the lateralization of cortical dipoles was found in children receiving bilateral cochlear implants sequentially. A factor analysis of multiple demographic variables identified the duration of unilateral implant use as the factor which best accounted for the spread of cortical responses. We thus further analyzed the cortical lateralization data for effects of duration of unilateral implant use occurring prior to bilateral implantation. When responses were evoked by the first (i.e., right) implant, there was an increase in lateralization of activity to the contralateral left auditory cortex with unilateral implant use. This became significantly larger than the percent of cortical lateralization in the simultaneously implanted group at 1.48 years of unilateral implant use. Consistent results were obtained in data evoked by the second (i.e., left) implant but, in this case, cortical lateralization changed from the normally expected contralateral direction to ipsilateral lateralization with unilateral implant use. This abnormal switch to larger activity in the ipsilateral auditory cortex became significantly different from responses in the simultaneously implanted group by 1.37 years of unilateral implant use. These analyses indicated that children with longer than approximately 1.5 years of unilateral implant use had experienced an abnormal strengthening of pathways from their first implanted right ear through the auditory brainstem (Gordon et al., 2008b, 2012) to their left contralateral cortex. This was not resolved by several years of bilateral implant use and was associated with poorer speech perception in the second than first implanted ear (Gordon et al., 2013).

The importance of restricting unilateral implant use to less than 1.5 years is further evident in **Figure 3** (reprinted from Gordon et al., 2013). Here, the grand mean lateralization of cortical activity are shown (**Figure 3A**), as well as the grand mean dipole moments identified from the virtual sensors in each hemisphere (**Figure 3B**). The group of 16 sequentially implanted children have been divided into two groups based on the cut off of 1.5 years of unilateral implant use. The Short Delay group includes seven children who had 0.86 ± 0.1 years of unilateral implant experience at the time of testing. The other nine children, the Long Delay group, had more than 2 years of unilateral implant use (3.44 ± 1.27 years). The single positive peaked response is clear in all of the group averaged waveforms shown in **Figure 3B**. The maximum dipoles were marked and analyzed in each child. The left plot of **Figure 3C** shows that dipoles evoked by stimulation from the first/right implanted ear resulted in significantly higher dipoles in the left auditory cortex (blue bars) of children who had >1.5 years of unilateral implant use (Unilateral and Long Delay groups) than other groups of children (*F*(4,36) = 3.52, *p* < 0.05). The similar findings for these two groups confirm that unilaterally driven strengthening of projections to the contralateral left auditory cortex was not reversed by the addition of a second cochlear implant. This was true despite the children in the Long Delay group having had several years of bilateral implant experience at the time of the test. The right plot in **Figure 3C** shows mean dipoles for each auditory cortex in response to left/second cochlear implant stimulation. The Long Delay group shows significantly higher dipole moments in the left auditory cortex than the other groups of children (*F*(3,29) = 5.31, *p* < 0.01). Thus, regardless of which ear was stimulated, the left auditory cortex (contralateral to the first/right implanted ear) was the more active side of the brain in children who had used one implant for >1.5 years. One explanation for this finding is that the specialized processing of language in left auditory cortex (Zatorre and Belin, 2001; Zatorre et al., 2002; Tervaniemi and Hugdahl, 2003; Firszt et al., 2006) is abnormally increased in unilateral cochlear implant users. It is not clear, however, how such a network would have been recruited by the simple non-speech stimuli used in the present experiment. An alternate explanation is that unilateral stimulation allowed abnormal strengthening of pathways from that ear.

Further evidence that the cortical changes were due to unilaterally driven strengthening was found by assessing activity in the ipsilateral/right auditory cortex. We assessed which ear preferentially activated the hemisphere contralateral to the ear deprived during the period of unilateral implant use (i.e., the right auditory cortex). The right auditory cortex was expected to respond more strongly to input from the left than right ear because the majority of neurons from one ear normally cross to the contralateral brainstem and ascend ipsilaterally from there. This was confirmed in the group of children with normal hearing and children with limited unilateral implant use prior to bilateral implantation (short delay and simultaneous). By contrast, this pattern was reversed in children in the Long Delay group. This meant that this group of children had experienced a strengthening of pathways from their hearing ear to both the ipsilateral (right) cortex, as shown by the reversal of aural preference, as well as the contralateral (left) cortex as shown by the data in **Figure 3**. The same reversal of aural preference in the cortex ipsilateral to the hearing ear has recently been reported in congenitally deaf white cats (Kral et al., 2013).

The abnormal strengthening of pathways from the unilaterally hearing ear to the immature brain seems to initially occur at

"fpsyg-04-00719" — 2013/10/15 — 18:28 — page 8 — #8

the level of the brainstem. This is supported by evidence of mismatched brainstem latencies observed from children with long (>2 years) unilateral hearing experience (Gordon et al., 2012). The shorter wave eV latencies evoked from the more experienced ear suggest an increasing efficiency of activity from this side and a weakening of pathways from the opposite ear, as reflected by slower peak latencies on the second implanted side. This could result from a lack of inhibitory processes in the brainstem which are normally present during binaural hearing (Grothe et al., 2010). Listening from one side would allow auditory input from the first right implanted side to be projected to the cortex with abnormally high excitation during development thus strengthening pathways to the contralateral cortex. It appears that if this is allowed to occur until the brainstem is largely developed (i.e., >1 year of unilateral implant use), it establishes asymmetric activity in the auditory pathways which is not easily reversed by providing a second implant in the deprived ear. Limiting the period of unilateral hearing in children by providing bilateral cochlear implants with little or no delay appears to protect the bilateral pathways from this abnormal development. These findings thus suggest that there is a sensitive period of 1.5 years for binaural auditory development in children.

## **LONG-TERM UNILATERAL IMPLANT USE IN OLDER CHILDREN CAUSES LASTING ASYMMETRY IN THE BILATERAL AUDITORY PATHWAYS**

We make the case above that unilateral implant use in children who have been deaf since infancy should be limited to less than 1.5 years to promote normal-like symmetrical development of the auditory pathways from both ears. However, providing bilateral implants within this time frame may not always be possible. For example, many adolescents/young adults who were implanted as babies and have already had many years of unilateral hearing experience are now seeking a cochlear implant for their opposite ear in hopes of deriving benefits of bilateral implantation. These children are different in several ways from our previous research cohorts of sequentially implanted children. They have had very long periods of unilateral cochlear implant use concurrently with long durations of deprivation in their non-implanted ear, and they are no longer children. We thus expect unique cortical development in this new group of bilateral implant users, relative to our previous study groups.

**Figure 4** shows the cortical responses recorded at a midline cephalic location on the head (Cz) and evoked by cochlear implant stimulation from each ear on the first day of activation of the second implant in a child who had 15.95 years of hearing experience on the right side and was deprived of auditory input in the left ear. These measures were repeated after 1 month of bilateral implant use and then again after 9 months (Jiwani et al., 2013b,c). Responses from the latter two time points are shown in **Figures 4B,C**, respectively. The red waveform shows the grand mean response recorded from the side with long-term unilateral cochlear implant experience, and the blue is the cortical waveform evoked by stimulation of the newly implanted side (naïve side). The two responses are very different from one another at all time points. Consistent with previous findings, the cortical responses from the experienced side (red waveform) in **Figure 4** were dominated by a mature-like morphology, comprised of the obligatory peaks P1-N1-P2-N2, similar to those expected in same aged peers with normal hearing (Jiwani et al., 2013a). By contrast, responses recorded from the newly implanted ear (blue waveform) were characterized by different peaks occurring with much larger amplitudes than the responses from the side with long-term hearing experience; a large negative peak (*N*(ci)), followed by a large positive peak (*P*(ci)) can be seen (Jiwani et al., 2013b,c). Little changes to either response occurred over the first months of bilateral implant use. Slight decreases in the latencies and amplitudes of the peaks evoked by the newly implanted ear were found after one month (**Figure 4B**), with almost no change in latency, amplitude or waveform morphology thereafter. This is shown by the response recorded at 9 months following activation of the second implant in **Figure 4C** (Jiwani et al., 2013c).

The lack of cortical development evoked by stimulation of the second implanted side is in contrast to the rapid developmental change expected to occur at early stages of unilateral cochlear implant use in young children (Sharma et al., 2002a; Sharma and Dorman, 2006), and, rather, more similar to the limited change reported in older children implanted after long durations of bilateral deafness (Sharma et al., 2002b; Gordon et al., 2005, 2008a). This might reflect immaturity or abnormalities in auditory development from the second implanted side, driven by either long duration of auditory deprivation or by maturation of the auditory cortex from unilateral cochlear implant use. Providing a second implant to children after this period has passed may prevent the naïve cortical pathways from developing after an important period in cortical auditory development has been missed. The findings from our previous study (Gordon et al., 2013) (discussed above and shown in **Figure 3**) suggest that there is an early sensitive period for bilateral brainstem development (exceeded after 1.5 years of unilateral implant use) and a later cortical maturation promoted by unilateral use of over 10 years (Jiwani et al., 2013a), as shown by the data in **Figure 4** (Jiwani et al., 2013b,c). Together, these results suggest that there are multiple sensitive periods in the developing auditory system.

## **BILATERAL IMPLANTATION WITHIN A SENSITIVE PERIOD IMPROVES PERCEPTION OF BINAURAL TIMING CUES**

As reviewed above, several lines of investigation suggest that the potential for promoting binaural hearing in children who are deaf will be best realized by limiting the period of bilateral deafness and providing bilateral implants with little delay. We have been studying the perception of binaural level and timing cues in children who received bilateral cochlear implants because these cues are important for binaural hearing. Interaural level and timing cues arise because sounds coming from one side of the head reach the closer ear at higher intensities and/or faster than the other ear. Level and timing differences are coded in the auditory brainstem by the degree of inhibition (Grothe et al., 2010).

We found that 19 children receiving one implant at 2.1 ± 1.1 years of age and the second after 4.9 ± 2.8 years of unilateral implant use can hear changes in interaural level differences but have particularly poor abilities to detect interaural timing cues even after several years of bilateral cochlear implant use (Salloum et al., 2010). Poor detection of binaural timing

"fpsyg-04-00719" — 2013/10/15 — 18:28 — page 9 — #9

**FIGURE 4 | Example of cortical evoked responses from an adolescent in the Jiwani et al. (2013b,c) study cohorts.** She received a right unilateral cochlear implant (red waveform) within limited durations of bilateral deafness (3 years of age) and used it unilaterally to hear for 15.95 years. She then received a second implant in the opposite and

deprived left side (naïve side; blue waveform). Cortical responses evoked from both implants are shown at: **(A)** the first day of activation of the second implanted ear (Jiwani et al., 2013b), **(B)** one month after bilateral implantation (Jiwani et al., 2013c) and **(C)** 9 months following bilateral cochlear implant experience (Jiwani et al., 2013c).

cues by sequentially implanted children was surprising given evidence from a similar group showing that the auditory brainstem integrates input from both implants as measured by the electrophysiological binaural interaction component (Gordon et al., 2012). This measure is a calculated difference between the sum of the left and right evoked auditory brainstem responses and the bilaterally evoked brainstem response. Peaks in the difference response reflect inhibition occurring with binaural processing (Dobie and Berlin, 1979; Dobie and Norton, 1980; Brantberg et al., 1999). Using this difference measure, we found that tonotopic organization is maintained in the bilateral brainstem of children who are deaf and that the pathways continue to code interaural level cues despite development driven from one ear before the other. There are consequences of the mismatches in development resulting from unilateral implant use. Although the auditory brainstem codes interaural timing differences, this does not occur normally (Gordon et al., 2008b). A miscalculation of binaural brainstem interactions results from the mismatch in neural conduction (measured by shorter peak latencies responses from the more experienced ear). More recent findings show that a sound arriving first to the more experienced ear by 1ms, for example, reduces the binaural interaural component more than when it arrives first by the same amount to the second implanted ear (Gordon, et al., in preparation). Nonetheless, coding of interaural timing remains (albeit abnormally calibrated); thus abnormal brainstem processing cannot account for the profound difficulties these children have detecting timing differences sent by their bilateral implants. This suggests a deficit for interaural timing processing in more central areas of the auditory system which likely occurred during the period before bilateral implantation. In support, the numbers of cortical neurons specialized to respond to interaural timing cues are reduced in congenitally deaf white cats (Tillein et al., 2010) as are numbers of neurons in auditory cortices responsible for sound localization (Malhotra et al., 2008). In more recent work, we are asking whether binaural timing cues are better heard by children who received bilateral cochlear implants simultaneously. Preliminary findings suggest good potential for

development of binaural hearing in children who have limited durations of bilateral and unilateral deafness, but is compromised in children with long unilateral cochlear implants experience (>1.5 years).

## **CONCLUSION**

"fpsyg-04-00719" — 2013/10/15 — 18:28 — page 10 — #10

We have reviewed evidence showing that access to sound within limited durations of bilateral deafness in early life promotes normal-like development of activity along the auditory pathways in children who have many years of hearing experience with a unilateral cochlear implant. At the same time, however, the unilaterally driven stimulation leaves the opposite pathways deprived of input and susceptible to reorganization. We find that providing bilateral cochlear implants to children after a period of unilateral deafness of longer than 1.5 years drives abnormal mismatches in activity at the level of the brainstem and cortex. This is characterized by abnormal strengthening of activity to both the contralateral and ipsilateral auditory cortices from the first implanted ear. These abnormalities in auditory development are associated with more asymmetric speech perception, poorer hearing in noise, abnormal sound localization, and an inability to identify inter-aural timing cues. These skills are important for normal integration and processing of auditory input. We therefore suggest that binaural hearing is compromised in children who receive bilateral cochlear implants after a period of unilateral implant use exceeding 1.5 years. With that in mind, cochlear implants should be provided to children early as well as bilaterally within very limited or no delays between implants (i.e., simultaneously). Our current studies are now examining how much residual hearing is needed in the un-implanted ear to provide a potential protective effect against unilaterally driven reorganization and whether bimodal hearing (acoustic and electrical input) can be used to restore binaural hearing. Further, we are asking whether the sensitive period for bilateral input can be "reopened" by attempting to strengthen pathways from the second implanted ear to restore symmetric bilateral pathways and binaural hearing. Our findings suggest that both bilateral

and unilateral deprivation should be limited to promote optimal binaural hearing in children who use cochlear implants, and enable them to function better and more naturally in challenging listening situations such as the playground or classroom environments.

## **REFERENCES**


"fpsyg-04-00719" — 2013/10/15 — 18:28 — page 11 — #11

## **ACKNOWLEDGMENTS**

The Authors of this manuscript wish to thank all of the families who participated in our studies, as well as all of the members of the clinical and research teams of the Cochlear Implant Program at the Hospital for Sick Children.

brainstem-evoked responses. *Arch. Otolaryngol.* 105, 391. doi: 10.1001/ archotol.1979.00790190017004


using cochlear implants. *Audiol. Neurotol.* 11, 7–23. doi: 10.1159/ 000088851


645. doi: 10.1097/AUD.0b013e3181 e50a1d


"fpsyg-04-00719" — 2013/10/15 — 18:28 — page 12 — #12

an early sensitive period. *Brain* 136, 180–193. doi: 10.1093/brain/aws305


humans. *Neurosci. Lett.* 301, 75– 77. doi: 10.1016/S0304-3940(01) 01597-X


language development of children with severe to profound hearing loss. *J. Speech Lang. Hear. Res.* 50, 1048. doi: 10.1044/1092-4388(2007/073)


*Clin. Neurophysiol.* 111, 220– 236. doi: 10.1016/S1388-2457(99) 00236-9


"fpsyg-04-00719" — 2013/10/15 — 18:28 — page 13 — #13


of interaural time difference in congenital deafness. *Cereb. Cortex* 20, 492–506. doi: 10.1093/cercor/ bhp222


sound localization in children with bilateral cochlear implants. *Audiol. Neurotol.* 15, 7–17. doi: 10.1159/ 000218358


"fpsyg-04-00719" — 2013/10/15 — 18:28 — page 14 — #14

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 01 May 2013; accepted: 18 September 2013; published online: 16 October 2013.*

*Citation: Gordon KA, Jiwani S and Papsin BC (2013) Benefits and detriments of unilateral cochlear implant use on bilateral auditory development in children who are deaf. Front. Psychol. 4:719. doi: 10.3389/fpsyg.2013.00719*

*This article was submitted to Auditory Cognitive Neuroscience, a section of the journal Frontiers in Psychology.*

*Copyright © 2013 Gordon, Jiwani and Papsin. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Theta brain rhythms index perceptual narrowing in infant speech perception

#### *Alexis N. Bosseler 1,2, Samu Taulu3, Elina Pihko4, Jyrki P. Mäkelä5, Toshiaki Imada1, Antti Ahonen3 and Patricia K. Kuhl <sup>1</sup> \**

*<sup>1</sup> Institute for Learning & Brain Sciences, University of Washington, Seattle, WA, USA*

*<sup>2</sup> Cognitive Brain Research Unit, University of Helsinki, Helsinki, Finland*

*<sup>3</sup> Elekta Oy, Helsinki, Finland*

*<sup>4</sup> Brain Research Unit, O.V. Lounasmaa Laboratory, School of Science, Aalto University, Helsinki, Finland*

*<sup>5</sup> BioMag Laboratory, HUS Medical Imaging Center, Helsinki University Central Hospital, Helsinki, Finland*

#### *Edited by:*

*Virginia Penhune, Concordia University, Canada*

#### *Reviewed by:*

*Virginie Van Wassenhove, Institut National de la Santé Et de la Recherche Médicale, France Huan Luo, Chinese Academy of Sciences, China*

#### *\*Correspondence:*

*Patricia K. Kuhl, Institute for Learning & Brain Sciences, University of Washington, Portage Bay Building, Box 357988 Seattle, WA 98195-7988, USA e-mail: pkkuhl@u.washington.edu*

The development of speech perception shows a dramatic transition between infancy and adulthood. Between 6 and 12 months, infants' initial ability to discriminate all phonetic units across the worlds' languages narrows—native discrimination increases while non-native discrimination shows a steep decline. We used magnetoencephalography (MEG) to examine whether brain oscillations in the theta band (4–8 Hz), reflecting increases in attention and cognitive effort, would provide a neural measure of the perceptual narrowing phenomenon in speech. Using an oddball paradigm, we varied speech stimuli in two dimensions, stimulus frequency (frequent vs. infrequent) and language (native vs. non-native speech syllables) and tested 6-month-old infants, 12-month-old infants, and adults. We hypothesized that 6-month-old infants would show increased relative theta power (RTP) for frequent syllables, regardless of their status as native or non-native syllables, reflecting young infants' attention and cognitive effort in response to highly frequent stimuli ("statistical learning"). In adults, we hypothesized increased RTP for non-native stimuli, regardless of their presentation frequency, reflecting increased cognitive effort for non-native phonetic categories. The 12-month-old infants were expected to show a pattern in transition, but one more similar to adults than to 6-month-old infants. The MEG brain rhythm results supported these hypotheses. We suggest that perceptual narrowing in speech perception is governed by an implicit learning process. This learning process involves an implicit shift in attention from frequent events (infants) to learned categories (adults). Theta brain oscillatory activity may provide an index of perceptual narrowing beyond speech, and would offer a test of whether the early speech learning process is governed by domain-general or domain-specific processes.

**Keywords: speech perception, infants, magnetoencephalography, perceptual narrowing, brain rhythms**

#### **INTRODUCTION**

Exposure to language early in infancy produces a dramatic shift in speech perception between 6 and 12 months, a period that has been referred to as a "critical" or "sensitive" period for the development of native-language phonetic perception (Kuhl, 2011; Peña et al., 2012). Infants begin life perceiving phonetic differences used to distinguish words across all languages, but by 12 months of age infants show a narrowing in their speech perception abilities—performance on native sound discrimination increases while at the same time performance on non-native sound discrimination declines (Werker and Lalonde, 1988; Kuhl et al., 2006).

Perceptual narrowing during the second half of the first year of life is neither restricted to speech nor to auditory stimuli: It has been demonstrated for sign language (Palmer et al., 2012), for visual stimuli such as faces (Lewkowicz and Ghazanfar, 2006), and for non-speech auditory patterns in music (Saffran et al., 1999; Saffran and Griepentrog, 2001), suggesting that perceptual narrowing may involve a pan-sensory developmental process, with sensory and cognitive effects (Lewkowicz and Ghazanfar, 2006).

Developments in cognitive neuroscience have identified brain measures that can be used to provide different kinds of information about how listeners process auditory stimuli. For example, the classic mismatch response obtained with the oddball paradigm, measured using EEG or MEG signal changes (e.g., Näätänen et al., 1997), indexes a listener's ability to discriminate two auditory stimuli. In contrast, brain oscillatory activity, measured using either electrophysiological (EEG) or magnetic (MEG) data, is being used to track brain rhythms in distinct frequency bands—alpha, theta, gamma, and beta. Power and phase variations in these frequency bands are associated with psychological processes including attention, memory, emotion, and cognitive effort (see Saby and Marshall, 2012 for review). In the domain of speech perception, Poeppel and his colleagues report in several publications (e.g., Ghitza et al., 2013) that modifications in theta phase measured with magnetoencephalography (MEG) serve as a marker that "parses" syllables.

The goal of the present study was to use MEG, a brain imaging technique that records changes in magnetic fields stemming from neuronal activity, to measure brain oscillatory activity in the theta (4–8 Hz) frequency band during a speech discrimination task in 6- and 12-month-old infants, as well as in adults. Theta brain rhythms are prevalent in EEG during infancy. Adults show theta rhythms, as well as higher frequency brain rhythms such as alpha and beta, which are more prominent in adults than in infants (Saby and Marshall, 2012). Increased theta oscillatory activity has been linked to increases in cognitive demand (Inanaga, 1998) and emotion processing (Hagne, 1972). In infants, increased theta has been observed in response to vowels pronounced using a "motherese," as opposed to an adult-directed, speech style (Zhang et al., 2011), and with increases in attention (Stroganova et al., 1998; Orekhova et al., 1999; Berger et al., 2006; Bazhenova et al., 2007).

In the current study, we examined developmental changes in theta brain oscillatory activity during the perception of native and non-native speech contrasts, and hypothesized that theta brain rhythms would be sensitive to the transition that occurs in the latter half of the first year. In the sections that follow, we develop an argument that explains our prediction: we argue that the transition in infant speech perception is affected by a change in implicit attention and cognitive effort. Early in infancy, prior to learning which phonetic units are relevant in their particular language, infants implicitly attend to stimuli that are presented with high distributional frequency. This (implicit) strategy assists in the discovery of the specific phonetic units used in their particular language(s). In adulthood, attending to the raw frequency of incoming syllables is not efficient; instead, adults' implicit strategy is driven by their learned native language categories; after learning, non-native categories stand out, and require increased attention and effort, affecting theta rhythms.

To investigate our hypotheses, we utilized a traditional oddball discrimination paradigm to examine theta oscillatory activity using MEG both before (6 months) and after (12 months) the transition in speech perception, as well as in adulthood, in response to: (1) frequent as opposed to infrequent speech syllables, and (2) native as opposed to non-native speech syllables. In infancy, we hypothesized that increased theta relative to baseline activity would be related to the frequency of stimuli, regardless of their language status (native or non-native). In adulthood, we hypothesized that increased theta relative to baseline would be associated with language status (native or non-native), irrespective of the frequency of presentation of the stimuli.

#### **THE TRANSITION IN DEVELOPMENTAL SPEECH PERCEPTION**

What do we know about the mechanisms underlying the "sensitive period" for phonetic learning? Research has shown that at least two factors—one computational in nature and the other social in nature—alter phonetic perception during the sensitive period of development in typically developing infants (Kuhl, 2010). Evidence for a computational component stems from studies that show "statistical learning" in infants, a sensitivity to the statistical patterns in language input that have been shown to affect phonetic learning, either through natural language exposure (Kuhl et al., 1992) or through the experimental manipulation of the distributional frequency of phonetic units (Maye et al., 2002, 2008). Studies data show that the potency of distributional frequency in affecting speech perception decreases as early as 10 months of age (Thiessen, 2010; Yoshida et al., 2010). In other words, there is evidence of a waning in the sensitivity to distributional frequency with age, which reduces the effect that a particular stochastic pattern in language has on speech perception, and the waning occurs at the age at which the transition in speech perception occurs. Clearly, in adulthood, exposure to a new pattern of distributional statistics of phonetic units, which occurs whenever adults move to a new country and are exposed to its novel language, does not induce robust phonetic learning, even after extensive training (Flege, 1995).

In addition to a computational component, there is evidence that a social component alters phonetic learning during the sensitive period. Infants exposed for the first time to the statistics of a new language in playful interactive sessions between 9- and 10 months of age showed robust learning of a new Mandarin Chinese phonetic contrast (Kuhl et al., 2003). Social interaction with a live tutor is necessary for this kind of natural language learning to occur: Exposure to the exact same material on the exact same schedule and in the exact same setting from a video resulted in no learning (infants performed equivalently to a control group exposed only to English) (Kuhl et al., 2003). These studies buttress arguments that a sensitive period for phonetic learning exists at about 9 months of age. At 9 months, exposure to a complex natural language via interaction with another human being induces learning (see Kuhl, 2007, for discussion).

Finally, the phonetic learning that occurs during this sensitive period may be critical to language learning. Infants' phonetic learning at this age is strongly linked to their future language growth. Longitudinal studies using behavioral (Tsao et al., 2004; Kuhl et al., 2005) as well as brain measures (event-related potentials, ERPs) (Rivera-Gaxiola et al., 2005; Kuhl et al., 2008), reveal that the ability to discriminate native language phonetic units predicts the growth of language to the age of 30 months, as well as reading readiness at the age of 5 years, independent of socioeconomic status of the child's family, and independent of the child's language scores at the age of 30 months (Cardillo Lebedeva and Kuhl, 2009; Cardillo, 2010). These data are correlational in nature; causal relations cannot be assumed. Nevertheless, the data support the idea that phonetic learning during a sensitive period in early development may be an important pathway for initial language learning that aids learning at higher levels of language.

#### **MECHANISMS OF DEVELOPMENTAL CHANGE AND BRAIN OSCILLATORY ACTIVITY**

Given the potential importance of early phonetic learning, recent research has been focused on identifying the mechanisms that underlie both the timing and the nature of the perceptual narrowing process. Regarding timing, premature infants, who by virtue of early birth have a longer period of exposure to speech by the end of the first year, do not show the transition in speech perception at an earlier age (Pons et al., 2012). This result, plus the data mentioned previously showing new phonetic learning from first-time exposure to a new language at 9 months (Kuhl et al., 2003) suggest that the perceptual narrowing process is not brought about by a prescribed amount of language exposure in the infant's environment, nor by a protracted period of exposure to a language. On the other hand, maternal prenatal exposure to antidepressants (serotonin reuptake inhibitors, SRIs) has been reported to accelerate phonetic development in infants, while untreated maternal depression is reported to slow the timing of the transition in infant speech perception (Weikum et al., 2012). These results, along with previously reviewed findings showing that social interaction promotes phonetic learning in typically developing infants (Kuhl et al., 2003; Conboy and Kuhl, 2011; Kuhl, 2011), and results showing an association between speech perception and the volume of the amygdala—infants with larger right amygdala at 6 months showed lower expressive and receptive language scores at age 2, age 3, and age 4 (Ortiz-Mantilla et al., 2010)—implicates the limbic system and the regulation of motivation and emotion in the transition in phonetic perception (see also Deniz Can et al., 2013).

One candidate mechanism for pan-sensory perceptual narrowing related to social/emotional development is the nascent set of abilities associated with infant attention. Evidence indicating that social interaction is a key skill in language development is of interest because social abilities such as eye gaze following come on line during the putative critical period for phonetic learning in typically developing infants. The ability to attend to another's gaze is a developmental ability that typically appears to emerge between 9 and 12 months of age (Childers and Tomasello, 2002; Gleitman et al., 2005; Hoff, 2006; Meltzoff et al., 2009; Naigles et al., 2009; Csibra, 2010). Posner and Raichle (1994) hypothesized that the onset of an initial nascent form of attentional control during infancy stems from maturation of an anterior attention network. In the domain of speech perception, infants' tendency to show a decline in the perception of non-native phoneme contrasts is significantly correlated with the growth of attentional control skills (Lalonde and Werker, 1995; Hespos and Spelke, 2004; Conboy et al., 2008).

Studies utilizing EEG and MEG suggest that oscillatory brain activity over time reflects changes in attention, learning, and memory. Neural rhythms are hypothesized to indicate synchronized global and local neural networks that operate at different frequencies (Varela et al., 2001; Buzsáki and Draguhn, 2004), and vary with sensory, motor and cognitive (attention and memory) task demands (Klimesch, 1999). Brain rhythms reveal selective activation and inhibition of neural networks involved in sensory and cognitive processing (Knyazev, 2007).

Theoretical work indicates that brain oscillations in the theta band (∼4–8 Hz in adults) index the control of attention and cognitive effort in adults involving a wide variety of verbal and non-verbal stimuli. In adults, theta measured with EEG or MEG is linked to focused attention and the encoding of new information (Klimesch, 1999), as well as novelty detection and increased memory load (Jensen and Tesche, 2002; Hsiao et al., 2009). Jensen and Tesche (2002) recorded MEG responses from 10 adult subjects who were asked to retain a list of 1, 3, 5, or 7 visually presented digits during a 3-s retention period. The authors show that during retention, theta activity increased parametrically with the number of items on the list. The results were interpreted as reflecting an increase in the allocation of cognitive resources that are required as the demands on working memory increase. In adults, interpretation of increased theta activity with working memory demands is that the increase reflects enhanced attention (Mizuki et al., 1980; Bruneau et al., 1993; Gevins et al., 1997).

Studies examining theta power are increasing, and theta brain rhythms have been recorded in infants as young as 2 months of age using EEG (see Saby and Marshall, 2012 for review). Existing studies are consistent with adult data and suggest that infant theta also increases when attention increases, either during cognitive (Stroganova et al., 1998; Orekhova et al., 1999; Bell and Wolfe, 2007) or social tasks (Stroganova and Posikera, 1993). Zhang et al. (2011) demonstrated increased theta for simple vowel sounds that reflected an infant-directed ("motherese") style of speaking as opposed to a more standard adult-directed style.

The topographical distribution of theta in infants is sensitive to the cognitive task or behavior under study (Saby and Marshall, 2012). In a study conducted using EEG, Bell and Wolfe (2007) showed that power in the theta band increases with memory load in 8-month-old infants across the entire scalp. However, when the children were tested again at age 4.5 with age-appropriate working memory tasks, increased theta activity was observed at frontal medial sites only. Previously mentioned studies linked frontal theta to the executive control of attention (Stroganova et al., 1998; Orekhova et al., 1999). Orekhova et al. (1999) measured theta in infants aged 8–11 months while they watched an object, anticipated a partner's appearance in a peek-a-boo game, and during the partner's subsequent appearance in the game. Theta increases were maximal during anticipation of the person's appearance at frontal electrode sites, which, the authors argue, support the idea that theta activity in infants increases during tasks that require sustained attention and particularly the regulation of attention (see also Stroganova et al., 1998).

Orekhova et al. (2006) tested both 10-month-old infants and pre-school children aged 3–6 years old during new toy exploration and social stimulation, and reported increases in theta rhythm and suppression of mu rhythm at both ages during both conditions, arguing that these activities engage attentional networks as reflected by the theta rhythm increase.

Berger et al. (2006) showed increases in theta activity in 7 month-old infants when a violation of expectancy occurred in an arithmetic test involving puppets. The authors interpreted the data as revealing a nascent indicator of executive function, even before infants have real control in self-regulatory processes. In summary, the existing literature suggests that theta oscillations, both in adults and infants, are modulated by tasks that elicit increased attention or cognitive effort.

#### **DESIGN OF THE CURRENT STUDY**

Based on prior data and theory, we hypothesized that the measurement of theta oscillatory rhythms in response to speech syllables that vary in relative *frequency* (frequent vs. infrequent) and *language category* (native vs. non-native) would differ in a particular way across age. Specifically, we hypothesized that the two features of phonetic speech signals (frequency and language) would elicit differential attention and cognitive effort (and therefore, increased theta power) in infants as opposed to adults. In infancy, before phonetic learning has occurred, infant attention (and therefore, theta) would increase for highly frequent phonetic elements, as shown by studies of statistical learning (Maye et al., 2002). In adulthood, after phonetic categories are learned, the status of a phonetic element, that is, whether it is drawn from a native vs. non-native category, would be expected to drive attention and cognitive effort (and therefore, theta).

MEG brain imaging was used to investigate these hypotheses. We measured the brain's theta oscillatory rhythms in response to speech syllables at three ages—6- and 12-months of age, as well as in adulthood. The traditional oddball paradigm allowed us to co-vary the two dimensions of interest (distributional frequency and native vs. non-native) to test our hypotheses. Confirmation of the hypotheses should show that theta brain oscillations vary depending on age. We expected a significant interaction between age and frequency and between age and language category: Infants (but not adults) were predicted to show theta increases for frequent over infrequent stimuli; adults (but not infants) were predicted to show theta increases for non-native relative to native syllables. More specifically, we predicted that early in infancy (6 months) increased relative theta power (RTP) would be observed for frequently presented speech stimuli, regardless of the category (native vs. non-native) from which the sounds were drawn. In contrast, we hypothesized that increased RTP would be observed for non-native speech stimuli in adults, regardless of the frequency with which they are presented. The prediction that non-native stimuli would produce increased theta responses stems from previous work showing that adult's processing of non-native speech stimuli increases the spread and duration of brain activation, effects associated with greater cognitive effort (Zhang et al., 2005, 2009). 12 month-old infants were expected to show a transitional pattern that more closely resembled that of adults than 6-month-old infants.

## **METHOD**

#### **PARTICIPANTS**

Seventeen healthy full-term Finnish-learning infants were tested at two ages: 6 months of age and 12 months of age in the MEG (see **Figure 1A**). The 6-month-old infants (*N* = 7) averaged 6.3 months at test (range = 5.15–7.27 months; 3 female); the 12-month-old infants (*N* = 11) averaged 12.27 months at test (range = 9.27–13.2 months; 3 female). An additional 17 infants were excluded due to failure to remain inside the MEG sensor array (2), insufficient data from head position sensors in the MEG (4), failure to complete the two required test conditions (11), or experimenter error (1). Infants were recruited by soliciting families at parentinfant groups and swim lessons in Helsinki, Finland. Written informed consent in accordance with the Research Ethics Board of BioMag Laboratory at Helsinki University Central Hospital and University of Washington was obtained from the parent.

Nine native Finnish-speaking adults (5 female) were tested (mean age = 38.3 years, range = 24.0–57.4 years). Adults were recruited with approved recruitment flyers posted at BioMag Laboratory and in public places in Helsinki, Finland. Adult participants gave written informed consent in accordance with the Research Ethics Board of BioMag Laboratory at Helsinki University Central Hospital and University of Washington.

#### **STIMULI**

Two sets of computer-synthesized syllables were used, one native (Finnish) and one non-native (Mandarin Chinese). The computer synthesized native speech tokens were the Finnish alveolar stop /pa/ and /ta/ syllables created for use in the current experiment (see Bosseler, 2010 for full description). The computer synthesized male tokens of Mandarin Chinese (alveolo-palatal /tchi/ and fricative /ci/ syllables), were originally created by Kuhl et al. (2003) and used by Kuhl et al. (2005); Tsao et al. (2006), and Kuhl et al. (2008). The stimuli were presented at 65 dBA sound pressure level via a loudspeaker located in front of the participant. Native and non-native sounds were tested in counterbalanced order. Sounds were presented in an oddball paradigm consisting of a standard (0.85 probability), and deviant (0.15 probability) (see **Figure 1B**). For the native stimuli, /pa/ served as the standard and /ta/ the deviant; for the non-native stimuli, /tchi/ served as the standard and /ci/ the deviant. Sounds were presented with a 1200 ms inter-stimulusinterval (ISI), onset-to-onset. During the experiment, adults were instructed to watch a self-selected silent movie on a screen and ignore the auditory stimuli that were being presented. Infants watched an assistant playing quietly with toys while a silent child-oriented video was displayed behind the assistant.

Borrowed Russian words, which are heard frequently by Finnish people, contain the same alveolo-palatal vs. fricative contrast present in the Mandarin stimuli. Consequently, both the Finnish and Mandarin phonetic contrasts are discriminated by native-Finnish speakers. In pilot studies, we verified that Finnishspeaking adults identified only the Finnish sounds as native to the language using behavioral tests, and that adults could discriminate both the Finnish and Mandarin sounds (Bosseler et al., 2010). As expected, MEG measures of the mismatch negativity (reported in Kuhl et al., in preparation) confirmed that both the Finnish and the Mandarin Chinese contrasts were neurally discriminated by all subject groups: the 6-month-old, 12-month-old, and adult Finnish participants. Discriminatory brain activity to both the native and the non-native contrasts was shown by statistically significant mismatch responses at the source level in auditory cortex (full report in Kuhl et al., in preparation). The goal of the present study was to compare brain rhythms in response to phonetic units varying in frequency and language using phonetic contrasts that are discriminated by all groups of listeners. The ability to discriminate both contrasts allows us to attribute observed theta differences to psychological processes beyond simple discrimination of the stimuli.

**FIGURE 1 | (A)** Infant in MEG during measurement. **(B)** Stimuli presented in the oddball paradigm, in two conditions, native (upper) and non-native (lower). Bolding reflects infrequently presented stimuli. **(C)** Time-Frequency plots showing the changes in RTP across conditions and groups. Left Panel,

Frequency: RTP to frequent and infrequent phonemes collapsed across native and non-native speech sounds as a function of age. Right Panel, Language: RTP to native and non-native phonemes collapsed across freqent and infrequent stimuli as a function of age.

#### **MAGNETOENCEPHALOGRAPHY RECORDINGS**

Auditory evoked magnetic fields (AEF) were recorded with a whole-head array of 306 channels with 102 triple-sensor elements, each with two orthogonal planar gradiometers and one magnetometer (Elekta Oy, Helsinki, Finland) in a magnetically shielded room (Euroshield, Eura, Finland) at the BioMag Laboratory, Helsinki University Central Hospital. The MEG was continuously recorded with a bandpass filter of 0.03–200 Hz and sampled at 600 Hz. Prior to the recordings, four indicator coils were attached to the infant's nylon cap at known locations in an anatomical coordinate system defined by the nasion and the preauricular points. The signals from the coils were used to determine the position of the head inside the helmet. During the experiment, infants were sitting in either a high chair or a car seat. An adult (in addition to the assistant manipulating toys) was in the room sitting to the side of the infant in the MEG during data collection. The duration of each condition was approximately 15 min.

#### **DATA ANALYSIS**

MEG activity was averaged for each of the 4 stimuli (2 Finnish, 2 Mandarin) offline. Only the pre-deviant standards were used during analyses in order to match the number of deviant and standard epochs. The averaging epochs were taken from 100 ms prior to trigger onset to 1200 ms. Spatiotemporal signal space separation (tSSS) was used to eliminate artifacts arising from sources outside the sensor array such as the heartbeat, limb movements, and other ambient magnetic disturbances (Taulu et al., 2004; Taulu and Simola, 2006). After rejecting artifacts using tSSS, the head position registered for each epoch was used to convert the MEG signals to correspond to a virtual unified (standardized) position within the MEG sensor array for averaging across epochs (Taulu and Kajola, 2005).

#### **BRAIN RHYTHM CALCULATION**

The amplitude of single-trial oscillations was calculated using a Fourier transform variant, the Morlet wavelet function of time and frequency, on individual raw data files. A set of wavelets was used with the fundamental frequency ranging from 0 to 30 Hz in steps of 0.5 Hz, using a wavelet width of 7 cycles (Tallon-Baudry et al., 1996). The time-frequency response (TFR) window was symmetrically extended to 1200 ms pre- and post- stimulus onset. The subsequent analysis concentrated on the TFR segment from −100 to 600 ms with respect to stimulus onset to avoid artifact at the ends of the TFR window related to the TFR computation. Changes in power as a function of time were calculated from the single-trial MEG signals. The power change value (*-*P) after the stimulus onset was obtained by computing the change in theta power relative to the 100-ms pre-stimulus baseline. The single-trial data were then averaged separately for the standard native, standard non-native, deviant native, and deviant non-native for each age group. The resulting values are expressed as the change in power relative to the power in a 100-ms prestimulus baseline period (**Figure 1C**). Mean area measurement windows were taken between 0–200 ms, 200–400 ms, and 400– 600 ms. Measurement windows were based on previous studies and inspection of individual time-frequency averages and grand averages of each age group.

The SSS method was used to transform the signals of the MEG sensors into magnetostatic multipole moments which were then combined into a total current estimate as given by Equation 18 (Taulu and Kajola, 2005). The integral of this estimate was calculated over the whole brain volume to give us a single time-dependent scalar value, reflecting the power of all brain activity. This procedure helps alleviate the reconstruction noise effects related to head position transformations in movementcompensated sensor signals (Medvedovsky et al., 2007). A refinement of the documented SSS algorithm, Foster's optimal inverse (Foster, 1961), was used in the linear transformation of the physical sensor signals to multipole moments to further improve the signal-to-noise ratio.

The amplitude of the multipole moment based total current was then used to calculate activity in the time-frequency domain. In humans, theta recordings during learning and memory tasks have been shown to be widespread across cortical and sub-cortical brain areas (Caplan et al., 2001; Buzsáki, 2005). Although MEG source modeling of theta would have been useful, we were unable to obtain individual infant MRIs in Finland and therefore, completely accurate modeling of the head's conductivity profile was not possible. The aforementioned simple current estimate did not require a priori knowledge of the individual conductivity profile of the infant brain. Thus, the multipole moment method allowed us to avoid the bias associated with source modeling in cases in which the head does not fit a typical conductivity profile as well as estimate RTP across cortical and sub-cortical areas.

#### **RESULTS**

We tested two specific hypotheses: (1) RTP will be greater for Frequent vs. Infrequent stimuli in 6-month-old infants (but not in adults), regardless of language, because infant attention is drawn to high distributional frequency events; (2) RTP will be greater for Non-native vs. Native stimuli in adults (but not in 6-month-old infants), regardless of the frequency of presentation, because non-native stimuli require more attention and cognitive effort. We expected 12-month-old infants to show a transitional pattern that resembled more closely the pattern shown by adults.

Previous work on infants' theta rhythms using fast Fourier transform with narrow frequency bin analysis of EEG data suggests that the frequency range of theta for infants is 3.6–5.6 Hz (Orekhova et al., 1999, 2006). A 4–8 Hz range for theta is widely accepted for adults (see Knyazev, 2007 for review). We calculated RTP using both frequency ranges and similar results were obtained. We report the 4–8 Hz range results here.

Following previous studies (Zhang et al., 2011), we first examined RTP in the 0–200 ms time window in a mixed-model ANOVA with stimulus Frequency (Frequent vs. Infrequent) and Language (Native vs. Non-native) as within subject factors, and Age (6 months, 12 months, and adults) as the between-subjects factor. Greenhouse-Geisser corrections were applied when appropriate and partial eta-squared (η<sup>2</sup> *<sup>p</sup>*) was calculated for main effects and interactions. Planned comparisons were reported as significant at the 0.05 level and Cohen's *d* was calculated for effect sizes.

Theta brain rhythm results were consistent with our predictions. **Figure 2** shows the overall RTP at each of the three ages for the factors of Frequency (**Figure 2**, left column) and Language (**Figure 2**, right column). A three way repeated-measures ANOVA revealed a significant main effect for the between groups factor of Age, *<sup>F</sup>(*2*,* <sup>24</sup>*)* <sup>=</sup> <sup>25</sup>*.*63, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*001, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*68. Tukey HSD revealed that RTP in adults was significantly higher than for the 6- and 12-month-old infants than adults (*p <* 0*.*001), and did not differ significantly for 6- and 12-month-old infants (*p* = 0*.*58). A significant main effect was also obtained for Frequency, *<sup>F</sup>(*1*,* <sup>24</sup>*)* <sup>=</sup> <sup>4</sup>*.*34, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*048, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*15, indicating higher RTP for Frequent (*M* = 0*.*9, *SE* = 0*.*03) vs. Infrequent (*M* = 0*.*84, *SE* = 0*.*02) stimuli. There was no main effect for Language, *F(*1*,* <sup>24</sup>*)* = <sup>0</sup>*.*7002, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*41, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*03.

The statistical results of greatest interest to our hypotheses were the predicted significant 2-way interactions. An Age X Frequency interaction, *<sup>F</sup>(*2*,* <sup>24</sup>*)* <sup>=</sup> <sup>5</sup>*.*061, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*015, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*30, confirmed our first predicted result: the effect of Frequency was significant only in 6 month olds, *<sup>F</sup>(*1*,* <sup>6</sup>*)* <sup>=</sup> <sup>23</sup>*.*301, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*<sup>003</sup> <sup>η</sup><sup>2</sup> *p* = 0*.*80 (**Figure 2**). RTP did not differ significantly to the frequent and infrequent stimuli in 12-month-old infants, *F(*1*,* <sup>10</sup>*)* = 0*.*13, *p* = 0*.*73, or in adults, *F(*1*,* <sup>8</sup>*)* = 0*.*06, *p* = 0*.*81.

As predicted, we also obtained a significant Age X Language interaction, *<sup>F</sup>(*2*,* <sup>24</sup>*)* <sup>=</sup> <sup>4</sup>*.*96, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*016, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*29, with the effect of Language significant in adults, *F(*1*,* <sup>8</sup>*)* = 5*.*66, *p* = 0*.*045, η2 *<sup>p</sup>* = 0*.*41 and 12-month-old infants, *F(*1*,* <sup>10</sup>*)* = 17*.*294, *p* = 0*.*002, η<sup>2</sup> *<sup>p</sup>* = 0*.*63. In adults, RTP was higher for the Nonnative (*M* = 0*.*993, *S.E.* = 0*.*017) vs. Native (*M* = 1*.*04, *S.E.* = 0*.*01), whereas in 12-month-old infants RTP was higher for the Native (*M* = 0*.*872, *S.E.* = 0*.*04) vs. the Non-native (*M* = 0*.*753, *S.E.* = 0*.*03). Six-month-old infants did not show significant differences between the Native (*M* = 0*.*769, *S.E.* = 0*.*47) and Non-native (*M* = 0*.*783, *S.E.* = 0*.*043), *F(*1*,* <sup>6</sup>*)* = 0*.*033, *p* = 0*.*86, stimuli. The three-way interaction was not significant, *p* = 0*.*47.

#### **CHANGES IN THETA OVER TIME**

We had no a priori predictions about the pattern of change in RTP over time at each age, but we were interested in the patterns obtained. The change in RTP over time at each age is shown in **Figure 3** for three measurement windows (0–200, 200–400, 400– 600 ms). Infants showed the largest changes over time in theta oscillatory activity.

At each age, we conducted a 3-Way ANOVA (Frequency × Language × Window). Looking first at RTP in response to Frequency (**Figure 3**, left), the results showed that in both 6- and 12- month-old infants, RTP increased from the 1st (0–200) to 3rd (400–600) window: 6 months, *F(*2*,* <sup>12</sup>*)* = 16*.*045, *p* = 0*.*001, η2 *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*73; 12 months, *<sup>F</sup>(*2*,* <sup>20</sup>*)* <sup>=</sup> <sup>4</sup>*.*08, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*04, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*29. For 6-month-olds, the increase was also significant for the change between the 2nd (200–400) vs. 3rd (400–600) window, *p* = 0*.*004, *d* = 1*.*634–0.80. The main effect of Frequency for 6-month-olds indicated higher RTP to the Frequent stimuli, *F(*1*,* <sup>6</sup>*)* = 5*.*95, *p* = 0*.*001, η<sup>2</sup> *<sup>p</sup>* = 0*.*50. The Window X Stimulus Frequency interaction, *<sup>F</sup>(*2*,* <sup>12</sup>*)* <sup>=</sup> <sup>7</sup>*.*48, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*023, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*56, indicated greater increases over time for Infrequent stimuli, especially between the 2nd and 3rd windows. Follow-up tests showed that the increase to infrequent stimuli was due primarily to Native stimuli (1st to 2nd: *p* = 0*.*15, *d* = 1*.*81–0.67; 2nd to 3rd: *p* = 0*.*01, *d* = 2*.*65–0.80). The main effect of Frequency and the Window × Frequency interaction were not significant in 12-month-olds or in adults.

For Language (**Figure 3**, right), RTP increased from the 1st to the 3rd time window in both the 6- and 12- month-old

infants, but not in adults, *<sup>F</sup>(*2*,* <sup>16</sup>*)* <sup>=</sup> <sup>2</sup>*.*92, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*10, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*27. In 6-month-olds, the increase was significant for both the Native, *p* = 0*.*006, *d* = 1*.*50–0.60, and Non-native stimuli, *p* = 0*.*045, *d* = 1*.*27, *d* = 0*.*54. For the 12-month-old infants significant changes in RTP as a function of time occurred for the Non-native stimuli only, indicated by the marginally significant Window × Language interaction, *F(*2*,* <sup>20</sup>*)* = 3*.*46, *p* = 0*.*056, η2 *<sup>p</sup>* = 0*.*26.

## **DISCUSSION**

The current study focused on oscillatory brain rhythms and the transition in developmental speech perception that all typically developing infants experience between the ages of 6- and 12-months-of-age: A change in phonetic perception that alters the initial language-general mode of perception to one that is language specific. This transition in infant perception has been widely reported across cultures and occurs during a putative "critical" or "sensitive" period in the development of language (Kuhl et al., 2005; Kuhl, 2010; Peña et al., 2012; Weikum et al., 2012). In the current study, our approach was to employ a spectral analysis of MEG-derived neuromagnetic signals. The brain's oscillatory rhythms have been associated in previous studies with changes in stimuli that evoke increased attention and/or cognitive effort.

We focused on the theta brain rhythm (4–8 Hz) because it has often been recorded in infants, as well as adults, and because we hypothesized that changes in implicit attention and cognitive effort, would be associated with the transition in phonetic perception. To our knowledge this is the first time that MEG brain imaging has been used in a study of oscillatory brain activity in response to speech in infants; it is also the first study of oscillatory activity using EEG or MEG in which developmental change from infancy to adulthood is examined.

We hypothesized that theta oscillatory rhythms would increase differentially as a function of age, and more importantly, that theta would vary as a function of two different aspects of the speech stimuli manipulated in the current experiment: (1) distributional frequency, and (2) status as native- as opposed to non-native phonetic stimuli. Interactions between age and frequency and between age and language were predicted: 6-monthold infants (but not adults) were expected to show theta increases for frequent over infrequent stimuli, regardless of language; adults

(but not 6-month-old infants) were expected to show significant theta increases for non-native relative to native syllables, regardless of frequency of presentation. 12-month-old infants were expected to show a transitional pattern that more closely resembled that of adults than 6-month-old infants.

Our results confirmed these hypotheses: Theta power in 6 month-old infants was higher in response to frequently presented stimuli, with no significant differences for native as opposed to non-native syllables. Attending to distributional frequency in an array of speech stimuli is very efficient in promoting phonetic learning during infancy, because this probability statistic reveals information that assists the identification of the phonetic units that signal meaningful differences between words. Studies show that language input contains robust distributional frequency information about the phonetic units that distinguish words in the language (for evidence from Japanese, see Werker et al., 2007). Increases in theta oscillatory brain activity to frequently presented syllables may thus, be a biomarker of infant attention to this feature.

In adulthood, theta increases are *not* driven by the frequency with which syllables are presented, but instead by the category (native or non-native) of the syllables. In adults, we observed theta increases in response to non-native as opposed to native syllables, and we attribute this result to the fact that processing non-native phonetic syllables demands greater attention and cognitive effort (see Zhang et al., 2009). Adults' perception of speech is governed by learned categories, and processing is highly automatic. Attention is automatically drawn to syllables that do not belong to learned categories; they require more cognitive effort to process.

The pattern in 12-month-old infants more closely resembled that of adults in that theta change is not driven by the frequency of presentation, and instead by language. However, and interestingly, 12-month-old infants showed higher theta rhythms to native syllables, whereas adults showed increased theta to nonnative syllables. At 12 months of age, infants, for the first time, are attending to the detailed characteristics of native language phonetic units, and developing representations in memory of these phonetic units. We interpret this finding as evidence that during the initial transition to native-language processing, when native language learning has begun but is still incomplete, infants' nascent attentional network is directed to the acoustic events that signal meaningful word differences, requiring increased attention and cognitive effort. In adulthood, when phonetic processing has become more automatic, native syllables are processed without effort, and non-native syllables require more attention and effort.

Thus, our data suggest that development, as revealed by theta power, may involve two transitions in perception: the first involves a shift in attention from the acoustic speech signals that occur most frequently, to the acoustic events that have native-language status. Once native-language processing becomes more automatic, as it is in adults, then attention is drawn toward syllables that do not belong to a known category, in this case, a non-native syllable. The fact that adults do not attend to highly frequent events, as they once did in infancy, restricts their abilities to learn new phonetic material. Moving to a new country as an adult, and listening for months or even years to the distributional statistics of a new language, does not lead to robust learning of the phonetics of a new language (Flege, 1995).

The use of MEG in the current study allowed us to observe the temporal unfolding of theta oscillatory activity for the first time. Infants showed more dramatic changes over the 600 ms temporal window when compared to adults, who were remarkably stable in the responses over the 600 ms period. Comparing the two infant groups, 6-month-olds showed more change over time than 12-month-olds, with greater increases to infrequent stimuli, suggesting perhaps that more time is needed to attend to infrequent stimuli. We assume that changes in theta over the 600 ms temporal window we investigated could reflect the value for infants of a longer period during which the stimulus can be analyzed; therefore, increases in theta over time are observed rather than decreases over time. In the 12-montholds, increases occurred to non-native stimuli over time, perhaps reflecting that additional time increases the attention to non-native stimuli for infants who are acquiring native-language categories.

It is of interest for future studies that work by Poeppel and his colleagues using MEG measures (e.g., Ghitza et al., 2013) indicates that, in adults, theta phase information "parses" syllables in a speech stream, which would aid infant speech processing. Future developmental studies can be directed to investigate theta phase, as well as to investigate other frequency bands, using both power as well as phase measurements, to more fully understand how brain oscillatory rhythms change with age and experience to language stimuli.

#### **AN IMPLICIT LEARNING PROCESS INVOLVING ATTENTION**

Collectively, the data from the present experiment suggest that theta oscillatory brain activity indexes an implicit learning process that entails a shift between infancy and adulthood in the patterns of speech stimuli that induce attention. The patterns we observed in theta oscillatory activity with age mirror the timing observed in behavioral studies on the transition in developmental speech perception. Theta oscillatory activity has been linked in previous studies outside the domain of speech to changes in attention and cognitive effort, and previous interpretations of theta brain rhythms are consistent with our findings. Infant theta increased for frequently presented stimuli, and this is consistent with literature showing that infants are statistical learners in the early period (see Maye et al., 2002). Infants are drawn to phonetic stimuli in their environment that occur frequently, and this has been shown experimentally to assist phonetic learning. In adults, attention to frequently occurring syllables could reduce the stability of the phonetic categories learned in infancy, which are expected to allow efficient speech processing all one's life. For adults, attention is driven by category knowledge—the recognition that a stimulus is a native vs. a non-native syllable. The results of the current study confirm that adults' theta increases are driven by the syllable category (native vs. non-native), rather than its frequency, with non-native syllables requiring more attention and cognitive effort. As expected, 12-month-old infants are in transition. Theta in 12-month-olds is no longer driven by frequency, and is instead driven by language; however, unlike adults, syllables representing the native category led to increased theta. Infants at 12 months are in the process of locking in learned phonetic categories, resulting in greater attention to native syllables. During the second 200 ms time window (200–400 ms), 12-month-olds begin to resemble adults, showing greater theta increases to the non-native syllables.

Thus, far in the discussion we have interpreted our results in terms of theta oscillatory activity indicating changes in attention brought about by development and experience. But the kinds of changes in attention that we propose are not under the control of the participant. The changes in attention we describe are part of an implicit learning process, one that is not under conscious control. Infants' attentional networks are not fully developed (see discussion by Diamond, 1990, 1994; Berger et al., 2006). Even in adults, theta oscillations do not reflect conscious attentional strategies. The learning process for language guides attentional shifts toward high probability events early in development because probabilistic information reveals structure in language input. In adulthood, attention to frequent stimuli would render learned categories unstable. Brain oscillatory activity indexes these higher-level psychological processes.

Infants' sensitivity to probabilistic events is highly conducive to language acquisition because, prior to learning, the elementary units (phonemes and words) that are critical to language are unknown to infants—attention to the probabilistic information in language input identifies the critical elements (phonemes and words) thus, supporting learning. In adulthood, the brain has developed neural networks that specialize in the analysis of native language patterns of phonemes and words, making analysis more automatic. Attentional demands thus, increase for non-native, rather than native, speech signals (see also Zhang et al., 2005, 2009). Continuing research on theta brain rhythms in typically developing infants and young children, as well as those with developmental difficulties, will advance our understanding of the relationship between attention, executive control, and language acquisition. The measurement of the brain's oscillatory rhythms goes beyond what behavioral studies can reveal.

#### **LIMITATIONS OF THE PRESENT STUDY**

In the present study, theta brain rhythms were measured using the overall amplitude of total current based on multipole moments, which provided a whole-brain assessment (a "virtual" sensor) that we submitted to spectral analysis. In future studies, improved noise-reduction algorithms that are currently being developed will improve our ability to track the temporal dynamics of brain activation in both infants and adults. In addition, ongoing work in optimizing the movement compensation algorithm with respect to reconstruction noise will ensure a robust sensorlevel analysis even with subjects whose movements are larger than the current limits considered acceptable for reliable analysis (Medvedovsky et al., 2007). Furthermore, the use of magnetic resonance imaging (MRI) of infant participants, or the use of agespecific infant average head models, which we are developing (see Akiyama et al., 2013), would allow brain oscillatory activity to be localized in the infant brain with improved accuracy. Localization of theta brain activity would provide information about hemispheric differences in the generators of this activity across age, as well as the location of brain activation that accompanies the observed shift in theta related to stimulus characteristics—from frequent stimuli in infancy to non-native stimuli in adulthood.

#### **CONCLUSIONS**

In the present study, changes in theta brain rhythms were shown to index the universally observed developmental change in speech that narrows infants' perception, a transition in development that has been posited as necessary for language acquisition (Kuhl, 2010). We demonstrate that infant theta oscillations are driven by the distributional frequency of speech events, whereas adult theta oscillations are driven by category knowledge. We posit that these changes in brain oscillations reflect an implicit learning process in which a shift in attention occurs with development and experience: early in infancy, before phonetic learning has occurred, attention is driven by event frequency; once phonetic learning occurs, attention is driven by category knowledge. This learning algorithm, we posit, is based on the brain's implicit learning systems and attentional mechanisms, and the learning algorithm

#### **REFERENCES**


native and non-native phonetic contrasts in infants and adults," in *Poster session presented at the 17th International Conference on Biomagnetism* (Dubrovnik).


causes the transition in speech perception that occurs before the end of the first year. Whether the timing of the transition in speech perception is governed by maturation, experience, the emergence of other cognitive skills (e.g., social cognition), or a combination of these factors, remains to be determined. Regardless of these future questions, however, the present study demonstrates the utility of theta brain oscillatory activity as an index of a shift in attention from frequent events to learned category knowledge that will be a helpful tool in future studies.

Future studies could examine theta oscillatory rhythms using other stimuli, such as faces, music, and multi-modal speech stimuli (Pascalis et al., 2002, 2005; Hannon et al., 2005; Lewkowicz and Ghazanfar, 2006). All of these stimuli have been reported to show perceptual narrowing in the developmental timeframe during which the transition in infant speech perception occurs. Theta rhythm analyses will reveal either parallels or distinctions across stimulus domains, supporting either a pan-sensory domain-general learning process as the instigator of perceptual narrowing, or a learning process that is highly specific and dedicated to language learning.

#### **ACKNOWLEDGMENTS**

Funding for the research was provided by a National Science Foundation grant under the Science of Learning Program to the University of Washington's LIFE Center (PI, Patricia K. Kuhl).

dissertation, University of Washington.


infancy predict later language skills: a whole brain voxel-based morphometry study. *Brain Lang*. 124, 34–44. doi: 10.1016/j.bandl.2012. 10.007


10, 110–120. doi: 10.1111/j.1467- 7687.2007.00572.x


122–134. doi: 10.1111/j.1467-7687. 2007.00653.x


responses in human. *J. Neurosci.* 16, 4240–4249.


in English and Japanese. *Cognition* 103, 147–162. doi: 10.1016/j. cognition.2006.03.006


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 28 May 2013; accepted: 11 September 2013; published online: 01 October 2013.*

*Citation: Bosseler AN, Taulu S, Pihko E, Mäkelä JP, Imada T, Ahonen A and Kuhl PK (2013) Theta brain rhythms index perceptual narrowing in infant speech perception. Front. Psychol. 4:690. doi: 10.3389/fpsyg.2013.00690*

*This article was submitted to Auditory Cognitive Neuroscience, a section of the journal Frontiers in Psychology.*

*Copyright © 2013 Bosseler, Taulu, Pihko, Mäkelä, Imada, Ahonen and Kuhl. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Do informal musical activities shape auditory skill development in preschool-age children?

## *Vesa Putkinen1\*, Katri Saarikivi <sup>1</sup> and Mari Tervaniemi 1,2*

*<sup>1</sup> Cognitive Brain Research Unit, Cognitive Science, Institute of Behavioural Sciences, University of Helsinki, Helsinki, Finland <sup>2</sup> Finnish Centre of Excellence in Interdisciplinary Music Research, University of Jyväskylä, Jyväskylä, Finland*

#### *Edited by:*

*Virginia Penhune, Concordia University, Canada*

#### *Reviewed by:*

*Elyse S. Sussman, Albert Einstein College of Medicine, USA Elizabeth Hellmuth Margulis, University of Arkansas, USA*

#### *\*Correspondence:*

*Vesa Putkinen, Cognitive Brain Research Unit, Cognitive Science, Institute of Behavioural Sciences, University of Helsinki, P.O. Box 9, Siltavuorenpenger 1B, FIN-00014 Helsinki, Finland e-mail: vesa.putkinen@helsinki.fi*

everyday musical activities such as singing and musical play. Here, we review recent electrophysiological and behavioral studies carried out in our laboratory and elsewhere which have begun to map how developing auditory skills are shaped by such informal musical activities both at home and in playschool-type settings. Although more research is still needed, the evidence emerging from these studies suggests that, in addition to formal musical training, informal musical activities can also influence the maturation of auditory discrimination and attention in preschool-aged children.

**Keywords: music, brain development, event-related potential, training, auditory perception, informal musical activities**

The influence of formal musical training on auditory cognition has been well established. For the majority of children, however, musical experience does not primarily consist of adult-guided training on a musical instrument. Instead, young children mostly engage in

## **INTRODUCTION**

A multitude of experimental evidence shows that musical expertise has functional and structural manifestations in the brains of musically trained adult individuals. These can be observed in cortical and subcortical neural architecture underlying uni- and cross-modal sensory, motor, and cognitive functions (Münte et al., 2002; Jäncke, 2009). While the pioneering findings were correlational, indicating merely an association between the starting age of music training and enhanced brain functions (Pantev et al., 1998; Bengtsson et al., 2005), very recent longitudinal studies in children demonstrate a causal link between musical training and changes in brain structure and function (Putkinen et al., 2013b; Hyde et al., 2009; Moreno et al., 2009; Chobert et al., 2012).

In the great majority of these studies, musical training consisted of formal studies aiming at the mastery of one or more music instruments. However, for most young children, typical musical experiences consist of everyday musical activities such as singing, dancing, listening to recorded music, and musical play at home or in playschool-type settings. Furthermore, different computer and console games also attract children into "musical play" in an increasing manner (e.g., Singstar, Rockband, karaokebased applications) and could be regarded as additional informal environments for music learning. Thus, there is an evident need for studies that examine informal musical activities as possible learning platforms in childhood.

## **EARLY PERCEPTUAL PREREQUISITES FOR THE EFFECTS OF MUSICAL ACTIVITIES AND MUSICAL ENCULTURATION**

Behavioral studies have demonstrated that already at around the age of six months infants are equipped with many of the perceptual and cognitive prerequisites for the putative beneficial effects of a musically enriched environment. Not only do they display fairly accurate discrimination of musically important basic

"fpsyg-04-00572" — 2013/8/28 — 19:16 — page 1 — #1

sound features such as pitch (Olsho et al., 1987) and duration (Morrongiello and Trehub, 1987), but they are also sensitive to some more abstract aspects of musical sounds. For example, infants appear to encode melodies and rhythms in terms of relative pitch (Trehub et al., 1984) and relative duration (Trehub and Thorpe, 1989), are able to group individual tones by pitch (Thorpe et al., 1988), and show long-term memory for musical pieces (Plantinga and Trainor, 2005). More recently, event-related potential (ERP) studies have shown that musically relevant auditory abilities such as the neural discrimination of different intervals (Stefanics et al., 2009), sound grouping (Stefanics et al., 2007), perception of the missing fundamental (He and Trainor, 2009), auditory stream segregation (Winkler et al., 2003), and detecting the beat of rhythmic sounds (Winkler et al., 2009) are present already before the age of six months or even at birth. These early perceptual skills are also put to heavy use during childhood: young children typically receive ample musical exposure (Trehub, 2006) and appear to find music both interesting and enjoyable (Nakata and Trehub, 2004; Zentner and Eerola, 2010). Therefore, everyday musical activities are a rich source of experiences that may have the potential to shape auditory skill development.

Indeed, in behavioral studies even musically untrained adults show (implicit) competence in processing some fairly nuanced aspects of music in a way that is consistent with learning through mere incidental exposure (Cuddy and Badertscher, 1987; David Smith et al., 1994; Bigand and Pineau, 1997; Honing and Ladinig, 2009). ERP studies indicate that the brains of non-musicians automatically process some aspects of Western tonality and harmony (Koelsch et al., 2000, 2003; Brattico et al., 2006; Krohn et al., 2007). Presumably, these idiosyncrasies of Western tonal music are internalized by non-musicians through everyday musical experiences. Native language learning offers a well-known parallel example of such an exposure effect where, during the first year of life, the auditory system starts to tune to the speech sounds of one's native language while simultaneously losing the sensitivity to non-native speech sound contrasts (Kuhl et al., 1992; Cheour et al., 1998). In infants, a development reminiscent of the tuning to native speech sounds appears to take place with regard to the processing of culturally typical vs. atypical metric (Hannon and Trehub, 2005) and scale structures (Trehub et al., 1999) in music. Consequently, adults show an advantage in processing music that follows the conventions of their culture (Kessler et al., 1984; Krumhansl et al.,2000; Drake and Ben El Heni,2003; Demorest and Osterhout, 2012). At the very least, this enculturation process demonstrates that ambient exposure to music without specific training is sufficient for learning culture-specific implicit musical knowledge. More generally, these effects raise the question of whether informal exposure to music might also influence the development of auditory processing outside of the musical domain.

## **MUSICAL ENRICHMENT IN EARLY CHILDHOOD**

In light of the evidence reviewed above, it seems plausible that everyday musical activities during childhood such as exposure to parental singing and musical play by the child might influence auditory skill development. Yet so far only a few research endeavors have directly examined how variation in the amount of such activities is reflected in early auditory abilities. These will be introduced next.

Recently, Putkinen et al. (2013a) set out to investigate whether the amount of informal musical activities is related to electrophysiological correlates of auditory change detection and attention in 2–3-year-old children. They used the *multi-feature paradigm* (Näätänen et al., 2004) to record several auditory ERP responses in parallel: the mismatch negativity (MMN), P3a, late discriminative negativity (LDN), and reorienting negativity (RON). These brain potentials reflect successive stages of auditory processing particularly in childhood. The MMN is thought to reflect memory-based discrimination of sound changes (Näätänen et al., 2007) while the P3a is an index of an ensuing involuntary attention shift toward such changes (Escera et al., 1998). The LDN, which is often seen in children but less so in adults, is related to further processing of sound changes but its exact functional role remains unclear (Bishop et al., 2011). Finally, the RON reflects attentional reorienting after distracting sounds (Schröger and Wolff, 1998). These ERP responses were recorded within a single sound sequence to changes in frequency, duration, intensity, perceived sound-source location, and the temporal structure of the sounds (i.e., infrequent silent gaps in sounds) and to surprising novel sounds (e.g., animal sounds; see **Figure 1A**).

In addition to ERP recordings, parents were enquired about how often their children engaged in different types of musical activities at home (e.g., singing and dancing) and how often the parents interacted with them musically (e.g., how often they sang to their children). A composite score indexing the amount of such musical everyday activities was found to be significantly correlated with the ERP response amplitudes (see **Figures 1B,C**). Firstly, a high amount of musical activities was associated with enlarged P3a responses to duration and gap deviants. This result suggests that the attention of the children from the more musically active families was more readily drawn toward the changes in duration and temporal structure of sound. Therefore, musical activities at home might affect how children consciously discriminate changes in temporal aspects of sounds. Second, high scores for the musical activities index were also associated with a diminished LDN across all five deviant types. The LDN is typical for immature auditory change detection since it decreases in amplitude with age as the brain matures (Hommet et al.,2009; Bishop et al.,2011). Therefore the reduced amplitude of the LDN in children with more musical activities at home suggests more mature auditory processing in these children.

The P3a elicited by the novel sounds, in turn, correlated with paternal singing so that the more the fathers reported singing to their children, the smaller the P3a elicited by the novel sounds was. Finally, the RON responses to the novel sounds were smaller in amplitude in children with high scores in the musical activity index. The P3a and RON responses elicited by novel sounds are regarded as indices of distractibility in children. The large amplitude of these responses is associated with behavioral distraction by the eliciting sounds (i.e., prolonged reaction times and/or decreased hit rates in a concurrent task that requires responding to stimuli unrelated to the distracting sounds) and the P3a is enlarged in children with attention deficit hyperactivity disorder (ADHD; Gumenyuk et al., 2005). Therefore, the reduced P3a and RON responses found by Putkinen et al. (2013a) in the children with more musical activities at home suggest that these children were less easily distracted by the novel sounds than children from less musically active home environments.

These results are interesting since they suggest that in early childhood, informal musical activities might affect the development of auditory skills essential for normal language development as well as attention skills that may be related to later school performance. Obviously, the correlational data of Putkinen et al. (2013a) do not allow one to conclude that the relation between everyday musical activities and response amplitudes was causal. However, several lines of evidence suggest that such an interpretation of the results is at least plausible. Firstly, animal studies indicate that early enrichment of the sound environment affects the organization of auditory cortices (Zhang et al., 2001; Engineer et al., 2004). Secondly, emerging evidence from experimental studies reviewed below support the notion that the kind of musically enriched early experience examined by Putkinen et al. (2013a) can influence the development of auditory processing.

In an ERP study, Trainor et al. (2011) randomly assigned 4 month-old infants either to a group that was exposed to recordings of melodies in a guitar timbre or to a group that heard the same melodies in a marimba timbre. After a week-long 20-minper-day exposure to one of the two timbres, the guitar-exposed infants showed a larger obligatory response to guitar tones than to marimba tones whereas the opposite response pattern was found for the marimba-exposed group. Furthermore, occasional pitch changes in the guitar tones elicited a mismatch response only in the guitar-exposed infants whereas pitch changes in the marimba tones did not elicit a significant mismatch response in either group. These results suggest that in infants a relatively short exposure can strengthen the neural representations of a given timbre which is further reflected in enhanced processing of pitch in that timbre.

"fpsyg-04-00572" — 2013/8/28 — 19:16 — page 2 — #2

"fpsyg-04-00572" — 2013/8/28 — 19:16 — page 3 — #3

et al. (2013a). In this paradigm standard tones (*P* ∼ 0.50) alternate with deviant tones (*P* ∼ 0.42) from five categories and novel sounds (*P* ∼ 0.08). See Putkinen et al. (2013a) for details. **(B)** and **(C)** illustrate difference signals for the deviant sounds and novel sounds, respectively, and the scatter plots illustrating the correlations between the P3a (diamonds) and LDN/RON (dots)

thin dashed lines represent the difference signal at individual fronto-central channels and the thick solid line is the average of these signals. The gray bars indicate the latency windows that were used to calculate the response mean amplitudes. Panels **(B)** and **(C)** reproduced with permission from Putkinen et al. (2013a).

Gerry et al. (2012) randomly assigned 6-month-old infants either to infant-directed music classes based on the Suzuki method or to classes during which infants interacted with their parents while recorded music was played in the background, or to a control group with no lessons. After a 6-month follow-up, the infants took part in an experiment during which a piece of classical music and an atonal version of the same piece was played to them. Lookingtime measurement indicated that the infants who had attended the Suzuki music classes preferred the original piece over the atonal version whereas the other two groups showed no preference. The authors interpreted this result as an indication of earlier learning of Western tonality in the Suzuki music group than in the control groups. Along the same lines, another study by the same team found that guided musical group activities with emphasis on infants and parents moving to music were associated with heightened sensitivity to the typical metric structure of Western music in 7-month-old infants (Gerry et al., 2010).

Although there are some clear differences between the aforementioned studies in the type of musical exposure, they all involve activities that parents spontaneously engage in with their children in order to musically enrich their auditory environment (i.e., moving to music, listening recorded music, singing etc.). One especially important common feature is that – in clear contrast to traditional studies looking at the neuroplastic effects of musical training – these studies examined musical experience that involved no formal training on a musical instrument (although in Gerry et al., 2012 the Suzuki classes reportedly involved playing percussive instruments). Also, in the studies of Putkinen et al. (2013a) and Gerry et al. (2012), social interaction between the parents and the child appeared to be a central mediator of the link between the musical experience and children's auditory skills: In both studies the effects were specific to active and interactive musical behaviors whereas no effects were observed for more passive exposure to music. Together these studies suggest that even without formal instrument training, musical exposure may shape auditory development not only in the musical domain but also more generally by influencing basic auditory discrimination abilities and auditory attention.

Importantly, the positive effects of informal music activities are not limited to children with typical development. Recently, Torppa et al. (2012)investigated children (aged from 4 to 13 years) who were born deaf but who had received a cochlear implant in childhood. MMN and P3a responses particularly to timbre and pitch changes in musical multi-feature paradigm consisting of instrumental sounds were modulated by the singing activities of these children (Torppa et al., unpublished data). Cochlearimplanted children who sang regularly had earlier MMNs and P3as to frequency changes as well as earlier MMNs for changes from piano to cembalo timbre than cochlear-implanted children who did not sing. This finding is promising since it gives strong evidence against traditional views that cochlear-implanted individuals are not able to perceive and appreciate music. Instead, they can perceive and even produce music. Whether their readiness and willingness to sing facilitate neural sound change discrimination or whether it is the outcome of originally facilitated neural discrimination remains to be clarified in future studies.

#### **CONCLUSIONS AND FUTURE DIRECTIONS**

In conclusion, the studies reviewed above suggest that musically rich environments might have beneficial effects on auditory abilities in childhood. We suggest that these effects are not specific to the musical domain but that informal musical activities might promote more general enhancement of auditory processing. Enhancement of these skills may have important consequences for example on the later development of language and attention.

Social aspects of musical activities probably play a key role in some of the effects of musical experience on the development of auditory skills. Not only do most of the musical activities that young children engage in – for obvious reasons – take place in social situations, but social interaction *per se* is probably a profoundly important component of early musical experience. Kirschner and Tomasello (2010) directly compared drum tapping to a rhythm in social and non-social situations in preschool-aged children and found more accurate spontaneous synchronization when drumming together with an adult than with a machine or a prerecorded beat. Moreover, the finding that social interaction (vs.

passive exposure) appears crucial for native speech sound learning (Kuhl et al., 2003) implies that social interaction could facilitate early perceptual learning in the musical domain as well.

Overy and Molnar-Szakacs describe music as a primarily social experience that involves understanding the intentions behind motor actions that are required to produce musical signals (Molnar-Szakacs and Overy, 2006; Overy and Molnar-Szakacs, 2009). They suggest that this process relies on the mirror neuron system (MNS) and that music making could promote positive social interaction precisely because of the engagement of this system. Mirror neurons are generally thought to encode motor goals and intentions and thereby support action understanding, social learning, and interaction (Van Overwalle and Baetens, 2009; Bonini and Ferrari, 2011; however see, Cook et al., in press; Hickok, 2013). Whether the MNS is involved or not, positive social behavior has indeed been connected to music-making. For example, Kirschner and Tomasello (2010) found that children more often showed prosocial behavior after joint music making than after non-musical cooperation. In contrast to the view that supposes inborn mirroring properties with evolutionary origin (Bonini and Ferrari, 2011), the associative learning account holds that the capability of mirror neurons to match observed and executed actions is only acquired through experience (Heyes, 2010; Catmur, 2013). Thus, interactive musicmaking between the parent and the child can either be thought to be supported by the MNS or as a naturally engaging learning platform for developing the matching properties of these neurons.

There is a growing interest toward incorporating informal musical activities in clinical interventions for various conditions. Partly fuelling this interest, a pioneering randomized clinical study showed that everyday music listening can support cognitive recovery after stroke (Särkämö et al., 2008). In this study, stroke patients who were assigned to a group that listened to self-selected music for at least 1 h a day for 2 months showed greater recovery in verbal memory and focused attention than patients in an audio-book listening group or a control group and had lower depression and confusion scores relative to the control group. Clearly, these are highly encouraging findings with regard to stroke rehabilitation that again indicate that everyday musical activities can have deep and wide ranging effects on the brain.

Perhaps informal musical activities in childhood could also be harnessed to tune basic auditory processing. This seems especially relevant for disorders of language development, since several studies indicate that auditory discrimination in infancy predicts later language skills (Molfese, 2000; Molfese et al., 2001; Benasich and Tallal, 2002; Guttorm et al., 2005) and that basic auditory dysfunction might be a key feature of dyslexia (e.g., Tallal and Gaab, 2006). Furthermore, the attention-related effects found by Putkinen et al. (2013a) have similar implications for the normal and disturbed development of attentional control while the results of Torppa et al. (2012) suggest that the recovery of hearing in children with cochlear implants might be supported with musical activities.

In contrast to some non-musical everyday activities that have been suggested to have beneficial effects on brain development (e.g., sports), music may have an especially strong motivational

"fpsyg-04-00572" — 2013/8/28 — 19:16 — page 4 — #4

component already very early in life and might suit the perceptual and motor capabilities of young children particularly well. Musical activities might therefore be in a special position to shape the brain during the early period of heightened neuroplasticity. Future studies should investigate whether this is true only for early childhood, i.e., whether there is an early sensitive period for these effects and whether these periods are relatively fixed or malleable by experience (for example, Kuhl, 2004, 2011 has suggested that exposure to more than one language in infancy could extend the sensitive period for native speech sound learning). Thus, longitudinal or large-scale cross-sectional studies starting from infancy need to be carried out in order to map the stability of the association between informal musical activities and auditory skills in childhood as well as their implications for later auditory development. Importantly,

## **REFERENCES**


experimental intervention studies are needed to disentangle the direction of causality in these associations and to test the feasibility of incorporating everyday musical activities in treating and preventing impairments of auditory processing.

Taken together, the current findings suggest that music as a part of daily informal activities may improve several neurocognitive functions and thereby encourage the use of music in various educational settings such as daycare and school for pupils with and without special needs (see, e.g., Uibel, 2012). By reviewing the benefits that music even without formal instrumental training may offer for the modulation of basic neurocognitive functions in children, the current paper hopefully opens new avenues for future studies on the effects of informal musical activities on brain development.


"fpsyg-04-00572" — 2013/8/28 — 19:16 — page 5 — #5

*Neuropsychologia* 47, 761–770. doi: 10.1016/j.neuropsychologia.2008.12. 010


mismatch negativity (MMN) in basic research of central auditory processing: A review. *Clin. Neurophysiol.* 118, 2544–2590. doi: 10.1016/j.clinph.2007.04.026


"fpsyg-04-00572" — 2013/8/28 — 19:16 — page 6 — #6

actions and goals by mirror and mentalizing systems: a meta-analysis. *Neuroimage* 48, 564–584. doi: 10.1016/j.neuroimage.2009.06.009


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 31 May 2013; accepted: 11 August 2013; published online: 29 August 2013.*

*Citation: Putkinen V, Saarikivi K and Tervaniemi M (2013) Do informal musical activities shape auditory skill development in preschool-age children? Front. Psychol. 4:572. doi: 10.3389/ fpsyg.2013.00572*

*This article was submitted to Auditory Cognitive Neuroscience, a section of the journal Frontiers in Psychology.*

*Copyright © 2013 Putkinen, Saarikivi and Tervaniemi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, providedthe original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Musical training heightens auditory brainstem function during sensitive periods in development

## *Erika Skoe1† and Nina Kraus 1,2\**

*<sup>1</sup> Auditory Neuroscience Laboratory, Department of Communication Sciences, Northwestern University, Evanston, IL, USA*

*<sup>2</sup> Department of Neurobiology and Physiology, Department of Otolaryngology, Institute for Neuroscience, Northwestern University, Evanston, IL,USA*

#### *Edited by:*

*Virginia Penhune, Concordia University, Canada*

#### *Reviewed by:*

*Mireille Besson, CNRS, Institut de Neurosciences Cognitives de la Meditarranée, France Joyce L. Chen, Sunnybrook Research Institute, Canada*

#### *\*Correspondence:*

*Nina Kraus, Auditory Neuroscience Laboratory, Northwestern University, 2240 Campus Drive, Frances Searle Bldg., Evanston, IL 60208, USA e-mail: nkraus@northwestern.edu URL: https://brainvolts. northwestern.edu*

#### *†Present address:*

*Erika Skoe, Department of Speech, Language, and Hearing Sciences; Department of Psychology Affiliate; Cognitive Science Program; University of Connecticut, Storrs, USA*

Experience has a profound influence on how sound is processed in the brain. Yet little is known about how enriched experiences interact with developmental processes to shape neural processing of sound. We examine this question as part of a large cross-sectional study of auditory brainstem development involving more than 700 participants, 213 of whom were classified as musicians. We hypothesized that experience-dependent processes piggyback on developmental processes, resulting in a waxing-and-waning effect of experience that tracks with the undulating developmental baseline. This hypothesis led to the prediction that experience-dependent plasticity would be amplified during periods when developmental changes are underway (i.e., early and later in life) and that the peak in experience-dependent plasticity would coincide with the developmental apex for each subcomponent of the auditory brainstem response (ABR). Consistent with our predictions, we reveal that musicians have heightened response features at distinctive times in the life span that coincide with periods of developmental change. The effect of musicianship is also quite specific: we find that only select components of auditory brainstem activity are affected, with musicians having heightened function for onset latency, high-frequency phase-locking, and response consistency, and with little effect observed for other measures, including lower-frequency phase-locking and non-stimulus-related activity. By showing that musicianship imparts a neural signature that is especially evident during childhood and old age, our findings reinforce the idea that the nervous system's response to sound is "chiseled" by how a person interacts with his specific auditory environment, with the effect of the environment wielding its greatest influence during certain privileged windows of development.

**Keywords: development, musical training, auditory brainstem response, sensitive periods, experience-dependent plasticity**

## **INTRODUCTION**

The auditory brain has an awesome capacity to change through experience. But are there limits to this plasticity throughout development? Are there biological guard rails that place limits on experience-dependent plasticity at some points in life or biological stimulants that promote plasticity at others? In this study, we examine these questions, focusing specifically on the auditory brainstem and what it can reveal about sensitive periods in the auditory brain and its ability to respond to sound.

Except in cases of brain death, the auditory brainstem is always "on" and metabolically active (Sokoloff, 1977; Chandrasekaran and Kraus, 2010). As evidence of this, the auditory brainstem response (ABR) is robust even under general anesthesia, during sleep, and while the participant's attention is directed elsewhere (Smith and Mills, 1989; Skoe and Kraus, 2010; Hairston et al., 2013). These steadfast qualities have made the ABR an invaluable clinical tool in the assessment and diagnosis of hearingand other auditory-related disorders (Hall, 2007). However, the fact that the ABR changes very little even under deep sleep or anesthesia, has led to a stereotyping of the response, with many researchers and clinicians treating the ABR as merely a reflex that preserves many of the acoustic features of the stimulus. Yet through the analysis of large datasets and more complex stimulus conditions, a different picture has emerged (Galbraith, 2008). With this approach, we have learned that the auditory brainstem captures the physics of the sound (timing, fundamental frequency, harmonics, etc.) as well as the *meaning* (i.e., behavioral significance) attributed to that sound. In fact, recent data from developing, mature, and aging populations demonstrate that brainstem nuclei are refined by active interactions with sound occurring over brief (hours) or long (years) timescales (reviewed in: Krishnan and Gandour, 2009; Kraus and Chandrasekaran, 2010; Bajo and King, 2012; Kraus et al., 2012; Strait and Kraus, 2013). For example, across the lifespan, we observe differences in auditory brainstem function depending on the instrument a person plays or the language or languages a person speaks (Krishnan et al., 2010; Krizman et al., 2012a; Strait et al., 2012a), suggesting that the auditory brainstem's fundamental ability to capture sound is chiseled by idiosyncratic experiences with sound.

Despite ample evidence of experience-dependent plasticity in the auditory brainstem, we have an incomplete picture of how specific auditory experiences influence auditory brainstem development (Kraus and Chandrasekaran, 2010; Jeng et al., 2011; Kraus et al., 2012; Strait and Kraus, 2013). Auditory brainstem nuclei have been long considered to develop precociously, with adult-like function shown to emerge within the first two years of life (Salamy et al., 1975). However, this concept has recently been called into question by evidence that the auditory brainstem continues to develop beyond age 2 (Johnson et al., 2008b; Skoe et al., 2013). This new line of research suggests that the "adult-like" state that occurs around age 2 is only temporary, with each subcomponent of the response exhibiting a unique and more protracted developmental profile. We find generally that the ABR continues to change throughout childhood, ultimately overshooting the adult value, with the developmental inflection point (the point where the curvature of the trajectory changes sign) occurring around ages 5–11. After this inflection point the developmental trajectory "returns" to the adult value then stabilizes. Following this period of stabilization, aging-related changes begin to emerge, around the sixth decade of life. Taken together, these developmental processes manifest in a complex developmental trajectory with four main age-dependent features: (1) a steep initial gradient (∼neonatal to age 5), (2) an inflection point (ages ∼5 to 14), (3) a period of stabilization where the slope approximates zero (ages ∼14 to 50) and (4) a shallow gradient during senescence (ages ∼ 50+).

We theorize that this protracted development of the ABR creates greater opportunities for the sensory environment to influence neural function. We further theorize that the shape and time course of the developmental trajectory is biologically determined with the trajectory providing a baseline on which experience-dependent processes can take root. Because of the undulating nature of the baseline, we posit that the influence of experience will wax and wane as the developmental trajectory changes slope over the life course, with the greatest effects coinciding with times when developmental changes are underway (Bengtsson et al., 2005; Fava et al., 2011). The inflection point may then, we speculate, reflect a "high point" within a sensitive window in development when experiencedependent plasticity is expected to be most pronounced (Kral et al., 2013).

Sensitive periods are restricted windows during development when a particular experience can have a profound and lasting effect on the brain and behavior. Knudsen has argued that sensitive periods are emergent "properties of neural circuits" (Knudsen, 2004), that is that they reflect points in development when a particular neural circuit is in a state of transition and therefore most labile. If the neural circuit receives heightened stimulation during that period of lability, this, we theorize, could exaggerate how the circuit responds during that window which, in turn, could affect how the circuit responds at a later point in time. Kral and colleagues have shown that sensitive windows in auditory cortical development coincide with transitory peaks in synaptic density in the cat (Kral and Eggermont, 2007; Kral and Sharma, 2012; Kral et al., 2013), which is consistent with the idea that sensitive periods reflect times of neural abundance (Jolles and Crone, 2012). Assuming the same holds for the auditory brainstem, then the inflection point in the developmental trajectory may reflect the height of synaptic overshoot, and therefore a critical turning point in the balance between synaptic proliferation and synaptic pruning (Kral and Sharma, 2012; Skoe et al., 2013). Synaptic overshoot has been argued to endow flexibility to the developing auditory system, allowing the system to be protected against sensory deprivation and primed to take advantage of sensory enrichment (Kral and Eggermont, 2007). This led us to ask whether the functional overshoot in auditory brainstem development represents a time of heightened interaction between nature and nurture, i.e., where the interaction between biologically-determined developmental processes and specific auditory experiences is most pronounced. We examine this question in a cross-sectional study of more than 700 participants spanning nearly 8 decades in age, by assessing how enriched auditory experience, resulting from extensive musical practice, affects auditory brainstem development.

Musical training comes in many forms. However, at their core, all pedagogies share the common feature of using music to engage sensory, motor, cognitive, emotional, and social skills. Through repeated practice these skills become more integrated and refined, resulting in a domain general enhancement. In the case of the auditory brainstem, the effects of musical training are not specific to musical stimuli (Musacchia et al., 2007; Lee et al., 2009; Bidelman et al., 2011; Strait et al., 2012a) but emerge in response to other complex sounds including speech and environmental sounds (Parbery-Clark et al., 2009a; Strait et al., 2009b, 2013a). This transfer of learning from one domain to another intimates a sharing of neural resources (Besson et al., 2011; Patel, 2011, in press): musical training fine-tunes how music is represented in the brainstem leading to the enhancement of acoustic features that are common to music and speech. These enhancements emerge as a distinctive neural signature, with musicians having earlier brainstem responses, more consistent responses and more robust amplitudes, especially at the high-frequency end of the response spectrum (reviewed in: Kraus and Chandrasekaran, 2010; Kraus et al., 2012; Strait and Kraus, 2013). Knowing that there is this transference between music and speech, we opted to use a short speech stimulus (40-ms "da" syllable) for the current study. We chose this particular speech stimulus because it is spectrotemporally complex yet short enough to capture many dimensions of the biological response to sound with minimal testing time (∼20 min), allowing us to more readily accumulate a large data pool. We have used this stimulus for nearly a decade as part of the standard protocol administered to all study participants and over time we have amassed a large dataset from a wide range of participants, enabling us now to provide the first comprehensive examination of how musicianship affects auditory brainstem function throughout life. It is important to note that although we have repeatedly demonstrated musician enhancements for longer speech stimuli (Wong et al., 2007; Parbery-Clark et al., 2009b, 2012b,c; Strait et al., 2012b, 2013a,b), we have not previously seen differences between "musicians" and "non-musicians" for this exact stimulus and collection protocol (unpublished data). However, previous analyses used small groups of participants within narrow age ranges. By examining the data en masse, from a broader, developmental perspective, we expected that musician effects would emerge as a consequence of increased statistical power. We predicted that the musician neural signature (earlier latencies, more robust high-frequency phase-locking, more consistent responses) would be evident throughout the lifespan but that the signature would be most pronounced during periods of developmental change.

## **MATERIALS AND METHODS**

All procedures were approved by the Northwestern University Institutional Review Board. Adult participants gave their written informed consent to participate. For infant and child participants, informed consent was obtained from the parent or guardian. Verbal assent was obtained from 3–7 year olds, and written assent was collected from 8–17 year olds using age-appropriate language. All participants were paid for their participation.

Auditory brainstem responses were recorded to a 40-ms speech syllable, /da/, following methodological conventions described previously (Skoe and Kraus, 2010) (**Figure 1**). We have adopted the terminology "cABR" to refer to ABRs to complex, naturalistic

**FIGURE 1 | Characteristics of the cABR (Top).** The complex stimulus [da] (gray) elicits a stereotyped cABR (black) with 6 characteristic peaks (V, A, D, E, F, O). V and A represent the onset response. D, E, and F occur within the frequency-following response (FFR), and O reflects the offset response. The stimulus waveform is shifted by ∼6.8 ms to maximize the visual coherence between the two signals in this figure. To obtain a measure of non-stimulus activity, the root-mean-square amplitude of the response to the 15 ms interval preceding the stimulus was taken. **(Bottom)** Frequency domain representation of the FFR (19.5–44.2 ms). Spectral amplitudes were calculated over three frequency ranges: low (75–175), mid (175–750) and high (750–1050 Hz). Waveforms represent the grand averages of the young adult group (21–40 year olds).

sounds such as speech and music, and will use it to refer to the neural recording throughout this report. cABRs reflect population neural responses from nuclei within the rostral brainstem, including the lateral lemniscus and inferior colliculus (IC) (Chandrasekaran and Kraus, 2010).

## **STIMULUS**

The /da/ stimulus is a five-formant synthesized syllable (Klatt, 1976) consisting of a high-frequency energy burst (occurring at 2500, 3500, and 4000 Hz) during the first 10 ms, followed by a voiced period with a gently ramping fundamental frequency (*F*0) (103–125 Hz). During the voiced period, the syllable transitions from a dental place of articulation, characteristic of /d/, to a place of articulation further back in the mouth associated with /a/. This shift in articulation is reflected by linearly changing formant frequencies: the first formant (*F*1) ramps up from 220 to 720 Hz, the second formant (*F*2) ramps down from 1700 to 1240 Hz, the third formant (*F*3) ramps down from 2580 to 2500 Hz, and the fourth and fifth formants are stable at 3500 and 4500 Hz, respectively.

## **PARALLELS BETWEEN THE STIMULUS AND RESPONSE**

One of the most striking features of cABRs is their fidelity to the stimulus (Skoe and Kraus, 2010). As seen in **Figure 1**, cABRs capture many of the temporal and spectral characteristics of the stimulus. The voiced /da/ stimulus evokes six characteristic response peaks (V, A, D, E, F, O) that relate to major acoustic landmarks in the stimulus, with each peak occurring roughly 6–8 ms after its corresponding stimulus landmark, a timeframe consistent with the neural transmission time between the cochlea and rostral brainstem. (For more information on the neural origins of the cABR we refer the reader to Chandrasekaran and Kraus (2010) where this topic is reviewed). Peaks V and A are transient responses to the energy burst at the onset of the sound, peak O is an offset response that marks the cessation of sound, and the interval spanning D-E-F is the frequency-following response (FFR) to the *F*<sup>0</sup> of the stimulus and its harmonics. Within the FFR, the interval between the major peaks corresponds to the wavelength of the syllable's *F*0. For natural speech, this interval represents the length of each glottal pulse. When air flows from the lungs through the vibrating glottis, a harmonically-rich sound is produced that is then filtered by the speech articulators to give rise to speech formants—concentrations of energy in the speech spectrum. In the cABR, smaller fluctuations between peaks D, E, and F reflect phase-locking to the harmonics of the *F*0, up until about 1000 Hz where phase-locking in the IC drops off precipitously (Langner and Schreiner, 1988; Liu et al., 2006). Fourier analysis of the FFR (**Figure 1**, bottom) reveals spectral peaks at the *F*0, and its harmonics, with an amplitude decay at higher-frequencies.

The latency of cABR peaks—the lag between a specific stimulus feature (i.e., onset, offset) and the appearance of a peak—is affected by the stimulus spectrum, including frequencies above 1000 Hz (Johnson et al., 2008a; Skoe et al., 2011). Due to the tonotopic organization of the basilar membrane, higher-frequencies yield slightly earlier peak latencies than lower-frequencies. So while the neural delay between the stimulus and brainstem response falls generally between 6–8 ms, the exact latency is determined by the spectral composition of the stimulus at each particular point in time. Owing to the spectral makeup of the acoustically complex stimulus /da/, energy in the higher end of the spectrum diminishes over the duration of the syllable, such that peak V is being driven more strongly by high-frequencies than the other peaks.

#### **PARTICIPANTS**

The present report includes a total of 770 participants ranging in age 0.25–72.41 years, 213 of whom were categorized as musicians with the remaining representing the "general population" (**Table 1**). Infants were excluded from the analyses because of the lack of musicians in this age range and also because of the difficulty of defining musicianship in this age; however, they are included for reference in some of the figures. The youngest musician in the sample was 3.26 years old and the oldest was 70.12 years old. Data from many of these musician participants have been previously published for other stimuli. To create the musician group, we pooled data across multiple published and unpublished studies on musicians from our laboratory, adopting the categorization criteria of musicianship for each respective study (**Table 2**). In large majority, the musicians were "early musicians" (Penhune, 2011), beginning before the age of 7, who practiced on a regular basis.

None of the participants had a history of learning disabilities or neurological dysfunction and all participants had normal audiometric profiles. Normal hearing was confirmed by air-conduction thresholds (*<*20 dB HL for 500, 1000, 2000, 4000 Hz) for participants older than 5 years or an audiological screen (pass/fail based on distortion product otoacoustic emissions and/or behavioral response at 20 dB HL) for participants 5 and under. To further control for audiometric differences, clickevoked ABRs were recorded on all subjects (Hood, 1998) and confirmed to be within normal limits based on laboratory-internal norms.

Participants were divided into 9 groups by age (*<*1, 2–5, 5–8, 8–14, 14–17, 17–21, 21–40, 40–60, 60–73 years). Analyses included the 8 oldest groups. Throughout the paper, the age ranges are labeled "X-Y" where X refers to the youngest possible age in the group and Y refers to the next integer value after the maximum age cutoff for the group. For example, for the "2–5" year-old range, 2.00 is the youngest possible age and 4.99 is the oldest possible age. Therefore, there is no overlap between the 2–5 and 5–8 groups.

#### **ELECTROPHYSIOLOGICAL PROCEDURES**

We briefly summarize the protocol here; for a complete description of the specific protocol we refer the reader to Krizman et al. (2012b). During electrophysiological testing, participants sat in a recliner within a sound treated chamber and were instructed to ignore the stimuli presented to their right ear via insert earphones (10.9/s, 80 dB SPL). cABRs were recorded using the Navigator Pro AEP System (Natus Medical, Inc.). Contact impedance was less than 5 kOhms for all electrodes. A total of 6000 trials were averaged, after excluding trials exceeding +*/*−23.8 microvolts. To gauge the repeatability of the response over the course of the recoding, two subaverages were collected (**Figure 2**).

### **ANALYSIS**

Analysis focused on four sets of measurements: peak latency (6 peaks), FFR amplitude (3 frequency ranges), response consistency, and non-stimulus activity (11 total dependent variables). Latency measurements were made manually via the AEP system, following guidelines described previously (Krizman et al., 2012b). All other data reduction occurred in the MATLAB programming environment (Mathworks, Inc.).

All of the cABR measurements included in the analyses are developmentally sensitive and exhibit age-dependent changes (Johnson et al., 2008b; Skoe et al., 2013). At least two different developmental patterns are expressed in the cABR in


*Participants were divided into 9 age groups. The number of participants and percentage of female participants is reported along with age statistics (mean, standard deviation, youngest age in group, and oldest age in group).*

#### **Table 2 | Musician definition by study.**


*Age range and musician definition as reported for each respective study.*

typically-developing populations, with the latency and FFR measures displaying a different developmental pattern than the response consistency and non-stimulus activity measures, which have similar but not identical developmental profiles (Skoe et al., 2013). The developmental trajectory for the latency and amplitude measures exhibits a transitory apex during schoolage years that briefly overshoots the adult pattern, whereas the other two measures have a more symmetrical, broader

trajectory with a more prolonged apex that extends from preadolescence into adulthood and lacks an overshoot period (Skoe et al., 2013).

#### **PEAK LATENCY**

over the FFR (19.5–42.2 ms, gray box).

The response is characterized by 6 peaks, which are highly repeatable within and across participants (Russo et al., 2004; Song et al., 2011). These peaks are referred to as V, A, D, E, F, and O (**Figure 1**, top). Peak identification was confirmed by a team of experienced observers. Peaks that were not repeatable or did not exceed the noise floor were excluded from the analysis.

#### **FREQUENCY-FOLLOWING RESPONSE (FFR) AMPLITUDE**

The FFR (19.5–44.2 ms) reflects neural discharges that are phaselocked to the *F*<sup>0</sup> and its harmonics (Moushegian et al., 1973; Skoe and Kraus, 2010). To derive measures of phase-locking at different frequencies, the response was converted to the frequency domain by applying a fast Fourier transform (with zero padding) to the FFR of each participant, after first applying a Hanning ramp. The resultant response spectrum was averaged over three frequency bins: 75–175 Hz ("low"), 175–750 Hz ("mid"), and 750–1050 Hz ("high") (**Figure 1**, bottom). The bins were determined based on the acoustic features of the stimulus. The low bin encapsulates the *F*<sup>0</sup> of the stimulus, the mid bin encapsulates F1, and the high bin encapsulates harmonics above the F1 that are still within the phase-locking limits of the rostral brainstem (Langner and Schreiner, 1988). The lower and upper boundaries of the analysis bins were set based on visual examination of the morphology of the response spectrum to ensure that the spectral peaks corresponding to the *F*<sup>0</sup> and *F*<sup>1</sup> were fully captured in the bin.

#### **RESPONSE CONSISTENCY**

To determine how consistent the FFR was over the course of the recording, we correlated the subaverages using a Pearson productmoment correlation calculation (**Figure 2**). Values were Fisher transformed to increase the normality of the data prior to analysis (Hornickel and Kraus, 2013).

#### **NON-STIMULUS ACTIVITY**

The magnitude of the response in the absence of stimulation was measured by calculating the root-mean-square amplitude of the averaged response to the 15 ms interval preceding the presentation of each stimulus.

#### **STATISTICAL ANALYSES**

To determine how enriched auditory experience affects the developmental trajectory, we conducted an 8 × 2 ANCOVA in SPSS (version 21, IBM) using age group (8 levels) and musician groups (2 levels) as the independent variables for each dependent measure, and covarying for the sex of the participant. For the latency measurements, we also covaried for the click-ABR peak V latency to factor out potential underlying differences in peripheral auditory function between groups (Hood, 1998). *F* and *p*-statistics are reported, along with the Eta squared, the estimated effect size (η2). Following Cohen's conventions (Cohen, 1988), an effect size between 0.01 and 0.059 is considered small, between 0.059 and 0.138 is medium, and ≥0.138 is large.

As a planned follow-up analysis, we examined whether the extent of the functional overshoot was larger in musicians compared to the general population. We operationally define *overshoot* to be a point on the developmental trajectory that exceeds the steady-state/stabilization point of the trajectory. To characterize the overshoot, we compared the 5–14 year olds to the young adults, an age range of presumed developmental maturity where the developmental trajectory is relatively stable. By comparing pediatric and adult brains, we adopt a similar approach to the landmark work by Huttenlocher and Dabholkar (1997) who studied synaptic overshoot by examining pediatric and adult human brains post mortem (Huttenlocher and Dabholkar, 1997). For this analysis, we combined the 5–8 and 8–14 year-old groups into a single group for because the 5–14 range appeared to represent a general period of overshoot across the various measures that we examined.

## **RESULTS**

#### **EFFECT OF MUSICIANSHIP ON THE DEVELOPMENTAL TRAJECTORY**

Musicians were found to differ from the general populations for peak V latency [*F(*1*,* <sup>721</sup>*)* <sup>=</sup> <sup>4</sup>*.*469, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*035, <sup>η</sup><sup>2</sup> <sup>=</sup> <sup>0</sup>*.*006], high-frequency phase-locking [*F(*1*,* <sup>728</sup>*)* <sup>=</sup> <sup>8</sup>*.*445, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*004, <sup>η</sup><sup>2</sup> <sup>=</sup> 0*.*011] and response consistency [*F(*1*,* <sup>728</sup>*)* = 10*.*742, *p* = 0*.*001, <sup>η</sup><sup>2</sup> <sup>=</sup> <sup>0</sup>*.*015]. The main effect of group was trending for E and F latency as was the interaction between age and group for the high-frequency phase-locking measure. No other main effects of group, or group × age interactions were found, (see **Table 3** for statistics, **Figures 3**–**5**).

#### **EFFECT OF MUSICIANSHIP ON THE DEVELOPMENTAL OVERSHOOT**

The developmental profile for the latency and FFR amplitude measures is characterized by a period of overshoot during childhood (occurring within the 5–14 year-old window), when the developmental trajectory briefly surpasses the adult value, as reflected by earlier latencies and larger amplitudes for children of


*Omnibus F and p statistics are reported for each independent measure. P-values <0.1 appear in gray. In addition to the 6 peaks of the cABR, results are reported for peak V latency of the click-evoked ABR. The lack of musicianship effects for the click-evoked ABR reinforces that the effects of musicianship on peak V of the cABR are not driven by subclinical differences in peripheral audiometric function.*

this age compared to the adults (**Figures 3**–**5**). The overshoot is observed in musicians and the general population, however, the extent of the overshoot is greater for the musicians for peak V latency and high-frequency phase-locking when comparing the 5–14 year olds to the 21–40 year olds [age × group interaction: *<sup>F</sup>(*1*,* <sup>330</sup>*)* <sup>=</sup> <sup>5</sup>*.*27, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*02, <sup>η</sup><sup>2</sup> <sup>=</sup> <sup>0</sup>*.*016; *<sup>F</sup>(*1*,* <sup>33</sup>*)* <sup>=</sup> <sup>3</sup>*.*23, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*05, <sup>η</sup><sup>2</sup> <sup>=</sup> <sup>0</sup>*.*011, respectively] (**Figure 6**). For the response consistency measure, the general population shows a u-shaped trajectory, with a prolonged (flat) apex and no overshoot. In contrast, musicians have a less symmetric trajectory for this measure that crests around age 8. Thus, whereas the trajectory is relatively flat for the general population between the child and adult values, musicians show a distinctive developmental pattern in which the musically-trained children have more consistent responses than musically-trained adults. [*F(*1*,* <sup>330</sup>*)* <sup>=</sup> <sup>8</sup>*.*53, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*005, <sup>η</sup><sup>2</sup> <sup>=</sup> 0*.*025] (**Figure 6**).

## **DISCUSSION**

Auditory development can be experimentally controlled via deprivation or pharmacological manipulation, leading to the extension, delay, or re-opening of plasticity (Hensch, 2003; McLaughlin et al., 2010; Zhou et al., 2011). This raises the question of whether experiences incurred in the natural world can likewise alter the developmental timeline of the auditory system and manipulate sensitive windows in development (Shahin et al., 2004; Jolles and Crone, 2012). We examined this question by studying the interaction between experience-dependent plasticity and developmental plasticity, using musicians as a model of enriched auditory experience. We aimed to understand (1) which aspects of auditory brainstem development can be altered by musical training and (2) whether there might be windows in life when the effects of enriched experience are most pronounced. We theorized that the potential for experiencedependent plasticity exists throughout life but that experiencedependent processes will "ride" on top of developmental processes resulting in a waxing and waning of experience-dependent plasticity that is constrained by the undulating developmental baseline for each subcomponent of the cABR. Based on previous reports, we also predicted that musical training would not affect all components of the response equally (reviewed in Kraus and Chandrasekaran, 2010; Kraus et al., 2012; Strait and Kraus, 2013). Consistent with our predictions, we observed differences between musicians and the general population for response latency, high-frequency phase-locking, and response consistency—three aspects of the cABR previously shown to be enhanced in musicians (Musacchia et al., 2007; Parbery-Clark et al., 2009a, 2012a,b; Strait et al., 2012b, 2013a,b). Across these different subcomponents of the response, the effect of musicianship appears most evident for younger and older age groups, with minimal differences for the adolescents and young adults (*<*40 years old). We also observe different developmental peaks and valleys for each component of the musician signature (onset latency, high-frequency phase-locking, response consistency), which lends support to the idea that musician advantages emerge in stages (Strait and Kraus, 2013).

#### **MUSIC EXPERIENCE MAXIMIZES FUNCTION DURING SENSITIVE PERIODS IN DEVELOPMENT**

The auditory brainstem undergoes at least two different developmental trajectories (Skoe et al., 2013). In the general population, latency and frequency-following components of the cABR have a similar developmental timeline that is marked by a transient period of functional overshoot. Response consistency and non-stimulus activity, on the other hand, exhibit a different developmental timeline that has a more prolonged apex and no overshoot. The current study allowed us to examine whether auditory enrichment, in the form of extensive musical training, alters these developmental profiles.

We find that the general morphology of the latency and frequency-following amplitude developmental trajectories is largely similar between musicians and the general population. In line with the theory that musical-training is constrained by developmental trajectories (Trainor, 2005), this findings suggests that musical training does not speed up or radically alter the shape of the developmental profile for the latency and amplitudes measures. Notably, however, while musical training does not change the *timeline* over which these developmental processes unfold, musical training does appear to interact with these developmental processes. In the case of peak V latency and high-frequency phase-locking, the expression of experience-dependent plasticity is greatest during the period of overshoot. Specifically we found that the functional overshoot is more prominent in musicians for these latency and phase-locking measures, resulting in a bigger difference between the 5–14 year olds (i.e., the height of the overshoot in the

general population) and the young adults in the musicians compared to the general population. We take this as evidence that experience-dependent plasticity is maximized during high points in development when neural resources are in abundance and the auditory system is undergoing a sensitive period for these measures.

For response consistency, the morphology of the musician trajectory is, however, rather different from that of the general population. The most notable difference being that the musician trajectory contains an overshoot whereas the general population has a more symmetric profile. Within this qualitatively different looking trajectory, musicians appear to reach developmental high points earlier than the general population, perhaps suggestive of more rapid development of this aspect of the cABR in musicians. The effect can be seen most clearly for the 2–5 year-old musicians whose response consistency is higher than the 5–8 year olds from the general population.

Taken together, in musicians we find that experiencedependent plasticity and developmental processes interact but that the nature of the interaction is different for different subcomponents of the musician signature. In the case of peak V and high-frequency phase-locking, the developmental curves

are similar in shape between the musicians and general population, but the musician curve has a more pronounced overshoot. Thus, for these measures, it appears that developmental processes put constraints on how much of an effect the environment can have at each point along the trajectory with a "soft spot" occurring around the period of overshoot, where the effects of musicianship are most amplified. Because the overshoot is not unique to the musician group, we argue that musical training is not triggering the overshoot or controlling the timing of the sensitive period for these subcomponents of auditory brainstem activity. In contrast, for the response consistency measure, musical training seems to change the shape of the developmental trajectory, leading to a period of overshoot that is not evident in the general population. This finding suggests that the environment can trigger changes in developmental processes that underlie the consistency of the response, but that the time points at which this can occur is developmentally constrained.

**than the general population (black).** This effect is present for the latency of peak V **(top)**, the amplitude of the high-frequency region of the frequency-following response **(middle)**, and response consistency **(bottom)**. Comparisons are made between the 5–14 year-old group and the 21–40 year-old groups. For all three measures, the difference between the child and adult values is greater in the musicians than in the general public. Error bars represent +*/*− one standard error of the mean.

### **NEURAL MECHANISMS: CHANGES IN SYNAPTIC DENSITY RESULTING FROM ACTIVE ENGAGEMENT WITH AN ENRICHED ENVIRONMENT**

Developmental overshoot is thought to reflect a time of neural abundance in which synaptic density is heightened. In the auditory cortex, Kral and colleagues have recently demonstrated that experience-dependent cortical plasticity is increased when auditory experience, which in their case was cochlear implantation, coincides with the point when synpatic overshoot is at its developmental peak (Kral et al., 2013). Analogous to this, our findings suggest that music-related plasticity is heightened during the functional overshoot or in the case of the response consistency measure that musical training triggers an overshoot. We interprets this to mean that enriched auditory experience, in the form of musical training, amplifies neural proliferation in the auditory brainstem [for a related account see (Green et al., 2006)], manifesting in decreased cABR latencies, increased high-frequency amplitudes and increased response consistency relative to the general population during this time period. This early amassing of neural resources, which appears to only be temporary, may protect the nervous system later in life when aging-related processes set in and potentially lead the aging auditory system to operate as if it were biologically younger (e.g., earlier, more robust, and more consistent responses) (Luk et al., 2011; Zendel and Alain, 2011; Parbery-Clark et al., 2012a).

Research from developing and congenitally deaf animals suggests that overshoot in the auditory cortex emerges independent of auditory experience and that the mechanisms leading to the rise, overshoot, and fall of synaptic density are biologically preprogrammed (Kral and Sharma, 2012). We speculate that many of the same general developmental principles hold for the auditory brainstem and auditory cortex, while at the same time acknowledging potential differences between brainstem and cortical development. For example, based on their work with inborn deaf populations, Tillein et al. (2012) have argued that there is no sensitive period in auditory brainstem development (Tillein et al., 2012). In deaf populations, the auditory brainstem (unlike the auditory cortex) remains in a state of arrested development until input is provided after which auditory brainstem development proceeds along a similar trajectory relative to hearing populations no matter when implantation occurs, with the developmental trajectory being driven by "age in sound" instead of biological age (Gordon et al., 2011; Tillein et al., 2012). So while the auditory brainstem has the potential for normal development even if initially completely deprived of auditory input, once auditory input is provided, as is the case for the auditory cortex, developmental processes will ultimately depend on the *nature* and *quality* of that input (whether it be enriched or impoverished) (Moore, 2002; Gordon et al., 2011) as well as how the individual interacts with that input (Kuhl, 2003; Engineer et al., 2004; Kraus and Chandrasekaran, 2010). In addition to receiving an enriched soundscape that supports active and passive music listening, musicians interact with sound in many diverse ways. We believe that it is through the combination of physically producing music, receiving mutlisensory feedback during musical performance, and engaging with music in socially- and emotionally-engaging ways that music is able to affect auditory development and transform how sound is processed by the brain.

#### **ARE THESE EFFECTS SPECIFIC TO MUSICAL TRAINING?**

The functional overshoot for onset latency and high-frequency phase-locking is maximized in musicians. This combined with evidence that musicians exhibit a functional overshoot for response consistency but the general population does not, raises the question of whether we have discovered a sensitive window for musical training in the auditory brainstem? From our perspective, the answer is both yes and no.

We show here that musical training affects specific components of the cABR, which reinforces the concept that musical training produces a selective enhancement and not an overall gain across all components of the cABR. While the specific pattern of enhancements may be unique to musical training, we view functional overshoot as a general property of auditory brainstem development with the time window of the overshoot representing a period of great sensitivity in auditory brainstem development that is not specific to musical training. Thus, the fact that we observed a rather circumscribed effect of musicianship that was limited to a small set of measures does not necessarily mean that the other cABR measures are insensitive to enriched experience or that they lack a sensitive period. Instead we expect that auditory experience of any form, whether it be musical or linguistic, enriched or impoverished, would have an especially pronounced effect during times of developmental change but that different types of auditory experiences might have unique manifestations (reviewed in: Krishnan and Gandour, 2009; Kraus and Chandrasekaran, 2010; Kraus et al., 2012; Strait and Kraus, 2013). For example, musicians have a unique neural signature that can be distinguished from bilinguals (reviewed in Kraus and Nicol, 2014). As we have demonstrated here, musicians tend to have earlier brainstem responses and more robust amplitudes, especially at the high-frequency end of the response spectrum. Boosts in high-frequency phase-locking may reflect a musician's extensive experience with musical timbre, a perceptual feature of sound that is driven (at least in part) by the spectral shape of the harmonics. In contrast, when presented with the exact same stimulus, bilinguals show increased lowfrequency phase-locking but no timing enhancements. Increased low-frequency phaselocking may be the outcome of heightened attention to the fundamental frequency, a vocal feature that changes when a bilingual speaker switches languages (Altenberg and Ferrand, 2006; Krizman et al., 2012a). Because bilingualism appears to boost phase-locking to low-frequencies in the cABR but not high-frequencies (Krizman et al., 2012a), we predict that bilinguals will show a distinct developmental trajectory for low-frequency encoding in the auditory brainstem compared to monolinguals.

Thus, we theorize that each cABR subcomponent has the potential to change with auditory experience. We hope to use the current work as a canvas for examining the developmental trajectory of other populations, including bilinguals, to gain a deeper understanding of how developmental processes within the auditory brainstem are influenced by specific auditory experiences.

#### **FUNCTIONAL SIGNIFICANCE**

Music imparts a specific neural signature on the auditory brainstem. But what is the functional significance of this neural rewiring? For this large dataset, we are not in a position to directly answer this due to the lack of a common behavioral index that can be compared between groups or across ages. There are several reasons for this, with the first being that there is no single behavioral test of perceptual or cognitive function that can be applied to all age groups, from toddlers to older adults. This is in contrast to cABRs, where the exact same testing protocol can be used at all developmental stages. Second, these data were collected over the course of nearly a decade as part of smaller studies where the battery of tests was not entirely overlapping. So, even within an age group we do not have the same behavioral index on all subjects. That said, based on our specific pattern of results and the close mapping between stimulus and the response that characterizes the cABR, we are in a position to speculate on the behavioral significance of our findings. Of the various response peaks, peak V was the most different between the two groups. This peak, which signifies the neural response to the onset of sound, is driven by the initial high-frequency burst of the stop consonant of the stimulus. Earlier onset latencies and more robust high-frequency phase-locking are both indicators of greater neural synchrony in musicians. The combination of earlier latencies and greater high-frequency phaselocking also suggests that musicians might be especially sensitive to the high-frequency, timbral components of the stimulus. This boosting of the higher harmonics, we conjecture, may provide an alternative mechanism for capturing the fundamental frequency of the stimulus given that the harmonics, by definition, are integer multiples of the fundamental.

We also know from previous studies that musicians outperform non-musician peers on a variety of behavioral tasks, including auditory working memory (Parbery-Clark et al., 2009a, 2011a), auditory attention (Strait et al., 2010), and perceiving speech in noise (Parbery-Clark et al., 2009a, 2011a), and that these behavioral advantages correlate with earlier latencies, larger high-frequency responses, and more consistent responses (Parbery-Clark et al., 2009a, 2012b; Kraus et al., 2012). To help further build the case that the neurophysiological differences we observe have functional consequences, our previous work has established that this same set of neural measurements is compromised in children with dyslexia (Wible et al., 2004; Banai et al., 2009; Hornickel et al., 2012). Taken together with our larger body of research, our pattern of findings therefore underscores the idea that the biological processes important for language and cognition are strengthened by musical experience (Patel, 2011; Strait and Kraus, 2011).

#### **DOES MORE EXPERIENCE TRANSLATE INTO MORE PLASTICITY?**

A large majority of our participants were "early musicians" (Penhune et al., 2005) who began music instruction before age 7 and continued to play for many years thereafter. Due to the small sampling of "late musicians" we are unfortunately not in a position to disentangle the effects of when musical training started from how long it lasted; however, given that most of the participants began training around the same age, our dataset can provide insight into how the developmental trajectory changes as more and more experience is accrued.

There are numerous examples indicating that increasing experience accentuates brainstem plasticity (Wong et al., 2007; Strait et al., 2009a; Parbery-Clark et al., 2011b); however, in our crosssectional survey spanning nearly 8 decades, we find that the musician trajectory does not diverge further from the general population as experience mounts. Instead our findings indicate that musical experience is associated with an initial boost in auditory brainstem function during the first few years of practice, and that additional experience brings about a state of equilibrium in which the differences between musicians and the general population appear diminished when the developmental trajectory stabilizes but then re-emerge later in life when the developmental baseline begins to change. This finding is consistent with evidence that auditory-related plasticity emerges early during learning but "renormalizes" with additional training (Reed et al., 2011) Another, not mutually exclusive interpretation, is that musical training leads the auditory system to operate at its maximal biological capacity at each point in life, allowing the individual to achieve his/her genetic potential for a particular stimulus (Jolles and Crone, 2012). However, once the biological ceiling is met, additional plasticity cannot occur, even if more experience is amassed, unless, for example, there is a change in the underlying biology such as occurs through natural aging.

#### **WHEN DOES EXPERIENCE-DEPENDENT BRAINSTEM PLASTICITY FIRST EMERGE?**

Although experience-dependent plasticity may be maximized during the period of overshoot, our data suggest that experiencedependent plasticity is not limited to this time period. Experience-dependent brainstem plasticity is apparent in the 2–5 year olds, the youngest group of musicians we sampled, as well as the 60–73 year olds, the oldest group of musicians we sampled, with the caveat, however, that we cannot entirely rule out inherent, "baseline" differences between the musicians in our sample and those in the general population. But if our theory holds and experience-dependent effects undulate with age-dependent effects, then given the rapid developmental changes that occur prior to age 2, we predict that music-dependent plasticity could emerge earlier in the rare individuals who begin participating in musical activities before age 2. In the future, we hope to explore this prediction and to also study more generally how early auditory experiences including formal and informal language and music activities (Fava et al., 2011; Trainor et al., 2012; Putkinen et al., 2013) affect auditory brainstem development, interact with the neural mechanisms that give rise to sensitive periods, and lead to changes in auditory proclivities that affect auditory function later in life (McMahon et al., 2012; Yang et al., 2012).

#### **COMPARISONS, CAVEATS, AND GENERALIZATIONS**

This study was performed retrospectively on an existing dataset. While this gave us the benefits of a large dataset, and allowed us to examine the effect of musicianship across nearly 8 decades, the retrospective nature also placed limitations on the study. For example, compared with our previous studies, the groups of participants being considered here are more of a "mixed-bag." To create the musician group, we carefully combed through our data pool to identify individuals with extensive musical training; however, due to the difficulty of establishing a single "musician" definition that applies to all ages, our musician group is not as precisely-defined as our previous work. For the general population, we also left in individuals with a nominal amount of musical training (*<*5 years). We took this approach because we wanted to understand how the developmental trajectory manifests under "normal" conditions; given that many individuals in the United States have had some small amount of musical training in school, an individual with zero years of music is in fact quite rare (Steinel, 1990), at least in Middle Class and more privileged populations. So, due to how we created this group, we leave open the possibility that the general population could be displaying a lingering effect of past musical experience (Skoe and Kraus, 2012). We acknowledge that the "muddy" nature of the general population, combined with the uneven number of individuals per group, the greater number of females in the musician group, and the use of a short consonant-vowel stimulus, are all caveats that may have diminished group differences and dampened group by age interactions that appear upon visual inspection of the data (**Figures 3**–**5**) but do not emerge in the statistics. For example, in **Figure 3**, peaks E and F appear different between the two groups, but are only teetering on the verge of being statistically different (*p <* 0*.*1). This leads us to predict that with more homogenous populations, including uniform group definitions, more extensive latency effects would emerge.

Now turning to the question of whether our findings can generalize to other stimuli. With our short speech syllable, we are able to quickly (20-min paradigm) tap into developmental and experience-dependent processes that are common to speech and music processing. While the pattern of findings is expected to be largely similar regardless of the stimulus, greater effect sizes are anticipated for longer, more complex stimuli (Wong et al., 2007; Strait et al., 2009a), especially when those stimuli are presented in background noise (Parbery-Clark et al., 2009a). The diminishment of neurophysiological differences for the adult participants, we believe, can largely be explained by the stimulus. In young adults, we have previously observed neurophysiological differences between musicians and non-musicians for a similar, albeit longer, "da" speech stimulus, but only when that stimulus was masked by noise, not when it was presented alone (Parbery-Clark et al., 2009a). In contrast, for younger and older populations, musician effects have been seen in both noisy and "quiet" conditions (Parbery-Clark et al., 2012b; Strait et al., 2012b, 2013b). Thus, we treat the apparent lack neurophysiological differences between musicians and the general population during adulthood as being reflective of the specific qualities of our stimulus and not as indicative of a lack of behavioral or other neurophysiological differences.

All caveats aside, the fact that we observed even modest differences between the musically-trained and general populations for a stimulus where musician effects have not previously been reported, we believe makes our findings all the more striking.

## **SUMMARY AND CONCLUSIONS**

This study examined the interaction between auditory development and enriched auditory experience. Our findings suggest that musical training can intensify neural function during sensitive periods in auditory brainstem development leading to enhancements for specific subcomponents of the cABR and not an overall boost in activity.

## **REFERENCES**


latency of auditory cortex neurons. *J. Neurophysiol.* 92, 73–82. doi: 10.1152/jn.00059.2004


## **ACKNOWLEDGMENTS**

We thank all of the members of the Auditory Neuroscience Laboratory, past and present, who helped in data collection and analysis. In addition we extend our thanks to Jennifer Krizman, Adam Tierney, Jessica Slater, Samira Anderson, Emily Spitzer, and Trent Nicol for their invaluable input on a previous version of this manuscript. This work was supported by the Northwestern University Knowles Hearing Center, The Mather's Foundation, NIH R01RO1 DC10016, NIH R01 DC01510, R01 HD069414, NSF 0921275, NSF 1057566, and NSF 1015614.


offsets age-related delays in neural timing. *Neurobiol. Aging* 33, 1483.e1–1483.e4. doi: 10.1016/j. neurobiolaging.2011.12.015


of the speech-evoked auditory brainstem response. *Clin. Neurophysiol.* 122, 346–355. doi: 10.1016/j.clinph.2010.07.009


*Cogn. Neurosci.* 6C, 51–60. doi: 10.1016/j.dcn.2013.06.003


representation of onset and formant structure of speech sounds in children with language-based learning problems. *Biol. Psychol.* 67, 299–317. doi: 10.1016/j.biopsycho. 2004.02.002


processing. *Psychol. Aging* 27, 410–417. doi: 10.1037/a0024816

Zhou, X., Panizzutti, R., de Villers-Sidani, E., Madeira, C., and Merzenich, M. M. (2011). Natural restoration of critical period plasticity in the juvenile and adult primary auditory cortex. *J. Neurosci.* 31, 5625–5634. doi: 10.1523/JNEUROSCI.6470-10.2011

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 31 May 2013; accepted: 23 August 2013; published online: 19 September 2013.*

*Citation: Skoe E and Kraus N (2013) Musical training heightens auditory brainstem function during sensitive periods in development. Front. Psychol. 4:622. doi: 10.3389/fpsyg.2013.00622 This article was submitted to Auditory Cognitive Neuroscience, a section of the*

*journal Frontiers in Psychology.*

*Copyright © 2013 Skoe and Kraus. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## The relationship between the age of onset of musical training and rhythm synchronization performance: validation of sensitive period effects

## *Jennifer A. Bailey\* and Virginia B. Penhune*

*Department of Psychology, Concordia University, Montreal, QC, Canada*

#### *Edited by:*

*Etienne De Villers-Sidani, McGill University, Canada*

#### *Reviewed by:*

*Nina Kraus, Northwestern University, USA Patricia Kuhl, University of Washington, USA*

#### *\*Correspondence:*

*Jennifer A. Bailey, Department of Psychology, Concordia University, Montreal, QC H4B 1R6, Canada e-mail: j.anne.bailey@gmail.com*

A sensitive period associated with musical training has been proposed, suggesting the influence of musical training on the brain and behavior is strongest during the early years of childhood. Experiments from our laboratory have directly tested the sensitive period hypothesis for musical training by comparing musicians who began their training prior to age seven with those who began their training after age seven, while matching the two groups in terms of musical experience (Watanabe et al., 2007; Bailey and Penhune, 2010, 2012). Using this matching paradigm, the early-trained groups have demonstrated enhanced sensorimotor synchronization skills and associated differences in brain structure (Bailey et al., 2013; Steele et al., 2013). The current study takes a different approach to investigating the sensitive period hypothesis for musical training by examining a single large group of unmatched musicians (*N* = 77) and exploring the relationship between age of onset of musical training as a continuous variable and performance on the Rhythm Synchronization Task (RST), a previously used auditory-motor RST. Interestingly, age of onset was correlated with task performance for those who began training earlier, however, no such relationship was observed among those who began training in their later childhood years. In addition, years of formal training showed a similar pattern. However, individual working memory scores were predictive of task performance, regardless of age of onset of musical training. Overall, these results support the sensitive period hypothesis for musical training and suggest a non-linear relationship between age of onset of musical training and auditory-motor rhythm synchronization abilities, such that a relationship exists early in childhood but then plateaus later on in development, similar to maturational growth trajectories of brain regions implicated in playing music.

**Keywords: musical training, sensitive period, brain development, early training, working memory**

#### **INTRODUCTION**

A sensitive period is a window in development when specific training or experience produces long-term changes in behavior and the brain, above and beyond those associated with that same experience at a different time during development (Knudsen, 2004; de Villers-Sidani and Merzenich, 2011). Sensitive periods have been proposed for the visual and auditory systems, as well as for language learning (for reviews see Hensch, 2005; Hooks and Chen, 2007; de Villers-Sidani and Merzenich, 2011). A sensitive period for musical training has also been proposed based on evidence that early-trained musicians demonstrate advantages over late-trained musicians for musical tasks such as rhythm synchronization and pitch identification, as well as differences in brain structure, particularly in motor regions (Takeuchi and Hulse, 1993; Schlaug et al., 1995; Amunts et al., 1997; Steele et al., 2013). Recent studies from our laboratory have shown that musicians who begin training before age seven perform better on auditory and visual Rhythm Synchronization Tasks (RSTs) even when groups are matched for years of experience, formal training and hours of current practice (Watanabe et al., 2007; Bailey and Penhune, 2010, 2012). More recently, we have also shown that early trained musicians have greater gray matter in the premotor cortex and greater white matter integrity in the corpus callosum (Bailey et al., 2013; Steele et al., 2013). These studies compared early- and late-trained musicians using age seven as the dividing point between the groups. Although chosen based on previous findings (i.e., Schlaug et al., 1995), it can be argued that such a cut-point is arbitrary. Furthermore, sensitive periods arise due to the interaction between specific experience and the maturational trajectories of implicated brain regions. Most of the maturational trajectories of the auditory and motor regions implicated in musical training follow non-linear growth curves (Gogtay et al., 2004; Lebel et al., 2008). Previous studies of second language proficiency have shown that the age at which people acquire a second language shows a non-linear relationship with performance, likely mirroring the maturational trajectories of the relevant language regions (Johnson and Newport, 1989; Flege et al., 1999). Therefore, the purpose of the current study was to examine auditory-motor synchronization performance in a large sample of musicians who began training across a range of ages. Using this data, we then tested to see whether age of start of musical training and task performance followed a linear or a non-linear relationship. In addition, we examined the contribution of other factors, including years of formal musical training and individual differences in auditory working memory.

Previous studies support the sensitive period hypothesis for musical training by reporting differences in brain structure or task performance between groups of early- and late-trained musicians. One of the first studies reported greater corpus callosum surface area among musicians compared to non-musicians and showed that these differences were greater for those who began their training before age seven (Schlaug et al., 1995). In addition, differences in the corticospinal tract between early- and late-trained musicians have been reported (Imfeld et al., 2009), as well as a negative correlation between age of start of musical training and intrasulcal length of the precentral gyrus among keyboardists (Amunts et al., 1997). These results support the idea of a sensitive period for musical training, however, these studies did not control for the confounding fact that those who begin earlier likely have had more musical experience at the time of testing than their late-trained counterparts. Work in our laboratory has controlled for this by using a matching paradigm where groups of early- (ET; *<* 7) and late-trained (LT; *>* 7) musicians are matched for years of total playing experience, as well as years of formal training and hours of weekly practice. Evidence using this paradigm has provided more direct support for a sensitive period for musical training, such that ET musicians have consistently outperformed LT musicians on both visual-motor (Watanabe et al., 2007) and auditory-motor synchronization tasks (Bailey and Penhune, 2010, 2012). More recently, differences observed in the corpus callosum and the pre-motor cortex between ET and LT musicians have been identified using the matching approach (Bailey et al., 2013; Steele et al., 2013). As described above, a sensitive period arises when specific experience interacts with a particular phase of brain maturation. Average anatomical maturational trajectories of gray matter and white matter in several regions of the brain follow non-linear growth curves, with peaks varying between ages 5 and 10 years old with continued, but more subtle change thereafter (Gogtay et al., 2004; Lebel et al., 2008). In other words, the maturation rates in these areas are not consistent across development and have ages at which maturation is greatest, as well as ages at which point maturation slows or plateaus. The evidence reporting the effects of musical training on brain structure is accumulating (for review see Jäncke, 2009; Wan and Schlaug, 2010), however, what remains to be investigated is whether this effect is of a linear nature or whether it mimics the maturational trajectories in the brain and is of a non-linear nature. Previous evidence cited compared groups of early- and late-trained musicians and these findings could be explained by either a linear or a non-linear effect. Generally speaking, it may be the case that the earlier an adult musician begins their training, the better they will perform on a musical rhythm task, suggesting a linear relationship between age of onset of musical training and task performance. However, it may also be that the age at which an adult musician begins their training provides an advantage when performing a musical rhythm task only up to a certain age, after which point, the age of training commencement offers little advantage, depicting a non-linear relationship between age of onset of musical training and task performance. Using a single, large sample of musicians with a wider distribution of age of onset and years of formal training provides a complementary approach to the previously used group comparisons in order to empirically examine the type of relationship between age of onset of musical training and task performance.

If a non-linear effect is present in the data, the predictive strength of age of onset of musical training for task performance may vary as a function of when musical training begins. More specifically, if a sensitive period for musical training does exist, the age at which an individual begins within this window during development (e.g., 4 years old) is likely to predict auditory-motor synchronization performance, however, if an individual begins training outside of this window, the age at which they begin (e.g., 16 years old) may predict their task performance to a much lesser degree. In some cases, with large enough samples, a non-linear effect may be concluded when a third-order function accounts for more variability in the data than a first-order function. Of more relevance to the question at hand, comparing pre- and post-correlation values before and after a break point associated with a sensitive period has also been used. A similar question has been investigated in the domains of second-language acquisition and cochlear implant research. Of these studies, the most relevant to the current data set and question is the work of Johnson and Newport (1989), who investigated the relationship between age of arrival in the United States and English proficiency among second-language learners. They reported that prior to puberty (*<* age 15), a significant correlation between age of arrival and proficiency measures was observed, but no such relationship was observed for individuals arriving after age 15. In other words, age of arrival was only predictive of English proficiency if it took place prior to age 15, supporting a non-linear relationship between age of arrival and English proficiency. Flege et al. (1999) reported further evidence for a non-linear relationship between age of arrival and language proficiency measures in second-language learners. They observed that simple correlations were present for certain ranges of age of arrival but that the correlation was not always consistent across the age range. One of the techniques they used to examine this was testing for discontinuities in their data by selecting specific ages as break points to divide the groups and then comparing the correlations in each group. The analyses in the current study have been largely based on the methods implemented in these studies in order to examine whether a linear or non-linear relationship exists between age of onset of musical training and auditory-motor synchronization performance. In addition to comparing pre- and post-correlations after break point ages, slopes were also compared. Given that these analyses were designed to be exploratory in nature, there were no precise predictions for the current data set.

While age of onset of musical training has typically been the variable of interest for our lab, years of formal training and individual working memory scores are additional characteristics that have shown a relationship with auditory-motor synchronization performance on the RST (RST; Bailey and Penhune, 2010, 2012). This task requires participants to tap in synchrony with a series of auditory rhythms of varying metrical complexity (Chen et al., 2008). Previous studies using ET and LT musicians have revealed that performance on the RST is related to brain structure, musical training and cognitive abilities. ET musicians have been better able to reproduce the temporal structure of the rhythms, with no group differences on standard measures of global cognitive function (Vocabulary and Matrix Reasoning), however, individual working memory scores (Digit Span and Letter-Number Sequencing) have correlated with RST performance (Bailey and Penhune, 2010, 2012). A regression analysis confirmed that, even after considering individual working memory scores, early training accounted for additional variance in RST performance. Similar to working memory, individual years of formal musical training also related to RST performance, even though the musician groups were matched on this variable (Bailey and Penhune, 2010, 2012). Taken together, these results indicate that RST performance is predicted by age at which musical training begins, the number of years of formal training and individual working memory abilities in musicians. Given that the sensitive period may result from an interaction between maturational trajectories of brain regions and experience, it stands to reason that the predictive value of all three of these variables for RST performance may change, depending on when musical training began. One could expect that years of formal training would predict task performance more strongly among those who received years of training in their early childhood years compared to those who received their training in later years. Similarly, the predictive value of working memory scores for performance on the RST might change as a function of when musical training occurred during development. Using an unmatched large single sample of musicians, with a wider distribution of age of start and years of formal training, we have taken the opportunity to investigate these questions to further our understanding of a sensitive period for musical training.

In summary, the current study will explore the nature of the relationship between age of onset of musical training and performance on the RST in a large sample of musicians by first considering a linear correlation model, followed by breakpoint analyses comparing correlation values to determine if the relationship between age of onset and RST performance changes across development, similar to Johnson and Newport's approach (1989) and Flege et al. (1999) exploration of discontinuities in their data sets. Finally, changes in the predictive value of years of formal training and individual working memory scores on RST performance based on when musical training began will be explored. This is the first study to examine the nature of the relationship underlying the sensitive period effect for musical training using a single sample of musicians with a wide range in age of onset of musical training as well as length of formal training.

### **MATERIALS AND METHODS PARTICIPANTS**

The current study uses a sample of 77 musicians between the ages of 18 and 37 (*M* = 24*.*91, *SD* = 4*.*97). This sample includes musicians previously tested in studies comparing early- and late-trained musicians using a matched samples design (Bailey and Penhune, 2010, 2012). For this study we tested additional musicians to cover a broader range of ages of start (3–17). The musical training and experience of each participant was determined through a Musical Experience Questionnaire (MEQ) that was developed within our laboratory (Bailey and Penhune, 2010, 2012). The MEQ quantifies the amount of instrumental and vocal training a musician has received, age of onset of this training, number of years of formal lessons and the amount of time dedicated to practicing on a weekly basis at the time of testing. Musicians had a range of musical experience (**Table 1**). All participants were neurologically healthy and were screened for significant head injuries, history of neurological disease or medication that could affect task performance. All participants gave informed consent and the Concordia University Research Ethics Committee had approved the protocol.

#### **TASKS**

Participants performed the RST (**Figure 1**), which was previously used in Bailey and Penhune (2010, 2012) and is a variant of the task used in Chen et al. (2008). In this task, participants are required to listen to and then tap in synchrony with a series of auditory rhythms of varying metrical complexity. The task consists of six woodblock rhythms varying in metrical structure and difficulty. Each rhythm lasts 6 s and is made up of 11 woodblock notes. Each rhythm contains five eighth


*Standard deviations are in brackets.*

notes (250 ms), three quarter notes (500 ms), one dotted quarter note (750 ms), one half note (1000 ms) and one dotted half note (1500 ms). Each trial has two parts: during the first part, participants listen to the rhythm without responding, and during the second part they listen and tap in synchrony using the computer mouse. Key press responses are recorded by the computer and used to score the data as described below. For a more detailed description of the RST, please see Bailey and Penhune (2010, 2012).

Participants completed the Digit Span and Letter-Number Sequencing subtests from the Wechsler Adult Intelligence Scale-III (WAIS) and the Vocabulary and Matrix Reasoning subtests from the Wechsler Abbreviated Scale of Intelligence (WASI; Wechsler, 1997, 1999). Digit Span requires individuals to recall strings of numbers and Letter-Number Sequencing requires individuals to recall and mentally manipulate strings of letters and numbers. Both of these subtests tap into working memory abilities, however, Letter-Number Sequencing imposes a heavier load on working memory, while Digit Span consists of a rote auditory memory recall section in addition to a mental manipulation section. Vocabulary assesses an individual's ability to orally define words and Matrix Reasoning assesses nonverbal reasoning and visual pattern recognition abilities. Both of these subtests are highly correlated with global IQ, yet are thought to represent different aspects of intelligence (Wechsler, 1999).

#### **PROCEDURE**

All participants followed the same procedure for data collection. Participants first completed one block of the RST followed by the Digit Span test. They then performed the second block of the RST, followed by Vocabulary, Letter-Number Sequencing and finally, Matrix Reasoning.

### **MEASURES**

Information about musical training and experience from the MEQ was quantified for each participant to produce measures of years of experience, years of formal training and hours of weekly practice. Cognitive subtest results were scored according to standard procedure. A composite score for each participant's working memory abilities was created using their Letter-Number Sequencing and Digit Span scores and was used as the Working Memory variable. Performance on the RST was measured using three dependent variables: percent correct (PC), asynchrony (ASYN), and inter-tap-interval (ITI) deviation. A tap was considered correct if it was made within half of the onset-to-onset interval before or after a woodblock note (**Figure 2**). ASYN was defined as the absolute value of temporal difference between the onset of each woodblock note and the associated mouse key press. ITI deviation was calculated by dividing the interval between each pair of the participant's taps by the interval between each corresponding pair of woodblock notes in the rhythms and subtracting this ratio from a value of one. This measure evaluates the extent of deviation of the participant's tap interval from the actual interval between each pair of woodblock notes and is indicative of how well participants reproduce the temporal structure of the rhythms.

#### **DATA ANALYSIS**

In order to replicate findings from Bailey and Penhune (2010, 2012) that age of onset of musical training, individual working memory scores and amount of formal training contribute to RST performance in this larger and unmatched sample, onetailed Pearson correlation analyses were conducted between the variables: ASYN, ITI Deviation, Age of Onset, Working Memory, and Formal Training. PC was not analyzed because it is a global measure of task performance and has not previously revealed group differences between early-trained musicians and latetrained musicians, nor is it informant about the exact timing of participant taps (Chen et al., 2008; Bailey and Penhune, 2010, 2012).

To test for evidence of break points in the data, the musicians were split using four different age of onset values. ET and LT groups were defined by using ages 6–9 (ET ≤ 6, *n* = 30, LT *>* 6, *n* = 47; ET ≤ 7, *n* = 38, LT *>* 7, *n* = 39; ET ≤ 8, *n* = 45, LT *>* 8, *n* = 32; ET ≤ 9, *n* = 50, LT *>* 9, *n* = 27). Correlation analyses were conducted between age of onset and RST performance for each of the ET and LT groups. Correlation coefficients were compared in each condition by calculating a *z*-test statistic according to the method designed by Fisher and slopes were calculated using regression models and compared using *t*-test analyses. Subsequently, formal training and working memory variables were correlated with task performance in the ET and LT groups separately using the strongest break-point age of onset value. These analyses were conducted to investigate differences in predictive strength of task correlates as a function of age of onset of musical training.

### **RESULTS**

Age of onset of musical training did not significantly correlate with task performance measures using a linear correlation model with all musicians (**Table 2**), supporting the possibility of a nonlinear relationship between age of onset of musical training and RST performance. In fact, using the four different break points

#### **Table 2 | Pearson correlation analyses of musical demographics, working memory scores, and RST Performance.**


*A composite score for working memory was created from raw scores on the digit span and letter-number sequencing cognitive subtests.*

*\*p < 0.05.*

*\*\*p < 0.001.*

in age of onset (i.e., ages 6–9) to split the musicians into ET and LT groups yielded results suggesting a non-linear relationship between age of onset of musical training and RST performance. All four break point conditions resulted in differential correlations between groups, with the ET group showing a positive correlation between age of onset and task performance (ASYN and ITI Deviation) and the LT group showing no correlation between age of onset and task performance. Of the four different conditions, when age 9 was used to divide the groups, the correlations between age of onset and task performance reached trend-level in the ET group (**Figure 3D**) and provide the strongest evidence supporting a non-linear relationship. However, the results from the Fisher transformation tests and slope comparison analyses suggest that the relationship between age of onset and task performance is most different when age 7 was used to divide the groups. The correlation results in each of the break point conditions are illustrated in **Figure 3** and the results from the Fisher transformation tests and slope comparisons can be found in **Tables 3**, **4**. Taken together, these results provide evidence that age of onset differentially predicts performance when training begins earlier as compared to later in development, however, there may exist a subtle change in effect around age 7 through 9. These results do not support a discrete cut point, but instead an age-range after which the predictive value of age of start of musical training plateaus.

Linear correlation analyses across all musicians revealed a significant relationship between ITI Deviation and both working memory and formal training (**Table 2**), similar to previous findings. To investigate whether the predictive strength of task correlates differed based on age of start of training, years of formal training and working memory were examined in each musician group, using age 9 (ET ≤ 9, LT *>* 9) as the break point in the age of onset variable. A significant correlation between formal training and task performance (ITI Deviation) was observed for musicians who began training at age 9 or younger (**Figure 4**; *r* = −0*.*345, *p <* 0*.*01), however, this relationship was not significant among musicians who began training later (**Figure 4**; *r* = −0*.*161, *p >* 0*.*05). Working memory correlated with task performance in both groups (**Figure 5**). Finally, **Figure 6** illustrates a significant relationship between formal training and working memory among those who began training earlier but not among those who began their training later. It should be noted that similar patterns for task correlates were observed when age 7 was used as the break point to divide the groups into ET and LT musicians.

**Table 3 | Comparison of Pearson correlation coefficients of task performance and age of onset between early- and late-trained musicians in each age of onset break point condition.**


*†p* <sup>=</sup> *0.1.*

**Table 4 | Comparison of slope values between early- and late-trained musicians in each age of onset break point condition.**


*Standard error values of unstandardized b coefficients (i.e., slope values) are in brackets.*

*\*p < 0.1.*

#### **DISCUSSION**

The results from this study add to the growing body of evidence supporting a sensitive period for musical training. However, these findings are the first to examine whether a linear or non-linear relationship underlies the sensitive period effect for musical training. It is difficult to determine convincing evidence to support a sensitive period associated with musical training due to the effects of additional variables. The current evidence does not point to a single specific age cut-off, but instead to a more subtle age range when the effect of age of start of training decreases or plateaus. The simple correlation break point analyses suggest that age of onset predicts rhythm synchronization performance if musicians begin training at or prior to age 9, but not afterwards. The results from the Fisher's z-transformation analyses and slope comparisons suggest that the relationship between age of onset and task performance differs the most when age 7 is used to divide the groups. Examining task correlates using age 9 to split musicians into Early-Trained and Late-Trained groups revealed that performance on the RST, as assessed by ITI Deviation, correlated with years of formal training only in the Early-Trained group. Working memory scores correlated with ITI Deviation in both groups, however, this correlation was stronger among those who began their training prior to or at age 9. It is important to note that when task correlates were compared between groups using age

7 as a dividing age, the same pattern of findings was observed. Overall, these results suggest that effects associated with age of onset or amount of formal training on the RST are stronger earlier in development, with a change occurring between ages 7 and 9 and may plateau thereafter. While these results are consistent with previous findings reporting a group difference between earlyand late-trained musicians, they introduce the idea of a nonlinear relationship between aspects of musical training (e.g., age of start and years of formal lessons) and auditory-motor synchronization skills across development, mirroring the maturational growth trajectories of the brain regions implicated in playing music.

**FIGURE 4 | Results from the correlation analyses between performance on the Rhythm Synchronization Task (RST) and years of formal training in each musician group using age 9 as the break point.**

Previous studies from our laboratory have investigated a sensitive period for musical training by comparing groups of earlyand late-trained musicians (before and after age seven) who were matched for years of experience in an effort to isolate the effects of age of onset (Watanabe et al., 2007; Bailey and Penhune, 2010, 2012; Bailey et al., 2013; Steele et al., 2013). In contrast, the current study was designed to determine the nature of the relationship between age of onset of training and auditory-motor rhythm synchronization abilities in a single sample of musicians. The results from the correlation analyses support the hypothesis that the relationship between age of onset and task performance is not linear across development. These results are supported by previous research examining sensitive periods in the language and auditory domains showing that age of acquisition of a second language or a cochlear implant and skill development is not a linear relationship across development, but instead reveals evidence for sensitive periods in

**memory and years of formal training in each musician group using age 9 as the break point.**

development when this relationship is strongest (Johnson and Newport, 1989; Flege et al., 1999; Svirsky et al., 2004; Harrison et al., 2005). Furthermore, a non-linear relationship between age of onset and auditory-motor synchronization mirrors the maturational trajectories of the brain regions that comprise the auditory-motor neural network (Gogtay et al., 2004; Lebel et al., 2008). Maturational trajectories for gray and white matter in several brain regions follow a non-linear growth curve, with peaks varying between ages 5 and 10, depending on the region (Gogtay et al., 2004; Lebel et al., 2008). The primary motor cortices are among the first to mature (approximate peak at or prior to age 5; Gogtay et al., 2004), while the pre-motor cortex has a more protracted development (approximate peak at age 8.5; Gogtay et al., 2004). The posterior midbody of the corpus callosum connects the sensorimotor cortices of the two hemispheres (Hofer and Frahm, 2006; Chao et al., 2009) and this region undergoes significant developmental changes between the ages of 6 and 8 (Westerhausen et al., 2011). Interestingly, our previous studies using the matching paradigm reported differences between ET and LT musicians in the pre-motor cortex and the posterior midbody of the corpus callosum (Bailey et al., 2013; Steele et al., 2013).

Given that sensitive periods likely arise due to an interaction between maturational processes and experience, the current behavioral findings support and mimic the non-linear growth curves that have been observed during brain structure development. Previous studies provided evidence for a sensitive period around age seven, even though these studies were not designed to predict a specific age break point. The current results are not contradictory to these previous findings, but offer empirical evidence that ages 7 through 9 may be a period where a non-linear break in the relationship between age of start of musical training and auditory-motor synchronization skills takes place. The present findings suggest that the age of onset of musical training predicts auditory-motor synchronization abilities, if that training happens prior to a certain age range (7–9) but this effect stabilizes later in development, when brain networks are more mature and therefore less influenced by experience or training. Given the current exploratory nature of these analyses, it would be necessary to replicate these findings in a larger sample with more equal representation across age of onset as well as more accurate measures of age of onset (e.g., months), thus allowing the use of more stringent criteria in order to conclude the presence of a nonlinear relationship between age of onset of musical training and auditory-motor synchronization abilities in adulthood.

A secondary, but related, finding from the current study is that formal training relates to RST performance only in early starters. Given the strong correlation with age of onset of musical training (*r* = −0*.*534), it is not surprising that formal training shows a similar non-linear effect on RST performance. It may be that music lessons during the earlier years have a stronger influence on training auditory-motor synchronization skills that are implicated in the RST than music lessons during the later years. Alternately, there are potential differences in the type of formal instruction received in early childhood compared to during the later years. Musical training programs beginning before children are able to read focus on learning by listening to and reproducing music from an auditory model. These skills may be particularly relevant for training auditory-motor synchronization.

Unlike formal training, individual differences in working memory abilities were similarly related to RST performance across both musician groups. In addition, working memory scores were not significantly related to age of onset of musical training overall (*r* = −0*.*116, *p >* 0*.*1). This provides a good reminder that individual working memory abilities are important for RST performance, but do not seem to be related to age of onset of training. Furthermore, a relationship between individual working memory abilities and RST performance was also observed among a group of non-musicians (Bailey and Penhune, 2010, 2012), supporting the results that this relationship is unaffected by age of start of musical training.

Overall, the current study provides additional evidence for the sensitive period hypothesis for musical training and offers a more nuanced view of the relationship between age of onset of musical training and auditory-motor synchronization abilities. These results suggest the presence of a non-linear relationship between age of onset of musical training and auditorymotor synchronization, such that the age at which training begins is related to auditory-motor synchronization abilities in adults, if that training begins in early childhood. This idea of a non-linear relationship is mirrored by growth trajectories of brains regions in the auditory-motor neural network and suggests that brain plasticity may decrease across development.

#### **REFERENCES**


*Ann. N.Y. Acad. Sci.* 1252, 163–170. doi: 10.1111/j.1749-6632.2011. 06434.x


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 16 August 2013; accepted: 09 November 2013; published online: 29 November 2013.*

*Citation: Bailey JA and Penhune VB (2013) The relationship between the age of onset of musical training and rhythm synchronization performance: validation of sensitive period effects. Front. Neurosci. 7:227. doi: 10.3389/fnins.2013.00227*

*This article was submitted to Auditory Cognitive Neuroscience, a section of the journal Frontiers in Neuroscience.*

*Copyright © 2013 Bailey and Penhune. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*