# THE EFFECT OF HEARING LOSS ON NEURAL PROCESSING

EDITED BY: Jonathan E. Peelle and Arthur Wingfield PUBLISHED IN: Frontiers in Systems Neuroscience

#### *Frontiers Copyright Statement*

*© Copyright 2007-2015 Frontiers Media SA. All rights reserved. All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.*

*The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.*

*Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.*

*Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.*

*As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.*

> *All copyright, and all rights therein, are protected by national and international copyright laws.*

*The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use.*

ISSN 1664-8714 ISBN 978-2-88919-540-4 DOI 10.3389/978-2-88919-540-4

## About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

## Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

## Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

## What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

## **THE EFFECT OF HEARING LOSS ON NEURAL PROCESSING**

#### Topic Editors:

**Jonathan E. Peelle,** Washington University in St. Louis, USA **Arthur Wingfield,** Brandeis University, USA

Pattern of noise-induced sensorineural hearing loss as a function of center frequency. Noise overexposure was associated with threshold elevation and increased tuning bandwidth. From Henry KS, Kale S and Heinz MG (2014) Noise-induced hearing loss increases the temporal precision of complex envelope coding by auditory-nerve fibers. Front. Syst. Neurosci. 8:20. doi: 10.3389/fnsys.2014.00020

Efficient auditory processing requires the rapid integration of transient sensory inputs. This is exemplified in human speech perception, in which long stretches of a complex acoustic signal are typically processed accurately and essentially in real-time. Spoken language thus presents listeners' auditory systems with a considerable challenge even when acoustic input is clear. However, auditory processing ability is frequently compromised due to congenital or acquired hearing loss, or altered through background noise or assistive devices such as cochlear implants. How does loss of sensory fidelity impact neural processing, efficiency, and health? How does this ultimately influence behavior?

This Research Topic explores the neural consequences of hearing loss, including basic processing carried out in the auditory periphery, computations in subcortical nuclei and primary auditory cortex, and higher-level cognitive processes such as those involved in human speech perception. By pulling together data from a variety of disciplines and perspectives, we gain a more complete picture of the acute and chronic consequences of hearing loss for neural functioning.

**Citation:** Peelle, J. E., Wingfield, A., eds. (2015). The Effect of Hearing Loss on Neural Processing. Lausanne: Frontiers Media. doi: 10.3389/978-2-88919-540-4

# Table of Contents



Jos J. Eggermont


Jennifer D. Gay, Sergiy V. Voytenko, Alexander V. Galazyuk and Merri J. Rosen

*135 Auditory training during development mitigates a hearing loss-induced perceptual deficit*

Ramanjot Kang, Emma C. Sarro and Dan H. Sanes

*142 Shaping the aging brain: role of auditory input patterns in the emergence of auditory cortical impairments*

Brishna Kamal, Constance Holman and Etienne de Villers-Sidani

*153 Neural correlates of moderate hearing loss: time course of response changes in the primary auditory cortex of awake guinea-pigs*

Chloé Huetz, Maud Guedin and Jean-Marc Edeline

*165 Noise-induced hearing loss increases the temporal precision of complex envelope coding by auditory-nerve fibers*

Kenneth S. Henry, Sushrut Kale and Michael G. Heinz

*175 Unilateral hearing during development: hemispheric specificity in plastic reorganizations*

Andrej Kral, Silvia Heid, Peter Hubka and Jochen Tillein


Sushmit Mishra, Thomas Lunner, Stefan Stenfelt, Jerker Rönnberg and Mary Rudner

*217 Auditory and cognitive factors underlying individual differences in aided speech-understanding among older adults*

Larry E. Humes, Gary R. Kidd and Jennifer J. Lentz


Antje Ihlefeld, Alan Kan and Ruth Y. Litovsky


Julia Erb and Jonas Obleser

*300 The effect of mild-to-moderate hearing loss on auditory and emotion processing networks*

Fatima T. Husain, Jake R. Carpenter-Thompson and Sara A. Schmidt


Robert Becker, Maria Pefkou, Christoph M. Michel and Alexis G. Hervais-Adelman


Claude Alain, Anja Roye and Claire Salloum

## The effects of hearing loss on neural processing and plasticity

Arthur Wingfield<sup>1</sup> \* and Jonathan E. Peelle<sup>2</sup> \*

<sup>1</sup> Volen National Center for Complex Systems, Brandeis University, Waltham, MA, USA, <sup>2</sup> Department of Otolaryngology, Washington University in St. Louis, St. Louis, MO, USA

Keywords: hearing loss, auditory cortex, cognition, aging, listening effort

Hearing loss—ranging from mild to severe—afflicts large numbers of individuals of all ages. It is estimated that 40–50% of adults over the age of 65 years have some degree of significant hearing loss, with this figure rising to 83% of those over the age of 70 (Cruickshanks et al., 1998). This makes hearing loss the third most prevalent chronic medical condition among older adults after arthritis and hypertension (Lethbridge-Cejku et al., 2004). Recent years have seen increasing appreciation for the downstream consequences of reduced hearing acuity, even when perception itself has been successful. In the case of speech, these consequences include negative effects of perceptual effort on encoding what has been heard in memory (Rabbitt, 1991; Surprenant, 1999; Pichora-Fuller, 2003; McCoy et al., 2005; Cousins et al., 2014) and comprehension of sentences whose processing is resource-demanding because of complex syntax (Wingfield et al., 2006). Beyond these short-term effects, there also appear to be small but statistically significant correlations between hearing acuity and the appearance of all-cause dementia (Gates et al., 2011; Lin et al., 2011b) and performance on standardized cognitive tests in non-demented individuals (Lin et al., 2011a). Strikingly, the relationship between hearing acuity and cognitive ability holds even when adjusted for sex, age, education, diabetes, smoking history, and hypertension (Lin, 2011; Lin et al., 2011a; Humes et al., 2013a).

The effects of impaired hearing thus goes beyond difficulty in speech recognition. Speech comprehension in the face of mild-to-moderate hearing loss modifies patterns of neural activation in BOLD imaging, and analyses of structural MRI images have shown that poor hearing acuity is associated with reduced gray matter volume in auditory cortex (Peelle et al., 2011; Eckert et al., 2012; Lin et al., 2014). Findings such as these indicate a biological link between sensory stimulation and cortical integrity, consistent with animal models demonstrating neural reorganization when sensory input is disrupted. In humans, these effects on auditory cortex may have cascading influences throughout the hierarchical set of regions involved in speech processing (Davis and Johnsrude, 2003; Rauschecker and Scott, 2009; Peelle et al., 2010).

Understanding sensory-cognitive interactions represents an important research challenge, especially when changes in hearing acuity are compounded by declines in working memory resources and executive function that often occur in adult aging. One must also note claims of an increase in hearing loss among young adults (Shargorodsky et al., 2010), many of whom remain unaware of their hearing loss and the consequences of perceptual effort on cognitive performance (Widen et al., 2009; Le Prell et al., 2011). At the level of remediation, surgically placed cochlear implants have seen increasing use, to include use with older adults, when hearing acuity has declined to a point where standard hearing aids no longer yield significant benefit (Dillon et al., 2013). This emerging technology will call increasingly on the translational potential of basic research in auditory physiology currently active in human and animal studies.

This research topic presents a collection of original articles that explore the cognitive and neural consequences of hearing loss, including basic processes carried out in the auditory periphery, computations in subcortical nuclei and primary auditory cortex, and higher-level processes such as those involved in human speech perception. Together, these

#### Edited and reviewed by:

Maria V. Sanchez-Vives, Institució Catalana de Recerca i Estudis Avançats (ICREA) and Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Spain

#### \*Correspondence:

Arthur Wingfield and Jonathan E. Peelle, wingfiel@brandeis.edu; peellej@ent.wustl.edu

Received: 14 January 2015 Accepted: 19 February 2015 Published: 06 March 2015

#### Citation:

Wingfield A and Peelle JE (2015) The effects of hearing loss on neural processing and plasticity. Front. Syst. Neurosci. 9:35. doi: 10.3389/fnsys.2015.00035 articles form a compelling body of work demonstrating numerous ways in which brain structure, neural function, and behavior are impacted by hearing loss.

We begin with seven review and theory articles. Rönnberg and coauthors offer a timely update of the Ease of Language Understanding (ELU) model in which they stress the importance of working memory for online spoken language processing, especially under poor listening conditions (Rönnberg et al., 2013). Heald and Nusbaum (2014) continue this theme, arguing that even early-stage speech recognition is an attentionallyguided active process and not as automatic as some have suggested. Review articles by Guediche et al. (2014) and by Keating and King (2013) stress the flexibility in the perceptual system that allows for adaptation to auditory perturbations. Eggermont (2013) and by Butler and Lomber (2013) focus primarily on animal models to explore effects of experience on auditory processing, while Bharadwaj et al. (2014) review human and animals studies demonstrating that precision in temporal coding may be poor even when hearing thresholds are normal. Taken together, these papers emphasize the view that auditory detection thresholds give only a limited picture of auditory and auditory-cortical processing.

Additional evidence bearing on plasticity and development appears in six research articles using animal models. Gay et al. (2014) and Kang et al. (2014) explore mechanisms underlying interactions between early conductive hearing loss and effects on detection tasks in adulthood, while Kamal et al. (2013) focus on impact and reversibility of noise exposure effects in auditory cortex. Huetz et al. (2014) examine functional modification to cortical cells in response to moderate hearing loss. Henry et al. (2014) report effects of noise-induced sensorineural hearing loss on complex temporal coding, and Kral et al. (2013) examine the implications of hemisphere asymmetries in cortical adaptation to unilateral hearing loss in development.

Studies in human listeners reveal many of the same aspects of plasticity in the perceptual system as seen in animal models. Avivi-Reich et al. (2014) illustrate the dynamic interaction between bottom-up input and top-down cognitive factors when older adults are challenged by listening to a target speaker in a background of multiple speakers and when listening in a second language. Mishra et al. (2013) continue this theme with an emphasis on the role of selective attention when listening to speech in noise. Humes et al. (2013b) examine individual difference factors that influence successful speech comprehension

## References


beyond peripheral hearing acuity. The value of in-depth studies of a single individual is illustrated by Firszt et al. (2013) who report neural and performance changes in an adult patient following successful surgery for a congenital unilateral hearing loss. Anderson et al. (2013) offer additional evidence bearing on plasticity in the sensory-cognitive system in a study of compensatory training through directed attention in hearing impaired older adults. McGettigan et al. (2014) address learning-related changes in speech recognition using noise-vocoded speech to simulate the acoustic input available from a cochlear implant. Finally, Ihlefeld et al. (2014) focus their research article on factors relating to cochlear implant recipients' decrements in the use of interaural time differences for localizing sound sources in space.

Considerable advances have been made using a number of human brain imaging techniques, as illustrated by a final eight articles in this collection that have examined effects of hearing loss using diffusion tensor imaging (DTI) to assess white matter integrity (Rachakonda et al., 2014), functional MRI to reveal patterns of neural reorganization and compensatory cognitive control with hearing loss and aging (Erb and Obleser, 2013; Husain et al., 2014), patterns of neural responses using electroencephalograph (EEG) recordings from scalp electrodes (Becker et al., 2013; Campbell and Sharma, 2013; Catz and Noreña, 2013; Tremblay et al., 2014) and magnetoencephalography (MEG) to examine contributory effects of reduced inhibitory control in older adults with hearing impairment (Alain, 2014).

Together, these collected articles reflect a valuable sample of current approaches to our understanding of the effects of hearing loss on neural and perceptual processing. A theme that emerges from both the human and animal studies in this collection is that of an adaptive plasticity in the sensory, perceptual and cognitive systems that regulates performance in the face of often seriously degraded input. Challenges for future research include better understanding the link between neural consequences of hearing loss and other modifications of acoustic input (Van Engen and Peelle, 2014) and a more direct linking of hearing ability, brain structure, neural function, and behavior.

## Acknowledgments

The preparation of this manuscript was aided by NIH grants R01AG019714 and R01AG03890 from the National Institute on Aging and The Dana Foundation.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Wingfield and Peelle. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Jerker Rönnberg1,2\*, Thomas Lunner 1,2,3,4, Adriana Zekveld2,5, Patrik Sörqvist 2,6, Henrik Danielsson1,2, Björn Lyxell 1,2, Örjan Dahlström1,2, Carine Signoret 1,2, Stefan Stenfelt 2,3, M. Kathleen Pichora-Fuller 2,7,8,9 and Mary Rudner 1,2*

*<sup>1</sup> Department of Behavioural Sciences and Learning, Linköping University, Linköping, Sweden*

*<sup>2</sup> Linnaeus Centre HEAD, Swedish Institute for Disability Research, Linköping University, Linköping, Sweden*

*<sup>3</sup> Department of Clinical and Experimental Medicine, Linköping University, Linköping, Sweden*


*<sup>9</sup> The Rotman Research Institute, Baycrest Hospital, Toronto, ON, Canada*

#### *Edited by:*

*Arthur Wingfield, Brandeis University, USA*

#### *Reviewed by:*

*Natasha Sigala, University of Sussex, UK Begoña Díaz, University Pompeu Fabra, Spain*

#### *\*Correspondence:*

*Jerker Rönnberg, Department of Behavioural Sciences and Learning, Linköping University, SE-581 83 Linköping, Sweden e-mail: jerker.ronnberg@liu.se*

Working memory is important for online language processing during conversation. We use it to maintain relevant information, to inhibit or ignore irrelevant information, and to attend to conversation selectively. Working memory helps us to keep track of and actively participate in conversation, including taking turns and following the gist. This paper examines the Ease of Language Understanding model (i.e., the ELU model, Rönnberg, 2003; Rönnberg et al., 2008) in light of new behavioral and neural findings concerning the role of working memory capacity (WMC) in uni-modal and bimodal language processing. The new ELU model is a meaning prediction system that depends on phonological and semantic interactions in rapid implicit and slower explicit processing mechanisms that both depend on WMC albeit in different ways. It is based on findings that address the relationship between WMC and (a) early attention processes in listening to speech, (b) signal processing in hearing aids and its effects on short-term memory, (c) inhibition of speech maskers and its effect on episodic long-term memory, (d) the effects of hearing impairment on episodic and semantic long-term memory, and finally, (e) listening effort. New predictions and clinical implications are outlined. Comparisons with other WMC and speech perception models are made.

**Keywords: working memory capacity, speech in noise, attention, long-term memory, hearing loss, brain imaging analysis, oscillations, language understanding**

## **BACKGROUND**

#### **OVERVIEW**

Some 30 years ago, we began a program of research to investigate the factors related to individual differences in speechreaders' ability to understand language. The findings underscored the importance of working memory capacity (WMC) for explaining those individual differences. In subsequent research, we extended our investigations to examine the associations between WMC and language understanding in other conditions, with the most recent focusing on audio-only speech understanding in adverse listening conditions by listeners using hearing aids.

The Ease of Language Understanding model (i.e., the ELU model, Rönnberg, 2003; Rönnberg et al., 2008) was developed, tested, and refined in an attempt to specify the role of working memory (WM) in a wide range of conditions in which people with normal or impaired hearing understand language. The language signal may be uni-modal or bi-modal speech or sign language and background conditions are realistic but provide contextual support or environmental challenge to differing degrees.

#### **SPEECHREADING AS COMPENSATION FOR HEARING LOSS**

Hearing loss leads to poorer perception of auditory speech signals and greater reliance on visual information available from the talker's face. Thus, we hypothesized, initially, that daily practice in visual speechreading by individuals with profound hearing loss or deafness would lead to superior, *compensatory* speechreading or speech understanding skills in comparison to normally hearing peers. One of the findings that motivated this hypothesis was that visual speechreading ability varies enormously between individuals (see Rönnberg, 1995 for a review). To test this hypothesis, we conducted several studies of speechreading in well-matched groups of individuals with normal hearing, moderate hearing loss and profound hearing loss. Contrary to the prediction, there were no significant group differences and thus no evidence of compensation for hearing loss by better use of visual speech information. Results were similar irrespective of type of presentation (video vs. real-life audiovisual; Rönnberg et al., 1983), type of materials (digits vs. discourse; Rönnberg et al., 1983), for just-follow conversation tasks (Hygge et al., 1992), for different durations of impairment, and for different degrees of hearing loss (e.g., Lyxell and Rönnberg, 1989; Rönnberg, 1990; Rönnberg et al., 1982, 1983). Spontaneous compensation for hearing loss through speechreading seemed, therefore, to be a cherished myth (Rönnberg, 1995).

#### **PERCEPTUAL AND COGNITIVE SKILLS**

These data prompted us to look for other ways to try to explain at least parts of the large variability in speech understanding observed across individuals (Rönnberg et al., 1998a, for an overview). In a set of studies, we identified the following predictor variables: verbal inference-making (sentence completion, Lyxell and Rönnberg, 1987, 1989), context-free word decoding (Lyxell and Rönnberg, 1991), and information processing speed that relies on semantic long-term memory (LTM; e.g., lexical access speed, Rönnberg, 1990; as well as rhyme decision speed; Lyxell et al., 1994; Rönnberg et al., 1998b). Indirect predictors of sentence-based speechreading performance included the VN 130/P200 peak-to-peak amplitude measure in the visual evoked potential (Rönnberg et al., 1989); WMC measured by the reading span test (Lyxell and Rönnberg, 1989; Pichora-Fuller, 1996); and verbal ability (Lyxell and Rönnberg, 1992). Overall, the indirect predictors were found to be related to sentence-based speechreading *via* their relationships with the direct predictors. This set of results demonstrated that WMC is strongly related to verbal inference-making, which in its turn is related to speechreading skill (Lyxell and Rönnberg, 1989); the amplitude of the visual evoked potential is related to speechreading via word decoding (Rönnberg et al., 1989); and verbal ability is related to speechreading via its relation to lexical access speed (Lyxell and Rönnberg, 1992).

#### **OTHER MODALITIES OF COMMUNICATION**

A more general picture emerged as evidence accumulated that many of the predictor variables also related to other forms of communication. Successful visual-tactile speech communication and cued speech (i.e., a phonemic-based system which uses hand shapes to supplement speechreading) are predicted by phonological skills (e.g., Leybaert and Charlier, 1996; Bernstein et al., 1998; Leybaert, 1998). The precision of a phonological representation assessed by text-based rhyme tests has been shown to be an important predictor of the rate of visual-tactile (Rönnberg et al., 1998b; Andersson et al., 2001a,b), and visual speech tracking (Andersson et al., 2001a,b). In the same vein, audio-visual speech understanding in cochlear implant (CI) users is predicted by both phonological ability and individual differences in WMC measured using a reading span test (Lyxell et al., 1996, 1998).

#### **IMPORTANCE OF WM**

Thus, about a decade ago, the data were pointing to an important role for WMC in predicting, directly or indirectly, the individual differences in speech understanding in one or more modalities. Testing participants who were hard-of-hearing or deaf provided clues as to how to re-conceptualize theories concerning speech understanding in individuals with normal hearing to take into account how their performance varies across a continuum from ideal to adverse perceptual conditions. However, more direct tests of the hypothesis concerning the importance of WMC for speech understanding in atypical cases and conditions were needed.

### **SPECIFYING THE ROLE OF WMC IN EASE OF LANGUAGE UNDERSTANDING DEFINING WM**

WM is a limited capacity system for temporarily storing and processing the information required to carry out complex cognitive tasks such as comprehension, learning, and reasoning. An individual's WMC, or span, is measured in terms of their ability to simultaneously store and process information. Importantly, *complex* WMC is the crucial ability when it comes to understanding language, viz being able to store *and* process information relatively simultaneously. Simple span tests, such as digit span, mainly tap storage functions in short-term memory, and tend not to be such good predictors of language comprehension, reading ability and speechreading ability [for an early review see Daneman and Merikle (1996); but see Unsworth and Engle (2007)]. We have usually assessed WMC using a *reading* span test (Daneman and Carpenter, 1980; Rönnberg et al., 1989; Just and Carpenter, 1992). In the reading span test procedure, the participant reads a sentence as quickly as possible and then performs a task to ensure that the sentence has been fully processed. After a small set of sentences has been presented, read and understood, the participant is asked to recall either the first or last word of each of the sentences in the set in the order in which they were presented. Set size gradually increases and the WM span is determined to be the largest set size for which the individual can correctly recall a minimum specified proportion of the words. We have found in our research that the total number of words correctly recalled in any order, is a more sensitive predictor variable than set size (Rönnberg et al., 1989; Lunner, 2003; Rönnberg, 2003). The basic assumption is that, as the processing demands of the reading span task increase, there will be a corresponding decrease in how much can be stored in the limited capacity WM system. Total reading span score is used to gauge this trade-off between WM processing and storage.

There is a strong cross-modal relationship between reading span scores (visual-verbal) and spoken communication skills (auditory-verbal), implying that it is supported by a *modalitygeneral* ability (cf. Daneman and Carpenter, 1980; Just and Carpenter, 1992; Kane et al., 2004). This may explain why WMC has a predictive power that applies to several communicative forms (e.g., Ibertsson et al., 2009). Moreover, the reading span test (and other similar complex span tests) seems to tap into semantic *processes* such as inhibition of irrelevant information (in particular inhibition of context-irrelevant word-meaning; Gunter et al., 2003), the ability to *selectively attend to* one channel of information (Conway et al., 2001), the ability to *divide attention* between channels (Colflesh and Conway, 2007), and the ability to *store* and integrate signal-relevant prior *semantic* cues (Zekveld et al., 2011a, 2012). The similiarity of results across the wide range of conditions applied in these studies supports the role of WMC in on-line language processing. Our research assessed the role of WMC during language understanding in various conditions such as when speech is processed visually by speechreaders, when auditory speech is heard in noise, by listeners with hearing impairment, when hearing aids are used, and when sign rather than speech is the signal used to convey language.

#### **EXTREME SPEECHREADING SKILL**

Case studies of extremely skilled speechreaders (Rönnberg, 1993; Lyxell, 1994; Rönnberg et al., 1999) demonstrated that bottomup processing skills (e.g., lexical access speed and phonology) are only important up to a certain *threshold*, or level of language understanding. The threshold is assumed to be due to the efficiency of phonologically mediated lexical access, constrained by neural speed at different levels in the perceptual-cognitive system (Pichora-Fuller, 2003). We showed that to surpass this threshold, and to become speechreading experts, the individual has to be equipped with large complex WMC and related verbal inference-making and/or executive skills (see also Andersson and Lidestam, 2005). This seemed to be true irrespective of communicative habit—participant GS used tactile speechreading (Rönnberg, 1993), participant MM was bilingual in sign and speech (Rönnberg et al., 1999), and participant SJ used visual speechreading strategies only (Lyxell, 1994). The effects could not be explained in terms of age, degree of hearing loss, or even onset of the loss.

#### **NOISE, HEARING LOSS, AND HEARING AIDS**

The importance of predictions of individual differences in speech understanding based on reading span was crucially demonstrated when a strong association was found with spoken sentence recognition in noise by individuals with hearing loss irrespective of whether they were tested with or without hearing aids (Lunner, 2003). During conversation, the individual who has impaired hearing must orchestrate the interplay between distorted perceptual input, LTM, and contextual cues. We argue that the storage and processing abilities reflected in complex WMC tasks are essential for such compensatory interactions in people with hearing loss. WMC also seems to play an important role when people with normal hearing must understand language spoken in acoustically adverse conditions (for discussions see Mattys et al., 2012; Pichora-Fuller et al., 1995; McKellin et al., 2007).

#### **SIGN LANGUAGE**

Insight into the role of WM in sign language communication also led to a series of studies at our lab investigating the neurocognitive mechanisms of WM for sign language (Rönnberg et al., 2004; Rudner et al., 2007, 2010, 2013; Rudner and Rönnberg, 2008a,b). These studies demonstrate similar neurocognitive mechanisms across language modalities with some modality-specific aspects. These language modality-specific differences include a greater involvement of superior parietal regions in WM for sign language and a de-emphasis of temporal processing mechanisms.

Taken together, the speech understanding and sign language findings set the stage for formulating the Ease of Language Understanding model (ELU, see Rönnberg, 2003; Rönnberg et al., 2008) to extend existing more general models of WM in order to account for a wide range of communication conditions.

## **THE ORIGINAL WM SYSTEM FOR** *E***ASE OF** *L***ANGUAGE** *U***NDERSTANDING (ELU)**

The broader context of the ELU model is that of cognitive hearing science. Cognitive Hearing Science is the new field that has emerged in response to general acknowledgement of the critical role of cognition in communication (Arlinger et al., 2009). Characteristic of cognitive hearing science models is that they emphasize the subtle balancing act, or interplay between bottom-up and top-down aspects of language processing (e.g., Schneider et al., 2002; Scott and Johnsrude, 2003; Tun et al., 2009; Mattys et al., 2012). The ELU model describes how and when WM is engaged to support listening in adverse conditions, and how it interacts with LTM. In the original version we did not distinguish between episodic and semantic LTM but subsequent research and theoretical development have proven that this is an important distinction (see under EXTENDING THE ELU APPROACH:.). Episodic memory is memory of personally experienced events (tagged by time, place, space, emotion and context, see Tulving, 1983). Semantic memory refers to general knowledge, without personal reference (e.g., vocabulary and phonology).

In the original model (see **Figure 1**; Rönnberg, 2003; Rönnberg et al., 2008; Stenfelt and Rönnberg, 2009), we assumed that multimodal speech information is Rapidly, Automatically, and Multimodally Bound into a PHOnological representation in an episodic buffer (cf. Baddeley, 2000, 2012) called RAMBPHO. RAMBPHO is assumed to operate with syllables that feed forward in rapid succession (cf. Poeppel et al., 2008; Bendixen et al., 2009). If the RAMBPHO-delivered sub-lexical information matches a corresponding syllabic phonological representation in semantic LTM, then lexical access will be successful and there is no need for top-down processing. And, if RAMBPHO continues to provide matching syllabic information, lexical retrieval will continue to occur implicitly and at a rapid rate. The time-window for the assembly of the RAMBPHO information and for successful lexical retrieval is assumed to start when activation begins at a cortical level [superior temporal gyrus (STG)/posterior superior temporal sulcus; (Poeppel et al., 2008)], where the neural binding of syllabic auditory and visual speech seems to occur (around 150 ms after speech onset, Campbell, 2008), and then it generally takes another 100–250 ms before lexical access presumably occurs supported by neural mechanisms in the left middle temporal gyrus(MTG)/inferior temporal gyrus (Poeppel et al., 2008; see also Stenfelt and Rönnberg, 2009). If, however, the RAMBPHO information cannot be immediately related to phonological representations in semantic LTM or is not precise enough to match them unambiguously, lexical access is delayed, temporarily disrupting the feed-forward cycle of information flow. Explicit and deliberate WM processes are assumed to be invoked to compensate for this mismatch between RAMBPHO output and LTM representation. These explicit processes typically operate on another time scale, measured in seconds rather than milliseconds (Rönnberg et al., 2008). Examples of such processes include inference-making, semantic integration, switching of attention, storing of information, and inhibiting irrelevant information. While the source of the mismatch is at the lexical level, later explicit compensation may involve other linguistic

levels. WMC is assumed to be required for most explicit processing aspects/subskills.

Generally then, depending on the conditions under which the incoming speech signal unfolds (ambient noise, hearing impairment, signal processing in the hearing aid, etc.), as well as the precision and quality of the semantic LTM representation, the relative contributions of explicit and implicit processes will continuously fluctuate during a dialogue.

### **INITIAL TESTS OF THE MODEL**

The experimental studies performed to test the model have principally used two types of manipulation, one based on hearing aid signal processing and one based on presentation of text cues in order to induce a mismatch between RAMBPHO and semantic LTM. Initial testing was done primarily with spoken language presented in auditory noise, but the model is also likely to be applicable to signed languages presented in visual noise (cf. Speranza et al., 2000). Brain imaging studies (e.g., Söderfeldt et al., 1994; Rönnberg et al., 2004; Rudner et al., 2007, 2013; Cardin et al., 2013) suggest that similar but not identical neural networks are active for processing sign language and speech, and that the close relation between semantics and phonology in sign language may influence the mismatch mechanism (Rudner et al., 2013).

#### **USING AUDITORY SIGNAL PROCESSING MANIPULATIONS TO INDUCE PHONOLOGICAL MISMATCH**

Wide Dynamic Range Compression (WDRC) is one of the technologies used in modern digital hearing aids to increase speech intelligibility by applying non-linear amplification of the incoming signal such that soft sounds become audible without loud sounds becoming uncomfortable. However, this non-linear signal processing can also have side-effects that distort the phonological properties of speech, especially when compression release is fast. We used this phenomenon to investigate the main prediction of WM-dependence in the ELU model in experienced hearing-aid users. Hearing aids were experimentally manipulated such that participants received WDRC for the first time. According to the model, given that a syllabic segment in the speech stream is processed with a new algorithm, the sound may seem different compared to the one delivered by the habitual algorithm, thus causing a relative RAMBPHO-induced mismatch with the phonological-lexical representation in LTM. Results showed that individual differences in WMC accounted for most of the variance in the threshold for 50% correct word recognition on speech-in-noise tests, irrespective of whether stationary or modulated noise backgrounds were applied (Foo et al., 2007; cf. Desjardins and Doherty, 2013). This means that as long as we disrupt the habitual processing mode, WMC is invoked.

A follow-up intervention study was conducted to investigate how the relationship between WMC and mismatch might change as the individual acclimatized to a new hearing aid algorithm. Again, participants who were habitual hearing aid users were switched to a new fast or slow signal processing algorithm in the hearing aid. After nine weeks of experience with one kind of hearing aid compression, participants were tested either with the same kind of compression ("matching" conditions), or with the other kind of compression of which they had no experience ("mismatching" conditions). As predicted, in one study conducted in Swedish (Rudner et al., 2009a,b) and in another conducted in Danish (Rudner et al., 2008), thresholds for 50% correct word recognition on speech-in-noise tests for mismatching compression conditions were correlated with WMC. WMC was not the main predictor of speech-in-noise thresholds for matching conditions. Independent studies support the notion that WMC is crucial to speech understanding in adverse conditions by hearing aid users (Gatehouse et al., 2003, 2006; Lunner, 2003; Akeroyd, 2008; Rudner et al., 2011; Mattys et al., 2012).

Using the visual letter monitoring task as an index of WMC, Lunner and Sundewall Thorén (2007) showed that WMC accounted for about 40% of the variance in the ability to perceive speech in modulated background noise with FAST compression. Pure-tone average hearing loss, on the other hand, accounted for less than 5% of the variance. Importantly, the pattern was reversed when compression by the hearing aid was SLOW and tests were conducted in steady-state noise conditions: WMC explained only 5% of the variance while pure-tone hearing loss explained 30%. Lunner and Sundewall Thorén (2007) suggested that FAST compression in modulated noise backgrounds better reflects the more rapid changes in the signal and noise characteristics of everyday listening conditions. Hence, using SLOW compression in steady-state noise conditions may underestimate everyday cognitive demands (cf. Festen and Plomp, 1990). The conclusions drawn in the studies using WDRC are consistent with recent findings in which other advanced hearing aid signal processing algorithms were used: Arehart et al. (2013) found that a high degree of frequency compression reduced intelligibility more for individuals with low WMC compared to individuals with high WMC, especially for older adults.

The emerging picture seems to be that advanced signal processing algorithms designed to improve intelligibility and listening comfort may also generate RAMBPHO-dependent mismatch due to distortions at the syllable level caused by unfamiliar amplitude or frequency compression. Thus, there is a benefit and a cost from such signal processing. Mismatches, or costs, are overcome more successfully by individuals with high WMC.

#### **USING TEXTUAL MANIPULATIONS TO CREATE PHONOLOGICAL MISMATCH**

Severe hearing loss can lead to phonological deterioration in semantic LTM (Andersson, 2002; Lazard et al., 2010; Rönnberg et al., 2011b). Classon et al. (2013a) undertook a study that tested the hypothesis that high WMC can compensate for poor phonological skills in individuals with hearing impairment. To avoid audibility problems, phonological mismatch was manipulated using text rather than speech. Classon et al. (2013a) showed that hearing impairment negatively affected performance on a textbased task in which participants decide if two words rhyme or not in RAMBPHO-dependent, mismatching conditions. Mismatch was created in conditions where the two test words rhymed but were orthographically dissimilar, or alternatively, did not rhyme but were orthographically similar (Lyxell et al., 1993, 1998; Andersson and Lyxell, 1998; Andersson, 2002). In the latter case, orthographic similarity may induce an incorrect "yes" response when words do not rhyme, if the phonological precision of representations in semantic LTM is compromised. The prediction based on the ELU model is that participants who have a high WMC will be able to compensate for poor phonological representations because they can keep representations in mind and double-check back and forth to ensure that the words really do not rhyme before they decide. The data confirmed this prediction. Hearing impaired participants with high WMC performed on a par with normal hearing participants, whereas hearing impaired participants with low WMC displayed higher error rates than the normal hearing subgroups with low WMC. Note that hearing impairment did not confound the results since the level of WMC was matched across groups with normal hearing and hearing impaired participants, and there was no difference in the degree of hearing impairment between the high vs. low WMC subgroups.

#### **SEMANTIC STRATEGY IN RHYME TASKS**

The effects of hearing impairment on the mismatching conditions in the yes/no rhyme task may be attributed to imprecise phonological representations in semantic LTM. This may lead automatically to a non-phonological orthographic bias, and perhaps even a semantic bias, when written words are presented in a rhyme task, especially for individuals with hearing impairment who have a low WMC. The plausibility of such an explanation was reinforced by the finding that participants with low WMC outperformed participants with high WMC on subsequent incidental recognition of items that had been correctly identified in the initial rhyme testing phase. Since semantic processing has been shown to promote episodic LTM (e.g., Craik and Tulving, 1975), a semantic interpretation of this seemingly paradoxical result may fall into place.

Connected to this semantic interpretation of the rhyming data, is the fact that the test of WMC that we have been discussing so far, the reading span test, also measures important semantic interpretation processes. Although the semantic absurdity judgments typically demanded in this task (Rönnberg et al., 1989) were initially introduced to ascertain that the participants actually processed the whole sentence rather than strategically focusing only on the first or final words, semantic processing may in itself be an important component of the test. Indeed, sentence completion ability (tapping semantic integration and grammar) under time pressure is significantly correlated to performance in the reading span test (Lyxell and Rönnberg, 1989). Although the reading span test taps into several storage and processing components summarized by one measured variable, a semantic perspective on reading span may cast new light on old data in that rapid sensemaking and semantic judgment is demanded in the reading span test as well as in the sentence-completion task.

#### **NEURAL SIGNATURES OF TEXT-SPEECH SEMANTIC MISMATCH**

WMC, again measured with the reading span task, has in recent studies been shown to modulate the ability to use semantically related cues and to suppress unrelated, "mismatching" cues to help understand speech in noise (Zekveld et al., 2011a, 2012). Interestingly, both the WMC of the participants and the lexicality of text cues modulated neural activation in the left inferior frontal gyrus (LIFG) and the STG during speech perception. Presumably, these areas are related to compensatory processes in semantic cue utilization. Independent data also suggest that LIFG is involved in semantic and syntactic processing networks (Rodd et al., 2010). Cortical areas beyond the temporal lobe are engaged in the processing of intelligible but degraded speech (Davis and Johnsrude, 2007). It is quite plausible that there is a functional connectivity between LIFG and STG, and that LIFG modulates STG via topdown connections when semantic processing is involved (Obleser and Kotz, 2011). In fact, the general picture is that there are ventral and dorsal pathways that connect pre-frontal and temporal language-relevant regions which support semantic and syntactic processes (Friederici and Gierhan, 2013).

#### **INTERIM SUMMARY**

Thus far, we can infer the architecture of a WM system (i.e., the ELU model) that is invoked when there is some kind of signal processing that changes the phonological structure of the speech signal, or when there is a combination of signal processing and fluctuating background noise that puts large demands on phonological processing. In addition, there is also new evidence to suggest that a semantic mismatch requires WM resources to help focus on the target-speech signal while inhibiting distracting semantic cue information as will be discussed further below.

#### **EXTENDING THE ELU APPROACH: WMC RELATED TO ATTENTION, MEMORY SYSTEMS, AND EFFORT**

In the following sections, evidence is reviewed indicating that WMC plays a part in (a) "early" attention processes, (b) shortterm retention of spoken information when the signal is processed by hearing aids, (c) inhibition and episodic LTM of masked speech, (d) the effects of hearing impairment on episodic and semantic LTM, and (e) listening effort. These data—behavioral and physiological—have shaped a new version of the ELU model, which will be presented subsequently.

#### **WMC INFLUENCES EARLY ATTENTIONAL PROCESSES**

This section suggests that high WMC is associated with neural interactions that facilitate attention and which are important for further speech signal processing (Peelle and Davis, 2012). This kind of cognitive tuning of the brain does not seem to involve any explicit processing component, although it is dependent on WMC.

WMC is related to the ability to inhibit processing of irrelevant information and overrule undesired but pre-potent responses (e.g., Kane et al., 2001; Engle, 2002). More precisely, high-WMC individuals appear to have a superior ability to modulate attention span (i.e., how much information that has access to cognitive processing). Where in the processing chain filtering out of irrelevant information takes place is still a subject of debate. Relations between WMC and early cortical auditory processing (as reflected in the amplitude of the N1 component of event-related potential measures) have been demonstrated with greater amplitudes for attended sound and lesser amplitudes for ignored sound in high-WMC individuals (Tsuchida et al., 2012). However, a recent experiment in our lab (Sörqvist et al., 2012a) suggests that WMC is involved in filtering processes at even earlier (sub-cortical) stages. Normally-hearing participants visual-verbal performed a visual *n*-back (1-, 2-, 3-back) task (Braver et al., 1997) while being presented with to-be-ignored background sound. In a control condition, the participants just heard the sound and did not perform any task. In the *n*-back task, WM load increased with increasing *n* and the control condition represented least load. The magnitude of the auditory brain stem response (ABR, wave V, on average 7 ms post-stimulus onset) was negatively associated with WM load. Moreover, higher WMC scores were related to a greater difference of the ABR between conditions. Thus, both the experimental load manipulation and correlational evidence converge on the same conclusion: early attentional processes interact with WM. Our interpretation is that cognitive load reduces resources at the peripheral level, and the relation with WMC suggests a relationship between central and peripheral capacity.

One mechanism underpinning this relation might be the alpha rhythm. Alpha rhythms reflects the cognitive system's pre-stimulus preparation for incoming stimuli, enabling efficient processing (Babiloni et al., 2006), and have been associated with both WM load *and* processing of acoustically degraded stimuli (Obleser et al., 2011). Moreover, in a recent focused review of brain oscillations and WM, it was suggested that the alpha rhythm serves as an attentional gate-keeper to optimize the signal-tonoise ratio for WM-based processing, and that the number of gamma cycles that fit within one theta cycle may index WMC (Freunberger et al., 2011).

However, single indices may only tell part of the story of how brain oscillations relate to WM. In a recent review, it has been argued that the correlations between oscillatory phases in different brain regions, so called phase synchronization, affect the relative timing of action potentials. This is important for a memory system such as WM, which in turn depends on the interaction between different storage and executive processing components (and their corresponding phases), for example, phase correlations between pre-frontal and temporal regions (Fell and Axmacher, 2011).

Thus, a high WMC may facilitate neural fine-tuning at an early level of auditory processing (cf. Pichora-Fuller, 2003; Sörqvist et al., 2012a) but may also reflect a highly synchronized brain network (Fell and Axmacher, 2011). The conclusion about some kind of fine-tuning is further reinforced by the finding that WM processes are interconnected with the effects of practice on auditory skills (Kraus and Chandrasakaren, 2010) and their corresponding neural signatures (Kraus et al., 2012).

All in all then, data from independent labs suggest that WMC is related to several brain oscillation indices, and that WMC is related to early attention processes. This WMC-based top-down influence on speech-relevant attention processes may be part of the explanation as to why attending to a speaker in a multitalker situation gives rise to dedicated neural representations (Mesgarani and Chang, 2012).

#### **WMC INTERACTS WITH SIGNAL PROCESSING AND SHORT-TERM RETENTION**

This section presents data showing for the first time that hearing aid signal processing can improve short-term memory in hearingimpaired individuals, and that this effect is modulated by WMC (Ng et al., 2013a,b). This may prove to have important clinical consequences (Piquado et al., 2012).

Even when audibility is controlled (e.g., by amplifying speech with hearing aids), individuals with hearing impairment still perform worse than young normal-hearing subjects, with cognitive factors accounting for residual variance in performance (e.g., Humes, 2007). Attentional resources may contribute to speech understanding, especially in effortful or divided attention tasks (Tun et al., 2009; Rönnberg et al., 2011a,b). For example, Tun et al. (2009) have shown poorer delayed recall for audible auditory stimuli in participants with impaired compared to normal hearing when encoding took place under divided attention conditions. Rönnberg et al. (2011a,b) also demonstrated that short-term memory performance under divided attention encoding conditions correlated with degree of hearing impairment (cf. Humes et al., 2006).

Hearing aid signal processing schemes may reduce attention costs while listening to speech in noise and thus improve speech understanding. It has been demonstrated that noise reduction signal processing reduces listening effort for people with normal hearing (Sarampalis et al., 2009). In a recent study (Ng et al., 2013a,b), we examined how hearing aid signal processing influences word recall in people with *hearing impairment*. The scheme under investigation was binary time-frequency masking noise reduction (Wang et al., 2009). Each participant listened to sets of eight sentences from the Swedish Hearing-In-Noise-Test (HINT) materials (Hällgren et al., 2006) in 4-talker babble or stationary noise, with and without noise reduction. To control audibility, SNRs were individualized such that performance levels were around 95% for word recognition in stationary noise with individual linear amplification and individually prescribed frequency response. Typical SNRs for 95% correct were around +5 dB. Each participant recalled as many sentence-final words as possible after each set of sentences had been presented. We found that participants performed worse in noise than in quiet and that this effect was partially restored by noise reduction. In particular, individuals with high WMC recalled significantly more of the items from the end of the lists (recency position) presented in noise when noise reduction was used.

Thus, WMC interacts with signal processing in hearing aids and facilitates short-term memory. There is obviously room for improvement even when the audibility of the signal is good, a fact that offers a new perspective on how to conceptualize benefits from different kinds of signal processing in hearing aids.

#### **WMC—ESPECIALLY THE INHIBITORY ASPECTS—DETERMINE EPISODIC LTM FOR PROSE MASKED BY SPEECH**

This section is about how WMC relates to inhibition of an interfering talker during listening to sentences and to later long-term episodic recall.

We have recently shown that WMC seems to be related to long-term retention of information that is conveyed by masked speech (Sörqvist et al., 2012b). Young, normally-hearing students listened to invented stories (each about 7.5 min long) about fake populations and afterwards answered questions about their content (e.g., what did the lobiks wear in the kingdom of death?). The stories were spoken in a male voice and masked by another male voice (normal or spectrally-rotated; Scott et al., 2009).

Two types of complex WMC tests were administered separately: the reading span and the size-comparison (SIC) span test (Sörqvist et al., 2010). The SIC span is a WMC test that targets the ability to resist semantic confusion. It involves comparing the size of objects while simultaneously maintaining and recalling words taken from the same semantic category as the tobe-compared words. The distinguishing feature of the test is that the semantic interference between the comparison words and the to-be-recalled words must be resolved by inhibiting the potential semantic intrusions from the comparison words.

Ability to answer content questions was superior when the story was masked by a rotated as compared with a non-rotated speech signal. More importantly, SIC span was a better predictor variable than reading span of the magnitude of this difference (Sörqvist et al., 2012b). We argue that the inhibition ability tapped by SIC span is involved during resolution of the confusion between competing and target speech and that better resolution enhances episodic encoding and retrieval. This will, at least in part, determine an individual's ability to remember the important parts of a conversation.

Speech-in-speech processing studies have typically addressed speech perception as such (e.g., Bronkhorst, 2000). Our contribution is that we associate WMC—and the inhibition component in particular—with the encoding carried out during speechin-speech comprehension, and how this type of WMC encoding relates to episodic LTM (cf. Hannon and Daneman, 2001; Schneider et al., 2010). There is some evidence of a relation between episodic LTM and cognitive spare capacity (Rudner et al., 2011; Mishra et al., 2013).

#### **DEGREE OF HEARING IMPAIRMENT IN HEARING AID USERS IS ASSOCIATED WITH EPISODIC LTM**

This section summarizes a recent cross-sectional study on a sample of hearing aid users and how their hearing thresholds are associated with the efficiency of different memory systems.

Despite the possibility of using hearing aids, hearing problems continue to occur in everyday listening conditions. Many people who own hearing aids do not use them on a regular basis. For those who do wear them regularly, signal processing algorithms in hearing aids cannot generally provide an optimal listening situation in noisy and challenging conditions (Lunner et al., 2009). By including hearing aid users (*n* = 160) from the longitudinal Betula study of cognitive aging (Nilsson et al., 1997), we made a conservative test of the hypothesis that hearing impairment is negatively related to episodic LTM deficits. The basis of the prediction from the ELU model (Rönnberg et al., 2011a,b) is that mismatches will remain despite the use of a hearing aid, and hence fewer items will be encoded and retrieved from episodic LTM. Therefore, we assume a *disuse* effect on episodic LTM, leading to a less efficient episodic memory system. However, short-term memory (STM, here operationalized by Tulving and Colotla, 1970; the Tulving and Colotla lag measure) and WM (not explicitly measured in this study) should be increasingly active in mismatching conditions because both systems would be constantly occupied during retrospective disambiguation of what had been said in a conversation. Thus, both STM and WM would be *relatively less vulnerable to disuse*. It is also predicted that semantic LTM should be highly correlated with episodic LTM because the status of phonological representations in semantic LTM should be tightly related to the success of encoding into episodic LTM. These predictions have recently been confirmed by structural equation modeling. Episodic LTM decline is related to long-term hearing impairment, despite the use of existing hearing aid technology (Rönnberg et al., 2011a,b). One note of caution though is that exact measures of every-day hearing aid use were not available. Hence, any potential dose-response relationship among the hearing aid wearers could not be assessed.

Thus, hearing loss was independently related to episodic LTM (verbal recall tasks) and semantic LTM (initial letter fluency and vocabulary) but unrelated to STM, even when age was accounted for. Visual acuity alone, or in combination with auditory acuity, did not contribute to any acceptable structural equation model; it only made the prediction of episodic LTM decline worse by standard goodness-of-fit criteria (see also Lindenberger and Ghisletta, 2009). And finally, even when the episodic LTM tasks were of nonauditory nature (i.e., motor encoding of lists of imperatives and subsequent free recall of these actions, Nilsson et al., 1997) the association with hearing loss persisted (Rönnberg et al., 2011a,b).

Although the participants wore their hearing aids whilst completing the auditory episodic memory tasks, this negative result may be accounted for in terms of perceptual stress, or information degradation (cf. Pichora-Fuller et al., 1995). It has been argued and empirically demonstrated that once perceptual stress is equated for example among different age groups, differences in performance on WM, associative memory and comprehension tasks (e.g., Schneider et al., 2002) tend to vanish. Nevertheless, the decreased performance in the non-auditory tasks reported in Rönnberg et al. (2011a,b) cannot be explained on the basis of information degradation and it is possible that there are both information degradation and long-term deprivation effects. Central mechanisms involving attentional resources could also be affected by hearing impairment, which in turn would predict problems with memory encoding (Tun et al., 2009; Majerus et al., 2012; Peelle and Davis, 2012; cf. Sörqvist et al., 2012a) and possibly WMC (see also Schneider et al., 2010). Before we can reach definite conclusions about the selective effects of hearing impairment on memory systems, a broader spectrum of tasks assessing different memory systems must be employed.

From a more general and clinical perspective, we suggest that future longitudinal studies should evaluate the effects of the use of the hearing aids on cognition and memory systems, and in particular, the effects of certain kinds of signal processing on different tasks assumed to index different memory systems.

#### **WMC AND EFFORT**

In this section, we discuss recent work related to the ELU prediction about WMC and effort (cf. Hervais-Adelman et al., 2012; Amichetti et al., 2013). In particular, we focus on predictions based on recent data using pupillometry that contrast with the ELU prediction.

Apart from taxing cognitive capacity, listening under adverse conditions is often associated with subjectively experienced effort, especially in individuals with hearing impairment (Pichora-Fuller, 2006). The ELU prediction about effort, or the inverse notion of "ease" (Rönnberg, 2003) is that in effort-demanding listening situations, an individual with a high WMC will be better able to compensate for the distorted signal, without exhausting WMC and therefore experience less effort in comparison to an individual with small WMC (cf. the neural efficiency hypothesis; e.g., Pichora-Fuller, 2003; Heitz et al., 2008), given that the task does not hit ceiling/floor (Rönnberg, 2003). Intermediate difficulty levels provide the best opportunity for explicit processes to operate in a compensatory fashion. Recent work by our group has confirmed that higher WMC is associated with lower perceived and rated listening effort for intermediate levels of difficulty, or "ease" of processing (Rudner et al., 2012). We suggest that subjective effort ratings may be useful for understanding the relative contributions of explicit WM processes to speech understanding in challenging conditions (Rudner et al., 2012; Ng et al., 2013b).

Some researchers have proposed that the pupillary response reflects cognitive processing load during the processing of sentences of different grammatical complexity (Piquado et al., 2010; Zekveld et al., 2010). This response is also sensitive to age, hearing loss, and the extra effort required to perceive speech in competing talker conditions compared to noise maskers (Zekveld et al., 2011b; Koelewijn et al., 2012a). Koelewijn et al. (2012b) observed that people with high SIC spans demonstrated larger pupil size, and that higher SIC span performance, in turn, was related to lower signal-to-noise ratios needed to perform at a certain threshold level in the competing talker condition. This pattern of findings may suggest that cognitive load is actually increased by high WMC, which can be viewed as a paradoxical result, but has support in the literature (Van der Meer et al., 2010; Zekveld et al., 2011b). Another interpretation of these data is that individuals with a high capacity solve difficult stimulus conditions by consuming more cognitive brain resources (more extensively or more intensively), thus exercising greater task engagement, and this is what is reflected in the pupil size variations (Koelewijn et al., 2012b; see Grady, 2012).

Pupil size seems to reliably capture cognitive load and associated effort under certain semantic or informational masking conditions. The exact mechanisms behind the empirical findings so far remain to be elucidated. But clinically, irrespective of explanatory mechanism, pupil size may become a complementary measure to subjective ratings of effort.

## **GENERAL DISCUSSION AND A NEW ELU-MODEL**

Phonological and semantic mismatches increase the dependence on WMC in speech-in-noise tasks. However, as we have seen in the current review of recent ELU-related WMC studies, the role of WMC is extended to include early attention mechanisms, interactions with memory systems under different multi-talker conditions, both for short-term and LTM, and a relationship to effort via subjective and objective measures.

Below we present the new empirical extensions that emerge from our recent data inspired by the old ELU model (Rönnberg et al., 2008). Then, we describe the new ELU model, based on these new empirical patterns, emphasizing in general and in detail the new features that differ from the old model. A section on predictions will close the presentation of the new model. In the following section, the new ELU model is compared to other relevant WM and speech perception models. The paper ends by addressing some important clinical consequences that follow from the model.

#### **NEW EMPIRICAL EXTENSIONS**

First, the data we have presented and discussed suggest that several kinds of signal processing in hearing aids (*i.e., fast amplitude compression, frequency compression, and binary masking*), designed to facilitate speech perception, are handled best by individuals with high WMC. This is the first extension from the original studies that informed the development of the ELU model (Rönnberg et al., 2008). At that time, we did not know whether WMC was important for just one kind of distortion induced by signal processing (i.e., fast amplitude compression) or not. Importantly, when some kind of distortion of the signal is introduced, the feed-forward mechanism (cf. Bendixen et al., 2009) of RAMBPHO that predicts yet-to-be-experienced (syllabic) elements in the unfolding sound sequence (cf. Poeppel et al., 2008; Bendixen et al., 2009) seems to be temporarily interrupted, allowing ambiguous information to enter an explicit processing loop before understanding can be achieved.

A second extension is related to the *pre-tuning or synchronization* by WMC, directly or indirectly, prior to or early on during stimulus presentation. One type of prior influence mediated by WMC relates to "early" attention processes (Fell and Axmacher, 2011; Freunberger et al., 2011; Sörqvist et al., 2012a), another is related to priming, or pop-out (e.g., Davis et al., 2005). Recent data suggest that the magnitude of the pop-out effect may be mediated by WMC (Signoret et al., 2012). A third kind of influence exerted by WMC relates to memory encoding operations (Sörqvist and Rönnberg, 2012), and subsequent influences on understanding, including turn-taking in a dialogue (Ibertsson et al., 2009). This kind of continuous feedback was not part of the old ELU model (Rönnberg et al., 2008). This means that the new model also acknowledges a *post-dictive*, explicit feedback loop, feeding into *predictive* RAMBPHO processing. This mechanism is akin to the hypothesis testing, analysis-by-synthesis aspect of the Poeppel et al. (2008) framework (see below under Relation to other models).

A third extension has to do with the role of WMC in processing text cues that generate *explicit semantic expectations* of what will come in the unfolding speech stream. WMC is particularly important when expectations are violated by the content of the speech signal (Zekveld et al., 2011a). This may be because individuals with high WMC have a superior ability to inhibit the cue-activated, *mismatched* representation in semantic memory (cf. Nöstl et al., 2012; Sörqvist et al., 2012b). The discovery of a semantic influence on RAMBPHO processing means that the theoretical assumption of the model must be revised (see further below). Further, research suggests that older people more frequently rely on semantic context. For older people, incongruent semantic context seems to impair identification of words in noise, although confidence levels are higher than in younger adults (Rogers et al., 2012). Older people have a smaller WMC than younger individuals while frontal-lobe based executive functions may remain intact and this may account for the false hearing effects (Rogers et al., 2012). Also, over many decades of greater reliance on context in the face of gradual age-related declines in sensory processing, there may be changes in brain organization with an anterior-posterior shift in the brain areas engaged in complex tasks (Davis et al., 2008).

A fourth extension is that *high WMC individuals can deploy more resources to both semantic and phonological aspects* of a task, depending on instruction. The versatility in types of processing (phonological and semantic) of high WMC people represents a feature that was lacking in the old model. For example, in the Sörqvist and Rönnberg (2012) study WMC contributed to inhibition of a competing talker while focusing on the semantic content of the target talker. A consequence of this is enhanced, or deeper, understanding (Craik and Tulving, 1975). The by-product is more durable episodic memory traces (Classon et al., 2013a). In a recent ERP study Classon et al. (2013b) showed that hearing impaired, but not normal hearing individuals, demonstrate an amplified N2-like response in non-rhyming, orthographically mismatching conditions. This ERP signature of hearing impairment is suggested to involve increased reliance on explicit compensatory mechanisms such as articulatory recoding and grapheme-tophoneme conversion and may prove to tap into some phonological WM function.

A fifth important extension encompasses the *negative relationships between hearing loss and episodic and semantic LTM*. These occur despite the use of hearing aids. However, STM is relatively unaffected, presumably because the demand to resolve speech understanding under mismatching, adverse conditions keeps this memory system in a more active state. Therefore, the overall selective effects on different memory systems are couched in terms of use/disuse (Rönnberg et al., 2011a,b). It should be noted that although the ELU prediction is in terms of relative effects of use/disuse, it does not exclude the possibility that either STM or WM may be affected by hearing impairment (cf. Van Boxtel et al., 2000; Cervera et al., 2009); it only predicts a relatively larger LTM impairment.

A sixth general fact to note is the *modality-generality* of memory systems in relation to language understanding. Reading span obviously taps modality-general verbal WMC as it predicts variance in the speech-in-noise tasks (Akeroyd, 2008; cf. Daneman and Carpenter, 1980; Just and Carpenter, 1992). Generality is also a key feature of the modulation of auditory attention (ABR) by manipulating visual-verbal WM load (Sörqvist et al., 2012a). Finally, a striking finding in the (Rönnberg et al., 2011a,b) study is that the negative memory consequences that may be attributed to hearing loss also show an independence of encoding format, and is not uniquely related to auditory encoding: At the level of simple correlations, hearing loss showed the highest negative correlation to free recall performance on tasks which not only involved auditory encoding but also encoding of motor and textual representations—and the effects were still manifest after statistically correcting for age.

Seventhly, and finally, the effect of WMC on stimulus processing is pervasive in terms of the *time window*: from early brain stem responses to encoding into episodic LTM. Thus, the above generalizations have set the stage for a more analytical and general formulation of the ELU model.

#### **THEORETICAL CONSEQUENCES**

The new extensions result in a better specified ELU model that presents WM as the arena for interpretating the meaning of an ongoing dialogue. An individual with high WMC is more capable of using different levels/kinds of information and implicit/explicit strategies for extracting meaning from a message. The storage and processing operations that are performed by a high-capacity system are modality-general and flexible during multi-tasking. Implicit and explicit processes are assumed to run in parallel and interactively, but under different time windows (cf. Poeppel et al., 2008).

The successful listener disambiguates the signals on-line over time, due to successive semantic and lexical retrieval attempts, combined with contextual and dialogical constraints to narrow down the set of lexical candidates cued in the speech stream. Because of time constraints in dialogues, the listener may often settle for the gist without resolving all of the details of the signalmeaning mapping. It may even be the case that the context is so strongly predictive that very little if any information delivered by RAMBPHO is needed for successful recognition to occur (Moradi et al., 2013).

We now further assume that the information delivered by RAMBPHO is relayed by a fast-forward, matching mechanism that is nested under a slow, explicit feedback loop (cf. Poeppel et al., 2008; Stenfelt and Rönnberg, 2009). The mismatch in itself is determined either by poor RAMBPHO information and/or poor phonological representations in semantic LTM. We conceptualize the phonological representations in LTM in terms of multiple attributes. A minimum number of attributes are required for access to a certain lexical item. Above a certain threshold there is a sufficient number of attributes to trigger the lexical representation. Below threshold, we can expect a number of qualitatively different outcomes: (a) if the number of attributes is close to threshold, then some phonological neighbors may be retrieved (Luce and Pisoni, 1998); (b) if too few attributes match the intended target item, the matching process could be led astray by contextual constraints induced by "mismatching" semantic cues (Zekveld et al., 2011a); and (c), if no phonological attributes are present at the RAMBPHO level, it could still be the case that a sentence context is so predictive that an upcoming target word is very likely to be activated anyhow (Moradi et al., 2013).

The matching process is ultimately determined by the fidelity of the input and phonological representation. Fidelity is affected by external noise but also by internal noise (e.g., by poor phonological representations due to long-term hearing impairment; Classon et al., 2013a,b). These phonological attributes are primarily constrained at a syllabic level of representation. RAMBPHO information is based on rapid phonological extraction from the signal by means of a mix of visual, sound-based and motoric predictions (cf. Hickok, 2012). We still propose that the bottleneck of the system is the connection between RAMBPHO delivered information and the phonological-lexical representation. However, we now also assume that the phonological attributes are embedded within domains of semantically related attributes; i.e., relations between the two types of attributes are assumed to be stored and represented together (cf. Hickok, 2012). Thus, these synergistic representations allow lexical access both via RAMBPHO and semantic cueing (Zekveld et al., 2011a, 2012), and give ground for a conceptualization of a versatile, multi-code capacity usage of high WMC participants. Furthermore, in the new ELU model, the implicit as well as the explicit processing mechanisms rely on phonological and semantic interactions. Semantic LTM can be used either for explicit "repair" of a distorted signal, for inferencemaking, or for implicit and rapid semantic priming. Mismatch will determine how time is shared between the explicit and implicit operations: the fast (implicit) RAMBPHO mechanism is always running until it is temporarily interrupted. When interrupted, the default situation is that it re-starts the analysis of the speech signal with whatever information is available (phonological/semantic). At the same time, mismatch will tune the system to use the explicit slow loop to repair violated expectations, again via semantic and phonological cues.

Under time pressure, and given that the listener is happy to settle for the gist of the message (see Pichora-Fuller et al., 1998), low-level RAMBPHO processing may be overruled by explicit functions. RAMBPHO is in principle "blind" to the overall meaning of a message, in the sense that its sole function is to "unlock" the lexicon. But it is conceivable that it can be modified in terms of attention to certain attributes depending on semantic knowledge about speaker identity and topic (Mesgarani and Chang, 2012). The crucial aspect is therefore not the specific kind of signal processing that temporarily interrupts RAMBPHO, but the modality-general explicit capacity to use and combine the available perceptual evidence and quality of the LTM knowledge. This takes place via different WMC-dependent executive mechanisms such as inhibition, focusing of attention, and retrieval of contextual and semantic information. The sooner the brain can construct an interpretation of the message, the easier language processing becomes, and the content of a dialogue is more rapidly committed to more permanent memory encodings.

In short, the new ELU model is a WMC-based model of a meaning prediction system (cf. Samuelsson and Rönnberg, 1993; Federmeier, 2007; Hickok, 2012). Specifically, the settings of the system are regulated either explicitly (by some semantic/contextual instruction or explicit feedback) or by the neural consequences of high WMC (in terms of, e.g., brain oscillations). Attention manipulations—seen as one way of pre-tuning the system—have recently proven to have cortical consequences in speech in noise tasks (Mesgarani and Chang, 2012; Wild et al., 2012).

In **Figure 2**, we illustrate how explicit/implicit processes interact over time. Each explicit "loop" is activated by a mismatch. The number of times the listener passes through an explicit loop depends generally on for example turn taking, competing speech, attention manipulations, or to distortions from signal processing in the hearing aid.

## **NEW ELU PREDICTIONS**

We outline some new predictions that follow from the revised and updated ELU model.


number of phonological attributes (i.e., above threshold) in the mental lexicon and lexical access proceeds rapidly and automatically. RAMBPHO may be preset by expectations—modulated by WM—concerning the phonological characteristics of the communicative signal, e.g., the language or regional accent of the communicative partner or by semantic or contextual constraints. When there is a mismatch (as in suboptimal listening conditions), WM "kicks in" to support listening (Rönnberg et al.,

missing information, which also feeds back to RAMBPHO. The output of the system is some level of understanding or gist, which in turn induces a semantic framing of the next explicit loop. Another output from the system is episodic LTM, where information encoded into LTM is dependent on the type of processing carried out in WM. Explicit and implicit processes run in parallel, the implicit being rapid, the explicit is a relatively slow feedback loop.

in conditions of speech-in-speech maskers (Sörqvist and Rönnberg, 2012). In short, participants with high WMC are predicted to better adapt to different task demands than participants with low WMC, and hence are more versatile in their use of semantic and phonological coding and re-coding after mismatch.


#### **RELATION TO OTHER MODELS**

The new ELU model differs from models of speech perception (e.g., the TRACE model, McClelland and Elman, 1986; the Cohort model, Marslen-Wilson, 1987; and the NAM model, Luce and Pisoni, 1998) and also from the original notion of mismatch negativity (Näätänen and Escera, 2000) in its assumption that explicit WMC is called for whenever there is mismatch between language input and LTM representations. In this way, the mismatch mechanism—and the demand on WMC—is related to communication. Nevertheless, the ELU model is similar to the earlier speech perception models in that all acknowledge the importance of an interaction with LTM representations and that lexical access proceeds via some kind of model-specific retrieval mode. The ELU model especially focuses on how the perceptual systems interact with different memory systems. The cognitive hearing science aspect and the historical context of the ELU model has recently been reviewed elsewhere (Pichora-Fuller and Singh, 2006; Arlinger et al., 2009).

RAMBPHO focuses on the integration of phonological information from different sources and thus shares similarities with the episodic buffer introduced by Baddeley (2000). However, unlike Baddeley's model, the ELU model is geared toward the communicative outcome, i.e., language understanding, rather than WMC as such (Rudner and Rönnberg, 2008a,b). The fact that the need for explicit resources such as WMC is restricted to mismatch situations also represents a unique processing economy aspect of the ELU model.

The ELU model is inspired by the notions and models of WM for read text presented by Daneman and Carpenter (1980) and Just and Carpenter (1992) in that it emphasizes both storage and processing components of WM. This is why we originally adopted the reading span task as a potentially important predictor variable of speech-in-noise performance, without introducing audibility problems. The trade-off between storage and processing is particularly relevant for the ELU model in that hearing impairment typically puts extra pressure on the processing and inferencemaking that is needed to comprehend a sentence. Less storage and less encoding into episodic LTM are expected for participants with hearing impairment compared to participants without hearing impairment unless they have a high WMC. Of particular relevance is the fact that Just and Carpenter (1992) showed that WMC constrains sentence comprehension during reading such that individuals with high WMC are better than individuals with low WMC at coping with more complex syntactic structures (e.g., object-relative clauses), maintaining ambiguous representations of sentences, and resolving anaphora. Dealing with semantic or syntactic complexity is presumably very important for participants who are "mismatching" frequently during conversation. Here, the attention, inference-making, inhibition and storage abilities of individuals with high WMC play a crucial role.

The ELU has some interesting similarities with the speech perception model by Poeppel et al. (2008) in that both models assume parallel processes (streams) that operate within different time-windows (cf. Hickok and Poeppel, 2004, 2007). For ELU the first time window is when phonological representations are formed in RAMBPHO to match representations in semantic LTM; the second is the slower explicit loop function. RAMPBHO seems to be a concept very similar to the phonological primal sketch suggested by Poeppel et al., where syllables mediate lexical access. Also, audiovisual integration seems to occur around 250 ms, where visual information typically leads and affects the integration (van Wassenhove et al., 2007). We have speculated about the earlier (than syllabic) spectral-segmental kinds of analyses discussed in the Poeppel et al. (2008) paper (see Stenfelt and Rönnberg, 2009), primarily in terms of how different types of hearing impairment might affect perception of segmental features.

The explicit, slow processing loop, is *postdictive* in the sense that mismatch, error-induced, signals may invoke some kind of WM-based inference-making. This was the main function for re-construction and inference-making in the old ELU model (Rönnberg, 2003; Rönnberg et al., 2008). However, as we have emphasized with the new ELU model, its *predictive* potential is now clearly spelled out in terms of the re-settings explicit processes may invoke, phonologically and semantically, and also because of the fine-tuning or synchronization by WM itself (Hickok, 2012). In keeping with Poeppel et al. (2008), the neural basis of syllabic processing is likely to involve STS, lexical access supposedly involves MTG, while our recent study (Zekveld et al., 2012) is a first indication of a frontal (LIFG) WMCbased compensation for the explicit effort involved in decoding words/sentences in noise. Poeppel et al. (2008) advocate an analysis by synthesis framework whereby initial segments of a spoken signal are matched against a hypothesis, "an internal forward model." The internal model is then updated against new segments of speech approximately at every 30 ms, feeding back to several levels of representation including the phonological primal sketch. One way of conceptualizing the hypothesis-driven, analysis-by-synthesis framework by Poeppel et al. (2008) may in fact be understood in terms of WMC. A high WMC helps keep several hypotheses alive, allowing for top-down feed-back at several points in time and at segmental, syllabic, lexical and semantic levels of representation (cf. Figure 4 in Poeppel et al., 2008, cf. Poeppel and Monahan, 2011). The probability of entertaining or maintaining a hypothesis in WM may then in part be determined by Bayesian logic, "The quantity p(H|E) represents the likelihood of the hypothesis, given the sensory analysis; p(E|H) is the likelihood of the synthesis of the sensory data given the analysis" (p. 1080, Poeppel et al., 2008), where H represents the forward hypothesis and E the perceptual evidence. With an ELU perspective, this will also be modulated by the WMC to hold several hypotheses, at different levels in the cognitive system, in mind.

In the general context of dual stream models, addressing the interaction between ventral and dorsal attention networks, Asplund et al. (2010) found that so called surprise blindness, i.e., where a profound deficit in the detection of a goal-relevant target (a letter) as a result of the presentation of an unexpected and taskirrelevant stimulus (a face), causes activity in the inferior frontal junction. This manipulation represents an interaction between stimulus-driven and goal-directed, hypothesis-driven attention and may be compared to the cueing manipulations by Zekveld et al. (2011a, 2012). Resolutions of ambiguity also involve interactions between stimulus-driven and knowledge-driven processes (Rodd et al., 2012), which demand the integrative functions of LIFG. These examples may in fact be related to the new predictive and postdictive (feedback) interactions postulated in the new ELU model.

As discussed by Arnal and Giraud (2012), implicit temporal predictions of spoken stimuli represent one mechanism that may be modulated by slow delta-theta oscillations, whereas in the case of top-down, hypothesis-driven transmission of content specific information, beta oscillations may index a complementary mechanism in speech comprehension. Similar kinds of dual mechanisms have been proposed by Golumbic et al. (2012) when tracking selective attention to a target voice while ignoring another voice in a cocktail party situation. Low frequency activity typically corresponds to the speech envelope at lower auditory cortex levels, whereas high gamma power activity is reflected in the entrainment to the attended target voice only at later stages of processing, which also were cortically spread out to, e.g., inferior frontal cortex and anterior temporal cortex. This general result connects nicely with the Zekveld et al. (2012) data of WMC based compensation localized in LIFG and MTG.

Finally, Andersson and colleagues demonstrated in a recent study (Anderson et al., 2013a), using structural equation modeling, that auditory WM, in combination with central auditory functions such as brain stem responses (e.g., pitch encoding), contributes to understanding speech in noise. Peripheral auditory measures *did not* account for any variance but musical experience reinforced the effect of auditory WM. This is in line with our research ascribing a central role to WM for speech understanding under adverse conditions. Interestingly, Anderson et al. (2013b) have also been able to show that brain stem responses to complex sounds, rather than hearing thresholds, predict selfreported speech-in-noise performance. These data agree with the Sörqvist et al. (2012a,b) data on the relationship between brain stem responses and WMC. Since WM is by definition an explicit processing and storage system, the data also fit with the fact that self-report—which taps into explicit awareness of speech processing (cf. Ng et al., 2013b)—has the capacity to reflect brain stem responses.

In sum: although the ELU model shares underlying notions with other speech perception and WM models, its uniqueness lies in the connection between mismatch and WMC (explicitly and postdictively), and implicitly and predictively, between WM and RAMBPHO, and the roles played by the interaction between WM and other memory systems such as episodic and semantic LTM.

#### **LIMITATIONS**

One limitation of the new ELU model concerns the more exact definition of when a mismatch condition is at hand. We have seen a picture of results that suggests that many kinds of signal processing actually demand a higher dependence of WMC, at least initially, before some learning or acclimatization has occurred (cf. the first prediction). This of course also holds true for the case when a person has acclimatized to a certain kind of signal processing, and then is tested with another, thus breaking, the habitual phonological coding schemes. However, a critique that can be launched is that we a priori may have problems determining the exact parameters for the mismatch induction. The problem of circularity is apparent. More empirical investigations into, e.g., determining whether it is the kind of signal processing or the artifacts caused by signal processing that determine mismatch and WM dependence will help clarify this issue. Another problem relates to the (so far) relatively few studies involving the neural correlates of WMC and speech understanding in noise. Future studies will also have to address the neural consequences of high vs. low WMC and how it modulates predictions at different linguistic levels (syllabic, lexical, semantic, and syntactic).

#### **CLINICAL IMPLICATIONS**

Given that WMC is crucially important for on-line processing of speech under adverse conditions as well as the ability to maintain its content for shorter or longer periods, then hearing aid manufacturers, speech-language pathologists and hearing health care professionals must take that into account. First, clinically relevant WMC tests need to be developed; tests that tap into the processes that have proven to be modality-general and optimal for both online processing of speech as well as for episodic LTM. This means normative data needs to be collected to determine age-dependent and impairment-specific performance levels and provide a clinical instrument for assessing WMC. By using visual-verbal tests audibility problems are avoided, thus disentangling potential perceptual degradation effects from WM performance. However, it is important to collect norms for different age-groups and levels of hearing impairment in combination, because there is also the possibility of more central, or cognitive side-effects of age and impairment.

Second, individuals with low WMC seem to be initially susceptible to signal processing distortions from "aggressive"

#### **REFERENCES**


memory: executive control and listening effort. *Mem. Cognit.* doi 10.3758/s13421-013-0302-0. [Epub ahead of print].


signal processing (fast amplitude compression, severe frequency compression, binary masking), although this susceptibility may decline after a period of familiarization (Rudner et al., 2011). For all individuals, concrete options are at hand for manipulations of the signal in the hearing aid: to increase amplification, alter input dynamics, to remove some information (= noise reduction) to get a benefit. But these manipulations come at a cost that is different for different individuals. Thus, we advocate that the "dose of the medicine" (= the active ingredient), the intended benefit of signal processing and its side-effects (by-product of the medicine) must be tailored to the individual, such that the high WMC can have a more active ingredient (= more aggressive signal processing, compared to the low WMC who may be more susceptible to side-effects). This reasoning could in principle also be applied to acoustic design of other technologies.

Third, the data we have presented suggest that many kinds of more advanced signal processing in hearing instruments demand WMC. The down-side of using advanced signal processing on a daily basis is that it demands effort and for any given individual with hearing loss, this may outweigh the benefit. Therefore, there is a need to develop new methods that assess effortful brain-work with more precision. Here, reaction time measures, pupil dilation indices or measures of evoked response potentials may prove to be useful signals for on-line adjustment of signal processing parameters in hearing instruments.

Fourth, with a new cognitive hearing science perspective, it would be equally important to evaluate memory and comprehension of the contents of a conversation in noise, as functional outcome measures, rather than only focusing on word recognition accuracy per se (Pichora-Fuller, 2007; Rönnberg et al., 2011a). This can actually be seen as an indirect measure of cognitive spare capacity, or the residual cognitive capacity that remains once successful listening has taken place (Pichora-Fuller, 2007, 2013; Mishra et al., 2013).

Fifth, it is quite possible that to properly evaluate the effects of hearing aids and other interventions, a longitudinal study that also systematically manipulated the kind of signal processing would be quite informative. We know very little about the longterm effects of signal processing on cognition and how this may relate to or reduce the risk of dementia (Lin et al., 2011, 2013).

Finally, an intervention study that evaluated the effects of WM training on speech in noise understanding would put the causal nature of WMC to the test (cf. McNab et al., 2009). Additionally, if one could study the neural correlates of this putative plastic change that would shed further light on the neural mechanisms involved.

N. (2013b). Auditory brain stem responses to complex sounds predicts self-reported speech-in-noise performance. *J. Speech Lang. Hear. Res.* 56, 31–43.


a speechreading expert: The case of AA (JK023). *Ear Hear.* 26, 214–224.


Woll, B. (2013). Dissociating cognitive and sensory neural plasticity in human superior temporal cortex. *Nat. Commun.* 4:1473. doi: 10.1038/ ncomms2463


types of masker noises. *Ear Hear.* 34, 261–272. doi: 10.1097/AUD. 0b013e31826d0ba4


for measurement of speech recognition. *Int. J. Audiol.* 45, 227–237. doi: 10.1080/14992020500429583


Rönnberg et al. ELU: theory, data, and clinical implications

comprehension—individual differences in working memory. *Psychol. Rev.* 99, 122–149. doi: 10.1037/ 0033-295X.99.1.122


et al. (2013). Hearing loss and cognitive decline in older adults. *JAMA Intern. Med.* 173, 293–299. doi: 10.1001/jamainternmed.2013.1868


speechreading. *Scand. Audiol.* 21, 67–72. doi: 10.3109/010503992090 45984


in *Hearing Care for Adults*, eds C. Palmer and R. Seewald (Stäfa: Phonak), 71–85.


48, 1324–1335. doi: 10.1016/j. neuropsychologia.2009.12.035


and profoundly hearing-impaired group. *Scand. Audiol.* 12, 71–77. doi: 10.3109/01050398309076227


The neural processing of masked speech: evidence for different mechanisms in the left and right temporal lobes. *J. Acoust. Soc. Am.* 125, 1737–1743. doi: 10.1121/1.3050255


253–258. doi: 10.1037/0882-7974. 15.2.253


et al. (2010). Resource allocation and fluid intelligence: Insights from pupillometry. *Psychophysiology* 47, 158–169. doi: 10.1111/j.1469-8986. 2009.00884.x


hearing loss, and cognition on the pupil response. *Ear Hear.* 32, 498–510.

Zekveld, A. A., Rudner, M., Johnsrude, I. S., Dirk, J., Heslenfeld, D. J., and Rönnberg, J. (2012). Behavioural and fMRI evidence that cognitive ability modulates the effect of context on speech intelligibility. *Brain Lang.* 122, 103–113. doi: 10.1016/j. bandl.2012.05.006

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 12 March 2013; accepted: 24 June 2013; published online: 13 July 2013.*

*Citation: Rönnberg J, Lunner T, Zekveld A, Sörqvist P, Danielsson H, Lyxell B, Dahlström Ö, Signoret C, Stenfelt S, Pichora-Fuller MK and Rudner M (2013) The Ease of Language Understanding (ELU) model: theoretical, empirical, and clinical advances. Front. Syst. Neurosci. 7:31. doi: 10.3389/ fnsys.2013.00031*

*Copyright © 2013 Rönnberg, Lunner, Zekveld, Sörqvist, Danielsson, Lyxell, Dahlström, Signoret, Stenfelt, Pichora-Fuller and Rudner. This is an openaccess article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## Speech perception as an active cognitive process

## **Shannon L. M. Heald and Howard C. Nusbaum\***

Department of Psychology, The University of Chicago, Chicago, IL, USA

#### **Edited by:**

Jonathan E. Peelle, Washington University in St. Louis, USA

#### **Reviewed by:**

Matthew H. Davis, MRC Cognition and Brain Sciences Unit, UK Lori L. Holt, Carnegie Mellon University, USA

#### **\*Correspondence:**

Howard C. Nusbaum, Department of Psychology, The University of Chicago, 5848 South University Avenue, Chicago, IL 60637, USA e-mail: hcnusbaum@uchicago.edu

One view of speech perception is that acoustic signals are transformed into representations for pattern matching to determine linguistic structure. This process can be taken as a statistical pattern-matching problem, assuming realtively stable linguistic categories are characterized by neural representations related to auditory properties of speech that can be compared to speech input. This kind of pattern matching can be termed a passive process which implies rigidity of processing with few demands on cognitive processing. An alternative view is that speech recognition, even in early stages, is an active process in which speech analysis is attentionally guided. Note that this does not mean consciously guided but that information-contingent changes in early auditory encoding can occur as a function of context and experience. Active processing assumes that attention, plasticity, and listening goals are important in considering how listeners cope with adverse circumstances that impair hearing by masking noise in the environment or hearing loss. Although theories of speech perception have begun to incorporate some active processing, they seldom treat early speech encoding as plastic and attentionally guided. Recent research has suggested that speech perception is the product of both feedforward and feedback interactions between a number of brain regions that include descending projections perhaps as far downstream as the cochlea. It is important to understand how the ambiguity of the speech signal and constraints of context dynamically determine cognitive resources recruited during perception including focused attention, learning, and working memory. Theories of speech perception need to go beyond the current corticocentric approach in order to account for the intrinsic dynamics of the auditory encoding of speech. In doing so, this may provide new insights into ways in which hearing disorders and loss may be treated either through augementation or therapy.

**Keywords: speech, perception, attention, learning, active processing, theories of speech perception, passive processing**

In order to achieve flexibility and generativity, spoken language understanding depends on active cognitive processing (Nusbaum and Schwab, 1986; Nusbaum and Magnuson, 1997). Active cognitive processing is contrasted with passive processing in terms of the control processes that organize the nature and sequence of cognitive operations (Nusbaum and Schwab, 1986). A passive process is one in which inputs map directly to outputs with no hypothesis testing or information-contingent operations. Automatized cognitive systems (Shiffrin and Schneider, 1977) behave as though passive, in that stimuli are mandatorily mapped onto responses without demand on cognitive resources. However it is important to note that cognitive automatization does not have strong implications for the nature of the mediating control system such that various different mechanisms have been proposed to account for automatic processing (e.g., Logan, 1988). By comparison, active cognitive systems however have a control structure that permits "information contingent processing" or the ability to change the sequence or nature of processing in the context of new information or uncertainty. In principle, active systems can generate hypotheses to be tested as new information arrives or is derived (Nusbaum and Schwab, 1986) and thus provide substantial cognitive flexibility to respond to novel situations and demands.

#### **ACTIVE AND PASSIVE PROCESSES**

The distinction between active and passive processes comes from control theory and reflects the degree to which a sequence of operations, in this case neural population responses, is contingent on processing outcomes (see Nusbaum and Schwab, 1986). A passive process is an open loop sequence of transformations that are fixed, such that there is an invariant mapping from input to output (MacKay, 1951, 1956). **Figure 1A** illustrates a passive process in which a pattern of inputs (e.g., basilar membrane responses) is transmitted directly over the eighth nerve to the next population of neurons (e.g., in the auditory brainstem) and upward to cortex. This is the fundamental assumption of a number of theories of auditory processing in which a fixed cascade of neural population responses are transmitted from one part of the brain to the other (e.g., Barlow, 1961). This type of system operates the way reflexes are assumed to operate in which neural responses are transmitted and presumably transformed but in a fixed and immutable way (outside the context

of longer term reshaping of responses). Considered in this way, such passive processing networks should process in a time frame that is simply the sum of the neural response times, and should not be influenced by processing outside this network, functioning something like a module (Fodor, 1983). In this respect then, such passive networks should operate "automatically" and not place any demands on cognitive resources. Some purely auditory theories seem to have this kind of organization (e.g., Fant, 1962; Diehl et al., 2004) and some more classical neural models (e.g., Broca, 1865; Wernicke, 1874/1977; Lichtheim, 1885; Geschwind, 1970) appear to be organized this way. In these cases, auditory processes project to perceptual interpretations with no clearly specified role for feedback to modify or guide processing.

By contrast, active processes are variable in nature, as network processing is adjusted by an error-correcting mechanism or feedback loop. As such, outcomes may differ in different contexts. These feedback loops provide information to correct or modify processing in real time, rather than retrospectively. Nusbaum and Schwab (1986) describe two different ways an active, feedbackbased system may be achieved. In one form, as illustrated in **Figure 1B**, expectations (derived from context) provide a hypothesis about a stimulus pattern that is being processed. In this case, sensory patterns (e.g., basilar membrane responses) are transmitted in much the same way as in a passive process (e.g., to the auditory brainstem). However, descending projections may modify the nature of neural population responses in various ways as a consequence of neural responses in cortical systems. For example, top-down effects of knowledge or expectations have been shown to alter low level processing in the auditory brainstem (e.g., Galbraith and Arroyo, 1993) or in the cochlea (e.g., Giard et al., 1994). Active systems may occur in another form, as illustrated in **Figure 1C**. In this case, there may be a strong bottom-up processing path as in a passive system, but feedback signals from higher cortical levels can change processing in real time at lower levels (e.g., brainstem). An example of this would be the kind of observation made by Spinelli and Pribram (1966) in showing that electrical stimulation of the inferotemporal cortex changed the receptive field structure for lateral geniculate neurons or Moran and Desimone's (1985) demonstration that spatial attentional cueing changes effective receptive fields in striate and extrastriate cortex. In either case, active processing places demands on the system's limited cognitive resources in order to achieve cognitive and perceptual flexibility. In this sense, active and passive processes differ in the cognitive and perceptual demands they place on the system.

Although the distinction between active and passive processes seems sufficiently simple, examination of computational models of spoken word recognition makes the distinctions less clear. For a very simple example of this potential issue consider the original Cohort theory (Marslen-Wilson and Welsh, 1978). Activation of a set of lexical candidates was presumed to occur automatically from the initial sounds in a word. This can be designated as a passive process since there is a direct invariant mapping from initial sounds to activation of a lexical candidate set, i.e., a cohort of words. Each subsequent sound in the input then deactivates members of this candidate set giving the appearance of a recurrent hypothesis testing mechanism in which the sequence of input sounds deactivates cohort members. One might consider this an active system overall with a passive first stage since the initial cohort set constitutes a set of lexical hypotheses that are tested by the use of context. However, it is important to note that the original Cohort theory did not include any active processing at the phonemic level, as hypothesis testing is carried out in the context of word recognition. Similarly, the architecture of the Distributed Cohort Model (Gaskell and Marslen-Wilson, 1997) asserts that activation of phonetic features is accomplished by a passive system whereas context interacts (through a hidden layer) with the mapping of phonetic features onto higher order linguistic units (phonemes and words) representing an interaction of context with passively derived phonetic features. In neither case is the activation of the features or sound input to linguistic categorization treated as hypothesis testing in the context of other sounds or linguistic information. Thus, while the Cohort models can be thought of as an active system for the recognition of words (and sometimes phonemes), they treat phonetic features as passively derived and not influenced from context or expectations.

This is often the case in a number of word recognition models. The Shortlist models (Shortlist: Norris, 1994; Shortlist B: Norris and McQueen, 2008) assume that phoneme perception is a largely passive process (at least it can be inferred as such by lack of any specification in the alternative). While Shortlist B uses phoneme confusion data (probability functions as input) and could in principle adjust the confusion data based on experience (through hypothesis testing and feedback), the nature of the derivation of the phoneme confusions is not specified; in essence assuming the problem of phoneme perception is solved. This appears to be common to models (e.g., NAM, Luce and Pisoni, 1998) in which the primary goal is to account for word perception rather than phoneme perception. Similarly, the second Trace model (McClelland and Elman, 1986) assumed phoneme perception was passively achieved albeit with competition (not feedback to the input level). It is interesting that the first Trace model (Elman and McClelland, 1986) did allow for feedback from phonemes to adjust activation patterns from acoustic-phonetic input, thus providing an active mechanism. However, this was not carried over into the revised version. This model was developed to account for some aspects of phoneme perception unaccounted for in the second model. It is interesting to note that the Hebb-Trace model (Mirman et al., 2006a), while seeking to account for aspects of lexical influence on phoneme perception and speaker generalization did not incorporate active processing of the input patterns. As such, just the classification of those inputs was actively governed.

This can be understood in the context schema diagrammed in **Figure 1**. Any process that maps inputs onto representations in an invariant manner or that would be classified as a finitestate deterministic system can be considered passive. A process that changes the classification of inputs contingent on context or goals or hypotheses can be considered an active system. Although word recognition models may treat the recognition of words or even phonemes as an active process, this active processing is not typically extended down to lower levels of auditory processing. These systems tend to operate as though there is a fixed set of input features (e.g., phonetic features) and the classification of such features takes place in a passive, automatized fashion.

By contrast, Elman and McClelland (1986) did describe a version of Trace in which patterns of phoneme activation actively changes processing at the feature input level. Similarly, McClelland et al. (2006) described a version of their model in which lexical information can modify input patterns at the subphonemic level. Both of these models represent active systems for speech processing at the sublexical level. However, it is important to point out that such theoretical propositions remain controversial. McQueen et al. (2006) have argued that there are no data to argue for lexical influences over sublexical processing, although Mirman et al. (2006b) have countered this with empirical arguments. However, the question of whether there are top-down effects on speech perception is not the same as asking if there are active processes governing speech perception. Top-down effects assume higher level knowledge constrains interpretations, but as indicated in **Figure 1C**, there can be bottom-up active processing where by antecedent auditory context constrains subsequent perception. This could be carried out in a number of ways. As an example, Ladefoged and Broadbent (1957) demonstrated that hearing a context sentence produced by one vocal tract could shift the perception of subsequent isolated vowels such that they would be consistent with the vowel space of the putative speaker. Some have accounted for this result by asserting there is an automatic auditory tuning process that shifts perception of the subsequent vowels (Huang and Holt, 2012; Laing et al., 2012). While the behavioral data could possibly be accounted for by such a simple passive mechanism, it might also be the case the auditory pattern input produces constraints on the possible vowel space or auditory mappings that might be expected. In this sense, the question of whether early auditory processing of speech is an active or passive process is still a point of open investigation and discussion.

It is important to make three additional points in order to clarify the distinction between active and passive processes. First, a Bayesian mechanism is not on its own merits necessarily active or passive. Bayes rule describes the way different statistics can be used to estimate the probability of a diagnosis or classification of an event or input. But this is essentially a computation theoretic description much in the same way Fourier's theorem is independent of any implementation of the theorem to actually decompose a signal into its spectrum (cf. Marr, 1982). The calculation and derivation of relevant statistics for a Bayesian inference can be carried out passively or actively. Second, the presence of learning within a system does not on its own merits confer active processing status on a system. Learning can occur by a number of algorithms (e.g., Hebbian learning) that can be implemented passively. However to the extent that a system's inputs are plastic during processing, would suggest whether an active system is at work. Finally, it is important to point out that active processing describes the architecture of a system (the ability to modify processing on the fly based on the processing itself) but not the behavior at any particular point in time. Given a fixed context and inputs, any active system can and likely would mimic passive behavior. The detection of an active process therefore depends on testing behavior under contextual variability or resource limitations to observe changes in processing as a consequence of variation in the hypothesized alternatives for interpretation (e.g., slower responses, higher error rate or confusions, increase in working memory load).

## **COMPUTATIONAL NEED FOR ACTIVE CONTROL SYSTEMS IN SPEECH PERCEPTION**

Understanding how and why active cognitive processes are involved in speech perception is fundamental to the development of a theory of speech perception. Moreover, the nature of the theoretical problems that challenge most explanations of speech perception are structurally similar to some of the theoretical issues in language comprehension when considered more broadly. In addition to addressing the basis for language comprehension broadly, to the extent that such mechanisms play a critical role in spoken language processing, understanding their operation may be important to understanding both the effect of hearing loss on speech perception as well as suggesting ways of remediating hearing loss. If one takes an overly simplified view of hearing (and thus damage to hearing resulting in loss) as an acoustic-to-neural signal transduction mechanism comparable to a microphone-amplifier system, the simplifying assumptions may be very misleading. The notion of the peripheral auditory system as a passive acoustic transducer leads to theories that postulate passive conversion of acoustic energy to neural signals and this may underestimate both the complexity and potential of the human auditory system for processing speech. At the very least, early auditory encoding in the brain (reflected by the auditory brainstem response) is conditioned by experience (Skoe and Kraus, 2012) and so the distribution of auditory experiences shapes the basic neural patterns extracted from acoustic signals. However, it is appears that this auditory encoding is shaped from the top-down under active and adaptive processing of higher-level knowledge and attention (e.g., Nusbaum and Schwab, 1986; Strait et al., 2010).

This conceptualization of speech perception as an active process has large repercussions for understanding the nature of hearing loss in older adults. Rabbitt (1991) has argued, as have others, that older adults, compared with younger adults, must employ additional perceptual and cognitive processing to offset sensory deficits in frequency and temporal resolution as well as in frequency range (Murphy et al., 2000; Pichora-Fuller and Souza, 2003; McCoy et al., 2005; Wingfield et al., 2005; Surprenant, 2007). Wingfield et al. (2005) have further argued that the use of this extra processing at the sensory level is costly and may affect the availability of cognitive resources that could be needed for other kinds of processing. While these researchers consider the cognitive consequences that may be encountered more generally given the demands on cognitive resources, such as the deficits found in the encoding of speech content in memory, there is less consideration of the way these demands may impact speech processing itself. If speech perception itself is mediated by active processes, which require cognitive resources, then the increasing demands on additional cognitive and perceptual processing for older adults becomes more problematic. The competition for cognitive resources may shortchange aspects of speech perception. Additionally, the difference between a passive system that simply involves the transduction, filtering, and simple pattern recognition (computing a distance between stored representations and input patterns and selecting the closest fit) and an active system that uses context dependent pattern recognition and signalcontingent adaptive processing has implications for the nature of augmentative hearing aids and programs of therapy for remediating aspects of hearing loss. It is well known that simple amplification systems are not sufficient remediation for hearing loss because they amplify noise as well as signal. Understanding how active processing operates and interacts with signal properties and cognitive processing might lead to changes in the way hearing aids operate, perhaps through cueing changes in attention, or by modifying the signal structure to affect the population coding of frequency information or attentional segregation of relevant signals. Training to use such hearing aids might be more effective by simple feedback or by systematically changing the level and nature of environmental sound challenges presented to listeners.

Furthermore, understanding speech perception as an active process has implications for explaining some of the findings of the interaction of hearing loss with cognitive processes (e.g., Wingfield et al., 2005). One explanation of the demands on cognitive mechanisms through hearing loss is a compensatory model as noted above (e.g., Rabbitt, 1991). This suggests that when sensory information is reduced, cognitive processes operate inferentially to supplement or replace the missing information. In many respects this is a kind of postperceptual explanation that might be like a response bias. It suggests that mechanisms outside of normal speech perception can be called on when sensory information is degraded. However an alternative view of the same situation is that it reflects the normal operation of speech recognition processing rather than an extra postperceptual inference system. Hearing loss may specifically exacerbate the fundamental problem of lack of invariance in acoustic-phonetic relationships.

The fundamental problem faced by all theories of speech perception derives from the lack of invariance in the relationship between the acoustic patterns of speech and the linguistic interpretation of those patterns. Although the many-to-many mapping between acoustic patterns of speech and perceptual interpretations is a longstanding well-known problem (e.g., Liberman et al., 1967), the core computational problem only truly emerges when a particular pattern has many different interpretations or can be classified in many different ways. It is widely established that individuals are adept in understanding the constituents of a given category, for traditional categories (Rosch et al., 1976) or *ad hoc* categories developed in response to the demands of a situation (Barsalou, 1983). In this sense, a many-to-one mapping does not pose a substantial computational challenge. As Nusbaum and Magnuson (1997) argue, a many-to-one mapping can be understood with a simple class of deterministic computational mechanisms. In essence, a deterministic system establishes one-to-one mappings between inputs and outputs and thus can be computed by passive mechanisms such as feature detectors. It is important to note that a many-to-one mapping (e.g., rising formant transitions signaling a labial stop and diffuse consonant release spectrum signaling a labial stop) can be instantiated as a collection of one-to-one mappings.

However, when a particular sensory pattern must be classified as a particular linguistic category and there are multiple possible interpretations, this constitutes a computational problem for recognition. In this case (e.g., a formant pattern that could signal either the vowel in BIT or BET) there is ambiguity about the interpretation of the input without additional information. One solution is that additional context or information could eliminate some alternative interpretations as in talker normalization (Nusbaum and Magnuson, 1997). But this leaves the problem of determining the nature of the constraining information and processing it, which is contingent on the ambiguity itself. This suggests that there is no automatic or passive means of identifying and using the constraining information. Thus an active mechanism, which tests hypotheses about interpretations and tentatively identifies sources of constraining information (Nusbaum and Schwab, 1986), may be needed.

Given that there are multiple alternative interpretations for a particular segment of speech signal, the nature of the information needed to constrain the selection depends on the source of variability that produced the one-to-many non-determinism. Variations in speaking rate, or talker, or linguistic context or other signal modifications are all potential sources of variability that are regularly encountered by listeners. Whether the system uses articulatory or linguistic information as a constraint, the perceptual system needs to flexibly use context as a guide in determining the relevant properties needed for recognition (Nusbaum and Schwab, 1986). The process of eliminating or weighing potential interpretations could well involve demands on working memory. Additionally, there may be changes in attention, towards more diagnostic patterns of information. Further, the system may be required to adapt to new sources of lawful variability in order to understand the context (cf. Elman and McClelland, 1986).

Generally speaking, these same kinds of mechanisms could be implicated in higher levels of linguistic processing in spoken language comprehension, although the neural implementation of such mechanisms might well differ. A many-to-many mapping problem extends to all levels of linguistic analysis in language comprehension and can be observed between patterns at the syllabic, lexical, prosodic and sentential level in speech and the interpretations of those patterns as linguistic messages. This is due to the fact that across linguistic contexts, speaker differences (idiolect, dialect, etc.) and other contextual variations, there are no patterns (acoustic, phonetic, syllabic, prosodic, lexical etc.) in speech that have an invariant relationship to the interpretation of those patterns. For this reason, it could be beneficial to consider how these phenomena of acoustic perception, phonetic perception, syllabic perception, prosodic perception, lexical perception, etc., are related computationally to one another and understand the computational similarities among the mechanisms that may subserve them (Marr, 1982). Given that such a mechanism needs to flexibly respond to changes in context (and different kinds of context—word or sentence or talker or speaking rate) and constrain linguistic interpretations in context, suggests that the mechanism for speech understanding needs to be plastic. In other words, speech recognition should inherently demonstrate learning.

#### **LEARNING MECHANISMS IN SPEECH**

While on its face this seems uncontroversial, theories of speech perception have not traditionally incorporated learning although some have evolved over time to do so (e.g., Shortlist-B, Hebb-Trace). Indeed, there remains some disagreement about the plasticity of speech processing in adults. One issue is how the longterm memory structures that guide speech processing are modified to allow for this plasticity while at the same time maintaining and protecting previously learned information from being expunged. This is especially important as often newly acquired information may represent irrelevant information to the system in a long-term sense (Carpenter and Grossberg, 1988; Born and Wilhelm, 2012).

To overcome this problem, researchers have proposed various mechanistic accounts, and while there is no consensus amongst them, a hallmark characteristic of these accounts is that learning occurs in two stages. In the first stage, the memory system is able to use fast learning temporary storage to achieve adaptability, and in a subsequent stage, during an offline period such as sleep, this information is consolidated into long-term memory structures if the information is found to be germane (Marr, 1971; McClelland et al., 1995; Ashby et al., 2007). While this is a general cognitive approach to the formation of categories for recognition, this kind of mechanism does not figure into general thinking about speech recognition theories. The focus of these theories is less on the formation of category representations and the need for plasticity *during* recognition, than it is on the stability and structure of the categories (e.g., phonemes) to be recognized. Theories of speech perception often avoid the plasticity-stability trade off problem by proposing that the basic categories of speech are established early in life, tuned by exposure, and subsequently only operate as a passive detection system (e.g., Abbs and Sussman, 1971; Fodor, 1983; McClelland and Elman, 1986; although see Mirman et al., 2006b). According to these kinds of theories, early exposure to a system of speech input has important effects on speech processing.

Given the importance of early exposure for establishing the phonological system, there is no controversy regarding the significance of linguistic experience in shaping an individual's ability to discriminate and identify speech sounds (Lisker and Abramson, 1964; Strange and Jenkins, 1978; Werker and Tees, 1984; Werker and Polka, 1993). An often-used example of this is found in how infants' perceptual abilities change *via* exposure to their native language. At birth, infants are able to discriminate a wide range of speech sounds whether present or not in their native language (Werker and Tees, 1984). However, as a result of early linguistic exposure and experience, infants gain sensitivity to phonetic contrasts to which they are exposed and eventually lose sensitivity for phonetic contrasts that are not experienced (Werker and Tees, 1983). Additionally, older children continue to show developmental changes in perceptual sensitivity to acoustic-phonetic patterns (e.g., Nittrouer and Miller, 1997; Nittrouer and Lowenstein, 2007) suggesting that learning a phonology is not simply a matter of acquiring a simple set of mappings between the acoustic patterns of speech and the sound categories of language. Further, this perceptual learning does not end with childhood as it is quite clear that even adult listeners are capable of learning new phonetic distinctions not present in their native language (Werker and Logan, 1985; Pisoni et al., 1994; Francis and Nusbaum, 2002; Lim and Holt, 2011).

A large body of research has now established that adult listeners can learn a variety of new phonetic contrasts from outside their native language. Adults are able to learn to split a single native phonological category into two functional categories, such as Thai pre-voicing when learned by native English speakers (Pisoni et al., 1982) as well as to learn completely novel categories such as Zulu clicks for English speakers (Best et al., 1988). Moreover, adults possess the ability to completely change the way they attend to cues, for example Japanese speakers are able to learn the English /r/-/l/ distinction, a contrast not present in their native language (e.g., Logan et al., 1991; Yamada and Tohkura, 1992; Lively et al., 1993). While learning is limited, Francis and Nusbaum (2002) demonstrated that given appropriate feedback, listeners can learn to direct perceptual attention to acoustic cues that were not previously used to form phonetic distinctions in their native language. In their study, learning new categories was manifest as a change in the structure of the acoustic-phonetic space wherein individuals shifted from the use of one perceptual dimension (e.g., voicing) to a complex of two perceptual dimensions, enabling native English speakers to correctly perceive Korean stops after training. How can we describe this change? What is the mechanism by which this change in perceptual processing occurs?

From one perspective this change in perceptual processing can be described as a shift in attention (Nusbaum and Schwab, 1986). Auditory receptive fields may be tuned (e.g., Cruikshank and Weinberger, 1996; Weinberger, 1998; Wehr and Zador, 2003; Znamenskiy and Zador, 2013) or reshaped as a function of appropriate feedback (cf. Moran and Desimone, 1985) or context (Asari and Zador, 2009). This is consistent with theories of category learning (e.g., Schyns et al., 1998) in which category structures are related to corresponding sensory patterns (Francis et al., 2007, 2008). From another perspective this adaptation process could be described as the same kind of cue weighting observed in the development of phonetic categories (e.g., Nittrouer and Miller, 1997; Nittrouer and Lowenstein, 2007). Yamada and Tohkura (1992) describe native Japanese listeners as typically directing attention to acoustic properties of /r/-/l/ stimuli that are not the dimensions used by English speakers, and as such are not able to discriminate between these categories. This misdirection of attention occurs because these patterns are not differentiated functionally in Japanese as they are in English. For this reason, Japanese and English listeners distribute attention in the acoustic pattern space for /r/ and /l/ differently as determined by the phonological function of this space in their respective languages. Perceptual learning of these categories by Japanese listeners suggests a shift of attention to the English phonetically relevant cues.

This idea of shifting attention among possible cues to categories is part and parcel of a number of theories of categorization that are not at all specific to speech perception (e.g., Gibson, 1969; Nosofsky, 1986; Goldstone, 1998; Goldstone and Kersten, 2003) but have been incorporated into some theories of speech perception (e.g., Jusczyk, 1993). Recently, McMurray and Jongman (2011) proposed the C-Cure model of phoneme classification in which the relative importance of cues varies with context, although the model does not specify a mechanism by which such plasticity is implemented neurally.

One issue to consider in examining the paradigm of training non-native phonetic contrasts is that adult listeners bring an intact and complete native phonological system to bear on any new phonetic category-learning problem. This pre-existing phonological knowledge about the sound structure of a native language operates as a critical mass of an acoustic-phonetic system with which a new category likely does not mesh (Nusbaum and Lee, 1992). New contrasts can re-parse the acoustic cue space into categories that are at odds with the native system, can be based on cues that are entirely outside the system (e.g., clicks), or can completely remap native acoustic properties into new categories (see Best et al., 2001). In all these cases however listeners need to not only learn the pattern information that corresponds to these categories, but additionally learn the categories themselves. In most studies participants do not actually learn a completely new phonological system that exhibits an internal structure capable of supporting the acquisition of new categories, but instead learn isolated contrasts that are not part of their native system. Thus, learning non-native phonological contrasts requires individuals to learn both new category structures, as well as how to direct attention to the acoustic cues that define those categories without colliding with extant categories.

How do listeners accommodate the signal changes encountered on a daily basis in listening to speech? Echo and reverberation can distort speech. Talkers speak while eating. Accents can change the acoustic to percept mappings based on the articulatory phonetics of a native language. While some of the distortions in signals can probably be handled by some simple filtering in the auditory system, more complex signal changes that are systematic cannot be handled in this way. The use of filtering as a solution for speech signal distortion assumes a model of speech perception whereby a set of acoustic-phonetic representations (whether talker-specific or not) are obscured by some distortion and that some simple acoustic transform (like amplification or time-dilation) is used to restore the signal.

An alternative to this view was proposed by Elman and McClelland (1986). They suggested that the listener can use systematicity in distortions of acoustic patterns as information about the sources of variability that affected the signal in the conditions under which the speech was produced. This idea, that systematic variability in acoustic patterns of phonetic categories provides information about the intended phonetic message, suggests that even without learning new phonetic categories or contrasts, learning the sources and structure of acousticphonetic variability may be a fundamental aspect of speech perception. Nygaard et al. (1994) and Nygaard and Pisoni (1998) demonstrated that listeners learning the speech of talkers using the same phonetic categories as the listeners show significant improvements in speech recognition. Additionally, Dorman et al. (1977) elegantly demonstrated that different talkers speaking the same language can use different acoustic cues to make the same phonetic contrasts. In these situations, in order to recognize speech, listeners must learn to direct attention to the specific cues for a particular talker in order to ameliorate speech perception. In essence, this suggests that learning may be an intrinsic part of speech perception rather than something added on. Phonetic categories must remain plastic even in adults in order to flexibly respond to the changing demands of the lack of invariance problem across talkers and contexts of speaking.

One way of investigating those aspects of learning that are specific to directing attention to appropriate and meaningful acoustic cues without additionally having individuals learn new phonetic categories or a new phonological system, is to examine how listeners adapt to synthetic speech that uses their own native phonological categories. Synthetic speech generated by rule is "defective" in relation to natural speech in that it oversimplifies the acoustic pattern structure (e.g., fewer cues, less cue covariation) and some cues may actually be misleading (Nusbaum and Pisoni, 1985). Learning synthetic speech requires listeners to learn how acoustic information, produced by a particular talker, is used to define the speech categories the listener already possesses. In order to do this, listeners need to make use of degraded, sparse and often misleading acoustic information, which contributes to the poor intelligibility of synthesized speech. Given that such cues are not available to awareness, and that most of such learning is presumed to occur early in life, it seems difficult to understand that adult listeners could even do this. In fact, it is this ability to rapidly learn synthetic speech that lead Nusbaum and Schwab (1986) to conclude that speech must be guided by active control processes.

### **GENERALIZATION LEARNING**

In a study reported by Schwab et al. (1985), listeners were trained on synthetic speech for 8 days with feedback and tested before and after training. Before training, recognition was about 20% correct, but improved after training to about 70% correct. More impressively this learning occurred even though listeners were never trained or tested on the same words twice, meaning that individuals had not just explicitly learned what they were trained on, but instead gained generalized knowledge about the synthetic speech. Additionally, Schwab et al. (1985) demonstrated that listeners are able to substantially retain this generalized knowledge without any additionally exposure to the synthesizer, as listeners showed similar performance 6 months later. This suggests that even without hearing the same words over and over again, listeners were able to change the way they used acoustic cues at a sublexical level. In turn, listeners used this sublexical information to drive recognition of these cues in completely novel lexical contexts. This is far different from simply memorizing the specific and complete acoustic patterns of particular words, but instead could reflect a kind of procedural knowledge of how to direct attention to the speech of the synthetic talker.

This initial study demonstrated clear generalization beyond the specific patterns heard during training. However on its own it gives little insight into the way such generalization emerges. In a subsequent study, Greenspan et al. (1988) expanded on this and examined the ability of adult listeners to generalize from various training regimes asking the question of how acousticphonetic variability affects generalization of speech learning. Listeners were either given training on repeated words or novel words, and when listeners memorize specific acoustic patterns of spoken words, there is very good recognition performance for those words. However this does not afford the same level of perceptual generalization that is produced by highly variable training experiences. This is akin to the benefits of training variability seen in motor learning in which generalization of a motor behavior is desired (e.g., Lametti and Ostry, 2010; Mattar and Ostry, 2010; Coelho et al., 2012). Given that training set variability modulates the type of learning, adult perceptual learning of spoken words cannot be seen as simply a rote process. Moreover, even from a small amount of repeated and focused rote training there is some reliable generalization indicating that listeners can use even restricted variability in learning to go beyond the training examples (Greenspan et al., 1988). Listeners may infer this generalized information from the training stimuli, or they might develop a more abstract representation of sound patterns based on variability in experience and apply this knowledge to novel speech patterns in novel contexts.

Synthetic speech, produced by rule, as learned in those studies, represents a complete model of speech production from orthographic-to-phonetic-to-acoustic generation. The speech that is produced is recognizable but it is artificial. Thus learning of this kind of speech is tantamount to learning a strange idiolect of speech that contains acoustic-phonetic errors, missing acoustic cues and does not possess correct cue covariation. However if listeners learn this speech by gleaning the new acoustic-phonetic properties for this kind of talker, it makes sense that listeners should be able to learn other kinds of speech as well. This is particularly true if learning is accomplished by changing the way listeners attend to the acoustic properties of speech by focusing on the acoustic properties that are most phonetically diagnostic. And indeed, beyond being able to learn synthesized speech in this fashion, adults have been shown to quickly adapt to a variety of other forms of distorted speech where the distortions initially cause a reduction in intelligibility, such as simulated cochlear implant speech (Shannon et al., 1995), spectrally shifted speech (Rosen et al., 1999) as well as foreign-accented speech (Weil, 2001; Clarke and Garrett, 2004; Bradlow and Bent, 2008; Sidaras et al., 2009). In these studies, listeners learn speech that has been produced naturally with coarticulation and the full range of acoustic-phonetic structure, however, the speech signal deviates from listener expectations due to a transform of some kind, either through signal processing or through phonological changes in speaking. Different signal transforms may distort or mask certain cues and phonological changes may change cue complex structure. These distortions are unlike synthetic speech however, as these transforms tend to be uniform across the phonological inventory. This would provide listeners with a kind of lawful variability (as described by Elman and McClelland, 1986) that can be exploited as an aid to recognition. Given that in all these speech distortions listeners showed a robust ability to apply what they learned during training to novel words and contexts, learning does not appear to be simply understanding what specific acoustic cues mean, but rather understanding what acoustic cues are most relevant for a given source and how to attend to them (Nusbaum and Lee, 1992; Nygaard et al., 1994; Francis and Nusbaum, 2002).

How do individuals come to learn what acoustic cues are most diagnostic for a given source? One possibility is that acoustic cues are mapped to their perceptual counterparts in an unguided fashion, that is, without regard for the systematicity of native acoustic-phonetic experience. Conversely, individuals may rely on their native phonological system to guide the learning process. In order to examine if perceptual learning is influenced by an individual's native phonological experience, Davis et al. (2005) examined if perceptual learning was more robust when individuals were trained on words versus non-words. Their rationale was that if training on words led to better perceptual learning than non-words, then one could conclude that the acoustic to phonetic remapping process is guided or structured by information at the lexical level. Indeed, Davis et al. (2005) showed that training was more effective when the stimuli consisted of words than non-words, indicating that information at the lexical level allows individuals to use their knowledge about how sounds are related in their native phonological system to guide the perceptual learning process. The idea that perceptual learning in speech is driven to some extent by lexical knowledge is consistent with both autonomous (e.g., Shortlist: Norris, 1994; Merge: Norris et al., 2000; Shortlist B: Norris and McQueen, 2008) and interactive (e.g., TRACE: McClelland and Elman, 1986; Hebb-Trace: Mirman et al., 2006a) models of speech perception (although whether learning can successfully operate in these models is a different question altogether). A subsequent study by Dahan and Mead (2010) examined the structure of the learning process further by asking how more localized or recent experience, such as the specific contrasts present during training, may organize and determine subsequent learning. To do this, Dahan and Mead (2010) systematically controlled the relationship between training and test stimuli as individuals learned to understand noise vocoded speech. Their logic was that if localized or recent experience organizes learning, then the phonemic contrasts present during training may provide such a structure, such that phonemes will be better recognized at test if they had been heard in a similar syllable position or vocalic context during training than if they had been heard in a different context. Their results showed that individuals' learning was directly related to the local phonetic context of training, as consonants were recognized better if they had been heard in a similar syllable position or vocalic context during training than if they had been heard in a dissimilar context.

This is unsurprising as the acoustic realization of a given consonant can be dramatically different depending on the position of a consonant within a syllable (Sproat and Fujimura, 1993; Browman and Goldstein, 1995). Further, there are coarticulation effects such that the acoustic characteristics of a consonant are heavily modified by the phonetic context in which it occurs (Liberman et al., 1954; Warren and Marslen-Wilson, 1987; Whalen, 1991). In this sense, the acoustic properties of speech are not dissociable beads on a string and as such, the linguistic context of a phoneme is very much apart of the acoustic definition of a phoneme. While experience during training does appear to be the major factor underlying learning, individuals also show transfer of learning to phonemes that were not presented during training provided that were perceptually similar to the phonemes that were present. This is consistent with a substantial body of speech research using perceptual contrast procedures that showed that there are representations for speech sounds both at the level of the allophonic or acoustic-phonetic specification as well as at a more abstract phonological level (e.g., Sawusch and Jusczyk, 1981; Sawusch and Nusbaum, 1983; Hasson et al., 2007). Taken together both the Dahan and Mead (2010) and the Davis et al. (2005) studies provide clear evidence that previous experience, such as the knowledge of one's native phonological system, as well as more localized experience relating to the occurrence of specific contrasts in a training set help to guide the perceptual learning process.

What is the nature of the mechanism underlying the perceptual learning process that leads to better recognition after training? To examine if training shifts attention to phonetically meaningful cues and away from misleading cues, Francis et al. (2000), trained listeners on CV syllables containing /b/, /d/, and or /g/ cued by a chimeric acoustic structure containing either consistent or conflicting properties. The CV syllables were constructed such that the place of articulation was specified by the spectrum of the burst (Blumstein and Stevens, 1980) as well as by the formant transitions from the consonant to the vowel (e.g., Liberman et al., 1967). However, for some chimeric CVs, the spectrum of the burst indicated a different place of articulation than the transition cue. Previously Walley and Carrell (1983) had demonstrated that listeners tend to identify place of articulation based on transition information rather than the spectrum of the burst when these cues conflict. And of course listeners never consciously hear either of these as separate signals—they simply hear a consonant at a particular place of articulation. Given that listeners cannot identify the acoustic cues that define the place of articulation consciously and only experience the categorical identity of the consonant itself, it seems hard to understand how attention can be directed towards these cues.

Francis et al. (2000) trained listeners to recognize the chimeric speech in their experiment by providing feedback about the consonant identity that was either consistent with the burst cues or the transition cues depending on the training group. For the burst-trained group, when listeners heard a CV and identified it as a B, D, or G, they would receive feedback following identification. For a chimeric consonant cued with a labial burst and an alveolar transition pattern (combined), whether listeners identified the consonant as B (correct for the burst-trained group) or another place, after identification they would hear the CV again and see feedback printed identifying the consonant as B. In other words, burst-trained listeners would get feedback during training consistent with the spectrum of the burst whereas transition-trained listeners would get feedback consistent with the pattern of the transitions. The results showed that cue-based feedback shifted identification performance over training trials such that listeners were able to learn to use the specific cue (either transition based or spectral burst based) that was consistent with the feedback and generalized to novel stimuli. This kind of learning research (also Francis and Nusbaum, 2002; Francis et al., 2007) suggests shifting attention may serve to restructure perceptual space as a result of appropriate feedback.

Although the standard view of speech perception is one that does not explicitly incorporate learning mechanisms, this is in part because of a very static view of speech recognition whereby stimulus patterns are simply mapped onto phonological categories during recognition, and learning may occur, if it does, afterwards. These theories never directly solve the lack of invariance problem, given a fundamentally deterministic computational process in which input states (whether acoustic or articulatory) must correspond uniquely to perceptual states (phonological categories). An alternative is to consider speech perception is an active process in which alternative phonetic interpretations are activated, each corresponding to a particular input pattern from speech (Nusbaum and Schwab, 1986). These alternatives must then be reduced to the recognized form, possibly by testing these alternatives as hypotheses shifting attention among different aspects of context, knowledge, or cues to find the best constraints. This view suggests that there should be an increase in cognitive load on the listener until a shift of attention to more diagnostic information occurs when there is a one-to-many mapping, either due to speech rate variability (Francis and Nusbaum, 1996) or talker variability (Nusbaum and Morin, 1992). Variation in talker or speaking rate or distortion can change the way attention is directed at a particular source of speech, shifting attention towards the most diagnostic cues and away from the misleading cues. This suggests a direct link between attention and learning, with the load on working memory reflecting the uncertainty of recognition given a one-to-many mapping of acoustic cues to phonemes.

If a one-to-many mapping increases the load on working memory because of active alternative phonetic hypotheses, and learning shifts attention to more phonetically diagnostic cues, learning to perceive synthetic speech should reduce the load on working memory. In this sense, focusing attention on the diagnostic cues should reduce the number of phonetic hypotheses. Moreover, this should not simply be a result of improved intelligibility, as increasing speech intelligibility without training should not have the same effect. To investigate this, Francis and Nusbaum (2009) used a speeded spoken target monitoring procedure and manipulated memory load to see if the effect of such a manipulation would change as a function of learning synthetic speech. The logic of the study was that varying a working memory load explicitly should affect recognition speed if working memory plays a role in recognition. Before training, working memory should have a higher load than after training, suggesting that there should be an interaction between working memory load and the training in recognition time (cf. Navon, 1984). When the extrinsic working memory load (to the speech task) is high, there should be less working memory available for recognition but when the extrinsic load is low there should be more working memory available. This suggests that training should interact with working memory load by showing a larger improvement of recognition time in the low load case than in the high load case. Of course if speech is directly mapped from acoustic cues to phonetic categories, there is no reason to predict a working memory load effect and certainly no interaction with training. The results demonstrated however a clear interaction of working memory load and training as predicted by the use of working memory and attention (Francis and Nusbaum, 2009). These results support the view that training reorganizes perception, shifting attention to more informative cues allowing working memory to be used more efficiently and effectively. This has implications for older adults who suffer from hearing loss. If individuals recruit additional cognitive and perceptual resources to ameliorate sensory deficits, then they will lack the necessary resources to cope with situations where there is an increase in talker or speaking rate variability. In fact, Peelle and Wingfield (2005) report that while older adults can adapt to time-compressed speech, they are unable to transfer learning on one speech rate to a second speech rate.

#### **MECHANISMS OF MEMORY**

Changes in the allocation of attention and the demands on working memory are likely related to substantial modifications of category structures in long term memory (Nosofsky, 1986; Ashby and Maddox, 2005). Effects of training on synthetic speech have been shown to be retained for 6 months suggesting that categorization structures in long-term memory that guide perception have been altered (Schwab et al., 1985). How are these category structures that guide perception (Schyns et al., 1998) modified? McClelland and Rumelhart (1985) and McClelland et al. (1995) have proposed a neural cognitive model that explains how individuals are able to adapt to new information in their environment. According to their model, specific memory traces are initially encoded during learning via a fast-learning hippocampal based memory system. Then, *via* a process of repeated reactivation or rehearsal, memory traces are strengthened and ultimately represented solely in the neocortical memory system. One of the main benefits of McClelland's model is that it explains how previously learned information is protected against newly acquired information that may potentially be irrelevant for longterm use. In their model, the hippocampal memory system acts as temporary storage where fast-learning occurs, while the neocortical memory system, which houses the long-term memory category that guide perception, are modified later, presumably offline when there are no encoding demands on the system. This allows the representational system to remain adaptive without the loss of representational stability as only memory traces that are significant to the system will be strengthened and rehearsed. This kind of two-stage model of memory is consistent with a large body of memory data, although the role of the hippocampus outlined in this model is somewhat different than other theories of memory (e.g., Eichenbaum et al., 1992; Wood et al., 1999, 2000).

Ashby et al. (2007) have also posited a two-stage model for category learning, but implementing the basis for the two stages, as well as their function in category formation, very differently. They suggest that the basal ganglia and the thalamus, rather than the hippocampus, together mediate the development of more permanent neorcortical memory structures. In their model, the striatum, globus pallidus, and thalamus comprise the fast learning temporary memory system. This subcortical circuit is has greater adaptability due to the dopamine-mediated learning that can occur in the basal ganglia, while representations in the neocortical circuit are much more slow to change as they rely solely on Hebbian learning to be amended.

McClelland's neural model relies on the hippocampal memory system as a substrate to support the development of the long-term memory structures in neocortex. Thus hippocampal memories are comprised of recent specific experiences or rote memory traces that are encoded during training. In this sense, the hippocampal memory circuit supports the longer-term reorganization or consolidation of declarative memories. In contrast, in the basal ganglia based model of learning put forth by Ashby a striatum to thalamus circuit provides the foundation for the development of consolidation in cortical circuits. This is seen as a progression from a slow based hypothesis-testing system to a faster processing, implicit memory system. Therefore the striatum to thalamus circuit mediates the reorganization or consolidation of procedural memories. To show evidence for this, Ashby et al. (2007) use information-integration categorization tasks, where the rules that govern the categories that are to be learned are not easily verbalized. In these tasks, the learner is required to integrate information from two or more dimensions at some pre-decisional stage. The logic is that information-integration tasks use the dopamine-mediated reward signals afforded by the basal ganglia. In contrast, in rule-based categorization tasks the categories to be learned are explicitly verbally defined, and thus rely on conscious hypothesis generation and testing. As such, this explicit category learning is thought (Ashby et al., 2007) to be mediated by the anterior cingulate and the prefrontal cortex. For this reason, demands on working memory and executive attention are hypothesized to affect only the learning of explicit based categories and not implicit procedural categories, as working memory and executive attention are processes that are largely governed by the prefrontal cortex (Kane and Engle, 2000).

The differences between McClelland and Ashby's models appear to be related in part to the distinction between declarative versus procedural learning. While it is certainly reasonable to divide memory in this way, it is unarguable that both types of memories involve encoding and consolidation. While it may be the case that the declarative and procedural memories operate through different systems, this seems unlikely given that there are data suggesting the role of the hippocampus in procedural learning (Chun and Phelps, 1999) even when this is not a verbalizable and an explicit rule-based learning process. Elements of the theoretic assumptions of both models seem open to criticism in one way or another. But both models make explicit a process by which rapidly learned, short-term memories can be consolidated into more stable forms. Therefore it is important to consider such models in trying to understand the process by which stable memories are formed as the foundation of phonological knowledge in speech perception.

As noted previously, speech appears to have separate representations for the specific acoustic patterns of speech as well as more abstract phonological categories (e.g., Sawusch and Jusczyk, 1981; Sawusch and Nusbaum, 1983; Hasson et al., 2007). Learning appears to occur at both levels as well (Greenspan et al., 1988) suggesting the importance of memory theory differentiating both short-term and long-term representations as well as stimulus specific traces and more abstract representations. It is widely accepted that any experience may be represented across various levels of abstraction. For example, while only specific memory traces are encoded for many connectionist models (e.g., McClelland and Rumelhart's, 1985 model), various levels of abstraction can be achieved in the retrieval process depending on the goals of the task. This is in fact the foundation of Goldinger's (1998) echoic trace model based on Hintzman's (1984) MINERVA2 model. Specific auditory representations of the acoustic pattern of a spoken word are encoded into memory and abstractions are derived during the retrieval process using working memory.

In contrast to these trace-abstraction models is another possibility wherein stimulus-specific and abstracted information are both stored in memory. For example an acoustic pattern description of speech as well as a phonological category description are represented separately in memory in the TRACE model (McClelland and Elman, 1986; Mirman et al., 2006a). In this respect then, the acoustic patterns of speech—as particular representations of a specific perceptual experience—are very much like the echoic traces of Goldinger's model. However where Goldinger argued against forming and storing abstract representations, others have suggested that such abstractions may in fact be formed and stored in the lexicon (see Luce et al., 2003; Ju and Luce, 2006). Indeed, Hasson et al. (2007) demonstrated repetition suppression effects specific to the abstract phonological representation of speech sounds given that the effect held between an illusory syllable /ta/ and a physical syllable /ta/ based on a network spanning sensory and motor cortex. Such abstractions are unlikely to simply be an assemblage of prior sensory traces given that the brain areas involved are not the same as those typically activated in recognizing those traces. In this way, memory can be theoretically distinguished into rote representational structures that consist of specific experienced items or more generalized structures that consist of abstracted information. Rote memories are advantageous for precise recall of already experienced stimuli where as generalized memory would favor performance for a larger span of stimuli in a novel context.

This distinction between rote and generalized representations cuts across the distinction between procedural and declarative memory. Both declarative and procedural memories may be encoded as either rote or generalized memory representational structures. For example, an individual may be trained to press a specific sequence of keys on a keyboard. This would lead to the development of a rote representational memory structure, allowing the individual to improve his or her performance on that specific sequence. Alternatively, the individual may be trained to press several sequences of keys on a keyboard. This difference in training would lead to the development of a more generalized memory structure, resulting in better performance both experienced and novel key sequences. Similarly declarative memories may be encoded as either rote or generalized structures as a given declarative memory structures may consist of either the specific experienced instances of a particular stimulus, as in a typical episodic memory experiment, or the "gist" of the experienced instances as in the formation of semantic memories or possibly illusory memories based on associations (see Gallo, 2006).

The argument about the distinction between rote and generalized or abstracted memory representations becomes important when considering the way in which memories become stabilized through consolidation. In particular, for perceptual learning of speech, two aspects are critical. First, given the generativity of language and the context-sensitive nature of acoustic-phonetics, listeners are not going to hear the same utterances again and again and further, the acoustic pattern variation in repeated utterances, even if they occurred, would be immense due to changes in linguistic context, speaking rate, and talkers. As such, this makes the use of rote-memorization of acoustic patterns untenable as a speech recognition system. Listeners either have to be able to generalize in real time from prior auditory experiences (as suggested by Goldinger, 1998) or there must be more abstract representations that go beyond the specific sensory patterns of any particular utterance (as suggested by Hasson et al., 2007). This is unlikely due to the second consideration, which is that any generalizations in speech perception must be made quickly and remain stable to be useful. As demonstrated by Greenspan et al. (1988) even learning a small number of spoken words from a particular speech synthesizer will produce some generalization to novel utterances, although increasing the variability in experiences will increase the amount of generalization.

The separation between rote and generalization learning is further demonstrated by the effects of sleep consolidation on the stability of memories. In the original synthetic speech learning study by Schwab et al. (1985), listeners demonstrated significant learning in spite of never hearing the same words twice. Moreover this generalization learning lasted for roughly 6 months without subsequent training. This demonstrates that high variability in training examples with appropriate feedback can produce large improvements in generalized performance that can remain robust and stable for a long time. Fenn et al. (2003) demonstrated that this stability is a consequence of sleep consolidation of learning. In addition, when some forgetting takes place over the course of a day following learning, sleep restores the forgotten memories. It appears that this may well be due to sleep separately consolidating both the initial learning as well as any interference that occurs following learning (Brawn et al., 2013). Furthermore, Fenn and Hambrick (2012) have demonstrated that the effectiveness of sleep consolidation is related to individual differences in working memory such that higher levels of working memory performance are related to better consolidation. This links the effectiveness of sleep consolidation to a mechanism closely tied to active processing in speech perception. Most recently though, Fenn et al. (2013) found that sleep operates differently for rote and generalized learning.

These findings have several implications for therapy with listeners with hearing loss. First, training and testing should be separated by a period of sleep in order to measure the amount of learning that is stable. Second, although variability in training experiences seems to produce slower rates of learning, it produces greater generalization learning. Third, measurements of working memory can give a rough guide to the relative effectiveness of sleep consolidation thereby indicating how at risk learning may be to interference and suggesting that training may need to be more prolonged for people with lower working memory capacity.

#### **CONCLUSION**

Theories of speech perception have often conceptualized the earliest stages of auditory processing of speech to be independent of higher level linguistic and cognitive processing. In many respects this kind of approach (e.g., in Shortlist B) treats the phonetic processing of auditory inputs as a passive system in which acoustic patterns are directly mapped onto phonetic features or categories, albeit with some distribution of performance. Such theories treat the distributions of input phonetic properties as relatively immutable. However, our argument is that even early auditory processes are subject to descending attentional control and active processing. Just as echolocation in the bat is explained by a cortofugal system in which cortical and subcortical structures are viewed as processing cotemporaneously and interactively (Suga, 2008), the idea that descending projects from cortex to thalamus and to the cochlea provide a neural substrate for cortical tuning of auditory inputs. Descending projections from the lateral olivary complex to the inner hair cells and from the medial olivary complex to the outer hair cells provide a potential basis for changing auditory encoding in real time as a result of shifts of attention. This kind of mechanism could support the kinds of effects seen in increased auditory brainstem response fidelity to acoustic input following training (Strait et al., 2010).

Understanding speech perception as an active process suggests that learning or plasticity is not simply a higher-level process grafted on top of word recognition. Rather the kinds of mechanisms involved in shifting attention to relevant acoustic cues for phoneme perception (e.g., Francis et al., 2000, 2007) are needed for tuning speech perception to the specific vocal characteristics of a new speaker or to cope with distortion of speech or noise in the environment. Given that such plasticity is linked to attention and working memory, we argue that speech perception is inherently a cognitive process, even in terms of the involvement of sensory encoding. This has implications for remediation of hearing loss either with augmentative aids or therapy. First, understanding the cognitive abilities (e.g., working memory capacity, attention control etc.) may provide guidance on how to design a training program by providing different kinds of sensory cues that are correlated or reducing the cognitive demands of training. Second, increasing sensory variability within the limits of individual tolerance should be part of a therapeutic program. Third, understanding the sleep practice of participants using sleep logs, record of drug and alcohol consumption, and exercise are important to the consolidation of learning. If speech perception is continuously plastic but there are limitations based on prior experiences and cognitive capacities, this shapes the basic nature of remediation of hearing loss in a number of different ways.

Finally, we would note that there is a dissociation among the three classes of models that are relevant to understanding speech perception as an active process. Although cognitive models of spoken word processing (e.g., Cohort, TRACE, and Shortlist) have been developed to include some plasticity and to account for different patterns of the influence of lexical knowledge, even the most recent versions (e.g., Distributed Cohort, Hebb-TRACE, and Shortlist B) do not specifically account for active processing of auditory input. It is true that some models have attempted to account for active processing below the level of phonemes (e.g., TRACE I: Elman and McClelland, 1986; McClelland et al., 2006), these models not been related or compared systematically to the kinds of models emerging from neuroscience research. For example, Friederici (2012) and Rauschecker and Scott (2009) and Hickok and Poeppel (2007) have all proposed neurally plausible models largely around the idea of dorsal and ventral processing streams. Although these models differ in details, in principle the model proposed by Friederici (2012) and Rauschecker and Scott (2009) have more extensive feedback mechanisms to support active processing of sensory input. These models are constructed in a neuroanatomical vernacular rather than the cognitive vernacular (even the Hebb-TRACE is still largely a cognitive model) of the others. But both sets of models are notable for two important omissions.

First, while the cognitive models mention learning and even model it, and the neural models refer to some aspects of learning, these models do not relate to the two-process learning models (e.g., complementary learning systems (CLS; McClelland et al., 1995; Ashby and Maddox, 2005; Ashby et al., 2007)). Although CLS focuses on episodic memory and Ashby et al. (2007) focus on category learning, two process models involving either hippocampus, basal ganglia, or cerebellum as a fast associator and cortico-cortical connections as a slower more robust learning system, have garnered substantial interest and research support. Yet learning in the models of speech recognition has yet to seriously address the neural bases of learning and memory except descriptively.

This points to a second important omission. All of the speech recognition models are cortical models. There is no serious consideration to the role of the thalamus, amygdala, hippocampus, cerebellum or other structures in these models. In taking a corticocentric view (see Parvizi, 2009), these models exhibit an unrealistic myopia about neural explanations of speech perception. Research by Kraus et al. (Wong et al., 2007; Song et al., 2008) demonstrates that there are measurable effects of training and experience on speech processing in the auditory brainstem. This is consistent with an active model of speech perception in which attention and experience shape the earliest levels of sensory encoding of speech. Although current data do not exist to support online changes in this kind of processing, this is exactly the kind of prediction an active model of speech perception would make but is entirely unexpected from any of the current models of speech perception.

#### **AUTHOR CONTRIBUTIONS**

Shannon L. M. Heald prepared the first draft and Howard C. Nusbaum revised and both refined the manuscript to final form.

## **ACKNOWLEDGMENTS**

Preparation of this manuscript was supported in part by an ONR grant DoD/ONR N00014-12-1-0850, and in part by the Division of Social Sciences at the University of Chicago.

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 12 September 2013; accepted: 25 February 2014; published online: 17 March 2014.*

*Citation: Heald SLM and Nusbaum HC (2014) Speech perception as an active cognitive process. Front. Syst. Neurosci. 8:35. doi: 10.3389/fnsys.2014.00035*

*Copyright © 2014 Heald and Nusbaum. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Speech perception under adverse conditions: insights from behavioral, computational, and neuroscience research

#### *Sara Guediche1 \*, Sheila E. Blumstein1,2, Julie A. Fiez 3,4,5 and Lori L. Holt 3,5,6*


#### *Edited by:*

*Jonathan E. Peelle, Washington University in St. Louis, USA*

#### *Reviewed by:*

*Patti Adank, University College London, UK Conor J. Wild, Western University, Canada*

#### *\*Correspondence:*

*Sara Guediche, Department of Cognitive, Linguistic, and Psychological Sciences, Brown University, 190 Thayer Street, Providence, RI 02912, USA e-mail: sara\_guediche@ brown.edu* Adult speech perception reflects the long-term regularities of the native language, but it is also flexible such that it accommodates and adapts to adverse listening conditions and short-term deviations from native-language norms. The purpose of this article is to examine how the broader neuroscience literature can inform and advance research efforts in understanding the neural basis of flexibility and adaptive plasticity in speech perception. Specifically, we highlight the potential role of learning algorithms that rely on prediction error signals and discuss specific neural structures that are likely to contribute to such learning. To this end, we review behavioral studies, computational accounts, and neuroimaging findings related to adaptive plasticity in speech perception. Already, a few studies have alluded to a potential role of these mechanisms in adaptive plasticity in speech perception. Furthermore, we consider research topics in neuroscience that offer insight into how perception can be adaptively tuned to short-term deviations while balancing the need to maintain stability in the perception of learned long-term regularities. Consideration of the application and limitations of these algorithms in characterizing flexible speech perception under adverse conditions promises to inform theoretical models of speech.

#### **Keywords: perceptual learning, plasticity, supervised learning, cerebellum, language, prediction error signals, speech perception**

Spoken language is conveyed by transient acoustic signals with complex and variable structure. Ultimately, the challenge of speech perception is to map these signals to representations (e.g., pre-lexical and lexical knowledge) of an individual's native language community. In real-world environments, this challenge is frequently exacerbated under adverse listening conditions arising from noisy listening environments, hearing impairment, or speech that deviates from long-term speech regularities due to talkers' accents, dialects or speech disorders. In circumstances where adverse conditions lead to systematic short-term deviations from the long-term regularities of a language, a listener can rapidly adjust the mappings from acoustic input to long-term knowledge. However, little is known about the mechanisms underlying adaptive plasticity in speech perception. Understanding such rapid adaptive plasticity may provide insight into how the perceptual system deals with adverse listening situations. Although there has been recent interest in investigating adaptive plasticity in speech perception, these studies have used different tasks and methodologies and remain mostly unconnected. It is one of the goals of this paper to review these findings and integrate the results within a potentially common framework.

To this end, we examine a number of factors that influence adaptive plasticity in speech perception and review behavioral, computational, and functional neuroimaging studies that have contributed to our current understanding of adaptive processes. In reviewing these mostly separate strands of research, we take the view that examining candidate neural systems that may underlie the behavioral changes could reveal a unifying framework for understanding how adaptive plasticity is achieved. We draw from domains outside of speech perception to consider supervised learning relying on sensory prediction error signals as a potential mechanism for uniting seemingly distinct behavioral speech perception phenomena. From this perspective, we propose that understanding the neural basis of adaptive plasticity in speech perception will require integrating subcortical structures into current frameworks of speech processing, which until now have largely focused on the cerebral cortex. Specifically, we examine the possibility that subcortical-cortical interactions may form functional networks for driving plasticity.

#### **INSIGHTS FROM BEHAVIORAL STUDIES**

We first examine two distinct behavioral literatures, each demonstrating adaptive changes in speech perception in response to signal distortions. One set of studies investigates improvements in *spoken word recognition* following experience with distorted signals. The other examines changes in *acoustic phonetic perception* following experience with distorted input in disambiguating contexts. Both sets of studies show changes at early stages of speech processing, which are facilitated by disambiguating contextual sources of information (e.g., lexical information). Across different studies and tasks, perceptual effects showing adaptive changes in speech perception have been variously termed "perceptual learning," "adaptation," "recalibration," and "retuning," with the choice of descriptor driven mostly by the associated task. Here, we use "adaptive plasticity" as a broader term to be inclusive of distinct literatures and different tasks that may tap into some of the same processes in adjusting speech perception to accommodate short-term deviations in speech acoustics.

#### **ADAPTIVE PLASTICITY IN WORD RECOGNITION TASKS**

Adults rapidly and effortlessly extract words from fluent speech. However, adverse listening conditions can affect the quality and reliability of the acoustic speech signal and negatively impact word recognition, reducing intelligibility (for review see Mattys et al., 2012). Under certain circumstances, brief experience with the adverse listening condition results in intelligibility improvements (e.g., Pallier et al., 1998; Liss et al., 2002; Clarke and Garrett, 2004; Bradlow and Bent, 2008). For example, several studies have shown that brief familiarization with natural foreign-accented speech can improve intelligibility of the accented talker (e.g., Clarke and Garrett, 2004; Bradlow and Bent, 2008) and, under some circumstances, generalize to intelligibility improvements for speech from other talkers with the same native language background (e.g., Bradlow and Bent, 2008). Such adaptive plasticity is observed across many acoustic speech signal distortions including synthesized text-to-speech (Schwab et al., 1985; Greenspan et al., 1988; Francis et al., 2000) dysarthric speech (Liss et al., 2002), and speech in noise (e.g., Cainer et al., 2008). It is also observed with more synthetic manipulations of the speech signal such as noise vocoding (Davis et al., 2005), spectral shifting (e.g., Fu and Galvin, 2003), and time compression (e.g., Altmann and Young, 1993). Many of these experimental manipulations relate to commonly occurring natural adverse listening experiences and some are intended to mimic the degraded experiences encountered by listeners with hearing deficits or cochlear implants. Overall, there is widespread evidence that intelligibility of distorted speech input improves with relatively brief experience or training across many different types of signal distortion.

Though the flexibility of perception under a variety of adverse listening conditions indicates the robustness of adaptive plasticity in speech perception, the use of different stimulus manipulations and different types of training and experience across studies makes it difficult to build an integrative model. However, several key characteristics merit special attention. One significant characteristic of studies in this literature is the supportive influence of information that disambiguates the acoustics of distorted words. This information may originate from external feedback indicating the appropriate interpretation of the signal. For example, intelligibility is improved when a distorted acoustic word is paired with the written form of the word during the initial presentation (e.g., Fu and Galvin, 2003) or following the response (e.g., Greenspan et al., 1988; Francis et al., 2000, 2007), and when the clear undistorted version of the signal precedes the distorted signal during training (Hervais-Adelman et al., 2008). Each of these approaches provides the speech system with information to support mapping the distorted speech signal to linguistic knowledge.

Adaptive plasticity in speech perception can even occur without explicit feedback. Mere exposure to nonnative-accented speech results in improvements in performance in the absence of explicit feedback or other explicit information about the correct interpretation (e.g., Altmann and Young, 1993; Mehler et al., 1993; Sebastian-Galles et al., 2000; Liss et al., 2002). Simply listening to time-compressed speech (Altmann and Young, 1993; Mehler et al., 1993; Sebastian-Galles et al., 2000) or natural speech from dysarthric patients (Liss et al., 2002) can lead to intelligibility improvements. Likewise, experience with distorted sentences containing real target words improves recognition of subsequent distorted sentences to a greater degree than experience with target nonwords (Davis et al., 2005). These findings suggest that internally generated lexical information may also contribute to adaptive plasticity. In sum, information that supports the disambiguation of speech, including externally provided information and internally generated lexical information, may promote adaptive plasticity (Davis et al., 2005; Hervais-Adelman et al., 2008).

A second significant characteristic of adaptive plasticity is that when external sources of information are unavailable to resolve the ambiguity of the distorted acoustic signal, the degree of adaptation appears to be dependent on the severity of the distortion (Bradlow and Bent, 2008; Li et al., 2009). For example, listeners show greater adaptation to relatively more intelligible foreign accented speech (Bradlow and Bent, 2008). Other studies have shown that adaptive plasticity is difficult for severe artificial speech distortions (e.g., Li and Fu, 2006, 2010), whereas gradually increasing the severity of the distortion (Guediche et al., 2009) or intermixing less severely distorted signals with more severe distortions (Li et al., 2009) facilitates adaptation. Indeed, adaptive plasticity can be readily observed for time-compressed speech, even without external feedback (e.g., Pallier et al., 1998). This may be because the degree of time compression generally used tends to result in more intelligible distortions (often 50– 60% intelligibility or greater) (e.g., Pallier et al., 1998; Peelle and Wingfield, 2005; Adank and Janse, 2009) in comparison to textto-speech or noise-vocoded speech distortions (e.g., Schwab et al., 1985; Francis et al., 2000; Davis et al., 2005; Hervais-Adelman et al., 2011) that typically employ feedback to promote adaptive plasticity.

A third key characteristic of adaptive plasticity is that improvements in intelligibility as a consequence of experience generalize to words not encountered during a training or exposure period (Schwab et al., 1985; Francis et al., 2000, 2007). In fact, in many studies all the words in the experiment are unique. Therefore, even though lexical knowledge can mediate adaptive plasticity by disambiguating the distorted signals (Davis et al., 2005), adaptive change must occur in the mapping of the distorted sounds to pre-lexical representations and not in the mapping from speech acoustics to any particular lexical item.

Overall, studies of adaptive plasticity in word recognition employ multiple stimulus distortions and various approaches to delivering experience with the speech distortion. Experience with the speech distortions can lead to improvements in intelligibility, through adaptive processes that retune the mapping of the distorted acoustic speech input to the speech processing system. The remapping seemingly plays out at an early stage of perception (e.g., pre-lexical). This adaptive plasticity is facilitated by the availability of disambiguating external information (such as explicit feedback or corresponding clear and undistorted speech), and also by signals that are relatively less distorted and, therefore, more intelligible. Disambiguating information and baseline intelligibility may have their influence on adaptive plasticity through a common means: each may impact the relative accuracy with which the distorted acoustics are mapped to established longterm regularities of the native language.

If both externally-provided and internally-generated information contribute to adaptive plasticity, the impact of external feedback on adaptive plasticity is likely to be greater for less intelligible signal distortions compared to more intelligible distortions. Indeed, when distortion intelligibility and the presence of external feedback are independently manipulated, the two factors interact to modulate the degree of adaptive plasticity observed (Guediche et al., 2009). Intelligibility serves as a metric for the accuracy with which listeners can map distorted signals to lexical knowledge. Greater intelligibility thus indicates greater success in mapping distorted acoustics, which may produce internal signals to guide adaptive plasticity that are less reliable or less available when intelligibility is low. In this latter case, external information that supports accurate mapping may serve to drive adaptive plasticity. We return to the implications of this possibility below.

#### **ADAPTIVE PLASTICITY IN ACOUSTIC PHONETIC PERCEPTION**

Adaptive plasticity has also been shown in other speech tasks that examine acoustic phonetic perception. Acoustic phonetic perception involves a complex mapping of acoustic speech signals that vary along multiple, largely continuous acoustic dimensions to long-term representations that respect the regularities of the native language (e.g., phonemes, words). This mapping is complicated by the fact that even when measured in quiet, well-controlled laboratory conditions, the acoustics conveying a particular phoneme or word are highly variable (e.g., Peterson and Barney, 1952).

Under adverse conditions more typical of natural listening environments, there are short-term deviations in speech acoustics introduced by sources like foreign accent, dialect, noise, different speakers, and speech disorder. These systematic deviations can distort the acoustic speech signal. A listener may encounter a native Spanish talker referring in English to a *fish* using a vowel with acoustics more typical of English /i/ (a *feesh*) than /I/. The same listener might also encounter a native Pittsburgh talker chatting about the local football team, the Steelers, in the local dialect that produces English /i/ with acoustics more typical of /I/ (the *Stillers*). Listeners would have little difficulty in either case as the perceptual system flexibly adjusts to such signal distortions.

A broad research literature with a long history demonstrates that ambiguous speech signals can be resolved using many sources of contextual information. Acoustic (Lotto and Kluender, 1998; Holt, 2005), lexical (Ganong, 1980), visual (McGurk and Macdonald, 1976; MacDonald and McGurk, 1978), and sentence contexts (Ladefoged and Broadbent, 1957), among others, each play a role in disambiguating speech signals. A sound with ambiguous acoustics between /g/ and /k/ is more likely to be perceived as /k/ in the context of *\_\_iss* (*kiss* is a real word, *giss* is not), but as /g/ in the context of \_\_*ift* (Ganong, 1980)*.* Similarly, an ambiguous sound between /b/ and /d/ can be disambiguated by watching a video of a face articulating /b/ vs. /d/ (Bertelson et al., 1997). Relevant to the adaptive plasticity literature, repeated exposure to an ambiguous acoustic speech signal in a disambiguating context affects later perception of the ambiguous speech—even in the absence of a biasing context (Norris et al., 2003; Vroomen et al., 2007). This suggests an adaptive change in the way the ambiguous speech acoustics are mapped that remains even when the biasing context is no longer available.

Two such biasing contexts have been explored extensively, lexical context and visually-presented articulating faces (Bertelson et al., 2003; Norris et al., 2003; Vroomen et al., 2007). Lexicallymediated changes in acoustic phonetic perception can be achieved by exposing listeners to ambiguous speech sounds embedded in lexical contexts that only produce a valid lexical item for one of the phonemes (e.g., Norris et al., 2003; Kraljic and Samuel, 2005; Maye et al., 2008; for review see Samuel and Kraljic, 2009 for review). For example, when an acoustically-ambiguous sound between /s/ and [- ] is presented in contexts for which only /s/ completes a real word (e.g., *legacy, Arkansas*), lexical knowledge provides a means of disambiguating the sound (Ganong, 1980). This experience affects subsequent [s]-[- ] perception such that the acoustically-ambiguous [s]-[- ] sound is more broadly accepted as [s] following exposure to [s]-consistent lexical contexts than following exposure to [- ] -consistent contexts (e.g., *pediatrician;* Kraljic and Samuel, 2005). This effect is observed even when the lexically-biasing context is no longer present. Many experiments have demonstrated such lexical tuning of acoustic phonetic perception across phonemes, languages, and talkers in adults (see for review Samuel and Kraljic, 2009) and even among 6- and 12-year-old children (McQueen et al., 2012).

Exposure to visual information from an articulating face that disambiguates an ambiguous speech sound produces similar changes in acoustic phonetic perception. Bertelson et al. (2003) examined phonetic perception of an acoustically-ambiguous /aba/ and /ada/. Following exposure to the ambiguous token paired with a video of a face clearly articulating /aba/, subsequent perception of the ambiguous /aba/-/ada/ stimuli was shifted as acoustic information consistent with /aba/.

Although lexical and visually-mediated adaptive plasticity have been most studied to date, other factors can also drive adaptive plasticity. Phonotactic probabilities (Cutler, 2008) and statistical regularities experienced across multiple tokens of speech exemplars (Clayards et al., 2008; Idemaru and Holt, 2011) can also result in adaptive plasticity. In the latter example, correlations among acoustic cues provide a disambiguating source of information for how acoustic dimensions relate to one another in signaling phonemes (Idemaru and Holt, 2011). These findings are consistent with a rich literature demonstrating that listeners make use of many sources of information to disambiguate inherently ambiguous acoustic speech input. The literature on adaptive plasticity extends these observations by demonstrating that upon repeated exposure, the effects of a disambiguating context can remain even in the absence of context.

Clarke-Davidson et al. (2008) argue that data demonstrating adaptive plasticity in acoustic phonetic perception are best fit by modeling adaptation at the level of perceptual (pre-lexical) processing rather than at a subsequent decision level. In general, the nature of this pre-lexical influence is to more broadly accept the ambiguous acoustics as consistent with the biasing context. In other words, the adaptive adjustments of acoustic-phonetic perception are in the direction of the disambiguating (lexical, visual, statistical) contexts. In this way, adaptive plasticity in acoustic phonetic perception bears resemblance to adaptive plasticity in word recognition reviewed above. Specifically, both examples of adaptive plasticity show that contextual information (e.g., lexical information) can drive changes in perception at a pre-lexical level.

#### **SUMMARY**

Two largely independent strands of research demonstrate rapid adaptive changes in the mapping of distorted acoustic speech signals. They have evolved in parallel, kept distinct primarily along paradigmatic lines, with little cross-talk (although Norris et al. (2003), Cutler (2008), and Samuel (2011) note commonalities). Motivated by results across these studies that show similarities, such as the contributions of both internal (e.g., lexical) and external (e.g., feedback) information sources, a common pre-lexical locus, and a similar influence of the severity of the acoustic distortion on the degree of adaptation, we explore the possibility that these commonalities reflect common mechanisms. We first review computational modeling efforts that account for adaptive plasticity, and then turn to cognitive neuroscience and neuroscience research in other domains for further insights.

## **INSIGHTS FROM COMPUTATIONAL MODELING**

Computational models assist in understanding adaptive plasticity by explicitly modeling outcomes of potential learning algorithms and relating these outcomes directly to behavioral evidence. Traditional computational models of speech perception are generally defined by hierarchically-organized layers that represent linguistic information at different levels of abstraction (e.g., perceptual/featural, pre-lexical, lexical). Two classes of hierarchical models—feedforward models (e.g., Norris, 1994; Norris et al., 2000) and interactive models (e.g., McClelland and Elman, 1986; Gaskell and Marslen-Wilson, 1997)—have been especially influential and each has provided an account of rapid adaptive plasticity, specifically focusing on lexically-mediated adaptive plasticity as measured by changes in acoustic phonetic perception (e.g., Norris et al., 2003). In the interactive model Hebb-TRACE, an unsupervised learning algorithm, Hebbian learning, is used to modify connection weights (Mirman et al., 2006), whereas a supervised learning algorithm (backpropagation) is proposed in the context of the feedforward MERGE model (Norris et al., 2003).

One influential debate between feedforward and interactive accounts is the degree to which different levels interact with one another. In feedforward modes like MERGE, there is no direct feedback from lexical representations to influence online speech perception. Thus, in contrast to interactive models, adaptive

plasticity arises from feedback that is dedicated only for the purpose of learning. Norris et al. (2003) propose that in this case, feedback from lexical to pre-lexical levels is used to derive an error signal that indicates the degree to which there is a discrepancy between the expected phonological representation activated by the lexical item and the one indicated by the acoustic speech signal. They propose backpropagation, first instantiated by Rumelhart et al. (1986), as an implementation of supervised learning to produce adaptive plasticity. Backpropagation uses error signals to drive changes in the weights of connections between the input speech signal and the pre-lexical information to reduce the discrepancy. Because the pre-lexical units mediate mapping between acoustic input and lexical knowledge, generalization to new words also results. While backpropagation provides a supervised learning mechanism that may capture the rapid nature of the observed behavioral effects, it is not neurobiologically plausible (Crick, 1989).

Hebb-TRACE (Mirman et al., 2006) is a modification of the interactive TRACE model (McClelland and Elman, 1986) that has an added Hebbian learning algorithm. It models adaptive plasticity via adjustments in the weights mapping from input to pre-lexical representations. Lexical activation results in direct excitatory feedback from the lexical layer to pre-lexical information consistent with the word. Processing of a perceptually ambiguous sound (e.g., with acoustics between /s/ and /- /) leads to partial activation of both consonants with lateral inhibitory within-level connections leading to competition between the two alternatives at the pre-lexical level. The biasing lexical context (e.g., *legacy, Arkansas*) increases the activation of the congruent phoneme (/s/) through direct excitatory feedback, granting it an advantage over the partially-activated /- /. To achieve adaptive plasticity, the mapping of lower-level perceptual information to phonetic categories is adjusted via Hebbian learning such that subsequent perception of these consonants is more likely to activate the consonant consistent with the previous lexical context, even in the absence of the biasing context. By this account, the same lexical feedback that influences online acoustic phonetic perception also guides learning of the mapping of distorted speech onto pre-lexical representations. A difficulty for this account is its time course. Whereas adaptive plasticity effects can require as few as 10–20 trials to evoke, Hebbian learning has a much slower time course for learning (Norris et al., 2003; Vroomen et al., 2007).

Although the focus of traditional computational accounts has been on modeling the effect of lexical information on acoustic phonetic perception, the proposed learning mechanisms may be capable of accounting for adaptation to distorted speech input of the sort observed in the word recognition literature. Norris et al. (2003) explicitly make the connection between the mechanisms involved in adaptive plasticity of acoustic phonetic perception and those that underlie improvements in word recognition. The proposed mechanisms for lexically-guided adaptive plasticity in both MERGE and Hebb-TRACE also could be extended to accounts of other types of lexically-mediated adaptive plasticity and effects of other linguistic information at other higher levels of linguistic abstraction [e.g., sentence context (Borsky et al., 1998)], or different modalities (e.g., visual information Vroomen et al., 2007). Nonetheless, to this point, these disparate strands of research have not been integrated and there have been few attempts to examine whether it may be possible to unite different phenomena of adaptive plasticity in speech perception on mechanistic grounds.

#### **A UNIFYING PERSPECTIVE?**

The behavioral and modeling literatures that investigate adaptive plasticity in speech processing have distinct approaches that make it challenging to draw direct comparisons. However, evaluating them together reveals that there are a few observations any account of adaptive plasticity must address. One is that information that disambiguates distorted or otherwise perceptuallyambiguous acoustic speech input rapidly adjusts the way that the system maps speech input at a pre-lexical level, such that later input is less ambiguous even when disambiguating information is no longer present to support interpretation. Long-term knowledge, external feedback, and overall intelligibility of the distorted input each seem to play a role in modulating the extent to which adaptive plasticity is observed.

A common feature among different forms of disambiguating information may be that they each provide a basis for generating predictions. This characteristic relates to recent work suggesting that predictive coding may be a useful framework for understanding speech processing. To this end, we use predictive coding as an illustrative approach for considering adaptive plasticity. Predictive coding models capitalize on the reciprocal connections between different levels of a hierarchically organized structure and provide a way for generating predictions from externally-provided context or from internally-accessed information induced by the stimulus itself (Bastos et al., 2012; Panichello et al., 2012). The idea is that feedback from higher levels in the hierarchical speech processing structure can modulate activity in lower levels. These predictions are compared with the actual sensory input such that any discrepancies result in an internallygenerated prediction error signal. This error signal, in turn, drives adaptive adjustments of the internal prediction to improve alignment of future predictions with incoming input. Although there is still debate regarding the role of different sources of feedback in online perception compared to adaptive plasticity (Norris et al., 2000; McClelland et al., 2006), the generation of predictions and prediction error signals may be common to both processes.

In the domain of adaptive plasticity for acoustic phonetic perception, Vroomen and colleagues suggested that "crossmodal conflict" is responsible for driving rapid changes in perception and noted the possibility that it provides a common mechanism for both lexically-mediated and visually-mediated adaptive plasticity (Vroomen et al., 2007; Vroomen and Baart, 2012). They argued that in both cases, a discrepancy (i.e., error signal) between the information provided by different sources of information (lexical, visual) and the information provided by the input sensory modality (ambiguous acoustic speech signal) leads to adaptive plasticity. Bertelson et al. (2003); Vroomen et al. (2007), Vroomen and Baart (2012) also noted the intriguing similarities between adaptive plasticity in speech perception and sensorimotor adaptation, such as is observed for adapting movements while wearing visually-distorting prism goggles, Martin et al., 1996b). Namely, each depends on discrepancies between expected and actual sensory outcomes. Although Vroomen et al.'s analogy has been rarely linked to the supervised learning algorithms that are posited as a mechanism of adaptive plasticity in the MERGE model (Norris et al., 2003), it is strikingly similar. Dependence on discrepancies between expectations of the input as a result of lexical activation and the *actual* activation from the input form the basis of prediction error signals of supervised learning for adaptive plasticity and also relate closely to mechanisms attributed to sensorimotor adaptation in literatures outside of speech perception (see Wolpert et al., 2011 for review). Thus, consideration of the mechanisms underlying prediction error signals, generally, and sensorimotor adaptation, more specifically, may reveal a rapid and biologically-plausible neural mechanism for achieving adaptive plasticity in speech perception.

## **INSIGHTS FROM COGNITIVE NEUROSCIENCE**

## **NEUROIMAGING EVIDENCE FOR PREDICTIVE CODING IN SPEECH PERCEPTION**

Although neuroanatomical models of speech perception differ in their details, the general consensus is that there are two or more hierarchically-organized streams that diverge from posterior superior temporal cortex (Hickok and Poeppel, 2007; Rauschecker, 2011). The popular dual-stream model by Hickok and Poeppel (2007) suggests a ventral stream that supports access to meaning and combinatorial processes, and a dorsal stream that supports access to articulatory processing. In the ventral stream, more posterior areas of temporal cortex are involved in perceptual and lower levels of speech processing, whereas more anterior temporal cortical regions are involved in more abstract higher levels of language processing (Hickok and Poeppel, 2007; Rauschecker and Scott, 2009; DeWitt and Rauschecker, 2012). In particular, superior temporal areas are recruited for sensory-based perceptual processes, posterior middle and inferior temporal areas are engaged in lexical and semantic processes, and anterior superior and middle temporal areas are involved in comprehension (Binder et al., 2004, 2009; Scott, 2012). Supporting evidence for a posterior (responding earlier) to anterior (responding later) ventral processing stream in temporal cortex comes from a variety of neuroimaging methodologies and analyses (e.g., Gow et al., 2008; Leff et al., 2008; Sohoglu et al., 2012). In the dorsal stream, parietal areas have been implicated in sensorimotor processing and frontal areas in articulatory processing. However, there is also evidence for parietal involvement in other aspects of speech processing including semantic and conceptual processes (e.g., Binder et al., 2009; Seghier et al., 2010), lexical and sound categorization (e.g., Blumstein et al., 2005; Rauschecker, 2012). Similarly, other functions have been attributed to frontal areas, such as suggestions that the inferior frontal gyrus (BA44/45) is engaged in syntactic and executive processes (Caplan, 1999, 2006; Binder et al., 2004; Fedorenko et al., 2012). Nonetheless, the view that multiple hierarchically organized neural streams support different aspects of perception has been established as a framework for understanding perception for visual and auditory perception (Ungerleider and Haxby, 1994; Rauschecker and Tian, 2000), and is also becoming a widely accepted view for speech processing (e.g., Hickok and Poeppel, 2004, 2007; Rauschecker and Scott, 2009; Peelle et al., 2010; Price, 2012).

This kind of hierarchically organized system has formed the basis for understanding speech processing. For example, models that propose predictive coding also postulate a system that is hierarchically organized with reciprocal connections between different stages of processing. Although the focus of such models has been on online speech processing rather than adaptive plasticity, understanding how predictions affect changes in brain activity is essential for each of these processes. At the neural level, the predictive coding framework suggests predictions can serve to constrain perception through feedback signals from regions associated with processing information at higher levels of abstraction (e.g., frontal areas that are at higher levels in the speech hierarchy) that modulate activity in regions associated with perceptual processes (e.g., temporal areas that receive the top–down modulation) (for review see Davis and Johnsrude, 2007; Peelle et al., 2010; Wild et al., 2012b). Thus, the literature on predictive coding has focused largely on changes in frontal areas (associated with higher-level processes) and temporal areas (associated with perceptual processes). Based on hypothesized functions of different brain regions, neuroimaging studies have provided some evidence for predictive mechanisms in speech perception (e.g., Clos et al., 2012; Sohoglu et al., 2012; Wild et al., 2012) by examining effects of predictive contexts and stimulus distortions, as well as their interactions.

Consistent with a hierarchically organized predictive coding framework, manipulation of predictive contexts modulates activity in frontal areas, with greater activity typically observed for more predictive contexts (e.g., Myers and Blumstein, 2008; Gow et al., 2008; Davis et al., 2011; Clos et al., 2012; Wild et al., 2012). Not surprisingly, stimulus distortions modulate activity in temporal areas (Davis et al., 2011; Clos et al., 2012; Wild et al., 2012), which are associated with early perceptual processes. Findings from MEG provide supporting evidence that this modulation begins early in the speech processing time course (Sohoglu et al., 2012). Interestingly, effects related to manipulations of speech signal distortion seem to depend on stimulus intelligibility, with greater activity to distortion severity for intelligible stimuli and decreased response to distortion severity for unintelligible stimuli (Poldrack et al., 2001; Adank and Devlin, 2010). This U-shaped response function indicates that modulatory influences of signal distortion in temporal cortex may be dependent on multiple factors. Although not all of the studies examine or report modulatory influences of stimulus distortions on frontal areas, many studies do show increases in frontal activity associated with increases in the distortion severity (e.g., Poldrack et al., 2001; Adank and Devlin, 2010; Eisner et al., 2010).

Since the size of the prediction error signal depends on both the predictive context and the congruency of the acoustic input, one approach has been to examine the interaction between a predictive context and a stimulus distortion in order to determine potential regions that encode error signals (Spratling, 2008; Gagnepain et al., 2012; Clark, 2013). A number of studies have shown such interactions in both temporal and frontal areas (e.g., Obleser and Kotz, 2010; Davis et al., 2011; Obleser and Kotz, 2011; McGettigan et al., 2012; Sohoglu et al., 2012; Guediche et al., 2013). Davis et al. (2011) found an interaction between a semantic coherence manipulation that modulated the degree to which targets were predictable and an acoustic speech signal distortion of those targets in frontal and temporal areas, providing evidence for the involvement of the two regions in predictive coding. Sohoglu et al. (2012) examined the joint effects of the sensory distortion of a spoken word and the informativeness of preceding text resolving the distorted signal, suggesting that both factors modulate activity in temporal cortex albeit in opposing directions. That is, sensory detail evoked greater response, relative alignment of the signal with top-down knowledge resulted in less response. Even more compelling evidence comes from an MEG study that demonstrated changes in activity in the superior temporal gyrus that were modulated based on differences between what was expected and what was heard. This study used a segment prediction error task, in which the beginning segment of a word predicted or did not predict the end segment (formula vs. formubo) (Gagnepain et al., 2012). That temporal areas are involved in early perceptual processes and are also sensitive to this interaction led the authors to conclude that these areas reflect the encoding of prediction errors in speech perception (Clos et al., 2012; Gagnepain et al., 2012; Sohoglu et al., 2012; Wild et al., 2012). Together, the studies suggest that predictive coding, which generates feedback signals (presumably from frontal areas) modulates temporal areas according to the predicted sensory input generated from the predictive coding context.

On the other hand, evidence from other studies suggests that the story may be more complex. For example, across studies, similar manipulations have produced different patterns of changes in BOLD signal (e.g., Davis et al., 2011 vs. Sohoglu et al., 2012). Since changes in BOLD signal may reflect different aspects of the error signal (e.g., degree, precision) (Friston and Kiebel, 2009; Hesselmann et al., 2010), there are still many open questions about the role of different regions in predictive coding. Furthermore, some interactions cannot be completely accounted for by a predictive coding framework (McGettigan et al., 2012; Guediche et al., 2013). In the predictive coding framework, activation within areas reflecting prediction error signals should increase as the degree of discrepancy between the expected and actual input increases. However, some studies have shown interdependent modulatory influences, for example, McGettigan et al. (2012)showed that responses to the *quality* or clarity of the acoustic stimulus depended on the *predictability* of the context as well other factors associated with the stimulus properties (e.g., intelligibility). That a predictive context may lead to either increased or decreased activity as a function of the intelligibility of the stimulus (in temporal and/or parietal areas) (McGettigan et al., 2012; Guediche et al., 2013) suggests that the generation of prediction error signals may be informed by the integration of multiple sources of information, and not solely by the computation derived from a predictive context.

Above, we suggested that adaptive plasticity is guided by supervisory signals derived from discrepancies between expected and actual sensory input. The evidence we reviewed from recent studies in speech perception examining predictive coding in online speech perception is beginning to reveal the cortical networks engaged by tasks that manipulate signal distortions and predictive contexts. To date, the findings related to interactions between predictive contexts and stimulus distortions provide support for a dynamic speech processing framework where predictions can be generated from contextual sources of information and be used to derive prediction error signals. In the predictive coding framework the error signal presumably is used to optimize future predictions and drive learning mechanisms that lead to adaptive plasticity (Clark, 2013). Despite potential similarities between the mechanisms underlying these effects [although see Norris et al. (2003) for a different view], adaptive plasticity differs from the online effects of predictive context on interpreting distorted speech acoustics in that it impacts subsequent perception of speech even once disambiguating contexts are no longer available. While it is possible that predictive coding provides a means of generating prediction error signals that can be used to supervise adaptive plasticity, it is not clear how changes in activity related to predictive coding could give rise to the adaptive plasticity effects evident in the behavioral literatures reviewed above. Although many details about prediction-errorsignal driven learning remain to be discovered, it is uncontroversial that the brain integrates incoming sensory information with prior perceptual, motor, and cognitive knowledge to arrive at a unified perceptual experience.

#### **NEUROIMAGING EVIDENCE FOR ADAPTIVE PLASTICITY IN SPEECH PERCEPTION**

In an attempt to dissociate neural changes directly related to adaptive plasticity from modulatory effects of factors such as predictive context and stimulus distortions, we review studies that have specifically investigated changes in neural activity associated with adaptive plasticity (Adank and Devlin, 2010; Eisner et al., 2010; Kilian-Hutten et al., 2011a,b; Erb et al., 2013). Although tasks (word recognition and acoustic phonetic perception) and stimulus manipulations (noise-vocoded, time-compressed, ambiguous) vary across these studies, collectively they implicate the involvement of premotor, temporal, parietal, and frontal areas in adaptive speech perception.

In word recognition studies, evidence for the recruitment of temporal and premotor areas is consistent across studies. Adank and Devlin (2010) examined adaptive plasticity during exposure to time-compressed speech and showed increased activation in bilateral auditory cortex and left ventral premotor cortex associated with adaptation. They concluded that under adverse listening conditions, such as time compression, the dorsal motor stream is recruited to facilitate disambiguation of the speech signal. In a recent word recognition study, Erb et al. (2013) showed that greater changes in activity in precentral gyrus were associated with greater adaptive plasticity after exposure to a noise-vocoded speech distortion. The involvement of the motor system is consistent with prior work suggesting that motor recruitment may facilitate the resolution of perceptually ambiguous speech signals under difficult listening conditions (e.g., Davis and Johnsrude, 2003, 2007; Rauschecker, 2011; Szenkovitz et al., 2012).

The recruitment of other regions may also be important for adaptive plasticity. Eisner et al. (2010) examined adaptation to a speech distortion that simulated cochlear-implant speech input and found that activity in superior temporal cortex and inferior frontal gyrus corresponded with improvements in intelligibility with training. They also found that learning over the course of the experiment corresponded to modulation of activity in a parietal area—specifically, the angular gyrus. The angular gyrus may be ideally suited for guiding the adaptation process, as its functional and structural connectivity with other brain regions suggests that it may provide a point of convergence for motor, sensory, and more abstract linguistic information (Binder et al., 2009; Friederici, 2009; Turken and Dronkers, 2011). Guediche et al. (accepted) also showed differences in frontal and temporal areas before vs. after adaptation to vocoded and spectrallyshifted speech. Taken together, changes in frontal, temporal, and premotor areas have been associated with manipulations of disambiguating contexts context and the severity/intelligibility of the distorted stimuli.

Fewer studies have investigated visually- and lexicallymediated adaptive plasticity of acoustic phonetic perception using neuroimaging. One study examined visually-mediated adaptive plasticity using videos of articulating faces to disambiguate ambiguous acoustic speech stimuli (Kilian-Hutten et al., 2011a). As in the behavioral study by Bertelson et al. (2003), exposure to an ambiguous token paired with a video of a face clearly articulating one of the phonetic alternatives led the ambiguous token to be perceived more often as the alternative consistent with the articulating face in a later acoustic phonetic perception task. Kilian-Hutten et al. (2011a) showed that the perceptual interpretation of the ambiguous sounds could be decoded with multi-voxel pattern analysis in temporal areas (adjacent to and encompassing Heschl's gyrus). This demonstrates a change in the neural pattern of activity consistent with the perceptual change relatively early in auditory cortical networks caused by adaptive plasticity. In order to identify regions involved in learning, Kilian-Hutten et al. (2011b) examined how brain activity during adaptation was related to later perception of the ambiguous stimuli. They found that the visually-mediated adaptive plasticity of acoustic phonetic perception corresponded to changes in activity in a network of areas including frontal, temporal, and parietal areas (Kilian-Hutten et al., 2011b).

To our knowledge, only one neuroimaging study has examined lexically-mediated adaptive plasticity in acoustic phonetic perception (Mesite and Myers, 2012). Similar to the behavioral study by Kraljic and Samuel (2005), two groups of participants were exposed to ambiguous [s]-[- ] tokens in different biasing lexical contexts. They showed between-group changes in subsequent acoustic phonetic perception of the ambiguous tokens presented without lexically-disambiguating contexts. The behavioral changes in acoustic phonetic perception were associated with differences in the activity of right frontal and middle temporal areas. The limited data that exist thus suggest that, similar to the findings from word recognition studies, adaptive plasticity evidenced in acoustic phonetic perception of ambiguous phonetic categories engages a network of frontal, temporal, and parietal areas.

Because of the use of different stimuli, tasks (examining context effects vs. adaptation effects), and analyses (focusing on specific changes and sometimes specific regions) across studies, many questions remain open. Furthermore, even though there is a great deal of evidence supporting the multiple stream view of speech processing, there is still debate regarding the role of specific regions in speech and language processes. Despite these caveats, the current evidence is consistent with a view that frontal (e.g., inferior frontal and middle frontal gyrus) and temporal areas (e.g., superior temporal and middle temporal gyrus) are sensitive to context and stimulus properties. Frontal areas may provide the source of the predictive feedback, potentially involving different frontal areas for different sources of contextual information (Rothermich and Kotz, 2013) and may modulate activity in temporal areas associated with earlier perceptual processes (Gagnepain et al., 2012). Changes in brain activity related to adaptive plasticity may rely more specifically on the recruitment of higher association areas (e.g., parietal cortex) that seem to relate more directly to adaptive plasticity (Obleser et al., 2007; Eisner et al., 2010; Guediche et al., accepted). In all, the literatures investigating the neural basis of predictive coding and adaptive plasticity complement one another and can be leveraged for developing and refining a more detailed model of the dynamic, flexible nature of speech perception.

Despite these advances in our understanding of how specific cortical regions may contribute to a dynamically adaptive speech perception network, presently, there is no formal speech perception model that relates activity in the cortical regions identified via neuroimaging to the computational demands of adaptive plasticity in speech perception. Conversely, the classic computational models of speech perception that have attempted to differentiate how the system may meet the computational demands of adaptive plasticity have not made specific predictions of the underlying neural mechanisms. Next-generation models will need to bridge this divide to explain how adaptive changes in perception are reflected in brain activity and how they take place without undermining the stability of and sensitivity to long-term regularities.

We next examine literatures outside of speech perception for insight into how we may make progress toward meeting these challenges. Inasmuch as it relates to the dual demands of maintaining long-term representations that respect regularities of the environment while flexibly adjusting perception to shortterm deviations from these regularities, adaptive plasticity is not unique to speech perception. Preserving the balance between stability and plasticity is important for perceptual, motor and cognitive processing in many domains. Consequently, research outside the domain of speech perception may provide insight regarding the development of a biologically plausible account of adaptive plasticity in speech processing that captures the significant behavioral characteristics we outlined above.

#### **INSIGHTS FROM NEUROSCIENCE**

Thus far, research on the neural basis of adaptive plasticity in speech perception has been largely focused on cerebral cortical regions. In the section that follows, we argue that the cerebellum plays a role in adaptive plasticity in speech perception. Specifically, we review evidence from sensorimotor learning for cerebellar involvement in perception, predictive coding, and adaptive plasticity. We consider the potential importance of cerebro-cerebellar interactions in generating prediction errors derived from discrepancies between predicted and actual sensory input. Such a mechanism may provide a way to unite the seemingly distinct behavioral speech perception phenomena we reviewed above. Finally, we propose that such a mechanism may be especially relevant since it offers a means to achieve rapid adjustment of perception in response to short-term deviations without undermining the stability of learned long-term regularities.

It may seem surprising to consider the cerebellum as part of a network involved in perceptual plasticity as, historically, the cerebellum has been considered a primarily motor structure. Since many neuroimaging studies of speech perception are focused on changes in perisylvian areas, data collection and/or analyses often fail to consider the cerebellum. However, outside the domain of speech perception, there has been increased interest in the cerebellum's role in non-motoric functions, with some limited but compelling evidence that it is involved in cognitive functions, including language (Fiez et al., 1992; Desmond and Fiez, 1998; Thach, 1998; Strick et al., 2009; although see Glickstein, 2006 for debate). This perspective posits that the cerebellar system plays an important role in supervised learning across many different domains through the manipulation of internal models (Ito, 2008). We next briefly review evidence for cerebellar involvement in sensorimotor adaptation.

#### **CEREBELLAR-DEPENDENT SUPERVISED LEARNING IN SENSORIMOTOR TASKS**

In the sensorimotor domain, the underlying mechanisms of adaptation to sensory input distortions have been explored extensively, with multiple lines of evidence underscoring the significance of the cerebellum. A classic behavioral task demonstrating sensorimotor adaptation is visually-guided reaching while wearing prism goggles (e.g., Martin et al., 1996b). When prism goggles that shift the visual field several degrees distort sensory input, motor behavior in a visually guided reaching task is impacted. Initially, reaches are off-target. However, participants rapidly adapt to the distorted sensory input across 10–20 reaches, as evidenced by successful on-target reaching (Martin et al., 1996b). Such sensorimotor adaptation is observed across many stimulus distortions and motor behaviors (Kawato and Wolpert, 1998; Wolpert et al., 1998). Clinical studies examining performance on sensorimotor tasks in patients with cerebellar damage (Martin et al., 1996a; Ackermann et al., 1997), functional neuroimaging studies examining changes in neural activity in short-term adaptation tasks (Clower et al., 1996), and lesion studies with non-human primates (Kagerer et al., 1997; Baizer et al., 1999) all implicate the cerebellum as having an important role in such sensorimotor adaptation.

The role of the cerebellum in sensorimotor adaptation has been attributed largely to supervised learning mechanisms based on internally-generated sensory prediction errors (e.g., Doya, 2000; Shadmehr et al., 2010). Cerebellar-dependent supervised learning within the context of sensorimotor adaptation is thought to rely on the internal generation of sensory prediction error signals derived from discrepancy between the predicted and actual sensory input (Wolpert et al., 2011). The predicted sensory input is the expected outcome of a planned movement (a reach, for example) and can thus be derived from the "internal model" of the input-output relationship of sensory and motor information. With repeated visually-guided reaches while wearing prism goggles, for example, the sensory prediction errors reconfigure the relationship among visual, motor, and proprioceptive information sources to optimize future predictions and minimize error signals, leading to adaptation evidenced by more accurate reaching on subsequent trials (Kawato and Wolpert, 1998; Bedford, 1999; Desmurget and Grafton, 2000; Flanagan et al., 2003; Scott and Wise, 2004; Shadmehr et al., 2010; Clark, 2013).

Such sensorimotor adaptation is also evident in the domain of speech. Adaptation is observed when speakers experience sensory input distortions while talking, such as through real-time manipulation of voice acoustics to alter acoustic feedback from one's own voice or via somatosensory perturbations that alter the feel of speech articulation (e.g., Houde and Jordan, 1998, 2002; Perkell et al., 2007; Villacorta et al., 2007; Shiller et al., 2009; Golfinopoulos et al., 2011; Chang et al., 2013). Speakers quickly adjust their production in a direction that compensates for the sensory input distortion (Houde and Jordan, 1998). In this way, speech production exhibits compensatory motor changes in response to distorted sensory input just as observed for other sensorimotor tasks (Houde and Jordan, 1998, 2002; Jones, 2003). A range of acoustic manipulations has been examined including shifts in fundamental frequency, vowel formant frequency, and the timing of auditory speech feedback (Houde and Jordan, 1998; Jones and Munhall, 2000; Perkell et al., 2007). These shifts can be quite extreme. In one study, participants produced a completely different vowel sound relative to the intended target after they were exposed to vowel formant shifts (Houde and Jordan, 1998).

Neuroanatomical models of speech production have incorporated the idea of internal models that represent the relationship between the sensory input and motor output (Guenther, 1995; Guenther and Ghosh, 2003; Kotz and Schwartze, 2010; Tian and Poeppel, 2010; Price et al., 2011). Guenther (1995); Guenther and Ghosh (2003) developed a neuroanatomically-based computationally model of speech production that incorporates expected relationships between a desired sensory outcome, the motor commands that should produce this outcome, and the actual sensory consequences of the produced speech. The DIVA (Directions Into Velocities of Articulators) model consists of several cerebral cortical areas that interact with the cerebellum, forming a network that guides sensorimotor adaptation in speech production. Through these interactions, internal models can be used to detect and correct errors under sensory input perturbations. Neuroimaging studies of sensorimotor adaptation in speech production have yielded results consistent with predictions from this model. In a study that investigated somatosensory perturbations by using a device to block jaw movement, increases in the BOLD signal were observed across left inferior frontal gyrus, ventral premotor cortices, supramarginal gyri, and the cerebellum, consistent with the model's predictions. These results provided support for the view that cerebro-cerebellar interactions are involved in sensorimotor adaptation in speech (Golfinopoulos et al., 2011). A recent study by Zheng et al. (2013)suggests that multiple interacting functional networks are involved in coding different aspects of the error signals. As reviewed briefly above, although the speech production literature has focused largely on cerebral cortical areas (e.g., Price et al., 2011; but see Guenther and Ghosh, 2003), there is convergent evidence from other literatures that supervised prediction error learning involves cerebro-cerebellar interactions (Doya, 2000; Ito, 2008; Wolpert et al., 2011). In the current speech production models, generation of prediction error signals may relate to those in speech perception either through the sensory expectations that are generated from internal speech processes (e.g., Tian and Poeppel, 2010) or from phonological information (e.g., Price et al., 2011).

There is still debate regarding the role of the motor system in generating predictions during speech perception. Pickering and Garrod (2007) suggested that multiple levels of linguistic information (e.g., semantic, syntactic) engage speech production processes to generate predictions. More recently, Tian and Poeppel (2013) instructed participants to engage in overt speaking, covert/imagined speaking, or imagined hearing and found that there may be differences in how predictions are generated depending on the nature of the speaking tasks participants were engaged in. Tian and Poeppel (2013) suggest that linguistic information retrieved from memory, as well as inner speech processes, can be used to generate predictions and modulate activity in regions associated with perceptual processes. This is consistent with models of visual perception, which also suggest that multiple sources of information can provide feedback to early visual areas (Mumford, 1992; Rao and Ballard, 1999). Thus, cerebellardependent supervised learning mechanisms may contribute to adaptive plasticity in speech perception that may operate on prediction error signals derived directly from different sources of linguistic information, indirectly from inner speech motor processes, or both.

Although the focus of research has been, and continues to be, on cerebellar contributions to the adaptive control of movement through sensorimotor adaptation, there is mounting evidence that the cerebellum is also involved in many other perceptual (Ivry, 1996; Petacchi et al., 2005) and cognitive behaviors (Fiez et al., 1992; Desmond and Fiez, 1998; Thach, 1998; Strick et al., 2009). At the outset, we noted that the cerebellum is increasingly recognized to play an important role in supervised learning, across many domains, through the manipulation of internal models (Ito, 2008). In sensorimotor learning, sensory prediction errors realign internal models of sensorimotor relationships. If the role of the cerebellum is more general, it is possible that it is involved in supervised learning that serves to align sensory input with predictions arising from *nonmotor* sources thus extending cerebellar-dependent supervised learning outside sensorimotor domains, (e.g., Doya, 2000; Ito, 2008; Strick et al., 2009).

Indeed, in a nonmotor perceptual task, recent evidence points to cerebellar involvement in perception of spatiotemporal relationships. Roth et al. (2013) recently demonstrated that cerebellar patients are impaired in their ability to adapt to discrepancies in a nonmotor task that relies on spatio-temporal judgments about a visual target. This study provides direct evidence of cerebellar involvement in perceptual adaptation within an entirely nonmotor task that is not dependent on the consequences of one's own motor behavior. There is also evidence that the cerebellum is involved in encoding acoustic sensory prediction error signals in a nonmotor task. Schlerf et al. (2012) showed that activity in the cerebellum is modulated by sensory changes in an acoustically presented stimulus (Schlerf et al., 2012), and different forms of predictive information (Rothermich and Kotz, 2013). In sum, intriguing recent results, even outside the domain of speech perception, suggest the possibility of cerebellar involvement in supervised learning that extends beyond sensorimotor interactions.

In light of known interactions between perception and production, a relationship between the mechanisms that underlie sensorimotor and sensory adaptation seems likely. In fact, even sensorimotor adaptation can evoke "purely" perceptual shifts that are unaccounted for by changes in motor output (e.g., Shiller et al., 2009; Nasir and Ostry, 2009; Mattar et al., 2011). For example, Shiller et al. (2009) demonstrated that after sensorimotor adaptation of speech production induced by altered auditory feedback of a listener's own /- / (as in *ship*) productions, subsequent perception of another talker's /s/-/- / (as in *sip* to *ship*) sounds was also shifted. Thus, the consequences of sensorimotor adaptation (attributed to cerebellar supervised learning mechanisms) may have a perceptual component that is unrelated to changes in motor output.

The link between sensorimotor adaptation and sensory adaptation, together with recent evidence implicating the cerebellum in purely perceptual adaptation (e.g., Roth et al., 2013) suggest that the supervised learning mechanisms posited for sensorimotor adaptation in speech (Houde and Jordan, 1998; Jones and Munhall, 2000; Guenther and Ghosh, 2003; Shiller et al., 2009) can also provide a framework for understanding adaptive plasticity in speech perception. In speech perception, predictions about sensory input may be derived from multiple sources of information (e.g., lexical, visual) that constrain listeners' interpretation of incoming acoustic signals.

Guediche et al. (accepted) recently examined the potential for cerebellar contributions to adaptive plasticity in speech perception. To this end, they examined neural activity linked to improvements in recognition of acoustically distorted words. Several cerebellar regions showed significantly different activation before, compared to after, adaptation to acoustically distorted words. Activity in one region, right Crus I (previously implicated in language tasks; Stoodley and Schmahmann, 2009; Keren-Happuch et al., 2012) was significantly correlated with behavioral improvement measures of adaptive plasticity during the adaptation phase of the experiment. A seed functional correlation analysis revealed that hemodynamic responses in right Crus I during adaptation significantly covaried with areas in parietal and temporal cortices. This evidence is consistent with prior functional neuroimaging findings implicating these cerebral cortical regions in adaptive plasticity (e.g., Eisner et al., 2010), and extends those prior findings to include the cerebellum as part of a cerebro-cortical functional network that contributes to adaptive changes in speech perception.

In sum, the recent theoretical development and empirical investigation of predictive coding and adaptive plasticity in speech processing, as reviewed above, offers a framework for understanding how prediction errors may be computed, represented, and used to optimize perception. Although prior neuroimaging studies of speech perception adaptation and predictive coding have specifically focused on changes in cerebral cortical areas, the converging lines of evidence described above are consistent with the involvement of cerebellar-supervised learning via cerebro-cerebellar interactions. We are proposing that the cerebellum plays a key role in adaptive plasticity and critically provides a mechanism that can allow for plasticity in the context of a stable perceptual system. In particular, the cerebellum provides an established neural mechanism known to be involved in rapid adaptive plasticity. More research will be needed to examine this issue but this hypothesis provides a working framework for examining the dual roles of stability and plasticity in cognitive systems generally, and in speech perception in particular.

Finally, with regard to maintaining stability it is notable that there is evidence for the possibility that the cerebellum (potentially through interactive loops with cerebral cortex) can maintain multiple adaptive adjustments to internal models (Cunningham and Welch, 1994; Martin et al., 1996b; Imamizu et al., 2003). This provides the means for rapid and short-term adaptive plasticity that can be implemented without catastrophically affecting the stability of long-term regularities. Most germane to adaptive plasticity in speech perception, it presents the opportunity for multiple relationships between acoustic input and linguistic information to be simultaneously represented, such as might be necessary to maintain adaptation to different speakers or different accents. Thus, future neuroimaging efforts should be attentive to including the cerebellum (and potentially other subcortical structures) in the network of regions investigated as contributing to adaptive plasticity in speech perception.

#### **CONCLUSIONS AND FUTURE DIRECTIONS**

Everyday speech communication largely takes place in suboptimal or even adverse listening conditions, at least relative to the pristine listening environments in which most research is conducted. The acoustic speech signals most often conveying meaning to listeners in everyday conversation carry the influence of noisy environments, foreign accented talkers, reduced conversational speech, and dysfluency (see Mattys et al., 2012). We have reviewed several parallel behavioral literatures that demonstrate that the perceptual system makes rapid adaptive adjustments in response to distorted acoustic speech input. We make the case that these largely unconnected behavioral literatures, which have focused on different aspects of speech processing (spoken word recognition and acoustic-phonetic perception) may, in fact, be linked by common factors. We have reviewed computational modeling in the speech perception and neuroscience literatures within and outside the field of speech communication. We have considered how these literatures speak to prospective mechanisms and their ability to unite the behavioral literatures on adaptive plasticity in word recognition and acoustic phonetic perception. In addition, we considered two separate, but complementary, neuroimaging literatures on predictive coding and adaptive plasticity, with the goal of informing the mechanistic basis of adaptive plasticity in speech perception. Both predictive coding and adaptive plasticity models posit mechanisms for encoding error signals when there is a discrepancy between predicted and actual sensory input. Supervised learning mechanisms that rely on prediction error signals for rapid adaptive plasticity have been well-established in the sensorimotor literature, including speech production adaptation tasks, and have been attributed to cerebro-cerebellar interactions. More recently, they have been implicated in nonmotor, perceptual tasks including speech perception. We posit that these findings suggest prediction error-driven learning orchestrated via cerebrocerebellar interactions may play a role in adaptive plasticity in speech perception.

Based on the synthesis of these literatures, we argued that the generation of predictions, prediction error signals, and supervised learning may be significant in driving adaptive plasticity. In particular, we highlighted the potential for a cerebellar-dependent supervised learning mechanism to play a role in adaptive plasticity in speech perception and described preliminary evidence that supports this possibility. This perspective suggests some directions for future research that will better develop neurobiological models of speech communication that capture the dynamic, online flexibility of the system.

Although a great deal of evidence points to the importance of subcortical-cortical interactions in adaptive plasticity in other domains, the mainstream literature on speech perception has yet to make significant contact with the literature on subcortical contributions to adaptive plasticity. Neuroscience research relevant to adaptive plasticity in speech perception and, indeed to speech perception more generally, has tended to be be focused on the cerebrum. Although we know less about contributions of subcortical structures in speech perception, there have been a number of studies that have highlighted roles for the cerebellum, thalamus, caudate, and the brainstem that may be defined by specific functions, or interactions with specific regions in cerebral cortex (Ravizza, 2003; Tricomi et al., 2006; Song et al., 2008, 2011, 2012; Stoodley and Schmahmann, 2009; Anderson and Kraus, 2010; Stoodley et al., 2012; Erb et al., 2013).

In the broader neuroscience literature, developing perspectives have suggested that different types of learning mechanisms may be subserved by different neural systems. At least three types of potentially distinct and interacting learning circuits have been proposed for unsupervised, reinforcement, and supervised learning (see Doya, 2000; Hoshi et al., 2005; Bostan et al., 2010; Wolpert et al., 2011). Doya (2000) suggested that unsupervised learning algorithms depend mostly on long-term changes in cerebral cortex that can be incorporated over longer timecourses (Doya, 2000). Reinforcement learning, on the other hand, relies on information to predict reward outcomes. In speech perception, reinforcement learning has been examined in the context of non-native category learning. In a functional neuroimaging study, Tricomi et al. (2006) examined learning with performance feedback and found that basal ganglia activity was modulated by the presence of feedback during a non-native phonetic category perception task just as they are in other reinforcement learning tasks (e.g., Delgado et al., 2000). Whereas reinforcement learning may optimize subsequent reward prediction error and engage the basal ganglia, supervised learning may optimize sensory prediction error signals by engaging the cerebellum.

In speech perception, both unsupervised and supervised learning mechanisms have been used to account for adaptive plasticity (Norris et al., 2003; Mirman et al., 2006). Outside the domain of speech perception, unsupervised learning mechanisms are generally used to model learning that arises over longer time courses (McClelland et al., 1995; O'Reilly, 2001; Grossberg, 2013) than the learning that characterizes adaptive plasticity. Supervised learning in models of speech perception have not accounted for many known behavioral and biological constraints, However, outside the domain of speech perception, recent models have explored a number of alternatives for achieving neurobiologically plausible supervised learning algorithms (e.g., Yu et al., 2008; Chinta and Tweed, 2012).

In speech, there is behavioral evidence that listeners can achieve greater levels of adaptation that go beyond those reached with rapid adaptation training paradigms, if they are exposed to multiple sessions with consolidation (Banai and Lavner, 2012). Improvements in word recognition for distorted acoustic input degrade over the course of a day-long retention interval, but are fully restored with sleep; sleep thus appears to stabilize what is learned in adaptation to distorted speech (Fenn et al., 2003), with word recognition improvements lasting as long as 6 months (Schwab et al., 1985). Thus, a fully mechanistic account of speech processing will require an understanding of how and to what extent different learning mechanisms interact with one another to influence speech processing. Some computational accounts of perception have begun to incorporate different types of learning algorithms within single systems (Hinton and Plaut, 1987; O'Reilly, 2001; Kleinschmidt and Jaeger, 2011; Grossberg, 2013). One challenge for models of speech processing is to account for the equilibrium that must be maintained between mechanisms involved in preserving stability while supporting plasticity.

In light of the parallels we have drawn between adaptive plasticity in speech perception and sensorimotor adaptation, it is interesting to note that research has demonstrated retention of sensorimotor adaptation effects over more than a year (Yamamoto et al., 2006) suggesting that cerebellar-dependent supervised learning can evoke changes in internal models that are maintained across long time periods. Yamamoto et al. speculate that the extent to which sensorimotor adaptation is retained depends on an interaction between the number of training trials and the magnitude of the distortion, with more subtle distortions leading to longer-lasting adaptation perhaps because they evoke smaller errors and avoid engaging explicit compensation mechanisms (Redding and Wallace, 1996). These issues have not been investigated in the adaptive plasticity of speech perception, but have important implications for long-lasting adaptation in speech perception. Understanding the details of the interplay between the different types of learning mechanisms will be crucial for understanding how the system maintains balance between stability and plasticity in speech perception.

Beyond delineating the learning mechanisms available to guide adaptive plasticity in speech perception, there also are many open questions regarding the nature of putative prediction errors and how predictions may be derived from various information sources. The field has focused much attention on the role of lexical information in driving adaptive plasticity. Other sources of information, such as co-speech gestures from arm and hand movements associated with speech communication (Skipper et al., 2009), semantic or sentence context (e.g., Borsky et al., 1998; Zekveld et al., 2012), knowledge about the speaker (Samuel and Kraljic, 2009), or previously learned voice-face associations (von Kriegstein et al., 2008) may provide a basis for disambiguating distorted acoustic input via prediction errors and, potentially, may drive adaptive plasticity. Indeed, in more natural communication, many different information sources converge to constrain predictions and disambiguate acoustic speech input. The emerging framework we have begun to sketch unites the means by which these very different information sources drive adaptive plasticity in speech perception. These other sources of information provide a constraint on the predictions the system makes about the intended message and, in turn, affect the sensory prediction that is made and the prediction error that results. Moreover, since both internally-generated and external sensory input inform predictions, it becomes easier to reconcile seemingly distinct influences of acoustic sensory distortions and higher-level influences such as expectations about speaker- or context-specific factors that influence speech, (Kraljic et al., 2008; Kraljic and Samuel, 2011).

In conclusion, evidence for a flexible speech perception system that rapidly adapts to accommodate systematic distortions in acoustic speech input is abundant. A review of behavioral, computational, and neuroscience research related to rapid adaptive mechanisms suggests that it may be informative to consider phenomena in literatures outside of speech communication to identify common and unifying principles of how the brain balances stability and plasticity. Here, we examined cerebellar-dependent supervised learning that relies on sensory prediction error signals as a potential mechanism for supervising adaptive changes in speech perception. The predictions used to derive the error signals may be generated from multiple interacting sources of external sensory and internally-generated information. By incorporating cerebral-subcortical interactions established in other literatures into neuroanatomical theories of speech perception, the mechanisms that contribute to stability and plasticity may be better understood.

#### **ACKNOWLEDGMENTS**

The authors acknowledge the sources of funding which facilitated this work including RO1 DC 004674 to Holt, RO1 MH 59256 and NSF 1125719 to Fiez, and RO1 DC 006220 to Blumstein.

#### **REFERENCES**


Davis, M. H., and Johnsrude, I. S. (2003). Hierarchical processing in spoken language comprehension. *J. Neurosci.* 23, 3423. doi: 10.1093/cercor/bhi009


Kawato, M., and Wolpert, D. (1998). Internal models for motor control. *Novartis Found. Symp.* 218, 291–304.


functionally differentiated networks underlying auditory feedback processing of speech. *J. Neurosci.* 33, 4339–4348. doi: 10.1523/JNEUROSCI. 6319-11.2013

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 16 September 2013; accepted: 16 December 2013; published online: 03 January 2014.*

*Citation: Guediche S, Blumstein SE, Fiez JA and Holt LL (2014) Speech perception under adverse conditions: insights from behavioral, computational, and neuroscience research. Front. Syst. Neurosci. 7:126. doi: 10.3389/fnsys.2013.00126*

*This article was submitted to the journal Frontiers in Systems Neuroscience.*

*Copyright © 2014 Guediche, Blumstein, Fiez and Holt. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Developmental plasticity of spatial hearing following asymmetric hearing loss: context-dependent cue integration and its clinical implications

## *Peter Keating\* and Andrew J. King\**

*Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, UK*

#### *Edited by:*

*Jonathan E. Peelle, Washington University in St. Louis, USA*

#### *Reviewed by:*

*Michael Pecka, Ludwig-Maximilians University Munich, Germany Daniel B. Polley, Massachusetts Eye and Ear Infirmary, Ashmore and Cartier Islands*

#### *\*Correspondence:*

*Peter Keating and Andrew J. King, Department of Physiology, Anatomy and Genetics, University of Oxford, Parks Road, Oxford OX1 3PT, UK e-mail: peter.keating@dpag.ox.ac.uk; andrew.king@dpag.ox.ac.uk*

Under normal hearing conditions, comparisons of the sounds reaching each ear are critical for accurate sound localization. Asymmetric hearing loss should therefore degrade spatial hearing and has become an important experimental tool for probing the plasticity of the auditory system, both during development and adulthood. In clinical populations, hearing loss affecting one ear more than the other is commonly associated with otitis media with effusion, a disorder experienced by approximately 80% of children before the age of two. Asymmetric hearing may also arise in other clinical situations, such as after unilateral cochlear implantation. Here, we consider the role played by spatial cue integration in sound localization under normal acoustical conditions. We then review evidence for adaptive changes in spatial hearing following a developmental hearing loss in one ear, and show that adaptation may be achieved either by learning a new relationship between the altered cues and directions in space or by changing the way different cues are integrated in the brain. We next consider developmental plasticity as a source of vulnerability, describing maladaptive effects of asymmetric hearing loss that persist even when normal hearing is provided. We also examine the extent to which the consequences of asymmetric hearing loss depend upon its timing and duration. Although much of the experimental literature has focused on the effects of a stable unilateral hearing loss, some of the most common hearing impairments experienced by children tend to fluctuate over time. We therefore propose that there is a need to bridge this gap by investigating the effects of recurring hearing loss during development, and outline recent steps in this direction. We conclude by arguing that this work points toward a more nuanced view of developmental plasticity, in which plasticity may be selectively expressed in response to specific sensory contexts, and consider the clinical implications of this.

**Keywords: auditory localization, binaural, monaural, conductive hearing loss, adaptation, learning, cortex, midbrain**

#### **INTRODUCTION**

The ability to hear is of critical importance for a wide variety of species. Indeed, in many naturalistic situations, auditory input provides the only source of information about distant events. However, whilst the identity of a sound source is clearly important, its location also plays a critical role in guiding behavior. In noisy and reverberant acoustic environments, spatial hearing can additionally help to separate different sound sources, thereby enabling their subsequent identification (Yost, 1997; Kidd et al., 2005). For these reasons, numerous species have developed and refined the ability to localize sounds in space.

Unlike the visual and somatosensory systems, however, the auditory system does not contain an implicit map of space at the level of the receptor surface. Instead, the receptors that transduce sound are arranged along the cochlea according to their tuning for sound frequency. The brain must therefore actively construct a representation of auditory space by transforming and processing the acoustical inputs provided to each ear. In doing so, the brain takes advantage of the fact that specific aspects of the acoustical input tend to depend on the position of the sound source relative

to that of the listener (Blauert, 1997). The challenge faced by the brain is therefore to interpret and combine the information provided by these different spatial cues in order to create a coherent representation of auditory space.

In many cases, the most effective way to localize sounds will depend on the precise properties of the acoustical environment. While this varies with the nature of the sound sources themselves and the acoustical conditions in which they are encountered, developmental changes in the relative dimensions of the ears and in the neural circuits that process sound will also cause the acoustical environment to change. In order to maintain stable representations of space, the auditory system must therefore adapt to these changes by processing auditory spatial cues dynamically in ways that are appropriate to the prevailing sensory conditions. To understand these adaptive mechanisms, one popular approach that has been used is to manipulate the acoustical input experimentally and study its consequences on the perception and processing of sound. Although the sounds reaching the ears can be altered in a variety of ways, important insights into developmental plasticity have been gained by introducing a hearing loss to one ear (Clopton and Silverman, 1977; Silverman and Clopton, 1977; Clements and Kelly, 1978; Moore and Irvine, 1981; Brugge et al., 1985; Popescu and Polley, 2010; Keating et al., 2013a; Polley et al., 2013). In this way, asymmetric hearing loss has become an important model system for understanding basic principles of neural development, complementing studies of monocular deprivation in the visual system (Daw, 2009).

However, whilst monaural occlusion provides a powerful method for studying basic aspects of developmental plasticity, the developmental effects of asymmetric hearing loss are also clinically important. This is because periods of unilateral hearing loss are extremely common during development. For example, otitis media with effusion, colloquially referred to as "glue ear," is experienced by approximately 80% of children before the age of 3, and is often associated with a temporary hearing loss in one ear (Engel et al., 1999; Whitton and Polley, 2011). In rarer cases, children may also experience a congenital hearing loss in one ear (Wilmington et al., 1994; Gray et al., 2009). Similarly, in situations where children with bilateral deafness receive a cochlear implant in only one ear, the auditory system may be exposed to long periods of unilateral stimulation. This can result in marked changes in auditory pathway circuitry, with important implications for the restoration of normal functions if the second ear is subsequently implanted (Gordon et al., 2013; Illg et al., 2013; Kral et al., 2013). From both a fundamental and clinical perspective, it is therefore extremely important to understand the developmental consequences of asymmetric hearing loss.

In this review, we briefly outline auditory spatial processing under normal acoustical conditions, highlighting the importance of integrating the information provided by the different cues to sound source location. We then review evidence for adaptive changes in spatial hearing following a developmental hearing loss in one ear, and argue that adaptation may be achieved either by learning to use altered cues correctly or by learning to change the way cues are integrated. Having outlined the positive aspects of developmental plasticity, we next consider this plasticity as a source of vulnerability, describing evidence for effects of asymmetric hearing loss that persist and become maladaptive when normal hearing is restored. We then ask whether the consequences of asymmetric hearing loss are mediated by its timing and duration, and suggest that spatial hearing may be particularly vulnerable to prolonged periods of imbalanced hearing early in development.

Although much of the experimental literature has focused on the effects of a stable hearing loss in one ear, some of the most common forms of hearing loss experienced by children tend to fluctuate over time (Hogan et al., 1997; Whitton and Polley, 2011). We therefore propose that there is a need to bridge this gap by investigating the effects of recurring hearing loss during development, and outline recent steps in this direction. In addition to its clinical relevance, we suggest that this approach may also provide a useful experimental tool for understanding how the brain learns the importance of sensory context in complex environments. We then conclude by arguing that this work points toward a more nuanced view of developmental plasticity following asymmetric hearing loss, in which plasticity may be selectively expressed in response to specific hearing conditions.

#### **AUDITORY SPATIAL PROCESSING UNDER NORMAL ACOUSTICAL CONDITIONS AUDITORY SPATIAL CUES**

In individuals with normal hearing, the location of a sound can be inferred from a variety of different cues. However, whilst a complete description of sound source location requires information about both distance and direction, relatively little is known about the neural mechanisms underlying distance perception (Zahorik et al., 2005; Kopco et al., 2012 ˇ ). For this reason, this review will focus on how the direction of a sound source is computed. In this respect, it has long been known that sounds originating on one side of space tend to be more attenuated in the ear contralateral to the source (**Figure 1A**). This acoustical shadowing effect of the head produces an interaural level difference (ILD) that varies systematically with sound source direction (Blauert, 1997). Similarly, due to differences in path length between a sound source and the two ears, the acoustic waveform of a sound will often arrive at one ear slightly before the other (**Figure 1A**). This produces an interaural time difference (ITD) that also varies systematically with the angular direction of the source relative to the head (Blauert, 1997).

Because ITDs and ILDs both require a comparison of the input provided to the two ears, these cues are collectively referred to as binaural spatial cues. In many instances, however, the relative usefulness of these binaural cues depends on sound frequency (Strutt, 1907; Middlebrooks and Green, 1991; Blauert, 1997; Macpherson and Middlebrooks, 2002). For example, since lowfrequency sound waves can diffract around the head, ILD cues tend to be relatively small at low frequencies, which limits the usefulness of these cues. This means that ILDs tend to be used primarily for localizing high-frequency sounds.

Conversely, for simple periodic sounds, ITDs become spatially ambiguous as the sound frequency is increased, as it becomes harder to tell at which ear the sound is leading and at which it is lagging (Schnupp et al., 2011). Moreover, ITDs can only be calculated in situations where information is preserved in the auditory system about the temporal structure of the auditory waveform, a feat that many species find difficult to achieve at higher frequencies. This is because auditory nerve fibers represent temporal structure by locking their activity to specific phases of the stimulus waveform. In many species, phase locking begins to decline at frequencies greater than 1 kHz (Sumner and Palmer, 2012), which produces a corresponding reduction in ITD sensitivity at these higher frequencies (Brughera et al., 2013). In such cases, any residual sensitivity to ITDs therefore depends on the envelope of a sound (Henning, 1974; Bernstein and Trahiotis, 2002), rather than its fine temporal structure. The frequency dependence of ILD and ITD sensitivity provides the basis for the duplex theory of sound localization (Strutt, 1907), which applies to humans as well as to at least some other mammalian species (Wakeford and Robinson, 1974; Brown et al., 1978; Houben and Gourevitch, 1979; Keating et al., 2013b).

In addition to these binaural cues, spatial information may also be inferred from the relative intensities of different frequency components present at one ear. This is because, at least in mammals, the filtering properties of the head and external ears serve to shape the spectrum of a sound in a direction-dependent

way (**Figure 1B**). Commonly referred to as spectral cues, these monaural spatial cues are most pronounced at high frequencies, and are thought to be critical for distinguishing between locations that produce identical ITD and ILD values (Musicant and Butler, 1984; Musicant et al., 1990; Carlile et al., 2005). For this reason, spectral cues are thought to play an important role in determining whether sounds are located in the front or rear hemifields. In many mammalian species, including humans, spectral cues are also critical for determining the elevation of a sound (Parsons et al., 1999; Carlile et al., 2005; Tollin et al., 2013). Although the acoustical properties of spectral cues make them equally suitable for determining the lateral angle of sounds in the horizontal plane, these cues typically contribute very little to this process, with ITDs and ILDs instead dominating the perceived azimuth under normal listening conditions (Macpherson and Middlebrooks, 2002).

#### **IMPORTANCE OF CUE INTEGRATION**

The findings outlined in the previous section illustrate that the importance of different spatial cues can vary depending on the properties of the sound and the region of space in which it needs to be localized. Consequently, under a particular set of hearing conditions, different cues tend not to contribute equally to judgments of sound location. Determining the weight that should be given to each cue therefore represents a key aspect of sensory processing. Individually, monaural and binaural spatial cues typically provide only partial, and even potentially contradictory, information about stimulus location. This can occur because neural representations are often noisy and imprecise and the nature of these coding errors may be independent for different cues, thereby giving rise to cue conflict. In addition, whereas monaural spectral cues are influenced by the spectrum of a sound (Wightman and Kistler, 1997), binaural cues are much more robust with respect to the source spectrum. In certain situations, the auditory system may therefore misattribute the spectral properties of a sound to the filtering effects of the head and ears (Hofman and Van Opstal, 2002; Keating et al., 2013a), with the result that monaural and binaural cues may indicate that the sound originated from different directions. Over time, the reliability of each cue can also change. The challenge faced by the brain is therefore to determine the best way of combining these different cues to provide a coherent representation of the external world.

This need for cue integration, however, is not unique to the auditory system. Different sensory systems, for example, often provide complementary information about the location of a particular target object or event. By taking into account the information provided by each system, it is therefore possible to achieve a better estimate of object location than would be possible using either system in isolation (Knill and Pouget, 2004; Alais et al., 2010).

Previous studies have shown that this kind of cue integration may be described by a process that takes the weighted average of individual cues, with the weights given to each being proportional to the relative reliability of that cue. It can also be shown that this simple process is statistically optimal under certain conditions (Ernst and Banks, 2002; Alais and Burr, 2004). Although there has been much recent emphasis on multisensory cue integration, models of cue integration have also been applied to the combination of depth cues within the visual system (Jacobs, 1999), as well as the combination of speech cues within the auditory system (Clayards et al., 2008). It is therefore likely that similar models may apply to the integration of auditory spatial cues (Van Wanrooij and Van Opstal, 2007; Keating et al., 2013a).

#### **NEURAL BASIS OF SPATIAL HEARING**

Although the auditory system must ultimately combine the information provided by different spatial cues, monaural and binaural cues are initially processed separately prior to integration at higher levels of the neuroaxis. In mammals, for example, acoustical inputs are transduced into neural signals by cochlear hair cells before being passed via the auditory nerve to the cochlear nuclei, and it is within the dorsal cochlear nuclei that processing of monaural spectral cues is subsequently thought to occur (Young et al., 1992). Projections originating in the ventral divisions of the cochlear nuclei target the superior olive bilaterally (**Figure 1C**), allowing input from the two ears to converge for the first time. The nature of this convergence is such, however, that the processing of different binaural cues remains segregated to a large extent, with the lateral superior olive (LSO) involved primarily in ILD processing and the medial superior olive (MSO) more associated with the processing of ITDs (Yin, 2002).

Projections from these brainstem nuclei then ascend to the midbrain, where they are thought to target partially overlapping populations of neurons in the central nucleus of the inferior colliculus (ICc) (Loftus et al., 2010). From here, auditory signals are transmitted via the medial geniculate nucleus of the thalamus to the auditory cortex (**Figure 1C**), which has been shown by inactivation studies to play a critical role in sound localization (Heffner and Heffner, 1990; Malhotra et al., 2004; Nodal et al., 2012). Although cortical cells presumably integrate the information provided by different auditory spatial cues, there is currently very little evidence to suggest that a topographic map of auditory space is constructed at the level of the cortex (Recanzone and Sutter, 2008; Razak, 2011) or indeed at any subcortical level of the primary auditory pathway in mammals [reviewed in Grothe et al., 2010]. A crude map of auditory space has, however, been described in the nucleus of the brachium of the inferior colliculus (nBIC) (Schnupp and King, 1997), which receives a major source of input from the ICc. The nBIC projects topographically to the superior colliculus (SC), where a more refined map of auditory space is found that shows the same topographic order observed for the representation of other sensory modalities (Palmer and King, 1982; King and Hutchings, 1987). Whilst the precise neural architecture invariably differs across species, broadly similar organizational principles apply to the avian brain, with the optic tectum and the external nucleus of the inferior colliculus (ICx) both showing maps of auditory space in the absence of any topographical representation in the forebrain (Cohen and Knudsen, 1999). However, in contrast to mammals, topographic representations of ITDs and ILDs have been found in the brainstem nuclei where these cues are first computed (Singheiser et al., 2012).

## **DEVELOPMENTAL PLASTICITY OF SPATIAL HEARING FOLLOWING ASYMMETRIC HEARING LOSS**

#### **TYPES OF HEARING LOSS**

In attempting to understand developmental plasticity in the auditory system, numerous studies have investigated the effects of early hearing loss. Although hearing loss can be produced in a number of different ways, these can be broadly categorized as being either sensorineural or conductive in nature, each of which has distinct advantages from an experimental perspective. For example, experimental induction of sensorineural hearing loss, which can be achieved via cochlear ablation (Moore and Kowalchuk, 1988), tends to completely abolish both the transduction of sound as well as any spontaneous activity at the site of the lesion (Tucci et al., 1987). Although this form of hearing loss can completely eliminate binaural spatial cues, it is typically irreversible, making it very difficult to determine what happens to the behavioral and neurophysiological representation of the affected ear unless cochlear implants are used. It is also less appropriate as a model for the types of hearing loss that are typically experienced by children during development (Moore and King, 2004; Tollin, 2010; Whitton and Polley, 2011).

In contrast, conductive hearing loss is typically incomplete, producing only a partial attenuation of acoustic input to the affected ear (Moore et al., 1989; Gravel and Wallace, 2000; Kumpik et al., 2010; Lupo et al., 2010). Conductive hearing loss is also often fully reversible and represents an excellent model for the types of hearing loss that are most commonly experienced during development. This is particularly true of otitis media with effusion (Gravel and Wallace, 2000; Whitton and Polley, 2011), which is associated with an accumulation of fluid in the middle ear. In many cases, this prevents the normal transmission of sound by the middle ear, thereby producing a conductive hearing loss.

Although otitis media with effusion can occur either unilaterally or bilaterally (Hogan et al., 1997; Engel et al., 1999), a situation common to many other forms of hearing loss, experimental studies have shown that the effects of a unilateral hearing loss are typically more dramatic than those observed following a bilateral hearing loss (Silverman and Clopton, 1977; Clements and Kelly, 1978; Moore, 2002; Moore and King, 2004; Keuroghlian and Knudsen, 2007; Tollin, 2010; Whitton and Polley, 2011). In addition to its prominence as an experimental model, unilateral hearing loss therefore represents a major source of vulnerability in clinical populations.

From an experimental perspective, a partial hearing loss in one ear can be reversibly induced either by surgical ligation of the ear canal (Silverman and Clopton, 1977; Moore and Irvine, 1981; Brugge et al., 1985; Popescu and Polley, 2010) or by occluding the ear canal with a material that attenuates sound (Knudsen, 1985; Gold and Knudsen, 1999; Kacelnik et al., 2006; Kumpik et al., 2010; Polley et al., 2013). In addition to reducing the input amplitude, these manipulations tend to delay the transmission of sound to the affected ear. Monaural deprivation therefore has a profound effect on both the ITDs and ILDs available to the listener (Moore et al., 1989; Hartley and Moore, 2003; Kumpik et al., 2010; Lupo et al., 2010), which is likely to be shared by the effects of otitis media with effusion in children (Gravel and Wallace, 2000).

Unilateral hearing loss therefore alters the usefulness of auditory spatial cues as well as the precise relationship between specific cue values and spatial location, but leaves any monaural spatial cues available to the intact ear unchanged. The methods used to induce hearing loss also tend to act as low-pass filters, attenuating higher frequencies to a greater extent than lower frequencies (Moore et al., 1989; Kumpik et al., 2010; Lupo et al., 2010; Polley et al., 2013). Consequently, ILDs will typically be altered in a frequency-dependent manner. Although this frequency-dependent attenuation can have profound effects on tonotopic representations in the brain (Popescu and Polley, 2010), this review will focus on its impact on spatial hearing.

#### **CUE REMAPPING**

In principle, spatial hearing could adapt to a unilateral hearing loss in two distinct ways. First, in situations where the affected ear retains some acoustical sensitivity, the auditory system could utilize the cues that have been altered by hearing loss. In particular, this would require the brain to acquire new mappings between specific locations and the values of individual spatial cues (**Figure 2A**). In other words, the auditory system could adapt by learning to reinterpret the spatial meaning of particular acoustical inputs. Prior to the onset of hearing loss, for example, a sound located directly in front of an observer is likely to produce an ILD of zero, since the sound will be of equal intensity at each ear. However, if the sound transmitted to one ear is attenuated due to hearing loss, sounds located in front of the observer will no longer produce an ILD of zero. Instead, an ILD of zero may be produced by sounds originating from more peripheral locations ipsilateral to the hearing loss. In such circumstances, the auditory system could therefore adapt by adjusting the ILD sensitivity of neurons to compensate for the imbalance in inputs between the ears.

Thus, far, the clearest evidence for cue remapping has been obtained by studies of partial unilateral hearing loss in the developing barn owl. In particular, sound localization behavior in this species readily adapts to a unilateral hearing loss introduced early in development (Knudsen et al., 1984a). At a neural level, this adaptation is paralleled by shifts in ILD and ITD tuning, thereby changing the location of the receptive fields in ways that compensate for the effects of hearing loss (**Figures 2B,C**). For example, compensatory shifts in ITD sensitivity emerge at the level of ICx (Gold and Knudsen, 2000b) and, in turn, are observed in the optic tectum (Gold and Knudsen, 2001), thalamus (Miller and Knudsen, 2003) and forebrain (Miller and Knudsen, 2001). Similarly, adaptive shifts in ILD sensitivity initially appear in the brainstem nucleus that is the first site of ILD processing in the barn owl (Mogdans and Knudsen, 1994), prior to subsequent elaboration at the level of the ICx (Mogdans and Knudsen, 1993), optic tectum (Mogdans and Knudsen, 1992), and forebrain (Miller and Knudsen, 2001).

There are, however, key differences in the way in which barn owls and mammals localize sound, which likely reflect the independent evolution of mechanisms for sound localization in different groups of vertebrates (Grothe et al., 2010). Consequently, we should not necessarily expect the way the brain responds to unilateral hearing loss to be the same in birds and mammals. Indeed, in contrast to the experiments carried out in barn owls, very little evidence has been obtained for cue remapping in mammals. Thus, cats and rats reared with a partial unilateral hearing loss similar to that used in barn owl studies do not show compensatory shifts in the neural sensitivity to binaural spatial cues, either in the inferior colliculus (IC) (Clopton and Silverman, 1977; Silverman and Clopton, 1977; Moore and Irvine, 1981; Popescu and Polley, 2010) or the primary auditory cortex (Brugge et al., 1985; Popescu and Polley, 2010). Although it is conceivable that compensatory changes occur at higher levels of processing, these results highlight the possibility that cue remapping may not occur under all circumstances when a unilateral hearing loss is experienced.

Although there is currently very little evidence in mammals for experience–dependent adjustments in binaural cue sensitivity equivalent to those seen in barn owls, this does not necessarily mean that mammals are incapable of adapting in this manner to the altered cues produced by a unilateral hearing loss. In situations where normal hearing cannot be restored, it is therefore important to ask whether it might be possible to devise targeted intervention strategies that promote adaptive adjustments in ILD or ITD sensitivity. The mechanisms of plasticity revealed by research in barn owls therefore illustrate the potential for adaptive changes in the encoding of binaural spatial cues, even if the immediate clinical implications of this work remain unclear.

#### **CUE REWEIGHTING**

A compensatory adjustment in neural sensitivity represents one viable mechanism for adapting to changes in auditory spatial cues, but this could also be achieved by the auditory system becoming more dependent on other cues that remain unchanged. In the specific case of unilateral hearing loss, this would require the auditory system to ignore the affected binaural spatial cues and instead rely more on the monaural spatial cues available to the intact ear. For individuals with a complete loss of hearing in one ear, this may be the only way in which a recovery in sound localization accuracy can be achieved. Indeed, behavioral evidence in at least some unilaterally deaf humans supports the idea that monaural spectral cues might be used to localize sounds in the horizontal plane (Slattery and Middlebrooks, 1994; Van Wanrooij and Van Opstal, 2004). Similarly, in cases of partial hearing loss, binaural cues are eliminated when the sound level is insufficiently high to be transmitted to the affected ear, which appears to enable humans to use spectral cues at these lower sound levels (Van Wanrooij and Van Opstal, 2007).

In such cases, however, it is unclear whether this amounts to cue reweighting, since these individuals do not have access to binaural spatial cues. Nevertheless, in situations where altered binaural cues remain available, it is clear that spatial hearing can adapt to a partial unilateral hearing loss by relying to a greater extent on the spectral cues provided to the intact ear. This has been demonstrated in both ferrets and humans with a unilateral hearing loss experienced either during adulthood (Kacelnik et al., 2006; Kumpik et al., 2010; Agterberg et al., 2012) or development (Newton, 1983; Keating et al., 2013a).

At a neuroanatomical level, developmental studies of unilateral hearing loss in mammals have demonstrated a relative weakening of the pathways that convey input from the affected ear, which include reduced connectivity, reductions in the size and number of neurons as well as changes in dendritic morphology (Tollin, 2010). Changes consistent with a weakening of these pathways have been observed in a variety of brain regions, including the cochlear nucleus (Coleman and O'Connor, 1979; Webster and

**the relationship between cue values and sound-source location. (A)** Under normal listening conditions (left), specific combinations of cue values correspond to particular locations in the external world. Small circles of the same color represent particular cue combinations and their corresponding locations in the external world. Under abnormal listening conditions, such as when one ear is occluded by an earplug, these relationships are distorted and altered (right). In order to use these abnormal cues for accurate sound localization, the brain must therefore learn that the same locations now correspond to different cue combinations. **(B)** At present, robust neurophysiological evidence for cue remapping has only been observed in barn owls reared with one ear occluded. Electrophysiological recordings from neurons in the optic tectum of these animals show that compensatory shifts take place in the neurons' auditory spatial tuning. Tectal neurons respond most strongly to visual and auditory stimuli presented from overlapping locations, with their receptive fields arranged systematically to produce topographically-aligned maps of visual and auditory space (represented by the contour lines superimposed on the optic tectum in the picture of the owl's brain). Recordings from the rostral region of the tectum in

Webster, 1979; Blatchley et al., 1983; Webster, 1983b; Moore and Kowalchuk, 1988), superior olive (Webster and Webster, 1979; Webster, 1983a; Sanes et al., 1992; Russell and Moore, 1999) and IC (Webster, 1983a,b), although the precise nature of these revealed little difference between the visual and auditory receptive field centers when the earplug was still in place ("Earplug"). Misalignment in elevation is plotted on the ordinate, with misalignment in azimuth plotted on the abscissa. When the ear was occluded, the data points cluster around the origin, demonstrating that auditory and visual receptive fields are broadly in register. Following earplug removal, however, the receptive fields became systematically misaligned in both azimuth and elevation, indicating that the neurons were tuned to binaural cue values that no longer corresponded to their preferred visual location. **(C)** Site of auditory plasticity in the ascending auditory pathway of the barn owl. Frequency-dependent shifts in ITD tuning are plotted for barn owls reared either with normal hearing or with a passive filtering device in one ear that delays and attenuates sound. Positive values indicate shifts in ITD tuning that compensate for the effects of the device. Bars and lines show medians and interquartile ranges. Data are shown for the optic tectum (OT), external nucleus of the inferior colliculus (ICx) and the lateral shell of the central nucleus of the inferior colliculus (ICcls). Shifts in ITD tuning emerge at the level of the ICx. Modified with permission from Knudsen (1985) and Gold and Knudsen (2000b).

changes varies across different brain regions and different types of hearing loss. Studies of unilateral conductive hearing loss have shown that the neurophysiological representation of the developmentally occluded ear is similarly weakened, with neurons in the IC (Clopton and Silverman, 1977; Silverman and Clopton, 1977; Popescu and Polley, 2010) and auditory cortex (Brugge et al., 1985; Popescu and Polley, 2010) becoming relatively more driven by acoustical input provided to the intact ear. Although these studies were unable to measure the relative weight given to different spatial cues, a change in the relative efficacy with which each ear can activate central auditory neurons is precisely what would be expected if the auditory system were to become more dependent upon the monaural spatial cues provided by the intact ear.

In a recent study, we therefore set out to test this cue reweighting hypothesis explicitly (**Figure 3**) (Keating et al., 2013a). We found that ferrets reared with a hearing loss in one ear were able to localize sounds accurately when tested as adults, despite wearing an earplug in the developmentally occluded ear (**Figure 3A**). Moreover, these behavioral experiments revealed that they did so by relying more on the monaural spectral cues provided to the intact ear (**Figures 3D–F**). At a neurophysiological level, this was paralleled by a corresponding reweighting of auditory spatial cues in the primary auditory cortex, with neurons carrying relatively less information about binaural spatial cues and relatively more information about the spectral cues that were unaffected by the hearing loss (**Figure 3C**). Thus, the animals were able to adapt to a unilateral hearing loss during the postnatal period when the auditory system is particularly plastic by giving greater weight to the spatial cues that remain unchanged. In conjunction with previous work in humans (Newton, 1983), these results therefore show that cue reweighting represents a viable strategy for adapting to developmental changes in sensory input.

#### **CHOICE OF ADAPTIVE STRATEGY**

These studies demonstrate that the developing auditory system possesses the capacity to adapt to a partial unilateral hearing loss in one of two distinct ways. Whereas barn owls can adjust their sensitivity to the altered binaural cues (Keuroghlian and Knudsen, 2007) (**Figure 2**), mammals can accommodate the imbalance in inputs between the ears by becoming more dependent on the monaural spatial cues that remain intact (Newton, 1983; Keating et al., 2013a) (**Figure 3**). At present, however, it is unclear what determines which of these strategies is adopted. Perhaps the most obvious answer is that species differences may play an important role. Given that the barn owl possesses highly-specialized neural machinery for processing sound source location, the possibility of differences between species is certainly plausible.

The barn owl, for example, is capable of using fine-structure ITDs over the full range of frequencies to which it is sensitive, a truly remarkable feat that is made possible by phase-locking to much higher sound frequencies than is the case in mammals (Köppl, 1997). The acoustical properties of the head and ears also mean that this species respectively uses ITDs and ILDs for localizing sound sources in azimuth and elevation (Moiseff and Konishi, 1981; Moiseff, 1989). In this respect, barn owls differ considerably from mammals, which primarily use both ITDs and ILDs for determining the azimuth of a sound (Macpherson and Middlebrooks, 2002). Moreover, as discussed in an earlier section, there are differences in the way in which ITDs are encoded between these species (Harper and McAlpine, 2004). For these reasons, it is therefore possible that barn owls are capable of remapping binaural spatial cues onto abnormal spatial locations, whereas developing mammals are not.

On the other hand, barn owls may adapt to the altered binaural cues simply because this is the only viable strategy for adapting to a unilateral hearing loss in this species. In our experiments in ferrets that were monaurally deprived during development, we found that adaptive reweighting of auditory spatial cues is specific to the high sound frequencies (above approximately 16 kHz; **Figure 3D**) where spectral cues in this species are likely to be most informative (Keating et al., 2013a). Because their upper frequency limit of hearing is only approximately 10 kHz, it is possible that barn owls may not have access to the high-frequency spectral cues that would enable adaptation via cue reweighting, and may instead be forced to map altered binaural spatial cues onto their correct sound locations in order to adapt to a unilateral hearing loss. However, the frequency range over which the auditory periphery generates useful monaural spatial cues varies with the dimensions of the head and external ears. For example, studies in adult humans, where these structures are relatively large, have shown that subjects can learn to use abnormal spectral features available at much lower frequencies to make elevation judgments (Van Wanrooij and Van Opstal, 2005). Confirmation of whether barn owls are capable of experience-dependent cue reweighting will therefore require determining whether the spectral cues generated by the facial ruff in this species (Hausmann et al., 2009) contribute more to sound localization in birds that have adapted to a unilateral hearing loss.

Although these different viewpoints provide alternative, and perhaps complementary, explanations for the apparent differences between species, they remain speculative. As such, they serve to highlight important limitations in our understanding of the fundamental mechanisms underlying developmental plasticity. Under certain circumstances, for example, it is possible that both remapping and reweighting strategies could be utilized for adaptation in the same species. This would be consistent with the results of sound localization measurements in human listeners with acquired conductive hearing loss in one ear (Agterberg et al., 2012), although other work indicates that subjects with normal hearing can adapt to a temporary unilateral hearing loss without learning to use abnormal binaural spatial cues (Kumpik et al., 2010). Nevertheless, there is evidence that adult humans may be able to adapt to altered ILDs and ITDs following some acoustical manipulations (Javer and Schwarz, 1995; Shinn-Cunningham et al., 1998), so further research is needed, particularly at a neurophysiological level, to determine the relative contributions of spatial cue remapping and reweighting to the ability of the auditory system to accommodate the changes in input associated with hearing loss.

Factors likely to influence which adaptation strategy is used include the experience of the individual prior to the onset of hearing loss and the extent to which hearing is restored in the affected ear. It is conceivable, for example, that the ability to use binaural spatial cues, whether normal or abnormal, depends on normal binaural hearing during development (Seidl and Grothe, 2005; Grothe et al., 2010; Litovsky et al., 2010; Agterberg et al., 2012). Consequently, in the case of partial unilateral hearing loss,

#### **FIGURE 3 | Adapting to a unilateral hearing loss by changing the dependence of the auditory system on different spatial cues.**

**(A)** Performance on an approach-to-target sound localization task is shown for normally-reared, control ferrets fitted with an earplug in one ear for the first time, as well as ferrets reared with a unilateral earplug (juvenile-plugged) and tested with an earplug in the developmentally-occluded ear. The animals initiated a trial by waiting on a central platform and approached the source of a sound presented from one of 12 loudspeakers positioned at equal intervals around the periphery, as illustrated in the accompanying schematic (top right). Juvenile-plugged animals performed the task with much greater accuracy than controls. **(B)** When spatial cues are altered or degraded by hearing loss, the auditory system can adapt by becoming less dependent on the abnormal cues and more dependent on the cues that remain intact. **(C)** Cue reweighting in primary auditory cortex. Neural weighting index is shown for neurons in the primary auditory cortex of juvenile-plugged ferrets while a virtual earplug was experienced in the developmentally-occluded ear. Stimuli were presented over earphones so that individual cues could be manipulated independently, which enabled a weighting index to be constructed. Higher values indicate that relatively more weight was given to the spatial cues provided by the intact ear. Data are also shown for normally-reared, control animals experiencing a virtual earplug in one ear. Neural weighting index

unravelling the factors that determine the nature of the adaptive mechanism represents a key goal for future work, with implications that are of interest from both a clinical and fundamental perspective.

#### **NEURAL BASIS OF CUE REWEIGHTING**

As previously discussed, the capacity of ferrets to adapt to a conductive hearing loss in one ear during postnatal development relies on their greater use of monaural spectral cues provided by the contralateral ear, with equivalent cue reweighting being displayed by auditory cortical neurons in these animals (Keating et al., 2013a). It is presently unclear, however, whether this reflects plasticity at the level of the cortex itself or at lower levels of processing. In mammals, for example, it is known that a unilateral values are higher in juvenile-plugged animals than controls, indicating greater reliance on the unchanged spectral cues provided by the intact ear. **(D–F)** Behavioral reweighting of auditory spatial cues revealed using reverse correlation. If approach to target localization responses are determined by the spectral cues provided by the intact ear, it is possible to recover these cues using reverse correlation. Juvenile-plugged ferrets performed the task in **(A)**, but the stimulus spectra were randomized across trials. The mean stimulus spectrum across all trials was very close to zero (gray line). However, on the subset of trials on which behavioral responses were made to a particular location (60◦ in the example shown), the mean stimulus spectrum deviated considerably from zero, with distinct spectral features emerging at frequencies *>*16 kHz **(D)**. Repeating this analysis for each response location produced a reverse correlation map **(E)**, which closely resembled the directional transfer function (DTF) of the intact (right) ear **(F)**. These results indicate that localization behavior in juvenile-plugged animals is guided by spectral features that resemble those produced by the directional filtering properties of the intact ear. This was not the case in controls, indicating that the juvenile-plugged animals had developed a greater dependence on, and therefore adapted to the unilateral hearing loss by giving greater weight to, the spectral cues that are unaffected by an earplug. Modified with permission from Keating et al. (2013a).

hearing loss produces changes in the binaural tuning properties of IC neurons (Clopton and Silverman, 1977; Silverman and Clopton, 1977; Moore and Irvine, 1981; Popescu and Polley, 2010) (**Figures 4D,F**). Since IC neurons can combine the information provided by different auditory spatial cues (Chase and Young, 2005), it is possible that changes in cue integration are implemented at the level of the IC. More generally, this is consistent with the idea that the IC may be an important site of plasticity due to its role as a major site of convergence for many auditory pathways (Yu et al., 2007).

On the other hand, some measures of spatial processing by IC neurons are immune to the effects of a unilateral hearing loss, despite undergoing changes at higher levels of processing (Popescu and Polley, 2010) (**Figures 4C,E**). Moreover, plasticity

**FIGURE 4 | Maladaptive effects of asymmetric hearing loss during development. (A)** Example of a binaural interaction matrix recorded from a unit in the central nucleus of the inferior colliculus (ICc) in a normally reared rat. Contralateral sound level is plotted against ipsilateral level, with color denoting the number of spikes fired for each combination. Firing rates typically increase as the contralateral level is increased, but are suppressed when the ipsilateral level exceeds that in the contralateral ear. For each interaural level combination enclosed by the blue box, binaural suppression was quantified by comparing it with the linear sum of its monaural intercepts (e.g., blue cross relative to the sum of the red and green crosses). **(B)** To investigate the developmental effects of monaural deprivation, rats were reared with a hearing loss in one ear that was induced by ligation of the ear canal, which was reversed prior to electrophysiological experiments. Bilateral recordings were then performed in the ICc and primary auditory cortex (A1) of these animals and compared with data from sham-operated controls reared with normal hearing. **(C,D)** Examples of binaural interaction matrices from A1 **(C)** and ICc **(D)** in sham operated controls (left), and in ligated animals. For ligated animals, data are shown for the hemisphere ipsilateral (middle) and contralateral (right) to the ligated ear. Color scales and axis labels are identical to (A). **(E,F)** Ipsilaterally mediated suppression expressed as a function of ILD for A1 **(E)** and ICc **(F)** recordings. Data are shown for sham operated controls (open symbols) as well as ligated animals, both contralateral (gray) and ipsilateral (black) to the ligated ear. Asterisks denote significant differences between ligated animals and controls, with asterisk grayscale indicating the hemisphere in which the comparison was made. Error bars show SEMs. Ipsilaterally mediated suppression in A1 of monaurally deprived animals is increased in the hemisphere contralateral to the deprived ear **(E)**, but not in the corresponding hemisphere of the ICc **(F)**. Conversely, ipsilaterally mediated suppression is reduced in the ICc ipsilateral to the deprived ear **(F)**, but this effect is not apparent at the level of A1 **(E)**. These results suggest *(Continued)*

#### **FIGURE 4 | Continued**

that monaural deprivation induces persistent changes in the strength of ipsilateral input, which acts to weaken the representation of the deprived ear at the level of the ICc and strengthen the representation of the intact ear at the level of A1. In both cases, this increases the relative strength of the intact ear, and produces maladaptive shifts in ILD sensitivity. Modified with permission from Popescu and Polley (2010).

observed in the spatial response properties of IC neurons is often more extensive at higher levels of processing (Popescu and Polley, 2010). This suggests that additional changes may emerge either in the thalamus or the cortex (Popescu and Polley, 2010; Polley et al., 2013). Indeed, the involvement of multiple processing levels in adaptive plasticity has been demonstrated in adult ferrets. Thus, the ability of these animals to relearn to localize sound following a unilateral hearing loss depends on corticocollicular connections (Bajo et al., 2010) as well as the integrity of the cortex (Nodal et al., 2010). Similarly, plasticity induced by conditioning or by focal electrical stimulation in the tonotopic representation of the auditory cortex leads to changes in frequency tuning in the IC that depend on the relationship between activity in these two structures (Yan et al., 2005). There is therefore growing evidence that the effects of sensory experience on auditory perception may be mediated at the level of the cortex and then transmitted to subcortical sites via descending feedback connections (Bajo and King, 2013) (**Figure 1C**).

Whilst it is unclear whether similar principles apply to developing animals, these studies highlight the possibility that cue reweighting may be implemented by interactions between the midbrain and cortex. Consistent with this view, multisensory cue integration in the SC is thought to depend on descending input from "association" cortex, both during adulthood and development (Wallace and Stein, 1994; Jiang et al., 2006). However, since these results do not necessarily imply that cortical input influences unisensory integration at a subcortical level (Alvarado et al., 2007), a key step toward testing this model would be to determine whether IC neurons show evidence for cue reweighting.

#### **INCOMPLETE AND MALADAPTIVE CHANGES IN SPATIAL PROCESSING**

Although developmental plasticity can be beneficial, enabling an individual to adapt to a particular environment, these adaptive changes are often incomplete. For example, ferrets reared with an earplug in one ear localize sounds less well whilst wearing an earplug than controls with normal hearing (Keating et al., 2013a). Similarly, children who experience a unilateral hearing loss appear to be unable to fully adapt to the abnormal acoustical input available to them (Newton, 1983), and show clear deficits in sound localization (Viehweg and Campbell, 1960; Humes et al., 1980; Newton, 1983; Bess et al., 1986). In addition, whilst barn owls show very little residual bias in sound localization following adaptation to a unilateral hearing loss, the spatial precision of behavioral responses often remains slightly worse than controls (Knudsen et al., 1984a). Thus, despite clear differences between birds and mammals, an inability to fully compensate for the effects of asymmetric hearing loss is common to many different species. For tasks that are likely to require more complex processing, such as binaural unmasking in the presence of spatially discordant signals and noise, there is currently little evidence for any adaptation (Moore et al., 1999). The degree to which the developing auditory system can accommodate changes in input following hearing loss therefore depends on the nature of the task, and even where adaptation clearly occurs, such as in directional hearing tasks, this may not be enough to fully restore normal performance levels.

Perhaps more damagingly, however, any adaptive changes that do occur can become maladaptive if the environment is subsequently altered. This is particularly relevant to situations where a developmental hearing loss is resolved later in life, either spontaneously or through clinical intervention. In such cases, the auditory system may have altered the way in which spatial cues are processed, thereby compromising its ability to take full advantage of normal hearing when it becomes available.

Following a developmental hearing loss in one ear, for example, barn owls are, at least initially, unable to localize sounds correctly when normal hearing is restored (Knudsen et al., 1984b). This is because these animals have adapted to the imbalance in input between the two ears by remapping binaural spatial cues onto abnormal spatial locations. Although these mappings permit accurate localization when a hearing loss is experienced in one ear, they are not appropriate for localization under normal hearing conditions. The animals therefore exhibit systematic errors in sound localization when normal hearing becomes available, which can persist indefinitely if hearing is not restored early enough in development (Knudsen et al., 1984b). At a neural level, this is paralleled by persistent abnormalities in the tuning to binaural spatial cues (Keuroghlian and Knudsen, 2007) (**Figures 2B,C**).

Similarly, in mammals reared with a stable hearing loss in one ear, the neural representation of the developmentally occluded ear is typically [though not always (Moore and Irvine, 1981)] weakened, and does not immediately return to normal when the cause of the hearing loss is removed (Clopton and Silverman, 1977; Silverman and Clopton, 1977; Brugge et al., 1985; Popescu and Polley, 2010) (**Figure 4**), even when the period of monaural hearing loss is relatively brief (Polley et al., 2013). This is consistent with the cue reweighting mechanism of adaptation, which involves relying more on the input provided by the intact ear and less on the input from the occluded ear. Although this plasticity may help the auditory system to adapt to a hearing loss in one ear, it has been proposed that it may also be responsible for amblyaudia (Popescu and Polley, 2010; Whitton and Polley, 2011; Polley et al., 2013), a persistent condition in which the brain is unable to fully exploit the acoustical input provided to one ear. This is because the changes associated with asymmetric hearing loss can impair binaural processing of normal acoustical inputs, producing deficits in ILD sensitivity when normal hearing is restored (Moore and Irvine, 1981; Popescu and Polley, 2010; Polley et al., 2013).

In a recent study, for example, rats were reared with a conductive hearing loss in one ear and binaural interactions were quantified in the ICc and auditory cortex after the hearing loss was reversed (Popescu and Polley, 2010) (**Figures 4A,B**). In the ICc ipsilateral to the developmentally deprived ear, ipsilaterallymediated suppression was weakened (**Figures 4D,F**), but this effect was not apparent at the level of A1 (**Figures 4C,E**). Conversely, ipsilaterally-mediated suppression in A1 of monaurally deprived animals was increased in the hemisphere contralateral to the deprived ear (**Figures 4C,E**), but not in the corresponding hemisphere of the ICc (**Figures 4D,F**). Together, these results show that binaural interactions may be altered by a developmental hearing loss in one ear, which can subsequently impair processing when normal hearing is restored.

Consistent with this view, humans and guinea pigs with a developmental history of asymmetric hearing loss tend to show persistent deficits in behavioral measures of binaural and spatial hearing even after normal hearing is restored (Clements and Kelly, 1978; Beggs and Foreman, 1980; Hall and Derlacki, 1988; Pillsbury et al., 1991; Wilmington et al., 1994; Moore et al., 1999; Gray et al., 2009). Since spatial hearing is important for listening in noisy and reverberant environments (Yost, 1997), this can affect performance in social and educational settings. In this way, deficits in spatial hearing may impair or delay linguistic, cognitive and social development (Moore et al., 2003; Tollin, 2010; Whitton and Polley, 2011).

Nevertheless, whilst long periods of unilateral hearing loss may weaken the neural representation of the deprived ear, this representation does not appear to be eliminated entirely. For example, ferrets reared with an earplug in one ear eventually show normal levels of binaural unmasking, although this can take up to approximately 2 years after the cause of the hearing loss has been removed (Moore et al., 1999). Consistent with these results, bilaterally deaf individuals who receive a unilateral cochlear implant typically retain some sensitivity to the non-implanted ear when a second implant is received in that ear later in life, despite experiencing a long period of unilateral stimulation (Kral et al., 2013). Although generally inferior to that associated with the first implant, this residual sensitivity might provide a neural basis for rehabilitation. A key goal for future research will therefore be to determine whether recovery from hearing loss during development can be enhanced or accelerated by specific training regimens, an approach that has been successfully used to treat deficits in language comprehension (Merzenich et al., 1996; Tallal et al., 1996). In this respect, it is critical to identify individuals who are particularly at risk and determine the developmental time point at which intervention is likely to be most successful.

## **EFFECTS OF ASYMMETRIC HEARING LOSS THROUGHOUT THE LIFESPAN**

#### **DIFFERENCES BETWEEN DEVELOPMENTAL AND ADULT PLASTICITY**

One of the key findings to emerge from studies of spatial hearing following unilateral hearing loss is that the timing of hearing loss plays a critical role in mediating its effects. Early work in the barn owl, for example, emphasized the notion of sensitive or critical periods (Knudsen et al., 1984a,b), suggesting that auditory representations of space become relatively fixed at a particular stage of development. Initially, it was thought that the critical period for spatial hearing was determined by age rather than sensory experience (Knudsen and Knudsen, 1986). Subsequent work, however, suggested that critical periods for the visual recalibration of spatial hearing may be extended by environmental enrichment (Brainard and Knudsen, 1998). In a similar vein, the critical period for the effects of auditory experience on the organization of frequency maps in the auditory cortex (Zhang et al., 2001; De Villers-Sidani et al., 2007) can be extended by exposure to environmental noise (Chang and Merzenich, 2003).

A number of studies in which individuals have been exposed to altered auditory or visual experience have, however, demonstrated that spatial hearing remains plastic in adulthood [reviewed by Irvine and Wright (2005), Keuroghlian and Knudsen (2007); King (2009); King et al. (2011)]. In particular, adaptive changes in sound localization have been observed in adult ferrets (Kacelnik et al., 2006; Nodal et al., 2010; Irving et al., 2011) and humans (Kumpik et al., 2010; Irving and Moore, 2011; Strelnikov et al., 2011) after introducing an asymmetric hearing loss. While there are clear similarities with the experience-dependent changes that can take place during development, with adaptation in mammals likely to involve cue reweighting irrespective of age (Kacelnik et al., 2006; Kumpik et al., 2010; Keating et al., 2013a), this does not mean that the neurophysiological basis for this is necessarily the same. Indeed, plasticity in the young brain may differ from that in the adult in a number of important ways. Adult plasticity, for example, seems to depend to a greater extent on behavioral training (Bergan et al., 2005; Kacelnik et al., 2006), which may be mediated by increased focus and heightened arousal (Keuroghlian and Knudsen, 2007). In this respect, the plasticity of spatial hearing in adults parallels that of frequency and intensity processing (Polley et al., 2006).

Additional work, however, suggests that adult plasticity is possible even in the absence of behavioral training, with changes in frequency tuning induced by passive exposure to specific acoustic environments (Noreña et al., 2006; Zhou et al., 2008; Pienkowski and Eggermont, 2009, 2010; De Villers-Sidani and Merzenich, 2011; Pienkowski et al., 2011; Zhou et al., 2011; Pienkowski and Eggermont, 2012; Zheng, 2012; Zhou and Merzenich, 2012; Pienkowski et al., 2013). The results of one particular study, for example, suggest that it may be possible to reinstate more extensive plasticity in adults by exposure to moderate levels of acoustic noise (Zhou et al., 2011), a result that parallels findings of increased plasticity in visual cortex following a period of dark exposure (Duffy and Mitchell, 2013). This suggests that patterned sensory input may affect not only the development of neural processing but its subsequent maintenance later in life (Shepard et al., 2012).

On the other hand, the effects of attention and reward are not limited to adults, with developing circuits relevant to vocalization showing a similar susceptibility to these influences (Doupe and Kuhl, 1999). Together, these results question whether behavioral training plays a unique role in adult plasticity, and instead point toward the possibility that training enhances and accelerates plasticity throughout the lifespan. Indeed juvenile animals may be even more susceptible to the effects of auditory training than adults. For example, Sarro and Sanes (2011) found that behavioral training induced greater plasticity in juvenile gerbils than an equivalent amount of training in adulthood. The key difference between juveniles and adults may therefore lie in their susceptibility to change, with developing animals capable of more rapid and extensive changes. The reasons for this difference remain to be identified, but a variety of factors are thought to contribute, including differences in neuromodulatory influences (Shepard et al., 2012) as well as the relative balance of excitation and inhibition (Dorrn et al., 2010).

#### **IMPORTANCE OF EARLY DEVELOPMENTAL EXPERIENCE**

Although the auditory system can adapt to a unilateral hearing loss both during development and in adulthood, the consequences of this form of auditory deprivation have been shown to vary depending on the point in development at which hearing loss first occurs. Earlier developmental onsets result in a greater weakening of the neurophysiological (Clopton and Silverman, 1977; Popescu and Polley, 2010) and neuroanatomical (Blatchley et al., 1983; Webster, 1983b) representations corresponding to the affected ear. This suggests that neural circuits responsible for integrating inputs from the two ears may be particularly labile early in development.

If spatial hearing were particularly vulnerable to hearing loss early in development, we might expect clinical intervention to be more successful early in life. Consistent with this view, one recent study used hearing aids to restore balanced hearing to children who had previously been diagnosed with a sensorineural hearing loss in one ear (Johnstone et al., 2010). When tested more than 1 year later, children who received a hearing aid before the age of 5 showed improved sound localization performance under aided hearing conditions, but children who received a hearing aid later in life did not. Although the authors of this study noted that the two groups were tested at different ages, which could have influenced the outcome, these results are consistent with the notion that early developmental experience is particularly important. Further evidence for this view has been observed in studies of patients fitted with bone conduction devices following either congenital or acquired asymmetric hearing loss. In particular, patients with a congenital hearing loss benefit less from wearing a bone conduction device than patients who had normal hearing during early development (Agterberg et al., 2011).

Recipients of cochlear implants who are congenitally deaf also provide a unique opportunity to study this question. In recent work, for example, sequential implantation of the two ears was found to impair the ability to make use of the second implant if it was implanted much later in life than the first (Graham et al., 2009; Gordon et al., 2013; Illg et al., 2013), with the associated neural impairments persisting for at least 3–4 years following the onset of bilateral stimulation. Similarly, unilateral cochlear implantation in congenitally deaf cats leads to greater functional dominance of the implanted ear, but only if animals experience unilateral stimulation early in life (Kral et al., 2013). This suggests that the auditory system may be particularly vulnerable to early experience of unilateral stimulation, which has profound implications for rehabilitation strategies following asymmetric hearing loss.

More recent work, however, points toward a more complex situation, in which different aspects of binaural sensitivity are vulnerable to hearing loss at different stages of development (Polley et al., 2013) (**Figure 5**). In particular, this study showed that cortical neurons tend to preferentially respond to contralateral locations with short-latency spikes and ipsilateral locations with longer latency spikes (**Figures 5A–C**). Hearing loss at specific ages

#### **FIGURE 5 | Precise timing of unilateral hearing loss influences**

**developmental outcome. (A)** Poststimulus time histograms (PSTHs) of spikes recorded from neurons in the primary auditory cortex of the mouse can be divided into windows that show the strongest differential sensitivity to contralateral (black, left) and ipsilateral (red, right) ILDs. Greater contralateral ILD sensitivity is typically a feature of short-latency responses, whereas greater sensitivity to ipsilateral ILDs tends to be seen in longer latency responses. **(B)** Firing rate as a function of ILD at 5 different average binaural levels, with each plot corresponding to the response window shown immediately above. Heat map is scaled to the normalized firing rate within each time window, whereas circle diameter is normalized to the maximum firing rate across both time windows. **(C)** Faint lines show firing rates as a function of ILD for each of the average binaural levels shown in **(B)**. Thicker lines show the average of ILD functions obtained with different average binaural levels. **(D)** Slopes of linear fits were calculated for the thick lines in **(C)** and provide a measure of ILD sensitivity, with larger slope values indicating greater sensitivity. Mean slope values (±s.e.m.) are shown for sham operated controls as well as mice that experienced brief periods (1–2 weeks) of unilateral hearing loss beginning at different ages (either postnatal day 12, 16, or 20). Contralateral and ipsilateral ILD sensitivity are both reduced by asymmetric hearing loss, but these different aspects of spatial processing are vulnerable at different stages of development. Asterisk indicates significant differences relative to (sham-operated) controls (*post-hoc* tests following ANOVA, *P <* 0*.*05). Modified with permission from Polley et al. (2013).

can selectively impair one of these measures of binaural processing whilst leaving the other largely intact (**Figure 5D**). These results therefore point toward the possibility that vulnerability may be most pronounced when the neural circuits in question are maturing. Because different circuits are likely to mature at different times, this means that the consequences of hearing loss may depend on the precise developmental stage at which it is experienced.

#### **EFFECTS OF A RECURRING ASYMMETRIC HEARING LOSS DURING DEVELOPMENT**

#### **INTERMITTENT PERIODS OF NORMAL EXPERIENCE MAY REVERSE AMBLYAUDIA**

Although the timing and duration of hearing loss are likely to have a significant impact on spatial hearing, the temporal pattern of the impairment may also be a critical factor. In this respect, most research in this area has been restricted to studying a particular type of auditory deprivation, namely a single episode of unilateral hearing loss that remains stable over time (Clopton and Silverman, 1977; Silverman and Clopton, 1977; Clements and Kelly, 1978; Moore and Irvine, 1981; Blatchley et al., 1983; Webster, 1983b; Brugge et al., 1985; Popescu and Polley, 2010; Polley et al., 2013). It is much less clear, however, what would happen if the auditory system were exposed to a recurring hearing loss in one ear separated by periods of normal hearing. From a clinical perspective, this is important because recurring periods of hearing loss are extremely common during infancy (Hogan et al., 1997; Whitton and Polley, 2011). Otitis media with effusion, for example, is typically experienced in discrete episodes separated by periods of normal hearing (**Figure 6A**). It is therefore of considerable interest to determine the impact of intermittent hearing loss on auditory development.

On the one hand, we might expect that the effects of hearing loss would be reduced by periods of normal hearing. Studies of the visual system, for example, have shown that the negative effects of occluding one eye during development can be at least partially (Mitchell et al., 2003, 2006, 2011), albeit not necessarily completely (Mitchell et al., 2009), reversed by providing cats with brief intermittent periods of normal visual experience. If the same were true of the auditory system, then the symptoms associated with amblyaudia might be similarly ameliorated by providing intermittent experience of normal hearing throughout development.

Alternatively, intermittent periods of hearing loss might be damaging, producing an unstable acoustical environment in which the auditory system is unable to develop properly. In one early study, for example, barn owls were reared with unilateral earplugs that alternated between the two ears every second day (Knudsen and Knudsen, 1986). In contrast to the effects of chronic occlusion of one ear, these animals failed to adapt to a hearing loss in either ear and therefore made large localization errors, suggesting that inputs need to remain stable for a minimum period of time for adaptive changes in spatial cue processing to take place.

We addressed this issue in our recent study in which ferrets were reared with a unilateral hearing loss by providing brief intermittent periods of normal hearing throughout development

(Keating et al., 2013a). As shown in **Figure 3**, the sound localization behavior of these animals indicated that significant adaptation to the unilateral hearing loss had taken place. Instead of making systematic errors when normal inputs were provided in both ears, they were indistinguishable from controls in their ability to localize sounds (**Figure 7A**). Thus, although the animals showed clear evidence for experience-dependent changes in cue integration, these changes were selectively expressed, both behaviorally (**Figures 7B,C**) and neurophysiologically (**Figure 7D**), in the presence of a unilateral hearing loss, and disappeared whenever balanced binaural hearing was available (**Figures 7B–D**). These results contrast dramatically with the persistent changes observed following removal of the ear canal occlusion in animals reared with a stable hearing loss in one ear (Clopton and Silverman, 1977; Silverman and Clopton, 1977; Clements and Kelly, 1978; Moore and Irvine, 1981; Knudsen et al., 1984a; Brugge et al., 1985; Gold and Knudsen, 2000a; Popescu and Polley, 2010; Polley et al., 2013), which suggests that relatively brief periods of normal hearing may be able to preserve the ability to use binaural cues correctly.

This is consistent with clinical observations in humans, which show that the negative effects of a recurring hearing loss are reduced as the cumulative amount of abnormal hearing is decreased (Hogan and Moore, 2003) (**Figure 6B**). Together, these studies indicate that the symptoms of amblyaudia may be at least partially reversed by providing the developing auditory system with brief periods of normal hearing. Although it is currently unclear whether intermittent experience of normal hearing preserves the integrity of neuroanatomical pathways that are degraded by exposure to a stable unilateral hearing loss (Blatchley et al., 1983; Webster, 1983b), this is likely to be the case, and therefore remains an important question for future research.

**FIGURE 7 | Context-dependent reweighting of auditory spatial cues following a recurring developmental hearing loss in one ear. (A)** Sound localization performance of ferrets reared with an earplug in one ear, either in the presence or absence of an earplug. Each symbol represents data from an individual animal. Although juvenile-plugged ferrets adapt to an earplug (see **Figure 3**), their performance improves when the earplug is removed, and approaches the mean performance level of controls under normal listening conditions (dotted black line). This means that juvenile-plugged ferrets adapt to an asymmetric hearing loss without compromising their ability to localize accurately when normal hearing becomes available. Error bars denote bootstrapped 95% confidence intervals, with solid black lines showing group means. **(B)** Context-specific reweighting of auditory spatial cues. Randomizing stimulus spectra across trials is known to degrade the usefulness of spectral cues, since it becomes unclear whether spectral features arise from the filtering effects of the head and ears or are instead properties of the stimulus itself. Whilst wearing an earplug, sound localization performance in juvenile-plugged ferrets declined as the amount of spectral randomization was increased, but this effect largely disappeared once the earplug was removed. Each line shows data for an individual animal, either with an earplug in place (solid, dark blue), following earplug removal (solid, pink), or after the reintroduction of an earplug (dotted, dark blue). **(C)** To quantify the effects of spectral randomization, slope values were calculated for the lines in **(B)** and are plotted for different hearing conditions. Each symbol shows data from an individual animal, with solid black lines indicating group means. Performance of juvenile-plugged ferrets was only impaired by randomization (negative slope values) when one ear was occluded. This means that the localization behavior of juvenile-plugged ferrets became more dependent on the spectral cues available to the contralateral ear whenever a unilateral earplug was present. The lack of effect of spectral randomization in the absence of the earplug suggests that the animals were relying on binaural cues when normal inputs were available. **(D)** Neural weighting index values are shown for neurons in the primary auditory cortex of juvenile-plugged ferrets, either in the presence or absence of a virtual earplug. Higher values indicate greater reliance on the spectral cues provided by the developmentally non-occluded ear. In juvenile-plugged ferrets, neural weighting index values change depending on whether a virtual earplug is present or not. Controls do not show the same effect, and are indistinguishable from juvenile-plugged ferrets under normal hearing conditions (dotted black line shows mean neural weighting index values for controls). This means that recurring monaural deprivation during development leads to cortical neurons weighting auditory spatial cues differently depending on whether a hearing loss is experienced, providing a possible neural basis for the cue reweighting observed behaviorally. Modified with permission from Keating et al. (2013a).

Another important issue, however, concerns the amount of normal sensory experience that is required to avoid the negative effects of hearing loss. In humans, deficits in spatial hearing are observed if the proportion of abnormal hearing exceeds 50% (Hogan and Moore, 2003) (**Figure 6B**). In ferrets, however, normal sound localization abilities can develop if the proportion of normal hearing is approximately 20% (Keating et al., 2013a). Although there are many factors that could explain this difference, it is likely that different tasks may be more or less sensitive to abnormal hearing during development. Consequently, whilst relatively simple tasks, such as localizing single sound sources in quiet environments (Keating et al., 2013a), may require very little experience of normal hearing, performance of tasks that involve more complex processing (Hogan and Moore, 2003) may require relatively more experience of normal hearing for these abilities to be preserved following a period of asymmetric hearing loss (Wilmington et al., 1994). An additional possibility is that it may be beneficial to have access to at least one set of cues that remains stable over time. If this were the case, then this would explain why subjects reared with entirely normal hearing in one ear (Keating et al., 2013a) appear to be less affected by hearing loss than individuals who intermittently experienced abnormal hearing in either ear (Knudsen and Knudsen, 1986; Hogan and Moore, 2003). Future work, however, will be necessary to resolve this issue.

#### **CONTEXT-SPECIFICITY OF DEVELOPMENTAL PLASTICITY**

Although studies of recurring hearing loss are directly relevant to otitis media with effusion and its treatment (Hogan et al., 1997; Whitton and Polley, 2011), they also have broader implications for our understanding of sensory processing in complex multi-context environments (Qian et al., 2012). In naturalistic situations, for example, the sensory environment may change dramatically over time. This variability, however, may not be entirely random, but may instead have specific statistical properties. The acoustical properties of a classroom, for example, may be very different from those experienced beside a busy road, although each may remain similar over time. The acoustical environment may therefore transition between distinct contexts, each of which is characterized by relatively stable statistical properties (Qian et al., 2012). By alternating between normal and abnormal acoustical contexts, recurring forms of hearing loss therefore provide an excellent experimental model for studying a much wider class of problem faced by the brain.

Our finding that juvenile ferrets can adapt to a recurring hearing loss in one ear by relying more on the monaural spectral cues provided to the intact ear when the hearing loss is present, whilst maintaining sensitivity to binaural cues under normal hearing conditions, suggests that the developing auditory system can process spatial cues in different ways depending on the specific environmental context in which they are experienced (Keating et al., 2013a) (**Figure 7**). Such context-dependent processing therefore enables the auditory system to adapt to a unilateral hearing loss without compromising its ability to use normal spatial cues. On the grounds that bilingual individuals can learn to interpret the same acoustic tokens in different ways depending on the linguistic context in which they occur (Werker, 2012; Buchweitz and Prat, 2013), we can think of this type of plasticity as "spatial bilingualism."

By showing that developmental plasticity enables contextdependent cue integration, this result ties together two distinct lines of research on cue integration. On the one hand, developmental studies have demonstrated that the emergence of cue integration depends on prior sensory experience (Wallace and Stein, 2007; Xu et al., 2012; Yu et al., 2013). On the other hand, numerous studies in adults have shown that cue weights can be rapidly updated to reflect their relative reliability (Ernst and Banks, 2002; Alais and Burr, 2004). In particular, behavioral experiments in adult humans have demonstrated that cue weights can be selectively updated for certain contexts or object classes, but not others (Jacobs and Fine, 1999; Atkins et al., 2001; Seydell et al., 2010). Very little is known, however, about the neural mechanisms that might enable the developing brain to learn to weight different cues (Fiser et al., 2010; Berkes et al., 2011).

Similarly, in the auditory system, there is accumulating neurophysiological evidence for various forms of context-dependent processing. Cortical activity, for example, shows rapid changes when individuals are required to perform specific tasks, with the nature of these changes being determined by the specific task requirements (Fritz et al., 2003, 2005; David et al., 2012; Mesgarani and Chang, 2012). Although these effects are thought to be mediated by attention, sensory context is also known to influence auditory processing, since the tuning properties of neurons can be modified by prior acoustical stimulation (Dean et al., 2008; Dahmen et al., 2010; Wen et al., 2012; Yaron et al., 2012; Nelken and De Cheveigne, 2013; Stange et al., 2013). Some studies have investigated the emergence of context-specific plasticity in the auditory system (Diamond and Weinberger, 1989; Cohen et al., 2011), but this work has tended to focus on the adult. Consequently, it is less clear how context-dependent processing emerges during development. In this respect, developmental studies of recurring hearing loss may therefore provide a useful tool for future work.

#### **NEURAL TRACES OF CONTEXT-DEPENDENT PLASTICITY**

In the specific case of spatial hearing, a key goal for future research will be to characterize the neural circuitry that enables developmental plasticity to be selectively expressed in specific contexts. One possibility is that the brain maintains neural circuits that are appropriate for different sensory contexts but functionally silences circuits that are not appropriate for the prevailing sensory conditions. Consistent with this view, studies in barn owls have shown that prism rearing results in the emergence of novel connections between the ICX and the optic tectum, which provide an anatomical basis for adaptive shifts in ITD tuning that realign the tectal maps of auditory and visual space. These abnormal connections persist following the removal of the prisms and coexist with connections that are appropriate to normal sensory experience, which likely accounts for the capacity of the owls to readapt in later life to the same audiovisual mismatch encountered during development (Linkenhoker and Knudsen, 2002; Linkenhoker et al., 2005).

Interestingly, only one set of connections between the ICX and optic tectum appears to be functionally expressed at any given time, with GABA-mediated inhibition implicated in the selection and stabilization of a particular set of connections (Zheng and Knudsen, 1999, 2001). Similarly, following temporary monocular occlusion during development, mice acquire novel anatomical specializations that persist long after normal vision is restored (Hofer et al., 2009). Although these changes have very little impact on neurophysiological response properties under conditions of normal vision, they nevertheless appear to facilitate more rapid and extensive changes in the visual cortex following a subsequent period of monocular occlusion later in life (Hofer et al., 2006, 2009).

In this way, prior experience can produce changes in neural circuitry that appear to be functionally silenced unless the brain is exposed to specific sensory conditions. Although sensory systems may require relatively long periods of time to switch on functionally silenced circuits, it is equally possible that the brain might learn to rapidly transition between using different circuits. This is likely to be particularly true in situations, such as a recurring hearing loss in one ear, where the auditory system has plenty of experience in switching between different acoustical conditions (Keating et al., 2013a).

However, whilst a recurring hearing loss can produce dramatic changes in auditory input, acoustical conditions can also vary, albeit to a lesser extent, in naturalistic environments. Contextdependent updating of spatial processing is therefore likely to be a general feature of the auditory system, even in individuals without a history of hearing loss. Consistent with this view, two recent studies have shown that the mature auditory system can undergo rapid changes in binaural spatial processing in response to prior acoustical stimulation. In one of these studies, shifts in ILD sensitivity were observed in the ICc as a function of prior stimulus statistics (Dahmen et al., 2010), whilst the other demonstrated GABAB receptor-mediated adaptation in the population code for ITDs found in the MSO (Stange et al., 2013). Although dynamic processing of binaural cues cannot account for the context-dependent plasticity induced by unilateral hearing loss in developing animals (Keating et al., 2013a), similar principles may be involved. Indeed, the adaptive mechanisms that have been observed in normally reared adults may themselves be plastic. In this way, adapting to a recurring hearing loss, either in development or adulthood, may utilize mechanisms that contribute to flexible processing of acoustical inputs under normal hearing conditions. Consequently, studies of recurring hearing loss may provide important insights into normal as well as abnormal adaptive processes in the brain.

## **CONCLUSIONS**

Studies of asymmetric hearing loss have shown that the mechanisms underlying spatial hearing are remarkably plastic during development, with behavioral adaptation in response to altered sensory inputs reported in different species. The underlying basis for adaptation to asymmetric hearing loss appears, however, to vary across species. Thus, barn owls can learn with experience to use altered binaural spatial cues, whereas mammals appear to become more dependent on cues that remain intact and less on those that change in value as a result of the hearing loss. Further research is therefore needed to unravel the reasons for this difference. In addition, whilst the neural basis for behavioral plasticity has been well characterized in barn owls, corresponding studies in mammals are still at a comparatively early stage, highlighting the need for further work in this area.

Although this capacity for change enables the auditory system to adapt to abnormal acoustical inputs, it also represents a major source of developmental vulnerability. In many cases, prolonged periods of hearing loss may lead to amblyaudia, a condition in which the auditory system is unable to fully exploit acoustical input provided to the affected ear if normal hearing is subsequently restored. To the extent that binaural spatial hearing is important for speech comprehension in noisy, naturalistic environments, this may have secondary effects on linguistic, cognitive, social and educational development. It is therefore important to identify individuals who are particularly at risk and identify appropriate strategies for clinical intervention.

In this respect, a key finding to emerge from studies of asymmetric hearing loss is that spatial hearing may be particularly labile early in development. The timing and duration of hearing loss, as well as any necessary clinical intervention, are therefore of considerable importance. Although the stability of hearing loss is likely to play a similarly important role in determining whether intervention is necessary, as well as guiding how intervention might be successfully achieved, this issue has received much less attention in the experimental literature. Recent work, however, suggests that brief intermittent periods of normal hearing may play a protective role, reducing the longer term effects on auditory function when normal inputs become available. In this way, recurring forms of hearing loss may be less damaging than more stable deficits in hearing. Consequently, for individuals who experience a stable hearing loss in one ear, the provision of even relatively brief, intermittent periods of balanced hearing may help to protect against amblyaudia.

Investigation of how the auditory system responds to recurring hearing loss is likely to provide additional insight into a fundamental aspect of neural processing, namely the mechanisms that enable sensory inputs to be processed differently depending on the context in which they occur. In this respect, the contextdependent cue integration revealed by these studies may represent one of the mechanisms through which the auditory system maintains stable and efficient representations of auditory space in different acoustical environments.

To the extent that auditory processing and perception are influenced by sensory context, research in this field may also have important implications for rehabilitation strategies following hearing loss. It has been suggested, for example, that cochlear implants could make use of signal processing strategies that are optimized for specific acoustical contexts, and switch between those strategies depending on the prevailing acoustical conditions (Hu and Loizou, 2010). However, in addition to implementing context-dependence at the level of the device itself, stimulation and rehabilitation strategies might also be designed to exploit the ability of the auditory system to process identical inputs in different ways depending on the context in which they occur. In this way, it may be possible to leverage context-dependent processing for improving perceptual abilities following hearing loss.

## **ACKNOWLEDGMENTS**

The authors' research is supported by the Wellcome Trust through a Principal Research Fellowship (WT076508AIA) to Andrew J. King and was previously supported by a Newton Abraham Studentship to Peter Keating.

## **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 27 October 2013; accepted: 12 December 2013; published online: 27 December 2013.*

*Citation: Keating P and King AJ (2013) Developmental plasticity of spatial hearing following asymmetric hearing loss: context-dependent cue integration and its clinical implications. Front. Syst. Neurosci. 7:123. doi: 10.3389/fnsys.2013.00123*

*This article was submitted to the journal Frontiers in Systems Neuroscience. Copyright © 2013 Keating and King. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## On the similarities and differences of non-traumatic sound exposure during the critical period and in adulthood

## *Jos J. Eggermont\**

*Department of Physiology and Pharmacology, Department of Psychology, University of Calgary, Calgary, AB, Canada*

#### *Edited by:*

*Jonathan E. Peelle, Washington University in St. Louis, USA*

#### *Reviewed by:*

*Andrew J. King, University of Oxford, UK Etienne De Villers-Sidani, McGill University, Canada*

#### *\*Correspondence:*

*Jos J. Eggermont, Department of Physiology and Pharmacology, Department of Psychology, University of Calgary, 2500 University Drive N.W., Calgary, AB T2N 1N4, Canada. e-mail: eggermon@ucalgary.ca*

There is an almost dogmatic view of the different effects of moderate-level sound stimulation in neonatal vs. adult animals. It is often stated that exposure in neonates results in an expansion of the cortical area that responds to the frequencies present in the sound, being either pure tones or frequency modulated sounds. In contrast, recent findings on stimulating adult animals for a sufficiently long time with similar sounds show a contraction of the cortical region responding to those sounds. In this review I will suggest that most neonatal animal results have been wrongly interpreted (albeit generally not by the original authors) and that the changes caused in the critical period (CP) and in adulthood are very similar. Thus, the mechanisms leading to the cortical map changes appear to be similar in the CP and in adulthood. Despite this similarity, the changes induced in the CP are occurring faster and are generally permanent (unless extensive training paradigms to revert the changes are involved), whereas in adults the induction is slower and a slow recovery (months) to pre-exposure conditions takes place.

**Keywords: auditory cortex, tonotopic map, plasticity, animal, human**

## **INTRODUCTION**

The still prevailing dogma on use-dependent adult cortical plasticity is expressed in the following (Keuroghlian and Knudsen, 2007, pp. 113–114; references removed from citation, italics are mine):

"To induce adaptive plasticity in the adult central auditory system, acoustic stimuli must be behaviorally relevant. Frequency tuning is the response property that has been used most often to document plasticity in adults. The plasticity of frequency tuning has been studied in a variety of species and with a variety of training paradigms. Most of these studies have focused on the auditory cortex, specifically the primary auditory cortex (A1), but some have studied the central nucleus of the inferior colliculus (ICC) and the medial geniculate nucleus (MGN). *The results from all of these studies agree that merely exposing adult animals to an environment dominated by a particular frequency has no effect on the representation of that frequency*. Instead, in order to alter frequency tuning in adults (without lesioning the cochlea), either the animal must be conditioned to attend to the frequency, usually accomplished through positive or negative reinforcement, or the frequency must be paired with electrical microstimulation of the brain applied directly to the circuit or to sources of modulatory input such as cholinergic and dopaminergic systems."

It is the purpose of this review to demonstrate that the italicized statement cannot be taken at face value, and that the effect of passive sound exposure in adults surprisingly can produce changes in auditory cortex that are similar to those in critical period (CP) animals. Yet, differences remain particularly in the time it takes to induce the changes and in the potential for spontaneous recovery from the induced changes.

#### **ADULT ANIMAL PLASTICITY**

In 2006 we reported on an experiment that exposed adult cats to an enhanced acoustic environment (EAE), a 4–20 kHz random multi-tone pip stimulus ensemble presented at 80 dB peSPL for ∼5 months continuously (Noreña et al., 2006). The dBA equivalent level was slightly lower at 78 dB, and could thus be considered safe for long-term exposure. This was confirmed by the normal auditory brainstem response (ABR) thresholds obtained at the end of the exposure period. To our surprise we found that the neurons in A1 had ceased to respond to the majority of tone pips with frequencies between 4 and 20 kHz, the exception being a narrow band around 10 kHz. This could thus be interpreted as a contraction of the representation of the 4–20 kHz range. This finding is illustrated in **Figure 1**.

The figure compiles post-stimulus-time histograms (PSTHs) over 0–100 ms, and at stimulus levels of −5 to 65 dB SPL. The PSTHs are arranged according to SPL so as to form a compound response area of AI. The color scale represents mean peak firing rates in 2 ms bins. **Figure 1A** shows the average data for 15 control cats, and indicates that the most sensitive frequency in cat AI is around 10 kHz, which fits for frequencies over 5 kHz with data for auditory nerve fibers (Liberman, 1978) and behavior (Fay, 1988). The relatively high thresholds (relative to the cited sources) for frequencies below 5 kHz are the result of a recording bias; lower frequency neurons can be hidden in the posterior ectosylvian sulcus, i.e., unreachable with our multi-electrode arrays (two rows of four electrodes with 0.5 mm between electrodes). Because of the averaging used in these graphs, the thresholds are elevated. It is assumed that this recording bias is the same for EAE cats as well.

What can be seen in **Figure 1B**, showing the average PSTHs for four EAE cats, are the following dramatic points: (1) the

10 kHz and that the largest responses were to frequencies between 2.5 kHz

and **(D)** mean latency per frequency-intensity bin. From Noreña et al. (2006).

boundary of reduced neural activity is very sharp and coincides closely with the sharp boundary of the EAE. (2) Some neural activity is remaining around 10 kHz with nearly normal thresholds. (3) For frequencies above and below the EAE range, thresholds are reduced by up to 20 dB, and peak amplitudes are strongly enhanced, compared to controls. (4) The transient response type found in the control cats is now replaced by a response lasting as long as the tone pips. This happened in the low frequency range particularly below 1.5 kHz and thus more than an octave below the low-frequency boundary of the EAE. On the high-frequency side the enhancement borders the EAE cut-off frequency. **Figures 1C**,**D** for control cats indicates that at high sound levels a broad (in frequency), short latency and short duration response occurs, and is followed by a profound post-activation suppression. The latter is at least partly the result of feed-forward inhibition via an interneuron activated by the thalamo-cortical afferents. For EAE cats the PSTHs showed long latency and sustained responses at both the high-frequency and low-frequency border of the EAE frequency range. The longer latency of these responses caused by horizontal fiber activation can be explained by the slow conduction velocity of these horizontal fibers (*<*0.5 m/s). The absence of post-activation suppression is a strong indicator of horizontal fiber input, as these fibers typically do not produce feed-forward inhibition (Noreña et al., 2006).

We also found that the tonotopic map was profoundly changed by the EAE exposure (**Figure 2**). In the outlined region in the middle panel (same location as the pink area in the top panel) one observes a lack of responses to frequencies in the range of 5–20 kHz and an overrepresentation of higher frequencies, particularly those with CFs *>*25 kHz. In comparison for a normal tonotopic map (bottom panel) one can see a more gradual change from low frequency to high frequency sites. Note also the outlined region for the corresponding part of the top panel cartoon. We argued that the reorganization of the tonotopic map could be due to weakened thalamo-cortical and strengthened horizontal cortico-cortical fiber synapses onto the pyramidal cells.

Are these changes in responsiveness and tonotopic map the result of the very long exposure times (*>*5 mo) and 80 dB SPL that may be potentially damaging to the inner hair cell ribbon synapses (Kujawa and Liberman, 2009) despite normal ABR thresholds? A very recent study by Maison et al. (2013) exposed animals to a band-pass noise at 84 dB SPL for one week and found that despite no OHC loss, normal DPOAEs, and normal ABR thresholds, there was a small but significant reduction in the ABR wave I amplitude and a corresponding small reduction in ribbons per IHC, and IHC synapse survival. Our ABR data at that point were only used to estimate threshold, so we could not evaluate that. Being aware of these potential effects we then started using exposure levels well below the effective quiet level (Ward et al., 1976); "Effective quiet, the highest SPL of a noise that will neither produce a significant temporary threshold shift nor retard recovery from a TTS produced by a prior exposure to a higher level, is shown to be about 76 dB for octave bands of noise centered at 250 and 500 Hz, and around 68 dB for those centered at 1000, 2000, or 4000 Hz." We decided to repeat these exposure experiments at more modest levels (68 dB peSPL) and

**FIGURE 2 | Map of best frequencies at 65 dB SPL onto the cortical surface in a normal hearing adult cat reared in an enhanced acoustic environment (4–20 kHz) presented for at least 5 mo at 80 dB SPL (middle panel).** For comparison the map in an unexposed normal hearing cat is shown in the bottom panel. The cartoon of the auditory fields (top panel) indicates the region (pink) where the 4–20 kHz are normally represented. In the outlined region in the EAE cat there is a lack of responses to frequencies in the range of 5–20 kHz and an overrepresentation of frequencies above 25 kHz. From Eggermont (2008) and Pienkowski and Eggermont (2012).

shorter exposure times (6 weeks), and also checked if the response changes were genuine plastic and could spontaneously return to normal (Pienkowski and Eggermont, 2009). Again, in these experiments we did find that ABR thresholds and amplitudes at 70 dB SPL were normal. Thus, although there was a quantitative difference, we concluded that ribbon synapse loss for the 80 dB peSPL exposure was unlikely to be the cause of the findings. The compound tuning results, again in the form of PSTHs, are shown in **Figure 3**.

The top row of **Figure 3** shows the compound responses (now rotated 90◦ compared to **Figure 1**) for **(C)** control, **(B)** EAE = 80 dB SPL, and **(A)** for 68 dB SPL. Although the suppression effect is less for the lower exposure level and duration, one can clearly see from the compound response areas and the histograms, showing the % CFs of neurons, below each panel that there are dips in the responsiveness just below 20 kHz and above 4 kHz. The percentage of neurons tuned to the 10 kHz region remained unchanged compared to control. Panels **D–E** in the bottom half of the figure show the effects of recovery in quiet after 6 weeks exposure to the EAE. They indicate that the histograms showing the percentage of neuron's CFs obtain close to normal shape only after 8–12 weeks (**Figure 3F**).

However, the cortical region affected by the EAE exposure is still not normal after 3 months of recovery in quiet, because the tonotopic map remained distorted in the 4–20 kHz region (**Figure 4**). In this Figure, panel **A** shows the CF of single units measured in 15 control cats with respect to recording sites in AI, plotted on one particular cat brain. One observes the gradual increase in CFs from left (caudal) to right (frontal, see inset of a cartoon of the cat's brain) as indicated by the dotted line. In panel **B**, the sites with CFs between 4 and 20 kHz are replotted and now color-coded for the lower (4–9.9 kHz region; green), and for the 10–20 kHz region (in orange). The two centers of gravity are indicated by the large filled circles. These are located quite some distance apart indicating the significant segregation of these two frequency regions. After 6–12 weeks of recovery from the EAE (panels **C**, **D**) the two frequency regions covering the EAE are overlapping as illustrated by nearly the same position of their centers of gravity. It is presently not clear if the map will return to normal after a longer waiting time, or if this would only happen after further rearing in a different acoustic environment, potentially accompanied by training (e.g., as in Zhou and Merzenich, 2007 for CP exposed animals).

Thus, qualitatively, the data for 6 weeks exposure to a 68 dB SPL EAE, conform to those for the 5 months exposure to the same EAE presented at 80 dB SPL The tonotopic maps contract initially and whereas the CF-distribution of single units recovers to normal after 6 weeks in quiet, abnormalities in the tonotopic map persist. We subsequently tested several different EAEs; random multi-tone pip stimulus ensembles of (1) one-octave wide (2–4 kHz), and (2) two 1/3rd octave bands centered around 4 and 16 kHz (Pienkowski and Eggermont, 2010b), and (3) a 4–20 kHz filtered noise (Pienkowski et al., 2011) and found basically the same results (**Figure 5**, top two rows). In addition, the short-latency part of the averaged local field potentials (LFP) representing the thalamo-cortical input to the AI show basically the same effect (**Figure 5**, bottom two rows). A comparison between the 4–20 kHz multi-tone EAE and the 4– 20 kHz filtered noise EAE shows some differences in the response enhancements above and below the EAE frequency range, but otherwise produced the same results. An interesting effect was noted for the EAE consisting of two 1/3rd octave bands of multi-tone frequencies; here the region with decreased spike and LFP activity was nearly the same as for the contiguous 4–20 kHz EAE.

We (Pienkowski and Eggermont, 2010a) found qualitatively similar effects of passive exposure occurred when the EAE presentation was limited to 12 h/day (**Figure 5**, third column). Compared to continuous exposure at the same SPL (**Figure 5**, second column) and over a similar duration (6–12 weeks), this intermittent exposure produced a smaller decrease in AI spike and LFP activity in response to sound frequencies in the exposure range, and an increase in LFP amplitude only for frequencies above the exposure range. As expected at these moderate exposure levels, cortical changes occurred in the absence of concomitant hearing loss (i.e., ABR threshold shifts). Since there is some overlap in the amount of change in neural activity between the intermittently exposed group and the continuously exposed group, it is expected that recovery from the effects of the intermittent exposure would also take a long time.

Recently, we addressed the use of low aggregate tone-pip presentation rates (Pienkowski et al., 2013). In that paper we stated we exposed cats to a pair of 1/3rd octave bands but with the presentation rate reduced to 2.5 pips/s in the 4 kHz band. Finding a similar suppression profile in AI with this lower density EAE, we then reduced the rate in the 16 kHz 1/3rd octave band to just 0.5 pips/s, which again failed to affect the suppression profile. Thus, exposure with stimuli containing as few as 0.5 pips/s produced or at least maintained a degree of plasticity similar to that previously observed with much denser stimuli.

Zheng (2012) exposed 50 days-old, i.e., adult, rats to a 60–70 dB SPL, 4–45 kHz noise continuously for 30 days. The tonotopic map underwent a dramatic reorganization, i.e., the systematic change from low to high CF from caudal to rostral disappeared. Behavioral testing showed that fine pitch discrimination was impaired, whereas coarse-pitch discrimination remained. Interestingly, the noise-exposed rats performed similarly in a quiet and noisy acoustic testing environment, whereas control rats performed much more poorly in background noise. This suggests that noise-exposed adult animals have adapted to perception in a noise living environment, potentially by reorganizing their tonotopic maps, and frequency tuning properties.

Also extending our studies on passive sound exposure driven plasticity in adult AI, Zhou and Merzenich (2012) exposed 3-months-old, i.e., adult, rats to pulsed noise bursts delivered at 65 dB SPL for a 2-months period. This modulated broadspectrum noise exposure was intended to model the noise environments encountered in the industrial workplace and other modern acoustic settings. Frequency tuning curve bandwidths were generally increased in pulsed noise-exposed (PNE) rats, but changes in tonotopic maps were not reported. Significant behavioral impairments and negative cortical changes in temporal and spectral sound processing were induced in these PNE adult rats. They first examined the behavioral performance of PNE versus age-matched control rats by using temporal rate discrimination tasks. The results showed that a 2-months-long exposure to moderate-level structured noises significantly degraded these adult animals' abilities to discriminate between sound stimulus

rates. These post-exposure effects persisted for at least 6 weeks after the end of noise exposure. Statistical analysis showed no significant ABR-threshold differences between PNE and control rats at any frequency determined. Response thresholds and latencies recorded at cortical sites in PNE rats did not differ from those recorded in control rats. Note the strong similarity with our earlier data. Qualitatively similar post-exposure effects were

**and allowed 1–3 weeks (D), 6 weeks (E) or 8–12 weeks (F) of recovery.** In the top panels, averaged SU responses to individually-presented tone pips (at

> also documented even when exposure was limited to 10 h/day (as in Pienkowski and Eggermont, 2010a), an exposure regimen that better models a noisy-work/quiet-living environment. This study thus provides evidence that chronic exposure to moderate level of structured noises during adulthood can significantly and persistently impair central auditory processing and auditory-related perceptual abilities.

EAE. Reprinted from Pienkowski and Eggermont (2009), with permission

from Elsevier.

## **ANIMAL CRITICAL PERIOD PLASTICITY**

de Villers-Sidani et al. (2008) exposed CP rats to a 5–20 kHz band-pass noise; similar bandwidth, but a different carrier as used by Noreña et al. (2006) in adult cats, and also found a compression of the 5–20 kHz frequency range in A1. So for this stimulus there appeared to be no difference in the effect of stimulation between critical-period rats and adult cats, as we later confirmed similar results for a 4–20 kHz multi-tone stimulus and a 4–20 kHz band-pass filtered noise (Pienkowski et al., 2011). These corresponding findings prompted a new look at criticalperiod studies with respect to the effects of non-traumatic sound exposure, tones or noise, on tonotopic map representation. I will use a chronological approach.

#### **TONAL ENVIRONMENTS**

Stanton and Harrison (1996) stimulated newborn kittens for 3 months using an 8 kHz (±1 kHz) FM stimulation at a level between 55 and 75 dB SPL. The exposure produced no hearing loss as was determined from ABR recordings and cortical response thresholds in adulthood (*>*1 year old). At this time the cortical (AI) tonotopic maps were determined and compared with those in age-matched non-exposed controls. They found a significant expansion of the 6–12 kHz region. However, this has been interpreted often as an expansion of the area of stimulation. Since this was between 7 kHz and 9 kHz, the observed expansion range was much larger. Scrutinizing their Figure 1 and comparing the unexposed control CFs recorded in the 6–12 kHz range across the three animals suggests that the expansion affected predominantly units with CF *>*9 kHz, i.e., the region above the stimulation frequencies. This is similar to what we found a large enhancement in responsiveness in adults (cf. **Figure 1**).

Zhang et al. (2001) stimulated rats during the CP for 10–16 h/day with 25-ms tones (4 kHz or 19 kHz) at 60–70 dB SPL and at six pulses per s with 1-s intervals to minimize adaptation effects. In rats that were exposed to a pulsed 4-kHz tone, a low frequency (2–6 kHz) tuned sector emerged as early as post-natal day (P) 14, whereas, in naive rats, low-frequency representations did not appear until P18-P20. Thus, tonal stimulation did speed

up maturation. At P22, the posterior zone of the exposed rat's cortex (the presumptive A1 precursor) was dominated by neurons responding selectively to frequencies clustered around 4 kHz. Another three litters of rat pups were exposed to 19-kHz pulsed tonal stimuli over the same time epoch, and with the same experimental schedule. Compared to non-exposed rats, this exposure resulted in a significant increase in the area of the posterior region in which neurons were sharply tuned to CFs centered at or near 19 kHz. The changes induced in the CP persisted into adulthood. This was later (Keuroghlian and Knudsen, 2007) interpreted as "the AI came to over-represent the experienced frequency and, in this sense became customized to the acoustic environment experienced by the individual during this sensitive period." From close studying Zhang et al. (2001)'s Figure 6, one cannot escape the impression that the expansion related to 4 kHz stimulation is dominated by CFs from 2.6–9 kHz, whereas that to 19 kHz stimulation covers a wide range with CFs from 9–30 kHz. I interpret this as an expansion that is not at the stimulation frequency but in wide regions surrounding these frequencies.

A subsequent study from the Merzenich group (Nakahara et al., 2004) exposed rat pups through a period extending from P9 (hearing in rats starts to be functional at P12) to P30 (when the CP is presumed to be ending) to a tone sequence with two specific spectro-temporal patterns. This stimulation consisted of two sets of tone sequences with distinct temporal orders: a set of pulsed low-frequency tones presented in the order 2.5, 5.6, and 4 kHz; followed after a brief pause and a larger sound frequency jump by a set of pulsed high-frequency tones presented in the order 15, 21, and 30 kHz. Each tone lasted 30 ms with an intensity of 65 dB SPL. Interestingly, and in agreement with the non-selective expansion in rats exposed to isolated single tones (Zhang et al., 2001), the expanded representations in adulthood for low frequency stimulation here were not centered at 2.8 kHz, 4 kHz, and 5.6 kHz, but just below 2.8 kHz and just above 5.6 kHz (**Figure 6**). Again, results can be explained as expansions occurring above the stimulated frequencies with a contraction for the stimulated frequencies (low frequency region) or only a contraction (for high frequency stimulation).

de Villers-Sidani et al. (2007) on the other hand found for pure tone stimulation with a 7 kHz tone presented at 70 dB SPL, that the cortical region corresponding to 7 kHz ± 0.3 octave was expanded by about 20% for exposure from P11–P13 and mapping at P60. The expansion of A1 activation at 65 dB SPL ranged from 3.5–14 kHz, i.e., 1 octave on both sides of the 7 kHz tone frequency, again not limited to the exposure frequency.

Thus, with this one exception, potentially related to the particular time slot of exposure, the general finding in CP animals can be interpreted as an expansion for units with CFs above and below the stimulated frequency region, combined with a potential reduction in the cortical representation for the frequencies of stimulation.

#### **NOISE ENVIRONMENTS**

Zhang et al. (2002) exposed rat pups to pulsed (65 ms duration, once per s) broad-band noise at 65 dB SPL during P9–P28, which resulted in broader-than-normal tuning curves, in multipeaked tuning curves, and in a discontinuous tonotopic map in A1. In addition, weaker than normal temporal correlations between the discharges of nearby A1 neurons were recorded in exposed rats. In contrast, pulsed-noise exposure of rats older than P30 did not cause significant changes in auditory cortical maps. Zhou and Merzenich (2012) corroborated this by showing that exposure of 60 days old rats with pulsed noise did not affect the tonotopic map but still introduced profound behavioral changes (see above). Thus, synchronous activation of multiple frequencies appears to play a crucial role in shaping neuronal processing in the A1 during a CP. One would have expected that these synchronous activations by the noise pulses would result in synchronous firing under spontaneous conditions, however, this did not happen. This may have been a result of the discontinuity of responses within the receptive fields, albeit that the bandwidths of tuning curves at 20 dB above the threshold at CF were significantly larger than control rats.

Chang and Merzenich (2003) found that "[. . . ] rearing infant rat pups in continuous, moderate-level noise delayed the emergence of adult-like topographic representational order and the refinement of response selectivity in the A1 long beyond normal developmental benchmarks. When those noise-reared adult rats were subsequently exposed to a pulsed pure-tone stimulus, A1 rapidly reorganized, demonstrating that exposure-driven plasticity characteristic of the CP was still ongoing". de Villers-Sidani et al. (2008) showed that exposure with a 5–20 kHz band-pass noise in critical-period rats delayed the closure of the CP for this particular frequency range, whereas other frequency ranges all showed signs of critical-period closure.

Ranasinghe et al. (2012) tested whether exposure to pulsed noise or speech sounds in P9–P38 rats would alter neural representations and behavioral discrimination of speech. Both groups of rats were trained to discriminate speech sounds from P50– P100, and anesthetized neural responses were recorded from A1. Pulsed noise changed the frequency representation in A1 such that the cortical area was extended for frequencies below 3 kHz and reduced for frequencies above 10 kHz and increased frequency-tuning bandwidth. Speech-rearing only reduced the frequency representation for frequencies above 10 kHz to some extent. The representation of speech in A1 and behavioral discrimination of speech was little affected after pulsed-noise exposure. Exposure to passive speech during early development did not change speech sound processing either. Speech training increased A1 neuronal firing rate for speech stimuli in naïve rats, but did not increase responses in rats that experienced early exposure to pulsed-noise or speech. This suggests that speech sound processing is resistant to changes in cortical frequency tuning and tonotopic maps caused by manipulating the early acoustic environment.

The spectral, temporal, and intensive selectivity of neurons in the adult A1 is easily degraded in early post-natal life by raising rat pups in the presence of pulsed noise (Zhang et al., 2002). The non-selective frequency tuning recorded in these rats substantially endures into adulthood. By using a modified go/nogo training strategy, structured noise-reared rats were trained to identify target auditory stimuli of specific frequency from a set of distractors varying in frequency (Zhou and Merzenich, 2007). Tonotopicity and frequency-response selectivity returned

receptive field. **(D)** Distribution of CFs (represented by solid dots) and

permission from National Academy of Sciences, USA.

to normal after this perceptual training. Changes induced by training were retained for at least 2 months after the end of training.

#### **SUMMARY OF ANIMAL DATA**

Both adult and CP animals show plastic changes in auditory cortex following passive exposure to tonal or noise stimuli. The CP, in general, is considered a time period when the best neural representation of the environment is selected from among the many competing inputs that affect the maturing nervous system. The growth and function of lateral inhibitory circuits may be important for terminating the CP. The difficulty of this problem is highlighted by the fact that the closure of the early CP may be dependent on the input received (Zhang et al., 2001; Chang and Merzenich, 2003). Moreover, specific types of auditory experience can result in the CP remaining open in some parts of A1, but being closed in others (de Villers-Sidani et al., 2008), further emphasizing the fact that CPs are controlled by sensory inputs. Note that Zheng (2012) in adult rats exposed to continuous noise found a complete disappearance of tonotopic order, i.e., as if the rats had reentered a condition similar to the critical-period rats. Pulsed noise stimulation in neonatal animals disrupts the tonotopic map and broadens frequency tuning, whereas in adult animals map changes do not occur but behavioral effects related to broader frequency tuning are evident. Tonal stimulation in CP animals either expands the region of single frequency stimulation and up to an octave wide region on either side, or contacts the region of multi-tone stimulation and expands the surrounding frequencies. In adult animals, the stimulated region contracts regardless if stimulated with band-pass tonal or noise stimuli, whereas the bordering regions dramatically expand. These changes in adults spontaneously recover, those in CP animals only in the case of continuous noise, which delayed closure of the CP. For the pulsed noise or tonal stimulation in CP animals spontaneous recovery does not occur. The relationship between map changes in A1 and behavior remains unclear.

Are the EAEs really "passive" for the animals? Although they did not have to make responses they may have started listening outside the stimulation band in order to better communicate or listen to other environmental sounds. This "attention change" might have affected the responses in auditory cortex. Albeit not extensively discussed in this review, but represented in **Figure 5**, in our EAE series that started with the 4–20 kHz exposures (both multi-frequency tone pip stimuli or noise) we also included a 2–4 kHz multi-tone stimulus that overlapped with the dominant vocalization formants of the cats, and a combination of two 1/3rd octave bands (centered at 4 and 16 kHz) that would minimally interfere with either hearing their own vocalizations or other important environmental sounds. Yet, as **Figure 5** showed, all of these EAEs produced strong suppression/enhancement effects. Furthermore, there was no enhancement for units with CFs between the two 1/3rd octave bands, as would be expected if the cats were listening outside the stimulated 4 and 16 kHz regions. This formed the basis for us to consider the exposures "passive."

A unifying mechanism would be that stimulation suppresses neural activity at the specific frequency (ies) of stimulation, and likely by loss of lateral inhibition enhanced activity up to one octave above or below that frequency (Pienkowski and Eggermont, 2012). The exception of the expanded tonotopic map at exactly the stimulus frequency by de Villers-Sidani et al. (2007) could imply that cortical lateral inhibition is not fully formed at this early age.

## **RELEVANCE FOR HUMANS**

## **HUMAN NEONATES**

It is not exactly known whether there are similar CP s in human auditory development, but from the cochlear implant (CI) literature one may derive CPs for the necessity of auditory stimulation for binaural hearing [*<*2 years of unilateral hearing (i.e., one CI; Gordon et al., 2012)]; for the development of certain auditory evoked response components (i.e., N1; *>*3 years of deafness under the age of six; Ponton and Eggermont, 2001), and for normal language development (Svirsky et al., 2000). Conductive hearing loss in children is a major determinant of language delay and may potentially cause long-lasting deficits.

The human cochlea is fully developed by 24 weeks of gestation. A blink startle response can first be elicited (acoustically) at 24–25 weeks and is constantly present at 28 weeks. Hearing thresholds are 40 dB SPL at 27–28 weeks and reach the adult threshold of 13.5 dB SPL by 42 weeks of gestation (Birnholz and Benacerrah, 1983). Early born preterm children often end up in the neonatal intensive care unit (NICU), and quite often they show signs of auditory neuropathy and sensorineural hearing loss; however, even in case they do not, they may have other neurological problems from which they only very slowly recover (Marlow et al., 2005).

A busy NICU is by default a noisy environment. Noise is also present in the confines of an isolette or incubator. A big issue is the so far largely unknown effect of prolonged noise exposure in the NICU on the neonatal brain. Whereas it has been established that this does not cause hearing loss, it may still have profound effects on hearing, as the animal studies suggest (Zhang et al., 2001; Chang and Merzenich, 2003). In neonatal and adult animals, band-pass noise exposure leads to contracting tonotopic maps surrounded by expanding tonotopic maps (Pienkowski and Eggermont, 2012). Potential extrapolations can be drawn that pertain to human auditory development. Several studies of long-term outcomes in NICU graduates cite speech and language problems (Stjernqvist and Svenningsen, 1999; Marlow et al., 2005; Kern and Gayraud, 2007). However, few studies have specifically linked them with noise type and levels.

#### **HUMAN ADULTS**

Would the adult auditory cortical plasticity induced by the noiseand tone-EAEs in animals also develop in humans exposed to moderately loud environments in the real world? Although our 4–20 kHz noise and tone stimuli have near-identical long-term power spectra, they sound different, as the tone ensemble has a much more variable short-term frequency spectrum and a lowpass modulation spectrum. Continuous exposure to either stimulus produced a comparable suppression of neural activity in AI, suggesting that mixes of tonal and noise sounds (i.e., a more realistic, real-world noise) could have similar effects. There are several caveats, however. All of our stimuli were sharply band-limited, whereas the power spectra of natural sounds would fall off more gradually; thus, the edge effect that was proposed to enhance suppression should be smaller for more realistic sounds. This was recently confirmed for both factory noise and multi-tone EAEs with only 12 dB/oct slopes (Pienkowski et al., 2013). Another potential factor was that our exposures were less structured (more random) than typical sources of real-world noise, and may thus have been easier to "habituate to" (Kjellberg, 1990). Perhaps the most important factor would be the duration of the exposure. As mentioned above, a decrease in the suppression effect was found when the exposure was reduced from 24 to 12 h/day; a further decrease might be expected from 12 to 8 h or less. A similar reduction in the amount of suppression was found after exposure to EAEs with 12 dB/oct slopes compared to those with very steep slopes. The very long recovery times will however still result in a demonstrable effect after several weeks of exposure. The reduced effect may, furthermore, be more than offset by an intermittent, real-world recreational noise exposure that occurs over years or decades, rather than weeks or months as in our laboratory. If so, would the time course of the reversal of plasticity also be more protracted than that observed in our studies? Would full reversal even be possible, given that longer-term exposure led to a complete reorganization of the tonotopic map in AI (Noreña et al., 2006)? This awaits further investigation.

Kujala et al. (2004) reported that long-term exposure to noise had a persistent effect on central auditory processing that underlies behavioral deficits. They found that speech and sound discrimination was impaired in noise-exposed individuals, as indicated by behavioral responses and the auditory mismatch negativity (MMN) brain response. These subjects were healthy individuals exposed to occupational noise for several years, with peripheral hearing (i.e., audiological status) that did not, however, differ from that of individuals in the control group not exposed to long-term noise. These results demonstrated

#### **REFERENCES**


Fay, R. R. (1988). *Hearing in Vertebrates: A Psychophysics Databook.* Winnetka, IL: Hill-Fay Associates.


that long-term exposure to noise had long-lasting detrimental effects on central auditory processing and attention control. They recorded auditory evoked potentials from 10 healthy noiseexposed workers (exposure duration *>*5 years) and 10 matched controls with 32-channel EEG in two conditions, one including standard and deviant speech sounds, the other non-speech sounds, with novel sounds in both. The MMN was larger to nonspeech than speech sounds in control subjects, while it did not differ between the sound types in the noise-exposed subjects. Thus, subpathological changes in cortical responses to sounds may occur even in subjects without a peripheral damage but continuously exposed to noisy auditory environments. Furthermore, long-term exposure to noise had a persistent effect on the brain organization of speech processing and attention control (Kujala and Brattico, 2009). These results indicate the need to re-evaluate which noise levels can be considered safe for brain functions and raise concerns on the speech and cognitive abilities of individuals living in noisy environments.

These combined animal and human studies thus demonstrated that several aspects of mature AI function remain impaired over the long-term by an uninterrupted passive exposure to a moderate-level, spectrally-EAE. These results combined also argue strongly for the importance of more completely defining these potential hazards of moderate-level noise exposure that cannot be detected with the standard audiogram. This could have serious implications for persistently noisy work/living places, even at levels considerably below those that presently are required by law to use sound protection.

#### **ACKNOWLEDGMENTS**

This work was supported by Alberta Innovates-Health Science, by the Natural Sciences and Engineering Research Council of Canada, and by the Campbell McLaurin Chair for Hearing Deficiencies.

cochlear nerve degeneration after "temporary" noise-induced hearing loss. *J. Neurosci.* 29, 14077–14085.


years of age following extremely preterm birth. *N. Eng. J. Med*. 352, 9–19.


moderate-level sound impairs central auditory function of mature animals without concomitant hearing loss. *Hear. Res.* 261, 30–35.


passive, moderate-level sound exposure on the mature auditory cortex: Spectral edges, spectrotemporal density, and real-world noise. *Hear. Res.* 296, 121–130.


R. T. (2000). Language development in profoundly deaf children with cochlear implants. *Psychol. Sci.* 11, 153–158.


Zhou, X., and Merzenich, M. M. (2012). Environmental noise exposure degrades normal listening processes. *Nat. Commun.* 3:843. doi: 10.1038/ncomms1849

**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 11 March 2013; accepted: 17 April 2013; published online: 06 May 2013.*

*Citation: Eggermont JJ (2013) On the similarities and differences of nontraumatic sound exposure during the critical period and in adulthood. Front. Syst. Neurosci. 7:12. doi: 10.3389/fnsys. 2013.00012*

*Copyright © 2013 Eggermont. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any thirdparty graphics etc.*

## Functional and structural changes throughout the auditory system following congenital and early-onset deafness: implications for hearing restoration

#### *Blake E. Butler <sup>1</sup> \* and Stephen G. Lomber <sup>2</sup>*

*<sup>1</sup> Cerebral Systems Laboratory, Department of Physiology and Pharmacology, Brain and Mind Institute, University of Western Ontario, London, ON, Canada <sup>2</sup> Cerebral Systems Laboratory, Department of Physiology and Pharmacology and Department of Psychology, National Centre for Audiology, Brain and Mind Institute, University of Western Ontario, London, ON, Canada*

#### *Edited by:*

*Jonathan E. Peelle, Washington University in St. Louis, USA*

#### *Reviewed by:*

*Shaowen Bao, Unviersity of California-Berkeley, USA David R. Moore, University of Cincinnati College of Medicine, USA*

#### *\*Correspondence:*

*Blake E. Butler, Cerebral Systems Laboratory, Natural Sciences Centre, Brain and Mind Institute, 1151 Richmond Street North, London, ON N6A 5B7, Canada e-mail: bbutler9@uwo.ca*

The absence of auditory input, particularly during development, causes widespread changes in the structure and function of the auditory system, extending from peripheral structures into auditory cortex. In humans, the consequences of these changes are far-reaching and often include detriments to language acquisition, and associated psychosocial issues. Much of what is currently known about the nature of deafness-related changes to auditory structures comes from studies of congenitally deaf or early-deafened animal models. Fortunately, the mammalian auditory system shows a high degree of preservation among species, allowing for generalization from these models to the human auditory system. This review begins with a comparison of common methods used to obtain deaf animal models, highlighting the specific advantages and anatomical consequences of each. Some consideration is also given to the effectiveness of methods used to measure hearing loss during and following deafening procedures. The structural and functional consequences of congenital and early-onset deafness have been examined across a variety of mammals. This review attempts to summarize these changes, which often involve alteration of hair cells and supporting cells in the cochleae, and anatomical and physiological changes that extend through subcortical structures and into cortex. The nature of these changes is discussed, and the impacts to neural processing are addressed. Finally, long-term changes in cortical structures are discussed, with a focus on the presence or absence of cross-modal plasticity. In addition to being of interest to our understanding of multisensory processing, these changes also have important implications for the use of assistive devices such as cochlear implants.

#### **Keywords: hearing loss, brain development, auditory cortex, cochlear prostheses, hearing restoration**

## **INTRODUCTION**

A profound childhood hearing loss can have widespread, devastating consequences that impact a child and their family for a lifetime. Perhaps most importantly, hearing loss can prevent a child from acquiring spoken language, which has a number of subsequent developmental and psychosocial consequences (see Möeller, 2007 for review). Fortunately, interventions which bypass damaged peripheral structures have been developed that allow for the restoration of auditory input. In fact, if implanted within a sensitive period for normal development, children with cochlear implants typically go on to display expressive and receptive language skills similar to those of normal hearing children by the time they are school-aged (e.g., Svirsky et al., 2004). However, successful intervention requires that the remaining auditory structures are of sufficient anatomical integrity, and functional state. For example, while cochlear implants have been successfully applied in cases of cochlear degeneration, they require intact spiral ganglion neurons be present in order to function.

Much of what we know about the changes to auditory structures that result from deafness, and how these changes have informed the design of cochlear prostheses, has come from studies in animal models. Fortunately, the subcortical auditory system is highly conserved among mammals (e.g., Glendenning and Masterton, 1998), such that a number of animal models exist that can inform our understanding of its structure and function. Moreover, a number of *deaf* animal models exist which closely resemble common morphologies of human disease (e.g., BALB/c mice, *deafness* mice, deaf-white cats). However, it is important to note that changes in the anatomy and function of peripheral and central auditory structures depend highly upon a number of factors, including the time of onset of hearing loss and the specific nature of the impairment. This review aims to address changes that occur in response to bilateral, congenital or early-onset deafness. Other forms of deafness (e.g., late-onset, unilateral, frequency-specific, etc.) are associated with a wide variety of highly specific changes that are beyond the scope of this paper. Here we address the most common methods for acquiring deaf animal models, including some discussion surrounding whether the methods currently used to evaluate hearing impairment are sufficient. We then describe anatomical and physiological changes that occur following deafness, including structures within the cochlea, subcortical nuclei, and within auditory cortex. Finally, we discuss cross-modal reorganization that appears to follow hearing impairment with some consideration of potential mechanisms.

#### **DEAFENING METHODS**

The breadth and variability of phenotypes, mutations, mechanisms, and pathways associated with heritable deafness in humans is remarkable (Raviv et al., 2011). Thus, a high degree of variability in animal models is necessary to begin to understand the structural and functional changes associated with deafness. As a result, a number of methods have been used to produce animal models of profound deafness. While each has unique advantages and limitations, the development of any reliable technique requires that certain criteria be satisfied. Firstly, to minimize between-subject differences that might complicate the interpretation of post-deafening interventions, variability in the outcome of the procedure within a given species should be minimal. Ideally, this would include both variability in functional outcomes (i.e., threshold elevation), as well as variability in associated pathology. While there is some variability in the threshold elevation required for a deaf model, researchers typically seek models with ABR thresholds in excess of 80 dB nHL across the frequency range tested (see the section titled Measuring Deafness for more on outcome measurement). In order to avoid frequency-specific complications, any pathology associated with the deafening procedure should be uniform along the length of the cochlea. Finally, in order to minimize trauma associated with the procedure, steps should be taken to ensure the general health of the animal both during the procedure, and during post-procedural care. Here, we present a list of techniques that have been successful in generating animal models of hearing impairment, along with some commentary on their benefits and shortcomings.

#### **GENETIC MODELS OF HEARING LOSS**

Across mammalian species, a number of genes required for normal cochlear function have been identified. For example, mutations in at least six mouse genes (PAX3, SOX10, MITF, SLUG, EDN3, and EDNRB) cause hereditary auditory-pigmentary disorders that mimic Waardenburg syndrome in humans (Tachibana et al., 2003). Transgenic and knock-out mouse strains that overor under-express these genes have provided useful models of heritable conditions relating to inner and outer hair cell dysfunction that result in hearing loss (see Avraham, 2003 for review). These models allow for examination of the auditory system in great detail, and help improve our understanding of how very small-scale anatomical changes are related to hearing loss. Unfortunately, the genetic heterogeneity of hearing loss in humans involves hundreds of genes, with the possibility of multiple mutations contributing to disease etiology in any given patient (see Raviv et al., 2011, for review). Thus, while often related to hereditary hearing disorders in humans, gene-targeted models can be overly specific [e.g., a mouse model of the rare X-linked genetic mutation leading to progressive hearing loss associated with Norrie Disease (Berger et al., 1996)], such that data from these animals may not be generalizable to a large population of people with hearing impairments of varied origins.

Other models take advantage of the high incidence rates of congenital deafness that have been observed in a number of mammalian species. Examples include white minks, which provide yet another model of deafness associated with Waardenburg syndrome (e.g., Sugiura and Hilding, 1970), as well as collies (e.g., Lurie, 1948), Dalmatians (e.g., Lurie, 1948; Niparko and Finger, 1997), and deaf white cats (e.g., Bergsma and Brown, 1971), each of which models congenital deafness associated with the complete breakdown of cochlear structures often seen in the Scheibe deformity in humans. Across modalities, there is evidence that even a small amount of patterned sensory input at a very young age can initiate a cascade of developmental changes that can drastically alter the subsequent function of sensory systems (e.g., Hubel and Wiesel, 1970; Chang and Merzenich, 2003). Blocking the auditory canals of hearing animals at an early age provides a model of deafness associated with malformation of the external ear. However, this method produces an insufficient model of complete auditory deprivation, as many sounds (particularly those with low-frequency energy) are still transmitted to the cochlea via bone conduction (e.g., Popescu and Polley, 2010). Thus, congenitally deaf models are necessary to study the auditory system in the absence of input. Moreover, models like the congenitally deaf cat are useful for the study of late onset hearing (e.g., as provided by a cochlear implant), as the auditory nerve is well-preserved compared to other methods of deafening (Shepherd and Martin, 1995; Leake et al., 1999).

#### **PHYSICAL DESTRUCTION**

A second method of obtaining deaf animal models involves the physical destruction of cochlear structures. This is typically accomplished by creating an opening in the cochlea either by drilling through the cochlear wall (e.g., Sanes et al., 1992; Illing et al., 1997, 2005; Vale and Sanes, 2002), or by penetrating the round window (e.g., Tierney et al., 1997). Once exposed, the contents of the cochlea can be ablated using one of a number of small tools, and aspirated with a hollow glass needle. In the most extreme cases, the entire cochlea may be crushed with forceps and the remains aspirated (e.g., Rubio, 2006; Alvarado et al., 2009). One advantage of physical ablation is that the basilar membrane can be selectively lesioned, such that degeneration of spiral nerve fibers is largely restricted to the damaged area (Leake-Jones et al., 1982). This allows for the generation of models of partial hearing loss. However, the interpretation of pathology associated with cochlear implantation following physical ablation (Xu et al., 1993) and the growth of new bone (Leake-Jones et al., 1982) make this type of model impractical for studies of electrical stimulation of auditory nerve fibers.

An alternative method of physically ablating cochlear structures is through exposure to high-intensity sound, often in excess of 100 dB SPL (e.g., Sullivan et al., 2011). Such exposure can cause permanent damage to the cochlea that may provide a good model of frequency-specific hearing loss, such as the high-frequency impairments common in aging populations. However, the hearing loss produced is highly variable between individual animals (Bredberg, 1973; Cody and Robertson, 1983), such that the utility of noise-induced hearing loss for deafening animal models is limited.

It is also possible to prevent auditory stimuli from reaching subcortical and cortical auditory structures via bilateral transection of the auditory nerve. Unfortunately, efforts to completely transect the auditory nerve often cause inadvertent damage to adjacent vestibular nerve fibers. Conversely, overly conservative transections may preserve some auditory fibers, maintaining a partial representation of the cochlea.

#### **OTOTOXIC DRUG ADMINISTRATION**

The final class of methods used to obtain deaf animal models takes advantage of the ototoxic side-effects of common drugs. For example, their impressive efficacy and low cost make aminoglycoside antibiotics the most widely used class of antibacterial drugs worldwide (Forge and Schacht, 2000). However, their nephrotoxic and ototoxic side-effects have been well-documented. While some of the aminoglycosides (e.g., gentamycin, tobramycin, streptomycin) have been shown to be predominantly vestibulotoxic, others (e.g., neomycin, kanamycin, amikacin, dihydrostreptomycin) exhibit toxicity primarily within the cochlea. Across a number of species, the onset of this toxicity has been shown to be related to the onset of auditory function. For example, both rats (O'leary and Moore, 1992) and cats (Shepherd and Martin, 1995) receiving ototoxic drug administrations before the onset of hearing later showed normal auditory thresholds, while animals treated after the onset of hearing showed profound hearing losses. There also appears to be a sensitive period following the onset of hearing, during which animals are particularly sensitive to aminoglycoside toxicity (Henley and Rybak, 1995; Henley et al., 1996). During this period [post-natal days 11–16 in the rat (Henley et al., 1996)], decreased elimination rate constants, and increased halflives lead to much higher mean serum aminoglycoside levels in young animals than in old animals. Thus, ototoxicity is expressed 2–3 times more quickly in these younger animals (Osaka et al., 1979; Astbury and Read, 1982).

The exact mechanisms involved in aminoglycoside ototoxicity are not fully understood. Labeled aminoglycosides appear first in the stria vascularis (Wang and Steyger, 2009), suggesting that they enter the fluids of the inner ear via strial capillaries, and subsequently accumulate in hair cells. The point of entry of aminoglycosides into cochlear hair cells is also not clear. While there is some suggestion that endocytosis is the primary mechanism (Hashino and Shero, 1995; Richardson et al., 1997), others advocate for the mechano-electrical transducer channel located on the stereocilia (Marcotti et al., 2005; Waguespack and Ricci, 2005). Still others suggest transient receptor potential channels expressed in the cochlea and permissive to aminoglycosides in renal cells may play a role (see Huth et al., 2011 for review). Within the hair cells, there is some speculation that mitochondria are the target of aminoglycoside toxicity. A maternally-linked genetic predisposition to ototoxic susceptibility (Hu et al., 1991) and the potentiation of toxicity that results from the inhibition of mitochondrial protein synthesis (Hyde, 1995), suggest that the drugs target and impair the function of mitochondrial RNA. This might explain why the toxic effects of aminoglycosides are readily observed in mitochondria-rich tissues like the organ of Corti.

Regardless of the mechanisms involved, it is clear that ototoxic aminoglycosides, when administered in sufficient quantity, can be used to produce deaf animals across a number of mammalian species. Repeated intramuscular injections of an aminoglycoside result in bilaterally symmetric hearing loss that progresses from high to low frequencies (Simmons et al., 1960). The time course of this hearing loss has been described as biphasic, consisting of a dramatic reduction in high frequency hearing occurring within 48 h of the first injection, followed by a slow reduction that proceeds from high to low frequencies over several weeks (Shepherd and Clark, 1985). While effective, there is considerable variability in the extent of cochlear damage between individuals for any given drug dosage (Leake-Jones et al., 1982). Repeated drug administrations of this nature are also stressful to the animal, and time consuming for the experimenter (e.g., the cats deafened by Shepherd and Clark (1985) first showed profound low-frequency hearing loss 75 days following drug administration). Finally, the risk of kidney failure following repeated drug administrations is significant and thus, renal function must be constantly monitored during the deafening procedure.

In an effort to circumvent systemic effects, alternative methods involve aspirating the cochlear lymph, and administering aminoglycoside antibiotics directly to the cochlea (e.g., Leake-Jones et al., 1982; Zettel et al., 2003; Asako et al., 2005). This method causes the rapid-onset of profound hearing loss across the entire frequency range. While this method may prevent undesirable nephrotoxicity, the degree of destruction in the organ of Corti is extreme and may limit the types of deafness that can be modeled in this manner. Moreover, this method is not ideal for studies of cochlear implant function, as it can result in extensive fibrous tissue and bone growth within the scala tympani (Sutton and Miller, 1983).

A promising method for deafening animals involves administering an aminoglycoside antibiotic in combination with an infusion of a loop diuretic, such as furosemide or ethacrynic acid (West et al., 1973). Loop diuretics do not result in permanent ototoxicity when administered in isolation; rather, they are thought to act on the stria vascularis to reduce the endocochlear potential, causing a subsequent alteration of the ionic composition of the endolymph that fills the scala media. Typically, a single injection of an aminoglycoside is given enough time to accumulate in the cochlea. A loop diuretic is then infused, and the animal's hearing thresholds are either continuously monitored, or periodically monitored, usually via auditory brainstem responses (ABR). This method produces a rapid and dramatic bilateral hearing loss in guinea pigs (West et al., 1973; Brummett et al., 1979) and cats (Xu et al., 1993). Unfortunately, efficacy differs by species, with the same procedure resulting in only mild hearing loss, and acute renal failure in the macaque (Shepherd et al., 1994).

#### **MEASURING DEAFNESS**

Regardless of the procedure used, successful deafening depends on valid and reliable methods of measuring the degree of hearing loss achieved. For example, one early method of assessing deafness that is now rarely used involved producing a loud hand clap and observing whether an animal responded with reflexive movement of the pinnae or a startle reflex (Preyer, 1882). While the absence of Preyer's reflex has been shown to correlate well with profound hearing loss (Jero et al., 2001), the method relied heavily on subjective measure, and was incapable of distinguishing between conductive and sensorineural hearing loss.

Currently, an overwhelming majority of researchers rely on an auditory brainstem response (ABR) to define the endpoint of, or to measure the success of their deafening procedure. ABRs can be evoked using a variety of auditory stimuli. However, researchers typically rely on click-evoked ABRs to measure deafness in animal models (see **Figure 1** for an example). While clicks contain energy across a wide frequency band, click-evoked ABRs are not equally sensitive to hearing loss across this same range. Rather, the results of click-evoked ABRs in humans correlate best on average with audiometric thresholds in the range of 2–4 kHz (Watkins and Baldwin, 1999). Indeed, Shepherd and Martin (1995) noted that the click-evoked ABR is not a good predictor of highfrequency hearing loss in cats; such losses are better revealed by ABR audiograms that measure thresholds at different pure tone frequencies. Insensitivity of click-evoked ABRs to high frequency hearing loss is cause for concern when monitoring the auditory status of humans receiving aminoglycoside antibiotics, as this class of drugs is known to first impair hair cell function in the high frequency range (Simmons et al., 1960), and thus, early signs of hearing loss may be masked. However, when deafening an animal via ototoxic drug administration, the goal of the procedure is often profound hearing loss across all frequencies. Thus, the correlation between low-frequency hearing and clickevoked ABR results is less troublesome. That being said, pure tone-evoked ABR provides an alternative method for measuring hearing thresholds across a variety of frequencies, provided

comprised of 5 peaks: wave I is thought to be generated by the peripheral auditory nerve; wave II by the central auditory nerve; wave III by the cochlear nucleus; wave IV by the superior olive and lateral lemniscus; and wave V by the lateral lemniscus and inferior colliculus. Each of these characteristic peaks shows a reduction in amplitude and an increase in latency as presentation level decreases. Auditory thresholds are typically considered to lie somewhere between the presentation level at which no discernible response is present and the level at which a response is first elicited (between 20 and 25 dB, respectively, in this example).

that the frequencies chosen represent the extent of the audible frequency range of the animal in question.

In humans, the click-evoked ABR is a widely used screening tool for hearing loss in neonates (see Hyde, 2005 for review). Typically, an automated screening device provides a pass/fail output with no need for subjective evaluation, but also produces false-positive rates between 3 and 8% (Barsky-Firkser and Sun, 1997; Mason and Herrman, 1998; Mehl and Thompson, 1998). Infants who fail this initial screen are referred for further audiological assessment. In addition to deafness, a number of disease conditions can result in abnormal ABRs, including posterior fossa tumors, vertebra-vascular pathology, demyelinating diseases, central nervous system infections, and polyneuropathies (Thomsen and Tos, 1990). In humans, diagnosis of the cause of ABR abnormality requires follow-up audiometric testing and imaging. Fortunately, follow-up measures can be avoided when using ABR to assess the success of deafening, provided the animal was shown to have a normal ABR at the onset of the procedure. When deafening animal models, hearing loss is typically considered to be complete when waves I through V of the ABR are absent at stimulus intensities of 80 dB nHL or greater. While this subjective evaluation may be a cause for concern, the complete absence of wave I reflects a lack of activity in the auditory nerve (e.g., Starr, 1976) and thus, should be expected to reflect profound hearing loss throughout the central and peripheral auditory structures.

In sum, the ABR represents a quick and inexpensive method of monitoring auditory system function during or following deafening procedures. Click-evoked ABRs may be insensitive to high-frequency hearing losses that often precede impairment at lower frequencies in aminoglycoside-induced deafness. However, when seeking to ensure a profound hearing loss across all frequencies, ABR is well-suited, provided that a baseline ABR is suggestive of normal hearing status prior to deafening.

## **EFFECTS OF DEAFNESS ON THE AUDITORY SYSTEM THE COCHLEA AND COCHLEAR NERVE**

The nature of the cochlear damage involved with hearing loss in animal models differs widely depending on the nature of the deficit. For example, in the case of mechanical destruction of the cochlea, the impact to cochlear structures is decidedly dependent on the extent of damage (see **Figure 2** for an illustration of cochlear structures commonly affected). Conversely, the cochlear damage associated with genetic models of hearing loss is particular to the specific genes involved. For example, deaf white cats mimic the Scheibe deformity in humans, presenting with earlyonset, progressive cochleosaccular degeneration and severe sensorineural hearing impairment (Scheibe, 1892). However, the rate and extent of pathology are widely variable between animals. The traditionally described course of pathology involves cochleosaccular degeneration that begins at the end of the first postnatal week with the sagging and ultimate collapse of Reissner's membrane, distortion of the tectorial membrane, and atrophy of the stria vascularis (Bosher and Hallpike, 1965, 1967; Ryugo et al., 1997, 1998, 2003). However, additional forms of pathology have been described involving excessive epithelial growth within the bony labyrinth either in isolation, or in addition to the collapse of Reissner's membrane (Ryugo et al., 2003; Baker et al., 2010).

In any case, these anatomical changes are typically followed by hair cell destruction that proceeds from the cochlear base toward its apex (e.g., Leake et al., 1997; Ryugo et al., 1998), mimicking the pattern of maturation in the organ of Corti (Pujol and Marty, 1970; Romand and Romand, 1982; Lim and Anniko, 1984). The extent of cell loss ranges from the basal 20% of the cochlea in cases of threshold elevation (single unit thresholds in excess of 60 dB SPL for tones below 10 kHz; Ryugo et al., 1998) to the eventual complete loss of inner and outer hair cells, as well as supporting cells along the entire length of the basilar membrane in cases of complete deafness (Rebillard et al., 1981; Ryugo et al., 1998).

Each inner hair cell of the cochlea has a direct, one-to-one connection with a spherical bushy cell in the cochlear nucleus, via a type I spiral ganglion neuron (SGN; Sento and Ryugo, 1989). The number of SGNs is drastically reduced in hearing impaired animals; the nature of the deficit appears to depend on the degree and duration of hearing loss, as well as the age at which it occurs. In animals with congenitally elevated thresholds (Ryugo et al., 1998), damage is often limited to those cells which innervate the most basal portion of the cochlea. Short-term deafened adult animals also show maximal cell loss in basal SGNs (Leake and Hradek, 1988), while animals deafened during development, before the onset of hearing, present maximal SGN degeneration in a region approximately 40-60% from the cochlear base (Leake et al., 1991, 1992). Finally, congenitally deaf (Ryugo et al., 1998) and longterm deafened animals (Shepherd et al., 2004; Hurley et al., 2007) present with a dramatic reduction in SGNs throughout the entirety of the cochlea. The process of SGN loss begins with the loss of unmyelinated peripheral processes in the organ of Corti, followed by a gradual degeneration of myelinated processes in the spiral lamina, and of the cell somata within Rosenthal's canal (Leake and Hradek, 1988; Heid et al., 1998; Hardie and Shepherd, 1999). Surviving SGNs are devoid of a perikaryal myelin sheath (Leake and Hradek, 1988; Shepherd and Hardie, 2001), which can lead to reduced temporal resolution (Zhou et al., 1995), increased refractory properties (Shepherd et al., 2004), and evidence of conduction block (Shepherd and Javel, 1997). Schwann cells within the deafened cochlea can survive for some time despite the degeneration of SGNs, however, there is some evidence that they revert to a non-myelinating phenotype (Hurley et al., 2007).

It has been suggested that alterations of SGN structure occur secondary to cochlear pathology in a variety of species, including deaf white cats (Bosher and Hallpike, 1965, 1967; Suga and Hattler, 1970; Mair, 1973; West and Harrison, 1973; Elverland et al., 1975), mice (Mikaelian et al., 1974), Dalmatians (Johnsson et al., 1973; Mair, 1976), and humans (Altmann, 1950). There is some evidence that the survival of SGNs depends on endogenous, pro-survival neurotrophin peptides that are normally provided by the hair cells and supporting cells of the cochlea (Springer and Kitzman, 1998; Landry et al., 2011). However, others have suggested that SGN pathology represents a separate degenerative process that can precede or follow cochlear damage (Pujol et al., 1977; Leake et al., 1997). Indeed, in some cases the pattern of SGN loss differs significantly from the pattern of cochlear pathology, lending support to this latter view. Furthermore, in congenitally deaf animals, a large number of unmyelinated SGNs are found before evidence of other sensory or epithelial deficits occur, and in some cases SGN degeneration precedes damage to the sensory cells of the cochlea (Pujol et al., 1977).

Auditory nerve fibers bifurcate in the ventral cochlear nucleus, sending an ascending branch rostrally in the anterior division (AVCN), and a descending branch caudally into the posterior division (PVCN) of the ventral nucleus, which ultimately innervates the dorsal nucleus. These branches terminate in a variety of structures including endbulbs of Held, modified endbulbs, and terminal boutons which may be accompanied by a series of en passant swellings. In normal hearing animals, endbulbs of Held typically exhibit a complex arborization, with multiple branches that stem from a single, thick trunk. These endings typically contact up to half of the soma of a spherical bushy cell (SBC; Ryugo et al., 1997). In contrast, the endbulbs of Held that remain following deafness exhibit less extensive arborization, giving rise to fewer en passant and terminal swellings that are larger in size, and which contain fewer synaptic vesicles on average than those of normal hearing animals (Ryugo et al., 1997, 1998; Limb and Ryugo, 2000; Lee et al., 2003; Baker et al., 2010; O'Neil et al., 2011). The fine, interconnected varicosities and branches present in the endbulbs of Held of normal hearing animals are absent in the deaf, leading to diminished contact with the target bushy cell (Ryugo et al., 1998). In fact, evidence of morphological differences between the endbulbs of Held of deaf animals and those of hearing animals are evident at birth both in deaf white cats (Baker et al., 2010) and mice (Oleskevich and Walmsley, 2002; McKay and Oleskevich, 2007). In contrast, the modified endbulbs of deaf animals, which typically contact globular bushy cells (GBCs) in the VCN, show a drastic reduction in size, but are not different from those of normal hearing animals in terms of complexity (Redd et al., 2000). Finally, the bouton endings that synapse on multipolar cells of the cochlear nucleus are significantly smaller in congenitally deaf animals than in normal hearing controls (Redd et al., 2002).

The highly-organized pattern of neurons projecting to the cochlear nucleus helps maintain the tonotopic organization initiated in the cochlea. In hearing animals, these projections are broad prior to the onset of hearing, and are refined during a sensitive period for development occurring shortly thereafter (Snyder and Leake, 1997). However, this refinement is activity-dependent; the tonotopic specificity of projections to the cochlear nucleus is significantly degraded in hearing impaired animals (Leake et al., 2006).

#### **SUBCORTICAL NUCLEI**

#### *Cochlear nucleus (Table 1)*

In many ways, the *pattern* of ascending auditory projections in the brainstem of congenitally deaf animals appears normal (Heid et al., 1997). However, anatomical and functional changes are

**Table 1 | Summary of changes in cochlear nuclei.**

present at most levels of this pathway (**Figure 3**). The precise nature of these changes depends on a number of factors, including an animal's age at the onset of deafness and the intervention used to induce deafness. Thus, a summary table is provided for each of the following subcortical and cortical sections to allow for direct comparison across age and methodology.

Neonatal removal of the cochlea or blockade of cochlear nerve activity results in reduced cochlear nucleus volume resulting from decreases in the number of neurons (Nordeen et al., 1983; Tierney et al., 1997; Moore et al., 1998), in the size of neurons (Hulcrantz et al., 1991; Lustig et al., 1994; Saada et al., 1996; Hardie and Shepherd, 1999), or a combination of the two (Hashisaki and Rubel, 1989). The magnitude of these changes is dependent on a number of factors including the degree of ganglion cell loss (Moore and Kowalchuk, 1988; Hardie and Shepherd, 1999), and the duration of hearing impairment (Hardie and Shepherd, 1999). Additionally, changes in both neuronal size and number appear to be related to the time at which auditory input is


*PSD, Postsynaptic density, Day 0* = *Day of birth.*

removed; the greatest decrease occurs when animals are deafened long before the onset of hearing, while those animals deafened at or after hearing onset show no change (Tierney et al., 1997; Stakhovskaya et al., 2008). Finally, the decrease in volume is typically less severe in the dorsal division than in either of the ventral divisions (Moore and Kowalchuk, 1988; Anniko et al., 1989; but see Saada et al., 1996).

In the cochlear nuclei of hearing animals, post-synaptic densities (PSDs) cover the somata of bushy cells. These PSDs are punctate and tend to present as distinct convexities that indent the presynaptic membrane (Redd et al., 2000). In contrast, bushy cells of congenitally deaf animals contain PSDs that are larger and appear flattened (Redd et al., 2000; Ryugo et al., 2010), while the cell somata themselves are smaller than those of hearing animals (West and Harrison, 1973; Moore and Kowalchuk, 1988; Saada et al., 1996). While no deafness-related changes in the size of PSDs, or in the synaptic vessel density have been reported for multipolar cells in the cochlear nucleus, the cell bodies themselves are significantly smaller in deaf animals than in normal hearing controls (Redd et al., 2002). Additionally, the system of channels that exists in the synaptic cleft between these cells and the terminal boutons of ascending auditory nerve fibers (which likely functions to remove neurotransmitter from the synapse) is significantly less complex following hearing loss (Redd et al., 2002).

In addition to these anatomical differences, changes in the function of synapses in the cochlear nucleus appear to increase the likelihood of action potential generation under conditions of drastically diminished spike activity. For example, some deaf models show an increase in neurotransmitter release probability, relative to normal hearing controls (Oleskevich and Walmsley, 2002). Concurrently, hypertrophy of PSDs in the deaf cochlear nucleus may represent an upregulation of the neurotransmitter receptors in order to optimize potential responses (Redd et al., 2000). It has been suggested that the differential effects of deafness on PSD size and vesicle density between bushy cells and multipolar cells may be related to the baseline activity levels; those cells which are normally highly active (bushy cells) undergo large-scale compensatory changes following deafness, while cells with lower baseline rates of activity (multipolar cells) undergo little or no change (Redd et al., 2002). It has further been suggested that the changes occurring at bushy cell synapses may impair the ability of those cells to reliably preserve temporal coding information arriving from ganglion cells (Wang and Manis, 2006).

#### *Superior olive (Table 2)*

The superior olivary complex consists of three primary nuclei, the medial superior olive (MSO), lateral superior olive (LSO), and the medial nucleus of the geniculate body (MNTB), along with several smaller periolivary nuclei. In hearing animals, the MSO, LSO, and MNTB contribute to sound localization in the azimuth, and are tonotopically organized. Rough frequency gradients in these nuclei are established by the differential expression of ion channels (e.g., Li et al., 2001) and currents (e.g., Leao et al., 2006) along the tonotopic axis, occurring before the onset of hearing. However, these physiological gradients are dependent on spontaneous activity in the cochlear nerve, and fail to develop in congenitally deaf models that lack spontaneous spiking (von Hehn et al., 2004; Leao et al., 2006).

The MSO receives input from the cochlear nuclei bilaterally. Within the MSO of hearing animals, excitatory inputs are segregated such that ipsilateral inputs terminate on lateral dendrites while contralateral inputs terminate medially (Russell and Moore, 1995; Kapfer et al., 2002). In the absence of auditory input, dendrites of MSO neurons have been shown to undergo selective atrophy, leading to a reduction in the number, but not in the overall area of dendritic profiles (Russell and Moore, 1999). Some researchers report age-related decreases in the size of MSO neurons, and the total volume of the nucleus of congenitally deaf animals (Schwartz and Higa, 1982), while others fail to find evidence for such changes (Tirko and Ryugo, 2012). Inhibitory inputs of normal hearing mammals specialized for low frequency hearing (e.g., gerbil, cat, chinchilla) tend to be confined primarily to MSO cell bodies (Werthat et al., 2008; Couchman et al., 2010). This spatial arrangement is thought to be crucial for processing the sub-millisecond interaural differences that allow for accurate sound localization, and is the result of neural activity-dependent developmental change. Deafness causes a bilateral disruption in the spatial segregation of MSO neurons, with a significant



*MSO, Medial superior olive; LSO, Lateral superior olive; MNTB, Medial nucleus of the trapezoid body.*

reduction in inhibitory input at the cell somata (Kapfer et al., 2002; Tirko and Ryugo, 2012) and along the dendrites (Tirko and Ryugo, 2012). While the density of terminations on MSO cell dendrites does not change following hearing loss, the terminal boutons of deaf animals are significantly smaller than those of normal hearing animals (Tirko and Ryugo, 2012).

The LSO receives excitatory input from the ipsilateral cochlear nucleus, and inhibitory input from the contralateral cochlear nucleus, via the MNTB. In animal models of hearing loss, cochlear destruction leads to neural loss and shrinkage of the LSO (Moore, 1992) and a decrease in the size of cell somata in MNTB (Pasic et al., 1994). Within the MNTB, there is a large central synapse known as the calyx of Held that undergoes remarkable development to ensure the high-fidelity transfer of sound information. Interestingly, this development appears to be unrelated to both spontaneous and sound-evoked neural activity, such that the calyx of Held matures normally in deaf animals (Oleskevich and Walmsley, 2002; Oleskevich et al., 2004; Youssoufian et al., 2005). In hearing animals, the rough tonotopy established before the onset of hearing is later refined such that each neuron of the LSO receives excitatory and inhibitory inputs from neurons that respond to the same sound frequency (Kandler et al., 2009). However, the pruning that leads to this sophisticated tonotopy depends in large part on auditory-evoked activity (Gillespie et al., 2005; Kandler et al., 2009) and fails to occur following early-onset deafness.

#### *Inferior colliculus (Table 3)*

The inferior colliculus (IC) is comprised of dorsal and lateral cortices that collectively form a "rind" surrounding the central core (Winer, 2005). This central nucleus receives inputs from the cochlear nuclei, superior olives, and nuclei of the lateral lemniscus, as well as descending inputs from the auditory cortex and superior colliculus. Interestingly, the pattern of projections remains relatively unchanged following long-term auditory deprivation. For example, projections from the cochlear nucleus to the ipsilateral IC show no change in number following bilateral cochlear removal (Moore, 1990), while projections to the contralateral IC show either a small decrease (Trune, 1983) or no change at all (Moore and Kowalchuk, 1988; Moore, 1994). Similarly, the number of projections from the cochlear nucleus to IC is unaffected by congenital deafness (Heid et al., 1997). Projections from the superior olivary complex to IC are similarly unaffected by cochlear removal (Russell and Moore, 1995) or congenital deafness (Heid et al., 1997). Finally, a rudimentary representation of tonotopy persists in the IC following long-term deafness (Snyder et al., 1990, 1991; Heid et al., 1997; Shepherd and Javel, 1999), suggesting that frequency-based organization is established independent of patterned auditory activity.

While the number of projections to the IC appears unaffected by hearing loss, there appear to be substantial qualitative differences between IC neurons in hearing-deprived and normal hearing animals. The somata of IC neurons in bilaterally deafened animals undergo some degree of atrophy, resulting in a slight but significant decrease in soma area relative to normal hearing controls (Nishiyama et al., 2000). Moreover, early-onset hearing loss leads to a sharp reduction in synaptic density relative to normal hearing animals, and an apparent decrease in the number of presynaptic vesicles in many of the remaining synapses (Hardie et al., 1998). Developmental studies have shown that dramatic increases in synaptic density in the IC follow the onset of hearing, suggesting a role for stimulus-evoked neural activity in shaping connections in this nucleus (Aitkin et al., 1996, 1997).

Functionally, bilateral cochlear ablation causes a rapid loss of inhibitory (Vale and Sanes, 2000, 2002) and excitatory (Vale and Sanes, 2002) synaptic strength, as a result of changes to both pre- and post-synaptic mechanisms. Vale and Sanes (2002) have demonstrated that these changes are independent of deafferentation-induced cell death of neurons in the cochlear nucleus. IC neurons deprived of auditory input also demonstrate


**Table 3 | Summary of changes in inferior colliculi.**

*CN, Cochlear nucleus; SO, Superior olive.*

poor temporal resolution, with decreased maximum following frequencies and longer response latencies than IC neurons in normal hearing animals (Snyder et al., 1995; Shepherd et al., 1999; Vollmer et al., 2005).

#### *Medial geniculate body*

The medial geniculate body (MGB) is the auditory thalamic processing station between the inferior colliculus and the auditory cortex. Across species, the MGB is typically subdivided into multiple subsections, each of which contains several nuclei that process both afferent and efferent neural activity (Winer, 1984; Clerici and Coleman, 1990). Despite its importance to the auditory system, there is a paucity of information on changes at the thalamic level following deafness; a single study has identified normal cortical projections to A1 from the MGB of neonatally deafened animals (Stanton and Harrison, 2000). There are a number of potential reasons for this lack of information, the most likely of which is difficulty accessing thalamic structures. Because of its location, the MGB is very difficult to target, both for neuroanatomical tracer injection, and for the type of *in vivo* electrophysiological studies that have measured function at other levels of the auditory system. Changes in the pattern of projection to auditory cortex could be revealed through cortical injections of retrograde tracers; however, these studies have not yet been undertaken.

#### **AUDITORY CORTEX (TABLE 4)**

The primary auditory cortex (A1) is the most extensively studied area of auditory cortex. In congenitally deaf animals, A1 has a similar laminar structure to that of hearing animals (Hartmann et al., 1997). Electrophysiological studies have suggested that the area occupied by A1 increases slightly following neonatal deafening (Raggio and Schreiner, 1999), while the size of A1 in congenitally deaf animals appears to be no different than in hearing animals (Kral et al., 2002). However, anatomical studies demonstrate that auditory cortex decreases in size following hearing loss, and that this decrease is correlated with the age of deafness onset (Wong et al., 2013a). In particular, the size of A1 appears drastically reduced following early-onset deafness (Wong et al., 2013a) as well as in congenitally deaf animals (Wong et al., 2013b). In addition, congenitally deaf animals present with reductions in both the number of primary dendrites and in the span of dendritic trees in primary auditory cortex relative to hearing controls (Kral et al., 2006). Thus, while gross level anatomical similarities may exist between hearing and non-hearing animals, functional connectivity differs greatly between the two. For example, inputs to layers III/IV of A1 are present in congenitally deaf animals, as are subsequent inputs to more superficial, supergranular layers (Klinke et al., 1999). However, activity in deeper, infragranular layers is significantly decreased (Kral et al., 2000, 2002), and synaptic current latencies are significantly longer [after controlling for brainstem latency shifts (Kral et al., 2000; Klinke et al., 2001)], suggesting that connections between superficial and deeper layers do not mature. In hearing animals, the infragranular layers of A1 are the source of descending, feedback projections. Thus, inactivity in these layers following auditory deprivation suggests that subcortical feedback loops are likely non-functional.

In hearing animals, supergranular layers project to higherorder areas of auditory cortex. The presence of supergranular activity in electrically-stimulated deaf animals suggests that feedforward connections persist between A1 and secondary auditory areas in deaf animals, at least early in development. Feedback projections from these higher-order auditory areas primarily target the deep layers of A1 (Rouiller et al., 1991). Inactivity in the infragranular layers of deaf A1 suggests that these feedback projections and the associated top-down modulation of activity



*LTP, Long-term potentiation; A1, Primary auditory cortex.*

in A1 do not develop in deaf animals (Raizada and Grossberg, 2003). In support of this idea, *in-vitro* electrophysiological examination of hearing-deprived auditory cortex has demonstrated that layer V neurons are incapable of undergoing the sort of long-term potentiation that normally underlies synaptic plasticity (Kotak et al., 2007).

Functional changes in the primary auditory cortex of congenitally deaf animals have been explored using *in vitro* electrophysiological techniques, as well as through the introduction of peripheral electrical stimulation via a cochlear implant. Multiunit recordings from deaf A1 show slightly increased spontaneous firing rates when compared to hearing animals, which may reflect upregulated spontaneous activity in thalamic inputs (Kral et al., 2003). Additionally, the excitability of A1 neurons has been shown to increase following deprivation of afferent activity, while inhibition is decreased (Raggio and Schreiner, 1999; Kotak et al., 2005; Kral et al., 2005). Together, these results suggest that cortical neurons favor excitability, likely as a response to reduced cochlear excitation. However, when driven via electrical stimulation, evoked neural activity is decreased in congenitally deaf animals compared to hearing controls (Kral et al., 2005).

Despite changes in the rate of activity, the rudimentary features of A1 neuron responses appear to be present in congenitally deaf animals, despite a complete, and in some cases long-term lack of stimulus-evoked neural activity. For example, the rate-intenisty and latency-intensity functions of electrically-stimulated deaf A1 neurons are similar to those of hearing animals (Raggio and Schreiner, 1994, 1999). Additionally, A1 neurons from congenitally deaf animals demonstrate rudimentary binaural feature sensitivity (Tillein et al., 2010). Interestingly, there are no reports of changes in the temporal processing of electrically stimulated A1 neurons, despite changes in downstream structures (see above).

As in the IC, the auditory cortex of congenitally deaf animals maintains a rudimentary representation of tonotopy, even after extensive periods of hearing loss (Hartmann et al., 1997; Shepherd et al., 1997; Kral et al., 2001, 2002; Tillein et al., 2010), with an activation area similar to hearing controls (Kral et al., 2005). Conversely, neonatally deafened animals show a near-complete loss of tonotopic organization and a rostro-caudal spread of activation in A1 (Raggio and Schreiner, 1999; Fallon et al., 2009). Tonotopic organization of the IC remains following neonatal deafening, and thalamocortical projections to A1 have been shown to be relatively normal in deafened animals (Stanton and Harrison, 2000), suggesting that these differences in A1 tonotopy are the result of reorganization at the level of the thalamus or of A1 itself, serving to increase the overlap between adjacent basilar membrane representations.

#### **CROSS-MODAL REORGANIZATION FOLLOWING DEAFNESS**

Genetic blueprints play a significant role in the establishment of rudimentary organization throughout the auditory system prior to the onset of hearing. For example, molecular guidanace cues establish tonotopy in the cochlear nucleus in the absence of stimulus-related activity (Kandler et al., 2009), and ectopic projections from the cochlear nucleus to the superior olive are established before the onset of cochlear function (Kitzes et al., 1995; Russell and Moore, 1995). In hearing animals, this organization undergoes stimulus-evoked, activity-dependent refinement, such that adult-like perception is achieved only after hearing onset. As with other sensory systems, congenital deprivation results in an immature system that appears to persist for some time following the normal point of hearing onset. However, if sensory input is not restored before the end of the sensitive period for normal development, many auditory structures may be recruited by another sensory modality. This cross-modal reorganization of cortical structures is thought to underlie behavioral enhancements observed in the remaining sensory modalities of both animal models (e.g., Lomber et al., 2010), and of humans (see Bavelier et al., 2006 for a review).

In hearing animals, the response properties of A1 neurons remain dynamic into adulthood, undergoing rapid changes in order to optimize auditory perception. For example, animals trained to detect a tone of a particular frequency within a complex soundscape show facilitated processing for that frequency in A1 (Fritz et al., 2003). Despite evidence that primary sensory areas are *capable* of processing information from remaining sensory modalities when that information is introduced via surgical manipulation of afferent inputs (Frost and Metin, 1985; Sur et al., 1988; Ptito et al., 2001), crossmodal reorganization in the primary auditory cortex following congenital and early-onset deafness remains a contentious issue. Rebillard and colleagues (1977) reported recording visually-evoked responses to flashes of a stroboscopic light in the primary auditory cortex of both congenitally deaf and early-deafened cats. However, other researchers report an absence of neurons in A1 that are responsive to light flashes or illuminated bars (Stewart and Starr, 1970; Kral et al., 2003). This has led to the belief that A1 is not susceptible to crossmodal reorganization following sensory deprivation. This is in accordance with research in the visual system; congenital blindness leads to the processing of auditory stimuli in areas of cortex which normally process visual information in both cats (Rauschecker and Korte, 1993) and humans (Röder et al., 2000). However, cross-modal reorganization is limited to higher-order visual areas, with no change in primary visual cortex (Yaka et al., 1999; Weeks et al., 2000). Kral and colleagues (2003) also investigated whether cells in deaf A1 were responsive to somatosensory stimulation, finding none that responded to direct stimulation by a cotton pad applied to various parts of the head and body, or to puffs of air directed toward the face of the animal. However, more recent studies in early- (Meredith and Allman, 2012) and late-deaf ferrets (Allman et al., 2009) have found evidence of neurons in core auditory areas, including A1, that are responsive to strokes and taps from brushes and Semmes-Weinstein filaments, as well as puffs of air. In these latter studies, crossmodally activated neurons tended to have large, bilateral receptive fields that were not somatotopically organized. Anatomical tracer injections demonstrated that the pattern of projections between somatosensory areas and A1 in reorganized deaf animals does not differ from the pattern present in hearing animals, suggesting that crossmodal activity does not rely upon the formation of novel projections (Meredith and Allman, 2012). Thus, contradictory data exist with respect to crossmodal reorganization between deaf A1 and both the visual and somatosensory systems. While a number of factors may be involved in these discordant data, a likely candidate involves the anesthetic regimens used. Studies failing to find crossmodal activation of A1 (Stewart and Starr, 1970; Kral et al., 2003) relied on halothane anesthesia, while those demonstrating A1 neurons responsive to non-auditory stimulation used infusions of pentobarbitol (Rebillard et al., 1977), or ketamine and acepromazine (Allman et al., 2009; Meredith and Allman, 2012). Since anesthetics are known to vary in their physiological effects (e.g., Albrecht et al., 1977; Schettini, 1980), it is entirely possible that the presence of crossmodal activation in A1 may be differentially affected by the anesthetic used. Beyond the single animal examined by Rebillard and colleagues (1977), it remains to be seen whether visually-evoked activity can be recorded in deaf A1 under appropriate anesthetic conditions.

Unlike A1, there is convincing evidence that higher-order auditory areas process non-auditory stimuli in deaf animals. For example, it has been demonstrated that recruitment of auditory areas typically involved in sound localization, including the posterior auditory field (PAF; Lomber et al., 2010), and the auditory field of the anterior ectosylvian sulcus (FAES; Meredith et al., 2011), underlies enhanced peripheral localization of visual stimuli in deaf animals. In each of these cases, deaf cats were shown to more accurately detect the location of a small LED light source in the periphery of their visual field than did hearing cats. When PAF was reversibly deactivated (Lomber et al., 2010), deaf cats were no better than hearing cats at this task. Interestingly, when FAES was deactivated in the same manner (Meredith et al., 2011), the accuracy of deaf cats fell to well below that of normal cats, suggesting that deaf FAES is involved in visual target detection *in lieu of,* rather than in addition to the visual cortical area normally involved with this task. The dorsal zone (DZ) of the auditory cortex, which lies adjacent to the visual motion processing regions of the middle suprasylvian sulcus (Lomber, 2001), has been shown to mediate enhanced visual motion sensitivity in deaf animals (Lomber et al., 2010). Deaf cats outperformed hearing controls on a two-alternative forced choice task designed to determine their threshold for visual motion detection. However, the thresholds of the two groups were no different following deactivation of DZ. Finally, neurons in the anterior auditory field (AAF) of deaf animals have been shown to encode somatosensory cues from low-threshold hair receptors stimulated by a soft brush or calibrated filament, as well as movement characteristics of visual stimuli, including their velocity and direction (Meredith and Lomber, 2011).

How the sort of cross-modal reorganization described above might occur remains an issue of some debate (see Bavelier and Neville, 2002, for review). Rauschecker (1995) described several possible cortical mechanisms, including unmasking of silent inputs, stabilization of normally transient connections, sprouting of new axons, or by some combination of these processes. Indeed, anatomical studies have demonstrated that cortical sensory areas are connected both directly (Falchier et al., 2002; Rockland and Ojima, 2003; Hall and Lomber, 2008; Allman et al., 2009; Meredith and Allman, 2012) as well as via multimodal cortical areas (Cappe and Barone, 2005). Thus, it is possible that intermodal connections that are normally latent or transient may underlie reorganization. Such reorganization is often examined using tracer injections designed to determine whether the number of axons connecting sensory areas is increased following deafness. However, intersensory connections might also be strengthened via increases in dendritic branching and synapse number (with or without a change in axonal number). Thus, anterograde tracing and analysis of changes in the number of terminal boutons would provide a fuller insight into the role of intracortical connections in cross-modal plasticity. Conversely, it has also been suggested that cortical reorganization may result from changes in subcortical inputs (Allman et al., 2009). For example, both the cochlear nucleus (Shore and Zhou, 2006) and inferior colliculus (Aitkin et al., 1981) have been shown to respond to somatosensory inputs in hearing animals, and this response is enhanced following hearing loss (Shore et al., 2008; Zeng et al., 2012). In the absence of auditory input, subcortical nuclei may respond to inputs from other sensory modalities, and the reorganization of auditory cortex may simply reflect upstream processing of these changes. Cortical and subcortical mechanisms for reorganization are by no means mutually exclusive; it is likely that cross-modal plasticity involves some combination of mechanisms that depends, at least in part, on the nature of the hearing impairment, the timing of auditory deprivation, and the replacement sensory modality involved.

#### **CONCLUSIONS**

The absence of auditory input that accompanies hearing impairment causes long term changes to the structure and function of the auditory system. The exact nature of these changes depends upon factors such as the etiology and onset time of the impairment, and can have significant developmental and psychosocial consequences. Interventions including amplification and cochlear implantation may mediate these consequences, but each depends critically on the integrity and function of remaining auditory structures. Studies undertaken in deaf animal models have provided much of what is known about the function of the deaf auditory system, and have informed the development and design of hearing prostheses. Perhaps most interestingly, these studies have informed our understanding of sensitive periods in development, and their role in functional recovery following the provision of a hearing aid and/or cochlear implant. The animal studies described herein illustrate the importance of early intervention both in terms of minimizing structural and functional damage within auditory structures, as well as recovering auditory cortical areas that might otherwise be recruited by other sensory systems. However, the effects of deafness on higher-order cortical areas and the exact mechanism(s) underlying cross-modal plasticity are not yet fully understood. Thus, research using animal models will continue to inform our understanding of the far-reaching consequences of deafness as the field moves forward.

#### **ACKNOWLEDGMENTS**

This work was supported by grants to Stephen G. Lomber from the Canadian Institutes of Health Research, the Natural Sciences and Engineering Research Council of Canada, and the Hearing Foundation of Canada.

#### **REFERENCES**


function of age at cochlear implantation. *Audiol. Neurootol.* 9, 224–233. doi: 10.1159/000078392


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 15 August 2013; accepted: 03 November 2013; published online: 26 November 2013.*

*Citation: Butler BE and Lomber SG (2013) Functional and structural changes throughout the auditory system following congenital and early-onset deafness: implications for hearing restoration. Front. Syst. Neurosci. 7:92. doi: 10.3389/fnsys. 2013.00092*

*This article was submitted to the journal Frontiers in Systems Neuroscience.*

*Copyright © 2013 Butler and Lomber. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Cochlear neuropathy and the coding of supra-threshold sound

#### **Hari M. Bharadwaj 1,2 , Sarah Verhulst 1,3† , Luke Shaheen4,5 , M. Charles Liberman3,4,5 and Barbara G. Shinn-Cunningham1,2\***

<sup>1</sup> Center for Computational Neuroscience and Neural Technology, Boston University, Boston, MA, USA

<sup>2</sup> Department of Biomedical Engineering, Boston University, Boston, MA, USA

<sup>3</sup> Department of Otology and Laryngology, Harvard Medical School, Boston, MA, USA

<sup>4</sup> Eaton-Peabody Laboratories, Massachusetts Eye and Ear Infirmary, Boston, MA, USA

<sup>5</sup> Harvard-MIT Division of Health Sciences and Technology, Speech and Hearing Bioscience and Technology Program, Cambridge, MA, USA

#### **Edited by:**

Jonathan E. Peelle, Washington University in St. Louis, USA

#### **Reviewed by:**

Agnès C. Léger, Massachussets Institute of Technology, USA Stuart Rosen, University College London, UK Hedwig Elisabeth Gockel, Medical Research Council—Cognition and Brain Sciences Unit, UK

#### **\*Correspondence:**

Barbara G. Shinn-Cunningham, Center for Computational Neuroscience and Neural Technology, Boston University, 677 Beacon Street, Boston, MA 02215, USA

e-mail: shinn@cns.bu.edu

#### **†Present address:**

Sarah Verhulst, Cluster of Excellence Hearing4all, Department of Medical Physics and Acoustics, Oldenburg University, Oldenburg, Germany

Many listeners with hearing thresholds within the clinically normal range nonetheless complain of difficulty hearing in everyday settings and understanding speech in noise. Converging evidence from human and animal studies points to one potential source of such difficulties: differences in the fidelity with which supra-threshold sound is encoded in the early portions of the auditory pathway. Measures of auditory subcortical steadystate responses (SSSRs) in humans and animals support the idea that the temporal precision of the early auditory representation can be poor even when hearing thresholds are normal. In humans with normal hearing thresholds (NHTs), paradigms that require listeners to make use of the detailed spectro-temporal structure of supra-threshold sound, such as selective attention and discrimination of frequency modulation (FM), reveal individual differences that correlate with subcortical temporal coding precision. Animal studies show that noise exposure and aging can cause a loss of a large percentage of auditory nerve fibers (ANFs) without any significant change in measured audiograms. Here, we argue that cochlear neuropathy may reduce encoding precision of suprathreshold sound, and that this manifests both behaviorally and in SSSRs in humans. Furthermore, recent studies suggest that noise-induced neuropathy may be selective for higher-threshold, lower-spontaneous-rate nerve fibers. Based on our hypothesis, we suggest some approaches that may yield particularly sensitive, objective measures of supra-threshold coding deficits that arise due to neuropathy. Finally, we comment on the potential clinical significance of these ideas and identify areas for future investigation.

**Keywords: temporary threshold shift, frequency-following response, auditory steady-state response, individual differences, aging, auditory nerve, noise-induced hearing loss, temporal coding**

## **INTRODUCTION**

A significant number of patients seeking audiological treatment have normal hearing thresholds (NHT), but report perceptual difficulties in some situations, especially when trying to communicate in the presence of noise or other competing sounds (e.g., Hind et al., 2011). Such listeners are typically said to have "central auditory processing disorders", more recently known simply as "auditory processing disorders" (CAPD/APD; Catts et al., 1996; Chermak and Musiek, 1997), a catchall diagnosis testifying to how little we know about the underlying causes.

In some ways, the fact that having NHTs does not automatically predict good performance in these conditions is not particularly surprising. Audiometric thresholds measure the lowest intensities that a listener can *detect*. In contrast, the ability to *analyze* the content of sound requires a much more precise sensory representation of acoustic features across a large dynamic range of sound intensities. Specifically, current audiometric screenings test the lowest level of sound listeners can hear at various frequencies, but they do not test whether they can make judgments about the spectral or temporal content of the sound, analogous to seeing an eye doctor and being asked whether you can tell that light is present, without worrying about whether or not you can tell anything about the object the light is coming from.

Consistent with the idea that analysis of supra-threshold sound differs amongst NHT listeners, many APD patients seek help precisely because they notice difficulties in situations requiring *selective auditory attention* (Demanez et al., 2003), which places great demands on the auditory system. Moreover, recent laboratory evidence suggests that the prevalence of NHT listeners with APDlike symptoms may be greater than one might predict based on the number of people seeking audiological treatment. Specifically, in the lab, NHT listeners have vastly different abilities on the types of tasks that typically frustrate APD listeners. One recent study shows that when NHT subjects are asked to report spoken digits from one direction amidst otherwise similar speech, performance ranges from chance levels to nearly 90% correct, with the bottom quartile of listeners falling below 60% correct (Ruggles and Shinn-Cunningham, 2011). Crucially, when subjects made errors, they almost always reported a digit coming from a non-target direction rather than an unspoken digit, suggesting that differences were unlikely due to higher-level deficits involving language such as differences in speech intelligibility. Instead, the errors appeared to be due to failing to select the target stream from amidst the maskers. Yet none of the listeners in the study complained of hearing difficulties, even those at the bottom of the distribution; moreover, none had entertained the idea of seeking audiological treatment.

Differences in higher-order processing clearly contribute to individual differences in complex tasks such as the ability to selectively attend, process speech, or perform other high-level tasks (for instance see Surprenant and Watson, 2001). However, in this opinion paper, we focus on how low-level differences in the precision of spectro-temporal coding may contribute to differences in performance. We argue that poor sensory coding of suprathreshold sound is most likely to be revealed in complex tasks like those requiring selective attention, which helps to explain the constellation of symptoms that lead to APD diagnoses. Selective auditory attention hinges on segregating the source of interest from competing sources (object formation; see Bregman, 1990; Darwin and Carlyon, 1995; Alain and Arnott, 2000; Carlyon, 2004), and then focusing on that source based on its perceptual attributes (object selection; see Shinn-Cunningham, 2008; Shinn-Cunningham and Best, 2008). Both object formation and object selection rely on extracting precise spectro-temporal cues present in natural sound sources, which convey pitch, location, timbre, and other source features. Given this, it makes sense that listeners with poor supra-threshold coding fidelity notice problems in crowded social settings, an ability that depends upon robust coding of supra-threshold sound features.

Here, we argue that the fidelity with which the auditory system encodes supra-threshold sound is especially sensitive to the number of intact auditory nerve fibers (ANFs) encoding the input. In contrast, having NHTs likely depends only on having a relatively small but reliable population of ANFs that respond at low intensities. Indeed, one recent study shows that, in animals, audiometric thresholds can be normal even with only 10–20% of the inner hair cells (IHCs) of the cochlea intact (Lobarinas et al., 2013). Our hypothesis is that the convergence of multiple ANFs, while possibly redundant for detecting sound, is critical for analyzing supra-threshold sound.

In this paper, we first consider how supra-threshold sound content is normally encoded, focusing particularly on temporal coding. We then review animal evidence for *cochlear neuropathy*, a reduction in the number of ANFs responding to supra-threshold sound. We argue that this neuropathy can help explain why some listeners have difficulty performing selective attention and other supra-threshold tasks, despite having NHTs. We discuss evidence that lower-spontaneous rate ANFs (lower-SR ANFs; i.e., those with rates below about 18 spikes/s) may be especially vulnerable to damage. We hypothesize that lower-SR ANFs may play a critical role in coding supra-threshold sound features, particularly under challenging conditions. We then discuss the use of the subcortical steady-state response (SSSR) to quantify temporal coding in the early portions of the auditory pathway, including the challenges inherent in interpreting the SSSR and relating it to single-unit neurophysiology. With the help of simple models of brainstem responses, we suggest measures that may emphasize the effect of neuropathy on the SSSR. Using these ideas, we suggest future experiments to (1) test our hypothesis that cochlear neuropathy contributes to the supra-threshold coding deficits seen in some listeners; and (2) develop sensitive, objective correlates of such deficits that may be useful, clinically.

## **CODING OF SUPRA-THRESHOLD SOUND**

#### **THE DIVERSITY OF AUDITORY NERVE FIBERS**

ANFs comprise the sole conduit for information about the acoustic environment, carrying spike trains from the cochlea to the central auditory system. As schematized in **Figure 1A**, each ANF contacts a single IHC via a single synapse. At each synapse, an electron-dense ribbon sits near the pre-synaptic membrane surrounded by a halo of glutamatergic vesicles. Sound in the ear canal leads to cochlear traveling waves that deflect IHC stereocilia, causing the opening of mechanoelectric transduction channels and a graded change in the IHC membrane potential. At the IHC's synaptic pole, this sound-driven receptor potential drives an influx of calcium causing an increased probability of fusion of synaptic vesicles with the IHC membrane in the region of the ribbon. Glutamate released into the synaptic cleft binds to the AMPA-type glutamate receptors at the post-synaptic active zone, causing depolarization and action potentials in the ANF.

Between 10 and 30 ANFs synapse on each IHC, depending on species and cochlear location (**Figure 1B**), and there are roughly 3500 IHCs along the 35 mm cochlear spiral in humans. Thus, all the information we receive about our acoustic world is carried via the roughly 30,000 ANFs emanating from each cochlea. ANFs in the mammalian inner ear can be subdivided into three functional groups. The classification is based on spontaneous discharge rate (SR; i.e., the spike rate in the absence of sound), because it is easy to quantify, but the key functional differences are in the sensitivity to sound. High-SR fibers have the lowest thresholds, low-SR have the highest thresholds, and medium SR thresholds are intermediate between the two (**Figure 2A**). The distribution of SRs is fundamentally bimodal (**Figure 2B**) with roughly 40% in the lower peak (SR < about 18 spikes/second), which includes both low-SR and medium-SR fibers (15% and 25% of all ANFs, respectively) and 60% in the higher peak (Liberman, 1978). In this paper, we shall use the term lower-SR ANFs to refer jointly to the low- and medium-SR groups, which are sometimes distinguished in the literature.

Anatomical studies suggest that all three ANF types can innervate the same IHC, however, lower-SR fibers have thinner axons, fewer mitochondria, and tend to synapse on the modiolar side of the IHC. In contrast, high-SR fibers have thicker axons, more mitochondria, and synapse on the pillar side (Liberman, 1982). There are also systematic differences in the sizes of presynaptic ribbons and post-synaptic glutamate-receptor patches (Liberman et al., 2011). All three ANF types send their central axons to the cochlear nucleus (CN), where they branch, sending collaterals to the anteroventral, posteroventral, and dorsal subdivisions. Although branches from all SR types are present in each

CN subdivision, low- and medium-SR fibers give rise to more endings than high-SR fibers, especially in the small-cell cap of the anteroventral CN (Ryugo and Rouiller, 1988; Liberman, 1991). Hence, lower-SR fibers may have more downstream influence than suggested by the fact that they make up less than half of the population at the level of the auditory nerve (AN).

The diversity of ANF threshold sensitivity is believed to be important in intensity coding in the auditory system, where level discrimination abilities are near-constant over a range of 100 dB or more (Florentine et al., 1987; Viemeister, 1988). This large dynamic range may be mediated, at least in part, by the differing dynamic ranges of low-, medium-, and high-SR fibers. As represented in **Figure 2C**, high-SR fibers, whose response thresholds are at or near behavioral detection threshold, likely determine the ability to detect sounds in a quiet environment. However, 20–30 dB above threshold, their discharge rate saturates. By virtue of their higher thresholds and extended dynamic ranges, the lower-SR fibers may be particularly important for extending the dynamic range of hearing. Possibly more important is their contribution to hearing in a noisy environment. Activity of high-SR fibers is relatively easy to mask with continuous noise, as schematized in **Figure 2D**. Because they are so sensitive to sound, even near-threshold noise increases the background discharge rate of high-SR fibers. This continuous activation causes synaptic fatigue (i.e., vesicle depletion) and thus also decreases their maximum discharge rate to tone bursts or other transient signals that might be present (Costalupes et al., 1984; Costalupes, 1985). By virtue of their higher thresholds, the lower-SR fibers are more resistant to background noise. Thus with increasing levels of continuous broadband masking noise, lower-SR fibers likely become increasingly important to the encoding of acoustic signals, because they will increasingly show the largest changes in average discharge rate in response to transient supra-threshold stimuli (**Figure 2D**; also see Young and Barta, 1986).

#### **TEMPORAL CODING AND ITS IMPORTANCE FOR AUDITORY PERCEPTION**

As a result of cochlear filtering, each ANF is driven by a narrow frequency band of sound energy. Thus, the temporal information encoded by the ANFs can be logically separated into two parts; the temporal fine-structure (TFS), corresponding to the timing of the nearly sinusoidal narrowband carrier fluctuations, and the slower temporal envelope of that carrier, whose temporal fluctuations are limited by the bandwidth of the corresponding cochlear filter. For low-frequency cochlear channels, ANFs convey both TFS and envelope information; neural spikes are phaselocked to the carrier and the instantaneous firing rate follows the envelope. At higher frequencies, ANFs do not phase lock to the TFS; however, responses convey temporal information by phase locking to envelope fluctuations.

Although different perceptual attributes of natural sound are encoded by different spectro-temporal cues, many depend on reliable timing information. For instance, the computation of interaural time differences (ITD), important for spatial perception of sound, requires temporal precision on the order of tens of microseconds (Blauert, 1997). While perceptually, TFS information in low-frequencies is the dominant perceptual cue determining perceived location (at least in anechoic conditions; Wightman and Kistler, 1992), for broadband and high-frequency sounds, ITDs can be conveyed by the envelope alone. Moreover, high-frequency envelope ITDs can be perceived nearly as precisely as low-frequency TFS ITDs (Bernstein and Trahiotis, 2002). In addition, envelopes may play a significant role in space perception in everyday settings such as rooms, where reverberant energy

Schematic rate-vs-level functions for high-, medium-, and low-SR fibers to

modulated (AM) at 100 Hz. Data from cat (Joris and Yin, 1992).

distorts TFS cues (Bharadwaj et al., 2013; Dietz et al., 2013). The coherence of the temporal envelope across channels helps to perceptually bind together different acoustic constituents of an "object" in the auditory scene (Elhilali et al., 2009; Shamma et al., 2011). Coding of pitch and speech formants also may rely, at least in part, on both TFS and envelope temporal information, although the precision needed to convey this information is less than that needed to extract ITDs (see Plack et al., 2005 for a review). On an even slower time scale, speech meaning is conveyed by fluctuations in energy through time. Thus, a range of temporal features in both TFS and envelopes are necessary to enable a listener to parse the cacophonous mixture of sounds in which they commonly find themselves, select a sound source of interest, and analyze its meaning. Importantly, almost all of these tasks, when performed in everyday settings, require analysis of temporal information at supra-threshold sound intensities.

To exacerbate matters, everyday settings typically contain competing sound sources and reverberant energy. Both degrade the temporal structure of the sound reaching a listener's ears, reducing the depth of signal modulations and interfering with the interaural temporal cues in an acoustic signal. If amplitude modulation is weakly coded in a listener with cochlear neuropathy, degradations in the input signal modulations due to competing sound and reverberant energy may render spatial information diffuse and ambiguous, pitch muddy, and speech less intelligible (e.g., see Stellmack et al., 2010; Jørgensen and Dau, 2011). TFS cues convey information important for speech intelligibility in noise (Lorenzi and Moore, 2008). Envelope cues are important for speech-on-speech masking release (Christiansen et al., 2013). Given all of this, a listener with degraded coding of envelope and TFS is most likely to notice perceptual difficulties when trying to understand speech in challenging settings, even if they do not notice any other deficits and have no difficulty in quiet environments. Thus, we hypothesize that differences in the fidelity with which the auditory system encodes supra-threshold TFS and amplitude modulation accounts for some of the inter-subject differences that NHT listeners exhibit in tasks such as understanding speech in noise or directing selective auditory attention (also See Section Human Data Consistent with the Neuropathy Hypothesis). Based on this idea, we argue that a method for measuring supra-threshold temporal coding fidelity may have important clinical applications, enabling quantification of suprathreshold hearing deficits that affect how well listeners operate in everyday environments, but that are not commonly recognized today.

#### **CONSEQUENCES OF COCHLEAR NEUROPATHY FOR TEMPORAL CODING**

One consequence of cochlear neuropathy (i.e., a reduction in the number of ANFs conveying sound) will be a reduction in the fidelity of temporal coding of supra-threshold sound. For instance, convergence of multiple, stochastic ANF inputs leads to enhanced temporal precision in the firing pattern of many CN cells (e.g., see Joris et al., 1994; Oertel et al., 2000). Thus, a reduction in the overall number of ANFs will reduce the precision with which both TFS and envelope temporal information are conveyed to higher centers (see also Lopez-Poveda and Barrios, 2013). While the importance of TFS coding for various aspects of sound perception cannot be overstated, we only briefly discuss TFS coding here. We focus primarily on the implications of cochlear neuropathy on the fidelity with which envelope information is conveyed. This focus is motivated particularly by recent data from guinea pigs and mice that suggest that noise-induced neuropathy preferentially damages the higher-threshold, lower-SR cochlear nerve fibers (Furman et al., 2013), rendering envelope coding especially vulnerable, as explained below.

Damage to lower-SR ANFs is likely to be especially detrimental to supra-threshold coding of sound envelopes, as high-SR fibers cannot robustly encode envelope timing cues in sounds at comfortable listening levels. Specifically, the average firing rate of high-SR ANFs (ignoring the temporal pattern of the response) saturates at levels roughly 20–30 dB above threshold, around the sound level of comfortable conversation (see red solid line in **Figure 2E**). In addition, both measures of phase locking to the envelope (namely the modulated rate, which is the magnitude of the frequency domain representation of the post-stimulus time histogram of the ANF response, evaluated at the fundamental frequency of the input signal; see dashed red line in **Figure 2E**) and the synchronization index (also known as the vector strength, calculated as the modulated rate normalized by one half of the average rate; see red line in **Figure 2F**) of high-SR neurons drop off as sound levels approach and exceed comfortable listening levels. This drop off is particularly detrimental for relatively intense sounds with shallow modulation depths, where both the crests and troughs of the envelope of the signal driving the high-SR ANFs fall in the saturation range of intensities, resulting in relatively poor modulation in the temporal response of these fibers (Joris and Yin, 1992). In contrast, lower-SR fibers are more likely to encode these envelope fluctuations because they are likely to be at an operating point where the firing rate (in the steady–state) is still sensitive to fluctuations in the sound level. If noise exposure causes a selective neuropathy that preferentially affects lower-SR fibers, then the ability to analyze envelopes at conversational sound levels is likely to be impaired. Both theoretical simulations and preliminary experimental evidence from envelope following responses (EFRs, described in Section Objective Measures of Subcortical Temporal Coding) recorded in mice and humans are consistent with this reasoning, as discussed in Section Evidence for Cochlear Neuropathy.

#### **OBJECTIVE MEASURES OF SUBCORTICAL TEMPORAL CODING**

Many psychophysical studies have been devoted to the development and discussion of behavioral measures to assess temporal coding in both NHT and hearing-impaired listeners (see Moore, 2003; Strelcyk and Dau, 2009). On the other hand, SSSRs provide an objective window into how the subcortical nuclei of the ascending auditory pathway encode temporal information in sound. While behavioral characterizations are important indicators of everyday hearing ability, in order to limit the length and scope of this opinion paper and still provide substantial discussion, here we focus on objective, physiological measures that can quantify the temporal coding precision of supra-threshold sound in the individual listener. Such measures may also be helpful in identifying some of the mechanisms that lead to individual differences in behavioral ability.

SSSRs refer to the scalp-recorded responses originating from subcortical portions of the auditory nervous system. These responses phase lock both to periodicities in the acoustic waveform and to periodicities induced by cochlear processing (Glaser et al., 1976). SSSRs are related to auditory brainstem responses (ABRs; the stereotypical responses to sound onsets and offsets; Jewett et al., 1970); however, whereas ABRs are transient responses to sound onsets and offsets, SSSRs are sustained responses to ongoing sounds that can include responses phase locked to both the fine structure and the cochlear-induced envelopes of broadband sounds. SSSRs have been used extensively in basic neurophysiologic investigation of auditory function and sound encoding (e.g., Kuwada et al., 1986; Aiken and Picton, 2008; Gockel et al., 2011; also see Krishnan, 2006; Chandrasekaran and Kraus, 2010, for reviews). Given the frequency specificity possible with SSSRs, they have also been proposed as a potential tool for objective clinical audiometry (Lins et al., 1996). In addition, SSSRs have been shown to be sensitive to deafferentation in that IHC loss leads to degraded SSSRs, especially at moderate sound levels (Arnold and Burkard, 2002).

While there are many studies of SSSRs, confusingly, different branches of the scientific literature use different names to refer to the same kinds of measurements. Periodic responses to amplitude-modulated sounds originating from both the subcortical and cortical portions of the auditory pathway are often collectively referred to as auditory steady-state responses (ASSRs) (Galambos et al., 1981; Stapells et al., 1984; Rees et al., 1986). However, brainstem SSSRs can be distinguished from responses generated at the cortical level by virtue of their relatively high frequency content; practically speaking, cortical and SSSR responses can be extracted from the same raw scalp recordings by appropriate filtering (e.g., see Krishnan et al., 2012; Bharadwaj and Shinn-Cunningham, 2014). The responses that specifically phase lock to the envelope of amplitude modulated (AM) sounds have been referred to as EFRs or amplitude modulation following responses (AMFRs; Dolphin and Mountain, 1992; Kuwada et al., 2002). In the recent literature, SSSRs are most commonly referred to as frequency following responses (FFRs), a term originally used to denote responses phase locked to pure tones (Marsh et al., 1975). Since the term FFR hints that responses are phase locked to the acoustic frequency content of input sound (i.e., the finestructure of narrowband or locally narrowband sounds), here we will use the term "SSSR" to describe the sustained responses originating from subcortical portions (at frequencies >80 Hz or so in humans) of the auditory pathway. More specifically, we will focus on EFRs: SSSRs that are locked to the envelope.

While EFRs provide a convenient non-invasive measure of subcortical envelope coding, there are several difficulties in interpreting them. First, they represent neural activity that is the sum of a large population of neurons, filtered by layers of brain tissue, skull, and scalp. Depending on the stimulus parameters, thousands of neurons in each of multiple subcortical nuclei may contribute to the EFR (Kuwada et al., 2002). Neurons from several regions along the tonotopic axis could contribute to the EFR for high-level sounds due to spread of excitation, even for narrow-band sounds. Thus, relating EFR results to physiological responses of single neurons is not straightforward. ANF modulation frequency responses are uniformly low pass; high characteristic frequencies (CFs) fibers (>10 kHz) have cutoff frequencies around 1 kHz in cat (Joris and Yin, 1992). Below 10 kHz, cutoff frequency is dependent on CF, suggesting a limit imposed by an interaction between the content of the input signal and the bandwidths of cochlear filters (Joris and Yin, 1992). As signals ascend the auditory pathway, they are transformed from a temporal to a rate code, with the upper limit of phase locking progressively shifting to lower modulation frequencies (summarized in Figure 9 of Joris et al., 2004; see also Frisina et al., 1990; Joris and Yin, 1992; Krishna and Semple, 2000; Nelson and Carney, 2004). Modulation frequencies in the 70 to 200 Hz range elicit phase-locked responses in a cascade of subcortical auditory structures, from cochlear hair cells to inferior colliculus (IC) neurons, suggesting that many sources can contribute to the EFRs in this frequency range. Luckily, compared to the IC, the more peripheral EFR generators generate relatively weak responses, both because they drive smaller synchronous neural populations and because they are more distant from the measurement site.

Based on single-unit data, reversible inactivation studies, irreversible lesion studies, and studies analyzing EFR group delay, it has been argued that the dominant generators of the EFR move from caudal (AN and CN) to rostral (inferior colliculus or IC) as modulation frequency decreases (Sohmer et al., 1977; Dolphin and Mountain, 1992; Kiren et al., 1994; Herdman et al., 2002; Kuwada et al., 2002). These studies provide evidence that the IC dominates EFRs at modulation frequencies between about 70 and 200 Hz, in all species tested. Changes in the slope of the response phase vs. input modulation frequency can be used to calculate apparent latency of the sources and thereby infer changes in the relative strengths of different neural generators in the mixture (Kuwada et al., 2002); regions where the slope is constant indicate regions where the mixture of generators is constant. Above 200 Hz, the pattern of these changes varies across species, probably due to differing head sizes and shapes. Humans, rabbits, and mice exhibit regions of constant phase slopes out to 500, 700, and 1000 Hz, respectively (Kuwada et al., 2002; Purcell et al., 2004; Pauli-Magnus et al., 2007); in contrast, in gerbils, the phase slopes above 200 Hz are not constant (Dolphin and Mountain, 1992). These differences in phase slopes indicate that the specificity of EFRs is species-dependent. However, in all species it is clear that manipulation of modulation frequency can be used to bias responses towards more rostral or more caudal sources.

Despite these complications, all acoustic information is conveyed to the brain through the ANFs; moreover, deficiencies at the level of the ANF can be expected to have an effect downstream, in higher-order processing centers. Therefore, EFRs originating in the brainstem/mid-brain are likely to reflect the consequences of ANF neuropathy. Indeed, by using different stimuli, it may be possible to emphasize the contribution of different subcortical sources (by changing the modulation frequency of the input) or different portions of the cochlear partition (by changing the acoustic carrier of the signal). In particular, metrics such as the phase-locking value (PLV) can be calculated to quantify the robustness of temporal coding in the EFR, akin to using the vector-strength to assess temporal coding in single-unit physiology studies (Joris et al., 2004).

When analyzing the temporal precision of signals, the PLV has a straightforward interpretation. The details of the PLV computation and its statistical properties are described in a number of previous studies (e.g., see Lachaux et al., 1999; Bokil et al., 2007; Ruggles et al., 2011; Zhu et al., 2013). Briefly, the PLV quantifies the consistency of the response phase across repetitions of the stimulus presentation ("trials"). For a given frequency bin, the response to each trial can be represented as a unit vector (phasor) in the complex plane whose phase equals the response phase. The PLV then equals the magnitude (length) of the vector average of the phasors, averaged across trials (**Figure 3A**). If the response is consistently at or near a fixed phase, then the resulting average has a magnitude near one and the PLV is high (top panel, **Figure 3A**). On the other hand, if the response phase relative to the stimulus is random over the unit circle, the phasors cancel, the resultant vector has a small magnitude, and the PLV is near zero (bottom panel of **Figure 3A**). An example of the PLV spectrum (computed for EFRs from 400 repetitions of a 100 Hz transposed tone at a carrier frequency of 4 kHz and 65 dB SPL) is shown in **Figure 3C**. Strong peaks are evident at the fundamental and harmonic frequencies of the envelope. The PLV thus is one way of assessing the temporal coding fidelity of the EFR, and of subcortical encoding of supra-threshold sound.

## **EVIDENCE FOR COCHLEAR NEUROPATHY**

#### **NEUROPATHY AND SELECTIVE LOSS OF LOWER-SPONTANEOUS DISCHARGE RATE FIBERS IN ANIMALS**

Recent studies in both mice and guinea pigs show that noise exposure that causes a *temporary* increase in threshold sensitivity (e.g., initial threshold elevations of as much as 40 dB that completely recover over 3–5 days) nevertheless can cause a rapid loss of 40–50% of the ANF synapses on IHCs as well as a slow death of the ANF cell bodies (spiral ganglion cells) and central axons (Kujawa and Liberman, 2009; Maison et al., 2013). Despite

single-trial SNR of the measurement in the frequency bin of interest and

in the PLV at multiples of the envelope frequency.

the extent of effects of such exposure on synapses and ganglion cells, it does not typically cause any loss of hair cells. Single-unit recordings in the guinea pig indicate that this noise-induced loss is selective for lower-SR fibers (Furman et al., 2013). Pharmacological studies suggest that this neuropathy is the result of a type of glutamate excitotoxicity, brought on by glutamate overload at particularly active synapses (Pujol et al., 1993). In the central nervous system, glutamate excitotoxicity is mediated by an increase in intracellular calcium concentration (Szydlowska and Tymianski, 2010). Since mitochondria comprise an important intracellular calcium buffering system, the relative paucity of mitochondria in the lower-SR fibers (Liberman, 1980) may contribute to their special vulnerability to glutamate excitotoxicity caused by noise exposure.

In aging mice, there is a steady degeneration of ANFs. Indeed, 30–40% of IHC synapses are lost by roughly 3/4 of the lifespan, an age at which threshold elevation is modest (typically less than 10 dB), but there is no significant loss of hair cells (Sergeyenko et al., 2013). Previous neurophysiological studies of age-related hearing loss in the gerbil suggest that this neurodegeneration is also selective for lower-SR fibers (Schmiedt et al., 1996). Unfortunately, relatively little is known about how aging impacts ANF synapses in humans. The only study that counted IHC synapses in the human inner ear (**Figure 1B**) found relatively low numbers of IHC synapses; however, this low count may reflect a significant degree of age-related neuropathy rather than a species difference, given that the tissue was obtained from a relatively old individual (63 years of age). Indeed, counts of spiral ganglion cells in an age-graded series of human temporal bones show degeneration of 30%, on average, from birth to death, even in cases with no hair cell loss (Makary et al., 2011). The marked delay between synaptic death and spiral ganglion cell death (1–2 years in mouse, and possibly much longer in humans) suggests that the loss of cochlear nerve synapses on IHCs is almost certainly significantly greater than 30%, on average, in the aged human ear.

Considering that only a small number of sensitive, intact ANFs may be needed for detection in quiet (Lobarinas et al., 2013), it seems likely that even considerable neuropathy would not change thresholds for tones in quiet, and thus would not be detected by standard threshold audiometry. This is even more likely the case if the neuropathy is selective for ANFs with higher thresholds, which are not active near perceptual thresholds. It also seems likely that a loss of a large population of high-threshold ANFs could dramatically affect auditory performance on complex tasks that require analysis of supra-threshold sound content, such as those requiring the extraction of precise timing cues or extracting a signal in a noisy environment, as discussed above. Thus, we hypothesize that cochlear neuropathy in general—and possibly selective neuropathy of high threshold fibers in particular—is one of the reasons that aging often is found to degrade human performance on tasks requiring analysis of the content of suprathreshold sound.

#### **HUMAN DATA CONSISTENT WITH THE NEUROPATHY HYPOTHESIS**

While there is no human data yet to directly support the neuropathy hypothesis, a series of studies from our lab are consistent with the hypothesis that cochlear neuropathy causes difficulties with coding of supra-threshold sound for humans and accounts for some of the individual variability seen in listeners with normal audiometric thresholds. NHT listeners exhibit marked differences in how well they can utilize precise temporal information to direct selective attention, from nearchance levels to almost perfect performance (Ruggles and Shinn-Cunningham, 2011). As discussed in Section Consequences of Cochlear Neuropathy for Temporal Coding, cochlear neuropathy could result in degraded coding of both TFS and envelope information. In line with this hypothesis, differences in EFR phase locking accounts for some of this inter-subject variability in performance. **Figure 4A** shows the relationship between performance in a spatial attention task in reverberation and the PLV calculated from EFRs obtained separately (data from Ruggles et al., 2011, 2012). Pooled over age groups, listeners with higher EFR phase locking performed better in the selective attention task (Kendall *tau* = 0.42, *p* < 0.002). Though age by itself did not correlate with performance in anechoic conditions, when temporal cues in the acoustic mixture were degraded by adding reverberation, middle-aged listeners showed a bigger drop in performance than younger listeners (Ruggles et al., 2012), as if timing cues are encoded less robustly in middle-aged listeners than in young adults. In addition, as shown in **Figure 4B**, performance also correlated with thresholds for low-rate frequency modulation (FM) detection, a task known to rely on robust temporal coding of TFS (Kendall *tau* = 0.5, *p* = 0.001, data from Ruggles et al., 2011, 2012). Crucially, all listeners in these studies had pure-tone audiometric thresholds of 15 dB HL or better at octave frequencies between 250 Hz and 8 kHz. The small differences in hearing threshold (within the NHT range) that did exist were not correlated with selective attention performance; similarly, reading span test scores (a measure of cognitive ability) were unrelated to performance. These results suggest that both TFS and envelope cues are important in everyday listening under challenging conditions, since individuals with poor TFS and envelope coding (as measured by FM detection thresholds and EFR phase locking respectively) perform poorly in a spatial attention task. (For a complete description of the spatial attention task, the FM detection task and the EFR measures, see Ruggles et al., 2011, 2012).

Several other studies have reported that some listeners with normal thresholds (particularly older participants) perform poorly on certain behavioral tasks, sometimes even on par with hearing-impaired subjects. Yet other studies show that temporal processing of both TFS and envelope degrades with aging and manifests independently of hearing loss (see Fitzgibbons and Gordon-Salant, 2010 for a review). In NHT listeners, sensitivity to ITD varies greatly across the population, with some listeners performing as poorly as older hearingimpaired subjects (see Grose and Mamo, 2010; Strelcyk and Dau, 2009). Recent studies have also demonstrated abnormal speech processing among hearing-impaired listeners even when the frequency content of the speech was limited to regions where thresholds are normal, pointing towards supra-threshold coding deficits (Horwitz et al., 2002; Lorenzi et al., 2009; Léger et al., 2012).

Listeners with good temporal coding of envelopes as measured by the EFR PLV were able to spatially segregate the competing speech streams and performed well. **(B)** Relationship between spatial attention task

Older listeners also have been shown to exhibit deficits specific to envelope processing across a range of tasks, including speech recognition in the presence of modulated noise maskers (Dubno et al., 2003; Gifford et al., 2007) and temporal modulation sensitivity (Purcell et al., 2004; He et al., 2008). Consistent with this, the highest modulation frequency to which EFRs exhibit phase locking decreases with age (Purcell et al., 2004; Leigh-Paffenroth and Fowler, 2006; Grose et al., 2009), supporting the hypothesis that the robustness of supra-threshold modulation coding is reduced with aging. Using measures of both gap detection and word recognition on a sizeable cohort of young and old listeners, Snell and Frisina (2000) concluded that age-related changes in auditory processing occur throughout adulthood. Specifically, they concluded that deficits in temporal acuity may begin decades earlier than age-related changes in word recognition. Though not direct evidence that neuropathy causes these perceptual difficulties, these results are consistent with our hypothesis, especially given animal data suggesting that both aging and noise-exposure degrade ANF responses (especialy lower-SR fibers) and degrade supra-threshold temporal coding without affecting thresholds (Schmiedt et al., 1996; Kujawa and Liberman, 2009; Lin et al., 2011; Furman et al., 2013). If neuropathy underlies deficits in temporal encoding that predict behavioral differences, it may be possible to develop even more sensitive physiological metrics to capture an individual listener's supra-threshold coding fidelity. Section Diagnosing Cochlear Neuropathy is devoted to the discussion of this idea.

#### **DIAGNOSING COCHLEAR NEUROPATHY**

The degree of deafferentation in cochlear neuropathy can be studied directly in animals using invasive methods in combination with histological evaluation, or in humans using post-mortem studies (e.g., Halpin and Rauch, 2009; Makary et al., 2011). However, assessment in behaving humans must be non-invasive, and therefore must employ indirect methods. Given that neuropathy should impact supra-threshold temporal coding, individual behavioral assessment of envelope and TFS coding of sound at comfortable listening levels may prove useful in assessing neuropathy. In order to expose supra-threshold deficits and individual differences, selective attention tasks in adverse conditions (e.g., in a noise background or in a complex, crowded scene) may be most effective. However, given that aging and noise exposure cause outer hair cell loss, elevated thresholds, and other (muchstudied) effects, assessment of cochlear function is necessary to ensure that supra-threshold deficits are attributable to neuropathy. Measures of brainstem temporal coding, like the ABR and SSSR, may be helpful in assessing neuropathy objectively and passively; exploring these metrics at high sound levels and low modulation depths (which stresses coding of modulations akin to those important when listening in a crowded scene) may be particularly useful (see Section Emphasizing the Contribution of Lower-Spontaneous Discharge Rate Auditory Nerve Fibers to the Envelope Following Responses). In order to develop and interpret effective, sensitive tests using these types of non-invasive physiological measures, quantitative models that provide testable predictions will be vital. In this section, we consider some of these points, with a focus on objective measures.

#### **MEASURING BRAINSTEM CODING: AUDITORY BRAINSTEM RESPONSES VS. SUBCORTICAL STEADY-STATE RESPONSES**

In animal work, the preferential loss of higher-threshold (lower-SR fibers) leads to a decrease in the supra-threshold growth of the amplitude of wave I of the ABR, without a change in ABR threshold (Kujawa and Liberman, 2009; Furman et al., 2013). In both noise-exposed mice and noise-exposed guinea pigs, the proportional decrement in the magnitude of wave I at high levels (i.e., 80 dB SPL) closely corresponds to the percentage of loss of auditory-nerve synapses. However, by limiting the analysis to animals without permanent threshold shifts in the noise-exposed ear, these experiments remove the confound that changes in hearing threshold are likely to affect wave I amplitude; by design, the supra-threshold changes in ABR amplitude found in these experiments cannot be due to differences in threshold sensitivity, but instead reflect differences in the number of fibers responding to supra-threshold sound. Even in populations with normal thresholds, inter-subject variability in ABR amplitudes complicates analysis. One past study showed that in age- and gender-matched mice, the variance in normal ABR amplitude measures is relatively low (Kujawa and Liberman, 2009); however, the mice in this study were genetically identical. In age- and gender-matched guinea pigs, the variance in ABR amplitude is significantly higher. In the genetically heterogeneous guinea pigs, neuropathy-related changes in ABR amplitude are revealed clearly only when data are analyzed within subject, measuring the effects of noise exposure by normalizing the post-trauma amplitude responses by the responses from the same ear before exposure (Furman et al., 2013). Of course, such a before-and-after approach is unlikely to prove useful for human clinical testing, except in extraordinarily rare circumstances.

The above studies suggest that the ABR may be useful for assessing neuropathy. However, there are a number of reasons why the electrophysiological responses to an AM carrier tone, i.e., the EFR, might be better suited to the assessment of lower-SR neuropathy than the ABR. For one thing, ABR wave I, generated by tone pips, is proportional to the size of the onset responses in the AN. Since, as schematized in **Figure 2C**, the onset responses of lower-SR fibers are small compared to high-SR fiber onset responses (Taberner and Liberman, 2005; Buran et al., 2010), they make a relatively small contribution to the total onset response, rendering the metric fairly insensitive to the integrity of the lower-SR population. In contrast, the steadystate rates of the three SR groups are of more similar magnitude; a loss of lower-SR fibers should thus cause a greater change in steady-state measures like the SSSR or EFR than transient responses like the ABR. Furthermore, as noted above (see **Figure 2F**), lower-SR ANFs synchronize more tightly to the envelope of an AM tone than their high-SR counterparts, especially at moderate and high sound intensities (Johnson, 1980; Joris and Yin, 1992). Synchronization in response to AM-tones can be assessed both by the modulated rate (the amplitude of the peri-stimulus time histogram at the stimulus modulation frequency) and synchronization index (or vector strength; see Joris et al., 2004 for a discussion about different measures of envelope coding). The synchronization index of lower-SR fibers can be larger than that of high-SR fibers of similar best frequency. Indeed, preliminary results suggest that in noise-exposed mice, amplitude decrements in EFR responses to an amplitudemodulated carrier tone presented at the frequency region of maximum cochlear neuropathy are a more sensitive measure of deficit than decrements in ABR wave I amplitude (Shaheen et al., 2013). Perhaps more importantly, a phase-based analysis like the PLV can be used to analyze EFR strength, which can be a more robust and more easily interpreted metric than amplitude measures of these far-field potentials, which have a weak signalto-noise ratio (SNR) and depend on factors such as tissue and head geometry.

#### **EMPHASIZING THE CONTRIBUTION OF LOWER- SPONTANEOUS DISCHARGE RATE AUDITORY NERVE FIBERS TO THE ENVELOPE FOLLOWING RESPONSE**

As previously discussed (Section Temporal Coding and Its Importance for Auditory Perception), one likely consequence of cochlear neuropathy is a reduction in the fidelity of temporal coding in the brainstem. The idea that cochlear neuropathy may preferentially target lower-SR fibers (Schmiedt et al., 1996; Furman et al., 2013) may be exploited to devise EFR measures that are more likely to capture the effects of neuropathy. Focusing on responses to high-frequency envelopes could prove to be an effective way to assess neuropathy, because envelope fluctuations cannot drive saturated high-SR fibers effectively. Even for "transposed tones" (a modulated high-frequency signal whose envelope mimics the rectified sinusoidal drive of a low-frequency tone operating at low-frequency portions of the cochlea; see van de Par and Kohlrausch, 1997), phase locking of high-SR fibers is reduced at mid to high sound levels (Dreyer and Delgutte, 2006). This effect is likely to be particularly strong for a relatively high-intensity modulated signal with a shallow modulation depth. For such signals, the input intensity of the driving signal will fall within the saturation range of high-SR fibers at all moments; the only fibers that could encode the shallow modulations are the lower-SR fibers. Thus, measures of EFR phase locking to high-frequency, high-intensity, amplitude-modulated signals with shallow modulation may be especially sensitive when assessing lower-SR-fiber status.

Here, we use a simple model of brainstem responses to illustrate why EFRs to shallow amplitude modulations and high sound levels are likely to emphasize the contribution of lower-SR fiber responses to the measurements. Given that EFR responses reflect responses at the level of the brainstem/midbrain, likely the IC, we built a model of IC responses (**Figure 5A**) by combining an established model of the ANF responses (Zilany and Bruce, 2006; Zilany et al., 2009) with previous phenomenological models of amplitude-modulation processing in the IC (Nelson and Carney, 2004). Updated, humanized, ANF model parameters were used for the simulation (Zilany et al., 2014). This model has been shown to predict ANF single-unit envelope response data quite well (Joris and Yin, 1992). Considering that the simulations included stimuli with high sound levels (as in Dau, 2003; Rønne et al., 2012), a tonotopic array of ANFs (and corresponding IC cells) were included to allow for off-frequency contributions. ANFs with 50 CFs uniformly spaced along the basilar membrane according to a place-frequency map were simulated. For each CF, lower- and high-SR fibers were simulated. In order to obtain a population response at the level of the IC, responses to IC cells driven by lower- and high-SR ANFs were averaged with weights

proportional to known population ratios (40% Lower-SR fibers and 60% high-SR fibers, see Liberman, 1978). At the level of the IC, the resulting population response is treated as a proxy for the signal driving the EFR. Responses were simulated for a sinusoidally amplitude modulated (SAM) tone with a carrier frequency of 4 kHz and a modulation frequency of 100 Hz. In order to attenuate the contribution of off-frequency neurons to the population response, a broadband noise masker with a notch centered at 4 kHz and extending 800 Hz on either side was added to the SAM tone, as can be done with real EFR measurements in the laboratory. The SNR for the simulations was fixed at 20 dB (broadband root mean square (RMS)). The IC model parameters were set to the values used in Nelson and Carney (2004), which ensured that the 100 Hz modulation frequency was within the band-pass range of the IC cells. Neuropathy was simulated by progressively attenuating the weights given to the IC population driven by lower-SR ANFs, leaving the high-SR population unchanged.

**Figure 5** shows the absolute population response magnitude following the 100 Hz modulation in logarithmic units. Results are shown for different amounts of neuropathy, both for different stimulus levels (**Figure 5B**) and for different modulation depths (**Figure 5C**). As seen from the figures, neuropathy has the greatest effect on the population response for stimuli at mid to high sound levels and relatively low modulation depths. This is consistent with the idea that the modulated firing rate of high-SR ANFs is drastically attenuated at moderate to high sound levels and low-modulation depths (Joris and Yin, 1992; Dreyer and Delgutte, 2006). Similar results were obtained (not shown) presenting "transposed" tones to this model as well as when using the Rønne et al. (2012) model, where the EFR is obtained by convolving the ANF population response with a "unitary-response" that is designed to aggregate and approximate all transformations of the ANF population response before being recorded in the EFR. In both model approaches, lower- and high-SR ANF driven IC responses were summed linearly to generate the population response. When the lower- and high-SR ANF responses were mixed non-linearly using a coincidence detection process (i.e., a geometric average instead of an arithmetic average) before being delivered to the IC model, the effects of the lower-SR fiber neuropathy were even larger (not shown).

This analysis supports the idea that EFR responses to shallow amplitude modulation at high levels may provide a sensitive, objective correlate of neuropathy. Apart from emphasizing the contribution of lower-SR ANFs, high sound levels are more likely to reveal differences in the number of intact ANFs even if neuropathy is not specific to lower-SR fibers because larger populations of ANFs are recruited overall. These results are also consistent with the report that the ABR wave I amplitude in noiseexposed mice closely corresponds to the amount of neuropathy when the sound level is high (80 dB, Furman et al., 2013) as well as preliminary data from our lab that suggest that individual differences in the EFR are largest at high stimulus levels (Bharadwaj et al., 2013). In addition, inspection of **Figure 5B, C** suggests that the sizes of the change (i.e., slopes) in the population response with level and with modulation depth both reflect the level of neuropathy. Thus, either of these changes, along with behavioral measures, could be used to assess the ability of the listener to process supra-threshold sound. However, in practice, manipulating modulation depth with the level fixed at a high value may lead to more easily interpreted results than measuring how the EFR changes with overall level (see Section Using Envelope Following Responses to Assess Supra-threshold Coding Fidelity). As explained above, we suggest that individual listeners with normal audiometric thresholds could differ in the number of intact ANFs due to differences in noise exposure, genetic predisposition to hearing damage, and other factors. Given the alreadydiscussed importance of supra-threshold temporal coding for operating in everyday social settings (understanding speech in noise, directing selective auditory attention, etc.), assessment of neuropathy by measurement of EFRs may have a place in audiological practice, especially because such measures are objective and can be recorded passively (making them suitable for use with special populations in which behavioral assessment is not easy).

#### **ISOLATING COCHLEAR NEUROPATHY**

As noted above, in order to assess neuropathy, it is critical to rule out or otherwise account for cochlear dysfunction. One of the most basic characteristics of cochlear function is the frequency selectivity of the basilar membrane (BM). BM frequency selectivity is correlated with cochlear gain at low sound levels (Shera et al., 2002, 2010) and typically decreases with hearing impairment. BM frequency selectivity can be estimated psychophysically (Patterson, 1976; Glasberg and Moore, 1990; Oxenham and Shera, 2003); however, it is possible that such measures may include small contributions from extra-cochlear factors (such as neuropathy). Alternatively, distortion product otoacoustic emissions (DPOAEs) in response to fixed-level primaries (DPgrams; e.g., see Lonsbury-Martin and Martin, 2007) can be used to assess cochlear function. Because OAEs are generated within the cochlea as a consequence of outer-hair-cell activity and do not depend on afferent processing, measuring them may be preferable to measuring psychophysical tuning curve measures. Specifically, normal DPgrams can be used to establish that poor suprathreshold coding arises post transduction (e.g., via cochlear neuropathy) rather than from outer-hair-cell loss or other problems with cochlear amplification (an approach taken in the animal studies of Kujawa and Liberman, 2009; Furman et al., 2013). To test that cochlear compression is intact at the frequencies tested, either stimulus-frequency OAEs (SFOAEs; Schairer et al., 2006) or DPOAE growth functions can be used (Kummer et al., 1998; Neely et al., 2003). DPOAE suppression tuning curves (Gorga et al., 2011; Gruhlke et al., 2012) or SFOAE phase gradients at low stimulus levels (Shera et al., 2002) can provide estimates of cochlear filter tuning. Henry and Heinz (2012) recently demonstrated the importance of considering differences in cochlear function in order to interpret differences in measures of temporal coding fidelity properly. As this work shows, establishing that participants have normal cochlear sensitivity by measuring both OAEs and audiometric thresholds is crucial when trying to attribute individual differences in SSSRs and psychoacoustic measures to deficits in supra-threshold coding of sound due to neuropathy.

## **FUTURE EXPERIMENTS**

A growing body of evidence suggests that (1) NHT listeners vary significantly in how well their auditory systems encode supra-threshold sound; and (2) Noise exposure and aging can lead to considerable amounts of neuropathy without affecting audiometric thresholds. We have argued that cochlear neuropathy in general, and selective neuropathy of lower-SR ANFs in particular, may help explain some of the supra-threshold differences in NHT listeners. Although we believe that the diversity of evidence consistent with this hypothesis is compelling, further experiments are necessary to truly establish these ideas and to understand potential implications for audiological practice. Here, we propose a few key areas that we believe merit future investigation.

#### **ACCOUNTING FOR INDIVIDUAL DIFFERENCES IN COCHLEAR FUNCTION**

As discussed in Section Isolating Cochlear Neuropathy, experiments seeking to implicate cochlear neuropathy in human perception must account for individual differences in cochlear processing. There are a number of objective metrics of cochlear health including DPOAE and SFOAE growth functions (Kummer et al., 1998; Schairer et al., 2006), DPOAE suppression tuning curves (Gorga et al., 2011; Gruhlke et al., 2012), and SFOAE group delay measurements (Shera et al., 2002; Shera and Bergevin, 2012). However, there are practical concerns that may limit the utility of many of these methods. For instance, using OAE methods to study neuropathy in patients with elevated hearing thresholds may be difficult, as SFOAE amplitudes critically depend on cochlear gain (Shera and Guinan, 1999). DPOAE methods depend more on cochlear compression, rather than cochlear gain (Shera and Guinan, 1999), and thus may prove to be a more robust method for assessing contributions of cochlear function to perception in heterogeneous subject populations (Gruhlke et al., 2012). Experiments are needed to determine what tests best quantify cochlear function, enabling such factors to be teased out when appraising cochlear neuropathy, and developing such tests into clinically useful tools.

#### **DEVELOPING QUANTITATIVE MODELS OF ENVELOPE FOLLOWING RESPONSE GENERATORS**

Because any human measurements of EFRs only indirectly reflect the responses of ANFs, quantitative models of the subcortical generators of the measured response are critical for understanding results and using them to quantify supra-threshold envelope coding. Data suggest that EFRs primarily reflect responses from the mid-brain, and are dominated by responses in the IC (Smith et al., 1975; Sohmer et al., 1977; Dolphin and Mountain, 1992; Kiren et al., 1994; Herdman et al., 2002). However, further experiments are needed to assess if current physiological models capture the behavior of real EFRs. When applied to modulated highfrequency sounds, simple models of IC responses predict a graded loss in the population response with cochlear neuropathy (see **Figure 5**), consistent with the idea that the observed heterogeneity of EFR responses in NHT subjects reflects, in part, differences in ANF survival. Instead of modeling individual neurons, others have modeled brainstem responses (ABRs and FFRs) directly using a kernel method (e.g., Dau, 2003; Rønne et al., 2012). In this approach, all subsequent transformations of the AN responses are modeled by a linear system approximation; model AN responses are used to deconvolve click-ABRs to obtain a "unitary response" that aggregates all of the transformations occurring from the nerve through to the electrode (including processing within the midbrain nuclei and any summation and filtering influencing what is recorded on the scalp). Despite the obvious simplifying assumptions of such an approach, model predictions capture many of the observed properties of ABRs and FFRs in response to simple stimuli. A slightly more elaborate model of EFRs that combines both these approaches (taking into account single-unit level phenomena such as in the model in **Figure 5** as well as scalprecording properties of the measurements as in Dau, 2003), may be considered. For instance, one recent study explored the consequences of cochlear sensitivity and selective cochlear neuropathy on the latency of simulated ABR responses (Verhulst et al., 2013). Further development, testing, and refinement will ensure that results of EFR experiments are interpreted appropriately in the context of these models. Hence, we identify this as a key area for future efforts devoted to interpreting EFR measures.

#### **USING ENVELOPE FOLLOWING RESPONSES TO ASSESS SUPRA-THRESHOLD CODING FIDELITY**

A selective loss of lower-SR fibers would likely cause phase locking of the EFR to degrade at high sound levels, in line with the model results presented here (**Figure 5B**). As suggested in **Figure 5**, if neuropathy underlies some supra-threshold deficits, the rate of change of the EFR PLV with sound level (akin to the rate of change of ABR wave I in Furman et al., 2013) would correlate with perceptual abilities on tasks requiring analysis of the envelope of supra-threshold sounds, such as envelope ITD discrimination, spatial selective auditory attention, and related tasks. Preliminary data support this idea (Bharadwaj et al., 2013). Further experiments are needed to corroborate our hypothesis that neuropathy (especially neuropathy that preferentially affects lower-SR fibers) contributes to individual differences in the ability to analyze complex auditory scenes. The use of narrowband stimuli such as transposed tones (van de Par and Kohlrausch, 1997) with offfrequency maskers may allow for a frequency specific assessment of EFR phase locking at different CFs (i.e., at different frequency channels of the auditory pathway). If the neuropathy hypothesis proves correct, this approach may allow for a frequency-specific diagnosis of cochlear neuropathy from non-invasive physiological measures.

Despite the potential of EFRs (especially the EFR-intensity slope) for assessing cochlear neuropathy, there are some limitations. The EFR is a measure of multi-source population activity and produces scalp potentials that are different mixtures of the source activity at different scalp locations. These measures depend on the geometry of the generators, properties of the recording electrodes, the volume conductor in between, the level of unrelated electrical activity from cortex and from muscles, and other subject-specific factors (Hubbard et al., 1971; Okada et al., 1997). All of these parameters cause inter-subject variability in the absolute magnitudes of the measured EFRs. This makes interpretation of the raw EFR magnitude difficult. While phasebased metrics such as the PLV are normalized and have a straightforward interpretation (Zhu et al., 2013), their absolute strength is still influenced by the same factors. Specifically, PLV estimates are biased by the within-band SNR in the raw responses that go into the PLV computation.

This is illustrated in **Figure 3B**, which shows the relationship between estimated and true PLVs for simulated data (signal phase drawn from a von Mises distribution with known concentration and additive noise) as a function of SNR, under the assumptions that the noise phase in any trial is independent of the signal phase (something that can be guaranteed experimentally by jittering the stimulus presentation across trials). In **Figure 3B**, at sufficiently high SNRs, the estimated PLVs converge to the true PLV of the simulated signal, and are insensitive to absolute magnitudes of both signal and noise. However, at intermediate SNR values, the EFR PLV estimates are negatively biased (see Bharadwaj and Shinn-Cunningham, 2014). This has implications when trying to account for individual differences across subjects, whose raw responses may well have different SNRs. Even in within-subject comparisons, if two experimental manipulations produce responses with very different SNRs, the values of the EFR PLVs will have different biases. This is particularly important when assessing the change in PLV as a function of sound level, since high-level sounds are likely to produce stronger responses (higher SNR measurements) than low-level sounds. While an increase in response power at the stimulus modulation frequency is meaningful in itself, it is not easy to dissociate increases in PLV that result from increases in response synchrony (phase consistency) vs. from increases in response level. Minimally, using recordings in the absence of stimuli might serve to provide estimates of background noise and SNR that can then be used to extract metrics to compare fairly across subjects and conditions. How important and robust such corrections will prove depends in no small part on where on the SNR curve a particular experimental measurement falls (**Figure 3B**). Additional experiments are needed to characterize these effects in human listeners across different types of stimuli and experimental procedures.

Another limitation is that physiologically, the change in the basilar membrane excitation pattern with sound level also complicates the interpretation of both EFR and psychophysical results. In particular, when seeking to assess cochlear neuropathy within a specific frequency channel using PLV-level growth curves, effects of the spread of excitation are a confounding factor. Use of offfrequency maskers such as notched noise may ameliorate these effects. However, it has also been reported that at least for midfrequency stimuli (around 1 kHz), the SSSR at the stimulus component frequency can be attenuated by noise even if the peripheral interaction between the signal and the masking noise is expected to be minimal (Gockel et al., 2012).

Alternately, EFRs can be measured in response to narrowband stimuli with a fixed peak pressure presented at different modulation depths. For deep modulations, high-SR fibers can entrain to the modulation. At shallow modulation depths with a high sound level (carrier level), even the valleys in the signal will have sufficient energy to keep high-SR fibers saturated; thus, the strength of phase locking to shallow modulations may better reflect the contribution of lower-SR ANFs. By computing how the EFR PLV strength changes as the modulation depth is reduced, the spread-of-excitation confounds associated with manipulating the stimulus level may be avoided. Moreover, the approach of fixing the peak sound pressure and progressively decreasing the modulation depth serves to fix the point of operation on the ANF rate-level curve, so that any reduction in PLV with decreasing modulation depth can be interpreted as being related to a drop in synchrony rather than a change in average rate causing a lower SNR. The model results in **Figure 5C** are consistent with this notion. However, as discussed in Section Developing Quantitative Models of EFR Generators, further work is needed to relate EFR results to physiological responses of single neurons. These issues further underscore the importance of combining electrophysiological, behavioral, and modeling approaches.

#### **SUMMARY AND CONCLUSIONS**

Human listeners with normal audiometric thresholds exhibit large differences in their ability to process supra-threshold sound features. These differences can be exposed in the laboratory by challenging behavioral tasks that necessitate the use of temporal information in supra-threshold sound (e.g., segregating and selecting one auditory object out of a complex scene). While some NHT listeners seek audiological help for difficulties of this sort (a population labeled as having APD), a significant percentage of ordinary, NHT listeners recruited for psychophysical studies in the laboratory, none of whom have known hearing problems, show similar deficits under carefully designed, challenging conditions. These observations hint that perceptual problems with supra-threshold sounds are more widespread than is currently appreciated and that there may be a continuum of abilities across NHT listeners, amongst those who seek audiological help and amongst the general population.

Recent animal work shows that noise exposure and aging can result in a loss of significant proportion of ANFs without any permanent shift in detection thresholds. Moreover, this kind of neuropathy appears to preferentially affect lower-SR ANFs. Both physiological responses to AM stimuli in animals and simplistic computational model simulations suggest that lower-SR fiber loss will degrade temporal coding of sound envelopes at comfortable conversational levels, where high-SR fibers are saturated and therefore unable to entrain robustly to envelopes in input sounds.

A number of studies show that individual differences in the perception of supra-threshold sound are correlated with the strength of brainstem responses measured noninvasively on the scalp (especially SSSRs and EFRs driven by signal modulation). While the absolute strength of EFRs correlates with perceptual abilities, sensitivity of such physiological measures may be improved by using stimuli that mimic conditions akin to adverse listening conditions, such as high levels and shallow modulations. In addition, differential measures that consider how EFR phase locking changes with stimulus intensity or modulation depth may be especially sensitive when quantifying supra-threshold hearing status, helping to factor out other subject-specific differences unrelated to neuropathy. Interpretation of such measures requires assessment of cochlear function, as well as development of quantitative models of brainstem responses to establish the correspondence between population responses such as EFRs and single-unit physiology.

There are many challenges in trying to relate behavioral and EFR results to underlying physiological changes such as neuropathy, a number of which are due to gaps in current knowledge. However, converging evidence supports the hypothesis that deficits in supra-threshold coding fidelity are relatively common in the population of NHT listeners, and account for at least part of the important differences in how well these listeners can communicate in difficult everyday social settings. Here, we argue that the neuropathy seen in aging and noiseexposed animals may also be occurring in humans and that it may explain observed supra-threshold individual differences. We have also proposed some objective metrics that, based on our hypothesis, should be sensitive measures of the integrity of ANFs, allowing individual assessment of supra-threshold hearing status, and have discussed some of the limitations of the metrics. Still, there remains a large set of questions to be answered, ranging from what mechanisms cause synaptic loss that preferentially affects lower-SR fibers to what physiological or perceptual tests may be most sensitive for assessing neuropathy. We believe these questions should be addressed immediately, given the potential clinical significance of these ideas.

#### **REFERENCES**


within normal limits. *Int. J. Audiol.* 50, 708–716. doi: 10.3109/14992027.2011. 582049


Patterson, R. D. (1976). Auditory filter shapes derived with noise stimuli. *J. Acoust. Soc. Am.* 59, 640–654. doi: 10.1121/1.380914


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 16 September 2013; accepted: 05 February 2014; published online: 21 February 2014.*

*Citation: Bharadwaj HM, Verhulst S, Shaheen L, Liberman MC and Shinn-Cunningham BG (2014) Cochlear neuropathy and the coding of supra-threshold sound. Front. Syst. Neurosci. 8:26. doi: 10.3389/fnsys.2014.00026*

*This article was submitted to the journal Frontiers in Systems Neuroscience.*

*Copyright © 2014 Bharadwaj, Verhulst, Shaheen, Liberman and Shinn-Cunningham. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Developmental hearing loss impairs signal detection in noise: putative central mechanisms

#### **Jennifer D. Gay1,2 , Sergiy V. Voytenko<sup>1</sup> , Alexander V. Galazyuk<sup>1</sup> and Merri J. Rosen<sup>1</sup>\***

<sup>1</sup> Department of Anatomy and Neurobiology, Northeast Ohio Medical University, Rootstown, OH, USA <sup>2</sup> Biomedical Sciences Program, Kent State University, Kent, OH, USA

#### **Edited by:**

Jonathan E. Peelle, Washington University in Saint Louis, USA

#### **Reviewed by:**

Peter Keating, University of Oxford, UK

Ramnarayan Ramachandran, Vanderbilt University Medical Center, USA

#### **\*Correspondence:**

Merri J. Rosen, Department of Anatomy and Neurobiology, Northeast Ohio Medical University, P.O. Box 95, 4209 State Route 44, Rootstown, OH 44272, USA e-mail: mrosen@neomed.edu

Listeners with hearing loss have difficulty processing sounds in noisy environments. This is most noticeable for speech perception, but is reflected in a basic auditory processing task: detecting a tonal signal in a noise background, i.e., simultaneous masking. It is unresolved whether the mechanisms underlying simultaneous masking arise from the auditory periphery or from the central auditory system. Poor detection in listeners with sensorineural hearing loss (SNHL) is attributed to cochlear hair cell damage. However, hearing loss alters neural processing in the central auditory system. Additionally, both psychophysical and neurophysiological data from normally hearing and impaired listeners suggest that there are additional contributions to simultaneous masking that arise centrally. With SNHL, it is difficult to separate peripheral from central contributions to signal detection deficits. We have thus excluded peripheral contributions by using an animal model of early conductive hearing loss (CHL) that provides auditory deprivation but does not induce cochlear damage. When tested as adults, animals raised with CHL had increased thresholds for detecting tones in simultaneous noise. Furthermore, intracellular in vivo recordings in control animals revealed a cortical correlate of simultaneous masking: local cortical processing reduced tone-evoked responses in the presence of noise. This raises the possibility that altered cortical responses which occur with early CHL can influence even simple signal detection in noise.

**Keywords: conductive hearing loss, masking, noise, signal detection, auditory cortex, intracellular, electrophysiology, gerbil**

#### **INTRODUCTION**

Listeners with hearing loss often struggle to understand speech in noisy environments. This difficulty is reflected in increased thresholds for detecting a simple signal in noise, i.e., simultaneous masking. Sensorineural hearing loss (SNHL) affects both the peripheral (cochlear) and central auditory system (cochlear nucleus and above), making it challenging to determine the mechanisms underlying impaired signal detection. We thus use a model of conductive hearing loss (CHL) that leaves the cochlea intact, allowing us to determine whether and how central auditory changes induced by hearing loss affect signal processing in noise.

CHL has effects on the central auditory system. In children, chronic middle ear infections (otitis media) produce a fluctuating CHL that can overlap with critical periods of neural development. Auditory deprivation during these periods alters intrinsic cellular and synaptic properties throughout the central auditory system (Vale and Sanes, 2000, 2002; Leao et al., 2004; Youssoufian et al., 2005; Leão et al., 2006). Developmental CHL is correlated with persistent perceptual problems that are presumably linked to changes in the central auditory system (Whitton and Polley, 2011). Support for this idea arises from animal developmental studies (Knudsen et al., 1984; King et al., 2000; Popescu and Polley, 2010). For example, binaural CHL leads to increased perceptual detection thresholds for slow amplitude modulations, and these behavioral deficits match the magnitude of neural shifts in auditory cortex (ACx; Rosen et al., 2012).

Acoustically demanding conditions, such as noisy environments, are particularly challenging for children who experience CHL. In multiple studies, children with a history of CHL have greater difficulty correctly identifying words or understanding speech in background noise than controls, requiring higher signal-to-noise ratios (SNRs) to attain equivalent performance (Gravel and Wallace, 1992; Schilder et al., 1994; Hall et al., 2003; Eapen et al., 2008; Hsieh et al., 2009; Keogh et al., 2010). These speech processing difficulties are likely due to changes in the central auditory system that arise from auditory deprivation (Sanes and Bao, 2009).

Detection of simple signals in noise should also be susceptible to CHL-induced central auditory system changes. Auditory percepts that reach mature performance levels gradually are susceptible to central changes that can arise due to hearing loss-induced deprivation (Moore, 2002). In particular, for the detection of brief signals in noise (simultaneous masking), thresholds do not reach adult levels until 10 years of age or later in humans (Hartley et al., 2000; Huyck, personal communication).

Neural elements that are affected by early hearing loss may contribute to these deficits. In auditory cortex, many hearing loss-induced effects are considered to involve modifications in local inhibitory networks (Calford et al., 1993; Kral et al., 2000; Chang et al., 2005; Razak et al., 2008; Takesian et al., 2010, 2012). Importantly, at least some of these changes persist into adulthood (Takesian et al., 2012). Thus, cortical neural changes during development as a result of hearing loss may contribute to signal detection problems that persist into maturity.

Deficits in simultaneous masking are also seen in SNHL listeners, but are usually attributed to peripheral mechanisms. In SNHL listeners, elevated simultaneous masking thresholds are ascribed to broadened filters and abnormal intensity perception that arise from cochlear damage, particularly of outer hair cells (Moore and Glasberg, 1986; Glasberg et al., 1987; Florentine, 1992; Kidd et al., 2002; Oxenham and Bacon, 2003). In normal listeners, psychophysical models attribute signal-innoise detection thresholds to processing within the cochlea (Dai et al., 1991; Schlauch and Hafter, 1991; Moore, 2012). However, both psychophysical and neurophysiological data indicate that the central auditory system contributes to simultaneous masking performance.

Several psychophysical phenomena and auditory cortical responses implicate the central auditory system in simultaneous masking. One is overshoot, when a signal presented in the middle of a noise masker is more detectable than a signal presented at or near the noise onset (Elliott, 1965; Zwicker, 1965). Peripheral mechanisms cannot explain this phenomena (Smith and Zwislocki, 1975; Moore et al., 1987; Bacon and Smith, 1991), but neurons in primary ACx have response properties consistent with overshoot: signals presented at a delay relative to a background sound elicited more action potentials than those presented close to background sound onset, and this was directly attributable to inhibition from local cortical circuits (Volkov and Galazyuk, 1992). Another compelling indicator of central involvement is that a subject's expectation influences performance. In the presence of a continuous noise, detection of a tone can drop nearly to chance when the tone occurs at an unexpected frequency or duration (Scharf et al., 1987; Dai et al., 1991; Schlauch and Hafter, 1991; Wright and Dai, 1994). This susceptibility to stimulus variability has no peripheral correlate and implicates higher processing elements such as sensory memory and attention. A central correlate may exist in neurons from auditory cortical areas, which modulate their discharge rates in response to sound elements that deviate from expected values (Ulanovsky et al., 2003; Gill et al., 2008; Buran et al., 2014a).

Although it is difficult to pinpoint mechanisms underlying perceptual deficits, CHL, which does not result in hair cell damage, can disambiguate the contributions of peripheral and central elements to perception (Tucci et al., 1999; Lee et al., 2013). Early CHL induces central synaptic and cellular changes across the auditory neuraxis (Webster, 1983; Stuermer and Scheich, 2000; Tucci et al., 2001; Sumner et al., 2005; Xu et al., 2007; Takesian et al., 2010, 2012). Thus perceptual deficits resulting from CHL are presumably due to central contributions. Here we use an animal model of early CHL to demonstrate the effects of hearing loss on basic auditory perception, and to examine putative neural correlates in the central auditory system. Mongolian gerbils underwent surgery prior to the onset of hearing to induce a permanent moderate CHL. Animals with this permanent loss were then tested in adulthood compared with normal-hearing controls. Performance on an operant conditioning task demonstrated that early CHL impaired the perception of tones in simultaneous noise maskers. Then in normal-hearing animals, we used intracellular recordings to reveal a cortical correlate of simultaneous masking: local cortical processing reduced tone-evoked responses in the presence of noise. This raises the possibility that altered cortical responses affect simultaneous masking thresholds in animals with hearing loss.

## **METHODS**

## **ANIMALS**

All procedures relating to the maintenance and use of animals were approved by the Institutional Animal Care and Use Committee at the Northeast Ohio Medical University. Adult Mongolian gerbils (*Meriones unguiculatus*) ranging between postnatal (P) day 54–125 were tested in one of two procedures. Cortical responses to signals in noise were measured with intracellular recordings from a group of control (CTR) animals (*n* = 7). Perceptual detection thresholds were obtained from separate groups using an operant conditioning procedure. Control animals (*n* = 8) received normal auditory experience during development and were compared to animals with developmental CHL (*n* = 10). All animals were weaned at P30 and housed with litter mates in a 12 h light/12 h dark cycle. Groups were comprised of animals from multiple litters and included both males and females.

## **CONDUCTIVE HEARING LOSS INDUCTION AND MEASUREMENT Malleus removal**

Bilateral CHL was induced at P10–11 prior to the onset of hearing by tympanic membrane puncture and malleus extirpation (Tucci et al., 1999). Pups were anesthetized with methoxyflurane (Metofane, Ivesco Holdings) and the malleus was removed bilaterally through perforations in each tympanic membrane. At the conclusion of the study, hearing thresholds were measured via auditory brainstem responses (ABRs) from all animals. In addition, CHL animals were sacrificed and both ears examined to confirm malleus removal and to verify that the cochlea was not damaged by visual inspection of the bony labyrinth.

#### **Auditory brainstem response (ABR)**

After behavioral testing, ABRs from a subset of animals were measured to assess neural hearing thresholds. Tucker Davis Technologies (TDT) software and hardware were used to generate and present sounds (SigGen, BioSigRZ, RZ6), and to digitize and record neural responses (RZ6, RA4PA). Animals were anesthetized with ketamine and chloral hydrate and presented with auditory stimuli from a free-field calibrated speaker (TDT MF-1) positioned 6 cm in front of the animal. Responses were measured using stainless steel needle electrodes inserted subdermally at the dorsal midline between the eyes (non inverting), posterior to each pinna (inverting), and base of the tail (common ground). Auditory stimuli were 3 ms pure tones with 1 ms rise/fall times, repeated at 15/s. Sound level was adjusted in 5 dB steps to obtain a threshold response (i.e., a visually detectable N1 potential).

## **BEHAVIORAL TRAINING AND TESTING**

#### **Experimental environment**

Gerbils were placed in a custom built test cage in a doublewalled room (ETS Lindgren Acoustic Systems) lined with echo-attenuating material, and were observed via closed circuit monitor. The test cage contained a stainless steel drinking spout and metal floor plate. Gerbil contact with both plate and spout completed a circuit that initiated water delivery via a syringe pump (New Era). A personal computer connected to a digital I/O interface TDT measured animal contact and controlled the timing of acoustic stimuli, water delivery (0.2–0.25 ml/min), and a small aversive current delivered at the end of warning trials. Auditory stimuli were generated by the TDT system and delivered via a calibrated custom speaker (Madisound) centered ∼ 60 cm in front of the lick spout. Sound level at the test cage was measured with a spectrum analyzer (Bruel and Kjaer 2690-OF2) via a 1/4 inch free-field condenser microphone (Bruel and Kjaer 4939) positioned at the location of the animal head when in contact with the spout. Noise levels are in dB SPL, converted from RMS measurements.

#### **Auditory stimuli and operant conditioning task**

During training and testing, animals heard continuous repeated bursts of a 300 ms noise masker (30% BW noise centered linearly at 4 kHz) with 700 ms inter-burst intervals. Within this repeated background were intermittent SAFE and WARN trials (**Figure 1A**). SAFE trials contained only the noise masker, while WARN trials contained the masker and signal (4 kHz pure tone, 40 ms duration, 2 ms rise/fall; **Figure 1A**). Contact with the spout was monitored immediately prior to each trial and the trial only proceeded if the animal was in contact with the spout for 75% of the 50 ms pre-check window. The warning stimulus was followed by an aversive unconditioned stimulus (300 ms electrical current via the lick spout) delivered 300 ms after the end of the 40 ms signal. To determine if the animal detected the signal, contact with the spout was monitored during the 50 ms prior to delivery of the shock. Contact for ≤75% of this window was scored as a hit. For the same window during SAFE trials a contact time of ≤75% was scored as a false alarm (FA). WARN trials always occurred at the end of a block of 2–4 SAFE trials, randomized to avoid temporal conditioning.

## **Sound levels**

Control and CHL animals were tested at approximately equivalent sensation levels based on previous measures of hearing loss induced by malleus removal. This procedure typically produces an attenuation of ∼45 dB as assessed by ABR (Tucci et al., 1999; Rosen et al., 2012), and of ∼30 dB at 4 kHz as assessed by behavioral testing (Buran et al., 2014b; see Section Results for more detail). Sound levels were presented 35 dB louder for CHLs (85 dB SPL masker) than CTRs (50 dB SPL masker). The signal levels began at 23 dB SPL above the masker level and were reduced to determine detection threshold. We determined that the stimulus was not distorted at the loudest levels presented.

#### **Procedural training**

Behavioral training and testing involved a conditioned avoidance procedure (Heffner and Heffner, 1995; Kelly et al., 2006). Animals were water deprived for 48 h prior to training and remained on controlled water access for the duration of the training and testing. Animals were introduced to the behavioral cage and trained to initiate water delivery via contact with the metal spout, during repeated presentations of gated noise maskers (**Figure 1A**). Animals learned to withdraw from the spout when an acoustic cue

was delivered 300 ms after the offset of the signal. During the 50 ms

duration.

(tonal signal) was presented within the noise, in order to avoid a low AC current (0.25–2.5 mA, 300 ms, Coulbourn) delivered through the lick spout. Since animals display large betweensubject variability in pain sensitivity (Mogil, 1999; Wasner and Brock, 2008; Nielsen et al., 2009), the shock level was adjusted continuously for each animal to produce reliable withdrawal from the spout without dissuading the animal from returning to the spout. For initial training, a long (200 ms) signal was presented and shortened in 20–30 ms steps until animals reliably detected the target duration of 40 ms (**Figure 1B**). To establish criterion performance at the target duration, warning trials were presented until performance reached 70% correct over 10 consecutive trials.

#### **Perceptual testing**

Once animals reached criterion on the conditioned avoidance procedure, they were tested with decreasing signal levels using the method of constant limits: five signal levels separated by 3 dB presented in decreasing order. Each day of testing used a range of sound levels that bracketed the previous day's threshold (Sarro and Sanes, 2010). Animals were tested for 4–5 days with increasing difficulty in order to determine thresholds for detecting the signal in noise (signal-to-noise ratio, SNR, quantified as the signal in dB minus the noise in dB). Pilot data indicated that this duration of testing produced reliable performance, while longer testing resulted in poorer performance, as the difficulty of the task induced animals to adopt strategies that resulted in increased thresholds (e.g., high FA rates or long lapses of attention). SNRs for the best day along with the best 3 days of performance were taken as perceptual thresholds.

#### **Data analysis**

A performance value, *d* <sup>0</sup> = *z*(hits)−*z*(false alarms), was obtained for z-scores that corresponded to right-tail *p*-values (Swets, 1973; Yanz, 1984), and was calculated for each signal level. Thresholds were defined as the signal level at which performance reached *d* 0 = 1; only sessions in which animals performed on a minimum of 25 WARN trials were included in the analysis. Psychometric functions of *d* 0 across signal level were constructed for each day of testing. Performance during the best 3 days of testing served as the assessment of practiced detection thresholds. Performance across groups was compared with two-sided Wilcoxon rank sum tests for unpaired data. All values are given as mean ± standard error (SEM).

#### **Inclusion criteria**

All animals that performed behavioral testing were included in the analysis. In order to compare the effects of treatment group, no animals were excluded for poor performance, as is common in behavioral studies examining best performance capability. Animals that did not reach criteria after 10 days of training with the long-duration signal were excluded. This removed three controls and six CHL animals from the study.

#### **INTRACELLULAR RECORDINGS Surgical preparation**

Mongolian gerbils (*n* = 7) were anesthetized through isoflurane inhalation (1.5–2.5% in oxygen) and held secure in a stereotaxic apparatus. A headpost was cemented to the skull using dental acrylic, and a small craniotomy was made over left auditory cortex, leaving the dura intact. Following this brief surgical procedure, animals were anesthetized with ketamine (30 mg/kg) and chloral hydrate (350 mg/kg) in preparation for recordings, and were maintained as necessary with supplemental doses.

#### **Acoustic stimuli**

Stimuli were generated using TDT software (SigGen) and hardware (RP 2.1) and delivered via a calibrated custom built speaker. The signal was a 40 ms FM downsweep with a 10 kHz bandwidth, extending through a range chosen to encompass the cell's best frequency (BF). This range was based on an initial extracellular recording that determined the general BF of cells within the region. The masker was a 200 ms broadband noise, with onset 100 ms before the FM signal. Signal and masker were presented at equal amplitudes.

## **Electrophysiology**

Intracellular recordings were made using techniques described previously (Voytenko and Galazyuk, 2008). Animals were placed inside a single-walled acoustic chamber (Industrial Acoustics), and positioned on an air table 7<sup>00</sup> from a freefield speaker. Sharp microelectrodes were pulled from 1.2-mm-diameter quartz glass (Sutter Instruments) on a Flaming-Brown micropipette puller and filled with 3M potassium acetate. Impedance ranged between 40 and 90 M. After placement on the dorsal surface of the brain, the exposure was filled with 4% agar and the electrode was advanced in 3-µm steps using a precision microdrive (Kopf, Model 660). Intracellular responses were amplified (Cygnus Technologies NeuroData IR183A), monitored with Pulse software (v. 8.65) and digitized using a data acquisition system (Heka model EPC-10) at a sampling rate of 100 kHz.

### **Analysis**

Responses to masked and unmasked signals were compared within cells. 10–20 repetitions were presented for each stimulus, and mean traces calculated from membrane potentials (Vm) after truncating spikes. Hyperpolarization within a given time window was measured from mean traces as the area under the curve in relation to baseline Vm. For each cell, the time window in which a specific response occurred (either a hyperpolarization or rebound action potentials) was determined from the mean V<sup>m</sup> trace elicited by the signal alone. This time window was used to measure response magnitudes with and without a masker present; these were compared with two-sided Wilcoxon signed rank tests for paired data. All values are given as mean ± SEM.

#### **RESULTS**

#### **DEVELOPMENTAL CONDUCTIVE HEARING LOSS DEGRADES SIGNAL DETECTION IN NOISE**

Behavioral thresholds for signal detection in noise were obtained from two groups of adult gerbils: animals reared with CHL prior to hearing onset, and age-matched controls (CTR). Animals were trained to detect a brief 4 kHz signal embedded in a longer noise masker (30% bandwidth centered around the signal frequency). Once animals were reliably performing the task, they received several days of procedural testing to determine threshold (defined as the quietest tone level that could be reliably detected in the presence of the masker, reported here as SNR in dB). Thresholds were compared across groups for each animal's best day (**Figure 2A**) and mean across the best 3 days (**Figure 2B**). CTR animals displayed significantly lower detection thresholds compared to CHLs (Best day: *p* < 0.001; Best 3 days: *p* = 0.01, Wilcoxon rank-sum). Differences in thresholds were not due to differences in training or performance on the task. Groups received a comparable number of procedural training trials (**Figure 2C**: *p* = 0.97, Wilcoxon rank sum) and learned the task at a similar rate (days of training: CTR 5.6 ± 0.9 vs. CHL 5.8 ± 0.6, *p* = 0.69, Wilcoxon rank-sum). Furthermore, the amount of procedural training on the task was not correlated to threshold for either group (**Figure 2D**: CTR: *R* <sup>2</sup> = 0.192, *p* = 0.28, CHL: *R* <sup>2</sup> = 0.021, *p* = 0.69).

One might expect that early hearing loss would affect learning or improvement on perceptual tasks. For this simple simultaneous masking task that was not the case. In addition to equivalent time-courses of training, both groups improved but did so equivalently across testing days (**Figure 2E**; First day minus last day threshold (dB SNR): CTR 7.8 ± 2.7 vs. CHL 4.7 ± 4.6, *p* = 0.15, Wilcoxon rank-sum). Furthermore, the rate of learning did not differ between groups, as animals reached best performance over an equivalent number of testing days (number of testing days to reach threshold (dB SNR): CTR 4.6 ± 0.2 vs. CHL 3.7 ± 0.4, *p* = 0.19, Wilcoxon rank-sum).

#### **PERFORMANCE VARIABLES DO NOT ACCOUNT FOR THE EFFECT OF EARLY HEARING LOSS**

Higher thresholds for the CHL animals did not appear to be due to poorer attention to the task. Poor attention can be indicated by high FA rates, measured here as withdrawal from the spout during SAFE trials (which minimizes the chance of receiving a shock during poor attention to the signal). CTR and CHL animals did not differ in their FA rates (**Figure 3A**: *p* = 0.26, Wilcoxon rank-sum) and this measure of attention did not predict threshold for either group (**Figure 3B**: CTR: *R* <sup>2</sup> = 0.172, *p* = 0.31, CHL: *R* <sup>2</sup> = 0.159, *p* = 0.25).

Attention can also be indicated by licking consistency. During testing, animals ideally maintain constant contact with the spout, and withdraw only upon detecting a signal. Another strategy is to drink continuously but make poor contact with the spout, with continuous micro-withdrawals to minimize the magnitude of shock received. This hesitant contact can be an indicator of poor attention or performance anxiety and can be quantified as the number of times an animal breaks contact with the spout between WARNs. CTR animals displayed significantly more breaks in contact between WARNs compared to CHLs (**Figure 3C**: *p* < 0.01, Wilcoxon rank-sum) indicating group differences in overall performance strategy. However, this was independent of signal detection thresholds (**Figure 3D**: CTR: *R* <sup>2</sup> = 0.06, *p* = 0.57, CHL: *R* <sup>2</sup> = 0.001, *p* = 0.93). Notably, this performance strategy would be likely to increase the false alarm rate in CTR animals, thus increasing their thresholds as calculated by the *d* <sup>0</sup> measure. Despite this, CTR thresholds were significantly lower than those of CHLs, suggesting that, if anything, the threshold differences measured here may underestimate perceptual differences across groups.

see Section Methods) were significantly higher for conductive hearing loss (CHL; shaded orange) compared with control (CTR; cream-filled black) animals, as measured by **(A)** the best performance day and **(B)** the mean across the best 3 performance days. Thresholds from individual animals are depicted as circles or diamonds atop each bar. **(C)** Groups required similar numbers of training trials to reach criterion performance. **(D)** The amount of training did not predict final detection thresholds. Lines show non-significant linear fits. **(E)** Detection thresholds across testing days indicate gradual improvement which did not differ across groups. Thin lines are thresholds from individual animals, and thick lines are means. Abbreviations: CTR: controls, CHL: conductive hearing loss, n.s.: not significant, \*\*: p < 0.01, \*\*\*: p = 0.001.

Task proficiency and performance consistency can also affect behavioral thresholds. As an indicator of task proficiency, we measured performance at the easiest level on the best testing

day (**Figure 4A**). As an indicator of performance consistency we measured the range of *d* 0 scores for the easiest level presented across days; a wider range indicates increased variability in performance (**Figure 4B**). Neither the *d* 0 score for the easiest level on the best performance day (CTR: *R* <sup>2</sup> = 0.35, *p* = 0.12, CHL: *R* <sup>2</sup> = 0.11, *p* = 0.34) nor the range of *d* 0 scores at the easiest level (CTR: *R* <sup>2</sup> = 0.17, *p =* 0.30, CHL: *R* <sup>2</sup> = 0.42, *p* = 0.57) correlated with behavioral threshold for either group. Thus perceptual deficits in CHL animals are independent from their ability to perform the task.

group (lines show non-significant linear fits). Abbreviations as in **Figure 2**.

#### **SENSATION LEVEL ALONE DOES NOT ACCOUNT FOR THE EFFECT OF EARLY HEARING LOSS**

In order to attribute deficits in masked thresholds to central changes, it is essential to demonstrate that the attenuation provided by CHL is not sufficient to explain these deficits. It is possible that increased signal detection thresholds for CHL animals could be attributed to the stimuli not being presented at sensation levels equivalent to CTRs. To account for this, the two groups were tested with stimuli that differed by 35 dB (for both masker and signal). This level difference was based on behaviorally-measured level thresholds for separate groups of CHL animals in two tasks: AM detection (Rosen et al., 2012) and tone-detection (Buran et al., 2014b), which indicated 35 and 30 dB shifts respectively. We then tested to ensure that the amount of hearing loss at 4 kHz did not predict detection thresholds for our 4 kHz masked tones. **Figure 4C** shows that hearing levels for a 4 kHz tone, as measured by ABRs for each animal, did not correlate with behavioral thresholds in either CTRs (*R* <sup>2</sup> = 0.01, *p* = 0.84) or CHLs (*R* <sup>2</sup> = 0.10, *p* = 0.55). As there was no differential difficulty in hearing the signal, nonequivalent sensation levels could not account for the increased masked thresholds seen in CHL animals. That is, the characteristics of the perceptual deficit are not what would be predicted based solely on the amount of signal attenuation caused by the CHL.

#### **AUDITORY CORTEX CONTRIBUTES TO REDUCED RESPONSES DURING SIMULTANEOUS MASKING**

Since the mechanisms by which central areas contribute to simultaneous masking are not resolved, intracellular responses to noisemasked signals were measured via sharp electrode recordings in auditory cortex. We focused on signal-evoked hyperpolarizations, as these are of cortical origin and reflect central processing (Somogyi et al., 1983; DeFelipe and Jones, 1985; Matsubara, 1988; Albus et al., 1991; Albus and Wahle, 1994; Tomioka et al., 2005; Higo et al., 2007). The spikes that occurred when hyperpolarizations returned to baseline were examined because they were likely to be generated by post-inhibitory rebound. Such spikes can be attributed to local cortical processes.

The subset of cells (*n* = 16) that exhibited hyperpolarization in response to the signal were examined. For each cell, the response to the signal alone was compared with the response to the masked signal. An example of overlaid traces in **Figure 5A** (top) shows a FM signal-evoked spiking onset response followed by a hyperpolarization, with rebound spikes upon return to baseline. The bottom trace shows a reduced hyperpolarization with no clear rebound spikes in response to the FM signal during a masker. For the sample of cells tested, the magnitude of the signal-evoked hyperpolarization (measured as the area under the curve relative to the membrane resting potential) was significantly reduced during presentation of the masker (**Figure 5B**; Wilcoxon signed rank, *p* = 0.049). Furthermore, those cells that spiked on rebound from the FM signal-evoked hyperpolarization fired significantly fewer rebound spikes during masker presentation (**Figure 5C**; Wilcoxon signed rank, *p* = 0.008). A reduced firing response to a masked signal is one measure of a neural correlate of perceptual masking. As this signal-evoked hyperpolarization must arise locally, from intrinsic cortical cellular mechanisms or local inhibitory circuitry, these data are evidence for a cortical contribution to simultaneous masking.

## **DISCUSSION**

This study demonstrates that simultaneous masking involves the central auditory system and is impaired by early CHL. Developmental auditory deprivation induces anatomical and physiological changes in the central auditory system that persist into adulthood in animals (Harrison et al., 1993; Chang and Merzenich, 2003; Kral et al., 2005; Sanes and Bao, 2009). Perceptions that involve the central auditory system should be affected. We thus asked whether animals with early CHL had increased detection thresholds for simple signals in noise. When tested with a conditioned avoidance procedure, adult animals reared with CHL displayed significantly poorer detection thresholds than CTRs for a brief tone embedded in a simultaneous noise masker, despite equivalent training (**Figure 2**). These masked threshold increases could not be attributed to the amount of attenuation induced by the hearing loss. Furthermore, they were not due to task proficiency, strategy, or attention, as these did not differ across groups (**Figures 3**, **4**). This type of developmental CHL maintains cochlear integrity and is known to alter intrinsic and synaptic properties in the central auditory system, particularly in ACx (Xu et al., 2007; Takesian et al., 2010, 2012). For such alterations to affect thresholds for simultaneous masking in deprived animals, neural areas central to the periphery should directly contribute to the responses to these signals, rather than merely inheriting responses from the auditory periphery. We therefore made intracellular *in vivo* recordings from ACx in control animals, and demonstrated a cortical contribution to simultaneous masking. This can be attributed to local sources, either intrinsic neural properties or local inhibitory synaptic inputs (**Figure 5;** Somogyi et al., 1983; DeFelipe and Jones, 1985; Matsubara, 1988; Albus et al., 1991; Albus and Wahle, 1994; Tomioka et al., 2005; Higo et al., 2007).

#### **TECHNICAL CONSIDERATIONS OF THE DEVELOPMENTAL CHL MODEL**

The CHL induced here, via malleus removal, attenuates sound transmission to the inner ear but leaves the cochlea intact: ossicle removal does not alter hair cell counts on the basilar membrane, and bone conduction thresholds are normal (Tucci and Rubel, 1985). Here, visual postmortem analyses excluded the possibility of accidental cochlear damage. In SNHL listeners, reduced cochlear nonlinearities are often used to explain raised thresholds for simultaneous masked signals (Moore and Glasberg, 1986; Oxenham and Bacon, 2003), but this should not be the source of the raised thresholds seen here. The possibility exists that the permanent CHL induced compensatory changes at the periphery, perhaps via efferent alterations of the middle ear reflex (Munro and Blount, 2009). However, signal detection thresholds were not correlated with hearing thresholds at the same frequency (**Figure 4C**). Another possible peripheral contribution would be the induction of a hearing loss that was not consistent across the frequency range used in the behavioral masking task. Hearing loss with malleus removal is relatively flat across the frequency spectrum (Rosen et al., 2012), mimicking the flat CHL that occurs during otitis media in children (Kokko, 1974; Anteby et al., 1986; Hunter et al., 1994). Here, signals of 4 kHz were masked with a noise spanning 3.4–4.6 kHz, which approximates the gerbil critical band at that center frequency (∼1 kHz; Kittel et al., 2002). Finally, it is worth noting that CHL induced via malleus removal is not the best mimic of otitis media, but is wellsuited to examine central contributions to hearing loss deficits. Our CHL is permanent, whereas early otitis media typically produces a fluctuating CHL that resolves after childhood. Our model does not reproduce the effusion viscosity that occurs with otitis media, which alters sound transmission delays (Hartley and Moore, 2003). It therefore avoids introducing an element of altered peripheral processing that may affect temporal processing in listeners with otitis media.

#### **CORTICAL CONTRIBUTIONS TO SIMULTANEOUS MASKING**

Our evidence in combination with previous work indicates that responses to simultaneous masked signals evolve from the periphery to central regions. In the periphery, simultaneous

**masked signals**. **(A)** An example of overlaid traces (gray traces with mean in red) from a cortical neuron in response to 10 presentations of a FM tone (top; blue bar) or a FM tone occurring during a broadband noise (bottom; blue bar within gray bar). The traces show changes in voltage over time. Top: This neuron has an onset spiking response to the FM tone followed by a hyperpolarization, with rebound spikes (arrow) upon return to baseline (black dotted line). Bottom: When a simultaneous masker is presented, there is a reduced hyperpolarization and no clear rebound spikes (arrow) in response to the masked FM signal. **(B)** During masker presentation compared with tone alone, there was a significant reduction in the magnitude of signal-evoked hyperpolarizations. **(C)** When a simultaneous masker was present (gray bar), there were significantly fewer spikes immediately following hyperpolarizations, as compared with the tone presented alone (blue bar). Since hyperpolarizations arise within cortex and indicate central processing, a reduction in their magnitude, and a reduction of the number of post-inhibitory rebound spikes, are attributable to central mechanisms. \*\*: p < 0.01, \*: p < 0.05.

masking functions in auditory nerve fibers are based upon cochlear nonlinearities. For example, with a continuous broadband noise masker, auditory nerve fiber responses to tones shift in a manner consistent with two-tone suppression (the reduction of the response to one tone by simultaneous presentation of another tone) (Costalupes et al., 1984; Ruggero et al., 1992). This suppression is attributable to outer hair cell interactions with the basilar membrane (Robles and Ruggero, 2001). SNHL reduces those nonlinearities, which affects simultaneous masked thresholds (Oxenham and Bacon, 2003). In comparison, CHL, which does not alter cochlear nonlinearities, also affects simultaneous masked thresholds, and this is presumably due to central processes.

Our intracellular data reveal a cortical correlate of simultaneous masking. In neurons that exhibited signalevoked hyperpolarization, the responses to FM tones showed a reduced hyperpolarization and fewer subsequent spikes in the presence of a noise masker (**Figures 5B,C**). The action potentials that immediately follow the hyperpolarization are likely to be rebound spikes, resulting from intrinsic cellular properties and thus of cortical origin. Cortical inhibition is known to arise locally, via intrinsic horizontal, intralaminar, and some long-range projections (Somogyi et al., 1983; DeFelipe and Jones, 1985; Matsubara, 1988; Albus et al., 1991; Albus and Wahle, 1994; Tomioka et al., 2005; Higo et al., 2007). As the hyperpolarizations seen here arise locally, our data are an example of the cortex modifying inherited subcortical information. The two possible sources of this hyperpolarization are either inhibition from local interneurons, or intrinsic cellular properties that generate afterhyperpolarizations. Intracellular *in vivo* recordings of ACx pyramidal neurons show responses similar to those seen here: signal-evoked depolarization often followed by hyperpolarization. The hyperpolarization has characteristics of a chloride-mediated IPSP that is likely mediated by GABA<sup>A</sup> receptors (De Ribaupierre et al., 1972; Ojima and Murakami, 2002). Here, the hyperpolarization time courses were similar to those previously described. Therefore they are likely to arise from local inhibition rather than intrinsic hyperpolarizing currents activated by depolarization. If the hyperpolarization indeed arises from local cortical interneurons, the reduced hyperpolarization seen during masking may arise from reduced excitatory input onto those interneurons. Reduced neural responses are seen in subcortical regions during simultaneous masking, and may be a source of these excitatory feedforward inputs (Costalupes et al., 1984; Rees and Palmer, 1988; May and Sachs, 1992).

To our knowledge, the only other direct demonstration of inhibitory reduction of spiking during simultaneous masking in ACx is by Volkov and Galazyuk (1992), where a background sound evoked a local inhibitory current that reduced the response to an overlaid signal. However, they did not directly compare unmasked and masked conditions and only tested binaural interactions. Previous neurophysiological analyses of ACx and inferior colliculus have provided some evidence for central contributions to simultaneous masking involving local inhibition. In the inferior colliculus, tone-evoked responses were shifted during simultaneous noise in a manner consistent with local inhibition, providing responses that differed from auditory nerve fibers (Rees and Palmer, 1988). In primary ACx, monotonic cells (those whose discharge rate increases with sound level) responded like auditory nerve fibers, dominated by whichever sound (signal or masker) elicited a stronger response alone. In contrast, nonmonotonic cells (those with more complex level-sensitivity) were suppressed by a noise masker, suggestive of inhibitory influences (Phillips and Cynader, 1985; Phillips and Kelly, 1992). Consistent with this idea, nonmonotonic response profiles do not reflect inputs inherited from lower hierarchical areas. Instead, they arise from local excitatory and inhibitory interactions (Wu et al., 2006). Thus our result is expected: local cortical inhibition shapes responses to signals masked by simultaneous noise. We did not explicitly test for monotonicity, but would predict that the neurons analyzed here were nonmonotonic.

#### **NEURAL CORRELATES OF SIGNAL DETECTION**

A challenging element is connecting changes in neural discharge with altered auditory perception. The reduced spiking shown in this study indicates a reduced output to a masked signal and would be expected to increase signal detection thresholds. This idea is supported by recordings of cortical auditory evoked potentials (CAEPs) in humans, which show a correlate for the detection of masked signals in the P1/N1/P2 waveform. The amplitude of CAEP peaks is correlated with detectability of signals in noise (Martin et al., 1997). Since the CAEP primarily reflects synchronous activity arising from ACx (Näätänen and Picton, 1987; Eggermont, 2007), the reduction in ACx spiking seen here during the presence of a masker could manifest as reduced CAEP amplitude, and may thus contribute to poor perception in noise.

Cortical changes known to occur with hearing loss suggest a simple mechanism for impaired signal detection. Early CHL directly affects the central auditory system, generally reducing inhibition relative to excitation in order to maintain homeostasis (Sanes, 2013). For example, animals raised with early CHL have reduced IPSC amplitude, decreased firing rates in fastspiking interneurons, and facilitating rather than the normal depressing short-term plasticity (Takesian et al., 2010, 2012). Inhibition is known to sharpen neural tuning curves at the level of cortex (Ojima, 2011). In normal listeners, inhibition would reduce the magnitude of noise-evoked responses. With reduced cortical inhibition seen from early hearing loss, the frequency components of a noise surrounding a neuron's BF would *increase* the response magnitude to noise. At the same time, early CHL reduces the magnitude of ACx firing to tones (Rosen et al., 2012). Concurrently, these effects would reduce the SNR of tones presented in a noise masker. In this manner, changes to the cortical network may subserve the impaired simultaneous masked thresholds seen in this study. This prediction can be directly tested in our animal model. Future experiments that measure neural activity during perceptual detection tasks in CHL animals can connect changes in central auditory regions with perceptual impairments arising from developmental deprivation.

#### **ACKNOWLEDGMENTS**

This work was supported by R01DC013314 to Merri J. Rosen and R01DC011330 to Alexander V. Galazyuk. We thank Julia Huyck, Kyle Nakamoto, and Sharad Shanbhag for helpful ideas, discussions, and comments on the manuscript.

#### **REFERENCES**


of the central auditory pathways. *Acta Otolaryngol.* 113, 296–302. doi: 10. 3109/00016489309135812


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 03 June 2014; accepted: 21 August 2014; published online: 09 September 2014*.

*Citation: Gay JD, Voytenko SV, Galazyuk AV and Rosen MJ (2014) Developmental hearing loss impairs signal detection in noise: putative central mechanisms. Front. Syst. Neurosci. 8:162. doi: 10.3389/fnsys.2014.00162*

*This article was submitted to the journal Frontiers in Systems Neuroscience*.

*Copyright © 2014 Gay, Voytenko, Galazyuk and Rosen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms*.

## Auditory training during development mitigates a hearing loss-induced perceptual deficit

#### **Ramanjot Kang<sup>1</sup> , Emma C. Sarro<sup>1</sup> and Dan H. Sanes1,2\***

<sup>1</sup> Center for Neural Science, New York University, New York, NY, USA

<sup>2</sup> Department of Biology, New York University, New York, NY, USA

#### **Edited by:**

Jonathan E. Peelle, Washington University in St. Louis, USA

#### **Reviewed by:**

Robert N. S. Sachdev, Yale University, USA Amy Poremba, University of Iowa, USA

Steven Eliades, University of Pennsylvania Perleman School of Medicine, USA

**\*Correspondence:** Dan H. Sanes, Center for Neural Science, New York University, 4 Washington Place, New York, NY 10003, USA

e-mail: dhs1@nyu.edu

Sensory experience during early development can shape the central nervous system and this is thought to influence adult perceptual skills. In the auditory system, early induction of conductive hearing loss (CHL) leads to deficits in central auditory coding properties in adult animals, and this is accompanied by diminished perceptual thresholds. In contrast, a brief regimen of auditory training during development can enhance the perceptual skills of animals when tested in adulthood. Here, we asked whether a brief period of training during development could compensate for the perceptual deficits displayed by adult animals reared with CHL. Juvenile gerbils with CHL, and age-matched controls, were trained on a frequency modulation (FM) detection task for 4 or 10 days. The performance of each group was subsequently assessed in adulthood, and compared to adults with normal hearing (NH) or adults raised with CHL that did not receive juvenile training. We show that as juveniles, both CHL and NH animals display similar FM detection thresholds that are not immediately impacted by the perceptual training. However, as adults, detection thresholds and psychometric function slopes of these animals were significantly improved. Importantly, CHL adults with juvenile training displayed thresholds that approached NH adults. Additionally, we found that hearing impaired animals trained for 10 days displayed adult thresholds closer to untrained adults than those trained for 4 days. Thus, a relatively brief period of auditory training may compensate for the deleterious impact of hearing deprivation on auditory perception on the trained task.

**Keywords: conductive hearing loss, plasticity, auditory training, frequency modulation, development, perceptual learning**

## **INTRODUCTION**

The developmental sensory environment can initiate life long modifications to central nervous system computations and has been demonstrated for a broad range of sensory systems (for review see (visual) Hubel, 1978; Wiesel, 1982; Hooks and Chen, 2007; (somatosensory) Feldman and Brecht, 2005; (auditory) Keuroghlian and Knudsen, 2007; Sanes and Bao, 2009). In the auditory system, these neural deficits are closely associated with impaired perceptual skills. For example, developmental auditory deprivation leads to diminished behavioral performance on frequency discrimination, amplitude modulation detection, and sound localization (Clements and Kelly, 1978; Kerr et al., 1979; Knudsen et al., 1984; Moore et al., 1999; Rosen et al., 2012). Consistent with this, developmental hearing loss in humans can lead to persistent deficiencies in sound localization and signal detection, as well as impairments in the acquisition of speech and language (Hall and Grose, 1994; Wilmington et al., 1994; Hall et al., 1995; Kidd et al., 2002; Halliday and Bishop, 2005, 2006). Even transient periods of conductive hearing loss (CHL), due to chronic otitis media with effusion, may cause perceptual deficits (Whitton and Polley, 2011).

In contrast to the detrimental impact of developmental hearing loss, a brief period of auditory training during development can enhance performance on the trained task when animals are tested in adulthood (Sarro and Sanes, 2011). The long-term effect, as assessed in adulthood, is similar to the impact of adult perceptual training (Wright et al., 1997; Wright and Fitzgerald, 2001; Wright and Sabin, 2007). However, the short term effect of training during development can be surprisingly limited (Sarro and Sanes, 2010; Huyck and Wright, 2011). These findings suggest that developmental auditory training could counteract the long-term perceptual deficits induced by early sound deprivation. Here, we asked whether the diminished performance skills of adult animals reared with CHL could be rescued with a brief period of auditory perceptual training during juvenile development.

In the present study, we examined the performance of gerbils on a frequency modulation (FM) detection task. Adults reared with CHL display much poorer performance on this task than adults reared with normal hearing (NH; Buran et al., 2014). Therefore, we provided both CHL and NH juveniles with a brief period of auditory training on the FM detection task, and reassessed their detection thresholds in adulthood. Juvenile training permitted animals with hearing loss to display superior FM detection thresholds and decreased variance (as measured with psychometric function slope) as compared to CHL animals without juvenile training. These results suggest that training induces a long-term compensation for the perceptual deficits on the trained task caused by early hearing loss.

## **MATERIALS AND METHODS**

#### **ANIMALS**

Gerbil (*Meriones unguiculatus*) pups were weaned from commercial breeding pairs (Charles River) at postnatal days (P) 23–30. Males and females were caged separately and maintained in a 12 h light/dark cycle. All procedures related to the maintenance and uses of animals were in accordance with the "Institutional Animal & Use Committee Handbook" and approved by the University Animal Welfare Committee (UAWC) at New York University.

#### **DEVELOPMENTAL HEARING LOSS**

Bilateral CHL was induced via surgical removal of a middle ear bone, the malleus, prior to the onset of hearing at P10. At postnatal day 10 (P10), pups were anesthetized with the halogenated ethyl methyl ether, methoxyflurane. Anesthetic induction occurred within 10 min and produced complete elimination of responses to nociceptive stimuli. CHL was induced by tympanic membrane puncture and malleus extirpation (Tucci et al., 1999). A postauricular skin incision was made, and the tympanic membrane was visualized and punctured with a forceps. The malleus was then removed through this opening. The postauricular wound was closed with cyanoacrylate glue, and procedure repeated on the other side. After surgery, animals were warmed on a heating pad and returned to the litter when respiration and motor activity had recovered. The age of surgery was chosen based on the finding that anteroventral cochlear nucleus cell number is unaffected by cochlear ablation after P9 in gerbils (Tierney and Moore, 1997). This manipulation induces an attenuation of ≈55 dB at 4 kHz, as assessed by auditory brainstem response (Tucci et al., 1999; Rosen et al., 2012), but behavioral measures indicate an attenuation of ≈40 dB at 4 kHz (Buran et al., 2014). We did not use sham controls for this study, however previous work from our lab has published findings that show similar effects of CHL when compared to sham controls or non-sham controls (Takesian et al., 2012; Kotak et al., 2013), demonstrating the neurophysiological effects of CHL were not due to the anesthesia or surgery procedures.

#### **EXPERIMENTAL GROUPS**

FM depth detection thresholds were obtained from gerbils as both juveniles and adults. Data from adults reared with NH, and adults reared with CHL were collected previously for a study on the critical period of vulnerability to hearing loss (Buran et al., 2014). Both of these groups received procedural training and 10 days of perceptual training (described below) on an FM detection task from ∼P70–P90. The new data collected in this study was obtained from the following groups: (1) Juvenile trained NH animals (*n* = 11 (2f, 9m)) received procedural training and 4 or 10 days of perceptual training on an FM detection task beginning on ∼P23, the earliest age at which animals could be weaned and placed on controlled water access; (2) Juvenile trained CHL gerbils (*n* = 13 (7f, 6m)) also received procedural training and 4 or 10 days of perceptual training on an FM detection task beginning at ∼P23. As adults (∼P70–P90), after the age at which gerbils reach sexual maturity (Field and Sibold, 1999) these animals were retested on the FM detection task; (3) Adult trained NH gerbils (*n* = 6 (6m)) received procedural training and 10 days of perceptual training on an FM detection task beginning at ∼P70. As older adults (∼P120–150) these animals were retested on the FM detection task. These animals were not exposed to handling or the testing context prior to adult testing; (4) An additional group (*n* = 6 (2f, 4m)) of adults reared with NH were obtained, and added to the data obtained previously and re-shown here (Buran et al., 2014). These animals were not exposed to handling or the testing context prior to adult testing.

#### **BEHAVIORAL TESTING APPARATUS**

Gerbils were placed in a testing cage of approximated 1 ft<sup>2</sup> that was housed within a sound isolation booth (Gretch-Ken Industries), and observed from a separate room via a closed circuit monitor. When the animal contacted both a metal footplate and lick spout, they completed a circuit that initiated water delivery via a syringe pump (Yale Apparatus). A personal computer, connected to a digital interface (Tucker-Davis Technologies, TDT RZ6), generated the acoustic stimuli, timed the water delivery (0.3 ml/min), and controlled a small current that was delivered through the metal lick spout. Auditory stimuli were delivered via a calibrated tweeter (KEF Electronics) positioned 1 m in front of the test cage at 0◦ elevation. Sound level was calibrated with a spectrum analyzer (Bruel and Kjaer 3550) via a 1/4<sup>00</sup> free-field condenser microphone positioned at the head location when in contact with the lick spout.

#### **PROCEDURAL TRAINING**

All training used a conditioned avoidance Go-Nogo procedure to measure detection of FM stimuli (Heffner and Heffner, 1995; Kelly et al., 2006; Sarro and Sanes, 2010, 2011). Animals were placed on controlled water access and, upon introduction to the experimental cage, learned to obtain water from the metal lick spout. This training occurred in the presence of a continuous unmodulated 4 kHz tone. Sound level was set to 45 dB SPL for NH animals, and 95 dB SPL for animals with CHL to compensate for the elevated thresholds. These values were identical to those used in a published report on FM detection (Buran et al., 2014). Additionally, we previously found that sensation level was not correlated with FM detection threshold (Buran et al., 2014). Animals were then trained to withdraw from the spout when a FM stimulus was presented. To train the withdrawal response, a low AC current (0.5–1.0 mA, 300 ms; Lafayette Instruments) was delivered through the lick spout immediately after each FM stimulus. Since animals display between-subject variability in pain sensitivity (Mogil, 1999; Wasner and Brock, 2008; Nielsen et al., 2009) the strength of the shock was adjusted for each animal to reliably produce withdrawal from the spout, but not so great as to dissuade an animal from approaching the spout on subsequent trials (Sarro and Sanes, 2010). To train animals on the procedure, go trials (4 kHz center frequency; 5 Hz modulation rate; 500 Hz modulation depth) were presented until performance reached a criterion of ≥70% correct over 10 consecutive trials. All animals received procedural training to establish criterion performance when they were first introduced to the task as juveniles, and again when they were reintroduced to the task for assessment in adulthood.

#### **PERCEPTUAL TRAINING AND ASSESSMENT OF FM DETECTION THRESHOLDS**

Once this criterion was reached, we tested animals on a range of at least 5 FM depths within each session, presented in descending order from largest to smallest. On each subsequent day of perceptual training, an animal's performance on the previous day determined the range of depths that were presented; these depths always bracketed the previous day's detection threshold. This protocol was used for all treatment groups. For juvenile perceptual training, animals were trained for either 4 or 10 days. This choice was chosen to remain consistent with the number of juvenile training days used in a prior study (Sarro and Sanes, 2011). Subsequently, we found that adults approach their best detection thresholds on the FM detection task when using at least 10 days of detection training (Buran et al., 2014). Adult animals were tested until their performance stabilized (i.e., did not improve for 3 consecutive days).

#### **TRIAL STRUCTURE**

Each trial was 1000 ms long, either containing the FM stimulus (go trials) or not containing a modulation of the 4 kHz center frequency (nogo trials). To determine if the animal detected the FM stimulus, contact with the spout was monitored during the final 100 ms of each go trial. A contact time of <50 ms was scored as a correct response (i.e., a hit), and a contact time of >50 ms was scored as a miss. For nogo trials, a contact time of <50 ms was scored as a false alarm, and a contact time of >50 ms was scored as a correct rejection. Go trials always occurred after a block of 3–5 nogo trials, randomized to avoid temporal conditioning.

#### **DATA ANALYSIS**

Behavioral sensitivity, *d* <sup>0</sup> = *z*(false alarm) − *z*(hit), was obtained for *z*-scores that corresponded to the right-tail *p*-values (Swets, 1973), and was calculated for each FM depth. Only sessions in which an animal performed a minimum of 5 trials per stimulus value were included in the analysis. Performance functions from sessions consisting of at least five presentations of five different depths were fitted using the open-source package psignifit as described in Buran et al. (2014). To ensure fits were of sufficient quality we discarded fits where the deviance of the fit to the original dataset exceeded the 95th percentile of the deviance of the fit to 1,000 simulated datasets (see Fründ et al., 2011 for details). Threshold was defined as the FM depth at which performance reached a *d* <sup>0</sup> = 1, and slope was also calculated at a *d* <sup>0</sup> of 1, as obtained from fitted psychometric functions (see examples shown in **Figures 1, 2**). Average threshold for each treatment group was determined by averaging the threshold for the best 3 days of testing for each animal.

plotted for NH juveniles (black bar, gray circles) and CHL juveniles (red bar; light red circles). Data points from individual animals are the average of the best 3 days of performance. Open circles indicate 4 days of perceptual training, and closed circles indicate 10 days of perceptual training. Insets: Example psychometric function fits from one NH juvenile and one CHL juvenile, taken from a single day of testing.

Statistical tests were first performed to determine whether the dependent variable was normally distributed for each treatment group (NH, CHL, juvenile-trained CHL), using a Shapiro-Wilk normality test. We found two departures from normality (slope of psychometric function for NH adults, *p* = 0.02 and Juveniletrained CHL adults, *p* = 0.007) and used Levene's test for equal variance (using the median value as an estimate of each group's center) since it is more robust when samples deviate from a normal distribution. Two of the dependent variables displayed unequal variance (Adult FM thresholds, *df* = 2, *F* = 4.6, *p* = 0.02; and slope of psychometric function, *df* = 2, *F* = 33.6, *p* < 0.001). For all multiple comparison tests, we used the nonparametric Kruskal-Wallis test followed by pairwise comparisons using a two-sided Wilcoxon test with Holm-corrected *p*-values.

**FIGURE 2 | Training on FM detection as a juvenile improved detection thresholds assessed in adulthood**. Average FM detection thresholds (log10Hz; mean ± SEM) are plotted for NH adult (black bar, gray squares), CHL adult (red bar; red squares), CHL adult animals trained on the FM task as juveniles (red bar; red circles), and NH adult animals trained on the FM task as adults and retested at a later time (black bar, black squares). For the juvenile trained animals, open circles indicate 4 days of perceptual training, and closed circles indicate 10 days of perceptual training. Data points from individual animals are the average of the best 3 days of performance. Significant differences for between-group comparisons are presented in the text and indicated here by asterisks (**\*\*** significantly greater than all other groups; **\*** significantly smaller than all other groups). Insets: Example psychometric function fits from CHL adult animals trained on the FM task as juveniles (red circles, red line) and the NH adult animals trained on the FM task as adults and retested at a later time (black bar, black squares), taken from a single day of testing. Examples from the same set of NH adults and CHL adults are provided in a previous published report (Buran et al., 2014).

For normally distributed depended variables (Juvenile FM detection thresholds), we used a one-way ANOVA as a way to compare the groups.

#### **RESULTS**

#### **FM DETECTION ABILITY FOR NH AND CHL GERBILS IS SIMILAR DURING DEVELOPMENT**

To address whether developmental CHL influenced perception during the juvenile period, we obtained FM detection thresholds from NH and CHL animals at P23–P40. **Figure 1** plots the mean FM detection thresholds, and the performance of individual animals, with days of training indicated by the symbol type. An ANOVA reported a main effect of training duration (*F* = 35.0, *df* = 1, *p* < 0.0001), but there was neither an effect of hearing status (*F* = 3.3, *df* = 1, *p* = 0.08) nor an interaction between hearing status and training duration (*F* = 2.8, *df* = 1, *p* = 0.11). Therefore, it appears that CHL does not immediately diminish FM detection thresholds.

#### **ADULT FM DETECTION IS IMPROVED BY JUVENILE TRAINING**

We have previously shown that developmental CHL leads to degraded FM detection thresholds in adulthood (Buran et al., 2014). Here, we tested whether training on the FM detection task during juvenile development could rescue this perceptual skill in adulthood. NH and CHL animals were trained on the FM detection task from P23–P40, and retested as adults. **Figure 2** illustrates a main effect of treatment group on FM detection thresholds (Kruskal-Wallis: χ <sup>2</sup> = 31.03, *df* = 3, *p* < 0.0001). The average FM detection threshold for NH adults (1.44 ± 0.05 log10Hz) was significantly lower than for CHL adults (1.93 ± 0.06 log10Hz) (Wilcoxon test: χ <sup>2</sup> = 19.8, *df* = 1, *p* < 0.0001). Notably, the average FM detection threshold for juvenile trained CHLreared animals (1.64 ± 0.07 log10Hz) was significantly better than CHL adults (Wilcoxon test: χ <sup>2</sup> = 10.73, *df* = 1, *p* < 0.001), and only slightly worse than the detection thresholds of NH animals (Wilcoxon test: χ <sup>2</sup> = 3.4, *df* = 1, *p* = 0.07). However, the CHL animals that had been trained for 10 days (**Figure 2**, closed red circles; 1.54 ± 0.09 log10Hz) generally displayed better detection thresholds than those animals trained for only 4 days (**Figure 2**, open circles; 1.74 ± 0.09 log10Hz). This suggests that developmental training on FM detection rescued adult abilities for animals raised with moderate hearing loss.

Finally, we asked whether training on the FM detection task in adulthood could improve performance of NH animals when subsequently tested again. NH adults with prior adult training on the FM detection task displayed better detection thresholds (1.14 ± 0.12 log10Hz) than NH adults (Wilcoxon test: χ <sup>2</sup> = 4.77, *df* = 1, *p* < 0.05), untrained CHL adults (Wilcoxon test: χ <sup>2</sup> = 12.8, *df* = 1, *p* < 0.001), and the juvenile trained CHL adults (Wilcoxon test: χ <sup>2</sup> = 10.1, *df* = 1, *p* < 0.01). Therefore, the effect of juvenile training on animals with CHL did not restore the highest level of performance that was observed in NH animals.

To determine whether developmental training modifies the variance of performance, the psychometric function slopes were obtained at *d* <sup>0</sup> = 1 for the three best sessions. **Figure 3** shows a main effect of treatment group (Kruskal-Wallis: χ <sup>2</sup> = 34.3, *df* = 2, *p* < 0.0001). The average slope for NH adults (0.06 ± 0.01 *d* 0 /log10Hz) was significantly steeper than found in CHL adults (0.02 ± 0.007 log10Hz) (Wilcoxon test: χ <sup>2</sup> = 18.65, *df* = 1, *p* < 0.0001). However, the average slope for juvenile trained CHLreared animals (0.17 ± 0.02 *d* 0 /log10Hz) was significantly steeper than CHL adults (Wilcoxon test: χ <sup>2</sup> = 20.4, *df* = 1, *p* < 0.0001), and also significantly steeper from NH animals (Wilcoxon test: χ <sup>2</sup> = 15.9, *df* = 1, *p* < 0.001). The CHL animals that had been trained for 10 days (**Figure 3**, closed red circles; 0.2 ± 0.03 *d* 0 /log10Hz) generally displayed steeper slopes than those animals trained for only 4 days (**Figure 3**, open circles; 0.13 ± 0.03 *d* 0 /log10Hz). This suggests that developmental training on FM detection decreases performance variability for animals raised with hearing loss.

#### **DISCUSSION**

Auditory deprivation during development can induce long-term changes to the central nervous system, and these are associated with perceptual deficits in adulthood (Sanes and Bao, 2009; Sanes and Woolley, 2011). For example, CHL that is induced prior to

hearing onset results in poorer performance on both frequency and amplitude modulation detection tasks, even when audibility is compensated (Rosen et al., 2012; Buran et al., 2014), where our lab's prior work on FM detection in animals with hearing loss tested the theory that there is a critical period for hearing loss on adult perception (Buran et al., 2014). However, the developmental period also provides an opportune time to train animals on perceptual tasks. A brief period of auditory perceptual training during the juvenile period results in a long-term benefit to adult performance on the trained task (Sarro and Sanes, 2011). This current study was a logical extension of this because it led us to explore whether long-term deficits on auditory perceptual skills on FM detection could be rescued with a brief period of training during development. Here, we trained juvenile gerbils with CHL on a FM detection task, and found that their performance on this task in adulthood was better than untrained CHL animals.

The performance of CHL animals was initially assessed immediately at the termination of the training period while animals were still juveniles. At this time, the FM detection thresholds of CHL animals were similar to those displayed by NH juvenile animals (**Figure 1**). This suggests that the deficits in performance, as measured in adults (Buran et al., 2014), emerge gradually as the duration of deprivation accumulates. Because of inadequate sensory input, the animals with CHL may not display a normal trajectory of improvement. A similar phenomenon is found in children with language based learning disorders who display slower auditory perceptual maturation, and do not attain adult levels of performance (Wright and Zecker, 2004). Consistent with this, children with hearing impairments display immature measures of cortical function when compared to NH children that is only partially restored by cochlear implants, suggesting a delay in the developmental process that does not resolve following an extended duration of hearing (Ponton and Eggermont, 2001).

The results from the present study demonstrate that a brief period of FM detection training during this period is sufficient to improve adult detection FM thresholds in animals with CHL, and thus possibly impede the accumulation of the deficit as a function of age. The two training regimens, 4 or 10 days, were not sufficient to restore a normal level of performance (**Figure 2**). However, our results indicate that the amount of developmental training can impact adult performance. Thus, those adults that had received 10 days of developmental training had slightly better adult detection thresholds than those with only 4 days of training (**Figure 2**). The impact of juvenile training on psychometric function slopes (**Figure 3**) suggests that the limited amount of training given to the juveniles with CHL allowed these animals to become more consistent within a narrower range of FM depths. It is possible that, with more training on the FM detection task during development, these animals may have become more confident across a broader range of FM depths and eventually approached NH adult thresholds, while also modifying the slope closer to NH adults. Taken together, these findings suggest that perceptual learning that occurs during development can provide long-term benefits on perceptual abilities and may, in fact, be able to rescue deficits in perception.

The long-term enhancements to performance were the result of only a brief period of developmental training. Each daily session during development occurred over a period of about 10–15 min in duration and only for a total or either 4 or 10 sessions. Following this training period, animals were not exposed to the training stimuli until reaching adulthood. This suggests that the practice required to improve performance may be small, at least for this task. In fact, the amount of perceptual training required to induce learning does differ, depending on the specific percept being trained (Wright and Fitzgerald, 2001; Wright and Sabin, 2007; Fitzgerald and Wright, 2011). For example, in adult humans, the number of sessions required to show dramatic improvement can be as low as one session for tasks such as amplitude modulation detection and interaural time difference detection, but as high as three sessions for interaural level difference detection (Wright and Fitzgerald, 2001; Fitzgerald and Wright, 2011). In the present study, the amount of training had an impact on the degree of improvement in adulthood (**Figure 2**), indicating that a further increase of practice could lead to even greater improvements (Hussain et al., 2009), and possibly full recovery of normal adult perception.

#### **DEVELOPMENTAL TRAINING MAY COMPENSATE FOR CENTRAL IMPAIRMENTS**

One basis for the ameliorative influence of juvenile training is that central impairments caused by CHL (Xu et al., 2007, 2010; Takesian et al., 2010, 2012, 2013) were either corrected or compensated for by learning learning-associated plasticity. Prior studies demonstrate that primary auditory cortex displays induced modifications for both frequency representation and cortical temporal processing (Recanzone et al., 1993; Weinberger and Bakin, 1998; Beitel et al., 2003; Bao et al., 2004). Moreover, behavioral training has been shown to restore impaired cortical processing in animals with developmentally induced hearing loss. For example, in prelingually deafened cats, behaviorally relevant training leads to enhanced temporal processing in auditory cortex that is close to normal levels (Beitel et al., 2011; Vollmer and Beitel, 2011). Consistent with this, congenitally deaf humans who receive cochlear prostheses early in life display improvements in speech perception and language skills, suggesting a long-term effect of exposure and use of auditory input (Busby et al., 1991; Dawson et al., 1992; Svirsky et al., 2004). Thus, modifications and improvements to an impairment of cortical function may accompany the behavioral improvements displayed by the CHL animals with developmental training.

#### **ACKNOWLEDGMENTS**

This work was supported by DC009237 (Dan H. Sanes). We thank Antje Ihlefeld and Brad Buran for help with data analysis.

#### **REFERENCES**


developmental hearing loss. *PLoS One* 7:e41514. doi: 10.1371/journal.pone. 0041514


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 11 January 2014; accepted: 17 March 2014; published online: 04 April 2014.*

*Citation: Kang R, Sarro EC and Sanes DH (2014) Auditory training during development mitigates a hearing loss-induced perceptual deficit. Front. Syst. Neurosci. 8:49. doi: 10.3389/fnsys.2014.00049*

*This article was submitted to the journal Frontiers in Systems Neuroscience.*

*Copyright © 2014 Kang, Sarro and Sanes. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Shaping the aging brain: role of auditory input patterns in the emergence of auditory cortical impairments

## *Brishna Kamal , Constance Holman and Etienne de Villers-Sidani\**

*Department of Neurology and Neurosurgery, Montreal Neurological Institute, Montreal, QC, Canada*

#### *Edited by:*

*Jonathan E. Peelle, Washington University in St. Louis, USA*

#### *Reviewed by:*

*Gregg H. Recanzone, University of California, USA Jos J. Eggermont, Hotchkiss Brain Institute, Canada*

#### *\*Correspondence:*

*Etienne de Villers-Sidani, Department of Neurology and Neurosurgery, Montreal Neurological Institute, 3801 rue University, Montreal, QC H3A2B4, Canada e-mail: etienne.de-villers-sidani@ mcgill.ca*

Age-related impairments in the primary auditory cortex (A1) include poor tuning selectivity, neural desynchronization, and degraded responses to low-probability sounds. These changes have been largely attributed to reduced inhibition in the aged brain, and are thought to contribute to substantial hearing impairment in both humans and animals. Since many of these changes can be partially reversed with auditory training, it has been speculated that they might not be purely degenerative, but might rather represent negative plastic adjustments to noisy or distorted auditory signals reaching the brain. To test this hypothesis, we examined the impact of exposing young adult rats to 8 weeks of low-grade broadband noise on several aspects of A1 function and structure. We then characterized the same A1 elements in aging rats for comparison. We found that the impact of noise exposure on A1 tuning selectivity, temporal processing of auditory signal and responses to oddball tones was almost indistinguishable from the effect of natural aging. Moreover, noise exposure resulted in a reduction in the population of parvalbumin inhibitory interneurons and cortical myelin as previously documented in the aged group. Most of these changes reversed after returning the rats to a quiet environment. These results support the hypothesis that age-related changes in A1 have a strong activity-dependent component and indicate that the presence or absence of clear auditory input patterns might be a key factor in sustaining adult A1 function.

**Keywords: aging, auditory, noise, plasticity, inhibition, A1, parvalbumin, GABA**

### **INTRODUCTION**

Perceptual decline represents a universal component of the aging process across species, yet remains a poorly understood phenomenon. Perceptual deficits, particularly in the primary sensory cortices can commonly result in difficulties identifying fine details of stimuli, as well as a reduced ability to detect signals in noise (Divenyi and Haupt, 1997; Strouse et al., 1998). However, in recent years, much progress has been made in discovering the cellular and molecular correlates of slowed sensory processing. In aged rodents, for example, deficits in auditory processing have been linked to reduced inhibitory signaling (Seidman et al., 2002; Caspary et al., 2005), GABAergic transmission (Ling et al., 2005; Burianova et al., 2009), and parvalbumin positive (PV+) neurons (Ouda et al., 2008; Del Campo et al., 2012), and myelin (de Villers-Sidani et al., 2010; Tremblay et al., 2012). However, a recent upswing in research on cognitive training as therapy for age-related deficits has shown that many of these changes are reversible via specifically targeted training paradigms (de Villers-Sidani et al., 2010). The plastic nature of this inhibitory signaling, therefore, is unlikely to be the fundamental mechanism underlying age-related cognitive decline, and instead could represent a side effect of other complex neurological processes.

Hearing loss in younger animals has been extensively studied, and has led to many insights about the nature of such plasticity in the auditory system. Investigators have found strong evidence to suggest that peripheral damage leads to a downregulation of inhibitory synapses, stemming from a decrease in statistically meaningful sensory inputs from the environment (Robertson and Irvine, 1989; Kotak et al., 2005; Takesian et al., 2009, 2013). This process, termed "negative plasticity," serves to re-establish homeostasis in the auditory cortex, and may explain many of the features associated with altered processing accompanying damage to the auditory system (Syka, 2002; Takesian et al., 2009, 2011). Several authors have suggested that a similar mechanism may also contribute to age-related deficits in auditory processing, whereby altered input from the periphery (i.e., due to cochlear degeneration) leads to "noisy" information reaching the cortex (Mendelson and Ricketts, 2001; Caspary et al., 2005, 2008). Pioneering work from the Eggermont lab has shown that passive exposure to various forms of moderatelevel noise is sufficient to cause dramatic changes in the A1 of adult cats, including disorganization of tonotopic indices and frequency tuning (Noreña et al., 2006; Pienkowski and Eggermont, 2010; Eggermont, 2013; Pienkowski et al., 2013). In rats, further work has shown that exposing young rats to continuous broadband noise could produce cortical patterns of broadened activation and temporal desynchronization in A1 similar to those seen in aged animals, yet with an absence of peripheral damage or degeneration (Zhou et al., 2011). Recently, similar exposure in adult rats has also been shown to distort tonotopic organization in A1 and compromise measures of pitch discrimination (Zheng, 2012). Thus, it appears that a simple change in the statistics of sensory inputs, essentially masking the majority of auditory patterns, can powerfully induce plasticity in the adult rat A1. Little is known, however, about the consequences of chronic noise on inhibitory circuits and processes that are specifically altered by aging. Further, although Pienkowski and Eggermont (2009) have shown that certain elements of A1 frequency representation can be remediated by placing adult cats in a quiet environment after noise exposure, much remains to be known about the mechanisms of inhibition involved in attenuation of noise-induced changes through such a recovery paradigm. If negative plasticity in response to altered input is indeed responsible for age-related A1 impairments, then to what degree may other plasticity contribute to recovery of normal auditory function in the aged or damaged auditory system?

To explore these questions, we exposed a group of young adult (6 months) rats to 8 weeks of low-grade broadband random noise to chronically mask auditory patterns reaching A1. The sound intensity level chosen for this exposure (65 dB SPL) is well below the threshold for potential hair cell damage. We hypothesized that after this exposure, the animals would demonstrate impaired auditory cortex tuning selectivity and temporal processing akin to deficits previously found in older rats (Caspary et al., 2008; de Villers-Sidani et al., 2010), accompanied by changes in inhibitory neuron populations and myelin. Finally, we posited that a return to a normal auditory environment would result in a spontaneous recovery of these noise-induced deficits, due to their plastic nature.

## **METHODS**

All experimental procedures used in this study were approved by the Montreal Neurological Institute Animal Care Committee and follow the guidelines of the Canadian Council on Animal Care.

#### **SOUND EXPOSURE**

The young exposed rats were housed for 8 consecutive weeks in a sound attenuated chamber equipped with a speaker reproducing continuous (24 h/day, 7 day/week) broadband random noise covering 0.1–80 kHz and presented at 65 dB SPL via a calibrated speaker. The noise were generated with custom MatLab routines and played back via a MOTU UltraLite-mk3 Hybrid Interface sampling at 192 kHz.

#### **MAPPING THE AUDITORY CORTEX**

Fourteen female young (6 months old) and fourteen female aged (22 months old) Long-Evans rats were used for this study. For A1 mapping, the rats were pre-medicated with dexamethasone (0.2 mg/kg) to minimize brain edema. They were then anesthetized with ketamine/xylazine/acerpromazine (65/13/1.5 mg/kg, i.p.) followed by a continuous delivery of isoflurane 1% in oxygen delivered via endotracheal intubation and mechanical ventilation. Vital signs were continuously recorded using a MouseOx device (Starr Life Sciences, Holliston, Massachusetts). Body temperature was monitored with a rectal probe and maintained at 37◦C with a hemothermic blanket system. The rats were placed in a custom designed head holder, holding the rat by the orbits, leaving the ears unobstructed. The cisterna magnum was drained of cerebrospinal fluid to further minimize cerebral edema. The right temporalis muscle was reflected, auditory cortex was exposed and the dura was resected. The cortex was maintained under a thin layer of silicone oil to prevent desiccation. Recording sites were marked on a digital image of the cortical surface.

Cortical responses were recorded with 32–64 channel tungsten microelectrode arrays (TDT, Alachua, Fl). The microelectrode array was lowered orthogonally into the cortex to a depth of 470– 600µm (layers 4/5), where vigorous stimulus-driven responses were obtained. The extracellular neural action potentials were amplified, filtered (0.3–5 kHz), sorted, and monitored on-line. Acoustic stimuli were generated using TDT System III (Tucker-Davis Technology, Alachua, FL) and delivered in a free field manner to right ear through a calibrated speaker (TDT). A software package (OpenEx; TDT, Alachua, FL) was used to generate acoustic stimuli, monitor cortical response properties on-line, and store data for off-line analysis. The evoked spikes of a single neuron or a small cluster of neurons were collected at each site.

Frequency-intensity receptive fields (RF) were reconstructed by presenting pure tones of 63 frequencies (1–48 kHz; 0.1 octave increments; 25 ms duration; 5 ms ramps) at eight sound intensities (0–70 dB SPL in 10 dB increments) to the contralateral ear at a rate of tone stimulus per second. Ten minute-long trains of tones pips with of 50 ms duration pips were presented at 5 pulses per second at a sound intensity of 70 dB SPL. Each train had a commonly occurring frequency (standard) with a probability of occurrence of 90% and a pseudo-randomly distributed oddball frequency pips presented 10% of the time with no repetition. The two frequencies in the train had constant separation of 1 octave and were chosen so they would be contained within the RF of the recorded neuron and elicit strong reliable spiking responses.

The stimulus used to estimate spectro-temporal receptive fields (STRFs) is based on the stimulus used in Blake and Merzenich (2002), and was created by adding independent tone pip trains at each 1/6th octave frequency bands between 0.75 and 48 kHz. Tone pips in each independent train were 50 ms long with 5 ms on and off ramps and occurred following a Poisson distribution with an average of 0.25 pip per second (average tone pip rate of 1 pip per 139 ms). The spectro-temporal density stimulus was presented continuously for 15 min.

#### **IMMUNOHISTOCHEMISTRY**

At the end of recording sessions, all rats received a high dose of pentobarbital (85 mg/kg i.p.) and perfused intracardially with saline followed by 4% paraformaldehyde in 0.1 M phosphatebuffered saline (PBS) at pH 7.2. Their brains were removed and placed in the same fixative overnight for further fixation and then transferred to a 30% sucrose solution, snap-frozen, and stored at −80◦C until sectioning. Fixed material was cut in the coronal plane along the tonotopic axis of A1, on a freezing microtome at 40µm. Tissue was incubated overnight at 4◦C in either monoclonal or polyclonal antisera (For anti-PV: #P3088; dilution: 1:10,000; Chemicon International, Temecula, CA; for anti-MBP: ab62631; dilution 1:500; ABCAM). The following day, sections were washed and incubated in secondary anti-sera (for PV+, Cy2:#715-545-151; dilution 1:800; Jackson ImmunoResearch; for MBP, Alexa 647; dilution 1:800, Invitrogen). Tissue from young and aged rats were always processed together in pairs during immunostaining procedures to limit variables relate to

*NANB*-

antibody penetration, incubation time, and post-sectioning age/ condition of tissue. A Zeiss LSM 510 Meta confocal microscope was used to assess fluorescence in the immunostained sections. Quantification of PV+ cells and MBP optical density (OD) was performed in Image J and MetaMorph imaging software (Molecular Devices Systems, Toronto, ON), respectively. To quantify Myelin OD, digital images of A1 cortical sections were taken with a 40× objective (Zeiss LSM 510). All quantification was assessed in 300–400µm wide A1 sectors (rostral, middle, and caudal) per hemisphere extending from layer 1 to the underlying white matter. Data were then recorded as an averaged value for each case. The experimenters performing the histological measurements reported in this study were blind to the age of the animals.

#### **ELECTROPHYSIOLOGICAL DATA ANALYSIS**

The characteristic frequency (CF) of a cortical site was defined as the frequency at the tip of the V-shaped tuning curve. For flatpeaked tuning curves, CF was defined as the midpoint of the plateau at threshold. For tuning curves with multiple peaks, CF was defined as the frequency at the most sensitive tip (i.e., with lowest threshold). Response bandwidths 10 dB above threshold of tuning curves (BW10) were measured for all sites. The CF, threshold, and BW10 were determined by using an automated routine developed in the MatLab environment (The MathWorks Inc., Natick, MA). A1 was identified based on its rostral-to-caudal tonotopy, reliable low-latency tone-evoked neuronal responses and relatively sharp V-shaped RF (Polley et al., 2006).

To generate A1 maps, Voronoi tessellation (a Matlab routine; The MathWorks) was performed to create tessellated polygons, with electrode penetration sites at their centers. Each polygon was assigned the characteristics (i.e., CF) of the corresponding penetration site. In this way, every point on the surface of the auditory cortex was linked to the characteristics experimentally derived from its closest sampled cortical site. The boundaries of the primary auditory cortex were functionally determined using published criteria (Bao et al., 2003). The normalized tonotopic axis of CF maps was calculated by rotating the map to make horizontal a linear function fit of the penetration coordinates using a least squares method. The tonotopic index (TI) was determined by computing the average minimum distance from each data point to the line connecting (0, 0) and (1, 1) after converting the logarithmic frequency range (1–48 kHz) to a linear range (0–1). We used the reverse correlation method to derive the spectrotemporal receptive field (STRF), which is the average spectrotemporal stimulus envelope immediately preceding a spike (STA) (Escabi and Schreiner, 2002). Only neurons with CFs well within the sound range of the stimulus were used. To enable comparisons between neurons, each STRF was normalized to the absolute value of peak activation of the STA. Total activation and inhibition strength was then calculated as the integral of the positive or negative area of the STA more than 2 standard deviations away from the baseline.

To compute the degree of neural synchronization in A1, we computed cross-correlation (CC) functions from each electrode pair by counting the number of spike coincidences for time lags of −100 to 100 ms with 1 ms bin size. These were then normalized by dividing each bin by the square root of the product of the number of discharges in both spike trains (Brosch and Schreiner, 1999). Neural events occurring within 10 ms of each other in two channels were considered synchronous. The degree of synchronization may be correlated with spike rates in a non-linear manner. For each pair of spike trains, we estimated the number of synchronized events if the two spike trains were not correlated, using *NANB*-*T*, where *NA* and *NB* are the numbers of spikes in the two spike trains, - (=21 ms) is the bin size, and *T* is the duration of the recording (Eggermont, 1992; Bao et al., 2003). The strength of the synchrony was then assessed using a *Z*-score of the number of synchronous events: *<sup>Z</sup>* <sup>=</sup> number of syn events <sup>−</sup> *NANB*- - *T* .

*T* For neural synchrony recording, offline spike sorting using TDT OpenSorter (Tucker-Davis Technology, Alachua, FL) was performed to include only single units in the analysis.

Normalized responses to standard and oddball tones were obtained by dividing the average firing rate recorded in the 50 ms after the occurrence of each tone presentation by the average firing rate observed during the 50 ms after the first standard or oddball tone in the sequence. Asymptotes for standard and oddball responses were calculated by fitting exponential functions with a least squares method to the normalized response data from each recorded neuron. The method used to quantify probability coding in A1 has been previously described in detail Ulanovsky et al. (2003). Receiver operating characteristics (ROC) curves were calculated by plotting the true positive rate against the false positive rate of classification of oddball vs. standard the distribution of normalized firing rates as previously published (Britten et al., 1992; Dayan et al., 2001).

#### **STATISTICS**

Statistical significance was assessed using unpaired two-tailed *t*-tests with Bonferroni correction for multiple comparisons. Data are presented as mean ± standard error to the mean (s.e.m).

#### **RESULTS**

#### **COMPARING THE EFFECTS OF NOISE EXPOSURE AND AGING ON FREQUENCY REPRESENTATION IN A1**

In the adult rat A1, neurons' RF are usually sharp, V-shaped, and possess tuning that follows a smooth rostro-caudal gradient known as the tonotopic axis (Kelly and Sally, 1988; Zhang et al., 2001; Polley et al., 2006). Natural aging is associated with a broadening of A1 RFs and a disruption of this tonotopic axis (Turner et al., 2005; de Villers-Sidani et al., 2010). We investigated here whether an 8-week low-grade broadband noise exposure was sufficient to induce similar changes in A1 frequency representation. To do so, we examined the frequency tuning characteristics of A1 neurons in young adult (Y, *n* = 6), aged (A, *n* = 7) and young adult rats exposed to low-grade noise (Y-NE, *n* = 7). Representative examples of A1 characteristic frequency (CF) tuning maps in each group are shown in **Figure 1**. Noise exposure caused a significant RF broadening, as measured with BW10 (bandwidth 10 dB above threshold, see Methods) of neurons across the frequency spectrum (18% increase in BW10 compared to young naïve, *p* = 0.02–0.04, **Figure 1C**), with neurons tuned to low frequencies being slightly more affected. BW10

was also globally increased in the aged group across the frequency tuning range (38% increase on average compared to young naive, *p* = 0.002–0.01, **Figure 1C**). Low-frequency tuned neurons were also more affected in that group, with BW10 values similar to what has been previously reported in aged rats of a different strain (de Villers-Sidani et al., 2010). BW10 measures were not statistically different between the aged and noise-exposed groups (*p* > 0.2). The orderliness of frequency representation along A1's tonotopic axis was quantified using a tonopic index (TI) that assesses the degree of scatter in frequency tuning around an ideal logarithmic tonotopic progression (Zhang et al., 2001) (see Methods). Higher TI values imply more scatter. The TI was significantly elevated in both the aged and noise-exposed group compared to young controls (Y: 0.15 ± 0.008; Y-NE 0.32 ± 0.03, *p* = 0.003; A: 0.26 ± 0.03, *p* = 0.03, **Figure 1D**). An examination of the frequency distribution reveals that in Y-NE, the increase in TI is primarily due to the emergence of neurons with relatively low tuning (*CF* < 6 kHz) in more rostral sectors of A1. This effect was not present in aged rats, which displayed a more homogenous scatter in CF tuning. It should be noted that sound intensity thresholds in A1 were not significantly altered after noise exposure (*p* > 0.2). A few aged rats (<15% of those examined) showed significant increase in cortical thresholds attributable to peripheral hearing loss (usually in the high frequency >20 kHz range). These animals were excluded from this study.

#### **SPECTRO-TEMPORAL INTERACTIONS IN THE NOISE EXPOSED A1**

The effect of natural aging on the sharpness of A1 neurons' tuning is thought to be partially due to reduced inhibitory influences from neighbouring A1 sectors. This phenomenon, also called "side-band inhibition," can be quantified by reconstructing the spatio-temporal receptive fields (STRF) of A1 neurons using a dense broadband auditory stimulus (deCharms et al., 1998; Blake and Merzenich, 2002; Valentine and Eggermont, 2004; Noreña and Eggermont, 2005; de Villers-Sidani et al., 2010). The STRFs of neurons in each experimental group were computed using spiketriggered averaging (reverse correlation) of a "random chord" stimulus containing a spectrally and temporally dense sequence of random tone pips (see Methods). Representative STRFs obtained from each experimental group are presented (**Figure 2A**). Average inhibitory strength across each neuron population was computed, after the activation peaks of the STRFs were aligned, and response intensity was normalized according to the total strength of activation. Total STRF inhibitory area was, on average, 25% less in the naive aged group and young noise-exposed group, compared with the naive young group (**Figure 2B**; *p* = 0.0001–0.001). The ratio activation over inhibition was also computed for each neuron and found significantly elevated in the Y-NE and aged groups (*p* = 0.0002–0.001). This reduction in response inhibition in both groups was most apparent for stimulus frequencies less than 1 octave away from the neurons' best frequency and occurring 50–150 ms before neuron spiking. A clear broadening of the RF tuning was also observed and was reflected as an overall increase in area of activation in the STRF. This change, however, was relatively small compared to the change in inhibition (7 and 12% increase in NE and A, respectively, *p* = 0.02 and 0.03). A greater change in inhibition is also reflected in the activation to inhibition strength ratio, which was significantly increased in these two groups (**Figure 2B**). The latency to maximal activation and inhibition was also significantly reduced in the NE and A groups (average latency to maximal activation: Y: 27 ± 2 ms; Y-NE: 23 ± 2 ms, *p* = 0.002; A: 24 ± 2, *p* = 0.02; average latency to maximal inhibition: Y: 113 ± 18 ms; Y-NE: 68 ± 12 ms, *p* < 0.001; A: 74 ± 12 ms, *p* = 0.02 **Figure 2C**).

Corticocortical interactions in A1 were further studied by measuring CC functions and neural synchrony on the spontaneous discharge of individual A1 neurons at varying

inter-electrode distances in all experimental groups. Higher CC and synchrony suggests stronger horizontal projections, indicative of more organized or efficient processing (Eggermont, 2007). The mean peak coefficient for all neuron pairs recorded at an inter-electrode distance of 0.5 mm or less was 14 and 27% lower in the noise-exposed and aged group, respectively, relative to the young controls (Y: 0.081 ± 0.008; Y-NE: 0.065 ± 0.01, *p* = 0.03; A: 0.058 ± 0.06, *p* = 0.006; **Figure 3A**). Furthermore, individual CC functions were on average 30% wider (width at half height of the peak) in the young control group compared to the noise-exposed and aged group (*p* < 0.001 in both cases). Neural synchrony measurements (see Methods) revealed that the impact of noise exposure was stronger on relatively short inter-neuronal distances (<1.25 mm), as it dissipated at distances of 1.75 or more (**Figure 3B**). In aged rats, however, the reduction in synchrony was significant at the longest inter-neuronal distances we could measure in A1 (2.75 mm). Noise exposure and aging both resulted in a slight but significant lag in the peak of the maximal correlation. This effect was seen for all interneuronal distances in the A group but only for neurons pairs separated by 1.25 mm or less in the NE group (average lag of CC function maximum: Y: 1.3 ± 0.1 ms; Y-NE: 1.6 ± 0.1 ms, *p* = 0.02; A: 1.9 ± 0.2, *p* = 0.01, **Figure 3C**).

#### **IMPACT OF NOISE EXPOSURE ON NOVEL STIMULUS DETECTION**

A1 neurons have the capacity to increase the salience of tones deviating in frequency from a sequence of monotonous stimuli (Ulanovsky et al., 2003). A reduction in the responses to these "oddball" tones is thought to be directly linked to the impairment in novel stimulus detection that follows natural aging (Ulanovsky et al., 2003; de Villers-Sidani et al., 2010). We compared the effect of noise exposure on deviant stimulus detection

**FIGURE 3 | Reduced A1 neural synchrony in noise-exposed and aged groups. (A)** Mean cross-correlation functions for all A1 neuron pairs with inter-neuronal distances less than 0.5 mm in Y, Y-NE, and A groups. **(B)** *Z*-score of neuronal firing synchrony (see Methods) as a function of distance for site pairs for all experimental groups. **(C)** Average time lag in absolute values of the peak of the cross-correlation function for all recorded pairs. (Y, number of site pairs: *n* = 513; Y-NE, *n* = 237; A, *n* = 263).

in A1 by presenting trains of identical, repetitive tones (standard or "distractors") and introducing occasional deviant (oddball) frequencies. Exponential functions were fitted to the normalized response rates of A1 neurons to oddballs and standard tones in all experimental groups. This provided a quantitative measure of maximal suppression (asymptote, normalized units) of background tones and their separability from oddball tones (**Figure 4**). We found a significant reduction in the average suppression of responses to standard tones (mean standard asymptote of normalized response rate: Y, 0.20 ± 0.01; Y-NE, 0.38 ± 0.05, *p* < 0.001 relative to Y; A, 0.36 ± 0.07, *p* < 0.01 relative to Y) in both the noise exposed and aged groups. No significant difference in the overall magnitude of responses to oddballs was found in either group (*p* > 0.2). Overall, the effect of noise exposure and aging translated into a diminished response gap between standards and oddballs (asymptote *difference* between oddballs and standards; Y 0.43 ± 0.06; Y-NE 0.21 ± 0.08, *p* < 0.01 relative to Y; A; 0.22 ± 0.06, *p* < 0.01 relative to Y).

The difference in novel stimulus detection was quantified in the experimental groups by performing receiver operating characteristic ROC analyzes on response rates to oddballs and standards (see Methods, **Figure 4C**). This measure gives an indication of how an ideal observer would discriminate between the occurrences of an oddball or standard tone, solely based on the response magnitude of the recorded neuron. In young adult rats, the probability of a reliable discrimination (70% of correct classification, measured by the area under the ROC curve) was reached on average once 15 tones had been presented in the sequence (average successful classification rate probability: 0.71 ± 0.04), and was maintained thereafter. By contrast, for the aged and noise-exposed groups, this value remained significantly lower, under the detection criterion, even after the presentation of 200 tones in the oddball sequence (Y-NE: 0.60 ± 0.05, *p* = 0.04; A: 0.59 ± 0.07, *p* = 0.03).

#### **REDUCTION IN PV+ CELLS AND MYELIN IN THE A1 OF NOISE EXPOSED AND AGED RATS**

Parvalbumin positive (PV+) cortical neurons are part of a group of inhibitory inter-neurons that play an important role in stimulus selectivity and novel stimulus detection in sensory cortices (Beierlein et al., 2000; Fries et al., 2002; Lee et al., 2012), and aging is associated with a reduction in their numbers in the rat A1 (de Villers-Sidani et al., 2010). A progressive decline in brain myelin is also observed in the aging rat and human brain A1 (Itoyama et al., 1980; Steen et al., 1995; de Villers-Sidani et al., 2010), and is also thought to contribute to age-related cognitive decline (Peters, 2002). To determine if low-grade noise exposure can mimic the effect of aging on these structural elements of the cortex, we measured the density of PV+ and GABA+ cells and myelin basic protein (MBP) in A1 in our 3 experimental groups using standard immunofluorescence techniques (**Figure 5A**). Noise exposure resulted in a significant 30% reduction in PV+ cells in A1 (all layers pooled, *p* < 0.001), which was equivalent to the reduction observed in naive aged relative to young controls (25%, *p* < 0.01) (**Figure 5B**). The global count of inhibitory interneurons in A1 as evidenced by staining for GABA was also significantly lower in noise-exposed rats by 20% compared to young controls (*p* < 0.01, **Figure 5B**), but this was not the case for aged rat (Y: 8.73 ± 0.72 vs. 8.0 ± 0.58 GABA+/hpf, *p* = 0.2). Staining for MBP was also significantly reduced in A1 following noise exposure (44% relative to Y, *p* = 0.04), which was

**noise-exposed group. (A)** Representative normalized responses of individual A1 neurons in the three experimental groups to standards (black line) and deviant tones or "oddballs" (gray line) as a function tone position in the stimulus sequence. The red dotted line represents the asymptote of the progressively suppressed response to standards. **(B)** Average asymptote computed for the response to oddballs and standards in all groups. Note the reduction in the difference between

groups (height of red vertical lines). **(C)** Average receiving operating characteristic curves computed from responses of individual neurons to oddball tone trains in all groups and for different time points in the tone sequence. AUC stands for "area under the curve." Note that Y-NE and A groups do not reach the discrimination criterion (70%) (Y, number of neurons recorded = 275; Y-NE = 208; A = 245). <sup>∗</sup>*p* < 0.05, ∗∗*p* < 0.01: *t*-test.

similar to what was observed in the aged group (38% *p* = 0.002 relative to Y).

#### **EXPERIENCE-DEPENDENT REVERSIBILITY OF NOISE-INDUCED AND AGE-RELATED A1 CHANGES**

To examine the reversibility of functional and structural noiseinduced changes in A1, we placed a group of young adult rats (6 months old, *n* = 5) previously exposed to noise for 8 weeks in a quiet, noiseless auditory environment for an additional 8 weeks. We then mapped in this group's A1 responses to the same stimuli used in the noise exposed group, and quantified PV+ cell populations and myelin density. Average breadth of RFs in A1 was significantly less than after noise exposure but still slightly higher than in young controls (average BW10 Young-R vs. Y: 1.21 ± 0.11 vs. 0.95 ± 0.14, *p* = 0.05; Y-NE: 1.46 ± 0.11, *p* < 0.01, **Figure 6B**). A similar finding was obtained for the orderliness of the frequency representation gradient in A1, where the time spent in the noiseless environment did not appear to be sufficient for the tonotopic axis to return to normal (average TI Young-R vs. Y: 0.21 ± 0.02 vs. 0.15 ± 0.03, *p* = 0.03; Y-NE: 0.27 ± 0.03, *p* = 0.02, figure not shown). For all other functional measures examined, however, we found a complete normalization of A1 responses after discontinuation of noise exposure (figure not shown). These include STRF inhibitory and excitatory areas (*p* = 0.11 and *p* = 0.47, respectively), local neural synchrony (inter-neuronal distances < 0.5 mm, *p* = 0.32) and the asymptote of the response to recurrent tones in oddball sequences (*p* = 0.45). PV+ counts and myelin density also recovered partially in this group as shown in **Figure 6**.

## **DISCUSSION**

Based on the results of our experiments, it appears that a prolonged masking of auditory input patterns with low-grade noise may be sufficient to cause numerous alterations in A1 normally associated with aging, such as tonotopic re-organization of A1 previously demonstrated by Zheng (2012) in a similar paradigm with younger animals. These changes, perhaps, represent the result of a form of negative plastic compensation akin to corticothalamic excitatory patterns accompanying induced hearing loss (Syka, 2002; Kotak et al., 2005), whereby inhibitory activity is reduced to compensate for "noisy" information from the environment. In older humans, it has been hypothesized that similar compensatory mechanisms may be linked to broadened patterns of cortical and subcortical activation observed by functional neuroimaging paradigms (Reuter-Lorenz and Cappell, 2008; Reuter-Lorenz and Park, 2010). However, these changes do not appear to be of a purely neurodegenerative nature in our experiment, as many aspects of normal auditory processing were recovered by young rats returned to a normal auditory environment.

Adult rats exposed to our broadband stimulus demonstrated significantly altered spectral and temporal processing profiles, as well as anatomical changes mirroring those previously found in aged animals (Mendelson and Ricketts, 2001; Caspary et al., 2008; de Villers-Sidani et al., 2010), and young adults exposed to continuous white noise (Zheng, 2012). To some extent, these patterns also mimic the diffuse excitatory nature of signaling of early critical periods (de Villers-Sidani et al., 2008; de Villers-Sidani and Merzenich, 2011). In fact, previous work using a very similar exposure paradigm in adult rats was shown to restore many characteristics of the critical period for noise exposure (Zhou et al., 2011), and extend it beyond its normal period of closure in young animals (Chang and Merzenich, 2003; Zhou et al., 2008; Zhou and Merzenich, 2009). In both cases, although the authors did not specifically test for abnormalities commonly associated with aging (i.e., response to oddball sequences), exposed animals demonstrated a BW10 profile very similar to that found in our results, pointing toward the capacity of unpatterned noise to provoke plasticity leading to widespread, non-specific excitation.

Despite several studies examining the impact of various forms of repetitive noise exposure on auditory processing, it is not yet clear by which mechanism this type of broadened excitation and diffuse responsiveness manifests itself. It appears that the phenomenon is primarily a cortical process (Ulanovsky et al., 2003), and thus, likely involves changes over several types of neuronal populations. For example, sustained excitation during recurrent sounds could stem from reduced suppression of recurrent excitatory input, perhaps from changes to the network of inhibitory interneurons in the cortex. Alternatively, the excitation could represent the failure of cells to reduce individual responses to incoming stimuli, as seen during stimulus-specific adaptation (Ulanovsky et al., 2003, 2004). Pienkowski and Eggermont (2012) have further suggested that such inhibitory changes could, in turn, impact inhibition on neighboring cortical fields, causing widespread changes in excitability across a major portion of A1. In either case, however, our STRF results are suggestive of large-scale plastic changes leading to deterioration of STRFs and weakening of strong post-activation suppression and sideband inhibition, previously found in aged rats (de Villers-Sidani et al., 2010). Similar mechanisms may also have played a role in our young rats' oddball response and BW10 profile, suggesting that during the course of exposure, large-scale, coordinated mechanisms of plasticity were mobilized. The reduction in postactivation suppression we observed after noise exposure is most likely due to a decrease in cortical inhibition (Eggermont, 1999; Wehr and Zador, 2005). Fast spiking interneurons (majority PV+) could be implicated more specifically in this deficit due to their influence on receptive field shape and strong gating of noncoincident inputs (Beierlein et al., 2000; Fries et al., 2002; Cardin et al., 2009; Lee et al., 2012) but the role of other inhibitory cells in in mediating this noise induced change can not be excluded. For example, somatostatin positive inhibitory interneurons have been shown modulate the gain of repetitive excitatory stimuli in hippocampal neurons (Kozhemyakin et al., 2013). Their exact role in A1 processing remains however, elusive.

Along with emergence of changes to spectrotemporal processing in A1, noise exposure also had an effect on the population of PV+ inhibitory interneurons and cortical myelin, which had both previously been shown to be reduced in the aged cortex (de Villers-Sidani et al., 2010). In the aged group, the total GABA+ cell count was not significantly decreased. Since all PV+ cell are also GABA immunoreactive, we interpret this as a reduction in PV expression with aging rather than cell death. An upregulation of PV+ expression in the aged A1 after auditory training also supports this idea (de Villers-Sidani et al., 2010). Noise exposure, however, led to a decrease both in GABA and PV expression, suggesting a role for reduced inhibitory neurotransmission in response to environmental noise. As with aging, noise also caused a reduction in MBP. The functional impact of reduced A1 MBP is unclear given the fact that we observed and overall reduction in tone-evoked latencies, which are also strongly influenced by inhibition (Calford and Semple, 1995). On the other hand, the increased lag in peak cortical synchronization we observed could in theory be secondary to a demyelination of cortico-cortical projections. The fact that PV and MBP expression normalized after the return to a quiet environment again indicates that this might occur here too through a down-regulation of protein production rather than cell actual cell death. Why GABA is reduced after noise exposure, yet not with aging is still unclear and the physiological consequences of reduced PV and MBP expression also remain to be clarified. However, in human populations, it is known that cortical inhibitory transmitters can change in complex ways throughout the lifespan (Sundman-Eriksson and Allard, 2006; Pinto et al., 2010), potentially contributing to changes in sensory processing in old age. Thus, they represent an exciting target for research on aging, and experiments are currently underway in our laboratory to explore this area.

Apart from examining the link between effects of low-grade noise exposure and age-related auditory deficits, this work has also provided some important insight on how plasticity in A1 may be mobilized to favor recovery from, and even reversal of such deficits. In humans, there is a growing literature centered around the use of training, or other enrichment strategies to mediate are-related auditory cognitive decline (Chislom et al., 2003; Alain et al., 2013; Anderson et al., 2013). However, many of the physical mechanisms of this recovery process remain to be elucidated, leaving room for experiments such as ours to determine what electrophysiological correlates are (and are not) subject to experience-dependent plasticity. Primarily, we have shown that noise-exposed young rats can regain A1 function more typical to their age group if returned to a more normal acoustic environment. However, it is still puzzling that despite improvements in BW10, oddball response and local firing synchrony, the TI of these animals failed to normalize. This result parallels the work of Pienkowski and Eggermont (2009), who had previously noted a similar disruption of tonotopic organization in the A1 of cats exposed to noise, despite 12 weeks in a quiet recovery environment. While the exact reasons for this discrepancy remain unknown it is possible that the tonotopic axis is most sensitive to non-patterned broadband noise, and requires a spectrally enriched, rather than quiet, recovery environment to return to normal. In any case, more work needs to be done to better characterize flexibility of the tuning curve across different exposure paradigms, particularly examining the time course of changes in the tonotopic map during recovery.

Questions additionally remain about the effect of enrichment on the auditory cortices of aged rats. Previously, several studies examining the effect of auditory enrichment on young animals have found it to improve many aspects of spectral and temporal processing (Engineer et al., 2004; Percaccio et al., 2005; Jakkamsetti et al., 2012). If increased environmental "noise" can, indeed, induce a form of negative plasticity as suggested by our results in noise-exposed young rats, then perhaps many agerelated auditory processing deficits can be linked to a similar, yet more robust plastic response in the aging brain. This possibility certainly merits further study, and is currently being examined via several new ongoing projects.

Finally, further experiments are needed to define the precise nature of events precipitating impaired cortical processing in the aging brain. It is still unclear whether reduction in incoming regular sensory patterns [i.e., from peripheral degeneration such as the well-documented degeneration of cochlear hair cells (Crowley et al., 1972, 1973; Soucek et al., 1987; Seidman et al., 2002)] or "internal" noise caused by reduced inhibitory neurotransmitter cell populations in the cortex and/or brainstem leads to more significant decline. In the case of the latter scenario, it is additionally possible that decreased inhibition could set off a vicious cycle, at first compensating in an adaptive manner for altered input but subsequently resulting in the widespread desynchronization and slower temporal processing seen in our experiment. Though the development of an "aged" brain is doubtless a complex process, our experiments have provided a novel method to disentangle these factors via selective exposure/enrichment paradigms, and future work in our laboratory will aim to further explore the many facets of neurological change and adaptation in the aging brain.

In conclusion, environmental noise appears to have a powerful influence on creating auditory processing deficits akin to those in aged animals. This work supports the hypothesis that age-related auditory decline represents a form of negative plasticity, compensating for low-quality information arriving at the cortex (i.e., from damage to the cochlea or auditory pathways), and in the process, fostering diffuse and maladaptive patterns of excitation. Unfortunately, our experiment did not examine the behavioral correlates, but using a similar noise exposure procedure, recent work by Zheng (2012) has demonstrated that exposed rats had significantly impaired fine pitch discrimination, yet could adapt to perform behavioral tasks in noisy environments better than non-exposed controls. In addition, a small body of literature in human populations exposed to long-term broadband noise has hinted at the capacity of this type of exposure to be associated with a selection of cognitive impairments, ranging from difficulty with pitch discrimination to impairments in memory and attention (Gomes et al., 1999; Pawlaczyk-Luszczyñiska et al., 2005). Many other neuropsychiatric disorders are thought to result in a similar type of disorganized or "noisy" auditory processing, such as schizophrenia (Kim et al., 2009; Shin et al., 2012) or autism (Siegal and Blades, 2003; Kern et al., 2006), suggesting that similar mechanisms of plasticity may exist as in the aging brain. Fortunately, our work also suggests that these deficits are neither irreversible nor purely degenerative, which could have important implications for the remediation of brain processing impairments in these disorders. It is our hope that exposurebased paradigms such as ours may prove useful in modeling age or disease-related deficits in cortical processing, and that we may continue to gain a better understanding of changes in plasticity, and provide strategies of promote cognitive health across the lifespan.

## **AUTHOR CONTRIBUTIONS**

Brishna Kamal and Etienne de Villers-Sidani designed the experiments. Brishna Kamal performed the experiments. Etienne de Villers-Sidani and Brishna Kamal performed analysis. Constance Holman, Brishna Kamal, and Etienne de Villers-Sidani wrote the paper.

## **ACKNOWLEDGMENTS**

The authors would like to thank Lydia Ouellet for her technical support during these experiments, as well as Sydney Lee, Zahra Kamal, and Salvador Vergara Lopez for their help with histological analysis. This work was supported by Canadian Institutes for Health Research (CSA phase two II), the Fonds de Recherche en Santé du Québec (Bourse d'établissement de Jeune Chercheur) and the Canadian Foundation for Innovation (CFI infrastructure grant 28121).

#### **REFERENCES**


adult auditory cortex. *J. Neurosci.* 23, 10765–10775.


cells induces gamma rhythm and controls sensory responses. *Nature* 459, 663–667. doi: 10.1038/nature 08002


retards auditory cortical development. *Science* 300, 498–502. doi: 10.1126/science.1082163


*Ear Hear.* 18, 189–201. doi: 10.1097/ 00003446-199706000-00002


M., Szymczak, W., and Sliwiñska-Kowalska, M. (2005). The impact of low-frequency noise on human mental performance. *Int. J. Occup. Med. Environ. Health* 18, 185–198.


aging mind: a new look at old problems. *J. Gerontol. Psychol. Sci.* 65B, 205–215. doi: 10.1093/geronb/ gbq035


*Am.* 104, 2385–2399. doi: 10.1121/1. 423748


*Neuroscience* 154, 390–396. doi: 10.1016/j.neuroscience.2008.01.026 Zhou, X., Panizzutti, R., de Villers-Sidani, E., Madeira, C., and Merzenich, M. M. (2011). Natural restoration of critical period plasticity in the juvenile and

adult primary auditory cortex. *J. Neurosci.* 31, 5625–5634. doi: 10.1523/JNEUROSCI.6470-10.2011

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 06 August 2013; accepted: 27 August 2013; published online: 17 September 2013.*

*Citation: Kamal B, Holman C and de Villers-Sidani E (2013) Shaping the aging brain: role of auditory input patterns in the emergence of auditory cortical impairments. Front. Syst. Neurosci. 7:52. doi: 10.3389/fnsys.2013.00052*

*This article was submitted to the journal Frontiers in Systems Neuroscience.*

*Copyright © 2013 Kamal, Holman and de Villers-Sidani. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Neural correlates of moderate hearing loss: time course of response changes in the primary auditory cortex of awake guinea-pigs

## *Chloé Huetz , Maud Guedin and Jean-Marc Edeline\**

*Centre de Neurosciences Paris-Sud, CNRS, UMR 8195, Université Paris-Sud, Orsay, France*

#### *Edited by:*

*Jonathan E. Peelle, Washington University in St. Louis, USA*

#### *Reviewed by:*

*Arnaud Norena, Université de Provence, France Simone Kurt, Hannover Medical School, Germany*

#### *\*Correspondence:*

*Jean-Marc Edeline, Centre de Neuroscience de Paris-Sud, CNRS, UMR 8195, Université Paris-Sud, Bâtiment 446, Orsay 91405, France e-mail: jean-marc.edeline@u-psud.fr* Over the last decade, the consequences of acoustic trauma on the functional properties of auditory cortex neurons have received growing attention. Changes in spontaneous and evoked activity, shifts of characteristic frequency (CF), and map reorganizations have extensively been described in anesthetized animals (e.g., Noreña and Eggermont, 2003, 2005). Here, we examined how the functional properties of cortical cells are modified after partial hearing loss in awake guinea pigs. Single unit activity was chronically recorded in awake, restrained, guinea pigs from 3 days before up to 15 days after an acoustic trauma induced by a 5 kHz 110 dB tone delivered for 1 h. Auditory brainstem responses (ABRs) audiograms indicated that these parameters produced a mean ABR threshold shift of 20 dB SPL at, and one octave above, the trauma frequency. When tested with pure tones, cortical cells showed on average a 25 dB increase in threshold at CF the day following the trauma. Over days, this increase progressively stabilized at only 10 dB above control value indicating a progressive recovery of cortical thresholds, probably reflecting a progressive shift from temporary threshold shift (TTS) to permanent threshold shift (PTS). There was an increase in response latency and in response variability the day following the trauma but these parameters returned to control values within 3 days. When tested with conspecific vocalizations, cortical neurons also displayed an increase in response latency and in response duration the day after the acoustic trauma, but there was no effect on the average firing rate elicited by the vocalization. These findings suggest that, in cases of moderate hearing loss, the temporal precision of neuronal responses to natural stimuli is impaired despite the fact the firing rate showed little or no changes.

**Keywords: acoustic trauma, vocalization, animal, spike timing, tuning curve, single unit recording**

#### **INTRODUCTION**

Over the last decade, an increasing number of studies have described the reorganizations occurring in the adult auditory system after partial hearing loss performed either by traumatic noise or by partial lesions of the cochlea (e.g., Robertson and Irvine, 1989; Kamke et al., 2003; Noreña and Eggermont, 2003, 2005; Rosen et al., 2012). At the thalamocortical level, electrophysiological studies have documented that exposure to loud, traumatic sounds generating partial hearing loss produced alterations in frequency tuning (Rajan, 1998, 2001; Kimura and Eggermont, 1999; Noreña and Eggermont, 2003; Scholl and Wehr, 2008; Gourévitch and Edeline, 2011) and tonotopic map reorganizations (Noreña and Eggermont, 2005). These changes in functional properties may result both from physiological modifications already occurring at subcortical levels (Wang et al., 1996, 2002; Kamke et al., 2003; Vale et al., 2003) and/or from morphological alterations of cortical cells such as modifications in dendritic morphology and in dendritic spine numbers (Fetoni et al., 2013).

In all but three cases (Sun et al., 2008; Noreña et al., 2010; Rosen et al., 2012), the electrophysiological experiments performed so far after hearing loss have assessed the functional properties of auditory cortex neurons under general anesthesia a few hours (Kimura and Eggermont, 1999; Noreña and Eggermont, 2003) to a few days (Rajan, 1998, 2001) or months (Robertson and Irvine, 1989; Gourévitch and Edeline, 2011) after hearing loss. The apparent discrepancy between all these results might be simply due to the use of different delays between the beginning of hearing loss and the time at which the recordings are collected. More precisely, if neuronal activity is collected shortly after hearing loss, it will reveal a neuronal correlate of temporary threshold shift (TTS), whereas recording neural activity months after hearing loss will reveal a correlate of permanent threshold shift (PTS) with a potential contribution of functional reorganizations occurring at the cortical and subcortical levels.

In the present study, we aimed at following the time course of cortical changes from day to day after an acoustic trauma. Single neuron activity was recorded from micro-electrodes chronically implanted in the primary auditory cortex of awake guinea pigs and the responses of cortical neurons were tested from 3 days before to 15 days after the acoustic trauma. On a subset of cells, we also recorded responses to conspecific vocalizations before and up to 3 days after the acoustic trauma.

## **MATERIALS AND METHODS**

#### **SUBJECTS AND SURGERY**

Experiments were performed on 10 adult (3–6 month old, 5 males, 5 females) pigmented guinea pigs (390–650 g) with a national authorization N◦91–271 to conduct animal research and protocol specifically approved by the CNRS and Paris-Sud University (CEEA, Ethic committee N◦59). The animals were housed in a colony room and were grouped by four or five animals in large plastic cages (75 × 55 × 25 cm; Tecniplast, Buguggiate, Italy) with large wire mesh doors (55 × 20 cm). All animals frequently emitted vocalizations during social interactions with the other animals of the same cage and loudly vocalized during animal care and feeding. We did not notice obvious changes in behavior or in amount of emitted vocalizations before vs. after the acoustic trauma.

Initially, the animals underwent a sterile surgery under anesthesia (atropine 0.08 mg/kg, diazepam 8 mg/kg, pentobarbital 20 mg/kg; see Evans, 1979). Three silverball electrodes were inserted between the bone and dura: one was used as a reference during the recording sessions; the other two, placed over the frontal and parietal cortices, served to monitor the cortical electroencephalogram (EEG) during the subsequent recording sessions. A large opening was made in the temporal bone and very small slits (200μm) were made in the dura matter under microscopic control. A diagram of the vasculature pattern was drawn and the primary field (AI) location was estimated based on those observed in our previous studies (Edeline and Weinberger, 1993; Manunta and Edeline, 1999; Edeline et al., 2001; Huetz et al., 2009). A coarse mapping of the cortical surface was made to confirm the location of AI: neuronal clusters were recorded with low impedance (<1 MW) electrodes until a progression from low to high frequency was observed in the caudo-rostral direction (Wallace et al., 2000; Gaucher et al., 2012). An array of 5 tungsten electrodes (∼1.0 M at 1 kHz, spaced 200–300μm in the rostro-caudal axis) was slowly inserted in the auditory cortex under electrophysiological control. Starting from 600μm below the pia, responses to pure tone frequencies were tested at regular depths to optimize the strength of evoked responses; the final placement depth of the electrodes ranged from 800 to 1250μm which correspond to cortical layers III/IV (Wallace and Palmer, 2008).

A dental acrylic cement pedestal, including two cylindrical threaded tubes, was built to allow for atraumatic fixation of the animal's head during the subsequent recording sessions. An antiseptic ointment (Cidermex, neomycine sulfate, Rhone-Poulenc Rorer) was liberally applied to the wound around the pedestal. An injection of non-steroidal anti-inflammatory (Tolfedine 0.1 mg/kg) was given at the end of the surgery and the 2 following days. All surgical procedures were performed in compliance with the guidelines determined by the National (JO 887-848) and European (86/609/EEC) legislations on animal experimentation, which are similar to those described in the *Guidelines for the Use of Animals in Neuroscience Research of the Society of Neuroscience.* Regular inspections of our laboratory by accredited veterinarians designated by the CNRS and Paris-Sud University confirmed that care was taken to maximize the animals' health and comfort throughout the different phases of the experiment.

#### **RECORDING PROCEDURES**

Three days after surgery, each animal was adapted to restrained conditions in an acoustically isolated chamber (IAC, model AC2) for several days. The animal was placed in a hammock with the head fixed for increasing periods of time (from 10–20 min to 1–2 h per day). The animal was also accustomed to hearing sequences of pure tone bursts as well as different vocalizations used subsequently to test the neuronal responses. At least 4 days of adaptation of restrained conditions were allowed before the collection of neuronal recordings.

The recording procedures were the same as in previous studies (Edeline et al., 2000, 2001; Huetz et al., 2009). The signal from the electrode was amplified (gain 10000; bandpass 0.6–10 kHz,) then multiplexed in an audio monitor and a voltage window discriminator. The action potentials waveform and the corresponding TTL pulses generated by the discriminator were digitized (50 kHz sampling rate, Superscope, GW Instruments), visualized on-line and stored for off-line analyses. The pulses were sent to the acquisition board (PClab, PCL 720) of a laboratory microcomputer, which registered them with a 50μs resolution and provided online displays of the neuronal responses. For each animal, the signal from each electrode was tested daily and the data collection only started when, under 1–3 electrodes, action potential waveforms can be unambiguously attributed to a single neuron.

#### **AUDIOMETRY AND EXPOSURE TO THE TRAUMATIC SOUND**

Auditory brainstem responses (ABRs) were recorded as previously described (Gourévitch et al., 2009; Gourévitch and Edeline, 2011). Briefly, ABR were recorded via subcutaneous electrodes (SC25, Neuro-Services); using a Centor-USB interface and software (DeltaMed, France). The signal was filtered (0.2–3.2 kHz, sampling rate 100 kHz), waveforms were averaged (500–1000 waveforms depending on the stimulus intensity) and stored for off-line analyses on a computer. An artifact rejection procedure was used during averaging, the rejection criterion being ±40μV. The stored waveforms were examined and the threshold was defined as the lowest level (dB re:20 μPa) at which a clear waveform could be observed. Intensity levels were always below 90 dB in order to avoid inducing additional hearing loss on traumatized animals.

Hearing loss was realized by a single 1 h exposure to a loud sound. The animals, placed individually in a wire mesh cage (23 × 23 × 15 cm), were exposed to a traumatic tone (pure tone of 5 kHz at 110 dB SPL) in an acoustically isolated chamber (IAC model AC2). The pure tone was generated by the wave generator (Hewlett-Packard, model HP 8903B), amplified (Prism-Audio, model LA-150M) and sent to two piezoelectric tweeters (Motorola, model KSN 1005) located on each side of the cage. The sound delivery system was calibrated to obtain 110 ± 10 dB at various locations in the cage using a calibrated type I precision sound level meter (B&K model 2235). A videocamera, installed in the acoustic chamber, allowed visualizing the animal during exposure and checking that there was no preferred orientation of the animal regarding the two speakers. During the 5 first minutes of exposure, freezing behavior was observed most of the animals (*n* = 8), then the animals moved toward a corner of the cage and stayed there for the rest of the exposure. In two other cases, the animal explored the cage and moved during the 5 first minutes then stayed at the same location in the cage. During the rest of the 1 h exposure, we did not observed particular signs of stress, panic, or abnormal behavior. The amount of excretion (urine and feces) found in the cage at the end of the 1 h exposure session was not different from what is usually found when guinea pigs are placed for 2 h in a new environment (Manunta and Edeline, 1999; Tith and Edeline, unpublished data).

#### **AUDITORY STIMULI AND EXPERIMENTAL PROTOCOL**

All the cells included in the present study exhibited reliable tuning curves when tested with pure tone frequencies. The sound generating system to deliver pure tone frequencies was the same as that previously described, (Edeline et al., 2001; Manunta and Edeline, 2004; Huetz et al., 2009). Pure tones (100 ms, rise/fall time 5 ms) were generated by a remotely controlled wave analyzer (Hewlett-Packard model HP 8903B) and attenuated by a passive programmable attenuator (Wavetek, P557, maximal attenuation 127 dB), both controlled via an IEEE bus. Stimuli were delivered through a calibrated earphone (Beyer DT48) placed close to the ear canal. The system was calibrated using a sound level calibrator and a condenser microphone/preamplifier (Bruel and Kjaer models 4133 and 2639T) placed at the same distance from the speaker as the animal's ear (<5 mm). The whole sound delivery system (HP 8903B, attenuators, and speaker) was calibrated from 0.1 to 35 kHz and could deliver tones of 80 dB up to 20 kHz and of 70 dB up to 35 kHz. Harmonic distortion products were measured to be down about 50 dB from the fundamental. The EEG (bandpass 1–90 Hz) was displayed on a polygraph (Grass, model 79D) to make sure that the animal was awake during the entire recording session (the data collection was stopped when large voltage EEG signals characteristic of slow-wave sleep were present).

When the recordings were stable enough and the animal quiet enough after completion of the tuning curve determination, three typical guinea pig vocalizations (Berryman, 1976; Harper, 1976) used in our previous studies (Philibert et al., 2005; Huetz et al., 2009) were presented in their natural and time-reversed versions. These vocalizations were collected from five adult male guinea pigs recorded either in pairs or individually in a sound attenuated room. Calls were recorded using a Sennheiser MD46 microphone connected to a microcomputer and digitized using SoundEdit software (44 kHz sampling rate). The relationships between these calls and the animal behavioral repertoire have been previously described (Berryman, 1976; Harper, 1976). A "chirp" is a brief call (0.7–15 kHz, <100 ms) that is believed to be a low-intensity distress call or a warning signal. A "chutter" consists of a chain of five components (0.5–3.5 kHz, 150–250 ms separated from each other by 140–175 ms) emitted during discomfort. A "whistle" is a two-part call (250–400 ms, with the first part from 1–3 kHz and the second part rising steeply to 8–20 kHz) emitted when animals are isolated or in response to stimuli associated with caretaking. The time-reversed versions of the stimuli were generated by reversing the natural calls in the time domain, i.e., playing the call backward. Each call was presented at a peak intensity of 70 dB sound pressure level. The synchronization between the vocalizations' onset and the spike trains was made by a synchronization pulse triggered when the vocalization intensity crossed a voltage threshold. Therefore, the neuronal recordings to different vocalizations were not synchronized with the real onset of each vocalization but rather with a fixed sound pressure level reached by the vocalization. The natural and time-reversed versions of the three calls were presented in random order, with each call repeated 20 times with a 2-s period of silence between each vocalization. The whole protocol, i.e., testing the frequency tuning with pure tones and the responses to the four vocalizations (natural and time-reversed) lasted approximately 60 min. The EEG was displayed on a polygraph and a computer to make sure that the animal was awake during the entire recording session. The animals did not vocalize during the recording sessions. Neuronal activity was recorded during recording sessions separated by 24 h from 3 days before up to 15 days after the trauma. In all cases, the recording session was stopped each time the spike waveform became unstable. Systematic off-line examination of the digitized waveforms confirmed that spike trains of unambiguously isolated single units were recorded.

#### **HISTOLOGICAL ANALYSES**

After the last recording session, the animals received a lethal dose of pentobarbital (200 mg/kg), and small electrolytic lesions were made by passing anodal current (10μA, 10 s) through the recording electrodes. The animals were perfused transcardially with 0.9% saline (200 ml) followed by 2000 ml of fixative (4% paraformaldehyde in 0.1 M phosphate buffer, pH 7.4). The brains were subsequently placed in a 30% sucrose solution for 3–4 days; then coronal sections were cut on a freezing microtome (50 μm thickness) and counterstained with cresyl violet. The analysis of histological material was always done blind of the electrophysiological results. The sections were examined under several microscopic magnifications to find the electrode tracks corresponding to the implanted tungsten electrodes. Determinations of the relative thickness of cortical layers in the guinea-pig Auditory Cortex (ACx) (Wallace and Palmer, 2008) were used to assign each recording to a cortical layer.

#### **DATA ANALYSIS**

For each cell, the frequency tuning was quantified from threshold up to 80 or 70 dB by 10 dB steps. At each intensity, the best frequency was determined as the frequency eliciting the largest evoked responses. The breadth of tuning was quantified by the Q20 dB (but the Q10 dB and the Q40 dB were computed as well). The latency of the tone-evoked responses was computed at each intensity used to test the frequency tuning curve. At a given intensity, the responses obtained for all the tested frequencies were considered, and the latency of the first spike after tone onset was computed (1-ms precision). For each cell, at each intensity, the variability of the latency was quantified by the standard deviation of the mean latency value.

For each cell, the responses to the vocalizations were analyzed in terms of evoked firing rate and spike timing reliability. Since the three tested vocalizations had different lengths (from 90 ms for the "chirp," up to 1740 ms for the "chutter"), only the first 90 ms were analyzed to allow pooling of the responses to different vocalizations. The spike timing reliability was computed using the *CorrCoef* as in previous studies (Gaucher et al., 2013a). It corresponds to the normalized covariance between each pair of spike trains recorded at presentation of this vocalization and was calculated as follows:

$$CorrCoef = \frac{1}{N(N-1)} \sum\_{i=1}^{N-1} \sum\_{j=i+1}^{N} \frac{\sigma \,\varkappa\_i \varkappa\_j}{\sigma \,\varkappa\_i \sigma \,\varkappa\_j}$$

where *N* is the number of trials and σ*xixj* is the normalized covariance at zero lag between spike trains *xi* and *xj* where *i* and *j* are the trial numbers. Spike trains *xi* and *xj* were previously convolved with a 10 ms half-width Gaussian window.

Onset PSTHs were constructed by summing up all the trials of all cells recorded on each day before and after the hearing loss. These PSTHs were constructed only for the onset response, i.e., for the first 90 ms after the beginning of each vocalization. They were made for the days on which a sufficient number of recordings was available (from day −3 to day +3). They were first computed for each vocalization, but as the main result was similar for all vocalizations, the figures show averaged onset PSTH over all vocalizations.

In the following text and figures, days before and after trauma are labeled D−X or D+X (X being the number of days before −, or after + the acoustic trauma).

#### **RESULTS**

#### **EVALUATION OF THE HEARING DEFICIT BY ABR**

Since we did not record the ABRs of our animals (*n* = 10) before and after the acoustic trauma, hearing loss evaluation is only based on ABRs obtained a few weeks (between 4 and 8 weeks) after the last recording session. We compared the ABRs obtained from our hearing impaired animals with a large database of ABRs obtained (with exactly the same equipment) in control guinea pigs (*n* = 46) of the same age (4.7 ± 0.6 months) and weight (585 ± 113 g) than the ones used here. The control animals had typical ABR audiograms similar to those previously published (Gourévitch et al., 2009; Gourévitch and Edeline, 2011). Compared to control animals, the animals used in the present experiment had a consistent hearing deficit (15–20 dB on average) in the 4–8 kHz range (**Figure 1**). Statistical analyses confirmed that there was no hearing deficit in the low frequency range (unpaired *t*-tests, *p* > 0.10 at 0.5, 1 and 2 kHz), a clear and significant hearing deficit in the mid frequency range (*p* < 0.01 for 4, 5, and 8 kHz). The hearing loss was modest but still significant at 16 kHz (*p* = 0.047); it was not for 24 and 32 kHz (*p* > 0.08 in both cases). Thus, the long-term effects of the exposure were mostly an increased threshold of 15 dB at, and one octave above, the trauma frequency. Most likely, these changes should be considered as a PTS since they were obtained 1.5–2.5 months after the acoustic trauma.

#### **CONSEQUENCES OF THE TUNING CURVES PARAMETERS**

Thirty-two cortical sites were studied in eight animals from 3 days before to 15 days after the acoustic trauma (for two animals none of the electrodes gave satisfactory signal-to-noise ratio to record clear single unit activity over the 2 weeks of recording). On each recording session, special cares were taken to make sure that the

discharges of a single unit were actually recorded. From 1 day to the next, it was not possible to determine whether or not the same neuron was recorded and, rather than claiming that the same cells were recorded over time, we prefer to consider that it was the same cortical site from which cells were sampled across the 3 weeks of the protocol.

computed from a database of ABRs obtained in control guinea pigs

(*n* = 46) of similar age and weight.

On a given recording session, the neuronal responses were determined at 3–7 intensities (from 80 or 70 dB to threshold) thus allowing quantifications of functional parameters. The scattergrams presented on **Figure 2** display the characteristic frequency (CF) derived at each cortical site before trauma vs. after trauma. Comparing the CF obtained 3 days vs. 1 day before the acoustic trauma (**Figure 2A**) indicates that there was a relative good match between the CF values, which suggests that there was a decent amount of stability of the CF in control conditions. During all the following recording sessions after the trauma, the general tendency was the same: cortical sites with initial CF above 8 kHz displayed lower CF after the trauma (dots below the diagonal lines in **Figures 2B–F**). Statistical analyses indicated that there was no change in CF value between two recording sessions before the acoustic trauma (*p* > 0.32), whereas there was a significant decrease of the CF values from the first (D+1) to the last (D+15) recording session after the trauma (paired *t*-test, *p* < 0.05 in all cases). This effect is illustrated on **Figure 3**: For two different cortical sites, the tuning curves clearly display a shift of at least one octave in the low frequency range. Note that there was a partial recovery of the threshold between the first day and the third day after the trauma. Also, as shown in **Figure 4**, there was no correlation between the threshold shifts and the shifts in CF values: even at D+15, cells within the same frequency band can display either large increase in threshold or a small decrease in threshold.

Based on the tuning curves obtained at each cortical site before and after the trauma, the group data clearly pointed out that there was an increase in cortical threshold. **Figure 5A** shows the evolution of the mean threshold value from 3 days up to 15 days after the acoustic trauma. There was a 20 dB increase (from 39.6 to 61.4 dB, *p* < 0.01) in threshold when comparing the day preceding the trauma and the day following the trauma. In the following days, this increase in threshold was less pronounced (10 dB on average) but the mean threshold remained significantly higher than before trauma (*p* < 0.05 at 9 and 15 days post-trauma). It tended to stabilize after the trauma since there was not threshold difference at 9 and 15 days post-trauma (*p* > 0.20).

As shown in **Figure 5B**, the quantification of the tuning width by the Q20 dB revealed neither systematic tuning curve enlargement nor systematic shrinkage after the acoustic trauma (all *p*-values > 0.18). Quantification of the Q10 dB or the Q40 dB did not reveal effect that could have been masked by the arbitrary choice of quantifying the tuning curve at a particular intensity (*p* > 0.15 in all cases). Analyzing the tuning width according to the CF frequency did not reveal more effects: There were no statistical differences in tuning width for cells with low CF (<5 kHz), middle CF (5 < CF < 10 kHz) and high frequency CF (>10 kHz).

In contrast, quantification of the latency of the tone evoked responses revealed a marked effect (**Figure 5C**): At the highest intensity tested (70 or 80 dB), there was a large and significant (*p* < 0.01) increase in response latency the first day following the acoustic trauma (from 27.2 to 32.5 ms). This increase in latency was no longer present the subsequent days (*p* > 0.25 in all cases). Similarly, the variability of the response latency was increased the first day after the acoustic trauma (*p* < 0.05, **Figure 5D**) but not the following days (*p* > 0.27 in all cases). In fact, subsequent analyses revealed that this increase in response latency and latency variability was present for the cells exhibiting a CF above 5 kHz before the trauma (*p* < 0.01) but was less pronounced for the cells exhibiting lower CF (*p* = 0.07). Note that this increase in latency and latency variability was still significant (*p* < 0.05) when the responses obtained at all the tested intensities (80–20 dB) were pooled together.

To summarize, the quantification of the tuning curves indicated that there was a 20 dB increase in cortical threshold in the 24 h following the acoustic trauma accompanied by a 5 ms mean increase in response latency and in variability of response latency. Although attenuated, the increase in cortical auditory threshold was still present up to 15 days post-trauma, but the changes in latency and latency variability could no longer be detected after the first day post-trauma. Whatever the frequency band that was considered (below the trauma frequency, within 1.5 octave above, or more than 1.5 octave above it), there was no correlation between the ABR threshold shift and the cortical threshold shift (lowest *p*-value, *p* = 0.23). This lack of relationship was previously reported (see Figure 4 in Gourévitch and Edeline, 2011).

#### **ALTERATIONS OF THE RESPONSES TO CONSPECIFIC VOCALIZATIONS**

Each time the animal was still quiet after the completion of the tuning curve determination, a set of previously used conspecific vocalizations (Philibert et al., 2005; Huetz et al., 2009) was presented at 70 dB SPL (peak amplitude) in their normal and

time-reversed versions. Based upon previous studies (review in Huetz et al., 2011; Gaucher et al., 2013b), two parameters were quantified to assess the effect of hearing loss on the responses to communication sounds, the firing rate and the spike timing reliability of the responses. On average, we could not detect significant change in evoked firing rate during the presentation of the vocalizations at 70 dB. **Figure 6A** displays the evoked firing rate for the 3 days before and the 3 days after the acoustic trauma: There was a slight increase in evoked response the 2 first days after the trauma, and a non-significant decrease on the third day. None of these variations were significant as they were in the range of the pre-trauma response fluctuations. Similarly, we could not detect significant effect on the spontaneous firing rate. Even if an ANOVA across days was significant (*F* = 7.14, *p* < 0.01), the difference between D−1 (3.1 spikes/s) and D+1 (3.9 spikes/s) was smaller than the difference between D-2 (4.7 spikes/s) and D−1. Thus, day-to-day fluctuations in spontaneous firing rate before the acoustic trauma might have prevented to detect significant effects after the acoustic trauma.

The *CorrCoef* index, i.e., the index quantifying the spike timing reliability of evoked responses, did not indicate significant

changes (**Figure 6B**). There was a small decrease in spike timing reliability on the 2 first days after the acoustic trauma, but, as for the firing rate, this change was in the range of the pre-trauma fluctuations (ANOVA, *F* = 1.82, *p* = 0.10). For the cells with CF above 5 kHz, we did not detect significant changes in terms of evoked firing rate. However, for these cells the *CorrCoef* index showed a significant decrease on the first day after the trauma compared to the day before trauma (ANOVA, *F* = 2.34, *p* = 0.04, paired *t*-test between D−1 and D+1, *p* = 0.007) suggesting that the acoustic trauma transiently

after the trauma.

impacts the spike timing reliability of middle and high CF cells (**Figure 6C**).

Interestingly, analyzing the average latency obtained from the onset responses to the different vocalizations indicated clear effects. **Figure 7A** shows average onset PSTHs obtained when pooling the response to the different vocalizations across all the recorded cells. Over the 3 first days before the trauma (blue curves), the latency was relatively stable with a mean latency of 10 ms<sup>1</sup> . During the 3 days after the acoustic trauma (red curves), the latency was increased to a value of 11 ms: Although small in absolute value, this increase in latency was systematic as attested by the shifts of the onset PSTHs. This increase was more pronounced the first day following the trauma then the response latency progressively moved back to control values. The overall shape of the response was also modified. The onset response was reduced and the response duration was increased, suggesting a lack of synchronization of the response at the onset of the vocalizations. Moreover, analyzing the duration of the onset response (by taking a threshold at 0.15 spikes/s which correspond to the baseline level plus 2 standard deviations) revealed that the trauma strongly increased the variability and/or the duration of the onset response: before the trauma, the responses returned to the background firing rate (0.15 spikes/s) after 9 ms, whereas the first day after the trauma, the onset response lasted for 19 ms.

To investigate how the acoustic trauma affects the responses of cells with different CFs, we split the population into three groups according to the cell's CF: the "low CF" group (CF < 5 kHz, *n* = 6) for which the CF was below the trauma frequency, the middle frequency CF (5 kHz < CF < 10 kHz) for which the CF was in one octave above the trauma frequency and high CF (CF > 10 kHz) for which the CF was more than one octave above the trauma frequency. **Figure 7B** shows the onset PSTHs computed for these three groups the day before and the day after the trauma. Before the trauma, we observed that the higher the CF, the longer the latency and the smaller the onset response. This effect probably results from the fact that there was little energy in the vocalization in the high frequencies, so the strength of the inputs was probably very weak for the neurons with high CF. This pattern was conserved the day following the trauma, but in the three groups, the response latencies was increased after the acoustic trauma (compare the different blue curves with the redyellow curves). More importantly, for the low CF neurons, only the latency was shifted without any effect on the strength of the onset response. For the two other groups, the trauma not only induced a shift in latency but also a slight decrease of the strength of the onset response.

<sup>1</sup>Note that this value corresponds to the latency of the neuronal response as measured from the time at which the vocalizations reached a fixed threshold in terms of SPL level (the same for all vocalizations; see Methods).

bar. **(B)** Same for the spike timing reliability as computed from the *CorrCoef* (see Methods) for all recorded neurons. The *CorrCoef* is an index of the trial-to-trial reliability of the response. **(C)** Same as **(B)**, but only for neurons which CF is above the trauma frequency (CF > 5 kHz). Stars correspond to statistically significant difference (*p* < 0.05) from control level at Day-1.

#### **DISCUSSION**

We show here that an acoustic trauma, producing moderate but permanent hearing loss triggers both transient (D+1) and long-lasting (D+15) events in the responses of auditory cortex neurons. When tested with pure tones, auditory cortex neurons displayed 24 h after the trauma an increase in threshold, accompanied by an increase in latency and latency variability. Only a modest increase in threshold persisted a few days after. When tested with conspecific vocalizations, the responses of cortical neurons displayed a decrease in spike-timing reliability especially for the cells with CF above the trauma frequency, and there was a significant shift in response latency and response duration. These effects clearly suggest that moderate hearing loss impacts on several temporal parameters of neuronal responses in the auditory cortex of awake animals. The immediate effects differ from those observed 15 days later probably reflecting the difference between TTS and PTS.

#### **METHODOLOGICAL CONSIDERATIONS**

An obvious pitfall of our experiment is that the ABRs were only recorded at the end of the experiment. Therefore, the hearing loss was assessed by comparing the post-trauma ABRs with a large database of ABR obtained from control animals of the same age and weight. Another consequence was that we could not follow the time course of the ABRs changes in parallel with the changes in cortical responses. However, testing the ABR daily would have required to anesthetize the animals daily (or every other days), which was a risk we did not want to take given that (i) the animals were implanted with chronic micro-electrodes and (ii) they were already adapted to restrained conditions. Despite these limitations, there is clear evidence for hearing loss within one octave at, and above, the trauma frequency. This loss is weaker than in previous studies (Gourévitch et al., 2009; Gourévitch and Edeline, 2011) because of the shorter time of exposure. The increase in cortical thresholds observed here is modest, but it should be kept in mind that even with much larger ABR threshold shifts (about 40 dB), cortical threshold shifts were found to be 5–20 dB (Noreña and Eggermont, 2003; Noreña et al., 2003).

Other limitations are (i) that the size of our population is quite small compared with previous studies performed in anesthetized conditions, and (ii) that we cannot claim that the same neurons were tested before and after trauma. Nonetheless, we sampled the same cortical sites before and after trauma, as it was the case when multiunit recordings were compared before and after trauma (Eggermont and Komiya, 2000; Noreña and Eggermont, 2003; Noreña et al., 2003), which led us to consider that comparison with previous studies is possible.

#### **COMPARISON WITH OTHER STUDIES TESTING AUDITORY CORTEX RECEPTIVE FIELDS AFTER HEARING LOSS**

Over the last two decades, several studies have provided compelling evidence for cortical reorganizations after hearing loss (review in Pienkowski and Eggermont, 2011). The origins of hearing loss can be quite diverse ranging from a physical lesion of the cochlea (Robertson and Irvine, 1989; Rajan et al., 1993; Rajan and Irvine, 1996), to intense noise exposure (Eggermont and Komiya, 2000; Noreña and Eggermont, 2003, 2005; Tomita et al., 2004), ototoxic drugs (Harrison et al., 1991; Schwaber et al., 1993), or to genetic pathology (Willott et al., 1993). When the consequences of hearing loss were assessed weeks/months after injury or acoustic trauma, large-scale reorganizations of cortical maps were reported (Robertson and Irvine, 1989; Rajan et al., 1993; Rajan and Irvine, 1996; Eggermont and Komiya, 2000; Noreña and Eggermont, 2005). In contrast, when immediate changes were evaluated, the authors mainly focused on alterations of frequency tuning curves, spontaneous and evoked firing rates (Noreña and Eggermont, 2003; Noreña et al., 2003). Results obtained here in awake animals within a few days after acoustic trauma share some similarities with some of the immediate

effects following a trauma. For example, the shifts of CF toward lower values for cells having their original CF above the trauma frequency is similar to what was reported in the cat auditory cortex after a trauma at the same frequency as here (Noreña and Eggermont, 2003; Noreña et al., 2003). However, several other changes described in previous studies were not observed in the present data. For example, we could not detected systematic changes in evoked (**Figure 6**) or in spontaneous activity after acoustic trauma. In fact, as explained in the Results section, dayto-day fluctuations in evoked and spontaneous activity (either due to differences in the cell types recorded from 1 day to the next by a given electrode, or to other uncontrolled factors) may have masked potential increase in spontaneous and evoked firing rates. In addition, we should keep in mind that based on cortical evoked potentials recorded in awake animals, there was no increase in evoked potentials amplitude 1 day after an acoustic trauma despite the fact that increases were detected 4 h after trauma (Sun et al., 2008).

Also, we could not detect any effect on the width of tuning curves as reported in some studies (Noreña et al., 2003). This is not totally surprising given that, as shown by Scholl and Wehr (2008), a complex disruption of the excitation/inhibition balance occurs immediately after the acoustic trauma, which selectively increases and decreases the strength of inhibition at different positions within the receptive field. In addition, besides the small size of the neuronal population we have studied, it is possible that the lack of effect on the frequency tuning stems from differences between anesthetized and unanesthetized conditions. Changes in the excitation/inhibition balance converging onto a particular cell are obviously important for controlling the shape of the frequency and intensity tuning but, in awake animals, these effects can be masked by the state of vigilance, the animals' attention and/or the concentrations of neuromodulators at the vicinity of the recorded cells (review in Edeline, 2003, 2012). Despite the fact that the EEG was collected to make sure the recording sessions took place in waking conditions, it was neither possible to evaluate the animals' attention to the different stimuli nor to evaluate its level of arousal or alertness.

Conflicting results were previously described on the latency changes observed immediately after an acoustic trauma. On the one hand, Noreña and colleagues show examples of shorter latencies and shorter response durations in the cat auditory cortex within the 2–4 h following trauma (see the example in Figure 12 in Noreña et al., 2003). On the other hand, in the rat auditory cortex, Scholl and Wehr (2008) showed that membrane potential responses were delayed and prolonged throughout the receptive field in the tens of minutes following the trauma. The methodological differences explaining this discrepancy remain unknown. However, when auditory cortex neurons were tested 2–5 months after the acoustic trauma, the mean response latency was 5 ms longer in traumatized cats than in normal cats (Eggermont and Komiya, 2000). In any case, this suggests that the time at which cortical neurons are tested after an acoustic trauma is potentially a crucial factor. This urges for additional studies during which the time course of cortical changes will be evaluated. Several mechanisms, having different time courses, can underlie our results: the effects on the response latency were transient whereas the effects on the cortical thresholds, although attenuated, were long lasting, potentially reflecting that TTS and PTS have different consequences on the responses of auditory cortex neurons.

Whether or not the cortical alterations reported here reflect residual responses or a central (thalamo-cortical) plasticity is a challenging question. When testing the different levels of the auditory system after hearing loss, evidence for responses having the properties of residual responses come from studies performed at the cochlear nucleus (Rajan and Irvine, 1998) and inferior colliculs levels (Irvine et al., 2003), whereas the map changes reported at the cortical and thalamic level were considered as real reorganizations (Robertson and Irvine, 1989; Rajan et al., 1993; Kamke et al., 2003). As shown in **Figure 3**, some of our recording sites recovered CFs with threshold close from the pre-trauma level which suggests that the changes described here result from plastic changes and not from residual responses.

#### **RESPONSES TO NATURAL STIMULI AFTER ACOUSTIC TRAUMA**

Many previous studies have stressed that the discrimination performance of auditory cortex neurons are better when based on the temporal organization of spike trains rather than when based on the average firing rate (e.g., see Schnupp et al., 2006; Huetz et al., 2009; Gaucher et al., 2013a). At presentation of conspecific vocalizations, we found here that there was no significant change in firing rate but a decrease of the *CorrCoef* (for the cells with CF > 5 kHz), an index quantifying the spike-timing reliability (**Figure 6**). This means that at the level of individual cells, the trial-to-trial reliability of evoked responses is impaired after the trauma. The latency shift, and more importantly, the increase in response duration (**Figure 7**) observed from the onset PSTH indicate that the synchronization of cortical cells is also impaired by the trauma. Altogether, these results point out that important aspects of neuronal responses to communication sounds are modified after hearing loss. A puzzling result is the fact that the responses latency was only increased the day after trauma when tested with pure tones, whereas it was increased for several days when tested with communication sounds, suggesting that more pronounced effects can be detected with natural stimuli. Potentially, one explanation is that each pure tone activates relatively small sets of afferents converging on the recorded cell, whereas the onset of vocalizations recruits a much larger proportion of afferents converging on this cell. This is consistent with the fact that acoustic trauma induces an important loss of inhibition far away from the receptive field of the cell (Scholl and Wehr, 2008). In addition, because the response of cortical neurons to communication sounds does not depend only on the spectral content of these sounds (see Figure 1 in Gaucher et al., 2013b), one can consider that communication sounds can reveal effects that are undetectable with the test of classical tuning curves.

Often, alterations observed after hearing loss at the cortical level are interpreted as resulting from intracortical reorganizations. Based on recent results, it is unlikely that the effects described here result from an attenuation of intracortical inhibition. When reducing intracortical inhibition by pharmacological treatments, it was found that the trial-to-trial reliability and phasic inhibition were enhanced at presentation of communication sounds (Gaucher et al., 2013a). Therefore, the effects described here cannot be simply explained by a local alteration of cortical inhibition. Most likely, some of the alterations observed at the cortical level can stem from effects already occurring at subcortical levels. For example, after an acoustic trauma, the acute loss of the IHC ribbon synapses connected to high-threshold, low spontaneous firing rate, auditory nerve fibers may lead to decrease the synchronization of auditory nerve fiber responses. At the central level, a down-regulation of inhibition can be detected days or weeks after a noise-induced hearing loss: the level of GAD-65/67 (the enzyme responsible for the conversion of glutamate into GABA) and the levels of GABAA receptor subunit α1 were found to be lowered both in inferior colliculus and auditory cortex, respectively (Wang et al., 2002; Browne et al., 2012; Kou et al., 2013).

#### **CONCLUSIONS**

So far, very few studies have studied the consequences of hearing loss in non-anesthetized animals. Recording evoked potentials in auditory cortex 24 h after an acoustic trauma, Sun et al. (2008) did not detect increase in evoked potential amplitude despite transient increases 4 h after the trauma. The cortical recordings in performed in awake gerbils by Rosen et al. (2012) after developmental hearing loss is the only one making the link between behavior and neuronal deficits. These authors examined behavioral and neural detection thresholds for sinusoidally amplitude modulated (SAM) stimuli. In animals with bilateral conductive hearing loss, behavioral SAM detection for slow modulation (<5 Hz), but not for fast modulation (100 Hz) was impaired in hearing impaired animals. Auditory cortex neurons displayed limited impairments for static stimuli but respond

poorly to slow, but not to fast, SAM tones. Comparisons between psychometric and neurometric curves (based on firing rate) indicated similar impairment at the behavioral and neural levels (Rosen et al., 2012).

To the best of our knowledge, the present study is the first one describing the consequences of partial hearing loss on the responses to communication sounds. By analyzing the responses to conspecific vocalizations we found significant alterations of parameters related with the temporal synchronization of neuronal responses. For several reasons, analyzing the cortical responses to communication sounds can be a good model for understanding how speech stimuli are processed by cortical neurons. Many studies in human now suggest that very subtle hearing loss, or sometime undetectable hearing loss, can lead to abnormal speech processing (e.g., Léger et al., 2012). This study could be a starting point to link more tightly the deficit in speech intelligibility observed in human and the alterations in cortical responses after modest hearing loss.

## **ACKNOWLEDGMENTS**

This work was supported by grants from the National Research Agency (ANR program Neuro2006, ANR-11-BSH2- 004-01 HEARFIN) and from the Fédération pour la Recherche sur le Cerveau (FRC) to Jean-Marc Edeline. Special thanks to Nathalie Samson and Pascale Leblanc-Veyrac for taking care of the guinea-pig colony.

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 14 October 2013; accepted: 07 April 2014; published online: 28 April 2014. Citation: Huetz C, Guedin M and Edeline J-M (2014) Neural correlates of moderate hearing loss: time course of response changes in the primary auditory cortex of awake guinea-pigs. Front. Syst. Neurosci. 8:65. doi: 10.3389/fnsys.2014.00065*

*This article was submitted to the journal Frontiers in Systems Neuroscience.*

*Copyright © 2014 Huetz, Guedin and Edeline. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Noise-induced hearing loss increases the temporal precision of complex envelope coding by auditory-nerve fibers

## *Kenneth S. Henry1†, Sushrut Kale2† and Michael G. Heinz 1,2\**

*<sup>1</sup> Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, IN, USA <sup>2</sup> Weldon School of Biomedical Engineering, Purdue University, West Lafayette, IN, USA*

#### *Edited by:*

*Jonathan E. Peelle, Washington University in St. Louis, USA*

#### *Reviewed by:*

*Roland Schaette, University College London, UK Daniel Stolzberg, University of Western Ontario, Canada*

#### *\*Correspondence:*

*Michael G. Heinz, Department of Speech, Language, and Hearing Sciences, Purdue University, 500 Oval Drive, West Lafayette, IN 47907, USA e-mail: mheinz@purdue.edu*

#### *†Present address:*

*Kenneth S. Henry, Department of Biomedical Engineering, University of Rochester, Rochester, USA; Sushrut Kale, Department of Otolaryngology – Head and Neck Surgery, Columbia University, New York, USA*

While changes in cochlear frequency tuning are thought to play an important role in the perceptual difficulties of people with sensorineural hearing loss (SNHL), the possible role of temporal processing deficits remains less clear. Our knowledge of temporal envelope coding in the impaired cochlea is limited to two studies that examined auditory-nerve fiber responses to narrowband amplitude modulated stimuli. In the present study, we used Wiener-kernel analyses of auditory-nerve fiber responses to broadband Gaussian noise in anesthetized chinchillas to quantify changes in temporal envelope coding with noise-induced SNHL. Temporal modulation transfer functions (TMTFs) and temporal windows of sensitivity to acoustic stimulation were computed from 2nd-order Wiener kernels and analyzed to estimate the temporal precision, amplitude, and latency of envelope coding. Noise overexposure was associated with slower (less negative) TMTF roll-off with increasing modulation frequency and reduced temporal window duration. The results show that at equal stimulus sensation level, SNHL increases the temporal precision of envelope coding by 20–30%. Furthermore, SNHL increased the amplitude of envelope coding by 50% in fibers with CFs from 1–2 kHz and decreased mean response latency by 0.4 ms. While a previous study of envelope coding demonstrated a similar increase in response amplitude, the present study is the first to show enhanced temporal precision. This new finding may relate to the use of a more complex stimulus with broad frequency bandwidth and a dynamic temporal envelope. Exaggerated neural coding of fast envelope modulations may contribute to perceptual difficulties in people with SNHL by acting as a distraction from more relevant acoustic cues, especially in fluctuating background noise. Finally, the results underscore the value of studying sensory systems with more natural, real-world stimuli.

**Keywords: amplitude modulation, auditory nerve, sensorineural hearing loss, temporal envelope, temporal resolution, Wiener-kernel analysis**

## **INTRODUCTION**

People with sensorineural hearing loss (SNHL) commonly have difficulty understanding speech under real-world listening conditions, even with amplification from a modern hearing aid (Duquesnoy, 1983; Woods et al., 2010). Research conducted over the last several decades has uncovered changes in cochlear frequency tuning with SNHL that most likely contribute to speech perception problems in these listeners. While the normal-hearing cochlea decomposes broadband signals like speech into a number of sharply tuned "auditory filter" channels for central processing, SNHL causes an increase in the bandwidth of auditory filters that consequently decreases resolution of spectral features by the cochlea (Young, 2012). Furthermore, SNHL causes downward shifts in the best frequency of tuning, particularly in the base of the cochlea, which are also likely to contribute to perceptual impairment (Liberman, 1984; Henry and Heinz, 2013).

In addition to diminished spectral resolution, perceptual difficulties in people with SNHL might also reflect changes in auditory sensitivity to the temporal structure of sound. The effects of SNHL on sensitivity to temporal fine structure and slower varying temporal envelope cues are both topics of active debate and investigation (Lorenzi et al., 2006; Hopkins et al., 2008; Swaminathan and Heinz, 2012). In the current report, we focus on temporal envelope sensitivity. While several studies suggest that SNHL might cause an increase in the perceptual salience of temporal envelope structure that could adversely affect speech perception in fluctuating background noise (through loudness recruitment; Moore and Glasberg, 1993; Moore et al., 1995, 1996), other studies suggest that envelope sensitivity is relatively unaffected by SNHL (Bacon and Gleitman, 1992; Moore et al., 1992; Lorenzi et al., 2006) or even diminished (Bacon and Viemeister, 1985; Formby, 1987; Grant et al., 1998). Physiological data from nonhuman animals have the potential to clarify changes in temporal envelope sensitivity with SNHL.

Our current physiological knowledge of temporal envelope coding in the impaired cochlea is limited (e.g., Kale and Heinz, 2010, 2012). Consistent with enhancement of envelope coding, these studies showed that noise-induced SNHL amplifies phase locking to the temporal envelope of sinusoidally amplitude modulated (SAM) tones and single-formant stimuli in auditorynerve fibers of anesthetized chinchillas (Kale and Heinz, 2010). Amplified envelope coding was observed over a wide range of modulation frequencies and did not appear to alter the shape of temporal modulation transfer functions (TMTFs) plotting response amplitude as a function of modulation frequency (Kale and Heinz, 2012). Hence, it appears from the SAM-tone data that SNHL may amplify the neural representation of envelope structure in the cochlea without altering the precision with which temporal modulations are encoded, i.e., without extending modulation coding to higher modulation frequencies.

In the present study, we extend this previous physiological work by studying temporal envelope coding using more complex, Gaussian noise stimuli with broad frequency bandwidth and a dynamic temporal envelope. Wiener-kernel analyses of auditorynerve fiber responses were used to quantify the preferred spectral and temporal stimulus features driving the neuron (van Dijk et al., 1994; Lewis et al., 2002; Recio-Spinoso et al., 2005; Temchin et al., 2005). The 2nd-order Wiener kernel (*h*2; **Figure 1A**) is a time domain representation of the spectro-temporal receptive field (STRF; **Figure 1B**) or mean spectrogram in the 10–20 ms time window preceding a spike (the STRF is the 1-dimenstional Fourier transform of *h*2; Lewis and van Dijk, 2004). Just as STRFs that are compact in frequency indicate sharp frequency tuning (i.e., spikes are driven by a narrow range of acoustic frequencies), STRFs and *h*<sup>2</sup> that are compact in time indicate high temporal precision, i.e., spikes are driven by acoustic energy falling in a short temporal window.

In previous studies, we used Wiener kernel analyses to quantify the effects of noise-induced SNHL on the frequency tuning of phase-locked responses to the temporal fine structure and envelope of broadband Gaussian noise stimuli (e.g., Henry and Heinz, 2013). Here, we quantify the effects of SNHL on the temporal precision, amplitude, and latency of envelope coding at the level of the auditory nerve in anesthetized chinchillas. The results show that at equal stimulus sensation level, noise-induced SNHL increases the temporal precision of envelope coding by 20–30%. Furthermore, SNHL is associated with a decrease in response latency of 0.4 ms and amplification (∼50%) of envelope coding in fibers with CFs from 1–2 kHz.

## **MATERIALS AND METHODS**

#### **ANIMALS**

All procedures were performed in chinchillas and approved by the Purdue Animal Care and Use Committee. The neurophysiological data presented here were collected from 10 normal hearing control animals (143 fibers), 6 animals exposed to a 50 Hz band of Gaussian noise with a center frequency of 2 kHz for 4 h at 115 dB SPL (76 fibers), and 5 animals exposed to an octave band of Gaussian noise with a center frequency of 500 Hz for 2 h at 116 dB SPL (46 fibers).

#### **NOISE OVEREXPOSURES**

Noise overexposures were performed in a sound-attenuating booth under anesthesia using either a pair of dynamic loudspeakers (Fostex FT28D; for 2 kHz exposures) or single enclosed woofer (Selenium 10PW3; for 500 Hz exposures) suspended 25–30 cm above the animal. Anesthesia was induced with xylazine (1–2 mg/kg subcutaneous) followed after several minutes by ketamine (50–65 mg/kg intraperitoneal). Atropine (0.05 mg/kg intramuscular) was given to control mucous secretions and eye ointment was applied. Animals were held in position with a stereotaxic device, and body temperature was maintained at 37◦C using a feedback controlled heating pad (Physitemp TCAT2LV or Harvard Apparatus 50–7220F). Supplemental injections of ketamine (20–30 mg/kg intraperitoneal) were given as needed to maintain an areflexic state.

#### **NEUROPHYSIOLOGICAL RECORDINGS**

Neurophysiological data were recorded from auditory-nerve fibers under anesthesia 3 or more weeks after the noise overexposure using standard procedures in our lab (e.g., Kale and Heinz, 2010, 2012). Anesthesia was induced with xylazine and ketamine as described above, but maintained with sodium pentobarbital (∼15 mg/kg/2 h intravenous) for neurophysiological recordings. Physiological saline (1–2 ml/2 h intravenous) and lactated ringers (20–30 ml/24 h subcutaneous) were also given, and a tracheotomy performed to facilitate breathing. Animals were positioned in a stereotaxic device in a sound-attenuating booth. The skin and muscles overlying the skull were transected to expose the ear canals and bullae, and both ear canals were dissected to allow insertion of hollow ear bars. The right bulla was vented through 30 cm of polyethylene tubing. A craniotomy was opened in the posterior fossa and the cerebellum partially aspirated and retracted medially to expose the trunk of the auditory nerve bundle. Acoustic stimuli were presented through the right ear bar with a dynamic loudspeaker (Beyerdynamic DT48) and calibrated using a probe microphone placed within a few mm of the tympanum (Etymotic ER7C). Neurophysiological recordings were made using a 10–30 M glass microelectrode advanced into the auditory nerve with a hydraulic microdrive (Kopf 640). Recordings were amplified (Dagan 2400A) and band-pass filtered from 0.03 to 6 kHz (Krohn-Hite 3550). Spikes were identified using a time-amplitude window discriminator (BAK Electronics) and timed with 10-μs resolution.

Single fibers were isolated by listening for spikes on a monitor speaker while advancing the electrode through the auditory nerve during periodic stimulation with broadband noise. When a fiber was encountered, a tuning curve was recorded (**Figure 1D**, black line) using an automated procedure that tracked, as a function of stimulus frequency, the minimum SPL of a 50-ms tone required to evoke at least 1 more spike than a subsequent 50-ms silent period (Chintanpalli and Heinz, 2007). CF (**Figure 1D**, cross) was identified as the frequency of best sensitivity or, in noise-overexposed fibers, as the frequency of the breakpoint in the high frequency slope of the tuning curve because this value provides a robust estimate of CF prior to cochlear damage (Liberman, 1984). The bandwidth of the tuning curve 10 dB above threshold was also quantified (**Figure 1D**, gray line). Next, a sequence of 9 broadband Gaussian noise stimuli were presented repeatedly for up to 10 mins at 10–15 dB above the threshold for the noise stimulus until approximately 20,000 driven spikes were recorded. Due to noise-induced threshold elevation, the SPL of noise stimuli was

**FIGURE 1 | Estimation of temporal envelope coding. (A)** 2nd-order Wiener kernel (*h*2), computed from 2nd-order cross correlation of a broadband Gaussian noise stimulus and spike train response of a normal-hearing auditory-nerve fiber. Time axes (**A,B,E**) indicate time relative to the occurrence of a spike, and are plotted from a lower limit of 1 ms rather than 0 ms to more clearly show the structure of the kernels. Normalized amplitude color scales **(A–C)** are drawn to the right of each panel. **(B)** Spectro-temporal receptive field (STRF), or mean stimulus spectrogram preceding a spike, calculated from *h*2. **(C)** Modulation tuning function, calculated from the STRF. **(D)** Tuning curve showing threshold

and CF (black cross) and the 10-dB bandwidth of frequency tuning (gray line). **(E)** Temporal window of sensitivity (gray line), computed as the amplitude envelope of the 1st eigenvector of *h*2. Temporal windows were used to calculate the amplitude and latency of envelope coding (black cross) and duration of temporal sensitivity at 50, 40, 30, and 20% of peak amplitude (red dotted lines). **(F)** Temporal modulation transfer function (TMTF, black line), calculated from the modulation tuning function, and TMTF noise floor (gray dotted line). TMTFs were characterized based on roll-off rate (green line) and cut-off modulation frequencies measured 3, 4, 5, and 6 dB down from peak amplitude (red dotted lines).

on average ∼20 dB higher in noise-overexposed fibers than in unexposed controls (**Figure 2**). Noise stimuli were 1.7 s in duration with a bandwidth of 16.5 kHz and silent interval between stimuli of 1.2 s.

#### **WIENER-KERNEL COMPUTATIONS**

*h*<sup>2</sup> was computed from 2nd-order cross-correlation between the Gaussian noise stimulus waveform *x(t)* and the response train of *N* = ∼ 20,000 driven spikes (**Figure 1A**). Only spikes occurring more than 20 ms after stimulus onset and before stimulus offset were included in the cross-correlations, which were calculated with a sampling period of 0.02 ms and maximum time lag τ of 10.2 ms (512 points) or 20.4 ms (1024 points; for fibers with CF < 3 kHz). The basic computations for *h*<sup>2</sup> have been described previously in detail (van Dijk et al., 1994; Lewis et al., 2002; Recio-Spinoso et al., 2005; Temchin et al., 2005) Briefly, *h*2(τ1,τ2) is calculated as *<sup>N</sup>*<sup>0</sup> <sup>2</sup>*A*<sup>2</sup> [*R*2(τ1, τ2) − φ*xx*(τ<sup>2</sup> − τ1)], where τ<sup>1</sup> and τ<sup>2</sup> are time lags, *N*<sup>0</sup> is the mean driven spike rate, *A* is the instantaneous power of the noise, *<sup>R</sup>*<sup>2</sup> (τ1, <sup>τ</sup>2) <sup>=</sup> <sup>1</sup> *N* -*N <sup>i</sup>* <sup>=</sup> <sup>1</sup> *<sup>x</sup>*(*ti* <sup>−</sup> <sup>τ</sup>1)*x*(*ti* <sup>−</sup> τ2) is the 2nd-order reverse-correlation function, and φ*xx*(τ) is the autocorrelation function of the stimulus. So computed, *h*<sup>2</sup> is a matrix with units of spikes·*s* <sup>−</sup><sup>1</sup> · *Pa*−<sup>2</sup> that represents the non-linear interaction or "cross-talk" between the responses to two impulses (Recio-Spinoso et al., 2005).

In practice, *h*<sup>2</sup> can contain bands running parallel to the diagonal (separated in time by ∼1/CF), reflecting phase locking to the temporal envelope of the stimulus, and perpendicular to the diagonal, reflecting non-linearity in the phase-locked response to the temporal fine structure. In its 2-dimensional Fourier transform, the envelope component of *h*<sup>2</sup> falls into the 2nd and 4th quadrants while the fine structure component falls into quadrants 1 and 3 (Recio-Spinoso et al., 2005). Because our primary focus was envelope coding, we removed the fine structure component from *h*<sup>2</sup> by discarding the contents of the 1st and 3rd quadrants.

#### **QUANTIFYING TEMPORAL ENVELOPE CODING**

We quantified the temporal precision of envelope coding using spectral and temporal analysis methods. For the spectral analysis, we computed the STRF as the 1-dimensional Fourier transform of *h*<sup>2</sup> (**Figure 1B**; Lewis and van Dijk, 2004). Next, the modulation tuning function was calculated at the 2-dimensional Fourier transform of the STRF (**Figure 1C**), and collapsed across spectral

modulation frequencies to yield a TMTF (**Figure 1F**, black line; as in Woolley et al., 2005). This TMTF describes the temporal profile of the STRF. TMTFs were quantified based on roll-off rate (**Figure 1F**, green line; measured over the region of the function between peak amplitude and 6 dB below the peak) and cut-off modulation frequencies falling −3, −4, −5, and −6 dB from peak amplitude (red dotted lines). A noise floor for each TMTF (**Figure 1F**, gray dotted line) was generated from portions of the STRF occurring before and after the response (i.e., the first 2 ms and last 3 ms of the STRF in **Figure 1B**) using the same computations. Regions of the TMTF rising less than 3 dB above the noise floor were excluded as not statistically significant. In general, slower (less negative) roll-off rates and greater cut-off modulation frequencies correspond to greater temporal precision of envelope coding.

For the temporal analysis, we calculated the first eigenvector of *h*<sup>2</sup> as in previous work (**Figure 1E**, black line; Lewis et al., 2002; Recio-Spinoso et al., 2005). The amplitude envelope of this eigenvector (**Figure 1E**, gray line), determined using the Hilbert transform, represents the temporal window to which the fiber responds to acoustic stimulation with an increase in spike rate. For the fiber shown in **Figure 1**, for example, spikes are evoked by acoustic energy falling in a temporal window occurring 2.5–5 ms previously. The noise floor of the temporal window was calculated as the mean amplitude plus 3 standard deviations during the first 2 ms and last 3 ms of the waveform. Portions of the temporal window falling below the noise floor were excluded as not statistically significant. We quantified the duration of the temporal window at amplitude values corresponding to 20, 30, 40, and 50% of peak amplitude (**Figure 1D**, red dotted lines). In general, shorter durations at any given amplitude value reflect greater temporal precision.

Finally, we used the amplitude envelope of *h*<sup>2</sup> to quantify the amplitude and latency of temporal envelope coding (**Figure 1E**, black cross). To facilitate comparison of data across fibers with varying driven rates and response threshold levels, we normalized *h*<sup>2</sup> by dividing by *N*0*A*. So normalized, the amplitude values relate to a modulation factor of the mean firing at the average SPL of the noise stimulus.

#### **STATISTICAL ANALYSIS**

Statistical analyses were conducted using local regression (LOESS procedure) and mixed models in SAS (MIXED procedure; SAS Institute Inc.). Dependent variables were log10-transformed in all cases except for the analysis of temporal window latency. Local regressions were performed with a smoothing parameter of α = 0.5. In general, mixed model analyses were conducted with two continuous independent variables, log10(CF) and [log10(CF)]2. [log10(CF)]<sup>2</sup> was dropped from the model when not statistically significant (*P* > 0.05). Categorical independent variables included hearing status (normal vs. noise-overexposed) and spontaneous rate group (low[≤20 spikes/s] or high). Statistical inferences were drawn based on *F*-tests and *T*-tests comparing least squares means.

## **RESULTS**

#### **PATTERN OF HEARING LOSS**

Acoustic overexposure was associated with increases in both the threshold and 10-dB bandwidth of auditory-nerve fiber tuning curves (**Figure 3**). Threshold elevation and increased tuning bandwidth were most pronounced in fibers with CFs from 1.5 to 4 kHz, but occurred to some extent across the entire CF range studied (0.2–10 kHz).

Patterns of hearing loss were similar between the 2 kHz narrowband and 500 Hz octave band overexposure paradigms (**Figure 3**, red triangles and blue circles, respectively). For example, mean threshold elevation at CFs of 0.5, 1, 2, 4, and 8 kHz were 8, 20, 33, 25, and 25 dB, respectively, for the 2 kHz exposure and 20, 25, 26, 21, and 16 dB, respectively for the 500 Hz exposure. Temporal response patterns were also similar between the overexposed groups, and are therefore described together.

In basal auditory-nerve fibers with CFs > 2.5 kHz, noise overexposure was associated with a decrease in the proportion of fibers with low(≤20 spikes/s) spontaneous rates (SR; Chi-square = 4.79, *DF* = 1; *P* = 0.029; **Figure 4A**). Furthermore, noiseoverexposed basal fibers exhibited an increase in mean driven rate in response to noise stimuli presented at 10–15 dB sensation level [mean difference [±*SE* = 31.9 ± 9.8 spikes/s, *t*(65) = 3.25, *P* = 0.002; **Figure 4B**], suggestive of an increase in the slope of rateintensity level functions (for broadband noise). More apical fibers with *CF*s ≤ 2.5 kHz showed no significant variation with noise overexposure in the proportion of low-SR fibers (Chi-square = 0.08, *DF* = 1, *P* = 0.77) or driven firing rate [*t*(123) = −0.06, *P* = 0.95; **Figure 4**].

#### **TEMPORAL MODULATION TRANSFER FUNCTIONS**

TMTFs generated from Wiener-kernel analyses of auditorynerve fiber responses to broadband Gaussian noise invariably exhibited a roll-off in response amplitude with increasing temporal modulation frequency (**Figure 1F**). In general, roll-off rate was slower (i.e., less negative) in fibers with higher CFs

(**Figure 5**), consistent with expectations for the modulation spectra of cochlear-filtered narrowband noises with larger bandwidths (**Figure 3**) (Dau et al., 1999). Notably, the roll-off rate was also slower in noise-overexposed fibers than in normal-hearing control fibers. The mean difference in log10-transformed rolloff rate ±*SE* was −0.149 ± 0.033 [*t*(157) = −4.45, *P* < 0.001], which corresponds to a 29% reduction in roll-off rate in noiseoverexposed fibers compared to normal-hearing controls. This pattern is consistent with increased temporal precision with SNHL.

Cut-off modulation frequencies were generally greater in noise-overexposed fibers than in normal-hearing control fibers (**Figure 6**). Mean differences (±*SE*) in log10-transformed cutoff modulation frequencies measured 3, 4, 5, and 6 dB down the TMTF were 0.0741 ± 0.0249 [*t*(154) = 2.98, *P* = 0.003], 0.0833 ± 0.02509 [*t*(136) = 3.32, *P* = 0.001], 0.0989 ± 0.0248 [*t*(110) = 3.99, *P* < 0.001], and 0.1036 ± 0.0270 [*t*(82) = 3.84, *P* < 0.001], respectively. These results are consistent with the analysis of roll-off rate, and correspond to a 20–25% increase in cut-off modulations with our noise-induced SNHL.

TMTF parameters were generally similar between subpopulations of auditory-nerve fibers with low and high-SRs (**Table 1**;

rate plotted as a function of CF in normal hearing and noise-overexposed auditory-nerve fibers. Trend lines are based on local regression analyses. At CFs above 2.5 kHz, noise overexposure was associated with an increase in driven rate and decrease in the proportion of low-SR fibers.

**Figures 5B,C**), except for the −3 dB cut-off modulation frequency, which was moderately greater in the high-SR group (mean difference in log10-transformed cut-off modulation frequency [±*SE*]: 0.0532 ± 0.0254; *t*(152) = 2.09, *P* = 0.038).

#### **TEMPORAL WINDOWS**

Temporal windows of auditory-nerve fiber sensitivity to acoustic stimulation decreased in duration with increasing CF and moreover, were shorter in noise-overexposed fibers than in normal-hearing controls (**Figure 7**). Mean differences (±*SE*) in log10-transformed window duration measured at 50, 40, 30, and 20% of peak amplitude were −0.0982 ± 0.0189 [*t*(1830) = −5.20, *P* < 0.001], −0.0915 ± 0.0185 [*t*(181) = −4.94, *P* < 0.001], −0.0568 ± 0.0199 [*t*(172) = −2.86, *P* = 0.005], and −0.0537 ± 0.0204 [*t*(153) = −2.63, *P* = 0.009], respectively, consistent with the TMTF analyses. On average, noise-induced SNHL decreased the duration of temporal sensitivity by 10–20%, with greater reductions observed for measurements taken closer to peak amplitude (i.e., 40 and 50% of peak amplitude).

Temporal window duration and changes in temporal window duration with noise overexposure were similar between fibers with low and high SRs (**Table 1**).

**FIGURE 5 | TMTF roll-off rate. (A)** The roll-off rate of TMTFs plotted as a function of CF in normal hearing and noise-overexposed auditory-nerve fibers. Data from subpopulations of fibers with low and high spontaneous rate are shown in panels **(B)** and **(C)**. Trend lines are based on local regression analyses. Roll-off rate was 30% slower (less negative) in noise-overexposed fibers than in normal-hearing controls.

## **AMPLITUDE AND LATENCY OF TEMPORAL ENVELOPE CODING**

Noise-induced SNHL was associated with a moderate increase in the amplitude of envelope coding in fibers with CFs between 1 and 2 kHz (**Figure 8A**). The mean (±*SE*) increase in log10-transformed amplitude was 0.193 ± 0.044 [*t*(60) = 4.38,

**FIGURE 6 | TMTF cut-off modulation frequencies.** Cut-off modulation frequencies measured 3, 4, 5, and 6 dB down from the peak of the TMTF plotted as a function of CF in normal hearing and noise-overexposed fibers. Modulation cut-offs in dB are marked at the top left of each panel, and trend lines are based on local regression analyses. Cut-off modulation frequencies were 20–25% greater in noise-overexposed fibers than in normal-hearing controls.


**Table 1 | Statistical tests for effects of SR group on temporal precision.**

*DF lists the numerator followed by the denominator degrees of freedom.*

*P* < 0.001], which corresponds to an increase of 56%. Envelope coding at higher and lower CFs was similar between noiseoverexposed fibers and normal-hearing controls [*CF* > 2: *t*(73) = −1.91, *P* = 0.06; *CF* < 1: *t*(46) = −0.57, *P* = 0.57].

The latency of envelope coding decreased with increasing CF and notably, was shorter in noise-overexposed fibers than in normal-hearing controls (**Figure 8B**). Across fibers with CFs greater than 0.6 kHz, noise-induced SNHL decreased latency by 0.434 ± 0.061 ms [mean ± *SE*; *t*(174) = −7.08, *P* < 0.001].

#### **DISCUSSION**

The results of the present study show that noise-induced SNHL increases the temporal precision of envelope coding in auditory nerve fibers by 20–30% at equal stimulus sensation level (i.e., 10–15 dB above threshold). Furthermore, SNHL decreases response latency by 0.4 ms and, in fibers with CFs from 1 to 2 kHz, amplifies the representation of envelope structure by 50%.

The increase in temporal precision with SNHL demonstrated here can most likely be attributed to broader cochlear frequency tuning. Broadly tuned systems have a short impulse response that increases sensitivity to rapid temporal envelope modulations of the input stimulus. Interestingly, tuning curve bandwidths of

some noise-overexposed fibers were 100–200% greater than in normal-hearing controls, but temporal precision rarely increased by more than 20–30%. This difference suggests that while broader frequency tuning with SNHL may allow an increase in the temporal precision of envelope coding, temporal precision can only increase to a degree before it becomes constrained by additional limiting factors such as neural refractoriness and adaptation. A similar phenomenon appears to occur in the normal-hearing cochlea of cats, where temporal precision increases with increasing tuning bandwidth up to a CF of approximately 10 kHz, above which temporal precision remains constant despite further increases in tuning bandwidth (Joris and Yin, 1992).

Kale and Heinz (2012) examined the effects of SNHL on the temporal precision of envelope coding using TMTFs generated from auditory-nerve fiber responses to SAM tones. While -3 dB cut-off modulation frequency showed similar CF dependence between SAM tone-based TMTFs and TMTFs from the current study, SAM tone-based TMTFs failed to show a consistent change in temporal precision with SNHL. Wiener kernel-based TMTFs may be more sensitive to the effects of SNHL because, due to the stimulus, fibers are stimulated over their entire frequency tuning bandwidth with a dynamically varying temporal envelope. This task may not only be more challenging, but also more representative of the fiber's behavior during processing of perceptually relevant signals such as speech in fluctuating background noise.

While our results show that temporal precision increases with noise-induced SNHL at equal sensation level, it remains unclear how temporal precision might compare between normal hearing and noise-overexposed auditory-nerve fibers at equal SPL. If temporal precision in the normal-hearing cochlea increases with SPL, as might be expected based on increasing tuning bandwidth with level, temporal precision might converge somewhat between groups at equal SPL. Note, however, that outer hair cell dysfunction causes broader-than-normal frequency tuning up to at least 75 dB SPL (Ruggero and Rich, 1991), suggesting that increased temporal precision with SNHL should persist. Furthermore, previous studies using SAM tone stimuli have shown relatively limited variation in TMTF shape with stimulus level in normalhearing animals (Joris and Yin, 1992). Greater knowledge of changes in temporal precision with sound level in both normal hearing and impaired cochleae are high priorities for future research.

It should be noted that CF in noise-overexposed fibers was assigned based on the breakpoint in the high-frequency slope of the tuning curve (Liberman, 1984). While we have no reason to suspect that estimates of CF were biased in this group, significant overestimation would be expected to result in a finding of enhanced temporal precision because temporal precision increases with increasing CF (e.g., see **Figures 5**–**7**).

The decrease observed in the proportion of basal auditorynerve fibers with low SRs is consistent with previous findings that noise overexposure causes selective degeneration of low-SR fibers. Underrepresentation of low-SR fibers was noted in an earlier study of noise-induced permanent hearing loss in cats (Liberman, 1978). More recent results from a study of guinea pigs suggest that even temporary threshold shifts associated with mild noise overexposure lead to selective degeneration of low-SR fibers (Furman et al., 2013).

While selective loss of low-SR fibers due to SNHL is expected to adversely affect perceptual abilities under real-world listening conditions based on particularly robust coding of signals in noise and high-SPL signals in this group (e.g., Costalupes et al., 1984), underrepresentation of low-SR fibers did not appear to contribute to the differences in temporal precision observed in the present study (see **Table 1**). Differences in temporal precision related to SR might be more prominent at higher SPL. Differences in temporal precision were also not obviously related to mean driven rate. Whereas increases in driven rate were limited to fibers with CFs above 2.5 kHz, changes in temporal precision spanned the entire CF range sampled.

Our finding of amplified envelope coding is consistent with the results of previous physiological work involving SAM tones and single-formant stimuli (Kale and Heinz, 2010). Amplified envelope coding with SNHL may be related to a variety of factors including a reduction in fast-acting cochlear compression with outer hair cell damage. Reduced compression, which has been hypothesized to underlie perceptual "loudness recruitment" or abnormal growth of loudness with increasing SPL, increases the slope of the input-output function of the basilar membrane and therefore leads to larger modulations of basilar membrane velocity (and hence, spike rate) for a given modulation of the stimulus amplitude envelope. Amplified envelope coding with SNHL may also reflect increases in the slope of auditory-nerve fiber rate level functions associated with partial inner hair cell damage (loss of component-1 due to loss of the tallest row of stereocilia; Liberman and Kiang, 1984; Heinz and Young, 2004; Kale and Heinz, 2010; but see **Figure 4**) and changes in auditory-nerve response temporal dynamics (Scheidt et al., 2010).

Taken together with other physiological data (Kale and Heinz, 2010, 2012), these new results help explain previous behavioral findings of enhanced perceptual salience of envelope structure with SNHL. In individuals with unilateral SNHL, more modulation depth must be applied to 1-kHz tones presented to the unimpaired ear than the impaired ear to evoke a sensation of equal modulation depth (Moore et al., 1996).

The 0.4 ms decrease in response latency with noise-induced SNHL found here is consistent with previous results showing reduced response latency to clicks and tone bursts with SNHL, at least at equal sensation level. Noise-overexposure in chinchillas decreases the latency of auditory-nerve fiber onset responses to clicks and tones (Salvi et al., 1979; Scheidt et al., 2010), while kanamycin-induced damage in guinea pigs decreases compound action potential latency (Wang and Dallos, 1972). Similarly, studies employing scalp-recorded auditory evoked potentials have demonstrated reductions in response latency in chinchillas (Henry et al., 2011) and human subjects (Don et al., 1998; Strelcyk et al., 2009). The differences in response latency at equal (high) SPL is not known, but might be smaller in magnitude based on previous findings that the latency of second-order Wiener kernels decreases with increasing level in normal-hearing chinchilla auditory-nerve fibers (Recio-Spinoso et al., 2005).

In conclusion, the changes in envelope coding demonstrated here may contribute to speech perception problems in people with SNHL. Stronger coding of temporal envelope cues and coding of faster envelope modulations may serve as distractions from more relevant cues needed to perceive speech in environments with fluctuating background noise. Several studies have simulated loudness recruitment in normal-hearing listeners to examine the possible effects of enhanced temporal envelope structure on perception of speech in noise. Enhanced envelope structure was shown to increase speech discrimination thresholds by up to 6 dB in steady background noise and by 11–13 dB in single-talker babble (Moore and Glasberg, 1993; Moore et al., 1995). The development of new speech processing strategies aimed at restoring normal-hearing temporal envelope coding in the impaired cochlea may be a promising direction for future research.

#### **AUTHOR CONTRIBUTIONS**

Kenneth S. Henry, Sushrut Kale, and Michael G. Heinz designed the experiments and collected data. Kenneth S. Henry analyzed the data and wrote the manuscript with assistance from Sushrut Kale and Michael G. Heinz.

## **ACKNOWLEDGMENTS**

This research was supported by grants F32-DC012236 and R01- DC009838 from the National Institute on Deafness and other Communication Disorders.

## **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 16 November 2013; accepted: 26 January 2014; published online: 17 February 2014.*

*Citation: Henry KS, Kale S and Heinz MG (2014) Noise-induced hearing loss increases the temporal precision of complex envelope coding by auditory-nerve fibers. Front. Syst. Neurosci. 8:20. doi: 10.3389/fnsys.2014.00020*

*This article was submitted to the journal Frontiers in Systems Neuroscience.*

*Copyright © 2014 Henry, Kale and Heinz. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Unilateral hearing during development: hemispheric specificity in plastic reorganizations

#### *Andrej Kral <sup>1</sup> \*, Silvia Heid1,2, Peter Hubka1 and Jochen Tillein1,2*

*<sup>1</sup> Cluster of Excellence, Department of Experimental Otology, Institute of Audioneurotechnology, ENT Clinics, Hannover Medical School, Hannover, Germany <sup>2</sup> Department of Physiology and Otolaryngology, J. W. Goethe University, Frankfurt am Main, Germany*

#### *Edited by:*

*Jonathan E. Peelle, Washington University in St. Louis, USA*

#### *Reviewed by:*

*Leonard M. Kitzes, University of California, Irvine, USA Martin Pienkowski, Salus University, USA*

#### *\*Correspondence:*

*Andrej Kral, Institute of Audioneurotechnology, Hannover School of Medicine, Feodor-Lynen-Str. 35, D-30625 Hannover, Germany e-mail: kral.andrej@mh-hannover.de* The present study investigates the hemispheric contributions of neuronal reorganization following early single-sided hearing (unilateral deafness). The experiments were performed on ten cats from our colony of deaf white cats. Two were identified in early hearing screening as unilaterally congenitally deaf. The remaining eight were bilaterally congenitally deaf, unilaterally implanted at different ages with a cochlear implant. Implanted animals were chronically stimulated using a single-channel portable signal processor for two to five months. Microelectrode recordings were performed at the primary auditory cortex under stimulation at the hearing and deaf ear with bilateral cochlear implants. Local field potentials (LFPs) were compared at the cortex ipsilateral and contralateral to the hearing ear. The focus of the study was on the morphology and the onset latency of the LFPs. With respect to morphology of LFPs, pronounced hemisphere-specific effects were observed. Morphology of amplitude-normalized LFPs for stimulation of the deaf and the hearing ear was similar for responses recorded at the same hemisphere. However, when comparisons were performed between the hemispheres, the morphology was more dissimilar even though the same ear was stimulated. This demonstrates hemispheric specificity of some cortical adaptations irrespective of the ear stimulated. The results suggest a specific adaptation process at the hemisphere ipsilateral to the hearing ear, involving specific (down-regulated inhibitory) mechanisms not found in the contralateral hemisphere. Finally, onset latencies revealed that the sensitive period for the cortex ipsilateral to the hearing ear is shorter than that for the contralateral cortex. Unilateral hearing experience leads to a functionally-asymmetric brain with different neuronal reorganizations and different sensitive periods involved.

**Keywords: cochlear implant, plasticity, single-sided deafness, critical periods, development**

## **INTRODUCTION**

In developmental manipulations of the symmetry of auditory input, as occurs with unilateral deafness (Reale et al., 1987; Vale et al., 2004; Langers et al., 2005; Burton et al., 2012; Kral et al., 2013) or asymmetric moderate hearing loss (King et al., 2001; Popescu and Polley, 2010), the hemispheres can be differentiated in respect of the anatomical relationship to the (better) hearing ear. Plastic reorganizations are often reported in the hemisphere contralateral to the hearing ear. However, the ipsilateral cortex also receives asymmetric input and likely participates in behavioral consequences of unilateral hearing. The present study investigates whether the function of the primary field A1 of the ipsilateral and the contralateral cortex differs in unilateral deafness.

The primary auditory cortex contains mainly binaural neurons—neurons responsive to stimulation of only one ear are virtually absent (Zhang et al., 2004). Many different aural interaction patterns have been described in neuronal recordings, even in the same neurons, depending on the exact binaural temporal and intensity relations (Zhang et al., 2004). The majority of auditory neurons has receptive fields covering large portions of the contralateral hemifield (Middlebrooks et al., 1994). Stimulation of one ear most frequently leads to excitation in the neurons of the contralateral cortex but may cause excitation or inhibition in the ipsilateral cortex (Imig and Adrián, 1977). Because of the stronger responses at the contralateral cortex, and because of the shorter latency of the responses at the contralateral cortex, the term "aural dominance" (Imig and Adrián, 1977) or aural preference (Kral et al., 2013) has been introduced. Contralateral "dominance" is the consequence of the cortical representation of the contralateral acoustic hemifield.

Recently, effects of unilateral deafness have attracted clinical interest owing to the predominantly monaural therapy of prelingual deafness with one cochlear implant (Graham et al., 2009; Gordon et al., 2013) and the relatively high incidence of unilateral deafness (Eiserman et al., 2008; Watkin and Baldwin, 2012). Unilateral deafness is now considered an indication for cochlear implantation of the deaf ear, but so far mainly in cases of postlingual deafness due to tinnitus in the deaf ear (Vermeire et al., 2008; Buechner et al., 2010; Firszt et al., 2012). The effects of congenital or early onset of unilateral hearing are less well explored.

Many previous studies have investigated the plasticity of the brain following cochlear implantation and have described, both in humans and in an animal model, sensitive developmental periods for plasticity (review in Kral and Sharma, 2012). When hearing is imbalanced (as caused by early unilateral hearing loss following periods of binaural hearing), the auditory system reorganizes (Bilecen et al., 2000; King et al., 2001; Langers et al., 2005; Popescu and Polley, 2010; Kral et al., 2013). Central reorganizations have been suggested previously in cases including sequential cochlear implantations in children (Peters et al., 2007; Graham et al., 2009; Gordon et al., 2013; Illg et al., 2013). The outcomes for the implantation of the second ear critically depend on the age at implantation of the first ear and the delay between the implantations (Sharma et al., 2007; Graham et al., 2009; Illg et al., 2013). Recently, two studies elucidated the mechanisms behind such phenomena. In the first study, a sensitive period for the reorganization of aural preference at the cortex ipsilateral to the first implanted ear has been demonstrated using electrophysiological methods in cats (Kral et al., 2013). This study uncovered the mechanism behind the critical dependence of the outcome of the second implantation on age at first implantation. The second study confirmed and extended the findings using electroencephalographic recordings in sequentially-implanted children (Gordon et al., 2013).

The present study directly compares the hemispheric effects of unilateral hearing. Congenitally deaf (white) cats were selected from a colony of deaf white cats using an early hearing screening procedure described earlier (Heid et al., 1998). Some animals were implanted with a cochlear implant during early development, so that stimulation onset was possible from 2.5 to 6 months of age, covering the age of synaptic development of deaf cats (**Figure 1**, Kral and Sharma, 2012). Two animals were born unilaterally deaf (Kral et al., 2013) and were used as models of very early asymmetric hearing (single-sided deafness).

The present study demonstrates that reorganizations following unilateral hearing (deafness) show a specificity for the hemisphere. The cortex ipsilateral to the hearing ear demonstrates a functional shift toward the hearing ear (Kral et al., 2013). On the other hand, at the hemisphere contralateral to the hearing ear a reduction of the responses to the deaf ear was observed. Despite some ear-specific effects in onset latency, the morphology of the local field potentials (LFPs) showed more pronounced differences between hemispheres than differences observed on the same hemisphere when comparing the responses to stimulation of the deaf and the hearing ear. This demonstrates hemispheric specificity of the reorganizations. Finally, reorganizations observed in onset latency demonstrated a shorter sensitive period for plasticity in the hemisphere ipsilateral to the hearing ear when compared to the contralateral hemisphere.

#### **MATERIALS AND METHODS**

The present experiments complement a previous study and the methods are described in detail there (Kral et al., 2013). Here, we summarize the most important technical aspects of the method.

#### **ANIMALS**

Experiments were performed on 10 cats. In all animals, hearing was strongly asymmetric (**Table 1**): two animals had normal hearing in one ear (hearing thresholds being <40 dB SPL) and were unilaterally congenitally deaf in the other ear (hearing thresholds

**FIGURE 1 | Onset of unilateral hearing in ten animals (arrows) compared with the developmental time-line of deaf cat auditory system as reflected in the auditory cortex.** The intensity of the color represents the extent of the given change. "Functional synaptogenesis" (development of synaptic function, measured by current source density analyses of evoked responses in the cortex, thus reflecting synaptic counts combined with synaptic function) in binaurally deaf animals has been shown to be delayed by 1.5 months compared to that in hearing animals, with a maximum of evoked synaptic currents around 3 months (Kral et al., 2005, red). Previous studies in congenitally deaf cats demonstrated two sensitive periods: one for reorganization reflected at cortex ipsilateral to hearing ear (green), terminating between 3.5 and 4.2 months (Kral et al., 2013), the other sensitive period for increase in activated volume of tissue ("active area") in the contralateral field A1 (and the corresponding response latencies; black), terminating between 5 and 6 months (Kral et al., 2002; Kral and Sharma, 2012). In the present study, distinction between early and late onset was based on sensitive period of aural preference (Kral et al., 2013), whereas sensitive period for expansion of the active area extends beyond this timeframe. Gray arrows: unilaterally congenitally deaf animals; black arrows: binaurally deaf animals, implanted with a unilateral cochlear implant at different postnatal times.

**Table 1 | Implantations, age at final experiments and investigated cortex for the animals in the study.**


>110 dB SPL). The remaining animals were binaurally congenitally deaf cats (CDCs) monaurally implanted with a custom-made cochlear implant at the age of 2.5–6.0 months and subjected to chronic electrostimulation. The implantation ages of unilateral animals are given in **Table 1**. Not all stimulus/recording combinations could be investigated in each animal.

All animals obtained from our colony of deaf white cats underwent hearing screening within the fourth week of life. The screening procedure was based on a longitudinal study of hearing in deaf white cats recorded every two days after birth and is described in detail elsewhere (Heid et al., 1998).

All experiments were approved by the local state authorities and were performed in compliance with the guidelines of the European Community for the care and use of laboratory animals (EU VD 86/609/EEC) and the German Animal Welfare Act (TierSchG).

To investigate developmental plasticity in animals with unilateral hearing, chronic stimulation by a cochlear implant was initiated at two different ages, reflecting the results of previous studies (**Figure 1**):


#### **IMPLANTATION AND CHRONIC STIMULATION**

Implantations were performed under sterile conditions in anesthetized animals as described previously (Kral et al., 2002). Animals were premedicated with 0.25 mg atropine i.p. and anaesthetized with ketamin hydrochloride (24.5 mg/kg Ketavet, Parker-Davis, Germany) and xylazine hydrochloride (1 mg/kg, Bayer, Germany) with supplementary doses when necessary. The animal's status was monitored by capnometry, electroencephalography, electrocardiogram and pulse oximetry. The bulla was exposed via a retrocochlear cutaneous incision and a subsequent soft separation of the muscles overlying it. The bulla was opened using a drill. The membrane of the round window was carefully removed with a sharp hook and the electrode carrier was implanted through the round window. The implant was fixed using a suture (non-absorbable thread) at the dorsal thickened part of the bulla. The bulla was tightly closed using dental acrylic. The contacting leads for the implant were led subcutaneously and additionally secured at the lambdoideal crista. The contact with the processor was transcutaneous in the interscapular line. At the end of the implantation, the electrical thresholds were determined using brainstem-evoked responses. The transcutaneous penetration site was covered with a jacket containing the processor. Animals were treated with ampicilline (100 mg s.c.) for 7 days post-surgery. Three days after the operation the electrical thresholds were behaviorally tested using pinna reflexes (details in Kral et al., 2002). The signal processor was activated subsequent to animal's full recovery (∼7 days after implantation). The moment at which the implant was activated determined the age of onset of asymmetric hearing.

Chronic stimulation was performed using single-channel portable processors with a compressed analogue coding strategy in monopolar stimulation. Stimulation was applied continuously without interruption (on a 24/7 basis; for details see supplementary material in Kral et al., 2013). To exclude the effects of stimulation duration, three animals were stimulated for two months (∼1440 h of implant stimulation), one animal for three months (∼2160 h of implant stimulation) and four animals were stimulated for five months (∼3600 h of implant stimulation). The animals were trained to respond to a brief tone of 732 Hz frequency by picking up a reward at a specified location on 5 days a week (20 stimulus presentations per session). The success rate exceeded 50% after 7–21 days of training in all animals and confirmed that the animals used the acoustic input to actively control behavior.

#### **ACUTE EXPERIMENTS: STIMULATION AND RECORDING**

For acute experiments, all animals were premedicated with 0.25 mg atropine i.p. and initially anaesthetized with ketamin hydrochloride (24.5 mg/kg Ketavet, Parker-Davis, Germany) and propionylpromazine phosphate (2.1 mg/kg Combelen, Bayer, Germany) or xylazine hydrochloride (1 mg/kg, Bayer, Germany). The animals were then tracheotomized and artificially ventilated with 50% O2 and 50% N2O, with a 0.2–1.5% concentration of isoflurane (Lilly, Germany) added to maintain a controlled depth of anesthesia (Kral et al., 1999). It was monitored using heart-rate, end-tidal CO2, muscle tone and EEG signals. Endtidal CO2 was maintained below 4%. Core temperature was kept above 37.5◦C using a homeothermic blanket. The animals' status was monitored further by blood gas concentration measurements, pH, bicarbonate concentration and base excess, glycaemia and oxygen saturation. A modified Ringer's solution containing bicarbonate (dosage based on the base excess) was infused i.v. The internal state was monitored by testing capillary blood every 12 h.

The animal's head was fixed in a stereotactic holder (Horsley-Clarke). Both bullae and ear canals were exposed. In order to record evoked auditory brainstem responses, a small trephination was drilled at the vertex and a silver-ball electrode (diameter 1 mm) was attached epidurally. Hearing status was tested at the beginning of the experiments. So as to prevent electrophonic responses, the hair cells in normal-hearing ears were destroyed by intracochlear instillation of 300μl 2.5% neomycine sulphate solution over a 5 min. period and subsequent rinsing using Ringer's solution. The absence of hearing was subsequently confirmed by the absence of brainstem-evoked responses.

Stimulation in the final acute experiments was performed using cochlear implants inserted bilaterally into the cochleae. In chronically electrically stimulated animals, the chronic implant was used for stimulation at the "hearing" ear. The stimulus was a biphasic pulse (200μs/phase) applied through the apicalmost electrode contact at 10 dB above the lowest cortical threshold (see Kral et al., 2009).

For recording, a trephination above the auditory cortex was performed and the dura was opened. First, within a grid of 3 × 3 positions, LFPs were recorded using low-impedance electrodes to determine the lowest cortical threshold for stimulation with a biphasic pulse (200μs/phase) applied through the apicalmost electrode of the implant. Mapping of cortical responses was performed using glass microelectrodes (*Z* ≈ 6 M-) that were moved along the auditory cortex with a micromanipulator (1μm precision) at the cortical surface. Stimulus was 10 dB above the lowest cortical thresholds, the current level at which electrically-evoked LFPs reach the saturation point. By choosing that intensity the whole active volume of the cortex can be determined. The signals were amplified 5000–10,000 times, bandpass filtered (0.01– 10 kHz), digitized (at a sampling rate of 25 kHz) and 50 responses were averaged to determine the mean LFP at each recording position. First, contralateral cortex was investigated, followed by the ipsilateral cortex. The tested combinations of recording site and stimulation site are shown in **Table 1**.

#### **DATA PROCESSING**

From the more than 100 recordings at the cortical surface within field A1 and adjacent fields, cortical activation maps were constructed (**Figure 2**).

The electrical artifacts from the stimulation occurred between 0.0 and 0.6 ms post-stimulus and did not influence the response (latency > 7 ms). The signal before the response (500 ms duration in each animal) was characterized by computing its mean and standard deviation. The threshold of mean ± 4∗standard deviation was then used for detecting neuronal responses. The threshold attained absolute values of 10–20μV.

Using this measure, onset latencies were detected for the first positive response (*Pa* component, **Figure 3**) of the LFPs. For each recording position of the surface maps, these data were determined for crossed and uncrossed stimulation. Normality of the data was tested using the Jarque-Bera test (5% significance level) and if confirmed, a two-tailed *t*-test was carried out; if it failed, a Wilcoxon-Mann-Whitney test (both at 5% significance level) was used. Additionally, for each position a paired difference of the onset latency was computed. The medians were used as population measures since the latency values showed a significantly skewed distribution. Comparisons between the experimental groups were performed by applying the Wilcoxon-Mann-Whitney test (two-tailed, 5% significance level).

Peak amplitudes of *Pa* components were determined using an automated procedure (based on the time derivatives of the signals). Amplitudes below 50μV were discarded from the processing to minimize the effect of noise on small amplitude signals.

Some analyses were performed exclusively from six positions within the area with the largest responses (the "hot-spot," Kral et al., 2009). These responses were averaged and used in comparisons. For this, a sliding window of 2 ms was used, comparisons were performed using the Wilcoxon-Mann-Whitney two-tailed *t*-test at α = 0.1%, adjusted to multiple comparisons using the false detection rate procedure (Benjamini and Hochberg, 1995).

stimulation-recording configurations. In the inset, small loudspeaker indicates hearing side, and green spot the cortex in which recordings are made. Blue and red wires indicate ear stimulated with a cochlear implant. Largest peak amplitudes and peak latencies (color) are shown as a function of recording position. Responses below 50μV were not contralateral to hearing ear. Few responses above 50μV were observed. **(B)** in the same animal, stimulation of hearing ear extensively activated the whole contralateral primary auditory cortex. **(C,D)** at cortex ipsilateral to hearing ear, stimulation of both deaf (left) and hearing (right) ear resulted in strong activation of portions of auditory cortex.

**FIGURE 3 | Effect of stimulation duration (2 and 5 months) on morphology of mean local field potentials (LFPs) from hot spots in two early-implanted animals.** Inset shows configuration: recording was at the ipsilateral cortex (green spot), stimulation of hearing and deaf ear. Response color denotes stimulation site (red: uncrossed stimulation, blue: crossed stimulation); shaded area is the temporally smoothed region of one standard deviation around mean. Gray bar

The comparison of the morphology of mean LFP was assessed using the dissimilarity index (DI, Kral et al., 2009):

$$DI = 0.5 \cdot \sum\_{T} \left| \frac{\text{LFP}\_1(T)}{\sum\_{t} |\text{LFP}\_1(t)|} - \frac{\text{LFP}\_2(T)}{\sum\_{t} |\text{LFP}\_2(t)|} \right|.$$

This index considers the morphology of the LFPs irrespective of the amplitude. A dissimilarity index can reach values between 0 and 1, whereas 0 represents identical LFPs. Identical but timereversed signals yield a high dissimilarity index due to the sample-by-sample comparison.

## **RESULTS**

The present comparisons concentrated on LFP morphology and onset latency. First, the response maps and morphology of the LFPs are described, followed by onset latency comparisons.

#### **TERMINOLOGY**

In the present paper, the terms *contralateral* and *ipsilateral* are always used with reference to the side of the "chronically" hearing ear. The *hearing ear* may be either normal hearing (in unilaterally congenitally deaf animals) or born deaf and implanted at a later age (in the chronically electrically stimulated animals). The ipsilateral cortex is the one on the same side of the brain as the hearing ear.

On the other hand, the term "*crossed*" is used relative to the side of a given recording or stimulation and always refers to the "opposite" side of the brain. The term "*uncrossed*" refers to the same side as the ear stimulated or the cortex in which recordings were made.

Thus, if the left ear was the hearing ear, and a probe stimulus was presented at the right (i.e., the deaf) ear, the crossed response refers to the response recorded at the left cortex and the uncrossed

response refers to that at the right cortex. In this case, the ipsilateral cortex is then the left one and the contralateral cortex is the right one.

#### **SPATIAL AND TEMPORAL CORTICAL RESPONSE PATTERN**

Crossed and uncrossed responses compared at the same cortex tend to result in different LFPs in normal hearing animals, with crossed responses showing larger amplitudes and shorter latencies (Kral et al., 2009). In binaurally deaf animals, this difference was diminished in amplitudes and absent in latencies (Kral et al., 2009). The congenitally unilaterally deaf animal exhibited a remarkable pattern of activity when the hemispheres were compared directly (**Figure 2**). The cortex ipsilateral to the hearing ear showed responses to both the hearing as well as the deaf ear (Kral et al., 2013). The cortex contralateral to the hearing ear, on the other hand, showed very weak responses to the stimulation of the deaf ear following congenital unilateral deafness despite good responsiveness to the hearing ear.

To minimize the effect of different spatial position when comparing responses at the contralateral and the ipsilateral cortex, further analysis was concentrated on hot spots: the areas of largest responses. These were considered as being the cortical representation of the stimulated region in the cochlea and were therefore considered the functionally corresponding sites in the cortex. From a total area of 1.5 mm<sup>2</sup> within the hot spot, all LFPs were averaged (∼6 recording positions; **Figure 3**).

First, the effect of stimulation duration was compared in earlyimplanted animals (**Figure 3**, animals 3 and 4, **Table 1**): Longer experience with a unilateral cochlear implant increased the amplitudes of the responses in the hot spot, and more so for the hearing ear; however, responses to both the hearing and the deaf ear increased. At longer latencies (component *Pb* and later), more significant differences for the two stimulation sites were observed than at short latencies (**Figure 3**; see also below).

Crossed responses to stimulation of the deaf ear and to stimulation of the hearing ear were compared next (**Figure 4**). For crossed responses the maximum amplitudes were of a similar order of magnitude for both ears, although significant interindividual variability was noted in the maximum potential. However, in the congenitally unilaterally deaf animal, a marked

blue, for deaf ear and recording at the ipsilateral hemisphere in red, shaded area is the temporally smoothed region of one standard deviation around mean. Gray bars below curve indicate statistical significance of a running two-tailed Wilcoxon-Mann-Whitney test (α = 0.001, corrected by false detection rate procedure, 2 ms window). Ear-specific effects were observed in both animals; however, regions of statistical significance were larger in the congenital onset case. Maximum amplitudes in both compared animals were within 200–400μV, as reported previously (Kral et al., 2009).

difference in the morphology of the response was observed at longer latencies (**Figure 4**, top). In particular, component *Pb* was well separated from the *Pa* component when the deaf ear was stimulated (**Figure 4** top, red curve), whereas the components appeared fused when the hearing ear was stimulated (**Figure 4** top, blue curve). A morphological difference of this nature was less expressed in late-implanted animals, particularly since *Pb* could not always be identified (**Figure 4**, lower panel).

At the ipsilateral hemisphere early-implanted animals had uncrossed responses larger than crossed ones (**Figure 5A**). This was not observed in the late-implanted animals (**Figure 5C**, comp. Kral et al., 2013). The crossed and uncrossed mean response had similar morphology when considered at the same hemisphere (comparisons within the panels **A–D** in **Figure 5**). At the cortex contralateral to the hearing ear, a marked difference in amplitude of the crossed and uncrossed response was evident in the congenitally unilaterally deaf animal (**Figure 5B**; see also **Figure 2**). Nonetheless, even in this case it was apparent that the LFP morphology within a hemisphere was more similar than between hemispheres. In total, the responses at the same hemisphere (comparisons within panels **Figures 5A–D**) were more similar than those compared between hemispheres (comparisons between panels in **Figures 5A,B** or **Figures 5C,D**), irrespective of the age of onset.

To quantify this observation, dissimilarity index for *Pa*/*Pb* complex was computed for different intrahemispheric and interhemispheric comparisons (**Figure 6**). A Kruskal-Wallis test demonstrated a significant effect of the configuration (*p* = 0.01). For the crossed and uncrossed responses at the ipsilateral hemisphere, the LFP morphology was similar (a small *DI* of 0.16 ± 0.08, *n* = 6). However, comparing the crossed response of the contralateral hemisphere to the uncrossed response of the ipsilateral hemisphere resulted in larger difference (same animals, *DI* = 0.38 ± 0.12, two-tailed Wilcoxon-Mann-Whitney test, *p* = 0.010). Further, crossed responses (compared between hemispheres) showed a high dissimilarity index of similar magnitude as the latter comparison (*DI* = 0.41 ± 0.15, *p* = 0.749). The dissimilarity index at the contralateral hemisphere for the crossed vs. uncrossed comparison was not different from the same at the ipsilateral hemisphere (*DI* = 0.16 ± 0.08 vs. 0.16 ± 0.05, *p* = 1.000). Thus, interhemispheric comparisons resulted in larger *DI*s than intrahemispheric comparisons. Taken together, these results demonstrate that the cortical response shows hemispheric specificity in morphology irrespective of the ear that is stimulated. Although the largest *DI*s were found in the congenitally unilaterally deaf animal (**Figures 5A,B**), the correlation with onset age was not significant in *DI* measures (α = 5%).

#### **ONSET LATENCIES**

A previous study demonstrated high plasticity of LFP onset latency and its sensitivity to developmental modifications (Kral et al., 2013). This measure showed lower variance over cortical positions and animal than peak amplitudes, so that interhemispheric comparisons were statistically testable and recording positions biased less to the comparisons. For these reasons, further comparisons were performed using onset latencies of LFPs. In the previous study (Kral et al., 2013), we performed paired

comparisons at the ipsilateral cortex; in the present study, we had to perform unpaired comparisons (between hemispheres and animals).

First, the effect of stimulation duration and implantation age on onset latencies of crossed responses was determined at the contralateral hemisphere for stimulation of the hearing ear (**Figure 7**). In animals implanted early (3.5 months), as expected, stimulation duration was associated with a decrease in median onset latency, demonstrating that onset latency does change as a result of the stimulation. Moreover, increasing implantation age decreased the effect of stimulation on contralateral onset latency after 5 months of chronic electrical stimulation, changing from 9.1 ± 1.89 ms (median ± absolute deviation of the median) at 3.5 months to 9.95 ± 1.00 ms at 5.0 months to finally reach 10.5 ± 2.63 at 6 months onset (significant increases with increasing age of onset, two-tailed Wilcoxon-Mann-Whitney test at α = 0.05). Therefore, onset latency reflects the amount of hearing experience and age of onset and thus represents a good measure of plastic adaptation caused by the stimulation.

Next, onset latencies of crossed responses were compared between hemispheres in four animals with a stimulation duration of 5 months (**Figure 8**). Here, stimulation of the hearing ear

Wilcoxon-Mann-Whitney test, ∗∗*p* < 0.01.

resulted in shorter crossed response latencies than that of the deaf ear. The effect was significant in early onset animals and disappeared in late-implanted animals. However, when considering the crossed pathway of the deaf ear, the latency of the congenital and the late implanted animals were similar and we could not detect any consistent age-of-onset effect. Consequently, with regard to onset latency, the effect of stimulation in crossed responses is confined to the hearing ear; the influence on the crossed response of the deaf ear was weak. Remember that the amplitudes were, however, similar for the crossed response of the deaf and the hearing ear (**Figure 4**).

Finally, the onset latencies for the responses to the stimulated (hearing) ear were compared between the cortices (**Figure 9**). This comparison was performed on one congenitally unilaterally deaf and five implanted animals, two with early and three with late implantation. In the animals with early onset of asymmetric hearing, the latencies were short and not different at the contralateral and ipsilateral cortex, with a tendency toward shorter latencies for ipsilateral (uncrossed) response. This is highly unusual, as in all hearing cats the situation was the reverse and significant (Kral et al., 2013). However, in late-implanted animals, the latencies tended to be longer (see **Figure 9**), and in two out of three animals tested, contralateral onset latency was significantly smaller than the ipsilateral onset latency (in the remaining animal it was not significant, but with a tendency toward shorter contralateral response, **Figure 9**). That demonstrates that in the late implantation, there is some profit from the stimulation, but it is more confined to the crossed response. In early implantation, there is benefit to both the uncrossed and crossed response. In this respect, the afferent pathway to the contralateral cortex has a longer sensitive period than the one to the ipsilateral cortex.

To verify this outcome, the difference in medians of uncrossed and crossed responses to the hearing ear was compared between early and late onset of asymmetric hearing (**Figure 10**). In earlyonset animals, uncrossed response showed a shorter onset latency, resulting in a negative mean difference in all three animals, but the difference became positive in all three cases of late implantations (Wilcoxon-Mann-Whitney two-tailed test, *p* = 0.044, **Figure 10**). This further demonstrates that the benefit to the uncrossed response is confined to early-onset asymmetric hearing.

#### **DISCUSSION**

The present manuscript describes hemispheric specificity of effects of unilateral hearing following congenital deafness. It demonstrates that in early-onset unilateral hearing, both the ipsilateral and the contralateral hemisphere reorganize and strengthen the responses to stimulation of the hearing ear, giving it an advantage over the deaf ear. In the early-onset animals, the ipsilateral hemisphere responded more strongly to stimulation of the hearing ear. The morphology of the LFPs demonstrated that the reorganizations following unilateral deafness were hemisphere-specific (**Figure 6**).

The uncrossed pathway appeared more susceptible to developmental alteration of hearing balance than the crossed pathway, although both the crossed and uncrossed responses to the hearing ear were changed by the stimulation. In late implantations, the uncrossed response did not show similar reorganization, and it was largely the crossed pathway of the hearing ear that still benefited from stimulation, although less than in early implanted animals (**Figures 8**, **9**). The uncrossed pathway was more sensitive to age of onset than was the crossed pathway, demonstrating

**FIGURE 8 | Onset latency for crossed responses; color denotes configuration.** Blue denotes the hearing ear. In animals with early onset of unilateral hearing, crossed response for hearing ear was significantly shorter than that for the deaf ear (two-tailed Wilcoxon-Mann-Whitney test). This was not the case for late-onset animals. Onset latency for stimulation of deaf ear did not show any systematic dependence on onset of asymmetric hearing, whereas onset latency for hearing ear became longer in cases of late-onset asymmetric hearing. ∗∗∗∼*p* < 0.001; ∗∼*p* < 0.05.

a shorter sensitive period. The mutual relationship between the hearing ear and the hemisphere investigated is critically important when assessing developmental auditory plasticity.

#### **METHODOLOGICAL DISCUSSION**

The present study compared the effects of stimulation at both hemispheres. There are several limitations to the present approach that merit discussion. First of all, comparisons between hemispheres preclude pairwise testing. Although the present study used an approach validated by several previous studies performing interindividual comparisons (Klinke et al., 1999; Kral et al., 2002, 2006), the present approach is more limited than the pairwise comparison (Kral et al., 2013). The present experiments took 48 h in most animals, with the possibility of a state change during the procedure. However, even though recordings at the ipsilateral cortex were performed later, the responses to stimulation were not systematically different in hemispheres exposed later than those exposed earlier (e.g., for the hearing ear or the amplitudes of crossed responses; see examples in **Figures 2**, **4** and **5**). Furthermore, onset latencies systematically changed depending on experience and on which ear was stimulated, and not on time of recording (later-exposed cortices showed shorter onset latencies in early-implanted animals and longer ones in the late-implanted animals), so that a change of state during the experiments can be ruled out. In total, we have no indication of any systematic shifts in the general state of the animals during recordings.

Degeneration of spiral ganglion cells is unlikely to have contributed to the findings here, as there was no significant spiral ganglion cell loss in the implanted (basal) region of the cochlea within the first 2 years of life in congenitally deaf cats, and even at an older age there was less degeneration in the basal cochlea (Heid et al., 1998). Moreover, stimulation of the deaf ear in unilateral animals resulted in much smaller responses at the hemisphere contralateral to the hearing ear, but at the ipsilateral cortex the responses to the deaf ear were comparable to the

responses to the hearing ear (**Figures 2**, **6**), showing not only the hemispheric specificity of plastic reorganization, but also ruling out any significant influence of peripheral ear-specific effects.

Finally, age at final experiment did not significantly contribute to the present findings. The cortical developmental sequence in deaf and hearing cats with respect to electrically evoked responses terminates at ∼6 months (Kral et al., 2005; for similar data on acoustic stimulation, see Eggermont, 1996; for data on onset latencies, see supplementary material in Kral et al., 2013). Only two of the 10 investigated animals were younger (animals no. 3 and 6 in **Table 1**). Omitting these two animals from the comparisons did not affect any finding of the study. Overall, we can also rule out age at experiment as a confounding factor.

Finally, it has to be considered that the effects measured at the ipsilateral cortex need not necessarily arise in the ipsilateral hemisphere, but may have an origin in the contralateral hemisphere before pathway crossing. For the sake of simplicity, however, we will not complicate the considerations below by including this aspect.

#### **DISCUSSION OF RESULTS**

The present study is well in agreement with previous investigations on the subject. The notion of auditory sensitive periods in neuronal plasticity (review in Kral, 2013) has been established for deaf, cochlear-implanted animals (Kral et al., 2001, 2002, 2013) and cochlear-implanted children (Ponton and Eggermont, 2001; Sharma et al., 2002, 2005), as well as for hearing animals (Zhang et al., 2002; Nakahara et al., 2004; de Villers-Sidani et al., 2008). However, sensitive periods are not observed in all plastic reorganizations in the brain (Noreña et al., 2006; Eggermont, 2013; Pienkowski et al., 2013).

Early neonatal unilateral ablation studies suggested a reorganization of the auditory brain toward the hearing ear (Nordeen et al., 1983; Kitzes and Semple, 1985), although it has not been possible to stimulate the ablated ear and compared it with the hearing ear, and no developmental study has been undertaken.

The reorganization reported here and in previous studies is in accord with results from cochlear-implanted children (Peters et al., 2007; Zeitler et al., 2008; Graham et al., 2009; Gordon et al., 2013; Illg et al., 2013). Early second implantations are important in terms of retaining the potential to reverse the aural preference, whereas the effects were observed after more than 1 year of unilateral use (in early implantations at 1.74 years, Gordon et al., 2013; in single-sided deafness, see Scheffler et al., 1998; Bilecen et al., 2000; Langers et al., 2005; Burton et al., 2012; Maslin et al., 2013). As the greatest effects were observed in congenitally unilaterally deaf animals in the present study, before cortical synaptogenesis has set in in cats (Kral et al., 2013), implantation at ages of less than 1 year of life (peak of synaptogenesis between 1 and 4 years, comparison cat-human in Kral and O'Donoghue, 2010) can be expected to generate substantially larger effects in children than those described to date. It has to be stressed here, however, that although age at first implantation is important for the outcome in pediatric cochlear implantation, some benefit from the second ear is found even in cases of longer interimplant delay (Zeitler et al., 2008; Illg et al., 2013). This corresponds to the present observation that in no case was the response to the deaf ear eliminated completely (see also Kral et al., 2013). Nonetheless, lesser responses for the deaf ear at the contralateral cortex supports the suggestion that the deaf ear is placed at a disadvantage in competition for cortical resources particularly in individuals with early onset-unilateral hearing (Kral et al., 2013).

Importantly, the outcome shows specificity for the ear that has received input (**Figures 2**, **4**, **8**). Both onset latency (**Figure 8**) and amplitudes (Kral et al., 2013) of the responses appeared different for different ears, but a more detailed analysis demonstrated that the morphology (disregarding amplitudes) is more hemisphericspecific than ear-specific (**Figures 5**, **6**). Nonetheless, a lesser response was observed for the deaf ear, particularly at the contralateral hemisphere in the congenital animal (**Figure 2**). At the ipsilateral hemisphere, a pairwise comparison demonstrated smaller responses for the deaf ear (Kral et al., 2013). Pairwise testing was unfortunately not possible for the present interhemispheric comparisons, so that small differences may went unnoticed.

Subcortical reorganization with deafness and cochlear implants has been described before (Snyder et al., 1991; Shepherd et al., 1999; Ryugo et al., 2005; O'Neil et al., 2010). Nevertheless, several studies indicate that cortical plasticity is higher than subcortical (Ma and Suga, 2005; Popescu and Polley, 2010), and that the former plays the controlling role via efferent systems (Ma and Suga, 2005). Interestingly, despite pronounced developmental effects observed in other measures, the dissimilarity index did not show a clear developmental pattern. For the present experiments, this means that it is more the balance of aural inputs that is developmentally modulated and less the way in which the inputs are processed after converging on the same neuronal elements. Further it shows that the plasticity mechanism is similar at all ages, only the extent of the effect fades with increasing age, and fades out faster at the ipsilateral hemisphere. The present study indicates a subcortical site for reorganization following unilateral deafness (before or at the point of binaural convergence).

The uncrossed response latency for stimulation of the hearing ear became smaller than the crossed response latency, so that the difference in medians was negative, but only in the earlyimplanted animals (**Figures 9**, **10**). This finding could indicate a loss of inhibition at the ipsilateral cortex in early-implanted animals, allowing uncrossed inputs to be pulled to much shorter latency (Zhou et al., 2012). In normal, binaural hearing animals, the uncrossed response evokes inhibition more frequently, whereas the crossed response is more excitatory (Zhang et al., 2004). The present observations can therefore be explained by a specific down-regulation of inhibition in unilateral hearing at the hemisphere ipsilateral to the hearing ear (Vale et al., 2004), as it may explain the shorter latency, larger uncrossed response for the hearing ear, as compared with the longer-latency, smaller uncrossed response for the deaf ear. Different recruitment of inhibition in the ipsilateral and contralateral hemisphere would then also affect the morphology of the LFPs differentially for the different hemispheres. The relatively large drop in onset latency in early-onset animals (uncrossed response in the ipsilateral hemisphere) can be alternatively explained only by an increase in synaptic conduction in many synapses; this, however, fails to explain the hemispheric specificity of the outcomes on LFP morphology. Finally, a downregulation of inhibition in (preferentially) the ipsilateral hemisphere can explain the rapid onset of the effect that is different from the process behind the slower plastic changes in the contralateral hemisphere (**Figure 7**, see also Kral et al., 2002). Previous experiments with unilateral cochlear ablation also suggest that the mechanism underlying ipsilateral adaptations in the midbrain differs from that underlying contralateral adaptations (Vale et al., 2004). Future studies on unit activity in congenitally unilaterally deaf and unilaterally implanted animals may demonstrate this by showing the reduction or absence of suppressive binaural interaction in the ipsilateral hemisphere and its presence in the contralateral hemisphere.

Inhibitory synaptic transmission matures later than excitation (review in Kral et al., 2013) and is likely one of the mechanisms explaining the shorter sensitive period for the uncrossed response. In this sense the longer onset latency with increasing implantation age at the ipsilateral hemisphere could be related to the fact that in early-implanted animals, the developmental process of inhibitory transmission was not yet finalized when the hearing asymmetry started and could be therefore more modulated by unilateral hearing. In late-implanted animals the unilateral hearing set in after the development of inhibition has terminated. Plasticity observed before this point is likely to involve changes on both inhibitory and excitatory transmission, whereas later plasticity likely depends more on excitatory synapses with smaller contribution of inhibitory synapses. That can explain the different sensitive periods at the ipsilateral and contralateral hemisphere.

Stimulation of the hearing ear generates strong responses both at the ipsilateral and the contralateral hemisphere (for human data, see Bilecen et al., 2000; Hanss et al., 2009; Burton et al., 2012; Gordon et al., 2013). Stimulation of the deaf ear activates the crossed pathway but does so only weakly for the uncrossed pathway. The contralateral preference of the deaf ear is strengthened due to reduction of the uncrossed responses, the contralateral preference of the hearing ear being reduced because of to strengthening of the uncrossed responses (**Figure 11**). The present findings therefore provide an explanation of the mechanism behind the outcomes of human imaging studies (Bilecen et al., 2000; Gordon et al., 2013).

#### **CONCLUSION**

The present study supports the concept of several sensitive developmental periods by demonstrating a shorter sensitive period for reorganization at the ipsilateral hemisphere as compared with the contralateral hemisphere. It shows more extensive changes in uncrossed responses than in the crossed responses in early-onset animals. Furthermore, it shows that unilateral deafness results in an asymmetric brain, with different hemispheres showing differential responses for both the deaf and the hearing ear. The hemisphere ipsilateral to the hearing ear most likely downregulates inhibition, by that specifically decreasing onset latency of the response to the hearing ear. This effect is not found in the contralateral hemisphere.

The deaf ear is, however, not completely 'disconnected' from the cortex following single-sided deafness. The hemisphere ipsilateral to the hearing ear preserves responsiveness to the deaf ear, although with a preference for the hearing ear. Finally, the present results support a greater 'separation' of the ears in early onset unilateral hearing.

## **ACKNOWLEDGMENTS**

The study was supported by the German Research Foundation (DFG; Cluster of Excellence Hearing4all and DFG Grant Kr 3370/1-3). The authors want to thank Peter Baumhoff, MSc., for designing **Figure 11** and the insets of **Figures 2**–**5**, **8** and **9**.

## **REFERENCES**


**Conflict of Interest Statement:** Dr. Jochen Tillein works also for MedEl Company, Innsbruck. His obligations in the company had no interference with the work nor is there any direct financial interaction between MedEl and the research performed in this study. The other authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 15 August 2013; accepted: 05 November 2013; published online: 27 November 2013.*

*Citation: Kral A, Heid S, Hubka P and Tillein J (2013) Unilateral hearing during development: hemispheric specificity in plastic reorganizations. Front. Syst. Neurosci. 7:93. doi: 10.3389/fnsys.2013.00093*

*This article was submitted to the journal Frontiers in Systems Neuroscience.*

*Copyright © 2013 Kral, Heid, Hubka and Tillein. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## How age and linguistic competence alter the interplay of perceptual and cognitive factors when listening to conversations in a noisy environment

## *Meital Avivi-Reich , Meredyth Daneman and Bruce A. Schneider\**

*Human Communication Laboratory, Department of Psychology, University of Toronto Mississauga, Mississauga, ON, Canada*

#### *Edited by:*

*Arthur Wingfield, Brandeis University, USA*

#### *Reviewed by:*

*Paul Miller, Brandeis University, USA Mary Rudner, Linköping University, Sweden Sophia E. Kramer, VU University Medical Center, Netherlands*

#### *\*Correspondence:*

*Bruce A. Schneider, Human Communication Laboratory, Department of Psychology, University of Toronto Mississauga, Room: CCT 4073, 3359 Mississauga Road N., Mississauga, ON L5L 1C6, Canada e-mail: bruce.schneider@utoronto.ca* Multi-talker conversations challenge the perceptual and cognitive capabilities of older adults and those listening in their second language (L2). In older adults these difficulties could reflect declines in the auditory, cognitive, or linguistic processes supporting speech comprehension. The tendency of L2 listeners to invoke some of the semantic and syntactic processes from their first language (L1) may interfere with speech comprehension in L2. These challenges might also force them to reorganize the ways in which they perceive and process speech, thereby altering the balance between the contributions of bottom-up vs. top-down processes to speech comprehension. Younger and older L1s as well as young L2s listened to conversations played against a babble background, with or without spatial separation between the talkers and masker, when the spatial positions of the stimuli were specified either by loudspeaker placements (real location), or through use of the precedence effect (virtual location). After listening to a conversation, the participants were asked to answer questions regarding its content. Individual hearing differences were compensated for by creating the same degree of difficulty in identifying individual words in babble. Once compensation was applied, the number of questions correctly answered increased when a real or virtual spatial separation was introduced between babble and talkers. There was no evidence that performance differed between real and virtual locations. The contribution of vocabulary knowledge to dialog comprehension was found to be larger in the virtual conditions than in the real whereas the contribution of reading comprehension skill did not depend on the listening environment but rather differed as a function of age and language proficiency. The results indicate that the acoustic scene and the cognitive and linguistic competencies of listeners modulate how and when top-down resources are engaged in aid of speech comprehension.

**Keywords: age, nonnative listeners, speech comprehension, spatial separation, hearing, multitalker discourse, auditory-cognitive interaction, hearing loss**

#### **INTRODUCTION**

Conversations with friends, co-workers, healthcare providers, and others often occur in noisy environments (e.g., malls, restaurants, stores, offices) in which there are a number of different sound sources that could interfere with one's ability to communicate effectively. In particular, the presence of other talkers, who are not part of the conversation, can be particularly distracting when one is trying to follow a conversation between two or more people. Such multi-talker auditory scenes increase the complexity of both the perceptual and cognitive processes required for comprehension. To effectively follow a multi-talker conversation, the listener needs to perceptually segregate the talkers from one another, efficiently switch attention from one talker to another, keep track of what was said by whom, extract the meaning of each utterance, store this information in memory for future use, integrate incoming information with what each conversational participant has said or done in the past, and draw on the listener's own knowledge of the conversation's topic to extract general themes and ideas (Murphy et al., 2006; Schneider et al., 2010). In other words, fully comprehending what is going on in a conversation requires the smooth and rapid coordination of a number of auditory and cognitive processes. Hence, it is not surprising that people in general find such situations stressful, and that older individuals, whose auditory and cognitive systems may be in decline, and even young, healthy listeners who are operating in their second or third language, find such situations particularly devastating.

To experimentally determine the reasons why people may find it difficult to follow conversations in noisy situations, we need laboratory simulations of ecologically-relevant listening environments and tasks, where we can control and manipulate relevant variables such as the nature and number of competing sound sources, their spatial locations, and the signal-to-noise ratios (SNRs) under which they are presented. By far, most of the studies designed to evaluate the relative contribution of perceptual and cognitive factors in hearing involve simple word or sentence recognition (see review by Humes et al., 2012). In such studies the target stimuli are either words or sentences and the listeners are simply asked to repeat them (e.g., George et al., 2007; Francis, 2010). The ability to repeat target words is taken as indicating speech is understood (Humes and Dubno, 2010). Comprehending what is happening in a conversation, as we have argued above, requires much more than simply being able to repeat the words being spoken. Hence, there is a substantial difference between speech recognition and speech comprehension (Schneider, 2011). A review of studies taking a correlational approach to the relationships among perception, cognition and speech recognition suggests that individual differences in speech recognition cannot be explained fully by the auditory or the cognitive factors that have been considered (age, pure tone thresholds, spectral and temporal processing, intensity coding, and cognitive processing; see Houtgast and Festen, 2008 for a review). Such studies, while they provide important information that may shed light on some of the processes needed for speech recognition (e.g., stream segregation, morpheme identification, lexical access), are limited with respect to their ability to address the role played by the higher-order cognitive processes required to successfully comprehend a conversation (e.g., attention switching, information integration, memory).

Although following a conversation in complex auditory scenes is a challenging task for all listeners, this task seems to be disproportionally harder for older adults (Murphy et al., 2006), and most likely for people listening in their second language (L2). Difficulties experienced by older adults could reflect age-related declines in the auditory, cognitive, and/or linguistic processes supporting spoken language comprehension. Age related changes occurring at different levels of the auditory system (e.g., elevated hearing thresholds, loss of neural synchrony, see Schneider, 1997; Schneider and Pichora-Fuller, 2000 for reviews), may also be accompanied by age-related changes in the cognitive processes related to speech comprehension. The cognitive aging literature notes that there are age-related reductions in the ability to focus attention and inhibit irrelevant sources (Hasher and Zacks, 1988; for a review, see Schneider et al., 2007), as well as evidence suggesting a general slowing of cognitive processing (e.g., Salthouse, 1996). Recent studies show that stream segregation may take a longer time to emerge in older adults than in younger adults in the presence of speech and speech-like maskers (Ben-David et al., 2012). In addition, although linguistic knowledge has been found to be relatively preserved in older age (Burke and Shafto, 2008), it is possible that under stressful and difficult listening situations, older adults may experience reduced capability to utilize their linguistic knowledge and skills in order to enhance speech comprehension because of age-related declines in executive functions (Clarys et al., 2009). Such age-related changes in auditory and cognitive processes may require a reorganization of the way information is processed in the brain. A number of studies have shown that older adults often engage different neural circuitry than that employed by younger adults when performing a task (Harris et al., 2009; Wong et al., 2009). Hence it is likely the relative contribution of different auditory and cognitive processes will be affected by aging (Schneider et al., 2010; Wingfield and Tun, 2007). One of the objectives of the current research is to determine how age, linguistic status, and the nature of the auditory scene differentially engage the perceptual and cognitive processes that contribute to speech comprehension.

A couple of relatively recent studies examined age-related changes in speech comprehension using longer and more complex tasks such as monologs and dialogs in order to gain a fuller understanding of the contribution of both the cognitive and perceptual processes involved in speech comprehension, and possible interactions among them. Schneider et al. (2000) asked both younger and older adults to listen to monologs and answer multiple-choice questions regarding their content at the end of each one. The results showed that age-related declines in the ability to process and remember a monolog could be eliminated when individual adjustments are made to compensate for speech recognition thresholds in younger and older listeners. Murphy et al. (2006) adapted Schneider et al.'s methodology to study the ability of younger and older adults to follow two-talker conversations instead of single-talker monologs. They selected a series of engaging one-act plays, each involving dialog between two characters of the same gender. Participants listened to the dialogs either in quiet or in a background of multi-talker babble noise. After listening to a 10–15 min dialog, participants answered a set of 10 multiplechoice questions that tested their comprehension and/or memory of details about the conversation. In Experiments 1–3, the talkers were separated by 9◦ or 45◦ azimuth in order to simulate a typical conversation between two talkers who necessarily have different spatial locations. However, in a control experiment, no such spatial separation was present (equivalent to a radio play). Their results indicated that older adults were able to answer fewer questions than younger adults in this listening-to-conversation task when both age groups were tested under identical stimulus conditions (Experiment 1) and that this age difference could be reduced but not eliminated when the listening situation is adjusted to make it equally difficult for younger and older adults to hear individual words when the two talkers are spatially separated (Experiments 2 and 3). The results of their last experiment (Experiment 4) showed that the age effect could be eliminated when listening situations were individually adjusted and there was no spatial separation present between talkers.

These results provided some evidence that older adults are indeed less skilled than younger adults at extracting and remembering information from a two-person conversation if adjustments have not been made for their poorer speech recognition thresholds. In addition, the consistent age difference, which was found as long as spatial separation between the two talkers was present, even after compensations had been made for the older listeners' deficits in hearing individual words, suggests that older adults might not be able to benefit as much from the full range of acoustic cues available with real spatial separation. A number of studies have shown that the improvement in speech reception thresholds that occurs when targets and maskers are spatiallyseparated rather than co-located, is significantly larger for young normal-hearing adults than for older, hearing-impaired adults (Neher et al., 2011). This inability of older and/or hearingimpaired adults to benefit from spatial separation could make it more difficult for them to establish and maintain stream segregation. In turn, increased stream segregation difficulties are likely to reduce the fidelity of the bottom-up, acoustic information, thereby requiring a greater degree of attentional investment at the perceptual level (Neher et al., 2009). As a result a change in the balance between the contributions of bottom-up vs. top-down processes may occur to compensate for the loss of fidelity in the neural representation of the acoustic signal. It is reasonable to assume that as a result of the changes mentioned, more weight will be given to top-down processes as the use of bottom-up information becomes limited. In particular, when there is interference with the bottom-up, stimulus-driven processes leading to lexical access, the listener may come to depend more on those top-down processes, such as vocabulary knowledge, that could be used to aid lexical access. Hence we might expect to find the correlation between vocabulary knowledge and speech comprehension to increase as the auditory scene becomes more complex. In addition, the deployment of attentional resources to aid lexical access could make it more difficult for listeners to engage higher-order modality-independent processes (such as those involved in reading comprehension) to help them to understand and retain what they have heard. However, to our knowledge, there have been no attempts to investigate the degree to which listening difficulties alter the relative balance between bottom-up and top-down processing in ecologically-valid listening situations, nor have there been attempts to determine the degree to which the contribution of different levels of processing to speech comprehension in such situations are modulated by the characteristics of the auditory scene, and the age and linguistic competence of the listener.

In order to further explore the sources of the age difference in the ability to comprehend and recall a dialog when two talkers are spatially separated, we used the precedence effect to change the virtual location of each talker. A number of studies have shown that if the same sound is played over two loudspeakers located to the left and right of the listener, with the sound on the left loudspeaker lagging behind that on the right, the listener perceives the sound as emanating from the right and vice versa (e.g., Rakerd et al., 2006). Because the sound is played over both loudspeakers, a virtual spatial separation is achieved without altering the SNR at each ear. Hence a perceived spatial separation can be achieved even though there is a substantial reduction in the auditory cues supporting spatial separation (e.g., no head shadow effect). Moreover, Cranford et al. (1990) have shown that when the precedence effect is used to specify the virtual locations of sounds, both younger and older adults experienced the sound as emanating from the side where the leading loudspeaker was positioned The precedence effect has been used in a number of studies related to informational masking as a way of achieving a perceived (virtual) separation between sound sources without substantially affecting the SNR at each ear of the listener (e.g., Freyman et al., 1999; Li et al., 2004) and it has been shown that both younger and older adults reap the same degree of benefit from perceived separation (Li et al., 2004). However, using the precedence effect to achieve and maintain a virtual spatial separation among sound sources may require that a larger proportion of attentional resources be allocated to stream segregation since the sound sources under the precedence effect are perceived as more diffuse and cannot be precisely located in space. This, in turn, may alter the balance between the top-down and bottom-up processes involved in speech recognition.

Nonnative listeners also constitute a group that experiences considerable difficulty when attempting to follow a conversation in their second language (L2) in the presence of background noise. Nonnative young listeners are not likely to differ from native young listeners using their first language (L1) with respect to basic auditory and cognitive abilities. However, nonnative listeners of a language tend to have lower scores than native listeners on a number of speech-perception measures (Mayo et al., 1997; Bradlow and Pisoni, 1999; Meador et al., 2000; Bradlow and Bent, 2002; Cooke et al., 2008; Rogers and Lopez, 2008; Ezzatian et al., 2010). This difference in performance is influenced by several factors, such as duration of exposure to L2, degree of similarity between L1 and L2, knowledge of the L2 vocabulary and grammatical structure, frequency and extent of L2 use, etc. The acoustic–phonetic characteristics of a second language, which was acquired or learned at a later age than their L1, may not be fully acquired (e.g., Florentine, 1985; Mayo et al., 1997), resulting in a reduced ability to discriminate fine phonemic information, such as phonetic contrasts and phonemic categories which are crucial for successful speech perception (Bradlow and Pisoni, 1999; Meador et al., 2000). In regard to listening in noise, previous literature on young nonnative listeners suggests that nonnative listeners are less able to make use of a language mismatch between masking and target stimuli to facilitate masking release (Brouwer et al., 2012). In the case of nonnative listeners, it is reasonable to assume that the difficulties they experience in L2 environments may be due to the fact their L2 semantic and linguistic processes may not be completely differentiated from the semantic and syntactic processes that are usually invoked when listening in their L1 (Kroll and Steward, 1994).

In the current study we choose to further explore the effect of age and linguistic status on the ability to successfully follow a two-talker conversation (dialog) conducted in a babble background noise with the locations of the sound sources being either virtual or real, and with the two talkers being spatially separated or co-located. Native-English younger and older listeners and young nonnative-English listeners were asked to listen to conversations played against a babble background noise and to answer questions regarding their content. Individual hearing differences were compensated for by creating the same degree of difficulty in identifying individual words in a babble background when there was little or no contextual support for word recognition. Two measures of individual differences in linguistic competence were included; a measure of vocabulary knowledge (the Mill Hill; Raven, 1965) and a measure of reading comprehension skill (the Nelson-Denny; Brown et al., 1981) 1 . If listening difficulty, linguistic status, or age alters the relative contribution of bottom-up and top-down processes, and differentially affects the various stages in the speech processing network, we might expect that the degree to which these two factors are correlated with the ability

<sup>1</sup>These two measures were chosen to tap two different levels of linguistic and cognitive competence. We might expect a vocabulary measure to be related to processes involved with lexical access, whereas reading comprehension would likely tap additional higher-order modality independent processes involved in language comprehension. Other, more specific cognitive and linguistic skills could also be investigated in future studies.

to comprehend and remember dialogs to vary across groups and listening conditions.

## **MATERIALS AND METHODS**

#### **PARTICIPANTS**

The participants were 24 normal hearing younger adults who are native-English listeners (mean age: 21.26 years; *SD*: 3.02), 24 normal hearing older adults who are native–English listeners (mean age: 69.7 years; *SD*: 4.6), and 24 normal hearing young adults who are nonnative-English listeners (mean age: 21.04 years; *SD*: 1.71). Native-English listeners were all born and raised in a country in which the primary language was English and were not fluent in any other language at the time of participation. Nonnative-English listeners were those who first became immersed in an English speaking environment after the age of 14. One older listener, who found the noisy background in the study to be uncomfortable, withdrew from the experiment and had to be replaced. The young participants were volunteers recruited from the students and staff at the University of Toronto Mississauga. The older participants were volunteers from the local community. All participants were asked to complete a questionnaire regarding their general health, hearing, vision, and cognitive status. Only participants who reported that they were in good health and had no history of serious pathology (e.g., stroke, head injury, neurological disease, seizures, and the like) were included. All participants reported having a normal or corrected vision and were asked to use their correcting lenses when necessary. None of the participants had any history of hearing disorders, and none used hearing aids. The studies reported here were approved by the Ethics Review Board of the University of Toronto.

During each participant's first session at the lab we administrated audiometric thresholds, the Nelson-Denny reading comprehension test (Brown et al., 1981) and the Mill Hill test of vocabulary knowledge (Raven, 1965). The dialogs, along with the babble thresholds and the low-context R-SPIN thresholds were administered over the next two experimental sessions (three dialogs per session). Tests were administered in a double-walled sound-attenuating chamber. All participants were paid \$10/h for their participation.

#### **HEARING MEASURES**

#### *Audiometric testing*

Pure-tone air-conduction thresholds were measured at nine frequencies (0.25–8 kHz) for both ears using an Interacoustics Model AC5 audiometer (Interacoustic, Assens, Denmark). All younger participants were required to have pure tone airconduction thresholds 15 dB HL or lower, between 0.25 and 8 kHz in both ears. Young participants with a threshold of 20 dB HL at a single frequency were not excluded from the study. Older participants were required to have a pure tone thresholds 25 dB HL or lower from 0.25 to 3 kHz and 35 dB HL or lower for frequencies *<*6 kHz. Participants who demonstrated unbalanced hearing (more than a 15 dB difference between ears under one or more frequencies) were excluded from participation. Older adults with hearing in the range described are usually considered to have normal hearing for their age. However, it is acknowledged that older adults' hearing changes and deteriorates with age and is not equivalent to that of younger adults (Wingfield et al., 1985; Fitzgibbons and Gordon-Salant, 1996; Wingfield, 1996; Schneider and Pichora-Fuller, 2001; Glyde et al., 2011). The average audiograms for the three groups of participants are shown for the right and the left ears in **Figure 1**. The two groups of young adults had equivalent hearing levels at all frequencies. Hearing levels for older adults were about 7 dB poorer than those of the younger adults at frequencies ≤3 kHz, with the younger-older difference increasing as a function of frequency for frequencies *>*3 kHz.

#### *Babble threshold test*

The adaptive two-interval forced choice procedure employed by Schneider et al. (2000) was used to measure individual detection thresholds for the 12-talker babble masker used in this experiment. In this procedure, a 1.5 s babble segment was randomly presented in one of two intervals which were separated by a 1.5-s silent period. Two lights on the button box indicated the occurrence of each interval, and the listener's task was to identify the interval containing the babble segment by pressing the corresponding button. Immediate feedback was provided after each press. We used an adaptive two down one up procedure (Levitt, 1971) to determine the babble threshold corresponding to the 79% point on the psychometric function. Two different babble thresholds were determined for each individual. First, a babble threshold was determined when the babble was presented over a single central loudspeaker. The sound levels of the speech signals for the condition in which the voices of both talkers were presented only over the central loudspeaker (no separation, single loudspeaker condition) were individually set to be 45 dB above this babble threshold (sensation level, SL, of 45 dB). A second babble threshold was determined when the babble was played simultaneously over two loudspeakers located 45◦ to the right and left of the listeners. This babble threshold was used to adjust the intensity of speech signal to 45 dBSL for the conditions in which voices were presented over both lateral loudspeakers. For a graphic illustration of the two babble conditions see **Figure 2**, first column on the left.

### *R-SPIN test*

In this test participants are asked to immediately repeat the last word of individual sentences presented to them in a multi-talker

Standards Institute.

**FIGURE 2 | The left column specifies the two babble thresholds collected, and under what conditions they were used to adjust the signal level; the middle column specifies the conditions under which R-SPIN thresholds were obtained; and the right column illustrates the four dialog comprehension test conditions in the experiment.** The top row **(A)** specifies the babble, R-SPIN and dialog comprehension scenarios for the real no-separation condition. The three other conditions share the same babble threshold but differ from each other in respect to the R-SPIN and the dialog comprehension tasks. The second row from the top **(B)** illustrates the R-SPIN and dialog comprehension for the real spatial separation condition, the third row **(C)** illustrates the settings for the virtual no-separation and the fourth **(D)** specifies the settings for the virtual spatial separation. Orange stands for R-SPIN, red stands for talker 1, blue for talker 2, and gray for babble.

babble background. As in Schneider et al. (2000) and Murphy et al. (2006), we used the Revised Speech Perception in Noise (R-SPIN) test (Bilger et al., 1984) to individually determine the SNR producing 50% correct identification of final words of low-context sentences (e.g., Jane was thinking about the van.) presented in multitalker babble under both real and virtual location conditions. In the real no-separation condition, both the babble and R-SPIN sentences were played over the central loudspeaker. In the real separation conditions, the babble was presented over the central loudspeaker and the R-SPIN sentences were presented over the right loudspeaker. In the virtual no separation condition, both the babble and R-SPIN sentences were presented simultaneously over both loudspeakers. In the virtual separation conditions the babble was presented over both lateral loudspeakers simultaneously with the R-SPIN sentences also presented over both loudspeakers but with the sentences presented over the right loudspeaker leading the sentences presented over the left loudspeaker by 3 ms. Hence R-SPIN thresholds were determined for each of the conditions under which participants were tested. The R-SPIN thresholds estimated were rounded to units of 1 dB before being integrated into the calculation of the individually adjusted SNR which was used for presentation of the dialogs. If the estimated R-SPIN threshold was calculated to be exactly half way between two integer values rounding was conducted toward alleviating the SNR difficulty (e.g., 3.5 dB SNR was rounded to 4 dB SNR and −3.5 dB SNR to −3 dB SNR). For a graphic illustration of the four R-SPIN conditions see **Figure 2**, second column from the left.

#### *Dialog comprehension task*

Each participant was asked to listen to six dialogs presented in babble noise in a sound-attenuating chamber. These dialogs were created and previously used by Murphy et al. (2006). Each dialog was based on a published one-act play and had only two characters; in three of the six dialogs, the two characters were female, and in the other three, the two characters were male. At the end of each dialog, which was 10–15 min long, the participant was presented with a series of 10 multiple-choice questions regarding the contents of that dialog (for more information regarding the dialogs and the questions see Murphy et al., 2006). There were four conditions in the experiment: (a) Real spatial separation with the babble presented over the central loudspeaker, with the voice of one of the talkers presented over the left loudspeaker, and the voice of the other talker presented over the right loudspeaker; (b) Real, no spatial separation, with the babble, and the voices of the two talkers presented over the central loudspeaker only; (c) Virtual no spatial separation with the babble and two voices being played simultaneously over both lateral loudspeakers; (d) Virtual spatial separation with the babble presented simultaneously over both lateral loudspeakers, and the two voices also presented over both lateral loudspeakers with the perceived location of the two talkers manipulated using the precedence effect (the voice of talker one over the left loudspeaker playing 3 ms in advance over the same voice playing over the right loudspeaker, with the opposite timing arrangement for the second voice). For a graphic illustration of the four dialog comprehension conditions, see **Figure 2** rightmost column. Half of the participants were tested in conditions a and c (separation conditions), the other half in conditions b and d (no separation conditions). Hence in this design, virtual vs. real location was a within-subject factor and spatial location was a between-subjects factor. The participants in each group were randomly assigned to one of the two spatial location conditions. The dialogs were presented in babble at an SNR which was individually adjusted per participant and condition based on his or her babble threshold and R-SPIN results according to the following calculations:

Dialogs were presented at babble threshold + 45 dB; Babble was presented at babble threshold + 45 dB − R-SPIN threshold + 21 dB.

At the end of each dialog, participants were asked to answer a set of 10 multiple-choice questions with four alternatives that were also constructed and previously used by Murphy et al. (2006). Each question referred to a specific item of information that was mentioned explicitly only once during the dialog.

## **LANGUAGE PROFICIENCY MEASURES** *Vocabulary knowledge*

Participants were asked to complete the Mill Hill vocabulary test (Raven, 1965), which is a 20-item synonym test. In this test participants were required to match each test item with its closest synonym from six listed alternatives. No time restrains were applied. The extent of a person's vocabulary represents knowledge that can be used to facilitate word recognition. When listening becomes difficult, and the fidelity of the bottom-up information contributing to word identification becomes questionable, we might expect the role played by top-down knowledge (e.g., the extent of an individual's vocabulary) to increase. Hence we might expect the correlation between vocabulary knowledge and the number of dialog questions correctly answered to increase with listening difficulty.

#### *Reading comprehension skill*

The Nelson-Denny test (Brown et al., 1981) was used to assess the reading comprehension skills of each participant. In this test the participants had to read through a series of eight independent passages and answer multiple-choice questions regarding the content of the passages. This test includes a total of 36 questions and was limited to 20 min. Participants were instructed to complete as many questions as possible within the time given.

The six dialogs were administered over two sessions, three dialogs per session. Sessions were typically 1.5–2 h in duration and were completed within 2 weeks.

#### **RESULTS**

**Table 1** presents the gender breakdown, mean age, educational level, Mill Hill test of vocabulary knowledge and Nelson-Denny test of reading comprehension results for each age group. One of the nonnative-English listeners had an R-SPIN threshold of 22 dB SNR in the virtual no separation condition. Since this value was more than three standard deviations above the mean for that group, this value was identified as an outlier and was replaced by the average R-SPIN threshold of the nonnative-English listeners group after excluding the outlier (6 dB SNR)<sup>2</sup> .

#### **R-SPIN THRESHOLDS**

**Figure 3** plots the average 50% correct R-SPIN thresholds (dB) as a function of separation condition and group for the native-English younger listeners (dotted rectangles), the native-English older listeners (lined rectangles), and the nonnative-English young listeners (solid rectangles). The SNR levels required for 50% correct repetition of the last word in low-context sentences was highest when there was no spatial separation vs. when there was a separation between the target sentences and the babble background, and were on average lower for real than for virtual locations. The R-SPIN thresholds also appear to be higher for young nonnative-English listeners than for older native-English listeners, who, in turn, had higher thresholds than the younger native listeners. In addition, the advantage due to spatial separation is larger when spatial location is real than when it is virtual. The benefit of separation over no separation on average was larger


*A, A statistically significant difference between Younger*  *B, A statistically significant difference between Younger* 

*C, A statistically significant difference between Older* 

*Native-English*

*Native-English*

*Native-English*

 *and Young* 

 *and Young* 

 *and Older* 

*Native-English*

 *listeners.*

*Nonnative-English*

*Nonnative-English*

 *listeners.*

 *listeners.*

<sup>2</sup>Analyses were done for both the R-SPIN thresholds and the number of correct answers, in which the participant was excluded completely. All the results which were statistically significant in the analyses reported in this paper remained significant and all the results which were statistically insignificant remained as such. We have chosen to replace this one data point with the group average in order to minimize the loss of data.

for the young nonnative-English listeners (5.18 dB) than both the younger (4.2 dB) and older natives-English listeners (2.18 dB).

A repeated measures ANOVA with two separation conditions (yes/no) and three groups (younger and older native-English listeners and young nonnative-English listeners) as betweensubjects factors, and with the two types of spatial location (real/virtual) as a within-subject factor confirmed this description, showing a significant main effect of Separation Condition [*F(*1*,* <sup>66</sup>*)* = 97*.*613, *p <* 0*.*000], Group [*F(*2*,* <sup>66</sup>*)* = 51*.*539, *p <* 0*.*000] and Type of Location [real vs. virtual; *F(*1*,* <sup>66</sup>*)* = 30*.*451, *p <* 0*.*000] as well as a two-way Group by Separation interaction [*F(*2*,* <sup>66</sup>*)* = 3*.*559, *p* = 0*.*034] and a Type of Location by Separation interaction [*F(*1*,* <sup>66</sup>*)* = 46*.*367, *p <* 0*.*000]. No other effects were significant. *Post-hoc* tests with Sidak adjustment found that all three groups differed significantly from one another (*p <* 0*.*005 for all three pairwise comparisons).

To better illustrate the nature of the Separation by Type of Location interaction, **Figure 4** shows the advantage of real vs. virtual location cues (R-Spin threshold virtual—R-SPIN threshold real) for the two types of separation (target sentence and babble separated vs. co-located). This figure clearly indicates the advantage of real over virtual location is larger when the target sentence and babble were perceived to be spatially separated than when they were perceived to be co-located.

**Figure 5** suggests that the Group by Separation interaction is due to the fact that the benefit due to spatial separation is smaller for older native listeners than either of the younger groups. An examination of the group by spatial separation interaction for older vs. younger native listeners found the benefit due to spatial separation to be significantly smaller for older natives than for younger natives [Group × Separation interaction: *F(*1*,* <sup>44</sup>*)* = 6*.*127, *p* = 0*.*017]. The Group × Separation interaction was also significant when the two groups were older natives and young nonnatives [*F(*1*,* <sup>44</sup>*)* = 4*.*927, *p* = 0*.*032], but not when young native listeners were compared to young nonnative listeners [*F(*1*,* <sup>44</sup>*) <* 1]. **Figure 5** also suggests that when there is no separation, the R-SPIN thresholds are equivalent for both younger and

**FIGURE 4 | Average advantage of real vs. virtual location cues (R-Spin threshold virtual—R-SPIN threshold real) for the two types of separation (target sentences and babble separated vs. co-located).** Standard error bars are shown.

bars are shown.

older native listeners. A separate ANOVA on the no-separation condition revealed a significant effect of group [*F(*2*,* <sup>32</sup>*)* = 38*.*154, *p <* 0*.*001] with *post-hoc* tests with Sidak adjustment indicating that the nonnative English listeners differed significantly from both native groups (*p <* 0*.*001 in both cases) but that that the younger and older native English listeners did not differ significantly from one another. A comparable analysis of the data from the Separation Condition revealed a significant Group effect [*F(*2*,* <sup>33</sup>*)* = 20*.*842, *p <* 0*.*001] with the *post-hoc* tests with Sidak adjustment revealing that all three groups differed significantly from one another (*p <* 0*.*03 for all pairwise comparisons).

#### **DIALOG COMPREHENSION RESULTS**

The main finding of interest was that once the SNR levels were adjusted based on the R-SPIN results, only small differences were found between the three groups tested in the amount of questions correctly answered (see **Figure 6**). In general the native-English young listeners seemed to perform slightly better than either the native-English older listeners or the nonnative-English young listeners. However, a 2-within-subject (real vs. virtual) × 2-between-subject (separation vs. no-separation) by 3-betweensubject (younger and older native-English listeners, nonnative-English young listeners) ANOVA revealed only a significant main effect of Separation [*F(*1*,* <sup>66</sup>*)* = 4*.*671, *p* = 0*.*034] with performance being better when the voices were separated rather than co-located.

#### **THE CONTRIBUTION OF VOCABULARY KNOWLEDGE AND READING COMPREHENSION TO DIALOG COMPREHENSION**

We explored the degree to which the different levels of the factors (Separation, Group and Type of Location) were differentially associated with individual differences in vocabulary knowledge and in reading comprehension skill. First we examined whether individual differences in vocabulary knowledge were more predictive of the number of dialog comprehension questions answered correctly when the cues to spatial location were real, as opposed to virtual. **Figure 7** relates the percentage of questions correctly answered as a function of vocabulary score separately for the virtual location conditions (upper panel), and the real location conditions (lower panel). In this figure, the vocabulary scores were first centered within each group of participants to normalize the vocabulary scores across the six groups of participants. Percent correct answers were also normalized within each of the 12 conditions in the experiment to eliminate the contribution of any residual effects of conditions on performance. Before conducting these regression analyses, we first removed any effect that individual differences in reading comprehension had on performance (see Appendix A). This allowed us to evaluate the effects of individual differences in vocabulary once the effects of reading comprehension on performance had been removed. **Figure 7** shows that slope of the line relating percent correct to vocabulary knowledge is considerably higher for virtual location

than real location. A regression analysis (see Appendix A) found vocabulary knowledge to be significantly related to dialog comprehension for the virtual conditions but not for the real location conditions, with the difference in correlation between the two (the interaction between vocabulary and Type of Location) also being significant [*F(*1*,* <sup>142</sup>*)* = 4*.*70, *p* = 0*.*03]. However, similar regression analyses failed to find any evidence that the relationship of vocabulary knowledge to percent correct differed between the no-separation conditions and the separation conditions, or among the three groups of participants (native-English younger listeners, native-English older listeners, and nonnative-English young listeners).

A similar analysis was conducted to examine the contribution of reading comprehension skill to dialog comprehension. In particular, before evaluating the contribution of reading comprehension to performance, we first moved any effect that individual differences in vocabulary had on performance. The results of this analysis indicated that the contribution of reading comprehension skill to dialog comprehension did not differ between the virtual and real location conditions, nor did the contribution

differ between the no-separation and separation conditions. However, the contribution of reading comprehension skill to performance on the dialog comprehension task did differ across the three groups. **Figure 8** shows that reading comprehension was positively correlated with performance for the native-English younger listeners only. A regression analysis (see Appendix A) indicated that the strength of the relationship differed among the three groups, with Bonferroni—corrected pairwise comparisons confirming that the slopes differed significantly between young native-English and nonnative-English listeners.

We also examined whether either the vocabulary scores or the reading comprehension scores were related to R-SPIN in each of the 12 conditions (3 Groups × 2 Spatial Locations × 2 Types of Location). Similar analyses to those conducted for the percentage of correctly answered dialog questions failed to find any evidence that either vocabulary knowledge or reading comprehension skill could account for individual differences in the R-SPIN threshold values.

#### **COMPARISON OF CURRENT DATA TO PREVIOUS DATA**

Two of the conditions in the current study replicate two similar conditions which were found in the Murphy et al. (2006) study.

**FIGURE 8 | Percentage of correctly answered dialog questions plotted against the individual performance on the Nelson-Denny reading comprehension test after the contribution of vocabulary to**

**performance had been removed.** Both the adjusted number of questions answered and the Nelson Denny scores are centered within each group. A least squares regressions line is presented for each of the three groups.

In both the Murphy et al. and the current study, younger and older native-English listeners were volunteers recruited from the University Community and local neighborhood community, respectively. No nonnative-English listeners were tested in Murphy et al. The real separation condition in the present study is comparable to the high babble noise condition where the dialogs were presented from loudspeakers at 45◦ azimuth in Experiment 3 of Murphy et al. and the real no spatial separation condition replicates the condition in which the dialogs were presented once the spatial separation was removed in the last experiment reported (experiment 4) by Murphy et al. The R-SPIN thresholds and the dialog comprehension results of these two comparable conditions from both studies were analyzed separately using a Univariate Analysis of Variance with Age, Experiment and Separation as between-subjects variables. The results of this analysis revealed a significant main effect of Age [*F(*7*,* <sup>88</sup>*)* = 4*.*296, *p* = 0*.*041] but not of Experiment [*F(*1*,* <sup>88</sup>*)* = 0*.*963, *p* = 0*.*329] nor Separation [*F(*1*,* <sup>88</sup>*)* = 0*.*210, *p* = 0*.*648] on dialog comprehension performance. In addition there were no significant two- or three-way interactions among these three factors. Hence there is no evidence that the participants in this experiment differed in performance from those in Murphy et al.

A similar analysis which compared the R-SPIN thresholds calculated under the two comparable conditions in the two studies showed a significant effect of Age [*F(*1*,* <sup>88</sup>*)* = 125*.*94,*p <* 0*.*001] and Separation [*F(*1*,* <sup>88</sup>*)* = 504*.*45, *p <* 0*.*001]. As for the dialog comprehension performance, no significant effect of Experiment was found [*F(*1*,* <sup>88</sup>*)* = 0*.*089, *p* = 0*.*89], however a significant interaction was found between Age and Experiment [*F(*1*,* <sup>88</sup>*)* = 21*.*4, *p* = 0*.*039]. The latter interaction is probably a result of the younger adults in the current study performing slightly worse than the younger adults who participated in Murphy et al. and the older adults in the current study performing slightly better than the older adults who participated in Murphy et al. when there was no spatial separation. Overall, the evidence suggests that the current study successfully replicated the study conducted by Murphy et al. (2006) in regard to the two comparable conditions, and that participants in this study did not differ significantly from the participants in Murphy et al. with respect to their performance in both the dialog comprehension task and the R-SPIN word recognition task in those conditions which were comparable across the two experiments.

### **DISCUSSION**

#### **USING THE R-SPIN RESULTS AS AN INDEX OF SPEECH RECOGNITION DIFFICULTIES**

The R-SPIN results indicate that R-SPIN thresholds are lower for real than for virtual location (see **Figure 3**). This is consistent with the hypothesis that it is more difficult to parse the auditory scene when the location of the sources is virtual rather than real. There is also evidence (see **Figure 4**) that the advantage of real over virtual location of stimuli is considerably larger in the presence of spatial separation. The R-SPIN results also indicate that nonnative-English young listeners and native-English older listeners find it more difficult to recognize words in babble than young native-English listeners (see **Figure 3**). Moreover the results show that older adults benefit less from spatial separation than native-English and nonnative-English younger listeners (see **Figure 5**).

The results also show that when the target sentence and babble appear to be co-located, younger and older native-English listeners have similar R-SPIN thresholds that are significantly lower than those of the nonnative-English listeners. However, when the target sentences and the babble are perceived to be separated younger native-English listeners have significantly lower thresholds than the older natives-English listeners, who, in turn, have lower thresholds than the nonnative-English listeners. This suggests that when target and masker are co-located, normal-hearing older, and younger native-English listeners do not differ with respect to speech recognition in a background of babble, but that both groups have lower speech-recognition thresholds than do young nonnative-English listeners. However, speech recognition in the presence of spatial separation is better for younger than for older native-English listeners.

Overall, the results of the R-SPIN word recognition task show that when there are cues to spatially separate the target from the masker, word recognition in older native-English listeners and young nonnnative-English listeners is inferior to that of younger native-English listeners. When listeners of those two groups attempt to communicate in real-life situations where sound sources are often spatially separated, they will experience greater difficulty with respect to word recognition. In older native-English listeners, this increased difficulty most likely is due to the reduction in the quality of the bottom-up information leading to word recognition. In young nonnative-English listeners, difficulties in word recognition are likely due to the increased difficulty they experience in segregating language streams in their L2 (Cooke et al., 2008; Ezzatian et al., 2010).

#### **DIALOG COMPREHENSION RESULTS**

The dialog comprehension results demonstrated that when R-SPIN thresholds were used to adjust for individual differences in the ability to recognize words without supportive context, no effects due to Type of Location (real vs. virtual), or Group (young native-English listeners, older native-English listeners, young nonnative-English listeners) were found. However, the main effect of Separation was significant even though the SNRs were adjusted based on the R-SPIN thresholds. Listeners on average performed significantly better under spatial separation conditions than when sources were co-located. This suggests that spatial separation (real or perceived) facilitates the comprehension and retention of information obtained from the dialogs even when lexical access is presumably equated across the co-located and spatially separated conditions using the R-SPIN adjustment procedure. For example, spatial separation between the talkers may facilitate the association of the incoming information with a specific talker as well as facilitate its retention. However, it is not possible to determine from the present data the precise mechanisms responsible for this spatial separation effect. One possible reason why the R-SPIN adjustment in the spatial separation condition did not eliminate the Separation effect in the dialogs might be that R-SPIN thresholds were determined for a single voice emanating from the right of the listener with the babble occupying a central location. In the separation condition for the dialogs, however, a voice could be on the left or right depending on who was speaking, with the babble emanating from the center. Hence, spatial separation in the dialog condition could have facilitated switching attention back and forth from the right to left depending on who was speaking. The fact that the R-SPIN test did not require the listener to switch attention from one side to other may explain why it was not successful in eliminating the Separation effect in the dialog portion of this study.

The results from the younger and older adults in the real location conditions of the present study were found to be consistent with the results from the equivalent conditions in the Murphy et al. (2006) study. The age difference found when we combined the real location conditions of these two studies supports the hypothesis that older adults might not be as good as younger adult at using the full range of interaural cues to either obtain or maintain stream segregation when sources are separated in space. Older adults frequently need to communicate in multi-talker daily situations taking place in a noisy environment, and naturally they have to do so without any SNR adjustments to accommodate for any individual age-related changes in hearing. The results described here emphasize the notion that in addition to age related difficulties in word recognition, older adults might have a limited toolbox of acoustic cues to assist them when attempting to meet a speech comprehension challenge in real-life situations (e.g., listening to a movie in a surround sound environment). This reduction in the ability to benefit from the acoustic cues provided by physical separation among sound sources is likely to have even greater implications in the presence of a hearing impairment (Neher et al., 2009).

#### **THE RELATIONSHIP BETWEEN AGE, LINGUISTIC COMPETENCE, AND SPEECH COMPREHENSION PERFORMANCE**

Given the differences in word recognition across groups and acoustic situations found in the current study, we might expect that different processes are differentially engaged in order to compensate for the specific individual deficits in word recognition when listening to dialog. To examine this we took the two measures of linguistic competence and looked to see the extent to which those measures were correlated with performance under each of the acoustic settings used in the current study. This examination, which was done for both the R-SPIN results as well as for the dialog comprehension results separately, can be used to help identify the relative importance of bottom-up and top-down influences on speech comprehension, and how the pattern of interactions among these factors are modulated by the nature of the acoustic scene.

Specifically, we looked at each of the six groups (12 younger natives in the separation condition; 12 younger native-English listeners in the no-separation condition; 12 young nonnative-English listeners in the separation condition; 12 young nonnative-English listeners in the no-separation condition; 12 older native-English listeners in the separation condition; and 12 older native-English listeners in the no-separation condition) to determine the contribution of the two linguistic measures to performance (number of questions correctly answered) within each group and condition (see Appendix A for further details). This analysis showed that both the vocabulary and the reading comprehension tests results were related to the average number of questions correctly answered (*r* = 0*.*32, *p* = 0*.*007, and *r* = 0*.*38, *p <* 0*.*001, for reading comprehension skill and vocabulary knowledge, respectively), but that these slopes did not differ across groups. More interestingly, the results of the analysis indicated that there was a significant correlation between vocabulary knowledge and the number of dialog questions correctly answered under the virtual location conditions but not under the real spatial location conditions (see **Figure 7**), and that this interaction between the Type of Location and Vocabulary was significant (the slopes of the lines differed significantly from each other). We would like to suggest a hypothesis which could explain this finding. An early stage in speech comprehension involves obtaining lexical access to the meaning of words. It has been shown that both bottom-up and top-down processes are involved in word recognition. We can hypothesize that the degree to which top-down processes are engaged in lexical access is modulated by acoustic factors. When listening is relatively easy, we might expect successful lexical access with minimal assistance from top-down processes. However, when listening becomes difficult, we might expect that the top-down processes involved with lexical access to be more fully engaged. When sources are located virtually in space using the precedence effect, they give the impression that their location is diffuse, and there are fewer acoustic cues that can be used to segregate the different acoustic streams than when source location is real (Freyman et al., 1999; Li et al., 2004). Therefore, it is reasonable to expect that under such conditions, obtaining and maintaining stream segregation will be more demanding, and it is possible that the bottom-up processes involved in lexical access will be less reliable because of occasional intrusions from the competing streams. The R-SPIN results indicate that word-recognition is indeed more effortful under virtual as opposed to real location conditions (on average, 50% thresholds are 1.56 dB higher for virtual than for real location conditions), and might require a greater engagement of top-down lexical processes in order to maintain word-recognition accuracy. Hence we would expect that the relative contribution of top-down processes to lexical access to be greater for virtual than for real spatial location conditions. When there is relatively little need to draw on top-down, knowledge-driven processes to obtain lexical access, we would expect a small or negligible contribution of individual differences in the efficacy of these processes to performance. However, when the draw on top-down processes is heavy, then we would expect that some of the variance in performance to be accounted for by individual differences in the efficacy of these processes. Hence we might expect that the contribution of vocabulary knowledge to dialog comprehension performance to increase with level of listening difficulty, as it appears to do in this experiment.

It is also interesting to speculate on the reasons why the relationship between reading comprehension skill and dialog comprehension performance did not differ between real and virtual location conditions (see Appendix A). One possibility is that the linguistic and cognitive skills tapped by the reading comprehension measure are separate from those involved in lexical access. Bottom-up lexical access in these experiments is obtained exclusively through the auditory channel. We can hypothesize that reading comprehension taps higher-order processes that are modality independent and are related to the integration of information, extraction of themes, etc., and therefore are unlikely to be as affected by parameters of the acoustic scene such as whether the location type is real or virtual. On the other hand, we did find group differences with respect to the relationship between reading comprehension and the number of dialog questions answered correctly (**Figure 8**). Specifically reading comprehension was found to be positively and significantly correlated with performance only in younger native-English listeners

The fact that, in young native-English listeners, individual differences in reading comprehension skills account for a significant portion of the variance in the number of dialog questions correctly answered, indicates that there are higher-order, modalityindependent skills that contribute to both reading and listening comprehension in this population. The lack of correlation in older native-English listeners suggests that listening comprehension in difficult listening situations depends on a different set of modality-specific processes that are engaged to compensate for the loss of fidelity in the auditory processing system. In other words in younger native-English listeners, listening comprehension is akin to reading comprehension once lexical access has been achieved. In older native-English listeners and in young nonnative-English listeners, the same degree of listening comprehension (the number of dialog questions correctly answered did not differ across groups) appears to be achieved in a different way. This is consistent with the notion that different neural circuitry supports speech comprehension in different populations (e.g., Harris et al., 2009; Wong et al., 2009).

There may be different reasons why reading comprehension appears to contribute little to individual differences in performance in the older native-English listeners and young nonnative-English listeners. The Nelson-Denny reading comprehension test was developed and standardized for younger native-English listeners and might not be as valid when testing other populations such as older adults or nonnative-English speakers. Because the Nelson-Denny is a time-limited task, it might not adequately reflect individual differences in reading comprehension in older adults whose reading speed is substantially slower than that of younger adults (Rodríguez-Aranda, 2003). It is also unlikely to be a good measure of individual differences in reading competence in young nonnative-English adults either because they too are likely slower readers, or because this test does not adequately gauge the linguistic and cognitive skills used by nonnative-English speakers in comprehending written language. In addition, the draw on attentional resources in young nonnative-English speakers may be higher than in younger and older native-English speakers because lexical access in nonnative speakers most likely requires the activation and integration of information from both their L1 and L2 lexicons (Kroll and Steward, 1994). Moreover, the execution of the higher-order tasks involved in listening comprehension by L2 listeners, such as extracting themes, integrating information with past knowledge, and storing this information for future use may partially be executed in their L1. With respect to older adults, a number of cognitive aging theorists hypothesize that they have a more limited pool of attentional resources than do younger adults (Craik and Byrd, 1982). Alternatively, age-related changes in hearing may place a greater demand on attentional resources in older than in younger adults. Either or both of these factors would result in a greater degree of attentional focus within the auditory domain in older native listeners compared to younger native listeners. As a result, speech comprehension in older adults may depend more on processes that are specific to the auditory modality when listening becomes difficult.

To determine whether the failure to find a relationship between reading comprehension and performance in older native-English listeners when listening becomes difficult reflects an increased dependence on modality-specific processes in listening comprehension tasks, we examined the contribution of the reading comprehension performance to the number of correctly answered questions when older native-English listeners were asked to perform a similar task under less demanding perceptual conditions. As previously mentioned, in a study conducted by Murphy et al. (2006), both younger and older listeners were asked to listen to the same two-talker conversations under different acoustic setting, one of which was in quiet. We tested the contribution of vocabulary and reading comprehension to the dialog comprehension performance in quiet (see Appendix B for a detailed description of the analysis conducted) and found that the least squares regressions of the adjusted percentage correct scores against reading comprehension for both the younger and older participants were highly significant (see **Figure 9**). Hence, under perceptually easy listening conditions, reading comprehension is as strongly related to performance in older native-English listeners as it is in younger native-English listeners. The results of this analysis is consistent with the hypothesis that the lack of a significant contribution of reading comprehension to performance in older adults in noise reflects an increased dependence on modality specific processes when listening becomes difficult.

#### **TOWARD A GENERAL MODEL OF RESOURCE ALLOCATION IN SPEECH COMPREHENSION**

The differential contribution of vocabulary to dialog performance under virtual vs. real location conditions suggests that difficult listening conditions require that attentional resources be deployed in aid of scene analysis and word recognition. In addition, the relative weight given to bottom-up and top-down processes contributing to lexical access may be shifted in favor of top-down influences when listening becomes difficult. Previous theories such as the Ease of Language Understanding Model (ELU) have proposed that lexical access is impeded or slowed when listening becomes difficult (Rönnberg et al., 2013). Hence, more attentional and working memory resources are required to support lexical access. The current results suggest that the demand on such resources is modulated by the nature of the acoustic scene, which, in turn, affects the engagement of the more central, modality independent cognitive resources involved in language comprehension. Let us assume that the virtual location conditions require additional attentional resources be deployed to locate the diffused sources in space, depleting the pool of the resources available for phoneme identification and bottom-up lexical access. The notion here is that with real spatial location, the task of locating the stimuli is easy whereas virtual localization requires a larger amount of attentional processing. Now consider the problem

facing the executive. When full auditory attentional resources can be devoted to lexical access and the bottom-up acoustic information is reliable and sufficient, the executive will trust the output from bottom-up lexical processing, and give less weight to topdown, knowledge-driven factors such as vocabulary knowledge. However, when lower-level attentional resources are required to locate the sound sources, this additional burden reduces the reliability of the information produced through bottom-up lexical processes. In that case, the executive may places more weight on the top-down processes involved in lexical access to compensate for the missing or corrupted bottom-up information.

The hypothesis presented here suggests that depending on the acoustic scene the listening strategy may change the relative engagement of the different processes involved in speech comprehension. The idea that listeners may systematically downplay the contribution of acoustic detail and increase their reliance on lexical-semantic knowledge has been previously suggested by Mattys et al. (2009, 2010) and Mattys and Wiget (2011). Mattys et al. (2009, 2010), Mattys and Wiget (2011) demonstrated a shift which they refer to as a cognitive-load-induced "lexical drift" in cases of high cognitive load (CL) due to an additional secondary task even when no actual energetic masking or additional auditory information is involved (e.g., a secondary visual task). For example, Mattys and Wiget (2011) measured the magnitude of lexical bias on phoneme identification under CL and no CL by adding a secondary visual search task to increase CL. Their results suggested that the CL interferes with detailed phonetic analysis which leaves the listener with impoverished encoding of the auditory input and a greater need to rely on lexical knowledge in order to compensate for the missing information. The collaborative evidence provided by Mattys et al. (2009, 2010) as well as by Mattys and Wiget (2011) and the current study support the existence of a dynamic rather than stationary processing strategy which changes depending on the listening situation, and the age and linguistic status of the listener.

Individual differences in top-down lexical knowledge, as indexed by the Mill-Hill vocabulary test, may be expected to account for a greater proportion of the variance in speech comprehension when the accuracy of the bottom-up processes involved in lexical access is compromised by listening difficulty or by age. Hence we would not expect to find Mill-Hill scores to be related to comprehension in good listening conditions with only a single talker. However, as the listening situation becomes more complex and harder to analyze (competing sound sources, diffused virtual locations rather than compact coherent ones, etc.), the more likely it is that top-down lexical processes will be engaged, and individual differences in speech comprehension to be related to measures of top-down lexical processing. Note also that the complexity of the listening situation need not affect processes subsequent to word recognition. Hence measures indexing the contribution of higher-order processes involved in language comprehension (e.g., integration of information across talkers and with stored knowledge) might not be affected by the acoustic parameters of the auditory scene.

The behavioral evidence that the nature of the acoustic scene, and the age and linguistic competence of the listener, modify the engagement of different auditory and cognitive processes involved in speech comprehension is consistent with recent findings from brain-imaging studies. These studies demonstrate that the degree to which the different neural networks involved in speech comprehension are activated, is modulated by the degree of stimulus complexity, type of task, and age. Previous neuroimaging studies which attempted to map the brain areas involved in speech perception and comprehension demonstrated a frontaltemporal network in which temporal regions subserve bottom-up processes, whereas frontal regions subserve top-down processes (Zekveld et al., 2006). This network seems to be differentially activated depending on the nature of the auditory stimuli and the complexity of the task (Benson et al., 2001; Zekveld et al., 2006). In addition, neural activation seems to not only be affected by the characteristics of the stimuli and task, but also by the characteristics of the listeners as well. Harris et al. (2009) examined the performance of both younger and older adults on word recognition task in which the intelligibility of the stimuli was manipulated using low-pass filtering. Their results showed no age differences in the auditory cortex but differences were found in the anterior cingulate cortex which is presumed to be associated with attention. Age related differences were also found in the Wong et al. (2009) study in which younger and older adults were asked to identified single words in quiet and in two multitalker babble noise conditions (SNR = 20, −5). The fMRI results for older adults showed reduced activation in the auditory cortex but increased activation in the prefrontal and precuneus regions which are associated with working memory and attention. The increased cortical activities in the general cognitive regions were positively correlated with the behavioral results in the older adults. Wong et al. interpreted this correlation, as well as a more diffused activation involving frontal and ventral brain found in the older adults, as an indication of a possible compensatory strategy used in older age. These studies provide evidence for possible age related changes in the involvement of the different brain regions engaged in speech recognition in noise. As Scott and McGettigan (2013) note in their recent review of the neural processing of masked speech, "Further outstanding challenges will be to identify cortical signatures that are masker specific and that might be recruited for both energetic/modulation masking and informational masking,*...*and address the ways that aging affects the perception of masked speech while controlling for intelligibility (page 65, last paragraph)".

In general then, we would expect that the auditory and cognitive processes that are engaged in speech comprehension to be modulated by a number of factors including but not limited to: (1) the complexity of the auditory scene, (2) the nature of the speech material, (3) the task demands placed on the individual, and (4) individual differences in the auditory, linguistic, and cognitive skills and knowledge available to the listener. Future studies which will further examine how one or more of these factors modulate the contribution of auditory and cognitive processes are required. It could be interesting, for example, to conduct a similar study using other background noises such as speech spectrum noise or competing conversations, which differ in the levels of energetic and informational masking created, to further explore the effect masking type may have on the involvement of the different processes which support speech comprehension.

#### **ACKNOWLEDGMENTS**

This work was supported by Canadian Institutes of Health Research grants (MOP-15359, TEA-1249) and by a European Research Area in Ageing 2 grant. We would like to thank Jane Carey, Michelle Bae, and James Qi for their assistance in conducting these experiments.

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 11 September 2013; paper pending published: 16 October 2013; accepted: 27 January 2014; published online: 25 February 2014.*

*Citation: Avivi-Reich M, Daneman M and Schneider BA (2014) How age and linguistic competence alter the interplay of perceptual and cognitive factors when listening to conversations in a noisy environment. Front. Syst. Neurosci. 8:21. doi: 10.3389/fnsys. 2014.00021*

*This article was submitted to the journal Frontiers in Systems Neuroscience.*

*Copyright © 2014 Avivi-Reich, Daneman and Schneider. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## **APPENDIX A**

#### **EVALUATING THE CONTRIBUTION OF VOCABULARY AND READING COMPREHENSION TO R-SPIN THRESHOLD AND THE AVERAGE PERCENTAGE OF QUESTIONS CORRECTLY ANSWERED IN THE CURRENT STUDY**

We first checked to see whether vocabulary and/or reading comprehension was related to the average percentage of questions correctly answered by the 72 participants in the experiment. In this and the other tests conducted here we centered the vocabulary and reading comprehension scores within each of the six groups by subtracting the mean score of a group from each of the measures in the group. The six groups were: the 12 younger native-English listeners in the separation condition; 12 younger native-English listeners in the no-separation condition; 12 young nonnative-English listeners in the separation condition; 12 young nonnative-English listeners in the no-separation condition; 12 older native-English listeners in the separation condition; and 12 older native-English listeners in the separation condition. We also centered the average percentage of questions answered correctly in each of these six groups. Centering in this fashion allowed us to evaluate the relative contribution of vocabulary to performance within each group of participants. A regression analysis found a significant correlation between both vocabulary and performance (*r* = 0*.*38, *p <* 0*.*001) and reading comprehension and performance (*r* = 0*.*32, *p* = 0*.*007).

After finding that both vocabulary and reading comprehension were related to average percentage of questions correctly answered, we then investigated whether the relationship between the two language measures and the dependent variable interacted with one or more of the three factors. Specifically, we tested three hypotheses concerning the slopes of the lines relating these two measures to percent correct. Before conducting this analysis, we centered the percent correct scores within each of the 12 conditions (2-Separations × 3 Groups × 2 Types of Spatial Location) in order to be able to compare the relative contribution of vocabulary and reading comprehension across these 12 conditions. Because both of the measures of language ability and the dependent variable were centered in all conditions and groups, the linear relationship between the individual measures and the dependent variable in a condition was specified by a single parameter, namely the slope of the line relating the measure to performance. Hence, the full model is specified by

$$Y\_{i,j,k,m} = A\_{i,-,j,k} V\_{i,j,m} + B\_{i,j,k} R\_{i,j,m} + e\_{i,j,k,m} \tag{A1}$$

where *Yi,j,k,<sup>m</sup>* is the centered percent correct score in the *i*th level of Separation (voices separate vs. co-located), *j*th Group (younger native-English listeners, older native-English listeners, young nonnative-English listeners), *k*th Type of Location (real vs. virtual), of the *m*th individual in Group *j* and Separation Condition *i*. *Vi,j,<sup>m</sup>* and *Ri,j,<sup>m</sup>* are the centered vocabulary and reading scores for the *m*th individual from Group *j* and Separation Condition *i*, respectively, and the *ei,j,k,<sup>m</sup>* are assumed to be random normal deviates with mean = 0, and standard deviation = σ.

To test whether the relationships between the vocabulary and reading comprehension measures and the dependent variable were independent of the Type of Location (real vs. virtual), we defined and fit a model in which

$$Y\_{i,j,k,m} = A\_k V\_{i,j,m} + B\_k R\_{i,j,m} + e\_{i,j,k,m}.\tag{A2}$$

This model allows for the slopes relating the two language measures to the dependent variable to differ only between the real and virtual spatial location conditions. The best-fitting least-squares parameters of this model are the *ak* and the *bk*. We then tested the null hypothesis that *Ak* <sup>=</sup> <sup>1</sup> = *Ak* <sup>=</sup> <sup>2</sup> after adjusting the dependent measure for the estimated contribution of reading comprehension, *R*, to performance. In this adjusted model the dependent variable is *Y <sup>R</sup>,i,j,k,<sup>m</sup>* = *Yi,j,k,<sup>m</sup>* − *bkRi,j,<sup>m</sup>* This null hypothesis was rejected [*F(*1*,* <sup>142</sup>*)* = 4*.*70, *p* = 0*.*03], indicating that the relationship between the centered vocabulary scores and the dependent variable differed between the real and virtual spatial location conditions. However, when the dependent variable was adjusted for the contribution of vocabulary to performance (*Y V,i,j,k,m* = *Yi,j,k,<sup>m</sup>* − *akVi,j,m*), the null hypothesis that *Bk* <sup>=</sup> <sup>1</sup> = *Bk* <sup>=</sup> 2, could not be rejected [*F(*1*,* <sup>142</sup>*) <* 1].

To test whether the relationship of both vocabulary and reading comprehension to performance differed between the noseparation and separation conditions, we first averaged over the Within-Subject factor to arrive at a full Between-Subjects model

$$
\overline{Y}\_{i,j,m} = A\_{i,j} V\_{i,j,m} + B\_{i,j} R\_{i,j,m} + \overline{e}\_{i,j,m}.\tag{A3}
$$

We then defined a model in which

$$
\overline{Y}\_{i,j,m} = A\_i V\_{i,j,m} + B\_i R\_{i,j,m} + \overline{e}\_{i,j,m}.\tag{A4}
$$

This model allows for the slope relating the language measures to the dependent variable to differ only between the situations in which the two voices were separated vs. when they were colocated. We then tested the null hypotheses that *Ai* <sup>=</sup> <sup>1</sup> = *Ai* <sup>=</sup> <sup>2</sup> after correcting for the contribution of reading comprehension to performance in the same fashion as described above. We also tested the null hypothesis that *Bi* <sup>=</sup> <sup>1</sup> = *Bi* <sup>=</sup> <sup>2</sup> after correcting for contribution of vocabulary to performance. Neither null hypothesis could be rejected [*F(*1*,* <sup>70</sup>*) <* 1 for *Ai* <sup>=</sup> <sup>1</sup> = *Ai* <sup>=</sup> 2, and *F(*1*,* <sup>70</sup>*)* = 1*.*42, *p* = 0*.*24 for *Bi* <sup>=</sup> <sup>1</sup> = *Bi* <sup>=</sup> 2].

To test whether the contribution of vocabulary and reading comprehension to the dependent variable differed among the three groups, we first averaged over the Within-Subject factor and defined a model in which

$$
\overline{Y}\_{i,j,m} = A\_j V\_{i,j,m} + B\_j R\_{i,j,m} + \overline{e}\_{i,j,m}.\tag{A5}
$$

This model allows for the slopes relating the two language measures to the dependent variable to differ among the three groups (younger native-English listeners, older native-English listeners, young nonnative-English listeners). We then tested the null hypotheses that *Aj* <sup>=</sup> <sup>1</sup> = *Aj* <sup>=</sup> <sup>2</sup> = *Aj* <sup>=</sup> <sup>3</sup> after correcting for reading comprehension, and *Bj* <sup>=</sup> <sup>1</sup> = *Bj* <sup>=</sup> <sup>2</sup> = *Bj* <sup>=</sup> <sup>3</sup> after correcting for vocabulary. The null hypothesis that *Aj* <sup>=</sup> <sup>1</sup> = *Aj* <sup>=</sup> <sup>2</sup> = *Aj* <sup>=</sup> <sup>3</sup> could not be rejected [*F(*2*,* <sup>69</sup>*) <* 1], but the null hypothesis that *Bj* <sup>=</sup> <sup>1</sup> = *Bj* <sup>=</sup> <sup>2</sup> = *Bj* <sup>=</sup> <sup>3</sup> was rejected [*F(*2*,* <sup>69</sup>*)* = 6*.*23, *p <* 0*.*01] Because the relationship between the reading comprehension score and percent correct (adjusted for the contribution of vocabulary) differed across the three groups, we tested three subhypotheses: (1) younger native-English listeners' slope = young nonnative-English listeners' slope; (2) younger native-English listeners' slope = older native-English listeners' slope; and (3) young nonnative-English listeners' slope = older native-English listeners' slope. Only the difference between younger native-English listeners' and young nonnative-English listeners' slopes was significant [*F(*1*,* <sup>69</sup>*)* = 12*.*37. *p <* 0*.*01] at the 0.05 level after applying a Bonferroni correction.

It should be pointed out that we obtained exactly the same pattern of results when we independently examined the contributions of reading comprehension and vocabulary to performance, that is, without correcting for the effect of the other language variable on performance.

Recall that R-SPIN thresholds were determined in two conditions: (1) when the precedence effect was used to determine the locations of the voices and babble (virtual location), and (2) when the voices and babble were played over individual loudspeakers (real location). We examined whether either the centered vocabulary measures or the centered reading comprehension scores were related to both sets of centered R-SPIN thresholds. In conducting these tests, we eliminated the young nonnative-English individual in the no-separation group whose R-SPIN threshold for the virtual location condition was more than 3 standard deviations above the mean of that individual's group. None of these four correlations approached significance (*p >* 0*.*2 in all four cases).

## **APPENDIX B**

#### **EVALUATING THE CONTRIBUTION OF VOCABULARY AND READING COMPREHENSION TO PERFORMANCE IN THE QUIET CONDITIONS OF THE Murphy et al., 2006 STUDY**

The same vocabulary and reading comprehension measures taken on the participants in this study were also taken on the 96 participants (48 young, 48 old) in the Murphy et al. (2006) study, which used the same two-person plays. Four experiments (12 younger and 12 older participants in each experiment) were conducted in both quiet and babble. In this analysis, the dependent variable was the percentage of questions answered in the quiet part of all four experiments. The vocabulary, reading comprehension, and number of questions answered correctly were first centered in each of the four experiments for the two age groups. The full model in the analysis of these data was

$$Y\_{i,j,m} = A\_{i,j} V\_{i,j,m} + B\_{i,j} R\_{i,j,m} + e\_{i,j,m} \tag{A6}$$

where *Yi,j,<sup>m</sup>* is the centered average percentage of questions correctly answered by the *m* participants in the quiet conditions for Age Group *j* of Experiment *i*, and *V, R, A, B* and *e* are defined as above. We then defined and fit a model in which the *A* and *B* coefficients were the same for all four experiments and differed only with respect to the Age Group to which the participants belonged. Specifically, we defined and fit the model

$$Y\_{i,j,m} = A\_j V\_{i,j,m} + B\_j R\_{i,j,m} + e\_{i,j,m} \tag{A7}$$

We then tested and failed to reject the null hypothesis that *Bj* <sup>=</sup> <sup>1</sup> = *Bj* <sup>=</sup> <sup>2</sup> after adjusting for the contribution of vocabulary to performance as described above [*F(*1*,* <sup>94</sup>*) <* 1]. Least Squares regressions of the adjusted percentage correct scores against reading comprehension for both the younger and older participants were highly significant (slopeyoung = 0*.*90*, r*young = 0*.*47*, p* = 0*.*0006; *slopeold* = 0*.*98*, r*old = 0*.*62*, p <* 0*.*0001*)*.

After adjusting the centered average percentage of questions correctly answered for the contribution of reading comprehension, there was no evidence of any relationship of the dependent variable to vocabulary (slopeyoung = 0*.*04*, r*young = 0*.*01*, p >* 0*.*5; slopeold = 0*.*72*, r*old = 0*.*21*, p >* 0*.*15*)*.

## Seeing the talker's face supports executive processing of speech in steady state noise

## *Sushmit Mishra1\*, Thomas Lunner1,2,3, Stefan Stenfelt1,2, Jerker Rönnberg1 and Mary Rudner1*

*<sup>1</sup> Linnaeus Centre HEAD, Swedish Institute for Disability Research, Department of Behavioural Sciences and Learning, Linköping University, Linköping, Sweden*

*<sup>2</sup> Department of Clinical and Experimental Medicine, Linköping University, Linköping, Sweden*

*<sup>3</sup> Eriksholm Research Centre, Oticon A/S, Snekkersten, Denmark*

#### *Edited by:*

*Jonathan E. Peelle, Washington University in St. Louis, USA*

#### *Reviewed by:*

*Esther Janse, Radboud University Nijmegen, Netherlands Brent P. Spehar, Washington University School of Medicine, USA*

#### *\*Correspondence:*

*Sushmit Mishra, Linnaeus Centre HEAD, Swedish Institute for Disability Research, Department of Behavioural Sciences and Learning, Linköping University, SE-581 83 Linköping, Sweden*

*e-mail: sushmit.mishra@liu.se*

Listening to speech in noise depletes cognitive resources, affecting speech processing. The present study investigated how remaining resources or cognitive spare capacity (CSC) can be deployed by young adults with normal hearing. We administered a test of CSC (CSCT; Mishra et al., 2013) along with a battery of established cognitive tests to 20 participants with normal hearing. In the CSCT, lists of two-digit numbers were presented with and without visual cues in quiet, as well as in steady-state and speechlike noise at a high intelligibility level. In low load conditions, two numbers were recalled according to instructions inducing executive processing (updating, inhibition) and in high load conditions the participants were additionally instructed to recall one extra number, which was the always the first item in the list. In line with previous findings, results showed that CSC was sensitive to memory load and executive function but generally not related to working memory capacity (WMC). Furthermore, CSCT scores in quiet were lowered by visual cues, probably due to distraction. In steady-state noise, the presence of visual cues improved CSCT scores, probably by enabling better encoding. Contrary to our expectation, CSCT performance was disrupted more in steady-state than speechlike noise, although only without visual cues, possibly because selective attention could be used to ignore the speech-like background and provide an enriched representation of target items in working memory similar to that obtained in quiet. This interpretation is supported by a consistent association between CSCT scores and updating skills.

**Keywords: cognitive spare capacity, executive processing, working memory, updating, inhibition, speech processing**

#### **INTRODUCTION**

Listening in noise challenges explicit cognitive abilities (Rönnberg et al., 2008, 2013). A substantial body of work has shown that once audibility has been accounted for, individual working memory capacity (WMC) accounts for a large part of the variance in the ability to understand speech in noise (Humes, 2007; Akeroyd, 2008). This is not surprising as speech understanding requires encoding of the speech input for temporary storage, inferring meaning and at the same preparing for an appropriate response (Pichora-Fuller and Singh, 2006; Rudner and Lunner, 2013). Because WMC is limited, fewer cognitive resources will remain after processing of a message heard in noise compared to one heard in quiet (Pichora-Fuller, 2007; Lunner et al., 2009). This line of argument has sparked recent interest in measuring cognitive spare capacity (CSC), that is, the cognitive capacity that remains once successful listening has taken place (Rudner et al., 2011a). The concept of CSC is similar to the concept of WMC. However, whereas WMC often refers to a general capacity that is typically measured using a text-based visual task such as the reading span task (Daneman and Carpenter, 1980; Rönnberg et al., 1989), CSC refers specifically to a cognitive reserve that has been depleted by listening under adverse conditions such as against a background of noise or with a hearing impairment. It is likely that different cognitive functions do not tap into a single cognitive resource but have their own dedicated and distinct resources (Mishra et al., 2013). During speech understanding, a cognitive resource that is depleted in the act of speech perception may be compensated for fully or partially by another cognitive process. Hence, a measure of WMC may not provide adequate assessment of CSC. Being able to gauge individual CSC may be an important factor in designing and evaluating interventions for individuals with various communication difficulties. For example, it is likely to assist in appropriate fitting of hearing aids (Rudner et al., 2011a; Mishra et al., 2013).

In a recent publication (Mishra et al., 2013), we evaluated a test of CSC (CSCT) that probes the ability to perform different executive tasks (updating and inhibition) under high and low memory load based on two-digit numbers presented in the auditory modality, with or without a video of the talker's face. We found that CSC, measured using the CSCT, did not correlate with WMC, measured using the reading span task, suggesting that CSC is quantitatively and qualitatively different from WMC. Rather surprisingly, we found that CSC was reduced when the speaker's face was visible when the to-be-remembered stimuli were presented in quiet conditions. We suggested that when speech is fully audible and there is no competing noise, seeing the talker's face may act as a distraction during performance of the executive tasks. This interpretation is in line with other recent work demonstrating that when the intelligibility of audiovisual (AV) stimuli is equated with that of auditory-only (A-only) stimuli, listening effort gauged using a dual task procedure increases in the AV modality (Fraser et al., 2010; Gosselin and Gagné, 2011). The purpose of the present study was to replicate the findings of Mishra et al. (2013) and to investigate CSC for speech presented in noise.

The Ease of Language Understanding model (ELU; Rönnberg, 2003; Rönnberg et al., 2008, 2013) assumes that in optimal listening conditions, the incoming language signal can be readily matched with lexical and phonological representations stored in Long Term Memory (LTM), making speech understanding implicit. But in the presence of adverse conditions which may include noise or signal degradation (Mattys et al., 2012), a mismatch may occur between the incoming signal and stored representations. In a mismatch situation, explicit or effortful conscious processing is required to infer meaning from the incoming fragments of information. Such processing may include the abilities to achieve linguistic closure and gain access to previous knowledge stored in LTM (Rönnberg et al., 2010; Besser et al., 2013). Also, it has been suggested that individuals may compensate for information lost during signal degradation by directing their attentional capacity towards understanding the signal (Mattys et al., 2012). The use of explicit processing or involvement of attentional capacity for speech perception involves the executive functions of updating and inhibition (Mishra et al., 2010, 2013; Rudner et al., 2011a; Sörqvist and Rönnberg, 2012; Rönnberg et al., 2013). Updating refers to the monitoring and coding of information which is relevant to the task at hand and inhibition involves deliberate, controlled suppression of prepotent responses (Miyake et al., 2000). For example, while listening in modulated noise, the executive function of inhibition is used to suppress distracting noise (Janse, 2012). The involvement of executive functions in the act of listening, especially under adverse listening conditions, may come at the cost of depleted cognitive resources for processing heard material. This means that modulated noise may be more disruptive of performance on the inhibition than the updating task of the CSCT and that individual executive ability may specifically predict the ability to process speech heard in noise (Rudner et al., 2011b).

The ability to perceive speech (Mattys et al., 2012) and to remember speech (Pichora-Fuller et al., 1995; Murphy et al., 2000; Sörqvist and Rönnberg, 2012) is influenced in different ways by different types of noise. It has been regularly observed that speech recognition scores are higher in modulated compared to steady state noise at similar signal-to-noise ratios (SNR) (e.g., Duquesnoy, 1983) at least for younger adults without hearing impairment (George et al., 2006; Zekveld et al., 2013). However, speech recognition in modulated compared to steady state noise is also more likely to be associated with individual cognitive abilities such as WMC (Rönnberg et al., 2010; Besser et al., 2013; Zekveld et al., 2013) and linguistic closure (Zekveld et al., 2007; Besser et al., 2013). It is also likely to be more effortful both in terms of physiological response such as pupil dilation (Koelewijn et al., 2012) and subjective self-ratings, even when performance is better (Rudner et al., 2012). Recently, Janse (2012) showed that listening to speech in speech noise demands inhibition. These findings suggest that better speech recognition in modulated compared to steady state noise is dependent on cognitive resources. This applies in particular to speech noise (Zekveld et al., 2013) and hence CSC may be reduced more by listening to speech in speech-like noise than in steady state noise. Noise has been shown to disrupt recall of spoken items even when intelligibility is fairly high (Murphy et al., 2000). Emerging evidence suggests that speech-like noise, compared to steady state noise, is more disruptive of short-term memory retention of fully audible items in older adults with hearing loss (Ng et al., 2013). Thus, in the present study, we expected lower CSCT scores in noise than in quiet, but higher in steady state than speech-like noise. Because speech recognition in speech-like noise seems to specifically tax inhibition resources, we speculated that listening in speech-like noise might actually reduce the inhibition resources available for performing the CSCT. Thus, we expected speech-like noise to lower CSCT scores more in the inhibition than updating task.

Speech perception in the presence of noise is usually enhanced by observation of the talker's face. Lips, teeth and tongue may provide disambiguating information that is complementary to less well specified auditory information, by helping to determine manner and more importantly place of articulation (Grant et al., 1998). This can provide a substantial benefit in signal to noise ratio (Campbell, 2009) for speech recognition in noise (see also Hygge et al., 1992). The advantage of AV presentation has even been observed when only a graphic representation of the movement of the articulators was shown during speech detection in noise (Tye-Murray et al., 2011). Such visual cues do not provide disambiguating information and thus this finding was interpreted as suggesting that visual cues help the listener to direct their attentional capacities to the incoming signal at the most critical time to encode the target (c.f. Helfer and Freyman, 2005). This causes better signal detection and fewer cognitive demands in anticipating target stimuli in AV compared to A-only presentation (Besle et al., 2004; Moradi et al., 2013). Thus, visual cues help segregate target stimuli from interfering noise. Notwithstanding, it has been found that visual cues reduce performance on executively demanding auditory tasks (Fraser et al., 2010; Gosselin and Gagné, 2011; Mishra et al., 2013).

In a recent article, Yovel and Belin (2013) suggested that despite sensory differences, the neurocognitive mechanisms engaged by perceiving faces and voices are highly similar, facilitating integration of visual and speech information. Indeed, recent investigations of the episodic buffer of working memory (Baddeley, 2000) have shown that contrary to predictions, multimodal integration in working memory is not executively taxing (Rönnberg et al., 2010; Baddeley, 2012). One of the characteristics of the episodic buffer is that it communicates with LTM and exploits the quality of representations stored there (Rönnberg et al., 2013). Moradi et al. (2013) found that the AV speech recognition in the presence of noise for subjects with normal hearing is faster, more accurate and less effortful than A-only speech recognition and inferred that AV presentation taxes cognitive resources to a lesser extent by reducing working memory load. On the basis that AV presentation provides additional complimentary visual cues and helps in anticipating target onset in the presence of noise, we predict that seeing the face of the talker while performing the CSCT in noise will enable participants to form richer representations of the target items that will better survive the executive processing that the task demands. We do not expect visual information to increase the cognitive burden when CSCT is performed in noise. However, in our previous study, CSCT scores in quiet were found to be higher without visual cues in participants with normal hearing (Mishra et al., 2013). This is line with other work showing that visual cues may interfere with performance on executively challenging tasks (Fraser et al., 2010; Gosselin and Gagné, 2011) and may be due to difficulties in prioritizing task-related processing in the presence of low-priority stimuli (Lavie, 2005). In the quiet conditions of the CSCT, the numbers are fully audible and thus visual cues become low-priority stimuli. Thus, the superfluous visual information in the AV modality may act as a distractor when it is not required to segregate the target stimuli from a noise background. Hence, in the present study, we predicted higher CSCT scores in Aonly compared to AV modality in quiet and the opposite in noise.

The main purpose of the present study was to investigate how noise influences CSC. In summary, we predicted that noise would disrupt executive processing of intelligible auditory twodigit numbers leading to lower CSCT performance than in quiet. Speech-like noise would be more disruptive than steady-state noise given the same SNR, particularly during the inhibition task. Seeing the talker's face would counteract the noise decrement by helping the listener segregate target from noise and generate richer cognitive representations. However, in quiet conditions, visual cues would act as a distractor and reduce performance.

We have argued that cognitive skills like WMC (Pichora-Fuller and Singh, 2006; Rudner and Lunner, 2013; Rönnberg et al., 2013), executive functions (Mishra et al., 2013), linguistic closure and LTM (Rönnberg et al., 2010) are engaged during the complex processing involved in speech understanding. Moreover, linguistic closure and LTM may play a specific role in solving the updating and inhibition tasks in the CSCT. Even if only part of a twodigit number has been perceived it may provide task-relevant information. For example in an updating task requiring retention of the highest numbers by each speaker, a new list number in the twenties by a particular speaker can be discarded if the retained number by the same speaker is in the thirties or higher. In this particular case it is sufficient to achieve LTM access and linguistic closure for the part of the number that provides the necessary information. To assess the contribution of individual cognitive functions towards CSC and speech intelligibility, we administered a cognitive test battery in the present study along with the CSCT. This battery included, the reading span test (Daneman and Carpenter, 1980; Rönnberg et al., 1989) as a measure of WMC, the Text Reception Threshold test (TRT; Zekveld et al., 2007) as a measure of linguistic closure, the Letter Memory Task (Morris and Jones, 1990; Miyake et al., 2000) as a measure of updating, the Simon task (Simon, 1969; adapted from Pratte et al., 2010) as a measure of inhibition, and delayed recall of the reading span stimuli to measure episodic LTM. We did not predict an overall association between CSCT performance and reading span because reading span measures WMC and not specifically the ability to deploy executive processing resources during listening in noise. However, we did expect an association with TRT and delayed recall of the reading span stimuli because linguistic closure, which TRT measures, and LTM, which delayed recall measures may play key roles in CSC. We predicted that performance on the updating and inhibition tasks of the CSCT would be associated with the respective independent measures of executive function and that individual executive abilities would be more predictive of CSCT performance in noise than in quiet.

## **METHODS**

## **PARTICIPANTS**

Twenty native Swedish speakers with either continuing or completed university education, including 11 females and 9 males, participated in the experiment. They were 19–35 years of age, M = 25.9; SD = 4.4. Hearing thresholds of the participants were within normal limits (better than 25 dB HL) in the frequency range of 125–8 kHz. The participants did not report any psychological or neurological problems. Visual acuity after correction was normal as measured by the Jaeger eye chart (Weatherly, 2002). Ethical approval for the study was obtained from the regional ethical review board, Linköping, Sweden.

#### **MATERIAL**

The CSCT stimuli consisted of AV and A-only recordings of the Swedish two-digit numbers 13–99 spoken by a male and a female native Swedish speaker (Mishra et al., 2013). The levels of the numbers were equated for 50% intelligibility in steady state noise using a group of ten young adults with normal hearing. This was accomplished by increasing the SNR in steps of 1 dB for each number until a correct response was given in a procedure similar to that described in Hällgren et al. (2006). For more details of CSCT materials, see Mishra et al. (2013).

## **NOISE**

The stationary noise was a steady-state speech-weighted (SSSW) noise, having the same long term average spectrum as the recorded numbers. The modulated noise was the International Speech Testing Signal (ISTS; Holube et al., 2010). The ISTS noise is designed to be speech-like but unintelligible and is thus composed of concatenated speech segments of around 10 ms duration in six languages (American English, Arabic, Mandarin, French, German and Spanish) spoken by six different female speakers. The stimuli and noises were calibrated for the same root mean square (RMS) levels initially and then the noise levels were changed keeping the speech level constant to obtain individualized SNRs.

#### **INDIVIDUALIZING SNR**

The stimulus materials (numbers) in A-only modality and the SSSW noise were used in an adaptive procedure to determine the individualized SNR for presentation of the CSCT. In this adaptive procedure, the first stimulus was presented at an SNR of 5 dB and the participants were instructed to repeat the numbers they heard and were encouraged to guess if they were unsure. For the first run, for each new presented number that was repeated correctly, the noise was increased by steps of 3 dB until the participant's response was incorrect. Thereafter, the step size was changed to 1 dB and 30 numbers were randomly selected and presented consecutively to determine the 84% intelligibility level adaptively in a four-down/one-up procedure (Levitt, 1971). In the second step, the SNR obtained for 84% intelligibility was increased by 0.5 dB to give an approximate intelligibility level of 90% in SSSW noise. This 0.5 dB increment yielded an approximate intelligibility level of 90% in SSSW noise as verified in a piloting study using six participants with normal hearing. The 90% level was chosen so that the participants could perceive most numbers, in order to perform the tasks in CSCT, but requiring effort. To verify the intelligibility at this new SNR, 60 numbers, again randomly selected from the stimulus material, were used. These numbers were presented at the set SNR and the intelligibility with SSSW and ISTS noise was obtained independently. The same individualized SNR levels were applied in the SSSW and ISTS noise during CSCT presentation. The above tests were implemented in MATLAB (Version 2009b).

#### **COGNITIVE SPARE CAPACITY TEST (CSCT)**

In the CSCT (Mishra et al., 2013), 48 lists of 13 two-digit numbers (13–99) are presented serially and after each list the participant is requested to report two specified list items, depending on the predetermined criteria. These were designed to elicit updating and inhibition. In the updating task, the participants are asked to recall either the highest (in one version) or the lowest (in the other version) value item spoken by the male and female speaker in the particular list. In the inhibition task, the participants are asked to recall either two odd (in one version) or even (in the other version) value items spoken by a particular speaker. In half the trials, two numbers only are reported; these are low memory load trials. These two numbers are never the first item in the list. In the other half, high memory load trials, the first number in the list is also reported along with the two specified items, i.e., three numbers in total need to be held in working memory but only two of them are subject to processing. The first number (dummy item) in the high memory load trials is not included in the scoring. Thus, all scoring in the CSCT is based on correct report, in any order, of two numbers. These tasks are performed with either AV or A-only stimulus presentation and in the present study presentation took place in quiet (no noise), SSSW noise and ISTS noise.

The CSCT was administered using DMDX software (Forster and Forster, 2003; Mishra et al., 2013). The participants performed all conditions of each of the executive tasks in a balanced order in a single block. Thus, over both task blocks, there were a total of 24 conditions of presentation in the CSCT with two executive tasks, two memory loads, two modalities of presentation and three noise conditions in a 2 × 2 × 2 × 3 design. Two lists per condition were tested. The order of the conditions was pseudorandomized within the two executive task blocks and balanced across the participants. For the noisy conditions, the noise sound files were played together with the AV and A-only stimulus files in DMDX with noise onset 1 s prior to stimulus onset and offset of at least 1 s after stimulus offset. The lists of numbers were always presented at 65 dB SPL and the level of the noise was varied depending upon the individualized SNR level. Across all the conditions (noisy or quiet), the duration of presentation of each list was 33 s. The time from onset of one stimulus to the onset of the next was 2.5 s. The visual stimuli were presented using a computer with a screen size of 14.1 inches and screen resolution of 1366 × 768 pixels. The video was displayed in 720 × 576 pixels resolution in the center of the screen and the auditory stimuli were presented through Sennheiser HDA 200 headphones.

Before each of the executive task blocks, the participants were provided with written instructions for the particular executive task and the instructions were also elaborated verbally. Before each list, the participant was prompted on the computer screen as to which version of the executive task was to be performed, what the modality was and whether to remember two or three numbers (low or high load). This prompt remained on screen until the participant pressed a button to continue to the test. The order of the noise conditions was pseudo-randomized. At the end of each list, an instruction "Respond now" appeared on the screen and the participant was required to report the target numbers orally. The participants could make corrections in reported numbers and then pressed another button when they were ready to continue. The oral responses of the participants were audio recorded. The participants were specifically instructed to keep looking at the screen during stimulus presentation. This applied even during presentation in the A-only modality where a fixation-cross was provided at center screen. All the participants practiced each task with two lists before doing the actual test.

## **COGNITIVE TEST BATTERY** *Reading span test*

In the reading span test (Daneman and Carpenter, 1980; Rönnberg et al., 1989), the participant read series of sentences in Swedish consisting of three words which appeared on the computer screen one at a time. Each word was shown for 800 ms with an interval of 50 ms between words. Each series consisted of three to six sentences presented in increasing series length. Half of the sentences were coherent and half were nonsense. The participants' task was to make a sematic judgment in 1.75 s and respond "yes" (if the sentence was coherent) or "no" (if the sentence was nonsense) before the next sentence appeared. At the end of each series of sentences, the participants were prompted by an instruction on the screen to recall either the first or the last word of all the sentences in the series in the order they appeared on the screen. The participants were provided with written instructions about the test and they did practice with a series of three sentences before the actual testing. There were a total of 54 sentences and the total score obtained on the recall in any order of first and final words was used in the analysis.

#### *Text reception threshold (TRT)*

The TRT (Zekveld et al., 2007) test is a visual analogy of a test of speech recognition in noise and measures ability to read partially masked text. A Swedish version of the TRT, using the Hearing In Noise Test (HINT) sentences (Hällgren et al., 2006) was used. The test consisted of presentation of three lists of 20 HINT sentences each, the first list being a practice list. The sentences appeared word by word on the screen in red masked by vertical black bars with the preceding words remained on the screen until the sentence was completed. After presentation of the last word, the sentence remained visible for 3.5 s. The presentation rate of the words in each sentence was equal to the speaking rate in the corresponding speaker file. A one-up-onedown adaptive procedure with a step-size of 6% was applied to target percentage of unmasked text required to read 50% of the sentences entirely correct. The average percentage of unmasked text from the two lists of sentences was used as dependent variable.

#### *Letter memory task*

In the letter memory task (Morris and Jones, 1990; Miyake et al., 2000), the participants were presented with sequences of letters and asked to hold the four most recent letters in mind and then prompted to say them at the end of each sequence. Responses were audio-recorded. DMDX software was used to present lists of 5, 7, 9 or 11 consonants serially at the center of the computer screen. Two lists consisting of 7 and 9 letters were presented as practice and the testing consisted of 12 lists. Sequence length of test lists was randomized across trials to ensure that the participant followed the instructed strategy and continuously updated working memory until the end of the trial. The score was the number of consonants correctly recalled irrespective of order.

#### *Simon task*

A visual analogue of the go/no go task (Simon, 1969; adapted from Pratte et al., 2010), using red and blue rectangular blocks which appeared on the left or the right of the computer screen successively at intervals of 2 s. The participants were required to respond as quickly as possible by pressing a button on the right hand side of the screen when they saw a red block and when they saw a blue block they pressed a button on the left hand side of the screen. A total of 16 blocks were presented using DMDX without a practice. The participant had to ignore the spatial position in which the block appeared in the task and when the spatial position of the stimulus and correct response key coincided, the trial was termed congruent otherwise incongruent. The difference in reaction time between the incongruent and congruent trials was taken as a measure of inhibition.

#### *Delayed recall of reading span test*

Delayed recall of the items presented during the reading span test was used to assess episodic LTM. In this test the participants were asked to recall, without forewarning, words or whole sentences remembered from the reading span test after approximately 60 min. During the 60 min, the participants performed the other tests in the cognitive test battery. The score on delayed recall test was the total number of words correctly recalled by the participant irrespective of the order.

## **PROCEDURE**

The participants, on arriving for the testing, were fully briefed about the study and a consent form was signed. All the participants underwent vision screening and audiometric testing in the audiometric booth. The testing was conducted in two sessions. All auditory testing took place in a sound-treated booth with the participants facing the computer screen. Each session took approximately 90 min. The reading span test was administered followed by the Simon task, the letter memory test and the TRT test in a separate room. Individual SNRs for the CSCT were determined and the delayed recall of the reading span test concluded the first session. In the second session, the CSCT test was conducted. The participants were allowed to take breaks after different tests. Written instructions were provided for all the tests and the participants were given the opportunity to request oral clarification.

## **DATA ANALYSIS**

A repeated measures analysis of variance (ANOVA) on the CSCT scores was conducted. In order to test our a-priori hypotheses, planned comparisons were carried out and the simple main effect observed without a-priori hypothesis was investigated using posthoc Tukey's Honestly Significant Difference (HSD) test. In order to assess the association between cognitive functions and speech intelligibility and CSCT, Pearson's correlations with the cognitive test battery were computed.

## **RESULTS**

## **INTELLIGIBILITY**

In the noise conditions, the mean SNR for CSCT presentation was −2.17 dB (SD = 0.85). The mean intelligibility level was 93.8% (SD = 3.0) and 92.3% (SD = 2.9) for the SSSW and ISTS noise respectively. There was no statistically significant difference in speech intelligibility performance in SSSW and ISTS noise (*t* (38) = 1.58, *p* = 0.12).

#### **COGNITIVE SPARE CAPACITY TEST (CSCT)**

**Figure 1** displays the mean scores obtained in the 24 conditions of CSCT. Since there were two lists presented per condition, the maximum score per condition was four. Performance in the inhibition task in the low memory load conditions approached ceiling, and so all analyses of CSCT data were conducted on the rationalized arcsine-transformed scores (Studebaker, 1985) to counteract data skewing. Mean recall of the dummy item in the high memory load conditions, was 22.4 (SD = 1.3) out of 24 possible responses. This demonstrates that the participants carried out the CSCT task in the high memory load conditions according to the instruction.

The repeated measures ANOVA of the CSCT scores revealed main effects of all four factors: executive function, *F*(1, 19) = 10.01, MSE = 0.28, *p <* 0.01, memory load, *F*(1, 19) = 28.14, MSE = 0.23, *p <* 0.01, modality, *F*(1, 19) = 4.59, MSE = 0.10, *p <* 0.05 and noise, *F*(2, 38) = 18.25, MSE = 0.18, *p <* 0.01. The main effect of executive function and memory load showed higher

CSCT scores in inhibition than updating conditions and in low than high memory load conditions, in line with the predictions. Also, CSCT scores were higher in AV than A-only conditions. In order to identify significant differences in performance between the three levels of noise, pair-wise comparisons with Bonferroni adjustment for multiple comparisons were conducted. They revealed that CSCT scores in quiet and ISTS noise was significantly higher than in SSSW noise (*p <* 0.05) but there was no significant difference between the performance in quiet and ISTS noise (*p* = 1.00), see **Figure 1**.

The two-way interaction between noise and modality was significant, *F*(2, 38) = 6.78, MSE = 0.14, *p <* 0.01. We had a hypothesis that performance would be better in the A-only than AV modality in quiet but the opposite in noise. Planned comparisons revealed better performance in A-only compared to AV (*t* = 1.86, *p <* 0.05, one tailed) in quiet, and better performance in AV compared to A-only (*t* = 2.52, *p <* 0.05, one tailed) in SSSW in line with our predictions, see **Figure 2**. However, in ISTS there was no significant difference between performance in AV and A-only conditions (*t* = 1.1, *p >* 0.05, one tailed).

Post-hoc Tukey's HSD tests assessing the two-way interaction revealed that in the A-only modality of presentation, the scores was significantly higher in quiet (*p <* 0.01) and ISTS noise (*p <* 0.01) compared to SSSW noise and also that there was no significant difference between performance in AV modality in SSSW noise and A-only modality in ISTS noise, see **Figure 2**.

We had predicted that modulated noise would disrupt the inhibition task more than the updating task. However, there was no significant interaction between executive function and noise.

#### **COGNITIVE TEST BATTERY**

**Table 1** shows the mean performance and standard deviation in the cognitive test battery. In the reading span task, the mean performance on semantic judgment was 50.46 (SD = 3.20) out of 54 possible responses, demonstrating that the participants performed that part of the dual task in accordance with instructions.

The correlations between the tests in the cognitive test battery are shown in **Table 2**. The overall pattern of significant correlations without correction for multiple comparisons is shown.

The performance in the reading span test was significantly associated with performance in the letter memory test and the delayed recall of the reading span test. The performance in the TRT was significantly associated with the performance in the reading span and letter memory tests (c.f. Besser et al., 2013).

**Table 3** shows the correlations between cognitive tests and the SNR rendering 84% speech intelligibility for SSSW noise and actual speech intelligibility at estimated 90% intelligibility for SSSW and ISTS noise. The speech intelligibility scores in ISTS noise were associated with performance in the Simon task and TRT. Since the TRT score is the average percentage of unmasked text, a higher score in TRT indicates poorer performance. Similarly, in the Simon task, the score is the difference in reaction time between the incongruent and the congruent condition, so a greater difference indicates poorer inhibition ability.

The correlations between CSCT and the battery of cognitive tests are shown in **Table 4**. To gain a detailed picture of the association between CSCT and cognitive function, we looked at the CSCT overall as well as factor-wise. Performance on the letter memory test was associated with CSCT irrespective of how scores were split, except when CSCT was performed in SSSW noise.

**FIGURE 2 | Significant two-way interaction between modality of presentation and noise.** Raw scores for AV (unfilled bars) and the A-only modalities of presentation (filled bars) in the three noise conditions of CSCT are represented. Error bars represent standard errors. \* Indicates significance at 0.05 level (1-tailed).

#### **Table 1 | Mean performance in the cognitive test battery**.


There was no correlation between Simon and CSCT performance. However, TRT was associated with CSCT performance in inhibition conditions. TRT was also associated with performance in A-only modality and high load conditions of CSCT. Performance in reading span was associated with CSCT performance in quiet. Delayed recall of reading span test performance was not associated with performance in CSCT in any of the conditions. The correlation between overall performance in CSCT and TRT tended towards significance (*r* (20) = −0.43, *p* = 0.06).

#### **DISCUSSION**

In the present study, CSC was investigated in young adults with normal hearing. The CSCT was administered in quiet as well as in steady state (SSSW) and speech-like (ISTS) noise at individual speech intelligibility levels approximating 90%. CSCT scores were higher in inhibition conditions compared to updating conditions and when memory load was low compared to when it was high, in line with previous findings (Mishra et al., 2013). SSSW noise



*\* Correlation is significant at the 0.05 level (2-tailed)*

*\*\* Correlation is significant at the 0.01 level (2-tailed)*

reduced CSCT performance as predicted and seeing the talker's face counteracted the noise decrement. However, ISTS noise did not reduce CSCT performance. In quiet conditions, visual cues reduced performance as predicted.

There was no interaction between executive function and noise; hence our prediction that the scores in the inhibition subset of CSCT would be reduced more than in the updating subset in modulated noise was not supported. There was no overall association between CSCT performance and WMC but updating capacity predicted CSCT performance in all conditions.

#### **CSCT PERFORMANCE IN QUIET**

As predicted, we found higher CSCT scores in A-only modality compared to AV modality in quiet conditions. It has been argued that integration of visual information from the face of a talker with the speech produced by that talker does not consume cognitive resources (Campbell, 2009; Baddeley, 2012; Moradi et al., 2013; Yovel and Belin, 2013). However, it seems


*\* Correlation is significant at the 0.05 level (2-tailed)*

*\*\* Correlation is significant at the 0.01 level (2-tailed)*

that when speech has to be processed executively, superfluous information carried in the visual stream may reduce performance (Fraser et al., 2010; Gosselin and Gagné, 2011; Mishra et al., 2013). This may be because executive load makes it difficult to prioritize task-related processing in the presence of low priority stimuli (Lavie, 2005). In the quiet conditions of the CSCT, the auditory cues alone provide young adults with normal hearing thresholds with sufficient information for performing the executive tasks. Thus, for these individuals, the visual cues may simply compete for cognitive resources without enhancing task performance.

#### **EFFECT OF NOISE AND MODALITY OF PRESENTATION ON CSCT PERFORMANCE**

As predicted, CSCT scores were lower in noise than in quiet. However, this applied only to performance in SSSW. Based on the findings of Ng et al. (2013), we expected the CSCT scores to be lower in speech-like noise compared to steady state noise. However, contrary to our expectation, we found that CSCT scores in ISTS noise was significantly higher than in SSSW noise, despite the fact that there was no difference in either SNR or speech intelligibility between the two noise conditions. Also, CSCT performance in ISTS noise did not differ significantly from performance in quiet even in the A-only modality, although it is conceivable that potential differences in CSCT performance in ISTS noise and quiet were concealed by near-ceiling performance. Preserved CSCT performance in ISTS noise may be explained by a selective attention mechanism that comes into play when speech stimuli are presented against a background of speech-like noise (Zion Golumbic et al., 2013); the target speech stimuli are dynamically tracked in the brain but interfering noise is not tracked. This finding suggests that selective attention at higher cortical levels suppresses interfering speech-like noise at the perceptual level and may provide richer representation of the target speech stimuli in memory. In SSSW noise, it is likely that selective attention to the speech stimuli could not be achieved due to the lack of modulation in the interfering noise resulting in a failure to segregate the speech stimuli from the SSSW noise (c.f. Helfer and Freyman, 2005). Pichora-Fuller et al. (1995) have argued that

**Table 4 | Coefficients of correlations (Pearson's** *r***) between collapsed CSCT and cognitive test scores**.


*\* Correlation is significant at the 0.05 level (2-tailed)*

*\*\* Correlation is significant at the 0.01 level (2-tailed)*

although speech information has been perceived, it may not be adequately encoded in memory for retrieval. In this study, the speech intelligibility performance in SSSW and ISTS noise was similar, but the representation of the target speech stimuli in memory may have been impoverished in SSSW noise due to lack of segregation from noise, but not in ISTS noise. This may have led to lower CSCT scores in SSSW noise than in quiet and ISTS noise.

We also predicted that seeing the face of the talker while performing the CSCT would counteract the negative effects of noise by providing a richer representation of the to-be-remembered items and this is what we found in SSSW noise. Indeed, there was no significant difference in performance in the AV modality between any of the noise conditions including quiet. These findings demonstrate that when noise disrupts executive processing of speech, seeing the face of the talker counteracts the disruptive effect of the noise.

Significant correlations between performance on the Simon task and TRT and speech intelligibility in ISTS demonstrate that listening in ISTS noise is cognitively demanding. The correlation between speech performance in ISTS and performance in the Simon task suggests that inhibition skills may come to the fore to suppress irrelevant information during memory encoding (Janse, 2012) and the association with TRT suggests a role for linguistic closure during speech perception performance in noise when irrelevant cues have to be disregarded (Zekveld et al., 2012, 2013; Besser et al., 2013). However, these demands on cognitive resources while listening in ISTS do not seem to influence CSC as CSCT scores were higher in ISTS compared to SSSW noise. Our interpretation is that while cognitive resources relating to inhibition and linguistic closure are employed for perceiving speech in presence of noise, other higher cognitive processes selectively attenuate the interfering modulated noise such that richer representation of target items in working memory can be achieved. This leads to less load on CSC. This effect may be similar to the benefit afforded by visual cues in terms of generating a richer representation of the items. In the present study, we do not find a significant difference between the performance in SSSW noise in AV modality and ISTS noise in A-only modality. This finding suggests that for young adults with normal hearing, selective attention to speech in the presence of speech-like noise enriches the representation of the target stimuli in working memory in much the same way as the presence of visual cues.

#### **CORRELATIONS BETWEEN CSCT AND THE COGNITIVE TEST BATTERY** *Working memory*

We have previously reported evidence that CSC is not related to WMC measured using the reading span test (Mishra et al., 2013). The reading span test is a well-established test of verbal WMC (Daneman and Carpenter, 1980; Rönnberg et al., 1989) that has proved to be a potent predictor of the ability to unravel linguistic complexity (Unsworth et al., 2009) and in particular the ability to understand speech under adverse conditions (Akeroyd, 2008; Besser et al., 2013). In particular, it has proved useful as a way of probing the simultaneous storage and processing capacity that characterizes WMC, through the unimpaired modality of vision in persons with hearing impairment (Classon et al., 2013; Ng et al., 2013). Thus, we consider it to be the most suitable measure of general WMC. Although the concept of CSC is related to WMC in the sense that both involve temporary maintenance and processing of information, it is not necessarily the case that CSC can be assessed by simply measuring WMC. This is because general WMC may be depleted in different ways under different sets of listening conditions and reduced ability to deploy one set of processing skills, such as inhibition, may be compensated for fully or partially by other skills, such as updating. Thus, we did not expect CSCT performance to correlate with reading span performance, our measure of WMC in the present study. Indeed, there was no overall correlation, but CSCT performance in quiet conditions did correlate positively and significantly with WMC although not in either of the two noise conditions. On the face of it, it would appear that the quiet conditions in the present study are those that most closely resemble the conditions in the previous study (Mishra et al., 2013). However, there were several methodological differences in the present study compared to Mishra et al. (2013) that may help explain the difference in the pattern of correlations between the two studies. Buttonpress responses were used in Mishra et al. (2013), whereas voice responses were used in the present study. Button-press responses are more cognitively demanding than vocal responses as they require the synchronization of motor response and visual scanning of buttons. Thus, in our previous study (Mishra et al., 2013) general cognitive resources were probably being engaged for motor planning during the response phase of the task, reducing CSC and leading to a lack of correlation with reading span performance. In the present study, however, CSC in quiet conditions is likely to be more similar to independently measured WMC, which may explain the intercorrelation. Notwithstanding, adjustment for multiple comparisons renders this isolated correlation insignificant.

#### *Executive functions*

It is notable that performance in none of the tasks in the cognitive test battery predicted CSCT performance in SSSW noise. This may be because the cognitive skills measured by the test battery in the present study do not contribute towards executive processing of items presented in SSSW noise. However, letter memory performance predicted performance in both quiet and ISTS conditions. In quiet, there is no interfering noise to disrupt representation of the target stimuli in working memory. We have argued that in ISTS noise, good representation in working memory can be achieved by selective attention to speech (Zion Golumbic et al., 2013) whereas this is not the case in the presence of SSSW noise. The association of letter memory performance and CSCT performance in quiet and ISTS noise suggests that updating skills play an important role during processing of encoded representations. This in turn suggests that CSC in young persons with normal hearing thresholds may capitalize on updating ability when cognitive representations are rich.

Because the CSCT taps executive functions, we expected CSCT performance to correlate with the independent tests of executive function (c.f. Mishra et al., 2013). In particular, we expected updating ability to facilitate CSCT performance, especially in updating conditions, and inhibitory ability to facilitate CSCT performance in inhibition conditions. Performance on the letter memory task (Miyake et al., 2000) which was our independent test of updating correlated positively and significantly with performance in both the updating and inhibition conditions of the CSCT. Performance on the Simon task (Simon, 1969; Pratte et al., 2010) which was our independent test of inhibition skill did not correlate with CSCT under any conditions of the CSCT. Performance on letter memory and Simon tasks was not significantly related. In the present study, target items were presented in two different kinds of background noise for two out of three lists in an unpredictable manner. This means that the participants were probably always on the alert, at least at the beginning of each list, to cope with background noise, even when lists were presented in quiet. In other words, cognitive resources, probably inhibition skills, were probably always allocated, even when not specifically needed. This may have meant that participants had fewer inhibition resources available to engage in executive processing of the numbers. We have argued that a cognitive resource that is depleted in the act of speech perception may be compensated for fully or partially by another cognitive function during further processing. The pattern of correlations between CSCT and the cognitive test battery suggests that consistent demands were made on updating skills while inhibition skills had less impact. The explanation may be that during executive processing of numbers, updating skills compensated for the unavailability of inhibition skills engaged in preparing for noise.

Inhibition skills were significantly related to the intelligibility of the stimuli in ISTS noise, which suggests that inhibition skills were required to suppress the irrelevant information present in the ISTS noise during speech perception although they did not enhance memory performance or influence CSC. This may be because the inhibition resources were reduced during perception of numbers in noise and other cognitive skills like linguistic closure and updating were used to perform the inhibition task of CSCT.

#### *Linguistic closure*

The association of TRT with the inhibition subset of CSCT suggests that the ability to make use of linguistic closure is related to the processing required to identify and keep in mind auditory two-digit numbers of a certain parity and voice. The common factor may be an underlying ability to generate a coherent response on the basis of diverse pieces of information. The main effects of memory load and modality in CSCT performance revealed that the CSCT scores were lower when memory load was high and when visual cues were absent, suggesting that under these conditions the CSCT task is more difficult. The fact that performance in TRT was associated with CSCT performance in these particular conditions suggests that linguistic closure ability enhances CSCT performance when the task is difficult. This interpretation is supported by evidence that TRT predicts recall of speech heard in noise together with irrelevant visual cues (Zekveld et al., 2012). However, when visual information was available, CSCT performance in the present study was not associated with the TRT performance. This is in line with our prediction that AV integration is not cognitively taxing (Baddeley, 2012; Moradi et al., 2013). Further, we predicted that overall CSCT performance would be associated with performance in TRT and a tendency towards this association was found suggesting that overall performance in CSCT is predicted by an ability to make use of linguistic closure.

#### *Episodic long-term memory*

Finally, the delayed recall of the reading span stimuli measuring episodic LTM was not associated with performance in CSCT. Thus, no support was found for an association between episodic LTM and CSC in young adults with normal hearing. Recent work shows that hearing loss is associated with decline in LTM and it has been suggested that the mechanism behind this association is that hearing loss leads to more mismatch due to poor audibility and distortion of the input signal and thus less access to LTM (Rönnberg et al., 2011). Hence, it can be expected that an efficient LTM may facilitate processing of speech in adverse conditions, even though no such evidence was found for the participants with normal hearing in the present study.

## **COGNITIVE SPARE CAPACITY**

Cognitive resources are consumed in the act of listening (Pichora-Fuller, 2009; Rönnberg et al., 2013). Individual WMC capacity is associated with the ability to recognize speech in noise and the reading span task has proved to be a particularly potent predictor (Akeroyd, 2008; Besser et al., 2013). However, because available cognitive resources may be deployed differently under different listening conditions (Pichora-Fuller and Singh, 2006; Pichora-Fuller, 2007), it is important to gain an understanding of CSC which is the cognitive reserve that has been depleted by listening under adverse conditions. This requires an experimental approach. The results of the present study show that the CSCT may be a useful tool in this enterprise. They provide a baseline performance level for CSCT in quiet and in noise for young adults with normal hearing. The next step is to investigate CSCT performance in persons with hearing impairment. In the future, we aim to develop a simplified version of the CSCT which can be used for evaluation of hearing aid fitting and different signal processing strategies used in hearing aids. By using CSCT we will be able to show the influence of signal processing on memory for heard speech. We believe that CSCT performance can provide us with a snap-shot of how hearing-aid signal processing influences cognitive demands in communicative situations (Rudner and Lunner, 2013).

## **CONCLUSION**

The results of the present study replicate the results of Mishra et al. (2013) by showing that CSC in young adults with normal hearing thresholds is sensitive to storage load and executive function but not generally related to WMC and that availability of visual cues may hinder executive processing of speech heard in quiet. They also extend these results by showing that even when speech intelligibility is high, steady-state noise may lower CSCT performance but that this decrement can be restored when the talker's face is visible, probably by aiding segregation of target items and thus enriching their cognitive representation. Speech-like noise did not reduce CSCT performance, which was contrary to our prediction. We suggest that selective attention was used to ignore the speech-like background noise to provide an enriched representation of target items similar to that obtained in quiet. The overall pattern of results suggests that updating skills play a key role in exploiting CSC and may provide highlevel compensation when inhibition skills are engaged in low-level processing.

## **ACKNOWLEDGMENTS**

Supported by grant number 2007-0788 to Mary Rudner from the Swedish Council for Working Life and Social Research.

## **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 13 August 2013; accepted: 09 November 2013; published online: 26 November 2013.*

*Citation: Mishra S, Lunner T, Stenfelt S, Rönnberg J and Rudner M (2013) Seeing the talker's face supports executive processing of speech in steady state noise. Front. Syst. Neurosci. 7:96. doi: 10.3389/fnsys.2013.00096*

*This article was submitted to the journal Frontiers in Systems Neuroscience.*

*Copyright © 2013 Mishra, Lunner, Stenfelt, Rönnberg and Rudner. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Auditory and cognitive factors underlying individual differences in aided speech-understanding among older adults

## *Larry E. Humes\*, Gary R. Kidd and Jennifer J. Lentz*

*Department of Speech and Hearing Sciences, Indiana University, Bloomington, IN, USA*

#### *Edited by:*

*Arthur Wingfield, Brandeis University, USA*

#### *Reviewed by:*

*Bruce A. Schneider, University of Toronto, Canada Barbara Shinn-Cunningham, Boston University, USA*

#### *\*Correspondence:*

*Larry E. Humes, Department of Speech and Hearing Sciences, Indiana University, Bloomington, IN 47405, USA e-mail: humes@indiana.edu*

This study was designed to address individual differences in aided speech understanding among a relatively large group of older adults. The group of older adults consisted of 98 adults (50 female and 48 male) ranging in age from 60 to 86 (mean = 69.2). Hearing loss was typical for this age group and about 90% had not worn hearing aids. All subjects completed a battery of tests, including cognitive (6 measures), psychophysical (17 measures), and speech-understanding (9 measures), as well as the Speech, Spatial, and Qualities of Hearing (SSQ) self-report scale. Most of the speech-understanding measures made use of competing speech and the non-speech psychophysical measures were designed to tap phenomena thought to be relevant for the perception of speech in competing speech (e.g., stream segregation, modulation-detection interference). All measures of speech understanding were administered with spectral shaping applied to the speech stimuli to fully restore audibility through at least 4000 Hz. The measures used were demonstrated to be reliable in older adults and, when compared to a reference group of 28 young normal-hearing adults, age-group differences were observed on many of the measures. Principal-components factor analysis was applied successfully to reduce the number of independent and dependent (speech understanding) measures for a multiple-regression analysis. Doing so yielded one global cognitive-processing factor and five non-speech psychoacoustic factors (hearing loss, dichotic signal detection, multi-burst masking, stream segregation, and modulation detection) as potential predictors. To this set of six potential predictor variables were added subject age, Environmental Sound Identification (ESI), and performance on the text-recognition-threshold (TRT) task (a visual analog of interrupted speech recognition). These variables were used to successfully predict one global aided speech-understanding factor, accounting for about 60% of the variance.

#### **Keywords: presbycusis, speech recognition, amplification, psychoacoustics, aging**

## **INTRODUCTION**

The prevalence of hearing loss among adults over the age of 60 years has been estimated to be about 40% (e.g., Cruickshanks, 2010). A common consequence of the presence of such hearing loss is difficulty understanding speech in many everyday listening situations, but especially those situations involving backgrounds of competing speech or noise. For older adults, however, declining hearing sensitivity may not be the only factor contributing to the speech-understanding difficulties experienced. A review of the literature on factors contributing to the speech-understanding problems of older adults was provided by a working group of the Committee on Hearing and Bioacoustics and Biomechanics (CHABA) of the National Research Council (CHABA, 1988). Basically, as noted in Humes (1996), this report reviewed the evidence available at that time and evaluated this evidence relative to its support for one of three site-of-lesion hypotheses: (1) peripheral, which focused on the cochlea, and included both a simple (audibility loss) and more complex (suprathreshold processing deficits associated with cochlear pathology) version

of the hypothesis; (2) central auditory, which included auditory centers of the brainstem and cortex; and (3) cognitive, which involved non-auditory areas of the cortex used in various aspects of linguistic and cognitive processing. As noted in the CHABA report, these hypotheses were not mutually exclusive and any or all of them could apply to a given study or a given individual.

Since the publication of the CHABA report, there have been numerous studies conducted in an effort to better understand the factors contributing to the speech-understanding difficulties of older adults. Early post-CHABA studies emphasized peripheral and cognitive factors (e.g., van Rooij et al., 1989; van Rooij and Plomp, 1990, 1992) or peripheral, central-auditory, and cognitive factors (Jerger et al., 1989, 1991; Humes et al., 1994). In these and similar studies through the three decades following the CHABA report, the primary focus was on the understanding of unamplified speech by older adults; that is, speech presented at conversational levels of 60–70 dB SPL and without the use of amplification to compensate for the peripheral hearing loss. Repeatedly and consistently across studies, hearing thresholds were found to be the most significant contributor to individual differences in unaided speech understanding by older adults, often accounting for 30–80% of the total variance in speech-understanding performance [see review by Humes and Dubno (2010)]. This was especially true for listening in quiet and steady-state background noise, which were the two conditions that received the most attention initially from many researchers. Often, in studies involving speech stimuli presented in competing speech or speech-like (fluctuating) backgrounds, however, cognitive factors emerged as minor, but statistically significant secondary contributors to individual differences in speech-understanding performance [see reviews by Akeroyd (2008); Houtgast and Festen (2008)].

Over the past decade or so, there has been increased interest in the factors underlying individual differences in speechunderstanding performance for older adults when listening to amplified speech. Amplified speech, appropriately implemented, has the capability of overcoming the inaudibility of speech resulting from peripheral hearing loss. As was the case for unamplified speech, some researchers over the past decade chose to evaluate peripheral, central auditory, and cognitive factors as potential explanatory factors (e.g., Humes, 2002) whereas many more focused exclusively on peripheral and cognitive factors (Akeroyd, 2008; Houtgast and Festen, 2008; Humes and Dubno, 2010). For amplified speech, the relative contributions of peripheral and cognitive factors differed from that found with unamplified speech, for which hearing sensitivity was clearly the dominant factor. Rather, cognitive factors were typically found to be at least as important as hearing loss and were often the predominant factor accounting for individual differences in *aided* speech-understanding performance (Akeroyd, 2008; Houtgast and Festen, 2008; Humes and Dubno, 2010). Again, this is most apparent for speech-understanding performance measured in a background of competing speech or speech-like stimuli and may also depend upon the complexity of the target speech materials, the response task, or both. For the most part, studies of individual differences in speech understanding for amplified speech by older listeners have had relatively small numbers of older subjects (typically less than 25), making any conclusions based on the examination of individual differences tenuous at best. Humes (2002) is an exception, having tested 171 older adults, but this study made use of clinical hearing aids and full restoration of speech audibility was not achieved (Humes, 2002, 2007). Perhaps this is why Humes (2002) also saw much greater relative importance of peripheral hearing loss compared to cognition as a predictor of individual differences in the perception of amplified speech.

The present study sought to remedy some of the weakness of prior studies of individual differences in the perception of amplified speech by older adults. A relatively large sample of older adults (*N* = 98) was studied. Moreover, to ensure sufficient audibility, at least through 4000 Hz, spectral shaping of speech was applied in a laboratory setting, rather than relying on the use of clinical hearing aids. In addition, psychophysical measures of auditory processing thought to be relevant to the perception of speech in competing speech were added to a test battery of peripheral and cognitive measures administered to all subjects. Several of these auditory psychophysical measures, all making use of non-speech stimuli, were designed to tap processes related to the encoding of temporal information, such as slow envelope fluctuations and faster periodicity information that may be important for the segregation of target and competing talkers. Others were designed to assess various aspects of energetic and informational masking. Finally, a wide array of speech-understanding measures was included in this study to provide a more complete assessment of the difficulties experienced by older adults. Most of the speech-understanding measures involved competing speech-like backgrounds whereas others were included to verify restoration of performance to high levels with amplification in the absence of competition. Additional details about the measures included can be found in the next section.

The focus of this paper is on the individual differences in the understanding of amplified speech by the 98 older adults included in this study. However, we thought it was important to also test a small group (*N* = 27) of young normal-hearing adults for comparison purposes because several of the measures described below had not been used in an identical fashion with either younger or older adults previously. In addition, for those prior studies with young adults using similar paradigms the sample sizes were often small (*N* < 10). Between-group comparisons could then inform us as to whether older adults with impaired hearing performed differently relative to a young normal-hearing comparison group. Further, it has often been observed that tests believed to be reliable, based on data from young normal-hearing adults, do not prove to be reliable when evaluated in older adults (e.g., Dubno and Dirks, 1983; Christopherson and Humes, 1992; Cokely and Humes, 1992; Humes et al., 1996). Since reliability data were not available for many of the measures developed for this study, the test-retest reliability was examined in a subgroup (*N* = 31) of the 98 older adults. After methodological details have been presented, the results will be presented for the reliability analyses, followed by the age group comparisons, and then the examination of individual differences among the 98 older adults.

## **METHODS**

#### **SUBJECTS**

The 98 older adults included in this study ranged in age from 60 to 86 years (*M* = 69.2 y). Fifty were female, 91 were not current hearing aids users, and 88 had never worn hearing aids. Their hearing was characterized by a bilaterally symmetrical highfrequency sensorineural hearing loss of varying degrees and the median audiograms for each ear are shown in **Figure 1**. Most (91 of 98) subjects had their right ears tested and the remainder had their left ear tested for all monaural measures. The left ear was tested whenever the hearing in the right ear was too severe for inclusion in this study. In addition to an inclusion criterion based on the severity of hearing loss, all subjects had no evidence of middle-ear pathology (air-bone gaps <10 dB and normal tympanograms bilaterally), no signs of dementia (Mini Mental Status Exam, MMSE, >25; Folstein et al., 1975), and had English as his or her native language. Subjects were recruited primarily via newspaper ads in the local paper and were paid for their participation.

The 27 young normal-hearing adults ranged in age from 18 to 30 years (*M* = 22.7 y) and 21 were female. These subjects had hearing thresholds ≤25 dB HL (ANSI, 2004) from 250 through 8000 Hz in both ears, no evidence of middle-ear pathology, no signs of dementia (MMSE > 25), and had English as his or her native language. Mean audiometric thresholds for every frequency and both ears were less than 10 dB HL. The test ear was always the right ear for the young subjects. Subjects were recruited primarily by flyers and university online postings. Young subjects were also paid for their participation.

#### **PSYCHOPHYSICAL NON-SPEECH MEASURES**

#### *Common procedures*

For the auditory psychophysical measures, except for streaming, thresholds were measured for a variety of tasks using a standard/two-alternative forced-choice method, with trial-bytrial signal strengths chosen according to a 2-down, 1-up adaptive staircase procedure estimating the 70.7% correct point on the psychometric function (Levitt, 1971). In this procedure, a comparison stimulus was always presented in the first interval, and the second and third intervals contained a target and a comparison stimulus presented in random order with equal likelihood. Listeners were provided a visual marker for each observation interval and indicated which interval contained the target (altered) stimulus by responding to the appropriate area displayed on a touch screen monitor. Correct-answer feedback was provided to the listener following each trial. The staircase continued until a total of 7 reversals of the direction of the track were obtained. The mean of the signal strengths (geometric mean for harmonic mistuning) at the last 6 reversal points was taken as threshold. Unless noted otherwise, all psychophysical measures were repeated five times with those five replicates averaged to form a single threshold estimate. Analyses of performance across the five replicates failed to reveal consistent significant trends over time and the five replicates were averaged for all subjects and tasks as a result. As will be demonstrated below, most of these measures were found to be reliable in the older adults, the age group most likely to show changes in performance over time. It is acknowledged, however that the average of five relatively brief estimates of performance on a given task, typically representing a total of 200–250 trials, is not likely to be representative of asymptotic levels of performance, for either age group, that could be obtained with a much larger number of trials. All testing was conducted in a sound-attenuating room.

All stimuli were digitally generated and played through one or two channels of a 24-bit digital-to-analog converter (DAC; TDT System III RP2.1) at a sampling rate of 4096 <sup>×</sup> <sup>10</sup>−<sup>5</sup> s (about 24,414 Hz). The output was fed into Etymotic Research ER-3A insert earphones. Stimuli were presented monaurally in most of the tests, except for informational masking and masking-leveldifference experiments which included presentation to both ears. Each measure is described in more detail below.

#### *Informational masking*

These measures were developed from consideration of the work of Kidd et al. (1994). The task was to detect a series of fixedfrequency tone bursts (i.e., a signal train) embedded in a masker having similar temporal characteristics to the signal, but randomly selected frequency characteristics. Signal and masker stimuli were eight 60-ms tone bursts with 10-ms rise/fall times presented sequentially for a total stimulus duration of 480 ms. The interval between successive tone bursts was 0 ms as in Kidd et al. (1994). The signal contained pure-tone bursts at 500 or 1000 Hz, with a fixed frequency and phase across the eight bursts. In the masker stimulus, the bursts were 6-tone complexes with randomly selected frequencies. The frequencies were chosen from frequencies ranging from 2 octaves below to 2 octaves above the signal frequency, with the restriction that no component fall within ±12% of the signal frequency. In this task, the target stimulus was generated by adding the signal and masker stimuli, whereas the comparison stimulus was the masker alone stimulus. Two different masker types were tested: burst-same and burst-different. In the burst-same condition, the frequencies and phases of the masker bursts were held fixed across the eight bursts, and a new masker was generated for each interval. In the burst-different condition, the frequencies and phases were randomly selected across bursts and across intervals.

The masker level was fixed at 80 dB SPL per component. Stimuli were presented diotically to both ears, with a 700 ms interstimulus interval. Threshold is expressed as the signal level in dB SPL needed for detection. At the beginning of every track, the signal strength was set to 95 dB SPL. The initial step size of the tracking procedure was 5 dB, and after three reversals the step size was reduced to 2.5 dB. The signal level could never exceed 105 dB SPL, and conditions were ordered such that the burst-different conditions were always presented first and the signal frequency (500 or 1000 Hz) was selected at random for each adaptive run. A single threshold was measured in each condition prior to obtaining repeat thresholds in the various conditions.

#### *Modulation detection*

This task measured the just-detectable modulation depth of sinusoidal amplitude modulation imposed on broadband noise (e.g., Viemeister, 1979; Bacon and Viemeister, 1985). The carrier, broadband noise, was generated using a Gaussian random generator. When modulation was present, the modulated stimulus was generated by multiplying the carrier by a raised cosine with a modulation rate of *fm* (5, 20, or 60 Hz), an amplitude (modulation depth) of *m* (where *m* = modulator amplitude/carrier amplitude), and a random starting phase. The comparison stimuli were unmodulated carriers, and the target stimulus was the modulated carrier. A new random noise was chosen for each interval. Listeners detected which stimulus contained the modulation, and the just detectable modulation is expressed as 20log*m*.

Stimuli were calibrated such that the modulated and unmodulated stimuli were 80 dB SPL. Stimuli were 400 ms in duration with 40-ms rise/fall times, separated by a 600-ms interstimulus interval. The initial modulation depth was 0 dB (fully modulated; *m* = 1). The step size was 5 dB, decreasing to 2 dB after 3 reversals. Thresholds were tested using a random ordering of the modulation rates, with the caveat that all three modulation rates were tested before replicates were obtained.

#### *Modulation detection interference (MDI)*

This task was similar to modulation detection, but the carrier was a tone rather than noise, and the just-detectable modulation depth was measured in the presence of a high-frequency tone, which when modulated, interferes with modulation detection of the low-frequency tone (e.g., Yost et al., 1989). The standard stimulus was an unmodulated 400-Hz tone added to a 1974-Hz tone (with random phases), which could be either unmodulated or modulated. The comparison (target) stimulus was generated by modulating the 400-Hz carrier with a sinusoid of modulation depth, *m*, a modulation rate of 5, 10, or 20 Hz, and a random starting phase. In the no-interference conditions, the 1974-Hz tone was not modulated. In the interference conditions, the 1974-Hz tone was 100% amplitude modulated at a rate equal to the modulation rate imposed on the 400-Hz tone and the starting phase was random. Note that both target and comparison stimuli contain 400-Hz and 1974-Hz components. Listeners detected which interval contained the modulated 400-Hz tone, and thresholds are expressed as 20log*m*.

The modulated and unmodulated tones were calibrated such that their average overall level was 80 dB SPL. Each stimulus was 400 ms with 40-ms rise/fall times, and the interstimulus interval was 600 ms. The initial signal strength was 0 dB (fully modulated; *m* = 1), and the adaptive tracking procedure used a step size of 5 dB which was decreased to 2 dB after 3 reversals. Thresholds were measured by randomizing the modulation rate and the interference types without blocking. Each rate and interference combination was tested before replicates were obtained.

#### *Masking level difference (MLD)*

In this task, listeners detected a tone added to Gaussian noise. The comparison stimuli were broadband noises presented diotically, with the same noise presented to each ear. The target stimulus was generated by adding a tone of either 250 or 500 Hz with random phase to a different noise. As with the comparison stimulus, the noise in the target stimulus was presented diotically. In the NoSo conditions, the tone was presented diotically, in phase across the ears. In the NoSπ condition, the tone was presented dichotically, 180◦ out of phase across the ears (e.g., Hirsh, 1948). Listeners were asked to detect the tone added to the band of noise.

The noise was presented at an overall level of 80 dB SPL. The stimuli were 250 ms in duration with 40-ms rise/fall times, and the interstimulus interval was 500 ms. In the adaptive track, the initial signal level was 80 dB SPL, and the step size was 5 dB, decreasing to 2 dB after 3 reversals. The maximum permissible signal level was 105 dB SPL. Thresholds were obtained using a randomized block design in which the signal frequency was selected at random, and NoSo and NoSπ conditions were tested before moving on to the next frequency. A single threshold was obtained for each frequency/condition combination prior to obtaining repeat estimates.

#### *Anisochrony*

In this task, the listeners were asked to detect a lengthened interonset interval (IOI) embedded in an otherwise isochronous tone sequence. Comparison tone sequences consisted of eight 50-ms tones with 0◦ starting phase separated by 100 ms. Each tone had a 5-ms rise and fall time and was presented at 80 dB SPL. Target stimuli were identical to comparison stimuli, except for an increase (*t*) in the IOI between one pair of tones. The pause between sequences was 1700 ms.

This task was patterned after the "rhythm discrimination" task in the Test of Basic Auditory Capabilities (TBAC; Watson, 1987; Kidd et al., 2007). Two conditions were included. In the fixed/fixed (*F*/*F*) condition, each tone in the sequence had a frequency of 1000 Hz, and the increased IOI (*t*) always occurred between the 4th and 5th tones. In the variable/variable (V/V) condition, the frequencies of the tones within the sequence randomly varied between 500 and 2000 Hz (in logarithmic spacing), and the position of the increased IOI was variable among the seven possible IOI positions. For this condition, a new random selection of tone frequencies was chosen on each stimulus presentation. For the adaptive tracking procedure, the initial *t* was 30 ms for the *F*/*F* condition and 200 ms for the V/V condition. The step size began at 20% of the initial *t*, and decreased to 10% of the initial *t* after 3 reversals. The IOI could not exceed 400 ms. The order of testing included random selection of fixed/fixed and variable/variable, with the caveat that both conditions were tested before replicates were obtained.

#### *Harmonic mistuning*

The task was to detect a mistuned component from a harmonic stimulus (e.g., Moore et al., 1985). Comparison stimuli were the sum of 12 harmonically spaced tones generated at a fundamental frequency of 100 or 200 Hz with random phase. The target stimulus was generated by altering the frequency of the 3rd harmonic by *f* , expressed in Hz, with all tones having a newly selected random phase.

Each harmonic component was presented at 80 dB SPL. Stimuli were 400 ms in duration with 40-ms rise-fall times and a 700-ms inter-stimulus interval. In the adaptive tracking procedure, the initial *f* s were 20% of the fundamental frequency. The initial step size of the tracking procedure was a factor of 2, and after three reversals the step size was reduced to a factor of 1.5. The order of testing involved randomly selected fundamental frequency (either 100 or 200 Hz) with the caveat that thresholds were obtained for each fundamental frequency prior to obtaining repeat estimates.

#### *Stream segregation*

Two tones (or harmonic complexes) were alternated and separated by quiet intervals to form a sequence of triplets: ABA\_ABA\_ABA\_ABA ... (Bregman and Campbell, 1971; van Noorden, 1977). Timing was constant, with A, B, and "\_" (a silent interval) fixed at 100 ms in duration. The A event was a 250-Hz tone, a 1000-Hz tone, or a 150-Hz harmonic complex consisting of the first 12 harmonics of 150 Hz. All tones (or complexes) had a15 ms rise and fall time. When A was a pure tone, B was also a pure tone, and when A was a harmonic complex, so was B. B began at a frequency (or fundamental frequency) of 1.5 octaves above A. Each subsequent decreasing frequency (or fundamental frequency) of B (*fB*) was chosen according to the following function:

*fBn* = *fB*(*n*−1) (1/1.06) , where *n* is the triplet number. Each tone was presented at 80 dB SPL.

Listeners were provided two different sets of instructions. In the first block, listeners were told to press a button when they could no longer hear two separate streams (e.g., the fission boundary). In the second block, listeners were told to press a button as soon as they heard a galloping sound (e.g., the "galloping" boundary). This set of instructions was provided to listeners as pilot testing suggested that, at times, listeners had difficulty understanding the first set of instructions. The frequency of the B stimulus when the subjects pressed the button was recorded, and this process was repeated eight times for each set of instructions and frequency. The fission boundary was always tested first, with all eight replicates tested for each randomly selected frequency before a new frequency was tested. The "galloping" boundary was tested second, following a similar randomization procedure.

#### **ADDITIONAL AUDITORY TESTS**

Presentation of the stimuli for additional auditory tests, described below, was diotic, rather than monaural. In addition, a highquality sound card (Digital Audio Labs Card Deluxe) was used instead of the TDT RP2 real-time processor, with a sampling rate of 44,100 Hz.

#### *The test of basic auditory capabilities (TBAC)*

The TBAC is a battery of auditory tests that has been under development since the early 1980s (see Watson, 1987; Kidd et al., 2007). The version used here was the TBAC-4, obtained from Communication Disorders Technology (CDT), Inc. The test battery includes six tests of auditory discrimination using single tones or groups of tones, and two tests using speech sounds. The eight tests are briefly described below. For additional details see Kidd et al. (2007) and the TBAC information available on the CDT web site (http://comdistec.com/new/TBAC. html).

Trials in each of the subtests, except for subtest 8, are structured in a modified 2AFC format in which a standard stimulus is followed by two test stimuli, one of which is different from the standard. The listeners use a computer keyboard to indicate which test stimulus was different from the standard. Trials are arranged in groups of six, and the level of difficulty is systematically increased from trial to trial, within each group, in logarithmic steps. Eight levels of difficulty are tested over 72 trials, presenting the six easiest levels in the first 36 trials, followed by an increase in difficulty of two log steps for trials 37–72. (This is slightly modified for subtest 7, which uses only five levels and 48 trials.)

The following eight subtests were included. (1) *Single-tone frequency discrimination*: the standard was a 1000-Hz 250-ms tone and frequency increments were used. (2) *Single-tone intensity discrimination:* the standard was a 1000-Hz 250-ms tone and intensity increments were used. (3) *Single-tone duration discrimination*: the standard was a 1000-Hz 100-ms tone and duration increments were used. (4) *Pulse-train discrimination* (rhythm): the standard consisted of six 20-ms pulses (1000-Hz tone) arranged in three pairs, with a 40-ms pause within a pair and a 120-ms pause between pairs. The "different" sequence included an increase in the duration within a pair with a corresponding decrease in the duration between pairs, thus altering the rhythm of the sequence while keeping the total duration constant. (5) *Embedded tone detection:* the standard consisted of a sequence of eight tones of differing frequency with a temporal gap (ranging from 10 to 200 ms) in the middle of the sequence. The "different" sequence had a tone (also ranging from 10 to 200 ms in duration) filling the temporal gap in the middle position. A different sequence of frequencies (ranging from 300 to 3000 Hz) was presented on each trial. The duration of the middle gap or tone was varied to manipulate task difficulty. (6) *Temporal-order discrimination for tones:* the standard was a four-tone pattern consisting of two equalduration tones (550 and 710 Hz) preceded and followed by a 100-ms 625-Hz tone. The middle tones were presented in reverse order in the "different" interval. The duration of the tones varied from 20 to 200 ms in equi-log steps. (7) *Temporal-order discrimination for syllables*: this subtest is similar to subtest 6, but with consonant-vowel (CV) syllables comprising the sequence instead of tones. The task is to discriminate /fa/-/ta/-/ka/-/pa/ from /fa/- /ka/-/ta/-/pa/. The duration of the syllables was varied (by reducing the vowel duration) from 250 to 75 ms in five steps. (8) *Syllable recognition:* this was a test of the recognition of non-sense CVC syllables in broadband noise. A 3AFC paradigm was used, with foils created by altering the vowel or one of the consonants. Five speech-to-noise ratios (SNRs) were used with decreasing SNRs within each set of five trials. A set of 100 stimuli were presented twice in separate blocks, with a different random order for each block.

#### *Test of environmental sound identification*

This was a short version of the environmental sound identification (ESI) test described in Kidd et al. (2007). The number of different sounds was reduced from 25 to 20 to keep testing time under an hour. Subjects were asked to identify common non-speech sounds produced by animate or inanimate sources (e.g., a dog barking or a door closing) presented in broadband Gaussian noise. Subjects were initially familiarized with the full set of sounds by listening to them at a favorable SNR with the sound name displayed on the computer screen. During testing, a 3AFC response paradigm was used with the most confusable sounds from the set used as foils. The test consisted of two blocks of 120 trials with each sound presented at each of six overlapping SNRs in each block (8 SNRs in total). Trials were presented in groups of six, with increasing SNRs within each group. The sounds were presented in a different random order in each block of trials, using the highest 6 SNRs in the first block and the lowest 6 SNRs in the second block.

## **VISUAL COGNITIVE-LINGUISTIC MEASURES**

#### *Working-memory tests*

Three subtests from a Matlab-based working memory test battery developed by Lewandowsky et al. (2010) were administered. These subtests are described below. Additional details can be found in Lewandowsky et al. (2010). For all tests, there were no time constraints on the recall task at the end of each trial and no feedback was provided. Each test took ∼10 min to complete.

*Memory updating.* At the start of each trial, subjects were presented with a sequence of from 3 to 5 digits. Each digit was surrounded by a square to mark its position on the screen. After all of the digits were presented, the squares remained on the screen and a different sequence of arithmetic operations (addition or subtraction, ranging from +7 to 7) appeared in each of the squares, one at a time. The subject's task was to remember the digits that appeared in each square and then perform the sequence of arithmetic operations presented in each of the squares. The subject was asked to indicate (using the keyboard) the final resulting value in each square after a sequence of from two to six sequential arithmetic operations. The test consisted of 15 trials with a randomly-generated sequence of set size (3–5 co-occurring series of operations) and number of operations (2–6) on each trial.

Because this test was challenging for older adults, some adjustments were made to the procedures to ensure that the task was well-understood, and to make it a bit less challenging. The number of practice trials was increased from two (the default) to four and the time between items (to be added or subtracted) was increased from 250 to 500 ms. The first two practice trials used a 3-s inter-item time to allow the experimenter to explain the required operations during the trial. Also, the default instructions were supplemented with a verbal explanation of the task that included a subject-paced simulated trial using cue cards to present the stimuli.

*Sentence span.* The "easy" version of the sentence-span task was used for this study. In this task, subjects were presented with an alternating sequence of simple sentences (3–6 words in length) and single letters on the computer screen. Subjects judged whether the sentence was true or false on each presentation, with 4 s allowed for responding. The letters required no response. After from four to eight sentence/letter presentations, subjects were asked to recall the letters in the order they were presented. The test consisted of 15 trials (after three practice trials) with three instances of each number of sentence/letter presentations.

*Spatial short-term memory.* This test assessed a subject's ability to recall the location of dots (filled circles) in a 10 × 10 grid. On each trial, an empty grid was presented and then a sequence of dots appeared in the grid. Each dot remained on the screen for ∼1 s before it was removed and the next dot appeared. From two to six dots were presented on each trial. After all of the dots had been presented (and removed), the subject was asked to indicate the relative position of the dots by touching (or pointing and clicking with a computer mouse) the cells within the grid. This test consisted of 30 trials (6 at each set size).

## *A quick test (AQT)*

The AQT was used to provide a measure of cognitive abilities that often decline with age (or due to various types of dementia) (Wiig et al., 2002). The test is designed to measure verbal processing speed, automaticity of naming, working memory, and the ability to shift attention between dimensions of multidimensional visual stimuli. The test consisted of three timed subtests in which subjects named the color and/or the shape of symbols arranged on a page in eight rows of five. Test 1 required subjects to name the color (black, red, blue, or yellow) of colored squares. The second test required subjects to name each shape on a page of black circles, squares, triangles, and lines. The third test included colored shapes (the same shapes and colors used in tests 1 and 2) and subjects were asked to name both the color and the shape. Subjects were told to proceed as fast and as accurately as they could and the total time to complete each subtest was recorded.

## *Text recognition threshold (TRT)*

The TRT is a test of the ability to recognize written sentences that are partially obscured by a vertical grating. The Dutch version of the test, developed by Zekveld et al. (2007), was obtained and modified to present English sentences from the revised Speech in Noise (R-SPIN) test (Bilger et al., 1984). No other properties of the test were changed. On each trial, a row of equally-spaced vertical black bars appeared then a sequence of words that form a meaningful sentence, appeared behind (obscured by) the bars. The words appeared sequentially (250 ms per word) and the complete sentence remained on the screen for 3.5 s. The subject's task was to read aloud as much of the sentence as he or she could identify. The difficulty of the task was varied adaptively (based on a subject's performance) by increasing or decreasing the width of the bars (i.e., the percentage of unobscured text). The test consisted of four adaptive runs of 13 trials, with four different sets of R-SPIN predictability-high (PH) sentences. The threshold for each run was computed as the mean percentage of unobscured text on trials 5–13 and the final TRT value was the mean of the four threshold estimates.

## **SPEECH-UNDERSTANDING MEASURES**

#### *Stimuli*

The test battery included a collection of tests to assess the ability to recognize or identify speech under a variety of difficult listening conditions. Four open-set speech-recognition tests utilized the R-SPIN sentences (Kalikow et al., 1977; Bilger et al., 1984). Subsets of these sentences were presented: (1) with 50% time compression; (2) with intermittent interruption; (3) mixed with the original SPIN-test babble; and (4) in quiet. Another set of speech-understanding measures was based on the closed-set identification Coordinate Response Measure (CRM) corpus (Bolia et al., 2000). These measures assessed the ability to identify two key words (color-number coordinates) in a spoken sentence in the presence of a similar simultaneous competing sentence. The competing message was: (1) the same voice as the target message; (2) a voice transposed 6-semitones higher in fundamental-frequency (using STRAIGHT; Kawahara et al., 1999); (3) transposed and time reversed; or (4) not presented (quiet condition).

#### *General procedures for speech-understanding measures*

As with the other auditory measures, all testing was done in a sound-treated booth that met or exceeded ANSI guidelines for permissible ambient noise for earphone testing (American National Standards Institute, 1999). Stimuli were presented monaurally, using an Etymotic Research ER-3A insert earphone. A disconnected earphone was inserted in the non-test ear to block extraneous sounds. Stimuli were presented by computer using Tucker Davis Technologies System-3 hardware (RP2 16-bit D/A converter, 48,828 Hz sampling rate, HB6 headphone buffer). Each listener was seated in front of a touchscreen monitor, with a keyboard and mouse available.

*Presentation levels.* For the non-speech auditory measures described previously, audibility of the stimuli could be ensured by judicious selection of stimulus frequencies and levels. For broad-band speech stimuli, however, this is not as easily accomplished. Speech presentation levels were adjusted to ensure that speech information was audible for the older listeners and to provide comparable presentation levels for all listeners. For the older listeners, the long-term spectrum of the full set of stimuli was measured and a filter was applied to shape the spectrum according to each listener's audiogram. The shaping was applied with a 68 dB SPL overall speech level as the starting point, and gain was applied as necessary to each 1/3 octave band to produce speech presentation levels at least 13 dB above threshold from 125 to 4000 Hz. Because this often resulted in relatively high presentation levels, a presentation level of 85 dB SPL (without any spectral shaping) was used for the YNH listeners to minimize level-based differences in performance between groups. Previous work has shown that presentation levels above 80 dB SPL generally lead to somewhat poorer intelligibility, for both uninterrupted speech (e.g., Dubno et al., 2005a,b; Studebaker et al., 1999) and interrupted speech (Wang and Humes, 2010).

#### *Measures using the R-SPIN materials*

The R-SPIN stimuli are simple sentences, spoken by a male, that consist of five to eight words, ending with a common monosyllabic noun. The R-SPIN materials include 200 PH sentences in which the final word is highly predictable from the prior context, and 200 predictability-low (PL) sentences in which the same final words are presented in a neutral context. Subsets of these sentences were presented in four different stimulus conditions, as described below. The basic procedures and task were the same for all stimulus conditions. On each trial, the word "LISTEN" was presented visually on the monitor, followed by the presentation of a sentence 500 ms later. The subject's task was to type the final word of the sentence using the computer keyboard. Subjects were instructed to make their best guess if they were unsure. The next trial was initiated by either clicking on (with the mouse) or touching a box on the monitor labeled "NEXT." In addition to responses that were spelled correctly, homophones and equivalent phonetic spellings were scored as correct responses. The four stimulus conditions are described below.

*Time compressed speech.* A random selection of 100 R-SPIN PL sentences were time-compressed using a 50% timecompression ratio. The time compression was performed using a uniform-compression algorithm, described by Gordon-Salant and Fitzgibbons (1993), applied to the entire sentence. (The time-compressed stimuli were provided by Dr. Gordon-Salant.)

*Interrupted speech.* Interrupted versions of the R-SPIN sentences were created by digitally replacing portions of each sentence with silence to create a regular pattern of speech fragments, or "glimpses," throughout the sentence. The glimpse patterns were based on eight equal-duration glimpses of the target word (always the final word in a sentence), with a total glimpsed duration equal to 50% of the total word duration (which ranged from 300 to 600 ms). The onset and offset of each glimpse was smoothed with a 4-ms raised-cosine function to minimize spectral artifacts. The first and last glimpses were always aligned with the beginning and ending of the target word, with the other glimpses equally spaced with a constant pause duration. Consider a total word duration of 320 ms. This word would be interrupted such that 820-ms glimpses (50% of the total word) would be presented with 22.85 ms separating each glimpse. The interruption patterns created to fit the target words were applied to the entire sentence so that a consistent interruption pattern was maintained throughout the sentence, with the location and duration of glimpses determined by the glimpse alignment with the target word. Because of the variation in word duration, the glimpse parameters (8 glimpses and 50% of the word duration) yielded a range of glimpse durations, pause durations, and interruption rates. Because of variation in sentence duration, sentences often began with a single glimpse or silence (pause) duration that was shorter than the value used in the rest of the sentence. Speech-shaped noise was added to each sentence using broadband noise shaped to match the long-term spectrum of the full set of target words. A different randomly chosen section of a 10-s sample of noise was used for each sentence. The duration of the noise sample was adjusted for each sentence, with 250 ms of leading and trailing noise. The speech and noise were mixed at +10 dB SNR measured at the target word (based on rms values). The conditions chosen here were based on prior data for these materials from young and older adults (Kidd and Humes, 2012).

Testing consisted of two blocks of 100 trials. One hundred PL sentences were presented first, followed by 100 PH sentences containing the same set of final words (with a different context sentence).

*Speech in 12-talker babble.* A different set of PL and PH sentences (100 of each) were then presented with the original R-SPIN 12-talker babble, using a +8 dB SNR. This SNR is common in everyday listening conditions (Pearsons et al., 1977). As with the interrupted speech, the PL sentences were presented prior to the PH sentences in two 100-trial blocks.

*Speech in quiet.* The final test with the R-SPIN materials presented 100 intact PL sentences in quiet. The sentences were the 100 that had not been presented in the time-compression test. To provide a break from the R-SPIN materials, subjects completed the AQT and the Speech, Spatial, and Qualities of Hearing Scale (SSQ) measures, described below, after the babble condition before returning to the R-SPIN materials for this test.

#### *Measures using the CRM*

The CRM corpus (Bolia et al., 2000) consists of a collection of sentences spoken by four male and four female talkers. All sentences are of the form "Ready [call sign] go to [color] [number] now." There are eight call signs (arrow, baron, charlie, eagle, hopper, laker, ringo, tiger), four colors (blue, green, red, white) and eight numbers (1–8) spoken in all 256 combinations by each talker. The test battery utilized a single male voice presented either in the original form or transformed (time reversed and/or transposed in fundamental frequency by 6 semitones). These conditions represent a subset of those described in Lee and Humes (2012). On each trial, two different sentences were presented simultaneously and the task was to listen to the voice that said "baron" as the call sign (always the original unaltered voice) and report the color and number spoken by that voice. Each trial began with the word "LISTEN" presented visually on the display, followed 500 ms later by presentation of the sentences. After each presentation, subjects responded by touching (or clicking with a mouse) virtual buttons on a touch screen display to indicate whether they heard the "baron" call sign (which was always spoken), and, if so, to indicate the color and number spoken by the same talker. All four colors and all eight numbers were included on the response display which remained in view throughout each block of trials. The next trial was initiated by either clicking on (with the mouse) or touching a box on the monitor labeled "OK." All trial blocks consisted of 32 trials. The sequence of trial blocks was as follows: (1) two trial blocks with no competing sentence, for familiarization with the task; (2) one practice trial block with examples of each of the different listening conditions; (3) four trial blocks of simultaneous competing sentences in the same male voice (unaltered); (4) four trial blocks with a 6-semitone shift of the fundamental frequency of the target or competing voice (50% each within each block); (5) a second set of four trial blocks with simultaneous competing sentences in the same voice; and (6) four trial blocks in which the competing sentences were both time-reversed and transposed (F0 by 6 semitones).

#### *Vowel-sequence identification*

This task used four speech stimuli that consisted of the center 40 ms of vowels produced naturally in a /p/-vowel-/t/ context by a male talker (Fogerty et al., 2010). The first vowel presented was randomly selected from the four alternatives and presented randomly to either the right or left ear. After a variable stimulus onset asynchrony (SOA), the second vowel, randomly selected from the remaining three vowels, was presented to the opposite ear. The SOA was randomly selected from a set of six SOAs (120–170 ms in 10-ms steps) to encompass the linear portion of the psychometric function relating identification performance to SOA, The subject's task was to identify the vowel pair presented, and only those responses identifying the two vowels in the correct sequence from the choices presented on the PC touchscreen were scored as correct. Responses were collapsed across SOA values, and overall percent-correct performance was recorded. A total of 144 trials were presented with equal distribution of left-ear leading and right-ear leading trials, SOAs, and the 12 possible vowel pairs.

#### **SPEECH, SPATIAL, AND QUALITIES OF HEARING SCALE (SSQ)**

The SSQ is a questionnaire developed by Gatehouse and Noble (2004) to measure auditory disability through self-report of various aspects of hearing in a variety of common settings. The questionnaire covers speech-understanding difficulties with different competing backgrounds, as well as aspects of spatial hearing and sound quality. Version 3.1.2b was used for this study. This version included 14 questions about speech understanding, 17 questions about spatial hearing and sound localization, and 22 questions about sound quality (including sound segregation, music and voice identification, and sound source identification). Because the vast majority of subjects were not hearing-aid wearers, the approach described by Singh and Pichora-Fuller (2010) was followed in which 7 of the 53 SSQ items related to hearing aids were eliminated prior to scoring (Qualities subscale items 15, 16, 17, 20, 21, and 22; Spatial subscale item 14). All scales are arranged such that higher scores indicate fewer difficulties.

## **RESULTS AND DISCUSSION RELIABILITY OF MEASURES**

As noted, the reliability of the various test measures included in this study was evaluated in a subsample of 31 of the 98 older adults who were tested twice with the entire test battery. **Table 1** displays the means and standard deviations for the test and retest conditions. The mean test-retest interval from the beginning of Session 1-test to the beginning of Session 1-retest was 98.8 days with a range of 78–129 days. There were a total of 9 sessions required for the full test battery and the order of the sessions, as well as the tests within each session, were identical in both the test and retest conditions.

**Table 1** also provides the significance levels (*p*) for pairedsample *t*-tests, for comparisons of performance from test to retest, as well as the Pearson-*r* correlation coefficients between test and retest. Given 50 variables in **Table 1**, a conservative Bonferroni



*The actual sample size varied slightly across measures and is indicated in the second column (N). The p-value for paired-sample t-tests of these two means is also provided as is the Pearson-r correlation coefficient between test and retest. In both cases, significant t-test p-values and r-values are marked with an asterisk. Given 50 measures, the criterion p-value for statistical significance using Bonferroni adjustment for multiple comparisons is p* < *0.001 (i.e., 0.05/50). Units for each measure are dB unless indicated otherwise.*

adjustment for multiple comparisons was applied when interpreting the significance of both the *t*-tests and the correlations. As a result, the criterion for statistical significance is a *p*-value for both the *t*-statistic and the correlation coefficient that is less than 0.001 (i.e., 0.05/50). Using this criterion for statistical significance, significant *t*-statistics and Pearson correlations are marked with an asterisk in **Table 1**. For six of the measures listed in **Table 1**, there is a significant difference in performance from test to retest with five of the six measures (3 informational-masking measures, one harmonic-mistuning measure, and environmental sound identification) improving from test to retest and one measure (TBAC) showing worse performance at retest. These differences in mean performance were not as important as the consistency of the measures from test to retest among the group of 31 older adults. This consistency is, for the most part, represented by the test-retest correlations. A test-retest correlation of *r* = 0.60 was adopted as the minimum acceptable test-retest correlation and this corresponds well to the boundary between the statistically significant and nonsignificant correlation coefficients. Application of this criterion for minimally acceptable test-retest correlation resulted in the initial elimination of the 13 measures in **Table 1** for which the correlation was not significant. Only one significant correlation (modulation detection at 5 Hz, *r* = 0.59) was eliminated using a criterion of *r* ≥ 0.6. For the 14 measures with test-retest correlations less than 0.60, scatterplots of test and retest scores were reviewed. In two of these 14 cases, correlations were low because of ceiling effects in the data (SPIN-PH in babble and CRM in quiet) and these variables were retained for subsequent analyses. It should be noted that the SPIN-PL scores in quiet were very high during the test condition and would have likely shown a low test-retest correlation as well, but this measure was inadvertently omitted from the retest condition. Another of the 14 measures eliminated initially, CRM scores for the same-talker competition, was restored for subsequent analyses following examination of the test-retest scatterplots. This low test-retest correlation (*r* = 0.46) was due to two outliers among the 31 subjects who experienced an unusually large improvement in scores (15–20% points) from test to retest. Without these two data points the test-retest correlation for this measure exceeded 0.80. Finally, the low test-retest correlation for ESI test was examined and it too was found to exceed the *r* = 0.60 cutoff with two outliers removed (*r* = 0.50 before removal and 0.65 after removal). Although the impact of the outliers was not as great as with CRM, the ESI test was retained because of the previous demonstration of its reliability (Kidd et al., 2007) and because it is the only measure of this particular auditory ability in the test battery. In the end, a total of 10 measures were eliminated from further analyses due to poor test-retest reliability and all of these were auditory non-speech measures.

#### **GROUP DATA: OLDER ADULTS vs. YOUNG NORMAL-HEARING ADULTS**

**Table 2** provides the means and standard deviations for 41 measures, the 40 reliable measures identified in the previous section plus the SPIN-PL score in quiet, for the group of 98 older adults and 27 younger adults. Given 41 measures, a conservative Bonferroni adjustment of the criterion for statistical significance yields a criterion *p*-value of 0.00122 (i.e., 0.05/41). Using this criterion, the *p*-values for significant independent-groups *t*-tests have been marked in **Table 2** with asterisks. Of the 41 measures in **Table 2**, 15 revealed significant differences between young and older adults; 12 showing older adults performing worse than young adults and 3 showing older adults performing better than young adults. (The latter tests are marked with a plus sign by the older group's score.) For the three measures for which older adults outperformed young adults, two (TRT, SPIN-PH interrupted) made use of high-context sentences. Superior performance of older adults on such materials is consistent with superior verballybased cognitive scores in older adults (e.g., Salthouse, 2010) and is a phenomenon that has been observed frequently, but not universally, with the SPIN-PH test materials [see review in Humes et al. (2007), and note the slightly worse performance of older listeners with interrupted PH and PL materials observed by Kidd and Humes (2012)]. Superior performance of older adults on the SPIN-PH materials was not observed here for the babble test conditions, but this could be due, in part, to the ceiling effects observed in this condition. The lone remaining case in **Table 2** of superior performance of older adults was for modulation detection for a 20-Hz modulated broad-band noise and it is unclear why the older adults outperformed the young adults for this task.

For the 12 measures in **Table 2** for which older adults performed significantly worse than young adults, six of these are cognitive measures; the three working-memory tasks and the three AQT verbal speed-of-processing measures. For such processingbased measures of cognitive function, steady declines in performance throughout adulthood have been well-documented [see review by Salthouse (2010)] and the present data are entirely consistent with the literature. Of the remaining six measures for which the older adults performed significantly worse than the young adults, three involved non-speech auditory psychophysical measures (two informational-masking measures and the TBAC), one was a non-speech sound identification test (ESI), one involved speech-recognition performance (SPIN-PL, time compressed), and one was the speech subscale of the SSQ. The results for the SSQ are generally consistent with the effects of age and hearing loss observed for this self-report measure previously (Gatehouse and Noble, 2004; Singh and Pichora-Fuller, 2010; Banh et al., 2012). Likewise, the group differences for the time-compressed SPIN-PL items are similar to the results observed by Gordon-Salant and Fitzgibbons (1993) for the same test materials. In general, the performance by older listeners on psychoacoustic tasks observed here is consistent with earlier work. Prior research with the TBAC (Christopherson and Humes, 1992) has shown poorer performance on some tasks (primarily temporal-processing measures and speech tests), and many other investigations with psychoacoustic tasks have shown that older listeners often perform as well as younger listeners with simpler stimuli and non-temporal tasks, but are likely to have greater difficulty with temporal tasks and more complex stimuli (see Fitzgibbons and Gordon-Salant, 2010, for a review).

It is interesting that for the 11 measures of speechunderstanding included in **Table 2**, only one, the time-compressed SPIN-PL test, revealed significantly worse

## **Table 2 | Means (M) and standard deviations (SD) for the groups of young (***N* **= 27) and older (***N* **= 98) adults.**


*The t-value (t) and p-value (p) for independent-sample t-tests of these two means is also provided. Significant t-values are marked with an asterisk. Given 39 measures, the criterion p-value for statistical significance using Bonferroni adjustment for multiple comparisons is p* < *0.00122 (i.e., 0.05/41). Units for each measure are dB unless indicated otherwise. Tests for which the older group performed significantly better than the younger group are indicated by a plus sign by the older groups' score.*

performance for the older adults, despite the presence of varying degrees of high-frequency hearing loss in the older adults. It is important to recall, however that the speech stimuli used in each of these speech-understanding measures had been spectrally shaped in this study to minimize the contributions of inaudibility. Nonetheless, the cochlear pathology presumed to underlie the hearing loss is still present, but, along with age, seems to have little impact on most of the measures of speech-understanding included in this study, with the exception of time-compressed low-context sentences.

#### **INDIVIDUAL DIFFERENCES AMONG OLDER ADULTS**

As a first step in examining individual differences in performance among the 98 older adults in this study, factor analysis was used to reduce the redundancy among the various sets of variables. There were three basic sets of variables in this study in which most of the measures could be placed: (1) auditory non-speech psychophysical measures; (2) cognitive/linguistic-processing measures; and (3) speech-understanding measures. There were 18 auditory nonspeech psychophysical measures that, based on the information presented previously in **Table 1**, were considered sufficiently reliable for further analyses. To these 18 variables were added two measures of average hearing loss: (1) pure-tone average (PTA), which was the mean hearing loss at 500, 1000, and 2000 Hz in the test ear; and (2) high-frequency pure-tone average (HFPTA), which was the mean hearing loss at 1000, 2000, and 4000 Hz in the test ear. These 20 non-speech auditory measures were subjected to a principal-components factor analysis in an effort to reduce redundancy among the variables and, possibly, depending on the outcome of these analyses, also minimize collinearity among this set of variables. These factor analyses were exploratory and empirically motivated, not confirmatory or theoretically motivated. The primary objective was simply to capture the greatest amount of variance among this set of 20 measures and represent that variance with a smaller number of factors. In the initial factor analysis, and all subsequent factor analyses, the analysis made use of oblique rotation of factors (Promax rotation; κ = 4; Gorsuch, 1983) and the between-factor correlation matrix was examined if more than one factor emerged. If any between-factor correlations exceeded a value of 0.40, then the set of correlated factor scores were saved for each subject for a subsequent secondorder factor analysis. If the initial factor analysis failed to generate any between-factor correlations above 0.40, then an uncorrelated or orthogonal fit was considered appropriate and the set of orthogonal factor scores was saved for each subject.

For the auditory non-speech psychophysical measures, the initial factor analysis showed low communalities for three measures, Modulation Detection at 20 Hz, the TBAC, and Stream Segregation-1 for a 150-Hz fundamental. This was reflected, in all cases, in low component weights for all factors included in the solution. As a result, these three variables were dropped and the factor analysis was repeated for the set of 17 auditory non-speech psychophysical measures. A good solution was then obtained, accounting for 74.6% of the variance with five factors. Low between-factor correlations (*r* < 0.40) in the initial oblique rotation supported the subsequent use of orthogonal rotation. Communalities were all in excess of 0.43 with most (13 of 17) ≥ 0.70 and the KMO sampling-adequacy statistic was reasonably high (0.67). The component weights of each of the 17 auditory psychophysical measures on each of the five resulting rotated (varimax) orthogonal factors are shown in **Table 3**. Based on the patterns of these component weights Factors 1 through 5, respectively, were interpreted and labeled as follows: (1) Informational Masking (InfMask); (2) Signal Envelope Processing (ModDet); (3) Hearing Loss and Harmonic Mistuning (HLoss\_HM; (4) Dichotic Signal Detection (DicSigDet); and (5) Stream Segregation (StrmSeg). For the HLoss\_HM factor, the component weights for the hearing loss measures were clearly stronger than those for the harmonic-mistuning measures. The correlations between the four independent variables underlying this factor (two PTAs and two measures of harmonic mistuning,



*Weights* >*0.4 are shown in bold typeface.*

each with a different fundamental frequency) ranged from *r* = 0.28–0.35, suggesting about 10% common variance between hearing loss and harmonic mistuning.

Next, the six cognitive measures (three measures of working memory and the three measures of verbal speed of processing from the AQT) were subjected to a similar analysis. A good solution was obtained (KMO sampling adequacy statistic = 0.79; all communalities ≥ 0.62) and 76.5% of the variance was explained by two moderately correlated (*r* = −0.44) factors. **Table 4** shows the rotated oblique component (pattern matrix) weights for the initial factor solution. The two factors were interpreted as working memory and verbal speed of processing based on the pattern of component weights across factors. A second-order principalcomponents factor analysis was then performed on these two correlated factor scores and a single global cognitive processing factor resulted (KMO = 0.50; communalities = 0.72; 72.1% of the variance accounted for). This single global cognitiveprocessing factor was labeled CogProc\_Global and saved for all subjects.

Next, the 10 speech-understanding measures shown in **Table 5** were subjected to a principal-components factor analysis. A good solution was obtained with two factors and the resulting component weights for the pattern matrix of the obliquerotated solution are shown in **Table 5** (KMO sampling adequacy statistic = 0.84; all communalities ≥ 0.54). A total of 67.8% of the variance was explained by two moderately correlated (*r* = 0.50) factors. Based on the pattern of weights in **Table 5**, the two moderately correlated factors were interpreted as open-set recognition, with mainly the R-SPIN tests loading heavily on this factor, and closed-set speech identification, with CRM and vowel-sequence identification tests loading on this factor. A second-order principal-components factor analysis was then performed on these two correlated factor scores and a single global speech-understanding factor resulted (KMO = 0.50; communalities = 0.75; 74.8% of the variance accounted for). This single global speech-understanding factor was labeled SpeechUnd\_Global and saved for all subjects.

Finally, the three scales of the SSQ were subjected to a principal-components factor analysis. A single factor emerged,

**Table 4 | Component weights from the pattern matrix for each cognitive measure on each of the two oblique principal components identified via factor analysis.**


*Weights* >*0.4 are shown in bold typeface.*

**Table 5 | Component weights from the pattern matrix for each speech-understanding measure on each of the two oblique principal components identified via factor analysis.**


*Weights* >*0.4 are shown in bold typeface.*

accounting for 72.8% of the variance, with a good KMO sampling-adequacy statistic (0.67) and all communalities exceeding 0.67. This factor score was saved for all subjects at SSQ\_Global.

Ultimately, the focus of this study was to explain individual differences among older adults in aided speech understanding (SpeechUnd\_Global) or everyday self-reported speech perception (SSQ\_Global) using various predictor variables. The large set of non-speech psychoacoustic predictor variables was reduced to five orthogonal factor scores and the six cognitive measures were reduced to one global factor score (CogProc\_Global). ESI and TRT scores were added as additional potential predictors that were not necessarily represented well by any of the other predictor variables, but could be of significance. Correlations among the predictors and the two speech measures are shown in **Table 6**. Although Age is not a direct causal factor contributing to speech understanding, it is included in **Table 6** to show its correlation with other variables that may be causal factors in the observed relation between age and speech understanding. All but two of the measures in **Table 6** have significant correlations with SpeechUnd\_Global, with the highest correlations for Age, Cognition, and ESI (all greater than 0.5). Only three measures have significant correlations with SSQ\_Global (HLoss\_HarmMis, CogProc\_Global, and ESI).

To determine the relative contributions of the various measures in accounting for speech understanding (SpeechUnd\_Global), a dominance analysis (Azen and Budescu, 2003; Budescu and Azen, 2004) was performed with six of the nine predictor variables in **Table 6**. Two of the five psychoacoustic variables (ModDet and StrmSeg) were omitted because their correlations with SpeechUnd\_Global were non-signifiant, and quite low. Age was not included in the dominance analysis, because it is not a direct causal factor, and because it did not account for any significant variance in SpeechUnd\_Global that was not accounted for by the other variables in **Table 6**. Results of the dominance analysis are shown in **Table 7**. Dominance analysis provides a measure of importance for each predictor variable, which is the average amount of variance accounted for by each variable when entered into the regression equation alone and after each of the possible subsets of the other predictor

**Table 6 | Correlation matrix of selected predictor variables and global speech understanding measures.**


*\*p* < *0.01.*


*R2-values resulting from the entry of each variable in the presence of each possible subset of the other variables are computed, and the average R2-values across subsets of size k* = *0 through 5 (the total number of variables* −*1) for each variable are shown in the table. General dominance is the average R2-value across all levels of k. Rescaled dominance is the general dominance value rescaled as a percentage of the total variance accounted for by the full model.*

variables, with subset size (*k* in **Table 7**) ranging from 0 to 5 in the present case. A rescaled version of this importance measure is also provided by expressing the average variance for each variable as a percentage of the total variance accounted for by the full set of variables (59.5% in this case). Given the average test-retest reliability of the measures that make up the speech understanding factor (about *r* = 0.8), the maximum variance that one could expect to account for in this case would be 64% (0.82). Thus, these measures account for roughly 93% (59.5/64) of the "systematic variance" in speech understanding. As seen in **Table 7**, ESI has the highest rescaled dominance value (25%), followed by Cognition (19.2%), with TRT, InfMask, and HLoss\_HM close behind (14.8–16.8%), and a much smaller value for the Dichotic measure (8.2%). This model is represented in **Figure 2**. The general dominance of ESI falls short of conditional dominance due to a single value of *k* (*k* = 0) at which another variable (Cognition) accounts for slightly greater variance. [See Azen and Budescu (2003), for a discussion of complete, conditional, and general dominance.]

The importance of ESI in accounting for Speech Understanding is consistent with earlier findings by Kidd et al. (2007), who found that a similar ESI test was the only one of 16 non-speech measures to have its strongest loading on a factor defined by three speech-recognition measures (and the ESI measure). The present finding provides further support for the notion of a general familiar-sound recognition ability that is not specific to speech, and also reinforces the idea that this ability is distinct from a general cognitive or intellectual ability. It should also be noted that the other auditory measures in this test battery found to be related to speech understanding would not be considered to be measures of temporal or spectral resolution; areas of focus in many prior studies. Rather, they are largely measures that involve higher level processes such as selective auditory attention to complex sounds.

The dominance analysis also provides information about the relation between TRT and ESI. Both are measures of the ability to make use of partial information, but ESI is based on partial acoustic information about everyday sounds, and

TRT involves the use of partial information about written words in sentences. That the two measures have very little common variance (0.5%) indicates that they are not measures of a common ability to use partial information to recreate wholes. Both measures have relatively high correlations with CogProc\_Global, but ESI accounts for a greater proportion of variance in Speech Understanding independently of CogProc\_Global (and the other measures) than does TRT. Thus, despite the common linguistic component in both TRT and the recognition of masked (or interrupted) speech (i.e., the use of linguistic context), the ability to identify familiar masked nonspeech sounds is a better predictor of speech understanding in noise than is the ability to identify masked visually-presented words.

Measures from the test battery were less successful in accounting for individual differences in self-reported speech understanding difficulties, as measured by the SSQ. Only three of the predictors in **Table 6** (HLoss\_HarmMiss, CogProc\_Global, and ESI) were significantly correlated with SSQ\_Global. Together, these three variables accounted for 21.4% of the variance in SSQ\_Global, or 33.4% (21.4/64) of the systematic variance. (Only an additional 7.8% can be accounted for by including the other six predictors in **Table 6**, with roughly 41% of that increase due to a suppressive effect of Age on HLoss\_HarmMiss.) Dominance analysis was again used to examine the relative importance of the three significant predictors (see **Table 8**). Although these three measures were also important predictors of speech understanding, their relative importance is considerably different in this case. ESI is no longer dominant, accounting for considerably less of the explained variance than CogProc\_Global, which is now completely dominant (accounting for the most variance at each level of *k*, and all comparisons within each level of *k*), and HLoss\_HarmMiss, which has a much greater relative importance



*R*2*-values associated with the addition of each variable to regression models consisting of subsets of other variables are presented as described in Table 7.*

than it did in accounting for SpeechUnd\_Global. The results suggest that while cognitive abilities and hearing loss are important predictors of both aided speech understanding and self-reported speech understanding difficulties, the latter is more influenced by variables not included in the current test battery.

#### **SUMMARY**

The main points of this study of individual differences in older adults follow. First, using the procedures and tasks in this study, it was possible to obtain reliable estimates of performance from older adults on many measures of non-speech auditory perception, visually based cognitive-linguistic processing, and speech understanding. Second, as a group, the older adults were outperformed by the group of young adults on about 25% of the measures used in this study. About half the time, however, these differences were in the cognitive domain and seldom were agegroup differences observed in aided speech-understanding. The latter observation is undoubtedly due to the use of spectral shaping in this study to minimize the influence of stimulus inaudibility on speech-understanding performance. This suggests, however that neither the group differences in age nor presence of cochlear pathology were critical for recognizing or identifying the

#### **REFERENCES**


transfer functions in normalhearing and hearing-impaired listeners. *Audiology* 24, 117–134. doi: 10.3109/00206098509081545


spectrally shaped speech stimuli. Third, individual differences in aided speech-understanding performance (SpeechUnd\_Global) were well-explained by 5–6 predictor variables included in this study with significant contributions from visual measures of cognitive-linguistic processing (CogProc\_Global, TRT), and non-speech auditory measures (ESI, Informational Masking, Hearing Loss and Dichotic Signal Detection) that primarily assess auditory abilities more complex than basic spectral and temporal processing. Fourth, self-report measures of speechunderstanding difficulty (SSQ\_Global), however, were less wellaccounted for by the array of predictor variables included in this study. The three primary predictors that emerged were CogProc\_Global, Hearing Loss, and, to a lesser degree, ESI, with about 33% of the systematic variance in SSQ scores explained. Given that the SSQ assesses auditory perception in many everyday listening situations and few of the subjects in this study were hearing aid wearers, it is to be expected, based on prior studies of unaided speech-understanding that hearing loss and cognition would be the primary predictors of individual differences (e.g., Akeroyd, 2008; Humes and Dubno, 2010).

## **ACKNOWLEDGMENTS**

The authors thank Dan Fogerty, Charles Brandt, Dana Kinney, Megan Chaney, Hannah Fehlberg, Ellie Barlow, Nick Humes, and Thomas Gennaro for their assistance with various aspects of data collection for this study. This work was supported, in part, by a research grant from the National Institute on Aging (R01 AG008293).

#### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/Systems\_Neuroscience/10. 3389/fnsys.2013.00055/abstract

*J. Exp. Psychol.* 89, 244–249. doi: 10.1037/h0031163


*Aging Auditory System*, eds S. Gordon-Salant, R. D. Frisina, A. N. Popper, and R. R. Fay (New York, NY: Springer), 259–274. doi: 10.1007/978-1-4419-0993-0\_9


aging humans - hearing sensitivity and psychoacoustics," in *The Aging Auditory System: Perceptual Characterization and Neural Bases of Presbycusis*, eds S. Gordon-Salant and R. Frisina (New York, NY: Springer), 111–134.


a new compact disc for auditory perceptual assessment in the elderly. *J. Am. Acad. Audiol.* 7, 419–427.


and sentence-onset differences on speech-identification performance of young and older adults in a competing talker background. *J. Acoust. Soc. Am.* 132, 1700–1717. doi: 10.1121/1.4740482


*Acoust. Soc. Am.* 86, 1294–1309. doi: 10.1121/1.398744


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 07 May 2013; paper pending published: 15 July 2013; accepted: 09 September 2013; published online: 01 October 2013.*

*Citation: Humes LE, Kidd GR and Lentz JJ (2013) Auditory and cognitive factors underlying individual differences in aided speech-understanding among older adults. Front. Syst. Neurosci. 7:55. doi: 10.3389/fnsys.2013.00055*

*This article was submitted to the journal Frontiers in Systems Neuroscience.*

*Copyright © 2013 Humes, Kidd and Lentz. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Changes in auditory perceptions and cortex resulting from hearing recovery after extended congenital unilateral hearing loss

## *Jill B. Firszt1\*, Ruth M. Reeder1, Timothy A. Holden1, Harold Burton2,3 and Richard A. Chole1,4*

*<sup>1</sup> Department of Otolaryngology-Head and Neck Surgery, Washington University School of Medicine, St. Louis, MO, USA*

*<sup>2</sup> Department of Anatomy and Neurobiology, Washington University School of Medicine, St. Louis, MO, USA*

*<sup>3</sup> Department of Radiology, Washington University School of Medicine, St. Louis, MO, USA*

*<sup>4</sup> Department of Developmental Biology, Washington University School of Medicine, St. Louis, MO, USA*

#### *Edited by:*

*Jonathan E. Peelle, Washington University in St. Louis, USA*

*Reviewed by: Hamish Innes-Brown, Bionics Institute, Australia Kevin J. Munro, The University of Manchester, UK*

#### *\*Correspondence:*

*Jill B. Firszt, Department of Otolaryngology-Head and Neck Surgery, Washington University School of Medicine, 660 South Euclid Avenue, Campus Box 8115, St. Louis, MO 63110, USA e-mail: firsztj@ent.wustl.edu*

Monaural hearing induces auditory system reorganization. Imbalanced input also degrades time-intensity cues for sound localization and signal segregation for listening in noise. While there have been studies of bilateral auditory deprivation and later hearing restoration (e.g., cochlear implants), less is known about unilateral auditory deprivation and subsequent hearing improvement. We investigated effects of long-term congenital unilateral hearing loss on localization, speech understanding, and cortical organization following hearing recovery. Hearing in the congenitally affected ear of a 41 year old female improved significantly after stapedotomy and reconstruction. Pre-operative hearing threshold levels showed unilateral, mixed, moderately-severe to profound hearing loss. The contralateral ear had hearing threshold levels within normal limits. Testing was completed prior to, and 3 and 9 months after surgery. Measurements were of sound localization with intensity-roved stimuli and speech recognition in various noise conditions. We also evoked magnetic resonance signals with monaural stimulation to the unaffected ear. Activation magnitudes were determined in core, belt, and parabelt auditory cortex regions via an interrupted single event design. Hearing improvement following 40 years of congenital unilateral hearing loss resulted in substantially improved sound localization and speech recognition in noise. Auditory cortex also reorganized. Contralateral auditory cortex responses were increased after hearing recovery and the extent of activated cortex was bilateral, including a greater portion of the posterior superior temporal plane. Thus, prolonged predominant monaural stimulation did not prevent auditory system changes consequent to restored binaural hearing. Results support future research of unilateral auditory deprivation effects and plasticity, with consideration for length of deprivation, age at hearing correction and degree and type of hearing loss.

**Keywords: unilateral hearing loss, congenital, conductive, stapedotomy, brain imaging, sound localization, speech recognition**

#### **INTRODUCTION**

Monaural hearing induces auditory system reorganization. This occurs in cases of unilateral sensorineural hearing loss, where there is sensory (inner ear) or neural dysfunction, as well as losses in the conductive pathway between the outer and inner ear. In animals, unilateral conductive hearing loss (UCHL) results in structural and functional changes within the auditory system. For example, on the UCHL side in rats, the size of neurons in the anteroventral cochlear nucleus was smaller (Coleman and O'Connor, 1979) and binaural interactions were absent in the inferior colliculus (Silverman and Clopton, 1977). Likewise in cats, UCHL reduced the inhibition from the ipsilateral ear on neurons in the inferior colliculus when the contralateral ear was deprived of sound (Moore and Irvine, 1981). Additionally, UCHL leads to decreased 2-deoxyglucose uptake bilaterally in the higher auditory medial and lateral superior olive nuclei even in silence, indicating decreased neuronal activity (Tucci et al., 2001). Interestingly, in a number of these studies, bilateral conductive hearing loss had little to no effect on neuronal size or binaural interactions, with normal maintenance of contralateral and ipsilateral projections. Together, these studies suggest that modifications in the balance of afferent activity alter binaural interactions and auditory system structures.

Imbalanced input from UCHL also degrades signal segregation for listening in noise and time-intensity cues for sound localization in humans (Wilmington et al., 1994; Gray et al., 2009). Amongst tasks requiring binaural processing, UCHL participants had lower interaural temporal difference limens, required higher intensities in the affected ear to perceive balanced loudness, had lower speech recognition in noise, and for masking level differences (MLDs), had signal detection affected by lower amplitude noise levels (Hall and Grose, 1993; Wilmington et al., 1994; Gray et al., 2009). Self-assessment questionnaires that probe hearing in different listening environments indicated numerous hearing difficulties in those with congenital UCHL (Priwin et al., 2007). While there have been studies of bilateral sensorineural auditory deprivation and later hearing restoration from implantable devices (e.g., cochlear implants), much less is known about unilateral auditory deprivation from conductive hearing loss and subsequent hearing improvement.

Effects of early abnormal auditory experience and later recovery have been reported in several animal studies. For example, UCHL created by plugging one ear in young owls initially altered localization abilities but abilities then recovered, that is, the owls made use of abnormal cues to accurately localize (Knudsen et al., 1984a,b). After the earplug was removed, these owls again made localization errors but after some weeks they regained accuracy. However, localization performance did not recover when ear plugging occurred in older owls or even in younger owls when the plug remained in place beyond a few weeks. Thus, the ability to recover was affected by the age at which UCHL occurred as well as the age at which hearing was restored. While it is well known that unilateral deprivation of the visual system early in life results in permanent impairment of binocular vision (Hubel and Wiesel, 1970), less is known about the developing auditory system's ability to recover binaural abilities. Undefined in humans is a sensitive period for binaural hearing, where behavior can adapt to abnormal experience and develop accurate abilities, and a critical period, after which the ability to adapt to altered (including restored) hearing is greatly diminished or nonexistent.

Binaural performance improved in some individuals who had UCHL correction when the loss was acquired after maturation (Hausler et al., 1983; Hall and Derlacki, 1986, 1988). Hall and colleagues showed, using measures of MLDs, that reduced binaural abilities may continue after restoration of hearing in adults with otosclerosis (Hall and Derlacki, 1986, 1988; Hall et al., 1990). However, in a later longitudinal study of adults with otosclerosis, MLDs returned to normal in many participants when tested a year after hearing restoration (Hall and Grose, 1993). Otosclerosis is typically diagnosed in young to middle adulthood and the hearing loss is often progressive in nature (and primarily conductive), thus these individuals had several years of normal hearing (NH) prior to hearing loss onset.

Congenital conductive hearing loss may occur for reasons of atresia or middle ear anomalies such as a fused malleus and incus, or fixation of the stapes. Degree of hearing loss and the presence or absence of other inner ear deficits varies with the abnormality, but can result in 60 dB of hearing loss from the conductive aspect alone. Diagnosis of hearing loss and treatment to improve hearing often occurs in childhood. When left untreated into adulthood, individuals will have extended periods of auditory deprivation.

A prior study assessed binaural abilities in patients aged 6–33 years, before and after surgery to correct congenital UCHL (Wilmington et al., 1994). Measures included interaural temporal difference limens, MLDs, sound localization and speech recognition in noise. Post-surgery binaural improvements were significant for some individuals and some tasks but not all; sound localization and speech recognition continued to be difficult especially when noise was towards the NH ear and the restored ear was relied upon for speech understanding. Abnormal auditory experience in early life may affect later binaural abilities; however, the nature and impact of these interactions are not entirely clear.

In the current case study, we investigated the effects of longterm congenital, unilateral mixed (conductive and sensorineural) hearing loss on sound localization, speech understanding, and cortical organization, both before and after the conductive component of the hearing loss was corrected in adulthood.

## **MATERIALS AND METHODS**

Written informed consent was obtained from the participant in accordance with the Declaration of Helsinki and guidelines approved by the Human Research Protection Office at Washington University School of Medicine (WUSM).

## **PARTICIPANT**

The participant (P1) was a 41 year old female who had a history of unilateral hearing loss in the right ear that had been present since early childhood (probably since birth). Family history included a father and paternal grandmother with mixed unilateral hearing loss. P1's father had successful stapes surgery in adulthood as a treatment for otosclerosis induced hearing loss. P1 and her family had believed P1's hearing loss to be sensorineural and non-correctable. A hearing aid was fitted around age 5 but discontinued after a brief trial due to lack of benefit. In adulthood, the participant had a comprehensive audiological evaluation that diagnosed the loss as mixed. P1 consulted an otolaryngologist to discuss treatment options. Audiological air-conduction results (see **Figure 1A**) for the right ear (red triangles) indicated a severe hearing loss at 0.25 and 0.5 kHz, dropping to a profound hearing loss at 1 kHz and rising to a moderate to moderately-severe hearing loss from 3 to 8 kHz. Bone-conduction results (red brackets) identified the loss as mixed with a conductive component (30–70 dB) at all frequencies and an additional sensorineural component from 0.5 to 2 kHz. The difference between the air- and bone-conduction thresholds identifies the conductive component of the participant's hearing loss, that is, the portion of the loss that could potentially be corrected through surgery. Hearing levels were normal in the unaffected left ear (blue symbols) resulting in a large hearing asymmetry between ears. With this level of asymmetry and the mixed nature of the hearing loss in the affected ear, audiological masking approaches become more complicated due to potential cross masking effects. One-third octave narrow bands of masking noise (centered at the test frequency) were presented to the unaffected ear to isolate the affected ear during testing. Testing was completed with TDH-49 headphones with 40–60 dB of interaural attenuation. Although a 15 dB plateau was achieved at each frequency (increases in masking level at the unaffected ear did not increase the patient's response level in the affected ear), high levels of masking noise to the unaffected ear could possibly have been detected by the cochlea of the affected ear via bone

triangles or circles for the right ear. Circles indicate unmasked right ear thresholds and triangles indicate thresholds obtained with masking noise presented to the better hearing left ear. Masked bone-conduction thresholds for the right ear are indicated with red left-facing brackets.

conduction while attempting to identify accurate air-conduction thresholds of the affected ear.

Pre-operative, high resolution CT scanning revealed a slightly thickened stapes footplate with no other abnormalities suggestive of otosclerosis. The participant underwent a laser stapedotomy procedure with prosthetic reconstruction completed under general anesthesia through an aural speculum. This procedure is designed to restore mobility to the middle ear bones for improved sound conduction. A tympanomeatal flap was raised and the middle ear entered. The malleus and incus were mobile and the stapes firmly fixed in the oval window consistent with a congenitally fixed stapes. A CO2 laser was used to make a stapedotomy in the center of the stapes footplate and a Teflon piston was placed and connected to the incus. The meatal flap was returned into position.

#### **TEST MEASURES**

All testing occurred in double-walled sound booths with the participant comfortably seated. All test stimuli were from recordings and individually calibrated for the presentation equipment. Study measures were conducted pre-surgery and 3 and 9 months postsurgery.

Localization was measured with a roving-source Consonant-Vowel-Consonant (CNC) word task (Potts et al., 2009; Firszt et al., 2012a). American English Melbourne CNC words (Skinner et al., 2006) were presented randomly from 15 loudspeakers (10◦ apart and numbered 1–15) along a horizontal plane and 140◦ arc at 60 dB SPL (±3 dB). Each test administration included 100 words, presented randomly from each of the 10 active loudspeakers with the carrier "Ready". The participant sat approximately three feet in front of the center loudspeaker, was unaware that five loudspeakers were inactive (#2, 4, 8, 12, 14), and was instructed to face the center loudspeaker between each presentation but was allowed head turns during each carrier-word presentation. Head turns were allowed since it more accurately simulates talker localization in conversational settings. After each presentation, the participant repeated the word and indicated the source speaker number. For localization, a root mean square (RMS) error score was calculated as the mean target-response difference, irrespective of error direction. The task was administered twice and scores from each administration averaged.

Sentence understanding in noise was evaluated with the Hearing In Noise Test (HINT; Nilsson et al., 1994) in the *R*-Space (Revit et al., 2002; Compton-Conley et al., 2004). The *R*-Space consists of eight loudspeakers equally spaced in a 360◦ array with each loudspeaker 24 inches from the center of the participant's head. All loudspeakers presented a recording of diffuse restaurant noise (60 dB SPL), replicating a challenging real-life listening situation. Two 20 sentence lists were administered and an average signal-to-noise ratio (SNR) for 50% accuracy (SNR-50) obtained. Sentences were presented from the front loudspeaker beginning at a +6 dB SNR and adapted to be easier or more difficult based on the participant's responses. An average of the final 17 SNRs resulted in an SNR-50 for each list.

An adaptive speech reception threshold (SRT) psychoacoustic task resulted in speech thresholds for quiet and nine competing noise conditions (modified from the task described in Litovsky (2005) and Johnstone and Litovsky (2006)). SRT determinations were for test spondees (two-syllable words with equal stress on both syllables) spoken by a male talker and always presented from a loudspeaker facing the front of the seated participant. Testing occurred with noise emanating from one of three loudspeakers each placed 1.5 m from the participant: directly in front, 90◦ to the right, or 90◦ to the left. There were three noise types: multitalker babble (MTB) and two types of single-talker noise (a female talker and a male talker each presenting Harvard IEEE sentences). Initial spondee presentations were at 60 dB SPL for each noise type and source location. A four-alternative forced-choice task, without feedback, determined subsequent presentation levels that continued through four reversals using an adaptive paradigm based on participant responses. Noise conditions varied randomly and were tracked independently, resulting in an SRT (average of the last three reversals) for each noise type and loudspeaker location (e.g., front, left, or right for MTB, female talker, and male talker noise).

A psychoacoustic measure determined the random spectrogram sound (RSS) that was most different from two others during each trial. RSS are noise-like stimuli created with independent control of temporal or spectral sound parameters, but with all stimuli based on summation across the same 6-octave bandwidth (250–16000 Hz) and with matching intensity and durations (Schönwiesner et al., 2005; Burton et al., 2012). Spectral RSS differed by dividing the bandwidth into 3–16 spectral bands and changing between bands at a fixed temporal rate of 3 Hz. Temporal RSS differed by changing between three spectral bands at temporal rates between 8 and 30 Hz. Spectral RSS had greater complexity with more spectral bands and temporal RSS were more complex with higher temporal rates. In a three-interval, three-alternative, forced-choice, "odd-man out" paradigm, one of the stimuli, identified as the "target" differed in complexity from two "standard" stimuli. "Standard" sounds always had the same RSS complexity, but differed in constituent random fields or in amplitude modulations. The manipulated variable was minimum detectable changes (JND) in spectral or temporal complexity.

For spectral RSS, the number of spectral regions of the "standard" was 16, while the spectral regions for the "target" varied. The initial "target" had three spectral regions. After three consecutive correct responses, the "target" spectral regions increased with a "step-size" of three until an error occurred. After the first reversal, the "step-size" decreased to two spectral regions, and after three reversals, the "step-size" decreased to one spectral region. For temporal RSS, the temporal rate of the "standard" was 30 Hz, while the temporal rate varied for the "target" stimuli. The initial "target" temporal rate was 8 Hz. After three consecutive correct responses, the "target" temporal rate increased to 14 Hz. The same 6 Hz "step-size" in temporal rate repeated until an error occurred. After the first reversal, the "step-size" decreased to 3 Hz, and after three reversals, the "step-size" decreased to 1 Hz. After eight reversals, the "mean of target" values of the last four reversals provided an estimate of JNDs for spectral or temporal complexity. Feedback was provided for correct responses and each of four test runs concluded with eight reversals. Each of the JNDs for spectral and temporal RSS complexity was an average of the last four reversals.

A questionnaire also evaluated the participant's perception of listening function. The Speech, Spatial and Qualities of Hearing scale (SSQ; Gatehouse and Noble, 2004) probes three listening domains (14–19 questions each). The Speech domain probes speech recognition in a variety of listening environments and a range of talker visibility. The Spatial domain examines awareness of sound direction, distance and movement. In the Qualities domain, respondents indicate sound naturalness, listening effort and the ability to segregate multiple sounds. Each item is ranked on a scale from 0 (least ability) to 10 (most ability). Individual SSQ items are grouped by processing demands to obtain 10 subscale scores (Gatehouse and Akeroyd, 2006; Dwyer et al., 2013), four subscales within the Speech domain, two within the Spatial domain, and four within the Qualities domain.

Previously described protocols (Burton et al., 2012, 2013) were used to perform functional magnetic resonance imaging (fMRI), preprocess the images, and analyze blood oxygen level-dependent (BOLD) responses to auditory stimulation of the unaffected left ear. BOLD responses to RSS stimuli of 2 s duration were recorded at three times: pre-surgery and at 3 and 9 months post-surgery. Presentation times for RSS stimuli varied during 9-s silent intervals in 11-s volume acquisitions, which enabled capture of BOLD at different stimulus delays with respect to the echo-planar images (EPI) in this interrupted single event design (Belin et al., 1999). Subsequent assembly of BOLD amplitudes relative to baseline (% change in response) at stimulus delays from 2 to 9 s prior to the EPI reconstructed an average BOLD response time course. The average was from 24 trials for each stimulus to EPI delay time. Separately for each imaging session, *F*-tests per voxel evaluated whether the variance of evoked BOLD responses was greater than variance due to baseline noise. *F*-statistics were transformed to equally probable *z*-scores (*F*-*Z*stats). The significance of *F*-*Z*stats was determined after multiple comparison corrections based on Monte Carlo simulations (Forman et al., 1995) and with a correction threshold of *z* = 4.0 across 12 face-connected voxels and for *p* = 0.05.

The distribution of volume-based *F*-*Z*stats from each imaging session was superimposed on coronal slices through the patient's brain. Additionally, BOLD response time courses were extracted from regions of interest for peak *F*-*Z*stats identified with an automated search (Kerr et al., 2004). After registering the *F*-*Z*stats to the PALS-B12 surface-based atlas (Van Essen, 2005; Van Essen and Dierker, 2007), the activated auditory cortical fields were evaluated as previously described (Burton et al., 2012). Thus, the analyses evaluated activity in core primary auditory (Te1), planum temporale (Te2), and planum polare (Te3).

#### **RESULTS**

**Figure 1B** shows post-surgical audiometric thresholds as a function of frequency. (Results from the 3-month evaluation are shown. Note that the 9-month results were equivalent.) As expected, thresholds for the NH or left ear (blue symbols) were unchanged compared to pre-surgical results. The red line connects post-surgical air-conduction thresholds (obtained with insert earphones having approximately 75–90 dB interaural attenuation). Open circles indicate unmasked thresholds and open triangles indicate thresholds obtained with the unaffected ear masked. Post-surgical bone-conduction thresholds are indicated by red brackets. Air-conduction thresholds in the right ear improved 35–45 dB through the low and mid frequencies and 5–40 dB in the high frequencies. Surgery successfully eliminated the hearing loss conductive component from 0.5–3 kHz and most of the conductive component at 0.25 and 4 kHz.

**Figure 2** shows localization results. For each plot, the location of the loudspeaker source (in degrees azimuth) is indicated along the *x*-axis and of the reported loudspeaker along the *y*axis. Means and standard deviations of reported loudspeakers are plotted for each source loudspeaker location for pre-surgery

*x*-axis shows the sound-source loudspeaker locations in degrees azimuth that range from −70 degrees toward the NH ear side at axis' left to +70 degrees toward the hearing-impaired ear side at the axis' right. The *y*-axis shows the possible reported response loudspeaker locations in degrees azimuth that

and post-surgery test intervals of 3 and 9 months, respectively in Panels **A** to **C**. Correct identification of all presentations results in a straight diagonal line from the lower left- to the upper right corner. Each panel also includes the RMS error score for that test interval. A robust (Sandwich estimator) regression analysis method (with a Bonferroni adjustment for multiple comparisons) identified a significant effect of test interval [*F*(2,594) = 11.97, *p <* 0.003] and follow-up comparisons indicated significantly improved localization compared to baseline at 3 months [*F*(1,549) = 17.26, *p <* 0.003] and at 9 months [*F*(1,594) = 23.70, *p <* 0.001]. The improvement at 9 months compared to 3 months was not statistically significant (*p >* 0.05).

**Figure 3** displays results for *R*-Space sentence understanding in the presence of restaurant noise. The three test intervals are indicated along the *x*-axis and SNR score along the *y*-axis. Note that for this measure lower scores indicate better performance. Pre-surgery, P1 had 50% accuracy for sentences presented 3.5 dB softer than the surrounding restaurant noise (60 dB SPL). By 9 months post-surgery, P1 was able to understand sentences at even softer levels (4.9 dB softer than the restaurant noise). P1's performance improved (1.4 dB) from pre-surgery to the 9-month test interval. The 95% confidence interval for NH adults and noise from the front is ±1.2 dB (Nilsson et al., 1995).

**Figure 4** shows performance at the three test intervals for the Adaptive SRT test. A lower score indicates better performance. Panel **A** shows P1's performance in quiet. The softest level that P1 was consistently able to identify target speech from a closedset of four spondees in quiet did not differ substantially by test interval. Her SRT in quiet ranged from 12.9 dB pre-surgery to 10.3 dB at 3 months post-surgery. This is comparable to perforerror bars represent ±1 standard deviation. Mean RMS Error in degrees is noted within the plot for each test interval. Asterisks in Panels **B** and **C** indicated significantly improved localization results compared to pre-surgery (*p <* 0.001).

understand sentences in the presence of higher noise levels (e.g., an SNR of −4 dB indicates the restaurant noise was 4 dB louder than the target sentence).

mance of a group of 24 NH adults (ages 22–67 years) on this same task (see Figure 6 of Firszt et al., 2012b). Panel **B** shows performance in the presence of the three noise types, female talker (green diamonds), male talker (orange squares) and MTB (purple

triangles). Performance was more difficult with MTB than with each single-talker noise type and results were very similar in the presence of female and male talker noise. Performance improved at each successive test interval for each noise type except with female talker noise. Performance with MTB showed the greatest

improvement over time (pre-surgery SRT = 48.4 dB, 3-month SRT = 45.6 dB and 9-month SRT = 44.0 dB). Panel **C** shows results by noise location: right side with hearing loss (purple triangles), front (orange squares) and left side with the intact ear (green diamonds). The task was easier with noise from the affected side (right) than noise from the front (orange squares) or from the intact ear side (left). P1's scores for noise from the front were similar to scores for noise from the intact ear side.

**Figure 5** shows results from the two psychoacoustic measures with RSS stimuli that differed in spectral or temporal complexity with test interval along the *x*-axis and JND score along the *y*-axis. Performance for detecting differences from a standard of 16 spectral regions in the number of regions (purple squares) included in spectral RSS stimuli (e.g., spectral complexity) was very similar at all three test intervals (JND 8.8 pre-surgery to 8.2 at 9 months). P1's performance detecting differences from a standard temporal rate of 30 Hz in the rates (green circles) of temporal RSS stimuli (e.g., temporal complexity) was similar at pre-surgery (JND 10.8) and at 3 months (JND 10.4). By 9 months P1's JND for RSS temporal complexity differences had improved to 7.8. P1's performance on both tasks is similar to that of a group of 20 NH adults, ages 23–62 years, listening bilaterally across four test runs (**Figure 5**; Firszt et al., 2012b).

**Figure 6** shows SSQ questionnaire results describing selfperceived abilities in various listening scenarios by domain and subscale. The rating plots in Panel **A** are from four subscales in the Speech domain: Speech in Quiet (SiQ) as blue circles, Speech in Noise (SiN) as purple triangles, Speech in Speech Contexts (SiSCont) as orange squares, and Multiple Speech Stream Processing and Switching (MultStream) as green diamonds. Mean and SD subscale ratings by a group of 21 NH adults, ages 27–73, were reported by Dwyer et al. (2013). For reference, P1's ratings are compared to that NH group's ratings. P1's SiQ ratings were similar to the NH group's ratings at all three test intervals (9.3 pre-surgery to 10 at 9 months). P1's SiN ratings improved over time (5.1 pre-surgery to 7.3 at

impaired ear side with purple diamonds).

9 months), but were poorer than the NH group. The SiSCont and MultStream ratings improved over time and were similar to the NH group's ratings by 9 months post-surgery (SiSCont 5.3 presurgery to 7.4 at 9 months; MultStream 3.3 pre-surgery to 8.2 at 9 months).

The rating plots in Panel **B** are from two subscales in the Spatial domain: Distance and Movement (DisMov) as purple circles and Localization (Loc) as green triangles. P1's ratings for both subscales improved over time and were similar to the NH group ratings by 9 months (DisMov 5.7 pre-surgery to 8.7 at 9 months; Loc 4.0 pre-surgery to 8.0 at 9 months). The rating plots in Panel **C** are from four subscales in the Qualities domain: Segregation of Sounds (SegSnds) as purple triangles, Identification of Sound and Objects (IdSnd) as green diamonds, Sound Quality and Naturalness (Qlty) as orange squares and Listening Effort (Eff) as purple circles. SegSnds, IdSnd and Qlty ratings were all above nine and comparable to the NH group ratings at all three test intervals. Pre-surgery, Eff was poorer than that of the NH group but improved and was similar to that group's ratings by 9 months (6.0 pre-surgery to 8.3 at 9 months). In summary, P1's ratings were poorer than ratings of the NH adults reported by Dwyer et al. (2013) on 6 of 10 subscales prior to surgery. Each of those subscales improved and was more similar to the NH group's ratings by the 9-month test interval except for the SiN subscale.

The top row of **Figure 7** shows an inflated and laterally tilted view of the PALS-B12 atlas that reveals auditory cortex on the superior temporal plane. Drawn onto this plane are cytoarchitectonic borders for three auditory cortical fields (Te1, Te2, and Te3) (Glasser and Van Essen, 2011). The labeled purple spheres (A1, A2, B1 and B2) and comparable locations on the coronal sections indicate the locations of regions of interest strongly activated by the RSS stimuli (i.e., foci with peak *F*-*Z*stats). The most affected auditory cortical fields were Te1 and Te2. As shown in the leftmost column of coronal sections prior to surgery, the extent of significant activation was limited, bilaterally distributed, but slightly more extensive in the left hemisphere (LH), ipsilateral to the stimulated unaffected left ear. The activated zone was larger bilaterally at 3 months post surgery as shown in the middle column of sections. Activity was even more widely distributed by 9 months as shown in the right-most column of sections. Especially prominent at 9 months was a substantial response distribution contralateral to the stimulated left ear that extended over most of the posterior superior temporal plane and spread more than 1 cm from anterior to posterior. The affected contralateral cortex included all Te auditory cortical fields at 9 months post surgery.

The panels in **Figure 8** show response time courses from core (Te1.0 and Te1.1) and posterior belt (Te2) auditory fields. Response amplitudes were largest at 9 months post- surgery, especially in the belt area Te2 ipsilateral to the stimulated left ear and also in the anterior core Te1.0 field, contralateral to the left ear. Amplitudes were more nearly equal bilaterally before and after surgery in the center of primary auditory cortex located in Te1.1 fields of each hemisphere.

## **DISCUSSION**

Corrective surgery for the conductive portion of the hearing loss substantially improved hearing thresholds from 0.25 to 6 kHz. Prior to surgery, P1 could not detect sound in the mid frequencies and perceived loud sounds as soft in the lowest and highest frequencies of the affected right ear. After surgery, air-conduction thresholds were close to P1's bone-conduction thresholds, closing the air-bone gap. Sensorineural threshold levels remained, as expected, and thus hearing levels were essentially in the mild to

moderately-impaired range, reaching 25–35 dB HL at 0.25, 3, 4 and 6 kHz.

Source localization significantly improved following surgery. The largest improvement was between pre-surgery and the 3-month post-surgery test interval with stable performance between the 3 and 9-month test intervals. Wilmington et al. (1994) also showed significant localization improvements pre to post-operatively when measured at 4 and 24 weeks after surgery to correct congenital UCHL in a group of patients' ages 6–33 years; however, only three of the individuals were within the authors' normal confidence interval after hearing correction. Similar to our individual case, there were no significant changes in localization performance over time post-operatively for their study participants; that is between the 4 and 24-week test intervals.

Considering speech recognition in noise, P1's pre-surgical abilities varied by measure as did whether performance improved post-surgery. For example, on the *R*-Space task, P1's ability to understand sentences in the presence of restaurant noise improved between the pre-surgery and 9-month post-surgery test intervals. Noise type and location affected Adaptive SRT results. P1's SRT scores were similar when listening to words in the presence of single talker noise regardless of talker gender. P1's poorest SRTs were in the presence of MTB noise, as were the greatest improvements. With respect to the noise source, it was not surprising that P1 was able to hear the words at softer levels when noise was towards the affected (right) ear, even after surgery, than when the noise was towards the unaffected (left) ear or from the front. These results for the affected and unaffected ear were similar to Wilmington's (1994) results using a speech-in-noise task with low and high context sentences and competing babble (revised SPIN test, Bilger et al., 1984). When noise was toward the atretic ear, the majority of participants were within their NH group's 95% confidence interval, but when noise was toward the NH ear, only three of the 14 participants were within the NH confidence interval. A reported case of corrected long-term UCHL (Anderson, 1985) also had more difficulty understanding speech in speech-shaped noise than 10 NH controls through 5 months post-correction, even at an advantageous SNR (+20 dB). The reduced performance was present for both the affected and unaffected ears. By 14 months post-correction, performance matched that of the NH control group at positive SNRs but continued to be poorer than NH controls at 0 dB SNR, a fairly common SNR in noisy environments.

For the psychoacoustic task using RSS stimuli, results showed no change in detecting spectral complexity differences after surgical correction of hearing to the right ear. RSS temporal complexity detection did improve at the 9-month test interval, however performance at all test intervals (including pre-operative) was similar to that of NH individuals listening bilaterally (Firszt et al., 2012b). Thus, detection of spectral and temporal cues was not enhanced by unilaterally improved acoustic hearing for this participant. Perhaps this task is less reliant on binaural processing than listening in noise or localizing sound sources. It should be noted that for this task and the speech recognition measures, P1 completed the measures at three test intervals, introducing the possibility of a training effect, whereas the NH group data provided for comparison were from single test sessions.

Results of the SSQ were in general consistent with findings of the behavioral measures. For four of the subscales (SiQ and all of the subscales in the Qualities Domain except Eff), P1's ratings were between 9–10 and similar to ratings of the NH group (*n* = 21) reported by Dwyer et al. (2013). For all other subscales except SiN, ratings were poorer than those of Dwyer's NH group pre-surgery but more similar after 9 months binaural hearing experience. P1 continued to have more difficulty than the NH group, even after 9 months experience with SiN.

The self-assessment results from the SSQ were augmented by comments from P1. According to the participant, 1 week after surgery, all sounds were louder, at times exceptionally loud and even startling; leading to the need to depart from the immediate environment when this occurred and find a location that was quiet. Strategies for dealing with noise that had worked in the past, such as turning the impaired ear towards the noise source, no longer worked and led to confusion. Hearing from behind was a completely new sensation, and also created uncertainty as to where sound was coming from. P1 was involved with music, gave piano and violin lessons, and performed in a band. Sound quality and pitch changes were common during the early postoperative period. It was not always clear "what to make of all this sound".

Three months after surgery, sounds and sources were less confusing, sound was not as loud, and localization and sound quality had improved. P1 reported hearing new sounds that were not previously audible, even though P1 had one NH ear. The ability to hear at greater distances was notable, as was the ability to find and identify sounds in the surroundings without being able to see them. Prior to surgery, P1 often was unaware of a sound until the corresponding source came into view. This changed after surgery, when for example, hearing a person before seeing them was described as something novel. Being in noisy environments was less distracting and the ability to overhear conversation while holding a different conversation was a new experience. P1 commented that while regaining hearing abilities in the right ear, there was a sensation that the left ear (the unaffected ear) was also changing at the same time, and the two ears were "starting to play nice together". The result was hearing in stereo, again a notion that P1 had not realized prior to hearing correction. Finally, P1 reported increased feelings of confidence and reduction of effort to hear and communicate in everyday compared to pre-surgery. When questioned 9 months after surgery, the comments from P1 were similar to what had been previously reported. Overall, the descriptions mimic a period of over excitation of sound followed by adaptation and shifting to allow both ears to contribute to improved hearing performance using enhanced binaural abilities.

A case study presented by Stange et al. (2001) was of a 62 year old woman with a congenital maximum UCHL who reported hyperacusis (abnormal sensitivity to sound intensity) for more than 2 years following hearing restoration. Both the case reported by Stange et al. (2001) and the current case involved congenital onset and extensive time with UCHL and both reported an increased sensitivity to sound. However in the current case, the sensitivity diminished within the first 3 months following hearing restoration. One possible explanation for the difference may be that P1 continued to have some sensorineural hearing loss after surgery. Prior to surgery, P1 demonstrated a large asymmetry between ears; however hearing thresholds were 50–70 dB HL at 3 kHz and above, providing some high frequency input. Although this hearing was not considered useful based on participant report, some stimulation was present and may have contributed to quicker assimilation of the new auditory information. In this case, extended asymmetry in hearing did not prevent auditory system changes consequent to restored binaural hearing. Also promising was that P1 made significant improvements in binaural abilities despite hearing correction in adulthood at age 41. Gray et al. (2009) reported on combined data from two different studies (participant's age range 6–53 years, mean pre-operative hearing loss about 64 dB) and suggested that after surgery for aural atresia, older (38 years and greater) congenital adults performed more poorly when noise was toward the corrected ear than younger adults. Assessments were made, however, only 1 month after hearing correction and participant ages were not provided.

There was also evidence of neuroplastic reorganization in auditory cortex that especially involved core and belt regions in the current case. The overall extent and amplitudes of contralateral auditory cortex responses increased after hearing recovery, resulting in increased bilateral activation. Vasama et al. (1994) also found enhancements in auditory cortex in six participants with congenital UCHL using magnetoencephalography (MEG). They showed the expected asymmetric larger amplitudes in five patients and shorter latencies in three patients over the hemisphere contralateral to the stimulated NH ear (Vasama et al., 1994). The six patients ranged in age from 7–28 years and had mean hearing threshold levels in the affected ear near 70 dB HL. Only one patient showed larger amplitudes over the ipsilateral hemisphere, a finding similar to reports of cortical responses in patients with profound unilateral sensorineural hearing loss (Khosla et al., 2003; Hanss et al., 2009; Burton et al., 2012, 2013; Maslin et al., 2013). Of note, this individual also had greater hearing loss (about 90 dB HL). A later study compared preoperative and 2-months post-operative auditory-evoked magnetic fields in seven adults ages 26–51 with UCHL who underwent surgery for otosclerosis or abnormal ossicular chains (Vasama et al., 1998). Duration of hearing loss was 6–14 years and hearing loss in the affected ears was 57 dB HL before surgery and 17 dB afterwards. Based on MEG recordings, all patients showed cortical response changes after surgery and correction of the UCHL. Changes took the form of earlier latencies and increased response strengths contralateral to the stimulated ear, with some ipsilateral changes. Although mean pre-operative values did not differ from NH controls, the post-operative changes differed. This finding parallels that of our present participant, suggesting modification to cortical responses following hearing improvement.

In summary, a re-established balance to binaural activity might explain the trend toward improved performance on measures after the stapedotomy procedure for the studied participant. The SSQ results showed that ratings of the participant's perceived ability were similar to the NH comparison group (Dwyer et al., 2013) by 9 months on all subscales except SiN. It is possible that both the pre-surgery unilateral auditory deprivation and the post-surgery continued asymmetry between ears (due to the sensorineural hearing loss in the affected ear) influenced the speech perception in noise deficits. Persistent sensorineural hearing loss might also explain retained symmetrical activation of auditory cortex as opposed to a normally asymmetrical contralateral activation pattern in NH.

The improvements noted in binaural abilities were clinically promising, because they indicated that auditory system connections could re-activate with elimination of conductive hearing problems despite being physiologically latent well beyond any imagined critical developmental period. Most encouraging was that the restored physiology in native connectivity, even if only partial, had substantial behavioral consequences for the participant. The mechanisms responsible for these changes are unknown. However, they might arise from increased balance to binaural neural firing rates with the reintroduction of crossed inhibition that lead to changes in central gain mechanisms. At the same post-surgery test intervals, the observed enhanced contralateral activity in core and adjacent auditory cortical fields possibly are a manifestation of these changes in central gain. In contrast, sudden sensorineural unilateral hearing loss following surgical resections of acoustic neuromas result in the opposite effect of reduced crossed inhibition from the affected ear together with possible reduced central gain (Burton et al., 2013; Maslin et al., 2013). Together, these results support the need for future research of unilateral auditory deprivation effects and plasticity, with consideration for length of deprivation, age at hearing correction, degree of hearing, and type of hearing loss (conductive versus sensorineural).

#### **ACKNOWLEDGMENTS**

The authors acknowledge our participant's time and effort in this study. This work was supported by R01DC009010 from the National Institute on Deafness and Other Communication Disorders.

#### **REFERENCES**


**Conflict of Interest Statement**: The Associate Editor, Dr. Jonathan Peelle declares that, despite being affiliated to the same institution as the authors (the Department of Otolaryngology of Washington University School of Medicine), handled the review process objectively and no conflict of interest exists.

*Received: 18 September 2013; accepted: 24 November 2013; published online: 13 December 2013*.

*Citation: Firszt JB, Reeder RM, Holden TA, Burton H and Chole RA (2013) Changes in auditory perceptions and cortex resulting from hearing recovery after extended congenital unilateral hearing loss. Front. Syst. Neurosci. 7:108. doi: 10.3389/fnsys.2013.00108 This article was submitted to the journal Frontiers in Systems Neuroscience*.

*Copyright © 2013 Firszt, Reeder, Holden, Burton and Chole. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms*.

## Training changes processing of speech cues in older adults with hearing loss

## *Samira Anderson1,2†, Travis White-Schwoch1,2, Hee Jae Choi 1,2 and Nina Kraus 1,2,3,4,5\**

*<sup>1</sup> Auditory Neuroscience Laboratory, Department of Communication Sciences and Disorders, Northwestern University, Evanston, IL, USA*

*<sup>4</sup> Department of Neurobiology and Physiology, Northwestern University, Evanston, IL, USA*

*<sup>5</sup> Department of Otolaryngology, Northwestern University, Evanston, IL, USA*

#### *Edited by:*

*Jonathan E. Peelle, Washington University in St. Louis, USA*

*Reviewed by:*

*Preston E. Garraghty, Indiana University, USA Jonathan E. Peelle, Washington University in St. Louis, USA*

#### *\*Correspondence:*

*Nina Kraus, Auditory Neuroscience Laboratory, Northwestern University, 2240 Campus Drive, Evanston, IL 60208, USA e-mail: nkraus@northwestern.edu www.brainvolts.northwestern.edu*

#### *†Present address:*

*Samira Anderson, Department of Hearing and Speech Sciences, University of Maryland, College Park, USA*

Aging results in a loss of sensory function, and the effects of hearing impairment can be especially devastating due to reduced communication ability. Older adults with hearing loss report that speech, especially in noisy backgrounds, is uncomfortably loud yet unclear. Hearing loss results in an unbalanced neural representation of speech: the slowly-varying envelope is enhanced, dominating representation in the auditory pathway and perceptual salience at the cost of the rapidly-varying fine structure. We hypothesized that older adults with hearing loss can be trained to compensate for these changes in central auditory processing through directed attention to behaviorally-relevant speech sounds. To that end, we evaluated the effects of auditory-cognitive training in older adults (ages 55–79) with normal hearing and hearing loss. After training, the auditory training group with hearing loss experienced a reduction in the neural representation of the speech envelope presented in noise, approaching levels observed in normal hearing older adults. No changes were noted in the control group. Importantly, changes in speech processing were accompanied by improvements in speech perception. Thus, central processing deficits associated with hearing loss may be partially remediated with training, resulting in real-life benefits for everyday communication.

**Keywords: auditory plasticity, speech envelope, aging, hearing loss, speech perception, temporal fine structure, temporal coding**

## **INTRODUCTION**

Hearing loss is associated with reduced quality of life in older adults, affecting social and emotional well-being (Heine and Browning, 2002). The effects of hearing loss are especially noticeable in noisy backgrounds with multiple talkers (Dubno et al., 1984; Jin and Nelson, 2010). Many factors contribute to age-related deficits in speech-in-noise perception, including cochlear pathology (Dubno et al., 1984), impairments in central auditory processing (Phillips et al., 2000), and decreased cognitive resources (Pichora-Fuller, 2003; Wingfield et al., 2005; Peelle et al., 2010). Perceptual and neurophysiologic studies have demonstrated that central auditory processing is compromised by aging and hearing loss (Gordon-Salant et al., 2006; Clinard et al., 2010; Lister et al., 2011; Anderson et al., 2012). Hearing aid amplification improves audibility but cannot restore central auditory function (Tremblay et al., 2003). Training-driven neuroplasticity, which can partially reverse the effects of age-related deficits in temporal resolution in humans (Anderson et al., 2013c) and animals (de Villers-Sidani et al., 2010), may provide a means for treating abnormal central function processing associated with hearing loss.

One chief barrier to the wider use of amplification has been that hearing aids often make speech louder without improving clarity, especially in background noise (Johnson and Dillon, 2011). This phenomenon may arise from the effects of hearing loss on central processing of speech. Two acoustic aspects of speech, the slowly varying temporal envelope and the rapidly varying temporal fine structure (TFS), have been extensively studied (Ardoint et al., 2010; Hopkins and Moore, 2011). Although the envelope appears to be the dominant cue for understanding speech in quiet and in steady-state noise, both the envelope and TFS may play a role when listening to speech in fluctuating noise (Shannon et al., 1995; Moore, 2008). However, the role of TFS for understanding speech in fluctuating noise remains an area of debate (Oxenham and Simonson, 2009). In an animal model, sensorineural hearing loss is associated with enhanced envelope coding in the auditory nerve (Kale and Heinz, 2010) and midbrain (Anderson et al., 2013a). Perceptual studies also suggest exaggerated encoding of the envelope in humans with unilateral hearing loss (Moore et al., 1996) and potentially reduced ability to use TFS cues (Lorenzi et al., 2006). These effects might explain the hearing impaired listener's complaint that speech is loud yet unclear (Johnson and Dillon, 2011).

Because hearing aid amplification primarily addresses loss of audibility associated with cochlear pathology, there is a need to develop novel methods for counteracting the deleterious effects of hearing loss on central processing. Here, we assessed whether auditory-based cognitive training can be used to partially restore the imbalance of speech cue representation associated with exaggerated envelope encoding. Since both TFS (Sheft et al., 2008) and

*<sup>2</sup> Department of Communication Sciences, Northwestern University, Evanston, IL, USA*

*<sup>3</sup> Institute for Neuroscience, Northwestern University, Evanston, IL, USA*

envelope (Swaminathan and Heinz, 2012) cues play an important role in consonant perception, we hypothesized that directed attention to the fast-changing consonant-vowel (CV) transition within an adaptive training paradigm results in reweighing of speech cue representation, such that the TFS becomes more salient. To test this hypothesis, we randomly assigned older adults with and without hearing loss to complete 8 weeks of computerized training. We evaluated subcortical representation of envelope and TFS cues before and after training in addition to speech perception in noise, auditory short-term memory, and auditory attention.

## **MATERIALS AND METHODS PARTICIPANTS**

We recruited 58 (35 female) participants (ages 55–79) for a study examining training for speech-in-noise processing. Puretone audiometric thresholds were obtained bilaterally at octave intervals 0.125–8 kHz and at 3 and 6 kHz. No left/right asymmetries or interaural click-evoked auditory brainstem response Wave V differences (≥0.2 ms) were noted. No participants had a history of neurologic conditions and all participants had normal IQs [≥85 on the WASI (Zhu and Garcia, 1999)]. Participants provided informed consent for procedures that were approved by the Northwestern Institutional Review Board and were paid for their time.

Participants were divided into normal-hearing and hearingimpaired subgroups. The normal-hearing participants had hearing thresholds ≤25 dB HL through 6 kHz and the participants with hearing loss had hearing thresholds ≤80 dB HL through 8 kHz. See **Figure 1** for average thresholds in the normal-hearing and hearing-impaired groups. Participants from both hearing groups were randomly assigned to complete either auditorybased cognitive training or active control training (Smith et al., 2009). Both involved training on an in-home computer, 1 h/day, 5 days/week, for 8 weeks. The *auditory training* group completed Brain Fitness™ cognitive training (Posit Science Corporation, San Francisco, CA) consisting of six modules designed to increase the speed and accuracy of auditory processing: (1) time-order judgments of frequency-modulated sweeps, (2) discriminating between pairs of confusable syllables, (3) recognizing sequences of confusable syllables and words, (4) matching pairs of confusable syllables and words, (5) implementing sequences of commands, and (6) answering questions from stories (see Smith et al., 2009 for details). In the first module, an adaptively decreasing inter-stimulus interval challenges processing speed. In subsequent modules, focused attention is directed to adaptively expanding and contracting CV transitions. The overall goal of the training is to improve sensory function, based on the idea that an increase in the quality of neural information flowing through peripheral and central sensory systems may lead to better cognitive function (Schneider and Pichora-Fuller, 2000). Completion of training was verified through automated online logs. The *active control* group watched educational DVDs and completed multiple-choice questions about the content. See **Figure 2** for a schematic of the experimental design. Training groups (and hearing subgroups) were matched for sex, age, hearing, click-evoked wave V brainstem latency, IQ, and test-retest intervals (**Table 1**).

#### **BEHAVIORAL ASSESSMENTS**

#### *Perceptual—speech perception in noise*

The QuickSIN is a non-adaptive measure of speech perception in noise. Sentences are presented in a background of four-talker babble and the SNR decreases by 5 dB for each sentence. Participants receive a point for correctly repeating target words. The total number of points is subtracted from 25.5 to arrive at a final SNR loss [or the increase in SNR required to achieve 50% correct performance compared to normal performance of 0 dB SNR; see Killion et al. (2004)].

#### *Cognitive*

Two subtests of the Woodcock-Johnson III Cognitive Test Battery (Woodcock et al., 2001) were used to obtain an age-normed cluster score for auditory short-term memory: Numbers Reversed and Memory for Words. The attention score was based on the sustained index of overall attention (visual and auditory) on the IVA+: Integrated Visual and Auditory Continuous Performance Test (BrainTrain, North Chesterfield, VA). Due to equipment malfunction, we are missing attention data from 3 participants in the *auditory training* group and 1 participant in the *active control* group.

## **ELECTROPHYSIOLOGY**

#### *Stimulus*

We duplicated the electrophysiologic methods previously used to document enhanced envelope coding in older adults with hearing loss (Anderson et al., 2013a), using the protocol that produced the greatest hearing loss effects. The stimulus was a 40-ms syllable [da] synthesized in a Klatt-based synthesizer (Klatt, 1980). It began with a 5-ms onset burst followed by a CV transition and was perceived as a full CV syllable although it lacked a steady-state vowel. After the initial onset burst, the fundamental frequency (F0) of the stimulus rose linearly from 103 to 125 Hz while the formants shifted as follows: F1: 220 → 720 Hz; F2: 1700 → 1240 Hz; F3: 2580 → 2500 Hz. The fourth

**Table 1 | Means and** *SD***s are provided for age, IQ, and hearing (PTA dB HL; average hearing threshold 0.5–4 kHz) for all participants in the** *Auditory Training* **and** *Active Control* **groups and the subgroups of normal hearing and hearing impaired.**


*The groups are matched on all listed variables (all p's* > *0.10).*

(3600 Hz) and fifth (4500 Hz) formants remained constant for the duration of the stimulus. A spectrogram and Fourier transform of the stimulus waveform are presented in **Figures 3A,B**, respectively.

To partially equate for the effects of hearing loss, the [da] stimulus was individually amplified based on the National Acoustics Laboratory-Revised algorithm (NAL-R; Byrne and Dillon, 1986) using a custom program in MATLAB (The MathWorks, Inc., Natick, MA). The amplified presentation level did not exceed 90 dB SPL or the participants' loudness discomfort thresholds. We have previously found that the amplification procedure improves morphology without distorting spectral components of the response (Anderson et al., 2013b). In addition, although we found enhanced encoding in brainstem responses to both amplified and unamplified stimuli, the differences were clearest for the amplified stimuli (Anderson et al., 2013a); therefore, we expected to see the greatest treatment effects in response to the amplified stimuli. Repeated hearing tests verified that there were no changes in hearing after 8 weeks and therefore all participants received the same amplified stimulus before and after training.

#### *Recording*

The [da] was presented binaurally in pink noise at +10 dB SNR. Stimuli were presented at 80 dB SPL for normal hearing listeners and amplified stimuli for participants with hearing loss were presented at an intensity of up to 90 dB SPL, with most stimuli in the range of 80–83 dB SPL. Stimuli were presented via the Bio-Logic Navigator Pro System (Natus Medical, Inc., Mundelein, IL) at a rate of 10.9 Hz (inter-stimulus interval of 52 ms) through electromagnetically shielded insert earphones (ER-3A, Etymotic Research, Elk Grove Village, IL). A vertical montage of four Ag-AgCl electrodes (Cz active, Fpz ground, earlobe references) was used with all contact impedances <5 k-. A criterion of ± 23µV was used for online artifact rejection. Two blocks of 3000 artifactfree sweeps were collected in each condition for each participant and averaged using an 85.3-ms window, including a 15.8-ms prestimulus period. Responses were sampled at 12 kHz and were online bandpass filtered from 100 to 2000 Hz (Butterworth filter, 12 dB/octave, zero phase-shift) to minimize disruption by the low-frequency cortical response and to sample energy up to the phase-locking limits of the brainstem (Liu et al., 2006).

The [da] was presented in alternating polarities, allowing for the creation of responses comprised of both the sum and the difference of the two polarities (Campbell et al., 2012). When creating the summed frequency following response (FFR), the non-inverting envelope component of the response is enhanced while the inverting TFS component is minimized; conversely, for the subtracted responses the inverting TFS component is enhanced while the non-inverting envelope component is substantially reduced (Aiken and Picton, 2008). While this analysis is a somewhat different operationalization of envelope and TFS representation than is used in some other studies, the selection of measures of envelope and TFS representation is driven by the model organization and site of interest (Shamma and Lorenzi,

2013), and adding/subtracted brainstem responses to alternating polarities is a typical method used in analyzing human FFRs (Aiken and Picton, 2008; Gockel et al., 2011; Anderson et al., 2013a). Nevertheless, we note that the relationship between envelope/TFS representation reflected in the FFR, perceptual measures, and auditory nerve coding remains an avenue for future research. Spectral amplitudes were calculated using fast Fourier transforms (FFTs) over 60 Hz bins around the frequencies of interest, which included the fundamental frequency (F0) and its integer harmonics. The time region chosen for this calculation was 20–42 ms, corresponding to the most periodic time region of the FFR.

normal hearing has higher amplitudes in the fine structure dominated higher frequencies (H3–H6) than the group with hearing loss **(D)**. <sup>∗</sup>*p* < 0.05,

#### **STATISTICAL ANALYSIS**

∗∗*p* < 0.01.

A Multivariate Analysis of Variance (MANOVA) was used to assess hearing group differences in the envelope and TFS. The F0 and H2 amplitudes from the added polarities (henceforth F0ADD– H2ADD) were entered as dependent variables in the MANOVA to represent the envelope, as these lower frequency spectral peaks dominate the envelope-following FFR. The H3–H6 amplitudes from the subtracted polarities (H3SUB–H6SUB) were entered as dependent variables to represent the TFS, which is more prominent in relatively higher frequencies in the FFR. A repeated measures ANOVA was used to compare envelope-dominated (F0ADD–H2ADD) and TFS-dominated (H3SUB–H6SUB) frequency encoding, QuickSIN SNR scores, and memory and attention measures before and after training.

#### **RESULTS**

#### **EFFECTS OF HEARING LOSS ON ENCODING SPEECH ENVELOPE/TFS**

The combined participants with hearing loss from both groups had greater representation of the envelope (F0ADD and H2ADD) than participants with normal hearing [*F*(1, <sup>57</sup>) = 7.218, *p* = 0.002] (**Figure 3C**), replicating a previous study demonstrating greater subcortical representation of the envelope in response to speech in noise in older adults with hearing loss (Anderson et al., 2013a). In contrast to the Anderson et al. study (2013a), however, we also found reduced representation of the TFS (H3SUB–H6SUB) in the older adults with hearing loss [*F*(1, <sup>57</sup>) = 3.066, *p* = 0.024] (**Figure 3D**). The current study comprised 58 participants vs. the 30 participants in the previous study; therefore, the new finding of significant differences in TFS representation in the current study may be attributed to increased power and is in fact consistent with Henry and Heinz (2012), who found a reduction in TFS coding in noise in auditory nerve fibers of chinchillas with noise-induced hearing loss.

#### **TRAINING: NEUROPHYSIOLOGICAL CHANGES**

In the group with hearing loss, we found a training group × test session interaction [*F*(1, <sup>30</sup>) = 4.351, *p* = 0.023], with a significant reduction in envelope encoding (F0ADD to H2ADD) occurring within the *auditory training* group [*F*(1, <sup>14</sup>) = 3.843, *p* = 0.049] but not the *active control* group [*F*(1, <sup>14</sup>) = 0.381, *p* = 0.691] (see **Figures 4A,B**). We also analyzed the pre and post data of the groups with normal hearing and found no training group × test session interaction [*F*(1, <sup>27</sup>) = 0.803, *p* = 0.459] and no changes in either group (all *p*'s > 0.1) (see **Figures 4C,D**). Although there was no training group × hearing group × test session interaction [*F*(1, <sup>57</sup>) = 2.239, *p* = 0.117], there was a hearing group × test session interaction in the *auditory training* group [*F*(1, <sup>29</sup>) = 3.573, *p* = 0.043] that was not present in the *active control* group [*F*(1, <sup>29</sup>) = 0.136, *p* = 0.874] (see **Figures 4E,F**), suggesting that the auditory training effect was specific to the participants with hearing loss. Representation of the TFS (H3SUB–H6SUB) did not change for either hearing impaired or normal hearing participants of either training group (all *p*'s > 0.1) (**Figure 5**). **Figures 6**, **7** display mean F0ADD and H2ADD and mean H3SUB– H6SUB amplitudes, respectively, for individual participants. It is evident from these figures that individuals with hearing loss have greater variability than individuals with normal hearing.

#### **TRAINING: BEHAVIORAL CHANGES**

Summary: There were significant training-induced changes in speech-in-noise perception, memory, and attention across hearing groups in the *auditory training* group. The training effects

**FIGURE 4 | (A,B)** A comparison of pre- (dotted lines) and post-training responses (solid lines) to speech in noise to the envelope (F0–H2) in the *auditory training* (red) and *active control* (blue) groups with hearing loss, demonstrating a significant reduction in response to the envelope in the *auditory training* group. **(C,D)** No reduction was seen in response to the envelope in either group with normal hearing. **(E)** A significant hearing × session interaction was noted in the *auditory training* group, demonstrating that the change was specific to the participants with hearing loss. **(F)** A significant group × session interaction in the groups with hearing loss indicating a reduction in the representation of the F0 in the *auditory training* group only. <sup>∗</sup>*p* < 0.05. Error bars: ± 1 SE.

specific to hearing status varied depending on the task. The improvement in speech-in-noise performance was specific to the hearing impaired group, the memory change was found only in the normal hearing group, and attention improved in both groups. There were no corresponding changes in the *active control* group.

For speech-in-noise performance (QuickSIN), there was a significant training group × test session interaction [*F*(1, <sup>57</sup>) = 5.191, *p* = 0.027], with improvement noted in the *auditory training* group [*F*(1, <sup>28</sup>) = 13.394, *p* = 0.001] but not in the *active control* group [*F*(1, <sup>28</sup>) = 1.678, *p* = 0.206]. This change was largely driven by the improvement in performance in the auditory training group with hearing loss [*F*(1, <sup>14</sup>) = 12.220, *p* =

**FIGURE 6 | Mean F0ADD and H2ADD amplitudes are displayed for individual pre- (open circles) and post-data (closed circles) for the auditory training (red) and active control (blue) groups.** Visual observation of the data reveals that there is greater pre-training variability in both groups with hearing loss and in the degree of change in the auditory training group with hearing loss.

data for the envelope, the data demonstrates greater variability in both groups with hearing loss for pre-test data, but there is no systematic change with training as was found for the envelope.

0.004]; the improvement in the normal-hearing group was not significant [*F*(1, <sup>14</sup>) = 3.041, *p* = 0.105]. Neither hearing group in the *active control* group changed with training (all *p*'s > 0.1). There was a decrease in the QuickSIN of 1.22 dB in the *auditory training* group with hearing loss—given four lists, this number is below the 1.9 dB necessary for a critical difference between conditions with an 95% confidence interval (Killion et al., 2004). However, a 1 dB decrease in SNR corresponds to approximately a 10% increase in word recognition—a difference that is likely to be noticeable to the listener (Middelweerd et al., 1990). It should also be noted that the participants with hearing loss did not have more than a mild SNR loss—less than 7 dB—so greater gains might be expected from individuals with greater deficits.

Similarly, for short-term memory there was a significant training group × test session interaction [*F*(1, <sup>57</sup>) = 6.042, *p* = 0.017], with improvements in the *auditory training* group [*F*(1, <sup>28</sup>) = 9.800, *p* = 0.004] but not the *active control* group [*F*(1, <sup>28</sup>) = 0.158, *p* = 0.694]. For memory, however, the changes were only significant in the group with normal hearing [*F*(1, <sup>28</sup>) = 7.648, *p* = 0.016] but not in the group with hearing loss [*F*(1, <sup>28</sup>) = 2.630, *p* = 0.127] and neither hearing group in the *active control* group changed (all *p*'s > 0.1).

Finally, for attention, there was a significant training group × test session interaction [*F*(1, <sup>50</sup>) = 3.765, *p* = 0.043], with improvements in the *auditory training* group [*F*(1, <sup>25</sup>) = 17.941, *p* < 0.001] but not the *active control* group [*F*(1, <sup>24</sup>) = 0.623, *p* = 0.438]. In this case, there were significant improvements for both the subgroups with normal hearing [*F*(1, <sup>11</sup>) = 8.182, *p* = 0.016] and with hearing loss of the *auditory training* group [*F*(1, <sup>12</sup>) = 9.339, *p* = 0.009]. Again, there were no changes for members of either hearing subgroup of the *active control* group (all *p*'s > 0.1). See **Figure 8** for interaction plots of behavioral changes. Please refer to **Table 2** for means and standard deviations of pre- and post-training changes in behavioral measures for the normal hearing and hearing impaired participants of the *auditory training* and *active control* groups.

## **DISCUSSION**

Here we show that the imbalance in neurophysiological processing of speech cues associated with hearing loss is malleable and reversible with training in older adults. The results are summarized as follows: first, compared to normal hearing individuals, older adults with hearing loss have excessively large envelope encoding of speech in noise. Second, training that targets the CV transition reduces envelope representation in individuals with hearing loss to levels in line with those of normal-hearing individuals. Finally, the training-induced changes in neurophysiology are accompanied by gains in speech-in-noise perception, attention, and short-term memory, although it should be noted that the change in speech-in-noise perception was modest, and would not be considered clinically significant (Killion et al., 2004). Nevertheless, we note that an improvement of 1 dB SNR corresponds to approximately 10% word intelligibility in noise (Middelweerd et al., 1990), so the changes may be perceived as beneficial to the listener.

Our results confirm previous findings of exaggerated representation of envelope cues in animal and human models of sensorineural hearing loss (Kale and Heinz, 2010; Henry and Heinz, 2012; Anderson et al., 2013a), providing a possible explanation for the observation that the hearing impaired listener perceives speech as loud but unclear (Jin and Nelson, 2010). At present, the


**Table 2 | Means and** *SD***s are provided for pre- and post-test scores for speech-in-noise perception, auditory short-term memory and attention for the** *Auditory Training* **and** *Active Control* **groups, including the subgroups of participants with normal hearing and with hearing loss.**

mechanisms underlying this exaggerated representation in auditory brainstem are unknown; however, evidence from auditory nerve suggests a peripheral etiology arising from reduced outer hair cell compression in cases of mild to moderate hearing loss and from inner hair cell damage (steeper input-output functions) in cases of moderate to severe hearing loss (Kale and Heinz, 2010). The fact that envelope coding changed with training suggests that there may also be a top-down central gain effect resulting from auditory deprivation (Munro and Blount, 2009). However, there was no change in TFS coding in the training group. The effects of hearing loss on envelope and TFS representation are a function of complex interactions among cochlear function, stimulus presentation level, and SNR (Henry and Heinz, 2012). One possibility is that a change in TFS coding was too subtle to be observed at the single SNR we used. Another important consideration is the nature of the training: all of the training stimuli were presented in quiet, not in noise. It is possible that this presentation technique favored envelope coding, which is maladaptively enhanced even in quiet in listeners with cochlear hearing loss (Anderson et al., 2013a). Further investigation to establish the effects of hearing loss at different presentation levels and SNRs is warranted, as a better understanding of the mechanisms underlying abnormal stimulus encoding will help guide future treatment efforts.

The results of our study contrast with those of two previous studies that found training-induced increases in envelope coding in normal-hearing young adults using different types of training: pitch discrimination training (Carcagno and Plack, 2011) and recognition of speech presented in babble and other challenging conditions (Song et al., 2012). The key difference between these studies and ours is that our study population included individuals with hearing impairment. Given the differences between normal-hearing and hearing-impaired listeners in performance on perceptual tasks, neurophysiological encoding of sound, and reliance on cognitive mechanisms for speech intelligibility (Lorenzi et al., 2006; Anderson et al., 2013a; Humes et al., 2013), it is not necessarily surprising that there are different effects of auditory training. Another important difference is the training itself. The training in our study directed attention to fast-changing sounds in high memory load situations and occurred in quiet. It may be that training on recognition of speech in noise, such as in the Song et al. study, produces more robust gains in speech-in-noise perception and different outcomes when comparing pre and post envelope and TFS coding. As another example, FFR neural representation of the F0 is correlated with behavioral pitch discrimination, (Krishnan et al., 2010; Marmel et al., 2013); therefore, enhanced neural representation of cues contributing to the perception of pitch may underlie the traininginduced gains in pitch discrimination found by Carcagno and Plack (2011). Thus, outcomes may vary depending on the type of training and the targeted population.

We propose, therefore, that the training effects in this study were influenced by neural mechanisms specific to older adults with hearing loss. Both aging (Turner et al., 2005; Schatteman et al., 2008) and hearing loss (Vale and Sanes, 2002; Dong et al., 2009) appear to cause an imbalance in excitatory and inhibitory function, likely affecting stimulus encoding. However, this imbalance is at least partially reversed in animal models of auditory training (de Villers-Sidani et al., 2010) and acoustic experience (Turner et al., 2013). Although we are unable to verify these effects in humans, we speculate that the training-induced changes were facilitated by alterations in the balance of neurotransmitter levels to allow for precise encoding of subtle CV differences. That said, just as it is difficult to disambiguate between peripheral (i.e., outer hair cell loss and peripheral neuropathy) vs. central mechanisms of hearing loss, it is difficult to say with certainty which mechanisms were targeted by training.

Our training was designed to strengthen sensory function through attention to meaningful sound; specifically, focusing attention on CV transitions in speech may drive top-down modulation, which occurred in five of the six training modules. Animal models have demonstrated that directed attention to behaviorally-relevant stimuli is necessary for neural and behavioral plasticity (Fritz et al., 2005, 2007). In our study, we found that neural response changes were accompanied by improvements in attention. A functional connection between prefrontal and auditory cortices provides a basis for efferent activation during difficult tasks that require focused attention (Raizada and Poldrack, 2007). A tentative connection has also been suggested between prefrontal cortex and auditory brainstem based on fMRI studies (Roelfsema et al., 2010). Although our methodology does not allow us to draw similar conclusions, we propose that rapid TFS cues may become more salient when sensory and cognitive demands drive a prefrontal-brainstem connection to adjust subcortical encoding of these cues.

A number of questions remain unanswered. We did not include anyone in our study who wore hearing aids, because hearing aids themselves may induce plasticity in auditory processing (Munro et al., 2007; Munro and Merrett, 2013). Therefore, the use of amplification itself can be considered a form of training. Hornickel et al. (2012) demonstrated that assistive listening devices engender improved trial-to-trial consistency in brainstem firing, presumably by directing attention to a more robust and noise-free representation of the stimulus. Future work should compare the neural effects of amplification alone vs. amplification plus training. In addition, more work should be done to determine the persistence of training effects and the time course of learning. A number of approaches were employed in our training protocol, involving strictly bottom-up discrimination training and combined memory and perceptual training exercises. It would be important to identify the aspects of training that were primarily responsible for engendering plasticity. We note that there is a great deal of individual variability in the participants with hearing loss, both in terms of pre-training spectral amplitudes and in the degree of change. In future work, it will be important to identify sources of variability, and the factors that contribute to success in individuals.

Finally, although neither envelope nor TFS representation changed in the normal hearing group, this group did experience changes in attention and short-term memory. Therefore, we assume the normal-hearing participants experienced traininginduced biological changes that were not observed using the current envelope/TFS technique. The lack of observed improvement in memory in the group with hearing loss may be due to the auditory nature of our assessment measure. Although the presentation level was adjusted to a comfortable volume for each participant, it may be that peripheral hearing loss limited performance, even in the post-training session, given the known effects of hearing loss on verbal short-term memory (McCoy et al., 2005; Verhaegen et al., 2013). The speech-in-noise perception scores (QuickSIN) of normal hearing groups would be considered clinically normal—less than 3 dB SNR loss (Killion et al., 2004), and so the lack of improvement in this group probably stems from a ceiling effect. Further investigation is needed to determine the mechanisms of training-induced improvements in individuals with normal hearing.

These results have implications for management of hearing loss problems in older adults. Although provision of audibility is a necessary foundation for the treatment of hearing difficulties, we assert that auditory training is just as important. The combination of amplification and auditory training may lead to improved speech clarity in noise without distortion from overamplification.

#### **ACKNOWLEDGMENTS**

We thank Trent Nicol, Karen Chan, and Alexandra Parbery-Clark for their critical reviews of the manuscript. Supported by the NIH (R01 DC010016) and the Knowles Hearing Center.

## **REFERENCES**


after unilateral partial hearing loss. *Neuroscience* 159, 1164–1174. doi: 10.1016/j.neuroscience.2009.01.043


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 02 August 2013; accepted: 11 November 2013; published online: 28 November 2013.*

*Citation: Anderson S, White-Schwoch T, Choi HJ and Kraus N (2013) Training changes processing of speech cues in older adults with hearing loss. Front. Syst. Neurosci. 7:97. doi: 10.3389/fnsys.2013.00097*

*This article was submitted to the journal Frontiers in Systems Neuroscience.*

*Copyright © 2013 Anderson, White-Schwoch, Choi and Kraus. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Lexico-semantic and acoustic-phonetic processes in the perception of noise-vocoded speech: implications for cochlear implantation

#### *Carolyn McGettigan1,2\*, Stuart Rosen3 and Sophie K. Scott <sup>2</sup> \**

*<sup>1</sup> Department of Psychology, Royal Holloway, University of London, Egham, UK*

*<sup>2</sup> Institute of Cognitive Neuroscience, University College London, London, UK*

*<sup>3</sup> Department of Speech, Hearing and Phonetic Sciences, University College London, London, UK*

#### *Edited by:*

*Jonathan E. Peelle, Washington University in St. Louis, USA*

#### *Reviewed by:*

*Alexis Georges Hervais-Adelman, University of Geneva, Switzerland Kristin Van Engen, Washington University in St. Louis, USA*

#### *\*Correspondence:*

*Carolyn McGettigan, Department of Psychology, Royal Holloway, University of London, Egham Hill, Egham TW20 0EX, UK e-mail: Carolyn.McGettigan@ rhul.ac.uk;*

*Sophie K. Scott, Institute of Cognitive Neuroscience, University College London, London WC1N 3AR, UK*

*e-mail: sophie.scott@ucl.ac.uk*

Noise-vocoding is a transformation which, when applied to speech, severely reduces spectral resolution and eliminates periodicity, yielding a stimulus that sounds "like a harsh whisper" (Scott et al., 2000, p. 2401). This process simulates a cochlear implant, where the activity of many thousand hair cells in the inner ear is replaced by direct stimulation of the auditory nerve by a small number of tonotopically-arranged electrodes. Although a cochlear implant offers a powerful means of restoring some degree of hearing to profoundly deaf individuals, the outcomes for spoken communication are highly variable (Moore and Shannon, 2009). Some variability may arise from differences in peripheral representation (e.g., the degree of residual nerve survival) but some may reflect differences in higher-order linguistic processing. In order to explore this possibility, we used noise-vocoding to explore speech recognition and perceptual learning in normal-hearing listeners tested across several levels of the linguistic hierarchy: segments (consonants and vowels), single words, and sentences. Listeners improved significantly on all tasks across two test sessions. In the first session, individual differences analyses revealed two independently varying sources of variability: one lexico-semantic in nature and implicating the recognition of words and sentences, and the other an acoustic-phonetic factor associated with words and segments. However, consequent to learning, by the second session there was a more uniform covariance pattern concerning all stimulus types. A further analysis of phonetic feature recognition allowed greater insight into learning-related changes in perception and showed that, surprisingly, participants did not make full use of cues that were preserved in the stimuli (e.g., vowel duration). We discuss these findings in relation cochlear implantation, and suggest auditory training strategies to maximize speech recognition performance in the absence of typical cues.

#### **Keywords: speech perception, individual differences, cochlear implants**

## **INTRODUCTION**

A cochlear implant (CI) is a hearing aid that converts acoustic sound energy into electrical stimuli to be transmitted to the auditory nerve, via an array of electrodes arranged tonotopically along the basilar membrane of the inner ear (Rubinstein, 2004). Although the implant restores some degree of hearing to profoundly deaf individuals, the substitution of thousands of inner hair cells with, at most, tens of electrodes means that the transmitted signal is greatly impoverished in spectral detail. CI devices give a weak sense of voice pitch and transmit fewer discriminable steps in amplitude, and there is often a mis-match between frequencies being transmitted by the individual electrodes and those best received at the position of contact on the basilar membrane. Thus, particularly for post-lingual recipients of an implant (i.e., those who lost their hearing after the acquizition of language), the listener must learn to make sense of an altered and unfamiliar sound world. This process of adaptation and perceptual learning after cochlear implantation can take a long time, with widely varying levels of success (Pisoni, 2000; Sarant et al., 2001; Munson et al., 2003; Skinner, 2003). Much research relating to implantation has therefore been concerned with identifying predictive markers of success, and appropriate training regimes to optimize post-implantation outcomes.

A growing body of studies has employed acoustic simulations of CIs to model post-implantation adaptation in normal-hearing participants. Vocoding is an acoustic transformation that produces speech with degraded spectral detail by replacing the original wideband speech signal with a variable number of amplitude-modulated carriers, such as noise bands (noise-vocoding) or sine waves (tone vocoding). Here, the carriers simulate the electrodes of the CI to create a re-synthesized speech signal that is spectrally impoverished, yet maintains relatively intact amplitude envelope cues (Shannon et al., 1995). Here, increasing the number of bands (or channels) increases the spectral resolution, with a concomitant increase in intelligibility of transformed sounds. As with the studies in CI recipients, it has been shown that normal-hearing participants also exhibit considerable individual variability in performance with CI simulations (Nogaki et al., 2007; Stacey and Summerfield, 2007; Eisner et al., 2010).

An important issue for clinicians testing and training CI recipients is the selection of appropriate materials (Loebach and Pisoni, 2008; Loebach et al., 2008, 2009). The main day-to-day context for spoken communication is spontaneous face-to-face conversation, so it is natural to consider training paradigms similar to that situation. For example, connected discourse tracking (CDT), using face-to-face repetition of a story told by an experimenter, has been shown to yield significant improvements in the recognition of severely degraded speech in hearing participants (Rosen et al., 1999). However, delivery of this kind of training is very labor-intensive (though a recent study comparing live CDT with a computer-based approach showed equivalent training benefits from the two training routines; Faulkner et al., 2012). Thus, training and assessment routines typically involve the recognition of laboratory recordings of materials such as sentences, words and simple syllables. There is some evidence that improvements with one kind of test material can generalize to another. For example, Loebach and Pisoni (2008) found that training participants with exposure and feedback on either words, sentences or environmental sounds gave improvements in performance that generalized to the other tasks. In a very small group of three CI recipients, Fu et al. (2005) found that training with CV and CVC syllables (where "C" stands for Consonant, and "V" for Vowel) led to improved test performance on sentence recognition. However, the picture is not straightforward: Loebach and Pisoni (2008) found that generalization was most effective between materials of the same class (e.g., words to words, sentences to sentences), and that training on speech materials did not afford any improvements in recognition of environmental sounds. This may depend on the nature of the vocoding transformation, as this severely degrades spectral detail important for recognizing some environmental sounds—similarly, training on vocoded sentence materials gave poor generalization to the recognition of talkers (Loebach et al., 2009).

Imbalances in learning transfer may also be affected by the behavior of the listener. Loebach et al. (2008) found that training on talker identification afforded greater generalization to vocoded sentence transcription than training on talker gender identification. The authors suggest this is because the more difficult task of talker identification led to greater attentional engagement of the listeners with the acoustic properties of vocoded speech. Similarly, Loebach et al. (2010) and Davis et al. (2005) found that training participants with semantically anomalous sentences was just as effective as training with meaningful sentences, while Hervais-Adelman et al. (2008) found equivalent improvements in performance after training with vocoded nonwords as observed with real-word training. Loebach et al. (2010) suggest this is because the absence of semantic cues engages a more analytic listening mode in the listener, where attention is directed to the acoustic-phonetic aspects of the signal rather than "synthetic," higher-order processes focused on linguistic comprehension. Although they acknowledge that sentence and word materials afford greater ecological validity (Loebach et al., 2010), Loebach and colleagues suggest that encouraging analytic, acoustic-phonetic, listening can afford better generalization of learning across a range of materials.

Another way to view the data from these training studies is that there are potentially both "analytic" and "synthetic" factors at play when listeners adapt to degraded speech input. However, the extent to which these operate independently is not known, and it may be that some listeners would stand to benefit more from training on higher-order processes for employment in the recognition of ecologically valid, linguistic materials encountered in day-to-day life. A study by Grant and Seitz (2000) showed that use of "top–down" contextual information—what Loebach and colleagues would consider a "synthetic" process varies across individuals. They presented 34 hearing-impaired listeners with filtered sentences from the IEEE corpus (e.g., "Glue the sheet to the dark blue background"; IEEE, 1969), and their constituent keywords in isolation, at three different intelligibility levels. Using Boothroyd and Nittrouer's (1988) equation explaining the relationship between word recognition in sentences and in isolation, Grant and Seitz (2000) calculated individual k-factor scores—representing the listener's ability to use semantic and morpho-syntactic information in the sentence to identify the words within it—and observed considerable variability in this parameter across their listening population. In further support of multiple factors underlying speech recognition, Surprenant and Watson's (2001) large-scale study of individual variability in speech-in-noise recognition indicated that performance is far from identical across different linguistic levels—Pearson's correlation coefficients between speech-in-noise recognition of CV-units, words and sentences and a clear-speech syllable identification task ranged from only 0.25 to 0.47 in their experiment. Therefore, extracting patterns of covariance, and measuring how these change with learning, could offer additional insight into the underlying perceptual processes supporting adaptation to a CI or simulation. For example, close correlation of speech recognition at segment, word and sentence level may indicate a unified "analytic" strategy, whereas statistical independence of sentence stimuli from words and segments could reflect considerable importance for top–down syntactic and semantic processing strategies in recognizing vocoded sentences.

In the current experiment, we tested a group of normalhearing adults in the transcription of noise-vocoded sentences, words and segments at a range of difficulty levels (operationalized in terms of the number of vocoding bands), with the following objectives:


Davis et al. (2005) found that transcription of vocoded sentences improved significantly within 30 items of exposure, in the absence of feedback. Therefore, to minimize design complexity and optimize the exploitation of individual differences analyses, learning in the current experiment was operationalized as the improvement in performance from Session 1 to Session 2, without involvement of explicit training procedures.

• *To quantify the efficiency of "analytic" listening*—Loebach et al. (2010) suggested that failure to attend to critical acoustic properties of vocoded stimuli may limit the transfer of learning. Our final objective was to use information transfer (IT) analyses of consonant and vowel perception to quantify the reception of acoustic-phonetic features, and to directly assess the degree to which the acoustic cues present in the stimulus are being used by untrained listeners.

Previous work on the learning of vocoded speech has tended to train and test participants at a fixed level of degradation (i.e., number of bands; Davis et al., 2005; Hervais-Adelman et al., 2008; Loebach and Pisoni, 2008; Eisner et al., 2010). Given the considerable variation in individual performance with vocoded stimuli, this runs the risk of floor or ceiling effects in the data. For the current experiment, we adopted an approach used by Shannon et al. (2004), who tested across a range of difficulty levels (numbers of channels) and fitted logistic functions to describe performance on a range of noise-vocoded speech recognition tasks. In the current experiment, curves were fitted to the recognition data for each participant, by task and by session—estimates of 50% thresholds, representing performance across a range of difficulty, could then be extracted for use in further analysis of learning effects and covariation across tasks and time.

### **METHOD**

#### **PARTICIPANTS**

Participants were 28 monolingual speakers of British English (12 male), with no language or hearing problems. The participants were recruited from the UCL Department of Psychology Subject Pool using an age inclusion criterion of 18–40 years old (individual date of birth information was not collected). All participants were naïve to noise-vocoded speech.

#### **MATERIALS**

Listeners were tested on perception of 5 different stimulus types, all vocoded with 1, 2, 4, 8, 16, and 32 bands (where 1 is most degraded, and 32 the most intelligible). The items were also available in undistorted form. All materials were recorded by a female speaker of Standard Southern British English in a soundproof, anechoic chamber. Recordings were made on a Digital Audio Tape recorder (Sony 60ES) and fed to the S/PDIF digital input of an M-Audio Delta 66 PC soundcard. The files were then downsampled at a rate of 44100 Hz to mono.wav files with 16 bit resolution using Cool Edit 96 software (Syntrillium Software Corporation, USA). The vocoding algorithm followed the general scheme described by Shannon et al. (1995), with analysis and output filters between 100–5000 Hz and envelope extraction via half-wave rectification and low-pass filtering at 400 Hz.

The stimulus sets were as follows:


#### **DESIGN AND PROCEDURE**

Twenty-seven listeners made two visits to the lab, separated by 7–15 days (*M* = 10.44 days, *SD* = 2.69), while the twenty-eighth participant could only return after 78 days. All stimulus presentation routines were programmed and run in MATLAB v7.1 (The Mathworks, Inc., Natick, MA).


In each session, the tasks were administered in the order: BKB sentences, IEEE sentences, words, consonants, vowels. All test materials were presented over Sennheiser HD25-SP headphones in a quiet room, at a fixed volume setting using QuickMix (Version 1.06; Product Technology Partners, Cambridge, UK). The sentences and words tasks were open-set recognition tasks. Each stimulus was played once and the participant gave a typed report of the item content. Responses were self-timed. The listener was encouraged to type as much as possible from what they heard (and that partial answers were acceptable), but were also told that it was fine to leave a blank response bar if the item was completely unintelligible. The consonants and vowels tasks each adopted a 17-alternative forced-choice paradigm. The response choices were presented on a printed sheet which remained in view for the duration of the task. In these two tasks, listeners were encouraged not to leave any gaps, even when they were completely unsure of the answer.

#### **ANALYSIS**

For each participant, performance on the tasks was scored as the proportion of keywords/items correct at each distortion level. For the sentences, a scoring system was adopted in which deviations in tense and number agreement on nouns (i.e., if the participant reported "men" when the actual keyword was "man") and verbs (i.e., if the participant reported "carries" or "carried" when the correct word was "carry") were allowed. The reasoning behind this approach was to allow for errors that may have resulted from the participant's attempts to report a grammatically correct sentence for each item. For example, if the participant hears the first keyword in "the cup hangs on a hook" as "cups," then he/she may choose to report "hang" as the second keyword, in order to maintain number agreement. For both the Sentences and Words, typographic errors that produced homophones of the target word e.g., "bare" and "bear" were permitted.

#### *Psychometric performance curves*

Logistic curve-fitting was performed on group data (by task and session), and on each individual data set (by participant, task, and session) using the psignifit software package (Wichmann and Hill, 2001a,b). For superior fits, the distortion levels (number of bands) were converted into their log10 equivalents (as used by Shannon et al., 2004). Data from undistorted stimuli were not included. The equation used for fitting is shown in **Figure 1**.

In the output of the fitting procedure, the alpha parameter corresponds to the number of bands giving 50% of maximum

$$(f(x:\alpha,\beta,\gamma,\lambda)) = \gamma + \frac{1-\gamma-\lambda}{1+e^{-(x/\alpha)^\beta}}$$

**FIGURE 1 | Equation used to estimate psychometric functions describing the relationship between number of bands and speech intelligibility.** α, alpha; β, beta; γ, gamma; λ, lambda. "x" in this study was the log of the number of channels in the noise vocoder.

performance, and was extracted from each fitted curve for use in subsequent analyses. Lower alpha values indicate better performance. Beta is inversely proportional to the curve steepness. The parameter gamma corresponds to the base rate of performance (or "guessing rate"), while lambda reflects the "lapse rate" i.e., a lowering of the upper asymptote to allow for errors unrelated to the stimulus level. The software takes a constrained maximumlikelihood approach to fitting, where all four variables are free to vary, but where, in this case, gamma and lambda are constrained between 0.00 and 0.05. For the forced-choice tasks (Consonants and Vowels), the gamma parameter was set to 1/17.

#### *Information transfer analyses*

The forced-choice nature of the consonant and vowel tasks means that the data could be arranged into confusion matrices for use in an Information Transfer (IT) analysis (e.g., Miller and Nicely, 1955). IT analysis makes use of confusions (e.g., /b/ is mistaken for /d/) in speech identification tasks to measure the extent to which phonetic features (e.g., place of articulation, presence/absence of voicing) in the stimuli are transmitted accurately to the listener. The data are typically quantified in terms of the proportion or percentage of available bits of information in the stimuli that are accurately received by the listener. If no confusions are made in the participant's identification of a certain feature (e.g., vowel length), the IT score would be 1 or 100%, and correspondingly, if the participant's responses do not vary lawfully with the actual feature value, the score would be 0.

Unfortunately, as the participants' responses were made by typing the answers, rather than by selecting onscreen response options, some participants in the current experiment deviated from the forced-choice response constraints. This could take the form of omitted responses (which often occurred at particularly difficult distortion levels) or responses from outside the closed list. As a consequence, all data sets that included any omissions or deviations from the forced-choice options were not included in the IT analysis.

*Consonants.* A total of 14 data sets were entered into the IT analysis for consonant recognition. The feature matrix used included voicing, place and manner, and is shown in **Table 1**.

Unconditional IT feature analyses were run within the FIX analysis package (Feature Information XFer, University College London, UK; http://www.phon.ucl.ac.uk/resource/software. html). The amount of Information transferred for Voicing, Place, and Manner (as a proportion of the amount input for each of the features) was recorded for (i) group confusion


*For Voicing, the '*+*' and '*−*' signs correspond to present and absent voicing, respectively. For Manner, plos, plosive; fric, fricative; aff, affricate; app, approximant; nas, nasal. For Place, bil, bilabial; alv, alveolar; lad, labiodental; paa, postalveolar; vel, velar; lav, labialized velar; pal, palatal.*

matrices constructed at 1, 2, 4, 8, 16, and 32 bands, for Session 1 and Session 2 separately and (ii) for individual confusion matrices collapsed across 1–32 bands, for Session 1 and Session 2 separately. For the particular set of consonants used, there were 0.937 bits of information available for Voicing, 2.542 bits for Place of articulation and 2.095 bits for Manner of articulation.

*Vowels.* A total of 14 data sets were entered into the IT analysis for vowel recognition. The feature matrix used included vowel height, backness, roundedness, length, and whether the vowel was a monophthong or diphthong (**Table 2**).

IT feature analyses were run, using the FIX analysis package, for (i) group confusion matrices constructed at 1, 2, 4, 8, 16, and 32 bands, for Session 1 and Session 2 separately and (ii) for individual confusion matrices collapsed across 1–32 bands, for Session 1 and Session 2 separately. In all analyses, there were 3.264 bits of information available for vowel Height, 2.816 bits for Backness, 1.452 bits for Roundedness, 0.874 bits for Length and 0.937 bits for Mono—vs. Diphthong status.

All other reported statistical analyses were carried out in SPSS (version 19; IBM, Armonk, NY).

## **RESULTS**

This section falls into two parts. In the first, psychometric performance functions are fitted to each individual's performance, and individual differences analyses of curve position used to characterize group performance across and within the two sessions. The second part uses IT analyses to explore the perception of consonants and vowels and relate this to recognition of sentences and words.

#### **MEASURING PROFILES OF LEARNING AND COVARIANCE**

**Figure 2** shows a plot of the group performance functions for the open-set [2(a)] and closed-set [forced-choice; 2(b)] tasks in each session.

For analysis, the alpha scores generated in the curve fitting procedure were operationalized as the Threshold Number of Bands (TNB) for each participant, in each task, in each session. **Figure 3** shows that there was an overall decrease in TNBs on the five tasks between Session 1 and 2. A repeated-measures ANOVA analysis was run on the TNBs, with Session and Task as withinsubjects factors. A between-subjects factor, Version (which coded the order of presentation of the item sets) was also included. There was a significant effect of Session [*F*(1, <sup>26</sup>) = 35.094, *p* < 0.001], a significant effect of Task [*F*(4, <sup>104</sup>) = 117.18, *p* < 0.001), and a non-significant interaction of these two factors (*F* < 1), indicating that the degree of improvement was not significantly different across tasks.

**FIGURE 2 | Logistic curves describing group performance on the speech recognition tasks for (A) open-set tasks (sentences and words) and (B) consonants and vowels.** Error bars show 95% confidence limits around α.


*For Height, o, open; no, near-open; om, open-mid; m, mid; cm, close-mid; nc, near-close; c, close. For Backness, b, back; nb, near-back; c, central; nf, near-front; f, front. For Roundedness, y, rounded and n, unrounded. For Length, s, short and l, long. For Diphthong, y, diphthong and n, monophthong. Dashes indicate the separation of the diphthong descriptions into monophthongal elements, in temporal order.*

The between-subjects effect of Version was non-significant (*F* < 1), as were the two-way interactions of Version with Session [*F*(1, <sup>26</sup>) = 1.33, *p* = 0.260] and Version with Task [*F*(1, <sup>104</sup>) = 1.16, *p* = 0.333]. There was, however, a significant three-way interaction of Version, Session and Task [*F*(4, <sup>104</sup>) = 5.57, *p* < 0.001]; while most conditions across both versions showed a mean improvement from Session 1 to Session 2, Version A participants showed a trend in the opposite direction on the IEEE task, while the Version B participants showed a very small decrease in mean performance on the Words task from Session 1 to Session 2.

There was evidence of several significant relationships across tasks for the TNB scores. **Table 3A** shows the one-tailed Pearson's correlation matrix for TNB scores in Session 1. These show significant (and marginally significant) correlations between the two sentence tasks, and between the consonants and vowels tasks, while the words correlated reasonably well with all other tasks.

A common factor analysis was run on the threshold data, with maximum likelihood extraction and varimax rotation. The rotated factor matrix is shown in **Table 4A**, for those factors producing eigenvalues above 1. Two components were extracted. In the rotated matrix, the first component accounted for 22.6% of the variance, while the second component accounted for 19.2%. The pattern of correlations for TNB scores in Session 2 no longer fitted the processing framework suggested by the Session 1 data (see **Table 3B**), with the Words task now somewhat separate from the others. A common factor analysis was run on the data as for the Session 1 scores. This converged on two components—see **Table 4B**. In this analysis, Factor 1 accounted for 24.4% of the variance, where Factor 2 accounted for a further 20.4%.

#### **EXPLORING ANALYTIC LISTENING USING IT ANALYSIS** *Consonants*

The results of the pooled group analysis are plotted in **Figure 4**, showing the proportion of Information transferred for each **Table 3 | Pearson's correlation coefficients between the five tasks in the experiment, across the two testing session.**


*Cons, Consonants;* <sup>∧</sup>*p* < *0.10, \*p* < *0.05.*

#### **Table 4 | Results of factor analyses on individual TNBs (Threshold Number of Bands).**


*Only factor loadings over 0.3 are shown.*

feature, across each difficulty level. The plots give a readily interpretable visual representation of the "cue-trading" behavior of the listeners as spectral information is manipulated, and as a result of perceptual learning.

**Figure 5** shows the results of the individual analyses for each feature and session collapsed across difficulty level. A repeatedmeasures ANOVA gave significant effects of Session [*F*(1, <sup>13</sup>) = 13.52, *p* = 0.003] and Feature [*F*(2, <sup>26</sup>) = 64.13, *p* < 0.001]. A significant interaction of these two factors [*F*(1.38, <sup>26</sup>) = 4.16, *p* = 0.046; Greenhouse-Geisser corrected] was explored using 3 *post-hoc t*-tests with Bonferroni correction (significance level *p* < 0.017). These indicated a significant increase in IT for Manner [*t*(13) = 2.97, *p* = 0.002] and Voicing [*t*(13) = 2.97, *p* = 0.011] from Session 1 to Session 2, but not for Place [*t*(13) = 2.17, *p* = 0.049].

The individual-subject IT scores for voicing, place and manner in each session were entered as predictors in linear regression analyses on the TNB scores for the five tasks. In Session 1, a significant model with Place and Voicing as predictors offered the best account of consonant recognition [*R*2adj. <sup>=</sup> <sup>0</sup>.750; *<sup>F</sup>*(2, <sup>11</sup>) <sup>=</sup> 21.71, *p* = 0.001]. Performance on the vowels task was best predicted by Voicing [*R*2adj. <sup>=</sup> <sup>0</sup>.295; *<sup>F</sup>*(1, <sup>12</sup>) <sup>=</sup> <sup>6</sup>.44, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>.026].

**FIGURE 4 | Results of the group IT analysis on consonant perception for (A) Session 1 and (B) Session 2.**

In Session 2, Manner and Place predicted TNB scores on the Consonants task [*R*2adj. <sup>=</sup> <sup>0</sup>.580; *<sup>F</sup>*(1, <sup>12</sup>) <sup>=</sup> <sup>9</sup>.98, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>.003], while Manner scores predicted TNB scores on the IEEE sentences [*R*2adj. <sup>=</sup> <sup>0</sup>.270; *<sup>F</sup>*(1, <sup>12</sup>) <sup>=</sup> <sup>5</sup>.80, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>.033]. There were no other significant models.

#### *Vowels*

The results of the pooled group IT (**Figure 6**) show that vowel Length information is the best transferred (as a proportion of the information input about this feature) of the five features at low spectral resolutions (1, 2, and 4 bands), with the other features more closely bunched. At greater spectral resolutions (16 and 32 bands), this discrepancy is reduced.

**Figure 7** shows the results of the individual analyses for each Feature and Session collapsed across difficulty level). A repeatedmeasures ANOVA gave significant effects of Session [*F*(1, <sup>13</sup>) = 23.17, *p* < 0.001] and Feature [*F*(1.12, <sup>52</sup>) = 34.34, *p* < 0.001; Greenhouse-Geisser corrected) with no interaction of the two factors.

The individual-subject IT scores for each feature in each session were entered as predictors in linear regression analyses on the TNB scores for the five tasks. In Session 1, a significant model featured Height as the sole predictor of TNB scores on the Vowels task [*R*2adj. <sup>=</sup> <sup>0</sup>.743; *<sup>F</sup>*(1, <sup>12</sup>) <sup>=</sup> <sup>38</sup>.68, *<sup>p</sup>* <sup>&</sup>lt; <sup>0</sup>.001]. In

**FIGURE 6 | Results of the group IT analysis on vowel perception for (A) Session 1 and (B) Session 2.**

Error bars show ±1 standard error of the mean.

Session 2, a significant model with Height and Length [*R*2adj. <sup>=</sup> 0.772; *F*(1, <sup>12</sup>) = 28.51, *p* < 0.001] gave the best prediction of TNB scores on the Vowels task, and a model with Length emerged as significant for scores on the BKB sentences [*R*2adj. <sup>=</sup> 0.308; *F*(1, <sup>12</sup>) = 6.80, *p* = 0.023]. There were no other significant models.

## **DISCUSSION**

roundedness.

The current data showed evidence for improved recognition of noise-vocoded sentences, words and segments, when re-tested after a 1–2 week period of no exposure, and without any explicit training. Using individual differences as the starting point for analyses, we identified a pattern of covariance across levels of the linguistic hierarchy, which changed with learning. Analyses of confusion data revealed that participants in the experiment improved on the reception of acoustic-phonetic features by Session 2, but exhibited inefficient use of cues available in the vocoded signal. Further, these suggested predictive roles for specific phonetic features in the perception of noise-vocoded stimuli.

#### **UNDERSTANDING NOISE-VOCODED SEGMENTS, WORDS AND SENTENCES: EFFECTS OF LEARNING AND TASK**

We found that performance improved significantly between Session 1 and Session 2 of the experiment, and by an equivalent amount across tasks. As the experiment was primarily designed to explore individual differences and patterns of covariance across tasks and time, we chose to run the five speech tests in a fixed order for both sessions. Given that the task order was not counterbalanced across the group, we therefore cannot conclude whether the observed improvements are due to within-session exposure or between-session consolidation—for example, the BKB test occurred at the start of each session, and so the improvement observed by Session 2 may reflect adaptation during the remaining four tasks in Session 1 and before the delay period. However, taken across all tasks, the significant improvement in performance is, at least, a demonstration of medium-term retention of adaptation to noise-vocoded speech in the absence of exposure or training. We also note that, for the sentences and words tasks, the improved performance reflects perceptual learning at an acoustic-phonetic level, as recognition in Session 2 was tested using novel tokens (Davis et al., 2005).

#### **UNDERSTANDING NOISE-VOCODED SEGMENTS, WORDS AND SENTENCES: EXPLORING COVARIANCE**

Simple correlations between the tasks in each session showed, like Surprenant and Watson (2001), rather modest evidence for covariation of individual thresholds across the tasks (**Table 3**). This once again demonstrates that there is no straightforward, unitary approach to recognizing degraded speech across levels of the linguistic hierarchy. However, a factor analysis of Session 1 TNB scores suggested some systematicity—this revealed two similarly-weighted, orthogonal factors in the Session 1 threshold data, with sentences and words loading on one factor, and words and segments loading on the other. This suggests two independent modes of listening: a "top–down" mode making use of lexical, syntactic and semantic information to generate hypotheses about stimulus identity, and a "bottom–up" mode concerned with acoustic-phonetic discriminations. Notably, the words task occupies an intermediate status, by loading on both "top–down" and "bottom–up" factors. By Session 2, when performance had improved, all tasks but one—Words—patterned together. It appears that once the initial learning of sound-torepresentation mappings has taken place, the listener can begin to approach most stimulus types in a similar way. In this sense, we suggest that the *nature* of the underlying factors was different in Session 2, such that these could no longer be well described by a greater involvement of "top–down" or "bottom–up" processes. However, we note that, in both session, the factors only accounted for a proportion of the variance, and therefore we cannot rule out the influence of additional factors underlying performance.

The plot in **Figure 2** shows that the Words task was the most difficult of the open-set tasks in both sessions, with listeners requiring a greater amount of spectral detail (i.e., larger numbers of bands) in order to reach the 50% performance threshold. Within the open-set tasks, the overall amount of exposure to vocoded material across seventy sentences is much greater than for seventy monosyllabic words. However, Hervais-Adelman et al. (2008) showed that, even when matched for number of words of exposure, learning is still slower for noise-vocoded words than for sentences. These authors interpret such findings in terms of the relative richness of the "teaching signal" that assists learning. In the current experiment, the listener could draw upon many sources of knowledge against which to test hypotheses for sentence recognition—lexical, syntactic and semantic (Miller, 1947). Furthermore, the segment recognition tasks provided a learning frame-work through their forced-choice design. In contrast, the recognition of monosyllabic, degraded words could be constrained by the expectation of real lexical items, but with many monosyllables having several real-word neighbors (bat, cat, sat, fat etc.), any error in phonemic identification could lead to the participant making the wrong "guess" in their response to difficult items. In line with Hervais-Adelman et al., we argue that the nature of the Words task will have made it most difficult within each testing session, but also limited the potential for improved performance with learning. By the second testing session, listeners have established a sufficient level of acoustic-to-phonemic mapping that, in combination with expectancy constraints, allowed for improved performance on the sentences and segments tasks. However, the recognition of single words could not be performed using the same listening strategies(s).

#### **EXPLORING ACOUSTIC-PHONETIC PROCESSING: INFORMATION TRANSFER FOR CONSONANTS AND VOWELS**

The forced-choice design of the consonants and vowels tasks allowed us to explore performance in terms of the perception of phonetic features, using an IT analysis. The outcomes of IT analyses on the consonants and vowels recognition data are generally in agreement with the findings of several previous studies using noise-vocoded speech (Dorman et al., 1990, 1997; Shannon et al., 1995; Dorman and Loizou, 1998; Iverson et al., 2007). However, the current study enabled the assessment of two extra dimensions: the effect of perceptual learning on the extraction of feature information, and the relationship of feature processing to performance on the five speech recognition tasks.

The group IT analysis of the consonants data suggested that, numerically, place was the most poorly transferred feature, with no improvement across sessions. Dorman et al. (1990) tested identification of consonants by CI patients. They reasoned that, given the good temporal resolution by implants, envelope-borne information would be well transferred while the poor resolution offered by a small number of electrodes (6 in the device tested in their study) would limit the transfer of spectral information. Envelope information potentially cues listeners to voicing and manner, while transmission of place information is dependent on high-rate temporal structure (fluctuation rates from around 600 to 10 kHz) cueing spectro-temporal dynamics including the ability to resolve formants in the frequency domain; (Rosen, 1992). In CIs and their simulations, frequency resolution can be very poor in the region of formant frequencies, such that both F1 and F2 may be represented by the output of only one channel/electrode. Even if the first two formants can be resolved, the ability to differentiate one speech sound from the other can depend on within—formant transitions in frequency, for example in the discrimination of /b/ and /d/. The ability to make discriminations based on formant-carried frequency information in noise-vocoded speech will depend on the ability of listeners to compare the relative amplitude outputs of the different bands. A study by Shannon et al. (1995) with normal-hearing listeners exposed to noise-vocoded speech demonstrated that, after several hours of exposure, voicing and manner were almost completely transferred from spectral resolutions of 2 bands and upwards, while place IT was around 30% with 2 bands and did not exceed 70% by 4 bands. It should be noted that the three phonetic features of voicing, place and manner are not completely independent of each other, and there is likely to be some degree of overlap in the corresponding acoustic features. Dorman et al. (1990) point out that the amount of transferred place information should vary with the amount transferred about manner, as some manner cues facilitate place recognition e.g., frication manner (i.e., a wideband noise in the signal) potentially allows relatively easy discrimination between /s/ or /- / and /f/ or /θ/, as the former pair can be 15dB more intense than the latter.

The slightly more marked improvement in reception of voicing information than for the other two features in the current task can perhaps be explained by considering the acoustic nature of the noise-vocoded stimulus. Voicing can be weakly signaled by relatively slow envelope fluctations—for example, through detection of the longer silent periods in voiceless than voiced plosives, or in the greater amplitude of voiced compared to voiceless obstruents. However, voicing is also signaled by periodicity, that is, temporal regularity in the speech waveform carried by fluctuations primarily between 50 and 500 Hz (Rosen, 1992). This information is reasonably well preserved after the vocoding scheme used in the present experiment, where the amplitude envelope was low-pass filtered at 400 Hz. The between-session improvement shown for voicing at low band numbers could reflect the participants' increased ability to use the available temporal information in the stimulus to assist performance in the absence of cues to place of articulation that are more dependent on spectral resolution. However, voicing information is also carried by cues to overall spectral balance, as voicing is weighted toward low frequencies. These cues become apparent as soon as at least a second band of information is added to the noisevocoded stimulus. We note that the duration of the preceding vowel in naturally-produced VCV stimuli can be a cue to voicing in the upcoming consonant. However, the mean preceding vowel duration was not significantly different between tokens with voiced and voiceless consonants in our task [*t*(15) = 1.09, *p* = 0.292].

It is clear that the transmission of spectral shape information, as is required for identification of height, backness, roundedness, and diphthongs, is a limiting factor in recognition of noisevocoded vowels, and that these four features are closely related in terms of recognition. However, as neither the amplitude envelope nor the duration of the signal is distorted by the noise-vocoding procedure, the information on vowel length should have been readily transmitted at all channels. Indeed, at lower numbers of bands, length was the most well recognized feature in the vowels task. However, overall recognition of this feature was well below 100%, and variable across the participant group. Regression models identified length as a significant predictor of scores on the other speech tasks, suggesting that timing and rhythmic information are of importance in perception of noise-vocoded speech. Our findings are similar to those by Iverson et al. (2007), who measured IT for vowel length in CI users and normal-hearing listeners listening to a CI simulation. Both listening groups in the Iverson et al. study showed sub-optimal IT. The authors propose that, given the excellent preservation of durational information in noise-vocoding, participants should be able to show 100% IT for length, even at low spectral resolutions. Therefore, while the evidence suggests that timing and rhythm may be important for successful perception of some forms of noise-vocoded speech, listeners may require more guidance and training in order to make better use of durational cues.

#### **RECOGNITION OF NOISE-VOCODED SPEECH OVER PARTICIPANT, TASK, AND TIME—IMPLICATIONS FOR TRAINING AND COCHLEAR IMPLANTATION**

The current study presents a number of findings relevant to more applied settings such as training regimes for CI recipients. We identified that, on initial exposure to noise-vocoded speech, the pattern of covariance across tasks was suggestive of two different "levels" of processing—one lexico-semantic (or "top–down"), and the other more acoustic-phonetic (or "bottom–up"). The words task was implicated in both of these factors. Isolated monosyllabic items such as "mice" and "gas" have lexical and semantic content, and the expectation of meaningful tokens can constrain the listener's candidate pool of targets in the recognition task. However, with all the items bearing the same CVC structure, and without any higher-order syntactic and semantic cues, the listener must also engage an analytical, acoustic-phonetic approach in order to successfully identify the words. It may be this demand on both approaches that makes the words task the most difficult in the set. Loebach and colleagues (Loebach and Pisoni, 2008; Loebach et al., 2008, 2009, 2010) argue that training in analytical, acoustic-phonetic listening offers the most promising route for adaptation to distorted speech. In a test of vocoded sentence perception they observed equivalent transfer of learning after training with semantically anomalous sentences as from training with real sentences. They suggest that this reflects an increased demand for attention to acoustic-phonetic aspects of the signal (rather than higher-order syntactic or semantic cues) when listening to the anomalous stimuli—their implication is that learning at this level can then be readily transferred to other stimulus types (Loebach et al., 2010). However, they also find that transfer is greatest between stimuli of the same linguistic class (Loebach and Pisoni, 2008). We have identified two sources of variability, of similar explanatory power, underlying the recognition of noise-vocoded speech. We suggest that both listening "strategies" should potentially yield benefits for adaptation, and that the effects observed by Loebach and colleagues may be associated with more generalized attentional engagement rather than the superior effects of analytic listening. Nogaki et al. (2007) partly ascribe the variability within normal-hearing participants listening to CI simulations to variable levels of enthusiasm and involvement in difficult listening tasks, and contrast this with the keener sense of urgency shown by CI patients, for whom successful training has important consequences for their quality of life. An experimental modulation of attentional engagement with noise-vocoded speech might offer greater insight into how this might differentially affect top–down and bottom–up aspects of listening. In the context of cochlear implantation, it is important to recognize that most of what we hear in everyday speech takes the form of connected phrases and sentences. Therefore, identifying methods of engaging attention in higher-order aspects of linguistic processing, such as the use of semantic and syntactic cues to "fill in the gaps" in difficult listening situations, may yield benefits of a similar magnitude to more bottom–up strategies.

We also identified targets for improved bottom–up processing of noise-vocoded speech from the current dataset. The use of IT analyses to explore acoustic-phonetic processing produced findings unattainable from basic recognition scores. We identified significant predictive roles for voicing and vowel length information in recognizing noise-vocoded stimuli across the linguistic hierarchy. Both of these properties were well represented at low spectral resolutions in the current stimuli—in particular, vowel length information was fully present even with one band. However, although perception of these features showed marked improvement over time, **Figures 4**–**7** show that listeners' accuracy in recognizing these features was much less than 100% in both sessions. This suggests that, in the absence of specific guidance or instruction, listeners continue to rely on typically dominant cues to phoneme identification (e.g., formant frequencies in vowels) at the expense of other information that is more reliably preserved in the degraded signal. Based on this result, and similar findings from Iverson et al. (2007), we suggest that if CIs are to be trained in analytic listening to aid perceptual learning, this should be targeted at the acoustic cues that are most likely to be preserved when transmitted through the device. We suggest that focused training on perception of duration, amplitude modulation, and spectral balance cues could be used to improve acoustic-phonetic processing from the bottom–up by maximizing the usefulness of the information in the acoustic signal.

#### **CONCLUSION**

The current study offers some insight into the existence of overlapping lexico-semantic and acoustic-phonetic processes underlying the adaptation to a CI simulation in normal-hearing participants. We suggest that both "top–down" and "bottom–up" listening strategies have potential validity in settings such as training for recipients of CIs. To improve "analytic" processing, we suggest that training should involve targeted attentional engagement with acoustic cues that are well preserved in the degraded stimulus. Further work is necessary to evaluate the benefits of such an approach. When considering all of the findings, however, we must acknowledge that the speech transformation used in the current study forms only a basic approximation to the signal perceived by most users of CIs. The process of implantation can result in incomplete insertion of the electrode array and damage to parts of the basilar membrane (yielding "dead regions"), both of which have consequences for the mapping of sound to the auditory nerve. Such effects can be simulated through additional transformations in the noise-vocoding technique (e.g., Rosen et al., 1999; Smith and Faulkner, 2006), and future work will need to determine whether the current findings are borne out for these more degraded signals.

## **ACKNOWLEDGMENTS**

This work was supported by an Economic and Social Research Council +3 Postgraduate Studentship (PTA-030-2004-01065) awarded to Carolyn McGettigan.

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 29 September 2013; accepted: 24 January 2014; published online: 25 February 2014.*

*Citation: McGettigan C, Rosen S and Scott SK (2014) Lexico-semantic and acousticphonetic processes in the perception of noise-vocoded speech: implications for cochlear implantation. Front. Syst. Neurosci. 8:18. doi: 10.3389/fnsys.2014.00018*

*This article was submitted to the journal Frontiers in Systems Neuroscience.*

*Copyright © 2014 McGettigan, Rosen and Scott. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Across-frequency combination of interaural time difference in bilateral cochlear implant listeners

## *Antje Ihlefeld1,2\*, Alan Kan1 and Ruth Y. Litovsky1*

*<sup>1</sup> Waisman Center, University of Wisconsin, Madison, WI, USA*

*<sup>2</sup> Center for Neural Science, New York University, New York, NY, USA*

#### *Edited by:*

*Jonathan E. Peelle, Washington University in St. Louis, USA*

#### *Reviewed by:*

*Etienne Gaudrain, University Medical Center Groningen, Netherlands Virginia Best, Boston University, USA*

#### *\*Correspondence:*

*Antje Ihlefeld, Center for Neural Science, New York University, 4 Washington Place, New York, NY 10003, USA e-mail: ai33@nyu.edu*

The current study examined how cochlear implant (CI) listeners combine temporally interleaved envelope-ITD information across two sites of stimulation. When two cochlear sites jointly transmit ITD information, one possibility is that CI listeners can extract the most reliable ITD cues available. As a result, ITD sensitivity would be sustained or enhanced compared to single-site stimulation. Alternatively, mutual interference across multiple sites of ITD stimulation could worsen dual-site performance compared to listening to the better of two electrode pairs. Two experiments used direct stimulation to examine how CI users can integrate ITDs across two pairs of electrodes. Experiment 1 tested ITD discrimination for two stimulation sites using 100-Hz sinusoidally modulated 1000-pps-carrier pulse trains. Experiment 2 used the same stimuli ramped with 100 ms windows, as a control condition with minimized onset cues. For all stimuli, performance improved monotonically with increasing modulation depth. Results show that when CI listeners are stimulated with electrode pairs at two cochlear sites, sensitivity to ITDs was similar to that seen when only the electrode pair with better sensitivity was activated. None of the listeners showed a decrement in performance from the worse electrode pair. This could be achieved either by listening to the better electrode pair or by truly integrating the information across cochlear sites.

**Keywords: cochlear implant, interaural time difference, envelope ITD, across-frequency integration, spatial sound**

## **INTRODUCTION**

Interaural time differences (ITDs), which occur due to different arrival times of sound energy in the left and right ear, are paramount for localizing the direction of a sound source and for attending to speech in a mixture of surrounding sound (Darwin and Hukin, 1999; Kidd et al., 2005). However, bilateral cochlear implant (CI) listeners do not rely much on ITDs when localizing sound (e.g., Seeber and Fastl, 2008). Moreover, studies on speech intelligibility in situations with background sound demonstrate that binaural processing effects in bilateral CI users are either absent, or much smaller compared to normal-hearing (NH) listeners (Litovsky et al., 2006, 2009; Loizou et al., 2009). This suggests that when compared to NH listeners, CI listeners struggle to effectively utilize ITDs (Ihlefeld and Litovsky, 2012). One of the key challenges for CI listeners is to understand speech in situations with background sound, where ITDs can greatly aid speech understanding in NH listeners. Because speech is a broadband signal, binaural preservation of cues through multi-channel stimulation is ultimately required for restoring speech intelligibility in CI listeners in natural, multi-source environments. A first step toward the goal of restoring multi-channel binaural cues is to examine ITD sensitivity in CI listeners when multiple cochlear sites are simultaneously stimulated, and this is the focus of the current study.

Most CI listeners can resolve ITDs in the signal envelopes, at least for 100% modulated pulse trains in quiet, but are insensitive to fine structure ITDs transmitted by high carrier rates (Lawson et al., 1998; van Hoesel and Tyler, 2003; Laback et al., 2004). Previous work shows that CI listeners can discriminate envelope-ITDs with thresholds as small as 50µs (van Hoesel et al., 2009), with an overall sensitivity that shows U-shaped tuning as a function of envelope modulation frequency and peak sensitivity around 100 Hz (Noel and Eddington, 2013). However, when stimulation rate is low, there is generally reduced potency of fine structure ITD cues in bilateral CI listeners, and it is common to observe idiosyncratic differences in ITD sensitivity across cochlear sites (Litovsky et al., 2012). One reason is that within each ear, electric fields from stimulating electrodes can spread to nearby sites, resulting in reduced sensitivity to stimulation at desired electrodes (e.g., Bierer, 2010). Moreover, within and across listeners, there are interaural differences in neural survival and electrode placement which leads to reduced ITD sensitivity (Kan et al., 2013). Differences in envelope ITD sensitivity across stimulation sites could affect how CI listeners interpret broadband envelope ITD information.

In addition to these idiosyncratic factors, the acoustic environment could also influence across-electrode integration of envelope ITD. From an acoustic perspective, usefulness of envelope-ITDs is known to have limits. ITD robustness can be much reduced in everyday environments, where sound energy reflected from walls and ceiling, and energy from competing acoustic sources superpose with target sound. Reverberation often reduces the modulation depth of a target source relative to quiet anechoic conditions, decreasing both speech identification and sound localization performance (Houtgast and Steeneken, 1984; Watkins, 2005; Ihlefeld and Shinn-Cunningham, 2011; Ruggles and Shinn-Cunningham, 2011). Under anechoic conditions, potency of envelope-ITDs depends on the rise time of the slope of the envelope and on the duration of gaps in the envelope, both of which generally vary with modulation depth (Klein-Hennig et al., 2011; Laback et al., 2011; Dietz et al., 2013). This raises the question of how the auditory system can accommodate interference from reduced modulation depth in CI listeners. The effects of demodulation may be even more detrimental on performance for CI than for NH listeners. Indeed, when bilateral CI listeners localize sounds in a noisy background, the signal to noise ratio at which they can effectively utilize envelope-ITDs is much higher than for NH listeners (for a recent review, see van Hoesel, 2012).

Few previous studies have systematically investigated the role of modulation depth for ITD sensitivity in CI listeners. Early work on bilateral CI listening reported results for single-site stimulation in one CI listener, whose ITD sensitivity improved with increasing modulation depth (van Hoesel and Tyler, 2003). More recent work on CI listeners shows that, for trapezoidally amplitude modulated 1515 pps carriers, envelope-ITD sensitivity improves with decreasing duty cycle of the envelope (Laback et al., 2011). CI simulations with NH listeners confirm this effect of duty cycle and further suggest that envelope-ITD sensitivity improves monotonically at low modulation depths, then saturates beyond a critical modulation depth (Bernstein and Trahiotis, 2011; Klein-Hennig et al., 2011; Laback et al., 2011; Dietz et al., 2013).

It is notable that previous work on CI envelope-ITDs has focused on single-site stimulation (van Hoesel and Tyler, 2003; Laback et al., 2011; Noel and Eddington, 2013). However, most natural sounds have broadband energy. For CI stimulation it is therefore desirable to provide envelope-ITDs concurrently at multiple sites along the cochlear array. The current study examines broadband ITD sensitivity in bilateral CI listening, when stimulation occurs at two cochlear sites, and when ITDs are consistent across sites. The aim was to examine the integration of envelope-ITD cues that are spectrally remote due to their positioning at two anatomically different places along the cochlear arrays. Two experiments compared envelope-ITD sensitivity in CI listeners for stimulation that was at a single site vs. dual-site stimulation.

Previous work showed that some CI listeners depend heavily on onset ITD cues (Laback et al., 2007; van Hoesel, 2008). Unlike ongoing cues, due to their transient nature, onset cues do not offer multiple independent "looks" and thus do not provide the opportunity to combine sensory information across time. For an optimal listener, the availability of ongoing cues should improve performance relative to having only onset cues (Hafter and Dye, 1983). To gauge the relative importance of onset cues on integrating binaural cues across electrodes, Experiment 1 measured performance with onset cues; Experiment 2 tested four listeners with "good" binaural sensitivity as determined by data from Experiment 1, with attenuated onset cues.

Of interest was how bilateral CI users who are stimulated at two sites along the cochlear array extract binaural cues, in particular if, due to neural-electrode interface or neural survival issue, these listeners have different sensitivity to binaural cues at the two sites. In one potential scenario, these listeners might only process the binaural cue that is most salient. Thus, CI listeners may interpret binaural cues from the two cochlear sites as one aggregate spatial percept, and their performance with two binaural pairs of electrodes is not expected to increase relative to the single-site stimulation. We also consider the possibility that dual-site stimulation may increase a listener's uncertainty as to what cues to listen to, causing mutual interference and reducing ITD saliency. In this second scenario, ITD sensitivity would decrease in the aggregate sound compared to listening to the better single-site alone. Another possible outcome would be that of enhanced performance with dual-site vs. single-site electrodes; this might occur if CI listeners can combine ITD cues from multiple pairs of electrodes that are treated as independent channels of information.

Here we provide behavioral evidence buttressing the early finding that envelope-ITD sensitivity improves with increasing modulation depth (van Hoesel and Tyler, 2003). Furthermore, we show that when two cochlear sites jointly convey ITD information, CI listeners perform no worse than when they listen to the better single site. Six out of eight tested bilateral CI listeners either showed an improvement in ITD *d*- -sensitivity when two cochlear regions were stimulated jointly, or their performance was similar to that with the better of the two electrode pairs. None of these six listeners showed consistent interference from the worse electrode pair. Two additional performers, with very poor ITD sensitivity, showed neither consistent improvement nor decrement in performance for dual- vs. single-site stimulation.

#### **MATERIALS AND METHODS LISTENERS**

Eight bilateral CI users with Nucleus devices participated in the study and were paid for their time. All testing was administered according to the guidelines of the Institutional Review Board of the University of Wisconsin-Madison. **Table 1** lists details of their clinical etiology.

#### **STIMULI**

Custom-written software (implemented in Matlab, The Mathworks, Natick, MA) was used to present the stimuli and record listeners' responses. Stimuli were delivered with two synchronized Nucleus Implant Communicators (Cochlear Ltd., Sydney, NSW, Australia). Prior to testing human listeners, we confirmed the proper function of the custom-written stimulation software, by projecting the output from two test implants to an oscilloscope across a range of ITDs. Moreover, as a precaution at the beginning of each testing day, we checked proper function of our equipment with an oscilloscope.

Each stimulus consisted of a 300-ms long, 1000 pps pulse train that was 100-Hz sinusoidally amplitude-modulated (**Figure 1**). Electrodes were activated in monopolar configuration. Pulses


were biphasic, with phase duration of 25 or 50µs, depending on the comfortable loudness range of each CI listener (cf. **Table 2**). Envelopes (*E*) of each amplitude modulated pulse train were as follows:

$$E(t) = I\_{\text{max}} - m/200 \ast (I\_{\text{max}} - I\_{\text{min}}) \ast \left[1 + \cos(2 \ast pi \ast f \ast t)\right], (1)$$

where *I*max and *I*min are the maximal and minimal amplitudes in clinical units, respectively, *m* denotes the modulation depth in percent, *f* equals 100 Hz, and *t* denotes time. *I*min equaled detection threshold (T-level) for a 1000 pps pulse train with 0% modulation depth. *I*max was set such that, for single-site stimulation, the overall stimulus loudness was the same across modulation depths (see Loudness Calibration Procedures below). Throughout testing, *I*max was held fixed for each electrode and modulation depth. As a result, component stimulation levels in the dual-site stimulus were identical to the single-site cases. The starting phase of the envelope equaled zero.

1000 pps pulse trains were multiplied with the envelope and waveforms were time-delayed across ears, generating 1000 or 2000µs ITDs. We initially presented 1000µs ITD, expecting that it would lead to perfect performance at 100% modulation depth (van Hoesel et al., 2009). However, some listeners were unable to do the task with 1000µs ITD during initial training. Those listeners were tested with 2000µs ITD instead.

#### **CALIBRATION PROCEDURES**

Three types of calibration procedure were performed. First, a bilateral pitch-matching task was conducted to select electrodes whose combined stimulation produced binaurally fused percepts (see Methods in Litovsky et al., 2010). Second, a loudnessbalancing task was performed to ensure that all single-site stimuli were presented at a similar overall loudness. Third, an interaural level difference (ILD) centering was conducted so that the sounds were perceived roughly in the center of the head for an ITD of zero.

#### *Pitch matching*

For bilateral pitch matching, we initially selected two basal and two apical electrodes in the left ear. For each of these left-ear electrodes, we selected five right-ear electrodes whose clinical frequency stimulation ranges were close to that of the left ear. With a 1000 pps unmodulated pulse train stimulus, we sequentially stimulated the left ear electrode followed by one of its potential pitch matched electrodes in the right ear and asked the listener to categorize the perceived pitch of the second stimulus by determining whether, relative to the first stimulus, it was heard as: much lower, slightly lower, similar, slightly higher, or much higher. Twenty trials were collected per electrode pair for each listener. To reduce the possibility of range effects, all potential apical and basal pairs were presented in random order. This pitch comparison task resulted in one basal and one apical bilateral pair of electrodes that listeners most consistently rated similar in perceived pitch across ears. **Table 2** lists the electrode numbers for each listener comprising left-right pairs.

#### *Loudness balancing*

Each listener performed a series of loudness balancing calibrations (Landsberger and McKay, 2005). For each loudnessbalancing track, two sounds, a fixed loudness reference and a level-adjustable target, were presented sequentially in two stimulus intervals. The level-adjustable target was initially set to a quiet level. The listener controlled the stimulus and increased the target level *I*max*,*Target until the target sounded louder than the reference, then decreased the target level until the target


**Table 2 | The left and right (L/R) electrodes in each pair, the threshold level Imin in clinical units (cu), and the most comfortable level Imax, for 0% modulation depth at 1000 pps.**

sounded softer than the reference, followed by another increase in level until both reference and target sounded equally loud. Two tracks were recorded, and subsequently the roles of the two sounds were reversed and testing was repeated for two additional tracks. The signed difference between the target and reference, averaged across all four tracks was then added to the initial reference level, resulting in the loudness-balanced target level.

This loudness balancing was initially performed with 1000 pps unmodulated pulse trains, once across the two left-ear electrodes, and once across the two right-ear electrodes. The resulting sounds were near most comfortable level (C-level) on all four electrodes. In each ear, the basal and apical electrodes were then presented jointly at current levels that were a few dB below these loudness balanced current levels, and their levels were gradually increased together until they sounded comfortably loud when presented jointly. Specifically, for the dB scaling, the output level in current level units (CLU) was converted into units of Ampere, scaled in dB and then converted back in to CLU.

#### *ILD centering*

Loudness balancing tracks were followed by ILD centering where all four electrodes were stimulated simultaneously, and the listener was asked to adjust the perceived intracranial location of the sound until it was perceived to be in the center of the head. The adjustment was made by lowering the stimulus level on the side that dominated perceived laterality of the sound image.

After ILD centering, another round of loudness balancing measurements followed. For each of the four electrodes, the modulated sound was adjusted with the loudness balancing routine described above, balancing unmodulated 1000 pps with each of 100, 20, and 60% modulation depth (in that order). Specifically, the reference sound was the unmodulated 1000 pps train presented at the C-level that would have produced an ILD-centered percept for dual-site stimulation. *I*min was set to the T-level for 1000 pps at that electrode. Modulation depth was held constant. Listeners balanced *I*max of the modulated sound. *I*max-values for 10, 40, and 80% modulation depth were then linearly interpolated. As a final verification, we presented all four electrodes jointly at 100% modulation depth, and asked listeners to report the intracranial perceived location. All listeners reported that they perceived a dominant intracranial image approximately near midline. Note that due to monaural loudness summation, the dual-site, ILD-centered C-levels were smaller or equal to the C-levels for electrodes stimulated in isolation. Moreover, some listeners showed bilateral asymmetries, whereby they reduced the level of their better ear well below that ear's isolated C-level in order to obtain a centered intracranial image.

#### **TESTING PROCEDURES**

Pitch-matched, loudness-balanced, ILD-centered stimuli were used for ITD discrimination testing. Using a 2I-2AFC task, on each trial, left-leading and right-leading ITDs were presented in random order, separated by a 400 ms inter-stimulus interval (ISI). The listener's task was to identify whether the overall sound image moved from left to right or from right to left across the two stimulus intervals. The order of the intervals with leftleading and right-leading stimulus varied randomly from trial to trial. For training purposes, each listener performed the first two blocks of testing at each electrode condition with correctresponse feedback. No feedback was given during the remainder of the experiment.

#### *Psychometric functions*

For each channel, we measured percent correct as a function of *m*, the envelope's modulation depth. On each trial, *m* was chosen randomly from one of the possible modulation depths. To prevent learning effects, all values of *m* were tested once before being presented again. We initially tested seven modulation depths: 0, 10, 20, 40, 60, 80, and 100%. Occasionally, listeners clearly performed at floor or ceiling levels for some of these modulation depths. For these listeners we focused data collection on the most informative modulation depths. We presented 15 trials per block, and held the electrode configuration and modulation depth constant within each block. Blocks were grouped in triplets. Each triplet consisted of one modulation depth, presented at one apical-only, one basal-only, and one dual-site apical-basal configuration; these configurations were interleaved in a Latin Square Balanced design across blocks. Modulation depth varied randomly across triplet groups and was balanced in Latin Square design. All modulation depths and electrode configurations were tested once before everything was repeated with new randomization. There were four overall repeats of all testing conditions, resulting in 60 trials per electrode configuration and modulation depth and CI listener. All listeners completed the testing within 1 day. During training, even in the easiest, 100% modulated condition, two listeners (IAJ, and IBR) struggled to discriminate between movement from +1000µs ITD to −1000µs ITD vs. the opposite direction. Those listeners were tested with ±2000µs ITD instead.

Time permitting, listeners who could perform the ITD discrimination task with ±1000µs ITD participated in a second ITD discrimination experiment with ramped onsets and offsets. Compared to Experiment 1, the only difference in Experiment 2 was that the stimulus onsets and offsets were ramped with raisedcosine ramps of 100 ms rise and 100 ms fall time. Four listeners completed Experiment 2.

#### **DATA ANALYSIS**

For each listener and experimental condition, percent correct scores were calculated and converted into *d*- -values. Probabilities of hits for each stimulus interval, *P*<sup>1</sup> and *P*2, were estimated and, to prevent numeric instabilities, bracketed within the range of 1*/N* and 1 − 1*/N*, where *N* equals 60, the number of trials. Response biases were removed and unbiased *d*- -values were calculated by averaging the normal deviates of these probabilities (Klein, 2001):

$$d' = \sqrt{2} \ast \left[ z \left( P\_1 \right) + z \left( P\_2 \right) \right] / 2. \tag{2}$$

For an optimal listener, with independent internal peripheral noises and no central decision making noise, predicted performance in the dual-site conditions equals

$$d\_{\rm pred}^{\prime} = \sqrt{d\_b^{\prime}^{\prime} + d\_a^{\prime}^{\prime}^2} \tag{3}$$

where *d*- *<sup>b</sup>* and *d*- *<sup>a</sup>* are the *d* sensitivities in the basal and apical single-site conditions (e.g., Gockel et al., 2010).

## **RESULTS**

#### **PSYCHOMETRIC FUNCTIONS**

**Figure 2** shows ITD discrimination performance as a function of modulation depth for each of the eight listeners. Panels **A–F** show *d* performance for those listeners who could discriminate ±1000µs ITD, whereas panels **G,H** show data from the two listeners who could not discriminate ±1000µs and were therefore tested with ITDs of ±2000µs. Basal-only, apical-only, and dual-site electrode conditions are denoted by square, circle, and triangle symbols. Error bars, where large enough to be visible, show one standard error of the individual *d*- , assuming binomially distributed response rates.

For the six listeners with "good" sensitivity to ITDs (**Figures 2A–F**), performance generally improved with increasing modulation depth. An exception is IBP whose performance was near chance for the basal electrodes. Four of these listeners performed better for basal than for apical electrodes (squares fall above circles for IAZ, IBB, IBF, and IBK), while the other two listeners performed better at the apical electrodes (IBP and IBU).

None of the listeners performed consistently worse in the dual-site than in the single-site conditions (in **Figure 2**, triangles generally fall above or coincide with squares and circles). Occasionally, performance in the dual-site conditions fell slightly below that of the better single site. Specifically, for IBB at 40 and 80% modulation depth, and for IBP at 0, 40, and 60% modulation depth, *d* sensitivity at the better single-site marginally outperformed dual-site stimulation. However, this was not a consistent trend. Performance was then examined across listeners, to assess whether dual-site stimulation is better than single-site alone. We considered only those modulation depths where all

"good" listeners performed the task for all electrode configurations, i.e., modulation depths smaller than or equal to 20%. Repeated measures ANOVA resulted in significant effects of electrode configuration and modulation depth [*F(*2*,* <sup>10</sup>*)* = 11*.*487 and 9.128, *p* = 0*.*003 and 0.025 with Greenhouse–Geisser correction, for electrode configuration and modulation depth].

*Post-hoc* least significant difference (LSD) testing revealed significant differences between all pairwise comparisons of electrode configurations (*p* = 0*.*026, 0.025, 0.016 for basal vs. apical, basal vs. dual-site, and apical vs. dual-site). Specifically, performance in the basal condition was slightly better than in the apical condition (by an estimated marginal mean difference of 0.4 *d*- -units), indicating that the nominal high-frequency places of stimulation were more sensitive to ITD than the low-frequency places. Moreover, performance in the dual-site condition was slightly and consistently better than in the basal- and apical-only conditions (by estimated marginal mean differences of 0.2 and 0.5 *d*- -units). This suggests that as a group in the dual-site configuration listeners were not simply basing their decision on the apical pair or the basal pair.

The minimal required modulation depth to reach ceiling performance varied strongly across listeners, with IAZ and IBF performing at *d*- = 3 with as shallow as 20% modulation depth, and IBP and IBU not approaching ceiling performance until 100% modulation depth.

A noteworthy caveat, overall dynamic ranges varied across listeners, perhaps contributing some of the observed variance in the ITD discrimination performance. To address this potential confound, all dual-site-configuration curves were fitted with probit functions. To that end, the *d*- -values in Equation (1) were inverse transformed, and the resulting unbiased percent correct scores were then fit with probit functions. For each listener and electrode, dynamic ranges were calculated as the dB difference between *I*min and *I*max current amplitudes at 100% modulation depth. The slopes in the dual-site electrode conditions vs. each listener's smallest dynamic range in dB were not significantly correlated (*R*<sup>2</sup> <sup>=</sup> <sup>0</sup>*.*01, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*99). Probit mean and dynamic range were also not significantly correlated (*R*<sup>2</sup> <sup>=</sup> <sup>0</sup>*.*56, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*24). This suggests that overall dynamic range did not dramatically affect ITD performance.

#### **CONSIDERING THE OPTIMAL LISTENER MODEL**

When apical and basal electrodes were presented jointly, all listeners tended to perform similarly to what would be expected based on optimal integration of the ITD information across the two electrode pairs [Equation (2)]; however, subsequent statistical analysis failed to differentiate between true integration and listening to stimuli presented only to the site with better ITD sensitivity.

Pooled across all listeners and the three lowest modulation depths (0, 10, and 20%), theoretically optimal *d* and observed dual-site *d* were highly correlated (*R*<sup>2</sup> <sup>=</sup> <sup>0</sup>*.*931, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*0001). Moreover, when contrasting predicted and observed performance in the dual-site conditions, repeated measures ANOVA found no significant differences between predicted values and observed values [*F(*1*,* <sup>5</sup>*)* = 6*.*380, *p* = 0*.*53]; and a significant effect of modulation depth [*F(*2*,* <sup>10</sup>*)* = 7*.*758, *p* = 0*.*009]. However, when comparing better electrode with dual-site performance, those *d*- values were also highly correlated (*R*<sup>2</sup> <sup>=</sup> <sup>0</sup>*.*91, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*0001), and repeated measures ANOVA also did not reveal a statistically significant difference between the two conditions [*F(*1*,* <sup>5</sup>*)* = 0*.*14; *p* = 0*.*14, and *p* = 0*.*01, for better electrode vs. joint performance, and *F(*2*,* <sup>10</sup>*)* = 7*.*32, *p* = 0*.*01 for modulation depth]. Thus, two different interpretations are consistent with these findings. One interpretation suggests that the six "good" listeners were indeed able to optimally combine information across pairs of electrodes. Alternatively, these listeners simply extracted information from the electrode pair with better ITD sensitivity.

#### **POORER PERFORMERS**

For the two more poorly performing listeners (subjects IAJ and IBR), *d* was overall close to chance (zero), even with ±2000µs ITD (**Figures 2G,H**). At 80 and 100% modulation depth, for the dual-site electrode conditions, subject IAJ's performance was qualitatively better than chance. A paired *t*-test comparing apical, basal, and dual-site performance showed a significant difference between 0 and 100% modulation depth [*t(df* <sup>=</sup> <sup>2</sup>*)* = −4*.*5, *p* = 0*.*05]. However, paired *t*-tests did not reveal significant differences between the three electrode configurations [*t(df* <sup>=</sup> <sup>3</sup>*)* = −0*.*2, *p* = 0*.*9 for basal vs. dual-site; *t(df* <sup>=</sup> <sup>3</sup>*)* = −1*.*6, *p* = 0*.*2 for apical vs. dual-site]. Subject IBR's *d*- -data show that at 80 and 100% modulation depth this listener was somewhat able to discriminate

left- and right-leading stimuli. However, *d* was negative for the basal conditions, indicating that this listener consistently reported the opposite directions for these sounds. IBR's results did not differ significantly across stimulus conditions [*t(df* <sup>=</sup> <sup>2</sup>*)* = 0*.*2, *p* = 0*.*9, paired *t*-test for 0 vs. 100% modulation depth; *t(df* <sup>=</sup> <sup>2</sup>*)* = 0*.*2, *p* = 0*.*9 for apical vs. dual-site; *t(df* <sup>=</sup> <sup>2</sup>*)* = 0*.*4, *p* = 0*.*8 for basal vs. dual-site]. In summary, the results suggest that of the two more poorly-performing listeners, one was more sensitive to ITDs in the modulated condition than in the unmodulated condition.

#### **ONSET CUES**

For the four best-performing listeners (IAZ, IBB, IBF, and IBK, **Figures 2A–D**), even at 0% modulation depth, performance was better than chance; for each of the electrode configurations, *t*-tests for measured *d*- -values vs. *d*- = 0 revealed statistically significantly results [*t(df* <sup>=</sup> <sup>5</sup>*)* = 2*.*6, *p* = 0*.*05 for basal; *t(df* <sup>=</sup> <sup>5</sup>*)* = 4*.*1, *p* = 0*.*009 for apical; *t(df* <sup>=</sup> <sup>5</sup>*)* = 4*.*0, *p* = 0*.*01 for dual-site]. These results suggest that the four best-performing listeners were able to exploit onset ITD information, consistent with previous findings (Laback et al., 2007; van Hoesel, 2008).

To estimate how well listeners could discriminate ITDs with attenuated onsets, Experiment 2 tested four of the "good" listeners with conditions that were identical to those tested in Experiment 1, except that the stimuli were ramped on and off with 100 ms windows. **Figure 3** shows *d*- -performance for the ramped conditions. The overall pattern in performance is similar to that in **Figure 2**. Performance increases monotonically as a function of modulation depth. Moreover, at each modulation depth, performance in the dual-site electrode configurations is

similar to or better than that of the better electrode. Repeated measures ANOVA considering only those modulation depths that had been tested in all listeners and electrode configurations (0, 10, and 20%) found significant effects of modulation depth [*F(*2*,* <sup>6</sup>*)* = 6*.*648, *p* = 0*.*030] and of electrode configuration [*F(*2*,* <sup>6</sup>*)* = 6*.*740, *p* = 0*.*029]. However, *post-hoc* LSD indicated no significant differences between any of the pairwise comparisons across electrode configurations (*p* = 0*.*091 for basal vs. apical, *p* = 0*.*692 for basal vs. dual-site, and, *p* = 0*.*055 for apical vs. dual-site). Pairwise comparisons across modulation depths also found no significant differences with *post-hoc* LSD (*p* = 0*.*137 for 0–10%; *p* = 0*.*070 for 0–20%; and *p* = 0*.*088 for 10–20%).

**Figure 4** shows performance in the ramped conditions as a function of performance in the un-ramped conditions (one data point plotted per modulation depth). Overall, performance in the ramped conditions was worse than that in the un-ramped conditions (points fall below the diagonal). Repeated measures ANOVA, considering only those three modulation depths that were tested for all four listeners in both the ramped and un-ramped conditions, found no significant effect of ramping [*F(*1*,* <sup>3</sup>*)* = 7*.*619, *p* = 0*.*07]. Main factors of electrode configuration and modulation depth remained significant [*F(*2*,* <sup>6</sup>*)* = 10*.*783, *p* = 0*.*01; *F(*2*,* <sup>6</sup>*)* = 11*.*596; *p* = 0*.*009 for electrode configuration and modulation depth]. Note that unlike the data points in **Figure 4**, this analysis only considers performance above 20% modulation depth, where all listeners were tested. The effect of ramping is non-significant, however, it appears that at least for some listeners ramping can affect performance (e.g., IAZ, where all points fall below the diagonal in **Figure 4A**).

**FIGURE 4 | ITD sensitivity for stimuli with and without ramp, from Experiment 1 and 2, respectively.** Each panel **(A–D)** shows results for one listener. Basal-only, apical-only, and dual-site electrode configurations are shown by the different symbols. Repeated symbols reflect the fact that performance is shown across a range of modulation depths.

## **DISCUSSION**

Natural sounds have broadband energy, giving rise to ITDs along the full spectral range. In order to present binaural cues with fidelity for broadband sounds, it is necessary for CIs to present ITDs concurrently at multiple electrodes that are placed along the cochlear array. Current bilateral CIs present sounds, and therefore ITDs, through envelope information. However, in everyday acoustic settings, sound envelopes are often de-modulated by competing sources and by reverberant energy. In these adverse listening conditions, both CI sound localization and speech intelligibility suffer compared to NH performance (e.g., van Hoesel, 2012). The current study examined how CI listeners were able to combine envelope-ITD information across two concurrent stimulation sites, when modulation depth was parametrically varied. We compared ITD sensitivity across apical and basal cochlear regions when these two regions were stimulated either alone or jointly.

Consistent with early findings for one CI listener (van Hoesel and Tyler, 2003), our results show that ITD sensitivity increased with larger modulation depth for six out of eight CI listeners. Previously, when presented with transposed stimuli, ITD sensitivity in NH listeners decreased and perceived intracranial laterality moved frontally with smaller modulation depth (Bernstein and Trahiotis, 2011). The current results are consistent with the interpretation that, for CI listeners, similar to NH listeners, the intracranial localization strength was more distinctly available with steeper modulations.

There is some disagreement in the literature on CI listeners about whether binaural performance varies with stimulation site (apical or basal). One previous study found no difference across stimulation sites, for high-rate pulse trains (6000 pps) that were sinusoidally modulated at 100 Hz (van Hoesel et al., 2009). The absence of a stimulation site effect is consistent with findings in NH listeners who show similar ITD sensitivity to transposed tones with high frequency carriers and low-frequency pure tones, which stimulate the basal and apical regions of the cochlea, respectively (van de Par and Kohlrausch, 1997). The current study supports those findings. In particular, while each listener had a better electrode pair, there was no consistent across-listener trend that would differentiate apical and basal results: four out of six "good" listeners performed worse at the apical than at the basal pair, for two listeners the reverse was true. Similar findings have been reported for un-modulated low-rate pulsatile stimulation (Litovsky et al., 2010). That study had examined basal, middle, and apical for 100 pps and found no effect of place. However, other data shows that listeners performed better with basal than with apical stimulation, as tested with low-rate pulse trains (100 pps; Best et al., 2011).

Here, basal-alone and apical-alone levels at each electrode pair were usually below C-level for that pair when played in isolation. ITD sensitivity can decrease with decreasing stimulation level (van Hoesel, 2007). Here, we loudness-balanced the stimuli across cochlear stimulation sites. Therefore, it is unlikely that observed differences in ITD sensitivity across basal and apical stimulation sites are an artifact of the stimulus level choices. Instead, differential ITD sensitivity could be due to across-site differences in neural survival and biological or surgical factors, including proximity of electrodes and nerve (Litovsky et al., 2010, 2012). Our results show that in the dual-site electrode configurations, performance was always better than that of the electrode pair producing poorer performance. By loudness-balancing we ensured that performance was not simply dominated by the louder and therefore more salient pair of electrodes.

While previous work demonstrates that interactions across cochlear sites can influence performance for conflicting ITD cues for both acoustic and electric hearing (McFadden and Pasanen, 1975; Best et al., 2011), here, both pairs of electrodes were situated in a more basal position relative to where neurons with best frequencies in the 600–700 Hz range are typically located, and both provided non-conflicting envelope-ITDs, making it unlikely that binaural interference would affect listeners' performance. Indeed, to the extent that binaural interference may have influenced ITD sensitivity, performance in the dual-site conditions should have been worse than that of the single-site basal conditions. No such binaural interference was observed. In fact, the electrode pair with worse performance did not drive performance below that of the electrode pair with good performance in any of the dualsite stimulation conditions, showing that listeners do not suffer from interference even when one electrode pair provides poorly represented spatial information.

An important issue regarding many previous studies with bilaterally implanted CI users is the consideration of subject selection criteria. In a number of prior studies, listeners had been pre-selected based on their abilities to perform a binaural task with low-rate stimuli (e.g., Laback and Majdak, 2008; van Hoesel, 2008; van Hoesel et al., 2009). Here, listeners were not pre-selected according to performance criteria, which may help explain the fact that some listeners could not perform the task with 1000µs at 100% modulation depth, even though this has not been reported in the literature before.

A potential confound to note is that modulation depth was measured in percentage of the dynamic range between *I*min and *I*max, and this range differed across listeners (**Table 2**). A listener with a large difference between *I*min and *I*max may have performed better because the sounds were modulated over a wider dynamic range than for a listener with a small dynamic range. However, because the dynamic range was not significantly correlated with task performance, we deem this possibility unlikely.

Some listeners were able to perform the ITD discrimination task even at 0% modulation depth. This is consistent with previous work demonstrating that some listeners can utilize onset cues of high-rate pulse trains to discriminate ITDs, even in the absence of post-onset ITD information (Laback et al., 2007; van Hoesel, 2008; Noel and Eddington, 2013). Similarly, in another previous study using an ITD discrimination task, when tested with 800 and 1200 pps stimuli and 600µs ITD, CI listeners could perform the task above chance (Laback and Majdak, 2008). Note that in these prior studies, CI listeners were discriminating between 0 ITD and left or right-leading ITD in a 2AFC paradigm (Laback et al., 2007; Laback and Majdak, 2008; Noel and Eddington, 2013), or they were asked to indicate perceived intracranial position (van Hoesel, 2008). Here, CI listeners judged the direction of a change in ITD of 2000 or 4000µs, so the task was considerably easier than in those previous studies.

Experiment 2 tested the usefulness of onset cues by ramping the sounds on and off. Ramping altered the stimuli in two ways. It should have resulted in a more staggered stimulation of the neural population than with the non-ramped stimuli, reducing the potential usefulness of onset cues. In addition, the ramp shortened the perceived duration of the sound. Here, when stimuli were ramped, one listener (IAZ) performed clearly worse than when stimuli were not ramped, whereas other listeners showed less appreciable change in performance. Here, because the ±1000µs ITD was an integer multiple of the period of the 1000 pps carrier signal, ongoing temporal fine structure ITDs did not provide unambiguous information about source direction. The relatively robust performance with attenuated onsets is consistent with the previous finding that some listeners can utilize ongoing envelope-ITDs (Noel and Eddington, 2013). Moreover, it suggests that strategies for weighting the onset and running portions of each stimulus differ across listeners. These results are relevant when considering previous work on onset weighting. Using stimuli with a dichotic onset pulse followed by three diotic pulses, Laback et al. (2007) demonstrated that with increasing pulse rate CI listeners tend to rely more strongly on onset cues. Similarly, post-onset pulses exerted stronger effects on CI listeners' ITD sensitivity at 100 pps than they did at 300 and 600 pps (van Hoesel, 2008). These studies share that onset and post-onset pulses carried conflicting ITD information. In contrast, here, all pulses had the same ITD. Therefore, here, post-onset pulses are likely to have helped listeners perform the ITD discrimination task, even when onsets were attenuated.

Recent findings suggest that binaural sensitivity can be poorer when nearby electrodes cause energetic masking (Lu et al., 2010). Moreover, in an ITD discrimination task, the binaural bandwidth eliciting a robust ITD on a target electrode was estimated to be about five times greater in CI than in NH (Poon et al., 2009). Specifically, to decrease ITD sensitivity by a factor of 2 (binaural half-width), the mismatch in cochleotopic position across ears in CI users is 3.7 mm as opposed to 0.7 mm in NH listeners (Poon et al., 2009). This widened binaural bandwidth in CI users as compared to NH listeners could imply higher susceptibility to energetic masking also within one ear. Furthermore, in a binaural masking level difference task, CI listeners' abilities to detect a tone on a target electrode pair improved when fewer electrodes carried the masking signal, even when those additional masking electrodes were outside the nominal critical band for the target electrode (Lu et al., 2010). In addition, the extent of channel interaction, as estimated from auditory nerve evoked potentials in several listeners, was negatively correlated with binaural benefit (Lu et al., 2010). These studies show that peripheral interaction can impact binaural sensitivity, especially when spacing between neighboring electrodes is small. To limit the impact of performance asymmetries across the cochlea due to energetic masking, here, we chose electrodes that were spaced relatively widely. Still, in Experiment 1, each listener had a better electrode pair, similar to previous reports (Laback et al., 2011). Specifically, for three listeners who were tested in both experiments, that better pair was the same across both experiments. Often the differences between apical and basal stimulation within that listener were pronounced with a sizeable difference in midpoints of the underlying psychometric functions. An additional analysis using dB modulation depth along the abscissa (not shown) did not produce qualitatively different trends, nor could we identify a parameter in the level map of the CI listeners that could explain these within-listener differences.

#### **SUMMARY AND CONCLUSIONS**

The current study reinforces and extends findings from previous studies on sensitivity to envelope ITD in bilateral CI listeners. Specifically, when discriminating envelope-ITDs of 100 Hz modulated high-rate pulse trains, a CI listener's performance improves with increasing modulation depth. Moreover, consistent with previous work, most CI listeners performed clearly better at one of the two stimulation sites (Laback et al., 2007; van Hoesel, 2008). However, whether the apical or the basal stimulation site produced higher percent correct scores did not vary consistently across listeners. Finally, some listeners could discriminate 1000µs ITDs for an unmodulated 1000 pps train, whereas others struggled with this task. Moreover, when comparing the ramped to the non-ramped conditions, listeners varied in their ability to utilize onset cues. Together, these findings provide further evidence to previous work that listeners differ in how strongly they weigh onset and ongoing cues of the stimulus (Laback et al., 2007; van Hoesel, 2008).

Results show that CI listeners did not perform worse when two electrode pairs jointly transmitted ITD information, compared to listening to the better of the two pairs. This finding is consistent with the interpretation that CI listeners either truly integrated ITD information across the two stimulation sites or that they performed based on the electrode pair carrying more salient ITD information. Performance asymmetries between the two stimulation sites make it difficult to conclusively differentiate between these two explanations. Together these findings provide evidence that CI listeners can sustain ITD sensitivity for two-site stimulation compared to single-site stimulation. The current results are encouraging in that no interference from the worse electrode pair was observed.

#### **REFERENCES**


time and effect of interaural electrode spacing. *J. Acoust. Soc*. 126, 806–815. doi: 10.1121/1.3158821


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 15 September 2013; accepted: 29 January 2014; published online: 11 March 2014.*

*Citation: Ihlefeld A, Kan A and Litovsky RY (2014) Across-frequency combination of interaural time difference in bilateral cochlear implant listeners. Front. Syst. Neurosci. 8:22. doi: 10.3389/fnsys.2014.00022*

*This article was submitted to the journal Frontiers in Systems Neuroscience.*

*Copyright © 2014 Ihlefeld, Kan and Litovsky. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Diffusion tensor imaging in children with unilateral hearing loss: a pilot study

#### *Tara Rachakonda1, Joshua S. Shimony2, Rebecca S. Coalson2 and Judith E. C. Lieu1 \**

*<sup>1</sup> Department of Otolaryngology-Head and Neck Surgery, Washington University School of Medicine, St. Louis, MO, USA*

*<sup>2</sup> Department of Radiology, Mallinckrodt Institute of Radiology, Washington University School of Medicine, St. Louis, MO, USA*

*<sup>3</sup> Departments of Neurology and Radiology, Washington University School of Medicine, St. Louis, MO, USA*

#### *Edited by:*

*Jonathan E. Peelle, Washington University in St. Louis, USA*

#### *Reviewed by:*

*Fatima T. Husain, University of Illinois at Urbana-Champaign, USA Pim Van Dijk, University Medical Center Groningen, Netherlands*

#### *\*Correspondence:*

*Judith E. C. Lieu, Department of Otolaryngology-Head and Neck Surgery, Washington University School of Medicine, 660 S. Euclid Ave., Campus Box 8115, St. Louis, MO 63110, USA e-mail: lieuj@ent.wustl.edu*

**Objective**: Language acquisition was assumed to proceed normally in children with unilateral hearing loss (UHL) since they have one functioning ear. However, children with UHL score poorly on speech-language tests and have higher rates of educational problems compared to normal hearing (NH) peers. Diffusion tensor imaging (DTI) is an imaging modality used to measure microstructural integrity of brain white matter. The purpose of this pilot study was to investigate differences in fractional anisotropy (FA) and mean diffusivity (MD) in hearing- and non-hearing-related structures in the brain between children with UHL and their NH siblings.

**Study Design**: Prospective observational cohort.

**Setting:** Academic medical center.

**Subjects and Methods**: Sixty one children were recruited, tested and imaged. Twenty nine children with severe-to-profound UHL were compared to 20 siblings with NH using IQ and oral language testing, and MRI with DTI. Twelve children had inadequate MRI data. Parents provided demographic data and indicated whether children had a need for an individualized educational program (IEP) or speech therapy (ST). DTI parameters were measured in auditory and non-auditory regions of interest (ROIs). Between-group comparisons were evaluated with non-parametric tests.

**Results**: Lower FA of left lateral lemniscus was observed for children with UHL compared to their NH siblings, as well as trends toward differences in other auditory and non-auditory regions. Correlation analyses showed associations between several DTI parameters and outcomes in children with UHL. Regression analyses revealed relationships between educational outcome variables and several DTI parameters, which may provide clinically useful information for guidance of speech therapy.

**Discussion/Conclusion**: Our data suggests that white matter microstructural patterns in several brain regions are preserved despite unilateral rather than bilateral auditory input which contrasts with findings in patients with bilateral hearing loss.

#### **Keywords: hearing loss, unilateral hearing loss, diffusion tensor imaging, children, magnetic resonance imaging**

## **INTRODUCTION**

Although unilateral hearing loss (UHL) often goes undetected until children begin school, prevalence rates of UHL in newborns range from 0.04 to 3.4% (Mehl and Thomson, 1998; Widen et al., 2000). Studies have estimated the prevalence of UHL in schoolaged children (ages 6–19 years) as high as 5% (Niskar et al., 1998). It was previously assumed that because children with UHL had one functioning ear, speech acquisition and language comprehension skills developed normally. However, several studies have suggested otherwise (Davis et al., 1981; Bess and Tharpe, 1984; Lieu et al., 2010). In one study, when compared with their normal hearing (NH) siblings, children (ages 6–12 years) with UHL had substantially worse oral language scores (Lieu et al., 2010). Other investigators have found evidence of increased rates of behavioral problems, a greater need for intensive educational plans and worse performance in school amongst children with UHL (Bess, 1982; Bess and Tharpe, 1984; Davis et al., 2002).

Presumably, problems with sound localization and the need for a higher signal-to-noise ratio for speech comprehension in children with UHL contribute to their delay in language skills, but they do not explain why some of these children experience behavioral challenges (Bess and Tharpe, 1984; Davis et al., 2002). These difficulties may reflect problems with executive functioning rather than difficulty with auditory processing, similar to children with bilateral hearing loss who may encounter executive functioning difficulties (Figueras et al., 2008; Beer et al., 2011).

Whether "right ear advantage (REA)" exists for speech perception has been controversial (Hugdahl, 2011). Although children with UHL have performed worse on speech-language tests, it is less clear that the side of hearing impairment influences cognitive abilities. According to the REA hypothesis, children with right UHL should have greater difficulties with language skills than those with left UHL. Some have postulated that the right hemisphere preferentially processes spectrally complex sounds (e.g., music), whereas the left hemisphere processes temporally complex sounds (e.g., speech) (Penhune et al., 1996). Older studies point to a higher rate of grade failures and worse verbal test performance in children with right UHL (Bess and Tharpe, 1984; Hartvig et al., 1989; Niedzielski et al., 2006). However, a large case-control study did not find intellectual differences based upon side of hearing impairment (Lieu et al., 2010).

In this study, we sought to find a neuroanatomical basis for educational differences. In a prior investigation in this study cohort, differences in resting state functional connectivity MRI (rs-fcMRI) were found in areas associated with auditory processing, executive function and memory formation between children with UHL and NH controls (Tibbetts et al., 2011). MRI provides a non-invasive means by which to examine the brain for gross anatomical changes, functional changes in activation (fMRI), and microstructural changes (diffusion tensor imaging, DTI) in numerous nervous system pathologies (Rykhlevskaia et al., 2008), and more specifically in hearing loss (Chang et al., 2004; Firszt et al., 2006; Lin et al., 2008; Kim et al., 2009; Propst et al., 2010). DTI measures of microstructural damage in white matter have been associated with behavioral measures in numerous diseases, such as in blindness (Shimony et al., 2006), depression (Shimony et al., 2009), traumatic brain injury (Hulkower et al., 2013), phenylketonuria (Antenor-Dorsey et al., 2013), and numerous other examples. DTI measures the diffusion of water molecules in brain tissues (Shimony et al., 2006). Water in brain tissues diffuses faster parallel to white matter tracts as compared to perpendicular, a property known as anisotropy. Anisotropy has been used to investigate the integrity and course of white matter tracts in the brain. Common DTI parameters include fractional anisotropy (FA), which ranges from 0 for equal diffusion in all directions to 1 for diffusion only along 1 axis, and the mean diffusivity (MD), which measures how easily water diffuses averaged over all directions, also ranging from 0 to 1. Changes in diffusivity or loss of directionality suggest lack of microstructural integrity (Mori and Zhang, 2006; Assaf and Pasternak, 2008; Neil, 2008). Restricted diffusion has been identified in several types of disease processes, including acute demyelination, certain brain tumors and acute ischemia (Mukherjee et al., 2008).

A few studies have used DTI to investigate the neuroanatomic properties of auditory processing regions in patients with hearing loss. These studies focused mainly upon subcortical structures and adults with hearing loss (Chang et al., 2004; Lutz et al., 2007; Lin et al., 2008; Kim et al., 2009; Wu et al., 2009). In light of DTI changes reported in prior studies of hearing loss, the goal of this study was to determine how UHL influences the development of white matter tracts in the brain. Our aims were to compare the microstructural integrity of white matter tracts using DTI in hearing-related and non-hearing-related brain regions between children with UHL and their NH siblings; and to examine relationships between various educational outcome variables and the DTI parameters (FA and MD) of hearing-related regions.

## **MATERIALS AND METHODS**

#### **STUDY PARTICIPANTS**

Sixty one participants with severe-to-profound UHL and their NH siblings were enrolled. Participants with UHL were defined as having severe-to profound sensorineural UHL by pure tone audiometry, with pure tone averages (PTA) ≥70 dB hearing level (HL) in the affected ear. Participants with NH had PTA *<* 20 dB HL in both ears. All subjects underwent audiometry, cognitive testing for IQ scores, language testing, and MRI scanning. Imaging data from 12 children had to be excluded from analysis, due to movement artifact (*n* = 1), inability to lie still in the scanner (*n* = 1), inadequate data acquisition (*n* = 7) and inadequate data due to the presence of metallic devices (orthodontic appliances, *n* = 2, and a bone-anchored hearing aid, *n* = 1). Thus, the analysis included 29 children with UHL (13 right UHL, 16 left UHL) and 20 siblings with NH. Children with both acquired and congenital UHL were included. All subjects were cognitively normal per parent report.

This study protocol was approved by the Human Research Protections Office (HRPO) at the Washington University School of Medicine. Informed consent was obtained from all parents of subjects and pediatric assent obtained from all minor participants.

#### **BASELINE VARIABLES**

Parents provided demographic, health, and family data. Parents also indicated whether children required speech therapy, special educational accommodations or special services at school, or an individualized educational program (IEP).

#### **MEASURED OUTCOME VARIABLES**

Cognitive ability was assessed by the Wechsler Abbreviated Scale of Intelligence (WASI) (Wechsler, 1999).We used standardized performance IQ, verbal IQ, and full scale IQ scores along with vocabulary T-scores as educational outcomes. All scores were standardized to a mean of 100 with a standard deviation (SD) of 15, except for the vocabulary T-score, which is standardized to a mean of 50 and SD of 10. Two standardized language tests were administered to the participants—the Oral and Written Language Scale (OWLS) and the Clinical Evaluation of Language Fundamentals (CELF) (Carrow-Woolfolk, 1995; Semel et al., 2011). Because both language tests use a standardized scale, they were combined as language outcomes for analysis.

#### **MRI SCANNING PROTOCOL AND IMAGE ACQUISITION**

Images were obtained on a 3.0 Tesla Siemens Trio, MR scanner (Erlangen, Germany). The imaging protocol included T1-weighted Magnetization Prepared Rapid Acquisition Gradient Echo (MPRAGE) sequence [Time of repetition (TR) /inversion time (TI)/ echo time (TE) = 2400*/*1000*/*3*.*16 ms, voxel size = <sup>1</sup> <sup>×</sup> <sup>1</sup> <sup>×</sup> 1 mm3] and a T2-weighted (T2W) fast spin echo (FSE) scan (TR = 4380 ms, TE = 94 ms, 1 × 1 × 4 mm). DTI data (1*.*5 × 1*.*5 × 1*.*5 mm voxels, TR = 9600 ms, TE = 95 ms), was collected in 25 different directions with *b*-values linearly distributed between 0 and 1400 s/mm2.

The images were preprocessed and transformed into modified (Talairach and Tournoux, 1988) space using the following methods. A 9-parameter rigid body alignment registered all frames in all runs and was used for motion correction of each subject. Resampling was done by a 3-dimensional cubic spline interpolation and transformed to a Talairach atlas space using a single common atlas derived from adult and child brains via a warping mechanism (Burgund et al., 2002). After the registration steps, the diffusion data was processed with locally written software using a log linear algorithm into DTI parameter data using the commonly used tensor model (Basser et al., 1994).

#### **REGIONS OF INTERESTS**

The 15 regions of interest selected for analysis are listed in **Table 1** and fall into two groups. The first are regions that are known to be involved in auditory processing. The second group are nonauditory regions that are considered important white matter areas that are commonly sampled in DTI studies of the brain.

Auditory regions (**Figures 1A,B**) include: Auditory radiation; Heschl's gyrus (both gray and subcortical white matter); Inferior colliculus; Lateral lemniscus; Superior temporal gyrus (gray matter only).

Non-auditory regions (**Figures 1C–F**, **2A–C**) include: Genu and splenium of the corpus callosum; Middle cerebellar peduncle; Globus pallidus; Putamen; Anterior corona radiata; Anterior limb of the internal capsule; Uncinate fasciculus; and Inferior longitudinal fasciculus. These regions are commonly sampled in DTI studies due to their importance in brain function and chosen as "control" regions where no differences would be expected. The latter four were also chosen due to their involvement in executive functioning (Cummings, 1993; Schmahmann et al., 2008).

Regions were located and sampled after the MPRAGE scans were transformed into Tailarach space over three to four slices each by one independent rater under the supervision of an experienced neuroradiologist. ROI location was traced on the MPRAGE anatomy, but guided using coregistered T2-weighted images to avoid cerebrospinal fluid, and FA images to capture fiber centers. The FA and MD values in the ROIs were measured and averaged over consecutive slices for a more accurate measurement than a single slice would give, thus minimizing the risk of type I error. The approximate MNI coordinates for these ROIs are listed in **Table 1** (each subject's regions are slightly different).

#### **Table 1 | MNI coordinates for regions of interest (ROI) used in the study.**


*Average size was measured as mm3.*

**FIGURE 1 | Regions of interest (ROIs) used for DTI measurements placed on anatomical T1-weighted MRI scan. (A)** Four auditory ROIs, Gray matter of Heschl's gyrus (GM-HG), White matter of Heschl's gyrus (WM-HG), Superior temporal gyrus (STG), and Auditory radiation (AR). **(B)** Two auditory ROIs, Inferior colliculus (IC), Lateral lemniscus (LL). **(C)** Three

#### non-auditory ROIs, Putamen (PUT), Globus pallidus (GP), Posterior limb of the internal capsule (PLIC), Anterior limb of internal capsule (ALIC). **(D)** Two non-auditory ROIs, Genu of the corpus callosum (G-CC), Splenium of the corpus callosum (S-CC). **(E)** Non-auditory ROI, Middle cerebellar peduncle (MCP). **(F)** Non-auditory ROI, Middle cingulate gyrus (MCG).

#### **STATISTICAL ANALYSIS**

Statistical analyses were performed using the Statistical Package for the Social Sciences (SPSS), Version 20.0 (IBM Corp., Armonk, NY, US). Educational performance variables were compared with Mann-Whitney *U*-tests between the UHL and NH groups and between the right and left UHL groups. Pearson's chisquare was used to assess need for speech therapy and/or IEP. We used a Bonferroni correction of 0.05/6 = 0.008 for a two-sided *alpha* threshold for the six educational outcomes assessed; a *p*-value less than 0.05 was considered a trend.

Kruskal-Wallis tests provided between-group comparisons of DTI parameters. Paired Wilcoxon signed rank-test of DTI parameters between sides (right and left) were performed within each of the groups. Correlations between educational variables and DTI parameters were explored using Spearman's rho coefficient. Due to the concerns of multiple comparisons, we used a Bonferroni correction of 0.05/12 = 0.004 for a two-sided *alpha* threshold for auditory ROIs (6 ROIs × 2 hemispheres), and 0.05/18 = 0.003 for non-auditory ROIs (9 ROIs × 2 hemispheres); a *p*-value less than 0.05 was considered a trend.

Because multiple variables can predict outcomes, we used multiple linear or logistic regression modeling to evaluate the role of age and UHL simultaneously with the specific DTI parameters which were statistically significant or trending toward significance in bivariate analysis on the educational outcomes of interest. For the regression analyses, a *p*-value less than 0.05 was considered statistically significant.

## **RESULTS**

Demographic data is presented in **Table 2**. The age of study participants ranged from 7.4 to 17.6 years. There were no significant differences in gender, age, racial composition, handedness or rates of prematurity between the NH and UHL groups. Etiology of hearing loss did not differ significantly between the right UHL and left UHL groups. Age at identification of hearing loss was 3 months to 8 years with a mean of 4.6 years (*SD* 2.8 years); mean duration of UHL was 7.5 years (*SD* 3.3 years), with a range of 2.2–14.2 years.

Though NH subjects performed better on all aspects of the cognitive testing, these differences were not all statistically significant (**Table 3**). No differences in full sum IQ were observed

between the right and left UHL groups. However, NH individuals trended toward higher scores on the verbal component of the cognitive tests (**Table 3**). In addition, children with UHL had a significantly greater need for IEPs (45%) and speech therapy (41%) than children with NH (5% for both IEPs and speech therapy).

#### **DTI PARAMETERS BETWEEN GROUPS**

DTI parameters were compared separately on the left and right side of the brain; for instance, the FA of the right auditory radiation of the UHL group vs. that of the NH group (**Table 4**). FA values trended toward being higher in the NH group than in the UHL group in two auditory regions and two non-auditory regions. The FA of the left lateral lemniscus was significantly higher for NH compared to UHL. The only trends between groups among mean diffusivities were in the MD of the in right subcortical white matter of Heschl's gyrus and left putamen. In general, FA was greater and MD lower in the NH group than in the UHL group, but this did not reach statistical significance in most regions.

We also compared DTI parameters of the right UHL and left UHL groups (**Table 4**). The FA of the left lateral lemniscus and the FA of the left subcortical white matter of Heschl's gyrus trended toward being lower in the left UHL group. The FA of the right centrum semiovale trended toward being higher in the left UHL group.

#### **DTI PARAMETERS BETWEEN SIDES**

Paired comparisons between the left and right sides of structures were performed for the UHL and NH groups. In all UHL subjects, the right side FA was greater than the left side FA in both auditory and non-auditory regions, with the exception of the superior temporal gyrus, middle cingulate gyrus, anterior limb of the internal capsule, centrum semiovale and the anterior corona radiata (**Table 5**). Amongst NH subjects, a similar pattern was observed with a few exceptions—superior temporal gyrus, Heschl's gyrus, anterior limb of the internal capsule, middle cingulate gyrus, and the anterior corona radiata. MD results generally corresponded to FA results but in the opposite direction. For both NH and UHL subjects, right sided MD was significantly lower than left sided MD in most structures.

#### **CORRELATIONS**

Correlations of educational outcome variables with DTI parameters were performed separately for the following groups: (1) All NH subjects; (2) All UHL subjects; (3) All Right UHL subjects; and (4) All Left UHL subjects.

In general, FA correlated positively and MD negatively with test scores for children with UHL (data not shown). Though there were many trends toward significance in the NH group, there were no significant correlations. In the UHL group, the FA and the MD of several ROIs correlated significantly with several different educational parameters, such as full IQ, verbal IQ, IEP, and speech therapy (**Table 6**). The FA of the left uncinate fasciculus was negatively correlated with Performance and Full IQ, the MD of the right posterior limb of the internal capsule was negatively associated with Verbal IQ, while the FA of the left middle cerebellar peduncle was associated with need for speech therapy. For children with right UHL, the MD of the right middle cerebellar peduncle was highly correlated with Verbal IQ, and the FA of both right and left uncinate fasciculus were highly correlated with both Performance and Full IQ. For children with left UHL, both the MD and FA of the left middle cerebellar peduncle were strongly correlated with speech therapy.

#### **REGRESSION ANALYSES**

As DTI parameters have been noted to change with age (Loenneker et al., 2011), we performed multiple linear regression analyses for regions with significant correlations with educational outcome variables as the dependent variable; age, hearing loss status and one DTI parameter (either FA or MD) served as independent variables. As with the correlation analyses, FAs were generally positively related to test scores while MDs were negatively associated with test scores (**Table 7**).

Multiple logistic regression analyses were performed similarly for IEP and speech therapy as the dependent variables. The FAs of the bilateral middle cerebellar peduncles were negatively related to the need for speech therapy (**Table 8**). Hearing loss status was associated with educational outcome variables in several models.

## **DISCUSSION**

Our study is one of the first to apply DTI analysis to children with UHL. Consistent with previous research, children with UHL had significantly worse speech and language scores and required speech therapy and IEPs more often than children with NH (Lieu et al., 2010). Some significant differences were found between DTI parameters of the NH and UHL groups, or between right UHL, left UHL, and NH groups of children. Left-right asymmetries noted in the NH children were retained in the children with UHL.

The main advantage and the most interesting results from our study come from the inclusion of multiple educational outcome


**Table 2 | Demographic characteristics of normal hearing (NH) children and children with unilateral hearing loss (UHL).**

**Table 3 | Comparisons between unilateral hearing loss (UHL) group and the normal hearing group (NH) for educational outcome variables, and right unilateral hearing loss (right UHL) and left unilateral hearing loss (left UHL) groups.**


*\*Trend at p < 0.05; †Significant at p < 0.008 for multiple comparisons of educational outcomes.*

variables. DTI parameters in several ROIs were significantly correlated with educational outcome variables. Interestingly, the greater the FA of left Heschl's gyrus, the less likely a child needed an IEP, indicating that the greater the organization in this region, the better the educational outcome (**Tables 6**, **8**). Notably, IEPs are provided through public schools only when children are diagnosed with an educationally significant problem, such as reading delays or behavioral problems; speech delays are only one diagnosis that would elicit an IEP. In contrast, speech therapy may be pursued privately, especially when a child attends a private school. Heschl's gyrus is tonotopically organized, and monaural deprivation disrupts this organization (Popescu and Polley, 2010). Although this area is anatomically crucial in auditory functioning, the association between test scores and the microstructure of this region requires further confirmation and validation in future studies. As neuroimaging is often an early step in the etiologic work-up of pediatric UHL, acquisition of DTI would be a feasible addition to the protocol and may provide clinically relevant information in regard to special educational needs.

Proposed by Kimura (1963), the dichotic listening paradigm held that the right ear was preferred for listening to speech, due to the predominant representation of a right ear stimulus in the left cerebral hemisphere, where language typically lateralizes (Kimura, 1963). According to a hypothesis of right ear advantage, children with left UHL should enjoy a speechlanguage advantage, but the evidence is currently inconclusive (Hartvig et al., 1989; Niedzielski et al., 2006; Lieu et al., 2010). The only DTI differences observed between right and left UHL were increased FAs for right UHL in the left lateral lemniscus and the subcortical white matter of the left Heschl's gyrus.


**Table 4 | DTI parameters in those with unilateral hearing loss (UHL;** *n* **= 29) and normal hearing (NH;** *n* **= 20) participants amongst six auditory regions of interest (ROI) and nine non-auditory ROIs.**

*Additional columns list values for right unilateral hearing loss (Right UHL; n* = *13) and left unilateral hearing loss (Left UHL; n* = *16); adjacent p-value column refers to p-values of Kruskal-Wallis tests between right UHL, left UHL and normal hearing (NH). Only parameters with uncorrected p-values < 0.05 are listed. \*Trend at p < 0.05 level; †Significant at p < 0.004 for auditory ROIs and <0.003 for non-auditory ROIs.*

**Table 5 | Comparison of fractional anisotropy (FA) and mean diffusivity (MD) between right and left hemispheres for unilateral hearing loss (UHL;** *n* **= 29) and normal hearing (NH;** *n* **= 20) participants in six auditory and nine non-auditory regions of interest (ROI).**


*Only parameters with uncorrected p-values < 0.05 are listed.*

*\*Trend at p < 0.05 level; †Significant at p < 0.004 for auditory ROIs and <0.003 for non-auditory ROIs.*

Instead, several DTI regions showed differential strengths in correlation with language and verbal IQ outcomes in children with and without UHL. These discordances suggest that the brains of children with UHL undergo reorganization in the white matter to help compensate for the lack of typical peripheral auditory stimuli to the contralateral hemisphere. Thus, the dichotic listening paradigm may not fit when one ear does not hear.

**Table 6 | Spearman rho correlation values between educational outcomes and DTI parameters (fractional anisotropy, FA; mean diffusivity, MD) in auditory and non-auditory regions of interest (ROI) in participants with normal hearing (NH), unilateral hearing loss (UHL), right UHL or left UHL.**


*Statistical significance was considered to be p < 0.004 for auditory ROIs and p < 0.003 for non-auditory ROIs.*

*\*p < 0.05; †p < 0.01; ††p < 0.001.*

*IQ, intelligence quotient; IEP, individualized education plan.*

Regardless of hearing status, rightward asymmetries were observed in FA in many auditory structures. In non-auditory structures, asymmetries in DTI parameters were mixed, making interpretation of our results in large structures such as the centrum semiovale and posterior limb of the internal capsule less clear (Bonekamp et al., 2007; Eluvathingal et al., 2007; Liu et al., 2010; Takao et al., 2011). In auditory structures, our findings indicate that despite lack of auditory input from one ear, patterns of microstructural integrity are preserved in children with UHL. These results contrast with stimuli-induced functional MRI findings. When sounds or speech are presented monaurally to a NH individual, there is increased activation in the contralateral primary auditory cortex (Scheffler et al., 1998). Adults with UHL have a more symmetrical fMRI activation, with decreased contralateral primary auditory cortex activation and increased ipsilateral primary auditory cortex activation, when presented with a speech stimulus to the hearing ear (Firszt et al., 2006). While we would have expected more white matter microstructural differences in auditory structures between children with and without UHL, the lack thereof may be due to compensatory plasticity in the face of monaural deprivation, with these structures transmitting signal from the good ear, or being recruited for other brain functions. It is unclear whether UHL affects attentional networks and other aspects of executive functioning. In another study looking at fMRI data in children with and without UHL, children with UHL were found to have less activation of secondary auditory areas compared to NH individuals and failure to activate attentional areas (Propst et al., 2010). In the present cohort, differences in rs-fcMRI were found in areas associated with auditory processing, executive function and memory formation between children with UHL and NH controls (Tibbetts et al., 2011). Since the neuroanatomical microstructure remains intact, it is possible that these underutilized auditory areas have been recruited by other systems in the brain, such as noted by Obretenova et al. (2010), or analogously as has been shown to occur in the occipital cortex of blind subjects (Burton et al., 2012). An early-deaf and early-blind individual who relied on tactile communication modalities was found to have enhanced occipital connectivity as well greater activation of superior temporal and inferior frontal language regions on fMRI relative to a normally sighted and hearing person. This hypothesis could explain the discrepancy in speech-language outcomes between children with **Table 7 | Results of multiple linear regression analyses for all subjects (***n* **= 49).**


*Standardized scores for educational outcomes were the dependent variables, and the DTI parameters (FA, Fractional Anisotropy; MD, Mean Diffusivity) for the stated regions of interest, age and hearing status (i.e., unilateral hearing loss, UHL) were entered in the models as predictors. FA, Fractional Anisotropy; MD, Mean Diffusivity.*

and without UHL despite the preservation of the microstructure in auditory regions. In contrast to our results here, early blind individuals were noted to have marked differences in the geniculocalcarine tract from normally sighted individuals; however, connections between the visual cortex and the orbital frontal and temporal cortices were preserved in early blind individuals (Shimony et al., 2006).

Contrary to expectations of finding differences in DTI parameters of auditory regions between children with and without UHL, most DTI parameters between the two groups were quite similar. Thus, given the results from prior studies on bilateral hearing loss, UHL seems to affect white matter development differently than bilateral hearing loss. There were no differences between left and right UHL groups. However, our sample size was small, so that our negative findings do not preclude the existence of such differences. We included children with both acquired and congenital hearing loss, which may have influenced our results, although in all cases the hearing loss was at birth or in early childhood. Because most of the children were born before the era of newborn hearing screening, precise onset of UHL could not be determined. However, the severity of UHL and universal lack of patient complaint about hearing indicates that the children were very young and unaware of the hearing loss when it occurred. Sampling ROIs manually may also be perceived as a limitation of this study; however, this is how many of the comparison studies were done, and given the small size of the structures we were sampling, alternative approaches, such as atlas-based approaches, were not feasible.

A few studies have used DTI to examine white matter microstructure in adults with hearing loss. There is a body of evidence indicating that adults with hearing loss have reduced FA in several structures. Chang et al. (2004) examined five ROIs (the lateral lemniscus and inferior colliculus, the trapezoid body, auditory radiation and superior olivary nucleus); reduced FA was observed in subjects with sensorineural hearing loss at all locations bilaterally with the exception of the trapezoid body (Chang et al., 2004). A study of 13 adults with early deafness

## **Table 8 | Results of multiple logistic regression analyses for all subjects (***n* **= 49).**


*Standardized scores for educational outcomes were the dependent variables, and the DTI parameters (FA, Fractional Anisotropy; MD, Mean Diffusivity) for the stated regions of interest, age and hearing status (i.e., unilateral hearing loss, UHL) were entered in the models as predictors. \*p <* 0*.*05*; †p <* 0*.*01*.*

revealed decreased FA within the temporal white matter, internal capsule, superior longitudinal fasciculus and the inferior frontal white matter (Kim et al., 2009). In adults with UHL, FA values of the inferior colliculus and the lateral lemniscus were significantly lower on the side contralateral to the hearing loss than those on the ipsilateral side (Lin et al., 2008). The study of Wu et al. looked at 12 subjects with unilateral congenital cochlear nerve deficiency and noted bilateral decrease in FA and increase in MD in the lateral lemniscus and inferior colliculus (Wu et al., 2009). However, the population of this study spanned a wide age distribution (age 8–29 years) and no correction was made for this wide variability in age. Unlike the results of previous studies, our findings indicate that microstructural integrity in children with UHL is not substantially altered from that of children with NH. The difference in our results could be related to the differences in our cohorts (unilateral vs. bilateral hearing loss in most other studies), to the age of our groups (children vs. adults, or mixed in other studies), and to our statistical methods.

Two studies have used DTI in children with bilateral prelingual deafness to investigate alternations in brain white matter tracts. Children (age 10–18 years) with bilateral prelingual deafness had lower FA values and increased radial diffusivity bilaterally in the superior temporal gyrus, Heschl's gyrus, planum polare, and splenium of the corpus callosum compared to controls with normal hearing (Miao et al., 2013). In addition, the mean radial diffusivity of the right superior temporal gyrus appeared to be correlated with the duration of sign language use. Chang and colleagues used DTI characteristics to compare children who had undergone cochlear implantation but were classified as having "good" or "poor" auditory performance outcomes (Chang et al., 2012). They found higher FA values in Broca's area, genu of the corpus callosum, and auditory tract among the children with "good" auditory performance compared to those with "poor" outcome. Strong correlations were found between FA and auditory performance scores with several brain areas: medial geniculate nucleus, Broca's area, genu of the corpus callosum, and auditory tract. They concluded that preoperative functional imaging prior to placement of cochlear implants might be helpful to understanding clinical outcomes. Although all of the brain areas from these two studies in children are different from the ones identified in adults with hearing loss, most but not all occurred in areas associated with auditory and language processing, and most showed bilateral changes. These findings in children with bilateral prelingual deafness are highly interesting, but they differ from the results of the current study. Whether they apply to the population of children with UHL who have preserved auditory stimulation in one ear would need further investigation.

## **CONCLUSION**

This study detected correlations between educational outcomes and microstructural integrity of brain structures in children with and without UHL that may have clinical relevance in the guidance of speech and language therapy. Our results imply that unilateral auditory input preserves many of the asymmetries in white matter microstructural patterns in children between right and left hemispheres. However, UHL results in functional and behavioral differences on language and educational measures, possibly due to recruitment of certain areas (e.g., middle cerebellar peduncle, superior temporal gyrus, as shown in **Table 8**) for other brain functions.

#### **AUTHOR CONTRIBUTIONS**

Judith E. C. Lieu conceived and supervised the project and wrote the paper. Joshua S. Shimony supervised the project and wrote the paper. Tara Rachakonda collected data, performed statistical analyses, and wrote the paper. Rebecca S. Coalson collected data, performed statistical analysis, and revised the paper.

#### **ACKNOWLEDGMENTS**

We would like to express our gratitude to Steve Petersen, PhD, and Bradley Schlaggar, MD, PhD, for their support and guidance in this project. Our work would also not be possible without the efforts of Banan Ead. Grants from the American Hearing Research Foundation Wiley H. Harrison Memorial Research Award and the National Institutes of Health (K23 DC006638— JECL) funded this research. Tara Rachakonda was also supported by a training grant (5T32DC000022-2) and a Clinical and Translational science award from Washington University through the NIH/NCRR/NCATS (UL1RR024992). The contents of this publication are solely the responsibility of the authors and do not necessarily represent the official view of the NCRR, NCATS or NIH.

## **REFERENCES**


**Conflict of Interest Statement:** The Associate Editor, Dr. Peelle Jonathan declares that, despite being affiliated to the same institution as authors The Department of Otolaryngology of Washington University School of Medicine, the review process was handled objectively. The rest of the authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 10 September 2013; accepted: 25 April 2014; published online: 26 May 2014. Citation: Rachakonda T, Shimony JS, Coalson RS and Lieu JEC (2014) Diffusion tensor imaging in children with unilateral hearing loss: a pilot study. Front. Syst. Neurosci. 8:87. doi: 10.3389/fnsys.2014.00087*

*This article was submitted to the journal Frontiers in Systems Neuroscience.*

*Copyright © 2014 Rachakonda, Shimony, Coalson and Lieu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Upregulation of cognitive control networks in older adults' speech comprehension

## *Julia Erb\* and Jonas Obleser\**

*Max Planck Research Group "Auditory Cognition", Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany*

#### *Edited by:*

*Jonathan E. Peelle, Washington University in St. Louis, USA*

#### *Reviewed by:*

*Jonathan E. Peelle, Washington University in St. Louis, USA Stefanie E. Kuchinsky, Medical University of South Carolina, USA*

#### *\*Correspondence:*

*Julia Erb and Jonas Obleser, Max Planck Research Group "Auditory Cognition", Max Planck Institute for Human Cognitive and Brain Sciences, Stephanstrasse 1A, 04103 Leipzig, Germany e-mail: jerb@cbs.mpg.de; obleser@cbs.mpg.de*

Speech comprehension abilities decline with age and with age-related hearing loss, but it is unclear how this decline expresses in terms of central neural mechanisms. The current study examined neural speech processing in a group of older adults (aged 56– 77, *n* = 16, with varying degrees of sensorineural hearing loss), and compared them to a cohort of young adults (aged 22–31, *n* = 30, self-reported normal hearing). In a functional MRI experiment, listeners heard and repeated back degraded sentences (4-band vocoded, where the temporal envelope of the acoustic signal is preserved, while the spectral information is substantially degraded). Behaviorally, older adults adapted to degraded speech at the same rate as young listeners, although their overall comprehension of degraded speech was lower. Neurally, both older and young adults relied on the left anterior insula for degraded more than clear speech perception. However, anterior insula engagement in older adults was dependent on hearing acuity. Young adults additionally employed the anterior cingulate cortex (ACC). Interestingly, this age group × degradation interaction was driven by a reduced dynamic range in older adults who displayed elevated levels of ACC activity for both degraded and clear speech, consistent with a persistent upregulation in cognitive control irrespective of task difficulty. For correct speech comprehension, older adults relied on the middle frontal gyrus in addition to a core speech comprehension network recruited by younger adults suggestive of a compensatory mechanism. Taken together, the results indicate that older adults increasingly recruit cognitive control networks, even under optimal listening conditions, at the expense of these systems' dynamic range.

**Keywords: functional MRI, aging, degraded speech, neural adaptation, executive functions, noise-vocoding, cochlear implant simulation, temporal envelope**

### **INTRODUCTION**

Speech comprehension can become difficult with age and agerelated hearing loss, especially when listening conditions are challenging. Normal-hearing young adults have the capacity to rapidly adapt to degraded speech (Davis et al., 2005; Samuel and Kraljic, 2009; Eisner et al., 2010; Erb et al., 2013). Such short-term perceptual adaptation is not well established in older adults, although it bears particular relevance as older adults are frequently affected by hearing loss. For example, patients with hearing-aids or, more drastically, cochlear implants (CI) need to adapt to an altered and often distorted auditory signal delivered by their device.

In a previous short-term adaptation study in a cohort of young adults (Erb et al., 2013), we have shown that degraded speech processing elicits an increased blood oxygenation leveldependent (BOLD) response in an "executive" network (Eckert et al., 2009) comprising the anterior insula and anterior cingulate cortex (ACC). Also, adaptation to degraded speech was shown to be accompanied by hemodynamic down-regulation in a corticothalamic-striatal loop (Erb et al., 2013). In the current functional MRI experiment, we compare these results to a group of older adults with varying degrees of hearing loss to test: (1) whether older listeners are able to (behaviorally) adapt to spectrally severely degraded ("noise-vocoded") speech at a rate comparable to young listeners and (2) how neural processing of degraded speech differs between young and older adults.

There is evidence that rapid perceptual learning is preserved in older adults (Peelle and Wingfield, 2005; Golomb et al., 2007; Gordon-Salant et al., 2010). For example, older adults are able to quickly adapt to an unfamiliar accent (Adank and Janse, 2010) or a foreign accent (Gordon-Salant et al., 2010). Peelle and Wingfield (2005) showed that older adults adapted to temporally degraded ("time-compressed") and noise-vocoded speech at a similar rate as young adults, when starting accuracy was equated.

However, considerable inter-individual variability has been frequently observed in adaptation to vocoded speech (Shannon et al., 2002; Eisner et al., 2010). Especially in older adults, the degree of hearing loss and cognitive aspects might substantially impact adaptation to degraded speech. Working memory is one cognitive factor that has been implicated in degraded speech comprehension by a number of studies (Pichora-Fuller et al., 1995; Burkholder et al., 2005; Jacquemot and Scott, 2006; Eisner et al., 2010; Piquado et al., 2010; Obleser et al., 2012). For example, Pisoni and Cleary (2003) observed that working memory scores as measured by digit span significantly predicted speech comprehension in pediatric CI users. In older adults, cognitive factors might be even more closely related to degraded speech recognition (Janse and Adank, 2012), because cognitive decline has been shown to co-occur with sensory decline (Lindenberger and Ghisletta, 2009), which in turn leads to degraded auditory conditions. Thus, we expected working memory capacity in older adults to be related to vocoded speech comprehension.

A second factor which heavily affects comprehension is hearing loss. As age-related hearing loss is accompanied by auditory cortex atrophies (Harris et al., 2009; Peelle et al., 2011; Eckert et al., 2012), older adults likely draw on different neural resources for speech comprehension. It is a common observation that older adults recruit additional regions for speech comprehension compared to young adults, although it is unclear whether this reflects an age-related loss of specialization of cortical brain regions (Park et al., 2004) or a compensatory mechanism (Cabeza et al., 2002). Peelle et al. (2010b) noted that during processing of syntactically complex sentences, older adults engaged middle frontal regions in addition to a "core sentence-processing network" (comprising middle temporal gyrus (MTG) and inferior frontal gyrus (IFG); Peelle et al., 2004; Fiebach et al., 2005) recruited by young adults. The authors interpreted this engagement of additional areas as a compensatory process, whereby the older adults managed to maintain performance despite degeneration of the sensory cortices.

In line with this argumentation, older adults have been hypothesized to engage in more effortful processing during speech comprehension (Pichora-Fuller, 2003). Consistently, Eckert et al. (2008) observed an age-related upregulation of cognitive-control-related frontal areas during an easy word recognition task, while younger adults recruited these areas merely in difficult listening conditions. Harris et al. (2009) further showed that incorrect vs. correct word recognition elicited increased activity in the ACC, but more so in older than younger adults, possibly reflecting an age-related upregulation of error monitoring systems (Sharp et al., 2006). Hence, for solving auditory tasks the reliance on cognitive control appears to increase with age.

It is still largely unknown, however, how older adults adapt to degraded speech. In the current functional MRI study, we were primarily interested in how short-term adaptation to degraded speech and the involvement of cognitive control networks in speech processing changes with age. Young and older listeners heard and repeated back 100 degraded (4-band-vocoded) sentences as well as a control set of interspersed clear-speech sentences. Thus, we could identify age differences in the neural processes related to both, physical degradation of speech (degraded vs. clear sentences) and trial-by-trial fluctuations in comprehension (covariation of BOLD responses with degraded speech comprehension success).

## **MATERIALS AND METHODS**

#### **PARTICIPANTS**

Sixteen older adults (aged 56–77, mean 67.1 years, 8 female) participated in the study. Their data were analyzed jointly with a cohort of 30 young adults (aged 22–31, mean 25.9 years, 15 female) who had participated in the study reported in Erb et al. (2013). Participants were recruited from the participant database of the Max Planck Institute for Human Cognitive and Brain Sciences according to the following criteria: they were native speakers of German; had no language or neurological disorders; showed dominant right-handedness according to the Edinburgh inventory (Oldfield, 1971) and were naïve to noisevocoded speech. Younger adults self-reported normal hearing, whereas older adults had normal hearing to moderate hearing loss based on their pure-tone averages which were audiometrically assessed (see below). Participants received financial compensation of e 16, and gave informed consent. Procedures were approved by the local ethics committee (University of Leipzig).

### *Audiometric evaluation*

Older adults' pure-tone thresholds were measured at conventional frequencies from 0.25–8 kHz using an Equinox 2.0 AC-440 audiometer (Interacoustics) in a sound-proof chamber. Older participants' pure-tone average (PTA; defined as the average thresholds in the listener's better ear at 1, 2 and 4 kHz) indicated normal hearing (< 25 dB HL) to moderate hearing loss (40–70 dB HL), whereas high-frequency hearing ranged from normal to severe hearing loss (70–95 dB HL; audiograms are shown in **Figure 1A**). Young participants' hearing acuity was not tested but all of them self-reported normal hearing.

#### *Auditory forward and backward digit span*

To measure working memory capacity, all participants were tested with a digit span test, which is part of the revised Wechsler memory scale (WMS-R; Wechsler, 1987). The experimenter read out to the participant lists of single digits between 1 and 9 at a rate of approximately one digit per second. Participants had to immediately repeat the list of digits in the same order (forward digit span) or in the reverse order (backward digit span). The test had seven levels featuring list lengths starting from three digits increasing by one digit up to nine digits for forward digit span, and list lengths from two to eight digits for backward digit span. Each level comprised two items. The participants' response was marked as correct only if all digits were correctly repeated in the required order. The testing stopped when the participant reported none of the items of a level correctly or when all 14 items had been presented. The level at which the test was terminated was taken as the individual forward or backward digit span measure (Wechsler, 1987).

#### **STIMULI AND EXPERIMENTAL DESIGN**

Stimuli were German SPIN sentences, which control for the predictability of the final word (Kalikow et al., 1977; Erb et al., 2012). Only low-predictability sentences were used in the present study, such that semantic cues were limited and the listener had to rely primarily on acoustic properties of the sentence to understand it. A complete list of these sentences is available in Erb et al. (2012). The sentences were recorded by a female speaker of standard German in a sound-proof chamber. The length of the recorded sentences varied between 1620 and 2760 ms.

Sentences were degraded using 4-band noise vocoding. This procedure divides the speech signal into frequency bands, extracts the amplitude envelope from each band and reapplies it to bandpass-filtered noise carriers. Thus, the spectro-temporal fine structure is smeared while the temporal envelope remains preserved. For envelope extraction, we used a second-order, zero-phase Butterworth lowpass-filter with a cutoff frequency of 400 Hz. Noise-vocoding was applied to all sentences in MATLAB 7.11 as described in Rosen et al.(1999) using four frequency bands spanning 70–9000 Hz that were spaced according to Greenwood's cochlear frequency-position function (Greenwood, 1990); for the exact cut-off frequencies of the frequency bands see Erb et al. (2012).

Each trial was approximately 9 s long, but actual trial length varied due to cardiac gating (see below). Trials started with a 1 s silent gap, after which participants heard a sentence lasting for approximately 2.5 s. Following stimulus presentation (3.5 s into the trial), a green light ("go"-signal for response) was visually presented and lasted for 3 s. During this time, participants were to respond by repeating the sentence, but were instructed to stop talking when the green light disappeared in order to avoid movement during scan acquisition. After approximately 1 s of silence, scan acquisition with a TR of 2 s was triggered using cardiac gating. Thus, the onset of auditory stimulation preceded the anticipated scan acquisition by approximately 6.5 s. Verbal responses were recorded for later off-line scoring (Eckert et al., 2009; Harris et al., 2009). Responses were scored as proportions of correctly repeated words per sentence ("report scores", Peelle et al., 2013). Scoring took into account all words of a sentence including function words. The marking scheme was liberal such that errors in declension or conjugation were accepted as correct.

Clear-speech trials were used as a high-level baseline. Clear speech can be assumed to be fully adapted, and therefore to be processed in a stable way over time. This ensured that neural adaptation would not occur in the baseline condition, whereas another type of artificial speech degradation (e.g., rotated speech) might have led to neural adaptation (even in the absence of behavioral adaptation).

Overall, the experiment comprised three conditions, (1) 4-band vocoded sentences ("degraded speech"; 100 trials), (2) clear (non-degraded) sentences ("clear speech"; 24 trials) and (3) trials lacking any auditory stimulation ("silent trials"; 20 trials), summing up to 144 trials in total. Clear speech trials were presented every 5th trial, whereas silent trials were randomly interspersed. Sentences were presented to each participant in one of four different orders; presentation order was counterbalanced across participants.

#### *Individual adaptation curves*

To measure individual learning rate, we modeled learning curves in two different ways: as power law or as linear performance increase. It has been claimed that learning curves for short-term adaptation to degraded speech are asymptotic such that perceptual learning follows a power law (Peelle et al., 2010a). However, we had previously shown in young adults, that behavioral adaptation to noise-vocoded speech is better described by linear curves than more complex, power-law fits (Erb et al., 2012, 2013).

Here, to test whether in older adults, a linear or a power law function would better describe the data, both functions were fitted to the individual performance scores over time using a leastsquares estimation procedure in MATLAB 7.11 (cf. Erb et al., 2012; for an example linear and power law fit to the older group's average performance see **Figure 1B**). We compared goodness of fit by determining the Bayesian information criterion (BIC; Schwarz, 1978) of the linear and the power law fit within each subject. BIC values were defined as

$$BIC = n \times \ln(\sigma\_e^2) + k \times \ln(n) \tag{1}$$

where *n* is the number of observations (*n* = 100), σ<sup>2</sup> *<sup>e</sup>* is the error variance or sum of squared residuals, *k* is the number of fitted parameters (*k* = 2 for the linear fit, *k* = 3 for the power law fit). Smaller BIC values indicate a better fit (Schwarz, 1978; Priestley, 1981). The BIC increases as a function of σ<sup>2</sup> *<sup>e</sup>* and of *k*. Models with higher error variance and more fitted parameters are therefore penalized.

#### **EXPERIMENTAL PROCEDURE**

Before participants went into the scanner, they were familiarized with the task by listening to three 8-band vocoded sentences.

To prevent hearing damage due to scanner noise, participants wore Alpine Musicsafe Pro earplugs while in the bore, yielding approximately linear 14-dB reduction in sound pressure up to 8 kHz. Auditory stimuli were delivered through MR-Confon headphones using Presentation software. Presentation level was adjusted for each participant such that loudness was subjectively comfortable and equal across both ears. This ensured that stimuli were presented at a level well above participants' thresholds in the speech range frequencies such that all participants were able to perceive the sentences. Visual prompts were projected on a screen which participants viewed via a mirror.

#### **MRI DATA ACQUISITION**

MRI data were collected on a 3-T Siemens Verio scanner. Functional MR images were acquired with a 12-channel head coil using an echo-planar imaging (EPI) sequence [repetition time (TR) ≈ 9000 ms, acquisition time (TA) = 2000 ms, echo time (TE) = 30 ms, flip angle = 90◦, 3 mm slice thickness, 30 axial slices (ascending), interslice distance = 1 mm, acquisition matrix of 64 × 64, voxel size = 3 × 3 × 3 mm]. The acquisition matrix was placed such that the *x*-axis was in line with the anterior–posterior commissure (AC–PC). We used a sparse-sampling procedure, where TR was longer than TA, allowing for silent periods to play stimuli and record responses (Hall et al., 1999).

Cardiac gating was applied to avoid movement artifacts caused by the heartbeat in subcortical structures (Von Kriegstein et al., 2008), in which we were especially interested (Erb et al., 2012, 2013). Participants' heartbeat was monitored using an MRcompatible pulse oximeter (Siemens) attached to their left ring finger. On each trial, after 9 s had elapsed, the scanner waited for the first heartbeat to trigger volume acquisition. Thus, the actual TR was variable.

Following functional imaging, young participants received a T1-weighted structural brain scan with a 32-channel head coil using an MPRAGE sequence [TR = 1300 ms, TE = 3.5 ms, flip angle = 10◦, 1 mm slice thickness, 176 sagittal slices, acquisition matrix of 256 <sup>×</sup> 240, voxel size = 1 mm3].

Older adults' anatomical scans for registration with the functional images were available through the Institute's brain database. Scanning had been carried out on a 3T Siemens Trio TIM scanner using T1-weighted MPRAGE sequence to acquire 176 sagittal slices, with an acquisition matrix of 256 × 240, yielding a resolution of 1 mm3.

For one older participant the scanner had become desynchronized with the presentation script, such that he had to be excluded from the functional MRI data analyses, resulting in a total of 15 older participants included in the analyses. In one young participant, we were only able to acquire 136 (as opposed to 144) scans due to technical problems with cardiac gating.

## **DATA ANALYSIS**

Note that all behavioral and MRI analyses of single participants were closely matched to the procedures established previously in the young cohort and published in Erb et al. (2013).

## *Preprocessing*

MRI data were analyzed in SPM8 (Wellcome Trust Centre for Neuroimaging, London, UK). Functional images were realigned and unwarped using a fieldmap, coregistered with the structural scan, segmented and normalized to standard space (Montreal Neurological Institute [MNI] space) using the segmentation parameters, and smoothed with an isotropic Gaussian kernel of 8 mm full-width at half-maximum.

#### *Statistical analyses*

MR images were statistically analyzed in the context of the general linear model (GLM). We set up one model to assess effects speech degradation and effects of trial-by-trial-fluctuations in comprehension. In this model, we included two conditions: degraded and clear speech. To avoid overspecification, silent trials were not modeled in the analyses, but solely used for an initial quality check of the data confirming that sound compared to silent trials yielded large clusters of activity in the auditory cortex. Additionally, a parametric modulator of the degraded speech trials was defined, representing the behavioral report scores. A regressor of no interest, containing report latencies, was added in order to account for differences in speech production (analysis explained in detail below). To assess effects of stimulus clarity, we contrasted degraded against clear speech trials. To reveal effects of trial-bytrial fluctuations in speech comprehension, we assessed correlations with the regressor representing report scores. To look for effects of hearing loss, we correlated PTA on the second level with the contrast degraded > clear speech.

Although the present study was designed to image degraded speech perception, parts of the observed activity may be related to speech production or preparation, because participants overtly repeated back what they had understood starting approximately 3.5 s prior to scan acquisition. In particular, participants' verbal responses might have been faster for clear relative to degraded speech trials, perhaps leading to partly imaging the BOLD response to speech production, but more so for clear speech trials. Therefore, differences in report latencies might confound the comparison between degraded and clear speech trials. To control for this potential confound, we calculated report latency relative to the onset of the visual response cue. This measure was included at the first level as one single regressor of no interest in the model. For trials where participants did not produce an overt response, the subject-specific mean report latency was entered instead.

All described analyses were whole-brain analyses. Regressors were modeled using a finite impulse response (FIR) comprising one bin. A high-pass filter with a cutoff of 1024 s was applied to eliminate low-frequency noise. No correction for serial autocorrelation was necessary because of the long TR in the sparsesampling acquisition.

Second level statistics were calculated using a one-sample *t-*test and group differences were assessed using a two-samples *t*-test. We recognize that comparisons between groups of different sample sizes (here: 15 older adults vs. 30 younger adults) are problematic; especially when variance differs between groups, the group with the larger variance should comprise more samples (Samanez-Larkin and D'Esposito, 2008). There is evidence, however, that BOLD signal variability actually decreases in older adults (Garrett et al., 2010). Further, we wanted to avoid discarding data which were already available for the 30 young adults, resulting in a larger sample size of the young compared to the older adults.

We are aware of the problem that hemodynamics likely change with age (e.g., due to vascular changes), such that simple group differences in the BOLD signal could possibly reflect differences in neurovascular coupling rather than actual differences in neural processing. To overcome this issue, we only tested for age group × condition interactions when assessing age effects on neural processing (Samanez-Larkin and D'Esposito, 2008).

Group inferences are reported at a threshold of *p* < 0.001 and a cluster-extent of *k* > 20 to correct for inflated type I error at the whole-brain level, as based on a Monte Carlo Simulation (Slotnick et al., 2003). *T*-statistic maps were transformed to *Z*-statistic maps using spm\_t2z.m, and overlaid and displayed on the ch2 template in MNI space included with MRIcron (Rorden and Brett, 2000).

#### *Regions of interest analyses*

In order to extract measures of percent signal change in the regions identified by the whole-brain analyses described above, we defined regions of interest (ROIs) using the SPM toolbox MarsBar (Brett et al., 2002). ROIs were defined as spheres with a radius of 3 mm centered on the identified peak coordinates. Voxels within an ROI were aggregated into a single contrast estimate using the first eigenvariate.

## **RESULTS**

#### **BEHAVIORAL RESULTS** *Vocoded speech comprehension*

Older adults reported on average 28.0 ± 2.8 (mean ± SEM) % words correctly per degraded sentence. Performance in clear trials was at 98.0 ± 1.4 % correct. In comparison, young adults' degraded speech recognition was substantially better (*t*(44) = 8.23, *p* < 0.001), with on average 51.9 ± 1.4 % words correct per degraded sentence and 99.7 ± 0.2 % correct per clear sentence (**Figure 1D**, left).

#### *Perceptual adaptation*

We compared whether a linear or power-law fit would better describe the report scores' increase over time by calculating BIC for each fit and each older participant (**Figures 1B, C**). The BIC scores for the linear fits (median 222.09, range 95.21–268.48) were smaller than those for power law fits (median 226.69, range 99.81–273.09), as shown by a Wilcoxon signed-rank test (*p* < 0.001), indicating that the linear curve better fit the behavioral data. In the young participants, we had also shown that linear fits were more adequate than the power law fits to describe participants' learning curves (see Erb et al., 2013). Thus, the slope of the linear fit (adaptation slope) was taken as a measure of individual perceptual adaptation to degraded speech.

The BIC compares goodness of fit between different models but does not give an estimate of absolute goodness of fit. Absolute goodness of fit as measured by *R*<sup>2</sup> in the older adults amounted to *<sup>R</sup>*<sup>2</sup> = 0.061 <sup>±</sup> 0.01 (mean <sup>±</sup> SEM) for the power law fit and *R*<sup>2</sup> = 0.072 ± 0.016 for the linear fit. *R*<sup>2</sup> did not differ between the linear and the power law fit, as shown by a Mann-Whitney *U*-test (*p* = 0.95). Note however, that a direct comparison of *R*<sup>2</sup> between different models does not allow for a fair comparison, as *R*<sup>2</sup> does not take into account the number of fitted parameters.

In order to test whether goodness of fit differed between age groups, we compared *R*<sup>2</sup> of the linear fits in the two groups. *R*<sup>2</sup> in the older adults did not differ from *R*<sup>2</sup> in the younger adults (0.065 ± 0.009), as shown by a Mann-Whitney *U-*test (*p* = 0.92), indicating that the goodness of fit was comparable in older and younger adults. Although these single-subject *R*<sup>2</sup> values are relatively small (only approximately 7% of the variance is explained by the fitted model), when averaging over the 16 older adults, mean report scores correlated highly with sentence number (*R*<sup>2</sup> = 0.32, *p* < 0.001). Davis et al. (2005) have reported similar correlation coefficients for mean report scores with sentence number in their vocoded speech learning study (but did not report single-subject *R*2-values).

To confirm that the slopes of the linear fits were a sensible measure of learning, we used another more traditional measure of learning. For each participant, we subtracted the mean performance over the last 20 trials from the mean performance over the first 20 trials (Bent et al., 2009; Eisner et al., 2010). This performance difference ( performance) between the beginning and end of the experiment amounted to 0.22 ± 0.03 proportion correct (mean ± SEM) in the older adults and 0.27 ± 0.02 proportion correct in the younger adults. Across age groups,  performance was highly correlated with the adaptation slopes (*r* = 0.91; *p* < 0.001).

To test whether older and younger adults differed in their rate of learning, we compared both the performance and adaptation slopes between groups. According to a two-samples *t*-test, there was neither a difference in performance between older and young adults (*t*(44) = −1.25, *p* = 0.22), nor in the adaptation slopes of older (2.5 <sup>±</sup> 0.39 <sup>×</sup> <sup>10</sup>−3, mean <sup>±</sup> SEM) and young adults (2.7 <sup>±</sup> 0.21 <sup>×</sup> <sup>10</sup>−3; *<sup>t</sup>*(44) = <sup>−</sup>0.65, *<sup>p</sup>* = 0.52), indicating that both groups were comparable in the rate with which they adapted to degraded speech (**Figure 1D**, right).

Finally, to exclude the possibility that variability in the adaptation curves (shown in **Figure 1C**) was a consequence of the counterbalancing of material across participants (in four different presentation orders), we tested whether presentation order had an effect on adaptation slope. A Kruskal-Wallis test was not significant (χ2(15) <sup>=</sup> 4.34, *<sup>p</sup>* = 0.23), indicating that the fact that different listeners received different materials at different time points did not influence adaptation.

#### *Spearman's correlations*

In order to examine whether in older adults, adaptation slope was related to other factors (i.e., age, forward and backward digit span, PTA, mean performance), a two-tailed Spearman's correlation was calculated for all pairs of variables. We adjusted for multiple comparisons by controlling the false discovery rate (Benjamini and Hochberg, 1995), which resulted in a critical *p* = 0.019 at a false discovery rate of *q* = 0.05 ("significant") and *p* = 0.039 at *q* = 0.1 ("non-significant trend").

Within the older adults, adaptation slope correlated positively with backward digit span (**Figure 2A** and **Table 1**), indicating that listeners with a larger working memory capacity adapted faster to degraded speech. Note that this correlation remained significant, when the outlier participant with a slope of 0.006 was removed (ρ = 0.62, *p* = 0.18). PTA and age showed a non-significant trend for a negative correlation with adaptation slope, meaning that


*Significant correlations are shown in bold: \* significant at q* < *0.05;* <sup>+</sup> *significant at q* < *0.1; Slope: adaptation slope, DSF: digit span forward, DSB: digit span backward, PTA: pure-tone average, Perform: Mean vocoded speech comprehension.*

older age and hearing loss were associated with slower adaptation rates. Similarly, older age and PTA were negatively related to average speech comprehension performance. Finally, age also correlated significantly with PTA, indicating that older adults had greater hearing loss (**Table 1**).

To analyze more closely the relationship between adaptation slope, age, digit span and hearing loss, we calculated Spearman's partial correlation coefficients between these four variables of interest, with two variables partialled out at a time (**Figure 2B**). Adaptation slope still correlated significantly with backward digit span, even after partialling out age and hearing loss, indicating that the latter could not explain the relationship between working memory and adaptation. The correlation between PTA and age also remained significant in the partial correlation. On the other hand, the non-significant trend for a negative correlation of adaptation slope with hearing loss and age broke down in the partial correlation (**Figure 2B**).

#### **FUNCTIONAL MRI RESULTS IN OLDER ADULTS**

The results reported below refer to the group of older adults exclusively. For the cohort of young adults, activation clusters and coordinates of peak activity are described in detail in Erb et al. (2013).

#### *Degraded speech processing*

To reveal regions that are engaged in degraded speech processing, we compared degraded with clear speech trials. At a clusterextent corrected threshold of *p* < 0.001, degraded compared to clear speech elicited an increased BOLD signal in the left anterior insula. On the other hand, clear compared to degraded speech yielded higher activations bilaterally in the precentral gyrus spanning the temporal cortices, supramarginal gyrus (SMG), right putamen, posterior cingulate cortex and angular gyrus bilaterally (**Figure 3A**).

#### *Trial-by-trial fluctuations in degraded speech comprehension*

To identify areas where the BOLD signal varied with trial-bytrial fluctuations in speech comprehension, we tested for correlations with the behavioral report scores. The hemodynamic response linearly increased with comprehension of degraded speech in bilateral temporal cortices comprising Heschl's gyrus, the middle temporal gyrus, the precentral gyrus bilaterally, the putamen bilaterally, left angular gyrus, and middle frontal gyrus (**Figure 3B**). Although scans might have been sensitive to both speech perception and production, we controlled for report latency (see Section Materials and Methods) to model the hemodynamic response to auditory input rather than speech production. Note, however, that including a regressor for report latency might not completely control for production-related activity, such that motor cortical activity (apparent in **Figures 3A, B**), could be due to more speech production during more intelligible compared to less intelligible trials.

#### *Correlation with hearing loss*

We further tested, whether hearing acuity had an influence on neural processing. For the contrast degraded > clear speech, we found a negative correlation with PTA in the right and left anterior insula (**Table 2** and **Figure 3C**). The correlation showed the following pattern: Older adults with better hearing acuity had elevated activity for degraded compared to clear speech. Conversely, listeners with greater hearing loss had an increased BOLD signal for clear relative to degraded speech in the anterior insula (**Figure 3C,** right panel). This Pearson's correlation between insula dynamics and hearing loss was significant even after partialling out age (*r* = −0.79, *p* = 0.001), confirming that the correlation was not driven by age.

on a given trial, activity increased in a network comprising the auditory cortices, precentral gyrus, left angular gyrus, putamen and middle frontal gyrus. There were no negative correlations with trial-by-trial report scores. **(C)** Negative correlation of hearing loss with correlation remains significant when removing the outlier participant with a PTA of 43 dB (*r* = −0.76, *p* = 0.002) and when partialling out age (*r* = −0.79, *p* = 0.001). Note that the y-axis is flipped such that

better hearing acuity is plotted higher up.

**Table 2 | Overview of MRI activation clusters showing a correlation with hearing loss or significant group × condition interactions, thresholded at** *p <* **0.001 (cluster-extent corrected)**.


*L: left, R: right, ACC: anterior cingulate cortex, MFG: middle frontal gyrus, SMA: supplementary motor area.*

## **FUNCTIONAL MRI DIFFERENCES BETWEEN YOUNG AND OLDER ADULTS**

#### *Degraded speech processing*

Generally, older and younger adults showed largely overlapping activations during degraded speech processing (see Erb et al., 2013 for the activation clusters in young adults). However, the ACC exhibited a group difference: young adults showed a higher increase in ACC activity when comparing degraded to clear speech than older adults did. This age group × degradation interaction was driven by a reduced dynamic range in older adults, who displayed persistently elevated levels of ACC activity in both conditions (**Figure 4A** and **Table 2**).

Extracting individual percentage signal change values from the ACC region of interest (as identified by the age group × degradation interaction) showed that, across groups, activity dynamics in the ACC were related to performance: Individuals with an overall better degraded speech comprehension showed a higher ACC differentiation for degraded vs. clear speech (**Figure 4B**). However, this correlation was driven by age, because the correlation was not significant in both groups separately (Pearson's correlation in young adults: *r* = −0.07, *p* = 0.73; older adults: *r* = 0.39, *p* = 0.16) and the correlation broke down when controlling for age (*r* = 0.15, *p* = 0.34). Within older adults, however, the correlation reached a non-significant trend, when partialling out age (*r* = 0.46, *p* = 0.09). However, the age group × mean performance interaction in the ACC BOLD signal failed to reach significance (*t*(41) = 1.5, *p* = 0.14).

#### *Trial-by-trial fluctuations in degraded speech comprehension*

Following up on the comprehension-dependent fluctuations observed in the cohort of young listeners (Erb et al., 2013), we also tested for such fluctuations in the older adults. While observing again substantial overlap between groups, an age group × comprehension interaction, also in prefrontal cortex, was manifest: Older adults' BOLD signal positively correlated

change for both groups and conditions (upper panel). For comparison, signal change for silent trials is shown, illustrating that older but not younger adults upregulate ACC activity for clear compared to silent trials. **(B)** Correlation of ACC dynamics with performance. A post-hoc ROI analysis showed that across groups, individuals with a better overall degraded speech comprehension (mean report scores) showed a higher differentiation (degraded minus clear speech) in ACC activity.

with comprehension (i.e., report scores) in the middle frontal gyrus (MFG). Young adults did not show such a correlation in MFG. Young adults, on the other hand, exhibited additional correlations with comprehension in the left fusiform gyrus, right cerebellum and posterior cingulate cortex, where older adults did not show a correlation (**Figure 5** and **Table 2**).

#### **DISCUSSION**

The current study intended to compare degraded speech processing between young and older adults to characterize how underlying neural mechanisms change with age. The main results can be summarized as follows: Although degraded speech comprehension overall appeared deteriorated in older adults, both

older and younger adults adapted to degraded speech at the same rate. Within older listeners, better working memory predicted faster adaptation rates and hearing loss predicted worse speech comprehension. Hearing loss was related to a distinct activation pattern in the anterior insula during degraded speech processing. Young listeners showed an expected modulation of ACC activity depending on task difficulty (i.e., degraded greater than clear speech), whereas older adults displayed elevated levels of ACC activity throughout, consistent with a persistent upregulation in this cognitive-control related area. Within the ACC, a greater dynamic range predicted better speech comprehension. Finally, for correctly comprehended degraded speech trials, older listeners recruited middle frontal regions in addition to a core speech comprehension network which younger listeners relied on, most likely reflecting a compensatory mechanism. We will now discuss these results in more detail.

#### **PERCEPTUAL ADAPTATION IS COMPARABLE IN YOUNG AND OLDER ADULTS**

Even though degraded speech comprehension was substantially reduced in older listeners, their ability to gradually adapt to degraded speech over the course of the experiment was preserved, as the slopes of their linear fits (adaptation curves) did not differ from young adults (**Figure 1C**). Yet, as the *R*2-values were relatively low (but cf. Davis et al., 2005, for comparable *R*2-values), it remains questionable what no difference between age groups in a parameter of a badly fitting model actually means. Importantly however, we have shown that a second, model-free measure of learning ( performance) is strongly correlated with adaptation slope, and does not differ between age groups either. We take this as evidence that adaptation is comparable in young and older adults.

This result is consistent with a finding by Peelle and Wingfield (2005) who had shown that older adults' (aged 65–78) perceptual learning of both time-compressed and spectrally shifted noise-vocoded speech was comparable to those of young adults: when older adults' starting accuracy was equated with young adults, both groups' speech comprehension improved at the same rate when listening to 20 degraded sentences. Thus, older and younger adults appear to show equivalent behavioral adaptation to degraded speech. However, this does not warrant that the underlying neural mechanisms are identical.

Older listeners with a better short-term memory adapted faster to degraded speech. The correlation of adaptation with backward but not forward digit span reached significance. The latter demands simple maintenance and repetition. In contrast, backward digit span requires the listener to perform an operation on the items held in the memory buffer (i.e., inverse the order of the digits) and has specific demands on central executive mechanisms (Baddeley, 2012), which are supposedly involved in effortful speech perception. Rabbitt (1968) suggested an effortful hypothesis, according to which working memory becomes a limiting factor when perceptual effort is required in degraded speech comprehension: Masked words disrupt the short-term memory buffer, because resources that would otherwise be available for encoding in short-term memory are diverted for perceptual effort (Piquado et al., 2010). Our result lends further support to the hypothesis that such cognitive and perceptual abilities become coupled more tightly at an older age (Baltes and Lindenberger, 1997).

#### **HEARING LOSS DETERIORATES DEGRADED SPEECH COMPREHENSION AND ALTERS INSULA DYNAMICS**

One aim here was to identify how individuals with hearing loss recognize and process degraded speech. Unsurprisingly, a decline in hearing acuity was associated with worse average degraded speech comprehension in older adults. This is best explained by the fact that on top of the exogenous signal degradation (i.e., 4-band vocoding), hearing loss endogenously distorts the signal and reduces audibility (Pichora-Fuller and Souza, 2003).

In addition, hearing loss was accompanied by changes in central neural mechanisms. While older adults with worse hearing activated the anterior insula more for clear than degraded speech, better-hearing older listeners had increased anterior insula activity during degraded relative to clear speech perception. Importantly, this activation pattern in the insula could not be explained by age. In Erb et al. (2013), we had shown that younger adults rely on the anterior insula in difficult listening (see Erb et al., 2013). Here, it appears that only better-hearing older adults succeed to recruit the anterior insula in adverse listening conditions, whereas older adults affected by substantial hearing loss show the inverse dynamics. These hearing loss effects were found even though speech was presented at an audible level for each participant, supporting the notion that these are centrally-driven changes.

The insula together with the ACC have commonly been termed the "cingulo-opercular" system (e.g., Harris et al., 2009). Here, an interesting dissociation between insula and ACC dynamics emerges: Anterior insula activation was affected by hearing loss (independent of age), while ACC activity was altered by age (see below).

There is accumulating evidence that the insula plays a crucial role in top-down, executive processes (Eckert et al., 2009; Menon and Uddin, 2010; Sterzer and Kleinschmidt, 2010; Adank, 2012). For example, Wild et al. (2012) showed that in young listeners, the insula exhibited an enhanced BOLD signal when listeners attended to speech (rather than a distracter), and the speech signal was degraded (vocoded rather than clear). Activation in the anterior insula further scaled with task-difficulty of a temporal non-speech auditory task, dependent on attention (Henry et al., 2013).

The current results are somewhat more complicated, as insula engagement depended on the extent of hearing loss; only those listeners with very mild, age-typical hearing loss (< 20 dB HL) exhibit such a pattern, while more hearing-impaired listeners did recruit insular cortex into clear-speech (i.e., already at milder task demands), but less so into degraded-speech processing (higher task demands). Intriguingly, this pattern of insula activation is reminiscent of the Crunch-hypothesis, according to which older adults show a load-dependent inverse *u*-shaped pattern of activity in cognitive-control-related areas (Reuter-Lorenz and Cappell, 2008). While this finding deserves thorough follow-up experimentation, the insula seems to be crucial for cognitive control (Eckert et al., 2009) and the observed alteration of its hemodynamics might add up to declines of the peripheral auditory system and manifest in the observed deterioration in speech recognition.

#### **PERSISTENT UPREGULATION OF ANTERIOR CINGULATE ACTIVITY REFLECTS INCREASED COGNITIVE CONTROL IN OLDER ADULTS**

The main objective of the current study was to identify age differences in the neural systems supporting degraded speech processing. We found an age group × degradation interaction in the ACC, where young listeners showed a higher activation difference between degraded and clear speech than older adults. The latter displayed elevated levels of ACC activation during both conditions, indicating a reduced dynamic range in older adults. Thus, the older adults' BOLD signal in the ACC appears to be less informative and flexible, as it is differentiating less between a degraded and clear speech input. This is consistent with an age-related decrease in the variability of the hemodynamic signal (Garrett et al., 2010).

As we intended to compare the results in the older group to the ones obtained in young adults (Erb et al., 2013), we normalized older adults to a young adult (MNI) template. This can be problematic due to group differences in brain morphology. For example, partial volume effects, that is, sampling both grey and white matter in one voxel, may increase in older adults, who commonly exhibit gray matter atrophy in frontal regions. One solution to overcome this problem is to only test for the interaction (Samanez-Larkin and D'Esposito, 2008). As we found an age group × condition interaction in the ACC, it is unlikely that older adults' smaller dynamic range in ACC activity could be driven by the non-diffeomorphic normalization of older adults to a young adult (MNI) template.

Age-related functional and structural changes in frontal lobe systems supporting cognitive control have been previously noted (Cabeza et al., 2004; Colcombe et al., 2005; Sharp et al., 2006; Eckert et al., 2008; Harris et al., 2009). For example, Harris et al. (2009) showed that ACC activity increases for incorrect compared to correct word recognition, but more so in older adults. The authors linked this to auditory cortex architecture, showing that ACC recruitment correlates with age-related neurodegeneration of the auditory cortex. Along the same line, Sharp et al. (2006) showed that aging is accompanied by greater cognitive control, as indexed by higher ACC and prefrontal cortex activity during semantic and syllabic decisions on noise-vocoded words. Both Sharp et al. (2006) and Harris et al. (2009) observed that activity in ACC increased with age and was detrimental to performance. Therefore, they interpreted the age-related increase of ACC activation as upregulation of error monitoring systems (Dosenbach et al., 2006, 2007).

In contrast, the present findings show that a higher dynamic range of ACC activity (i.e., the degree to which the ACC became relatively engaged and disengaged in degraded and clear speech, respectively) was associated with better speech comprehension. However, this correlation was best explained by age, as the correlation broke down when correlating groups separately. The following picture emerges: Dynamic range of ACC activity decreases with age which in turn is detrimental to speech comprehension. Rather than playing a compensatory role for deficits due to aging (Cabeza et al., 2002), the observed ACC dynamics might reflect a generalized upregulation of cognitive control with age irrespective of task difficulty (see also the discussion of dedifferentiation vs. compensation hypothesis below).

#### **COMPENSATORY PREFRONTAL ACTIVITY DURING SPEECH COMPREHENSION IN OLDER ADULTS**

For successful speech comprehension, younger adults' activated a perisylvian network (Erb et al., 2013), where the BOLD signal was tightly coupled to actual speech comprehension (i.e., behavioral report scores), rather than acoustic properties of the sentences. Older adults additionally showed a correlation with report scores in middle frontal gyrus. Eckert et al. (2008) similarly demonstrated that older adults engage the MFG when words are most intelligible. However, their design varied speech intelligibility by low-pass filtering words such that effects due to acoustic differences could not be disentangled from actual comprehension. In contrast, the current study held physical stimulus properties constant (i.e., 4-band vocoding) and thus identified regions where activation varied with actual speech comprehension (i.e., behavioral report scores). Therefore, the present data provide evidence that additional MFG activation observed in older adults is related to comprehension rather than physical speech clarity.

Age-related additional recruitment of middle frontal or lateral prefrontal cortex has been repeatedly observed, for example, during working memory tasks (Cabeza et al., 2002), visual attention (Cabeza et al., 2004), word recognition (Eckert et al., 2008), and for processing of syntactically complex sentences (Peelle et al., 2010b). Two hypotheses have been put forward to explain the frequently observed recruitment of additional brain regions by older adults not observed in young adults: The dedifferentiation hypothesis (Baltes and Lindenberger, 1997) interprets the extra activation as difficulties in recruiting specialized neural mechanisms for the relevant task, hence as a loss of neural specialization (Park et al., 2004). Such a generalized non-functional spread of activation has been attributed to a deficit in neurotransmission with a decrease in signal-to-noise ratio in neural firing (Li and Lindenberger, 1999). If this hypothesis is true, engagement of additional regions should not correlate with task performance. On the other hand, the compensation hypothesis suggests that recruitment of additional brain regions plays a compensatory role, for example in counteracting performance decline due to neurodegeneration in specialized brain areas (e.g., the auditory cortex; Harris et al., 2009; Peelle et al., 2011; Eckert et al., 2012), and should therefore improve performance (Cabeza et al., 2002; Heuninckx et al., 2008).

In the current study, engagement of the MFG in older adults covaried with report scores, that is, MFG activity did in fact increase with better performance. We take this as evidence for a compensatory mechanism in older adults, whereby the MFG is recruited in addition to the core speech comprehension network (Erb et al., 2013) when speech comprehension succeeds. In sum, our data contribute to, but cannot solve, the ongoing debate of dedifferentiation vs. compensation, in that the age-groupdifferences observed in two prefrontal areas, ACC and MFG, are to be interpreted with opposing conclusions regarding the compensation hypothesis by Cabeza et al. (2002, 2004).

## **CONCLUSIONS**

Our results show distinct age-related changes of responses in prefrontal cortex. Higher anterior cingulate and middle frontal gyrus activities are found associated with better performance in adverse listening conditions. However, unlike younger adults, older adults do not succeed in selectively modulating ACC activity depending on listening difficulty, but exhibit generalized upregulated levels of ACC activity also in easy listening conditions. In contrast, MFG activity appears to be truly compensatory, as older adults recruit frontal areas in addition to a speech comprehension network when comprehension succeeds. Moreover, more hearing-impaired older adults involve the anterior insula more even in clear speech comprehension. As all three structures have been linked to cognitive control, the results provide further evidence that older adults increasingly rely on cognitive control networks when adapting to challenging listening conditions, at the potential expense of these systems' dynamic range.

#### **AUTHOR CONTRIBUTIONS**

Julia Erb designed and conducted research, analyzed the data and wrote the paper and Jonas Obleser designed research, analyzed the data and wrote the paper.

#### **ACKNOWLEDGMENTS**

Research is supported by the Max Planck Society (Max Planck Research group grant to Jonas Obleser). The authors thank their colleagues in the Max Planck Research Group "Auditory Cognition" for initial discussions and two reviewers for their thoughtful comments. Sylvie Neubert and Nicole Pampus helped acquire the behavioral and MR data.

## **REFERENCES**


Wild, C. J., Yusuf, A., Wilson, D. E., Peelle, J. E., Davis, M. H., and Johnsrude, I. S. (2012). Effortful listening: the processing of degraded speech depends critically on attention. *J. Neurosci.* 32, 14010–14021. doi: 10.1523/jneurosci.1528-12.2012

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 09 August 2013; accepted: 05 December 2013; published online: 24 December 2013*.

*Citation: Erb J and Obleser J (2013) Upregulation of cognitive control networks in older adults' speech comprehension. Front. Syst. Neurosci. 7:116. doi: 10.3389/fnsys.2013.00116*

*This article was submitted to the journal Frontiers in Systems Neuroscience*.

*Copyright © 2013 Erb and Obleser. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms*.

## The effect of mild-to-moderate hearing loss on auditory and emotion processing networks

### *Fatima T. Husain1,2,3\*, Jake R. Carpenter-Thompson2,3 and Sara A. Schmidt 2,3*

*<sup>1</sup> Department of Speech and Hearing Science, University of Illinois at Urbana-Champaign, Champaign, IL, USA*

*<sup>2</sup> The Neuroscience Program, University of Illinois at Urbana-Champaign, Champaign, IL, USA*

*<sup>3</sup> Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Champaign, IL, USA*

#### *Edited by:*

*Jonathan E. Peelle, Washington University in St. Louis, USA*

#### *Reviewed by:*

*Carolyn McGettigan, Royal Holloway University of London, UK Conor J. Wild, Western University, Canada*

#### *\*Correspondence:*

*Fatima T. Husain, Department of Speech and Hearing Science, University of Illinois at Urbana-Champaign, 901 S. Sixth Street, Champaign, IL 61820, USA e-mail: husainf@illinois.edu*

We investigated the impact of hearing loss (HL) on emotional processing using task- and rest-based functional magnetic resonance imaging. Two age-matched groups of middle-aged participants were recruited: one with bilateral high-frequency HL and a control group with normal hearing (NH). During the task-based portion of the experiment, participants were instructed to rate affective stimuli from the International Affective Digital Sounds (IADS) database as pleasant, unpleasant, or neutral. In the resting state experiment, participants were told to fixate on a "+" sign on a screen for 5 min. The results of both the task-based and resting state studies suggest that NH and HL patients differ in their emotional response. Specifically, in the task-based study, we found slower response to affective but not neutral sounds by the HL group compared to the NH group. This was reflected in the brain activation patterns, with the NH group employing the expected limbic and auditory regions including the left amygdala, left parahippocampus, right middle temporal gyrus and left superior temporal gyrus to a greater extent in processing affective stimuli when compared to the HL group. In the resting state study, we observed no significant differences in connectivity of the auditory network between the groups. In the dorsal attention network (DAN), HL patients exhibited decreased connectivity between seed regions and left insula and left postcentral gyrus compared to controls. The default mode network (DMN) was also altered, showing increased connectivity between seeds and left middle frontal gyrus in the HL group. Further targeted analysis revealed increased intrinsic connectivity between the right middle temporal gyrus and the right precentral gyrus. The results from both studies suggest neuronal reorganization as a consequence of HL, most notably in networks responding to emotional sounds.

**Keywords: fMRI, hearing loss, resting-state fMRI, functional connectivity, emotion, IADS**

## **INTRODUCTION**

Hearing loss (HL) remains one of the most common chronic conditions affecting older adults (Cruickshanks et al., 1998), with the prevalence rate increasing from between 25 and 40% in adults above 65 years of age to greater than 80% in people older than 85 years (Yueh et al., 2003). In general, mild-to-moderately severe sensorineural HL, which is often untreated, affects about 23–33% of the adult population in the world (Stevens et al., 2013). HL has a significant impact on quality of life and general well-being of an older adult (Mulrow et al., 1990; Carabellese et al., 1993) and may be associated with depression and isolation, especially in those younger than 70 years of age (Gopinath et al., 2009). However, little is known about the consequences of mild-to-moderately severe HL on the neural architecture and functionality of the brain.

The majority of brain imaging studies that have explored HL have done so when HL has occurred in conjunction with tinnitus (e.g., Weisz et al., 2004; Husain et al., 2011b), other disorders (e.g., Yoneda et al., 2012) or in the context of sign language studies when the impairment has been profound (e.g., Petitto et al., 2000; Husain et al., 2009). A few brain imaging studies have investigated the impact of HL on aging (Wong et al., 2010; Peelle et al., 2011); these remain the best sources to understand the neural correlates of HL. These neural correlates are manifested in decrease in gray matter in the frontal cortex (Wong et al., 2010; Peelle et al., 2011) and a decreased response in the superior temporal cortex, thalamus and brainstem in a speech comprehension task (Peelle et al., 2011). Our previous structural MRI study of HL in older adults (conducted as part of a larger study to investigate neural bases of tinnitus and HL) observed gray matter loss in the frontal cortices and disordered white matter tracts leading into and out of the auditory cortex (Husain et al., 2011a). A companion functional study noted increased response of the regions in the frontal and parietal cortices, possibly which related to increased attention processing (Husain et al., 2011b). In the latter fMRI study, participants had mild-to-moderately severe HL with an average age in the mid-fifties and were asked to perform a discrimination task with simple tones and tonal sweeps. When compared to rest trials, task trials resulted in greater response in the temporal, frontal and parietal cortices in the HL group relative to the normal hearing (NH) controls.

In the present study, we investigated the neural correlates of mild-to-moderate sensorineural HL in middle-aged adults without tinnitus or any other chronic physical or mental condition and compared them to age-matched NH controls. We concentrated on extra-auditory networks, specifically the one concerned with emotional processing. The limbic system is typically the main network of emotion processing. It consists primarily of the amygdala, parahippocampus, ventral medial prefrontal cortex, nucleus accumbens, and insula. Recently, studies have begun to map out the regions and connectivity of the auditory emotional network in adults with NH (Blood and Zatorre, 2001; Koelsch et al., 2006; Kumar et al., 2012) and in those with tinnitus (Giraud et al., 1999; Mirz et al., 2000; Seydell-Greenwald et al., 2012; Golm et al., 2013). Using dynamic causal modeling, Kumar et al. (2012) found that the negative valence of a sound modulates the backward connections from the amygdala to the auditory cortex, and the acoustic features of a sound modulate the forward connections from the auditory cortex to the amygdala. It is likely that such acoustic features, processing nodes and their connectivity may be susceptible to changes due to sustained loss of hearing acuity when listening to affective sounds. This may result in delayed processing or misclassification or both of the affective sounds. One of the goals of the present study was to investigate whether loss of hearing acuity affects the acoustic processing of affective sounds and whether this impacts their perception.

HL may also affect the perception of the valence of affective sounds, regardless of the processing of acoustic features. Tinnitus, in particular, has been established to have an altered auditorylimbic system link (Jastreboff, 1990; Rauschecker et al., 2010); behaviorally, there is greater prevalence of depression and anxiety in the tinnitus patient group compared to the general population (Bartels et al., 2008). Not surprisingly then, this disordered link is beginning to be studied in tinnitus. However, HL occurs in about 90% of the individuals with tinnitus (Davis and Rafaie, 2000), and the unique contribution of tinnitus to any changes in emotional processing is unknown. There are other reasons to study emotional processing in HL. As previously noted, prevalence of HL increases with age (Yueh et al., 2003) and may contribute to social isolation (Gopinath et al., 2009). This in turn may impact the emotion processing limbic network, as it has been shown to occur with aging and with tinnitus (Mather and Knight, 2005; Rauschecker et al., 2010; St Jacques et al., 2010; Anticevic et al., 2012). Nevertheless, no brain imaging study has explicitly investigated the emotion processing network in older adults with HL.

We conducted both a task-based and a resting state functional connectivity study of the emotion processing network in middleaged adults with bilateral sensorineural HL. The task consisted of classification of sounds as being "pleasant," "unpleasant," or "neutral." Our working hypothesis was that a loss of hearing acuity affects behavior, sounds may appear more unpleasant (Franks, 1982; Feldmann and Kumpf, 1988; Leek et al., 2008; Rutledge, 2009; Uys et al., 2012), reaction times may be longer due to effortful listening (Hicks and Tharpe, 2002; Tun et al., 2009). Likewise, the neural network subserving emotion processing may be affected, specifically in the response patterns of the nodes. In order to assess the impact of HL on a baseline, resting state, without the distraction of any task, we measured the functional connectivity of a number of networks, including that connected to the amygdala (a primary node of the limbic system).

Our main focus was on auditory and limbic systems, but these systems interact with intrinsic networks devoted to attention processing and possibly the default mode network (DMN). Intrinsic networks, or resting state networks, are defined as spontaneous, low-frequency oscillations in brain activity that can be organized into coherent networks. In many cases, intrinsic networks mirror task-related networks. For example, the auditory resting state network closely resembles an auditory task network. However, instead of correlations between activated regions in the task-based network, the intrinsic network shows correlations between deactivated brain regions. The DMN is the quintessential resting state network and is unique in that it exhibits deactivation in a taskbased state and is active during rest (Raichle et al., 2001). The DMN exhibits a push-pull type of relationship with other networks in the brain (Fox et al., 2005). The dorsal attention network (DAN), for instance, will show activations while the DMN is deactivated (in a task-based state). Its relationship with the DAN and other intrinsic networks warrants study of the DMN. Many disorders, including Alzheimer's disease, schizophrenia, and tinnitus, have been shown to affect the connectivity of the DMN (for reviews see, Greicius, 2008; Husain and Schmidt, 2013). It is also possible that intrinsic connectivity may differ in patients with HL, perhaps relating in particular to limbic areas. Alterations to resting state functional connectivity may result in decreased preparedness to perform a task. In particular, if limbic areas are shown to display irregular connectivity to other brain regions at rest, it may change how individuals process emotional stimuli.

Although we had provisional hypotheses, our study was exploratory because of a lack of brain imaging studies investigating the impact of HL on emotion processing and related extra-auditory networks. In the resting state portion of our study, general hypotheses were made regarding which networks may show altered connectivity, but we did not specify the nodes and the nature of these alterations. We had more constrained expectations about the role of amygdala and auditory processing areas in the emotion-task study, in that we expected reduced engagement of such regions in listeners with HL when processing affective stimuli. In sum, we conducted a comprehensive study, combining both task- and rest-based fMRI using multiple seed regions in order to establish a baseline for future studies.

## **METHODS**

#### **SUBJECTS**

Participants were recruited from the Urbana-Champaign area, were scanned under the UIUC IRB 10144 protocol, gave written informed consent, and were monetarily compensated. Subjects belonged to one of two groups: middle-aged adults with bilateral high-frequency sensorineural HL (*n* = 12), or their age- and gender-matched controls with NH (*n* = 15). Three NH subjects were excluded due to excessive motion artifact, and data from only 12 NH participants were included in the final analysis. During recruitment, we excluded subjects that presented with tinnitus or with asymmetric HL at the time of their audiological examination. We defined asymmetric HL to be more than a 15 dB HL difference between the right and left ear at one or more frequencies, or if the right and left ear differed by 10dB HL at two consecutive frequencies. The Beck Depression Inventory (BDI-II) and the Beck Anxiety Inventory (BAI) were used to assess depression and anxiety levels (Beck and Steer, 1984; Steer et al., 1993, 1999). All subjects scored in the minimal depression range and minimal anxiety range for the BDI-II and BAI, respectively. See **Table 1** for information about subject demographics.

#### **AUDIOMETRIC EVALUATION**

A comprehensive audiometric evaluation was performed for each subject. Audiological testing took place inside a soundattenuating booth and included pure tone testing, word recognition testing, and bone conduction testing. Additional tests included distortion product otoacoustic emissions and tympanometry measurements to eliminate any confounding peripheral hearing pathologies. For pure tone testing, the test frequencies included 0.25, 0.5, 1, 2, 4, 6, and 8 kHz. For all of the test frequencies, each subject in the NH group had pure-tone thresholds of 25dB HL or lower. Participants in the HL group had a puretone threshold of 30 dB HL or lower for the test frequencies 0.25–2 kHz, with the exception of two HL subjects who had a slightly elevated threshold of 35 dB HL at 1 kHz. For the test frequencies 4–8 kHz, the HL subjects had pure-tone thresholds between 30 and 70 dB HL, ranging from mild to moderatelysevere HL. **Table 1** contains information about the average hearing at testing frequencies for each subject group. None of the HL participants relied on aided hearing.

#### **DATA ACQUISITION**

A 3Tesla Siemens Magnetom Allegra head-only scanner was used to acquire all MRI images. A series of two anatomical and two functional images were acquired- the first functional scan was of the emotional task and the second acquired data on resting



*Pure tone audiogram information at each frequency is averaged across both ears and all subjects. BDI, Beck Depression Inventory; BAI, Beck Anxiety Inventory.*

state; order of acquisition varied across the subjects. Thirty-two low-resolution T2-weighted structural transversal slices (*TR* = 7260 ms, *TE* = 98 ms) were collected for each volume with a 4.0 mm slice thickness and a 0.9 <sup>×</sup> <sup>0</sup>*.*<sup>9</sup> <sup>×</sup> <sup>4</sup>*.*0 mm<sup>3</sup> voxel size [matrix size (per slice), 256 × 256; flip angle, 150◦]. We obtained 160 high resolution magnetization-prepared rapid-acquisition with gradient echo (MPRAGE) sagittal slices for each volume that were 1.2 mm in thickness with a 1.0 <sup>×</sup> <sup>1</sup>*.*<sup>0</sup> <sup>×</sup> 1.2 mm<sup>3</sup> voxel size [*TR* = 2300 ms; *TE* = 2*.*83 ms; matrix size (per slice), 256 × 256; flip angle, 9◦]. The functional images were acquired using the following parameters: slice thickness, 4 mm; inter-slice gap, 0.4 mm; 32 axial or transverse slices, distance factor, 10%; voxel size, 3.4 <sup>×</sup> <sup>3</sup>*.*<sup>4</sup> <sup>×</sup> 4.0 mm3; field of view (FoV) read, 220 mm; TR, 9000 ms with 2000 ms acquisition time; TE, 30 ms; matrix size (per slice), 64 × 64; flip angle, 90◦. Functional images were acquired separately for (a) an emotional task and (b) a resting state study.

#### *(a) Emotion task*

To mitigate the loud noise of the radio frequency gradients generated during image acquisition from interfering with the perception of the stimuli, we used clustered echo-planar imaging (EPI) (Hall et al., 1999; Gaab et al., 2003; Zaehle et al., 2004). Clustered EPI, or sparse sampling, was chosen particularly to improve the listening environment for the subjects with HL. To reduce scanner noise interference with sound perception, we collected one image volume (2 s acquisition time) post stimulus presentation rather than using continuous image acquisition during a period of "relative quiet" when the radio-frequency gradients were switched off and the only source of ambient noise was the scanner pump. The repetition time was 9 s, and within each trial a 6 s stimulus was presented during a 7 s interval of reduced scanner noise. To optimize the scanning procedure to detect neural response from regions within the limbic system, prior to data acquisition, a custom MATLAB (http://www*.* mathworks*.*com/products/matlab/) toolbox was used in order to fine-tune the timing of stimulus presentation relative to image acquisition (Dolcos and McCarthy, 2006). Stimuli were selected from the International Affective Digital Sounds (IADS) database and had normative scores for valence and arousal; sounds were rated on a scale of 1–9 (9 very pleasant, 1 very unpleasant for valence and 9 very arousing, 1 not at all arousing for arousal) (Bradley and Lang, 2007). Normative scores were as follows: pleasant (valence: 6*.*83 ± 0*.*54, arousal: 6*.*46 ± 0*.*56), unpleasant (valence: 2*.*78 ± 0*.*58, arousal: 6*.*9 ± 0*.*31) and neutral (valence: 4*.*81 ± 0*.*43, arousal: 4*.*85 ± 0*.*57). The normative valence ratings for P and U sounds are statistically different at *p <* 0*.*00001, but did not differ in arousal scores. **Supplementary Table 1** contains a complete list of the affective sounds used in the present study. We presented the sounds in the scanner at a maximum comfort level, as indicated by each participant, during the relatively quiet intervals of image acquisition. Prior to data collection, subjects were given both written and verbal instructions. Additionally, subjects were trained using sounds from the IADS database, different from the stimuli chosen in the experiment, to familiarize the participants with the task. During the final experiment, Presentation 14.7 software (www*.*neurobs*.*com) on a Windows XP computer in the fMRI control room was used to deliver sounds and instructions via pneumonic headphones (Resonance Technology, Inc., Northridge, CA.). To complete the task, subjects listened to 90 affective sounds [30 pleasant (P), 30 neutral (N), 30 unpleasant (U)], each 6 s in duration and were instructed to rate the sound as P, N, or U as soon as they felt confident in their rating.

#### *(b) Resting state*

To acquire information about the resting state, continuous scanning instead of clustered image acquisition was employed. During the resting scan, which was continuous and lasted approximately 5 min, subjects were instructed to lie still and to look at a fixation cross for the scan duration. One hundred and fifty volumes were collected for each subject. The first four images were discarded prior to preprocessing, leaving 146 volumes for analysis.

#### **PREPROCESSING**

Pre-processing was similar for both types of functional scans. Statistical Parametric Mapping 8 (SPM, Welcome Trust Centre for Neuroimaging, http://www*.*fil*.*ion*.*ucl*.*ac*.*uk/spm/software/ spm8/) software was used to analyze the functional imaging data. The images were first realigned using a rigid body transformation to control for head motion. Next, the low resolution Axial T2 (AxT2) image was registered to the mean fMRI image generated during the first step. The high resolution MPRAGE image was then registered to the AxT2 image. To normalize the functional images to MNI space, the MPRAGE was normalized to match a standard T1 MNI template. The normalized images were then smoothed using a Gaussian kernel of 8 <sup>×</sup> <sup>8</sup> <sup>×</sup> 8 mm3 full width at half-maximum. To account for artifacts created by head motion, data from 3 NH subjects who displayed excessive motion (defined as at or above ±1.5 mm translation and ± 1.5◦ rotation in any direction) were not included for further analysis. We also included regressors of motion as covariates of no-interest in the general linear models created in the different statistical analyses, in order to (partially) remove motion-related artifacts. Further, *t*-tests of root-mean-square estimates of both rotational and translational movement showed that there was no statistical significant difference between the two groups (translational motion mean ± standard deviation): NH (0*.*63 ± 0*.*28); HL (0*.*67 ± 0*.*23); rotational motion: NH (0*.*01 ± 0*.*005), HL (0*.*01 ± 0*.*004).

#### **DATA ANALYSIS**

#### *(a) Emotion task*

Behavioral data were obtained in the scanner during fMRI data acquisition. We collected individual subject ratings of each sound as P, N, or U along with reaction times. Subject ratings and reaction times were analyzed using separate Two-Way ANOVA tests in SPSS ver. 20 software (statistical package for social sciences, IBM, http://www-01*.*ibm*.*com/software/analytics/spss/). Group (NH, HL) and condition (P, N, U) were set as independent fixed factors in a general linear model within SPSS, and significance was set at *p <* 0*.*05.

For data analysis, trials were coded based upon each individual's subjective rating of the affective sounds. We chose to employ the individual ratings to classify trials as "P," "N," or "U," rather than the norms reported in IADS. Either using the IADS classification or individual rating to analyze data are valid methods of classifying the individual trials for further analysis. The trend in affective neuroscience literature is to move away from the normative classification to the individual classification, particularly when examining special populations where it is expected that emotional responses are altered (e.g., older adults or patient population) (St Jacques et al., 2010). It should be noted that the ratings reported with the IADS were obtained from younger adults with NH (Bradley and Lang, 2007). In the present study, the subject population was older with mean age of 51*.*4 ± 9*.*9 for NH adults and 58*.*2 ± 9*.*5 for those with HL. Therefore, individual ratings were used rather than the norms for classifying the trials obtained during fMRI scanning. First level fixed effects analysis was performed on each subject's smoothed images to generate P *>* N and U *>* N contrast images. Motion parameters were included in the first level model as covariates of no-interest. The contrast images were then included in the flexible factorial analysis and *post-hoc* two-sample *t*-tests at the second level. The P + U *>* N contrast was computed by performing a *t*-test on the condition vectors for each group separately in the flexible factorial model. The three factor design included group, subject and condition. Group was assumed to be independent and have unequal variance, subject was assumed to be independent and possess equal variance, and condition was assumed to be a dependent factor and to have equal variance. To directly compare the NH and HL groups, we conducted *post-hoc* two-sample *t-*tests within the flexible factorial model. Additionally, we performed a region-ofinterest (ROI) analysis using the Wake Forest University (WFU) pickatlas toolbox (http://www*.*fmri*.*wfubmc*.*edu) within SPM8, with regions defined anatomically based on the human MNI atlas within the toolbox. Based on our *a priori* hypothesis about the involvement of auditory and limbic regions in affective sound processing, we created a single anatomically-defined mask via selecting regions including the amygdala, insula, parahippocampus, nucleus accumbens, ventral medial prefrontal cortex, inferior colliculus, medial geniculate body and primary auditory cortex (Brodmann areas 42, 41, 22). ROI analysis was performed on the NH (P + U *>* N), HL (P + U *>* N), and between group contrasts, and small volume correction (SVC) was employed. All clusters identified in the results were reported using a significance level set at *p <* 0*.*025 FWE at either the voxel or cluster level (threshold halved from the standard *p <* 0*.*05 to account for a two-tailed *t*-test).

#### *(b) Resting state*

Preprocessing of the resting state data began with slice time correction for the interleaved, ascending data collection. Following that, the same preprocessing steps used for the emotion task were applied. The Functional Connectivity Toolbox (Conn) (Whitfield-Gabrieli and Nieto-Castanon, 2012) for MATLAB was used for data analysis. The smoothed fMRI data was band-pass filtered from 0.008 to 0.08 kHz. The average BOLD timeseries of the segmented white matter and cerebrospinal fluid as well as the realignment/motion parameters generated during preprocessing were regressed out of the data. Then, seed-to-voxel analysis was performed to analyze the auditory resting state network, the DMN, and the DAN. Connectivity was assessed using pairs of seed regions for both the DMN and auditory networks; correlations between each seed and the whole brain were measured and averaged across seed pairs. Seed locations are listed in **Table 2**. For the auditory network, seeds were located in the bilateral primary auditory cortices. For the DMN, they were located in the medial prefrontal cortex and the posterior cingulate cortex. The DAN was examined using four seeds in the left and right posterior intraparietal sulci and the left and right frontal eye fields (Burton et al., 2012). All seeds were created using Marsbar (Brett et al., 2002) and were 5 mm in radius. Coordinates of seed regions were the same as those used in (Schmidt et al., 2013). The resting state data used in the present study were partially described in the Schmidt et al. (2013) study, but have been re-analyzed for the purpose of this study. Correlation maps of the whole brain were created for each seed and then averaged over all seeds for a specific network for each subject. These correlations were then z-transformed, group averages were computed, and across-group comparisons were made via two-sample *t*-tests in the Conn toolbox (Whitfield-Gabrieli and Nieto-Castanon, 2012). Results were then exported to SPM8 for display purposes. After whole brain analysis at *p <* 0*.*001 uncorrected threshold, clusters that were significant at *p <* 0*.*025 FWE either at voxel or cluster level were selected to account for both tails of the *t*-test, with cluster extent set at 25 voxels.

A seeding analysis designed to examine resting state network connectivity was performed using ROIs determined from published studies as stated earlier, as well as using ROIs identified from the task-based study. The latter ROIs included the left amygdala, left inferior parietal lobule, left superior frontal gyrus, right middle temporal gyrus, and the right superior parietal lobule (**Table 2**). The left amygdala seed was created based on the NH *>* HL (P + U *>* N) contrast from the task ROI analysis. The right

**Table 2 | Seed regions for the resting state functional connectivity analysis, consisting of seeds for canonical resting state networks and for networks based on local axima from the results of the emotion task study.**


*Coordinates are listed in Montreal Neurological Institute (MNI) coordinates.*

superior parietal lobule and left inferior parietal lobule were both based on the HL *>* NH (P + U *>* N) contrast, and the right middle temporal gyrus and left superior frontal gyrus seeds were based on the NH *>* HL (P + U *>* N) contrast from the emotion task results. All of these ROIs were determined from group-level contrasts. Peak maxima were used as the centers for the spherical ROIs, and the mean BOLD timeseries of the voxels in the ROI was generated. Seed specification, data generation and statistical analyses were performed in the manner described earlier for the standard seeds.

#### **RESULTS**

#### **BEHAVIORAL DATA**

#### *(a) Emotion task*

Ratings were obtained in the scanner simultaneous with fMRI data acquisition. There was a statistically significant main effect of group [*F(*1*,* <sup>23</sup>*)* = 69*.*53, *p <* 0*.*000001], main effect of condition [*F(*1*,* <sup>23</sup>*)* = 17*.*59, *p <* 0*.*000001] and interaction between group and condition [*F(*1*,* <sup>23</sup>*)* = 7*.*79, *p <* 0*.*000423] for the reaction time data. The NH group responded significantly slower to the neutral sounds compared to the pleasant and unpleasant sounds (**Figure 1A**). The HL group's reaction time for the three types of sounds did not significantly differ (**Figure 1A**). For between group comparisons, the HL group was significantly slower for both pleasant and unpleasant sounds compared to the NH group (**Figure 1A**). Concerning the type of responses, there was a significant main effect of condition [*F(*1*,* <sup>23</sup>*)* = 7*.*162, *p <* 0*.*01]; however, the main effect of group [*F(*1*,* <sup>23</sup>*)* = 0*.*118, *p* = 0*.*733] and the interactions [*F(*1*,* <sup>23</sup>*)* = 0*.*109, *p* = 0*.*897] did not reach significance. Both groups rated significantly more stimuli as unpleasant compared to pleasant and neutral (determined using *post-hoc* within-group *t*-tests) (**Figure 1B**). Note that the experimental design used an equal number of sounds classified as P, N, and U according to the normative IADS scores (Bradley and Lang, 2007). Due to the observed variation from the normative ratings, we chose to code the trials during analysis according to each individual's rating.

#### *(b) Resting state*

No behavioral data were obtained for the resting state study.

## **fMRI DATA**

#### *(a) Emotion task*

Within group comparisons: In the NH group, areas of increased activation for the contrast P + U *>* N were observed in the bilateral middle temporal gyri, right transverse temporal gyrus, left superior temporal gyrus, left post central gyrus, right medial frontal gyrus, left superior frontal gyrus, left middle frontal gyrus, left anterior cingulate and the left insula (**Figure 2**, **Table 3**). With respect to the HL group, increased response was obtained in the bilateral superior temporal gyri, bilateral transverse temporal gyri, right middle temporal gyrus, right superior frontal gyrus, left middle frontal gyrus, right medial frontal gyrus, right precuneus, left inferior parietal lobule, left precentral gyrus, left lentiform nucleus and corpus callosum for affective sounds compared to neutral sounds (**Figure 2**, **Table 3**). ROI analysis revealed increased response in bilateral transverse temporal gyri, bilateral

**(A)** Mean reaction time data. For within group comparison, the HL group did not statistically differ between the P, N, and U reaction times. However, the NH group responded significantly slower to the N sounds compared to the P

patterns from both groups (MNI coordinate *z* = +14). The maps are displayed at *p <* 0*.*001 uncorrected level for better visualization, but the clusters in the circles are corrected for multiple comparisons (*p <* 0*.*05 FWE). (1) bilateral middle temporal gyrus, (2) medial frontal gyrus, (3) bilateral middle temporal gyrus, (4) medial frontal gyrus.

superior temporal gyri, bilateral medial frontal gyrus, left insula, right middle temporal gyrus and right parahippocampus for the NH (P + U *>* N) contrast (**Table 3**). For the HL (P + U *>* N) comparison, increased response was observed in the bilateral transverse temporal gyri, bilateral superior temporal gyri, bilateral superior frontal gyri, right medial frontal gyrus and left insula (**Table 3**).

Between-group comparisons: For the NH *>* HL (P + U *>* N) comparison, we observed heightened response in the left superior frontal gyrus, right middle temporal gyrus, left superior temporal gyrus, and left superior occipital gyrus (**Figure 3**, **Table 4**). Concerning the HL *>* NH (P + U *>* N) comparison, elevated response was observed in the right superior parietal lobule, right precuneus, and left inferior parietal lobule (**Figure 3**, **Table 4**). significantly slower for P and U sounds. **(B)** Mean number of responses. The NH and HL groups responded U significantly more than N and P. Statistical significance level *p <* 0*.*05 indicated by ∗.

For the ROI analysis, no suprathreshold voxels were obtained for the HL *>* NH (P + U *>* N) contrast. However, for the NH *>* HL (P + U *>* N) comparison, increased response was observed in the left amygdala and left parahippocampus (**Figure 3**, **Table 4**).

#### *(b) Resting state*

No significant differences were found between groups in the auditory resting state network. With respect to the DMN, a significant difference in the left middle frontal/precentral gyrus was observed in the HL *>* NH comparison. Analysis of the DAN revealed a significant difference in the left postcentral/precentral gyrus and left insula. These results of the typical intrinsic networks are listed in **Table 5** and displayed in **Figure 4**. For the connectivity analysis with task-based ROIs, no significant differences in connectivity with the left amygdala were seen. Placing a seed in the left inferior parietal lobule also did not reveal significant differences. With the seed in the left superior frontal gyrus, the NH *>* HL contrast showed significant differences in connectivity with the left middle occipital lobe. The right middle temporal seed showed a stronger correlation with the right precentral gyrus in the HL group compared to the NH group. Finally, no connectivity differences were seen with the seed in the right superior parietal lobule. The results of this analysis are also shown in **Table 5** and **Figure 4**.

#### **DISCUSSION**

We used a combined task- and rest-based fMRI study to identify the influence of HL on auditory and emotion processing. Results of the responses collected in the scanner revealed similar ratings for the unpleasant, pleasant, and neutral sounds—both groups tended to rate more sounds as unpleasant relative to the other types of sounds. However, the HL group differed from

## **Table 3 | Local maxima for the P + U** *>* **N contrasts from the emotion task.**


*(Continued)*

#### **Table 3 | Continued**


*The loci are listed for each group separately for the whole-brain analysis and the region-of-interest (ROI) analysis. Reported regions are listed in Montreal Neurological Institute (MNI) coordinates and in terms of Brodmann areas (before determining the Brodmann areas the MNI coordinates were converted to Talairach coordinates). The ROI mask comprised of bilateral regions in the primary auditory cortex, medial geniculate body, inferior colliculus, amygdala, insula, parrahippocampus, nucleus accumbens, and ventral medial prefrontal cortex, defined anatomically. Statistical threshold was set at p < 0.05 FWE corrected for multiple comparisons; all clusters noted here were significant at both voxel and cluster level. L, left; R, right.*

the NH group in their response times, which were significantly slower for the affective sounds. Overlapping patterns of fMRI activation were observed in both groups when processing affective sounds compared to neutral sounds. The main finding for the emotion task was increased activation in the left amygdala/parahippocampal gyrus complex for the NH *>* HL (P + U *>* N) comparison, via targeted ROI analysis. The reverse contrast, HL *>* NH (P + U *>* N), did not show increased activation within the limbic system, but rather revealed heightened responses in the right superior parietal lobule, right precuneus, and left inferior parietal lobule. Resting-state functional connectivity in the same group of participants focused on the canonical intrinsic networks and on seed regions obtained from the task-based activation patterns. Among the typical intrinsic networks, the default mode and DAN, but not the auditory network, revealed differences between the groups. Using seeds from the local maxima noted in the task-based analysis, decreased connectivity between the frontal cortex and other brain regions was noted, with the exception of stronger connectivity between the middle temporal gyrus and the right precentral gyrus in the HL group compared to the NH group. Our results suggest that HL may alter the emotional processing networks and lead to slower reaction times to affective stimuli.

## **EMOTION TASK**

HL may reduce the engagement of the emotional processing system, either because of disordered processing of acoustic or of valence features. Complex anatomical and functional connections exist between the auditory cortex and the limbic system, primarily with the amygdala (Amaral and Price, 1984; Blood and Zatorre, 2001; Koelsch et al., 2006; Tschacher et al., 2010; Kumar et al., 2012). Forward projections from the auditory cortex to the amygdala have been shown to be modulated by acoustic features, but backward projections appear to be modulated by the valence of sound (Kumar et al., 2012). The forward and backward projections work in concert to interpret incoming affective stimuli (Kumar et al., 2012). Sound deprivation may reduce the amount of acoustic or valence information available for this network. Reduced information may result in a dampened emotional response to affective stimuli, because individuals with HL may not receive all of the acoustic or valence information necessary to cause a robust emotional response. We investigated this hypothesis via whole brain analysis and a targeted ROI analysis of the auditory and limbic areas.

In our study, both positively and negatively valent sounds exhibited greater engagement of the limbic system and faster response times, compared to neutral sounds, in NH individuals. However, this pattern differed in the HL group, with a decreased response in the temporal and frontal cortices (whole brain analysis) and in the amygdala and parahippocampus (ROI analysis), and an increased response in the parietal cortices and precuneus compared to the NH group when processing affective sounds (**Table 4**). Similarly, in behavioral responses, the reaction times for the P and U sounds were significantly slower in the HL group (**Figure 1A**). The behavioral data suggest that the advantageous faster processing of affective sound found in the NH group does not occur in the HL group. The slower responses to pleasant and unpleasant sounds in the HL group may be due to lack of faster, bottom-up engagement of the amygdala and other limbic structures during auditory processing. However, HL does not appear


**Table 4 | Local maxima for the P + U** *>* **N contrasts from the emotion task for the group comparisons.**

*The loci are listed for each group separately for the whole-brain analysis and the region-of-interest (ROI) analysis. Reported regions are listed in Montreal Neurological Institute (MNI) coordinates and in terms of Brodmann areas (before determining the Brodmann areas the MNI coordinates were converted to Talairach coordinates). The ROI mask comprised of bilateral regions in the primary auditory cortex, medial geniculate body, inferior colliculus, amygdala, insula, parrahippocampus, nucleus accumbens, and ventral medial prefrontal cortex, defined anatomically. Statistical threshold was set at p < 0.025 FWE corrected for multiple comparisons (to account for two-tailed t-test); all clusters noted here were significant at both voxel and cluster level, unless noted with an \* which indicates significance only at cluster level. L, left; R, right.*

to affect the response to all sounds, with the reaction times for neutral sounds being nearly identical in the two groups. Instead, the highly valent sounds were most affected, indicating that it is identification of valence information rather than acoustic information that is impacted by HL. Another possible explanation of the differential processing of affective sounds by the HL group could be that there is more energy or information in the high frequency regions in the affective sounds and less so in the neutral sounds. Difficulty of processing high-frequency information in the affective sounds by the HL group led to longer reaction times. In order to maintain ecological validity, we chose not to low-pass filter the sounds to compensate for the HL of the HL group. In a previous study with mild-to-moderate HL (Husain et al., 2011b), we filtered sounds such that there was no energy in frequencies greater than 2 kHz. We found no difference in accuracy or reaction times for a discrimination task between HL and NH groups, nevertheless, the fMRI activation patterns were different (Husain et al., 2011b). In sum, regardless of the actual mechanism, our results suggest that HL may reduce engagement of the amygdala and result in slower reaction to positively and negatively valent sounds.

Another interesting aspect of the behavioral results was the finding that an increased number of sounds were classified as unpleasant, which differed from the published normative data (Bradley and Lang, 2007). There are at least two possible explanations for this finding. First, the normative scores were obtained in a young, NH population; therefore it is not surprising that both sets of older middle-aged participants in our study differed from these ratings, which points to an effect of age. Another possible explanation is that possible discomfort in the scanner may have influenced our participants to be more negative in their ratings. In order to tease apart these explanations and better understand the effect of aging on emotional processing, we intend to conduct a follow-up study with both young and older participants using stimuli from the IADS database.

#### **RESTING STATE DATA**

Resting state functional connectivity demonstrated alterations in the frontal cortex in HL patients. Increased connectivity between frontal regions and seed regions for the DAN and DMN was seen in HL patients compared to NH controls. A decreased correlation was seen between the left superior frontal cortex and the left middle occipital gyrus in HL subjects compared to controls. The relationship seen between HL and alterations in frontal connectivity may not just be important at baseline, as engagement of frontal regions was also apparent during the emotion task. The cause of these network alterations is not clear. The alterations could be purely attentional in nature, but they may also be a factor of interactions between emotional and attentional systems. A study including an attentional task without an emotional component may help to clarify this.

Except for activation of the left temporal pole, we did not find evidence to support our expectation that the response of the auditory cortex would differ between the two groups when processing affective sounds. A separate resting-state functional connectivity analysis with seeds in the primary auditory cortices also failed to find significant connectivity differences at rest between the two groups. The lack of significant findings may relate to mild-tomoderate severity of the HL in our study. To date, there have been no studies of resting state functional connectivity in patients with this level of HL. Deafness, however, has been investigated in this context. Intrinsic connectivity has been shown to be impacted by deafness, both within and outside of the temporal cortex (Li et al., 2013). The findings of the (Li et al., 2013) study are similar to those in our own study. Deaf patients showed increased negative correlations between the middle superior temporal sulcus and


**Table 5 | Local maxima for results of the resting state functional connectivity analysis.**

*The seeds are listed for the canonical resting state networks first, followed by seeds from the task-based study. Reported regions are listed in Montreal Neurological Institute (MNI) coordinates and in terms of Brodmann areas (before determining the Brodmann areas the MNI coordinates were converted to Talairach coordinates). Statistical threshold was set at p < 0.025 FWE corrected for multiple comparisons (to account for two-tailed t-test). All regions were significant at the cluster level. Abbreviations: L, left; R, right; amyg, amygdala; inf pariet, inferior parietal; sup front, superior frontal; mid temp, middle temporal; sup pariet; superior parietal.*

the left middle occipital and right precentral gyri when compared to NH controls. In our study, the left middle occipital gyrus was shown to be less correlated to the left superior frontal gyrus in the HL group, whereas the right precentral gyrus was more correlated with the right middle temporal gyrus. The presence of altered connectivity in similar regions in both deaf and our mild-tomoderate HL patients warrants further resting state connectivity studies examining varying degrees of HL.

It is important to note that inferences about the directionality of the connections cannot be made from the present functional connectivity analysis. An effective connectivity analysis, perhaps implemented as a structural equation modeling or dynamic causal modeling, would be needed to examine directionality (Horwitz, 2003). Our results suggest only a general alteration in connectivity between two related regions; differences in correlation may not be specifically due to coupling between the seed and a particular region, but may also arise due to the influence of a third region, or changes in noise, etc. (Friston, 1994).

HL is positively correlated with age (Yueh et al., 2003); age is therefore a potential confound in our research. A study examining connectivity in resting state networks (Onoda et al., 2012) noted a significantly decreased correlation between the auditory resting state network and the salience network (which includes the insula, ventrolateral prefrontal cortex (VLPFC), thalamus and cerebellum) with age. In addition, connections between regions of the salience network also weakened with age. The regions of the salience network relate to the processing of emotional stimuli. It is possible that HL within the older population in this study is partially responsible for the observed results. The decreased correlation between regions of the DAN and the insula seen in our study fit well with the results of the aging study. In our own study, however, all participants were matched for age; therefore, we are unable to parse out the effects of age from those of HL. More fMRI studies specifically addressing the effects of HL of varying severity, in different age groups, should be performed to clarify its impact on intrinsic and task-based functional networks. Subject motion in the scanner is another notable confound with the resting state data being particularly sensitive to its influence. Although we excluded data from participants who exhibited excessive head motion, included motion parameters as covariates of no-interest in our statistical models and statistical tests did not show any significant differences between

the groups, it is possible that motion-related artifacts continue to affect our results, as shown by recent publications (Kundu et al., 2012; Van Dijk et al., 2012; Kundu et al., 2013; Power et al., 2014). Future studies with more stringent data acquisition considerations and more advanced analytical methods will need to be conducted to fully account for the possibility of motion artifacts.

sulci and the bilateral frontal eye fields to examine connectivity in the

### **CONCLUSION**

Our results suggest that HL may affect emotional processing by decreasing amygdalar recruitment, resulting in slower reaction times to highly valent sounds. Although the HL group demonstrated slower response times to affective sounds, there was no difference in sound ratings between the HL and the NH group. Altered engagement of the frontal regions was also demonstrated in both emotion task-based subtraction and resting state functional connectivity analyses. HL is the third most common chronic condition in older adults and is highly comorbid with another hearing disorder, tinnitus. Our results in those with unaided, mild-to-moderately severe HL have implications for auditory rehabilitation for hearing impairment, reducing social isolation in the elderly, and management strategies for tinnitus.

#### **ACKNOWLEDGMENTS**

We wish to acknowledge the support of Tinnitus Research Consortium to Fatima T. Husain and of the NeuroEngineering NSF IGERT (Integrative Graduate Education and Research Traineeship) to Jake R. Carpenter-Thompson and Sara A. Schmidt. We are grateful to Kwaku Akrofi and Jaclyn Utz for their assistance in data acquisition.

#### **SUPPLEMENTARY MATERIAL**

frontal gyrus; R MTG, right middle temporal gyrus.

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fnsys. 2014.00010/abstract

**Supplementary Table 1 | Sounds included in the study.** Separated by column are 30 pleasant, 30 unpleasant, and 30 neutral sounds chosen from the IADS database to be included in the study.

## **REFERENCES**


Brett, M., Anton, J. L., Valabregue, R., and Poline, J. B. (2002). Region of interest analysis using an SPM toolbox. *NeuroImage* 16, 1140–1141.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 15 August 2013; accepted: 15 January 2014; published online: 04 February 2014.*

*Citation: Husain FT, Carpenter-Thompson JR and Schmidt SA (2014) The effect of mild-to-moderate hearing loss on auditory and emotion processing networks. Front. Syst. Neurosci. 8:10. doi: 10.3389/fnsys.2014.00010*

*This article was submitted to the journal Frontiers in Systems Neuroscience. Copyright © 2014 Husain, Carpenter-Thompson and Schmidt. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided*

*the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Left temporal alpha-band activity reflects single word intelligibility

#### **Robert Becker<sup>1</sup>† , Maria Pefkou<sup>2</sup>† , Christoph M. Michel <sup>1</sup> and Alexis G. Hervais-Adelman<sup>2</sup>\***

<sup>1</sup> Functional Brain Mapping Lab, Department of Fundamental Neuroscience, University of Geneva, Geneva, Switzerland <sup>2</sup> Brain and Language Lab, Department of Clinical Neuroscience, University of Geneva, Geneva, Switzerland

#### **Edited by:**

Jonathan E. Peelle, Washington University in St. Louis, USA

#### **Reviewed by:**

Philipp Ruhnau, Università degli Studi di Trento, Italy Björn Herrmann, Max Planck Institute for Human Cognitive and Brain Sciences, Germany

#### **\*Correspondence:**

Alexis G. Hervais-Adelman, Brain and Language Lab, Department of Clinical Neuroscience, University of Geneva, 1 Rue Michel-Servet, CH – 1211, Geneva, Switzerland e-mail: Alexis.Adelman@unige.ch †These authors have contributed equally to this work.

The electroencephalographic (EEG) correlates of degraded speech perception have been explored in a number of recent studies. However, such investigations have often been inconclusive as to whether observed differences in brain responses between conditions result from different acoustic properties of more or less intelligible stimuli or whether they relate to cognitive processes implicated in comprehending challenging stimuli. In this study we used noise vocoding to spectrally degrade monosyllabic words in order to manipulate their intelligibility. We used spectral rotation to generate incomprehensible control conditions matched in terms of spectral detail. We recorded EEG from 14 volunteers who listened to a series of noise vocoded (NV) and noise-vocoded spectrallyrotated (rNV) words, while they carried out a detection task. We specifically sought components of the EEG response that showed an interaction between spectral rotation and spectral degradation. This reflects those aspects of the brain electrical response that are related to the intelligibility of acoustically degraded monosyllabic words, while controlling for spectral detail. An interaction between spectral complexity and rotation was apparent in both evoked and induced activity. Analyses of event-related potentials showed an interaction effect for a P300-like component at several centro-parietal electrodes. Timefrequency analysis of the EEG signal in the alpha-band revealed a monotonic increase in event-related desynchronization (ERD) for the NV but not the rNV stimuli in the alpha band at a left temporo-central electrode cluster from 420–560 ms reflecting a direct relationship between the strength of alpha-band ERD and intelligibility. By matching NV words with their incomprehensible rNV homologues, we reveal the spatiotemporal pattern of evoked and induced processes involved in degraded speech perception, largely uncontaminated by purely acoustic effects.

**Keywords: speech intelligibility, degraded speech, noise-vocoding, alpha oscillations, left inferior temporal cortex**

## **INTRODUCTION**

Despite the great acoustic variability of the speech signal, normally-hearing listeners are able to understand speech seemingly effortlessly in day-to-day situations. In order for this to be possible, the human speech perception system must be robust in the face of both natural variability and acoustic degradations. Natural variability can arise from the different realization of speech sounds by different speakers, who can be understood even though they vary in terms of their size and sex (Peterson and Barney, 1952; Smith et al., 2005), accent (Clarke and Garrett, 2004; Clopper and Pisoni, 2004; Evans and Iverson, 2004), and speech rate (Miller and Liberman, 1979), or may suffer from articulation disorders such as dysarthria (Darley et al., 1969; Kent et al., 1989). Beyond signal variations related to the speaker, external factors also have an impact on the quality of speech that is heard, such as transmission through telecommunications systems, in which case it is heavily filtered, the presence of background noise masking part of the speech signal, the presence of echoes, or the presence of multiple talkers simultaneously.

In order to investigate the mechanisms of speech perception, researchers are increasingly turning to paradigms employing artificial acoustic degradations of speech, which allow fine-grained manipulation of the level of intelligibility of a speech signal, rendering the behavioral or neural correlates of speech intelligibility amenable to investigation. Several types of manipulations have been employed, such as interrupted speech (Heinrich et al., 2008; Shahin et al., 2009), masking with noise (Davis and Johnsrude, 2003; Golestani et al., 2013), time-compression (Altmann and Young, 1993; Mehler et al., 1993; Dupoux and Green, 1997; Pallier et al., 1998; Sebastian-Galles et al., 2000), or noise-vocoding (Shannon et al., 1995). We chose to employ the latter of these, as it has been the subject of intensive research both behaviorally (e.g., Davis et al., 2005; Stacey and Summerfield, 2007; Hervais-Adelman et al., 2008, 2011; Dahan and Mead, 2010) and increasingly in neuroimaging experiments (e.g., Scott et al., 2000; Davis and Johnsrude, 2003; Scott et al., 2006; McGettigan et al., 2011; Obleser and Kotz, 2011; Hervais-Adelman et al., 2012; Evans et al., 2013).

An increasing amount of evidence regarding the neural correlates of degraded speech perception is becoming available. Functional magnetic resonance imaging (fMRI) has been extensively used to detect brain regions active while listening to degraded speech (e.g., Scott et al., 2000; Davis and Johnsrude, 2003; Giraud et al., 2004; Davis and Johnsrude, 2007; McGettigan et al., 2011; Hervais-Adelman et al., 2012; Wild et al., 2012a), while electroencephalography (EEG) and magneto-encephalography (MEG) have been employed to study the timing of degraded speech processing through measures such as evoked response potentials and synchronized brain oscillations (e.g., Obleser and Kotz, 2011; Sohoglu et al., 2012; Peelle et al., 2013). Such experiments also provide insights into the cerebral processes that support speech perception in hearing-impaired listeners.

Neuronal oscillations are an important tool in the study of different brain functions, such as speech perception, as they address the issue of synchronization of neuronal networks within a given brain region as well as between regions, based on data obtained non-invasively and at the temporal scale neuronal activity occurs. Recent studies (reviewed by Weisz and Obleser, 2014) have focused on brain oscillations in auditory processing and speech perception. Among the different "brain rhythms", which differ from each other on the basis of their characteristic frequency (e.g., delta oscillations are of low frequency, 1–3 Hz, while gamma oscillations have a high frequency of 30–80 Hz) and which seem to play a role in speech processing, a major stream of research has focused on the alpha rhythm (the range of 8–13 Hz). Traditionally, alpha rhythm has been considered as being highly related to visual spatial attention (Thut et al., 2006), beginning with the reports of Hans Berger in the late 1920s (Berger, 1929). Berger (1929) observed that opening the eyes lead to clear attenuation of previously visible oscillations in the 10 Hz range over occipital electrodes. Since then, numerous studies have reported findings about the close relationship of alpha rhythm activity and the visual system (e.g., Makeig et al., 2002; Becker et al., 2008; Busch et al., 2009; Mathewson et al., 2009) as well as working memory tasks (Jokisch and Jensen, 2007) to attentional and other cognitive paradigms (for a review, see Klimesch, 1999).

Despite this strong interest in alpha rhythm, much less is known about the relationship of alpha activity and auditory processing. This may be in part owed to the fact that resting alpha is much more prominent in occipital electrodes, likely detecting visual-cortical activity rather than in temporal electrodes. This might be related to the larger volume of the visual cortex compared to the auditory cortex and to the fact that it is impossible to completely deprive a healthy, hearing individual from auditory input but easy to close one's eyes and thereby block visual input (for a review on auditory alpha see Weisz et al., 2011). The first reports of auditory-cortical oscillations in the alpha-band originate from MEG studies, in which the rhythm was initially named "tau" (Tiihonen et al., 1991; Lehtela et al., 1997). Lehtela et al. (1997) demonstrated that the presentation of short (500 ms) noise bursts was followed by decreases in alpha power, localized in the superior temporal lobes, suggesting a role for alpha as a marker of auditory processing similar to that shown in the visual system. Regarding alpha oscillations and speech perception, Krause et al. (1997) reported stronger alpha-band suppression for listening to a comprehensible text passage as when compared to listening to the same passage being played backwards (Krause et al., 1997). Despite their scarcity, such studies point to a functional role of alpha oscillations in processing speech.

In a more recent study, Obleser and Weisz (2012) investigated the role of alpha suppression in speech comprehension using degraded speech. They employed noise-vocoding, a form of artificial degradation which reduces the spectral information in the auditory signal (Shannon et al., 1995), in a graded manner, permitting the generation of stimuli on a continuum from unintelligible to intelligible, by increasing the spectral fidelity of the vocoding. The method is described in Section Stimuli below. Obleser and Weisz (2012) used words at different levels of intelligibility and hypothesized that alpha suppression should be enhanced with intelligibility, reflecting less need for functional inhibition and less effortful speech processing. This idea is compatible with the conception of alpha oscillations as a mechanism of gating or suppressing task-irrelevant high frequency oscillations (Osipova et al., 2008; Jensen and Mazaheri, 2010). In other words, a decrease in alpha power in a task-relevant region would accompany the disinhibition of gamma oscillations in the same region, which are thought to reflect active processing. In general, alpha suppression is accompanied by an increase of oscillations in the gamma band, which reflects engagement/processing as it has been shown by intracranial studies in animals and humans (e.g., Paller et al., 1987; Fries, 2005; Kaiser and Lutzenberger, 2005; Crone et al., 2006; Womelsdorf et al., 2007).

Obleser and Weisz (2012) showed a linear increase in alpha suppression as a function of word intelligibility. They interpret this finding as reflecting the listeners' effort or ability to comprehend degraded speech. The alpha suppression was localized in superior parietal, prefrontal and anterior temporal areas. This may indicate that these brain regions are more active during the task, assuming that a local decrease in alpha power is a marker of activation of the region in question. This idea is backed by early findings of changes in resting-state alpha power as well as by numerous more recent EEG-fMRI studies which reported an inverse relationship between alpha-power and BOLD signal, under rest as well as during stimulation (Laufs et al., 2003; Moosmann et al., 2003; Goncalves et al., 2006; de Munck et al., 2007; Becker et al., 2011). Furthermore, a more recent study showed an even more direct inverse relationship between alpha-power and neuronal firing rate in monkeys (Haegens et al., 2011). The localization of the alpha-suppresion effect reported by Obleser and Weisz (2012) is partially in agreement with previous fMRI studies seeking to identify the brain regions implicated in the processing of degraded but potentially comprehensible speech (Scott et al., 2000; Narain et al., 2003; Giraud et al., 2004; Eisner et al., 2010; Wild et al., 2012a,b), revealing recruitment of brain areas involved in speech processing in general, such as bilateral superior temporal gyrus (STG) and superior temporal sulcus (STS), together with a more left-lateralized inferior-frontal and motor network.

In the present study, our aims were twofold: (1) we aimed to dissociate the EEG signatures of intelligibility and spectral complexity while listening to degraded speech, using a paradigm similar to the one published by Obleser and Weisz (2012). To this end, we asked participants to listen carefully to noise-vocoded (NV) words with different levels of degradation (therefore different levels of spectral complexity), and importantly, to the spectrally rotated version of the same stimuli. Spectrally rotated speech (Blesser, 1972) retains the same amount of spectral complexity as its non-rotated equivalent but it is incomprehensible to the naïve listener [although listeners may adapt to it after extensive training (Green et al., 2013)]. In this way, we had a spectrally-matched control condition for each level of intelligibility in order to rule out the possibility that alpha suppression is purely driven by low-level acoustic properties of the signal. Importantly, we used monosyllabic words, which are very challenging for naïve listeners to understand when degraded, in order to increase task difficulty and limit the use of top-down processes. As lexical and prosodic information facilitate comprehension and perceptual learning of degraded speech (e.g., Davis et al., 2005), by using monosyllabic words our listeners would have to rely almost exclusively on the acoustic information they could extract from the degraded input. The nature of this acoustic input was therefore essential for comprehension, given the lack of other sources of information in the stimuli and (2) by using the approach described above, we wanted to examine both induced and evoked correlates for degraded speech processing, their timing, their topography as well as the putative cerebral sources underlying the observed effects. Although the work presented here employs artificial acoustic degradation to examine the neural underpinnings of degraded speech perception, we hope that the findings will also be relevant to the cerebral processes that help to compensate for degraded auditory input in hearing impaired listeners.

## **MATERIALS AND METHODS**

#### **PARTICIPANTS**

Fourteen right-handed native French speakers (two male) participated in the study. Participants self-reported as being righthanded and having no history of hearing or visual impairment. They all gave written informed consent to participate in the study, which was approved by the local ethics committee of the University Hospital of Geneva, and received financial compensation of 40 CHF.

#### **STIMULI**

Stimuli were 360 monosyllabic French nouns, among which 36 animal names, selected from the "Lexique 3" database for French words.<sup>1</sup> The animal names were used as targets in the detection task we employed. All words, apart from the animal names, were divided into three groups, matched for spoken wordform frequency (*F*2,321 = 0.205, *p* = 0.82; Group 1 mean = 31.94/million, Group 2 mean = 27.43/million and Group 3 mean = 28.37/million). The animal names used had mean frequency = 15.41/million. Stimulus matching was achieved using the Match program.<sup>2</sup>

The words were recorded in a soundproof chamber by a male native French speaker and digitized at a 44.1 kHz sampling rate. Recordings were denoised using the algorithm implemented in Audacity software<sup>3</sup> and trimmed at zero-crossings preceding and succeeding the words. Root mean square (RMS) amplitude of each of the extracted stimuli was set to a fixed level, and the stimuli were then bandpass filtered between 50 and 5000 Hz using a sixth order butterworth filter. Stimuli were then NV using 1, 4, 8 and 16 frequency bands following the procedure described by Shannon et al. (1995). Stimuli were filtered into quasi-logarithmically spaced frequency bands intended to mimic the tonotopic organization of the cochlea (Greenwood, 1990) between 50 and 5000 Hz, using second order butterworth filters and full-wave rectified, producing the amplitude envelope of each band. Each envelope was convolved with noise band-pass filtered into the same frequency range as the source band. The amplitude-modulated carriers were then recombined to produce NV words. The number of frequency bands has been shown to be a crucial factor for the intelligibility of vocoded words, with higher numbers of bands producing more intelligible stimuli (Shannon et al., 1995). The four NV versions of each word were created, producing a continuum from completely unintelligible (1-band noise-vocoded, NV1) to relatively easily comprehensible (16-band noise-vocoded, NV16).

Bandpass-filtered and NV words were also spectrally rotated, in order to create a control condition preserving the same amount of spectral information but rendering the input incomprehensible. Spectral rotation was achieved based on a procedure defined by Blesser (1972) and implemented following the procedure described by Scott et al. (2000). The speech signal was rotated around the midpoint of the frequency range of the bandpass filtered input (i.e., 2.5 kHz) and then pre-emphasized using a finite response (FIR) filter with the same long-term spectral distribution as the original input. Finally, signal was modulated by a sinusoid at 5 kHz. NV1 words were not spectrally rotated, since their spectral profile is that of white noise, i.e., flat, and would not be altered by rotation. Example NV and spectrally-rotated NV stimuli are illustrated in **Figure 1**.

These manipulations yielded four levels of spectral detail, crossed with the factor spectral rotation: band-pass filtered words (BP), NV16 words, 8-band noise-vocoded words (NV8) and 4 band noise-vocoded words (NV4) and their spectrally rotated homologues (rBP, rNV16, rNV8, rNV4), as well as a 1-band NV (NV1, equivalent to signal-correlated noise) control condition.

The experiment was divided into three blocks (see Section Procedure, below), for which stimulus selection proceeded as follows. Words were randomly selected from the pool of 108 non-target stimuli of each stimulus group, and allocated without replacement to one of the four non-rotated conditions (NV4, NV8, NV16, BP). Each selected word was also allocated to the spectrally rotated condition at the same level of spectral detail. Twenty seven words were allocated to each level of degradation per block. Due to the limited number of target stimuli (the 36 animal names) these were randomly allocated without replacement to each of the four levels of degradation, with a new randomization applied for each of the three blocks, thus each target stimulus appeared three times in a non-rotated condition and three times in a rotated

<sup>1</sup>http://www.lexique.org, New et al. (2001)

<sup>2</sup>http://www.mrccbu.cam.ac.uk/∼maarten/match.htm, Van Casteren and Davis (2007)

<sup>3</sup>http://audacity.sourceforge.net/

condition over the course of the experiment. For each block, 27 words were randomly selected without replacement from the whole pool of stimuli and allocated to the NV1 condition. There were therefore 81 non-target stimuli per condition and 27 target stimuli in each condition barring NV1, for a total of 279 trials in each of the three blocks.

#### **PROCEDURE**

Participants were seated in front of a computer screen, at a distance of approximately 1 m, in a sound-insulated Faraday cage. The overhead light in the booth was turned off while EEG was recorded. Stimuli were delivered through ER4 noise-isolating in-ear headphones (Etymotic Research Inc., Elk Grove Village, Illinois) at a comfortable sound level, adjusted at the beginning of the experiment.

The experiment was divided into three blocks of 12.5 min and 279 trials each. Participants were instructed to listen carefully to the words and respond each time they heard an animal name by pressing a button on a response-box placed on a table in front of them with their right index finger. Brain electrical activity was recorded during the whole period of each block. Short breaks took place after the end of the first two blocks in order to check, and if necessary reduce, the impedances of the electrodes.

Each trial began with the appearance of a fixation cross at the center of the screen. The stimulus was presented after a silent interval whose duration was randomly jittered on a uniform distribution between 400 and 600 ms after the onset of the

fixation cross. The fixation cross remained on screen during the presentation of the word so as to encourage fixation and minimize eye movements. 1000 ms after the offset of the stimulus a symbol appeared on-screen for 1000 ms, indicating to participants that they could blink if they wished. By cueing blinks we aimed to ensure that any blinking-related electrical artefacts would not contaminate the recording of EEG responses to the stimuli. The structure of a single trial is illustrated in **Figure 2**.

## **EEG RECORDING**

Brain electrical responses were recorded with a 256-electrode Electrical Geodesics HyrdoCel system (Electrical Geodesics Inc., Eugene, Oregon). Signal was recorded continuously and digitized at a sampling rate of 1000 Hz. By default, the recording system sampled at 20 kHz before applying an analog hardware antialiasing filter with a cut-off frequency of 4 kHz and downsampling the signal to 1000 Hz, and applying a software low-pass Butterworth filter with a cutoff of 400 Hz. The reference electrode was the Cz, situated at the vertex. Electrode impedances were checked at the beginning of the session and after the first and second recording blocks and were under 30 k at the beginning of each block.

## **DATA ANALYSIS**

#### **Behavioral data**

We computed and analyzed accuracy scores and response times for correct trials. We also analyzed the number of false alarms per condition.

## **EEG data preprocessing and artifact removal**

EEG data were initially analyzed using custom Matlab (The Mathworks, Natick, MA) scripts and the freely available EEGLAB toolbox (Delorme and Makeig, 2004). Data were re-referenced to average reference and downsampled to 200 Hz (after lowpass filtering to avoid aliasing). For further analyses a set of 204 channels was analyzed (channels covering the cheeks were excluded). The three blocks per subject were concatenated and an Independent Components Analysis (ICA) was computed on the whole dataset using the Infomax routine from the Matlab-based EEGLAB toolbox (Delorme and Makeig, 2004) in order to remove blink artefacts and 50 Hz line-noise. The resulting data were filtered between 0.75 and 40 Hz using a fifth order Butterworth bandpass filter and separated into epochs starting 800 ms before the sound onset and finishing 1000 ms after. A baseline correction was applied by subtracting from each epoch the mean signal computed over 200 ms preceding the onset of the stimulus. The filtered and epoched data were scanned to detect epochs in which the amplitude between 300 ms pre-stimulus onset and the end of the epoch was higher than 75 µV or lower than −75 µV. These epochs were considered outliers and were excluded from further analysis.

Furthermore, only trials with non-target words and no button presses (i.e., false alarms) were kept for further EEG analyses.

#### **Analysis of evoked activity—ERP analysis**

Single-trial evoked responses were averaged for each condition separately to examine the component of neural activity being phase-locked to the stimulation event, i.e., to obtain conditionspecific event-related potentials (ERPs). Data were analyzed in a time window from −100 to 1000 ms about stimulus onset.

#### **Analysis of induced activity—time-frequency analysis**

Using the EEGLAB toolbox, we chose a hanning-tapered shortterm fast-fourier-transformation (stFFT) to compute timefrequency representations of our data in the range of the alpha band (8–13 Hz). These time-frequency power fluctuations (referred to as *TF*α, in µ*V* 2 ) were then converted to *dB* logarithmic scale using the following equation:

$$dB\_{\alpha} = 10 \times \log\_{10}(TF\_{\alpha}) \tag{1}$$

and finally averaged across the defined frequency band. As for the ERPs, data were analyzed in a time window from −100 to 1000 ms.

#### **ANOVA**

The ERPs as well as the stFFT data were submitted to a repeatedmeasures ANOVA, including two factors: spectral detail (four levels, 4-bands, 8-bands, 16-bands and band-pass filtered), and spectral rotation (two levels, non-rotated and rotated), and including subject as a random factor. We were specifically interested in the interaction of spectral detail and spectral rotation to identify correlates of degraded word processing, which are not confounded with the factor of spectral detail. ANOVA was performed for each time point and electrode.

#### **Inverse solution modeling**

For the ERPs as well as the stFFT results we used the weighted minimum norm (WMN) approach (Lin et al., 2006), which belongs to the family of distributed inverse solution methods. We computed inverse solutions by applying the WMN inverse matrix to the data (5018 solution points, no regularization). The lead field model we employed was a LSMAC model (locally spherical model with anatomical constraints, for details about this approach see Brunet et al., 2011) which then in turn is applied to the MNI brain template (ICBM152) coregistered to EEG electrodes. Standard spatial electrode positions were used in all subjects, co-registered to the template MRI by adjusting the position of the nasion, inion, Cz and pre-auricular landmarks.

For the ERP data, condition and subject specific averages were inverted using the method described above and then subject to the same ANOVA procedure as the surface EEG data. To compute the inverse equivalent of the stFFT surface data in the alpha-band, we filtered the single EEG trials for each condition and subject in the alpha frequency band, inverted them and then averaged the normed (i.e., scalar) values of all single trials (similar to the approach used by de Pasquale et al., 2013). This results in a temporally resolved representation of alpha-band power distributed across 5018 solution points in the inverse space. Afterwards, these WMN inverted averages were subject to the ANOVA procedure as stated above. For visualization, results were projected onto the coregistered MNI brain template using the Cartool toolbox (Brunet et al., 2011).

## **RESULTS**

#### **BEHAVIORAL DATA**

Mean accuracy values for the non-rotated potentially intelligible conditions were 26, 66, 80 and 91% for the 4-bands, 8-bands, 16-bands and bandpass-filtered stimuli respectively. Accuracy values entered in a one-way ANOVA with spectral detail as the within-subject factor (four levels: 4-bands, 8-bands, 16-bands and bandpass-filtered), which revealed a significant main effect of spectral detail (*F*(3,39) = 339.931, *p* < 10−<sup>6</sup> , partial-η <sup>2</sup> = 0.96). The mean accuracy value per condition is displayed in **Figure 3A**.

Response times for the intelligible correct trials were also computed and entered in a one-way repeated-measures ANOVA with spectral detail as the within-subjects factor (four levels, as described above). There was a significant main effect of spectral detail (*F*(3,39) = 26.531, *p* < 0.001, partial-η <sup>2</sup> = 0.67) with response latencies being faster for intelligible trials and slower for more degraded words. Response time results are displayed in **Figure 3B**. We did not compute accuracy values or response times for the rotated trials, as no response could be categorized as "correct" in the face of incomprehensible stimuli. Those instances where participants did respond must be considered false alarms.

We also analyzed the number of false alarms as we expected more false alarms in the unintelligible or potentially intelligible but difficult conditions. The 1-band NV words were not included in this analysis. Number of false alarms was entered a repeatedmeasures ANOVA with spectral detail (four-levels, as described above) and spectral rotation (two levels: spectrally rotated and non-rotated) as within-subjects factors. There was a significant main effect of spectral complexity (*F*(3,19) = 5.037, *p* = 0.021, partial-η <sup>2</sup> = 0.279) and a significant spectral complexity by rotation interaction (*F*(3,19) = 8.534, *p* = 0.001, partial-η <sup>2</sup> = 0.396). The mean false alarms per condition is displayed in **Figure 3C**.

#### **EVOKED ACTIVITY—ERP ANALYSIS**

Average ERPs were entered into a two-way, repeated measures analysis of variance with number of frequency bands (i.e., spectral detail) and rotation as within-subjects factors. ANOVA was performed for each channel and sample point. To correct for multiple comparisons, we employed the approach of controlling the false discovery rate, FDR (Benjamini and Yekutieli, 2001) using a threshold of *p*(FDR) < 0.05. In order to ensure we report robust results, we additionally disregard results involving fewer

than two electrodes or having a duration of less than three consecutive sample points (15 ms). After FDR correction, we found the following main effects, at *p*(FDR) < 0.05, (uncorrected *p*-values are provided here): There was a significant main effect of rotation, temporally centered around 305 ms, involving up to eight suprathreshold electrodes. At the centroparietal peak electrode 132, *F*(1,13) = 57.59, *p* < 3.97\*10−<sup>6</sup> , the effect ranged from 295 to 325 ms. A main effect of spectral detail was found in a set of seven left superior-parietal and right temporal electrodes, in a timewindow ranging from 185 to 215 ms post stimulus-onset. At the centroparietal peak electrode 81, *F*(3,39) = 16.5, *p* < 4.45\*10−<sup>7</sup> .

The focus of our analysis was the interaction between the number of frequency bands and spectral rotation which was intended to specifically reveal the EEG signatures of listening to degraded words, while controlling for the impact of spectral complexity. We observed a significant spectral detail by rotation interaction located over a set of spatially coherent central parieto-frontal electrodes, in one distinct post-stimulus time-window, from 305– 390 ms, peaking at 345 ms (comprising a cluster of up to five supra-threshold electrodes around electrode 186, with maximum *F*(3,39) = 21.96, *p* < 1.72\*10−<sup>8</sup> , **Figures 4A, B**). It appears that the effect in this time-window is driven specifically by a significantly greater positivity in response to the bandpass-filtered words in comparison to all other categories of stimuli (see **Figure 4D**). In order to confirm this hypothesis, a post-hoc analysis was carried out, comparing the clear condition with all others, pooled, in a univariate ANOVA, including subject as a random factor. This showed that the clear condition was indeed significantly different from all the others *F*(1,13) = 45.303, *p* < 0.001, partial-η <sup>2</sup> = 0.777.

#### **INDUCED ACTIVITY—TIME-FREQUENCY ANALYSIS**

ANOVA and correction for multiple comparisons was performed as for the ERP analysis reported above. After FDR correction, we found no significant main effects. We found a significant interaction in the alpha-band, at electrodes situated over the left temporal lobe, from 462–633 ms, with the peak of this effect found at 533 ms (comprising a spatially coherent cluster of eight supra-threshold electrodes around left-temporal electrode 75 with maximum *F*(3,39) = 12.12, *p* < 9.23\*10−<sup>6</sup> see **Figures 5A, B**). This spatiotemporally coherent effect was manifest as a decrease in alpha-band power (i.e., alpha suppression) as a function of stimulus clarity for the NV but not rNV conditions (see **Figures 5D, E**). In contrast to the earlier effect revealed in the ERP analysis, this effect showed a graded response, suggesting sensitivity to the difficulty of comprehending words at different levels of degradation. Sidak-corrected post-hoc pairwise comparisons comparing the alpha-band power averaged over the 5 electrodes showing the greatest interaction effect in the period from 462–633 ms for the four potentially-comprehensible conditions partly support this: NV4 > NV16 (*p* = 0.021), NV4 > BP (*p* = 0.011), NV8 shows marginally less alpha power than BP (*p* = 0.063), other comparisons not significant. Although not all pairwise differences between the levels are significant, this is suggestive of increasing alpha-suppression with increasing intelligibility.

#### **INVERSE SOLUTIONS**

For visualization of results in the inverse space, we performed identical statistics as those on the surface data, confined to the two temporal windows showing FDR-corrected effects in both the ERP and the induced activity. The results are presented in **Figures 4C**, **5C** showing the resulting *F*-values averaged over the time-window of interest.

For effects on evoked activity, the source of the corresponding effect (i.e., the interaction between spectral complexity and rotation) in the time window from 300–400 ms with a peak at approximately −13, −27, 63 (MNI co-ordinates) principally located in medial precentral gyrus (BA6, extending posteriorly into BA4). This effect is shown in **Figure 4C**.

The source corresponding to the interaction between spectral complexity and rotation in the alpha band (time window 450– 650 ms) was localized to the left superior temporal lobe/inferior

parietal lobe. The maximum effect was located at −46, −27, 20 (MNI co-ordinates), and spanned the posterior reaches of the left supramarginal gyrus, and covered the superior posterior temporal gyrus. The effect is presented in **Figure 5C**.

## **DISCUSSION**

Behavioral data demonstrate that the rNV conditions were unintelligible to the participants in this study, while the intelligibility of the NV conditions increased as a function of the number of frequency bands used, as previously demonstrated (e.g., Shannon et al., 1995; Davis and Johnsrude, 2003). Although we did not obtain responses for every trial, we are confident that the d' measures established for the target-detection task reflect the mean level of intelligibility of each of the conditions employed.

Our analysis of the EEG data revealed two time-periods in which the brain response is significantly affected by the potential intelligibility of a stimulus. In the earlier one of these, we show an effect in evoked activity that appears to relate to the early identification of clear stimuli as a special category. Its timing and its topography suggest that we may be detecting modulation of the P300 component (for a review of the P300, see Polich, 2007). The P300 is usually apparent as a positivity in the midline parietofrontal electrodes and is observed in tasks, such as the animal name detection task used here, that require stimulus discrimination. Its appearance has been suggested to be an index of the occurrence of stimulus events sufficiently important to inhibit concurrent processes, and has therefore been associated with attention and memory processes (Polich, 2007). In the present case, it could reflect the fact that deciding whether the stimulus is a target or not is substantially easier when the word is more easily intelligible. Another possibility is that because the clear stimuli are categorically different from all others, this relatively early effect results from an oddball effect (Polich and Margala, 1997). This is also supported by the fact that the probability of a clear stimulus being presented was relatively low, at only 13%. The fact that this evoked effect seems to be rather central than left-lateralized could also be seen to support the view of this component being rather a general cognitive component than one specific to language processing.

We did not find effects in the evoked responses that we can more specifically ascribe to comprehension processes. This may be the result of the inherent difficulty of time-locking EEG responses to specific events during auditory word processing. Although we used the acoustic onset of the stimuli as our marker for the beginning of each epoch, given that we are interested in comprehension processes rather than auditory processes, this may not be the best point. Ideally, we might have used the recognition point determined using (for example) gating (Grosjean, 1980). This issue, of course, applies to all studies of spoken word perception that use measures with a high temporal resolution.

During a second time-window, we showed a significant interaction between spectral rotation and spectral complexity on induced activity—specifically the alpha band, reflecting increased alpha suppression as a function of increasing spectral complexity in the NV (i.e., potentially comprehensible) and not the rNV (always unintelligible) stimuli. This finding is superficially

similar to that reported by Obleser and Weisz (2012), who demonstrated a connection between alpha-band suppression and intelligibility of stimuli. However, by including the spectrally rotated stimuli as an unintelligible control condition we extended this finding by confirming that the relationship between alphasuppression and intelligibility is almost certainly the result of brain mechanisms that extract meaning from words, rather than the result of purely acoustic analysis of more vs. less complex auditory input. Furthermore, the effects we find occur earlier than those reported by Obleser and Weisz (2012). This may be the result of our having employed exclusively monosyllabic words rather than the mixture mono-, bi- and trisyllabic words they employed.

As discussed in Section Introduction, the functional role of auditory alpha is unclear, although it is increasingly becoming a focus of attention in the field of speech perception. At present, the relevant evidence seems to indicate that alpha-power suppression is related to comprehension, and it has been suggested that this could relate to the role of alpha-band activity in suppressing higher-frequency oscillations. It may be the case that alpha-band activity reveals an inhibitory process that suppresses processes whose occurrence is indicated by the presence of higher-frequency oscillations (Osipova et al., 2008). Thus, alpha-suppression may reflect the release of inhibition of processing. In the present case, where alpha-suppression reflects the intelligibility of auditory input, it is highly likely that the process indexed by this effect is related to word identification and comprehension. This effect is present in the time range of the N400 complex, which is related to semantic processing (Kutas and Hillyard, 1984).

Increased alpha-suppression for more easily comprehensible words during this time window is consistent with the idea that this effect reveals the on-average greater semantic processing in the easier compared to the harder conditions. However, a less prosaic explanation of this effect would be that the relatively higher alpha power in the more challenging conditions reflects increased functional inhibition related to the demands placed on attention and working memory by the challenge of comprehending acoustically degraded speech. Indeed, several studies have shown that the ability to comprehend degraded speech is influenced by working memory capacity (Ronnberg et al., 2008, 2010; Zekveld et al., 2009, 2011, 2012; Rudner et al., 2011) and selective attention (Shinn-Cunningham and Best, 2008; Wild et al., 2012b). It is possible that this effect is the manifestation of the inhibitory networks that are required to maintain selective attention and working memory processes to support the comprehension of degraded speech. A possibility raised by Obleser and Weisz (2012) is that the increase in alpha-suppression observed in response to less degraded stimuli is not necessarily increased suppression as a function of increased intelligibility, but increasing alpha-power as stimulus intelligibility falls. If this is the case, then the effect may be explained if alpha is considered to be a gating mechanism that inhibits other cognitive processes locally while challenging speech comprehension takes place.

Our data indicate that—in comparison to baseline—there is indeed an increase in alpha-power in response to the most challenging NV condition, indeed suggesting that the role of temporal alpha may be to actively regulate the engagement of downstream cognitive processes (confirmed by one-tailed posthoc *t*-test, *p* = 0.017). In the present case, it is possible that suppressing the final stages of word-identification processes (i.e., lexical access or semantic access) is a way of improving word-identification accuracy by enabling additional time for processing of the acoustic input. The possibility that the identification processes are delayed as a function of degradation can be investigated in future studies.

The effect at this time was localized to a region incorporating the left supramarginal gyrus (SMG), left inferior parietal lobule, and the posterior left superior temporal gyrus. This approximate localization situates the effect in a region that has been considered by some authors to be part of Wernicke's region (Geschwind, 1972; Bogen and Bogen, 1976) and has also been associated with sensorimotor integration during speech comprehension (Hickok and Poeppel, 2007). A role for sensorimotor integration during degraded speech comprehension would be consistent with the notion that, when confronted with degraded speech, the speech comprehension system falls back on higher-level (topdown) mechanisms to help resolve the degraded acoustic input. In this case, since very little linguistic information is present in the degraded monosyllabic stimuli, such information could potentially be articulatory in nature (e.g., as suggested by Hervais-Adelman et al., 2012). Nevertheless, beyond implicating this region in degraded speech processing, evidence for its exact role is scant.

The localization of the source of this alpha-suppression effect using a weighted minimum norm estimate was rather different to that shown by Obleser and Weisz (2012) using a beamformer approach. While we show relatively localized sources, they showed more distributed sources incorporating anterior temporal regions, parietal lobes, right dorsolateral prefrontal cortex and left inferior frontal regions.

Another potentially fruitful line of enquiry concerns the link between alpha-suppression and the pulvinar nucleus of the thalamus. Although we do not directly observe any effects in the pulvinar in this study, an investigation by Erb et al. (2012) implicated this structure as being linked to individuals' ability to learn to comprehend NV speech. Activity in the pulvinar nucleus has been suggested as a driver of alpha suppression (Lopes da Silva et al., 1980; Liu et al., 2012), and is also linked to attentional control in vision (e.g., Kastner et al., 2004; Smith et al., 2009) and audition (Wester et al., 2001), while attention has been found to affect the cerebral processing of NV speech (Wild et al., 2012b). Although at this stage this potential link is highly speculative, we would suggest that the thalamo-cortical networks implicated in alpha-band activity are worthwhile targets for investigations of speech comprehension.

Interestingly, we do not find regions in which alphasuppression shows the opposite response to intelligibility (i.e., alpha is more suppressed when stimuli are harder to understand). This is inconsistent with the fMRI literature, in which many studies have demonstrated the recruitment of additional brain areas when speech perception is challenging due to acoustic degradation compared to when it is clear. For example, several studies have demonstrated that left inferior frontal gyrus regions are engaged when acoustically degraded speech is presented (Davis and Johnsrude, 2003; Giraud et al., 2004; Obleser et al., 2007; Hervais-Adelman et al., 2012; Wild et al., 2012b), others have shown increased recruitment of anterior and posterior temporal lobe structures in the face of acoustic degradation (Scott et al., 2000; Narain et al., 2003; Evans et al., 2013). We would have expected to see such effects reflected in the EEG signature of degraded speech processing. That we do not find effects in this direction in these data is puzzling and demands further investigation. However, generation of alpha oscillations can differ considerably from site to site, even within the visual system. This has been reported for example with regard to originating layers of alpha and even relationship of the measured alpha oscillations to behavioral performance (Bollimunta et al., 2008).

Another interesting detail is that the sites of our observed major effect in the alpha-band and the sites of highest spectral power in the alpha-band do not spatially coincide. As is commonly known, occipital electrodes exhibit much higher resting-state alpha power than the remaining electrodes. The lefttemporal cluster identified in our study shows only moderate prestimulus alpha-amplitude. Yet its modulation is highly significant. This demonstrates that while such alpha modulation might be more difficult to identify, it can nonetheless be extracted.

In general, the link between alpha rhythm and language/word processing is relatively sparsely described. While there seems to be some consensus about auditory alpha/tau (see Lehtela et al., 1997), the idea of higher-level language-related alpha oscillations is still quite recent (e.g., Obleser and Weisz, 2012) and highly exciting. Whether this points to an even more universal role of alpha oscillations than previously envisioned remains to be seen and will require further investigations in more diverse experimental settings.

## **CONCLUSION**

Taken together, the data presented above suggest that the auditory system can rapidly differentiate easily-comprehensible speech (approximately 300 ms after stimulus onset), and that potentiallycomprehensible degraded speech can begin to be processed differently to incomprehensible speech as early as 480 ms after stimulus onset. We demonstrate that alpha-band power in the left temporal lobe is modulated by stimulus intelligibility. Spatial and temporal features of this effect seem to suggest this effect indexes the ease of access to word meaning. Our paradigm ensures that the alpha-band desynchronization we observe is not simply related to increasing spectral detail in the stimuli, and thus confirms that alpha-band desynchronization at left temporal sites reflects word intelligibility. These findings provide insights into the neural mechanisms of degraded speech perception, and may be taken to suggest that one strategy in compensating for acousticallydegraded input may be to suppress recognition processes that ordinarily take place rapidly and automatically, in order to permit additional processing to be carried out on the degraded signal.

## **ACKNOWLEDGMENTS**

This work was supported by a Marie Curie fellowship to Robert Becker and received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 267171. Christoph M. Michel is supported by the Swiss National Science Foundation (Grant No. 310030\_132952). Alexis Hervais-Adelman and Maria Pefkou were supported by the Swiss National Science Foundation (Grant No. 320030\_122085 awarded to Professor Narly Golestani). The EEG equipment and Cartool software are supported by the Center for Biomedical Imaging (CIBM), Geneva and Lausanne, Switzerland. We would like to thank the reviewers for their careful reading of the manuscript and their helpful comments.

#### **REFERENCES**


bias and predicts visual target detection. *J. Neurosci.* 26, 9494–9502. doi: 10. 1523/jneurosci.0875-06.2006


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 15 September 2013; accepted: 10 December 2013; published online: 27 December 2013*.

*Citation: Becker R, Pefkou M, Michel CM and Hervais-Adelman AG (2013) Left temporal alpha-band activity reflects single word intelligibility. Front. Syst. Neurosci. 7:121. doi: 10.3389/fnsys.2013.00121*

*This article was submitted to the journal Frontiers in Systems Neuroscience*.

*Copyright © 2013 Becker, Pefkou, Michel and Hervais-Adelman. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms*.

## Corrigendum: Left temporal alpha-band activity reflects single word intelligibility

#### *Robert Becker 1, Maria Pefkou2, Christoph M. Michel <sup>1</sup> and Alexis G. Hervais-Adelman2 \**

*<sup>1</sup> Functional Brain Mapping Lab, Department of Fundamental Neuroscience, University of Geneva, University Medical School, Geneva, Switzerland*

*<sup>2</sup> Brain and Languge Lab, Department of Clinical Neuroscience, University of Geneva, University Medical School, Geneva, Switzerland \*Correspondence: alexis.adelman@unige.ch*

#### *Edited and reviewed by:*

*Jonathan E. Peelle, Washington University in St. Louis, USA*

**Keywords: speech intelligibility, degraded speech, noise-vocoding, alpha oscillations, left inferior temporal cortex**

#### **A commentary on**

#### **Left temporal alpha-band activity reflects single word intelligibility**

*by Becker, R., Pefkou, M., Michel, C. M., and Hervais-Adelman, A. G. (2013). Front. Syst. Neurosci. 7:121. doi: 10.3389/fnsys.2013.00121*

Figure 5 of the article by Becker et al. (2013) contained a minor error, which we hereby rectify. In the original figure at the bottom left of panel C the indication of the sagittal section used for display of the inverse solution is incorrect. We therefore re-submit **Figure 5** with the correct crosssection for subpanel C.

## **REFERENCES**


*Received: 27 February 2014; accepted: 14 March 2014; published online: 01 April 2014.*

*Citation: Becker R, Pefkou M, Michel CM and Hervais-Adelman AG (2014) Corrigendum: Left temporal alpha-band activity reflects single word intelligibility. Front. Syst. Neurosci. 8:47. doi: 10.3389/fnsys. 2014.00047*

*This article was submitted to the journal Frontiers in Systems Neuroscience.*

*Copyright © 2014 Becker, Pefkou, Michel and Hervais-Adelman. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

#### **FIGURE 5 | Results of ANOVA of induced activity in the alpha-band. (A)** Electrode-by-time plot of the *p*-values for the interaction of rotation × spectral detail with corresponding *F*- and *p*-values, thresholded at *p* = 0*.*001, revealing the time-window of interest (462–633 ms). The color bar indicates the corresponding *F*- and *p*-values, the threshold for *p*(FDR) *<* 0*.*05 is indicated. **(B)** Topography of this effect, using the same color scale as in **(A)** at the peak of the effect (533 ms), indicating a contribution of left-temporal sources. **(C)** Localization of this effect in the inverse space, the main source being in the left supramarginal gyrus extending into left inferior

parietal and superior temporal structures, showing the average F-statistic over the time-window of interest. **(D)** Average time-course of this effect in a cluster of five contributing electrodes across NV conditions, demonstrating enhanced alpha-band suppression for more intelligible conditions. **(E)** Corresponding time-courses for the spectrally rotated conditions, where the effect of spectral detail is absent. **(F)** Alpha-band activity for each condition in the significant time-window, error bars represent standard error of the mean corrected to be appropriate for repeated-measures comparisons, as described in Loftus and Masson (1994).

## Compensatory changes in cortical resource allocation in adults with hearing loss

## *Julia Campbell <sup>1</sup> and Anu Sharma1,2\**

*<sup>1</sup> Department of Speech, Language and Hearing Sciences, University of Colorado at Boulder, Boulder, CO, USA <sup>2</sup> Institute of Cognitive Science, University of Colorado at Boulder, Boulder, CO, USA*

#### *Edited by:*

*Arthur Wingfield, Brandeis University, USA*

#### *Reviewed by:*

*Preston E. Garraghty, Indiana University, USA Teresa Mitchell, University of Massachusetts Medical School, USA*

#### *\*Correspondence:*

*Anu Sharma, Department of Speech, Language and Hearing Sciences, University of Colorado at Boulder, 2501 Kittredge Loop Road, Boulder CO 80309, USA e-mail: anu.sharma@colorado.edu*

Hearing loss has been linked to many types of cognitive decline in adults, including an association between hearing loss severity and dementia. However, it remains unclear whether cortical re-organization associated with hearing loss occurs in early stages of hearing decline and in early stages of auditory processing. In this study, we examined compensatory plasticity in adults with mild-moderate hearing loss using obligatory, passively-elicited, cortical auditory evoked potentials (CAEP). High-density EEG elicited by speech stimuli was recorded in adults with hearing loss and age-matched normal hearing controls. Latency, amplitude and source localization of the P1, N1, P2 components of the CAEP were analyzed. Adults with mild-moderate hearing loss showed increases in latency and amplitude of the P2 CAEP relative to control subjects. Current density reconstructions revealed decreased activation in temporal cortex and increased activation in frontal cortical areas for hearing-impaired listeners relative to normal hearing listeners. Participants' behavioral performance on a clinical test of speech perception in noise was significantly correlated with the increases in P2 latency. Our results indicate that changes in cortical resource allocation are apparent in early stages of adult hearing loss, and that these passively-elicited cortical changes are related to behavioral speech perception outcome.

**Keywords: adult, sensorineural hearing loss, cortical auditory evoked potential, cortical resource allocation, source localization**

## **INTRODUCTION**

Adults with hearing impairment have been shown to exhibit concomitant deficiencies in cognitive performance (see Craik, 2007; Tun et al., 2012, for a review). A possible reason for this interaction between hearing loss (HL) and cognition may be due to an increase in cognitive load as greater attention is devoted to auditory signals in hearing impairment. For instance, when hearing-impaired adults allocate cognitive processing strategies to understand a degraded incoming auditory signal, the increased load at a basic processing level may detract from later cognitive performance downstream (Pichora-Fuller et al., 1995; Pichora-Fuller and Singh, 2006). As a result, cognitive processes such as memory and executive function are adversely affected in hearing impairment (Arlinger et al., 2009; Lunner et al., 2009; Rönnberg et al., 2010, 2011a,b; Lin, 2011; Rudner et al., 2012; Lin, 2013).

Studies using functional neuroimaging, neural models, and behavioral measures have demonstrated a strong relationship between auditory cortical integrity and the processing of challenging auditory information, such as degraded signals and complex speech in individuals with HL (Wingfield et al., 2006; Harris et al., 2009; Miller and Wingfield, 2010; Peelle et al., 2010a,b, 2011; Wong et al., 2010).

Recent research has shown a compelling correlation between degree of HL severity and all-cause dementia (including Alzheimer's disease), suggesting that increases in auditory deprivation may subsequently influence overall cognitive decline (Lin, 2011, 2012, 2013; Lin et al., 2011a,b). Lin et al. (2011a; Lin, 2013) discuss the decrease in cognitive reserve accompanying HL as a possible mechanism for the link between HL and dementia. Cognitive or neural reserve reflects the ability of the brain to compensate for the deleterious effects of sensory deprivation through the recruitment of alternative or additional brain networks to perform a specific task (Boyle et al., 2008). Sensory deprivation, as in HL, appears to tax the brain by altering normal resource allocation, thereby affecting neural reserve and cognitive performance. Given the relationship between degree of HL and cognitive decline, there appears to be a clear need for systematically examining changes in cortical resource allocation as HL progresses in severity from mild to profound, and to determine whether these changes are apparent at early stages of cortical auditory processing. Electroencephalography (EEG) is a useful measure to examine cortical changes associated with HL due to its non-invasive nature, widespread use in clinical settings and high temporal resolution important in measures of auditory processing.

In this study, we examined cortical re-organization resulting from HL in adult listeners with mild-moderate sensorineural hearing impairment using high-density EEG. We evaluated obligatory, passively-elicited P1, N1, and P2 components of the cortical auditory evoked potential (CAEP) using source localization. We correlated CAEP changes with performance on a clinical test of speech perception in noise to better understand the impact of cortical changes in early stages of hearing decline.

#### **METHODS**

#### **PARTICIPANTS**

Adults between the ages of 37 to 68 years participated in this study (*n* = 17). Subjects were recruited using fliers and recruitment letters. Consent was obtained through documentation approved by the University of Colorado at Boulder Institutional Review Board. Hearing acuity was measured using standard clinical audiometric procedures. Normal hearing (NH) thresholds [below 25 dB Hearing Level (HL)] for frequencies ranging from 0.25–8 kHz were observed for eight of the participants (*M* = 50*.*5 years, *SD* = ±6*.*2 years), while the remaining nine demonstrated HL (*M* = 56*.*9 years, *SD* = ±8*.*9 years). The HL group showed, on average, NH from 0.25 through 1 kHz and a mild-to-moderate sensorineural HL bilaterally from 2 to 8 kHz. Mean threshold audiograms for the two groups are shown in **Figure 1**. Participants in the HL group had received no clinical intervention, and many were unaware of their HL at the time of enrollment, consistent with the mild nature of their HL, and suggesting that their HL might have been fairly recent. Participants reported no history of neurological impairment. The NH group and HL group showed no significant difference in age between groups [*t(*15*)* = −1*.*69, *p* = 0*.*537].

#### **SPEECH PERCEPTION IN NOISE**

The QuickSIN™, a clinical measure of auditory threshold for sentences in background noise, was used to determine acuity of speech perception in background noise (Killion et al., 2004). Stimuli were presented via a speaker placed at 0◦ azimuth. Standard clinical testing procedures were used: Listeners were instructed to repeat two sentence lists, consisting of six sentences each, presented at 65 dB HL. Background noise was increased for each consecutive sentence in 5 dB increments, so that the signalto-noise ratio (SNR) began at 25 dB and ended at 0 dB for the last sentence. The SNR score from the two lists was averaged for each

**FIGURE 1 | Average pure tone thresholds across clinical test frequencies (X-axis) for right and left ears, respectively.** Intensity of frequency presentation level is shown on the Y-axis. The normal hearing group (NH) thresholds are depicted in solid black, and the hearing loss (HL) group thresholds in dashed red. Vertical black bars indicate standard deviation. The solid black line illustrates the criterion for normal hearing, at 25 dB HL.

listener, providing the level necessary for each individual to correctly repeat 50% of the key words in each sentence. The lower the SNR score, the greater the level of background noise that could be tolerated by the listener, and the better the performance.

#### **EEG AUDITORY STIMULI**

Participants were presented with a nonsense speech syllable, /ba/, at a level of at 65 dB HL, via two speakers placed at 45◦ angles in relation to the subject (Sharma et al., 2005). Stimuli were presented at a similar intensity level to all subjects consistent with previous studies examining cortical functioning in HL listeners (e.g., Harkrider et al., 2009; Bertoli et al., 2011; Peelle et al., 2011). Subjects were asked to ignore the stimulus while watching a movie, with the sound off and subtitles on, to ensure that participants remained awake (Sharma et al., 2005). Each /ba/ stimulus was 90 ms in duration and was presented at an inter-stimulus interval of 610 ms. One block of 1200 sweeps was collected per subject.

#### **EEG RECORDING AND ANALYSES**

Participants were fit with a 128-channel electrode net (Electrical Geodesic, Inc.) and seated in a reclining chair in an electromagnetically shielded sound booth. Auditory stimuli were presented via stimulus software E-Prime 2.0. The recording sampling rate was 1 kHz, with a band-pass filter of 0.1–200 Hz.

EEG topographic map analysis was completed offline using Net Station 4 (Electrical Geodesic, Inc.). A two-dimensional voltage map was generated for each group grand average waveform for each of the three obligatory CAEP peak components (P1, N1, P2). Regions of interest (ROI) were identified based on the greatest group differences for each CAEP component. Four ROIs were determined to be present: the frontal region, central region, the left frontal hemisphere (LH), and the right frontal hemisphere (RH). Individual EEG data was then exported from Net Station and imported into the EEGLAB toolbox (Delorme and Makeig, 2004) supported by MatLab (The MathWorks®, Inc., 2010). Epoched data was baseline corrected to the pre-stimulus interval of 100 ms and initial artifact rejection performed at ±100µV. The sampling rate was down-sampled from 1 kHz to 250 Hz in order to decrease processing time, resulting in a change of the poststimulus time to 592 ms. Concatenated EEG sweeps were then pruned using an independent component analysis (ICA) statistical procedure (Debener et al., 2006, 2008). Additional artifact such as ocular and other extraneous muscle movement identified as separate components were removed from the data. CAEP waveform peak components were visually identified and averaged after this step. For each subject, three electrodes were then grand averaged in each ROI, except for the central ROI where we averaged across four electrodes. Latency and amplitude values were determined for each participant CAEP waveform. All peak component amplitudes (P1, N1, P2) were measured from baseline to peak, or the midpoint of broad peaks. Latencies were chosen at the highest amplitude of the peak, or the midpoint of broad, flat peaks. Planned statistical comparisons were performed on the CAEP latency and amplitude components averaged within each ROI to determine significant differences between groups.

#### **CURRENT DENSITY RECONSTRUCTIONS**

ICA on concatenated EEG sweeps was performed to remove artifact, increase signal to noise ratio, and identify underlying components to be sourced. ICA results in multiple temporally independent components that underlie the evoked potential and are fixed in the spatial domain (Makeig et al., 1997; Delorme et al., 2012). These components allow for precise generator localization when used in cortical source modeling (Makeig et al., 2004; Hine and Debener, 2007; Debener et al., 2008). Concatenated EEG sweeps were pruned, as previously described, using ICA in order to remove noise artifact (Debener et al., 2006, 2008). This first pruning was followed by a second pruning to identify major components making up each CAEP peak component. Only independent components that accounted for the greatest percent variance underlying a CAEP peak of interest (P1, N1, P2) were retained for source localization analysis, or current density reconstruction (CDR). The individually pruned waveforms were grandaveraged for the NH and HL groups and exported into CURRY® Scan 7 Neuroimaging Suite (Compumedics Neuroscan™) for CDR. In CURRY®, another ICA was performed on each group average, and only components showing a SNR of at least 2.0 accepted.

CDR was performed separately for each CAEP peak component using sLORETA. Standardized low-resolution brain electromagnetic tomography (sLORETA) is a statistical procedure that estimates a focal CDR with zero localization error using actual source and measurement variance (Pascual-Marqui, 2002; Grech et al., 2008). The selected head model utilized for source modeling consisted of the standardized boundary element method (BEM) (Fuchs et al., 2002). A color scale corresponding to the intensity of cortical activation, as estimated by sLORETA, illustrates the CDR on an average magnetic resonance image (MRI) consisting of 100 people.

### **RESULTS**

#### **AUDITORY EVOKED POTENTIALS**

Based on the two-dimensional voltage maps for both groups and group differences between the waveforms, four ROIs were determined in the frontal, central, left frontal hemispheric (LH), and right frontal hemispheric regions (RH). Three obligatory CAEP components elicited by the speech sound were evaluated: the P1 (occurring at approximately 70 ms), N1 (at approximately 100 ms), and P2 (at approximately 180 ms). Group differences for the amplitude and latency of each component were analyzed using a One-Way ANOVA, and planned *post-hoc* comparisons were made between the groups at each ROI.

P2 amplitude was found to be significantly larger in the HL group (relative to the NH group) for the frontal ROI [*F(*1*,* <sup>60</sup>*)* = 8*.*7, *p* = 0*.*005], the central ROI [*F(*1*,* <sup>60</sup>*)* = 14*.*97, *p* = 0*.*000], and the LH ROI [*F(*1*,* <sup>60</sup>*)* = 8*.*856, *p* = 0*.*004], but not at the RH ROI [*F(*1*,* <sup>60</sup>*)* = 3*.*621, *p* = 0*.*062]. P2 latency was found to be significantly longer for the HL group in the frontal ROI [*F(*1*,* <sup>60</sup>*)* = 5*.*34, *p* = 0*.*024], but not the central [*F(*1*,* <sup>60</sup>*)* = 0*.*783, *p* = 0*.*380], LH [*F(*1*,* <sup>60</sup>*)* = 1*.*054, *p* = 0*.*309], or RH [*F(*1*,* <sup>60</sup>*)* = 3*.*832, *p* = 0*.*055] ROIs. P1 amplitude did not differ significantly between groups in any ROI [frontal: *F(*1*,* <sup>60</sup>*)* = 2*.*149, *p* = 0*.*148; central: *F(*1*,* <sup>60</sup>*)* = 3*.*715, *p* = 0*.*059; LH: *F(*1*,* <sup>60</sup>*)* = 2*.*446, *p* = 0*.*123; RH: *F(*1*,* <sup>60</sup>*)* = 1*.*661, *p* = 0*.*202]. P1 latency showed no significant difference [frontal: *F(*1*,* <sup>60</sup>*)* = 1*.*163, *p* = 0*.*285; central: *F(*1*,* <sup>60</sup>*)* = 0*.*234, *p* = 0*.*630; LH: *F(*1*,* <sup>60</sup>*)* = 0*.*295, *p* = 0*.*589; RH: *F(*1*,* <sup>60</sup>*)* = 0*.*251, *p* = 0*.*618]. Similarly, the N1 component did not differ significantly between groups in amplitude [frontal: *F(*1*,* <sup>60</sup>*)* = 3*.*685, *p* = 0*.*060; central: *F(*1*,* <sup>60</sup>*)* = 0*.*362, *p* = 0*.*549; LH: *F(*1*,* <sup>60</sup>*)* = 3*.*322, *p* = 0*.*073; RH: *F(*1*,* <sup>60</sup>*)* = 0*.*042, *p* = 0*.*838], or latency [frontal: *F(*1*,* <sup>60</sup>*)* = 2*.*409, *p* = 0*.*126; central: *F(*1*,* <sup>60</sup>*)* = 0*.*020, *p* = 0*.*887; LH: *F(*1*,* <sup>60</sup>*)* = 1*.*625, *p* = 0*.*207; RH: *F(*1*,* <sup>60</sup>*)* = 0*.*851, *p* = 0*.*360]. **Figure 2** shows the grand average waveforms from the frontal ROI, with mean amplitude bar graphs depicting the significantly larger P2 amplitude and longer P2 latency for the HL group compared to the NH group.

It should be noted that we presented the auditory stimuli at a comfortably loud conversational level for our participants. The /ba/ stimulus is comprised of spectral energy occurring mainly in the low-mid frequency region (0.5–2 kHz) (Sharma et al., 2002), and the HL listeners demonstrated average thresholds that were within the normal range at these frequencies. There was an average difference of approximately 10 dB HL between thresholds for the HL and NH group in the 0.5–2 kHz range, therefore, some HL listeners may have heard the stimuli at a sensation level (SL) that was, on average, 10 dB lower than for NH subjects. However, it is important to note that it is a well-established finding that CAEP amplitude decreases with lower intensity level for both NH and HL listeners (Bertoli et al., 2011), while the results of this study show increased P2 amplitude for the HL listeners. That is, if results were influenced by the decreased SL for HL listeners, we would have expected to observe a corresponding decrease in P2 amplitude for HL compared with NH listeners rather than a larger P2 amplitude for HL listeners (**Figure 2**). Furthermore, our results are consistent with those of Bertoli et al. (2011) and Harkrider et al. (2009), who reported larger P2 amplitudes for adults with mild-moderate HL compared with those for control subjects.

#### **CURRENT DENSITY RECONSTRUCTIONS**

Cortical source localization, or CDR, was conducted using the sLORETA algorithm provided by CURRY Scan 7 Neuroimaging Suite for the three CAEP peak components (**Figure 3A**). The activations were superimposed on an average MRI (axial slice view) and the MNI co-ordinates are shown beneath each slice. The scale of the F distribution, indicating the strength of the activations, is also shown. **Figure 3A** shows axial views of the CDR. For NH listeners, as seen in **Figure 3A**, the P1, N1, and P2 CAEP components activated temporal cortical regions including superior temporal gyrus (STG) and inferior temporal gyrus (ITG). Responses for the P1 and P2 components were relegated to the left hemisphere (LH), likely due to our use of a speech syllable (Stefanatos et al., 2008). See **Figure 3B** for a table describing the main activated regions. Cortical activation by speech stimuli in regions of temporal cortex is consistent with fMRI neuroimaging and intracranial electrocorticographic studies using speech stimuli (Stefanatos et al., 2008; Pasley et al., 2012). In contrast, for the HL group, clearly decreased activation of auditory areas such as STG and MTG within temporal cortex was apparent (see **Figure 3A**).

**Figure 4** shows sagittal views for the CDR. Consistent with the axial views shown in **Figure 3A**, as seen in **Figure 4A**, NH listeners showed activation of temporal cortical areas including STG and ITG. Conversely, for HL listeners, cortical responses to speech stimuli were localized to frontal cortex, in medial frontal gyrus (MFG), inferior frontal gyrus (IFG), and Brodmann Area 11 (BA 11). See **Figure 4B** for a table describing the main areas of activation. Frontal cortical activation was clearly the largest for the P1 and P2 CAEP components (**Figure 4A**).

#### **SPEECH PERCEPTION IN NOISE**

Behavioral testing of speech perception in noise acuity was measured for both groups using the QuickSIN™ clinical test (Killion et al., 2004). The higher the SNR score, the louder the signal has to be in order for the listener to perceive speech. As shown in **Figure 5A**, the HL group required the signal to be, on average, almost four decibels higher than the background noise for correct


**FIGURE 3 | (A)** Current density reconstructions (CDR) showing cortical activation at the P1, N1, and P2 CAEP peak components on axial MRI slices for the normal hearing (NH) and hearing loss (HL) groups. The scale of the F Distribution is shown in the upper right corner ranging from red to yellow

(yellow is highest level of activation), and the Montreal Neurological Institute (MNI) coordinates are listed below each MRI slice. **(B)** A table describing activated anatomical cortical areas for the CAEP components for each group, listed in approximate order of highest level of activation.

**FIGURE 4 | (A)** Current density reconstructions (CDR) showing cortical activation at the P1, N1, and P2 CAEP peak components on sagittal MRI slices for the normal hearing (NH) and hearing loss (HL) groups. The scale of the F Distribution is shown in the upper right corner ranging from red to

yellow (yellow is highest level of activation), and the Montreal Neurological Institute (MNI) coordinates are listed below each MRI slice. **(B)** A table describing activated anatomical cortical areas for the CAEP components for each group, listed in approximate order of highest level of activation.

**FIGURE 5 | (A)** Mean QuickSIN™ scores for normal hearing (NH, in black) and hearing loss (HL, in red) groups. Standard deviations are shown as vertical bars. One asterisk reflects a significant difference at *p <* 0*.*05. **(B)** The correlation of the CAEP P2 component latency as a function of QuickSIN™ scores. The Spearman's rank order correlation coefficient value and significance level are indicated in the right upper corner.

perception. Due to the non-parametric distribution of individual QuickSIN™ scores, a Mann-Whitney U Test was calculated to determine statistical significance between the groups (*U* = 10*.*5, *Z* = −2*.*46, *p* = 0*.*014). This difference in performance has been found in similar studies with NH listeners and listeners with HL (Killion et al., 2004; Wilson et al., 2007).

QuickSIN scores were correlated with P2 latency and amplitude. All participants were included in the correlation, as HL can be considered a gradual decrease in threshold starting at 0 dB HL. Frontal ROI P2 latency showed a significant positive correlation with speech performance in background noise (*r* = 0*.*494, *p* = 0*.*022), suggesting that increases in P2 latency were associated with greater difficulty in perceiving speech in noise. We did not see a significant correlation between QuickSIN™ scores and P2 amplitude.

#### **DEGREE OF HEARING LOSS AND CAEP P2 AMPLITUDE**

Frontal P2 amplitude showed a significant positive correlation with high frequency Pure Tone Average (PTA), i.e., the degree of hearing impairment at 2–8 kHz for both ears (right ear: *r* = 0*.*538, *p* = 0*.*013, left ear: *r* = 0*.*474, *p* = 0*.*027). Thus, as HL increased across participants, there was a corresponding increase in P2 amplitude. No significant correlation was observed between P2 latency and high-frequency PTA.

#### **DISCUSSION**

We examined cortical changes secondary to mild-moderate HL in post-lingually hearing-impaired adults. When tested using speech-evoked EEG in a passive stimulation paradigm, adults with mild to moderate sensorineural HL showed the following distinct cortical changes relative to age-matched NH controls: (1) increased P2 CAEP amplitude and latency, (2) reduced activation in temporal auditory cortical regions, (3) activation of frontal cortical regions in response to auditory stimulation, (4) significantly poorer speech perception in background noise that correlated with increased P2 latency and (5) a significant correlation between increased P2 amplitude and hearing thresholds at high frequencies (2, 4, and 8 kHz). Thus, even in relatively early stages of HL and early stages of auditory processing, adult subjects appear to show significant alterations in cortical activation.

Our finding of increased P2 amplitude for HL listeners is consistent with previous studies, which documented increased P2 CAEP amplitude in older adults who were long-time hearing aid users (Bertoli et al., 2011), and in young adults with mild-moderate HL (Harkrider et al., 2009). Bertoli et al. (2011) reported larger P2 amplitudes for adults with mild-moderate HL who were long-term hearing aid users, and attributed the larger auditory cortical responses in HL adults to an increase in "effortful listening." It is of further interest to note that larger P2 CAEP amplitudes have been reported after auditory training, possibly indicating increased utilization of auditory memory and perceptual resources (Naatanen and Picton, 1987; Shahin et al., 2003; Ross and Tremblay, 2009; Tong et al., 2009). Along these lines, our finding of an increase in P2 latency is also consistent with previous studies in adults with HL, which suggest that the increased latency reflects inefficient cortical processing as the central auditory system is required to process a degraded and/or challenging signal (Harkrider et al., 2005, 2009; Ross et al., 2007).

Aging has also been reported as a factor in increased P2 amplitude and latency, possibly due to decreased central inhibition. However, we included age-matched, NH listeners, making it unlikely that aging solely accounts for the differences in P2 amplitude and latency seen for the HL group (Harkrider et al., 2006; Ceponiene et al., 2008). Furthermore, Harkrider et al. (2009) observed increased P2 amplitude and latency in young adults with mild-moderate HL in response to nonsense speech syllables, suggesting that higher-order auditory processing is affected by auditory deprivation and not age alone, though an interaction between age and HL is likely. In the case of older listeners with HL, reduced central inhibition via an interaction between aging and HL may result in increased P2 amplitude (Dustman et al., 1996; Syka, 2002). Our results also showed a significant increase in P2 amplitude for the HL group relative to the NH groups over the LH but not the right hemisphere (RH). Given our use of a speech stimulus, the larger P2 amplitude in the HL group over the LH may be due to more active role of the LH in processing of speech information combined with a lack of inhibition due to HL (Syka, 2002; Stefanatos et al., 2008).

A major finding in our study was that listeners with mild-moderate sensorineural HL showed significant cortical re-organization. Current density reconstructions via sLORETA revealed that HL listeners showed decreased activation of auditory cortical areas (STG and MTG) relative to NH listeners (**Figure 3A**) and showed activation of frontal cortical regions (e.g., IFG, MFG, SFG) in response to passive auditory stimulation (**Figure 4A**). This change in cortical activation from temporal regions to frontal regions indicates a possible re-allocation of cortical processing in response to auditory stimuli, likely as a compensatory effect of HL. The finding of a shift of the auditory response to frontal areas is consistent with the fMRI studies of Peelle et al. (2010a, 2011) and Wingfield and Grossman (2006), who showed lower amounts of gray matter volume in temporal cortices in adults with HL, as well as greater activation in frontal cortices in response to challenging listening conditions for older adults. This frontal and pre-frontal activation was associated with increased listening effort, as these regions have been traditionally associated with tasks involving working memory and executive function (Collette et al., 2006; Eckert et al., 2008; Liakakis et al., 2011). Overall, our results are consistent with neuroimaging research, which has demonstrated a reliance on frontal regions involved in the cognition and the processing of complex auditory stimuli in older adults (Sharp et al., 2006; Eckert et al., 2008; Tyler et al., 2010; Obleser et al., 2011). Thus, the present results of cortical re-organization in HL adults support recent hypotheses suggestive of an increased cognitive load in hearing impaired listeners, and may provide evidence for the taxation of the reserve of cognitive processes (Pichora-Fuller and Singh, 2006; Lin, 2011, 2012; Lin et al., 2011a,b).

It is surprising, however, that we observed that frontal cortical regions, typically associated with cognitive processing, were engaged in response to a passive auditory task that did not require the participants' attention. This finding suggests that compensatory processing may begin at early stages of central auditory processing in adult-onset HL (Harris et al., 2009; Anderson and Kraus, 2010). Indeed, another form of compensatory plasticity (i.e., recruitment of auditory cortical regions for visual processing) has been observed in adults with mild-moderate HL in whom passively viewed visual stimuli activated temporal cortical regions (Campbell and Sharma, in review). Recent studies have shown similar temporal cortical activation by visual stimuli in deaf adults fitted with cochlear implants (Doucet et al., 2006; Buckley and Tobey, 2011; Sandmann et al., 2012). Visual information becomes of greater importance in HL, especially in watching a speaker's face and lip movements for contextual cues (McCullough et al., 2005; Letourneau and Mitchell, 2011). These findings, taken together with the present results, suggest that increased frontal activation and reduced temporal activation to speech may occur in parallel with increased temporal activation to visual stimuli (likely due to reliance on faces and lipreading in everyday communication), even as early as in mild-moderate HL. Thus, cortical re-allocation during processing of auditory stimuli may result in increased cognitive load that usually occurs in higher-order processing, but that is now occurring for lower-level passive processing, resulting in degraded behavioral outcomes for challenging listening environments (Pichora-Fuller and Singh, 2006; Larsby et al., 2008). It is possible that various training paradigms using speech and music (possibly in conjunction with hearing aid rehabilitation) may allow for re-training of auditory cortices in HL listeners to re-activate normal neural networks during auditory processing (Petersen et al., 2009; Shahin, 2011; Turner et al., 2013).

Hearing loss is most consistently associated with poor outcomes in recognizing speech in background noise, a skill essential for everyday listening (Souza et al., 2007). Consistent with previous research in hearing-impaired adults, our results show that listeners with even mild-to-moderate HL demonstrate a significant deficit when listening to speech in background noise (Dubno, 1984; Vermiglio et al., 2012). HL listeners required a much larger SNR to accurately perceive sentences in noise (**Figure 5A**). Audibility does not appear to fully account for this decrease in performance (Hällgren et al., 2005; Souza et al., 2007; Léger et al., 2012; Vermiglio et al., 2012). In this study, speech perception in background noise was significantly correlated with increased P2 latency (**Figure 5B**). This increase in latency is consistent with previous studies suggesting that the increase in auditory processing time (as reflected by the P2 latency increase in the HL group) may be reflective of additional activated cognitive cortical regions, or compensatory cortical circuitry (Ross et al., 2007; Harkrider et al., 2009). In addition, larger P2 CAEP amplitudes were correlated with worse auditory pure tone thresholds at high frequencies (2, 4, and 8 kHz). Given that P2 amplitude has been associated with re-allocation of cognitive resources, (Tremblay et al., 2003; Harkrider et al., 2005, 2009; Tong et al., 2009), it would appear that the degree of cortical re-organization increases with the severity of the HL.

Taken together, the observed increase in P2 CAEP amplitude and latency, decreased activation in temporal areas with increased activation of frontal cortical regions during passive listening, and poorer behavioral outcomes in the HL group, provide evidence of compensatory cortical plasticity occurring in mild-moderate HL (i.e., in early stages of hearing decline). The nature of this plasticity is observed as a re-allocation of cortical resources from temporal auditory areas to frontal cognitive areas, which appear to be recruited to assist with processing of auditory stimuli even at the level of passive listening. Overall, our results are consistent with the hypothesis that HL appears to initiate a process of resource re-allocation, which results in increased cognitive load

## **REFERENCES**


and speech perception in pre- and postlingually deaf cochlear implant users. *Ear Hear*. 32, 2–15. doi: 10.1097/AUD.0b013e3181e8534c


(Pichora-Fuller et al., 1995; Pichora-Fuller and Singh, 2006; Peelle et al., 2010a,b, 2011; Lin, 2011, 2012, 2013; Lin et al., 2011a,b). Finally, measures of cognitive resource re-allocation in HL, both objective and behavioral, may become increasingly relevant in the clinical setting in order to determine patients at risk for cognitive decline. It would be of interest to determine whether hearing aids, auditory training, or a combination might possibly alleviate this cognitive resource re-allocation as reflected by a possible decrease in frontal activation and return to normal levels of temporal cortical activation (Lunner et al., 2009; Parbery-Clark et al., 2011; Rudner et al., 2012).

## **SUMMARY**

Our results demonstrate auditory cortical re-organization in the form of decreased temporal activation and increased frontal activation in early stage HL of mild-moderate severity using passively elicited EEG responses. Furthermore, increased latency and amplitude of the P2 component were associated with decreases in speech perception performance and increase in hearing threshold, respectively. Due to the strong relationship between HL and cognitive deficits, such as dementia, that arise later in life, it is important that clinical evaluation of cognitive reserve in HL be included as part of intervention services. Future research should focus on better understanding the relationship between the severity of cognitive re-allocation in relation to severity of HL as well as reversibility of re-organization as a result of intervention with amplification.

#### **ACKNOWLEDGMENTS**

We would like to acknowledge the assistance of Lauren Durkee, B.A. This research was supported by NIH grants R01DC06257 and F31DC011970.

*Psychophysiology* 45, 20–24. doi: 10.1111/j.1469-8986.2007.00610.x


loss. *Hear. Res.* 294, 95–103. doi: 10.1016/j.heares.2012.10.002


*Cogn. Brain Res*. 22, 193–203. doi: 10.1016/j.cogbrainres.2004.08.012


after cochlear implantation. *Ann. N.Y. Acad. Sci*. 1169, 437–440. doi: 10.1111/j.1749-6632.2009.04796.x


perception. *Front. Psychol*. 2:126. doi: 10.3389/fpsyg.2011.00126


Activation of human auditory cortex during speech perception: effects of monaural, binaural, and dichotic presentation. *Neuropsychologia* 46, 301–315. doi: 10.1016/j. neuropsychologia.2007.07.008


Preserving syntactic processing across the adult life span: the modulation of the frontotemporal language system in the context of age-related atrophy. *Cereb. Cortex* 20, 352–364. doi: 10.1093/cercor/bhp105


characteristics and speech perception in noise in older adults. *Ear Hear*. 31, 471–479. doi: 10.1097/AUD.0b013e3181d709c2

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 16 August 2013; paper pending published: 11 September 2013; accepted: 07 October 2013; published online: 25 October 2013.*

*Citation: Campbell J and Sharma A (2013) Compensatory changes in cortical resource allocation in adults with hearing loss. Front. Syst. Neurosci. 7:71. doi: 10.3389/fnsys.2013.00071*

*This article was submitted to the journal Frontiers in Systems Neuroscience.*

*Copyright © 2013 Campbell and Sharma. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Enhanced representation of spectral contrasts in the primary auditory cortex

## *Nicolas Catz and Arnaud J. Noreña\**

*Laboratory of Adaptive and Integrative Neurobiology, Fédération de recherche 3C, UMR CNRS 7260, Université Aix-Marseille, Marseille, France*

#### *Edited by:*

*Jonathan E. Peelle, Washington University in St. Louis, USA*

#### *Reviewed by:*

*Christo Pantev, University of Muenster, Germany Jean-Marc Edeline, Université de Paris, France*

#### *\*Correspondence:*

*Arnaud J. Noreña, Laboratory of Adaptive and Integrative Neurobiology, Fédération de recherche 3C, UMR CNRS 7260, Université Aix-Marseille, 3, Place Victor Hugo, 13331 Marseille, France e-mail: arnaud.norena@ univ-provence.fr*

The role of early auditory processing may be to extract some elementary features from an acoustic mixture in order to organize the auditory scene. To accomplish this task, the central auditory system may rely on the fact that sensory objects are often composed of spectral edges, i.e., regions where the stimulus energy changes abruptly over frequency. The processing of acoustic stimuli may benefit from a mechanism enhancing the internal representation of spectral edges. While the visual system is thought to rely heavily on this mechanism (enhancing spatial edges), it is still unclear whether a related process plays a significant role in audition. We investigated the cortical representation of spectral edges, using acoustic stimuli composed of multi-tone pips whose time-averaged spectral envelope contained suppressed or enhanced regions. Importantly, the stimuli were designed such that neural responses properties could be assessed as a function of stimulus frequency during stimulus presentation. Our results suggest that the representation of acoustic spectral edges is enhanced in the auditory cortex, and that this enhancement is sensitive to the characteristics of the spectral contrast profile, such as depth, sharpness and width. Spectral edges are maximally enhanced for sharp contrast and large depth. Cortical activity was also suppressed at frequencies within the suppressed region. To note, the suppression of firing was larger at frequencies nearby the lower edge of the suppressed region than at the upper edge. Overall, the present study gives critical insights into the processing of spectral contrasts in the auditory system.

**Keywords: tinnitus, synaptic depression, inhibition, mach bands, artificial scotoma, hearing loss**

## **INTRODUCTION**

The main goal of the central auditory system is to organize the acoustic environment into a coherent auditory scene, namely to detect, localize, discriminate, segregate and identify the multiple sources composing a sound mixture (Bregman, 1990; Darwin, 1997; Griffiths and Warren, 2004; Shamma and Micheyl, 2010). At the initial steps of processing, the auditory system can rely on the fact that sensory objects are composed of spectral cues such as spectral edges or contours where the stimulus energy reaches a maximum or changes abruptly over frequency (Moore and Glasberg, 1983; Assmann and Summerfield, 2004; Palmer and Shamma, 2004). While spectral peaks produced by vocal tract resonances are known to play an important role for indentifying conspecific vocalizations, in speech for example (Darwin, 1984; Assmann and Nearey, 1987; Henry et al., 2005), spectral troughs or notches produced by the head-related transfer function are also recognized as being critical for localizing sound sources on the sagittal plane (Carlile et al., 2005; Grothe et al., 2010).

A critical issue in auditory neuroscience is how the central auditory system represents acoustic stimuli, in particular the frequency-specific information that is critical for organizing the auditory scene. One possibility is that the central representation mirrors that found in the peripheral cochlear nerve, whereby peaks and troughs in the spectral envelope of the acoustic stimulus could be represented by peaks and troughs in the firing rate of neurons along the tonotopic axis (Sachs and Young, 1979; Blackburn and Sachs, 1990; Silkes and Geisler, 1991; Poon and Brugge, 1993; Conley and Keilson, 1995; May et al., 1998; Recio and Rhode, 2000). However, this peripheral "rate-place" representation has significant limitations. First, the rate-place representation strongly depends on the frequency resolution of the auditory system. In particular, the spectral decomposition carried out by the cochlea tends to smooth the internal representation of the spectral envelope of complex acoustic stimuli (Moore and Glasberg, 1987; Baer et al., 1993). Second, the rate-place representation of the spectral envelope in the cochlear nerve is degraded at high levels of stimulation where the firing rate of cochlear neurons tend to saturate and/or the auditory filters broaden (Sachs and Young, 1979; Glasberg and Moore, 2000; Palmer and Shamma, 2004; Oxenham and Simonson, 2006). Finally, the peak-to-valley ratio of the rate-place representation is further decreased by the presence of background noise, which fills in the spectral valleys (Baer et al., 1993; Assmann and Summerfield, 2004).

The limitations of the cochlear nerve representation suggest that the central auditory system may have developed a strategy to overcome them, in particular to enhance the representation of spectral edges and spectral contrasts (energy ratios between adjacent peaks and valleys). The visual system, for instance, is thought to rely heavily on this mechanism. This is suggested, in particular, by the phenomenon of "Mach bands" which refers to illusory bands perceived at the spatial boundaries where the stimulus luminance changes abruptly over space (Von Bekesy, 1967, 1969a,b). While there have been some attempts to investigate this issue in audition (Von Bekesy, 1967; Carterette et al., 1969; Houtgast, 1972), it is still unclear whether a related process plays a significant role in this modality. Interestingly, however, some psychoacoustic phenomena are consistent with a mechanism enhancing spectral edges. For example, neural enhancement at spectral edges may account for the pitch induced by noise bands at their spectral edges (Small and Daniloff, 1967; Bilsen, 1977), and for the dominant role played in pitch perception by the lowest and highest partials of a harmonic complex, especially when the low-numbered (resolved) partials are removed from the complex (Dai, 2000; Moore and Gockel, 2011).

The aim of the present study is to investigate the sensory representation of the stimulus spectrum in auditory cortex, and in particular whether the representation of spectral edges is enhanced. This was accomplished by employing acoustic stimuli composed of multiple pure tones of various frequencies and presented randomly over time. These stimuli can be thought as mimicking acoustic environments with different spectral profiles when timeaveraged over a few hundreds of milliseconds. Importantly, the fact that this particular stimulus was composed of a mixture of tone pips with non-synchronous onsets allowed for estimating the spectro-temporal receptive fields of cortical neurons for different time-averaged spectral envelopes (deCharms et al., 1998; Blake and Merzenich, 2002; Valentine and Eggermont, 2004; Norena et al., 2008). The present study extends earlier work where the dependence of the spectro-temporal acoustic context on cortical neurons were investigated (Gourévitch et al., 2009).

## **METHODS**

#### **ANIMAL PREPARATION**

The care and use of animals used in this study were approved by the Animal Care Committee of Bouches du Rhones, France (# A 13-504). Ten guinea pigs weighing between 300 and 800 g were used for this study. All animals were deeply anesthetized with the administration of 50 mg/kg of ketamine hydrochloride (Imalgene 1000) and 3 mg/kg of xylazine (Rompun 2%), injected intramuscularly; 0.1 ml of Atropine methyl nitrate and an analgesic (Tolfedine) were also administered. Throughout the experiment, anesthesia was maintained with half the dose of ketamine and xylazine administrated every hour. The tissue overlying the frontal lobe was opened and two screws were fixed to the top of the skull (on the antero-posterior axis) with dental cement, and used to fixate the animal's head. The tissue overlying the right or left side of the skull above the temporal lobe was removed. The skull was opened and the dura was cut back to expose the primary auditory cortex (AI). We used the location of the electrodes (Wallace et al., 2000) as well as the characteristic frequency of the neurons to ensure that the electrodes were located in AI (i.e., progression of best frequencies across electrodes). The body temperature was maintained at 37◦C with a thermostatically controlled heating blanket. After the experiment, a lethal dose of sodium pentobarbital was administered.

#### *Acoustic stimulation*

Stimuli were generated in MATLAB and transferred to an RP2.1-based sound delivery system (Tucker Davis Technologies). Acoustic stimuli were presented in a sound booth room from a headphone (Sennheiser HD595) placed 10 cm in front of the ear contralateral to the cortex where the recordings were carried out. The amplitude of each tone pip was adjusted to the transfer function of the sound delivery system so that they were presented at the desired level in dB SPL.

Spectro-Temporal Receptive Fields (STRFs) were obtained from a 180-s multi-tone pip stimuli (**Figure 1F**) (deCharms et al., 1998; Blake and Merzenich, 2002; Valentine and Eggermont, 2004; Norena et al., 2008; Gourévitch et al., 2009). Tone pips (49 frequencies, 8 frequencies per octave covering 6 octaves) were presented randomly over time (independent Poisson process for each frequency with a rate of 2 Hz and a 50-ms dead time designed so that tones of the same frequency did not overlap in time). Tone pips of different frequencies could overlap in time. The envelope of the tone pips is given by <sup>γ</sup>*(t)* <sup>=</sup> *(t/*4*)*2*e*−*t/*<sup>4</sup> with *<sup>t</sup>* in milliseconds (stimulus duration is 50 ms, maximum amplitude is reached at 8 ms). The average rate of tone pip presentation was around 16 Hz/octave (considering the number of tone frequencies present per octave, along with the average rate of presentation of each). Control STRFs were obtained from multi-tone stimuli with tone pips presented at 70 dB SPL (ctrl-70) or 40 dB SPL (ctrl-40) (**Figures 1A,B**). In the attenuated frequency band (AFB) conditions, all pure tones were presented at 70 dB, except those corresponding to the frequency band of the AFB where pure tones were omitted or presented at 40 dB SPL, producing a large or moderate spectral contrast, respectively (**Figures 1E** and **C**,**D**). The frequencies immediately outside of the AFB were called the edge-out frequencies, while the frequencies immediately inside of the AFB were called the edge-in frequencies (**Figure 1C**). The bandwidth of the AFB was varied (0.5, 1, and 2 octaves). The slope of the spectral contrast (transition in dB/oct between the edge-in frequency and the edge-out frequency) was 240 dB/oct in all conditions (namely 30-dB difference between the edge-in frequency and the edge-out frequency, except in one AFB condition (with 1-oct bandwidth) where the slope was 80 dB/oct until the level of the tone pip frequencies around the center frequency of the AFB was 40 dB (**Figures 1C,D**). The center frequency of the AFB was set as follows. First, the BF for each cortical site was derived from the control stimulus (ctrl-70). The center frequency of the AFB was then set to the BF of a given cortical site. Cortical responses were obtained for all stimulus conditions (different widths, slopes and depths) for that specific center frequency of the AFB. Once a set of recordings was completed, another set of recordings was carried out with a different AFB stimulus (centered on the BF of another cortical site). And so on for all cortical sites with a significant STRF (see below). One notes that as we recorded from many cortical sites simultaneously, the BFs could correspond to the center frequency of the AFB, one edge frequency of the AFB, or a remote frequency from the AFB. An additional stimulus condition was investigated which consisted of multi tone pips where all pure tones were presented at 40 dB SPL, except at one frequency which was presented at 70 dB SPL (**Figure 1G**). Some example sound files are provided in the supplemental material.

#### *MUA and LFP recording procedure*

Each set of recordings was obtained with 1 array of 16 electrodes (Alpha-Omega Eng, Nazareth, Israel) arranged in an 8 by 2 pattern with 0.25 mm electrode separation within the long row and 0.5 mm separation between rows. The electrodes had impedances between 0.8–1.4 MOhm. The array was manually advanced using a Narishige microdrive into primary auditory cortex (according to the location provided by Wallace et al., 2000). The signals were then amplified 10,000 times with filter cutoff frequencies set at 2 Hz and 5 kHz. The amplified signals were processed by a TDT-System three multichannel data acquisition system. Multi-unit activity was sampled at 24,414 Hz and was extracted from the 300- Hz high-pass filtered signal. Local field potentials (LFPs) were sampled at 1061 Hz and were extracted from the 300-Hz low-pass filtered signal. In this way, we were able to record spikes and LFPs simultaneously.

At an initial stage of the experiments, a "search procedure" was used and consisted of recording cortical activity induced by clicks, noise bursts and tone pips (from 500 Hz to 32 kHz, 1/8 octave step). This procedure provided a rough estimate of the tonotopy and the amplitude of LFPs. Moreover, electrodes were placed at a depth where the (negative) amplitude of stimulus-induced LFPs was near maximal (region of the border between layer III and IV—Szymanski et al., 2011).

#### *Data analysis*

All results were computed using custom MATLAB routines. Multi-unit activity or "spike events" were detected by using an amplitude threshold on the high-pass filtered data. The median was calculated on the negative values of the filtered signal; the threshold was then set to six times the median (see Quiroga et al., 2004 for a similar method). The spike waveforms were inspected visually throughout the experiments to ensure that they had a typical shape; inserts in **Figures 2**, **3** show the typical shape of multiunit activity.

The methodology for computing STRFs was similar to that used in previous studies (Valentine and Eggermont, 2004; Norena et al., 2008). Briefly, STRFs for MUA were determined by constructing poststimulus time histograms (PSTHs), with time bins of 1 ms for each tone pip frequency. In other words, spikes falling in the averaging time window (starting at the stimulus onset and lasting 100 ms) are counted. Because the average interstimulus interval in the stimulus ensemble is smaller than the averaging time window, a spike can be counted in the PSTH of several

pip frequencies. STRFs for LFPs were obtained by a similar procedure, except that the LFP waveforms (0–80 ms after stimulus onset) were averaged for each appropriate tone pip frequency. The maximal MUA response (or the minimal LFP amplitude) within the 10–40 ms time window after stimulus onset and over all frequencies was obtained from the ctrl-70 STRF. All STRFs (including those obtained from the ctrl-70 condition) are then normalized by dividing the mean neural activity by this single value. This normalization was aimed at minimizing the firing rate variability across recording sites. By definition, the maximum neural activity for the ctrl-70 condition was 1 (at the best frequency), and usually lower than 1 for the ctrl-40 condition. One notes that values above 1 are sometimes observed in the AFB conditions (i.e., at the edge frequencies of the AFB); this indicates that the maximum of absolute firing rate in the AFB conditions is larger than the maximum of absolute firing rate in the ctrl-70 condition (see **Figures 2**, **3**). This normalized mean neural activity is the dependent variable displayed in the STRFs (**Figures 2**, **3**).

In order to compare the STRFs obtained from control and stimuli producing an AFB (and for display purpose), the differences between their frequency profiles were computed. The frequency profiles were obtained from the normalized STRFs by taking the maximum neural activity within a time window of 10– 40 ms for each tone pip frequency. For the frequencies outside the AFB, which were presented at 70 dB, the responses were compared to the corresponding frequencies obtained from the ctrl-70. For the frequencies inside the AFB, which were presented at 40 dB, the responses were compared to the corresponding frequencies obtained from the ctrl-40.

Finally, the patterns of excitation, namely the neural population activity over the tonotopic axis, were obtained for the AFB conditions: for each tone pip frequency, the averaged normalized activity is derived for all MUA. Then, instead of plotting neural activity (for a given cortical site) as a function of stimulus frequency, neural activity evoked by a given frequency is plotted as a function of the best frequency of neurons (obtained from many cortical sites). The pattern of excitation could be

obtained and plotted for each tone pip frequency. Assuming that auditory information is represented as a "rate-place" code in the auditory cortex, the pattern of excitation may be closer to what downstream neurons read out during stimulation. In other words, the cortex may not directly detect changes in the neurons' best frequencies but rather read out the population activity, namely neural activity along the tonotopic axis. We were particularly interested in the pattern of excitation of the edge-in and edge-out frequencies of the AFB, as we suspected that the pattern of excitation would be modified at these frequencies in the AFB conditions compared to the control conditions.

#### *Statistics*

Before applying any statistical test we first verified the normality of the distribution in order to validate or not the use of parametric tests. As all distributions followed the Normal law, we then used the parametric Student test (*t*-test) to compare two distributions or one distribution against zero. Significativity thresholds were adjusted according to the number of comparisons (Bonferroni's correction). First, the group analysis was carried out on sites with "significant" STRFs for the 70-ctrl condition: the maximal response of the 70-ctrl STRFs within the 10–40 ms time window had to be significantly larger than the "background activity" (computed from the neural activity over the 49 frequencies and 100 time bins, so 4900 values). As 1470 comparisons were made (30 time bins × 49 frequencies), the significativity threshold was adjusted accordingly (Bonferroni correction, *p* = 0*.*05/1470). Second, all other comparisons between data sets and a reference value (or data sets between ech other) were also Bonferroni corrected when needed. For instance, when the differences in firing rate between control and AFB conditions were compared to 0 for 32 different frequencies (±2 octaves on either side of the AFB center), the significance value was adjusted accordingly (*p* = 0*.*05/32). As the center frequency of AFB was usually centered on neuron's BF, the number of recordings was larger for sites with BF near the center frequency of the AFB than at remote frequencies. Overall, the number of recordings as a function of the distance from the center frequency of the AFB were comprised between 24 (BFs remote from AFB center) and 117 (BFs at or near the AFB center).

#### **RESULTS**

The aim of the present study was to investigate the cortical representation of spectral edges in auditory cortex. A total of 317 multi-unit activity (MUA) recordings were obtained from the primary auditory cortex of 10 anesthetized guinea pigs. The median for the distribution of best frequencies derived from the STRFs was 11,314 Hz (lower and upper quartiles were 7336 Hz and 20,749 Hz, respectively).

#### **CORTICAL REPRESENTATION OF A BROAD-BAND STIMULUS WITH AN ATTENUATED FREQUENCY BAND (AFB)**

Here, we investigated the cortical representation of the frequencies composing a multi-tone stimulus. In particular, we focused on the representation of spectral edges (edge-in and edge-out frequencies) of the AFB. As we were also interested in studying whether the representation of the edge frequencies is sensitive to their local acoustic context (the spectral shape around the edge frequencies), the width, depth and sharpness of the AFB were varied.

#### **INDIVIDUAL EXAMPLES**

**Figure 2** depicts a representative example of MUA and LFP responses obtained for the different conditions of multi tone stimuli (the long-term frequency spectrum of the stimuli is shown in the first row). For this example, the center frequency of the AFB was chosen to correspond to the best frequency (BF) of the MUA (around 2348 Hz). When comparing the responses at the edge-out frequencies with those obtained from the 70-dB control condition, one observes a dramatic increase. Remarkably, there was a clear neural response (in terms of multi-unit activity) at the upper edge-out frequencies for the 2-octave AFB condition (column 5), even though both spectral edges fall outside of the MUA receptive field recorded in control conditions. The increase of responses at the edge-out frequencies was larger for the sharp contrast (compare columns 4 and 6) and for the large contrast (compare columns 4 and 7) conditions. On the other hand, when comparing the responses within the AFB with the 40 dB control condition, one observes a dramatic decrease of responses in all conditions, especially for edge-in frequencies.

**Figure 3** shows an additional example, where the BF (4362 Hz) of MUA was almost 1 octave above the center of the AFB (2378 Hz). This example illustrates that responses are not modified when the frequency range of the AFB is far away from the MUA's receptive field (column 3). However, neural responses were broadly increased when the upper edge of the AFB was near (columns 4 and 7) or overlapped with the receptive field (column 5). Once again, this example shows that the neural enhancement is larger for the conditions with sharp (compare columns 4 and 6) and large (compare columns 4 and 7) spectral contrast.

#### **GROUP DATA**

The frequency profiles averaged over the recordings where the BF corresponded to the center of the AFB are shown in **Figure 4**. On average, the cortical responses are greatly enhanced at both the upper and lower edge-out frequencies, and decreased within the frequency range of the AFB. The enhancement of responses at both upper and lower edge-out frequencies was maximal for the sharp and the large contrast conditions.

We next computed the difference between the frequency profiles of neural responses obtained from the AFB stimuli and those obtained from the control stimuli (see methods). This comparison was carried out for three specific positions of BF relative to the center of the AFB: when BF corresponded to the lower edge (1/8 octave), the upper edge (±1/8 octave) and the center of the AFB (±1/8 octave). The average differences between the frequency profile of AFB and control stimuli for three positions of BF relative to the center of the AFB are shown in **Figure 5**. The effects of the AFB stimuli relative to the control stimuli were tested statistically for both MUA and LFPs. As the results were generally not different between MUA and LFPs, we did not discriminate between these two signals in the rest of the manuscript. In other words, when a statistical difference is reported, this applies for both MUA and LFPs.

The neural enhancement for the edge-out frequencies was statistically significant for all widths of AFB and for the three positions of BF relative to the center of the AFB (*p <* 0*.*0014). It is worth mentioning that cortical responses were increased by about 70% for the fully AFB (when BF corresponded to either lower or upper spectral edge). Moreover, the neural enhancement for the edge-out frequencies was sensitive to the sharpness and the depth of the contrast. Indeed, cortical responses at edge-out frequencies were significantly larger for the sharp contrast condition (vs. the shallow-slope condition, *p <* 0*.*05) and for the 1 octave large contrast condition (vs. the 1 octave moderate contrast condition, *p <* 0*.*05), for all 3 positions of BF. The enhancement at the edgeout frequencies was also sensitive to the width of the AFB; indeed, the enhancement was smaller for the 0.5 octave condition compared to the 1 and 2 octave conditions (*p <* 0*.*05, 1 and 2 octaves conditions were not different from each other).

Besides the neural enhancement at the edge-out frequencies, there was a significant suppression of responses at the edge-in frequencies for the three positions of BF and for all notch widths (*p <* 0*.*0014, except for the 2-octave condition, and when BF was positioned at the upper edge frequency of the AFB). Interestingly, one notes that this neural suppression at edge-in frequencies was asymmetric for the 2-octave condition: the suppression was stronger when BF corresponded to the lower edge of the AFB (versus when BF corresponded to the upper edge of the AFB) (*p <* 0*.*05). Finally, the suppression at frequencies around the center of the AFB was largest for the 0.5 condition, when the BF corresponded to the center of the AFB.

While we found, on average, a clear (and significant) effect of the AFB on neural responses at the edge-in (suppression) and edge-out (enhancement) frequencies (**Figures 4**, **5**), the prevalence of these changes, namely whether they concern a majority of recording sites or not, is unknown. The percentages of recording sites showing at least 20% increase or decrease as a function of frequency for the three positions of BF relative to the center of the AHL are shown in **Figure 6**. Nearly 90% of the recording sites showed an increase of neural responses at the lower and upper edge-out frequencies, while a decrease of neural responses at edge-in frequencies was observed in around 60% of the recordings. This suggests that the cortical changes induced by the notched stimuli are very systematic. It is also worthwhile to mention that while the percentages of sites showing an increase at the edge-out frequencies is similar whether the BF corresponded to the lower edge or the upper edge of the AFB, the percentages of sites showing a decrease at the edge-in frequencies is larger when BF corresponded to the lower edge of the AFB (around 60% of the sites) than when BF corresponded to the upper edge of the

**FIGURE 4 | Average frequency profile for all stimulus conditions, for one BF position relative to the AFB center (neural BF corresponding to the AFB center) for MUA (first row) and LFPs (second row).** Frequency profiles are obtained by taking the maximal normalized firing rate (for MUA, or the minimal normalized amplitude for LFPs) between 10 and 40 ms after stimulus onset. Each column corresponds to an AFB condition. Schematics illustrating the long-term spectrum of the acoustic conditions are shown at the top of the figure. First column: half-an-octave partially AFB. Second column: one-octave partially AFB.

Third column: two-octaves partially AFB. Fourth column: shallow-slope partially condition. Fifth column: fully AFB. Both control conditions (at 70 dB SPL, continuous black line and at 40 dB SPL, dashed black line) are shown in all panels to permit a direct comparison between control and AFB conditions (red line). Neural responses were greatly enhanced at both edges of the notch, and reduced within the notch. The neural enhancement at both edges of the AFB was sensitive to the sharpness and the depth of the spectral contrast (compare 2nd and 4th columns, and 2nd and 5th columns, respectively).

AFB (around 20–30% of the sites). This result is consistent with the asymmetry in suppression observed from the averaged data (**Figure 5**), showing that neural suppression of edge-in frequencies is stronger at the lower edge of the AFB than at the upper edge (**Figure 6**) (see discussion for putative functional implications).

#### **POPULATION ACTIVITY OVER THE TONOTOPIC AXIS**

Thus far, neural data were analyzed with an emphasis on the characteristics of neural tuning. From a neural decoding point of view, on the other hand, a more relevant representation may be the spatio-temporal distribution of population activity. The cortex processes this dynamic and distributed population activity in real time over remote cortical regions. In order to provide a representation of neural activity closer to what may be relevant in the auditory cortex, we derived an excitation pattern (thought to approximate population activity) from MUA and LFP recordings. One notes that this representation was made possible by our matrix electrodes which allowed a relatively dense sampling of cortical responses over the tonotopic axis. The excitation patterns were obtained for each tone pip frequency presented in the multi-tone stimuli. Instead of representing the individual or average activity of cortical neurons as a function of frequency, the average neural activity was represented as a function of neural BF for each given stimulus frequency. This representation gives an estimate of the spatial representation (or population activity) of each tone pip frequency over the tonotopic axis (**Figure 7**). The resulting excitation patterns in the control condition (stimulus with a flat spectrum) were very homogeneous over frequency

stimulus conditions and the location of BF relative to the AFB are shown at the top of the figure. The first and second rows represent the average MUA and LFP, respectively. First column: average data for neurons with BF corresponding to the lower edge of the AFB. Second

enhancement of neural responses was maximal for the fully AFB and minimal for the half-an-octave condition. The reduction of responses within the AFB was larger when the BF corresponded to the lower edge of the AFB (vs. upper edge).

and resembled a Gaussian-shaped curve: for a given tone pip frequency, the activity is maximal for neurons with BF corresponding to that frequency (by definition), while neural activity decreases gradually for neurons whose BF is further from that frequency. More interestingly, the excitation patterns obtained from AFB stimuli were very different from those derived from control conditions. At the edge-out frequencies, the excitation patterns were not only increased (in terms of neural response amplitude, as already shown above) but they became broader. On the other hand, at the edge-in frequencies, the excitation patterns were decreased in amplitude and became narrower.

In order to investigate whether the AFB stimuli modified the cortical representation of edge-in and edge-out frequencies, the width of the excitation patterns was derived (at the normalized neural activity of 0.2). The respective widths obtained from control and AFB stimuli were then compared statistically (**Figure 8**). During the stimulation with the AFB stimuli, the representation of the edge-out frequencies was expanded (*p <* 0*.*05), while the representation of the edge-in frequencies was narrowed (*p <* 0*.*05). These results suggest that the cortical representation of stimulus frequencies (in terms of the amplitude of the response and number of the neurons involved) is highly dynamic and depends heavily on the overall acoustic spectrum or acoustic context.

#### **CORTICAL REPRESENTATION OF A BROAD-BAND STIMULUS WITH AN ENHANCED FREQUENCY (EF)**

In order to gain further insight into the properties of the firing rate reduction on either side of a spectral edge (later called "lateral suppression"), in particular its width and asymmetry, an additional experiment was carried out. In this experiment, cortical responses were obtained from a multi-tone stimulus where all pure tones were presented at 40 dB SPL, except at one frequency which was presented at 70 dB SPL; that frequency was referred as the enhanced frequency (EF). The main purpose of this experiment was to investigate the cortical representation of tone pip frequencies surrounding the EF. Indeed, if a central mechanism exists that sharpens the neural representation of spectral edges,

then one expects a decrease of neural responses at frequencies adjacent to the EF, as this would produce an increase of the peak-to-valley ratio.

The average differences between the frequency profiles of the EF stimulus and control stimuli for three positions of BF relative to the EF are shown in **Figure 9**. Neural responses obtained from the EF stimulus at the EF were compared to the ctrl-70 and those at other frequencies were compared to the ctrl-40. Cortical responses at the EF were largely increased in the EF condition compared to those in the control condition (*p <* 0*.*0014). On the other hand, neural responses were significantly decreased on either side of the EF (up to ¼ octave away from the EF, *p <* 0*.*0014—condition where the EF corresponded to BF). Interestingly, this decrease was largely asymmetric over frequency: the decrement of cortical responses was stronger (and slightly wider) towards high frequencies than towards low frequencies. In the condition where BF was lower than the EF, only one frequency below the EF was significantly suppressed (*p <* 0*.*0014). In the condition where BF was higher than the EF, frequencies up to 3/8 octave above the EF were suppressed (*p <* 0*.*0014). The width and the asymmetry of the suppressed sidebands observed in this stimulus condition are broadly consistent with the neural changes produced by the AFB stimuli (**Figures 5**, **6**) (see discussion).

## **DISCUSSION**

The present study was aimed at investigating whether there is an enhancement, in auditory cortex, of the representation of spectral edges in acoustic stimuli. Overall, we show that the cortical representation of the acoustic spectrum tends to enhance the spectral edges. As the stimuli used in this study have spectral contrasts or edges only when they are time-averaged over a few hundreds of milliseconds, our results imply that auditory centers integrate the stimulus spectrum over hundreds of milliseconds. More specifically, in the condition where a frequency band was attenuated, we observed that cortical responses were increased near the edge-out frequencies, whereas they were reduced for the edge-in frequencies. Interestingly, by estimating the neural population activity over the tonotopic axis, we also found that the cortical response profile following the presentation of a stimulus with an AFB was

Neural activity was normalized to the maximal activity obtained in the ctrl-70 condition and was smoothed with a moving average. In the first (for MUA) and third (for LFPs) rows, excitation patterns for the ctrl-70 (first column) and the ctrl-40 (second column) are represented. In the first column, the second (for MUA) and fourth (for LFPs) rows represent the excitation patterns for the fully AFB condition. The "excitation patterns" for frequencies at the upper

the excitation patterns for the partially AFB condition. The excitation patterns for frequencies at the upper and lower edge-in frequencies are represented by thick blue lines (excitation pattern at center frequency of the AFB is represented by thick black line). This figure shows that the representations of frequencies at both edge-out frequencies of the AFB as well as at edge-in frequencies are greatly "distorted".

greatly altered: the relative number of sites responsive to the edgeout frequencies was increased, while the relative number of sites responsive to the edge-in frequencies was decreased (compared to the number of sites representing the frequencies remote from the AFB). These cortical changes were sensitive to the properties of the AFB, namely its width, depth and sharpness. These changes were highly systematic, being present in the majority of cortical recording sites. In the condition where the sound level of a single tone frequency was increased, neural activity was reduced at the neighboring frequencies of the enhanced frequency.

#### **THE SPECTRO-TEMPORAL INTEGRATION INVESTIGATED BY OTHER STUDIES**

The effects "at a distance" between frequencies presented in a given temporal sequence reported in the present study are reminiscent of those reported previously. In particular, presenting a pulsated tone pip at a given frequency for seconds to minutes has been shown to produce a decrement of cortical responses not only at the tone pip frequency but also at nearby frequencies (Condon and Weinberger, 1991; Ulanovsky et al., 2004). The spectrotemporal interactions of acoustic stimuli in auditory cortex have

**FIGURE 8 | Difference (in octaves) in the width of the "excitation patterns" between the two AFB conditions (one-octave and fully AFB) and the control conditions (40-dB within the AFB and 70-dB outside the** **AFB).** Positive values indicate increased width, while negative values indicate decreased width. The width of the "excitation patterns" gives an estimate of the number of neurons involved in the representation of a given frequency.

**EF and the control conditions for three positions of the EF relative to the neural BF, as a function of frequency.** Each column represents a position of the EF relative to the neural BF. At the top of the figure, schematics illustrate all stimulus conditions, namely the location of the receptive field relative to the EF (see text for more details). First column: average data for neurons with BF corresponding to the EF. Third column: average data for neurons with BF higher than the peak frequency. One notes the neural suppression at frequencies adjacent to the peak frequency and the asymmetry of this neural suppression (suppression is stronger from low to high frequencies).

been investigated using two-tone sequences (Shamma, 1985; Shamma and Symmes, 1985; Calford and Semple, 1995; Rajan, 1998). A complex pattern of firing suppression and facilitation has been reported, which depends on the frequency separation and the delay between the two tones (Brosch and Schreiner, 1997, 2000; Brosch and Scheich, 2008; Sadagopan and Wang, 2010). One notes that little difference has been found between these effects in multi-unit and single unit activity (Brosch and Schreiner, 1997, 2000; Brosch et al., 1999). Our demonstration that cortical responses (multi-unit activity and local field potentials) are either suppressed or enhanced depending on the stimulus context is consistent with these studies. However, the above studies did not specifically address the cortical representation of spectral edges embedded in spectrally complex acoustic stimuli or the sensitivity of this representation to the characteristics of the spectral edges (width, depth, sharpness). Indeed, while neural enhancement at spectral edges has been predicted by various computational models (Shamma, 1985; Yost, 1986; Gerken, 1996; Parra and Pearlmutter, 2007), there is to our knowledge only one experimental study showing cortical enhancement near the cutoff frequency of 2-octave wide multitone stimuli (Gourévitch et al., 2009). The present study, however, extends the latter by reporting, for the first time, the effects of the physical characteristics of the spectral contrast (sharpness, depth and width) and by documenting neural responses for frequencies nearby and within spectral notches. Our study also provides new information about the width and asymmetry of lateral suppression that are crucial for computational models and the functional implications of these mechanisms (see below). Finally, while we did not investigate specifically the effects of rate and spectral density of our acoustic stimuli on cortical responses (Blake and Merzenich, 2002; Valentine and Eggermont, 2004; Norena et al., 2008), it is likely that the central changes reported in the present study are sensitive to these parameters. Presentation rate and spectral density have to be high enough to fall within the spectro-temporal integration constants of cortical neurons. Indeed, very small presentation rate should not produce any edge enhancement, even for spectrally dense stimuli, and reversely.

#### **MECHANISMS OF NEURAL ENHANCEMENTS AND DECREMENTS AT THE SPECTRAL EDGES**

The malleability of cortical responses reported in the present study are produced by acoustic stimuli presented passively for only 3 min, in contrast to studies that reported rapid modifications in frequency tuning but during/after active listening (Edeline et al., 1993; Fritz et al., 2003, 2005; Elhilali et al., 2007) or after prolonged (on the order of weeks) stimulation (Norena et al., 2006; Kim and Bao, 2009). The rapidity of these cortical changes precludes the involvement of slow cortical changes such as those involved in homeostatic plasticity (Watt and Desai, 2010) or longterm depression and potentiation (Buonomano and Merzenich, 1998). Instead, they are likely the results of one or a combination of relatively fast mechanisms, occurring on the order of milliseconds to seconds or minutes.

The first (fast) mechanism that comes to mind to account for our results is lateral inhibition. Indeed, it has long been recognized that lateral inhibition could be used by sensory systems to sharpen/enhance the representation of stimulus contrasts (Hartline et al., 1956; Ratliff and Hartline, 1959; Von Bekesy, 1967, 1969a,b; Marr and Hildreth, 1980). The presence of lateral inhibition has been suspected at virtually all levels of the central auditory system using various methodology such as whole-cell recordings (Wu et al., 2008), electrophysiology (twotone sequences, effects of hearing loss) (Shamma and Symmes, 1985; Calford et al., 1993; Rhode and Greenberg, 1994; Calford and Semple, 1995; Suga et al., 1997; Rajan, 1998, 2001; Wang et al., 2002; Noreña et al., 2003) and pharmacology (Yang et al., 1992; LeBeau et al., 2001). Lateral inhibition is likely to contribute to our results when tones of different frequencies overlap in time. Moreover, tones are presented in random temporal sequences with a relatively short average inter-stimulus interval (500 ms for one frequency, or nearly 60 ms for one-octave frequency band—roughly the width of the STRFs) suggesting that the cortical activity induced by a tone at a given time also depends on the tones presented shortly before it (Brosch and Schreiner, 1997, 2000; Brosch and Scheich, 2008; Sadagopan and Wang, 2010). These post-stimulatory effects on neural activity have been shown to result from synaptic inhibition up to 100 ms after stimulus presentation (Wehr and Zador, 2005). At longer delays, on the other hand, other mechanisms involved in synaptic depression, such as receptor desensitization, vesicle depletion and changes in presynaptic release probability are thought to be at work.

In summary, we propose that the cortical changes reported in the present study are likely the results of different mechanisms such as synaptic inhibition and synaptic depression. Reduced synaptic inhibition and/or synaptic depression produced by frequencies falling in the AFB could result in a relative increase of responses at the edge-out frequencies, while the enhanced synaptic inhibition and/or synaptic depression produced by the (enhanced) edge-out frequencies could in turn reduce the responses at the edge-in frequencies. One further important question relative to the mechanisms enhancing spectral contrasts is whether they operate at the cortical level or are inherited from earlier stages of the auditory pathway. The very similar pattern of responses for MUA and LFPs (the latter known to represent mainly the thalamic inputs sent toward the cortex) (Mitzdorf, 1985; Steinschneider et al., 1992) suggests that the enhancement of spectral contrasts observed in cortex is largely inherited from lower levels. Consistent with a sub-cortical contribution to the cortical changes reported in the present study, a complex pattern of firing suppression has been evidenced in the cochlear nucleus (Rhode and Greenberg, 1994). Further studies will be needed to investigate this important question. Finally, the present study has been carried out on anesthetized animals (mixture of ketamine and xylazine); consequently, it is unclear whether the results reported here also apply to awake animals. However, a study carried out in the primary auditory cortex of awake macaques shows that anesthetics only marginally modifies the pattern of neural suppression and facilitation produced by two-tone sequences (Brosch and Scheich, 2008). The latter study suggests that the cortical responses produced by our stimuli may also apply to awake preparation.

#### **SENSORY INPUT CONDITIONS MIMICKED BY OUR STIMULI WITH ATTENUATED FREQUENCY BAND**

The stimuli used in the present study can be thought as producing acoustic environments with different spectral profiles when time-averaged over a few hundreds of milliseconds. One can wonder whether these synthetic stimuli mimic natural sensory input conditions for the auditory system.

One pattern of sensory inputs that may be mimicked by our notched stimuli is that produced by a complex broadband environment in presence of sharp notched hearing loss. Hearing losses restricted to a given frequency band (i.e., referred to as an audiogram with notches) have been reported in many studies (Gates et al., 2000; McBride and Williams, 2001; Rabinowitz et al., 2006; Nondahl et al., 2009; Etchelecou et al., 2011). Assuming that the time-averaged acoustic background is "flat", this particular shape of hearing loss is thought to result in an averaged pattern of (rate-place) activity in the cochlear nerve with a dip corresponding to the hearing loss region. More specifically, frequency regions outside hearing loss are evenly stimulated, while the frequency region of hearing loss receives only weak stimulation, if any (Gerken, 1996). In other words, our notched stimuli mimic the contrast in the averaged rate-place sensory inputs over the tonotopic axis in presence of hearing loss. As the pattern of sensory inputs provided by the AFB stimuli resembles the averaged pattern of sensory inputs in presence of notched hearing loss, the AFB stimuli can be interpreted as producing an acute "functional deafferentation" or "artificial hearing loss" (Pantev et al., 1999; Norena et al., 2000; Okamoto et al., 2007). In this context, our notched stimuli can be considered as an equivalent of the stimulus used in vision to produce an "artificial scotoma," i.e., moving lines or random dots stimulating the visual field around a small non-stimulated area (Ramachandran and Gregory, 1991; Pettet and Gilbert, 1992; Das and Gilbert, 1995; DeAngelis et al., 1995). One notes, however, that our stimuli do not model some typical characteristics accompanying cochlear damage, such as the decrease in spontaneous activity in the cochlear nerve within the frequency range of cochlear damage (Liberman and Dodds, 1984), the neural degeneration of cochlear fibers (Kujawa and Liberman, 2006, 2009) and/or the broadening of auditory filters (Glasberg and Moore, 1986).

The results of the present study may give some insights into the sensitivity of the auditory cortex to the characteristics of acute hearing loss. These properties are potentially important for the understanding of the functional implications of the cortical changes produced by acute hearing loss, such as tinnitus, for example (Norena, 2011; Noreña and Farley, 2013). One notes that the relationship between cortical changes and the characteristics of hearing loss is relatively difficult to study in practice as (noise-induced) hearing loss is generally variable (Loeb and Smith, 1967; Atherley et al., 1968). In conclusion, our study suggests that the cortical changes produced by acute hearing loss could be sensitive to the sharpness, depth and width of hearing loss. Moreover, while the cortical changes observed in the present study are short-term, it is possible that a more prolonged exposure to the AFB stimulus could induce long-lasting changes such as those produced by chronic hearing loss or reported in previous studies (Robertson and Irvine, 1989; Rajan et al., 1993; Norena and Eggermont, 2005; Norena et al., 2006; Pienkowski et al., 2013).

#### **PROPERTIES OF CENTRAL INHIBITION**

Our results also provide some information about the properties of lateral suppression of firing (either it is produced by synaptic inhibition and/or synaptic depression) in the central auditory system. The bandwidth of suppressed sidebands derived from this study (0.25–0.4 octave) closely approximates the lateral inhibitory networks described by Shamma (1985, 0.3 octave) and Yost (1986, 0.2 octave). We also show that lateral suppression is asymmetric as a function of frequency with a stronger and wider suppression produced toward high frequencies (suppression was significant up to around 0.4 octaves above the spectral peak) than toward low frequencies (suppression was significant up to 0.25 octaves below the spectral peak) (**Figure 9**). This particular pattern of asymmetric inhibition is consistent with the results of (Zhang et al., 2003) for high BF neurons. As the pattern of vibration of the basilar membrane is asymmetric (slope is shallower on the basal side of the cochlea compared to the apical side), leading to the corresponding asymmetric pattern of excitation in the cochlear nerve, it has been suggested that the asymmetry of central inhibition (stronger inhibition from low to high frequencies) may further refine the central representation of spectral edges (Suga, 1995; Okamoto et al., 2007).

We have proposed that the "Zwicker tone," the tonal and faint illusory percept produced after the presentation of a notched noise (broadband noise containing a suppressed frequency band) (Zwicker, 1964; Lummis and Guttman, 1972; Wiegrebe et al., 1996; Franosch et al., 2003), could be interpreted as a model of transient tinnitus (Norena et al., 2000; Noreña and Eggermont, 2003; Parra and Pearlmutter, 2007). The "Zwicker tone" can also be induced by low-pass or high-pass noises, although the former is more efficient to produce the sensation (Lummis and Guttman, 1972). It is interesting to note that this asymmetry for producing the "Zwicker tone" might be related to the asymmetry in neural suppression reported in the present study (larger neural suppression at lower edge-in frequencies vs. upper edge-in frequencies).

#### **IMPLICATIONS FOR NEURAL CODING**

The present study shows that the cortical representation of spectral edges is enhanced (more neurons are dedicated to the representation of spectral edges). A putative link between stimulus importance and its representational size in the primary auditory cortex has been suggested (Rutkowski and Weinberger, 2005). Our study further suggests that the representational size of spectral cues may be dynamically enhanced in cortex. This may improve the processing of relevant spectral cues (edges) within the ever changing acoustic environment.

It has been suggested that the responsiveness (gain) of subcortical and cortical neurons could be dynamically adapted to the statistics (mean and variance) of stimuli. This mechanism provides an elegant solution to the dynamic range problem (Viemeister, 1988) by adjusting the input–output function of neurons to the distribution of input levels (Dean et al., 2005, 2008; Watkins and Barbour, 2008, 2011; Rabinowitz et al., 2011, 2012). These studies varied the mean and variance of stimulus level across conditions but the mean stimulus level was fixed (for single pure tone) or uniform (for noise bursts or multi-tone pips) over frequency for a given condition. Our study can be considered as an extension of these studies as the mean level was varied over frequency (mean level was low in the AFB, and high elsewhere). While the hypothesis of gain control predicts a decrease of gain for high contrast stimuli (and the reverse at low contrast stimuli neurons become more sensitive to small level variations), our results suggest the opposite: the firing rate difference between edge-out and edge-in frequencies are maximally enhanced for sharp and deep contrast. These results emphasize the importance

#### **REFERENCES**


of considering the effects of the spectral dimension (spectral envelope) in future studies investigating contrast gain control.

## **ACKNOWLEDGMENTS**

The authors wish to thank Brandon Farley, Yves Cazals, Olivier Macherey, and Lucas Parra, for their comments on an earlier version of this manuscript. This work was supported by the Tinnitus Research Initiative and the Agence Nationale de la Recherche (ANR) grant ANR-2010-JCJC-1409-1.

#### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/Systems\_Neuroscience/10.3389/ fnsys.2013.00021/abstract


220–228. doi: 10.1016/S0378-5955 (99)00223-3


adults. *Ear Hear* 30, 696–703. doi: 10.1097/AUD.0b013e3181b1d418


neurons after limited receptor organ damage. *Cereb. Cortex* 11, 171–182. doi: 10.1121/1.381276


*J. Neurosci.* 24, 10440–10453. doi: 10.1523/JNEUROSCI.1905-04.2004


the absolute threshold of hearing and its relationship to the Zwicker tone. *Hear. Res.* 100, 171–180. doi: 10.1016/0378-5955(96)00111-6


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 17 March 2013; accepted: 23 May 2013; published online: 19 June 2013.*

*Citation: Catz N and Noreña AJ (2013) Enhanced representation of spectral contrasts in the primary auditory cortex. Front. Syst. Neurosci. 7:21. doi: 10.3389/ fnsys.2013.00021*

*Copyright © 2013 Catz and Noreña. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

## Is the auditory evoked P2 response a biomarker of learning?

#### **Kelly L. Tremblay<sup>1</sup>\*, Bernhard Ross 2,3 , Kayo Inoue1,4 , Katrina McClannahan<sup>1</sup> and Gregory Collet 5,6**

<sup>1</sup> Department of Speech and Hearing Sciences, University of Washington, Seattle, WA, USA

<sup>2</sup> Baycrest Centre, Rotman Research Institute, Toronto, ON, Canada

<sup>3</sup> Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada

<sup>4</sup> Department of Radiology, Integrated Brian Imaging Center, University of Washington, Seattle, WA, USA

<sup>5</sup> Life Sciences Department, Royal Military Academy, Brussels, Belgium

<sup>6</sup> Unité de Recherche en Neurosciences Cognitives, Centre de Recherches en Cognition et Neurosciences Université Libre de Bruxelles, Brussels, Belgium

#### **Edited by:**

Arthur Wingfield, Brandeis University, USA

#### **Reviewed by:**

Jonas Obleser, Max Planck Institute for Human Cognitive and Brain Sciences, Germany Lisa Payne, Brandeis University, USA

#### **\*Correspondence:**

Kelly L. Tremblay, Department of Speech and Hearing Sciences, University of Washington, Eagleson Hall, 1417 NE 42nd St., Seattle WA 98105, USA e-mail: tremblay@uw.edu

Even though auditory training exercises for humans have been shown to improve certain perceptual skills of individuals with and without hearing loss, there is a lack of knowledge pertaining to which aspects of training are responsible for the perceptual gains, and which aspects of perception are changed. To better define how auditory training impacts brain and behavior, electroencephalography (EEG) and magnetoencephalography (MEG) have been used to determine the time course and coincidence of cortical modulations associated with different types of training. Here we focus on P1-N1-P2 auditory evoked responses (AEP), as there are consistent reports of gains in P2 amplitude following various types of auditory training experiences; including music and speech-sound training. The purpose of this experiment was to determine if the auditory evoked P2 response is a biomarker of learning. To do this, we taught native English speakers to identify a new pre-voiced temporal cue that is not used phonemically in the English language so that coinciding changes in evoked neural activity could be characterized. To differentiate possible effects of repeated stimulus exposure and a button-pushing task from learning itself, we examined modulations in brain activity in a group of participants who learned to identify the pre-voicing contrast and compared it to participants, matched in time, and stimulus exposure, that did not. The main finding was that the amplitude of the P2 auditory evoked response increased across repeated EEG sessions for all groups, regardless of any change in perceptual performance. What's more, these effects are retained for months. Changes in P2 amplitude were attributed to changes in neural activity associated with the acquisition process and not the learned outcome itself. A further finding was the expression of a late negativity (LN) wave 600–900 ms poststimulus onset, post-training exclusively for the group that learned to identify the prevoiced contrast.

**Keywords: auditory, training, ERP, P2, exposure, learning, rehabilitation, electrophysiology**

#### **INTRODUCTION**

Long before the effects of auditory deprivation and stimulation on the brain were known, audiologists used auditory training exercises as a way to help people compensate for hearing loss (Carhart, 1960). The motivation for such exercises stemmed from the fact that adults and children with hearing loss often needed help in dealing with their speech perception deficits that remained after being fit with hearing aid amplification devices (Boothroyd, 2010). Some people reported training exercises to be helpful and others did not, so the use of auditory training exercises was questioned and slowly faded from clinical practice. By the year 2005, a mere 30% of audiology practices reported using auditory training type interventions in routine clinical practice (Kricos, 2006).

Advances in neuroscience reignited the interest in auditory training because of the plethora of research documenting the capacity of the human brain to change, depending on the type of sensory input or lack thereof. Here we focus on auditory perceptual training as a means of exploring the human capacity to learn so that brain plasticity can be optimized in ways that enhance the rehabilitation of people with hearing loss. Previous studies have shown that training-related changes in neural activity precede changes in auditory perception (Tremblay et al., 1998; Atienza et al., 2002) therefore, non-invasive physiological measures might provide an opportunity to monitor and optimize intervention efforts in people with different types of hearing loss.

Even though auditory training exercises in humans have been shown to improve certain perceptual skills of individuals with and without hearing loss (Boothroyd, 1997; Tremblay et al., 1997, 1998, 2001; Fu et al., 2004; Irvine and Wright, 2005; Sweetow and Sabes, 2006; Burk and Humes, 2007; Tremblay and Moore, 2012; Anderson et al., 2013; Chisolm et al., 2013; Sullivan et al., 2013), there is a lack of knowledge pertaining to which aspects of training are responsible for the perceptual gains, and which aspects of perception are changed (Amitay et al., 2006, 2013; Boothroyd, 2010; Henshaw and Ferguson, 2013; Jacoby and Ahissar, 2013). This lack of knowledge hinders the rehabilitation of people with hearing loss because individuals do not always respond as expected to the training program in which they participate. Even among normal hearing listeners, the effects of training can be highly heterogeneous. Without knowing which aspects of the training exercises are responsible for observed benefits, it is difficult to determine which components of the training paradigm are ineffective and what individual needs still require targeted intervention.

To better define how auditory training exercises impact brain and behavior, electroencephalography (EEG) and magnetoencephalography (MEG) have been used to determine the time course and coincidence of cortical and sub-cortical modulations in evoked activity associated with different types of auditory training (Tremblay et al., 1997, 2001, 2009, 2010; Brattico et al., 2003; Shahin et al., 2003; Bosnyak et al., 2004; Sheehan et al., 2005; Alain et al., 2010; Carcagno and Plack, 2011; Shahin, 2011; Anderson et al., 2013; Barrett et al., 2013). Here we focus on studies involving the P1-N1-P2 waves of the cortical auditory evoked response (AEP), as there are consistent reports of gains in P2 amplitude following various types of auditory training experiences; including music (Shahin et al., 2003; Kuriki et al., 2007; Seppänen et al., 2012; Kühnis et al., 2013) and speechsound training. Despite converging evidence that increases in the amplitude of the P2 wave of the P1-N1-P2 complex coincides with improved perception, little is known about the functional meaning and neural generators of the auditory P2 response and whether or not it could serve as a biological marker of auditory learning. Our earlier studies show the center of activity for P2 to be in the anterior auditory cortex, but how this relates to learning is still unknown (Ross and Tremblay, 2009).

Speech sounds and acoustic elements thereof are represented in the neural activity patterns along the auditory pathway. One example is the representation of voice-onset time (VOT), as reflected through a sequence of onset responses recorded from primary auditory cortices in feline, primate, and human models (Eggermont, 1995; Steinschneider et al., 2005). Monotonic increases in VOT result in latency shifts and double onset responses involving the N1 peak of the P1-N1-P2 complex (Tremblay et al., 2003; Steinschneider et al., 2005). The N1 is often described to be an "exogenous" response, meaning that it is sensitive to physical characteristics of the sound used to evoke the response (see Picton, 2013 for a recent review). As an example, the N1 reflects the detection of acoustic changes; including, the onset of sound, and acoustic changes within an ongoing sound (such as a consonant-vowel transitions) (Ostroff et al., 1998; Wagner et al., 2013). The P1 wave is thought to reflect gating of auditory information to the auditory cortex (Alho et al., 1994) whereas the P2 may reflect auditory processing beyond sensation (Crowley and Colrain, 2004). It is for this reason; the P1-N1-P2 complex has been used to examine the neural representation of perceptually relevant temporal cues such as VOT.

In a series of past experiments, the effects of VOT training on the human P1-N1-P2 complex have also been studied (Tremblay et al., 2001, 2009, 2010; Sheehan et al., 2005; Alain et al., 2010). These experiments were used to determine if neural VOT codes could be altered through training. That is, could the perception of two within category VOT stimuli (e.g., identification and/or discrimination) that are perceived alike, and that evoke similar N1 peak latencies be altered with training? What's more, if perception changes, does the neural representation of VOT, marked by the latency of N1, change?

The VOT training studies described earlier did not reveal modifications in the latency of the N1 response. Instead, P2 amplitudes increased following VOT training. Training-related enhancements in P2 turned out not to be specific to VOT or VOT training. Enhanced P2 amplitudes appeared after various types of sound exposures (Tremblay and Ross, 2007; Tremblay et al., 2001, 2009; Atienza et al., 2002; Bosnyak et al., 2004; Sheehan et al., 2005) including identification or discrimination training; for different types of stimuli including tones and speech sounds; presented in different types of event-related potentials (ERPs) contexts (homogenous block or oddball paradigm, monaurally or binaurally); over different time courses (1 day vs. 1 year); using EEG or MEG. The P2 effect is robust, can be reliably seen in individuals, and is retained for months following initial exposure (Tremblay et al., 2010). This phenomenon is not limited to the laboratory either; enhanced P2 amplitudes appear to reflect lifelong learning such as musical training (Kuriki et al., 2006; Shahin, 2011).

Even though P2 amplitude gains have been reported to be physiological correlates of auditory learning, it is important to challenge this notion by recognizing that contributions of stimulus exposure, executive function, cognitive tasks, and memory are inherent in any auditory training paradigm. Any one or combination of these components, rather than learning itself, could be influencing P2 changes reported in the literature. In fact, our previous studies (Ross and Tremblay, 2009; Tremblay et al., 2010), and others (Sheehan et al., 2005) suggest that mere stimulus exposure, during EEG and MEG recording sessions and behavioral baseline testing, in the absence of training or changes in perceptual performance, contribute to enhanced P2 amplitude.

Expanding this program of research by including different experimental designs, while involving the same stimuli, enables us to identify converging evidence across the studies. Therefore, the purpose of this study was to determine whether or not P2 amplitude changes represent biologic markers of auditory learning. To do so required examining modulations in brain activity in a group of participants who learned the task and comparing it to participants, matched in time, task, and stimulus exposure, that did not learn. Modulations in P2 amplitude could be viewed as a biomarker of auditory learning if P2 amplitudes increased only for the group that learned the VOT contrast, but not in the other groups.

We therefore recorded behavioral responses and brain activity, elicited by stimuli differing in VOT, from three groups of participants, who were tested within similar time windows (**Figure 1**). The first group served as a control group without intervening listening or training experience, so that quantifiable modulations in brain activity could be related solely to the passage of time. The remaining two experimental groups (Groups 2 and 3) participated in listening tasks during a 5 day intervening period between pre 2 and post sessions. Both groups heard the same number of stimulus sounds during these intervening days, but the two groups differed in the type of task and feedback they received. One facilitated learning whereas the other did not. For example, members of Group 2 were asked to click a mouse button (to proceed to the next sound) after hearing each sound without receiving any feedback to facilitate learning the VOT contrast. Group 3 members were instructed how to label each sound (the two-alternative force-choice task) by clicking a mouse button, feedback about their performance followed so to facilitate learning. In doing so, we were able to examine brain-related changes in activity among a group that did and did not learn the VOT contrast. We also looked beyond a typical P1-N1-P2 time window (<200 ms in latency), to determine if VOT training modulates more endogenous, higher-level, aspects of sound processing.

## **MATERIALS AND METHODS**

#### **PARTICIPANTS**

Thirty normal-hearing native-English speakers (18–39 years) were randomly assigned to one of three groups (10 in each group). Normal hearing was defined as pure tone thresholds ≤25 dB HL across frequencies between 250 and 8000 Hz. All participants were right handed and provided their written informed consent prior to participation. The Research Ethics Board of the University of Washington approved the study. Data from ten of these subjects (Group 1) were previously described in a publication that

**FIGURE 1 | Experiment design and time course**. EEG recording and behavioral testing was performed at similar points in time, across four sessions, and involved three groups. EEG data were acquired separate from the behavioral sessions. Whereas participants in Groups 2 and 3 were exposed to, and interacted with, the stimuli over a 5 day period between test sessions, Group 1 did not. The number of stimuli (amount of stimulus

exposure) was identical across Groups 2 and 3, and participants were required to perform a similar task (click the mouse to advance to the next stimulus), but what differed between the two groups was the instructions and feedback. Participants in Group 3 received instructions and response feedback intended to improve their ability to correctly identify each of the two pre-voiced stimuli, but participants in Group 2 did not.

reported only the effects of repeated stimulus exposure (Tremblay et al., 2010).

#### **STIMULI**

Two Klatt synthesized pre-voiced "ba" syllables, 180 ms in duration, were used in this experiment. They were the same stimuli used in a series of experiments designed to examine the neural encoding of VOT with training (see series of experiments by Tremblay et al., 1997 through 2010). Adult native English speakers consistently describe both pre-voiced stimuli as "ba" (McClaskey et al., 1983), but following training, they can learn to identify and label the −10 ms VOT stimulus as "ba" and the −20 ms VOT stimulus as "mba" (Tremblay et al., 1997).

#### **BEHAVIORAL TESTS**

The ability to correctly identify the two stimuli was tested in four sessions for all groups within the same time frame (**Figure 1**). The first two tests were performed on 2 subsequent days, termed pre 1 and 2, and provided baseline performance scores. A post-training test was administered 5–7 days later and a retention test more than 2 months later. All groups were involved in the identification task, which was the same for all sessions. Participants were presented with randomized trials of the "mba" and "ba" stimuli. Twenty-five of the "mba" and 25 "ba" stimuli were presented in each session binaurally at a level of 76 dB SPL using insert earphones (Etymotic Research ER3a). The test was self-paced and a response, entered via a computer mouse, triggered the presentation of the next sound. Feedback was not provided in any test. The instructions to all participants were: "*You will hear some sounds and I want you to label the sounds as you hear them using the left button on the computer mouse. You will label the sounds based on two choices that will be displayed on the computer monitor. There is no right or wrong answer; it is simply your perception of what you hear*". Two labels appeared on the computer screen as text: "mba" and "ba".

#### **BEHAVIORAL TRAINING**

Group 1 participated in the four-behavioral tests only and served as a control group for examining changes in perception and physiology, over the same time periods as Groups 2 and 3. Groups 2 and 3 participated in training sessions on 5 consecutive days, starting immediately following the pre 2 behavioral testing. Both groups heard four blocks of 50 randomized presentations of the "mba" and "ba" syllables, 25 of each on each day. Behavioral testing was self-paced and lasted approximately 20 min each day. Whereas the numbers of stimuli (amount of stimulus exposure) and the motor task of clicking the mouse were similar across the two groups, the instructions and feedback were different between groups. The task for Group 3 involved evaluating the stimulus they just heard, making a decision about what label they will assign to each sound, and then clicking the mouse to indicate which sound they heard. Group 3 also received feedback, which was intended to motivate participants to "correctly" label each sound.

Participants in Group 2 were instructed: "*You will hear some sounds. After each sound press the button on the screen to continue to the following sound*". A button labeled "NEXT" was displayed on the computer screen to advance the task following each stimulus presentation.

Group 3 participants were instructed: "*Now, we're going to help you label one sound /ba/ and one /mba/. You will be given feedback following each trial. If you select the correct label, it will turn green. If you do not select the correct label, the next trial will begin*". Two text labels, "mba" and "ba", were displayed on the computer screen.

#### **ELECTROENCEPHALOGRAPHY (EEG) ACQUISITION**

EEG recordings and behavioral testing were completed in a sound-attenuated booth on 2 consecutive days (Session pre 1 and 2) 1 week following initial testing (post-training session) and 2 months to 1+ year following initial testing (retention session). Retention tests were staggered in time so changes in brain and behavior could be tracked over a large time window.

Similar to our previous experiments, stimuli were delivered monaurally via insert earphones to the right ear at 76 dB SPL; the same intensity was used for the behavioral tests. A passive EEG paradigm was used, meaning participants watched closedcaptioned movies and were instructed to stay alert but no particular attention to the stimuli was requested. No behavioral task took place during EEG recordings. Four hundred presentations of the same type stimuli ("ba" or "mba") were presented with an inter-stimulus interval of 1993 ms in a block. Following a 5 min break, a block of the other sound stimulus ("mba" or "ba") was recorded. Stimulus order was counter-balanced across groups and test sessions. This particular ISI was used because our previous studies have shown that younger and older adults are differentially sensitive to stimulus presentation rates faster than 2 s and in future studies we wish to compare these data to those of older adults (Tremblay et al., 2004).

Continuous EEG signals were recorded from 59 electrodes using an elastic cap (Electro-cap International, Inc.) and a PCbased Neuroscan system (SCAN, ver. 4.3.3) with SynAmps2 amplifiers. The electrode montage followed an extended 10– 20 system, reported in more detail in Tremblay et al. (2010). Four additional electrodes were placed on the inferior and outer canthus of each eye to monitor eye blink activity. EEG signals were referenced to the Cz electrode, analog bandpass-filtered between 0.15 and 100 Hz (12 dB/octave roll off), amplified with a gain of 500, and digitized at a sampling rate of 1000 Hz.

For offline analysis, an artifact correction procedure using BESA (5.2) was applied to reduce the effects of contamination from eye-blinks and ocular movements. Eye-blink artifacts were identified by a threshold criterion and corresponding waveforms were averaged to obtain a template of ocular artifacts. A principal component analysis of these averaged recordings provided a set of components that best explained the eye movements. The scalp projections of these components were then removed from the EEG signal to minimize ocular contamination.

In BESA the continuous EEG signal was parsed into stimulus onset related epochs of 1200 ms length, including a 200 ms pre-stimulus interval, which was used for baseline-correction. The signals were averaged for each stimulus condition and rereferenced to the average across all electrodes. Waveforms were low pass filtered at 32 Hz. The peak amplitudes and latencies of the N1 and P2 waves were measured as the signal maxima at electrode Cz in the latency intervals of ±50 ms around 100 ms and 200 ms for each participant, each stimulus type and each session.

#### **RESULTS**

#### **BEHAVIORAL DATA ANALYSIS**

To assess perceptual performance across groups, *d*-prime (*d* 0 ) scores (Macmillan and Creelman, 1991) were computed for each participant from the rates of hits, misses, false alarms, and correct rejections for each behavioral test. A response was scored correct (hit) if the participant assigned the label "mba" to the −20 ms VOT stimulus. A correct rejection involved choosing the label "ba" for the −10 ms VOT stimulus. A split-plot 3 (fixed between groups; "Group") × 4 (fixed within groups; "Session") mixed model ANOVA was used to test the effects of "Group" and "Session" as well as their interaction on the *d* 0 -scores. *F-*statistics for the within-group effects and interactions were adjusted to control for Type I error due to significance of Mauchly's test of sphericity. Follow-up pairwise comparisons were made using the Dunn-Sidak multiple comparisons procedure to control for Type I error. **Figure 2A–C** summarize the behavioral results. Significant improvement in identification performance was seen for Group 3 only, and was retained for as long as 1 year for some individuals.

#### **PERFORMANCE EVALUATION—PRE 1, PRE 2 AND POST SESSIONS**

There was a significant main effect of "Session" on *d* 0 *-scores* (*F*(2.33,53.50) = 6.59, *p* < 0.01, *partial* ω <sup>2</sup> = 0.13); as well as a "Group" × "Session" interaction (*F*(4.65,53.50) = 7.17, *p* < 0.001, partial ω <sup>2</sup> = 0.28). Follow-up pairwise comparisons showed an increase in *d* <sup>0</sup> between baseline (Pre 2) and the post-test for

for Group 3. Members of Group 3 participated in the identification task and received feedback. **(B)** Changes in d' over time for 7 out of 10 individuals in Group 3 individuals who participated in the retention sessions. **(C)** A

button-pushing task, but Group 2 did not receive instructions or feedback designed to facilitate learning. No significant changes in performance were seen for Group 2.

Group 3 only i.e., for those participants in the training who received performance feedback (*p-*values < 0.05).

#### **PERFORMANCE RETENTION**

**Figure 2B** shows changes in *d* 0 -scores over time for Group 3. Three individuals were lost to attrition and were unavailable to return for retention testing. Analysis of *d* 0 -scores measured more than 2 months after the initial testing (Retention) revealed sustained improvements in performance for Group 3 (**Figure 2A**). Significant increases in *d* 0 -scores were seen between the baseline session (pre 2) and the post-training measures (*p* = 0.033), with significant differences between baseline and retention (*p* = 0.009) and no significant differences between post-training and retention measures (*p* = 0.697). An analysis of the *d* 0 -scores revealed that the improvements in performance, which was found between preand post-training measures persisted in the retention measure.

#### **ELECTROENCEPHALOGRAPHY (EEG) ANALYSIS: AUDITORY EVOKED RESPONSES**

To compare to our previously published studies, grand averaged evoked responses at electrode Cz are shown for the three groups and the four recording sessions in **Figures 3**, **4**. All waveforms are in response to the −10 ms prevoiced "ba" stimulus and show prominent N1-P2 waves. The P1 wave is small. Although the response morphologies are quite different between groups as, for example, expressed in different ratios of the N1 and P2 amplitudes and variations at longer latencies beyond 300 ms, the effect of increasing P2 amplitudes between the first baseline recording and the post-training session is apparent in all three groups. Also, similar to our previous study (Tremblay et al., 2009), P2 amplitude measured across staggered retention sessions more than 2 months after the first recording, remained larger than the initially measured P2 amplitude. In contrast, changes in N1 amplitude over the time course of the experiment were small. Offset responses also appear to decrease over time, but we assume them to be driven by growth of P2.

The N1 amplitude showed smaller between-session changes than that of P2 (**Figure 4A**). N1 amplitude diminished over the time course of the first three recordings. A repeated measures ANOVA for the N1 amplitude revealed main effects of "Session" (*F*(3,81) = 4.67, *p* = 0.0046) and "Stimulus" (*F*(1,27) = 5.32, *p* = 0.029) and a "Session" × "Group" interaction (*F*(6,81) = 3.11, *p* = 0.0087). When averaged across sessions, the stimulus effect appeared to be driven by the slightly larger "ba" amplitude for all three groups (mba: 1.63 µV and ba: 1.85 µV). The interaction diminished when considering the first three recordings only, suggesting it was mainly caused by the continuing N1 decrease in the retention session in Group 3 only. It should be kept in mind, that an N1 amplitude decrease means a positive voltage shift at the Cz electrode, which appeared in line with P2 amplitude increases, thus, a cross talk of the P2 changes has to be considered when interpreting the N1 changes. No significant changes in N1 latency were found for either stimulus, across sessions.

A repeated measures ANOVA on P2 amplitude with the between subjects factor "Group" (3 levels) and the within subjects factors "Session" (4 levels) and "Stimulus" (2 levels) revealed no main effect of "Group" (*F*(2,27) = 0.5), but there were main effects of "Session" (*F*(3,27) = 62.7, *p* < 0.0001) and of "Stimulus" (*F*(2,27) = 13.6, *p* = 0.001). No "Group" × "Session" or "Group" × "Stimulus" interaction was significant. For the mean across groups, the P2 amplitude increased from 0.90 µV to 1.61 µV between the pre 1 and 2 baseline recordings, continued to increase to 2.59 µV in the post-training session, and decreased to 1.86 µV in the retention session. Compared to the first baseline recording, the P2 amplitude increased by 79% at the second

baseline recording, by 187% at the post-training session, and retained larger than twice the initial amplitude after more than 2 months.

Gains in P2 amplitude between the pre-training sessions and between pre- and post-training sessions are illustrated in **Figure 4B**. An ANOVA revealed a main effect of "Session" (*F*(1,27) = 4.9, *p* = 0.035) and a "Session" × "Group" interaction (*F*(2,27) = 5.2, *p* = 0.030) because the P2 gain between pre- and post-training was larger than the P2 increase between the baseline sessions in Group 2 (*t*(19) = 2.21, *p* = 0.040) and in Group 3 (*t*(19) = 2.31, *p* = 0.035) but not in Group 1 (*t*(19) = 0.3). There were no differences in the amount of P2 gain between Groups 2 and 3.

Results of a spatio-temporal principal component analysis on the evoked response waveforms observed in Group 3 are summarized in **Figure 5** with the topographic distributions of the five largest components, which explain in total 98.4% of the variance, and the corresponding waveforms separately for the two baseline sessions and the post-training session. Overlaid are the responses to the "ba" and the "mba" stimuli. The aim of this analysis was to explore whether learning to identify the two stimuli would result in a different responses to "ba" and "mba". Recognizing there are spatial precision limitations with EEG, we report the largest component, characterized by the N1-P2 waves, as being maximal at frontal midline electrodes, and the second largest component was predominant above the posterior parietal region. Smaller components were localized to left and right temporal and inferior frontal regions. Although the smallest component explained only 2.2% of the signal variance, the corresponding time series were clearly reproduced between sessions. Most importantly, no clear distinction between "ba" and "mba" responses became obvious. Accordingly, a formal multivariate test using PLS analysis showed a main effect of "Session" but no "Session" × "Stimulus" interaction. So far, the current data do not suggest that learning results in different cortical representation of the learned stimulus item beyond the statistical power of our analysis.

#### **LATE NEGATIVITY**

Changes in evoked neural activity, in the 600–900 ms latency interval, were also observed during post- training and retentionsessions in the trained Group 3. Therefore, the mean amplitude in the 600–900 ms latency interval was measured and compared between groups and recording sessions. The repeated measures ANOVA for this late negativity (LN) revealed only a tendency toward significance for "Group" (*F*(2,27) = 2.71, *p* = 0.085); however, a main effect of the within-subject factor "Session" (*F*(2,54) = 6.92, *p* = 0.0021) and the "Session" × "Group" interaction (*F*(4,54) = 3.47, *p* = 0.0135) was observed. Pairwise comparisons help to explain the interaction because between-session differences in the LN were significant in Group 3 only (**Figure 6**). In Group 3, the LN was larger after the training compared to the pre 1 session (*t*(19) = 4.18, *p* < 0.0001) and compared to the pre 2 session (*t*(19) = 3.62, *p* = 0.0018). Despite the significant training-related changes in the LN latency range for Group 3, the magnitude of perceptual change did not correlate with the amount of amplitude LN change (*R* <sup>2</sup> = 0.07, *F* = 0.61, *p* = 0.46). Also, even though a visible LN can be seen in the retention data, the between subject variability was large (likely because of the staggered test times) and thus the retention effect was not significant.

### **DISCUSSION**

There is a long history of using auditory training exercises as a part of auditory rehabilitation programs for people with and without hearing loss. One assumption is that listening training modifies the way sound is encoded and processed in the central auditory system, another is that listening exercises permit the person to make better use of existing neural codes. We still do not know what aspects of auditory training are responsible for perceptual gains (Boothroyd, 2010) and how coincident changes in neural activity relate to the auditory cue being trained. To address this issue, we compared trainingrelated changes in perception and physiology, evoked by the same VOT stimuli, so brain and behavior relationships could be made. N1 and P2 latencies are consistently reported to be important neural correlates of VOT (Wagner et al., 2013); however, when people are trained to alter the perception of VOT, P2 amplitude, rather than N1 or P2 latencies are observed. Therefore, the purpose of this experiment was to determine

if the auditory evoked P2 response is a biomarker of VOT learning.

#### **IS P2 A BIOMARKER OF LEARNING?**

The main finding was that P2 amplitude growths were observed for people who did and did not learn the novel VOT contrast. Based on these data, the most obvious conclusion is that P2 amplitude is not a biomarker of learning. This conclusion is reinforced by the growing body of evidence suggesting it is the elements of training (exposure, task execution) that contribute to P2 enhancements, and not the learned product of a goaldirected act. It would also explain why no study has been able to establish a one-to-one relationship between the magnitude of P2 change and the magnitude of perceptual change (Tremblay et al., 2001; Sheehan et al., 2005) and why enhancements appear to generalize to other stimuli exposed to but not necessarily learned (Tremblay et al., 2009). However, it is also possible that the large training related changes in P2 might overlay and obscure smaller effects of learning; or reflect other related processes not measured here. Therefore, to entirely dismiss a relationship between P2 and auditory learning would be to ignore converging evidence, from multiple laboratories, linking enhanced neuroplastic P2 activity to multiple forms of learned behaviors. When learning to discriminate the rate of frequency modulation in tones, for example, differences in performance gain related to different learning strategies, and were reported be reflected in P2 amplitude increases (Orduña et al., 2012). In a study of pitch discrimination training, absolute P2 amplitude correlated with reaction time (Tong et al., 2009). Also, long-term experience in musicianship and effects of auditory training in musicians were expressed in larger P2 amplitudes and amplitude increments compared to non-musicians (Seppänen et al., 2012). Collectively, there is a growing body of literature linking enhanced P2 amplitudes to auditory learning that it makes it difficult to entirely reject some type of brain-behavior relationship. We therefore put forth an alternative hypothesis; changes in P2 amplitude reflect neural activity associated with the acquisition process, but not the learned outcome itself. What neural mechanisms are associated with the process, and driving modulations in P2 activity, still need to be defined.

Based on source modeling we can assume some degree of auditory cortex involvement. Ross and Tremblay (2009) showed N1 and P2 to originate from different anatomical structures that likely serve different functions. N1 sources lay in the posterior part of auditory cortex, the planum temporale, whereas the center of activity for P2 lay in anterior auditory cortex, the lateral part of Heschl's gyrus. P2 sources have also been identified in planum temporale, Brodmann's area 22, and auditory association cortices (Crowley and Colrain, 2004). Whereas the P2 increase exhibits a neuroplastic nature, with enhanced activity becoming evident only after a period of sleep (Atienza et al., 2002; Ross and Tremblay, 2009; Zhu, 2010) and persisting for months; decreases in N1 amplitude occur within an experimental session and return to baseline in subsequent recordings (Ross and Tremblay, 2009). This type of N1 behavior pattern is more in line with habituation and less so with the types of learning–related N1 changes exhibited during active EEG recordings where modulations in brain activity are recorded while the participant is attending and executing the training task (Alain et al., 2010). Then again, habituation is sometimes termed "non-associative learning" and may be facilitating the P2 effects reported here (Rankin et al., 2009). N1 suppression mechanisms may also help consolidation, resulting in an increase of P2 between sessions.

The stimuli and passive recording paradigms used in our original VOT studies were designed to determine if neural codes reflecting VOT, and reflected by the N1, could be altered through training. If so, these far-field AEP recordings could be used clinically to assess the temporal resolution and rehabilitation of Tremblay et al. Auditory training and ERPs

populations with suspected temporal processing disorders. The passive EEG recording paradigm is ideal for difficult to test populations and avoids potential confounds that can interfere with perceptual performance. Moreover, the stimulus block design was designed with future clinical applications in mind as these types of recording paradigms are within the capacity, and similar to electrophysiological procedures, used in audiology clinics today. However, to date, using this approach, no evidence of significant N1 latency shifts, reflecting perceptual changes in VOT, over time, have been reported. One possibility is that N1 latencies do not reflect subtle differences in *pre*-voicing. Another is that mechanisms underlying N1 are resistant to training (Wagner et al., 2013), or changes in synchronous activity are so modest that they cannot be detected using far-field recordings in humans. However, there is some evidence that N1 (and some subcomponents) can be modified with training but these were all observed as amplitude rather than latency changes (Menning et al., 2000; Brattico et al., 2003; Bosnyak et al., 2004). An exception is Reinke et al. (2003) who reported decreased N1 and P2 latencies, as well as enhanced P2 amplitudes following training, but these latency changes were recorded using an active EEG task while listeners partook in a vowel segregation-training task. This means, attention, auditory and visual sensory processing, memory and executive function could have contributed to the observed latency changes. Thus, it is difficult to differentiate sensory vs. cognitive (top-down) contributions to learning, as well as the various types of top-down contributors.

The P1-N1-P2 responses recorded here were acquired in a passive way and as such are described as being mainly exogenous in nature, meaning they are highly dependent on the physical properties of the stimulus used to evoke it. However, these AEPs can be endogenous, and modulated by attention in certain circumstances (Hillyard et al., 1973; Woldorff and Hillyard, 1991; Woods, 1995). This point is important when considering potential contributors to enhanced P2 activity. In our design, participants heard stimuli during the AEP sessions and during each perceptual training and testing task. They saw visual instructions and text response options. In all instances, auditory and visual input tapped into memory sources because sessions were repeated on different days. So, as described by Tremblay et al. (2001) and others, it is possible that some of the training-related physiological changes reported here might reflect other top-down modulatory influences that are activated during AEP recordings as well as focused listening tasks. What's more, the P2 effects might not even be auditory specific. Similar to the auditory evoked P2, the visually evoked P2 is modulated by attention, language context information, and memory and repetition effects. It is also considered to be part of cognitive matching system that compares sensory inputs with stored memory (Luck and Hillyard, 1994; Freunberger et al., 2007). Therefore, although our source modeling studies (Ross and Tremblay, 2009) showed involvement of primary and association cortical areas, we have not yet ruled out multisensory interactions from contributing to our results. Until future experiments are designed to disentangle the various multi-sensory top-down contributing components such as: attention, memory, and executive function, we are left to speculate about neural mechanisms, and their contributions to the results reported here.

One possibility worth exploring in future studies is the concept of object representation (Näätänen and Winkler, 1999; Ross et al., 2013). If we view N1 and P2 as reflecting synchronous evoked auditory involved in the early stages of perceptual learning, where the neural representation of the sensory input takes place, we could speculate that the P2 indicates memory updating, and consolidation, where the two similar sounding ("ba" and "mba") stimuli, are stored in a buffer. This phase could be passive, not requiring engagement of the participants, which would explain enhanced P2 activity from session to session in the absence of training. With directions and feedback, it would become possible to separate this sensory information into two objects "mba" and "ba". Within this framework, we suggest that P2 plays a role in stimulus familiarization and auditory object representation; critical processes for successful perception. The second phase of learning is likely mediated by top-down processes and probably involves many interactive aspects involving attention, motivation, reinforcement etc. Whereas the first stage applies to the neural detection of sound, the second stage reflects how the brain makes use of the sound. To better understand later stages in sound processing, we expanded our prior analyses to determine if auditory training, and its components, result in recordable modulations in brain activity—later in time. As seen in previous studies (Tremblay et al., 2009) there might be experience-related changes occurring outside the P2 latency region that are visible in different scalp locations.

#### **LATE NEGATIVITY (LN)**

A previously unreported finding was the presence of the LN in the post-training session, for the group that learned to identify the pre-voiced contrast. It appears to a lesser degree in the retention data as well, but brain-and-behavior scores do not correlate with each other. Like the P2, the magnitude of LN change does not predict a person's perceptual change score. So what does the LN reflect?

It is well established that distinct forms of cognitive control are associated with unique patterns of activation over a distributed network of regions. These networks can include the dorsolateral prefrontal cortex (DLPFC), ventrolateral prefrontal cortex (VLPFC), supplementary and pre-supplementary motor areas, the anterior cingulate cortex (ACC), superior and inferior aspects of the posterior parietal cortex (Corbetta and Shulman, 2002; Cole and Schneider, 2007). What's more, many aspects of cognitive control have been shown to manifest themselves as negativities in ERP recordings (e.g., N2, Nd, MMN, N400, Late Difference Negativity (LDN) and Error Related Negativity (ERN)). However, these types of negativities are typically recorded when the task involved attention switching, or other complex stimulus paradigms like an oddball paradigm, or often require active participation during the EEG recording. In the present experiment, participants were not engaged in a purposeful attention task and the stimuli were presented as a homogenous train of equiprobable events with no salient deviant stimuli. Thus, our use of the term LN is descriptive and does not neatly fit a wellcharacterized ERP profile. If left to speculate, we hypothesize that members of Group 3 learned to identify subtle acoustic cues that separated the two pre-voiced stimuli prior to the final ERP session. It is possible then that the training sessions drew greater attention to the stimuli as being separate objects. At the time of post-training EEG sessions, these two stimuli were automatically recognized as two separate auditory objects, but members of Group 3 were the only ones who were taught to attach each object to a perceptual label.

#### **SUMMARY AND CONCLUSIONS**

The purpose of this study was to determine if enhanced auditory evoked P2 activity is a biomarker of learning. The question is relevant to the study of auditory rehabilitation in that neurophysiological correlates of auditory training are needed to better understand the mechanisms of action presumed to be involved when using training as an intervention approach for people with and without hearing loss. This study showed increases in P2 AEP amplitude following exposure to auditory stimuli as well as the participation in tasks (with and without feedback). Enhanced P2 amplitudes were seen regardless of any change in perceptual performance and therefore not interpreted to be a biomarker of learning. Instead, modulations in P2 amplitude were attributed to changes in neural activity associated with the acquisition process and not the learned outcome itself. A process that is robust enough to be retained for months. A further finding was the expression of a LN wave 600–900 ms post-stimulus onset, in the post-training session, exclusively for the group that learned to identify the pre-voiced contrast. Collectively, we conclude that being exposed to and interacting with sound, alters the way those sounds are represented in the brain and these changes in neural activity are part of the learning process. Consistent with our earlier findings (Tremblay et al., 1998, 2009), changes in neural activity appear to precede changes in auditory perception and are retained for months. The application of this information to the assessment and rehabilitation of people with hearing loss and other communication-based disorders will depend on future studies aimed at disentangling multi modal bottom-up and topdown neural mechanisms contributing to changes in the N1, P2 and LN. However, a final take home point is that research directed at identifying neural mechanisms related to training and learning should take into consideration the contribution of repeated stimulus exposure as well as other possible coincident contributors to reported physiological changes.

#### **AUTHOR CONTRIBUTIONS**

All authors contributed substantially to the concept/design of the work; the interpretation of data; draft reviews including revisions, and approved the final version to be published. Collectively we are accountable for all aspects of the work, including accuracy and integrity. Additionally, Katrina McClannahan and Gregory Collet contributed to data collection.

#### **ACKNOWLEDGMENTS**

This work was supported by the National Institutes of Health (R01-DC007705; R01DC012769), (P30-DC04661), (F30- DC010297), and the Fonds National de Recherche Scientifique (FRS-FNRS), Brussels, Belgium. Funding from the American Academy of Audiology awarded to Katrina McClannahan is also acknowledged. The authors would also like to thank Dr. Chris Bishop for his review of earlier drafts and Yu He for her assistance with data processing.

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 28 October 2013; accepted: 06 February 2014; published online: 20 February 2014.*

*Citation: Tremblay KL, Ross B, Inoue K, McClannahan K and Collet G (2014) Is the auditory evoked P2 response a biomarker of learning? Front. Syst. Neurosci. 8:28. doi: 10.3389/fnsys.2014.00028*

*This article was submitted to the journal Frontiers in Systems Neuroscience.*

*Copyright © 2014 Tremblay, Ross, Inoue, McClannahan and Collet. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Effects of age-related hearing loss and background noise on neuromagnetic activity from auditory cortex

## **Claude Alain1,2,3\*, Anja Roye<sup>1</sup> and Claire Salloum<sup>1</sup>**

<sup>1</sup> Rotman Research Institute, Baycrest Centre for Geriatric Care, Toronto, ON, Canada

<sup>2</sup> Department of Psychology, University of Toronto, Toronto, ON, Canada

3 Institute of Medical Sciences, University of Toronto, Toronto, ON, Canada

#### **Edited by:**

Jonathan E. Peelle, Washington University in St. Louis, USA

#### **Reviewed by:**

Julia Stephen, The Mind Research Network, USA Ediz Sohoglu, University College London, UK

#### **\*Correspondence:**

Claude Alain, Rotman Research Institute, Baycrest Centre for Geriatric Care, 3560 Bathurst Street, Toronto, ON M6A 2E1, Canada e-mail: calain@research.baycrest.org

Aging is often accompanied by hearing loss, which impacts how sounds are processed and represented along the ascending auditory pathways and within the auditory cortices. Here, we assess the impact of mild binaural hearing loss on the older adults' ability to both process complex sounds embedded in noise and to segregate a mistuned harmonic in an otherwise periodic stimulus. We measured auditory evoked fields (AEFs) using magnetoencephalography while participants were presented with complex tones that had either all harmonics in tune or had the third harmonic mistuned by 4 or 16% of its original value. The tones (75 dB sound pressure level, SPL) were presented without, with low (45 dBA SPL), or with moderate (65 dBA SPL) Gaussian noise. For each participant, we modeled the AEFs with a pair of dipoles in the superior temporal plane. We then examined the effects of hearing loss and noise on the amplitude and latency of the resulting source waveforms. In the present study, results revealed that similar noise-induced increases in N1m were present in older adults with and without hearing loss. Our results also showed that the P1m amplitude was larger in the hearing impaired than in the normal-hearing adults. In addition, the object-related negativity (ORN) elicited by the mistuned harmonic was larger in hearing impaired listeners. The enhanced P1m and ORN amplitude in the hearing impaired older adults suggests that hearing loss increased neural excitability in auditory cortices, which could be related to deficits in inhibitory control.

**Keywords: aging, MEG, hearing loss, auditory cortex, inhibition (psychology)**

#### **INTRODUCTION**

Hearing abilities diminish with age largely due to changes that take place in the cochlea. However, there is increasing evidence suggesting that changes in the peripheral system alone cannot adequately account for all hearing problems encountered by older adults. Rather deficits in central auditory processing are also likely playing an important role (Martin and Jerger, 2005; Humes et al., 2012). Scalp recordings of auditory evoked potentials (AEPs) and auditory evoked fields (AEFs, the magnetic counterpart of AEPs) may be useful for differentiating an "aging" from a "hearing loss" basis for the auditory deficits observed in older adults. Furthermore, it may also help identify brain areas that are more susceptible to hearing loss and/or the aging process.

In healthy normal hearing adults, AEPs are usually composed of a positive, a negative, and then a positive wave that peak at about 50 (P1), 100 (N1), and 180 ms (P2) after sound onset, respectively. Converging evidence from lesion studies in humans (e.g., Woods et al., 1987; Alain et al., 1998), magnetoencephaplography (MEG) (e.g., Hari et al., 1980; Hari, 1991; Reite et al., 1994), and brain source modeling (e.g., Scherg and Von Cramon, 1986; Picton et al., 1999) are consistent with generators located in or near Heschl's gyrus. The P1, N1, and P2 waves are mainly stimulus-driven (i.e., exogenous) responses thought to index signal detection (Hillyard et al., 1971). The amplitude and latency of these responses are influenced by the signalto-noise ratio (Martin et al., 1997, 1999; Whiting et al., 1998; Martin and Stapells, 2005). Martin and colleagues, for example, examined the effects of competing signals by presenting speech signals embedded in noise (Martin et al., 1997; Martin and Stapells, 2005). They found that speech identification abilities decreased when exposed to poorer signal-to-noise ratios with the performance decrement paralleled by both increased N1 latencies and decreased N1 amplitudes. More importantly, the latency and amplitude of the N1 significantly correlated with behavioral assessments of signal detectability (Martin et al., 1997) with electrophysiological thresholds closely approximating behavioral thresholds (Lightfoot and Kennedy, 2006). These findings suggest that AEPs provide a sensitive measure of signal audibility, which may prove useful at evaluating the effects of age-related hearing loss on central auditory processing.

The effects of normal aging on AEPs and AEFs have been examined in numerous reports using a variety of paradigms and stimuli. In many studies, the P1 wave has been found to be larger in older adults than in younger adults. (e.g., Smith et al., 1980; Pekkonen et al., 1995; Bertoli et al., 2005; Kovacevic et al., 2005; Fabiani et al., 2006; Alain and Snyder, 2008; Ross and Tremblay, 2009; Soros et al., 2009; Ross et al., 2010; Lister et al., 2011; Alain et al., 2012) Similar age-related increases in the N1 amplitude have been reported (e.g., Anderer et al., 1996; Chao and Knight, 1997; Alain and Woods, 1999; Amenedo and Diaz, 1999; Harkrider et al., 2006; Ross and Tremblay, 2009; Soros et al., 2009), albeit with less consistency (for a failure to find age difference see, Pfefferbaum et al., 1980; Smith et al., 1980; Picton et al., 1984; Barrett et al., 1987; Iragui et al., 1993; Bertoli et al., 2002; Tremblay et al., 2004; Kovacevic et al., 2005; Lister et al., 2011). The effect of age on the P2 amplitude is more equivocal with some studies reporting no age difference (Ford et al., 1979; Picton et al., 1984; Barrett et al., 1987; Iragui et al., 1993; Tremblay et al., 2004) while others observing smaller (Goodin et al., 1978; Smith et al., 1980; Ross and Tremblay, 2009) or larger (Pfefferbaum et al., 1980; Ford and Pfefferbaum, 1991; Fabiani et al., 2006; Alain and Snyder, 2008) amplitudes in older adults. However, the effects of age on the P2 latency are more consistent, with most studies reporting an age-related increase in P2 latency (e.g., Goodin et al., 1978; Iragui et al., 1993; Tremblay et al., 2004; Alain and McDonald, 2007). Together these findings are often taken as evidence for an age-related change in central auditory processing. The implicit assumption is that the changes are specific to age rather than to other factors such as hearing loss. However, in most studies, the young and older adults do not only differ in terms of age, they often also differ in hearing thresholds. This potential confound is acknowledged in many studies and there have been some attempts to control for it, for example, by adjusting sound intensity using the mean audiometric thresholds (e.g., Ross et al., 2007) or by using hearing thresholds as a covariate (e.g., Alain and McDonald, 2007).

Another approach to separate the contribution of age and hearing loss on central auditory processing consists of comparing older normal-hearing adults with those who have mild or severe hearing loss. The few studies published so far using this approach have yielded unexpected and surprising results. For instance, Harkrider et al. (2006) compared young adults, older normalhearing adults, and older adults with mild to moderate hearing loss. They found that the speech-evoked N1 wave was larger in hearing impaired than in normal-hearing older adults. Tremblay et al. (2003) also report a larger N1 in older adults with hearing impairment but only for a subset of speech sounds with voice onset time greater than 40 ms. Conversely, Bertoli et al. (2005) tested young adults, normal-hearing older adults, and hearingimpaired older adults. They found reduced N1 amplitudes to pure tone stimuli in hearing-impaired older adults compared to older normal hearing adults. Hence, it remains unclear whether hearing loss contributes to the age-related difference in N1 amplitude. The results for the P2 are slightly more consistent, with studies reporting no difference in P2 amplitude between normal-hearing and hearing-impaired older adults (Bertoli et al., 2005; Harkrider et al., 2006). In older adults, there is little evidence that hearing loss per se affects the P1, N1, or P2 latency (Tremblay et al., 2003; Bertoli et al., 2005; Harkrider et al., 2006).

The present study aims to further investigate the effect of hearing loss on central auditory processing using MEG. This study is an extension of an earlier study that examined the effects of age and background noise on listeners' ability to process a mistuned harmonic in an otherwise harmonic complex tone (Alain et al., 2012). The use of harmonic complex sounds with and without a mistuned harmonic provide the means to assess age-related differences related to processing sound onset as well as neural activity associated with encoding frequency periodicity. The study from Alain et al. (2012) revealed an age-related increase in P1m (magnetic counterpart of the P1 wave from EEG) as well as a delayed P2m compared to young adults. The mistuned harmonic generated an object-related negativity (ORN), which was comparable in amplitude between young and older adults. The ORN is a relatively new event-related potential (ERP) component that has been associated with the perception of concurrent sound objects induced by the mistuning of a low tonal element of a harmonic complex tone (e.g., Alain et al., 2001, 2002; Hautus and Johnson, 2005). It is usually illustrated by subtracting the ERPs elicited by tuned from those elicited by mistuned stimuli. The ORN provides a metric to assess whether age and/or hearing loss impaired listeners' ability to segregate concurrent sounds based on periodicity cues, which is important for understanding speech in noisy situations. If hearing loss plays an important role in the age-related difference in central auditory processing, as previously reported, then we should observe a group difference in neuromagnetic brain activity elicited by sound onset and mistuning between older adults with normal-hearing and those who are hearing-impaired.

#### **MATERIALS AND METHODS**

#### **PARTICIPANTS**

A total of 36 older adults were recruited for the study. Based on their hearing status (pure tone thresholds, see below), they were categorized into one of two groups, normal hearing vs. mildly hearing impaired. All were screened to exclude health or mental problems and/or medications that might affect cognitive function or brain activity.

Participants were recruited from the local community and provided informed consent in accordance with the guidelines established by the University of Toronto and Baycrest Centre. Two participants were excluded due to excessive head motion during the MEG recording (one from each group). A final sample of 17 hearing impaired adults (range: 62–82; mean age = 70.6, standard deviation (s.d.) = 6.3; 8 men) were compared with a sample of 17 normal hearing adults (range: 63–76; mean age = 67.8, s.d. = 3.6; 9 men). The two groups did not differ in their mean age (*t*(32) = 1.60; *P* = 0.12). All but one participant in each group were right handed.

#### **HEARING ASSESSMENT**

Our criteria for mild hearing loss were pure-tone thresholds greater than 25 decibel (dB) and hearing level (HL) for octave frequencies from 250 to 2000 Hz in both ears. Participants with pure-tone thresholds less than or equal to 25 dB hearing level were included in the normal hearing group. All participants completed a speech-in-noise (SIN) test. Four lists of six sentences were used from the Quick SIN (Etymotic Research, 2001; Killion et al., 2004) test. All sentences were spoken by a female in a background of four-talker "babble" at 70 dB sound pressure level (SPL). The babble in each list of sentences was increased in 5 dB steps in order to vary the signal-to-noise ratio (SNR) from 0 dB to +25 dB. Participants repeated back the target sentence. Each sentence included five "keywords". A point was awarded for each key word of a possible total of five points per sentence. The SNR loss was determined by subtracting the total number of correct words from 25.5. This number represents the SNR required to correctly identify 50% of the sentences (Killion et al., 2004).

All hearing impaired participants filled in a brief hearing checklist (self-report questionnaire) that was based on the Revised American Academy of Otolaryngology-Head & Neck Surgery's 5 min hearing test (Koike et al., 1994), to compare the amount of hearing loss with the hearing loss of other older adults.

#### **STIMULI AND TASK**

The stimuli used during the recording of neuromagnetic activity consisted of complex sounds created by combining ten pure tones of equal intensity (i.e., 200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800, and 2000 Hz). The fundamental frequency (*f* <sup>0</sup>) was 200 Hz and the third partial could either be tuned (i.e., 0% mistuning at 600 Hz) or mistuned upwards from its original value by 4% (to 624 Hz) or 16% (to 696 Hz). Stimuli were digitally generated at a sampling rate of 24,414 Hz using a System 3 Real-Time Processor (Tucker Davis Technologies, Alachua, FL) and presented binaurally via an OB 822 Clinical Audiometer using ER-3A transducers (Etymotic Research, Elk Grove, IL, USA) and reflectionless 2.5-m plastic tubes. All three stimulus types (0%, 4%, and 16% mistuning) had durations of 200 ms and rise/fall times of 5 ms. They were equiprobable and presented in random order with an inter-stimulus interval (ISI) that varied randomly between 800 and 1200 ms in 100 ms steps (rectangular distribution). All participants, regardless of their hearing status, were presented with stimuli at a fixed 75 dB SPL. There were three listening conditions corresponding to the background against which the stimuli were presented within a block of trials. That is, within the entire block of trials, the stimuli were either presented without background noise, or against a continuous low (45 dBA), or against a continuous moderate (65 dBA) level broadband Gaussian white noise. For each noise condition, participants were presented with a total of 300 trials (100 of each stimulus type: 0%, 4%, and 16% mistuning), for a grand total of 900 stimuli during the course of the experiment. The order of noise conditions was counterbalanced between participants. The intensity of the stimuli and the intensity of the noise were measured using a Larson-Davis SPL meter (Model 824, Provo, Utah). The plastic tubes from the ER-3A transducers were attached to a 2 cc coupler on an artificial ear (Model AEC100l) connected to the SPL meter. Separate measurements were taken for left and right ear channels.

All participants completed a behavioral task that was performed after the MEG recording in a double-walled sound attenuated chamber (IAC model 1204A, Electromedical Instruments, Mississauga, ON). The same stimuli were presented over Eartone ER-3A insert earphones using a System 3 Real Time Processor and a GSI 61 Clinical Audiometer. Following the presentation of each stimulus, participants were asked to indicate if they heard one sound (i.e., a buzz) or two sounds (i.e., a buzz plus another sound with a pure tone quality) by pressing two different buttons label as "1" or "2", respectively. These responses (i.e., 1 or 2) reflected participants' perceptions based on the mistuning manipulation. Responses were registered using a multi-button response box and the next stimulus was presented 1500 ms following the previous response. Participants did not receive feedback on their performance. Prior to the behavioral experiment, participants were presented with a sample of stimuli to familiarize themselves with the task and the response box. After this familiarization phase, participants completed six blocks of trials. For each noise condition, two blocks of 150 trials were presented for a total of 300 trials (100 of each stimulus type: 0%, 4%, and 16% mistuning). For each participant, we calculated the proportion of trials in which participants reported hearing two sounds as well as dprime (*d* 0 ) and beta (β) values. For the calculation of *d* 0 and β, trials whereby participants were presented with 0% mistuning and responded "2" were treated as "false alarms", and trials whereby participants were presented with mistuned stimuli and responded "2" were treated as "hits" (Moore et al., 1986; Alain et al., 2012).

#### **NEUROMAGNETIC RECORDING AND ANALYSIS**

The MEG recording took place in a magnetically shielded room using a helmet shaped 151-channel whole cortex neuromagnetometer (OMEGA, CTF Systems, VSM Medtech Inc., Vancouver, Canada). AEFs were recorded in a passive listening session as participants watched a muted, subtitled movie of their choice. This design allowed us to examine the impact of noise on cortical activity elicited by stimuli while minimizing the influence of top-down processes on AEF amplitude. The use of muted subtitled movies has been shown to effectively capture attention without interfering with auditory processing (Pettigrew et al., 2004). To minimize movement, participants laid down throughout the recording.

The neuromagnetic activity was recorded continuously with a sampling rate of 625 Hz and an on-line, low-pass filter with a cutoff frequency of 200 Hz. The analysis epoch included 200 ms of pre-stimulus activity and 600 ms of post-stimulus activity. The epochs were scanned for artifacts using Brain Electrical Source Analysis (BESA) software (version 5.2). To account for individual differences in the amplitude of neuromagnetic brain activity, the maximum intensity for accepting single epochs of the MEG signals was adjusted for each participant and ranged from 1515 to 6607 fT/cm. AEFs were averaged separately for each stimulus type and noise condition (i.e., block of trials). For normal hearing adults, the number of trials included in the singlesubject grand average (i.e., all stimulus type and noise condition combined) ranged from 587 to 697 while for hearing impaired adults it ranged from 587 to 752. For each participant and for each noise condition, we computed the grand average of AEFs that comprised all stimulus types. This average was used to generate a dipole source model of the scalp recorded AEFs. The source waveforms for each experimental condition were computed from the resulting source model.

We used BESA software (version 5.2) for dipole source modeling. The analysis used the spherical head model of Sarvas (1987). Before dipole source modeling, the averaged data were low-pass filtered at 20 Hz (12 dB/octave; zero phase). First, we seeded a left and a right dipole in the temporal lobe near Heschl's gyrus using a magnetic resonance imaging template from BESA. Then, we fitted the location and orientation of each dipole to account for a 40 ms interval centered on the peak of the N1m wave. We chose to model the N1m wave because it was the largest and most reliable deflection from the AEF elicited by the harmonic complex tones. The analysis was performed on the grand average across stimulus types to enhance signal-to-noise ratio and because the differences in source location between the N1m elicited by tuned and mistuned stimuli were expected to be small (Arnott et al., 2011). Peak amplitude and latency were determined as the largest positivity (P) or negativity (N) in the individual source waveforms during a specific interval. The measurement intervals were 30–90 ms (P1m), 70–160 ms (N1m), 150–260 ms (P2m), and 120–220 ms (ORN). AEF amplitude and latency were analyzed using a mixed model repeated measures ANOVA with hearing status (normal, impaired) as the between-groups factor and mistuning (0%, 4%, 16%), noise condition (no, low, moderate), and hemisphere (left, right) as the within-group factors. When appropriate, the degrees of freedom were adjusted with the Greenhouse-Geisser epsilon () and all reported probability estimates are based on the reduced degrees of freedom, although the original degrees of freedom are reported. Bonferroni corrections were applied for all posthoc, pairwise comparisons. For the behavioral data, we performed mixed model repeated measures ANOVA with hearing status (normal, impaired) as the between-groups factor and mistuning (0%, 4%, 16%), and noise condition (no, low, moderate) as the within-group factors.

## **RESULTS**

## **BEHAVIORAL DATA**

#### **Pure tone thresholds**

The two experimental groups significantly differed in pure tone thresholds (**Figure 1**) with normal hearing adults having lower average thresholds (*M* = 12.7 dB HL) than hearing impaired

adults (*M* = 32.3 dB HL), *F*(1,32) = 75.64, *P* < 0.001, η <sup>2</sup> = 0.70) for 250–2000 Hz. Despite the difference, both normal hearing and hearing impaired groups showed typical age-related decline in pure tone thresholds, most prominent in the higher frequencies, *F*(3,96) = 11.62, *P* < 0.001, η <sup>2</sup> = 0.27. This decline was even more prominent in the hearing impaired group (interaction group × tone frequency, *F*(3,96) = 3.62, *P* = 0.002, η <sup>2</sup> = 0.01). There were no differences in hearing thresholds between the left and the right ear in either the normal hearing group or hearing impaired group (hemisphere *F* < 1; group × hemisphere *F* < 1; hemisphere × tone frequency, *F*(3,96) = 1.48, *P* = 0.23).

#### **Quick hearing check**

According to the quick hearing check, all participants classified as hearing impaired reached scores that were within the lower 10 to 35% of other older adults classified as hearing impaired (Koike et al., 1994). Furthermore, the hearing impaired group performed worse in the Quick SIN test as revealed by a higher SNR loss (*M* = 4.21, *s.e.* = 0.72) compared to the normal hearing group (*M* = 1.90, *s.e.* = 0.64; *t*(32) = −2.40, *P* = 0.02).

#### **Behavioral task**

Within the hearing impaired group, 5 of 17 participants did not finish the behavioral task and hence only 12 hearing impaired participants were included in the final analysis of the behavioral data. Overall, results showed that the proportion whereby participants indicated hearing two concurrent sounds increased with mistuning, *F*(2,56) = 64.33, *P* < 0.001, η <sup>2</sup> = 0.70, with the steepest slope for the low noise condition (interaction mistuning × noise, *F*(4,112) = 2.23, *P* = 0.07, η <sup>2</sup> = 0.07; **Figure 2**). The main effect of hearing status and noise was not significant, nor was the interaction between hearing status and noise. The three-way interaction between hearing status, noise and mistuning was not significant.

darker gray bar; Hearing impaired: lighter gray bar.

**Figure 3** shows the group mean sensitivity (i.e., *d* 0 ) and response bias (i.e., β) in hearing impaired older adults compared to normal hearing older adults in the three noise conditions. There was an increase in *d* <sup>0</sup> with mistuning (**Figure 3A**; *F*(1,27) = 14.23, *P* = 0.001, η <sup>2</sup> = 0.34), but a decrease with increasing noise (*F*(2,54) = 5.27, *P =* 0.008, η <sup>2</sup> = 0.16). The main effect of hearing status was not significant, nor was the interaction between hearing status and mistuning or noise.

Overall, the response bias varied as a function of mistuning (**Figure 3B**), *F*(1,27) = 5.03, *P* = 0.03, η <sup>2</sup> = 0.16, with smaller β values for the 16% mistuning condition. However, there was no significant difference between normal hearing and hearing impaired older adults in β, nor was the interaction between hearing status and mistuning or hearing status and noise significant.

#### **AUDITORY EVOKED FIELDS AND DIPOLE SOURCE LOCATION**

**Figure 4** overlays the time course of the AEFs recorded with all MEG sensors averaged over the three noise conditions. The AEFs comprise an initial P1m peak at 68 ms followed by the larger N1m and P2m responses at 125 and 220 ms, respectively. The magnetic

field topography at the latency of N1m for two representative participants is consistent with bilateral sources in the auditory cortices along the superior temporal gyrus.

The group mean locations for the N1m are also shown in **Figure 4**. The effects of hearing status and noise on N1m source locations were examined by comparing source coordinates (x, y, and z coordinates) separately. For the lateral-medial axis, the main effects of hearing status, noise, and hemisphere were not significant nor was the interaction between any of the factors. For the anterior-posterior axis, there was the typical main effect of hemisphere, *F*(1,32) = 18.25, *P* < 0.001, η <sup>2</sup> = 0.36), indicating that the source in the right hemisphere was more anterior than the one in the left hemisphere. The main effect of hearing status was also significant (*F*(1,32) = 4.80, *P* = 0.04, η <sup>2</sup> = 0.13), with more posterior N1 source locations for the hearing impaired group. There was also a main effect of noise (*F*(2,64) = 6.84, *P* = 0.002, η <sup>2</sup> = 0.18). Pairwise comparisons revealed that the N1m source location was more anterior under moderate noise conditions compared to no and low noise conditions (*P* < 0.05 in both cases). There was no difference in source location between the no and low noise conditions (*P* = 0.50).

For the inferior-superior axis, the hearing impaired group showed more inferior N1m source locations than the normal

hearing group (*F*(1,32) = 4.89, *P* = 0.03, η <sup>2</sup> = 0.13). Furthermore, the N1m source was located more superior in the left hemisphere compared to the right hemisphere (*F*(1,32) = 4.20, *P* = 0.049, η <sup>2</sup> = 0.11). No other effect reached statistical significance.

#### **SOURCE WAVEFORMS**

**Figure 5** shows the group mean source waveforms in each hemisphere for hearing impaired and normal hearing older adults as well as for each stimulus type as a function of background noise. In both groups, the source waveforms comprised a P1m, N1m, and P2m peaking at about 68 ms, 125 ms, and 220 ms after sound onset.

#### **Effects of hearing status, mistuning, and noise on auditory evoked field (AEF) latency**

Analysis revealed no main effect of hearing status on P1m latency (*F* < 1). For both groups the P1m peaked earlier in the right (*M* = 66.00, *s.e.* = 1.47 ms) than in the left (*M* = 71.45, *s.e.* = 1.47 ms) hemisphere, *F*(1,32) = 43.05, *P* < 0.001, η <sup>2</sup> = 0.57. There was a main effect of noise such that stimuli embedded in moderate noise levels generated longer P1m latency compared to lower or no noise conditions, *F*(2,64) = 8.75, *P* < 0.005, η <sup>2</sup> = 0.21 (**Table 1**). The interaction between hearing status and noise conditions was significant (*F*(2,64) = 4.36, *P* = 0.03, η <sup>2</sup> = 0.15). P1m latency increased with noise only in normal hearing older adults (*F*(2,32) = 13.03, *P* < 0.001). In hearing impaired older adults, the main effect of noise on P1m latency was not significant (*F* < 1). The


**Table 1 | Group mean P1m latency and amplitude.**

main effect of mistuning was significant, *F*(2,64) = 5.50, *P* = 0.007, η <sup>2</sup> = 0.15. The 16% mistuned stimuli generated a longer P1m latency than the tuned or 4% mistuned stimuli (*P* < 0.05 in both cases). There was no difference in latency between the tuned and


**Table 2 | Group mean N1m latency and amplitude.**

**Table 3 | Group mean P2m latency and amplitude**.


4% mistuned stimuli nor did mistuning interact with any other factors.

As with P1m, there was no main effect of hearing status on N1m peak latency, *F*(1,32) = 1.78, *P* = 0.19, (**Table 2**) and the N1m peaked earlier in the right (*M* = 122.32, *s.e.* = 1.92 ms) than in the left (*M* = 124.96, *s.e.* = 1.84 ms) hemisphere, *F*(1,32) = 6.13, *P* = 0.02, η <sup>2</sup> = 0.16. N1m latency increased with increasing background noise levels, *F*(2,64) = 28.62, *P* < 0.001, η <sup>2</sup> = 0.47, and this effect of noise was more pronounced for normal hearing adults (noise × group interaction, *F*(2,64) = 3.55, *P* = 0.04, η <sup>2</sup> = 0.10). Furthermore, the effect of noise on the N1m latency was more pronounced in the left than in the right hemisphere (noise × hemisphere interaction, *F*(2,64) = 3.59, *P* = 0.04, η <sup>2</sup> = 0.10). The main effect of mistuning was significant, *F*(2,64) = 2.36, *P* < 0.001, η <sup>2</sup> = 0.13 such that the 16% mistuned stimuli generated a longer latency than the tuned or 4% mistuned stimuli (*P* < 0.001, in both cases). There was no difference between the tuned and 4% mistuned stimuli (*P* = 0.63).

The effects of hearing status and experimental condition on the P2m latency are summarized in **Table 3**. As with the P1m and N1m, the P2m peak latency was unaffected by hearing status (*F* < 1). The P2m peaked earlier in the right (*M* = 210, *s.e.* = 4.3 ms) than in the left (*M* = 221, *s.e.* = 4.0 ms) hemisphere, *F*(1,32) = 9.47, *P* = 0.004, η <sup>2</sup> = 0.23. The main effect of noise level was significant, *F*(2,64) = 11.86, *P* < 0.001, η <sup>2</sup> = 0.27. Stimuli embedded in a moderate level of noise generated a longer P2m latency than those in the no noise or low noise condition (*P* < 0.001, in both cases). There was no difference in P2m latency between the no noise and the low noise conditions (*P* = 0.99). The main effect of mistuning was significant, *F*(2,64) = 3.82, *P* < 0.03, η <sup>2</sup> = 0.11. The 16% mistuned stimuli generated a longer latency than the tuned stimuli (*P* = 0.03). There was no significant difference between the 16% and the 4% mistuned stimuli (*P* = 0.11), nor was the difference between tuned and 4% mistuned stimuli significant (*P* = 0.99).

To sum it up, the latency of the P1m, N1m, and P2m was little affected by hearing loss. The source waveforms elicited by harmonic complex tones peaked earlier in the right than in the left hemisphere. Background noise and mistuning increased the latency of source activity from the auditory cortex. This effect of background noise on the latency of source waveforms was greater in normal hearing older adults than hearing impaired older adults. The effect of mistuning on the latency of source waveforms was comparable in both groups.

#### **Effects of hearing impairment and noise on auditory evoked field (AEF) amplitude**

The main effect of hearing status on P1m amplitude was significant (*F*(1,32) = 5.76, *P* = 0.02, η <sup>2</sup> = 0.15), with hearing impaired older adults generating a larger P1m response (*M* = 14.45, *s.e.* = 1.41) than normal hearing older adults (*M* = 9.67, *s.e.* = 1.41). The main effect of noise was also significant, *F*(2,64) = 15.55, *P* < 0.001, η <sup>2</sup> = 0.33). The P1m amplitude was smaller in the moderate noise condition than in the no noise or low noise conditions (*P* < 0.005, in both cases). There was no difference in P1m amplitude between the no noise and the low noise condition (*P* = 0.63). The interaction between hearing status and noise condition was not significant (*F*(2,64) = 1.14, *P* = 0.32). There was a main effect of mistuning, *F*(2,64) = 25.23, *P* < 0.001, η <sup>2</sup> = 0.44 as well as a significant interaction between hearing status and mistuning (*F*(2,64) = 3.71, *P* = 0.04, η <sup>2</sup> = 0.10). Overall, the P1m was smaller for the 4% mistuned stimuli than for the tuned or the 16% mistuned stimuli (*P* < 0.001 in both cases) whereas the P1m generated by the 16% mistuned harmonic was larger than the one elicited by the tuned stimuli (*P* = 0.024). The interaction between hearing status and mistuning was due to greater changes in P1m as a function of harmonicity in older adults with mild hearing loss than older adults with normal hearing.

The ANOVA on the P1m amplitude also revealed a significant interaction between mistuning and noise (*F*(4,128) = 22.09, *P* < 0.001, η <sup>2</sup> = 0.41) and between hearing status, mistuning and noise (*F*(4,128) = 3.96, *P* = 0.005, η <sup>2</sup> = 0.11). Post hoc testing revealed that the effect of mistuning on P1m amplitude was actually present only in the no noise condition and that this effect was bigger for the hearing impaired group (no noise, main effect mistuning *F*(2,64) = 34.83, *P* < 0.001; no noise, hearing status × mistuning *F*(2,64) = 5.39, *P* = 0.007; no noise, effect of mistuning for hearing impaired *F*(2,32) = 24.25, *P* < 0.001; no noise, effect of mistuning for normal hearing *F*(2,32) = 10.61, *P* = 0.002; no significant effect of mistuning for low noise and moderate noise). Further, the interactions of noise by hemisphere (*F*(2,62) = 7.59, *P* =0.002, η <sup>2</sup> = 0.19), mistuning by hemisphere (*F*(2,64) = 18.51, *P* < 0.001, η <sup>2</sup> = 0.37), and mistuning by noise by hemisphere (*F*(4,128) = 12.42, *P* < 0.001, η <sup>2</sup> = 0.28) were significant.

The N1m amplitude was not significantly affected by hearing status (*F*(1,32) = 1.46, *P* = 0.24). However, there was a main effect of noise (*F*(2,64) = 14.84, *P* < 0.001, η <sup>2</sup> = 0.32). The N1m was larger when stimuli were embedded in low or moderate background noise than when there was no background noise (*P* = 0.01 in both cases). There was no difference between the low and moderate noise condition (*P* = 0.27). The main effect of mistuning was also significant, (*F*(2,64) = 6.74, *P* = 0.005, η <sup>2</sup> = 0.17), and this effect is likely due to the ORN superimposed on the N1m thereby increasing its amplitude (see below). The interaction between hearing status and noise was not significant (*F* < 1) nor was the interaction between hearing status and mistuning (*F* < 1). The three-way interaction between hearing status, noise, and mistuning was not significant (*F* < 1). Lastly, the N1m amplitude was larger in the right than in the left hemisphere, *F*(1,32) = 5.06, *P* = 0.03, η <sup>2</sup> = 0.14.

For the P2m peak amplitude, the main effect of hearing loss was not significant (*F*(1,32) = 2.61, *P* = 0.12) nor was the main effect of noise (*F* < 1), or the interaction between hearing status and noise (*F* < 1). The main effect of mistuning was significant (*F*(2,64) = 17.29, *P* < 0.001, η <sup>2</sup> = 0.35), with the P2m amplitude decreasing with the presence of mistuning. Pairwise comparison revealed smaller P2m for the 4% and 16% mistuned stimuli relative to tuned stimuli (*P* < 0.001, in both cases). The difference between 4% and 16% was not significant (*P* = 0.28). Further, the interaction between noise and hemisphere was significant (*F*(2,64) = 6.15, *P* = 0.004, η <sup>2</sup> = 0.16). However, none of the pairwise comparisons reached statistical significance. No other main effects or interactions reached significance.

#### **Object-related negativity (ORN)**

The effects of hearing status and noise on concurrent sound segregation are best illustrated in the difference waves between source waveforms for the tuned stimuli and for the 4% and the 16% mistuned stimuli (cf. Alain et al., 2001). For both groups, this difference wave showed an ORN that peaked at about 170 ms after sound onset (**Figure 6**).

**16%mistuned stimuli and the corresponding difference in source waveforms.** The ORN peaked earlier when the harmonic was mistuned by 16% (M = 165, s.e. = 1.7 ms) than when it was mistuned by 4% (M = 175, s.e. = 1.7 ms), F(1,32) = 18.71, P < 0.001, η <sup>2</sup> = 0.37. The

was the effect of noise (F < 1) or hemisphere (F(1,32) = 2.70, P = 0.11). The interaction between hearing status and mistuning was not significant (F(1,32) = 2.01, P = 0.17) nor was the interaction between hearing status and noise (F < 1).

The 16% mistuned harmonic stimuli generated a larger ORN amplitude than the 4% mistuned stimuli, *F*(1,32) = 65.66, *P* < 0.001, η <sup>2</sup> = 0.67. The effect of hearing status on the ORN peak amplitude was significant (*F*(1,32) = 9.70, *P* = 0.004, η <sup>2</sup> = 0.23), with hearing impaired older adults generating larger ORN than normal hearing older adults. The interaction between hearing status and mistuning was not significant (*F* < 1), nor was the three way interaction between group, mistuning and hemisphere (*F* < 1). No further main effects or interactions reached statistical significance.

## **DISCUSSION**

The primary aim of this study was to assess the impact of mild hearing loss on cortical evoked responses in an effort to clarify age-related changes in auditory evoked responses. Specifically, this study examines whether these previously reported changes in the literature reflect age per se or whether they are partly due to agerelated hearing loss. In the present study, there was a difference in pure tone thresholds of about 20 dB HL between older adults with normal hearing and those included in the mild hearing loss group. Moreover, older adults with mild hearing loss, as defined by pure tone thresholds, showed elevated thresholds for understanding speech in noise. Surprisingly, there was no significant difference between the two groups in the proportion of trials for which participants indicated hearing two concurrent sounds as a function of mistuning. That is, the likelihood of hearing the mistuned harmonic as a separate sound was little affected by mild hearing loss. Although this may appear surprising, the lack of differences between groups might be due to the drop out of bad performers during the behavioral part of the experiment in the hearing impaired group only. The lack of group difference could also be related to the forced choice procedure, which involved a subjective component and did not capture participants' thresholds in detecting mistuning or parsing the mistuned harmonic as a separate tone.

#### **EFFECT OF MILD HEARING LOSS ON AUDITORY EVOKED FIELDS (AEFs)**

In both groups, the source waveforms derived from modeling the N1m wave with bilateral dipoles in or near Heschl's gyrus comprised a P1m, N1m, and P2m deflection. The N1m source location was more anterior and more inferior in the right than in the left hemisphere, and this is consistent with prior MEG studies (e.g., Pantev et al., 1998; Ross et al., 2010). These differences between left and right N1m sources map onto the anatomical asymmetry that showed a more anterior auditory cortex in the right than in the left hemisphere (Penhune et al., 1996; Leonard et al., 1998; Rademacher et al., 2001). The N1m source location was more posterior and more inferior in older adults with hearing impairment than without. This group difference was unexpected and may reflect neuroplastic changes in auditory cortices associated with peripheral hearing loss. For example, the changes in N1m source location could reflect the recruitment of cortical neurons that have lost their afferent input due to peripheral hearing loss (Dietrich et al., 2001). The recruitment of de-afferented neurons could also contribute to the increased amplitude of neuromagnetic responses.

The P1m amplitude was larger in older adults with mild hearing loss than in older adults with normal hearing, which suggests that aging is not the sole contributor to increased sensory evoked response amplitude. Hearing loss, therefore, plays an important role in modulating cortical evoked response amplitude. Furthermore, since stimuli and noise SPL levels were identical for all participants, sound intensity cannot account for increased P1m amplitude between the two groups. However, identical sound intensity may not have yielded the same loudness and it is possible that older adults with hearing impairment experienced greater loudness recruitment than older adults with normal hearing. Loudness recruitment refers to the perceptual phenomenon of sounds becoming rapidly louder with increasing sound levels and has been proposed to contribute to increased N1m amplitude in patients with hearing impairment (Morita et al., 2003).

Prior research examining the effect of hearing impairment on auditory evoked responses have observed larger N1 amplitudes in hearing impaired than in normal hearing adults (Morita et al., 2003; Tremblay et al., 2003; Harkrider et al., 2006). In the present study, the main effect of hearing status on N1m amplitude was not significant, despite the N1m source waveforms appearing slightly larger in the hearing impaired. The difference between the present study and prior work could be related to the material used. Prior research used speech sounds (Tremblay et al., 2003; Harkrider et al., 2006) or pure tones (Morita et al., 2003) with relatively long rise/fall times whereas we used complex sounds that had a well-defined and abrupt rise time. Consequently, our stimuli were more optimal to generate P1m and N1m responses whereas the material used in prior studies only generated a clear N1 response. Thus, our finding coupled with those of prior research provides converging evidence for increased neural excitability in primary and associative auditory cortices following sensoryneural hearing loss. Future experiments that manipulate the stimulus envelope is, however, needed to better understand the origin of these differences in the latency of the effect of hearing loss on cortical evoked responses.

In aging research, the enhanced amplitude of sensory evoked responses (i.e., P1 and N1 deflections) is often thought to index listeners' difficulty in filtering out task-irrelevant information (Chao and Knight, 1997; Alain and Woods, 1999; West and Alain, 2000; Gazzaley et al., 2005), which has been related to a decline in prefrontal function (West, 1996; Chao and Knight, 1997; Alain and Woods, 1999). This account emphasizes the role of top-down control processes mediated by higher brain functions and assumes deficits in descending auditory pathways. Empirical support for this proposal includes studies in humans that showed increased amplitudes of middle latency auditory evoked responses following lesions to the dorsolateral prefrontal cortex (Knight et al., 1989; Alho et al., 1994). An alternative of this top-down account focuses on impaired inhibitory functions along the ascending auditory pathways as well as a loss of frequency selectivity. Evidence from animal studies has revealed a relationship between sensory-neural hearing loss and increased neural excitability within the ascending auditory pathway (Willott and Lu, 1982) and primary auditory cortex (Kotak et al., 2005), which likely reflects deficits in inhibition (Willott and Lu, 1982; Caspary et al., 1995, 2005; Kotak et al., 2005). It is worth noting that we observed effects of mild hearing loss primarily on early exogenous evoked responses. This suggests that sensory-neural hearing loss impacts early cortical registration of acoustic information and further research is needed to determine whether even earlier stages of cortical processing would be impacted by hearing loss and how this relates to auditory perception and attention.

As in our previous aging study, we also found that the ORN amplitude increased with increasing mistuning. Interestingly, the effect of mistuning on perception was comparable in older adults with or without mild hearing loss. However, the ORN elicited by the mistuned harmonic was enhanced in older adults with mild hearing loss compared to normal hearing older adults. The enhanced ORN amplitude in mildly hearing impaired older adults was unexpected and could be due to loudness recruitment and/or increased cortical reactivity. The effect of mild hearing loss on both transient onset responses and ORN provides converging evidence for a general decline in inhibitory processes along the ascending auditory pathway, which likely contributes to increased time-locked cortical activity.

#### **EFFECTS OF HEARING LOSS AND NOISE ON THE SPEED OF AUDITORY PROCESSING**

The peak latency of AEFs indicates the amount of time taken to generate the neuromagnetic response after sound onset and provides a means for assessing speed of auditory processing. In the present study, the P1m latency was comparable for the two groups. This suggests that conduction time in the ascending auditory pathways may be little affected by mild hearing loss. The N1m and P2m latencies were also comparable in older adults with and without mild hearing loss. Our findings are consistent with those of earlier EEG studies using speech sounds (Tremblay et al., 2003; Harkrider et al., 2006), which also found that mild hearing loss had little impact on the latencies of the N1 and P2 waves. This is important as it suggests that mild hearing loss does not impact the speed of processing. In a prior study using the same stimuli and listening conditions (Alain et al., 2012), we found agerelated increases in N1m and P2m latencies, which was consistent with earlier research (e.g.,Tremblay et al., 2002; Matilainen et al., 2010). The age-related increase in latency of auditory evoked responses may reflect a slowing in auditory processing and/or a broadening of the temporal integration window (Emmer et al., 2006; Gleich et al., 2007; Huang et al., 2009) with older adults reaching a saturation in the auditory evoked responses at a longer latency than young adults. The fact that the ORN latency was comparable between older adults with and without hearing loss suggests that mild hearing loss does not significantly delay the early computation needed to segregate the mistuned harmonic as a separate sound object.

Presenting auditory stimuli against moderate background noise increased latencies of P1m, N1m, and P2m deflections. The noise-related increase in latencies of exogenous components is consistent with prior studies using speech sounds (Whiting et al., 1998; Parbery-Clark et al., 2011). In the present study, the noiserelated increase in P1m and N1m latencies was only present for the normal hearing group, suggesting that mild hearing loss and background noise do not independently affect central auditory processing. Further research is needed to characterize in more detail the impact of hearing loss on speed of processing and how it interacts with age.

## **NOISE-INDUCED INCREASE IN N1m AMPLITUDE**

Another goal of the present study was to assess whether the noise-induced increase in N1m amplitude previously reported for young adults would also be observed in older adults with mild hearing loss. For both groups, the N1m amplitude was larger when stimuli were embedded in low and moderate background noise than when there was no background noise. As such, this finding replicates and extends those of earlier studies (Alain et al., 2009, 2012) to older adults with mild hearing loss. This noise-induced increase in sensory evoked responses is not limited to harmonic complexes, but has also been observed for speech sounds in young adults (Shtyrov et al., 1999; Parbery-Clark et al., 2011) and children (Anderson et al., 2010). The increased N1 amplitude may be related to efferent feedback connections between the auditory cortex and the thalamus, inferior colliculus, and/or auditory brainstem nuclei, whose role would be to enhance the SNR in adverse listening situations (Alain et al., 2009). Such a proposal is supported by a positive correlation between the N1 amplitude for stimuli embedded in noise and the performance in a speech-in-noise task (Parbery-Clark et al., 2011). Another possibility is that low and moderate levels of background noise enhance states of arousal, which then similarly enhance the amplitude of sensory evoked responses to improve performance in working memory tasks (Han et al., 2013).

#### **CONCLUDING REMARKS**

The present study aimed to clarify the role of age-related hearing loss in the enhancement of cortical evoked responses previously reported in the literature. We found increased cortical evoked responses in older adults with mild hearing loss compared to age-matched controls that have clinically normal hearing. Our findings suggest that age-related decline in hearing sensitivity plays an important role in modulating the amplitude of auditory evoked responses. More importantly, our findings highlight the importance to control for age-related difference in hearing sensitivity while investigating the impact of aging on central auditory processing.

### **ACKNOWLEDGMENTS**

This research was supported by grants from the Canadian Institutes of Health Research (MOP 106619) and the Natural Sciences and Engineering Research Council of Canada. We wish to thank Jeffrey Wong for his help in preparing the illustrations and for his helpful comments and suggestions regarding the manuscript.

## **REFERENCES**


a memory retrieval task. *Electroencephalogr. Clin. Neurophysiol.* 47, 450–459. doi: 10.1016/0013-4694(79)90161-5


responses to changes in interaural phase. *J. Neurosci.* 27, 11172–11178. doi: 10. 1523/jneurosci.1813-07.2007


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 14 August 2013; accepted: 13 January 2014; published online: 31 January 2014*.

*Citation: Alain C, Roye A and Salloum C (2014) Effects of age-related hearing loss and background noise on neuromagnetic activity from auditory cortex. Front. Syst. Neurosci. 8:8. doi: 10.3389/fnsys.2014.00008*

*This article was submitted to the journal Frontiers in Systems Neuroscience*.

*Copyright © 2014 Alain, Roye and Salloum. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## ADVANTAGES OF PUBLISHING IN FRONTIERS

FAST PUBLICATION Average 90 days from submission to publication

COLLABORATIVE PEER-REVIEW

Designed to be rigorous – yet also collaborative, fair and constructive

RESEARCH NETWORK Our network increases readership for your article

## OPEN ACCESS

Articles are free to read, for greatest visibility

#### TRANSPARENT

Editors and reviewers acknowledged by name on published articles

GLOBAL SPREAD Six million monthly page views worldwide

#### COPYRIGHT TO AUTHORS

No limit to article distribution and re-use

IMPACT METRICS Advanced metrics track your article's impact

SUPPORT By our Swiss-based editorial team

EPFL Innovation Park · Building I · 1015 Lausanne · Switzerland T +41 21 510 17 00 · info@frontiersin.org · frontiersin.org