# THE FUNCTIONAL ORGANIZATION OF THE AUDITORY SYSTEM

EDITED BY: Monica Muñoz-Lopez and Yukiko Kikuchi PUBLISHED IN: Frontiers in Neuroscience

#### *Frontiers Copyright Statement*

*© Copyright 2007-2017 Frontiers Media SA. All rights reserved. All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.*

*The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.*

*Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.*

*Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.*

*As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.*

> *All copyright, and all rights therein, are protected by national and international copyright laws.*

*The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use.*

ISSN 1664-8714 ISBN 978-2-88945-061-9 DOI 10.3389/978-2-88945-061-9

## About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

## Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

## Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

## What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# **THE FUNCTIONAL ORGANIZATION OF THE AUDITORY SYSTEM**

Topic Editors:

**Monica Muñoz-Lopez,** University of Castilla-La Mancha, Spain **Yukiko Kikuchi,** Newcastle University Medical School, UK

The front page is a detail of the facade of the Sage Gateshead building with some of the figures from the articles comprising this e-book. The curvature of the building represents the continuity of sound frequencies. The smooth glass surface used by the architect hints the complexity and, at the same time, the simplicity of sound. This building is entirely dedicated to music performance and education. Photograph courtesy of Yamato Kikuchi with permission.

This eBook comprises s series of original research and review articles dealing with the anatomical, genetic, and physiological organization of the auditory system from humans to monkeys and mice.

**Citation:** Muñoz-Lopez, M., Kikuchi, Y., eds. (2017). The Functional Organization of the Auditory System. Lausanne: Frontiers Media. doi: 10.3389/978-2-88945-061-9

# Table of Contents



Maria Medalla and Helen Barbas


Jonathan M. Lovell, Judith Mylius, Henning Scheich and Michael Brosch

*221 Responses of neurons in the marmoset primary auditory cortex to interaural level differences: comparison of pure tones and vocalizations* Leo L. Lui, Yasamin Mokri, David H. Reser, Marcello G. P. Rosa and Ramesh Rajan

## **Audition in Bats**

*240 Conjugating time and frequency: hemispheric specialization, acoustic uncertainty, and the mustached bat*

Stuart D. Washington and John S. Tillinghast

## **Auditory Research in Humans**

*254 Functional significance of the electrocorticographic auditory responses in the premotor cortex*

Kazuyo Tanji, Kaori Sakurada, Hayato Funiu, Kenichiro Matsuda, Takamasa Kayama, Sayuri Ito and Kyoko Suzuki


Li Su, Isma Zulfiqar, Fawad Jamshed, Elisabeth Fonteneau and William Marslen-Wilson


# Editorial: The Functional Organization of the Auditory System

Monica Munoz-Lopez <sup>1</sup> \* and Yukiko Kikuchi <sup>2</sup>

*<sup>1</sup> Human Neuroanatomy Laboratory, Medical Sciences, Medical School, University of Castilla-La Mancha, Albacete, Spain, 2 Institute of Neuroscience, Newcastle University Medical School, Newcastle Upon Tyne, UK*

Keywords: audition, anatomy, physiology, genetics, fMRI, plasticity, monkeys, rodents

**The Editorial on the Research Topic**

#### **The Functional Organization of the Auditory System**

Hearing, a central ability in our lives, allows us to distinguish the sound of birds from mosquitos, remember our favorite piece of music, have a conversation with a friend, and be alerted by a honk while crossing a road with our child. In other species, such as bats, this sensory modality is key for survival. Many researchers have dedicated their efforts to understand the auditory system. This research topic on the Functional Organization of the Auditory System brings some of the best scientific efforts to understand audition from genetic studies in rodents, electrophysiological recordings in rodents, bats, and monkeys as well as fMRI and translational research in humans. The following paragraphs aim to give a glimpse of the contents in this topic.

## EXPERIMENTAL WORK IN RODENTS

Hearing loss among the aging or after injuries can be very debilitating. Fuentes-Santamaría et al. study experimentally induced conductive hearing loss in rodents and found changes in microglial cells, but not astrocytes, suggesting they may be the dynamic modulators of synaptic transmission in the cochlear nucleus immediately following unilateral hearing loss. Plasticity after injury of any component of the auditory system is of obvious clinical relevance. The thorough review presented by Gold and Bajo, reveals the potential of plasticity after injury, but also the sequels at different levels of the auditory hierarchy, from the brainstem to the cerebral cortex. Auditory driven alerting behavior is critical across species and has been further characterized by Gomez-Nieto et al. providing strong evidence that the cochlear root neurons-pontine reticular nucleus pathway mediates fast neurotransmission and binaural summation of the acoustic startle reflex in the rodent. Auditory hallucinations are a distinct symptom of schizophrenia. Nakao and Nakazawa, present a rodent model of this disorder and show that the "paradoxically" high spontaneous LFP activity of the primary auditory cortex in the absence of external stimuli may possibly contribute to the emergence of schizophrenia-related aberrant auditory perception. These authors suggest that NMDAR hypofunction in cortical GABAergic interneurons leads to two temporally distinct, brain state-dependent LFP deficits in the A1 cortex. Sotoca et al. present a surprising result in a transgenic P23H-1 rat model that opens up new strategies of study on deaf-blindness implicating a direct effect on rhodopsin mutation.

## ELECTROPHYSIOLOGY, ANATOMY, AND fMRI IN MONKEYS

Closer to humans, non-human primates like rhesus (Macaca mulatta) or cynomolgus (Macaca fascicularis) monkeys have a well-defined temporal lobe with superior temporal plane, where the primary auditory cortex lies. This proximity of the monkey brain to humans offers an exceptional

Edited and reviewed by: *Robert J. Zatorre, McGill University, Canada*

\*Correspondence: *Monica Munoz-Lopez monica.munozlopez@uclm.es*

#### Specialty section:

*This article was submitted to Auditory Cognitive Neuroscience, a section of the journal Frontiers in Neuroscience*

> Received: *02 June 2016* Accepted: *10 June 2016* Published: *07 July 2016*

#### Citation:

*Munoz-Lopez M and Kikuchi Y (2016) Editorial: The Functional Organization of the Auditory System. Front. Neurosci. 10:290. doi: 10.3389/fnins.2016.00290* opportunity for better understanding the human nervous system, and the auditory, in particular, yielding some unique access to information on the evolution of audition toward speech. A research on the electrophysiological properties of neuronal responses within the primary and belt areas of the auditory cortex to simple and complex auditory stimuli (Kikuchi et al.) has shown that harmonic features, such as relationships between specific frequency intervals of communication calls, are processed at relatively early stages of the auditory cortical pathway, but preferentially in LB. Detailed anatomical data on the feed-forward and feed-back arrangement of circuits of primary, belt, and parabelt areas presented by Hackett et al. showed the convergence of feedforward inputs into rostral medial belt from middle lateral belt (ML) and caudal parabelt (CPB). This was surprising given that CPB is at a higher stage of the processing hierarchy, with mainly feedback projections to all other belt areas. These data have stimulated a revision of the conventional model. Joly et al. addresses methodological issues of fMRI and anatomical mapping by means of combining information from cortical folding, micro-anatomy, surface-based atlas and tonotopic mapping. The fMRI study by Ortiz-Ríos et al. identifies the network for species-specific vocalization processing in the auditory ventral pathway, and they highlight the involvement of the anterolateral belt and parabelt areas. This fits well with our own anatomical data showing that the rostral superior temporal gyrus sends convergent input to the dorsal half of the temporal pole (Muñoz-López et al.). It is particularly in this region of the dorsal temporal pole that neurons show working memoryrelated responses (Bigelow et al.). Furthermore, even if monkeys can hold auditory stimuli in working memory, their ability to keep them for longer delays is very limited. The anatomical work by one of the editors (Muñoz-López et al.) aims to explain this poor auditory long-term memory ability by demonstrating that the connections from the dorsal temporal pole to medial temporal cortex bypass most of area 36 of the perirhinal cortex, to directly reach area 35, entorhinal, and posterior parahippocampal cortices. This contrasts with the highly dense projections from the visual cortex to area 36 of the perirhinal cortex, and may explain, at least in part, the poorer auditory memory in monkeys compared with vision. These anatomical data support the hypothesis that auditory stimuli can access the medial temporal cortex if intermingled with information from multimodal areas and this may guarantee the persistence of memory traces that include auditory information (Muñoz-Lopez et al.).

Outside the temporal lobe, several prefrontal areas have been put forward as important for auditory processing. The review by Medalla and Barbas points to area 10 as the one receiving the densest auditory input in monkeys. On the other hand, Plakke and Romanski present a complementary view whereby different areas of the dorsolateral and ventrolateral prefrontal cortex participate in different cognitive tasks depending on task demands. Data from electrophysiological recordings by Bigelow et al. highlight the possibility that match-enhancement of neural responses observed in A1 and dorsal temporal pole during working memory may reflect top–down feedback from prefrontal cortex.

From the prospective of motor-auditory interactions, Lovell et al. show that the red nucleus appears to receive inputs from a very early stage of the ascending auditory system, suggesting that sounds might influence the motor control exerted by this brain nucleus.

The study by Lui et al. in marmosets show that, although A1 neurons are sensitive to interaural level differences (ILDs), the ILD sensitivity depends on stimulus types (species-specific calls vs. pure tones). They also show that ILD sensitivity was heavily dependent on binaural levels, suggesting that with the lack of binaural level invariant encoding in A1 neurons, A1 neurons seem to encode auditory space using population codes.

## AUDITION IN BATS

Washington and Tillinghast show the findings in bats to support that the hemispheric specialization for speech and music in humans is based on hemispheric specialization for temporal and spectral resolution.

## AUDITORY RESEARCH IN HUMANS

Patient studies are of critical importance when studying the function of the nervous system. The patient with a tumor resection in the ventral precentral gyrus presented by Tanji et al. demonstrates that a lesion restricted to this auditory-responsive area is enough to cause apraxia of speech. The relevance of this result roots in that this lesion is sufficient to impair the execution of fluent speech, but leaves speech perception/comprehension intact. This result has important theoretical implications for the motor Sensory vs. Motor theories of speech. This study takes parts and bets for the sensory theory of speech. Lahav and Skoe present their work in neonatal intensive care units with the aim of better understanding the impact of the noisy sound environment on the developing auditory system as an important first step in meeting the developmental demands of preterm newborns. Su et al. present a novel method to resolve spatial organizations of human auditory cortex using a combination of EEG, MEG, and fMRI data and multivariate pattern analysis.

The importance of individual differences in human primary auditory cortical maps is addressed by Moerel et al. by examining tonotopic maps at a single-subject level to detail the topography of primary and non-primary areas. They merge multiple maps indicative of anatomical (i.e., myelination) as well as of functional properties (e.g., broadness of frequency tuning) to better identify auditory cortical areas in individual human brains. Individual differences add up with dynamics of auditory representations in humans in the fMRI data on attentive vs. inattentive auditory events. Amaral and Langers show the suppressive binaural interaction during a dichotic listening, which illustrates the dynamic properties of our auditory processing.

Of clinical relevance is the study by Johannesen et al. Identifying the multiple contributors to the audiometric loss of a hearing impaired listener at a particular frequency is becoming gradually more useful as new treatments are developed.

This research topic reunites a wealth of studies of diverse animal species including humans and it is our hope that they contribute to better understand mechanisms underlying audition and to achieve real-life applications in clinical populations.

## AUTHOR CONTRIBUTIONS

MM was invited to prepare this RT and invited Dr. YK to be co-editor in it. We both prepared and discussed a list of guestsauthors, invited them, revised their manuscripts and handled their revisions.

## FUNDING

This study was supported by NIMH/IRP grant BFI 2003-09581 and the Spanish Ministry of Science and Innovation grant BFU 2006-12964.

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Munoz-Lopez and Kikuchi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

## Glia-related mechanisms in the anteroventral cochlear nucleus of the adult rat in response to unilateral conductive hearing loss

## *Verónica Fuentes-Santamaría\*, Juan C. Alvarado , Diego F. López-Muñoz , Pedro Melgar-Rojas , María C. Gabaldón-Ull and José M. Juiz*

*Facultad de Medicina, Instituto de Investigación en Discapacidades, Neurológicas (IDINE), Universidad de Castilla-La Mancha, Albacete, Spain*

#### *Edited by:*

*Monica Muñoz-Lopez, University of Castilla-La Mancha, Spain*

#### *Reviewed by:*

*Alino Martinez-Marcos, Universidad de Castilla-La Mancha, Spain Ricardo Gómez-Nieto, Universidad de Salamanca, Spain*

#### *\*Correspondence:*

*Verónica Fuentes-Santamaría, Facultad de Medicina, Universidad de Castilla-La Mancha, Campus de Albacete, Calle Almansa, 14, 02006 Albacete, Spain*

*e-mail: veronica.fuentes@uclm.es*

Conductive hearing loss causes a progressive decline in cochlear activity that may result in functional and structural modifications in auditory neurons. However, whether these activity-dependent changes are accompanied by a glial response involving microglia, astrocytes, or both has not yet been fully elucidated. Accordingly, the present study was designed to determine the involvement of glial related mechanisms in the anteroventral cochlear nucleus (AVCN) of adult rats at 1, 4, 7, and 15 d after removing middle ear ossicles. Quantitative immunohistochemistry analyses at light microscopy with specific markers of microglia or astroglia along with immunocytochemistry at the electron microscopy level were used. Also, in order to test whether trophic support by neurotrophins is modulated in glial cells by auditory activity, the expression and distribution of neurotrophin-3 (NT-3) and its colocalization with microglial or astroglial markers was investigated. Diminished cochlear activity after middle ear ossicle removal leads to a significant ipsilateral increase in the mean gray levels and stained area of microglial cells but not astrocytes in the AVCN at 1 and 4 d post-lesion as compared to the contralateral side and control animals. These results suggest that microglial cells but not astrocytes may act as dynamic modulators of synaptic transmission in the cochlear nucleus immediately following unilateral hearing loss. On the other hand, NT-3 immunostaining was localized mainly in neuronal cell bodies and axons and was upregulated at 1, 4 and 7 d post-lesion. Very few glial cells expressed this neurotrophin in both control and experimental rats, suggesting that NT-3 is primarily activated in neurons and not as much in glia after limiting auditory activity in the AVCN by conductive hearing loss.

**Keywords: ossicle removal, microglial cells, astrocyte, cochlear nucleus, auditory pathways**

## **INTRODUCTION**

Conductive hearing loss is a condition that results in diminished cochlear nerve synaptic activity due to an inefficient sound transmission from the middle to the inner ear (Conlee and Parks, 1981; Tucci and Rubel, 1985). Physiological studies in adult patients and children who suffer from unilateral conductive ear disease have demonstrated that this hearing impairment results in functional anomalies in brainstem auditory evoked responses (ABRs) (Fria and Sabo, 1980; Ferguson et al., 1998). In both adult and juvenile populations, ABR recordings showed significant delays in wave V and in the III-IV interwave intervals. This observation has led to the suggestion that conductive hearing loss is associated with alterations at the central level that might have a significant impact on auditory processing. In an attempt to improve the diagnosis and treatment of these patients, animal models of conductive hearing loss have been developed to elucidate the morphological and functional anomalies associated with these pathological responses. In this regard, a series of studies have demonstrated that unilateral restriction of peripheral inputs to central auditory nuclei leads to decreased activity in auditory neurons of the affected side (Tucci et al., 1999, 2001, 2002; Hutson et al., 2007), strengthening of the ipsilateral projection from the cochlear nucleus to inferior colliculus (Moore et al., 1989), alterations in neurotransmitters release and uptake (Potashner et al., 1997; Suneja et al., 1998), redistribution of AMPA and glycine receptor subunits (Whiting et al., 2009), and modifications in the synthesis and composition of glutamate and glycine receptors (Wang et al., 2011).

Imbalance of neurotransmission after bilateral deprivation of cochlear activity results in long-term interactions between microglial cells and deafferented cochlear nucleus neurons (Campos Torres et al., 1999; Fuentes-Santamaria et al., 2012). In response to modifications in chemical and electrical signals from damaged neurons, microglia rapidly changes to an active phenotype promoting the synthesis and release of cellular mediators, as an attempt to restore synaptic homeostasis and function (Bruce-Keller, 1999; Cullheim and Thams, 2007; Hanisch and Kettenmann, 2007). Growth factors, including neurotrophins, and cytokines are activity-dependent signaling molecules involved in the regulation of synaptic activity in



*LM, light microscopy; CM, confocal microscopy; EM, electron microscopy.*

the healthy and injured brain (Guthrie et al., 1995; Watt and Hobbs, 2000; Hanisch, 2002; Parish et al., 2002). Studies in different lesion models such as ischemia and traumatic brain injury have shown increased production of insulin-like growth factor 1 (IGF-1) and interleukin-1β (IL-1β) by glial cells (Touzani et al., 1999; Rothwell and Luheshi, 2000; Hwang et al., 2004; Madathil et al., 2010). Particularly in the auditory system,

upregulation of IGF-1 and IL-1β levels occurs in neurons but not in either astrocytes or microglia within the ventral cochlear nucleus (AVCN) of adult rats at 1, 7, and 15 d after cochlear ablation (Fuentes-Santamaria et al., 2013). These findings support the idea that additional synthesis of IGF-1 and IL-1β by glial cells is not essential to re-establish damaged auditory synaptic connections and that other molecular mediators might be

**FIGURE 3 | Images depicting Iba1 immunostaining in the AVCN in control and experimental animals.** In the ipsilateral side, Iba1 immunostaining increased at 1d post-lesion (arrows in **C**) and peaked around 4d (arrows in **D**) in comparison with the contralateral side and unoperated animals (arrows in **A,B**). Iba1 levels remained elevated at 7d (arrows in **E**) and decreased at 15d (arrows in **F**) post-lesion. The

morphological features of these cells are shown in **G–K**. Particularly at 7d post-lesion, activated microglial cells assumed very diverse phenotypes (compare asterisk and arrows in **E**). The inset in A indicates the location of the AVCN, and the square box indicates the approximate locations of the fields represented in **A–F**. Scale bar = 250μm in A; 50μm in **F**; 10μm in K.

involved in this process. One possible candidate is neurotrophin-3 (NT-3), a neurotrophic factor expressed in the adult and postnatal auditory system (Hafidi, 1999; Tierney et al., 2001). Apart from its role as a survival factor that, when injected into the cochlea, increases the survival of spiral ganglion neurons after deafness (Wise et al., 2011), NT-3 also participates in the reestablishment of lost synaptic connections. Supporting this concept, increases in NT-3 levels have been found in adult guinea pigs at 7d following unilateral cochlear removal, a time point at which degeneration and synaptogenesis processes take place in the cochlear nucleus (Suneja et al., 2005). It is not known, however, whether NT-3 is dynamically expressed in glial cells after auditory lesions like unilateral conductive hearing loss (UCHL).

In addition to these changes, investigations in several species have provided insights into the role that reactive astrocytes might play in restoring synaptic homeostasis after sensorineural hearing loss (Lurie and Rubel, 1994; De Waele et al., 1996; Insausti et al., 1999; Lurie and Durham, 2000; Campos-Torres et al., 2005). Particularly in the cochlear nucleus, recent findings indicate that after cochlear ablation astrocytic activation is delayed (24 h) and less persistent (*<*30d) relative to microglial responses, which appear earlier (16 h) and last longer (*>*90d) (Fuentes-Santamaria et al., 2012). These observations support the idea that although these non-neuronal elements have different temporal patterns of activation, they both are implicated in reestablishing synaptic function following deafness. In the present study, we interrupted the conduction of sound from the middle ear to the inner ear to

assess whether or not glial cells express NT-3 and contribute to the recovery of synaptic deficits associated with UCHL.

## **MATERIALS AND METHODS**

#### **ANIMALS**

All animal protocols were approved by the Institutional Animal Care and Use Committee at the University of Castilla-La Mancha (Permit Number: PR-2013-02-03). These protocols were in accordance with the guidelines of the European Communities Council (Directive 2010/63/EU) and current national legislation (R.D. 53/2013; Law 32/2007) for the care and use of research animals. For light and confocal microscopy, 16 experimental and 4 agematched unmanipulated control rats were used. An additional group of 12 experimental and 3 age-matched unmanipulated control rats was used for electron microscopy. Two month-old female adult rats were used for all experiments. Following the

contralateral side and unoperated animals. The percentages of variation for each quantitative index are shown in **B,D,F**. The error bars indicate the standard errors of the mean.

surgical procedure, the experimental animals survived for 1, 4, 7, or 15 d.

#### **AUDITORY BRAINSTEM RESPONSES (ABR)**

Animals were anesthetized with isofluorane (4% for induction, 1.5–2% for maintenance with a 1 L/min O2 flow rate) and placed in a sound-attenuating electrically shielded booth (Eymasa/Incotron S.L., Barcelona, Spain) which was located inside a sound-attenuating room. Subdermal needle electrodes (Rochester Electro-Medical, Tampa, FL, USA) were placed at the vertex (positive) and under the right (negative) and left (ground) ears. The stimulation and recording were performed with a Tucker-Davis (TDT) BioSig System III (Tucker-Davis Technologies, Alachua, FL, USA). As previously reported (Alvarado et al., 2012, 2014), the ABR recordings were performed the day before the surgical procedure and at the end of



*Values are means* ± *standard errors. CSA, Cross-sectional area of Iba1 immunostained cells; MGL, Mean gray level of Iba1 immunostaining; ISA, Immunostained area of Iba1. \*p < 0.05; \*\*p < 0.01; \*\*\*p < 0.001; NS, No significant.*

each survival time. The stimuli were digitally generated using the SigGenRP software (Tucker-Davis Technologies) and the RX6 Piranha Multifunction Processor hardware (Tucker-Davis Technologies) and consisted of 5 ms rise/fall time tones, with no plateau and a cos2 envelope, delivered at 20/secat different frequencies across 7 octaves, from 0.5 to 32 kHz. They were delivered monaurally (right ear) using an EDC1 electrostatic speaker driver (Tucker-Davis Technologies) and the EC-1 electrostatic speaker (Tucker-Davis Technologies) which was placed into the external auditory meatus of the rat. Prior to the experiments, stimuli were calibrated using the SigCal software (Tucker-Davis Technologies) and the ER-10B+ low noise microphone system (Etymotic Research Inc, Elk, Groove, IL, USA). The evoked potentials were filtered (0.3–3.0 kHz), averaged (500 waveforms) and stored for later analyses on a computer. Auditory thresholds, for each of the frequencies evaluated, were determined comparing the evoked activity, recorded in 5 dB steps descending from a maximum stimulus intensity of 80 dB SPL, with the background activity measured before the stimulus onset. Auditory thresholds were defined as the stimulus intensity that evoked waves with a peak-to-peak voltage greater than 2 standard deviations above the background activity (Cediel et al., 2006; Garcia-Pino et al., 2010; Alvarado et al., 2012, 2014).

#### **SURGICAL PROCEDURE FOR UNILATERAL CONDUCTIVE HEARING LOSS**

All surgical procedures were performed under aseptic conditions and unilaterally on the right ear. Rats were anesthetized with isofluorane as indicated above. Once the skin behind the ears was shaved, a retroauricular incision was made in order to identify the external auditory canal, which was followed to the tympanic membrane. Using fine forceps, the tympanic membrane was punctured and the malleus and incus were removed. During the surgical procedure a heating pad was used to maintain normal body temperature and recovery from anesthesia. Once awake, animals were returned to their cages and maintained with free access to food and water for the survival period.

#### **PRIMARY ANTIBODIES**

The antibodies used in this study are listed in **Table 1**. Glial and neuronal antibodies included (1) mouse anti-glial fibrillary acidic protein (GFAP); (2) rabbit anti-glial fibrillary acidic protein (GFAP); (3) ionized calcium binding adaptor molecule 1 (Iba1); (4) mouse anti-CD11b; (5) mouse anti-neuronal nuclei (NeuN) and (6) rabbit anti-neurotrophin-3 (NT-3).

#### **IMMUNOPEROXIDASE STAINING PROCEDURE**

After the appropriate post-operative survival time, control and experimental rats were anesthetized with an intraperitoneal injection of ketamine (100 mg/Kg) and xylazine (5 mg/Kg) and perfused transcardially with 0.9% saline wash followed by a fixative solution of 4% paraformadehyde in 0.1 M phosphate buffer (PB, pH 7.3). The brains were removed from the cranium, and crioprotected for 48 h. Frozen sections 40 μm thick were cut on a sliding microtome in a coronal plane. After blocking for 1 h in a solution containing 10% normal goat serum (NGS) diluted in Tris-buffered saline (TBS, pH 7.4) with 0.2% Triton X-100 (TBS-Tx 0.2%), sections were subsequently incubated overnight at 4◦C in the same buffer solution with polyclonal primary antibodies for either Iba1 or GFAP or NT-3 (**Table 1**). The next day, sections were washed in a TBS-Tx 0.2% solution and incubated for 2 h at room temperature (RT) in biotinylated goat anti-rabbit secondary antibody (1:200, Vector Laboratories, Burlingame, CA, USA). Then, after several rinses in TBS-Tx 0.2%, sections were incubated in an avidin-biotinylated peroxidase complex (ABC) and rinsed in TBS. Sections were then exposed to 3–3 dimanobenzidine (DAB) as the chromogenic peroxidase substrate. Care was taken to ensure that incubation times in DAB were identical across control and experimental cases. Finally, the sections were washed thoroughly, mounted on gelatin-coated slides, air-dried, dehydrated in progressive ethanol solutions, cleared in xylene, and coverslipped with Cytoseal® (Stephens Scientific, Wayne, NJ, USA). Three sets of control experiments were performed to test the specificity of immunohistochemistry detection system: (1) omission of the primary antibody by replacement with TBS-BSA; (2) omission of secondary antibodies; and (3) omission of ABC reagent. No immunostaining was detected under these conditions.

**FIGURE 5 | Confocal images showing appositions between microglial cells and neurons in the ipsilateral AVCN in control and deprived rats.** In control rats, microglial cells with round or fusiform cell bodies and ramified processes were in close proximity to cochlear nucleus neurons (arrows and asterisks in **A–C**). The frequency of these appositions was particularly

increased at 1 and 4 d after the lesion, when enlarged microglial cell bodies with short processes were frequently observed opposing the soma and dendrites of cochlear nucleus neurons in the affected side (arrows and asterisks in **D–F**). At later survival times after ossicle removal, the occurrence of these cellular contacts decreased (arrows and asterisks in **G–L**). Scale bar = 25μm in **L**.

#### **DOUBLE IMMUNOFLUORESCENCE LABELING**

Sections were rinsed four times in TBS-Tx 0.2% and blocked for 1 h in the same buffer solution containing 10% NGS. Then, sections were incubated incubated overnight with one of the following mixtures of primary antibodies: (1) NeuN and Iba1; (2) NeuN and GFAP; (3) NT-3 and CD11b; and (4) NT-3 and GFAP primary antibodies. Following four 15 min rinses in TBS-Tx (0.2%), sections were incubated in the corresponding cocktail of fluorescently labeled secondary antibodies for 2 h at room temperature (1:200, anti-mouse antibodies conjugated to Alexa 594 (A-11005) and anti-rabbit antibodies conjugated to Alexa 488 (A-11008; Molecular Probes, Eugene, OR, USA) and after several rinses in TBS, they were mounted, counterstained with DAPI, coverslipped, and maintained overnight at 4◦C. Sections were examined with a laser scanning confocal microscope (LSM 710; Zeiss, Germany) with excitation laser lines at 405, 488, and 594 nm, using the ZEN 2009 Light Edition software. Maximum intensity projections of a z-stack were generated. For each dye, optical sections every 2.5 μm through the thickness of the tissue were captured with a 63X Plan Apo oil-immersion objective (1.4 NA), at fixed camera gain, pinhole size, and laser intensity. Then, images were merged and saved as TIFF files.

#### **ELECTRON MICROSCOPY IMMUNOCYTOCHEMISTRY**

Animals were anesthetized as described above and perfused transcardially with 0.9% saline wash followed by a fixative

have a nucleus with dense heterochromatin, a cytoplasm with numerous organelles and inclusion bodies. Note that multiple labeled processes (P1–P4) perfusion of 4% paraformadehyde and 0.5% glutaraldehyde in

0.1 M PB, pH 7.3. After fixation, the brains were removed, and sectioned at 40μm on a vibratome in the coronal plane. After several washes in PBS, sections were pre-incubated for 1 h in 10% NGS and then incubated overnight at 4◦C with either Iba1 or GFAP polyclonal antibodies diluted in PBS. The following day, after several rinses in PBS, sections were incubated in a dilution of anti-rabbit secondary antibody (1:200) for 2 h at RT and after several rinses they were incubated in ABC for 1 h. Peroxidase activity was visualized with a nickelintensified DAB reaction to produce a black reaction product. Sections containing the cochlear nucleus were post-fixed with osmium tetraoxide (1% in 0.1 M PB) for 1 h, block-stained with 1% uranyl acetate for 30 min, dehydrated in graded series of ethanol and embedded in Durcupan (Fluka) resin. Thin sections (**∼**75 nm) in the silver-gold range were cut on an ultramicrotome (Reichert Ultracut E, Leica, Austria) and collected on 200-mesh copper grids. Tissue was observed using a Jeol-1010 electron microscope.

#### **MEASUREMENTS OF THE CROSS-SECTIONAL AREA OF IBA1 IMMUNOSTAINED CELLS**

1μm in **B,D**; 0.5μm in **C**.

As glial cells could modify their phenotype in response to minor changes in their cellular environment, *the cross-sectional area*, was used as an indicator of possible changes in the soma size of glial cells at the different time points after the UCHL. The cross-sectional area of Iba1 immunostained cells in both control and experimental animals was measured using the public domain image analysis software Scion Image for Windows (Scion, Frederick, MD; v beta 4.0.2). Using a 60<sup>×</sup> objective, three fields (25*.*<sup>16</sup> <sup>×</sup> 103 <sup>μ</sup>m2; dorsal, middle and ventral) were sampled randomly in every fourth section throughout the rostrocaudal extent of the cochlear nucleus. Only cells with a well-defined cell body,

nucleus and nucleolus were measured and included in the analysis.

#### **ANALYSIS OF THE IMMUNOSTAINING**

Immunostained sections from control and experimental animals were examined with bright field illumination using a Nikon Eclipse photomicroscope with a 40× objective and images captured with a DXM 1200C digital camera attached to the microscope. Color images of each field were digitized and the resultant 8-bit image contained a grayscale of pixel intensities that ranged from 0 (white) to 255 (black). As previously described (Fuentes-Santamaria et al., 2003, 2005, 2007; Alvarado et al., 2004, 2005,


**Table 3 | GFAP immunostaining in the AVCN in control and experimental animals.**

*Values are means* ± *standard errors. MGL, Mean gray level of GFAP immunostaining; ISA, Immunostained area of GFAP; NS, No significant.*

2007), the densitometric procedure for the evaluation of the immunostaining was performed by using the public domain image analysis software Scion Image for Windows. Cochlear nucleus subdivisions were defined in accordance with previous terminology (for review, see Cant and Benson, 2003). The analysis of Iba1, GFAP and NT-3 immunostaining was performed in 6 coronal sections, 120μm apart, through the rostrocaudal extent of the AVCN. In each section, three fields (55*.*<sup>25</sup> <sup>×</sup> 103 <sup>μ</sup>m2 dorsal, middle and ventral) were sampled using a 40× objective. In order to perform an appropriate comparison of the immunostaining among cases, a macro was designed to process and analyze the captured images (Alvarado et al., 2004). Briefly, images were normalized by using an algorithm, based on the signal-to noise ratio that normalizes each pixel, adjusting the grayscales range of the image. Following normalization, the threshold level was set as two standard deviations above the mean gray level of the field and immunostained cells exceeding this threshold were identified as labeled. Additionally, as both the intensity and area of the immunostaining could be affected by changes in activity (Caicedo et al., 1997), *the mean gray level of the immunostaining* and *the immunostained area* were used as indicators of changes in protein levels (Winsky and Jacobowitz, 1995; Benson et al., 1997).

It has been demonstrated that the intensity of the immunostaining is related to antigen concentration (Huang et al., 1996; Yao and Godfrey, 1997). Therefore, *the mean gray level* was used as an indirect measure of intracellular protein levels within cells after UCHL providing a general estimation of the effect of the unilateral deprivation on the immunostaining of neurons and glia. Additionally, *the immunostained area* was used as an indicator of the area occupied by microglial cells and astrocytes at each survival time in comparison to control rats. It was calculated as the summed area of all profiles (cells and processes) labeled above the threshold in each field.

#### **PREPARATION OF FIGURES AND STATISTICAL ANALYSIS**

Photoshop (Adobe v5.5) and Canvas (Deneba v6.0) were used to adjust size, brightness and contrast of publication images. All the data were expressed as means ± standard error of the mean. Comparisons among groups were analyzed statistically using the one-factor analysis of variance and the Scheffe's *post-hoc* analysis to evaluate the effect of the survival time after unilateral conductive hearing loss over the immunostaining in the cochlear nucleus. Statistical significance was set at a level of *p <* 0*.*05.

## **RESULTS**

### **AUDITORY BRAINSTEM RESPONSES (ABR)**

To evaluate alterations in auditory function following UCHL, ABR recordings were performed in rats before (pre-lesion ABR) and after (post-lesion ABR) unilateral ossicle removal for each of the time points described in the Materials and Methods Section. Similar to the control condition, the pre-lesion recordings showed a distinctive wave pattern characterized by four to five positive peaks generated after a stimulus (**Figure 1A**). Meanwhile, the post-lesion ABR in the ear ipsilateral to the lesion (**Figures 1C–F**) showed differences in the wave amplitudes at all the frequencies tested when compared to the contralateral side (**Figure 1B**) and control animals (**Figure 1A**). Experimental rats had significant thresholds elevations at all frequencies and time points studied after UCHL which were indicative of decreased activity in the ipsilateral auditory nuclei (**Figure 2**).

#### **MICROGLIAL RESPONSE TO UCHL**

In the control condition and in the side contralateral to the lesion, microglial cells in the AVCN had round or oval cell bodies and long ramified processes (**Figures 3A,B,G**). A microglial reaction in the ipsilateral AVCN of experimental animals was first detectable 1 day after the lesion, at which time glial cells had larger cell bodies along with thicker and less branched processes than resting microglia (**Figures 3C,H**). At 4d post-lesion, microglial expression was maximal (**Figures 3D,I**) when compared to that in the intact side (**Figure 3B**) and control (**Figure 3A**) animals. Cells were larger, darker and occupied a larger area when compared to those observed at 1d post-lesion. These qualitative observations were confirmed by significant increases in their cross-sectional area, mean gray level of Iba1 immunostaining and immunostained area (**Figure 4**; **Table 2**). At 7d post-lesion, glial cells displayed a remarkable heterogeneity in their morphology. Some cells had smaller cell bodies and longer processes (arrowheads in **Figures 3E,J**) while others had morphologies resembling those

seen at 1 and 4 d post-lesion (compare asterisk and arrows in **Figure 3E**). At this time point, the microglial cross-sectional area was significantly decreased when compared with the other survival time points. However, it was increased when compared with the unmanipulated side and control animals (**Figures 3**, **4**; **Table 2**). Glial cells were still darkly immunostained as confirmed by significant decreases in the mean gray levels when compared with experimental animals at 4d post-lesion, but no differences were found when compared with 1d post-lesion animals (**Figure 4**; **Table 2**). At longer survival times (15d post-lesion), microglial cells adopted the typical ramified structure usually seen in the normal brain (**Figures 3F,K, 4**).

Microglial-neuronal appositions in the ipsilateral AVCN were predominantly observed at 1 and 4 d post-lesion (**Figures 5D–F**), when microglia assumed a more amoeboid phenotype (arrows in **Figure 5E**) and surrounded injured cochlear neurons (asterisks in **Figure 5D**). These close appositions were less frequently seen at day 7 (**Figures 5G–I**) and were almost absent at day 15 (**Figures 5J–L**) when microglial cells transformed into the ramified phenotype similar to that seen in the contralateral side and control animals (**Figures 5A–C**). The utrastructural features of Iba1 immunostained cells in the cochlear nucleus of control and experimental rats are shown in **Figure 6**. Iba1 immunostained microglial cells were identified by the presence of electron-dense DAB reaction product within their cell body cytoplasm and processes. In the control condition and the side contralateral to the lesion, microglial cells in the resting ramified state had a nucleus with dense heterochromatin packed against the nuclear membrane, a cytoplasm with numerous organelles and inclusion bodies and multiple labeled processes of different sizes and shapes in the neuropil (Shapiro et al., 2009; **Figures 6A,B**). In the ipsilateral AVCN of experimental animals, activated microglia was characterized by an enlarged cytoplasm and thicker processes rich in vacuoles and multi-vesicular bodies, which were observed surrounding neuronal processes and contacting nearby synaptic elements (**Figures 6C,D**).

#### **ASTROGLIAL RESPONSE TO UCHL**

GFAP immunostaining in unmanipulated animals was observed mostly as branched astroglial processes heterogeneously distributed through the AVCN (**Figure 7A**). Following UCHL, the morphology and staining features of astrocytes were similar to those observed in control animals (**Figures 7A–F**). Quantification of the mean gray levels and immunostained areas of GFAP immunostaining in experimental animals indicated that there were no differences at any survival time when compared to either

the contralateral side or normal control animals (**Figures 7G–J**; **Table 3**). Similar to the control condition, astrocytic processess were found in the neuropil or closely associated with cochlear nucleus cell bodies at all the time points studied (**Figures 8A–C**). The utrastructural features of astrocytes in the AVCN of control and experimental rats are shown in **Figures 8D,E**. These macroglial cells contacted surrounding cellular elements in the neuropil (asterisks in **Figures 8D,E**)

## **UPREGULATION OF NT-3 IMMUNOSTAINING FOLLOWING UCHL**

In control and experimental animals, NT-3 immunostaining was localized within the cell cytoplasm and also in the neuropil in the AVCN (**Figure 9**). At 1, 4, and 7 d post-lesion (**Figures 9C–E**), immunostaining in the ipsilateral side was significantly increased compared to the contralateral side and to normal control animals (**Figures 9A,B**). These observations were corroborated by significant increases in the mean gray level of the immunostaining at the above mentioned survival times (**Figures 9G,H**; **Table 4**).

To determine whether glial cells expressed NT-3 following UCHL, double-labeling experiments with neuronal and glial markers were performed in both control and experimental animals. Few microglial cells (**Figure 10**) and astrocytes (**Figure 11**) colocalyzed with NT-3, demonstrating that neurons and nerve terminals are the main sources of NT-3 within the AVCN


#### **Table 4 | NT3 immunostaining in the AVCN in control and experimental animals.**

*Values are means* ± *standard errors. MGL, Mean gray level of NT3 immunostaining. \*p < 0.05; \*\*p < 0.01; \*\*\*p < 0.001; NS, No significant.*

(**Figures 10**, **11**). The colocalization of activated microglia with NT-3 in the ipsilateral side in comparison to the contralateral side and control animals is shown in **Figure 10** (yellow arrowheads). Note the close spatial appositions between NT-3-containing neurons (asterisks in **Figure 10**) and microglial cells (white arrows in **Figure 10**). Regarding astrocytes, as GFAP immunostaining in the ipsilateral side was similar at all the time points studied after the lesion in comparison to the contralateral side and control animals, a representative example of the colocalization of GFAP and NT-3 in the ipsilateral AVCN at 7d post-lesion (yellow arrows) is shown in **Figure 11**.

## **DISCUSSION**

The present study demonstrates that UCHL, which causes a progressive decline in cochlear nerve activity, leads initially to an increase in microglial but not astroglial activation in the AVCN of adult rats, at least up to 15d after the lesion. At 1 and 4 d following unilateral ossicle removal, Iba1 immunostained cells were larger and darker in the ipsilateral AVCN when compared with those in the contralateral side and control animals. These observations were confirmed by significant increases in the mean cross-sectional areas of microglial cells and increases in the mean gray levels of Iba1 immunostaining. When the expression and distribution of NT-3 and its colocalization with microglial and astroglial markers was investigated, we observed that NT-3 levels peaked by day 4 post-lesion, and that most labeling was concentrated in neuronal cell bodies and axons. Only few, scattered glial cells expressed this neurotrophin in control animals and following monoaural hearing loss. These findings suggest that microglial cells contribute to restore impaired synaptic function following UCHL.

Interrupted conduction of sound waves to the inner ear by means of unilateral middle ear ossicle removal has been reported to modify the activity of auditory neurons (Woolf et al., 1983; Tucci et al., 1999, 2001, 2002). In this regard, UCHL in gerbils results in either increases or decreases in 2-deoxyglucose (2-DG) uptake in the ipsilateral cochlear nucleus depending on whether the animals are maintained in silence following the 2-DG injection (Woolf et al., 1983) or exposed to sound (Tucci et al., 2001). Also, cytochrome oxidase activity has been shown to decrease in the ipsilateral cochlear nucleus and to increase on the contralateral side of experimental animals (Tucci et al., 2002). In agreement with these studies, our results also demonstrate decreased activity in the ipsilateral cochlear nucleus after UCHL. In this regard, increases in auditory thresholds and decrease in waves amplitudes in the ipsilateral side to the lesion were indicative of reduced levels of cochlear inputs to auditory nuclei leading to the suggestion that auditory circuitry is altered following unilateral hearing loss. Supporting this idea, increases in the number of cochlear nucleus neurons projecting ipsilaterally to the inferior colliculus of the unmanipulated side (Nordeen et al., 1983; Moore and Kitzes, 1985; Moore and Kowalchuk, 1988; Moore, 1994) and increased calcium influx in deafferented auditory brainstem nuclei that project to the inferior colliculus contralateral to the lesion (Fuentes-Santamaria et al., 2003, 2005; Alvarado et al., 2004) have been observed after experimental hearing loss in gerbils and ferrets.

The magnitude of the effects of UCHL on synaptic transmission varies depending on the species, the age of the animal when hearing loss occurs and the lesion paradigm. In the cochlear nucleus of adult guinea pigs, long-term plastic changes in GABA, glycine and aspartate uptake and release take place after middle ear ossicle removal (Potashner et al., 1997; Suneja et al., 1998). Previous studies have demonstrated that short and long-term alterations in receptor trafficking at synapses also occur in the cochlear nucleus of adult rats following monoaural earplugging (Whiting et al., 2009; Wang et al., 2011). Particularly by 1d after unilateral hearing loss, GluA3 subunits of the AMPA receptor are upregulated at auditory synapses on cochlear nucleus projection neurons while glycine receptor α1 subunits are downregulated at inhibitory synapses suggesting that decreased auditory nerve activity compromises synaptic function.

Over the years, a number of studies have demonstrated that glial cells play a pivotal role as regulators of synaptic stability in response to brain damage (Bruce-Keller, 1999; Hanisch and Kettenmann, 2007). Particularly in the cochlear nucleus, bilateral cochlear ablation triggers a microglial activation process as early as 16 h post-lesion. This response peaks at 24 h when the intracellular levels of Iba1 are increased and microglial cells have adopted a phenotype characterized by irregularly shaped hypertrophic cell bodies with very few short processes (Fuentes-Santamaria et al., 2012). The present findings indicate that reduced sound

**control animals.** NT-3 immunostaining (blue) and cell nuclei stained with DAPI (pseudo-colored red) are shown in **A–F** while CD11b

nuclei were stained with DAPI (blue). Scale bar = 20μm in **F** (also applies to **A–R**); 20μm in **Y** (also applies to **S–X**).

transmission to the cochlea also triggers a microglial reaction in the cochlear nucleus that is already present by 1d following the lesion and reaches maximal levels by 4d post-lesion, a time point at which activated microglia is seen in apposition to activitydeprived auditory neurons. This functional relationship has also been corroborated by electron microscopic observations demonstrating that activated microglial processes are localized in the neuropil contacting nearby cellular elements following the lesion. These ultrastructural observations are in agreement with previous studies suggesting that microglia-neuronal interactions are critical to regulate synaptic activity (Skibo et al., 2000; Shapiro et al., 2009).

One of the mechanisms used by glial cells to facilitate the exchange of cellular signals and restore synaptic homeostasis is to increase the production and release of growth factors and cytokines (Cullheim and Thams, 2007). In this regard, upregulation of IGF-1 and IL-1β levels in cochlear nucleus neurons but not in glial cells has been observed in adult rats at 1, 7, and 15 d post-ablation suggesting that deprived auditory nucleus neurons do not require additional IGF-1 and IL-1β synthesis by glial cells to re-establish affected synaptic circuits (Fuentes-Santamaria et al., 2007, 2013). Neurotrophins are also signaling molecules expressed by neurons and microglia that serve trophic roles in the normal brain (Elkabes et al., 1996; Zhang et al., 2007). In agreement with our findings, NT-3 immunostaining in rats and gerbils is localized within the cell body cytoplasm as well as in the proximal dendrites and axon hillock of cochlear nucleus neurons in the adult and developing brain (Burette et al., 1998; Hafidi, 1999; Tierney et al., 2001; Hossain et al., 2002). Interestingly, recent findings in mice have provided evidence that NT-3 might also be expressed in primary cochlear afferents which are intermingled with the principal cells of the cochlear nucleus (Feng et al., 2010, 2012). Feng et al. (2010), observed that NT-3 staining was almost absent in the cytoplasm of cochlear nucleus neurons, and hence, they hypothesized that NT-3 might be released by supporting cells and inner hair cells in the inner ear, taken by spiral ganglion neuron peripheral processes and transported anterogradely to their endings in the cochlear nucleus. It is possible that the dissimilar findings between the study of Feng et al. (2010) and the aforementioned studies in rats and gerbils together with our observations might be due to species differences or to the antibodies used by the different authors that might be recognizing different epitopes. Although in the current study we have not evaluated whether NT-3 is also expressed in synaptic endings from spiral ganglion neurons, we cannot rule out the possibility that the punctate staining observed in the neuropil and surrounding cochlear nucleus neurons is presynaptic in nature.

Neurotrophic factors are activity-dependent molecular signals that play an important role in promoting auditory nerve fiber growth and spiral ganglion neuron survival and in restoring synaptic function and structure in response to hearing loss (Suneja et al., 2005; Fukui et al., 2012). Increased neurotrophin

**FIGURE 11 | Confocal images depicting the colocalization of NT-3 with the macroglial marker GFAP in the ipsilateral cochlear nucleus at 7d post-lesion in comparison to the contralateral side and control animals.** NT-3 immunostaining (green) and cell nuclei stained with DAPI (blue) are

represented in **A–C**, while astroglial staining (red) is shown in **D–F**. Colocalization of astrocytic processes with NT-3 is shown in **G–I** (yellow arrows). Colocalyzing cellular elements in **A–F** are indicated by white arrows. Scale bar = 20μm in **I** (also applies to **A–H**).

levels have been found in the adult cochlear nucleus of guinea pigs at 7d after unilateral cochlear ablation and have been shown to contribute to synaptogenesis following cochlear damage (Suneja et al., 2005). Our results show that upregulation of neurotrophin levels within neurons occurs at 4d following UCHL, a time point at which the microglial response reached maximal levels. This upregulation, together with the fact that microglial cells did not increase NT-3 levels at any of the time points studied following the lesion, indicates that transient increases in NT-3 likely contributing to preservation of synaptic function in the cochlear nucleus after UCHL, are part of activity-modulated neurotrophic mechanisms involving cochlear neurons, but not microglial cells.

In this study we did not find evidence of astrocytes activation at any of the time points included in this study. This suggests that these glial cells may not contribute directly to neuronal and synaptic adaptations to diminished activity in the cochlear nucleus after conductive hearing loss. The ultrastructural features of astrocytes described in this study are in agreement with those presented under normal conditions in different brain structures (Aoiki, 1992; Novikov et al., 2000). Astroglial reaction has been described in the cochlear nucleus after uni- or bilateral cochleotomy (Lurie and Rubel, 1994; De Waele et al., 1996; Campos Torres et al., 1999; Insausti et al., 1999; Lurie and Durham, 2000; Fredrich et al., 2013). In the adult rat, it reaches maximal levels by 7d post-ablation when degeneration and reactive synaptogenesis are in full progress. The fact that astrogliosis seems to be more protracted and less persistent than microglial activation has led to the hypothesis that differences in the temporal pattern of activation of both cell types may reflect cooperative interactions to facilitate adult synaptogenesis following deafness (Fuentes-Santamaria et al., 2013). The fact that we do not see an astroglial reactive component, at least up to day 15 post-lesion, may complement this hypothesis by providing evidence that perhaps neuronal degeneration, like after cochleotomy, is required to unleash an astroglial reaction, whereas microglial cells react to adapt neurons and circuits to a broader range of lesion situations, ranging from attenuated activity to full-scale neuronal degeneration. Finally, although previous studies have shown that astrocytes might express NT-3 (Burette et al., 1998; Feng et al., 2012), our observations indicate that only a small subpopulation expresses this neurotrophin, both in control and experimental animals. In summary, our results provide evidence that microglial cells, but not astrocytes, are involved in a transient response to diminished activity in the cochlear nucleus, likely aimed at preserving or adapting synaptic function. Although further studies are still required, these findings suggest that modulation of the microglial responses could be a pharmacological target of interest for the treatment of pathologies that induce hearing loss.

## **AUTHOR CONTRIBUTORS**

All authors had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. Study concept and design: Verónica Fuentes-Santamaría, Juan C. Alvarado. Acquisition of data: Verónica Fuentes-Santamaría, Juan C. Alvarado, Diego F. López-Muñoz, Pedro Melgar-Rojas and María C. Gabaldón-Ull. Statistical analysis and interpretation of data: Verónica Fuentes-Santamaría and Juan C. Alvarado. Drafting of the manuscript: Verónica Fuentes-Santamaría and Juan C. Alvarado. Critical revision of the manuscript for important intellectual content: Verónica Fuentes-Santamaría, Juan C. Alvarado, and José M. Juiz. Obtaining funding: Verónica Fuentes-Santamaría, Juan C. Alvarado, and José M. Juiz.

## **ACKNOWLEDGMENTS**

This study was supported by Programa I3 del Ministerio de Ciencia e Innovaciün (I320101590 to Verónica Fuentes-Santamaría and I320101589 to Juan C. Alvarado) and Ministerio de Ciencia e Innovación (BFU2012-39982-C02-02 to José M. Juiz).

## **REFERENCES**


in inducing growth of neuronal terminal arbors in mice. *J. Neurosci.* 22, 8034–8041. Available online at: http://www*.*jneurosci*.*org/content/22/18/8034*.* full


**Conflict of Interest Statement:** The Associate Editor, Monica Muñoz-Lopez, and the Review Editor, Alino Martinez-Marcos, declare that, despite being affiliated to the same institution as the authors, the review process was handled objectively and no conflict of interest exists. The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 21 July 2014; accepted: 19 September 2014; published online: 13 October 2014.*

*Citation: Fuentes-Santamaría V, Alvarado JC, López-Muñoz DF, Melgar-Rojas P, Gabaldón-Ull MC and Juiz JM (2014) Glia-related mechanisms in the anteroventral cochlear nucleus of the adult rat in response to unilateral conductive hearing loss. Front. Neurosci. 8:319. doi: 10.3389/fnins.2014.00319*

*This article was submitted to Auditory Cognitive Neuroscience, a section of the journal Frontiers in Neuroscience.*

*Copyright © 2014 Fuentes-Santamaría, Alvarado, López-Muñoz, Melgar-Rojas, Gabaldón-Ull and Juiz. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Insult-induced adaptive plasticity of the auditory system

## *Joshua R. Gold and Victoria M. Bajo\**

*Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, UK*

#### *Edited by:*

*Yukiko Kikuchi, Newcastle University Medical School, UK*

#### *Reviewed by:*

*Gregg H. Recanzone, University of California, USA William Sedley, Newcastle University, UK*

#### *\*Correspondence:*

*Victoria M. Bajo, Department of Physiology, Anatomy and Genetics, Sherrington Building, Parks Road, Oxford OX1 3PT, UK e-mail: victoria.bajo@dpag.ox.ac.uk*

The brain displays a remarkable capacity for both widespread and region-specific modifications in response to environmental challenges, with adaptive processes bringing about the reweighing of connections in neural networks putatively required for optimizing performance and behavior. As an avenue for investigation, studies centered around changes in the mammalian auditory system, extending from the brainstem to the cortex, have revealed a plethora of mechanisms that operate in the context of sensory disruption after insult, be it lesion-, noise trauma, drug-, or age-related. Of particular interest in recent work are those aspects of auditory processing which, after sensory disruption, change at multiple—if not all—levels of the auditory hierarchy. These include changes in excitatory, inhibitory and neuromodulatory networks, consistent with theories of homeostatic plasticity; functional alterations in gene expression and in protein levels; as well as broader network processing effects with cognitive and behavioral implications. Nevertheless, there abounds substantial debate regarding which of these processes may only be sequelae of the original insult, and which may, in fact, be maladaptively compelling further degradation of the organism's competence to cope with its disrupted sensory context. In this review, we aim to examine how the mammalian auditory system responds in the wake of particular insults, and to disambiguate how the changes that develop might underlie a correlated class of phantom disorders, including tinnitus and hyperacusis, which putatively are brought about through maladaptive neuroplastic disruptions to auditory networks governing the spatial and temporal processing of acoustic sensory information.

#### **Keywords: cochlea, auditory cortex, hearing loss, neural plasticity, tinnitus, peripheral insult**

The adult mammalian auditory system shows a remarkable degree of plasticity in a variety of contexts, and at a number of processing levels, manifesting as changes in the central representation of acoustic stimuli. These modulatory processes in the adult auditory brain are thought to be crucial to the performance and learning of ecologically relevant behaviors (Froemke and Martins, 2011; King et al., 2011), as well as being differentially affected during active and passive listening (Pienkowski and Eggermont, 2011). The types of changes observed in these contexts tend to be adaptive if behavior is assessed. However, as previously reviewed (Salvi et al., 2000; Syka, 2002), the mechanisms that develop following a sensorineural auditory insult may represent a unique assemblage, specific to abnormal or damaged sensory input. For example, cholinergic modulation of auditory cortical plasticity plays a key role in driving tonotopic reorganization (Kilgard and Merzenich, 1998) or training-related plasticity (Reed et al., 2011), including regaining acoustic spatial localization after unilateral conductive hearing loss (Leach et al., 2013). However, the cholinergic system is seemingly extraneous to cortical tonotopic plasticity following permanent cochlear damage (Kamke et al., 2005). Therefore, the various outcomes of auditory insults that affect normal cochlear function may constitute an important route for investigating the capacity of the mammalian brain for dealing with disrupted sensory inputs.

The burden posed by hearing-related trauma represents a significant challenge to healthcare services globally, with the rate of hearing impairment approaching ∼40% in some adult populations (Agrawal et al., 2008; Nondahl et al., 2011), while the risk of hearing impairment appears to be climbing amongst younger cohorts (Niskar et al., 2001). Each of these epidemiological observations suggest that hearing loss and auditory traumarelated symptoms are liable to affect larger proportions of the future population. There is thus impetus motivating the investigation of the underlying mechanisms that might be responsible for peripheral and the central changes that cause abnormalities in spectrotemporal processing, which in day-to-day life can have a substantial impact on speech perception and auditory scene analysis.

Furthermore, a link between auditory disruption and phantom percept generation has been appreciated for years (e.g., Axelsson and Sandh, 1985). More recently, interest in the possible central origins of such phantoms has been sparked by suggestions that maladaptive neuroplasticity may be responsible for these percepts (Eggermont and Roberts, 2004), even in the absence of any audiometrically identifiable hearing loss (Schaette and McAlpine, 2011). Clearly, the instantiation of tinnitus represents a problem: both for its human sufferers, substantially decreasing quality of life in some patients, particularly given the dearth of effective, broadly applicable treatments available (Baguley et al., 2013; Langguth et al., 2013); but also for researchers investigating the neurobiological mechanisms of the disease, particularly given the little consensus regarding the factors that induce and then maintain the disease chronically (Kaltenbach, 2011; Knipper et al., 2013; Noreña and Farley, 2013; Roberts et al., 2013).

To this end, the current review will seek to examine the confluence of factors related to auditory insults from a bottom-up perspective, addressing changes in peripheral function associated with mechanically-, acoustically-, pharmacologically-, and aging-mediated disturbances. The central consequences of these changes to hearing receptor function will then be evaluated: at the level of the single neuron, its local circuitry, as well as the global systemic effects likely to be responsible for behavioral changes that have been categorized in animal models of auditory trauma (**Table 1**). By occupying a non-hypothesis-driven position in considering the available data across a variety of fields, certain important functional commonalities and/or exceptions will hopefully come to light.

## **CHANGES IN COCHLEAR FUNCTION FOLLOWING PERIPHERAL INSULT**

Studies of hearing loss predominantly focus on damage inflicted upon the organ of Corti, with the outcomes of direct mechanical (e.g., Robertson and Irvine, 1989; Rajan et al., 1993) or thermal (e.g., Snyder et al., 2008) lesions of the cochlea and its afferents clear in terms of the extent of damage induced. The time scale is also variable, depending upon the idiosyncrasies of other deafening protocols, leading to temporary- (e.g., Calford et al., 1993; Syka and Rybalko, 2000; Dehmel et al., 2012b; but see Kujawa and Liberman, 2009), or permanent threshold shifts, or indeed related to natural aging. Nonetheless, different experimental approaches appear to share certain commonalities that may be responsible for the induction of stereotypical responses in central processing stations.

#### **THE ANATOMY OF COCHLEAR INJURY**

Following an acutely induced acoustic overexposure, the degree of hair cell loss generally scales with the amplitude and duration of the insult. This usually manifests as extensive outer hair cell (OHC) death, with frequency-delimited loss of inner hair cells (IHCs) scaling with the trauma severity (Spongr et al., 1992). Lesion patterns following similar exposure protocols are variable, however. Descriptions have been made of equivalent outer and inner hair cell degeneration (Dolan et al., 1975; Kiang et al., 1975), as well as consistent damage to basally-located OHCs and IHCs regardless of the spectral content or duration of the acoustic trauma (Hawkins et al., 1976; Mulders et al., 2011).

By comparison, disruption of hair cell stereocilia has been observed to occur without inducing apoptosis of the affected hair cells (Liberman and Beil, 1979). This heterogeneous susceptibility to trauma appears to be conserved in certain models of ototoxicity, such as aminoglycoside antibiotic exposure (Forge, 1985), or that induced by platinum-derived cancer treatment drugs (specifically cisplatin and carboplatin) (Yorgason et al., 2006). Certainly, these treatments are recognized for their potentially devastating side-effects in human patients. Interestingly, while each of these latter, platinum-based agents leads primarily to OHC destruction, a lack of dose dependency has been reported, often leading to variable degrees of OHC damage for a range of exposure concentrations (Kaltenbach et al., 2002). Notable is the unique case of the chinchilla, which suffers a severe, and often, complete, ablation of the IHC population by carboplatin (Wake et al., 1993; Takeno et al., 1994), affecting OHCs only when doses verge on systemic toxicity, at which point total hair cell losses are substantial (Kraus et al., 2009).

It is constructive to highlight an equivalency between losses developed following acute trauma, and those acquired during aging. Age-related changes affect the cochlea in a stereotyped, multifactorial manner (Schuknecht, 1964; Schuknecht and Gacek, 1993; Ohlemiller, 2004), with hearing deterioration being characterized according to the underlying pathology of the receptor organ. According to Schuknecht's revised categories of the pathology, specific cochlear components may undergo degradation. The types of presbyacusis typically referred to include sensory presbyacusis—a loss of hair cell or organ of Corti integrity; metabolic presbyacusis, indicative of aberrant strial physiology; neural presbyacusis, involving atrophy and/or apoptosis of the afferent spiral ganglion cell fibers in the cochlea; or some combination thereof.

In models of presbyacusis, the relative distribution of hair cell damage appears to vary between species and is often defined by greater OHC impairment (Keithley and Feldman, 1982; Spongr et al., 1997). During the onset of age-related peripheral pathology, progressive deterioration of a number of cochlear structures has also been described: in the stria vascularis (Di Girolamo et al., 2001; Hequembourg and Liberman, 2001) and in the spiral ligament (Ichimiya et al., 2000), as well as a metabolic deterioration of the organ of Corti, which may exacerbate the functional consequences of OHC prestin reduction (Buckiova et al., 2007; Bielefeld et al., 2008; Chen et al., 2009). Ultimately, each of these various cochlear pathologies seems to converge upon auditory neuropathy (Keithley and Feldman, 1979; Cohen et al., 1990; Dazert et al., 1996; White et al., 2000; Engle et al., 2013). This putatively common endpoint, involving complete deafferentation, is likely preceded by functional disturbance of the synaptic interface between spiral ganglion afferents and IHCs (Stamataki et al., 2006; Sergeyenko et al., 2013).

A similar template of neuropathy—in particular concerning damage at the IHC-ANF interface—has been observed in acute experimentally-induced hearing loss, with regions of IHC loss correlated with reactive swelling and peripheral deafferentation of an equivalent spiral ganglion cell (SGC) population (Feng et al., 2012). Akin to the description above, degeneration of SGC perikarya occurs over the weeks to months following trauma, as does selective stria vascularis pathology (Wang et al., 2002). Similarly, inspection of drug ototoxicity has unveiled the development of afferent fiber terminal vacuolization in advance of any discernible IHC damage (Wang et al., 2003). Systemic toxicity is probably responsible for certain aspects of peripheral trauma, although the retraction of peripheral afferents following traumamediated terminal swelling has been postulated to be associated with disruption of the IHC ribbon synapse, a specialized glutamatergic structure necessary for synchronized auditory nerve activity (Buran et al., 2010). Importantly, under traumatic conditions, the ribbon synapse structure is capable of mediating an excitotoxic injury of afferent dendrites (Puel et al., 1998; Pujol and Puel, 1999).

Ribbon synaptopathy has been noted in the absence of overt cellular morphological disruption (Liu et al., 2013), with recent evidence indicating that it may well be a predictable consequence of cochlear trauma (Rüttiger et al., 2013; Singer et al., 2013).



**28**




**Table 1 | Continued**


Observations of ribbon synaptopathy following even mild acoustic exposure (Maison et al., 2013) are particularly compromising from a functional perspective, since cochlear degeneration may have unwanted consequences at much later time points after the initial insult (Kujawa and Liberman, 2006, 2009; Sergeyenko et al., 2013) (**Figure 1**). How, then, are the various structural consequences of cochlear trauma to be reconciled with functional changes occurring in auditory nerve afferent signaling?

## **THE ASSESSMENT AND OUTCOMES OF PERIPHERAL TRAUMA**

Audiogram-based assessment is often capable of detecting hearing loss defined in terms of OHC dysfunction and/or threshold

elevation, and is used extensively to define the consequences of cochlear trauma as "permanent" or "temporary" threshold shifts. Typically, this is via inspection of waveforms of the auditory brainstem response (ABR) or auditory nerve fiber compound action potential (CAP), in conjunction with distortion product otoacoustic emissions (DPOAEs), or simply via self-reporting in human patients. By contrast with the threshold losses mediated by mixed OHC/IHC lesions (Cody and Johnstone, 1980; Liberman and Dodds, 1984a) or OHC loss alone (Dallos and Harris, 1978), the pattern of extensive cochlear damage induced by carboplatin administration in the chinchilla manifests as a timedependent reduction in the CAP amplitude, often without pathological modulation of DPOAEs or ABR synchrony (El-Badry and McFadden, 2009), and only minor elevation (∼9 dB) of electrophysiological thresholds (El-Badry and McFadden, 2007). This is reflected functionally in the preservation of behavioral audiograms collected from carboplatin-exposed chinchillas (Lobarinas et al., 2013b). It is certainly concerning that IHC pathology can extend to include up to 80% of the cochlear population, without similarly drastic changes in measures provided by commonlyused diagnostic tools.

Similarly, long-term degeneration of SGCs has been noted in aged mice with intact IHCs and permanently raised ABR thresholds (Kujawa and Liberman, 2006). More recently, a type of insidious pathology has been characterized following an apparently limited, acute hearing loss, previously thought to have minimal ongoing ramifications at the periphery (**Figure 1**). Inspection of mouse cochleae without evidence of raised thresholds (on ABRs or DPOAEs), nor with any overt structural damage to OHC or IHC populations, revealed IHC-specific ribbon synaptopathy within 24 h of acoustic trauma, manifesting as suprathreshold reductions in ABR amplitude at the affected cochleotopic frequencies (Kujawa and Liberman, 2009). More surprisingly, latestage (*>*2 years) cochleography exposed a reduction by ≤50% of SGC density, regardless of the apparent health of the organ of Corti. In addition to these findings being replicated in the guinea pig (Lin et al., 2011), afferent degeneration coming about in this "silent" deafness seems only to affect those auditory nerve fibers encoding high-intensity stimuli (Furman et al., 2013).

Diagnostic issues would undoubtedly be associated with detecting this type of selective insult to high-threshold, lowspontaneous-rate fibers, especially since classical studies have correlated hearing loss with reductions in afferent spontaneous and evoked firing rates (Lonsbury-Martin and Meikle, 1978; Liberman and Dodds, 1984b). Recent demonstration that the phase-locking capacity of the auditory nerve fibers is diminished in the presence of low-level noise following acoustic trauma (Henry and Heinz, 2012, 2013) suggests that the functional ramifications of high-threshold fiber pathology are liable to be complex and context-specific. Given the multifaceted peripheral sequelae of cochlear trauma, it is important that comprehensive evaluation of auditory afferent function is performed to fully characterize the extent of cochlear pathology following auditory trauma, both clinically and experimentally (e.g., Ahlf et al., 2012; Dehmel et al., 2012a; Gu et al., 2012; Rüttiger et al., 2013).

Of particular use would be the routine implementation of suprathreshold evoked potential-based measurements of peripheral and brainstem auditory function. The collection of ABR data at multiple intensities for each subject (prior to and following trauma in animal experiments) would allow auditory thresholds to be established (either generally, in response to broadband click-train stimuli; or at greater spectral resolution by using tonal or narrowband-passed noise stimuli). Moreover, possible sites of lesion-mediated plasticity could be ascertained by evaluating differences in waveform amplitudes and latencies (either between the trauma population and matched controls, or in repeat measurements of the same subjects under different treatment conditions). Given that ABRs can be obtained repeatedly, they represent a convenient way of tracking, longitudinally, the immediate and longer term effects of commonly-implemented trauma protocols. The collection of such data would greatly improve our understanding of the impact of trauma on brainstem neural populations. For example, data of this kind have been reported in human tinnitus patients with and without audiometric threshold elevations, and have been integral to our ongoing understanding of that pathology (Attias et al., 1996; Kehrle et al., 2008; Schaette and McAlpine, 2011; Gu et al., 2012) (see below).

## **NEUROCHEMICAL AND STRUCTURAL CHANGES INDUCED BY AUDITORY DAMAGE**

Induced damage to the peripheral hearing organ causes physiological disruption of neurons at various levels along the auditory neuraxis, affecting not only neurotransmitter communication mechanisms, but also intracellular signaling pathways and metabolism. Although the precise functional impact is yet to be fully elucidated, given the complexity of the central auditory system, plus the extent and temporal features of the various insults examined experimentally, examining how cellular-level processes are affected may enhance our understanding of the higher-order systemic or perceptual outcomes of auditory trauma.

#### **ACTIVITY-RELATED GENE MODULATION**

The expression of certain immediate early genes (IEGs), as a surrogate for neuronal metabolic state, has revealed time-varying modulation patterns in the auditory cortex following bilateral cochlear ablation that indicate the short time course necessary for insult-induced genetic mobilization in auditory neurons (Oh et al., 2007). *c-fos* is an IEG induced by stimulus novelty (e.g., Ehret and Fischer, 1991; Rouiller et al., 1992), thought to link neuronal activity with gene transduction associated with plasticity and learning (Carretta et al., 1999; Tischmeyer and Grimm, 1999). In the auditory cortex, *c-fos* is expressed within 1 h following acoustic trauma (Wallhäusser-Franke et al., 2003), in a diffuse manner indicating an acute, widespread (non-tonotopic) hyperreactivity within the cortex (Mahlke and Wallhäusser-Franke, 2004). Enhancements of *c-fos* expression levels were also detected over similar time frames at key subcortical auditory centers (including IC and DCN) following acoustic trauma, suggesting a feedforward upregulation of neural activity throughout the central auditory system.

As an alternative method of modifying cochlear activity, salicylate administration at high doses has been noted to affect hearing thresholds (Cazals, 2000), simultaneously elevating spontaneous activity (Evans and Borerwe, 1982) and diminishing driven response rates (Sun et al., 2009a; Stolzberg et al., 2011) within the auditory nerve. Appropriately, *c-fos* labeling after systemic salicylate injection yielded limited labeling in the non-lemniscal dorsal cortex and external nucleus of the IC (Wallhäusser-Franke, 1997; Mahlke and Wallhäusser-Franke, 2004), with more consistent, dose-dependent expression found at the primary auditory cortex (A1) and anterior auditory field (AAF) (Wallhäusser-Franke et al., 2003).

More recently, region-specific insult-induced changes in the expression of another IEG, Arg3.1/Arc (activity-regulated gene/activity-regulated cytoskeleton-associated protein), have been investigated. Arg3.1/Arc mobilization putatively occurs through brain-derived neurotrophic factor (BDNF)-mediated activation of the MEK-ERK signaling pathway (Ying et al., 2002). Components of this pathway are upregulated in the auditory brainstem following cochlear ablation (Suneja and Potashner, 2003; Suneja et al., 2005) and acoustic trauma (Tan et al., 2007). On examining the effects of salicylate and acoustic trauma on Arg3.1/Arc expression in the adult gerbil, marked divergence from the pattern of *c-fos* labeling is seen, with Arg3.1/Arc upregulation occurring in cortex only (Mahlke and Wallhäusser-Franke, 2004). Furthermore, dissimilarities between the tonotopic and laminar arrangements of Arg3.1/Arc expression were qualitatively related to the type of insult. Systemic salicylate injection enhanced Arg3.1/Arc expression in the highfrequency domain of A1, but predominantly outside of layer IV, thereby suggesting its effects are less likely to be mediated by substantial thalamic input. By contrast, moderate acoustic trauma (at an intensity sufficient to produce long-term cochlear synaptic and afferent dysfunction) yielded a non-layerspecific Arg3.1/Arc expression pattern in a tonotopically constrained manner, effectively matching the cochlear trauma profile (Mahlke and Wallhäusser-Franke, 2004).

In more recent studies of acoustic trauma, it was intriguing that those exposed subjects with demonstrable IHC ribbon synapse degradation displayed Arg3.1/Arc mobilization that appeared to be down, not up (Tan et al., 2007; Rüttiger et al., 2013; Singer et al., 2013) (**Figure 2**). Indeed, this bidirectional Arg3.1/Arc modulation is inversely correlated with the degree of BDNF and *c-fos* expression in the spiral ganglion, for periods spanning 3 h to 6 days after acoustic exposure (Tan et al., 2007); Arg3.1/Arc was downregulated in the cerebral cortex for at least 14–30 days thereafter (Rüttiger et al., 2013; Singer et al., 2013). Although the functional consequences of long-term upor downregulation following auditory trauma remain uncertain, Arg3.1/Arc is thought to play a key role in controlling AMPAtype glutamate receptor distribution during homeostatic plasticity (e.g., Shepherd et al., 2006; Bramham et al., 2010; Gao et al., 2010; Béïque et al., 2011).

#### **EXCITATORY SIGNALING FUNCTION**

On the basis of those data, it might be expected that significant alterations in glutamatergic function would be observed after the auditory insult—however, characterization of glutamate receptor pharmacology in the deafferented auditory cortex remains conspicuously underrepresented in the literature. As such, the observed modulations in Arg3.1/Arc mobilization

**FIGURE 2 | Changes in the auditory cortex expression of Arg3.1/Arc and cochlear ribbon synapse density related to the degree of acoustic trauma. (A)** Following exposure to acoustic trauma that renders non-permanent threshold elevation (100 dB SPL, 2 h), reverse-transcriptase polymerase chain reaction (RT-PCR) analysis reveals Arg3.1/Arc expression is significantly upregulated in the auditory cortex; **(B)** by contrast, animals overexposed at 115–119 dB SPL for 2 h displayed marked downregulation of Arg3.1/Arc transcipts. Modified with permission from Tan et al. (2007). <sup>∗</sup>*p* ≤ 0*.*05. **(C,D)** This relative downregulation is illustrated by immunohistochemical labeling of Arg3.1/Arc mRNA in the auditory cortex of animals exposed to a similar trauma (120 dB SPL, 2 h). Glutamate decarboxylase puncta are also indicated. Modified with permission from Singer et al. (2013). When animals were acoustically overexposed for either **(E)** 1 h or **(F)** 1.5 h, staining in the cochlea for IHC ribbon synapses (CtBP2, green, open arrows) indicated a subset of animals with marked reduction in ribbon synapse density, illustrated here for the mid-basal turn of the cochlea. IHC nuclei are labeled with DAPI (blue, circled); glutamate receptor subunit GluR4 protein labeled (red, open arrowheads). Scale bars 10μm. **(G,H)** For each exposure protocol, a subset of animals could be categorized according to significant proportional reductions in midbasal and basal cochlear IHC ribbon synapse densities, relative to exposed animals without this pathology. Open bars = controls; filled bars = exposed. Numbers in each bar correspond to respective *n*. Modified with permission from Rüttiger et al. (2013). <sup>∗</sup>*p* ≤ 0*.*05; ∗∗*p* ≤ 0*.*01; ∗∗∗*p* ≤ 0*.*001.

(reviewed in Knipper et al., 2010, 2013) might be functionally linked with those changes in cortical neurotransmission and neuronal response properties under similar exposure conditions, or rather might simply correlated with the induction of certain forms of hearing loss, without attendant regulation of homeostatic plasticity mechanisms (Bramham et al., 2010).

Changes in glutamatergic metabolism have been better documented subcortically, such as the small, non-significant reduction of glutamate concentrations in the medial geniculate body (MGB), inferior colliculus (IC) and cochlear nucleus (CN) of a rodent model of presbyacusis (Banay-Schwartz et al., 1989b). Cochlear ablation, by contrast, induces enhanced glutamate metabolism, manifesting as elevated release and uptake, and concomitant with a putative compensatory strengthening of excitatory processing following extensive afferent fiber death (Potashner et al., 1997; Illing et al., 2005), as levels of glutamate and aspartate in the CN correlate with the degree of cochlear damage (Godfrey et al., 2005, 2014). While the proportional expression of glutamate vesicular transporter (VGLUT) subtypes has been reported to undergo adjustment in the insult-exposed CN (Fyk-Kolodziej et al., 2011), work specifically directed at such changes in the dorsal cochlear nucleus (DCN) has unveiled marked crossmodal reweighing. By assigning VGLUT subtypes 1 and 2 to auditory and non-auditory inputs, respectively (Zhou et al., 2007), the upregulated representation of non-auditory afferents in the fusiform layer of the DCN following cochlear trauma was revealed (Zeng et al., 2009, 2012).

Furthermore, in the short term, mRNAs encoding glutamatergic AMPA and NMDA-type receptor subunits are reduced in the central nucleus of the IC (CNIC) contralateral to a partial mechanical lesion of the cochlea (Dong et al., 2009, 2010a). Although these changes are only transient, they may be sufficient to more chronically disrupt cellular excitability elsewhere in the central auditory system. Such an outcome might follow in parallel within the auditory brainstem, given the reported remodeling of AMPA binding and NMDA-receptor subunit expression in the ventral cochlear nucleus (VCN) (Förster and Illing, 1998; Suneja et al., 2000). In spite of these efforts, many details of post-traumatic glutamate pharmacology remain to be elucidated.

#### **INHIBITORY SIGNALING FUNCTION**

A more comprehensive understanding has been gained of inhibitory changes, which, in addition to the reactive changes in glutamate release and signaling subcortically, may precipitate a disruption of the balance of excitation and inhibition. Even at the earliest stages of auditory processing, inhibitory inputs are affected, failing to recover to pre-lesion levels in the ventral CN (Hildebrandt et al., 2011). In line with these disruptions, reductions in the density of glycine-immunoreactive puncta (Asako et al., 2005) and glycine receptor subunit protein (Wang et al., 2009a) have each been reported throughout the cochlear nucleus (CN). Such changes develop alongside altered strychnine binding and glycine metabolism in the ipsilateral CN and superior olivary complex, earlier described in the deafened guinea pig (Suneja et al., 1998a,b) and rat (Buras et al., 2006), suggesting a sustained post-deafening hyperexcitability (Potashner et al., 2000). These inhibitory disruptions occurring in a number of auditory brainstem nuclei are, perhaps, not unexpected, since previous work has demonstrated significant reductions in both the levels of glycine (Banay-Schwartz et al., 1989a; Willott et al., 1997; Wang et al., 2009b) and glycine/strychnine binding sites (Milbrandt and Caspary, 1995; Willott et al., 1997) in rodent models of presbyacusis.

Possibly the most heavily scrutinized locus for such representative disruptions to the auditory pathway is the inferior colliculus (reviewed in Caspary et al., 2008; Ouda and Syka, 2012). Aged rodents display decreases in free GABA concentrations throughout the IC (Banay-Schwartz et al., 1989b), as well as in evoked GABA release and GABAergic cell density (without modification of GABAergic cellular morphology) (Caspary et al., 1990), and reduced labeling for GAD65 and GAD67 (Burianova et al., 2009). GAD67 is also reduced following acute, complete deafferentation (Argence et al., 2006), although when the extent of damage is limited to certain frequency domains, the downregulation of transcripts coding for GABAA receptor subunit 1 and GAD1, respectively, were matched bilaterally by an effective reduction in GABAergic receptor protein expression in a tonotopically limited fashion (Dong et al., 2010b). It is noteworthy that GAD disruptions may occur only transiently (Milbrandt et al., 2000), indicating the possibility for the metabolic recovery of GABAergic signaling if trauma severity is limited. Indeed, enhanced collicular GABA immunoreactivity (Tan et al., 2007) (**Figure 2**) and GAD65 levels (Bauer et al., 2000) each have been described, possibly compensating for elevated network activity following auditory trauma.

A comparable GABAergic dysregulation is also seen cortically. A reduction in free GABA concentration is found in aged rat cortex (Banay-Schwartz et al., 1989b), and is possibly related to reductions in the levels of GAD65/67 mRNA, and GAD67 protein across cortical layers. Those observations were reported in two different rat strains that, importantly, display divergent peripheral symptoms as a function of aging but shared central inhibitory sequelae (Ling et al., 2005; Burianova et al., 2009). Indeed, lower GABA density throughout the auditory cortex appears to be codetermined by age and hearing loss, since GAD65 expression is also reduced after acoustic trauma in younger adult animals (Yang et al., 2011). Moreover, the receptor subunit composition of GABAA receptors is heterogeneously modified in aged rats (Schmidt et al., 2010), with a 30% reduction in the wildtype α1 subunit compared with young controls, as well as in certain subtypes of β- and γ-subunits (Caspary et al., 2013). In total, then, despite the picture of post-traumatic auditory neurochemistry standing broadly incomplete, certain characteristic changes appear to occur regardless of insult type. The extent to which these affect physiological function and neural response properties will therefore require examination. Additionally, an improved understanding of the neuropharmacological correlates of auditory trauma would be beneficial in the development of patient treatment related to the compensation for those changes (Schreiber et al., 2010; Mukherjea et al., 2011).

## **NEURONAL SPONTANEOUS AND EVOKED RESPONSE PROPERTIES UNDERGO PLASTIC MODULATION DURING AND AFTER AUDITORY INSULTS**

It is worthwhile considering that, even in the absence of acoustic stimulation, the net firing rate of the auditory nerve is particularly high in healthy animals (e.g., Kiang et al., 1975; Liberman and Dodds, 1984b); as such, cochlear exposure to various insults affecting peripheral and/or central function (particularly pharmacological and age-related) might be predicted to have profound effects on baseline levels of activity throughout the auditory system. Consequently, it is important to evaluate how single cells in the auditory system, when faced with abnormal synaptic inputs, may dynamically modify their activity and, by association, their properties related to the processing of acoustic stimuli.

#### **AUDITORY NERVE**

The activity of auditory nerve fibers (ANFs) changes in response to various experimental insults. Interpretation of data concerning post-traumatic amplitude transfer functions and temporal coding under diminished hearing conditions has been informed by classical physiological studies. In early experiments, wholesale mechanical cochlear ablation was shown to yield a complete loss of spontaneous activity within the auditory nerve and its afferent terminals within the ventral cochlear nucleus (VCN) (Koerber et al., 1966). This ultimately produced substantial fiber loss and density reduction throughout the VCN (Gentschev and Sotelo, 1973; Morest and Bohne, 1983; Morest et al., 1997). Under such circumstances, it is apparent that some ANFs remain functionally normal, even if the unlikelihood of recording from an active fiber scales with the degree of IHC damage (Wang et al., 1997). In such cases, functional normality is apparently contingent upon modification of the electrical properties of individual axons (Fryatt et al., 2011), and the survival of the OHC population. The latter is particularly necessary in maintaining sharp, low threshold tuning at the fibers' respective characteristic frequencies (CF) (Kiang et al., 1975; Dallos and Harris, 1978).

Nerve response properties may nevertheless be affected in other ways. By chronically monitoring the CAP over a period of weeks, long term (≤5 weeks post-exposure) reductions in the maximum evoked CAP magnitude can be observed for chinchillas with selective IHC ablation (Qiu et al., 2000) (**Figure 3**). Since the CAP represents the summed activity of multiple fibers, reduction in the steepness (or gain) of the nerve's rate-level function (RLF) may ostensibly derive from either, or each, of two sources: a similar reduction in gain of the RLFs of individual fibers; or a loss of fibers encoding high intensity sounds (Liberman, 1982; Taberner and Liberman, 2005). Each is likely, particularly in conditions of permanent threshold elevation and the associated cochlear pathophysiology. Certainly, acoustic exposure, yielding a range of threshold elevations, was reported to mediate a reduction in the gain of individual fiber responses to a spectrally diverse range of stimuli, regardless of the fiber's baseline spontaneous firing rate (SFR) (Heinz and Young, 2004; Heinz et al., 2005).

By contrast, even if cochlear trauma is minimal, such as following recovery from an acoustically-induced "temporary" hearing loss, a frequency-selective deactivation of ANFs responsive up to high intensities may occur, causing a shift in the distribution of ensemble spontaneous activity toward higher firing rates (*>*20 sp/s) (Furman et al., 2013) (**Figure 1**). This pattern of deafferentation has been associated with a depression in IHC ribbon synapse density (see above), and has been observed to arise in presbyacusis (Schmiedt et al., 1996). Thus, cochlear trauma is

potentially linked with a degradation of high intensity coding beginning at the level of the auditory nerve, in spite of possible threshold preservation. As a matter of comparison, although the ototoxic and threshold-elevating aspects of salicylate administration are well-documented (Cazals, 2000), its effects on spontaneous rate are ambiguous. Auditory nerve hyperactivity has been reported following salicylate infusion (e.g., Ruel et al., 2008) or systemic treatment (Evans and Borerwe, 1982; Cazals et al., 1998). From the available data, however, this phenomenon may simply be a by-product of extremely high systemic concentrations, far beyond those generally observed in human patients (Stolzberg et al., 2012).

### **VENTRAL AND DORSAL COCHLEAR NUCLEUS**

As a major site of ANF afferent termination, neurons in the VCN would be expected to display profound changes in activity, putatively compensating for the proposed reduction in peripheral drive. Given the high SFRs of ANFs, even the "silent" loss of high-threshold (low spontaneous rate) fibers might have profound effects on the baseline activity of VCN cells. SFRs of cells in the VCN appear to be fundamentally determined by ANF activity (Koerber et al., 1966). Although it is marked by a frequency-specific increase in threshold to tonal stimulation and apparent loss of evoked sensitivity in mouse presbyacusis (Willott et al., 1991), the VCN nevertheless may display an increase in contralaterally driven activity (Bledsoe et al., 2009).

The idea that the VCN expresses post-lesion response enhancement through plastic remodeling is intriguing. In the case of a partial cochlear lesion achieved mechanically or acoustically, a significant elevation of spontaneous discharge developed (Vogler et al., 2011). Interestingly, only in the former case was this increase limited cochleotopically, and in terms of the cell subpopulations concerned, with primary-like and onset response-type neurons displaying the largest enhancements (Vogler et al., 2011). This seems to confirm the reported enhancement of tone-evoked activity by transient acoustic trauma in those neural types (Boettcher and Salvi, 1993). By comparison, when identified units of the VCN were recorded following permanent hearing loss, primary-like neurons in particular displayed diminished rate-level function gain (RLF), whereas non-primary chopper type cells consistently exhibited monotonic steep RLFs (Cai et al., 2009). This remodeling of neuronal input/output functions is potentially a consequence of diminished inhibitory function, in conjunction with adaptive hyperactivity after diminished afferent drive. If this is the case, the VCN may be a major station for providing an augmented feedforward representation of cochlear activity to higher levels of the auditory pathway (Noreña, 2011; Noreña and Farley, 2013).

As the other major subdivision of the CN, and thus in receipt of ANF afferent terminals, neurons in the dorsal cochlear nucleus (DCN) have been extensively investigated for changes in excitability under a variety of traumatic conditions. A significant SFR increase in DCN cells develops within a few (2–5) days following cochlear trauma, and persists for weeks thereafter (Kaltenbach and McCaslin, 1996; Zhang and Kaltenbach, 1998; Kaltenbach and Afman, 2000; Kaltenbach et al., 2000; Chang et al., 2002; Finlayson and Kaltenbach, 2009; but see Ma and Young, 2006). The intrinsic excitability of fusiform cells (the major excitatory output from the DCN) is elevated bilaterally even after unilateral noise exposure (Brozoski et al., 2002). This activity change mirrors those seen in the response profiles of fusiform cells (Caspary et al., 2005) and glycinergic cartwheel cells (Caspary et al., 2006) during presbyacusis onset in rats. Although amplified responses may be derived from systemic inhibitory dysfunction (Middleton et al., 2011), the existence of a correlation between OHC disruption and DCN spontaneous discharge indicates that the onset of hyperactivity within the DCN is probably the result of a confluence of factors (Kaltenbach et al., 2002; Rachel et al., 2002).

#### **INFERIOR COLLICULUS**

At the level of the inferior colliculus (IC) in the auditory midbrain, the predictability of trauma-related neuronal excitability modulations becomes far less concrete, with outcomes varying as a function of exposure mechanism and recovery time. Spontaneous firing rate (SFR) elevations have been characterized in the days-to-weeks following an acoustic trauma (Dong et al., 2009; Mulders and Robertson, 2009; Manzoor et al., 2013; Robertson et al., 2013), complete cochlear deafferentation (Moore et al., 1997) and acutely after high-dose salicylate injection (Jastreboff and Sasaki, 1986; but see Ma et al., 2006). Generally, SFR develops in a manner that is correlated with those cochlear regions displaying equivalent bandpassed CAP threshold increases, with IHC and/or OHC loss (Willott et al., 1988a; Mulders et al., 2011). The origin/s of this hyperexcitability remains under debate, however. First, the peripheral origin of spontaneous rate enhancement appears to be contingent on some degree of OHC ablation only. This is suggested by carboplatin injection in chinchillas—selectively lesioning the IHC population—which had no effect on SFR of IC neurons 7–9 months after trauma. By contrast, such elevations did emerge following acoustic or cisplatin trauma, each of which produces some degree of OHC damage (Bauer et al., 2008).

Moreover, sectioning of the cochlear nerve has been reported to normalize trauma-related hyperactivity in the IC central nucleus (CNIC) (Mulders and Robertson, 2009), as has lesion of the DCN (Manzoor et al., 2012). These data suggest that the IC may chiefly adopt the evolved, persistent hyperactivity of the DCN (Zacharek et al., 2002; Zhang et al., 2006). Nevertheless, longer term observations have revealed that the IC may transition through a period of SFR lability prior to a more permanent acquisition of hyperactivity. By contrast with ipsilateral cochlear nerve lesions performed at 2–6 weeks after acoustic trauma, when identical auditory neurectomy was repeated at 8– 12 weeks, IC hyperactivity remained (Mulders and Robertson, 2011). Evidently, then, the IC represents another auditory centre that displays compensatory upregulation of resting activity in response to trauma (Robertson et al., 2013).

Available data remain controversial regarding the evoked input/output gain function changes in IC neurons. In the extreme case of complete cochlear ablation, little-to-no effect upon the response properties of the contralateral CNIC during acoustic stimulation of the intact ear was initially reported (Nordeen et al., 1983). Strong response enhancement has, however, more recently been demonstrated, when the impacts of similar cochlear lesion models were investigated acutely and after three months (Popelár et al., 1994; McAlpine et al., 1997). In addition to an enhancement of the CNIC neural responses to ipsilateral acoustic stimulation, an increase in evoked activity on the contralateral side develops in the next three months after the cochlear insult. The dynamic range of recorded units were also found to be selectively broadened. This adaptive reweighing of collicular responses to favor input from the remaining ear is thought to be motivated by contralateral deafferentation removing the afferent drive to which these cells are predominantly responsive.

How, then, do the collicular cells respond to acoustic stimulation following an incomplete lesion? While response thresholds may be transiently elevated, largely as a function of OHC damage or apoptosis (Salvi et al., 1990), in the case of OHC preservation, threshold shifts are undetectable (McFadden et al., 1998; Alkhatib et al., 2006). When suprathreshold stimulation levels are considered, whereas maximum response amplitude enhancement has been documented (Willott and Lu, 1982; Salvi et al., 1990; McFadden et al., 1998), more often the loss of IHC integrity appears to engender a reduction in neuronal suprathreshold responsivity (McFadden et al., 1998; Qiu et al., 2000; El-Badry and McFadden, 2007) (**Figure 3**). This is akin to the effects of cochlear administration of salicylate (Sun et al., 2009a), which is thought to be active against inhibitory pharmacology (Su et al., 2009). In addition, there is a reduction in the proportion of non-monotonic-type rate level functions (Alkhatib et al., 2006), a response pattern that is characteristically reliant upon normal inhibitory activity. In presbyacusis-affected C57Bl/6 mice, similar effects were dependent on cochlear threshold elevation (Willott et al., 1988b), since rate-level function steepening did not emerge in aged CBA/J mice that displayed less marked cochlear degeneration (Willott et al., 1988a). This correlation between peripheral disruption and alterations of rate-level function statistics, plus the preponderance of depressed suprathreshold evoked potential amplitudes, seems to support the conclusion that loss of peripheral normality may selectively depress inhibitory function within the IC.

### **AUDITORY THALAMUS**

The auditory thalamus (medial geniculate body, MGB) is a major lynchpin in the transmission of sensory information to the auditory cortex, and a key mediator of auditory receptive field properties of cortical neurons. Nevertheless the effects of trauma upon MGB neurons have been little explored thus far, particularly in terms of categorical shifts in response properties following auditory insult.

Suprathreshold neural response latencies are predominantly retained across a range of characteristic tuning frequencies following partial cochlear ablation (Kamke et al., 2003). Yet, indications of excitability shifts were provided by evaluating shifts in the strength of synchronous activation, specifically in the ventral division of the MGB (Sun et al., 2009b). These latter recordings were only reported for a single animal at each of 4 and 24 h after trauma. Nevertheless, they are congruent with the only comprehensive evaluation of insult-related response plasticity in the auditory thalamus (Richardson et al., 2012, 2013). By performing patch recordings made in the MGB of aged rats, substantial reductions in the amplitude of the evoked phasic and tonic GABAergic current density throughout the MGB were seen. Moreover, region-specific modulations of spontaneous inhibitory activity appeared to develop during the aging process. This was seen in comparisons of the dorsal (non-lemniscal) division of the MGB, which underwent significant reductions in spontaneous inhibitory postsynaptic current, and the ventral (lemniscal) MGB, where a paradoxical enhancement developed in parallel (Richardson et al., 2013). On the basis of these limited but intriguing observations, it appears that the MGB, as a whole, displays the characteristic perturbation of GABAergic inhibitory signaling seen in other auditory centers.

## **AUDITORY CORTEX**

By the level of primary auditory cortex (A1), basal activity and suprathreshold responses might each be conceivably elevated, based on its afferent inputs. Certainly, aging has been demonstrated to bring about spontaneous rate increases in cortical units in a number of species, including rat (Hughes et al., 2010) and rhesus macaque (Juarez-Salinas et al., 2010). A similar effect appears to manifest after acoustic overexposure (Noreña and Eggermont, 2003, 2006), though at a delay of a few hours after trauma, leading to sustained augmentation across A1 for weeks to months (Seki and Eggermont, 2003; Engineer et al., 2011).

In the case of acoustic trauma, cross-correlation analysis revealed coincident enhancement of neural spiking synchrony. This elevation in synchrony was particularly prevalent in those regions that displayed some amount of threshold elevation posttraumatically (Noreña and Eggermont, 2006). Based on the observation that this aspect of neural activity was enhanced almost instantaneously following trauma (Noreña and Eggermont, 2003), and prior to the onset of spontaneous rate enhancements, correlated network spiking may be acting as a precursor to the ongoing plastic modulation of single cell and network response properties.

In each of the studies to have evaluated the effects of salicylate administration on auditory cortex basal activity, the route of administration appears to affect the outcome, even if salicylate may qualitatively raise thresholds, like certain other forms of trauma. Depression of neural activity ensues following direct application of salicylate to the cortical surface (Lu et al., 2011). Systemic administration has, by contrast, been associated with wholesale depressions in spontaneous rate (Yang et al., 2007; Zhang et al., 2011) or simultaneous elevations and depressions in spontaneous discharge (Ochi and Eggermont, 1996; Lu et al., 2011). This latter phenomenon appears to indicate a process of normalizing the mean ensemble discharge by augmenting the spontaneous rate of minimally active units, and depressing the basal discharge of those units with high spontaneous activity. These shifts in spontaneous rate develop in the absence of short-term modulation of intracortical spike synchrony (Ochi and Eggermont, 1996). This is certainly intriguing, given that synchrony is, apparently, enhanced almost immediately after (non-salicylate-based) cochlear trauma.

The idiosyncratic nature of these perturbations, relative to other experimentally induced insults, is further evidence that salicylate is likely to have complex, multifactorial effects throughout the auditory system, both peripheral and central. For example, enhanced driven neural activity is consistently observed across the tonotopic extent of the cortical surface even when salicylate is administered by different routes, such as systemically (Qiu et al., 2000; Yang et al., 2007; Sun et al., 2009a; Deng et al., 2010; Noreña et al., 2010; Lu et al., 2011; Zhang et al., 2011) (**Figure 3**), or directly to the cortical surface (Lu et al., 2011) the latter approach being unlikely to bring about changes within the cochlea itself. The likelihood that this stably inducible effect is mediated via a suppression of GABAergic signaling efficacy is strongly suggested by the absence of any such enhancement when agents that positively modulate GABA pharmacology were coadministered (Sun et al., 2009a; Lu et al., 2011).

If, on the other hand, cortical evoked activity patterns are contrasted with those in the cochlear nuclei or the inferior colliculi induced by peripheral pathology, A1 appears to display changes that are uniquely indicative of early adaptation to deprived or modulated sensory input. In one case, recordings from cortical neurons were collected during the 50 h following cochlear ablation (Moore et al., 1997). In that experiment, there was a gradual decline in response thresholds to best-frequency stimulation of the ipsilateral (unlesioned) ear, suggesting adaptive reweighing to favor conserved inputs was occurring.

The evolution of this reweighing appears to vary according to the cortical region affected most severely by deafferentation, leading to cases in which driven activity may simultaneously increase and decrease within the same animal (Noreña et al., 2010). Still, a greater proportion of monotonic rate-level functions are seen in neurons tuned to frequencies outside of the region of cortex responding to the lesioned cochlea (Noreña and Eggermont, 2003) (this region of the auditory lemniscal pathway has been dubbed the "lesion projection zone," or LPZ Schmid et al., 1996; Calford, 2002). This observation, alongside that of enhanced evoked (Engineer et al., 2011) or maximum firing rates, may well be illustrative of enhanced excitatory activity as well as reduced inhibitory regulation (e.g., Rajan, 2001).

Evidence that the mechanistic underpinnings of this putatively homeostatic reweighing may operate for both excitatory and inhibitory synaptic signaling has been suggested in single cell recordings from animals with noise-induced threshold elevations. Whereas cells outside of the LPZ displayed proportional enhancement of excitatory and inhibitory activity, those neurons that underwent peripheral deafferentation instead displayed reweighing toward more excitatory responses (Yang et al., 2011) (**Figure 4**). This effect was correlated with reduced tonic GABAA signaling and GAD65 expression in the LPZ. Reduced inhibitory signaling efficacy is common to a rodent presbyacusis model (Llano et al., 2012), as is the description of significantly depressed tonic GABAA activity at the level of the ventral MGB (Richardson et al., 2013). As such, the elevation of response magnitudes in the auditory cortex, rather than simply being the result of passive unmasking of pre-existing excitatory inputs, may be the result of homeostatic reweighing of physiological excitatory/inhibitory balance toward network hyperexcitability. In future work, it will be important to disentangle the degree to which this homeostatic plasticity is endogenous to each auditory centre recorded from in a trauma model, or if the apparent excitability changes are simply inherited from elsewhere. It is likely that a combination of each is the case, though recent data nevertheless indicate the existence of categorical differences between centers with respect to neuronal excitability (e.g., Cai et al., 2014).

#### **INSULT-MEDIATED CHANGES IN NEURAL SPECTROTEMPORAL RECEPTIVE FIELDS AND NETWORKS**

Insults to the auditory system are capable of disrupting or even abrogating the network response features to stimuli that vary in spectral and temporal content. Efforts at characterizing these network-level modulations have typically taken two approaches: defining the spectral bandwidth characteristics of auditory cells in the tonotopically organized centers of the lemniscal auditory pathway, and their temporal tuning characteristics in response

**FIGURE 4 | The balance of excitation and inhibition is differentially modified by band-limited acoustic trauma according to the frequency tuning of cortical neurons.** Following acoustic trauma (123 dB SPL, 4 kHz, 7 h) of adult rats, *in vitro* patch recordings from primary auditory cortex layer II/III pyramidal neurons were performed to evaluate relative changes in excitability between neurons with low- or high-categorized characteristic frequency tuning. In response to square current pulse injection, low-CF neurons showed no enhancement in the number of spikes fired **(A)**, whereas high-CF cells displayed significantly elevated excitability **(B)**. Modified with permission from Yang et al. (2012). On recording miniature excitatory postsynaptic potentials (mEPSCs), there was a significant increase in the amplitude **(C)** and frequency **(D)** of mEPSCs in low-CF neurons only following trauma. Similar analysis of miniature inhibitory postsynaptic potentials (mIPSCs) revealed a significant trauma-driven elevation of mIPSC amplitude in low-CF neurons **(E)**, while the frequency of mIPSCs was significantly depressed in high-CF cells **(F)**. Modified with permission from Yang et al. (2011). <sup>∗</sup>*p* ≤ 0*.*05; ∗∗*p* ≤ 0*.*01.

to more complex, time-varying stimuli, such as amplitude- or frequency-modulated sounds.

#### **FREQUENCY RECEPTIVE FIELD REMAPPING**

Among the earlier explorations of ensemble responses to deafferentation and trauma are the systematic characterizations conducted of the tonotopic remapping that develops following the insult. This phenomenon (reviewed in Calford, 2002; Irvine, 2007; Kilgard, 2012) concerns the representation of the cochlear frequency domain along an axis of a lemniscal auditory centre that is modified to produce an enhanced sensitivity to one or more frequency bands—effectively to the exclusion of others. Pioneering work from Irvine and colleagues in the guinea pig (Robertson and Irvine, 1989), later extended to the adult cat (Rajan et al., 1993), unveiled a systematic flattening of the primary auditory cortex tonotopic map that was contralateral to partial cochlear damage (**Figure 5**). This damage was characterized by complete hair cell destruction with limited (*<*5%) retention of spiral ganglion afferents. In these experiments, a notable change in neural frequency receptive field properties emerged upon comparing acoustic stimulation to the ipsilateral and contralateral ears: whereas in the naive animal the monaural tonotopic maps were effectively in register for each form of tonal stimulation (within ±0.03 kHz), an asymmetry of responsiveness

**FIGURE 5 | Remapping of the tonotopy in the primary auditory cortex following a variety of peripheral insults. (A)** A flattening of the linear progression of characteristic frequency (CF) tuning when units were recorded along the dorsoventral axis of primary auditory cortex is observed following partial mechanical lesion of the cochlea in adult cats (individual subjects, upper and lower. Each datapoint represents a single unit CF. Modified with permission from Rajan et al. (1993). **(B)** A similar flattening of tonotopic progression is demonstrated as a function of distance along the cortical surface dorsoventrally in adult cats exposed to narrowband-passed acoustic trauma (crosses), compared with naive controls (circles). Modified with permission from Seki and Eggermont (2002). **(C)** In individual adult chinchillas (left, right) exposed to amikacin-induced basal cochlear hair cell lesions, there is a rearrangement of cortical responses to preference low characteristic frequency responses at regions of the cortex normally responsive at threshold to higher frequency stimuli (shaded region). Scale bars 1 mm. Modified with permission from Kakigi et al. (2000). **(D)** When individual multiunits were isolated in the adult rat primary auditory cortex (each unit is a different color), together spanning the hearing range of the animal, **(E)** systemic injection of salicylate (250 mg/kg intraperitoneal) produced a dynamic retuning of multiunits toward a CF range of 10–20 kHz by 2.5 h after injection. Modified with permission from Stolzberg et al. (2011).

following peripheral lesion developed such that only the contralateral frequency response map was flattened.

Similar, high-frequency, chemical spiral ganglion lesions may also induce tonotopic remapping process, being refined over the course of months to yield enhanced cortical representation of low frequency sounds (Schwaber et al., 1993; Kakigi et al., 2000) (**Figure 5**). Notably, however, the phenomenon may be observed within days to weeks following selective deafferentation (Seki and Eggermont, 2002; Noreña et al., 2003; Engineer et al., 2011; Yang et al., 2011). Frequency reweighing in the cortex following salicylate has also been described (Stolzberg et al., 2011). Rather than displaying shifts toward the low-frequency edge of the retained normal inputs, instead neurons became abnormally sensitive to stimuli between 10 and 20 kHz at physiological thresholds. In these experiments, recorded units displayed substantially broadened bandwidths, implying that a large degree of their sensitivity to pre-exposure CFs was presumably retained (Stolzberg et al., 2011). It is also rather striking that both salicylate exposure (Deng et al., 2010) and acoustic trauma (Yin et al., 2008) each yield an apparently equivalent reduction in the cortical gap-detection threshold for stimulation off-channel, i.e., completely outside of the region displaying peripheral threshold elevation. This might suggest that each insult commonly disrupts a frequency-insensitive mechanism, or mechanisms, critical to the precise encoding of temporally salient information.

In cases of acute remapping like those described above, the loss of feedforward drive may simply intensify the *relative* expression of certain frequencies that had previously occupied peripheral regions of each unit's frequency response area. While evoked response thresholds are elevated, albeit moderately, by the induction of trauma, the overexpression of particular bandwidths tends to occur outside of the LPZ. Elevated driven response magnitudes, in addition, would imply the existence of a short-term plastic unmasking of afferent inputs that affects the relative interaction of excitatory and inhibitory receptive fields (Rajan, 2001). This phenomenon may develop even in the absence of CF shift (Rajan, 1998). In the chronic phase, however, the redevelopment of normalized peak thresholds and sharp tuning, with little variance of frequency receptive field width across the cortical surface, advocates for intrinsic plastic phenomena, adaptively reweighing neural sensitivity in favor of the remaining inputs.

Furthermore, there exists the possibility that cortical remapping may be inherited, in part, from plasticity in subcortical centers. Recordings made in the ventral division of the MGB contralateral to the cochlear lesion indicate the existence of putative tonotopic remapping in that centre (Kamke et al., 2003). The importance of the thalamocortical circuitry in potentially mediating high-level remodeling of receptive fields after peripheral receptor organ damage, as distinct from simple rostral transmission of deafferented, occasionally silenced (Rajan and Irvine, 1998), projections can be hypothesized.

As a point for comparison, the smooth, linear progression of tonotopy along the dorsoventral axis of the CNIC was predictably flattened after a cochlear lesion, producing an expanded region in which the characteristic frequency of collicular cells matched that of the lower edge of the LPZ (Irvine et al., 2003). However, there was a pronounced dichotomy among those neural units measured, in terms of the stimulus intensity required to drive activity. Although many neurons displayed markedly raised thresholds, akin to those observed in the VCN, others presented driven onset activity at ecologically relevant sound levels. While there exists the possibility of selective remodeling of neural connectivity, the fact that similar tuning patterns were observed in the acute phase following high-frequency spiral ganglion lesions (Snyder et al., 2000, 2008; Snyder and Sinex, 2002) might suggest a release from inhibition of previously subthreshold inputs coding for infra-LPZ frequencies. Indeed, these inputs are highly convergent, and partially comprise suprathreshold ipsilateral projections, which in the lesioned animal are enhanced approximately 5-fold (Irvine et al., 2003; Izquierdo et al., 2008).

Interestingly, such adaptive remodeling may not be universal. In recordings from A1 in the short term, tonotopic remapping has been found to normalize over the course of a week posttrauma (Ahlf et al., 2012). Moreover, following a low-frequency permanent deafferentation, ipsi- and contralateral cortical tuning were found, by 6 months post-insult, to be brought back into register (in contrast with the conserved tuning described above) (Cheung et al., 2009). The mechanism by which this occurred involved an elevation of the thresholds and reweighing of the receptive field characteristics of the cortex ipsilateral to the lesion. This is of particular concern for studies into the longer-term perceptual effects of monaurally limited hearing loss, given that this process develops despite the apparent disadvantages involved with accommodating for the lesioned ear bilaterally, and may be important in considering the utility of fitting hearing aids as early as possible in unilaterally impaired audiological patients. It is, also notable that the frequency receptive fields of infragranular cortical neurons were acutely unstable in aged rats (Turner et al., 2005). Concurrent with a remapping of frequency representations at the cortical surface associated with presbyacusis-mediated cochlear threshold shifts (Willott et al., 1993), an age-related change to the distribution of receptive field shapes of infragranular cortical neurons appears to develop, with overrepresentation of "complex," non-V-shaped fields (Turner et al., 2005). How this may affect frequency-dependent perception remains to be seen.

### **CHANGES IN PROPERTIES OF NETWORK RESPONSES TO TEMPORALLY COMPLEX STIMULI**

In the context of complex stimulus processing, it is possible that more fundamental shifts in neural output may distort the representation or encoding of stimuli in "higher" cortical centers. Recordings in the rhesus monkey have highlighted an age-related loss of hierarchical abstraction across cortical fields, such that in young monkeys the spatial, directional tuning acuity of neuronal responses is amplified from A1 to the more secondary caudolateral field (area CL), whereas such refinement is absent in older animals (Juarez-Salinas et al., 2010). These effects derive from reductions in inhibition to off-target locations in both CL and A1 (yielding broader receptive fields), as well as a reduction in onset latency in CL, suggesting that plasticity may selectively arise in the primate corticothalamic system, thus enhancing the representation of primary afferent stimuli in non-primary cortical fields (Engle and Recanzone, 2012). To the extent that such cortical remodeling may have the capacity for normalization, recent experiments have demonstrated that following auditorydriven behavioral training, aged rats displayed rectification of the tonotopic map found in A1, as well as in neuronal spectrotemporal response properties (De Villers-Sidani et al., 2010). The correlation between these functional improvements in A1 network activity with a post-training enhancement in parvalbumin levels—a marker of fast spiking inhibitory neurons important to perception- and learning-derived network plasticity (e.g., Donato et al., 2013), which undergoes downregulation following age-related hearing loss (Martin del Campo et al., 2012)—is further indicative of the possible maladaptive effects generated by inhibitory dysregulation.

The suggestion that temporal processing deficits may endure at the level of the IC, having also been documented in the cochlear nuclei, is apparent even from ABR analysis, which provides a surrogate for synchronous network activation (Buchwald and Huang, 1975). In long-latency ABR waveforms, which correlate with auditory midbrain activity (Melcher and Kiang, 1996; Melcher et al., 1996), clear age-dependent modulations in network response timing emerged among older animals (Nozawa et al., 1996). Although no such age-related differences were found during frequency-modulated stimulation of collicular cells (Lee et al., 2002), markedly worsened synchrony to modulated stimulation in aged animals was effected by the addition of background noise (Parthasarathy et al., 2010) and by varying the stimulus modulation depth (Parthasarathy and Bartlett, 2011). Regarding modulation rate, substantial deficits are evidently present at higher modulation rates (Parthasarathy and Bartlett, 2012), and indeed for collicular multi-units, gap-detection thresholds were significantly elevated in aged CBA/CaJ mice (Walton et al., 1998).

These temporal processing deficits align conceptually with observations from a two-tone suppression protocol that indicated significant differences in post-stimulus suppression and facilitation among collicular neurons (Finlayson, 2002). Here, some aged cells displayed abnormally long suppression time constants, indicative of altered encoding of temporally precise stimuli. Since a reduction in rise times to sinusoidally amplitude-modulated (SAM) stimuli also developed in aged CBA mice (Simon et al., 2004), inhibitory dysregulation affecting IC information processing may be a factor common to auditory trauma phenotypes.

It is, however, important to note that specific changes appear to develop across the different IC subdivisions. External cortex (ECIC) neurons in F344 rats displayed a proportional shift toward non-monotonic response when presented with contralateral stimuli (Palombi and Caspary, 1996a), although response bandwidths and net dynamic range of ECIC and CNIC cells failed to change over time with aging. This absence of effective age-related response modulation in F344 rats was also reported during binaural stimulation, typically associated with ipsilaterally-derived suppression of contralaterally-derived excitation (Palombi and Caspary, 1996b). However, when the same authors investigated responses to temporally-complex SAM stimuli, both ECIC and CNIC neurons showed clear divergence from those response distributions recorded in young rats (Shaddock Palombi et al., 2001). Such data concur with experiments in the mouse, in which dramatic differences in preferred modulation rate emerged in aged animals, reducing from 200 to 70 Hz in the upper quartile of units recorded (Walton et al., 2002). Despite failing to express consistent tonotopic changes, used elsewhere as a metric for neuroplastic rearrangement, the clear changes to the temporal coding operations of the IC, which may have behavior- and context-specific impacts with age (Harrison, 1981; Brown, 1984), are strong evidence of cellular- and network-level imbalances with functional consequences.

## **ALTERED CROSS-MODAL SENSITIVITY OF THE AUDITORY SYSTEM FOLLOWING TRAUMA**

The occurrence of receptive field expansion in the traumaexposed auditory system is therefore interesting, since integration across neuronal computational modes may have unexpected network or perceptual consequences. Indeed, aside from displaying compromised acoustic responsivity, the bilaterally deafferented auditory cortex comprised a significant proportion of neurons that exhibited salient responses to somatosensory inputs (Allman et al., 2009). However, the proportion displaying verifiable multisensory integration, under conditions of selectively preserved auditory input, is in fact diminished relative to control animals (Meredith et al., 2012).

In related work from Shore and colleagues, an enhancement of somatosensory input into the DCN driven by traumatic threshold elevations has been described (Shore et al., 2008). This modified sensory afferent input in the DCN is accompanied by alterations of the typical stimulus timing-dependent plasticity rules for bimodal integration during successive auditory/somatosensory stimulation (Dehmel et al., 2012b; Koehler and Shore, 2013a,b) (**Figure 6**), generally producing a long-term enhancement of unimodal acoustically evoked activity. Given that earlier descriptions of the effects of cochlear trauma concluded the absence of plastic remapping in the DCN (Kaltenbach et al., 1992; Rajan and Irvine, 1998), it is worthwhile, in future work, to consider alternative plasticity mechanisms, including cross-modal reorganization, which may develop post-traumatically throughout the auditory neuraxis.

## **THE FUNCTIONAL AND BEHAVIORAL IMPLICATIONS OF TRAUMA-DRIVEN AUDITORY PLASTICITY**

A constellation of neural changes evolves in parallel throughout the auditory system in response to the challenge posed by the introduction of mechanically-, pharmacologically-, acoustically-, or aging-derived insults (**Table 1**). These changes can be classed as dynamic modulations of spontaneous and driven neural activity, the underpinnings of which appear to rely upon putatively homeostatic gain-modulation mechanisms that affect the extent of neural excitability, and selectively modulate the balance of excitation and inhibition. Changes at the single neuron and network processing levels can be demonstrated *in vivo* (e.g., Noreña, 2011; Yang et al., 2011, 2012) and by *in silico* modeling (Dominguez et al., 2006; Schaette and Kempter, 2006, 2008, 2012; Chrostowski et al., 2011; Tass and Popovych, 2012; Schaette, 2013). Although these neurophysiological sequelae are becoming better understood, it remains an ongoing problem as to whether the various components of central modulation, at different levels of the auditory pathway, might translate into perceptual abnormalities, like tinnitus, that could interrogated by behavioral testing.

A major point of inquiry in current auditory neuroscience thus seeks to explain how it is that (subjective) tinnitus, being the perception of an auditory stimulus in the absence of an environmental equivalent, may come about in the wake of auditory trauma of a variety of forms, and manifest in only a proportion of any tested cohort (e.g., Eggermont and Roberts, 2004; Roberts et al., 2010; Noreña and Farley, 2013). In addition, concerns have recently been raised regarding the capacity to behaviorally test for the presence/absence of some abnormal positive percept. A possible division might be drawn between tinnitus as distinct from other correlated perceptual abnormalities, such as hyperacusis (Baguley, 2003), or from the predicted outcomes of hearing loss in the absence of any positively generative perceptual changes (Eggermont, 2013).

To that extent, a number of questions ought to be addressed: what classes of behavioral changes have been conclusively shown to develop following the introduction of some specific auditory insult (in the absence of *a priori* assumptions regarding the perceptual correlates of these behavioral data)? What are the neurobiological changes that have been shown to develop *in parallel* with these behavioral changes? And is it possible to conclude some effective *functional* link between certain aspect/s of the underlying insult-affected neurobiology and a categorical class of phantom percept?

#### **COCHLEAR LESION-INDUCED BEHAVIORAL THRESHOLD MODULATIONS**

Of the behavioral studies that have been conducted related to sensorineural auditory insults in adult mammals, the range of observed effects correlates well with predictions made according to early physiological studies. In particular, those studies aimed at disambiguating the changes that develop in the periphery and at the level of the auditory nerve have been informative. Some insults have little effect on detection thresholds—for example, partial section of the auditory nerve in cats was found to have no impact on intensity thresholds except in the most severe of cases (Neff, 1947; Schuknecht and Woellner, 1953, 1955). Clearly, in the absence of complete destruction of afferent cochlear transmission, a small population of surviving neurons is sufficient to provide a basis for accurate and sensitive detection profiles. Conversely, damage at the cochlea is associated with behavioral threshold elevation in a manner related primarily to the extent of OHC loss (Ryan and Dallos, 1975; Hawkins et al., 1976; Ryan et al., 1979). After that kind of trauma, auditory nerve fiber tuning curves displaying marked loss of both sharp peak tuning and sensitivity by approximately 40 dB SPL at maximum.

In lieu of OHC loss, there is relatively good preservation of audiometric thresholds (Lobarinas et al., 2013b), whereas frequency discrimination is apparently reliant on effective preservation of the IHC population (Nienhuys and Clark, 1978). It is likely that the majority of peripherally-mediated insults affect one or both subtypes of hair cell population, even in cases in which the animal's electrophysiologically characterized thresholds are normalized following trauma (Kujawa and Liberman, 2009). As such, it might be predicted that the perceptual experience of an adult mammal subject would be affected in some fashion.

exposed but unimpaired animals. Sham = gray; GPIAS-unimpaired = pink; GPIAS-impaired = red. Modified with permission from Koehler and Shore (2013a). **(C–E)** Following exposure to acoustic trauma (octave band noise centered at 16 kHz, 115 dB SPL, 1 h), adult rats underwent daily pairing of tonal stimuli outside the 8–10 kHz frequency bandwidth, either with vagal

bandwidth, evoked spike response number, and neural spike-timing synchrony. For each bar plot, leftmost bar = naive; middle bar (gray) = sham-tone pairing therapy; rightmost bar = VNS-tone pairing therapy. Asterisk = significantly different (*p <* 0*.*05) compared with controls. Modified with permission from Engineer et al. (2011).

More recently, the focus of behavioral characterization has included the evaluation of changes in temporal processing ability, primarily (though not exclusively) in the form of testing gapdetection thresholds following insult. Electrophysiological data have demonstrated that neurometric gap detection thresholds deteriorate in the awake animal with aging (e.g., Recanzone et al., 2011) or acoustic trauma (e.g., Yin et al., 2008) (c.f. unilateral cochlear ablation, in which acoustic stimulation of the intact ear produced normal neurometric functions Kirby and Middlebrooks, 2010). Such neural changes appear to be behaviorally reflected by reductions in psychometric discrimination performance using gap-in-noise (Giraudi-Perry et al., 1982; Salvi and Arehole, 1985; Rybalko and Syka, 2005; Gold et al., 2013) or amplitude-modulated stimuli (Henderson et al., 1984). Performance of these operant behavioral paradigms is certainly reliant upon the normal function of the central auditory system for effective fine-scale perceptual discrimination of temporallymodulated stimuli. This can be reasoned from the effects of severe bilateral decortication/auditory cortical lesioning, which do not abolish task performance, but nonetheless significantly raises the detection thresholds of adult animals with otherwise normal hearing function (Ison et al., 1991; Kelly et al., 1996; Threlkeld et al., 2008). On the basis of these ablation/inactivation studies, it is not unreasonable to conclude that the accurate perception of the temporal fine structure of the stimuli, and so the performance of operant tasks reliant upon that perception, is expected to be affected following sensorineural insults that perturb normal auditory functionality.

As an additional measure of temporal processing abnormalities, a number of studies have leveraged various paradigms concerning the prepulse inhibition (PPI) of the acoustic startle response in small mammals. The behavior is reflexive, and thus can be recorded in the absence of prior training. It is notable, however, that there are indications of species-related efficacy, or rather lack thereof, of the response (Gruner, 1989; Pilz and Leaton, 1999). Nevertheless, it can be optimized for investigating aspects of an animal's perceptual experience. In particular, suppression of the acoustic startle using a gap-in-noise prepulse (gap-mediated prepulse inhibition of the acoustic startle, or GPIAS) can be modulated according to the specific experimental conditions imposed. These include the temporal interval between the prepulse and the startle stimulus, the amplitude of the startle stimulus, and the spectral content of the background noise masker (Longenecker and Galazyuk, 2012; Lobarinas et al., 2013a; Hickox and Liberman, 2014).

The introduction of an insult akin to those described that affected operant conditioning-derived deficits of gap detection may be interrogated by way of the acoustic startle response. From recordings performed either in the same subjects, or subjects exposed to identical stimuli which impaired GPIAS, aspects of subcortical temporal processing plasticity were found to be selectively affected in animals with gap-detection impairments, as distinct from subjects which have undergone comparable insult exposure, or naive controls (Koehler and Shore, 2013a) (see **Figures 6A,B**). Also, recordings in the auditory cortex have suggested that GPIAS may be a useful metric for evaluating insultmediated changes in cortical activity. In particular, there was correlation between levels of neural synchrony and frequency tuning, and remediation of behavioral deficits observed to occur following a paired vagal nerve stimulation protocol (Engineer et al., 2011) (see **Figures 6C,D**).

Recent data have suggested that the susceptibility of subjects to impaired gap-mediated startle suppression following acoustic trauma might be related to baseline levels of excitability of the auditory neuraxis. Differential outcomes were found among subjects with variable levels of resting cortical firing rates and evoked activity, manifesting in addition as instability of cortical tonotopy (Ahlf et al., 2012). This variable expression of neural excitability is almost certainly related to the plasticity phenotype of animals in various insult conditions. Indeed, systemic salicylate injection, which is understood to affect neural excitability (see above), has been recognized as effective in similarly modulating the suppression efficacy, largely in parallel with upregulation of neural gain in the auditory cortex (Deng et al., 2010; Sun et al., 2014).

Given the extensive excitatory/inhibitory remodeling that is associated with aging (Caspary et al., 2008), it is possibly unsurprising that GPIAS impairment has also been demonstrated in a mouse model of presbyacusis. In this model, behavioral dysfunction was correlated with cortical inhibitory dysfunction demonstrated using novel functional imaging techniques (Llano et al., 2012). GPIAS behavioral impairment is undoubtedly a useful indicator of certain aspects of the possible underlying pathophysiology that affects neuroplastic changes after trauma. It may not be premature to suggest that the GPIAS represents an effective metric of auditory temporal processing deficits that occur subcortically. Indeed, cortical deficits may also modulate subcortical processing of the reflex circuitry by way of strong corticofugal projections (Bajo and King, 2012), which may also undergo insult-driven plastic remodeling. The role of these projections in the context of insult-mediated plasticity is largely unknown, and requires exploration in future experiments.

#### **PERCEPTUAL PHANTOMS AND PROPOSED BEHAVIORAL CORRELATES**

The development of methods for reliably demonstrating a tinnitus-like percept in experimental animals represents another major research topic still in progress. The existence of tinnitus is, by convention, categorically different from the perception of silence. It is thus not unreasonable that a popular approach has sought to detect the phantom percept using operant conditioning paradigms to test perception. Often—but not always—under negative motivation, these require the animal to choose between a background noise stimulus, and either the absence of an acoustic stimulus, or the presence of some acoustic stimulus whose spectral content is deemed similar to the tinnitus percept (Jastreboff et al., 1988; Bauer and Brozoski, 2001; Heffner and Harrington, 2002; Guitton et al., 2003; Rüttiger et al., 2003; Yang et al., 2007, 2011; Sederholm and Swedberg, 2013).

Intuitively, experimental approaches of this kind are likely to be successful in revealing a subjective abnormality in the animal's perceptual state. The evaluation of the decision criterion, which is contingent upon the subject's integration of stimulus- and network-driven auditory activity, is likely to take into account the possible existence of a phantom percept. Indeed, the various demonstrations that long-known tinnitus inducers in human patients, such as high-dose salicylate (Cazals, 2000), bring about behavioral disruptions under such operant conditioning paradigms, indicate that the pursuit of phantom percepts in animals may not be wholly intractable.

More recently, the GPIAS paradigm has been proffered as a potential tool for behavioral interrogation of the phantom percept. Under the working framework that the presence of the tinnitus will internally mask, or "fill-in," the presence of the gap in noise acting to suppress the startle response, a cohort of animals with diminished startle suppression as a function of some auditory insult might be revealed (Turner et al., 2006, 2012). The paradigm has definitely yielded interesting and promising data regarding the behavioral correlates of neurophysiological modulation—related to changes in tuning, evoked and spontaneous single unit activity, and broader network correlational aspects of auditory function—and has been validated in relation to other reported tinnitus detectors (Bauer and Brozoski, 2001; Yang et al., 2007). However, the likelihood that the technique acts as a reliable indicator of tinnitus proper remains under debate (Eggermont, 2013).

If indeed tinnitus "fills in the gap," it is puzzling how it is that modifying the prepulse-startle inter-stimulus interval can renormalize GPIAS activity in animals whose suppression behaviors are selectively compromised (Hickox and Liberman, 2014). In those cases, IHC ribbon synapse pathology was comparable with the pathology displayed by subjects with operantdemonstrated behavioral deficits (Rüttiger et al., 2013; Singer et al., 2013). Moreover, gap-mediated startle suppression was absent during salicylate overdose when the gap-in-noise contained slowly ramped offset windows (Sun et al., 2014). This is despite the same animals displaying effective startle suppression when the gap on/offset ramp characteristics are otherwise modified (Sun et al., 2014). Observations of this kind may indicate that tinnitus, if present, fails to fill in the silent period consistently in the manner suggested in other reports of tinnitus-like behavior.

A further worry is the apparent susceptibility of GPIAS to modulation by unilateral earplug insertion (Lobarinas et al., 2013a) (**Figure 7**). This result has been interpreted as a falsepositive detection of tinnitus in non-tinnitus animals, on the grounds that transient earplugging failed to reveal a tinnituslike behavior under operant conditioning detection (Bauer and Brozoski, 2001). According to recent human data, chronic (*>*7 days) unilateral earplug insertion can induce positive phantom percepts in a majority of young, healthy listeners (Schaette et al., 2012). Animal behavioral studies have thus far only investigated transient earplugging effects, and so are not wholly comparable to the chronic earplugging condition of Schaette and colleagues. Yet, it is notable that enhanced neural synchrony has been postulated as a potential correlate of tinnitus in the auditory cortex (Noreña et al., 2003; Eggermont and Roberts, 2004; Roberts et al., 2010), developing more or less instantaneously following sensorineural

auditory trauma (but not, it seems, following salicylate Ochi and Eggermont, 1996). If the phantom perceptual outcome of plugging operates by way of a similar functional framework as is to blame in "sensorineural" tinnitus (Schaette and Kempter, 2006; Schaette et al., 2012), it is not unlikely that transient tinnitus-like percepts may be inducible in a subset of subjects with a unilateral earplug. This may (Lobarinas et al., 2013a) or may not (Bauer and Brozoski, 2001; Turner et al., 2006) manifest during "tinnitus-sensitive" behavioral testing.

The emergence of non-specific gap-detection deficits has recently been demonstrated among a subset of tinnitus patients (Fournier and Hébert, 2013; but see Campolo et al., 2013), and thus such perceptual abnormalities may well form part of the tinnitus syndrome in animal models thus far reported. It is, however, insufficient to rely on deficits of this kind to categorically define the presence of tinnitus, even in the context of physiological and anatomical features that have been related to the disease in the past. What is required is the development of behavioral paradigms that more specifically disentangle the presence of a phantom percept from the spatiotemporal processing deficits that may arise from auditory trauma, neuroplastic changes, or otherwise. Almost certainly, the implementation of these paradigms would rely upon baseline measurements being obtained in the same animal population as is subsequently exposed to trauma, enabling repeated-measures statistical tests to be leveraged. In addition, the exploration of novel behavioral and pathological models will expand our capacity to interrogate the perceptual effects of central auditory changes which develop following trauma. Prior to causally relating behavioral phenotypes to particular neurophysiological modulations to that have been labeled "tinnitus-like," it would be constructive to develop a fuller appreciation of the neurobiological changes that consistently develop in the wake of tinnitus-related effectors, irrespective of the proposed instantiation of the percept itself.

#### **THE FUTURE OF INSULT-RELATED PLASTICITY AND PERCEPTUAL CHANGES**

Recent circumspection has called into question the proposed relationship between certain neurophysiological changes and the development of auditory phantoms (Eggermont, 2013). Contrasting changes to spontaneous discharge rates were found following different insults that each yielded equivalent behavioral results (e.g., Bauer et al., 2008). There is a lack of correspondence between remediation of behavioral changes and spontaneous rate elevations, despite other neurophysiological features undergoing normalization (Engineer et al., 2011), and an apparently paradoxical depression or amplification of spontaneous rate in certain studies of salicylate administration (Eggermont and Kenmochi, 1998; Yang et al., 2007; Zhang et al., 2011). Similarly, the use of tonotopic remapping as a tinnitus-specific metric can be problematic due to the contrast between short term reversals in tonotopic map plasticity (Ahlf et al., 2012) compared with therapy-related tonotopy restoration (Engineer et al., 2011) in animals with "tinnitus-like" behavior. Indeed, in human patients, both the presence (Mühlnickel et al., 1998; Weisz et al., 2005) and the absence (Langers et al., 2012) of tonotopic remapping have been reported, further frustrating the interpretation of animal data.

While historically less focused upon in research than tinnitus, hyperacusis is gaining traction as a potential sequela of the trauma-induced maladaptive plasticity syndrome. Recent reports have linked the state of enhanced auditory sensitivity to the development of tinnitus in human patients (Dauman and Bouscau-Faure, 2005; Fournier and Hébert, 2013; Hébert et al., 2013). Behavioral data have suggested that the possibility for detection of a hyperacusis-like state in animal models may exist via the emergence of enhanced gain of the acoustic startle reflex, and its attendant prepulse suppression. This type of behavior has been noted following acoustic trauma (Dehmel et al., 2012a; Sun et al., 2012; Chen et al., 2013; Pace and Zhang, 2013; Hickox and Liberman, 2014), salicylate treatment (Turner and Parrish, 2008; Sun et al., 2009a), or aging (Ison et al., 2007).

It is possible that the neurobiological factors mediating the maintenance of hyperacusis are distinct from, though likely related to, those inducing tinnitus. Computational (Zeng, 2013) and molecular models (Knipper et al., 2013) each have sought to assign specific network modulation patterns, particularly those associated with aberrant central signaling gain changes, to the occurrence of post-traumatic hyperacusis. Certainly, evidence of enhanced driven neural activity has been noted subcortically (Cai et al., 2009) and cortically (Qiu et al., 2000; Yang et al., 2007; Sun et al., 2009a, 2012) following various forms of auditory overexposure (see above). It is intriguing to ask how it is that the auditory system's innate gain control systems may be perturbed to potentially generate an abnormal perceptual experience. Indeed, the observed disparities between cortical and subcortical gain functions may in part be explained by hierarchically distinct, physiological differences in gain control (Rabinowitz et al., 2011, 2012). As lines of inquiry, these certainly warrant further examination in integrated behavioral and neurophysiological models, chiefly to answer key concerns regarding the development of the condition, observed with (Dauman and Bouscau-Faure, 2005) and without (Gu et al., 2010) peripheral threshold elevations.

Fruitful investigation is likely to be achieved by evaluating the causes underlying changes to neural gain and network spiking synchrony following some tinnitus-related insult, particularly with respect to rebalancing of excitatory/inhibitory signaling mechanisms (Knipper et al., 2013). A broader consideration of multiregional network effects throughout the auditory pathway is required. It is necessary to highlight the temporally-defined differential effects of post-insult regional disruption, particularly at the level of the auditory brainstem and midbrain (Brozoski and Bauer, 2005; Mulders and Robertson, 2009, 2011; Brozoski et al., 2012). In addition, the necessary involvement of ascending and descending projections throughout the auditory neuraxis in mediating physiological and behavioral changes following auditory insult remain underexplored (Bajo et al., 2010; Bajo and King, 2012). In the absence of any effective and reliable therapeutic measures being available for tinnitus treatment in human sufferers (Baguley et al., 2013; Langguth et al., 2013), it is clear that the insult-related plastic processes related to, and distinct from, perceptual phantom generation must be unraveled for clinical progress to proceed.

## **ACKNOWLEDGMENTS**

We are grateful to Dr. F. R. Nodal and Prof. A. J. King for fruitful comments on a previous version of this review. Our research is supported by the Wellcome Trust (WT07650AIA to A. J. King) and by Action on Hearing Loss (S72\_Bajo).

## **REFERENCES**


Kamke, M. R., Brown, M., and Irvine, D. R. F. (2003). Plasticity in the tonotopic organization of the medial geniculate body in adult cats following restricted unilateral cochlear lesions. *J. Comp. Neurol.* 459, 355–367. doi: 10.1002/cne.10586


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 07 March 2014; accepted: 28 April 2014; published online: 23 May 2014. Citation: Gold JR and Bajo VM (2014) Insult-induced adaptive plasticity of the auditory system. Front. Neurosci. 8:110. doi: 10.3389/fnins.2014.00110*

*This article was submitted to Auditory Cognitive Neuroscience, a section of the journal Frontiers in Neuroscience.*

*Copyright © 2014 Gold and Bajo. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Origin and function of short-latency inputs to the neural substrates underlying the acoustic startle reflex

## *Ricardo Gómez-Nieto1,2,3, José de Anchieta C. Horta-Júnior 1,4, Orlando Castellano1,2,3, Lymarie Millian-Morell 1,3, Maria E. Rubio5 and Dolores E. López 1,2,3\**

*<sup>1</sup> Neuroscience Institute of Castilla y León, University of Salamanca, Salamanca, Spain*

*<sup>2</sup> Department of Cell Biology and Pathology, University of Salamanca, Salamanca, Spain*

*<sup>3</sup> Institute of Biomedical Research of Salamanca (IBSAL), University of Salamanca, Salamanca, Spain*

*<sup>4</sup> Department of Anatomy, Biosciences Institute, São Paulo State University Botucatu, São Paulo, Brazil*

*<sup>5</sup> Department of Otolaryngology, University of Pittsburgh, Pittsburgh, PA, USA*

#### *Edited by:*

*Monica Munoz-Lopez, University of Castilla-La Mancha, Spain*

#### *Reviewed by:*

*Alicia Mohedano-Moriano, University of Castilla-La Mancha, Spain Antonio Viñuela, SECUGEN SL, Spain*

#### *\*Correspondence:*

*Dolores E. López, Neuroscience Institute of Castilla y León, University of Salamanca, C/Pintor Fernando Gallego, 1, 37007 Salamanca, Spain e-mail: lopezde@usal.es*

The acoustic startle reflex (ASR) is a survival mechanism of alarm, which rapidly alerts the organism to a sudden loud auditory stimulus. In rats, the primary ASR circuit encompasses three serially connected structures: cochlear root neurons (CRNs), neurons in the caudal pontine reticular nucleus (PnC), and motoneurons in the medulla and spinal cord. It is well-established that both CRNs and PnC neurons receive short-latency auditory inputs to mediate the ASR. Here, we investigated the anatomical origin and functional role of these inputs using a multidisciplinary approach that combines morphological, electrophysiological and behavioral techniques. Anterograde tracer injections into the cochlea suggest that CRNs somata and dendrites receive inputs depending, respectively, on their basal or apical cochlear origin. Confocal colocalization experiments demonstrated that these cochlear inputs are immunopositive for the vesicular glutamate transporter 1 (VGLUT1). Using extracellular recordings *in vivo* followed by subsequent tracer injections, we investigated the response of PnC neurons after contra-, ipsi-, and bilateral acoustic stimulation and identified the source of their auditory afferents. Our results showed that the binaural firing rate of PnC neurons was higher than the monaural, exhibiting higher spike discharges with contralateral than ipsilateral acoustic stimulations. Our histological analysis confirmed the CRNs as the principal source of short-latency acoustic inputs, and indicated that other areas of the cochlear nucleus complex are not likely to innervate PnC. Behaviorally, we observed a strong reduction of ASR amplitude in monaural earplugged rats that corresponds with the binaural summation process shown in our electrophysiological findings. Our study contributes to understand better the role of neuronal mechanisms in auditory alerting behaviors and provides strong evidence that the CRNs-PnC pathway mediates fast neurotransmission and binaural summation of the ASR.

**Keywords: alertness system, binaural summation, cochlear root neurons, extracellular recordings, neuronal tracers, pontine reticular formation, rat, vglut1-auditory nerve**

## **INTRODUCTION**

The acoustic startle reflex (ASR) is a survival mechanism of alarm, which rapidly alerts and arouses organisms to a sudden loud auditory stimulus. Behaviorally, the ASR involves a rapid and sequential activation of muscles along the length of the body as well as an autonomic physiological response (Prosser and Hunter, 1936; Szabo, 1964; Hoffman and Ison, 1980). The ASR and its modulations, which are sensible to a variety of experimental approaches, can be easily tested in humans and rodents (Braff and Geyer, 1990; Lehmann et al., 1999). Thus, the ASR has been consolidated as an important research tool for studying brain mechanisms of learning, memory, emotions, sensory gating, and movement control (Davis and Sheard, 1974; Davis, 1990; Lang et al., 1990; Yeomans and Frankland, 1996; Koch, 1999; Swerdlow et al., 2001). The rat is an excellent animal model to study the ASR, and hence, the neuronal circuits underlying the ASR in rats are of great interest. It is well established that a relatively simple pathway in the brainstem mediates the ASR (**Figure 1**). The cochlear root neurons (CRNs), true sentinels of the rodent auditory pathway, are the first brainstem neurons receiving direct input from spiral ganglion cells (Harrison et al., 1962; Merchán et al., 1988; Osen et al., 1991; López et al., 1993, 1999). Collectively, they comprise the so-called cochlear root nucleus, and are morphologically characterized by their large cell body and thick dendrites that distribute among the eighth nerve fibers (Merchán et al., 1988; López et al., 1993). The thick myelinated axons of CRNs course through the trapezoid body (TB) to innervate neurons in the caudal pontine reticular nucleus (PnC) on both sides of the brainstem, with a clear contralateral predominance (López et al., 1999; Nodal and López, 2003). Finally, the acoustically driven PnC neurons project to facial, cranial and spinal motoneurons that rapidly activate the muscle contractions (Lingenhöhl and Friauf, 1994; Lee et al.,

fibers synapse cochlear root neurons (CRNs) which exhibit a secure response with first-spike latencies of approximately 2.2 ms (López et al., 1999; Sinex et al., 2001; Gómez-Nieto et al., 2010, 2013). This short-latency acoustic input is rapidly conducted to giant neurons in the pontine reticular formation (PnC) which respond with short-latencies of 5.2 ms (Lingenhöhl and Friauf, 1994; Lee et al., 1996; López et al., 1999). Finally, the acoustically driven PnC neurons send axons that contact motoneurons in

which is shown in coronal brainstem sections. Notice the axons of CRNs course through the trapezoid body (TB) to target preferentially contralateral PnC neurons (López et al., 1999; Nodal and López, 2003). Arrowheads indicate the flow of acoustic information within the circuit. Projections from CRNs to other non-auditory nuclei which participate in the full expression of the acoustic and pinna reflexes are not shown in this scheme (see details in López et al., 1999; Horta-Júnior et al., 2008).

1996; Yeomans and Frankland, 1996; Koch and Schnitzler, 1997). The electromiographic latencies of the muscles during the ASR are extremely short (6–10 ms) (Ison et al., 1973; Davis et al., 1982; Caeser et al., 1989). Such short-latency motor response is consistent with the neuronal latencies observed in the primary ASR mediating circuit (**Figure 1**). Thus, CRNs exhibit a secure electrophysiological response to tone burst, with first-spike latencies of approximately 2.2 ms (Sinex et al., 2001; Gómez-Nieto et al., 2010, 2013), and giant PnC neurons that receive acoustic inputs show first-spike latencies of 5.2 ms (Lingenhöhl and Friauf, 1994). The main purpose of this study is to reappraise the anatomical origin of short-latency auditory inputs to CRNs and PnC neurons and investigate whether the CRNs-PnC neuronal pathway is responsible for the binaural summation that occurs during the ASR. Although much is currently known about the cochlear nerve projection to the cochlear root nucleus (Harrison et al., 1962; Merchán et al., 1988; Osen et al., 1991; López et al., 1993, 1999), the neuroanatomical and neurochemical mechanisms by which CRNs integrate fast and frequency-specific acoustic information has not been clearly established. The rats are extremely sensitive to high frequency sounds, and the amplitude of the ASR depends on sound features such as intensity, duration and frequency (Gourevitch and Hack, 1966; Błaszczyk and Tajchert, 1997). Since the characteristic frequency of CRNs is approximately 30 kHz (Sinex et al., 2001; Gómez-Nieto et al., 2013), we aimed to determine whether auditory nerve afferents tonotopically innervate CRNs. The primary auditory afferents use glutamate as neurotransmitter (Hackney et al., 1996; Furness and Lawton, 2003; Rubio and Juiz, 2004) and their endings contain the vesicular glutamate transporter 1 (VGLUT1) as it has shown in other areas of the cochlear nucleus complex (Zhou et al., 2007; Gómez-Nieto and Rubio, 2009). Here, we demonstrated that VGLUT1 is also associated with auditory nerve afferents that terminate onto CRNs, which suggests a fast mechanism of controlling and securing excitatory transmission in the cochlear root nucleus. Additionally, there are two questions that have been raised about the primary ASR circuit at the level of the PnC. One question is whether, in addition to the CRNs, other neuronal types within the cochlear nucleus complex provide short-latency acoustic inputs to PnC neurons, as proposed by other researchers (Davis et al., 1982; Kandler and Herbert, 1991; Lingenhöhl and Friauf, 1994; Meloni and Davis, 1998). The other question relates to the neuronal bases underlying the binaural summation of the startle reflex, which has yet to be elucidated (Marsh et al., 1973; Li and Frost, 1996; Yeomans et al., 2002). To address these issues, we used extracellular recordings *in vivo* followed by subsequent tracer injections to investigate the sound evoked responses of PnC neurons after contra-, ipsi-, and bilateral stimulation, and identify the source of their auditory afferents. Our electrophysiological study combined with neuronal tracing indicated the existence of a binaural summation process in the pontine reticular formation, and demonstrated that CRNs are the main source of short-latency acoustic inputs to PnC neurons. Additional anterogradely tracing experiments were also performed to verify whether neurons of the dorsal and ventral cochlear nucleus (DCN and VCN, respectively) innervate PnC neurons. Furthermore, we correlated our electrophysiological findings with changes in the ASR behavioral performance of monaural earplugged rats. Monaural earplugging reduces auditory nerve activity and affects processing through the auditory pathways (Potash and Kelly, 1980). Since the startle depends on normal binaural auditory interaction (reviewed in Yeomans et al., 2002), we found a reduction of the ASR amplitude in rats under monaural sound-deprivation conditions, which was consistent with the PnC neurons' response to monaural acoustic stimulation. Our study provides convergent anatomical, electrophysiological and behavioral evidence for the role of the CRNs-PnC projection in fast neurotransmission and binaural summation of the ASR and offers new insights into the structure and function of the primary ASR neuronal circuit.

#### **MATERIALS AND METHODS**

#### **EXPERIMENTAL ANIMALS**

In total, 36 adult female Wistar rats (Charles River Laboratories) weighing 165–320 g were used in this study. The animals were randomly housed and maintained under normal 12/12 h light/dark cycle (lights on at 07:00 h) in a temperature -and humidity-controlled environment. The rats were given *ad libitum* access to food and water over the study period. The experiments were conducted in compliance with the guidelines for the use and care of laboratory animals of the European Communities Council Directive (2010/63/EU), the current Spanish legislation (RD 1201/05), and with those established by the Institutional Bioethics Committee. All efforts were made to minimize the number of animals used. For the surgical procedures, the animals were deeply anesthetized with a mixture of ketamine (40 mg/kg body weight) and xylazine (7 mg/kg body weight). The suffering of the animals was minimized during the surgery by monitoring the depth of anesthesia carefully. Supplementary doses of anesthetic were administrated as required to maintain deep anesthesia throughout the duration of the experiment.

#### **NEUROANATOMICAL EXPERIMENTS: SURGERY, TRACER INJECTIONS, AND TISSUE PROCESSING**

A total of 17 animals were used in the neuroanatomical experiments with the aim of studying the source of acoustic inputs to CRNs and PnC Neurons. To investigate the distribution of cochlear nerve inputs on CRNs, 4 rats received anterograde horseradish peroxidase (HRP, type VI, Sigma, St Louis, MO) injections into basal and apical regions of the cochlea following the procedure described by Osen et al. (1991). Once anesthetized, the tympanic bulla was exposed and a small orifice was made in the lateral wall of the cochlea at various apico-basal levels. HRP (20% in distilled water) was pressure-injected through a glass micropipette for 10–20 min, and a second micropipette was placed in the oval window for fluid removal. After the wound was washed and sutured, the animals were allowed to recover for 2 days. In cases with electrophysiological recordings in the right PnC (see below), we injected the bidirectional tracer, biotinylated dextran amine (BDA, 10,000 MW; #D-1956; Molecular Probes, Eugene, OR) to verify the recording site and investigate the source of cochlear nucleus inputs to PnC neurons. In another set of experiments, 6 rats received unilaterally injections of BDA into the left DCN and VCN. All surgical and stereotaxic procedures for injecting the BDA were identical to that used in our previous studies (Gómez-Nieto et al., 2008a, 2013; Horta-Júnior et al., 2008). BDA (10% in distilled water) were injected iontophoretically via a glass micropipette (25 μm tip diameter), with 3μA positive current pulses (7 s on/7 s off) for a period of 15 min. The stereotaxic coordinates for the right PnC was based on previous reports which studied afferent projections from auditory brainstem nuclei into the PnC (Lingenhöhl and Friauf, 1994; Lee et al., 1996; López et al., 1999; Nodal and López, 2003). The coordinates for the left VCN were precisely the same as those used by (Gómez-Nieto et al., 2013). All sterotaxic coordinates, including those for the DCN, were obtained from the atlas of the rat brain (Paxinos and Watson, 2009), using an electrode angle calibrator (David Kopf Instruments). After the tracer injection, the scalp was sutured and the animal was allowed to recover for a minimum of 7 days. Tissue preparation for light microscopy included: perfusion of the animals, brain dissection and subsequent cryoprotection with sucrose, freezing and serially slicing at 40μm thickness along parasagittal and coronal planes, visualization of HRP and BDA neurotracers, and calbindin protein-D28K (CaBP) immunohistochemistry. All these techniques were applied in a manner identical to that used in our previous reports (Osen et al., 1991; López et al., 1993, 1999; Gómez-Nieto et al., 2008a, 2013). In cases with HRP injections, we followed a standard immunostaining protocol to visualize CaBP as described by Gómez-Nieto et al. (2008a). CaBP immunoreactivity has been extensively utilized for detecting the cell body and dendrites of the CRNs (López et al., 1993; Gómez-Nieto et al., 2008a, 2013). Thus, nickel-intensified peroxidase reaction was developed for HRP visualization in order to distinguish from CaBP immunohistochemistry without heavy metal intensification of diaminobenzidine (Gómez-Nieto et al., 2008a). Detailed information of the antibodies and dilutions used for CaBP immunohistochemistry is shown in **Table 1**. All sections processed for light microscopic analysis were mounted on slides and 4 alternate series were counterstained with cresyl violet to highlight cytoarchitectonic divisions; the other sections were dehydrated in ethanol and coverslipped with Entellan Neu (#107961; Merck, Darmstadt, Germany).

#### *Double fluorescent tract-tracing experiments*

To label CRNs and the cochlear nerve, we used identical procedures to those used in our previous studies (Gómez-Nieto et al., 2008b; Gómez-Nieto and Rubio, 2009). Three animals received a unilateral injection of the fluorescent tracer, dextran fluorescein isothiocyanate (D-FITC; #D-1820; Molecular Probes,


**Table 1 | List of antibodies and dilutions used in the immunohistochemistry techniques.**

*\*Antibody manufacturers: JI, Jackson Immunoresearch, West Grove, PA, USA; SCB, Santa Cruz Biotechnology (Santa Cruz, CA, USA); Synaptic Systems, Göttingen, Germany; Sigma, Sigma-Aldrich, St Louis, MO, USA; Vector, Vector Laboratories, Burlingame, CA, USA.*

*†Specificity and immunogen sequence of primary antibodies: (1) The anti-CaBP was a mouse IgG isotype produced by hybridization of mouse myeloma cells with spleen cells from mice immunized with calbindin D-28k purified from chicken gut. The antibody specifically stains the 45Ca-binding spot of the calbindin D-28k (MW 28 kDa) of tissue originating from rat brain in two-dimensional immunoblots; (2) The anti-VGLUT1 is a polyclonal antibody generated in rabbits against Strep-Tag fusion protein containing amino acid residues 456–560 of rat VGLUT1. This antibody has been tested in preadsorption experiments that blocked efficiently and specifically the corresponding signals (manufacturer's technical information; see also Zhou et al., 2007); (3) The anti-c Fos is a is a rabbit polyclonal antibody produced against a epitope mapping at the N-terminus of c-Fos of human origin (see details in manufacturer's technical information).*

*‡ Tissue processed for light microscopy (LM) or confocal microscopy (CM).*

Eugene, OR), into the left TB to label CRNs retrogradely. The stereotaxic coordinates targeted the course of CRN axons which project to PnC neurons via the TB (López et al., 1999). A volume of 0.15–0.2μl of D-FITC (10% in distilled water) were pressure-delivered with a Hamilton syringe (#710; Hamilton Co., Reno, NV) attached to a Stereotaxic Injector (Stoelting Co., Wood Dale, IL). After the scalp was sutured, the animals were allowed to recover for 4–7 days. Then, the animals were deeply anesthetized and perfused through the heart with 0.5% paraformaldehyde (PFA) in 0.1 M phosphate buffer, pH 7.4. The cochlea was removed, and crystals of the lipophylic dye DiI (#D-3911; 1.1- -dioctadecyl-3,3,3- ,3- -tetramethylindocarbocyanine perchlorate; Molecular Probes) were placed on the exposed auditory nerve. The brains were stored in 4% PFA for approximately 30 days at room temperature protected from light and were processed for confocal microscopy as described by Gómez-Nieto and Rubio (2009).

#### *VGLUT1 immunofluorescence and colocalization experiments*

Two rats were used to identify VGLUT1 immunopositive terminals in the cochlear root nucleus. The immunofluorescence was performed following the same protocol described by Gómez-Nieto and Rubio (2009). After obtaining the brain tissue, freefloating sections (40μm in thickness) were pretreated with phosphate buffer saline (PBS) 0.3% TritonX-100, and blocked for 1h at room temperature with 6% normal goat serum (#S-1000, Vector Laboratories) in PBS. Subsequently, the brain slices followed overnight incubation at 4◦C with the primary antibody (see **Table 1** for complete information of antibodies). Thereafter, the sections were rinsed extensively in PBS and incubated for 1 h at room temperature with their corresponding fluorophore conjugated secondary antibody (**Table 1**). In another set of experiments, we studied the codistribution of cochlear nerve terminals and VGLUT1 on CRNs retrogradely labeled with D-FITC. For this purpose, we analyzed 2 animals from our previous studies in which a triple-labeling procedure was designed to detect colocalization of VGLUT1 with auditory nerve terminals on bushy cells dendrites (Gómez-Nieto and Rubio, 2009). These animals received an injection of D-FITC in the TB and insertions of DiI crystals in the cochlear nerve root, and finally the brainstem sections were processed for VGLUT1 immunofluorescence as described above. In these colocalization experiments, we used Cy5 goat anti-rabbit (blue emission) to distinguish the signal from the DiI (red emission)-labeled auditory nerve endings. Finally, all sections were rinsed extensively in PBS, dipped briefly in distilled water, mounted onto fluorescence-free slides, air-dried, and coverslipped with ProLong Antifade Kit (#P7481; Molecular Probes) to prevent photobleaching. In all immunofluorescence experiments, negative controls were not treated with primary antibodies, and this resulted in the complete absence of immunolabeling.

#### **ELECTROPHYSIOLOGY EXPERIMENTS COMBINED WITH BDA INJECTIONS**

Seven rats were used for extracellular multi-unit recordings of acoustically driven PnC neurons, and subsequent neuronal tracttracing study. The animal's body temperature was monitored and maintained at 38◦C by a thermostatically controlled electrical blanket. Once anesthetized, each animal was placed inside a sound-attenuated room in a stereotaxic frame in which the ear bars were replaced by a hollow speculum coupled to a sound delivery system (Sony MDR E-868 earphones). The output of the system at each ear was calibrated *in situ* using a DI-2200 spectrum analyzer and a Brüel and Kjær 4134 microphone. This standard calibration was used to set the levels of tones and white noise in all experiments. Both pure tones and bursts of white noise were generated using a waveform generator (Hewlett Packard-8904A multifunction synthesizer) controlled by a computer. Acoustic stimuli were delivered to the ears monaurally, ipsi- and contralateral to the recording site, as well as binaurally. After craniotomy and brain surface exposure, a glass micropipette with the tip broken to a diameter of 10μm was filled with 10% of BDA in 2M NaCl (15–25 M-) and lowered through the brainstem using a piezoelectric microdrive (Burleigh 6000 ULN, Burleigh Instruments, Fishers, NY) with a resolution of 1μm. The electrode was aimed to target PnC neurons in the right side of the brainstem following the same stereotaxic approach described above. As the micropipette was advanced into the brainstem, we used white noise as search stimuli. Extracellularly recorded action potentials were amplified (×10,000; BAK MDA-4I), filtered (0.3– 3 kHz), discriminated (BAK DIS-I) and time-stamped with an accuracy of 10μs by a CED-1401plus Laboratory Interface (Cambridge Electronic Design). Once significant evoked activity was detected, bursts of white noise and pure tones were used as experimental stimuli to assess the modifications of PnC neural responses at different conditions (intensity, frequency and side of stimuli presentation). Stimuli were 100 presentations of 75 ms pure tones (5 ms rise/fall time) or noise bursts at a rate of 2/s. Sound level was varied from 60 to 90 dB in 10-dB steps and frequency from 0.5 to 35 kHz. These intensity and frequency ranges were chosen based on the high firing thresholds and broad frequency tuning of PnC neurons (Lingenhöhl and Friauf, 1994). The spikes times evoked by the different stimuli were stored and were used to calculate the response magnitude (spikes per ms) and first spike latencies. The integration window to calculate the mean firing rate was the first 25 ms after stimuli presentation. For each recording, we obtained peristimulus time histograms to display the responses to the stimuli. Data from first spike latencies and responses of PnC neurons were pooled for statistical comparison between stimulation with pure tones and noise bursts as well as the side of stimuli presentation. At the end of each experiment, BDA was injected iontophoretically following the same procedure described above. Finally, the animal was allowed to recover for a minimum of 7 days and followed the histological procedure to visualize the neuronal tracer at light microscopy (López et al., 1999; Gómez-Nieto et al., 2008a, 2013).

#### **BEHAVIORAL EXPERIMENTS**

A total of 12 rats were used to study the behavioral paradigm of the ASR under monaural sound-deprivation conditions. The animals were randomly divided into one experimental group with monaural earplugging (*n* = 6) and an intact control group (*n* = 6). The ASR amplitude was assessed in rats from the experimental group prior to and following 4 days of monaural earplugging, and their ASR was compared with those from the control group. After the behavioral test, the animals were kept at rest during 60 min, and then followed the tissue preparation to detect c-Fos immunoreactivity at light microscopy (see below).

#### *ASR apparatus (system) and procedure*

Before testing, the rats were habituated to the experimental conditions, especially to their placement into the ASR apparatus. All testing was carried out between 9:00 A.M. and 11:00 A.M, using the SR-LAB system (SDI, San Diego, CA, USA) as described by Castellano et al. (2009). The acoustic startle reflexes were measured in six identical startle response systems (SR-LAB). Acoustic stimulus intensities and response sensitivities were calibrated (using an SR-LAB Startle Calibration System) to be nearly identical in each of the six SR-LAB systems (maximum variability <1% of stimulus range and <5% of response ranges). Each system consisted of a nonrestrictive Plexiglas cylinder, 8.7 cm in internal diameter and 20.5 cm long, mounted on a platform which was located in a ventilated, sound-attenuated chamber. Cylinder movements were detected by a piezoelectric accelerometer mounted under each platform and were digitized and stored by an interfacing computer assembly. A loudspeaker mounted 14.5 cm above the cylinder provided the broadband background noise and acoustic stimuli. Each testing session consisted of an acclimatization period of 5 min and 64 trials with a 30 s interval between them. The trials were a single noise pulse (20 ms bursts of white noise) presented at different intensities (95, 105, and 115 dB SPL) in a random manner. Whole body ballistic movements corresponding to startling responses were collected and analyzed by the SR-LAB system providing two main values of interest: *V*max and *T*max. The *V*max represents the peak startle response (ASR amplitude) that occurs during each trial while *T*max is the time from stimulus to the peak startle response (ASR latency). The background noise of 65 dB SPL was generated throughout the entire session in order to avoid interference from external noise and ensure equal experimental conditions.

#### *Monoaural earplugging*

After ASR testing, animals from the experimental group were anesthetized and put on a warm blanket under a stereomicroscope. The skin was disinfected and foam earplugs (Moldex®, Culver City, CA) were cut to appropriate size and introduced into the right external canal as described by Whiting et al. (2009). Once the ear canal was sealed, the animal was maintained under normal conditions for 4 days until the ASR was measured again. After earplugging, the animals showed no symptoms of stress or infection, and kept their weight steady (see table in **Supplemental Figure 6**). Acoustic effects of monaural earplugging in rats include attenuation of approximately 40 dB on the same side to the earplug as well as modifications of the threshold, amplitude and latency of acoustic brain response waveforms (Whiting et al., 2009; Popescu and Polley, 2010; Wang et al., 2011).

#### *Fos immunohistochemistry*

Once the behavioral tests were completed, animals from the control and the monaural experimental group were processed to detect c-Fos immunoreactivity in the auditory pathway, and more specifically in the inferior colliculus (IC). The noise bursts used to measure the ASR served to induce the c-Fos expression in the auditory pathway. Since c-Fos protein has been widely used as a marker of early neuronal activation (Sagar et al., 1988; Dragunow and Faull, 1989; Murphy and Feldon, 2001), the quantification c-Fos immunoreactivity in the IC allowed us to check the efficiency of the earplugging. Perfusion of the animals, serial brain slicing into 40μm coronal sections and c-Fos immunohistochemical staining protocol was described in detail elsewhere (Castellano et al., 2009, 2013). Sections were incubated in primary antiserum against c-Fos protein for 72 h at 4◦C. The tissue was then washed and incubated with its corresponding biotinylated secondary antibody for 2 h at room temperature, and finally visualized with the avidin-biotin-peroxidase complex procedure (Vectastain, Vector Labs.) and histochemistry for peroxidase without heavy-metal intensification. Complete information of the primary and secondary antibodies used for c-Fos immunohistochemistry is provided in **Table 1**. Immunolabeling was abolished by omission of the primary antibody. For each brain, all sections were mounted on slides, dehydrated and coverslipped as described above.

#### **IMAGE, DATA, AND STATISTICAL ANALYSIS**

The sections processed for light microscopy were examined on an upright brightfield microscope (#BX5; Olympus, Center Valley, PA, USA) equipped with a digital camera (SpotRt®; Diagnostic Instruments, Sterling Heights, MI, USA). Low-magnification images were taken with the 4×, 10×, or 20× objective lens, and high magnification images were taken with a 40× or 100× objective lens (oil immersion) for morphometric measures of labeled structures. The morphometric analysis was carried out with ImageJ (version 1.42; Rasband W.S., ImageJ, Bethesda, Maryland, USA, U.S. National Institutes of Health; 1997–2011. http://imagej.nih.gov/ij/). In the behavioral experiments, we estimated the number of c-Fos-immunolabeled neurons in the IC using a similar quantitative analysis as previously described by Castellano et al. (2013). High magnification micrographs (×40) of the ipsilateral and contralateral IC were taken with a Leica microscope (model # DMRB) coupled to a motorized *x*–*y*–*z* motor stage, and connected to a PC with Stereo Investigator software (MicroBrightField, VT, USA). The stereological analysis was accomplished with the aid of the optical fractionator method (Gundersen et al., 1988). We selected one section containing the left and right IC in each one of the 12 animals used in the behavioral experiments. To guarantee the homogeneity and the reproducibility of the area of interest, the contour of the IC was drawn at low magnification (2x), using transparent templates taken from the atlas of the rat brain (Paxinos and Watson, 2009) at 0.2 mm anterior from the interaural plane. The templates were placed on the screen of the computer and served to select the same area of interest in all histological sections. After that, the Stereo Investigator system automatically estimated the size of the counting grid (optical dissector) and the number of zones to be sampled. The neurons were counted according to the optical dissector counting rules as they first came into focus from 3μm below the upper surface of the section and did not touch the right inferior and superior edges of the dissector. Quantification was carried out in a single-blind assessment by two different investigators and only neurons that displayed a clearly staining above the background were considered as a count. For the immunofluorescence and colocalization experiments, the sections were examined with a conventional brightfield microscope equipped with epifluorescence. In selected slides, the sections were analyzed with a Leica TCS SP2 confocal laser scanning microscope (Leica Microsystems, Mannheim, Germany) coupled to a Leica DM IRE2 inverted microscope and equipped with argon and helium neon lasers with excitation wavelengths of 458, 476, 488, 543, 568, and 633 nm. The fluorochromes FITC, Cy3, DiI, and Cy5 were detected sequentially, stack by stack, with the acousto-optical tunable filter system and triple dichroic mirror TD488/543/633. The background was controlled, and the photomultiplier voltage (800 V) was selected for maximal sensitivity in the linear range. The objectives used were oil immersion ×40 and ×63/numerical aperture 1.30, providing a resolution of ×150 nm in the xy plane and ×300 nm along the z-axis (pinhole 1 Airy unit), as well as several electronic zoom factors up to ×1.58. To determine the codistribution of the immunolabeled terminals, series of 25–50 confocal images were obtained to generate a maximal-intensity z projection of stacks and an orthogonal projection (= xy, xz, yz planes, for z stacks series). Colocalization of the fluorochromes within positive terminals was verified in the orthogonal view. All photomicrographs shown in the figures were processed with minor modifications in brightness, contrast and to remove the tissue free background using Adobe Photoshop® CS3 Extended (Version 10.0) and assembled in Canvas 7.0 software. Statistical analyses were performed using the SPSS software, version 18.0 (SPSS Inc., Chicago, IL, USA). All mean values were expressed ± the standard error of the mean. Comparisons between groups were made by analysis of variance (mixed ANOVA split-plot), with pairwise comparisons Sheffe (between-subjects analysis) and Bonferroni post-hoc test (intra-subject analysis). To compare differences between two means, we used Student's t-test taking into account the Levene's test for equality of variances. The differences between groups were regarded as statistically significant when *p* ≤ 0.05.

## **RESULTS**

#### **HRP INJECTIONS INTO THE BASAL AND APICAL COILS OF THE COCHLEA**

To determine whether cochlear nerve terminals innervate CRNs in a tonotopic specific manner, we injected HRP into the basal and apical regions of the cochlea. In parasagittal sections of the cochlear nucleus complex, we observed HRP labeling as an intense black reaction product in which a densely bundle of axons were identified (**Figure 2**). These HRP labeled axons followed the characteristic V-shaped branching pattern of the primary cochlear afferents (**Figures 2A,B**). To define the injection site, we followed the criteria established by Merchán et al. (1988) and examined the areas of the cochlear nucleus complex in which the HRP-labeled fibers decussated and terminated. In cases with HRP injections in the basal coil of the cochlea, the cochlear nerve fibers bifurcate dorsally and give rise terminals into dorsal parts of the cochlear nucleus complex (**Figure 2A**). We examined the course of HRP-labeled axons within the cochlear root nucleus, and found that they give off collaterals of various diameters (ranged from 0.4 to 1.6μm) and numerous endings that terminated on CRNs somata, but not exclusively (**Figures 2A1,A2**). To verify that HRPlabeled fibers terminate on the cell body of CRNs, we immunostained brainstem sections for CaBP. The cochlear nerve collaterals were orientated perpendicular to the parent fibers, and extended approximately 70μm in length to innervate the cell body of CRNs immunostained for CaBP (**Figures 2A3,A4**). In cases with HRP injections in the apical coil of the cochlea, the cochlear nerve fibers divided more ventrally and form axonal terminals into ventral parts of the cochlear nucleus complex (**Figure 2B**). In the cochlear root nucleus of these cases, we observed numerous HRP-labeled fibers that span away from the CRNs cell body and followed the course of CRNs dendrites (**Figure 2B1**). Although we also observed that HRP-labeled fibers give off terminals on cell bodies of CRNs (**Figure 2B2**), a greater number of these terminals appear to establish contact on primary and distal CRNs

**FIGURE 2 | Injections of HRP in the basal and apical regions of the spiral ganglion generate respectively labeled auditory nerve endings on the cell body and dendrites of cochlear root neurons (CRNs). (A)** Parasagittal section of the cochlear nucleus complex shows labeled primary auditory afferents after injection of HRP in the basal coil of the cochlea. Notice that auditory nerve fibers divide dorsally and terminate in the dorsal part of the anteroventral cochlear nucleus (AVCN) and posteroventral cochlear nucleus (PVCN). **(A1,A2)** Higher magnification of the cochlear root nucleus corresponding to the frames in **(A)**. Note HRP-labeled endings outlining the unlabeled neuronal somata of CRNs (asterisks). **(A3,A4)** High magnification micrographs of the cochlear root nucleus show HRP-labeled collaterals of primary auditory fibers (arrowheads) that terminate on the cell body of CRNs immunolabeled for CaBP. **(B)** Parasagittal section of the cochlear nucleus complex shows labeled primary auditory afferents after injection of HRP in the apical coil of the cochlea. Notice that auditory nerve fibers bifurcate ventrally and terminate in the ventral part of the AVCN and PVCN. **(B1)** High magnification of the cochlear root nucleus corresponding to the frame in **(B)** shows HRP-labeled endings (arrowheads) outlining the unlabeled dendrite of a CRN. **(B2)** High magnification micrograph shows HRP-labeled auditory nerve endings (arrows) outlining the cell body of a CRN. **(B3,B4)** High magnification of the cochlear root nucleus shows details of HRP-labeled endings (arrowheads) outlining perpendicular **(B3)** and parallel **(B4)** unlabeled dendrites of CRNs after HRP injection in the apical coil of the cochlea. DCN, dorsal cochlear nucleus. Scale bars = 1 mm in **A** and **B**; 25μm in **A1–A4**, **B1–B4**.

dendrites (**Figures 2B3,B4**). This set of data suggests that auditory nerve inputs could be distributed preferentially on the cell body and dendrites of CRNs depending, respectively, on their basal or apical cochlear origin.

## **COCHLEAR NERVE ENDINGS ON CRNs COLABEL WITH VGLUT1**

Our HRP injections in the apical regions of the spiral ganglion gave evidence that primary auditory afferents might innervate dendrites of CRNs. To fully confirm this result, we retrogradely labeled CRNs by injecting D-FICT in the TB and inserted DiI crystals into the cochlear nerve root (**Figure 3**). The D-FICT injection sites in the TB were equal as those reported in our previous studies (Gómez-Nieto et al., 2008b; Gómez-Nieto and Rubio, 2009). Our D-FICT injections filled axons of approximately 3μm in diameter that coursed through the contralateral TB (**Figure 3A**, see also **Figure 4A** below). We followed the directions of these labeled axons and found that they emerged from retrogradely labeled cell bodies of CRNs (**Figure 3A**). The DiI

**FIGURE 3 | Auditory nerve projections to cochlear root neurons (CRNs) dendrites. (A)** Epi-fluorescence micrograph of a coronal section shows a representative case with an insertion of DiI crystals into the cochlear root (asterisk) and retrogradely labeled CRNs (arrows) after D-FITC injection into the trapezoid body (TB). Notice thick CRNs axons (arrowheads) labeled with D-FITC coursing through the TB. **(B1–B3)** Confocal micrographs show auditory nerve terminals on dendrites of CRNs. DiI-labeled auditory nerve fibers which send collaterals (arrows) into the region of CRNs are shown in **(B1)** (in red). CRNs retrogradely labeled with D-FITC are shown in **(B2)** (in green). The image **(B3)** is the merge of the two maximum z-series projection shown in **(B1,B2)**. **(C)** Detail of the boxed area in **(B3)** show a dendrite of a CRN decorated with auditory terminals. The orthogonal view in **(C)** confirms that auditory nerve terminals are in close apposition with the dendrite (arrowhead). Scale bars = 200μm in **A**; 75μm in **B1–B3**; 25μm in **C**.

crystals inserted into the cochlear nerve root led to diffusion along auditory nerve fibers (**Figure 3A**). In the cochlear root nucleus, we observed thin DiI-labeled collaterals branching perpendicularly from the main auditory nerve fibers (**Figure 3B1**). The DiI-labeled collaterals gave rise to small endings that terminated on dendrites of CRNs (**Figures 3B1–B3**). The orthogonal view of confocal z-stack confirmed that those cochlear nerve endings were in close apposition to dendrites of CRNs (**Figure 3C**).

The excitatory synaptic marker VGLUT1 has been associated with the mediation of glutamate transport at cochlear nerve fibers in other areas of the cochlear nucleus complex (Zhou et al., 2007; Gómez-Nieto and Rubio, 2009). As expected from our previous report (Gómez-Nieto et al., 2008b), we found a strong immunoreactivity for VGLUT1 in the cochlear root nucleus (**Supplemental Figure 1**). VGLUT1-labeled endings equally distributed throughout dorsal and ventral portions of the cochlear root nucleus (**Supplemental Figure 1**). They were quite numerous and fully decorated cell bodies as well as primary and distal dendrites of CRNs (**Supplemental Figure 1**). To determine whether these VGLUT1 immunopositive endings colabel with terminals of the auditory nerve, we used a triple-labeling method consisting of double tract-tracing with D-FITC for CRNs and DiI for cochlear nerve endings, combined with immunofluorescence for VGLUT1. As described in the above experiments, D-FITC injection in the TB retrogradely labeled the cell body and dendrites of CRNs (**Figures 4A,B**). The inserted DiI crystals into the cochlear nerve root diffused through cochlear nerve fibers and collaterals, allowing us to label cochlear nerve terminals (**Figure 4C**). We also found many VGLUT1-immunopositive endings that were outlining cell bodies and dendrites (**Figure 4D**). Our confocal analyses identified cochlear nerve terminals as VGLUT1-immunopositive endings that were closely apposed to CRNs (**Figure 4E**). The orthogonal view of cochlear nerve endings confirmed the colocalization DiI-labeled endings with VGLUT1 immunolabeling (**Figure 4F**).

### **ELECTROPHYSIOLOGICAL RESPONSES OF PnC NEURONS TO CONTRA-, IPSI-, AND BILATERAL ACOUSTIC STIMULATIONS**

To determine whether acoustically driven PnC neurons are involved in binaural summation processing, pure tones (frequency range from 0.5 to 35 kHz) and noise bursts of 75 ms duration were presented contra-, ipsi-, and bilateral to the recording site at four different intensities (60, 70, 80, and 90 dB SPL). A total of 169 and 49 multi-units recordings were obtained using pure tones and noise bursts, respectively. PnC neuronal activity exhibited an onset response with very short first spike latencies (**Figure 5**). For example, in **Figure 5A** noise bursts of 90 dB SPL evoked the PnC neuronal activity with a mean spike latency of 4.39 ± 0.12 ms. The major firing rate was observed within the first 25 ms after stimuli presentation (**Figure 5A**). Using this integration window, we examined the mean spike latency and firing rate of PnC neurons depending on acoustic stimuli (noise bursts and pure tones) and on the side of stimuli presentation (**Figures 5B,C**). We found that the mean spike latency was shorter with pure tone (3.94 ± 0.39 ms) than with noise burst stimulations (4.83 ±

**FIGURE 4 | Auditory nerve endings onto cochlear root neurons (CRNs) colabel with VGLUT1. (A)** Epi-fluorescence micrograph of a coronal section shows retrogradely labeled CRNs after D-FITC injection into the trapezoid body (TB) and the insertion site of DiI crystals into the cochlear root (asterisk). Notice thick CRNs axons (arrowheads) labeled with D-FITC coursing through the TB. **(B)** Confocal image of retrogradely labeled CRNs with D-FITC (position denoted by an arrow in **A**). **(C)** Confocal image of DiI-labeled endings which arise from auditory nerve collaterals (arrows). **(D)** Confocal image shows VGLUT1-immunolabeled endings. **(E)** Confocal image shows VGLUT1 colabeled with auditory nerve terminals on CRNs. This image is the merge of the three maximum z-series projection shown in **(B–D)**. **(F)** Detail of the boxed area in E shows VGLUT1-auditory nerve terminals (arrowheads) on dendrites of CRNs. Colocalization of VGLUT1 puncta and DiI is confirmed by the orthogonal view. Scale bars = 500μm in **A**; 40μm in **B–E**; 20μm in **F**.

0.58 ms), showing significant differences when they were presented monaurally (both contra- and ipsilateral to the recording site). However, no significant difference was found between pure tone and noise burst stimulations after binaural presentations (**Figure 5B**). There was also no significant difference in spike latencies when comparing binaural with monaural presentation of pure tones or noise bursts. Regarding the PnC neuronal activity, we found that the mean firing rate (spikes/ms) was significantly greater with binaural than with monaural presentations (**Figure 5C**). Thus, we consistently observed that either pure tones

**FIGURE 5 | Electrophysiological responses of PnC neurons to contra-, ipsi-, and bilateral acoustic stimulations. (A)** PST histogram shows the PnC neuronal activity (spikes/ms) evoked by noise bursts of 90 dB SPL and 75 ms duration (black line) delivered contralaterally to the recording site. Notice that the mean first spike latency is very short (4.39 ± 0.12 ms). The shadow area displays the integration window (first 25 ms after stimuli presentation) used to calculate the firing rate. **(B)** Histogram shows mean spike latencies of PnC neurons after contra-, ipsi-, and bilateral acoustic stimulations with noise bursts and pure tones. Notice that the spike latencies are shorter following pure tone than noise burst stimulations, with significant differences when stimuli were presented monaurally (both contra- and ipsilateral to the recording site). No significant difference (ns) was found after binaural presentations. Bars represent mean values and SEM, <sup>∗</sup> [contralateral presentation, *F*(1,216)= 4.399, *p* = 0.037; ipsilateral presentation, *F*(1,216) = 8.137, *p* = 0.005]. **(C)** Histogram shows the mean firing rate (spikes/ms) of PnC neurons within the first 25 ms after contra-, ipsi-, and bilateral acoustic

stimulations with noise bursts and pure tones. Binaural presentations showed significantly higher firing rates than monaural [∗, *F*(2,216) = 14.852, *p* = 0.000]. Significant differences were found between noise bursts and pure tones following binaural presentations [ <sup>∗</sup>, *F*(2, 216) = 4.037, *p* = 0.019], whereas no significant differences (ns) were found in monaural presentations. Bars represent mean values and SEM. **(D)** Firing rate-intensity function of PnC neurons in response to pure tones following contra-, ipsi-, and bilateral stimulus presentations. Binaural presentations showed significant differences with monaural presentations (both contra- and ipsilateral to the recording site) for tones of 60, 70, 80, and 90 dB SPL. Data points represent the means with SEM, [∗, *F*(6,216) = 3.014, *p* = 0.015]. **(E)** Low magnification micrograph shows a representative case of a recording site labeled with BDA. **(F)** Micrograph shows CRNs retrogradely labeled with BDA (arrowheads) after the injection site shown in **(E)**. Notice thick BDA-labeled axons of CRNs in the trapezoid body (TB, arrows). CRNs, cochlear root neurons; PnC, caudal pontine reticular nucleus. Scale bars = 1 mm in **E**; 200μm in **F**.

or noise bursts evoked higher responses when they were presented binaurally than when delivered monaurally. Significant differences were also found between pure tone and noise burst stimulations following binaural presentations, whereas monaural presentations showed no significant differences (**Figure 5C**). We further studied the neuronal activity of PnC neurons by presenting contra-, ipsi-, and bilateral pure tones at many intensities (**Figure 5D**). With contralateral presentations of the tones, we observed a greater increase in firing rates with increasing stimulus intensity. In contrast, this effect of the intensity was not observed with ipsilateral presentations of the tones (**Figure 5D**). For all the intensities tested, the firing rate was significantly greater with binaural than with monaural presentations (**Figure 5D**). In all the experiments, the bilateral evoked responses in PnC were almost equal to the sum of the contralateral and ipsilateral evoked responses, suggesting the existence of a binaural summation process (**Figures 5C,D**). To identify the origin of short-latency acoustic inputs to the acoustically driven area of PnC, we injected BDA at the end of each experiment. The BDA injections were small (∼0.3 mm), round in shape, and restricted to ventrocaudal regions of the PnC (**Figure 5E**, see also **Supplemental Figures 2**, **3**). In all cases, we found BDA-labeled axons in the TB that emerged from retrogradely labeled CRNs (**Figure 5F**). The number of BDA-labeled neurons located on the contralateral cochlear root nucleus (75.6%) was considerably higher than on the ipsilateral nucleus (24.4%). This result indicates that CRNs provide short latency acoustic inputs to PnC with a clear contralateral preference. Because BDA is an effective bidirectional pathway tracer (Rajakumar et al., 1993), we also observed retrogradely labeled neurons and anterogradely labeled terminals in the PnC contralateral to the recording/injection site (**Supplemental Figures 2**, **3**). The labeled fibers from PnC neurons of the right side crossed the midline and enter into the opposite (left) reticular formation giving off BDA-labeled terminals onto retrogradely labeled PnC neurons (**Supplemental Figures 2**, **3**). This result supports the idea that reciprocal connections might exist between the left and right PnC. In cases with slightly larger injections, we also found very few neurons retrogradely labeled by BDA in the DCN and VCN (**Supplemental Figure 4**). However, the number of retrogradely labeled neurons in the DCN and VCN was considerably less than those labeled in the cochlear root nucleus (see table in **Supplemental Figure 4**).

#### **BDA INJECTIONS IN THE DORSAL AND VENTRAL COCHLEAR NUCLEUS**

The results described above indicated that acoustically driven PnC neurons receive bilateral, but mainly contralateral, inputs from CRNs. However, the presence of very few retrogradely labeled neurons in other areas of the cochlear nucleus complex led us to investigate whether these regions innervate PnC neurons. This was done by injecting BDA into the left DCN and VCN. All injections sites of BDA in the DCN and VCN were restricted to the corresponding nucleus and did not spread to adjacent areas of the cochlear nucleus complex (**Figures 6A, 7A**). BDA injections in the DCN were small (0.5–0.7 mm in diameter) and round in shape (**Figure 6A**). To check the efficiency of BDA injections in the DCN, we analyzed the distribution of anterograde labeling in nuclei that are well known to receive DCN inputs (Cant and Gaston, 1982; Malmierca et al., 2002; Cant and Benson, 2003). Our material showed BDA-labeled fibers and terminals in the contralateral DCN (**Figure 6B**), the ipsilateral (data not shown) and contralateral VCN (**Figure 6C**), the contralateral IC (**Figure 6D**), and the contralateral medial geniculate body (**Figure 6E**). We also observed thin and thick labeled fibers of approximately 0.7 and 2.8μm in diameter, respectively, in the pontine reticular formation (**Figures 6F,G**). We examined both type of BDA-labeled fibers and found that they do not innervate giant PnC neurons (**Figure 6G**). Due to the bidirectional nature of the transport of the BDA, the cochlear nerve terminals that innervate DCN neurons uptake the tracer and filled retrogradely cochlear nerve fibers (**Supplemental Figure 5**). We followed these retrogradely labeled fibers and found that they innervate CRNs somata in a similar pattern than that observed in our HRP injections in the basal coil of the cochlea (for comparisons see **Figures 2A1–A4** and **Supplemental Figure 5**). In cases with BDA injection in the VCN, we obtained small (0.3–0.8 mm in diameter) and slightly elongated injection sites (e.g., in **Figure 7A**). These injections generated anterograde labeling in nuclei that are known to be targeted by VCN projecting neurons (Cant and Gaston, 1982; Friauf and Ostwald, 1988; Doucet and Ryugo, 1997, 2003; Cant and Benson, 2003). Thus, we observed a thin band of axons and swellings in the ipsilateral lateral superior olive (**Figure 7A2**) as well as BDA-labeled terminals in the contralateral medial nucleus of the TB (**Figure 7B**), the contralateral DCN (**Figure 7C**), and the contralateral anteroventral cochlear nucleus (**Figure 7D2**). In the reticular pontine formation, we found BDAlabeled fibers that passed in close proximity to giant PnC neurons, but without giving off any terminal fields (**Figure 7D1**). In sum, these track-tracing experiments suggest that other areas of the cochlear nucleus complex are not likely to innervate PnC neurons.

#### **THE ACOUSTIC STARTLE REFLEX IN RATS WITH MONAURAL SOUND-DEPRIVATION**

Our electrophysiological and morphological studies indicated the existence of a binaural summation process in the neuronal activity of PnC neurons that receive acoustic inputs from CRNs. To determine whether these results are consistent with the binaural summation of the behavioral response, we investigated the ASR in rats prior to and following monaural earplugging. By plugging the sound reaching from one ear, the acoustic inputs to the CRNs in the right was reduced, and hence affected the bilateral afferent processing within the ASR pathway. Changes of the ASR amplitude for each intensity of the stimulus (noise bursts of 95, 105, and 115 dB SPL) are shown in **Figure 8A** and **Supplemental Figure 6**. In general, an increase of the acoustic stimulus intensity resulted in a proportional increase of the ASR amplitude. As compared to controls animals, we observed that the ASR amplitude was significantly lower in monaural earplugged rats [*F*(1,12)= 6.32; *p* = 0.04]. A similar reduction was found when compared the experimental group prior to and following monaural earplugging, showing significant differences in the tested range of stimulus intensity [*F*(5,6) = 8.45; *p* = 0.008]. On the contrary, there were no significant differences in ASR amplitude between control and

contralateral inferior colliculus (IC; **D**) and the contralateral medial geniculate body (MG; **E**) show anterograde labeling after the BDA injection shown in **(A)**. (**B1–E1)** Higher magnification corresponding to the frames in **(B–E)** respectively, shows details of BDA-labeled terminals in those nuclei. **(F)**

pre-earplugged animals (**Figure 8A**). We also found no significant differences in ASR latency between normal hearing (control and pre-earplugged animals) and monaural earplugged conditions at all stimulus intensities tested (**Supplemental Figure 6**). As a histological control, c-Fos immunoreactivity in the IC was assessed to check the efficiency of the monaural earplugging (**Figures 8B–E**). In the contralateral plugged side, the number of c-Fos immunolabeled neurons was significantly less in earplugged rats than in controls (*p* < 00.5), whereas a similar number of neurons were observed in the ipsilateral side (**Figure 8B**). When the arrows and arrowheads, respectively) crossing the caudal pontine reticular formation. Notice that DCN projecting axons do not give off endings onto Nissl-stained PnC neurons. Scale bars = 500μm in **A** and **B**; 50μm in **B1**, **C1**, **D1**, **E1**, and **G**; 1 mm in **C–F**; 200μm in **F1**; 25μm in the **F1** inset.

ipsi- and contralateral sides to the earplugging were compared, the contralateral side showed a comparative largest decrease in c-Fos immunolabeling (**Figures 8C–E**). This indicates that the earplug attenuated the sound from reaching the right ear, and hence, reduced the neuronal activity of the contralateral auditory pathway including the left IC (**Figures 8C–E**). In sum, we showed that the ASR amplitude was higher when the acoustic startling stimulus was processed binaurally than when it was processed monaurally, indicating that there is a strong summation for ASR. Comparing the electrophysiological and behavioral experiments,

**FIGURE 7 | BDA injections in the ventral cochlear nucleus (VCN) do not generate labeled terminals on giant neurons of the caudal pontine reticular nucleus (PnC). (A)** Micrograph of a Nissl-stained coronal section shows a case with BDA injections in the VCN. **(A1)** A higher magnification (corresponding to the white frame in **A**) shows a labeled VCN projection neuron (arrow) in the proximity of the BDA injection sites (IS). **(A2)** Nissl-stained section of the ipsilateral lateral superior olive (LSO), corresponding to the black frame in **(A)**, shows the characteristic projection pattern of VCN neurons. The inset in **(A2)** (position denoted with an arrowhead) illustrates a higher magnification of BDA-labeled terminals in the LSO. **(B,C)** Micrographs of the contralateral medial nucleus of the trapezoid body (MNTB; **B**) and the contralateral dorsal cochlear nucleus (DCN; **C**) show anterograde labeling after the BDA injections shown in **(A)**. **(B1–C1)** Higher

we found that the reduction of PnC neurons' responses after monaural acoustic stimulation is consistent with the reduction in the ASR amplitude following the monaural sound-deprivation (for comparisons see **Figures 5D, 8A**).

## **DISCUSSION**

In this study, we showed key morphofunctional aspects of the ASR primary neuronal circuit. In the first central relay station, we found that dendrites and cell bodies of CRNs receive selective inputs from specific regions of the cochlea. We also verified that these cochlear nerve inputs are immunopositive for VGLUT1 as occurs in other areas of the cochlear nucleus complex (Zhou et al., 2007; Gómez-Nieto and Rubio, 2009). In the second central relay station, we demonstrated that the short-latency acoustic inputs to PnC neurons are provided mainly, if not exclusively, by CRNs. Our electrophysiological results indicated that there is a strong binaural summation in the neuronal activity of PnC neurons which can be linked to the bilateral projections from the cochlear root nucleus to the PnC (López et al., 1999; Nodal and López, 2003). Finally and accordingly with our electrophysiological findings, we showed that the overall response of the behavioral magnification of BDA-labeled terminals corresponding to the frame in **(B,C)**, respectively. The inset in **(C1)** (position denoted with an arrowhead) illustrates a higher magnification of BDA-labeled terminals on a Nissl-stained DCN neuron. **(D)** Micrograph of a Nissl-stained section containing the contralateral PnC and the anteroventral cochlear nucleus (AVCN). **(D1)** High magnification (corresponding to the black frame in **D**) shows the absence of labeled terminals on PnC neurons (asterisk). Note that labeled fibers (arrow) do not give off terminals. **(D2)** High magnification (corresponding to the white frame in **D**) shows anterograde labeling in the contralateral AVCN. The inset in **(D2)** (position denoted with an arrowhead) illustrates details of BDA-labeled terminals on Nissl-stained AVCN neurons. SOC, superior olivary complex. Scale bars = 1 mm in **A**, **C**, and **D**; 50μm in **A1**, **C1**, **D1**, and **D2**; 200μm in **A2**; 10μm in **A2** inset; 500μm in **B**; 25μm in **B1** and insets of **C1** and **D2**.

paradigm is higher when the acoustic startling stimulus is processed binaurally than when processed monaurally. Our study clearly supports that the functional connections of the cochlear root nucleus with PnC constitutes the neuronal bases underlying the rapid short-latency and binaural summation of the ASR.

#### **TONOTOPIC-SPECIFIC DISTRIBUTION THROUGH THE CELL BODY AND DENDRITES OF CRNs**

Alerting and escape behaviors have high dependence on the frequency of sounds, which are of great importance in the life of rodents. For example, rats emit ultrasounds in the frequency range between 18 and 30 kHz in response to a threatening stimulus (predator exposure), and hence, serve as alarm calls to specifically warn other individuals (Cuomo and Cagiano, 1987; Blanchard et al., 1991). CRNs provide acoustic information to a wide variety of non-auditory nuclei involved in the startle reflex, orientation of head and ears toward a novel sound, vocalization, emotional information, and escape (López et al., 1999; Nodal and López, 2003; Horta-Júnior et al., 2008). Since these sensory events and escape behaviors are initially mediated by CRNs (Lee et al., 1996; López et al., 1999), it is likely that

**FIGURE 8 | The acoustic startle reflex (ASR) in rats with monaural sound-deprivation. (A)** Amplitude-intensity function of the ASR in response to noise bursts shows differences between the control (*n* = 6) and the experimental group (*n* = 6), before and following monaural earplugging. Notice that ASR amplitude was significantly higher in control and pre-earplugging rats than earplugged animals at all intensities tested (95, 105, and 115 dB SPL). Data is expressed in mean values and error bars represent standard deviation (SEM), ∗*p* < 0.05 [control vs. monaural earplugged animals, *F*(1,12) = 6.32; *p* = 0.04; experimental group, pre-earplugging vs. monaural earplugging, *F*(5, 6) = 8.45, *p* = 0.008]. The source of variability is the number of animals and trials. **(B)** Histogram shows the number of c-Fos immunopositive neurons per mm2 in the inferior colliculus (IC) of control and monaural earplugged animals. Notice that the contralateral plugged side (left IC) contains significantly less neurons than the left IC of control animals. However, the right IC of control and monaural earplugged animals showed no significant differences (ns). Bars represent mean values and SEM, ∗*p* < 0.05. The variability observed in the stereological study comes from sampling one section containing the left and right IC in each one of the 12 animals used in the behavioral experiments**. C**, Low magnification micrograph shows c-Fos immunoreactivity in the IC of a monaural earplugged animal. **(D,E)** High magnification micrographs corresponding to the frames in **(C)** show c-Fos immunolabeling in the contralateral (**D**, left IC) and ipsilateral plugged side (**E**, right IC). Note the number of c-Fos immunolabeled neurons is considerable less in the contralateral than the ipsilateral plugged side. Scale bars = 1 mm in **C**; 50μm in **D** and **E**.

CRNs receive selective inputs from specific regions of the cochlea. Our HRP injections in apical and basal coils of the cochlea indicated that the cell body and dendrites of CRNs are preferentially innervated by different portions of the cochlea. Fibers from basal, high frequency parts of the cochlea, terminated preferentially on the cell body whereas those from more apical, low frequency parts of the cochlea, mainly terminated on dendrites. These qualitative observations are in agreement with Osen et al. (1991) studies, which showed that the base of the cochlea is the source of most primary inputs to CRNs somata. Our study further demonstrated that dendrites of CRNs, which are quite extensive, are highly innervated by cochlear nerve inputs. The technical challenge of labeling dendrites of CRNs while analyzing the primary afferents might explain why this result was not shown in our previous reports (Merchán et al., 1988; Osen et al., 1991; López et al., 1993). In the present study, the experimental approach designed by Gómez-Nieto and Rubio (2009) combining two neuronal tracers, D-FICT in the TB and DiI in the cochlear nerve root, allowed us to confirm that CRNs dendrites also receive cochlear nerve inputs. The frequency-specific distribution described in our study provides an explanation for the frequency tuning curves observed in extracellular recordings of CRNs (Sinex et al., 2001; Gómez-Nieto et al., 2013). Thus, our results showed that CRNs somata and primary dendrites receive massive high-frequency inputs that might be responsible for the high characteristic frequency of CRNs (∼30 kHz). Since the basal regions of the cochlea have also the shortest latencies, these might provide a specialization to minimize the response latency in CRNs (Sinex et al., 2001). Furthermore, it is interesting to point out that CRNs responded to a frequency range from 1.5 to 40 kHz that is nearly the entire audiogram of the rat (Kelly and Masterton, 1977; Sinex et al., 2001), which is in line with our observations showing high and mostly low frequency inputs on CRNs dendrites. Our data together with the fact that the synaptic strength becomes larger as one moves along the dendrite (Spruston, 2000; London and Segev, 2001) suggest that CRNs, which are high frequency specialized neurons, might also integrate a wide range of frequencies in their dendrites.

## **RAPID GLUTAMATERGIC TRANSMISSION IN THE COCHLEAR ROOT NUCLEUS**

The ASR behavioral paradigm is defined by its rapid and short latency reflex actions (Szabo, 1964; Hoffman and Ison, 1980). For that, CRNs have specific morphological, physiological and neurochemical properties that provide them the capacity of mediating fast and secure neurotransmission of short-latency auditory cues (López et al., 1993, 1999; Sinex et al., 2001). An important outcome of our study was to demonstrate that cochlear nerve inputs on CRNs colabeled with VGLUT1. This result together with the fact that VGLUT1 terminals massively covered CRNs on the z axis (Gómez-Nieto et al., 2008b; see also our results) indicates that auditory nerve terminals on CRNs are far more numerous than reported in previous studies (Merchán et al., 1988). Since cochlear nerve fibers use glutamate as neurotransmitter (Hackney et al., 1996; Rubio and Juiz, 2004; Rubio, 2006), VGLUT1 might contribute to the synaptic efficacy by regulation of vesicle cycling and filling (Wilson et al., 2005). Furthermore, the relatively low affinity for their substrate allows the VGLUT1 to transport large amounts of glutamate more rapidly (Bergles and Edwards, 2008). Thus, our results showed that CRNs somata and dendrites are fully covered by VGLUT1-cochlear nerve terminals which provide a great speed on synaptic signaling in the first component of the ASR circuit. This VGLUT1-cochlear nerve colabeling has been also found on bushy cells of the VCN (Gómez-Nieto and Rubio, 2009), a neuronal type that encodes features of the acoustic waveform and conveys precise temporal information to upper auditory structures (Friauf and Ostwald, 1988; Cant and Benson, 2003). VGLUT1 terminals on CRNs (ASR pathway) and bushy cells (auditory pathway) exhibit an altered morphology in mice with targeted deletion of the gene coding for the auxiliary subunit α2δ3 of voltage-gated calcium channels (Pirone et al., 2014). Both neuronal types express the α2δ3 subunit and its lack lead to neuronal deficits in the auditory and the ASR pathway, including the inability to discriminate temporal dimensions of sounds (Pirone et al., 2014). Thus, CRNs might provide fast acoustic information with accurate temporal precision to the ASR pathway just as bushy cells do for the ascending auditory pathway. This idea is in line with our findings that support the CRNs-PnC projection as the neuronal pathway underlying the binaural summation of the ASR (discussed below).

#### **ORIGIN OF SHORT-LATENCY ACOUSTIC INPUTS TO PnC NEURONS**

A conclusive demonstration that one nucleus is not innervated by another is difficult to accomplish in any region of the central nervous system. This became more difficult if that nucleus, as occurs in PnC, receives a large number of inputs and contains numerous crossing fibers. Bearing this argument in mind, our study provided evidence that supports the CRNs as the solely source of short-latency acoustic inputs to giant PnC neurons. Our electrophysiological experiments with subsequent neuronal tracing showed retrogradely labeled CRNs after recording acoustically driven PnC neurons. This result is consistent with our previous reports that demonstrated the projections from the cochlear root nucleus to PnC neurons (López et al., 1999; Nodal and López, 2003). In addition, our BDA injections in PnC also generated very few retrogradely labeled neurons in the DCN and VCN, which in principle suggests that other areas of cochlear nucleus complex might provide short-latency acoustic inputs to PnC as proposed by other studies (Davis et al., 1982; Kandler and Herbert, 1991; Lingenhöhl and Friauf, 1994; Meloni and Davis, 1998). Lingenhöhl and Friauf (1994) reported retrogradely labeled somata in DCN and VCN after injecting a pure retrograde trace (FluoroGold) in PnC. By contrast, our restricted BDA injections into the DCN and VCN were unable to demonstrate the connection of these two areas with giant PnC neurons. Although we filled a great number of DCN and VCN projecting neurons, as shown by the labeled terminal fields in many nuclei known to receive innervations from the DCN and VCN (reviewed in Cant and Benson, 2003), we did not find any terminal fields on PnC neurons. A possible explanation for these contradictory results is that DCN and VCN neurons were retrogradely labeled after tracer injections in PnC as a consequence of the tracer uptake by fibers of passage rather than by terminals on PnC neurons. This idea seems to be consistent with previous studies reporting that DCN and VCN projections to PnC were very weak, and not as dense as those from the cochlear root nucleus (Friauf and Ostwald, 1988; Kandler and Herbert, 1991; Meloni and Davis, 1998). In accordance with this morphological data, behavioral studies showed that chemical lesions of CRNs drastically reduced the startle response at all intensities (Lee et al., 1996), whereas electrolytic lesions of the DCN did not (Davis et al., 1982; Meloni and Davis, 1998). Meloni and Davis (1998) found that electrolytic lesions of the DCN lead to a significant reduction in ASR amplitude at 110 and 115 dB SPL startle-eliciting intensities and normal responses on all other intensities. Interestingly, one finding of our BDA injections in DCN was that some auditory nerve terminals on CRNs and DCN neurons arise from the same parent auditory nerve fiber. According to this, electrolytic lesions of the DCN might damage auditory nerve terminals on CRNs, and might reduce the ASR amplitude at high intensities. It is well established that auditory nerve fibers that reach the dorsal part of the cochlear nucleus complex originated from the basal coils (high-frequency) of the cochlea (Saint Marie et al., 1999). Thus, our BDA injections in the dorsal part of the DCN generated retrograde labeled terminals on the cell body of CRNs in a similar pattern than that observed in our HRP injections in the basal coil of the cochlea. This result verifies the tonotopic-specific distribution through the cell body and dendrites of CRNs and suggests that both CRNs and DCN neurons receive similar acoustic information. DCN projects to the IC (Beyerl, 1978; Oliver and Shneiderman, 1991; Oliver et al., 1999; Cant and Benson, 2003), which is known to participate in the modulation of the ASR (Leitner and Cohen, 1985; Fendt et al., 2001; Yeomans et al., 2006; Gómez-Nieto et al., 2008a, 2013). It is, therefore, likely that DCN might provide acoustic information to ASR modulation pathways rather than being necessary for the initiation and elicitation of the ASR.

#### **NEURONAL BASES UNDERLYING BINAURAL SUMMATION OF THE ACOUSTIC STARTLE REFLEX**

Our electrophysiological data showed that PnC neurons, which are innervated by CRNs, responded with very short spike latencies to noise bursts. The fact that sounds that contain every frequency activate PnC neurons is consistent with our hypothesis that frequency integration occurs in CRNs dendrites, and implies that CRNs provide precise and rapid acoustic information to PnC neurons. Our previous studies reported that CRNs have thick myelinated axons and contain calcium binding proteins that confer to the cochlear root nucleus the necessary specializations for sending fast electric signals to PnC (López et al., 1993, 1999). Accordingly, more recent studies have demonstrated that proteins such as the potassium channel subunit Kv1.1 and the transcription factor Math5-lacZ are highly expressed in CRNs (Oertel et al., 2008; Saul et al., 2008). These proteins have been found to participate in fast neurotransmission, temporal synchrony, and processing of binaural information (Saul et al., 2008; Allen and Ison, 2012). Consistently with the molecular specializations of CRNs, an important conclusion that we draw from our electrophysiological experiments is that the CRNs-PnC projections determine the binaural summation of the ASR. This phenomenon of binaural summation provides that the startling stimulus presented to both ears is perceived as more intense than if it were presented in monaural mode. Therefore, there is strong binaural summation for startle, with a preference for acoustic stimuli delivered near the midline to activate both ears simultaneously (reviewed in Yeomans et al., 2002). Accordingly, we showed that the binaural evoked responses of PnC neurons were almost the sum of those evoked monaurally. The fact that the bilateral CRNs-PnC projections have a clear contralateral predominance (López et al., 1999; Nodal and López, 2003) might explain our result showing higher PnC neurons' responses evoked with contralateral than ipsilateral acoustic stimulations. Our electrophysiological analysis also revealed that PnC neurons' responses increased as the contralateral stimulus intensity increases, a result which is in line with our behavioral data and the contralateral predominance of the CRNs-PnC pathway. Wagner et al. (2000) suggested that the superior olivary complex, which is involved in binaural processing, is necessary for the full expression of the ASR. However, the possibility that neuronal circuits beyond the cochlear nucleus complex contribute to the binaural summation of the ASR seems very limited. The ASR mediated via the superior olivary complex involves too many synapses to accomplish the short latency of the ASR, and this led us to restrict the binaural summation to the CRNs-PnC pathway. In accordance with two reports in rats (Lingenhöhl and Friauf, 1994) and cats (Walberg, 1974), our study suggests the existence of reciprocal connections between the left and right PnC, which need to be accounted for the final evoked response of PnC neurons. Further research is required to learn more about these crossed reticulo-reticular connections and their possible functional role in the ASR. It is also relevant to note that our electrophysiological results were consistent with our behavioral experiments showing that the ASR amplitude was higher when the acoustic startling stimulus was processed binaurally than when it was processed monaurally. We came to this conclusion comparing the behavioral response of the control animals with that of monaural earplugged animals. Therefore, it was essential in this experimental design to verify the effectiveness of the earplugging. Since c-Fos protein has been widely used as a marker of early neuronal activation (Sagar et al., 1988; Dragunow and Faull, 1989; Murphy and Feldon, 2001), we quantified the c-Fos immunolabeling in the IC. The IC was selected for c-Fos quantification because it is an obligatory relay center for most ascending auditory tracts (Beyerl, 1978; Oliver and Shneiderman, 1991) and plays an important role in the prepulse inhibition of the ASR (Leitner and Cohen, 1985; Fendt et al., 2001; Li and Yue, 2002; Yeomans et al., 2006; Gómez-Nieto et al., 2008a, 2013). Our results showed that the number of c-Fos immunolabeled neurons in the contralateral IC was drastically reduced by the earplugging. This reduction was not found in the ipsilateral side to the earplugging because the IC receives auditory inputs from the contralateral cochlear nucleus (Beyerl, 1978; Oliver and Shneiderman, 1991). Since monaural conductive hearing loss reduces auditory nerve activity and affects sound processing along the central auditory pathway (Potash and Kelly, 1980), the reduction of c-Fos immunoreactivity in the IC indicates that bilateral afferent processing within the ASR pathway was affected by the earplugging. As expected by Davis and Wagner (1969) report, we found that the ASR amplitude increased with increasing stimulus intensity in control and pre-earplugged animals. In contrast, earplugged animals showed much less ASR amplitude, suggesting that the binaural ASR pathway is required to elicit a full startling response. Our study suggests the CRNs-PnC pathway as an anatomical and physiological specialization that determines the binaural summation of the ASR. Since CRNs also projects to non-auditory nuclei involved in other startling reflex modalities (López et al., 1999; Horta-Júnior et al., 2008), it is reasonable to propose that the CRNs projections might also participate in cross-modal summation of the ASR (Yeomans et al., 2002). In conclusion, our study consolidates the CRNs as the "early warning system" responsible for the execution and propagation of bilateral acoustic startling signals at very short latencies, and that is what defines the ASR in itself.

## **ACKNOWLEDGMENTS**

We thank Sonia Hernández and David Sánchez for excellent technical assistance in the quantitative analyses of c-Fos immunoreactivity. This research was supported by grants from the Spanish Ministry of Science and Innovation (MICINN, #BFU2010- 17754) to Dr. Dolores E. López; the São Paulo State Research Foundation (FAPESP, #2008/02771-6) to Dr. José de Anchieta C. Horta-Júnior; and the 1R01DC013048-01 to Dr. María E. Rubio.

### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fnins. 2014.00216/abstract

**Supplemental Figure 1 | Distribution of VGLUT1-immunolabeled endings in the cochlear root nucleus. (A)** Epi-fluorescence micrograph of a coronal section shows VGLUT1-immunolabeled endings (Cy3 fluorochrome) in the cochlear root nucleus. **(B–E)** Epi-fluorescence micrographs of the boxed areas in **(A)** show distribution of VGLUT1-immunolabeled endings from dorsal to ventral regions of the cochlear root nucleus. Note that numerous VGLUT1-immunolabeled endings decorate unlabeled cell bodies (arrows) and dendrites (arrowheads) of cochlear root neurons (CRNs). TB, trapezoid body. Scale bars = 200μm in **A**; 25μm in **B–E**.

**Supplemental Figure 2 | BDA injections in the caudal pontine reticular nucleus (PnC) generate anterograde and retrograde labeling in the contralateral PnC. (A)** Low magnification micrograph of a coronal section of the brainstem shows a representative case with BDA injection sites (IS) in the right PnC. The tracer was injected after recording of acoustically driven PnC neurons. Notice numerous BDA-labeled axons (arrowheads) crossing the midline toward the contralateral PnC. Also, thick axons of cochlear root neurons (position denoted with an arrow) were retrogradely labeled with BDA in the trapezoid body (TB). **(B)** Higher magnification corresponding to the frame in **(A)** shows details of BDA-labeled terminals onto retrogradely labeled giant PnC neurons (arrows). Scale bars = 1 mm in **A**; 200μm in **B**.

**Supplemental Figure 3 | Reciprocal connections between the left and right caudal pontine reticular nuclei (PnC). (A)** Micrograph of a Nissl-stained coronal section shows a BDA injection site (IS) in the right PnC. The tracer was injected after recording of acoustically driven PnC neurons. Arrowheads indicate crossed reticulo-reticular projections labeled with BDA. Also, thick axons of cochlear root neurons (position denoted with an arrow) were retrogradely labeled with BDA in the trapezoid body (TB). **(B,C)** Higher magnification corresponding to the frames in **(A)** shows details of BDA-labeled terminals (arrowheads) onto Nissl-stained giant PnC neurons. Notice a PnC neuron retrogradely labeled by BDA (asterisk in **C**). Scale bars = 1 mm in **A**; 25μm in **B** and **C**.

**Supplemental Figure 4 | Retrograde labeling in the coclear nucleus complex after BDA injections in the acoustically driven area of the caudal pontine reticular nucleus (PnC). (A)** High magnification micrograph shows a coronal section of the contralateral dorsal cochlear nucleus (DCN) in

which a BDA-retrogradely labeled neuron (arrow) was found after the injection site shown in **Supplemental Figure 2A**. **(B)** High magnification micrograph shows a BDA-retrogradely labeled neuron (arrow) in the contralateral ventral cochlear nucleus (VCN). **(C)** Quantitative distribution of retrogradely labeled neurons in the cochlear nucleus complex after BDA injections in the acoustically driven area of PnC. The table shows the retrograde labeling obtained in the seven cases used in the electrophysiological experiments. Notice that the number of retrogradely labeled neurons in the DCN and VCN is considerably less than those labeled in the cochlear root nucleus. Scale bar = 200μm in **A** and **B**.

**Supplemental Figure 5 | BDA injections in the dorsal cochlear nucleus (DCN) generate retrogradely labeled terminals on the cell body of cochlear root neurons (CRNs). (A)** Nissl-stained coronal section of the cochlear root nucleus shows retrogradely labeled fibers after the BDA injections shown in **Figure 6A**. **(B)** Higher magnification corresponding to the frame in A shows details of BDA-labeled terminals (arrowheads) on CRNs somata. Notice that the morphology and distribution of these BDA-labeled terminals resemble those of the cochlear nerve after HRP injections in the basal coil of the cochlea (see **Figures 2A1–A4**). VCN, ventral cochlear nucleus. Scale bars = 400μm in **A**; 50μm in **B**.

**Supplemental Figure 6 | ASR amplitude, ASR latency and weight in rats prior to and following monaural earplugging. (A)** Histograms show the ASR amplitude in response to noise burst of 95, 105, and 115 dB SPL in rats before and following 4 days of monaural earplugging. Notice that ASR amplitude was significantly higher in normal hearing than following monaural earplugging at all intensities tested (95, 105, and 115 dB SPL). Data is expressed in mean values and error bars represent standard deviation (SEM), ∗*p* < 0.05 [experimental group, pre-earplugging vs. monaural earplugging, *F*(5,6)= 8.45, *p* = 0.008]. **(B)** Data-set tables display the ASR amplitude (mean values of *V*max within the testing session), ASR latency (mean values of *T*max within the testing session) and weight of each animal in normal hearing conditions (pre-earplugging) and following monaural earplugging. There were no significant differences in the ASR latency between the normal hearing and earplugged conditions for all stimulus intensities tested. Notice the animals maintained their weight steady throughout the behavioral experiment.

#### **REFERENCES**


primary acoustic startle circuit in rats. *Brain Struct. Funct*. doi: 10.1007/s00429- 013-0585-8. [Epub ahead of print].


response in rats. *Behav. Brain Res*. 108, 181–188. doi: 10.1016/S0166-4328(99) 00146-1


*Neurosci. Biobehav. Rev*. 26, 1–11. doi: 10.1016/S0149-7634(01) 00057-4

Zhou, J., Nannapaneni, N., and Shore, S. (2007). Vesicular glutamate transporters 1 and 2 are differentially associated with auditory nerve and spinal trigeminal inputs to the cochlear nucleus. *J. Comp. Neurol*. 500, 777–787. doi: 10.1002/cne.21208

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 01 March 2014; accepted: 03 July 2014; published online: 25 July 2014. Citation: Gómez-Nieto R, Horta-Júnior JAC, Castellano O, Millian-Morell L, Rubio ME and López DE (2014) Origin and function of short-latency inputs to the neural substrates underlying the acoustic startle reflex. Front. Neurosci. 8:216. doi: 10.3389/ fnins.2014.00216*

*This article was submitted to Auditory Cognitive Neuroscience, a section of the journal Frontiers in Neuroscience.*

*Copyright © 2014 Gómez-Nieto, Horta-Júnior, Castellano, Millian-Morell, Rubio and López. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Brain state-dependent abnormal LFP activity in the auditory cortex of a schizophrenia mouse model

## *Kazuhito Nakao1,2 and Kazu Nakazawa1,2\**

*<sup>1</sup> Department of Psychiatry and Behavioral Neurobiology, University of Alabama at Birmingham, Birmingham, AL, USA*

*<sup>2</sup> Unit on Genetics of Cognition and Behavior, Department of Health and Human Services, National Institute of Mental Health, National Institutes of Health, Bethesda, MD, USA*

#### *Edited by:*

*Yukiko Kikuchi, Newcastle University Medical School, UK*

#### *Reviewed by:*

*Piia Astikainen, University of Jyväskylä, Finland Huan Luo, Chinese Academy of Sciences, China*

#### *\*Correspondence:*

*Kazu Nakazawa, Department of Psychiatry and Behavioral Neurobiology, University of Alabama at Birmingham, Shelby Building 1105, 1825 University Boulevard, Birmingham, AL 35294, USA e-mail: nakazawk@uab.edu*

In schizophrenia, evoked 40-Hz auditory steady-state responses (ASSRs) are impaired, which reflects the sensory deficits in this disorder, and baseline spontaneous oscillatory activity also appears to be abnormal. It has been debated whether the evoked ASSR impairments are due to the possible increase in baseline power. GABAergic interneuron-specific NMDA receptor (NMDAR) hypofunction mutant mice mimic some behavioral and pathophysiological aspects of schizophrenia. To determine the presence and extent of sensory deficits in these mutant mice, we recorded spontaneous local field potential (LFP) activity and its click-train evoked ASSRs from primary auditory cortex of awake, head-restrained mice. Baseline spontaneous LFP power in the pre-stimulus period before application of the first click trains was augmented at a wide range of frequencies. However, when repetitive ASSR stimuli were presented every 20 s, averaged spontaneous LFP power amplitudes during the inter-ASSR stimulus intervals in the mutant mice became indistinguishable from the levels of control mice. Nonetheless, the evoked 40-Hz ASSR power and their phase locking to click trains were robustly impaired in the mutants, although the evoked 20-Hz ASSRs were also somewhat diminished. These results suggested that NMDAR hypofunction in cortical GABAergic neurons confers two brain state-dependent LFP abnormalities in the auditory cortex; (1) a broadband increase in spontaneous LFP power in the absence of external inputs, and (2) a robust deficit in the evoked ASSR power and its phase-locking despite of normal baseline LFP power magnitude during the repetitive auditory stimuli. The "paradoxically" high spontaneous LFP activity of the primary auditory cortex in the absence of external stimuli may possibly contribute to the emergence of schizophrenia-related aberrant auditory perception.

**Keywords: auditory steady-state responses, GABAergic interneurons, gamma oscillation, local field potentials, NMDA receptors, parvalbumin, schizophrenia, mouse models**

## **INTRODUCTION**

Neural oscillation and synchronization abnormalities have been suggested to play a role in the information and sensory processing deficits commonly seen in schizophrenia (Ford and Mathalon, 2008; Uhlhaas and Singer, 2010; Gandal et al., 2012a). Periodic auditory stimulation entrains the electro-encephalogram (EEG) to a specific phase and frequency, often referred to as the auditory steady-state response (ASSR). In both human and animal models, the ASSR has been used to assess the functional integrity of neural circuits that support synchronization (Picton et al., 2003; Brenner et al., 2009; O'Donnell et al., 2013). In schizophrenia, reduced ASSR power (magnitude) and phase locking (phase consistency across trials), particularly at 40 Hz, are observed in EEG (Kwon et al., 1999; Brenner et al., 2003; Light et al., 2006; Spencer et al., 2008, 2009; Vierling-Claassen et al., 2008; Krishnan et al., 2009) as well as in magneto-encephalogram (MEG) (Teale et al., 2008; Maharajh et al., 2010; Tsuchimoto et al., 2011) studies. Since cortical parvalbumin (PV)-positive fast-spiking interneurons have an intrinsic resonance near this range (Tateno et al., 2004; Golomb et al., 2007), the reduction in 40-Hz ASSRs may reflect functional deficits of these fast-spiking neurons in schizophrenia.

Earlier studies of gamma synchrony deficits in schizophrenia reported the *relative* changes in gamma band activity in response to task stimuli, by assessing stimulus-evoked responses in synchrony compared with a pre-stimulus baseline (Kwon et al., 1999; Haig et al., 2000; Lee et al., 2003). Thus, in these studies *relatively* less evoked gamma synchrony could be a reflection of greater baseline spontaneous gamma phase synchrony under pre-stimulus conditions. However, in schizophrenia the evidence regarding baseline gamma activity abnormalities is inconsistent. Both increases (Jalili et al., 2007; Venables et al., 2009; Kikuchi et al., 2011; Spencer, 2012) and decreases (Yeragani et al., 2006; Rutter et al., 2009) in *baseline* spontaneous gamma power during pre-stimulus period or "resting state" have been reported. The reason for these contradictory results has yet to be clarified.

To measure the baseline spontaneous gamma band power with high precision, it would be useful to directly record local field potentials (LFPs), necessitating the use of animal models. To that end, we recorded LFPs directly from the primary auditory (A1) cortex of GABAergic interneuron-specific NMDA receptor (NMDAR) hypofunction mice (*Ppp1r2*-cre/fGluN1 KO mice). Previous studies using this mutant mouse revealed that the selective deletion of GluN1, an indispensable subunit of NMDARs, in cortical and hippocampal interneurons during early postnatal development recapitulates several schizophrenia-like behavioral and pathophysiological phenotypes (Belforte et al., 2010; Jiang et al., 2013). In the present study, we subjected these mutant mice to the ASSR paradigm, similar to the one used in human studies (Krishnan et al., 2009). We assessed the auditory click trainevoked ASSRs and baseline LFP fluctuations in pre/post-stimulus period and at baseline (i.e., between stimulus presentations).

## **MATERIALS AND METHODS**

All experimental procedures were in accordance with National Research Council guidelines for the care and use of laboratory animals, and were approved by the National Institute of Mental Health Animal Care and Use Committee. Data analysis was conducted at the University of Alabama at Birmingham.

### **ANIMAL**

*Ppp1r2*-cre(+/−) /fGluN1(f/f) mice (henceforth referred to as KO mice or mutants) were generated as previously described (Belforte et al., 2010). Briefly, the protein phosphatase 1, regulatory subunit 2 (*Ppp1r2*)-cre line and a floxed-GluN1 (fGluN1) line were used to delete exons 9 and 10 of GluN1 gene from the postnatal second week in a subset of cortical and hippocampal Ppp1r2-cre positive interneurons, the majority of which are PV-positive. Female mutant mice were bred to homozygously fGluN1 male mice to generate the same mutant and fGluN1 control mice with a 50% probability. In the present study, 65 male mice received chronic survival surgery for the microwire array implantation. After successful detection of the auditory-evoked potentials 1 week after the surgery, 7 fGluN1 control (13–16 week-old, 30.6 ± 0.65 g body weight) and 6 mutant (12–14 week-old, 28.1 ± 0.8 g) mice were subjected to in-depth analysis of ASSRs, as described in Result section.

#### **SURGICAL PROCEDURES**

Animals were anesthetized with isoflurane to surgical levels and were mounted in a stereotaxic instrument with non-rupture ear bars (Zygoma ear cups, David Kopf Instruments). A custommade plastic headpost was secured to the occipital bone at the midline with superglue and dental acrylic, and was used to fix the animal's skull to the stereotaxic instrument. This was done to prevent physical occlusion of the external ear canals by stereotaxic ear bars in order to obtain tone-evoked LFP responses. A unilateral craniotomy was made over the right temporal bone from 1.5 to 3.5 mm posterior to bregma and from 3.5 to 4.5 mm lateral to midline. The vasculature was inspected. The microwire multielectrode array consisted of six tetrodes, which were customconfigured in a 2 × 3 matrix with inter-electrode distance of <sup>∼</sup>200µm, covering 0.6 <sup>×</sup> <sup>0</sup>.8 mm2. The impedance of each electrode was between 0.2 and 0.3 M-. The microwire array was inserted into the superficial layers of A1 cortex with the aid of cortical vascular patterns, and two stainless steel screws in the frontal cortex which served as ground and reference electrodes. After the dosage of isoflurane was reduced to 1%, a single white noise pulse (1 ms, duration; 80 dB, SPL) was applied to activate the A1 cortical area. In order to allow for the tone-evoke responses it is critical to maintain the isoflurane concentration at 1% (Santarelli et al., 2003). The animal was held in place with adhesive tape to prevent head twitching or grooming. An analgesia (buprenorphine, 0.1 mg/kg *s.c.*) was given to diminish pain sensation during the surgery. If single-tone evoked potentials (over 0.1 mV magnitude) were detected in at least one electrode of the microwire array, the electrodes were inserted further until maximal responses were obtained. The anesthetic dose was then returned back to surgical levels, and the microwire array was fixed to the skull with dental acrylic.

#### *IN VIVO* **RECORDING**

Seven days after surgery, LFP recording was performed from A1 cortex of awake, head-restrained mice. The mice were briefly anesthetized with 1% isoflurane to hold the animal head fixed to the stereotaxic instrument using the headpost, and the body was covered with adhesive paper tape to limit body movements. The micro-array electrodes were directly connected, via an EIB-27- Micro headstage pre-amplifier, to a Cheetah-64 recording system (Neuralynx Inc.), where LFP signals were filtered (bandwidth from 0.1 to 475 Hz), digitized, and acquired at a sampling rate of 1.56 kHz per channel. Thirty minutes after the cessation of anesthesia, LFP recording began from A1 cortex of awake, headrestrained mice in a custom-made auditory isolation chamber (background sound level, 40 dB SPL).

In the first session, spontaneous LFP activity during a prestimulus period was recorded from A1 cortex for 2–25 min. Subsequently, in the second session, 500-ms long click trains consisting of 80 dB white-noise pulses presented at 40 Hz (40- Hz ASSR stimuli) were applied 50 times with an inter-stimulus interval of 20 s, which mimics the ASSR protocol used in human studies (Krishnan et al., 2009). Auditory click stimuli, consisting of white noise pulses (1 ms, duration; 80 dB, SPL), were generated in Labview (National Instruments Inc.), and presented using a speaker with a 35 Hz–20 kHz frequency response (Z3, Logitech Inc.) placed 30 cm above the mouse head. In the third session, which began 10 min after cessation of the second session, 1000 ms long click trains consisting of 80 dB white-noise presented at 20 Hz (20-Hz ASSR stimuli) were applied 50 times with an interstimulus interval of 20 s. In the last session, spontaneous LFP activity was recorded for 25 min as a post-stimulus period. When no auditory evoked LFP responses were detected in any channels during the second session, the experiment was terminated and the animal was euthanized.

#### **LFP ANALYSIS**

Only the channel data in which the amplitude of initial N1 response in the 40 Hz-ASSRs was more than 0.1 mV (∼4 times the standard deviation), were used for subsequent analyses. Neuralynx LFP files were first converted to Spike2 format to visually inspect the raw data. Next, LFP voltage values in the Neuralynx files were converted to Matlab (Mathworks) files, and these values were normalized to the z-scores by subtracting the mean and dividing by the standard deviation of the LFP voltages during entire recording epoch (∼20 min). The Matlab files with the z-scores were then converted to NeuroExplorer (Nex Technologies) files to calculate the power.

In order to assess the oscillatory component of evoked ASSRs, z-score normalized LFPs during the last 200-ms of each ASSR were analyzed with a fast Fourier transform (FFT) algorithm in the range of 0–100 Hz using 256 frequency bins and presented as total ASSR power (e.g., **Figure 1C**). Relative power amplitudes were calculated by subtracting a baseline spontaneous power, which was from the 200-ms inter-stimulus segment 10 s prior to each click-train onset, from the total ASSR power (see **Figure 1E**).

For spontaneous LFP power during a pre-stimulus period, LFP data (200-ms bin) during the last 10 s prior to the first click-train administration were analyzed with FFT algorithm in the range of 0–200 Hz using 256 frequency bins (**Figure 3**). To compare the baseline power magnitudes in-between ASSR sessions with the spontaneous power during the pre- or post-stimulus period, z-score normalized LFP from 5 to 15 s (200-ms bin) after 1st stimuli, 25th stimuli, and 50th stimuli were analyzed with FFT algorithm in the range of 0–100 Hz using 256 frequency bins. For the pre-stimulus period, the baseline spontaneous power during last 10 s before click train onset was analyzed in the range of 0–100 Hz using 256 frequency bins. For power spectral analysis during the post-stimulus period, LFP data obtained from a 10-s period (200-ms bin) 20 min after the cessation of all ASSR stimuli were analyzed with FFT algorithm in the range of 0–100 Hz using 256 frequency bins (see **Figure 4A**).

To calculate phase locking to auditory click- trains, phase locking was performed in a frequency range 0–100 Hz with a 60% overlapping window after applying Hanning tapering of normalized LFP data, which was further analyzed with FFT algorithm. To plot a scalogram (wavelet spectrogram), Matlab z-score files of LFP were wavelet transformed using a Complex Gaussian wavelet from Matlab wavelet toolbox.

#### **STATISTICS**

Given that between-animal variability may be larger than withinanimal variability in per-channel (i.e., per electrode) design, we mainly presented the data with per-animal design in the Figures and some data with per-channel design in the Supplemental Figures (see Lazic and Essioux, 2013). Differences between groups were assessed for normally distributed data using a Student's *t*test (Statcel 2nd ed., OMS, Tokyo, Japan). The effect size was assessed as Cohen's *d*. For the graph data in **Figures 4**, **5**, differences were assessed by repeated measures of ANOVAs followed by Bonferroni *post-hoc* analysis (SPSS, IBM). Data were presented as mean ± s.e.m.

#### **RESULTS**

#### **ROBUST REDUCTION OF 40-Hz AUDITORY STEADY-STATE RESPONSES (ASSRs)**

Seven days after surgery, 40-Hz click train-evoked initial N1 responses (i.e., the transient auditory evoked potentials to click train onset, more than 0.1 mV) were detected in A1 cortex from13 mice (7 fGluN1 control and 6 mutant mice), out of a total of 65 animals in which the click-train-evoked responses had been detected during the electrode implantation surgery. The relative high number of animals that displayed no evoked LFPs was mostly likely to be due to a shift of or damage to the electrode microarray placed on the temporal bone. Thirty-one LFP recordings from 7 control mice [animal #1: 2 (number of recording sites to be analyzed)/6 (total channel number), #2: 6/6, #3: 3/6, #4: 6/6, #5: 6/6, #6: 3/6, #7: 5/6], and 26 LFP recordings from 6 mutant mice (animal #1: 5/6, #2: 4/6, #3: 6/6, #4: 5/6, #5: 3/6, #6: 3/6) were subjected to subsequent LFP in-depth analysis.

Fifty 40-Hz click trains (duration, 500 ms) were delivered to click train-naïve animals with an inter-stimulus interval of 20 s. **Figure 1A** depicts representative examples of the averaged ASSRs (middle) and scalogram (wavelet spectrogram, bottom) evoked by 40 Hz stimulation (upper) in the floxed-control (left) and mutant mouse (right). Robust click train-evoked N1 potentials were elicited within the first 100 ms after click-train onset in both genotypes, and there were no differences in the averaged N1 amplitudes between genotypes per animal (**Figure 1B**). However, the N1 amplitudes averaged per channel were lower in the mutants compared to the floxed-control mice (*p* < 0.05, Student's *t*-test, Supplemental Figure 1A). To assess the subsequent ASSRs coherent to the 40-Hz click trains without any impact of evoked N1 potentials on the steady-state responses, LFP data (z-score) during last 200 ms before click-train cessation (a dashed line period in **Figure 1A**) were analyzed with an FFT algorithm. We found that the amplitudes of 40-Hz ASSRs were smaller in the mutants compared to the controls per animal [**Figure 1C**, *t*(11) = 2.8, *p* < 0.05, Cohen's *d* = 1.60 (large effect size)] and per channel [Supplemental Figure 1B, *t*(55) = 5.23, *p* < 0.01, *d* = 1.43 (large effect size)]. Difference in evoked ASSR power from baseline spontaneous power during inter-stimulus intervals, which were obtained by subtracting the spontaneous power amplitudes in-between ASSR stimuli from total ASSR power (**Figure 1E**), also peaked at 40 Hz and, to the lessor degree, at 30 Hz in the controls. Conversely, only small differences were detected in the mutant mice (**Figure 1D** and Supplemental Figure 1C). **Figure 1F** and Supplemental Figure 1D showed power spectrum density difference from the baseline at 35–44 Hz for each animal [*n* = 7 controls, *n* = 6 mutants, *t*(11) = 4.57, *p* < 0.01, *d* = 2.64 (large effect size)] and for each channel [*n* = 31 sites from 7 controls, *n* = 26 sites from 6 mutants, *t*(55) = 7.79, *p* < 0.01, *d* = 2.14 (large effect size)], indicating that average low gamma power for the evoked ASSRs was lower in the mutants compared to the controls. In addition, phase locking analysis of the 40-Hz ASSRs (z-score) revealed two peaks at 40 Hz (35–44 Hz) and at 80 Hz (75–84 Hz) for both controls and mutants (**Figure 1G**), but only phase locking at 40 Hz in the mutants was lower in comparison to controls for each animal [**Figure 1H**, *t*(11) = 4.93, *p* < 0.01, *d* = 2.75 (large effect size)] and for each channel [Supplement Figure 1E, *t*(55) = 9.42, *p* < 0.01, *d* = 2.47 (large effect size)]. These findings suggest that mutants are severely impaired in 40-Hz ASSR for both amplitude and phase locking, both of which are reminiscent of ASSR deficits in schizophrenia patients.

#### **DIMINISHED 20-Hz ASSR POWER AND PHASE-LOCKING**

We next examined 20-Hz ASSRs (duration, 1000 ms) to explore whether ASSR deficits are specific to 40-Hz stimuli. **Figure 2A**

Schematic diagram indicates the analysis periods of baseline LFP power (green, ISI spontaneous power) and the evoked ASSR power (red). Relative ASSR power amplitudes shown in Panel **(D)** were calculated by subtracting an ISI power (in green) from the evoked ASSR power (in red) for each channel, and averaged per animal. **(F)** The difference in the magnitude between 35 and 44 Hz spectral power (arrowheads in Panel **D**) and the baseline for 40-Hz ASSRs in mutants (red) was lower than controls (blue). ∗∗*p* < 0.01, unpaired Student's *t*-test. **(G)** Phase locking to 40-Hz steady-state tone stimuli in control (blue) and mutant (red) mice. Dotted lines: mean ± s.e.m. **(H)** Magnitude of 35–44 Hz phase locking for 40-Hz ASSRs (arrowheads in Panel **G**) in mutants (red) was lower than controls (blue). ∗∗*p* < 0.01, unpaired Student's *t*-test. Each dot represents individual animals. Dotted lines in Panels **(D,G)** are s.e.m.

**FIGURE 2 | Diminished power and phase-locking of 20-Hz ASSRs. (A)** Representative examples of the averaged 20-Hz ASSR (middle, z-score) and spectrogram (bottom), in response to 20-Hz click trains (upper; 80 dB intensity, 1000 ms duration). Time 0 is tone onset. **(B)** No difference in the averaged N1 amplitudes (z-score) evoked by 20-Hz click trains between genotypes (blue for 7 fGluN1 control mice; red for 6 mutant mice). *p* = 0.54, unpaired Student's *t*-test. **(C)** The mean difference (A.U.) from ISI spontaneous power in click train-evoked ASSR power during last 200 ms before cessation of 20-Hz click trains (red

square in **A**) (blue for 7 fGluN1 controls; red for 6 mutants). Dotted lines: mean ± s.e.m. **(D)** The difference in the magnitude between 15 and 24 Hz power (arrowheads in **C**) from ISI spontaneous power for 20-Hz ASSRs was lower in mutant mice (red). ∗*p* < 0.05, unpaired Student's *t*-test. **(E)** Phase locking to 20-Hz ASSR stimuli in control (blue) and mutant (red) mice. Dotted lines: mean ± s.e.m. **(F)** Magnitude of 15–24 Hz phase locking for 20-Hz ASSRs (arrowheads in **E**) was lower in mutant mice (red). ∗*p* < 0.05 unpaired Student's *t*-test. Each dot represents individual animals. Dotted lines in **(C,E)** are s.e.m.

Nakao and Nakazawa Impaired auditory steady-state response

depicts a representative example of the averaged evoked potentials (middle) and spectrogram (bottom) evoked by 20-Hz click trains (upper) in control (left) and mutant mice (right). First, we found no difference in the averaged N1 amplitudes between genotypes analyzed per animal (**Figure 2B**) or analyzed per channel (Supplemental Figure 2A). The difference in the evoked ASSR power, which were obtained by subtracting the spontaneous power amplitudes in between ASSR stimuli from the total ASSR power during last 200 ms before cessation of click-trains (dashed period in **Figure 2A**), also peaked at 20 Hz with a smaller peak at the 40 Hz harmonic in both genotypes (**Figure 2C** and Supplemental Figure 2B). However, the relative power of the dominant peak at 20 Hz (15–24 Hz) was lower in the mutants compared to controls per animal [**Figure 2D**, *t*(11) = 2.59, *p* < 0.05, *d* = 1.47 (large effect size)] and in per-channel design [Supplemental Figure 2C, *t*(55) = 4.58, *p* < 0.01, *d* = 1.25 (large effect size)]. Furthermore, phase locking of the 20-Hz ASSR consisted of a dominant peak at 20 Hz with several spectral peaks at harmonics of 20 Hz (**Figure 2E** and Supplemental Figure 2D). The dominant peak of phase locking factor at 15–24 Hz in the mutants was lower than the controls analyzed per animal [**Figure 2F**, *t*(11) = 2.29, *p* < 0.05, *d* = 1.3 (large effect size)] and analyzed per channel [Supplemental Figure 2E, *t*(55) = 3.77, *p* < 0.01, *d* = 1.01 (large effect size)], but other spectral peaks in the mutants were similar to those in controls per animal (*p* = 0.63 for 35–44 Hz, *p* = 0.37 for 55–64 Hz, *p* = 0.40 for 75–84 Hz, unpaired Student's *t*-test) and per channel (*p* = 0.44 for 35– 44 Hz, *p* = 0.27 for 55–64 Hz, *p* = 0.50 for 75–84 Hz, unpaired Student's *t*-test). These results indicate both ASSR and phaselocking evoked by 20-Hz ASSR stimuli are also diminished in the mutant mice, while the magnitudes of auditory-evoked potentials triggered by 20-Hz stimuli are largely unaffected.

### **ENHANCED SPONTANEOUS LFP POWER IN AWAKE QUIESCENT PERIOD**

To systematically explore the levels of spontaneous power throughout the periods of inter-ASSR stimulus intervals and the post-ASSR period, we further assessed the transition of spontaneous LFP power (z-score) from the pre-stimulus period to the inter-stimulus periods post to the first, 25th and 50th 40-Hz clicktrain administration, and the post-stimulus period 20 min after the cessation of last (50th) 40-Hz click-train (**Figure 3A**). First, we assessed the power spectra of z-score normalized LFPs during the pre-stimulus period from awake head-restrained animals. We found that baseline spontaneous power during the last 10-s prestimulus period prior to the first click-train administration was augmented in the mutants compared to the controls regardless of the spectral frequency found in both per-animal (**Figure 3B**) and per-channel (Supplemental Figure 3A) design. The intensities of averaged power for baseline LFPs at low gamma (30–50 Hz) and high gamma (50–100 Hz) range were both higher in the mutant mice compared to the controls per animal [**Figure 3C**, *t*(11) = 3.00, *p* < 0.01, *d* = 1.67 for low gamma; *t*(11) = 3.13, *p* < 0.01, *d* = 1.74 for high gamma] and per channel [Supplemental Figure 3B, *t*(55) = 6.41, *p* < 0.01, *d* = 1.66 for low gamma; *t*(55) = 5.56, *p* < 0.01, *d* = 1.46 for high gamma]. This elevation of LFP fluctuation continued even at super gamma frequency (100–120 Hz) per animal [*t*(11) = 2.41, *p* < 0.05, *d* = 1.33 (large effect size)]

## **SPONTANEOUS LFP POWER RETURNS BACK TO NORMAL UPON ASSR STIMULI**

After the 40-Hz ASSR session began, we found a clear trend of a gradual reduction in mutant spontaneous LFP power amplitudes during inter-stimulus intervals with the increasing number of ASSR stimuli (**Figures 4A–C**, **5A–C**). For example, spontaneous LFP power per animal was reduced in the inter-stimulus period following the 25th ASSRs, compared to the pre-ASSR period at 35–44 Hz [**Figure 4B**, *F*(1, 11) = 1.389, *p* = 0.263 for genotype, Bonferroni *post-hoc* test, *p* < 0.05]. In per-channel design, spontaneous LFP power at 21–30 Hz [**Figure 5A**, *F*(1, 54) = 1.326, *p* = 0.255 for genotype, Bonferroni *post-hoc* test, *p* < 0.05], 35–44 Hz [**Figure 5B**, *F*(1, 54) = 6.392, *p* = 0.014 for genotype, Bonferroni *post-hoc* test, *p* < 0.05], and 71–80 Hz [**Figure 5C**, *F*(1, 54) = 2.707, *p* = 0.106 for genotype, Bonferroni *post-hoc* test, *p* < 0.05], were all decreased by the 25th ISI in the mutants. On the other hand, the spontaneous LFP power in the control mice tended to increase after the 1st ASSR stimuli, particularly in beta frequency range (**Figure 4A**). This power increase upon ASSR stimuli in the control mice was prominent in the per-channel design (**Figures 5A,B**, *p* < 0.05, respectively). Consequently, no genotypic difference was detected in spontaneous LFP power magnitudes during 10-s inter-stimulus intervals (combined data of first, 25th and 50th ISIs) at any power spectra examined per animal (**Figure 4D**) and per channel (**Figure 5D**). Interestingly, 20-min after the last ASSR, the spontaneous LFP power in the mutants was significantly augmented [**Figure 4A**, *F*(1, 11) = 1.104, *p* = 0.335 for genotype, Bonferroni *post-hoc* test, *p* < 0.05] to the level of pre-stimulus period. The elevation of LFP power amplitudes was more prominent in per channel analysis (**Figures 5A–C**, *p* < 0.05). These results suggest a brain statedependent abnormality of baseline spontaneous LFP power in the mutant mice, i.e., an abnormally high spontaneous LFP power in an awake quiescent period, which disappears upon receiving external auditory stimuli. Our findings also strongly suggests that the evoked ASSR deficit found in our mutants is not due to greater baseline spontaneous gamma power, rather it is simply caused by the deficits in evoking responses by external stimuli.

## **DISCUSSION**

We demonstrated abnormal oscillatory LFP power and impaired auditory-evoked LFP responses from the auditory cortex of awake, head-restrained GABA neuron-specific NMDAR hypofunction mice. Specifically, we found (1) a profound reduction of ASSR power and phase locking at 40-Hz, and to lesser degree, at 20-Hz, and (2) a broadband increase in spontaneous LFP power during the pre-stimulus period, but not during the inter-ASSR stimulus intervals. Interestingly, abnormal elevation of baseline spontaneous LFP power during the prestimulus period disappeared after the ASSR stimuli were presented. These finding suggest that NMDAR hypofunction in cortical GABAergic interneurons leads to two temporally distinct, brain state-dependent LFP deficits in A1 cortex; (1) the evoked ASSR deficits with normal level of spontaneous LFP

**FIGURE 3 | Broadband elevation of mutant spontaneous LFP power during pre-stimulus period. (A)** Schematic diagram indicates the analysis periods of spontaneous LFP power during pre-stimulus period (Pre), following the first ASSR stimuli [1st inter-stimulus interval (ISI)], following the 50th ASSR stimuli (50th ISI), and during post-stimulus period (Post). For a pre-stimulus period, z-score normalized LFPs (top line) during last 10 s (upper left green square) before the first click train onset were analyzed with FFT algorism with every 200-ms bin (each box). During ASSR sessions, LFP data from 5 to 15 s (200-ms bin) after the 1st stimuli (upper right green square), the 25th stimuli (not shown), and 50th stimuli (bottom left green square)

power and (2) abnormal broadband elevation of spontaneous LFP power when no auditory stimuli are presented. Our study also showed no obvious contribution to the evoked ASSR deficits of augmented spontaneous LFP fluctuation following NMDAR hypofunction in GABAergic interneurons.

#### **POTENTIAL MECHANISMS UNDERLYING EVOKED ASSR DEFICITS**

We demonstrated robust ASSR deficits in the mutant mice in which NMDARs were selectively eliminated from 75 to 84% of PV-containing interneurons in neocortex (Belforte et al., 2010). This suggests that NMDARs in the PV-positive fast-spiking neurons are crucial for emergence of ASSRs. Optogenetically evokedgamma oscillations have also been shown to be defective in mice in which NMDARs are genetically ablated from all PV-positive neurons (Carlén et al., 2012). The mechanism by which NMDAR deletion from PV neurons results in the ASSR deficits in not fully known. However, activation of cortical PV-positive interneurons were analyzed with FFT algorithm. For a post-stimulus period, LFPs were obtained from a 10-s period (bottom right green square) 20 min after cessation of the last click trains were analyzed with FFT algorithm (200-ms bin). **(B)** Z-score normalized spectral density power during pre-stimulus period from control (blue) and mutant (red) mice (control: *n* = 7, mutant: *n* = 6). Dotted lines: mean ± s.e.m. A 60-Hz bump in control LFP power spectra was due to power line noise contamination. **(C)** Averaged spontaneous LFP powers at low gamma (30–50 Hz), high gamma (50–100 Hz) frequency range were higher in mutant (red) mice compared to control mice (blue). ∗∗*p* < 0.01, unpaired Student's *t*-test.

in the thalamorecipient circuit is known to enhance acoustic information flow by feed-forward inhibition, which contributes to improved signal-to-noise ratio (Hamilton et al., 2013). In particular, the firing rate of fast-spiking neurons, likely PV-positive, appears to increase with increasing attention to external stimuli (Mitchell et al., 2007; Chen et al., 2008). It is noted that although selective genetic GluN1 deletion also occurs in ∼30% of Reelinpositive interneurons in the mutant cortex, Reelin-positive neurons are located mostly in the supra-granular layers. Therefore, the most likely mechanism for our observation is a functional deficit in the NMDAR-deleted PV neurons that receive thalamocortical afferents. Presumed impairment in their feed-forward inhibition in response to acoustic stimuli may attenuate the generation of auditory evoked potentials followed by gamma oscillations. Further research exploring whether NMDAR hypofunction in cortical PV-neurons disturbs feedforward information flow elicited by auditory stimuli in A1 cortex is warranted.

**FIGURE 4 | Normal magnitude of baseline LFP power during periods of periodic ASSR stimuli, by per-animal design. (A)** Transition of z-score normalized spontaneous LFP powers per animal at 21–30 Hz frequency in control (blue) and mutant (red) mice during Pre (pre-stimulus period), 1st ISI (first inter-stimulus interval), 25th ISI, 50th ISI, and Post (post-stimulus period). ∗*p* < 0.05. **(B)** Transition of spontaneous LFP powers at 35–44 Hz in control (blue) and mutant (red) mice. ∗*p* < 0.05. **(C)** Transition of spontaneous LFP powers at 71–80 Hz in control (blue)

and mutant (red) mice. ∗*p* < 0.05. Repeated-measures ANOVA followed by *post-hoc* Bonferroni testing. **(D)** No differences in averaged spontaneous LFP power amplitudes in the first, the 25th and the 50th ISIs, across frequencies between control (blue, *n* = 7) and mutant (red, *n* = 6) mice. The inset shows no difference in average LFP power amplitudes at low gamma (30–50 Hz) and high gamma (50–100 Hz) frequency. A 60-Hz bump in control LFP power spectra was due to power line noise contamination. Dotted lines: mean ± s.e.m.

## **POTENTIAL MECHANISMS UNDERLYING THE ENHANCED BASELINE LFP FLUCTUATION**

We also observed elevated spontaneous LFP oscillatory power in the pre-stimulus period before the animal attends to the auditory stimuli. Since genetic ablation of NMDARs selectively from PV neurons in awake mice also results in increased baseline power (Korotkova et al., 2010; Carlén et al., 2012), this finding is most likely due to NMDAR hypofunction in PV neurons. Similar results were also obtained from GluN1 hypomorph mice (Dzirasa et al., 2009; Gandal et al., 2012b) and from the acute administration of NMDAR antagonists (phencyclidine, ketamine, or MK-801) to rodents (Leung, 1985; Ma and Leung, 2000, 2007; Pinault, 2008; Ehrlichman et al., 2009; Hakami et al., 2009; Pálenícek et al., ˇ 2011; Kulikova et al., 2012; Wood et al., 2012; Caixeta et al., 2013; Molina et al., 2014), to humans (Maksimow et al., 2006; Hong et al., 2010), and in *in vitro* slice preparation (McNally et al., 2011). The most likely mechanistic explanation for these effects is that cortical disinhibition elicited by NMDAR deletion from local PV neurons render the cortical glutamatergic neurons hyperexcitable (Olney and Farber, 1995; Homayoun and Moghaddam, 2007; Lisman et al., 2008; Nakazawa et al., 2012). However, the NMDAR hypofunction-induced baseline power increase is unlikely to be caused by hyper-synchrony of spiking activity. A recent *in vivo* unit/LFP recording study revealed that cortical disinhibition elicited by MK-801, a NMDAR antagonist, evoked an increase in the number of random spike trains of individual units and consequently a reduced synchronized firing of action potentials in mPFC of free-moving rats, despite a robust increase in LFP power at gamma frequency (Molina et al., 2014). This finding suggests a decoupling of gamma band LFP power from neuronal spiking synchrony. Similarly, we also previously reported in the same mutant mice used in this study there was a disruption in *in vivo* spike synchrony among pyramidal neurons in somatosensory cortex (Belforte et al., 2010). Therefore, the spontaneous LFP power increase following cortical NMDAR hypofunction may simply reflect a robust increase in synaptic inputs with aberrant or "noisy" spike firing. It is also plausible that NMDAR antagonism on GABAergic neurons in the basal ganglia and/or thalamic reticular nucleus causes disinhibition of thalamocortical neurons, leading to massive stimulation of cortical neurons at gamma frequency (Llinás and Ribary, 1993; Santana et al., 2011). However, this is unlikely in our model because the genetic manipulation is largely confined to the cortex and hippocampus.

**FIGURE 5 | Normal magnitude of baseline LFP power during periods of periodic ASSR stimuli, by per-channel design. (A)** Mean normalized powers in per channel design for 21–30 Hz frequency LFP fluctuation in control (blue, *n* = 31 sites from 7 animals) and mutant (red, *n* = 26 sites from 6 animals) mice during Pre-ASSR, 1st ISI, 25th ISI, 50th ISI, and post-ASSR. ∗∗*p* < 0.01, repeated-measures ANOVA followed by *post-hoc* Bonferroni testing. **(B)** Mean normalized powers for 35–44 Hz frequency LFP fluctuation in control (blue) and mutant (red) mice during Pre-ASSR, 1st ISI, 25th ISI, 50th ISI, and Post-ASSR. ∗∗*p* < 0.01 and ∗*p* < 0.05, repeated-measures ANOVA followed by *post-hoc* Bonferroni testing. **(C)** Mean normalized

#### **BRAIN STATE-DEPENDENT ELEVATION OF SPONTANEOUS LFP POWER**

Unexpectedly, we found that spontaneous LFP power amplitudes tends to decrease during the repeated ASSR stimuli; a phenomenon which was more robust in per-channel design (**Figure 5**). Accordingly, the broadband elevation of spontaneous LFP power in the pre-stimulus period (**Figure 3B**) disappeared during the inter-ASSR stimulus periods (**Figure 4D**). The structure of cortical spontaneous activity is known to vary with cortical state or behavioral state (Steriade et al., 2001; Harris and Thiele, 2011). During the slow-wave sleep period and awake quiescent period, auditory cortex exhibits fluctuations of global activity between "synchronized" states of larger low frequency waves known as up and down state (Steriade et al., 1993; Harris and Thiele, 2011). In active wakefulness during tone presentation, these fluctuations are replaced by the "desynchronized" state characterized by low amplitude, high frequency LFPs (Castro-Alamancos, 2004). It has been reported that superficial pyramidal cells and putative fast-spiking neurons in rat A1 cortex dominate in awake quiescent period, and their activity was largely suppressed during auditory stimuli-induced cortical powers for 71–80 Hz frequency LFP fluctuation in control (blue) and mutant (red) mice during Pre-ASSR, 1st ISI, 25th ISI, 50th ISI, and Post-ASSR. ∗∗*p* < 0.01, ∗*p* < 0.05, repeated-measures ANOVA followed by *post-hoc* Bonferroni testing. **(D)** No differences in averaged spontaneous LFP power amplitudes in the first, the 25th and the 50th ISIs, across frequencies between control mice (blue, *n* = 31 sites from 7 animals) and mutant mice (red, *n* = 26 sites from 6 animals). The inset shows no difference in average LFP power amplitudes at low gamma (30–50 Hz) and high gamma (50–100 Hz) frequency. A 60-Hz bump in control LFP power spectra was due to power line noise contamination. Dotted lines: mean ± s.e.m.

desynchronization (Sakata and Harris, 2012). The firing of fastspiking neurons in rat somatosensory cortex, which is highly active during quiet wakefulness, is also dramatically suppressed during active whisking behavior (Gentet et al., 2010). Considering that the majority of cell-types in which NMDAR elimination occurred in our mutant mice are PV-positive fast-spiking neurons, it is conceivable that the state-dependent elevation of spontaneous LFP power reflects the dysfunction of mutant A1 cortex fast-spiking neurons during awake quiescent period. However, a recent study showed a dramatic increase in the putative fastspiking neurons in visual cortex by the active running in a head-restrained condition that may elicit desynchronized state (Niell and Stryker, 2010). Further study is necessary to clarify the mechanisms of the state-dependent elevation of spontaneous LFP power observed in our mutant mice.

#### **COMPARISON TO CLINICAL DATA**

Overall, the present results were consistent with the clinical EEG data showing reductions in the onset of auditory evoked responses (P50, N100) and of 40-Hz ASSR power and phase-locking in the cortex of individuals with schizophrenia, supporting the face validity of our mouse model. Furthermore, our findings argue against the possibility that 40-Hz ASSR deficits in patients with schizophrenia may reflect antipsychotic effects (Woo et al., 2010). However, we also found several findings inconsistent with the human data. First, in human adult subjects 40-Hz click trains induce the maximal ASSR at 40 Hz and the effects of 40-Hz stimuli at 20 and 30 Hz are smaller compared to 40 Hz (Galambos et al., 1981; Pastor et al., 2002; Picton et al., 2003). Since the optimal input frequency of fast-spiking neurons for action potential generation is known to be 30–50 Hz in rats (Pike et al., 2000), the ASSR impairment selectively at 40 Hz stimulation may suggest unequivocal deficits of fast-spiking neurons in patients with schizophrenia. In our study, however, 40-Hz stimuli induced a resonance peak at 30 Hz in addition to the 40-Hz peak (**Figure 1D**) whereas the phase locking spectrum showed a peak only at 40 Hz (**Figure 1G**). This may suggest that the murine A1 cortex exhibits a broader resonance frequency (30 Hz as well as 40 Hz) than in humans; although no power peak at 30 Hz was detected when stimulated at 20 Hz (**Figure 2B**). Further study is warranted to determine whether the resonant frequency to auditory stimuli is varied depending on the species.

Second, 20-Hz ASSRs are usually unaffected in schizophrenia (Kwon et al., 1999; Light et al., 2006; Vierling-Claassen et al., 2008); however some human ASSR studies also showed attenuation in 20-Hz ASSRs (Krishnan et al., 2009). In contrast, in our model 20 Hz ASSRs are reduced in power and phase locking (**Figure 2**). Nonetheless, the mutant ASSR peak at 20 Hz was still visible (**Figure 2B**) and attenuation of phase locking at 20 Hz was modest (**Figure 2E**), compared to robustness of 40-Hz ASSR deficits. Furthermore, the initial N1 responses triggered by 20- Hz ASSR stimuli were normal in the mutant mice. Therefore, the degree of evoked ASSR deficits appears to be more robust at 40- Hz than at 20-Hz in our mutant mice. It is conceivable that LFP recording directly from A1 cortex is more sensitive to detect ASSR impairment, compared to clinical skull-EEG recording.

Third, broadband enhancement of baseline EEG may not be characteristic of studies of resting EEG in patients with schizophrenia (Winterer et al., 2004; Kikuchi et al., 2011; Silverstein et al., 2012). However, Spencer (2012) re-analyzed their previous data which showed deficits in auditory evoked gamma oscillations and found that the pre-stimulus baseline gamma power was increased in the left auditory cortex of chronic patients. Interestingly, in his study, the baseline power increased across a wide frequency band (15–100 Hz) and this broadband increase was marginally significant, which is consistent with our finding.

Finally, this study involved relatively small sample sizes under per-animal analysis design (7 control and 6 mutant mice), which could be a confounding factor. However, nearly the same results were obtained by per-channel analysis (for example, **Figure 4** vs. **Figure 5**), which further supports our conclusion.

#### **CLINICAL MANIFESTATION OF BASELINE LFP POWER INCREASE**

Given that sensory-evoked gamma oscillation deficits are presumably linked to the cognitive deficits (Spencer et al., 2004; Cho et al., 2006), there are several possible clinical manifestation of baseline power increase. Increased baseline gamma oscillations have been reported in patients during psychotic episodes, including visual and auditory hallucinations (Baldeweg et al., 1998; Ropohl et al., 2004; Lee et al., 2006; Becker et al., 2009). Other studies suggest a link between baseline gamma oscillations and negative symptoms (Suazo et al., 2012), working memory (Winterer et al., 2004; Suazo et al., 2012), or synaptic plasticity (Bikbaev et al., 2008; Kulikova et al., 2012). A recent metaanalysis of functional neuroimaging in schizophrenia patients with auditory hallucinations revealed "paradoxical" engagement of A1 cortex, such that left A1 cortex displayed increased activation in the absence of external auditory stimuli (but with auditory verbal hallucinations), and decreased activation when an external stimulus was actually present (Kompus et al., 2011). Consistent with this, our mutant mice also exhibited an increase in baseline spontaneous LFP increase in the absence of external stimuli, which tended to decrease during repetitive ASSR stimuli and to return back to the elevated level 20 min after the last ASSR stimuli. This remarkable similarity between human patient studies and the finding in the present study may suggest that baseline LFP power increase is a signature of "paradoxical" A1 cortex activation in the absence of external stimuli. Further studies are warranted to assess the clinical and neurobiological significance of oscillations and synchrony deficits in schizophrenia.

#### **AUTHOR CONTRIBUTIONS**

Kazuhito Nakao conceived and designed the study, performed the experiments, assembled, analyzed and interpreted the data, and wrote manuscript. Kazu Nakazawa conceived and designed the study, interpreted the data, and wrote the manuscript.

#### **ACKNOWLEDGMENTS**

This work was supported by an NIH grant K22MH099164 (Kazu Nakazawa) and by NIH Intramural Research Program. We thank Stefan Kolata and Kentaroh Takagaki for their advice on the earlier version of manuscript.

#### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fnins.2014. 00168/abstract

#### **REFERENCES**


performance of EEG spectral entropy monitor during S-ketamine anesthesia. *Clin. Neurophysiol.* 117, 1660–1668. doi: 10.1016/j.clinph.2006.05.011


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 26 April 2014; accepted: 02 June 2014; published online: 01 July 2014.*

*Citation: Nakao K and Nakazawa K (2014) Brain state-dependent abnormal LFP activity in the auditory cortex of a schizophrenia mouse model. Front. Neurosci. 8:168. doi: 10.3389/fnins.2014.00168*

*This article was submitted to Auditory Cognitive Neuroscience, a section of the journal Frontiers in Neuroscience.*

*Copyright © 2014 Nakao and Nakazawa. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Hearing impairment in the P23H-1 retinal degeneration rat model

#### *Jorge V. Sotoca1,2, Juan C. Alvarado1, Verónica Fuentes-Santamaría1, Juan R. Martinez-Galan1 and Elena Caminos <sup>1</sup> \**

*<sup>1</sup> Deparment of Medical Sciences, School of Medicine and Institute for Research in Neurological Disabilities (IDINE), University of Castilla-La Mancha, Albacete, Spain*

*<sup>2</sup> Barn och Ungdomsmedicin, Eskilstuna, Sweden*

#### *Edited by:*

*Monica Munoz-Lopez, University of Castilla-La Mancha, Spain*

#### *Reviewed by:*

*Neil M. McLachlan, The University of Melbourne, Australia Avril Genene Holt, Wayne State University, USA*

#### *\*Correspondence:*

*Elena Caminos, Deparment of Medical Sciences, Faculty of Medicine, Institute for Research in Neurological Disabilities (IDINE), University of Castilla-La Mancha, C/Almansa 14, 02006 Albacete, Spain e-mail: elena.caminos@uclm.es*

The transgenic P23H line 1 (P23H-1) rat expresses a variant of rhodopsin with a mutation that leads to loss of visual function. This rat strain is an experimental model usually employed to study photoreceptor degeneration. Although the mutated protein should not interfere with other sensory functions, observing severe loss of auditory reflexes in response to natural sounds led us to study auditory brain response (ABR) recording. Animals were separated into different hearing levels following the response to natural stimuli (hand clapping and kissing sounds). Of all the analyzed animals, 25.9% presented auditory loss before 50 days of age (P50) and 45% were totally deaf by P200. ABR recordings showed that all the rats had a higher hearing threshold than the control Sprague-Dawley (SD) rats, which was also higher than any other rat strains. The integrity of the central and peripheral auditory pathway was analyzed by histology and immunocytochemistry. In the cochlear nucleus (CN), statistical differences were found between SD and P23H-1 rats in VGluT1 distribution, but none were found when labeling all the CN synapses with anti-Syntaxin. This finding suggests anatomical and/or molecular abnormalities in the auditory downstream pathway. The inner ear of the hypoacusic P23H-1 rats showed several anatomical defects, including loss and disruption of hair cells and spiral ganglion neurons. All these results can explain, at least in part, how hearing impairment can occur in a high percentage of P23H-1 rats. P23H-1 rats may be considered an experimental model with visual and auditory dysfunctions in future research.

**Keywords: ABR, cochlea, cochlear nucleus, hearing loss, immunocytochemistry, retinitis pigmentosa**

#### **INTRODUCTION**

The P23H mutant rhodopsin transgenic rat is an experimental model of retinal degeneration that exhibits gradual, fast photoreceptor loss with similar properties to human autosomal dominant retinitis pigmentosa (Berson et al., 1991; Lewin et al., 1998). Transgenic rats have a proline-23 to histidine (pro23His) *rhodopsin* mutation (LaVail et al., 2000; Aleman et al., 2001). Numerous studies have widely elucidated the morphology, function and molecular retina features of these animals to contribute to make progress in gene therapies, retina transplantation and alternative therapeutic approaches to slow photoreceptor degeneration (Machida et al., 2000; Aleman et al., 2001; Green et al., 2001; Zhang et al., 2003; Cuenca et al., 2004; Salzmann et al., 2006; García-Ayuso et al., 2010; Gorbatyuk et al., 2010; Kolomiets et al., 2010; Fernández-Sánchez et al., 2012; Jensen, 2012; Lu et al., 2013; Rahmani et al., 2013). There are three P23H mutant rat lines with different photoreceptor degeneration rates (http:// www*.*ucsfeye*.*net/mlavailRDratmodels*.*shtml). The Line 1 animals have a higher level of transgene expression than the Line 3 animals and they have faster degeneration than the rats of Lines 2 and 3. Moreover, homozygous animals produce a faster degeneration rate than heterozygotes; indeed the retina of these animals has been studied in depth, but the rest of the central and peripheral nervous systems remains completely unexplored. In the present study, we evaluate the auditory capacity of the homozygous P23H line 1 (P23H-1) rats at different ages by using physiological and morphological techniques. Recent studies have found a direct relationship between rhodopsins and the auditory system (Shimano et al., 2013; Coleman et al., 2014). Rhodopsins virally targeted within auditory neurons of the dorsal cochlear nucleus had no detrimental effects on hearing and may be useful to modulate the activity of specific auditory neurons (Shimano et al., 2013).

The idea of testing the auditory system in P23H-1 rats emerged after observing a low response of these animals to natural stimuli (one single clap and kissing sounds). We decided to define the functional capabilities of specific components of the auditory system using auditory brain response (ABR) recordings. The components of the auditory pathway, analyzed by ABR, were those generally accepted in rodents: Cochlear Nerve (wave I); Cochlear Nuclei (wave II); Superior Olivary Complex (wave III); Lateral Lemniscus and/or Inferior Colliculus (wave IV); and Inferior Colliculus and/or Geniculated Body (wave V) (Simpson et al., 1985; Chen and Chen, 1991; Alvarado et al., 2012).

The ABRs results prompted us to evaluate whether physiological auditory alterations are also reflected in the morphological features of the central and peripheral auditory pathways. The cochlear nucleus (CN) of the auditory brainstem, the first relay station of the central auditory pathway, is a useful model system to achieve these goals (Benson et al., 1997). In particular, we examined the anteroventral subdivision of the CN (anteroventral CN, AVCN), where the cell bodies of the spherical and globular/bushy cells are localized. Spherical bushy cells receive direct input from the auditory nerve, called the endbulb of Held. One or two endbulbs reach the cell body of a spherical bushy cell which, in turn, transmits high-fidelity temporal information to structures in the superior olivary complex (Malmierca, 2003; Ryugo and Parks, 2003). Cochleae of P23H-1 rats were also analyzed. The organ of Corti, the modiolus and the stria vascularis are the three main cochlea structures. The organ of Corti contains inner and outer hair cells that amplify the acoustic signal and transform the mechanical signal into an electrical one. The modiolus contains bipolar spiral ganglion neurons which connect hair cells with the cochlear nucleus neurons in the brainstem through the auditory nerve. Finally, the stria vascularis contains epithelial and endothelial cells that maintain the ion composition of the endolymph (Raphael and Altschuler, 2003). These three cochlear regions are involved in hearing loss due to noise, age, ototoxic substances or inherited deafness (Li et al., 2013; Alvarado et al., 2014; El-Amraoui and Petit, 2014; Sun et al., 2014).

The main goal of the present study was to evaluate the hearing capabilities of mutant rhodopsin transgenic rats P23H-1. Accordingly, we hypothesize that severe loss of auditory reflexes in response to natural sounds in P23H-1 rats might be the result of structural and functional alterations along peripheral and central pathways. To investigate this possibility, the integrity of the inner ear and the cochlear nucleus, the first relay station of the auditory pathway, was investigated in P23H-1 rats in comparison to Sprague-Dawley animals. Our results provide evidence that the P23H-1 homozygous transgenic rat represents an excellent animal model to study visual and auditory dysfunctions. While P23H rats are widely used to study photoreceptor degeneration, in the auditory system, they could be used to understand the specific function of neurons in hearing because it is known that rhodopsin can mediate neuronal activity in auditory neurons (Shimano et al., 2013).

## **MATERIALS AND METHODS**

#### **ANIMALS**

Transgenic P23H-1 homozygous albino rats for breeding were kindly provided by Dr. Matthew LaVail (UCSF School of Medicine, Beckman Vision Center, San Francisco, CA, USA), and were bred in a colony at the University of Castilla-La Mancha. As wild-type controls, Sprague-Dawley (SD, Charles River Laboratories, Barcelona, Spain) were used. All the animals were housed and handled according to the authorization and supervision of the Animal House Facility of the University of Castilla-La Mancha, after receiving approval from the Ethics Committee for Experimental Animal Welfare of the University of Castilla-La Mancha (CEEA). These studies were conducted in accordance with the guidelines of the European Council (Directive 2010/63/UE) and current Spanish regulations (R.D. 1201/2005 and P.L. 121/000123/2007) the use and care of animals in research.

## **AUDITORY TESTING**

Auditory stimuli were presented in a normal laboratory environment in the animal house. Rats were stimulated with two natural sounds: single clap (SC) and kissing sounds (KS).

Singly housed animals were tested while still in their cages with the lid removed and were exposed to one SC. Then, the rat were exposed to two short KS. Approximately, 2 min were spent to evaluate each animal once per week. Animals were scored in the ears depending on the observed response.

In order to generate a standard test battery, these sounds were recorded with a sound level meter, model VOLTCRAFT SL-200 (Ventus, Madrid, Spain). Stimuli were always run by the same person and in the same room with no other sound disturbance. SD and P23H rats lived and were stimulated under the same conditions. The intensity of a SC fell within the 85–95 dB range, with a frequency range between 1 and 2.5 kHz, while that of the KS ranged between 60 and 65 dB, with a frequency range between 1.1 and 7.2 kHz. The response to these sounds was evaluated in 54 P23H-1 rats (from P20 to P200, or older) and in two SD rats (P80 and P112). All the P23H-1 rats came from one breeding pair of homozygous P23H-1 rats and from crossings of first generation members. The auditory response consisted in observing the whole body startle response and Preyer's reflex (Jero et al., 2001). Animals responded to the KS by displaying a rapid movement of the whole body and then remained motionless with their head up for a split second ("alert movement") before they continued with their exploratory movements. The response to the SC was a rapid movement of the whole body, but the animal did not remain still, but continued making exploratory movements, with their head down and smelling the box. Each animal was ranked based on its response to these gross auditory assessments at three auditory capacity levels. Level I: rats responded to both the SC and KS with fast reflexes (considered to be normal audition); Level II: rats responded only to the KS by displaying slow alert movements; Level III: animals did not respond to either stimuli.

## **AUDITORY BRAINSTEM RESPONSE (ABR)**

To apply more objective measures to define the functional capabilities of the P23H rat auditory system, ABRs measurements were analyzed in 17 rats after the first assessment of auditory response to the above-described natural sounds. Fifteen P23H-1 transgenic rats were selected which presented different levels of audition subsequently to the auditory test results (Levels I–III) and at several postnatal ages (P20; P52; P56; P80; P100; P133; P180). Two control SD rats (aged P80 and P112), classified as Level I, were used during the gross hearing assessment.

The ABRs were performed as previously described (Alvarado et al., 2012; Lamas et al., 2013). Briefly, animals were located in a sound-attenuating, electrically shielded booth (EYMASA/INCOTRON S.L., Barcelona, Spain) and were placed inside a sound-attenuating room. Anesthesia was induced (4%) and maintained (1.5–2%) with isoflurane (1 L/min O2 flow rate). Subdermal electrodes (Rochester Electro-Medical, Tampa, FL, USA) were placed at the vertex (non-inverting) and under the right (inverting) and left (ground) ears. Acoustic stimulation and recordings were performed with a Tucker-Davis (TDT) BioSig System III (Tucker-Davis Technologies, Alachua, FL, USA). Stimuli consisted in tone bursts (5 ms rise/fall time, no plateau, cos2 envelope, at 20/s) at seven different frequencies (0.5, 1, 2, 4, 8, 16, and 32 kHz), which were generated digitally with the SigGenRP software (Tucker-Davis Technologies) and the RX6 Piranha Multifunction Processor hardware (Tucker-Davis Technologies). Stimuli were delivered into the external auditory meatus in the right ear using an EDC1 electrostatic speaker driver (Tucker-Davis Technologies) through an EC-1 electrostatic speaker (Tucker-Davis Technologies). Throughout the procedure, rat temperature was monitored by a rectal probe and was maintained at 37.5 ± 1◦C with a non-electrical heating pad. Prior to the experiments, stimuli were calibrated by the SigCal software (Tucker-Davis Technologies) and an ER-10B+ low-noise microphone system (Etymotic Research Inc., Elk, Groove, IL, USA). The evoked potentials were filtered (0.3–3.0 kHz), averaged (500 waveforms) and stored for offline analyses.

In order to determine the auditory threshold level, evoked responses were recorded in 5 dB steps, which descended from a maximum stimulus intensity of 80 dB SPL. The auditory threshold was defined as the stimulus intensity that evoked waveforms with a peak-to-peak voltage higher than 2 standard deviations of background activity (measured before stimulus onset).

#### **IMMUNOFLUORESCENCE PROCEDURE IN THE COCHLEAR NUCLEUS**

The integrity of the auditory synapses was analyzed by immunocytochemistry in the first relay station of the central auditory pathways, the AVCN, where output and input from ascending and descending pathways are present. At the end of the recording sessions, six animals (2 SD and 4 P23H rats) were anesthetized with ketamine (100 mg/kg, Parke-Davis, Alcobendas, Spain) and 2% xylazine (10 mg/kg, Dibapa, Barcelona, Spain) and were transcardially perfused with 0.9% saline and 2% paraformaldehyde in 0.1 M phosphate buffer (PB), pH 7.3. Brains were removed, postfixed for 4 h in the same fixative, washed in 0.1 M PB, transferred into PB containing 30% sucrose, and embedded in Tissue Tek (Leica, Wetzlar, Germany). Serial sections were obtained at 16 µm with a cryostat (Leica), mounted onto Super Frost slides (Kindler, Freiburg, Germany) and used for immunocytochemistry.

Single immunofluorescence procedures were performed as previously described (Caminos et al., 2007) to test the presence of VGluT1 and Syntaxin in the AVCN synapses of the control rats (*SD, n* = 2, P80 and P112), P23H rats with ABR waves (Level I, *n* = 2, P52 and P80), and P23H rats with profound deafness (Level III, *n* = 2, P52 and P100). Cryosections were washed in phosphate-buffered saline, pH 7.3 (PBS), containing 0.25% Triton X-100 (PBST), and were pre-incubated for 1 h at room temperature (RT) with blocking solution containing PBST and 1% BSA (Fraction V, Sigma-Aldrich, Steinheim, Germany). Sections were then incubated with primary antibodies to recognize type I vesicular glutamate transporter (mouse anti-VGluT1 monoclonal antibody, 1:500, clone N28/9, NeuroMab, Davis, CA, USA) and Syntaxin (mouse anti-Syntaxin monoclonal antibody, 1:1000, clone HPC-1, Sigma, Germany). Antibodies were diluted in PBST-BSA overnight at RT in a humid chamber and were then washed with PBST. Immunoreactivity was visualized using a secondary anti-mouse antibody coupled to Cy5 (1:200; Jackson ImmunoResearch, West Grave, PA, USA) for 1 h at RT. Finally, brain sections were washed in PBST, air-dried in the dark and mounted with Duolink *In Situ* Mounting Medium with DAPY (Olink Bioescience, Uppsala, Sweden). Immunofluorescence sections were examined under a Zeiss LSM 710 laser scanning confocal microscope (Zeiss, Germany). Images were analyzed using the ZEN 2009 Light Edition software (Zeiss) and the Image-J software (Rasband, W.S. National Institutes of Health, Bethesda, MD, USA, http://rsb*.*info*.*nih*.*gov/ij/). The immunolabeling controls included (1) incubation with a primary antibody followed by a conjugated secondary antibody which did not recognize the host species in which the corresponding primary antibody was obtained; (2) omission of the primary antibody. Neither control showed specific labeling.

For the statistical analysis, the digital Z-axis image stacks of the VCN were acquired from each brain section (14–16 sections/animal) under the confocal microscope using a Plan-Apochromat 20x/0.8 M27 objective. Images size was 424*.*68 × 424*.*68µm and resolution was 1024 × 1024 (8-bit). All the confocal settings were the same for the analyses across all the analyzed images. From each stack, the image with the greatest red-fluorescence intensity was automatically selected by the confocal software. Each confocal microscope image was separated into two images: one with red immunostaining and another with blue nuclei stained with DAPI. These selected images were used for the quantitative assessment of immunoreactive profiles for VGluT1 or Syntaxin with the Image-J software. In a first step, the scale bar was calibrated at a known distance to select the size of the immunofluorescence particles to be counted. Then each image was converted into a grayscale 8-bit and was then transformed into a "binary image." The red-fluorescence particles with a minimum size of 5 and 20µm were counted. In addition, the nuclei or the blue-fluorescence particles with a minimum size of 5µm were counted to determine the total number of cells. The statistical analysis was performed with the ratio fluorescent particles/number of nuclei per section. Data are presented as the mean ± Standard Deviation. The statistical analysis was performed by a Kruskal-Wallis test with a subsequent pair-wise comparison made using the Mann-Whitney *U*-test.

## **COCHLEAR HISTOLOGY AND IMMUNOCYTOCHEMISTRY**

Animals were anesthetized and perfused as indicated above. Cochleae were quickly removed, postfixed in 4% paraformaldehyde for 2 h and decalcified in 10% ethylenediamine tetraacetic acid (EDTA; pH 6.5) solution for 10 days. The left side was used for cochlear sensory epithelia surface preparations and the right side for myosin VIIa immunocytochemistry and Nissl staining.

#### *Whole mount immunocytochemistry*

Each turn of the organ of Corti was detached from the modiolus and the sensory epithelium incubated overnight with rabbit antimyosin VIIa primary antibody (1:100; Proteus Biosciences Inc., Ramona, CA, USA). The following day, the tissue was incubated for 2 h at RT in an anti-rabbit Alexa Fluor 594 secondary antibody and Alexa Fluor 488-conjugated Phalloidin (Invitrogen-Molecular Probers, Carlsbad, CA, USA) and was mounted with DAPI nuclear staining.

## *Myosin VIIa immunocytochemistry*

Cochleae were embedded in 10% gelatin, and were oriented in such a way that the modiolus was parallel to the base and frozen at −70◦C by immersion in a solution of 2-propanol. Cochleae were sectioned at 20µm on a cryostat and mounted onto Super Frost slides. To detect myosin, sections were rinsed in PBS containing 0.3% Triton X-100 (Tx) and blocked for 1 h in PBS-Tx (0.2%) containing 10% normal goat serum (NHS). The first series of sections was incubated overnight at 4◦C with myosin VIIa antibody in a solution containing PBS-Tx (0.2%), pH 7.4. Then, sections were washed 4 × 15 min in PBS-Tx (0.2%), and incubated for 2 h in biotinylated anti-rabbit secondary antibody (1:200; Vector Laboratories, Burlingame, CA) and for 1 in the avidinbiotin-peroxidase complex solution (ABC, Vector lab). Finally, sections were mounted onto gelatin-coated slides and coverslipped using Cytoseal (Richard-Allan Scientific, Kalamazoo, MI, USA). The second series of sections was stained with cresyl violet.

## **RESULTS**

### **RESPONSE TO GROSS AUDITORY ASSESSMENT**

The first aim of this study was to determine whether the response of P23H rats to auditory natural stimuli differed from that of the control SD rats. While all the SD rats responded to the SC and KS, not all the P23H rats responded to these stimuli. We classified animals into three levels based on the observed response to these gross auditory assessment: Level I, a normal hearing level, where rats responded to both the SC and KS stimuli; Level II, an intermediate hearing level, where rats responded only to the KS stimulus, and never to the SC one (lower frequency and higher intensity noise than the KS). Finally, Level III included animals with profound deafness, which meant that they did not respond to any sound.

We analyzed the response to natural stimuli in all the offspring of five couples of P23H-1 rats: (1) one breeding pair with a deaf male and a deaf female at age P100 (both Level III); (2 and 3) two breeding pairs of a normal male and a normal female at age P100 (Level I); (4) one male and female pair with Level II at P100; (5) one pair with a Level I male and a level II female at P100. All the pairs, except (3), had offspring with auditory responses at Levels I–III, regardless of their parents' hearing status. All the litters of pairs (1), (2), and (5) were totally viable and all the offspring survived. Nevertheless, only one litter of pair (4) survived, while no offspring of pair (3) survived. Thus, we cannot state that the survival of these P23H-1 rats depended on parents' auditory capacity. In fact, all the pairs, except (3), had offspring with auditory responses at Levels I–III, regardless of their parents' hearing status.

The auditory test was applied to all the descendants aged from P20 and it was repeated several times during their life until the animal was sacrificed (at P20–P60, P90–P110, P160– P200 and older). We found that most rats responded to both the KS and SC (Level I), some responded to only one stimuli and displayed slow alert movements (Level II), while others did not respond to any sound (Level III), even at P20 (**Table 1**). As these responses were variable during animals' life, we can state that auditory capacity diminished with age. **Table 1** includes the percentage of rats at the different auditory levels based on their response to natural stimuli. These data reflect progressive auditory capacity loss during the life of P23H-1 rats. Progression to deafness was evident from P100 onward, because 55% of normal rats (Level I) at P20 belonged to Level II at P100, and 45% of the oldest animals were profoundly deaf (Level III) at the age of P200.

Of all the young (P20–P60) rats, 25.9% presented moderate hearing loss, with the remaining 11.5% corresponding to the profound deafness group (**Table 1**). These numbers increased with age, mainly from P100, with 45% of the older adult rats not responding to either sound that they were tested for. The results of this hearing test were corroborated when some animals were subjected to ABR recordings.

### **AUDITORY BRAINSTEM RESPONSE (ABR)**

ABRs were recorded in 15 selected P23H-1 rats of different ages (P20; P52; P56; P80; P100; P133; P180) which corresponded to the three audition levels based on their responses to natural stimuli (see above), and in two control SD rats (P80 and P112). The P23H recordings were always compared with the ABR recordings of the control SD rats, which had the lowest auditory thresholds considered with normal audition (Burkard et al., 1990; Newton et al., 1992), as seen in **Figure 1**. Level I P23H-1 rats achieved the lowest auditory thresholds, which were always higher than those of the control rats. Level II P23H rats could hear a low frequency, but were totally deaf at a high frequency. P23H rats with audition Level III presented profound deafness at any frequency (**Figure 1**).

Consistently with previous studies in rodents, the ABR of the SD rats displayed the five typical evoked waves at the different frequencies evaluated (**Figure 2A**) with wave II being the largest of all the waves comprising the ABR recordings (Overbeck and Church, 1992; Church et al., 2010, 2012; Alvarado et al., 2012, 2014). Although morphology was similar in Level I P23H rats to that observed in SD rats at the low and intermediate frequencies, the amplitude of all waves diminished at higher frequencies

**Table 1 | Percentage of the P23H-1 rats classified into the three hearing levels based on their response to natural sounds at different ages.**


*Level I, considered "normal hearing" (response to the KS and SC sounds); Level II, a response only to the KS; Level III, considered deaf (no response to either stimuli); n, number of animals analyzed.*

**FIGURE 1 | Line graph showing the relationship between auditory thresholds and the frequency of the stimuli evaluated in the control SD rats (***n* **= 2) and P23H rats.** P23H rats were distributed according to their responses to natural stimuli into: Level I (*n* = 2), Level II (*n* = 3), and Level III (*n* = 10). The auditory threshold in P23H-1 rats was always higher than in SD rats. Data are expressed as mean ± standard deviation.

(**Figure 2B**). At level II, the amplitude of the waves across the frequencies also diminished for P23H, but became more evident at the higher one (**Figure 2C**). Finally for Level III P23H, which did not respond to either gross auditory assessment, the evoked waves at all frequencies evaluated were totally absent (**Figure 2D**). These results corroborate those obtained by applying the gross auditory assessment and show limitations in the normal auditory function of P23H rats.

#### **VGluT1 AND SYNTAXIN IMMUNOFLUORESCENCE IN THE VCN**

VGluT1 immunostaining under confocal microscopy revealed dense punctate immunostaining of the glutamatergic presynaptic terminals in the AVCN of both SD and P23H-1 transgenic rats (**Figures 3A,B**). A visual assessment of VGluT1 labeling showed an apparent difference in the distribution of VGluT1 between SD and P23H rats. The endbulbs of Held in SD rats were completely labeled and surrounded the bushy cell soma, while punctate VGluT1 immunolabeling was found in the endbulbs of the deaf P23H-1 rats. Accordingly, a statistical analysis was performed on the binary images of the VCN of the control SD rats (**Figures 3A ,A**), the P23H rats with normal hearing and the deaf P23H rats (**Figures 3B ,B**). Significant differences were found in the number of immunofluorescence particles larger than 5µm between the SD and deaf P23H rats (*p <* 0*.*001) (**Figure 3C**). Significant differences were also observed between the SD and P23H rats with normal hearing (*p <* 0*.*1), and between the deaf and normal P23H rats (*p <* 0*.*2), but levels of significance were lower (**Figure 3C**). When the fluorescent particle size was bigger than 20µm (**Figure 3D**), significant differences were also detected between not only the SD and deaf P23H rats (*p <* 0*.*001), but also between the SD and P23H-1 rats with normal hearing (*p <* 0*.*05) and the deaf P23H and normal P23H rats (*p <* 0*.*1). The number of cell nuclei (neurons and glial cells) labeled with DAPI in the AVCN of the control SD, the P23H rats with normal hearing and the profound deaf P23H rats was similar for them all, and no statistically significant differences were seen. It is common not to find remarkable morphological changes in the CN if we consider that when the auditory system approaches maturity

rat at P52 that responded to both natural stimuli (kissing sounds and one single clap; Level I). **(C)** A P23H rat at P80 that responded only to the kissing sound as a natural stimulus (Level II). Note that the waves are evident at low and intermediate frequencies in both examples **(B,C)**, but these waves lost their morphology at higher frequencies (4–8 kHz). **(D)** A P23H rat at P180, when the ABR totally lost the waveform, even at the lowest frequencies. Dashed lines indicate stimulus onset.

after hearing onset, and even in the adult brain, little or no neuronal loss occurs following cochlear injury (Mostafapour et al., 2000; Hildebrandt et al., 2011).

Syntaxin immunolabeling distribution was also analyzed in the AVCN of the control SD and P23H-1 rats with normal hearing (Level I) and in the deaf (Level III) ones (**Figures 4A,B**). A statistical analysis was done with the binary images as in the VGluT1 analysis. No statistical differences were found in the number of syntaxin immunoreactive particles in any comparison made (**Figure 4C**).

#### **COCHLEAR ABNORMALITIES IN P23H-1 RATS**

To determine whether the functional deficit observed in the ABR recordings of P23H-1 rats was accompanied by a cochlear pathology, we first evaluated the integrity of hair cells in these animals (**Figures 5A–F**). The results demonstrate loss (yellow asterisks in **Figures 5B–D**) and disruption (arrows in **Figure 5E**) of the outer hair cells in the hypoacusic P23H-1 rats when compared to the control animals (P23H-1 rats with normal ABR recordings) (**Figure 5A**). Although there was no apparent loss of inner hair cells in the hypoacusic animals, these cells were disorganized

**FIGURE 3 | VGluT1 immunolabeling in the coronal sections of the ventral cochlear nucleus of the control SD and P23H-1 rats.** Stained endbulbs of Held were red around cell bodies (arrows) in SD rats **(A)** and in deaf P23H-1 rats **(B)**. (**A** ,**A**,**B** ,**B**) Rendering the labeling for the binary image analysis obtained with the ImageJ software, and counting the labeled subcellular particles qualified by size [from 5µm to infinity in **(A ,B )**; and from 20µm to infinity in **(A,B)**]. **(C,D)** Balance of the VGluT1 positive particles in the ventral cochlear nucleus of P23H rats (deaf and with normal hearing) as compared with the control group (SD). Statistical differences were found when the counted particles were larger than 5µm **(C)**, and significant differences also appeared when the immunolabeled particles were larger than 20µm **(D)**. Data are expressed as means ± standard deviations, and the levels of significance are indicated as ∗∗∗*p* ≤ 0*.*001; ∗∗*p* ≤ 0*.*05; <sup>∗</sup>*p <* 0*.*1 or 0.2. Scale bar: 50 µm.

and had shortened cell bodies (asterisks in **Figures 5D,F**). The cytoarchitecture of the spiral ganglion cells was also investigated and revealed that these cells were larger and smaller in number (arrows in **Figures 5G,H**). Along with these changes,

myosin VIIa expression was investigated in the cochlea of P23H-1 rats to determine possible alterations in the function of this protein. Myosin VIIa immunocytochemistry revealed that myosin filamentous levels were low in the inner and outer hair cells of hypoacusic rats (**Figures 6A,B**) and also in other cochlear structures including the spiral limbus (**Figures 6C,E**) and the stria vascularis (**Figures 6D,F**), when compared to the control animals (**Figures 6A,C,D**). Additional anatomical defects in the inner ear of P23H-1 rats included a physical attachment of the tectorial membrane to the sensory epithelium (**Figure 6B**), as has been observed in animal models of sensorineural deafness (Camarero et al., 2001).

## **DISCUSSION**

All the assays in this study confirm that P23H-1 rats have a significant hearing deficit. It is noteworthy that the auditory threshold in all P23H-1 rats was higher than that in the control SD rats and in any other rat strain generally used in animal research. Our observations are supported by other studies conducted with ATP8A2 transgenic mice. Interestingly, the rhodopsin content in these mice is decreased, the ABR threshold is higher and the spiral ganglion cells appear to be degenerated (Coleman et al., 2014).

#### **PHYSIOLOGICAL FINDINGS**

The transgenic rhodopsin P23H mutant rat is an experimental retinal degeneration model that is widely used to study retinitis pigmentosa. These rats are in the albino Sprague-Dawley (SD) background, thus SD rats are employed for wild-type controls. Homozygous P23H-1 rats are mainly used to provide early

information about retinal degeneration. Here we evaluate the auditory capacity of this strain. Our preliminary observations detected a considerable number of P23H-1 rats with no impairment to respond to gross auditory assessment such as the KS and SC. In some cases, such deficiencies appeared at an early age and progressed during the rat's lifetime to reach profound deafness. The first auditory test was done at P20 when the central auditory system is considered mature. In other cases, auditory

**FIGURE 6 | Myosin immunostaining in the inner ear of P23H-1 rats.** There was a reduced myosin immunostaining in the Organ of Corti **(B)**, spiral limbus **(E)** and stria vascularis **(F)** of the hypoacusic P23H-1 rats in

comparison to P23H-1 rats with normal ABR recordings **(A,C,D)**. IHC, inner hair cells; OHC, outer hear cells. Scale bars = 50µm in **(A,B)**; 25µm in **(C–F)**.

alterations began later, at around P100, and some animals were totally deaf at P200. In order to organize the hearing test results, we divided offspring into three audition levels based on their responses to natural stimuli: Level I corresponds to P23H-1 rats, which responded to the KS and SC stimuli; Level II corresponds to those animals which responded only to the KS; Level III corresponds to the animals that responded to neither the KS nor the SC. With these preliminary results, we were able to predict progressive hearing loss in some P23H-1 rats.

The auditory test analysis was corroborated by the ABR recordings. The ABR is a commonly used technique to study auditory function under either normal or pathological conditions. In all the P23H rats evaluated, the auditory thresholds were always higher than those observed in SD rats, even when P23H rats responded to both KS and SC sounds (Level I). The amplitude for all the waves also diminished, which became more evident as the animal Level became higher; thus in the P23H-1 rats at Level III, the obtained ABR recordings were totally flat. Since it is generally accepted that waves I–V correspond to activity from the cochlea to the IC (Overbeck and Church, 1992; Church et al., 2010, 2012; Alvarado et al., 2012), the fact that in P23H the amplitude of all waves diminished suggests diverse anatomical abnormalities along the auditor pathway in this rat model. The alterations noted in the Organ of Corti, the spiral ganglion and the CN could explain the reduction noted in the amplitude of waves I and II in P23H-1 rats. Based on the high auditory thresholds and the differences in the ABR waves, we suggest that P23H-1 rats have altered auditory functional capabilities if compared to other rat strains, like SD, Long-Evans, Wistar or Fisher rats (Burkard et al., 1990; Newton et al., 1992; Overbeck and Church, 1992; Polak et al., 2004). Similarly, the retinal ganglion cells of the degenerated retina of P23H-1 rats have higher thresholds than normal retinas (Jensen and Rizzo, 2011; Jensen, 2012).

Progression of deafness with age in the homozygous P23H-1 rats resembled photoreceptors degeneration progression (Machida et al., 2000; Cuenca et al., 2004); http://www*.*ucsfeye*.* net/mlavailRDratmodels*.*shtml; our own studies). Photoreceptor cell loss begins at around P20. At P90, there are 2–3 layers of photoreceptors, and between P180 and P200, all the photoreceptors have almost disappeared and the rat is blind. In our study, hearing loss was also progressive in most rats, with more rapid loss from P100, and they became completely deaf at P200. In general, all the mammals present age-associated hearing loss. Nonetheless, a P180-aged experimental rat is considered young to have such auditory disabilities under normal conditions and if compared with other rat strains and other murine models (Borg, 1982; Schweitzer, 1987; Burkard et al., 1990; Cooper et al., 1990; Chen and Chen, 1991; Newton et al., 1992; Overbeck and Church, 1992; Church and Kaltenbach, 1993; Polak et al., 2004; Alvarado et al., 2012). Current optogenetic studies support the possibility that hearing impairment in P23H-1 is a direct effect of the rhodopsin mutation. Rhodopsin-mediated neuronal activity in the dorsal cochlear nucleus neurons of mice transfected with opsins adeno-associated viral vectors has been demonstrated (Shimano et al., 2013). In these mice, optical stimulation *in vivo* resulted in neuronal activity in the dorsal cochlear nucleus. Thus, the incorporation of rhodopsin into auditory neurons can be used as a tracer of auditory pathways to understand the specific function of neurons in hearing (Shimano et al., 2013). It is important to note that not all the P23H-1 rats in our study developed deafness, even if their hearing thresholds were higher than those in SD rats. In contrast, some were deaf at P20. Given the technical limitation of the ABR equipment, it is not possible to know if these last animals were born deaf.

### **INTEGRITY OF THE COCHLEA AND THE COCHLEAR NUCLEUS SYNAPSIS**

Initially, we considered that auditory alterations in P23H-1 rats could be of both a peripheral and central origin. We evaluated both possibilities by running an immunocytochemistry analysis of two proteins, which are widely distributed in the synaptic endings in the CN. Syntaxin is a protein located in all the synapses from the up- to downstream pathways reaching the CN, while VGluT1 is restricted to excitatory terminals on CN neurons (Garcia-Pino et al., 2010; Fyk-Kolodziej et al., 2011), mainly in the endbulbs of Held in the AVCN. In this study, we found no significant differences in the immunodistribution of syntaxin between the control SD and P23H-1 rats. However, the statistical analysis of anti-VGluT1 immunolabeling showed significant differences between SD rats and P23H rats, which were deaf or not deaf. To support these observations, VGluT1 expression was not detected in large auditory nerve terminals of VCN neurons 3 days after cochlear injury in rat (Fyk-Kolodziej et al., 2011). Our results suggest that the anatomical and/or molecular organization of the endbuld of Held in P23H-1 rats could differ from other rats with no hearing loss and, consequently, the anatomical origin of hearing loss can lie in the peripheral auditory pathway. Our results demonstrate that the hypoacusic P23H-1 rat displays multiple alterations in the inner ear, including loss and disruption of hair cells and spiral ganglion neurons, if compared with rats with normal ABR recordings. These morphological anomalies, along with the functional alterations described herein, demonstrate an increase in their auditory thresholds and a reduction in the amplitude of all the waves, which support the notion that these animals suffer from auditory dysfunction which results in affected neurotransmission in the cochlear nucleus.

The P23H-1 rat experimental model is very important to study retinal degeneration and can imply progressive loss of hearing function, as we show herein. We cannot assume a compensatory mechanism to visual impairment which promotes sense of hearing, unlike what happens in mice and cats with visual deprivation (Rauschecker et al., 1992; Rauschecker and Korte, 1993). This work opens up other fields of study into pathologies in which both the visual and auditory systems are implicated.

#### **SURVIVAL CONSIDERATIONS**

P23H mutant rats showed no breeding difficulties in the animal house, following the guidelines provided by Dr. M. LaVail. However, there were some cases of cannibalism of pups by the breeding pair. Deafness can be a barrier for parenting, as has been demonstrated in some strains of mutant mouse models. For example, mice with a homozygous mutation in cadherin 23 (C57BL/6J-Cdh23v-2J/J) are deaf from birth, they have no maternal instinct and their pups usually die (information from The Jackson Laboratory, and our own observations). For this reason, we wondered whether there was a relation between survival of pups and parents' auditory capacity. In this study, we found that parents with normal hearing can have offspring with impaired hearing, and that totally deaf parents can bear offspring with normal hearing. Therefore, breeding pairs with normal hearing is no guarantee that all offspring will have normal hearing. Although the success of offspring survival does not depend on parents being deaf, it is commonplace to find old breeding pairs with profound deafness when cases of cannibalism occurred.

## **CONCLUSIONS**

In the present report, we demonstrate that the transgenic P23H-1 rats employed to study retinal degeneration also underwent hearing deficiencies, which may appear at early age and can progress during lifetime to profound deafness. P23H-1 rats have a higher hearing threshold than the control SD rats and a higher one than other rat strains commonly used in research. Rhodopsin deficiency can be the molecular origin of auditory impairment in P23H-1 rats. Morphologically, the Organ of Corti and the spiral ganglion are the main affected structures. Consequently, these anatomical defects would result in altered excitatory inputs to the CN, which is reflected in the ABR recordings as abnormal auditory thresholds and reduced wave amplitudes. The results presented herein provide sufficient data to consider that a detailed morphological, molecular, physiological and genetic characterization of P23H-1 homozygous transgenic rats as an experimental model with visual and auditory dysfunctions is necessary.

## **AUTHOR CONTRIBUTIONS**

All the authors have made substantial contributions to the paper for having credit as authors. The individual tasks of the authors were as follows: conception of the work (Elena Caminos); design of the experiments, acquisition, analysis or interpretation of the data (all the authors: ABR experiments: Juan C. Alvarado; Immunocytochemical experiments in the cochlear nuclei: Jorge V. Sotoca and Elena Caminos; Auditory test: Jorge V. Sotoca and Elena Caminos; Statistical data analyses: Juan R. Martinez-Galan and Elena Caminos; Histology and immunocytology of cochleae: Verónica Fuentes-Santamaría. Drafting the work (Jorge V. Sotoca and Elena Caminos); revising it critically for important intellectual content (all the authors). Final approval of the version to be published (all the authors). Agreement to be accountable for all aspect of the work in ensuring that questions related to the accuracy or integrity of any part of the work have been appropriately investigated and resolved (Elena Caminos).

### **ACKNOWLEDGMENTS**

The authors wish to thank Dr. Matt LaVail and Kelly Ahern for supplying the P23H rats; José Julio Cabanes and Mari Cruz Gabaldón for technical assistance. This work has been supported by grants from Junta de Castilla-La Mancha, Consejería de Educación (PPlll 0-01 39-61 56) to Elena Caminos.

#### **REFERENCES**


response (ABR) waveforms: a study in Wistar rats. *Neurosci. Res.* 73, 302–311. doi: 10.1016/j.neures.2012.05.001


rhodopsin transgenic rats by gene delivery of BiP/Grp78. *Proc. Natl. Acad. Sci. U.S.A.* 107, 5961–5966. doi: 10.1073/pnas.0911991107


Zhang, Y., Arnér, K., Ehinger, B., and Perez, M.-T. R. (2003). Limitation of anatomical integration between subretinal transplants and the host retina. *Invest. Ophthalmol. Vis. Sci.* 44, 324–331. doi: 10.1167/iovs.02-0132

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 16 June 2014; accepted: 31 August 2014; published online: 17 September 2014.*

*Citation: Sotoca JV, Alvarado JC, Fuentes-Santamaría V, Martinez-Galan JR and Caminos E (2014) Hearing impairment in the P23H-1 retinal degeneration rat model. Front. Neurosci. 8:297. doi: 10.3389/fnins.2014.00297*

*This article was submitted to Auditory Cognitive Neuroscience, a section of the journal Frontiers in Neuroscience.*

*Copyright © 2014 Sotoca, Alvarado, Fuentes-Santamaría, Martinez-Galan and Caminos. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Processing of harmonics in the lateral belt of macaque auditory cortex

## *Yukiko Kikuchi 1,2,3\*, Barry Horwitz 2, Mortimer Mishkin3 and Josef P. Rauschecker <sup>1</sup>*

*<sup>1</sup> Department of Neuroscience, Georgetown University Medical Center, Washington, DC, USA*

*<sup>2</sup> Brain Imaging and Modeling Section, Voice, Speech and Language Branch, National Institute on Deafness and Other Communication Disorders, National Institutes of Health, Bethesda, MD, USA*

*<sup>3</sup> Laboratory of Neuropsychology, National Institute of Mental Health, National Institutes of Health, Bethesda, MD, USA*

#### *Edited by:*

*Monica Munoz-Lopez, University of Castilla-La Mancha, Spain*

#### *Reviewed by:*

*Tobias Overath, Duke University, USA Bethany Plakke, University of Rochester School of Medicine and Dentistry, USA Amy Poremba, University of Iowa, USA*

#### *\*Correspondence:*

*Yukiko Kikuchi, Institute of Neuroscience, Newcastle University Medical School, Henry Wellcome Building, Framlington Place, Newcastle Upon Tyne NE2 4HH, UK e-mail: yukiko.kikuchi@ newcastle.ac.uk*

Many speech sounds and animal vocalizations contain components, referred to as complex tones, that consist of a fundamental frequency (F0) and higher harmonics. In this study we examined single-unit activity recorded in the core (A1) and lateral belt (LB) areas of auditory cortex in two rhesus monkeys as they listened to pure tones and pitch-shifted conspecific vocalizations ("coos"). The latter consisted of complex-tone segments in which F0 was matched to a corresponding pure-tone stimulus. In both animals, neuronal latencies to pure-tone stimuli at the best frequency (BF) were ∼10 to 15 ms longer in LB than in A1. This might be expected, since LB is considered to be at a hierarchically higher level than A1. On the other hand, the latency of LB responses to coos was ∼10 to 20 ms shorter than to the corresponding pure-tone BF, suggesting facilitation in LB by the harmonics. This latency reduction by coos was not observed in A1, resulting in similar coo latencies in A1 and LB. Multi-peaked neurons were present in both A1 and LB; however, harmonically-related peaks were observed in LB for both early and late response components, whereas in A1 they were observed only for late components. Our results suggest that harmonic features, such as relationships between specific frequency intervals of communication calls, are processed at relatively early stages of the auditory cortical pathway, but preferentially in LB.

**Keywords: communication calls, harmonics, macaques, auditory cortex, single-unit, multi-peaked neurons**

## **INTRODUCTION**

Harmonics, one of the essential acoustic structures observed in a natural environment, consist of integer multiples of a sound's fundamental frequency (F0). Natural harmonic sounds include species-specific vocalizations, a sound category with biological relevance for most species including humans (Fitch, 2006), and most musical instrument sounds. Simultaneous presentations of tonal sounds are referred to as chords; if their harmonics have a simple frequency interval ratio of 2:1 ("octave") or 3:2 ("perfect fifth"), they are perceived as consonant by both humans and nonhuman primates (Schellenberg and Trainor, 1996; Izumi, 2000). By contrast, complex frequency interval ratios, as in a "minor second" (16:15), create roughness of sound and are perceived as dissonant. Studies have shown that human infants as young as 4 months have a preference for consonance over dissonance (Zentner and Kagan, 1996, 1998), suggesting a possible innate bias toward harmonic structure like that contained in communication sounds (but see Terhardt, 1974).

We perceive harmonically related sounds as a whole rather than as components of a spectrum. Macaque monkeys, whose architectonic structure of cortical auditory regions closely resembles that in humans (Hackett et al., 2001), judge two melodies to be the same when they are transposed by one or two octaves, but only if the melodies are tonal (Wright et al., 2000). They also perceive the pitch of harmonic sounds with a "missing fundamental" (Tomlinson, 1988), suggesting they experience gestalt perception of tonal structures just as we do.

Neurophysiological and fMRI studies of primary auditory cortex (A1) in several species have reported neurons with multiple peaks of their response rates in the frequency domain, including ones at harmonically-related intervals (monkey: Brosch et al., 1999; Kadia and Wang, 2003; cat: Oonishi and Katsuki, 1965; Sutter and Schreiner, 1991; Eggermont, 2007; Noreña et al., 2008; human: Moerel et al., 2013). Meanwhile, neuronal populations in monkey A1 show greater evoked responses to dissonant than to consonant chords (Fishman et al., 2001) and enhanced responses to mistuned harmonics compared to harmonics (Fishman and Steinschneider, 2010), perhaps due to the greater salience of the mistuned component. However, the neural basis of harmonic processing, especially outside primary auditory cortex, remains unclear.

Generally, neurons in the lateral belt (LB), located laterally adjacent to the auditory core areas, respond preferentially to complex sounds, including band-passed noise and frequencymodulated sweeps (Rauschecker et al., 1995; Rauschecker and Tian, 2004; Tian and Rauschecker, 2004). Within LB, selectivity for conspecific calls is highest in its anterolateral division (AL) (Tian et al., 2001). In the present study, the stimulus preference of auditory neurons for pure tones vs. harmonic vocalizations ("coo") was compared in A1 and LB [including the middle lateral (ML) and anterolateral (AL) areas of auditory cortex] in behaving rhesus monkeys. We hypothesized that spectral integration of harmonically related intervals takes place preferentially in LB. Due to this integration ("spectral combination sensitivity"; Suga et al., 1979; Rauschecker et al., 1995) a sound with harmonic structure should be more effective than a pure tone, even one at the neuron's BF, in evoking a response in this region.

#### **MATERIALS AND METHODS**

#### **ANIMAL PREPARATIONS**

Two adult male rhesus monkeys (*Macaca mulatta*), weighing 7.5–11.5 kg, were prepared for chronic awake electrophysiological recording. Animal care and all procedures were conducted in accordance with the National Institutes of Health guidelines, and all experimental procedures were approved by the Georgetown University Animal Care and Use Committee. Each animal was anesthetized, and a head post and recording chamber were attached to the dorsal surface of the skull under aseptic conditions. With guidance from MRI images obtained with a 3T scanner (0.5 mm voxel size, Siemens Tim Trio), a cylindrical chamber (65◦ angle, 19 mm diameter, Crist Instruments, Hagerstown, MD) was positioned stereotaxically over the left hemisphere of Monkey H, and a custom-made oval chamber (20 × 40 mm, Crist Instruments) was positioned over the left hemisphere so as to cover most of the supratemporal plane of Monkey P. Monkey H had previously been used to acquire data from the rostral supratemporal plane through a rostrally positioned chamber (Kikuchi et al., 2010); therefore, for this experiment, the original chamber was removed and re-implanted over a more caudal auditory region to permit access to the middle lateral (ML) and anterolateral (AL) auditory areas in addition to the auditory core cortex. A post-operative MRI scan confirmed that the chambers were positioned correctly. The skull disc within the chamber was then removed under aseptic conditions before recording was begun. While awake, Monkey H received audiological screening, which included DPOAE (distortion product otoacoustic emission) measurements to assess cochlear function, and tympanometry to evaluate middle-ear function. The hearing ability of Monkey H was found to be normal.

#### **BEHAVIORAL TASK**

Behavioral testing and recording sessions were conducted in a single-walled acoustic chamber (Industrial Acoustics Company, Bronx, NY) installed with foam isolation elements (AAP3, Acoustical Solutions). The animal sat in a monkey chair with its head fixed, facing a speaker located one meter directly in front of it in a darkened room. The animal was trained to perform an auditory discrimination task. A single positive stimulus (S+), consisting of a 300-ms pink-noise burst (PNB), was pseudorandomly interspersed among negative stimuli (S−) for 20% of the trials. The (S−) consisted of all other stimuli. The animal initiated a trial by holding a lever for 500 ms, triggering the presentation of one of the acoustic stimuli. Lever release within a 500-ms response window after offset of the S+ led to a water reward (∼0.2 ml) followed by a 500-ms inter-trial interval (ITI). Lever release in response to a negative stimulus prolonged the 500-ms ITI by 1 s (timeout). The average inter-onset-interval, including correct and incorrect trials, was 2.3 ± 0.45 s (mean ± SD). In this report, all electrophysiological analyses are based on correct trials only.

#### **SOUND PREPARATION**

The sound waveform signals were sent through a 12-bit D/A converter (CIO-DAS1602/12, ComputerBoards) using the CORTEX dual-computer system and then amplified, attenuated, and delivered through a free-field loudspeaker (Reveal 6, Tannoy), which had a flat (±3 dB) frequency response from 63 Hz to 51 kHz.

All stimuli, including the monkey vocalizations ("coo" calls), had a 300-ms fixed duration, gated with a 5-ms rise/fall linear ramp. This vocalization was recorded under natural conditions in Morgan Island using a directional microphone (ME66 with K6 powering module, Sennheiser, CT, USA, frequency response at 40–20,000 Hz ± 2.5 dB) with a solid-state portable recorder (PMD670, Marantz Professional, London, UK) at a sampling rate of 48 kHz (Laboratory of Neuropsychology, NIMH). The vocalization consisted of harmonic structures with asymmetrical spectral contours (**Figure 1**). Pure tones (PTs) and PNBs were generated at a sampling rate of 48 kHz (32 bit) using Adobe Audition 1.5. The stimuli were normalized by recording the stimuli played through the stimulus presentation system, filtering the recorded signal on the basis of Japanese macaque audiograms (Jackson et al., 1999), and using the maximum root-mean-square (RMS) amplitude during a sliding window of 200 ms duration and presented at ∼70 dB SPL. Details of the sound equalization method were described by Kusmierek and Rauschecker (2009) ´ .

#### **STIMULI**

The experiment consisted of several blocks of sessions, each block with a different stimulus set. After isolating a neuron, we determined the neuron's best frequency (BF) and receptive field (RF), using 23 PTs ranging from 134 Hz (C3) to 21 kHz (E10) at fifth and tritone intervals in a diatonic scale. This yielded a rough tuning curve and/or 84 PTs at semitone steps in a chromatic scale with a range of 7 octaves between 110 Hz (A2) and 13.3 kHz (G#9)

**FIGURE 1 | (A)** Spectrogram and **(B)** power spectrum of the coo call used as a naturalistic harmonic sound in this study. The coo consists of the first harmonic or fundamental (h1) and five prominent harmonics with amplitudes above −50 dB (2nd to 6th harmonics, h2–h6; FFT size: 1024, Hann window). The interval between h1 and h2 is one octave and that between h2 and h3 is a perfect fifth.

to obtain a fine tuning curve. We then used a set of PTs and coos with various pitches using digital recordings of natural coo calls to test responses to complex tones (see **Figure 1**, Supplementary material 1). The fundamental frequency (F0) of the coo was varied using the pitch-shift function in Adobe Audition 1.5. Neural responses to PT and coo stimuli were compared using a stimulus set comprised of 10 PTs and 10 pitch-matched coo calls in either the same block (76 sessions) or in separate blocks (48 sessions). The frequency of PTs and the F0 of the coos ranged from G3 (196 Hz) to C#8 (4435 Hz) in 6 semitone steps. In each recording session, the stimuli were presented in pseudorandom order with at least 15 trials per stimulus.

## **ELECTROPHYSIOLOGICAL RECORDINGS**

Prior to each recording session, a reference point above the lateral sulcus was calculated based on the preoperative MRI scan. The position of the supratemporal plane was calculated based on the MRI images, and coordinates were mapped onto the chamber. A guide tube for up to 4 tungsten microelectrodes (0.5–3.0 M-, epoxylite insulation, FHC, Bowdoin, ME) was then lowered into the brain to this reference point. Each electrode was independently advanced using a remote-controlled hydraulic, 4 channel customized microstep-multidrive system (NAN-SYS-4, Plexon. Inc., Dallas, TX). As the electrode was lowered, a silent gap in the recording signal was usually observed as the electrode passed between the frontal and the temporal lobe. The postgap depth at which the first robust spontaneous spiking activity was observed was marked as the initial recording site for each electrode; it is thus likely that much of the data were recorded from the supragranular layers. In addition to the coordinates of the recording site, which was perpendicular to the chamber grid plane, the coordinates of three additional points on the grid plane were determined as reference points. Thus, four reference points were available, as needed, to reconstruct the coordinates of each recording site in 3D space from individual anatomical MRI images (voxel size 1 mm) after each recording session. This becomes important for the standardization of the coordinates in relation to auditory areas using a population-average brain (see below for more details).

We attempted to select neurons based neither on their stimulus preference nor on the shape of their spiking activity. The signal from each electrode was passed through a head stage with gain one and high input impedance (HST/8o50-G1, Plexon Inc.) and then split to extract the spiking activity through a preamplifier system (PBX2/16sp/16fp, Plexon Inc.). The spike signals were filtered with a pass-band of 150–8000 Hz, further amplified, and then digitized at 40 kHz. Voltage-thresholding was applied to spiking activity, and spike waveforms were stored after threshold crossing. In many cases, the signal on each electrode contained activity from more than one neuron. After time-voltage thresholding, we separated the multi-unit spike trains into single-unit spike trains using the Valley Seeking algorithm (Offline Sorter, Plexon, Inc.). When we found more than one cluster, the separation quality of multi-clusters was inspected using Multivariate Analysis of Variance (MANOVA), and only data with *p* < 0.01 were considered to be separate neurons from the same electrodes and included as such in the analysis. During long recording sessions, temporal stability was sometimes lost due to electrode drift. Such instability appeared as a discontinuous cluster in time. These units were excluded from analysis.

We also inspected the inter-spike interval (ISI) for each cluster. An ISI distribution with entries smaller than the refractory period (1 ms) signifies that the recorded spikes were from more than a single neuron. In most such cases, we changed the threshold at the stage of voltage-thresholding and re-ran the cluster analysis. However, if an ISI < 1 ms was still observed in a small proportion of a newly sorted ISI distribution (usually <0.3%), the furthest spike waveform from the cluster center in 2D feature space with ISIs less than the refractory ISI was removed from the unit.

Time stamps indicating the timing of auditory stimulus, behavioral response, and reward events were sent through CORTEX (CIO-DAS1602/12, CIO-DIO24, ComputerBoards), and continuous data, such as sound waveforms and eye movements monitored by an infrared-based eye-tracking system at 60 Hz (ETL-200, ISCAN, Inc.), used to check the animals' state of wakefulness, were sent to a Multichannel Acquisition Processor system (MAP, Plexon, Inc.) and then integrated with the spike data. During the recording session, spikes were roughly sorted by real-time acquisition programs using template matching and PCA clustering methods (RASPUTIN, Plexon), and rough estimations of the frequency- and intensity- tuning profiles of the neuron were examined online (Neuroexplorer, Nex Technologies, MA). Throughout the recording sessions, we monitored neuronal activity visually with an oscilloscope (HM407-2, HAMEG) and aurally through headphones (HD 280 Professional, Sennheiser). Data selection, pre-processing, and data analysis were performed using MATLAB and SPSS. All the results in this report are based on offline analysis conducted after the experiments were completed; the online analysis was used only as a quick evaluation of a neuron's characteristics for stimulus selection purposes.

## **DATA ANALYSIS**

The spike trains of single-unit activity (SUA) were binned at 1 ms for each trial and the average spontaneous firing rate and its variability per stimulus condition was first calculated during the baseline period (0–150 ms before sound onset). The spike trains were convolved with a Gaussian kernel (σ = 10) to construct spike-density peri-stimulus time histograms (PSTHs) and then normalized to the average variability (SD) of the raw baseline firing rate across all stimulus conditions. Neurons that showed responses 2.0 SDs above baseline for 10 consecutive 1-ms sampling points in the normalized PSTH to at least one sound (other than the S+, pink noise) were defined as "auditory-responsive." These constraints were imposed in order to exclude spurious activity or artifacts.

Tuning curves were constructed based on peak response magnitude (i.e., the maximum magnitude of the peak firing rate minus the average baseline firing rate) and then smoothed by moving the average along with the two neighboring points (i.e., two semitones) on each side of the frequency axis. The frequency that produced the maximum response on the tuning curve function was defined as the best frequency (BF) for the neuron. Neurons were classified as having either one peak on the frequency tuning curve function (i.e., a single-peaked neuron) or more than one peak (i.e., a multi-peaked neuron) with a clear excitation greater than 2.5 SDs above baseline firing rates and the half driven rate of the peak (i.e., 50% firing rates of the normalized peak magnitude in response to the BF). To obtain a clear tuning peak, we used this stricter criterion than the one defined above for auditory-responsive neurons.

Neuronal latency was calculated based on the spike density function with the Gaussian kernel (σ = 10) described above and was defined as the time from sound onset to the first millisecond bin in which spiking activity rose 2 SDs above baseline for 10 consecutive 1-ms bins. The SD calculated from the raw data (taking the grand average of variability across all stimulus conditions) generally yielded a higher value than that calculated from smoothed data. Minimum latency was defined as the shortest latency across all auditory responses of the neuron; this was sometimes different from the latency in response to the BF, which was measured from stimulus onset to the peak magnitude of the response. Minimum latency to S+ was calculated only when the neuron showed a significant response to S+ in both correct and incorrect trials. For incorrect trials, the recording session was included in the statistical analysis only if there were at least five "miss" trials (without a response to S+) within that session. If there were other types of errors (e.g., premature response to the positive stimulus), the trials were excluded altogether to avoid incorporating artifactual effects on neuronal activity, e.g., effects of motor responses. BF latency was defined as the latency in response to the BF, and the average latency was defined as the median latency across all auditory responses of the neuron. The coo-call latency was defined as the latency in response to the coo whose F0 was matched to the neuron's BF. Neuronal latencies and spike rates across subdivisions of auditory cortex were compared using the Kruskal-Wallis test, and *post-hoc* testing between subfields was performed using Tukey's "honestly significant difference" (HSD) test to correct for multiple comparisons.

To analyze the tuning width of each neuron quantitatively, a bandwidth index (BI) was calculated by a method similar to one used by Lakatos et al. (2005; formula shown in **Figure 4A**) using a fine-tuning paradigm. The BI was calculated using normalized firing rates during the entire sound duration (0–300 ms) after subtracting mean baseline firing rates across all stimuli in the single-peaked neurons (**Figure 4A**). A BI index close to 1 indicates sharp frequency tuning, whereas a BI index near 0 indicates broad tuning. We also measured the traditional tuning width of the neuron's response peak at 30 dB above threshold (BW30; Sutter and Schreiner, 1991; Schreiner and Sutter, 1992). Pure tones with sound durations of 300 ms at five different intensities in 10-dB steps (30–80 dB SPL) were presented at different frequencies (E3– E10, 165-21 kHz) in octave steps. The tones were played in a pseudorandom order of different frequencies and intensities. The neuron's frequency response area (FRA) was determined as a contour line of 2.5 SD above baseline activity in the frequency and intensity domains, and the tuning width at an intensity of 30 dB SPL above the neuron's threshold in the FRA was determined as the neuron's BW30. To make the bandwidth results directly comparable between our study and Sutter and Schreiner's study, we computed the BW 30 using both single- and multi-peaked neurons. Since we used a fixed stimulus set (40 tones), we were not always able to precisely determine the neuron's threshold, because some neurons still showed a response at the lowest sound intensity we employed. In this case we calculated the BW30 as the tuning width at 60 dB SPL, which is 30 dB above the lowest sound intensity we used. For the same reasons, if the neuron's threshold was as high as 60 dB SPL, we were not able to obtain the BW30. Also, due to time constraints, we were not always able to fully determine the neuron's FRA after completing the other tests; thus our analysis for BW30 is limited to the neurons we actually tested in this paradigm.

Multi-peaked neurons were tested with a fine-tuning paradigm that included 84 pure tones in chromatic scales (A2-G#9, 110– 13289.8 Hz). We selected the best two (if there were only two) or three peaks (>2.5 SDs above baseline and the half-driven rate of the peak) and assigned them to BF1–BF3 in ascending order of their frequency at the peaks (i.e., the lowest peak frequency was assigned to BF1). Among these frequencies, the frequency that elicited the greatest peak response was selected as the neuron's overall BF. The criterion for the presence of a peak was a response above a specific threshold; the criterion for two peaks was a decrease in response below this threshold for at least one point between the two peaks, the minimum inter-peak interval being separation by more than two semitones. The frequency interval ratio (BF ratio) was calculated for all three combinations (BF1–BF2, BF1–BF3, and BF2–BF3) and normalized by the frequency of the lowest peak, a method similar to that used by Kadia and Wang (2003). The distribution of BF ratios was then binned by one tenth of an octave (Sutter and Schreiner, 1991), which is wider than semitone resolution. The distribution of BF ratios was calculated based on the peak firing rates during the early-response period (0–70 ms from sound onset) and during the late-response period (71–300 ms from sound onset). The 70 ms time window was used to separate the onset and sustained components of the response, since a typical auditory single-unit response showed a trough between onset and sustained responses at approximately 60–80 ms. The confidence interval (CI) was calculated from the distribution of BF ratios using the same bin width. We also calculated the CI from the distribution of BF ratios under the assumption that the peak interval relations of multipeaked neurons were random. The number of occurrences of peak intervals was assigned to a given bin of BF ratio and the averaged distribution after 1000 permutations was computed. Since the CI from the average distribution was always lower than the former CI using the raw distribution, we employed the CI calculated from the raw distribution in this study. To compare the number of harmonic intervals in multi-peaked neurons, we used one or two bins that were centered at the perfect fifth (1.5) and octave (2.0). When the BF ratio was in the middle of two bins, we used the bin with maximum peak.

Data from the subfields of LB (i.e., ML and AL) were grouped whenever the sample size for individual subfields was too small to allow for statistical testing.

#### **ASSIGNMENT OF RECORDING LOCATIONS TO CORTICAL AREAS**

The recording sites in this study were assigned to either the auditory core region (primary auditory cortex, A1) or to the auditory LB region [middle lateral field (ML) and anterolateral field (AL)] using the following criteria.

To reconstruct the boundaries between cortical areas along the anterior-posterior (AP) axis, in particular the boundary between ML and AL, we employed the standard approach of using the cortical tonotopic gradient map based on the neurons' best frequency (BF, **Figure 2B**; cf. Rauschecker et al., 1995). The mean BF along the AP axis was used to calculate the reversal point of the BF tuning curve along the AP axis (monkey H, 15.5; monkey P, 14.5, **Figure 2A**).

To reconstruct the boundaries between cortical areas along the medial-lateral (M-L) axis, i.e., between the putative core and LB regions, a similarly precise approach based on functional criteria cannot be taken, even though the neurons' response characteristics, in particular bandwidth tuning or tone-vs.-bandpass-noise preference, do differ between core and LB (Rauschecker et al., 1995). Therefore, we used an approach based on populationaverage T1-weighted MRI images (112RM-SL) provided by McLaren et al. (2009, 2010; the standardized atlas of the rhesus macaque brain can be downloaded from: www.brainmap.wisc.

**FIGURE 2 | (A)** Recording sites: the location of A1 and lateral belt (LB) are depicted on the left supratemporal plane (STP) of Monkeys H and P. The anterior-posterior (AP) and medial-lateral (ML) coordinates were transformed into standardized coordinates based on the population-average macaque brain (McLaren et al., 2009, 2010). The curved solid black line on each map shows the estimated border between core and LB based on the atlas from a single subject (Saleem and Logothetis, 2007). The anterior-posterior border (dotted line) was drawn from the frequency reversal observed on mapping the best frequencies (BFs, see **Figure 2B**); this reversal occurred at a slightly different AP coordinate in the two monkeys

(Monkey H, 15.5; Monkey P, 14.5). Shown on the right are two coronal MRI images of monkey P at the indicated AP levels, with A1 and LB on the supratemporal plane (STP) highlighted in red and blue, respectively. **(B)** Best frequency (BF) maps for each of the two animals. Frequency reversal on the BF maps was used to determine the anterior-posterior boundary of the auditory subdivisions for each animal. The dotted line was calculated based on the lowest frequency reversal point using mean values, smoothed by a 3-mm sliding window, along the AP direction. The scale bar on the right shows the frequency range of pure tones used to estimate the neurons' BF.

edu/monkey.html). This approach permits us to standardize electrophysiologically identified regions into a common space across multiple individuals and to assign the xyz-coordinate of each recording site to one of these regions. Most importantly for our study, these coordinates can be used to identify the medial-lateral boundary between core and LB.

The 112RM-SL database is the average of 112 rhesus macaque brains co-registered with the single-subject atlas (D99-SL) of Saleem and Logothetis (2007), which was itself co-registered with histological slices (Nissl, parvalbumin, SMI-32, calbindin and calretinin) aligned to cytoarchitectonic areas. We used the Analysis of Functional NeuroImages (AFNI) software (Cox, 1996) for MRI processing. The volumes were reconstructed from the original T1-weighted image of an individual animal in AFNI using the *to3d* function. To mask out nonbrain areas occupied by a thick mass of muscle around the skull, the skull was removed from the images using the *3sSkullStrip* function, and the skullstripped images were then aligned to the population-average brain (112RM-SL) to generate a transformation matrix that converted each xyz-coordinate of the recording site into standardized space. Since inter-individual variability is large, especially across the width of the supratemporal plane (STP), each co-registered brain yielded discrepancies for some brain structures, especially when the single-subject atlas was the reference for each site. In that case, we chose the gray matter closest to the site and visually assigned the cortical region based on the atlas.

## **RESULTS**

The animals performed an auditory discrimination task on average at 96.3% accuracy (86.4% correct responses for the S+ trials and 98.8 % correct for the S− trials). Most errors (1.8%) were failures to release the lever to the S+ ("miss" errors); the other types of errors were either premature responses to the S+ [lever release before sound offset (0.9%)] or "false-alarm" errors [lever release to a negative stimulus (0.9%)].

Neurons in A1 and LB were recorded either separately or, more often, simultaneously using two to three electrodes. The spontaneous firing rates showed no significant differences across the three divisions of auditory cortex (monkey H: A1, 11.7 ± 8.6 spikes/s; ML: 15.5 ± 10.9 spikes/s; AL: 11.2 ± 8.1 spikes/s, *p* = 0.07; monkey P: A1, 14.0 ± 10.7 spikes/s; ML: 16.9 ± 12.4 spikes/s; AL: 15.0 spikes/s ± 9.8 spikes/s, *p* = 0.18, Kruskal-Wallis test, mean ± SD).

### **HIERARCHICAL PROCESSING IN THREE SUBDIVISIONS OF AUDITORY CORTEX (A1, ML, AND AL) IN RESPONSE TO PURE TONES (PT) AND PINK-NOISE BURSTS (PNB)**

We first analyzed the responses of 596 single neurons (A1, 238; ML, 167; AL, 191) to pure tones (PTs). All three subfields of the auditory cortex generally responded to the PTs across a wide range of frequencies. The proportion of auditory neurons that showed a significant response to PTs decreased gradually from A1 to ML to AL (A1, 79%; ML, 74%; AL, 67%; see **Table 1** and **Figure 3A**). Although there was no overall statistically significant effect of PT responsiveness across subdivisions (chi-square test, <sup>χ</sup><sup>2</sup> <sup>=</sup> <sup>2</sup>.1, *df* <sup>=</sup> 2, *<sup>p</sup>* <sup>&</sup>gt; <sup>0</sup>.05), minimum onset latencies to PTs, i.e., the shortest latencies among all the responses

**Table 1 | Population of auditory neurons driven by pure tones (PT) in different subfields of auditory cortex.**


of a neuron, differed significantly across the three subdivisions (*p* < 0.001, Kruskal-Wallis test), being shortest in A1 [median: 28 ms, 25th percentile (*Q*1) = 17 ms, 75th percentile (*Q*3) = 42 ms, *N* = 188] followed by ML and AL (ML: median: 35 ms, *Q*1 = 26 ms, *Q*3 = 50 ms, *N* = 124, *p* < 0.001; AL: median: 44 ms, *Q*1 = 22 ms, *Q*<sup>3</sup> = 78 ms, *N* = 128, *p* < 0.001, Tukey's HSD test, **Figure 3B**, **Table 2**). Minimum latencies to the PNBs on correct trials were also compared across the three subdivisions: Like the gradual change in minimum latency observed in response to the PTs (**Figure 3C**, left), the latency to the PNBs differed significantly across the subdivisions (**Figure 3C**, right, *p* < 0.001, Kruskal-Wallis test). The *post-hoc* tests show that the median latency in A1 and ML differed significantly from that in AL, though the A1 and ML latencies did not differ from each other (A1: median: 36 ms, *Q*1 = 25 ms, *Q*3 = 47 ms; ML: median: 39 ms, *Q*1 = 31 ms, *Q*3 = 49 ms; AL, median: 51 ms, *Q*1 = 38 ms, *Q*3 = 74 ms, A1 vs. AL: *p* < 0.001; ML vs. AL: *p* < 0.05, Tukey's HSD test). To understand the variability between the two monkeys and three subdivisions of the auditory cortex, "monkey" and "area" were included as between-subject condition factors in Two-Way ANOVAs. The analysis revealed that for PT minimum latencies, there was a significant main effect of area [*F*(2, 434) = 4.929, *p* < 0.01] and monkey [*F*(1, 434) = 12.134, *p* < 0.01] but no interaction [*F*(2, 434) = 1.867, *p* = 0.16]. For PNB latencies, there was a significant main effect of area [*F*(2, 209) = 4.375, *p* < 0.02] but not monkey [*F*(1, 209) = 0.3495, *p* = 0.56] and there was no interaction [*F*(2, 209) = 0.14, *p* = 0.87]. The main effect of area was present for both PT and PNB latencies. Together, these data suggest that sound processing occurs along a cortical hierarchy from A1 to ML and AL.

We also compared the electrophysiological responses to the S+ (PNB) during correct trials (17.3% of all trials) and incorrect ("miss") trials (1.8% of all trials; see Methods). The minimum response latency for correct and incorrect trials during the same recording session did not differ significantly in either A1 or LB (A1: median: 35 vs. 37 ms, Q1:22 vs. 21 ms; Q3:47 vs. 47 ms, respectively, *N* = 54, *p* = 0.20; LB: median: 46 vs. 45 ms, Q1: 34 vs. 33 ms, Q3: 62 vs. 66 ms, respectively,*N* = 40, *p* = 0.41,Wilcoxon signed-rank test). The latencies were generally longer in LB than in A1 for both trial types (correct trials, *p* < 0.001; incorrect trials: *p* < 0.05), as expected from the responses to PTs and PNBs, consistent with the notion of a cortical hierarchy from A1 to LB.

## **MULTI-PEAKED NEURONS WITH HARMONICALLY RELATED INTERVALS IN A1 AND LB**

We constructed a frequency tuning curve using the neuron's peak magnitude to calculate the best frequency (BF) of each auditory neuron. Among 205 neurons recorded in the fine-tuning

**FIGURE 3 | (A)** Proportion of auditory neurons driven by pure tones in A1, ML, and AL. The proportion was greatest in A1, followed by ML and AL. **(B)** Cumulative proportion of minimum response latency to pure tones. Latencies in LB (i.e., ML and AL) were significantly longer than in A1 [median: A1: 28 ms, *N* = 188; ML: 35 ms, *N* = 124; AL: 44 ms, *N* = 128; A1 vs. ML, *p* < 0.001; A1 vs. AL, *p* < 0.001 (Tukey's HSD test)]. **(C)** Minimum response latency to pure tones (PT) (left) and pink-noise bursts (PNB) (right) across the three subdivisions of auditory cortex. The central marks of the boxplots show the median latency between the 25th and 75th percentiles. The asterisks denote the significance level of *post-hoc* testing (Tukey's HSD test, ∗*p* < 0.05, ∗∗∗*p* < 0.001).

pure-tone paradigms using chromatic scales with semitone steps, 142 neurons (69%) were single-peaked, and 63 neurons (31%) were multi-peaked. The proportion of multipeaked neurons in A1 and LB did not differ significantly in Monkey P (A1 vs. LB: 47 vs. 37%, A1: *<sup>N</sup>* <sup>=</sup> 17; LB: *<sup>N</sup>* <sup>=</sup> 10; <sup>χ</sup><sup>2</sup> <sup>=</sup> <sup>0</sup>.65, *df* <sup>=</sup> 1, *<sup>p</sup>* <sup>&</sup>gt; <sup>0</sup>.05) but it decreased in Monkey H (35 vs. 17%, A1: *N* = 16; LB: *<sup>N</sup>* <sup>=</sup> 16; <sup>χ</sup><sup>2</sup> <sup>=</sup> <sup>5</sup>.85, *df* <sup>=</sup> 1, *<sup>p</sup>* <sup>&</sup>lt; <sup>0</sup>.05). The distribution of BFs for multi-peaked neurons (*N* = 63) was not significantly different from that for single-peaked neurons (*N* = 142), when the BFs eliciting the neurons' maximum peak response were compared (Monkey H: mean ± SE, 2826 ± 619 Hz vs. 3625 ± 430 Hz, *p* = 0.63; Monkey P: 3244 ± 779 Hz vs. 2560 ± 614 Hz, *p* = 0.28, multi-peaked vs. single-peaked, Wilcoxon rank-sum test).

We next analyzed the sharpness of frequency tuning in A1 and LB using a bandwidth index (BI; see Methods) similar to the one used by Lakatos et al. (2005), with a BI close to 1 indicating sharp frequency tuning, and a BI near 0 indicating broad tuning. There was a main effect of recording site on BI (A1: 0.52 ± 0.02, *N* = 47; ML: 0.52 ± 0.02, *N* = 36; AL: 0.46 ± 0.01, *N* = 59 (mean ± SE), Kruskal-Wallis test, *p* < 0.02, **Figure 4A**) with A1 neurons displaying sharper tuning compared to AL (Tukey's HSD test, *p* < 0.02). Tuning width was further examined in a subset of neurons with the traditional approach measuring BW30 (Sutter and Schreiner, 1991; see Methods). This analysis had a similar outcome with a significant difference in frequency tuning width between A1 and AL (A1: 1.9 ± 0.21 octaves, *N* = 29; ML: 2.9 ± 0.66 octaves, *N* = 19; AL: 3.0 ± 0.51 octaves, *N* = 14, mean ± SE, *p* < 0.05, Wilcoxon rank sum test, **Figure 4B**).

**Figures 5A,B** show an example of a single neuron from area ML with a multi-peaked response. Whereas the tuning of the excitatory onset response was broad across a wide range of frequencies, several discrete frequency peaks can be distinguished in the sustained response after the drop-off of the initial onset response (see raster plots in **Figure 5A**). This neuron's BF (frequency with highest peak response; 112.8 spikes/s) was 1865 Hz (A#6). However, the sustained response showed three additional peaks above threshold, which were all distinct in frequency (440 Hz = A4, 622 Hz = D#5, and 932 Hz = A#5). The best three peaks were chosen based on the peak firing rates and assigned as BF1 (D#5, 59.4 spikes/s), BF2 (A#5, 60.6 spikes/s), and BF3 (A#6, 112.8 spikes/s) in order of ascending frequency (see Methods). The frequency ratios of the best three BFs in relation to each other were 3.0 (BF3/BF1, 19 semitones), 1.5 (BF2/BF1, 7 semitones), and 2.0 (BF3/BF2, 12 semitones), which correspond to "perfect" harmonic or musical intervals (i.e., ratios of 2.0 = octave; and ratios of 1.5 and 3.0 = "perfect fifths").

If LB contributes more than A1 to the spectral integration of harmonically-related interval information, the distribution of distances between two peaks of multi-peaked neurons might tend toward harmonically-related interval ratios more often in LB than in A1. We calculated the interval ratio between best frequencies (BF ratio) in all multi-peaked neurons (**Figure 5C**). This was done separately for early (0–70 ms from sound onset) and late responses (>70 ms). The distribution of BF ratios in LB showed a maximum at the perfect-fifth interval (3:2 = 1.5) in both the early and late periods (above the confidence interval (CI) at 99.9%) and at the octave (2:1 = 2.0, above the CI at 99.5%) for late periods, whereas BF ratios in A1 showed a peak at the perfectfifth interval only in the distribution of late responses (above the CI at 99.9%). A significant difference in the distribution of peak distances was found between A1 and LB for early (A1: *N* = 72 intervals measured, LB: *N* = 28, *p* < 0.01, Wilcoxon signed rank test) but not for late responses, when the same bin-by-bin paired comparison was performed (A1: *N* = 65 intervals, LB: *N* = 52 intervals, *p* = 0.60). The proportion of harmonic intervals in the early period was significantly greater in LB than in A1, and the different bin widths did not affect the results (bin width = 2: 39 vs. 15%, <sup>χ</sup><sup>2</sup> <sup>=</sup> <sup>5</sup>.28, *df* <sup>=</sup> 1, *<sup>p</sup>* <sup>&</sup>lt; <sup>0</sup>.025; bin width <sup>=</sup> 1, 25 vs. 4%, <sup>χ</sup><sup>2</sup> <sup>=</sup> <sup>8</sup>.75, *<sup>p</sup>* <sup>&</sup>lt; <sup>0</sup>.005, <sup>χ</sup><sup>2</sup> test).

#### **RESPONSE TO PITCH-SHIFTED COOS**

If, as hypothesized, the spectral integration of harmonically related frequencies takes place in LB, a sound with harmonic structure should be more effective in evoking a response in this area than would a pure tone, even one at the BF. To test this hypothesis, we shifted the pitch of a coo call to match the neuron's BF and compared the responses between A1 and LB. A coo call was used because a previous study showed that LB neurons can be driven quite selectively by species-specific vocalizations (Tian et al., 2001). Auditory responses were sometimes elicited by a coo with the same pitch as a low tone sharing the same F0, particularly in neurons responsive to low frequencies, even if



AL (Tukey's HSD test, *p* < 0.01). Tuning width using BW30 gave similar results with significantly sharper tuning in A1 than AL (*p* < 0.05, Wilcoxon rank sum test). Means are plotted and standard errors are represented by bars.

the coo's overtones were outside the neuron's excitatory receptive field (RF).

The classification of neurons was based on each neuron's RF, and this was limited to neurons (*N* = 24) whose lower PT frequency cutoff fell within the range of frequencies we used in this paradigm (196–4435 Hz, see Methods). The neurons were classified into two groups: (1) frequency-representative neurons that responded to coo stimuli when the overtone harmonics fell into the neuron's RF, even though the F0 of the coo was outside the RF; (2) pitch-selective neurons that responded when the F0 of coo stimuli fell into the RF but did not respond to coo stimuli when the overtone harmonics fell into the neuron's RF. Other types of neurons showed various kinds of responses that deviated from the above two groups; these neurons were categorized as "non-classified" (*n* = 64). **Figure 6A** illustrates an example of a neuron in A1 that showed a frequency-representative response (Unit A). This neuron had a single peak (**Figure 6B**), and its BF was 2218 Hz (C#7). Since the overtone harmonics (h2–h6, **Figure 1**) fell into the neuron's RF even when the F0 of the coo was outside the RF, the tuning curve in response to pitch-shifted coo calls was broader than that in response to PTs (**Figure 6B**). The onset latency to the BF was 52 ms, whereas the latency in response to the coo (whose F0 matched the BF) was 46 ms, with no difference in mean firing rate to the two stimuli (PT, 62.2 ± 20.7 spikes/s; coo, 73.2 ± 19.9 spikes/s; *p* = 0.13, Wilcoxon ranksum test, **Figure 6C**). Of the 88 neurons tested, 21 (24%) were of this type, and there was no difference in proportion between A1 and LB (8 vs. 13 neurons, respectively, <sup>χ</sup><sup>2</sup> <sup>=</sup> <sup>2</sup>.1, *df* = 1, *p* = 0.15). Although the number is small (*N* = 3), there

**FIGURE 5 | Response of multi-peaked neurons. (A)** Raster plots of single-unit activity in an example neuron from area ML with multi-peaked tuning (bin width 1 ms). Plots are aligned to sound onset (vertical red line) in response to 84 tones presented in semitone-steps between 110 Hz (A2) and 13.3 kHz (G#9). Sound offset is indicated by vertical black line at far right. Y-axis indicates frequency (Hz) of pure-tone stimuli on a log scale. The neuron responded to four distinct frequency bands (440, 622, 932, and 1865 Hz), which are indicated by arrows. **(B)** Rate tuning curve of the multi-peaked neuron shown in **(A)** based on peak response magnitude during the entire duration of the sound (normalized by subtracting baseline activity and 5-point smoothing). Four peaks above the half driven rate (defined as 50% of the highest normalized peak firing rate, here indicated as a dotted line) were detected, and we chose the three highest peaks as best frequencies (BF1, BF2, BF3) to analyze the interval relations of multiple peaks. In this neuron, the frequency interval ratios of the three

peaks were 7, 12, and 19 semitones, which correspond to 1.5 (BF2/BF1), 2.0 (BF3/BF2), and 3.0 (BF3/BF1), all of which are harmonically related (i.e., perfect harmonic or musical intervals): P5 (perfect fifth), 7 semitones apart; P8 (perfect eighth, or octave), 12 semitones apart, and another P5 (perfect fifth), 19 semitones apart. **(C)** Distribution of peak distance in multi-peaked neurons of A1 and LB. The distribution of BF ratios was calculated based on the peak firing rates during the early-response period (0–70 ms from sound onset) and during the late-response period (71–300 ms from sound onset). The interval distance between two peaks was estimated based on frequency interval ratio (BF ratio, x-axis), and the relative frequency (i.e., number of intervals relative to the total number of intervals in each subdivision) is shown on the y-axis. The confidence interval (CI) at 99.5% is indicated by a dashed line, and at 99.9%, it is indicated by a dotted line. BF ratios above 5.0 are not shown in the figure for display purposes; however, none of those peaks reached the CI threshold of 99.5%.

were neurons that exhibited similar tuning in response to PTs and pitch-shifted coos with a shorter latency to the coo than to the PT-BF, and an enhanced response to the coo relative to the response to the PT-BF (Supplementary Figure 1).

#### **AVERAGE RESPONSE LATENCIES TO PURE TONES AT THE BF AND TO F0-MATCHED COMPLEX TONES ("COO" CALLS)**

The gradual increase in minimum latency from A1 to ML, and from ML to AL in response to both PT and PNB (without pitch), as shown earlier in **Figure 3C**, suggests that this auditory information is processed hierarchically along these three subdivisions. Furthermore, the presence of harmonically-related interval ratios between peaks of multi-peaked neurons in their onset responses in LB but not in A1 (**Figure 5C**) suggests that harmonic processing occurs initially and preferentially in LB rather than A1. As one might predict from this hierarchy of PT processing (**Figure 3C**), the average BF latency was also longer in LB than in A1 (LB: median: 59 ms, *Q*1 = 42 ms, *Q*3 = 90 ms, *N* = 93; A1: 38 ms, *<sup>Q</sup>*<sup>1</sup> <sup>=</sup> 27 ms, *<sup>Q</sup>*<sup>3</sup> <sup>=</sup> 50 ms, *<sup>N</sup>* <sup>=</sup> 75; *<sup>p</sup>* <sup>&</sup>lt; <sup>10</sup>−5, Wilcoxon ranksum test, **Figure 7**). By contrast, the latency to coos that were F0-matched to the BFs did not differ between LB and A1. This was due to the response latencies in LB to coos being significantly shorter than the response latencies to PTs at the BF (coo: median: 43 ms, *Q*1 = 28 ms, *Q*3 = 62 ms, *N* = 36; PT: 59 ms, *Q*1 = 42 ms, *Q*3 = 90 ms, *N* = 93; *p* < 0.01).

The above analysis restricts latency calculation to BF and the corresponding F0-matched coo. If minimum coo latency is calculated instead (i.e., the shortest latency of auditory responses elicited by all effective coos), similar results are obtained. Again, minimum coo latencies showed no difference between A1 and LB.

**frequency-representative responses to pure tones and pitch-shifted coos. (A)** Spike rasters (upper half of each graph) and PSTHs (lower half) aligned to sound onset (red dotted line) in response to pure tones (left column) and pitch-shifted coos (right column) at the same pitch (196–4435 Hz at half-octave steps). The blue line marks sound offset. **(B)** Tuning curves for Unit A, shown in **(A)**. The tuning curves resulting from peak responses to PTs and to the pitch-shifted coo, which is based on the fundamental frequency (F0) of the coos. Black horizontal line indicates half-driven rate. This type of neuron continued to show high firing rates when the overtone harmonics of the pitch-shifted coo fell into the neuron's RF, even when the F0 of the coo was outside the RF. **(C)** Averaged PSTH of the responses of Unit A to its BF (solid blue line) and to a coo (solid red line) whose F0 was matched to the BF. The latency in response to the BF-PT was 46 ms, whereas the latency in response to the coo was 52 ms. Sound duration period is shown in pink; sound onset is indicated by the vertical edge at time zero.

In monkey H, the respective values were 40 vs. 43 ms (median, Q1:23 vs. 25 ms; Q3:57 vs. 61 ms, *p* = 0.80); in monkey P, they were 39 vs. 47 ms (median, Q1: 31 vs. 34 ms, Q3: 53 vs. 58 ms, *p* = 0.42). By contrast, the minimum response latency to PT increased significantly from A1 to LB (monkey H: median: 30 vs. 43 ms, Q1: 17 vs. 27 ms, Q3: 42 vs. 79 ms, *p* < 10−4; monkey P: median, 27 vs. 32 ms, Q1: 16 vs. 19 ms, Q3: 38 vs. 49 ms, *p* < 0.01). Corresponding latency data from single-unit recordings are displayed separately for the three subdivisions in **Figure 3C**.

#### **DISCUSSION**

We recorded single-unit activity from auditory core cortex (A1) and from the middle and anterior divisions of the lateral belt

**FIGURE 7 | Average response to the pure-tone BF and to coo calls with F0 matched to the BF. (A)** PSTH of the responses in A1 (red) and LB (blue) averaged separately across all auditory neurons to their BF (left panel) and to a coo (right panel) whose F0 was matched to the BF. The similarly color-coded thin lines show the standard deviations from the average response. The dotted vertical line indicates sound onset. Latencies to the BF in LB were longer than those to a coo (median ± SE, monkey H: median: 39 vs. 60 ms, Q1: 29 vs. 43 ms, Q3: 48 vs. 95 ms, *p* < 0.001; monkey P: median: 38 vs. 56 ms, Q1: 27 vs. 34 ms, Q3: 56 vs. 80 ms, *p* < 0.05). By contrast, there were no latency differences between A1 and LB in response to a coo (monkey H: median: 35 vs. 33 ms, Q1: 11 vs. 25 ms, Q3: 58 vs. 59 ms, *p* = 0.55; monkey P: median: 43 vs. 46 ms, Q1: 37 vs. 43 ms, Q3: 59 vs. 63 ms, *p* = 0.38). **(B)** Neural latencies to the BF in response to pure tones (PT) and to the coo with its F0 matched to the neuron's BF. Unlike the latencies observed in response to PTs and PNBs (**Figure 3**), there was no difference between A1 and LB latency in response to coo. Furthermore, the latency to coo was significantly shorter than the latency to the BF in LB at the population level (two animals: 43 ms vs. 59 ms, *p* < 0.01). The abbreviations used for the box and whisker plots are the same as in **Figure 3**. ∗*p* < 0.05, ∗∗*p* < 0.01, ∗∗∗*p* < 0.001.

(LB) in response to pure tones and natural coo calls in two rhesus monkeys while they performed an auditory discrimination task. There were three major findings: (1) Latencies to pure-tone and pink-noise stimuli were significantly longer in LB than in A1; (2) responses to natural coo calls, which consist of complex harmonic tones with a defined fundamental frequency (F0), were observed with essentially equal latencies in LB and A1; together with finding 1, this suggests neuronal facilitation by communication calls with harmonic structures; and (3) although multi-peaked neurons were found in all three divisions, peak intervals in LB showed harmonic relationships in both early and late responses, whereas harmonic peak intervals in A1 were only found in modest numbers and only in late responses. These findings suggest that LB neurons play a critical role in the processing of auditory harmonics in animal communication calls.

#### **LATENCY DIFFERENCES BETWEEN A1 AND LB**

The gradual increase of pure-tone latencies from A1 to LB (A1, 28 ms; ML, 35 ms; AL, 44 ms; **Figure 3C**) is comparable to that observed in other studies of macaques (Recanzone et al., 2000; Camalier et al., 2012). We observed a similar latency increase from **A1** to LB in response to pink noise bursts (PNB) (A1, 36 ms; ML, 39 ms; AL, 51 ms; **Figure 3C**). On the other hand, Lakatos et al. (2005) showed that the latency to a noise stimulus was reduced in belt regions compared to A1. However, their recording sites appeared to be in the posterior medial belt, whereas ours were in the lateral belt. More importantly, that study used narrow-band noise (NBN) stimuli, which elicit a pitch percept, as opposed to PNB stimuli that have no pitch. Thus, the reduced latencies to NBN in Lakatos' study correspond more closely to the relative latency reduction in response to (harmonic) coo stimuli reported here.

Since recent studies have reported that neuronal activity in the auditory cortex differ depending on task context or task demands (Scott et al., 2007; Sutter and Shamma, 2011; Niwa et al., 2012), neural latencies to S+ were analyzed separately for correct and incorrect trials. Although our results did not show a significant difference between the two conditions, it may be of interest to address this question more systematically in the future. This will require a more balanced design, since the number of error trials was very small (1.8%) in the present study.

Absolute latencies were longer overall in our study than in previous studies. One of the main reasons for this may be the use of raw data (in 1-ms bins) during the baseline period, which causes higher variability of baseline firing rates than does using Gaussian-smoothed data (see Methods). Furthermore, in our study, the variability of baseline firing rate across all stimuli was taken into account. Shorter latencies are generally observed in studies measuring multi-unit activity and current-source density responses, because neural latencies can be more clearly identified from such signals (Lakatos et al., 2005).

#### **MULTI-PEAKED NEURONS AND HARMONIC INTERVALS**

Multi-peaked neurons tuned to harmonically-related intervals have been reported in the primary auditory cortex of several species, including bats (Suga et al., 1979), marmosets (Kadia and Wang, 2003), and cats (Oonishi and Katsuki, 1965; Sutter and Schreiner, 1991; Eggermont, 2007; Noreña et al., 2008). Specifically, octave and perfect-fifth coding has been reported in A1 of cats (perfect fifth: Sutter and Schreiner, 1991) and marmosets (octave: Kadia and Wang, 2003). While all studies agree that spectral integration begins already at an early stage of auditory cortical processing, our study demonstrates that the number of neurons with harmonically-related intervals between best-frequency peaks increases significantly from A1 to LB (**Figure 5**). Furthermore, while we found multi-peaked neurons with harmonic intervals in both A1 and LB, there was a clear difference between the two regions in terms of response type: The distribution of peak distances in LB had a maximum at the perfect fifth for both early (<70 ms) and late response components (>70 ms) and a peak at one octave for late response components. By contrast, in A1 only a peak at the perfect fifth was found, and only for late response components (**Figure 5C**). Different (preferred) harmonic intervals were reported in A1 of cats (perfect fifth: Sutter and Schreiner, 1991), marmosets (octave: Kadia and Wang, 2003), and humans (Moerel et al., 2013) and it would be interesting to perform a cross-species comparison of preferred harmonic intervals in multipeaked responses as well as their cortical distribution in the future.

The relatively small amount of harmonic tuning observed in the early responses of A1 neurons suggests the possibility that LB is the first stage of convergence of inputs creating harmonic tuning, and that A1 neurons may reflect harmonic tuning mainly via feedback from higher-order regions like LB. Alternatively, it is possible that LB receives direct thalamic inputs that integrate over a broad frequency range at regular frequency intervals, or that inhibitory intracortical inputs play a role in sculpting the harmonically-related intervals. Taking all the evidence together, it seems most likely that convergent cortical projections from A1 create harmonically tuned cells in LB. This mechanism is commonly referred to as spectral "combination sensitivity" (Suga et al., 1979; Margoliash and Fortune, 1992; Rauschecker et al., 1995). The overall narrower tuning in A1 compared to LB observed in our study is consistent with this conclusion and is also supported by previous findings of others (Schroeder et al., 2001; Fu et al., 2004; Lakatos et al., 2005).

In one behavioral study, Izumi (2000) showed that Japanese macaques are poor at discriminating a single tone from simultaneously presented two-tone stimuli separated by either one octave or by a perfect fifth that share the same pitch. This suggests that the monkey makes use of perceptual grouping based on harmonically-related tones. Correspondingly, in another study (Kadia and Wang, 2003), response modulation was observed when sounds were presented outside the classical RFs of A1 in awake marmosets. Using a two-tone paradigm, these authors found that frequency-tuning peaks in multi-peaked neurons were often harmonically related, and they observed response facilitation when such harmonically related pairs of tones were presented simultaneously. Similar effects have been reported by other studies in A1 (Fitzpatrick et al., 1993; Brosch and Schreiner, 1997; Brosch et al., 1999; Kanwal et al., 1999), further supporting mechanisms of combination sensitivity. Since we did not employ a two-tone paradigm, direct response facilitation (increased firing rates) by a combination of tones was not examined here in either A1 or LB. Further studies will also be needed to examine whether neurons in A1 or LB are in fact more sensitive to consonant than to dissonant sound structures of a complex tone, since a recent study highlighted responses in primary auditory cortex to nonharmonic sounds (Fishman and Steinschneider, 2010).

### **RESPONSES TO COMPLEX TONAL "COO" CALLS**

The average response to a PT at the best frequency (BF) and to a pitch-shifted coo at the same frequency also revealed that latencies to PTs were significantly shorter in A1 than in LB, whereas the response of LB neurons caused by adding higher harmonics to a fundamental frequency resulted in essentially equal latencies to natural coo calls in A1 and LB (**Figure 7**), a finding that may seem surprising given the standard view of hierarchical cortical processing. This finding further underscores that convergence of inputs in LB results in facilitation of responses to complex harmonic tones, as LB neurons generally prefer complex sounds over PTs (Rauschecker et al., 1995). Alternatively, responses to PTs and coos could depend on input from different divisions of the medial geniculate nucleus (MGN) with differential frequency tuning and latency (Hackett, 2011). Indeed, more multi-peaked neurons are found in the dorsal than in the ventral part of the MGN (Bartlett and Wang, 2011). We showed two possible neuron types that may contribute to the serial and parallel processing in A1 and LB (**Figure 6** and Supplementary Figure 1). However, this classification was not able to cover all the neurons recorded in the PT and pitch-shifted coo paradigm because of the constraints on BF frequency ranges (see Methods). The relationships between the frequency tuning of the neurons and response latency (**Figures 6**, **7**) remain unclear; specifically, we found only three neurons showing similar tuning to pitch-shifted coos and PTs (Supplementary Figure 1), and this needs to be addressed in further studies.

In sum, the findings of this study demonstrate that a purely serial model of cortical processing may be insufficient. On the other hand, the principles of hierarchical convergence and combination sensitivity in auditory processing (Rauschecker, 1998; DeWitt and Rauschecker, 2012) still stand. The latency reduction to harmonically-structured conspecific vocalizations and the existence of neurons tuned to simple frequency interval ratios in monkey nonprimary auditory cortex could be evidence of efficient information processing for ethologically relevant sounds. Harmonics are among the essential acoustic structures observed in natural acoustic environments that are generally limited to species-specific vocalizations (including human speech), which are the main sounds of biological interest for most species. In this study we employed a natural vocalization instead of synthetic stimuli to maximize our chances of eliciting neural responses, based on the evidence that neurons in the anterolateral belt area (AL) are more responsive to species-specific vocalizations (Tian et al., 2001). Since the previous study treated various harmonic and nonharmonic vocalizations as one category ("monkey calls") and the F0 of the harmonic vocalizations was not varied, in this study we controlled the pitch and harmonic structure of monkey vocalizations by using a coo call, one of the most frequently heard vocalizations in both field and lab environments. Although the coo call has ethological meaning for the animals used in this study, we cannot determine from our results whether LB neurons respond to the harmonic structure of the calls, or whether they respond instead to complex acoustic features that might relate to their ecological relevance. Identification of the cortical areas that are involved in the transition from processing complex acoustic features (i.e., pitch and harmonicities) to processing natural conspecific calls is an important question. This issue is highlighted in a recent study by Fukushima et al. (2014) using microelectrocorticography in awake macaques: the classification of vocalizations was better than that for synthetic stimuli as the recording sites moved from caudal to rostral within the auditory ventral stream. Further studies will be needed to address this point at different neurological scales, including the single-unit level. Also, it would be of interest to learn more about the underlying neuronal mechanisms of harmonic preference observed at the behavioral level (Schellenberg and Trainor, 1996; Izumi, 2000) and when this important evolutionary development first occurred.

## **ACKNOWLEDGMENTS**

The authors thank M. Lawson, C. Silver, M. P. Mullarkey, and J. Lee for animal care and assistance with the experiments, P. Kusmierek for preparation of the acoustic stimuli, M. Ortiz for ´ technical advice with AFNI, K. King for audiological screening, and P. Kusmierek, B. Scott, M. Sutter, S. Baumann, and ´ M. Fukushima for discussion. This work was supported by grants from the National Institutes of Health (R01 NS052494, R56 NS052494-06A1) and the National Science Foundation (PIRE grant OISE-0730255) to Josef P. Rauschecker, and by the Intramural Research Programs of NIMH and NIDCD.

## **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fnins. 2014.00204/abstract

## **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 19 February 2014; accepted: 30 June 2014; published online: 21 July 2014. Citation: Kikuchi Y, Horwitz B, Mishkin M and Rauschecker JP (2014) Processing of harmonics in the lateral belt of macaque auditory cortex. Front. Neurosci. 8:204. doi: 10.3389/fnins.2014.00204*

*This article was submitted to Auditory Cognitive Neuroscience, a section of the journal Frontiers in Neuroscience.*

*Copyright © 2014 Kikuchi, Horwitz, Mishkin and Rauschecker. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Feedforward and feedback projections of caudal belt and parabelt areas of auditory cortex: refining the hierarchical model

#### *Troy A. Hackett <sup>1</sup> \*, Lisa A. de la Mothe2, Corrie R. Camalier 1,3, Arnaud Falchier 4,5, Peter Lakatos 4,5, Yoshinao Kajikawa4,5 and Charles E. Schroeder 4,5*

*<sup>1</sup> Department of Hearing and Speech Sciences, Vanderbilt University School of Medicine, Nashville, TN, USA*

*<sup>2</sup> Department of Psychology, Tennessee State University, Nashville, TN, USA*

*<sup>3</sup> Laboratory of Neuropsychology, National Institutes of Mental Health, Bethesda, MD, USA*

*<sup>4</sup> Cognitive Neuroscience and Schizophrenia Program, Nathan Kline Institute, Orangeburg, NY, USA*

*<sup>5</sup> Department of Psychiatry, Columbia University College of Physicians and Surgeons, New York, NY, USA*

#### *Edited by:*

*Yukiko Kikuchi, Newcastle University Medical School, UK*

#### *Reviewed by:*

*Lizabeth M. Romanski, University of Rochester School of Medicine and Dentistry, USA Brian H. Scott, National Institute of Mental Health, USA*

#### *\*Correspondence:*

*Troy A. Hackett, Department of Hearing and Speech Sciences, Vanderbilt University School of Medicine, 465 21st Avenue South MRB-3 Suite 7110, Nashville, TN 37232, USA e-mail: troy.a.hackett@vanderbilt.edu* Our working model of the primate auditory cortex recognizes three major regions (core, belt, parabelt), subdivided into thirteen areas. The connections between areas are topographically ordered in a manner consistent with information flow along two major anatomical axes: core-belt-parabelt and caudal-rostral. Remarkably, most of the connections supporting this model were revealed using retrograde tracing techniques. Little is known about laminar circuitry, as anterograde tracing of axon terminations has rarely been used. The purpose of the present study was to examine the laminar projections of three areas of auditory cortex, pursuant to analysis of all areas. The selected areas were: middle lateral belt (ML); caudomedial belt (CM); and caudal parabelt (CPB). Injections of anterograde tracers yielded data consistent with major features of our model, and also new findings that compel modifications. Results supporting the model were: (1) feedforward projection from ML and CM terminated in CPB; (2) feedforward projections from ML and CPB terminated in rostral areas of the belt and parabelt; and (3) feedback projections typified inputs to the core region from belt and parabelt. At odds with the model was the convergence of feedforward inputs into rostral medial belt from ML and CPB. This was unexpected since CPB is at a higher stage of the processing hierarchy, with mainly feedback projections to all other belt areas. Lastly, extending the model, feedforward projections from CM, ML, and CPB overlapped in the temporal parietal occipital area (TPO) in the superior temporal sulcus, indicating significant auditory influence on sensory processing in this region. The combined results refine our working model and highlight the need to complete studies of the laminar inputs to all areas of auditory cortex. Their documentation is essential for developing informed hypotheses about the neurophysiological influences of inputs to each layer and area.

**Keywords: connections, brain, monkey, functional organization, laminar, architecture, anatomy, laminar**

**Abbreviations:** A1, Auditory area 1 (core); AChE, Acetylcholinesterase; AL, Anterolateral area (belt); AS, Arcuate sulcus; Cis, Circular sulcus; CL, Caudolateral area (belt); CM, Caudomedial area (belt); CO, Cytochrome oxidase; CPB, Caudal parabelt area (parabelt); CS, Central sulcus; Id, Insula, dysgranular; Ig, Insula, granular; IPS, Intraparietal sulcus; ITG, Inferior temporal gyrus; Lim, Limitans nucleus; LS, Lateral sulcus; LuS, Lunate sulcus; MGad, Medial geniculate complex, anterodorsal division; MGC, Medial geniculate complex; MGd, Medial geniculate complex, dorsal division; MGm, Medial geniculate complex, magnocellular division; MGpd, Medial geniculate complex, posterodorsal division; MGv, Medial geniculate complex, ventral division; ML, Middle lateral area (belt); MT, Middle temporal area; Pro, Proisocortical area; proA, Prokoniocortex area; PS, Principal sulcus; PV, Parvalbumin; R, Rostral area (core); Ri, Retroinsular area; RM, Rostromedial area (belt); RPB, Rostral parabelt area (parabelt); RT, Rostrotemporal area (core); RTL, Rostrotemporal lateral area (belt); RTM, Rostrotemporal medial area (belt); S2, Somatosensory area 2; Sg, Suprageniculate nucleus; STG, Supeior temporal gyrus; STS, Superior temporal sulcus; TPO, Temporal parietal occipital area; TPOc, TPO, caudal sector; TPOr, TPO, rostral sector; Tpt, Temporal parietotemporal area; VGluT1, Vesicular glutamate transporter 1; VGluT2, Vesicular glutamate transporter 2.

## **INTRODUCTION**

The auditory cortex of primates is spread out over a large portion of the superior temporal gyrus (STG) and plane. Current models recognize 13 areas, grouped into three major regions (core, belt, parabelt). The identification and classification of areas and regions is based on interpretation of their neuroanatomical and neurophysiological profiles (Pandya et al., 1969; Pandya and Sanides, 1973; Burton and Jones, 1976; Jones and Burton, 1976; Imig et al., 1977; Fitzpatrick and Imig, 1980; Galaburda and Pandya, 1983; Cipolloni and Pandya, 1989; Morel and Kaas, 1992; Morel et al., 1993; Jones et al., 1995; Kosaki et al., 1997; Hackett et al., 1998; de la Mothe et al., 2006; Smiley et al., 2007; Hackett and de la Mothe, 2009). Among the most informative and distinguishing features of the three regions are the topographic patterns of connectivity within and between them. The three areas that comprise the core region are densely interconnected with about eight areas in the surrounding belt region. The belt areas have strong connections with parabelt region, which is currently divided into two areas. The core areas have only sparse connections with the parabelt.

On the basis of these connections, a regional hierarchy has been proposed in which information received by the core is sequentially processed by areas in the belt, and then the parabelt (Hackett et al., 1998). A second gradient has also been proposed along the caudal-rostral axis of the temporal lobe, based on the patterns of connections and architectonic gradients (Hackett, 2011). Although scant data are available, the known laminar projections suggest that information tends to flow from caudal to rostral areas in a feedforward manner (dominant inputs to layer 4), whereas projections from rostral onto caudal areas tend to exhibit feedback laminar profiles (dominant inputs to supragranular and/or infragranular layers, especially layer 1) (Rockland and Pandya, 1979; Felleman and Van Essen, 1991). There is also a bit of evidence that the most caudal auditory belt areas (caudomedial, CM; caudolateral, CL) direct some feedforward projections caudally toward auditory-related areas in the temporoparietal junction, such as Tpt (Fitzpatrick and Imig, 1980; Galaburda and Pandya, 1983; de la Mothe et al., 2006). Thus, information flow within the auditory cortex appears to move along two major anatomical axes: core-belt-parabelt and caudal-rostral. Correlated with these anatomical patterns are gradients in neuronal response properties. Frequency tuning bandwidth, response latencies and stimulus specificity generally increase along these axes, whereas temporal precision tends to decrease (Rauschecker et al., 1995, 1997; Rauschecker, 1998a,b; Rauschecker and Tian, 2004; Lakatos et al., 2005a; Bendor and Wang, 2008; Petkov et al., 2008; Kusmierek and Rauschecker, 2009; Kikuchi et al., 2010; Scott et al., 2011; Camalier et al., 2012; Kusmierek et al., 2012).

Beyond the confines of the auditory cortex, information from the belt and parabelt areas reaches multiple auditory-related areas distributed throughout the brain. (Tranel et al., 1988; Kosmal et al., 1997; Hackett et al., 1999; Romanski et al., 1999a,b; Cavada et al., 2000; Lewis and Van Essen, 2000; Falchier et al., 2002, 2009; Ghashghaei and Barbas, 2002; Lavenex et al., 2002; Petrides and Pandya, 2002; Yukie, 2002; Rockland and Ojima, 2003; Barbas, 2007; Smiley et al., 2007; Saleem et al., 2008, 2013; Markov et al., 2014). A rostrally-directed stream reaches targets in the temporal pole, ventral, rostral and medial prefrontal cortex, rostral cingulate, parahippocampal areas and the amygdala. A caudallydirected stream flows from the caudal belt and parabelt areas into the temporoparietal junction, posterior parietal and occipital regions (such as secondary visual cortex), caudal and dorsal prefrontal areas, dorsal cingulate and parahippocampal areas. Additional output streams flow laterally from the belt and parabelt regions to the upper bank of the superior temporal sulcus (STS) and medially into the insula and retroinsular areas within the lateral sulcus (Galaburda and Pandya, 1983; Hackett et al., 1998; de la Mothe et al., 2006; Smiley et al., 2007).

At present, the wiring diagrams that inform our models of auditory cortical function are low in resolution. The most widely used schematics depict the auditory areas on surface maps of the brain, using lines and arrows to denote a *connection* between one area and another (**Figures 1A,B,D**). These diagrams are a useful guide for describing the basic layout and connections of the auditory cortex, but reveal nothing about the laminar distributions of the somata, dendrites and axon terminals that comprise those connections and contribute to their functional importance (e.g., feedforward, feedback, etc.).

Unfortunately, these kinds of low-resolution maps reflect most of the current knowledge base. There are two reasons for this. First, most of the intrinsic and extrinsic connections identified in the experiments cited above used retrograde tracers, which permit identification of the input sources (neuronal somata) to the area targeted by a tracer injection. Relatively few of these studies used tracers with anterograde transport properties, which reveal the laminar projections from an area to its targets. Second, several studies utilized flattened brain preparations (cut parallel to the pial surface) for areal reconstruction of the patterns of labeled cells. While these methods are advantageous for creating surface maps of the connections between areas and regions, information about the laminar circuitry is not preserved. Fortunately, a few studies have been published in which the connections were studied in coronal sections using at least some tracers with anterograde transport properties (Fitzpatrick and Imig, 1980; Galaburda and Pandya, 1983; Aitkin et al., 1988; de la Mothe et al., 2006; de la Mothe et al., 2012). As noted above, sufficient data could be gleaned from these studies to support hypotheses about information flow along the two major axes (Hackett, 2011). However, these data are far from complete, derived from mixed primate species using different methods, and inconsistent at several levels of analysis. Lacking, and desperately needed, is an extended series of detailed anatomical studies in which anterograde and retrograde tracers are employed to more completely work out the laminar distribution patterns that actually comprise the connections between areas. Ideally, a survey of the input/output connections of each area should be obtained so that a complete wiring diagram of the auditory cortex could be made available. Such maps were developed over two decades ago for primate visual cortex (Felleman and Van Essen, 1991), and studies of both visual and somatosensory cortex have since progressed toward defining the connections of morphologically and neurochemically distinct neuronal subpopulations in every layer and sublayer (Callaway, 2002; Thomson and Bannister, 2003; Bannister, 2005; Douglas and Martin, 2007; Briggs, 2010; Feldmeyer et al., 2013). This implies that for primate auditory cortex, we are at the early stages of obtaining the kinds of basic data that were summarized over 20 years ago by researchers studying the visual cortex. This is an essential task, and one that will require sustained effort to complete.

With that in mind, the present study represents the beginning of what we intend to expand into a comprehensive accounting of the intrinsic connections of the primate auditory cortex. By injecting tracers with bidirectional transport properties in caudal belt (ML, CM) and parabelt (CPB) areas, our primary goals were to begin acquisition of the anatomical data, and at the same time test key predictions of our working model. For example, if the belt is driving activity in the parabelt, then projections from the belt should (at least) target layer 4 (L4) of

**FIGURE 1 | Location and subdivisions of the auditory cortex in macaque monkeys. (A)** Lateral view of the left hemisphere showing position of auditory and auditory-related regions in the superior temporal region. The parabelt region is located on the surface of the superior temporal gyrus (STG). The core and belt regions lie on the superior temporal plane, visible after graphical removal of the overlying parietal and frontal opercula (cut). Area TPO lies on the upper bank of the superior temporal sulcus. The locations of the areas on the STG rostral to auditory cortex (Ts2, Ts1, and Pro) are also labeled (after Galaburda and Pandya, 1983). **(B)** Location and connections of areas within and around auditory cortex (see list of abbreviations for details). *Core region*: A1, R, RT; *Medial* *belt region*: CM, MM, RM, RTM; *Lateral belt region*: CL, ML, AL, RTL; *Parabelt region*: CPB, RPB; *auditory-related fields*, Tpt, Ri, Pro, S2, Ig, Id, Ts2. Arrows denote simplified patterns of connections between areas. Connections between non-adjacent areas not shown. **(C)** Photograph of the left hemisphere of Case 2 showing locations of injections (FR, filled circle; BDA, filled square) and blocking cuts (dashed lines a and b). Microtome sections were cut parallel to line b. **(D)** Dorsal views of the superior temporal plane showing locations of core, belt, and surrounding regions. The approximate locations of several areas are labeled. Caudal is up, lateral is right. Black contour lines run across the brain surface from medial (left) to lateral (right). Scale bars, 10 mm.

the parabelt. Similarly, if caudal areas are driving rostral areas, then projections from the caudal areas should target L4 of the rostral areas. In addition, these experiments provided us with an opportunity to better characterize the nature of the inputs to the temporoparietal occipital area (TPO) on the upper bank of the STS, which is broadly connected with numerous auditory, visual, somatosensory, prefrontal, and posterior parietal areas. Earlier observations suggested that projections from the STG project to L4 and other layers of TPO (Cusick et al., 1995; Seltzer et al., 1996), indicating that the parabelt could be the main source of feedforward auditory input to this multisensory region.

#### **MATERIALS AND METHODS**

Three macaque monkeys were used in this study (1 *macaca mulatta*, 2 *macaca radiata*). All procedures involving animals were conducted in accordance with international standards on animal welfare, followed NIH Guidelines for the Care and Use of Laboratory Animals, and were approved in advance by the Vanderbilt University Institutional Animal Care and Use Committee.

#### **GENERAL SURGICAL PROCEDURES**

Aseptic techniques were employed during all surgical procedures. Animals were premedicated with cefazolin (25 mg/kg), dexamethasone (2 mg/kg), and robinul (0.015 mg/kg). Anesthesia was induced by intramuscular injection of ketamine hydrochloride (10 mg/kg) then maintained by continuous isoflurane (2–3%) inhalation blended with 100% oxygen (1 L/min) through an endotracheal tube. Body temperature was held at 37◦C with a water circulating heating pad. Heart rate, expiratory CO2, and O2 saturation were continuously monitored throughout the surgery and used to adjust anesthetic depth. For all surgical procedures, the head was held by hollow ear bars affixed to a stereotaxic frame (David Kopf Instruments, Tujunga, CA).

In Cases 1 and 3, injections of ML and CM were made using a vertical approach through chronic recording chambers (Crist Instruments, Hagerstown, MD) implanted over the auditory cortex in the left hemisphere. Injections were made through the injection ports of 24-channel linear array electrodes (U-Probe, Plexon Inc., Dallas, Tx) after completion of chronic electrophysiological recordings. The details of this procedure are explained in Smiley et al. (2007), where we first used the U-Probe for this purpose. Briefly, current source density analysis was used to initially position the deepest electrode channels between the pia and white matter. Slight adjustments in position are made to center the injection port at the prominent current sink in L4. The full volume of tracer (**Table 1**) was injected in 3 equal boluses, which disperses by capillary action up and down the electrode shaft, effectively depositing tracer across all cortical layers.

In Case 2, a midline incision was made exposing the skull, followed by retraction of the temporal muscle. A craniotomy was performed exposing the left dorsal STG, lateral fissure, and overlying parietal cortex. After retraction of the dura, warm sterilized silicone oil was applied to the brain to prevent desiccation of the cortex. Tracer injections were made into target areas through a pulled glass pipette affixed to a 1µl Hamilton syringe. The pipette was advanced into cortex under stereo microscopic observation to a depth of 1000µm using a stereotaxic micromanipulator. After manual pressure injection of tracer into each target area (**Table 1**), the syringe was held in place for 10 min under continuous observation to maximize uptake and minimize leakage. Injection the CPB were made directly into the lateral surface of the STG after removal of the dura. Injection of ML was achieved by slight retraction of the banks of the lateral fissure, as previously described (Hackett et al., 2005).

#### **TRACER INJECTIONS**

In all cases, tracers were injected by pressure into target areas of the auditory cortex. **Table 1** contains the relevant experimental details of each case, including tracer type, tracer volume, injection device, and area injected. In Cases 1 and 3, tracer injections were made subsequent to a series of electrophysiological recordings to more accurately identify the target area and surrounding areas. In Case 1, the injection into area ML was made through the recording chamber during a routine awake-behaving recording session. Recordings in this case broadly covered 10 areas of auditory cortex, as reported in Camalier et al. (2012). In Case 3, the injection into the rostral-medial limb of area CM was made through an established recording chamber following recordings focused on areas A1 and CM. Spatial mapping density was not dense in either of these cases, and did not significantly interfere with tracer transport or architectonic assays. In some figures, electrode tracks or lesions can be seen and these are marked with asterisks. In Case 2, the injections into areas ML (near its the caudal and lateral borders) and CPB (caudal and ventral quadrant) were made directly into cortex through a craniotomy under general anesthesia (see above) and in the absence of electrophysiology. Stereotaxic coordinates and surface landmarks were used to identify the target locations. In an attempt to avoid involving the dorsal CPB, the ML injection was made medial to the middle cerebral vein, by slight retraction of the dorsal bank of the lateral sulcus (Hackett et al., 2005). Note that the proximity of the injection to the CPB border (**Figure 4E**). The spread of tracer was minimal, but may have encroached slightly into the CPB, and if so, mainly in L1- 3a. Although the injection obscured architectonic features and absolute confirmation, the dense retrograde labeling in A1 is consistent with a significant deposit in ML, since CPB injections rarely produce labeled cells in A1 (**Supplementary Figure 2** and **Supplementary Table 2**). If the FR tracer was transported by any cells in the dorsal CPB, we are unable to determine that from the labeling patterns observed.

Three tracers were used in these studies: cholera toxin subunit B (CTB) (Vector Labs); 10 kDa tetramethylrhodamine (aka fluororuby, abbreviated as FR) (Molecular Probes); 10 kDa biotinylated dextran amine (BDA) (Sigma). FR and BDA have the potential for bidirectional axonal transport (anterograde and retrograde), but are most sensitive as anterograde tracers and produce well-defined labeling of axon arbors and terminal puncta. CTB is a very sensitive retrograde tracer, but uptake often produces strong anterograde transport, as in Case 1 and our previous studies in marmosets (de la Mothe et al., 2006). Rather than punctate labeling of terminals and their axonal arbors, anterograde transport of CTB typically produces a "dust-like" deposit in the terminal zone intermingled with some punctate terminal labeling.


*Areas of tracer injections (ML, middle lateral belt; CM, caudal medial belt; CPB, caudal parabelt). Neuroanatomical tracers (CTB, cholera toxin subunit B; BDA, biotinylated dextran amine; FR, fluororuby). Aqueous concentrations and volumes injected are listed for each tracer.*

This labeling is fine enough to be localized to specific to laminae and sublaminae, but the contacts of individual terminal puncta cannot usually be resolved.

#### **PERFUSION AND HISTOLOGY**

After a 14–21 days survival period, a lethal dose of pentobarbital (120 mg/kg) was administered intravenously. Just after cardiac arrest the animal was perfused through the heart with cold (4◦C) saline, followed by cold (4◦C) 4% paraformaldehyde dissolved in 0.1 M phosphate buffer. Following perfusion the brains were removed and photographed. The cerebral hemispheres were blocked and placed in 30% sucrose for 3 days. The cerebral hemispheres of each case were cut at slightly different angles. In Cases 1 and 3, the angle was very close to coronal, matching the angle of electrode penetrations through the recording chamber. In case 2, the angle was perpendicular to the lateral sulcus in the caudal to rostral direction at 40 µm, as shown in **Figure 1C** (line "b"). This minimized cross-cutting across cortical columns for areas in the lateral, superior temporal, inferior parietal, and central sulci.

In each brain, series of 12 sections were alternately processed for the following set of histochemical markers: (i) fluorescent tracer microscopy; (ii) biotinylated dextran amine (BDA) or cholera toxin subunit B (CTB); (iii) acetylcholinesterase (AChE) (Geneser-Jensen and Blackstad, 1971); (iv) stained for Nissl substance with thionin. Additional reactions were performed in some cases to facilitate reconstruction, or obtain data for other studies: (v) cytochrome oxidase (Wong-Riley, 1979); (vi) parvalbumin (PV); (vii) vesicular glutamate transporters 1 (VGluT1) and 2 (VGluT2) (Hackett and de la Mothe, 2009); (ix) neuron specific nuclear protein, NeuN (Hackett and de la Mothe, 2009), and (x) myelinated fibers (MF) (Gallyas, 1979). In cases 2 and 3, multifluorescent immunohistochemistry (IHC) of NeuN and VGluT2, VGluT1 or PV were combined with fluorescent detection of tracers in single sections to relate areal and laminar boundaries to the locations of axon terminals and somata. Auditory areas were identified using these markers according to detailed architectonic criteria established in previous studies of macaque monkeys (Hackett et al., 2001; Smiley et al., 2007; Hackett and de la Mothe, 2009), and also applied to marmoset monkeys (de la Mothe et al., 2006). In brief, the cytoarchitecture of the core areas stands out for a broad densely packed L4, separated from L6 by a cell-sparse L5. The middle layers of the core stain darkly for PV, VGluT2, MF, and AChE compared to sharp reductions in density at borders with adjacent belt areas. The parabelt transition from lateral belt to parabelt is usually within 1 mm of the edge of the lateral sulcus. It is not always sharply demarcated, but characterized by a reduction in L4 density of these markers. There is a gradual reduction in the prominence of these markers from along the caudal-to-rostral axis, which is used to distinguish adjacent areas along that axis. The border between parabelt and TPO is generally near the lateral edge of the STS, and its precise location also appears to meander somewhat along the rostral-caudal axis. **Supplementary Figures 1A,B** shows an example of fluorescent NeuN IHC paired with an adjacent section stained for VGluT2 and identification of areas in a coronal section through A1. The panels below illustrate how laminar boundaries were located in sections containing FR and BDA tracers in three different areas (C–F, area ML; G–J, area MM, K–N, area TPO).

#### **MICROSCOPY AND RECONSTRUCTION OF SECTIONS**

Digital images of brightfield sections were acquired using a Neurolucida system (MicroBright Field, Inc., Williston, VT) and Nikon 80i microscope. Fluorescent images were acquired using a Hamamatsu Orca digital camera and Nikon 90i microscope. All of the images in **Figures 2**–**9** are photomontages stitched from a matrix of multiple photographs obtained using a 10× objective and Nikon Elements AR software. These images were cropped, adjusted for brightness and contrast using Adobe Photoshop CS6 software. Images of sections containing transported tracers were selected at regular intervals (∼1:24) for illustration in rostral-to-caudal sequence. Final figures containing images and line drawings were made using Adobe Illustrator CS6 (Adobe Systems, Inc.). All descriptions of anterograde tracer deposits recorded in the text and figures were based on calculation of relative optical density in each layer of each cortical area. Digitized images of each cortical area (see **Figures 5**, **7**, **9**) were converted to 8-bit grayscale images and imported into ImageJ at full resolution (Rasband, W.S., ImageJ, U. S. National Institutes of Health, Bethesda, Maryland, USA, http://imagej*.*nih*.*gov/ij/, 1997–2014). For each cortical area, inverted grayscale levels (GL) from 0 (white) to 255 (black) were measured in each layer from regions of interest encompassing labeled terminals drawn using the polygon tool, avoiding artifacts, blood vessels, and retrogradely labeled cells. Background grayscale levels (BL) were measured from a separate region devoid of terminals in the white matter just below layer 6. The Gray Level Index (GLI) was calculated as follows: GLI = (GL–BL)/BL. These values are recorded in **Supplementary Table 1**, and used to set the intensity of layers in each panel of **Figure 10**. A "feedforward" (FF) connection type was defined as an axonal projection that produced concentrated terminal labeling in L4. Almost invariably, feedforward inputs to L4 of a recipient area were accompanied by band of labeled axons and terminals concentrated in other layers, most often L1 and L6. This was sometimes accompanied by weaker axon and terminal labeling in the intervening layers (L2–3B, L5). We refer to these projections as "lateral" connections. Projections that were concentrated in L1 or L1 and L6 are referred to as "feedback." These designations are based on prior studies (Rockland and Pandya, 1979; Felleman and Van Essen, 1991).

## **RESULTS**

## **CASE 1: ML INJECTION**

The sentinel case, which initiated the present study, was Case 1 (**Figure 2**). An injection of CTB was placed into ML, as part of an ongoing study of retrograde transport to the thalamus. **Figure 2** contains images of tissue sections at the level of ML, containing the injection site (C,D), and more rostral sections at the level of the RPB (A,B). At each level, sections stained for CTB (A,C) are depicted with a nearby section stained for AChE (B,D), as an example of the architecture that was used to identify areas of auditory cortex. In the panels below (a–e), higher magnification images from the RPB (a), CPB (b), A1 (c), and TPO (d,e) show retrograde and anterograde transport across laminae.

**FIGURE 2 | CTB labeling (A,C, a–e) and AChE histochemistry (B,D) in Case 1.** CTB injection into area ML visible in **(C,D)** (spearhead symbol). Adjacent sections processed for CTB and AChE at a more rostral location at the level of the RPB **(A,B)**. Rectangles in **(A,C)** denote locations of panels

**a–e**, below. These panels contain higher magnification images from the RPB **(a)**, CPB **(b)**, A1 **(c)**, and TPO **(d,e)** to show retrograde and anterograde transport in layers of each cortical area. Scale bars: **(A–D)**, 2 mm; **a–e**, 500µm.

The injection spanned all cortical layers and also appeared to involve the white matter directly below layer 6. Transport from this injection produced retrograde labeling of somata in nearly all areas of auditory cortex and ventrally across much of TPO on the upper bank of the STS. In addition, although CTB does not always produce detectable anterograde labeling, this case was an exception, as anterograde deposits were reliably present in areas where retrogradely labeled cells were found, and also in laminae (e.g., L1, L4) where they were not. Of particular interest was that anterograde transport was present in L4 and L6 of the RPB and CPB (**Figures 2Aa,Cb**), and extended across most of TPO (**Figure 2Cd,e**). Dense patches of overlapping anterograde and retrograde labeling, sometimes spanning all layers, were also present in TPO (d,e). In contrast, anterograde labeling in L4 of A1 was largely absent (c), despite strong anterograde and retrograde labeling in supragranular and infragranular layers. These laminar profiles suggested that feedforward and lateral projections from ML targeted RPB, CPB, and TPO, whereas projections to A1 were a type of feedback projection.

Further analyses of cortical transport were not pursued in this case, because of uncertainty about the quality of anterograde CTB transport, and some concern that the involvement of the white matter by the injection may have been taken up by fibers of passage to or from the caudal parabelt. However, as the overall projection patterns were highly similar to those observed in subsequent cases, the data have supplementary value. Further, as CTB is a highly sensitive retrograde tracer, this case served as an important control for the weaker retrograde transport associated with the BDA and FR tracers noted in the other cases.

#### **CASE 2: ML AND CPB INJECTIONS**

The FR and BDA tracers used in this study produced both retrograde labeling of neuronal somata and punctate anterograde labeling of axons and terminals. Typically, labeled terminals were distributed regularly along axon segments, often oriented horizontally (parallel to pial surface and laminae), crossing several cell columns. Some of these axons could be followed for several millimeters, and tended to be more common in L6 and white matter. Diagonal and vertical (radial) orientations were less common, and could occur in any layer. Occasionally, a radially-oriented axon traversed multiple layers, spanning infra and supragranular domains.

**Figure 3** depicts sections at different magnification from Case 2, in which BDA-labeling from injection of CPB was made fluorescent by reacting the sections with streptavidin tagged with a green fluorescent marker (AlexaFluor 488). This allowed simultaneous viewing of terminals from the FR injection of ML (red) and BDA injection of CPB in the same section (NeuN fluorescence for laminar identification not illustrated, but see **Supplementary Figure 1**). In panels A and B, sections from caudal (at level of A1) and rostral (at level of R) locations were selected for illustration. The panels below (C–E) are higher

**FIGURE 3 | Dual fluorescence images of BDA (green) and FR (red) labeling of axons, terminals, and somata in Case 2. (A,B)** Low magnification sections from caudal **(A)** and rostral **(B)** sections. Asterisks denote position of layer 4. **(C–E)** Higher magnification views through

portions of CPB, RM, and TPO show overlapping BDA and FR signals. Note overlapping bands in L4 and L6 from both tracers, and patchy columnar labeling across layers in all panels. Scale bars: **(A,B)** 2 mm; **(C–E)** 500µm.

magnification views through portions of CPB, RM, and TPO where overlapping signals from both tracers are clearly visible. These panels reveal zones of overlapping and non-overlapping transport of FR and BDA to L3 and L4 of these areas. These results illustrate the expected topographic differences between two different areas of auditory cortex, but also reveal a high degree of overlap in L4 and L6, for example, which implies that both input sources (ML and CPB) impact activity in the same cortical columns.

Note that, as in **Figure 2**, anterograde tracer deposits were dense enough in some locations that their laminar distributions were easily discerned even in low magnification images. Otherwise, labeled terminals and axons could only be resolved and visualized at higher magnification. In the figures of cases 2 and 3 that follow (**Figures 4**–**9**), anterograde terminal labeling is illustrated at lower and higher magnification for visualization of the details most relevant to this study Graphical representations of the projection patterns for each injection were also prepared to summarize this aspect of the results (**Figure 10**). Plots and cell counts of retrograde transport in this case are summarized in **Supplementary Figure 2** and **Supplementary Table 2**.

## *Projections of area ML*

The principal outputs of caudal ML reached core, belt, and parabelt areas along the entire rostral-caudal extent of the AC (**Figures 4**, **5**, **10**). Terminal labeling differed between areas and between layers. The strongest projections targeted areas at the same rostrocaudal level as the injection and areas just rostral to it.

Rostral to the injection, the most prominent projections from ML reached RPB, AL, RM, RTM and ProA (**Figures 4A–D**, **5A,C,F**). These were characterized by lateral and feedforward style inputs, usually with foci in L4 and L6, which stood out from somewhat reduced density in other layers. Most surprising were the strong projections to RM and RTM, as these areas were not expected to be important targets of feedforward projections by ML. These inputs spanned all layers, with prominent foci in L4 and L6 that extended medially into Pro in the floor of the circular sulcus. Weaker projections also reached the rostral core area, R, and rostral A1 where a few branching axons were located in L1/2 and L6. Otherwise, in A1 (**Figure 5K**), labeled terminals extended in L1/2 across the entire width of field from ML to MM. There was no significant input to RT or rostral TPO, and very sparse inputs to L1 of RTL.

At the approximate rostrocaudal position of the injection (**Figure 4E**), lateral connections with A1 were most dense in the L1/2 band (**Figures 5M,O**), moderate in the layers beneath, and sparse in L4. In MM, terminals were mainly found in bands spanning L1/2 and L5/6 (**Figure 5N**).

images in **Figure 5**. Scale, 2 mm.

Caudal to the injection, terminal labeling was found in CL, CM, CPB, Tpt, and TPO (**Figures 4F–H**). The strongest labeling in auditory cortex was in CL, where patches of dense terminal labeling extending across layers were bounded by patches of lesser labeling in L1–3b and L5–L6 (**Figure 5Q**). Light terminal labeling was found in L4. The laminar distributions in the intervening patches were comparable, but density was reduced overall across layers. In CM, labeling was patchy, as in CL, but not as dense (**Figure 5R**). In some ways, the laminar patterns in CM resembled those of A1, with heaviest concentrations in L1. Unlike A1, however, was moderate terminal labeling that extended across L2 and L3a, then became very sparse in L3b–6. In CPB caudal to the ML injection, terminal labeling was moderate in supraand infragranular layers, but avoided L3b and L4. In Tpt, very light terminal labeling, continuous with CL and CPB, persisted onto its gyral and planar domains, but was primarily restricted to axonal branching in L1–3a. Retrogradely labeled cells were mainly located in L3b of CL and CM, with some neurons in L3a and L5/6 of these areas. These results indicate that caudally-directed projections from ML reach the caudal belt areas and Tpt, and are clearly biased toward the supragranular layers (L1–3b). Terminal labeling in L4 was light in CL, sparse in CM, and absent from Tpt, indicating that caudally-directed feedforward projections from ML are extremely limited. The retrograde labeling suggests that ML may be a recipient of feedforward projections from CL and CM, however.

Projections from ML to TPO on the upper bank of the STS were primarily confined to its caudal half (**Figure 4G**), and roughly in line with the rostral-caudal span of the CPB. These inputs were characterized by two basic laminar patterns, often visible in the same section: (1) full columns of labeled terminals that spanned all layers (**Figure 5P**), or (2) terminal foci in L1/2, L4, and L6 with very sparse label in the intervening layers. This laminar pattern often occurred in the zones, or patches, that separated those with the columnar labeling pattern. In rostral sections, in line with the RPB, light terminal labeling continued from the ventral RPB for a short distance into the lateral edge of TPO before ceasing entirely. Therefore, ML has direct lateral and feedforward projections to mainly the caudal sector of TPO that bypass the CPB.

Two other patterns of interest concerned the projections to RPB and AL from ML. First, terminal labeling in RPB was concentrated in the dorsal half of the field, with sparse inputs to the ventral half (**Figures 4A,B**). In contrast, the projections to CPB from ML were more uniform from dorsal to ventral across the STG surface, then weakened near the transition to TPO in the STS (**Figures 4C–E**). The concentration of labeling in the dorsal CPB may reflect topographic gradients, but also may conform to a possible cytoarchitectonic boundary, separating the dorsal and ventral RPB. This possibility has been documented on the basis of architectonic features (Saleem and Logothetis, 2012).Second, in AL, terminals are located in a continuous band involving L1–3a that extended to the lateral edge of the core area, R (**Figure 4B**). Vertical patches of labeling spanning multiple cell columns were also located in AL (**Figure 5D**). These were characterized by somewhat higher concentrations of terminals in L4 and L2/3a. Thus, there are inputs to L1 across all of AL with patches characterized by feedforward and lateral inputs.

Altogether, the projections of area ML are consistent with the flow of information along the core-belt-parabelt axis, feeding also into TPO, and also along the caudal-rostral axis within auditory cortex.

#### *Projections of CPB*

With a few notable exceptions, the principal outputs of the ventral CPB injection were very similar to that of caudal ML, reaching nearly all of the core, belt, parabelt areas along the entire rostralcaudal extent of the AC, as well as TPO and Tpt (**Figures 6**, **7**, **10**). The span of the CPB projections covered a larger range along the rostro-caudal axis than those of ML, and therefore a greater number of sections are illustrated at low magnification in **Figure 6**.

Rostral to the injection site, the main targets of feedforward and lateral projections from the CPB were RPB, RTM, RM, ProA, and TPO (**Figures 6A–H**). These projections typically

formed prominent bands of terminal labeling in L1/2, L4 and L6, with labeling of variable density in other layers. In some locations, most notably TPO, radial columns of anterograde labeling spanning all layers added to these horizontal bands (**Figures 7A,C,D,E,H**). These columns were separated by zones with reduced label in L3 and L5. Weaker projections to rostral areas reached AL, and primarily L1-3a and L6, and also the core area, R, where inputs were restricted to a continuous band in L1. As observed for area ML, the weakest CPB projections in the rostral direction were to RTL, where only an occasional axon segment was found in L1. No projections were found to the putative core area, RT.

At about the same rostrocaudal level of the injection (**Figures 6H–J**), strong labeling across all layers formed columns in TPO, CPB, ML, MM, and CL (**Figures 7H–K**). As in more rostral locations, these dense projections occurred in patches, between which labeling density was significantly reduced. As an example, images of two adjacent patches from area ML are illustrated in (**Figures 7H,I**). In the left panel (H), terminal labeling spans all layers, and was most dense in L1/2. In the right panel (I), terminal labeling was reduced and mainly found in L1/2 and L6. The visible band in L4 is mainly produced by non-specific background staining, as only a few labeled terminals were found there.

Also at this rostrocaudal level, note that projections to A1 from CPB were limited to L1, which typically formed a continuous band that ran across the entire lateral to medial extent of A1 at all levels (**Figure 7L**). Combined with similar projections to L1 of R, it appears that this part of CPB projects evenly to L1 across the entire surface of A1 and R in the core. Projections to MM were concentrated in L1–3a and L5–6, but very sparse to absent in L3b–4.

Caudal to the injection site, terminal labeling was present in the belt areas, Tpt, and TPO. In CPB (**Figure 7N**), terminal labeling was mainly located in L1-3a and L5–6, but very sparse in L3b and 4. Further caudal in CPB, the laminar pattern was maintained, but projection density was reduced (**Figure 7O**). In CL, dense patches of labeled terminals spanning all layers occurred in patches (**Figure 7Q**), but also zones of lighter labeling concentrated in L1/2 and L6 were present between patches in CL (**Figures 6O**). In CM (**Figure 7P**), terminal labeling was primarily in L1/2 and L6, similar to the lighter zones in CL. In caudal TPO, the patchy columnar labeling seen in rostral TPO and other areas continued across at least the lateral 2/3 of the upper bank (**Figure 7K**) of the STS before diminishing at the caudal level of Tpt (**Figures 6I–P**). In Tpt, anterograde projections mainly targeted L1–3a (**Figure 7R**).

Together, these data indicate that except for RT and RTL, CPB has some type of projection to all areas of the auditory cortex, including the core. Feedforward and/or lateral projections mainly target rostral belt and parabelt areas, TPO, and the lateral belt areas adjacent to CPB. Varieties of feedback projections were more typical of the core and caudal medial belt areas. As noted for ML, these projections are consistent with prominent paths of information flow along the core-belt-parabelt-TPO and caudal-rostral axes in the auditory cortex.

#### **CASE 3: CM INJECTION**

In Case 3, a BDA injection was placed into the rostral and medial portion of CM, near its junction with MM (**Figures 8E,F**). Overall, anterograde and retrograde transport was very strong to the caudal portion of auditory cortex and uniformly weak or absent to rostral areas.

Rostral to the injection site, BDA labeled axons and terminals were found in A1, ML, MM, and CPB, but not in any of the rostral core, belt or parabelt areas. A few labeled somata were noted in the insula of the most rostral section illustrated (**Figure 8A**, open symbols), at the level of R, but no labeled axons or terminals were found at this level in any field. In rostral A1

and the transition from MM to RM (**Figures 8B,C**), a small number of axons and terminals were found in L1–3 (not illustrated). Caudally (**Figures 8D**, **9B,C**) labeling in A1 and MM was fairly high in L1, moderate in L2–3b, then light in L4–6. In ML, labeling was found in patchy columns, where terminal labeling was concentrated in L3a–4, and light in L1–2 and L5–6 (**Figure 8A**). In CPB rostral to the injection site, only a few isolated axons were found scattered in L3.

locations of electrolytic lesions. Scale, 2 mm.

Caudal to and in line with the injection site in CM, terminal labeling in CL and other portions of CM formed patchy columns that were distributed throughout the territory covered by both fields (**Figures 8F,G**, **9D,E**). In these columnar patches, labeled terminals were found in all layers, but the distribution was uneven. The greatest concentrations of terminals were in L3b and 4, with lighter labeling in L1–3a and weak labeling in L5–6. The patches were typically linked by reduced axon and terminal density in all layers, but a continuous band in L4 remained prominent, visible even at low magnification (e.g., **Figure 8G**, area CL). Although not entirely visible in **Figure 8G**, the L4 band was fairly continuous from CM to CL and into CPB on the STG. The terminal labeling in L4 of CPB (**Figure 9G**) was much lighter compared to the patches in CM and CL, but comparable to labeling in

the intervening zones between patches. In Tpt, a few branching axons with labeled terminals were found in L2-3b and L5–6, with a very light band of terminal labeling in L4 (**Figure 9H**). This laminar pattern was very similar to that observed in CPB caudal to the injection, but with reduced axon density. In caudal TPO (**Figure 9F**), terminal labeling was concentrated in L4, with sparse labeling in other layers.

Although the spread of projections from CM was more constricted than for ML and CPB, the areal and laminar projection patterns were consistent with information flow along the same major anatomical axes. We are not certain whether the restricted projections, especially in the rostral direction, exemplify the projections of this part of CM, or whether unknown methodological factors limited transport, or perhaps both.

#### **LAMINAR PROJECTION PATTERNS**

The most common laminar patterns observed were: (1) columns of labeled axons and terminals that spanned all layers, but with distinct or prominent bands in L1, L4, and/or L6. This was typical of projections from areas in presumably lower hierarchical stages to one or more areas at a higher stage where feedforward projections were found (e.g., ML and MM to CPB; ML and CPB to RPB and RM; MM, ML and CPB to TPO); (2) terminal labeling focused in supragranular and infragranular layers that avoided the middle layers, including L3b and 4, corresponding

to a lateral type of projection that lacked a feedforward component; and (3) terminal labeling concentrated primarily in L1 or L1–2 (**Figure 10**, right panels). This was typical of projections to A1 and R from CPB and ML, for example, implying a feedback style of input from higher to lower stage of processing. Other examples of this type were from CPB to CM and portions of CL. A somewhat unusual pattern was observed in the projections from CPB and ML to Tpt, which were focused in L1–3a. A very similar pattern characterized the projection from CM to MM and A1, which favored L1–3b. One other unusual pattern was the projection to ProA from ML and CPB, which produced labeled terminals almost entirely confined to L4 and L6. This is the only area that received inputs that did not also at least have inputs to L1 or other supragranular layers.

#### **RECIPROCITY AND NON-RECIPROCITY OF CONNECTIONS**

Although we did not focus on the patterns of retrograde cell labeling in this study, a few observations are worth noting for future consideration. Most of the locations that received inputs from CPB, ML, or CM also contained retrogradely labeled cells that project back to the injection site. Generally, these labeled somata were located in L3, and less often in L5 or 6. The absence, or near absence, of retrogradely labeled cells was noted in the connections between several areas. Examples included: (1) CPB to A1, R, ProA, and some portions of CM; (2) ML to A1, R, ProA, and Tpt; and (3) MM to TPO, CPB, and Tpt (**Supplementary Figure 2** and **Supplementary Table 2**). Weaker retrograde labeling was also frequently observed in the territory between columns in which dense terminal labeling across layers was accompanied by numerous retrogradely labeled somata concentrated in L3. The lack of labeled somata in these patches, or zones, was most obvious in the projections to TPO where terminals labeled bands in L1/2, L4, and L6 that joined the more prominent columnar patches. These intervening zones or patches are, therefore, sites which receive inputs from the injected areas, but may not project back to those sources.

The absence of labeled neurons in a location may be significant, but the relative rarity of retrogradely labeled cells in L5 and 6 is of questionable validity, as this would imply that most of the reciprocity between connected areas is accomplished via the connections of L3 neurons. This is unlikely. One possible explanation is that this reflects a technical artifact. It is widely known that 10 kDa BDA does not produce extensive retrograde labeling of neurons. Further, we have noted in prior studies of macaque auditory cortex that retrograde transport of the fluoruby and fluoremerald dextran tracers tends to be biased toward supragranular neurons for some unknown reason (Smiley et al., 2007). On one hand, in all of the areas where significant concentrations of terminals were found, retrogradely labeled cells were also present. By that definition, we could surmise that those interareal connections were reciprocal. However, the labeled cells were usually concentrated in L3, with fewer cells in L5 or 6, which may be an underrepresentation of the actual projection. We would add here that in contrast to cases 2–3, the CTB injection of ML in Case 1 (**Figure 2**) produced robust retrograde labeling of supragranular and infragranular neurons in most areas, which provides additional support for the biased transport conjecture. Therefore, although the absence of labeled somata in an area or layer may accurately reflect the absence of a connection, we cannot be entirely certain. For this reason, we elected not to emphasize the retrograde connection patterns in this study. For reference, however, plots and cell counts for Case 2 are summarized in **Supplementary Figure 2** and **Supplementary Table 2**.

## **DISCUSSION**

The purpose of the present study was to explore the laminar projections of selected caudal belt and parabelt areas, pursuant to a complete survey of the laminar projections of each area of the macaque monkey auditory cortex. Because prior studies and our current models of auditory cortical organization are based primarily on the analysis of neuronal somata labeled by retrograde transport, surprisingly little is known about the laminar circuitry of auditory areas within the superior temporal region. From those foundational earlier studies, low resolution wiring diagrams were generated, which now form the basis of our working models of auditory cortex organization in primates (Hackett et al., 1998; Kaas and Hackett, 1998, 2000; de la Mothe et al., 2006; Hackett, 2011). These diagrams depict connections between areas using lines and arrows (**Figure 1**), but lack information about the laminar projections of these areas. Therefore, much remains to be learned about these circuits by generating high-resolution wiring diagrams of the laminar circuitry, noting that such models were generated long ago for the visual cortex (Felleman and Van Essen, 1991), and continue to be refined (Markov and Kennedy, 2013). Looking ahead, development of these models is essential for generating and testing meaningful and informed hypotheses about auditory cortical function. Although limited in scope, the present study yielded several new discoveries of sufficient importance to compel modifications of our working model, as discussed below. These small steps increase our motivation to greatly expand this line of inquiry, as additional modifications of the model may result.

To summarize the present findings (**Figure 10**), the laminar projection patterns of the caudal belt and parabelt support and extend the hypothesis that information flows along two major axes in auditory cortex: core-belt-parabelt and caudal-rostral. First, projections with feedforward characteristics are directed from the caudal belt areas (ML, CM) to caudal domains of the parabelt (CPB) and TPO. The CPB also projects in this manner to caudal TPO. These patterns are consistent with a stream of information flow directed along the core-belt-parabelt axis of auditory cortex, that also feeds strongly into TPO from two different stages of the hierarchy (caudal belt and parabelt). Second, feedforward patterns were also evident in the projections of the caudal belt and parabelt to rostral belt and parabelt areas and rostral TPO. Some of these areas even received overlapping inputs from caudal belt and parabelt (e.g., RPB, RM, and RTM). Overall, these patterns are consistent with a flow of information from caudal to rostral among auditory and related areas in the superior temporal region (Hackett, 2011). Overlaid on these two major patterns of projections were the more complex area-specific projection patterns, for which the laminar relationships were more variable. The balance of the discussion highlights some of the more interesting details, which are presented in the context of a revised model of auditory cortical wiring.

## **SIMILARITIES AND DIFFERENCES IN THE PROJECTIONS OF ML, CM, AND CPB**

The similarities and differences in the connections of ML, CM, and CPB were enlightening with respect to general patterns of information flow and differences between individual areas. Perhaps the most robust finding was that ML and CPB have comparable laminar patterns of feedforward and lateral projections to several of the same belt and parabelt areas located rostral to or in line with the location of their injections (e.g., *rostral:* RPB, RTM, RM, ProA; *caudal:* CPB, and caudal TPO) (**Figure 10**). Rostral CM targeted some of the same areas (CPB and caudal TPO). These projections often spanned all layers, but were usually characterized by prominent bands of terminal labeling in L1/2, L4, and L6. Overall, these results indicate that the outputs of ML, CM, and CPB are directed along the two major anatomical axes within auditory cortex (core-belt-parabelt and caudal-rostral). The forward-directed projections along these axes have a prominent serial component that extends into TPO, but the projections to this region are not strictly serial since outputs from both belt and parabelt areas directly reach this field. These connections are discussed further below.

Other similarities were the laminar patterns of the projections that ran against the dominant feedforward gradients. Projections from ML and CPB to areas caudal to their injection sites (MM, CM, CL, caudal CPB, Tpt) were biased toward supra- or supraand infragranular layers, avoiding the middle layers, reflecting feedback or lateral connectivity. Similarly, projections to the core areas, mainly A1 and R, generally avoided the middle layers and were often concentrated in L1. Similarly, the projections of CPB to most of the belt areas also avoided the middle layers. These patterns indicate that feedback types of projections tend to characterize information moving in the opposite direction along the major axes. Elements of these patterns have been variably noted in prior studies (Galaburda and Pandya, 1983; de la Mothe et al., 2006).

In terms of differences between the injected areas, a few are highlighted here. First, ML had feedforward and lateral projections to AL, but the projections from CPB to AL reached supraand infragranular layers only. This indicates that AL is in the line of rostrally-directed feedforward projections from ML in the caudal belt, but not from the caudal parabelt. The absence of feedforward inputs to AL from CPB is consistent with straightforward core-belt-parabelt hierarchical relationships (Hackett et al., 1998). A second notable difference was that CPB had dense feedforward and lateral projections to caudal and rostral TPO, but projections to rostral TPO from ML were sparse to absent. This is intriguing since ML had strong forward projections to auditory areas at that rostral level (i.e., RPB, RM, and RTM), but not rostral TPO. It may turn out that CPB is the only caudal area with significant projections to rostral TPO. We predict that some or perhaps all of the rostral belt and parabelt areas will target that area, however. A third difference was that, whereas forward directed projection from the rostral CM injection targeted some of the same areas as ML and CPB (i.e., CPB, caudal TPO), projections were also concentrated in the middle layers of ML, CL, caudal portions of CM, and weakly in gyral Tpt. This suggests that some feedforward projections are directed laterally and caudally from the rostral CM position. Evidence of caudally-directed information flow has been observed in some studies (de la Mothe et al., 2006), but the data remain thin and will require further study of the caudal and medial belt areas. Finally, the absence of projections to all rostral auditory areas from this CM injection was striking. We are not certain whether technical factors could account for this, as projections from this region to rostral locations were noted in marmosets and macaques (de la Mothe et al., 2006; Smiley et al., 2007).

Finally, the concentration of inputs to L1/2, L4, and L6 indicate that, in addition to classic feedforward projections (to L4), significant inputs also terminate in other layers. This implies that within the bundle of projections from one area to another are multiple "strands" that target neurons in different layers. Although highly intriguing, it is not known whether the signals carried along each of these strands bear the same information or even have the same timing. At present, we do not know the specific cell types or laminar positions of the projecting (source) neurons, only that the majority are pyramidal neurons in layers 3 and 5. Future anatomical studies should incorporate methods to dissect these details. Ideally, these would be coupled with physiological recordings using laminar arrays to characterize the properties of the signals carried by each of these strands and their impact on neurons in all layers.

In summary, there were prominent similarities in the feedforward, lateral, and feedback projections of the caudal belt and parabelt areas that were injected. These lend support to our hypotheses about information flow in auditory cortex (Hackett, 2011). The results also reveal significant differences in the laminar projections of individual areas. This highlights the notion that each area, and likely each layer, has a different functional role, and sets the stage for studies that can bring out those features.

### **DIVERGENT AND CONVERGENT PROJECTIONS BLUR HIERARCHICAL RELATIONSHIPS**

The projection patterns of the caudal belt and parabelt had divergent and convergent characteristics. These patterns were in line with some, but not all, of the hierarchical relationships established in prior studies.

Divergent projections were reflected in two main ways. First, multiple areas of auditory cortex (and TPO) were labeled by injections of each area (**Figure 10**). The fact that a single area of auditory cortex has connections with several others is not novel, but one worth exploring further, since the present findings indicate that each area can foster feedforward, lateral or feedback projections to several other areas and at more than one level of auditory processing. ML, CM, and CPB had projections to multiple core, belt, and parabelt areas, as well as TPO. Also notable is that none of these tracer injections filled the entire target area. This is interesting since each of the injected areas is large, and its connections could be topographically distributed, meaning that the projections from different loci within a source area may be different in some ways (e.g., tonotopy, binaural integration, inputs from other areas, etc.). It will be important in future studies to compare injections placed in different portions of the same area to reveal whether its outputs are topographically organized. Second, within a single recipient area (e.g., RPB, TPO, A1), the projections from one of the injected areas were typically not contained within a single topographic locus (e.g., point-to-point). Instead, inputs were often distributed over multiple loci with different laminar profiles (e.g., **Figure 10**, twin panels in CPB, ML, A1, TPOc, CPB, CL). For example, patches that contained columnar labeling across all layers were often separated by inter-patch regions where terminal labeling was concentrated only in L4 and L6. This was common in belt, parabelt, and TPO. In other areas, the inputs were evenly distributed in some layers across much or possibly all of the entire field (e.g., continuous labeling of bands in L1 of A1 and R). These varied patterns of divergence imply that the outputs of a given area are processed in parallel by several areas, and by multiple locations within each of the recipient areas. There is little evidence of point-to-point connectivity in these circuits.

Convergent projections from two or more sites onto a single area or locus within an area were frequently observed. Three key observations are worth noting here. First, the projections of at least two, and sometimes all three, injected areas reached many of the same areas (**Figure 10**). The main exception, noted above, was rostral TPO, which was only reached by the CPB injection. Second, these projections sometimes originated from different levels of the core-belt-parabelt hierarchy. RPB, RM, RTM, and caudal TPO are examples of recipient areas in which convergent projections originated in belt and parabelt areas. Third, projections to a single area often overlapped in a single column, layer, or layers. An obvious example is in **Figure 3**, where fluorescent labeling was used to reveal the projections from ML and CPB in the same sections from Case 2. Overlap was substantial in L4 and L6 of CPB and TPO. Although CM was injected in a different case and not reflected in **Figure 3**, the locations of its projections to L4 of the CPB and TPO (**Figures 8**, **9**) rather strongly suggest that its inputs would also be overlapping. Altogether, these patterns imply that each area, specific layers within each area, and even multiple columns within each area process convergent inputs in parallel from two or more other auditory cortical areas at different hierarchical levels.

The functional implications of such widespread divergence and convergence are not very clear, as there are minimal data on the differences in neuronal response properties between hierarchical areas. Recent neurophysiological data from our laboratory provide some room for speculation. In recordings from 10 core, belt, and parabelt fields, an increasing gradient in response (spike) latencies was observed along the core-belt-parabelt and caudalto-rostral axes (Camalier et al., 2012) in response to clicks, tones, and noise bursts The gradient was strongest from caudal to rostral areas, and weakest from belt to parabelt. The results of this study, and several others with related findings, generally support the notion that feedforward signal flow is directed along the two major anatomical axes. However, we also noted that mean response latencies were only slightly longer in parabelt vs. lateral belt areas, and that their distributions were highly overlapping. When considered alongside the present study, one could predict that the information delivered to an area, or specific location within an area, by convergent projections from ML and CPB may arrive within a narrow temporal window. This would apply to several areas, based on the present study (RPB, RM, RTM, caudal TPO). Although the nature of the information delivered to a given site through convergent inputs is presumably distinct, those signals could reach that site at about the same time. This is especially intriguing since multiple areas at different hierarchical levels appear to receive at least some convergent inputs from different levels. In addition, for a given site, the precise timing of these events could vary between input layers. As one example, earlier arriving (e.g., modulatory) inputs to L1 could set the tone for later arriving (e.g., driving) inputs to L4, or perhaps reset the phase of ongoing oscillations (Lakatos et al., 2005b).

Along these lines, then, an important future line of inquiry will be to explore the possibility that each of these diverging and converging strands has different functional properties. It is likely that converging inputs from several areas are distinct, but are the divergent projections from one site propagating the same signal in parallel to multiple others? Do those signals differ by the laminar position and cell type of both source and target? These details are essential for understanding the ways in which signals are processed and distributed between areas of auditory cortex.

#### **FEEDBACK PROJECTIONS TO THE CORE FROM THE PARABELT**

It has been frequently observed in previous studies in primates that the parabelt does not receive significant input from the core areas A1 or R, and sparse inputs from RT. Projections to the parabelt from within the auditory cortex arise almost exclusively from the belt areas. This is the key anatomical support for a core-beltparabelt hierarchy (Hackett et al., 1998). However, as previously noted in macaques (Pandya and Rosene, 1993) and marmosets (de la Mothe et al., 2006), the parabelt region does appear to have a significant projection back to L1 of the core. In the present study, our results affirm those observations, and also indicate that those inputs are broadly distributed over the core (**Figure 10**, laminar profiles). The ventral CPB location injected in Case 2 projected evenly to L1 across the entire surface of A1 and R, although input to L1 of the putative third member of the core region, RT, was very sparse. The broad spread of the L1 projection over the core is very similar to that noted by Pandya and Rosene (1993) after a large isotope injection of the STG that probably involved CPB and ML. If we assume this pattern to be true of the parabelt region in its entirety (rostral and caudal divisions), then the feedback projections to the core would form an expansive and dense matrix over the entire core region that lacks obvious rostrocaudal topography. Given that apical dendrites from subpopulations of cells in almost all layers ramify in L1, the projections to L1 from even a single location in the parabelt could exert a powerful influence over global activity within the core region.

#### **IS THE ROSTRAL MEDIAL BELT A CONNECTIONAL CROSSROADS?**

In our studies of the connectivity of the auditory cortex in primates, we have noted that the rostromedial belt area, RM, is broadly connected with rostral and caudal auditory areas in a manner distinct from other belt areas (Hackett et al., 1998; de la Mothe et al., 2006; Smiley et al., 2007). See also Galaburda and Pandya (1983) (**Figure 10**). Retrograde tracing studies showed that whereas the caudal and rostral belt areas tend to have stronger connections with other caudal and rostral areas, the connections of RM appeared to lack such topography. Thus, the outputs of RM are more broadly distributed to belt and parabelt areas along the caudal-rostral axis. The surprising results of the present study add to this quandary, by revealing that the caudal belt and parabelt are sources of strong convergent inputs to RM that have both feedforward and lateral features, such as dense terminal labeling in L4, L6, and other layers. We also noted that RTM received similar inputs from the same areas. In contrast, the caudal medial areas in this study (MM, CM) did not have these convergent input profiles, implying that there is not a generalized pattern of feedforward or lateral inputs to the medial belt from the lateral belt or parabelt.

#### **INPUTS TO AREA TPT**

The caudal borders of the belt and parabelt regions are bordered by the temporal parietotemporal area (Tpt), which is mostly known for its multisensory features, including auditory responsiveness in some domains (Leinonen et al., 1980). Systematic studies of the neurophysiological properties have not been achieved so far. In prior anatomical studies from our research groups, retrograde tracer injections of Tpt and adjacent belt areas (CM, CL) revealed that its strongest cortical connections included the caudal belt and parabelt regions, whereas connections with the auditory core region are sparse (Hackett et al., 1998; Smiley et al., 2007). The principal thalamic inputs to Tpt include the medial/magnocellular division of the medial geniculate body (MGm) and multisensory nuclei of the posterior thalamus (i.e., suprageniculate, Sg: limitans, Lim; posterior, Po; medial pulvinar, PM), whereas inputs from the dorsal divisions of the MG (MGd) are sparse and variable (Hackett et al., 2007). On the basis of these connections, we have long considered Tpt to be an auditory-related field that is strongly influenced by the caudal belt and parabelt, and other sensory areas.

In the present study, projections to Tpt from CPB and ML targeted L1–3a, and sparse projections from rostral CM reached L1–4 (**Figure 10**). In the absence of other data, these patterns raise questions about which, if any, of the auditory cortical areas are a significant source of feedforward inputs to Tpt? The most likely sources would be CL and CM. Although our rostral CM injection revealed only sparse projections to L1–4 of gyral Tpt, perhaps stronger inputs may arise from caudal CM and parts of CL. In the absence of significant feedforward projections to L4 of Tpt, however, it is still possible that the inputs to L1–3a from caudal belt or parabelt areas could significantly impact auditory activity in this area. Given its position in the temporal-parietaloccipital junction, and projections to posterior parietal and dorsal prefrontal cortex (Hackett et al., 1999; Romanski et al., 1999a,b; Lewis and Van Essen, 2000), Tpt is potentially important link between higher order sensory cortex and the targets of the dorsal stream. Detailed studies of Tpt are long overdue.

#### **FEEDFORWARD PROJECTIONS TO TPO FROM CAUDAL BELT AND PARABELT AREAS**

In numerous prior studies, it has been observed that areas of the STG corresponding to the auditory belt and parabelt are broadly connected with areas on the upper bank of the STS corresponding to the TPO. We did not make efforts to subdivide TPO architectonically, but it appears that most of the terminal and cellular labeling was located in the rostral (TPOr), intermediate (TPOi) and caudal (TPOc) divisions (∼ TPO2–4), which approximately corresponds to the superior temporal polysensory area (STP) in other nomenclature (Jones and Powell, 1970; Seltzer and Pandya, 1978, 1989b, 1994; Cipolloni and Pandya, 1989; Barnes and Pandya, 1992; Cusick et al., 1995; Seltzer et al., 1996; Hackett et al., 1998; Padberg et al., 2003). These studies, which primarily used retrograde tracers, revealed that populations of labeled cells in TPO are rather dense, tend to be clustered in patches, and exhibit some degree of rostral-caudal topography, although there is also substantial overlap of rostral and caudal areas of the belt and parabelt along this axis. Data on the laminar input patterns from anterograde tracers are unfortunately rather scarce. Seltzer et al. (1996) made a large injection of the caudal parabelt, confined to the surface of the STG, that produced patches of terminal labeling along the caudal-rostral extent of TPO in the upper bank of the STS. In a second case, the CPB injection also extended into the upper bank of the STS. In this case, additional terminal labeling was found to extend beyond TPO and the upper bank to the fundus and lower bank, producing label in areas such as MT, MST, and FST. Cusick et al. (1995) also found that injection of the caudal parabelt produced patches of terminal labeling in TPO. In both studies, it was noted that these terminations spanned across layers in columns, were focused on L4, or were mixed. Thus, the laminar patterns they observed are highly similar to those identified in the present study (**Figure 10**).

Almost all of the existing data reveal that connections with auditory cortex do not extend significantly beyond the fundus of the STS to its ventral bank or to the inferior temporal gyrus (ITG), nor are there any clear connections with the middle temporal (MT) complex, or V5, which is a visual region known to be involved in visual motion processing. Otherwise, TPO and adjacent fields are broadly connected with primary and secondary sensory areas of visual and somatosensory cortex, and higher order areas of prefrontal and posterior parietal cortex (Seltzer and Pandya, 1978, 1989a,b; Pandya and Seltzer, 1982; Ungerleider and Desimone, 1986; Boussaoud et al., 1990; Cusick et al., 1995; Lewis and Van Essen, 2000; Saleem et al., 2000; Padberg et al., 2003; Markov et al., 2014). The connections of parietal and dorsolateral prefrontal cortex tend to overlap in rostral and caudal sectors of TPO, while connections of the posterior parietal and STG tend to be adjacent and non-overlapping (Barnes and Pandya, 1992; Seltzer et al., 1996). An interesting feature of the convergence of inputs in TPO is that they can be patchy, overlapping or interdigitating. It is not yet clear how patches associated with auditory cortex relate to those associated with other cortical fields. Although unimodal, bimodal, and trimodal responses to auditory, somatosensory, and visual stimuli have been recorded in TPO (Benevento et al., 1977; Desimone and Gross, 1979; Bruce et al., 1981; Baylis et al., 1987; Hikosaka et al., 1988; Schroeder and Foxe, 2002), the anatomical data suggest that while sensory inputs to STS may be initially segregated by modality, local connectivity provides a basis for multisensory interactions.

#### **CONNECTIONS BEYOND THE SUPERIOR TEMPORAL REGION**

We did not evaluate the projections to subcortical and other cortical regions for the present study, but in cursory inspections, we did note that the areas injected produced labeling in frontal, medial temporal, and thalamic locations. We did not observe labeling in posterior parietal areas, however, as might have been expected from prior studies (Pandya and Kuypers, 1969; Pandya et al., 1969; Lewis and Van Essen, 2000; Smiley et al., 2007). Because transport was judged to be very good from at least two of these injections, we are inclined to conclude that ML and CPB do not have significant projections to posterior parietal areas. Rather, judging from the earlier studies and more recent data (Markov et al., 2014), it is likely that Tpt, and perhaps CM or CL may be the most dominant sources of inputs to posterior parietal areas from superior temporal cortex. Interestingly, one recent study found significant projections between posterior parietal and RPB areas (Markov et al., 2014), suggesting that a more determined survey of these connections is warranted.

#### **INTEGRATION WITH PRIOR STUDIES**

One of the earliest studies of the laminar patterns of projections in auditory cortex was conducted in owl monkeys. Fitzpatrick and Imig (1980) analyzed projections in the core and belt after placing isotope injections into A1 or R. They found that projections from these core areas to belt areas often spanned layers, but sometimes with concentrations in L4 alone, L3a/4, L3a/4/6. These laminar patterns were similar to those observed in the feedforward and lateral projections between areas in the present study. They also found that within the core, projections from A1 to R targeted L4, consistent with a rostrally-directed flow of information within the core.

Three studies used tracers with anterograde or mixed anterograde and retrograde tracing properties to study auditory cortical connections in marmosets (Aitkin et al., 1988; de la Mothe et al., 2006; de la Mothe et al., 2012). Both sets of studies found that projections from core to belt areas often resulted in columnar terminations spanning layers, but often with a focal band of higher density in L4. In de la Mothe et al. (2006), injections of RM and CM or MM labeled columns of terminals in the lateral belt and parabelt areas, variably punctuated by more intense bands in L2/3a, 4 and 6. These columns were often separated by columns of weaker labeling, but it was common for dense terminal labeling to persist continuously in the L4 and 6 bands. This suggested that the medial belt areas give rise to feedforward and lateral projections to the lateral belt and parabelt. One additional note, when anterograde terminals were concentrated in L4 of another belt or parabelt area from these injections, they tended to be located in L4 in sites rostral to the injection site, but not caudal. So, in addition to a core-belt-parabelt pattern, there were hints of a rostrallydirected bias in the feedforward projections from the medial belt.

In their foundational study, Galaburda and Pandya (1983) used isotope tracers with anterograde transport properties to study the connections of auditory areas in the macaque monkey STG. Surprisingly, after 30 years, this study stands alone as the most extensive survey of anterograde projections in the auditory cortex of macaques. A major conclusion of that study supported a rostrally-directed pattern of connectivity between areas corresponding to the core, belt, and parabelt regions (terminology transposed to match our nomenclature). These patterns were tied to progressive stages of architectonic differentiation along this axis (caudal to rostral). Although injection sites were typically large, covering more than one field, some general patterns were identified that were also observed and refined by the present study. For example, in case IX, a large isotope injection involving the caudal parabelt and Tpt resulted in feedforward projections to the caudal parabelt and lateral belt, projections to L1 of the core (A1), and terminations across layers in areas corresponding to the RM and MM fields. In cases IV and V, injections of primarily the RPB generated feedforward projections to belt and parabelt areas rostral to the injection site, but predominately L1 projection to caudal CPB. In general, the patterns showed that rostrallydirected projections mainly originate in L3 and terminate in L4 or across all layers in the rostral targets. Caudally-directed projections tended to originate in infragranular layers and terminate in superficial layers. Projections from the core to belt areas were focused on L4, and projections from belt and parabelt to core were focused in L1. Projections from lateral belt to medial belt were spread broadly across layers. Overall, these patterns were generally comparable to those of the present study, although the detailed laminar patterns of connectivity revealed herein varied in a more specific manner between aerial targets, and the greater sensitivity of the tracers revealed the presence of axons and terminals in additional layers.

## **CAVEATS AND FUTURE DIRECTIONS**

The present study is based on only 4 tracer injections in 3 areas of 3 different experimental cases. Ideally, we would aim to have at least two injections from each target area as a means to evaluate reliability. There is always some variability in transport between injections into cortex. Reliable control of the precise size and location of the injections is generally not possible, even if all experimental variables are exactly repeated. In part, this is because the transport properties of different tracers vary, and the uptake and transport of the same tracer can vary due to factors that appear to be beyond experimental control. For the cases illustrated here, the injections were judged to be very good in terms of placement within a single area, involvement of all layers, and transport, and so we have a high degree of confidence in the results. Clearly, additional studies of this type are desperately needed to obtain detailed and comprehensive wiring diagrams of the projections of all auditory cortical areas, and bolster the findings of the present study. This will be a challenging pursuit, as the territory is relatively large, and many of the areas are buried in locations that require passing through or retracting the cortex of intervening regions. This is a necessary endeavor, however, as our understanding of auditory cortical function depends critically on knowledge of its circuitry. In the meantime, however, the results of this study raised several important questions about the diversity of ways in which signals are passed between areas, layers, and even specific cell types. Many of these questions can be addressed now by using laminar array recording techniques to document the properties of the signals carried by the various strands of projections that reach a given site.

## **ACKNOWLEDGMENTS**

The authors gratefully acknowledge the support of NIH grants R01DC04318 to Troy A. Hackett, R01DC011490 to Charles E. Schroeder, and R21DC012918 to Yoshinao Kajikawa.

## **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fnins.2014. 00072/abstract

#### **Supplementary Figure 1 | Examples of architectonic features used to**

**delineate areas and layers of auditory cortex. (A)** Coronal section at the level of mid-A1 reacted for NeuN IHC (blue fluorescence). Borders between other areas indicated by lines following radial orientation of cell columns. **(B)** Coronal section in same series as the section in **(A)**, but stained for VGluT2. **(C–F)** Coronal section through area ML, showing triple fluorescent labeling of FR, BDA, and NeuN. **(F)** is a grayscale conversion of **(E)**, showing position of layers. **(G–J)** Coronal section through MM. Same conventions as **(C–F)**. **(K–N)** Coronal section through area TPO, showing BDA transport and NeuN. Scale bars: **(A–N)**, 500µm.

#### **Supplementary Figure 2 | Plots of retrogradely labeled cells in Case 2 following tracer injections into ML (FR, open triangles) and CPB (BDA, filled circles).**

**Supplementary Table 1 | Gray Level Index (GLI) values of anterograde labeling in auditory cortical areas from tracer injections in Cases 2 and 3.** Measurements were taken from the sections illustrated in **Figures 4**–**9**. For each injection, GLI values are sorted by area and layer. Letters A–R in the top row correspond to panel numbers in **Figures 5**, **7**, **9**. These data are summarized graphically in **Figure 10**.

#### **Supplementary Table 2 | Retrograde-labeled cell counts and percent of total cells from tracer injections into ML (FR) and CPB (BDA) (Case 2).**

Sorted by cortical area in supragranular (S) and infragranular (I) layers. The results are graphically summarized in the charts below and in

**Supplementary Figure 2**.

## **REFERENCES**


support a dorsal stream supramodal timing advantage in primates. *Proc. Natl. Acad. Sci. U.S.A.* 109, 18168–18173. doi: 10.1073/pnas.1206387109


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 24 January 2014; accepted: 25 March 2014; published online: 22 April 2014. Citation: Hackett TA, de la Mothe LA, Camalier CR, Falchier A, Lakatos P, Kajikawa Y and Schroeder CE (2014) Feedforward and feedback projections of caudal belt and parabelt areas of auditory cortex: refining the hierarchical model. Front. Neurosci. 8:72. doi: 10.3389/fnins.2014.00072*

*This article was submitted to Auditory Cognitive Neuroscience, a section of the journal Frontiers in Neuroscience.*

*Copyright © 2014 Hackett, de la Mothe, Camalier, Falchier, Lakatos, Kajikawa and Schroeder. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## *Olivier Joly\*, Simon Baumann , Fabien Balezeau , Alexander Thiele and Timothy D. Griffiths*

*Auditory Group, Institute of Neuroscience, Newcastle University, Newcastle Upon Tyne, UK*

#### *Edited by:*

*Yukiko Kikuchi, Newcastle University Medical School, UK*

#### *Reviewed by:*

*Elia Formisano, Maastricht University, Netherlands Kazuyo Tanji, Yamagata University, Japan*

#### *\*Correspondence:*

*Olivier Joly, MRC - Cognition and Brain Sciences Unit, Department of Experimental Psychology, University of Oxford, South Parks Road, Oxford, OX1 3UD, UK e-mail: olivier.j.joly@gmail.com*

Recent neuroimaging studies in primates aim to define the functional properties of auditory cortical areas, especially areas beyond A1, in order to further our understanding of the auditory cortical organization. Precise mapping of functional magnetic resonance imaging (fMRI) results and interpretation of their localizations among all the small auditory subfields remains challenging. To facilitate this mapping, we combined here information from cortical folding, micro-anatomy, surface-based atlas and tonotopic mapping. We used for the first time, phase-encoded fMRI design for mapping the monkey tonotopic organization. From posterior to anterior, we found a high-low-high progression of frequency preference on the superior temporal plane. We show a faithful representation of the fMRI results on a locally flattened surface of the superior temporal plane. In a tentative scheme to delineate core versus belt regions which share similar tonotopic organizations we used the ratio of T1-weighted and T2-weighted MR images as a measure of cortical myelination. Our results, presented along a co-registered surface-based atlas, can be interpreted in terms of a current model of the monkey auditory cortex.

**Keywords: auditory cortex, monkey, tonotopy, phase-encoded design, cortical surface**

## **INTRODUCTION**

The auditory cortex is located in the temporal lobe of the primate brain and in macaque monkeys it lies mainly on the superior temporal plane (**Figures 1A,B**). The monkey auditory cortex can be divided into three core regions surrounded by seven or eight belt areas which project mainly to parabelt regions on the convexity of the superior temporal gyrus (Hackett, 2011). The current model of organization of the monkey auditory cortex consists of 12 regions which are defined on the basis of architectonic boundaries and connections (Jones et al., 1995; Hackett et al., 1998). Neurophysiological studies and more recently functional magnetic resonance imaging studies (fMRI) studies (Petkov et al., 2006, 2008; Baumann et al., 2010; Tanji et al., 2010; Joly et al., 2012b) aimed to improve our understanding of the functional properties of these auditory subfields. However, the attribution of auditory cortical subfields in fMRI studies can be problematic, because of imaging issues (e.g., EPI distortion and partial volume effects) and problems with the distinction between adjacent core and narrow belt areas that have the same tonotopic preference.

Imaging issues related to field definition using fMRI are manifold: (1) Spatial resolution is currently between 1 and 2 mm which is about the width of medial belt regions. (2) Echo-planar images (T2∗-weighted fMRI) typically show geometric distortions which are only partially corrected contributing to misalignment of fMRI results with anatomical images. (3) Voxel-based representation of the auditory cortex using axial slices or oblique slices aligned with the posterior part of the superior temporal plane cannot represent faithfully the anterior auditory cortex which is folded, in particular at the level of the circular sulcus (**Figures 1B,C**).

Tonotopy is a basic organizing principle along the auditory pathway including the primary auditory cortex. In macaques, many neurons in the superior temporal plane respond to auditory tones and show a tuning for specific stimulation frequencies. The auditory core neurons in particular, which receive direct input from the strongly tonotopically organized ventral division of the medial geniculate complex, show an organization of characteristic frequencies, which progresses mainly along a posterior-anterior axis (**Figure 1C**). Reversals in this progression define functionally the borders of 3 core regions: from posterior to anterior these are A1, R, and RT (**Figure 1C**) (Merzenich and Brugge, 1973; Morel et al., 1993). Tonotopic maps in the core region form a continuum with frequency preference found in the surrounding belt areas (see illustration in Figure 1 in Hackett et al., 2001). This shared frequency tuning between core and adjacent belt regions means that this property cannot be used to functionally define a border between core and belt regions. Other functional properties have been suggested to distinguish core and belt such as selectivity of frequency tuning (Moerel et al., 2012), response latency and preference for noise over pure tones in belt areas (Recanzone, 2000; Rauschecker and Tian, 2004). However, neural response latency is difficult to assess with fMRI because of its temporal resolution (heamodynamic filter) and the latency difference between core and belt is small (Camalier et al., 2012). Anatomical consideration provides a more direct measure of core. At the microscopic level, a major anatomical characteristic of the core (mainly A1 and R) is that it shows heavy staining for parvalbumin, acetylcholinesterase, cytochrome-oxidase, and myelin as compared to belt regions (Hackett, 2011). At the macroscopic level, anatomical landmarks can largely predict the functional

borders in humans, where the utility of the transverse temporal gyrus has been emphasized (Da Costa et al., 2011) although micro-anatomy (cyto-architectural borders) does not always follow macro-anatomical landmarks (Morosan et al., 2001). Rhesus monkeys do not have transverse temporal gyrus but do have other anatomical features that we assess here.

Here, we present a set of data and analyses integrating information from 4 different sources: (1) tonotopic mapping based on phase-encoding fMRI design known to be efficient, producing relatively robust maps from relatively short scanning time (Da Costa et al., 2011; Engel, 2012). (2) Macro-anatomical features related to the curvature of the cortical surface. (3) Mapping of anatomical properties of the auditory cortex using an index derived from the ratio of T1 over T2 weighted MR images (Glasser and Van Essen, 2011). (4) A surface-based anatomical atlas. This has allowed us to produce a detailed map of areal organization on the 3-dimensional superior temporal plane based on fMRI and T1 and T2 maps and a current model of organization (Hackett et al., 1998; Hackett, 2011; Baumann et al., 2013). Comparison of the map with macroscopic anatomy suggests that the low-frequency representation at the border between A1 and R is associated with the posterior end of the circular sulcus.

## **MATERIALS AND METHODS**

## **SUBJECTS**

Two male adult rhesus monkeys (*Macaca mulatta*), 10 and 5 years of age, weighing 18 and 11 kg, participated in the experiment. The animals, M1 and M2, had previous exposure to experimental auditory stimuli. Before the scanning sessions, monkeys were trained to perform a visual fixation task with the head of the animal rigidly positioned with a head holder attached to a cranial implant (see Thiele et al., 2006 for details regarding details and surgical procedures). The visual fixation task was used to equalize as much as possible attention across runs and more importantly to minimize stress and body movement during scanning. All experiments were carried out in accordance with the European Communities Council Directive RL 2010/63/EC, the US National Institutes of Health Guidelines for the Care and Use of Animals for Experimental Procedures and the UK Animals Scientific Procedures Act and were performed with great care to ensure the well-being of the animals.

## **STIMULI**

Sound stimuli were generated at the beginning of each functional run. The stimuli were computed with a sampling rate of 44.1 kHz using a custom-made python-based program—*PrimatePy,* which mainly relies on Psychopy (Peirce, 2007), a psychophysics package (www*.*psychopy*.*org/). *PrimatePy* also uses other python libraries for the generation of sound arrays and for the control of the multi-threading architecture. Stimuli were pure tone bursts and were presented in either low-to-high or high-to-low progression of frequencies (**Figure 2A**). Frequencies were 500, 707, 1000, 1414, 2000, 2828, 4000, 5657, and 8000 Hz (halfoctave steps). Tone bursts were either 50 ms or 200 ms in duration (inter-stimulus interval 50 ms) and were alternated in pseudorandomized order during the 2 s block. Pure tone bursts of each frequency were presented in 2 s blocks in succession until all 9 frequencies had been presented. This 18 s low-to-high progression was followed by a 12 s silent pause, and this 30 s cycle was presented 15 times. A run lasted for 8 min and the two run types

with either low-to-high or high-to-low progression (**Figure 2A**), were alternated. Stimuli were delivered through MR-compatible insert earphones (sensimetrics, Model S14, www*.*sens*.*com) at about 75–80 dB sound pressure level (SPL). Scan noise was attenuated by the insert earphones and by dense foam padding around the ears.

block (baseline). **(B)** Sagittal view from mean image of time series for both

#### **BEHAVIORAL TASK**

The animal performed the visual fixation task during the acquisition of a full time series (8 min). Each time series was followed by a break of about 1 min. The eye position was monitored at 60 Hz with a tracking (camera-based with Infra-Red illumination) of the pupil using iView software (SMI, www*.*smivision*.*com, Teltow, Germany). The eye position, *X* and *Y* coordinates, was communicated to PrimatePy via a UDP/IP socket. The task was as follow: a fixation target (a small red square) appeared on the center of the screen, when the eye trace entered within a fixation window (about 2–3 visual degree centered onto the target) a timer started and the fixation target turned green. A continuous visual fixation (no saccades) of a randomly defined duration of 2–2.5 s. was immediately followed by the delivery of a juice reward using a gravity-fed dispenser. The reward was controlled by PrimatePy via a data acquisition USB device LabJack (U3-LV, http://labjack*.* com/).

#### **DATA ACQUISITION—MAGNETIC RESONANCE IMAGING**

Magnetic resonance images were acquired at 4.7 Tesla with an actively shielded vertical scanner (Bruker Biospec 47/60 VAS, inner-bore width of 38 cm, Bruker GA-385 gradient system) dedicated to imaging non-human primates. Shimming was performed with the FASTMAP (Gruetter, 1993) algorithm which uses projections through a predefined volume to measure B0 inhomogeneity and applies first and second order corrections. Functional MRI measurements by blood oxygen level-dependent (BOLD) contrast consisted of single-shot gradient-echo echo-planar imaging (GE-EPI) sequences with the following parameters: *TR* = 1400 ms, *TE* = 21 ms, 90◦ flip angle, Receiver Bandwidth (BW) = 138 kHz, matrix size 92 × 92, FOV 110 × 110 mm, inplane resolution 1.2 × 1.2 mm, slice thickness = 1.2 mm, yielding to 1.72 mm<sup>3</sup> voxels. Functional time series lasted 8 min and consisted of a continuous acquisition of 343 volumes (plus 2 dummy scans) with 20 axial interleaved slices (ascending order, no gaps) acquired with parallel imaging with 2-fold GRAPPA acceleration using 8-channel array receive coil. The RF transmission was done with the Bruker birdcage volume coil in transmit mode. From the scanner, a TTL pulse signal was triggered at the start of every volume and sent out to PrimatePy via the LabJack for synchronization purposes. In total, a number of 23 runs were acquired (M1:15, M2:8) which represents 23∗343 = 7889 EPIs. Based on the behavior (amount of body motion), only a subset of 16 runs entered into the analyses (M1:8, M2:8).

segmentation of the white matter (M1) MCA, middle cerebral artery.

Anatomical MR images consisted of 2 sequences, T1-weighted (T1w) and T2 weighted (T2w) images. The T1w images consisted of a 2D magnetization-prepared rapid gradient-echo (MPRAGE) sequence with a 130◦ preparation pulse, *TR* = 2100 ms, *TE* = 7 ms, *TI* = 800 ms, 27◦ flip angle, Receiver Bandwidth = 30 KHz. The T2w images consisted of a 2D Rapid Acquisition with Relaxation Enhancement (RARE) sequence with *TR* = 6500 ms, *TE* = 14 ms, Effective *TE* = 56 ms, *BW* = 50 kHz, RARE factor 8. The geometry was the same for both T1w and T2w images: matrix 166 × 166, FOV 100 × 100 mm, slices thickness 0.6 mm, and 54 axial slices. Because of time constraints, those anatomical scans were acquired during separate scanning sessions but with the same visual fixation task to minimize body motion and stress and to control the animal's behavior.

#### **DATA ANALYSES**

MR images were first converted from Bruker file format into 3D (anatomical data) or 4D (*x, y, z, t* functional data) minc file format (.mnc) using the Perl script pvconv.pl available online (http:// pvconv*.*sourceforge*.*net/) and next from minc to nifti format using the minc tools.

## *Structural images*

Structural images were resampled at 0.25 mm isotropic voxels with 7th order B-spline interpolation method and reoriented to MNI space (alignment of posterior and anterior commissures). The resampling allows a smoother definition of the cortical surface: interface between gray and white matter. The ratio between T1w and T2w structural images was used to segment the brain as it increases the contrast between white and gray matter and reduces biases from B0 inhomogeneities and from the receiver sensitivity profile (Glasser and Van Essen, 2011). Next, semiautomatic segmentation (Yushkevich et al., 2006) of the white matter was performed using ITKsnap (http://www*.*itksnap*.*org). The binary image (after dilation of 0.25 mm) was used to generate a 3D triangulated mesh (**Figure 2C**) including smoothing and decimation to reduce the number of vertices using BrainVisa suite (http://brainvisa*.*info) and a selection of the sub-surface corresponding to the superior temporal plane (lower bank of the lateral sulcus) was saved into the GIFTI (www*.*nitrc*.*org/projects/ gifti/) file format. The STP surface area was about 350 mm<sup>2</sup> in M1 and about 280 mm<sup>2</sup> in M2. The atlas surface from the macaque F99 atlas which is available in Caret software (Van Essen et al., 2001) was co-registered with our surfaces using ICP (Iterative Closest Point) registration (affine registration) as implemented in vtk (www*.*vtk*.*org). The benefit of this local surface-based registration is to enable the comparison of our functional MRI results with the anatomical atlas (Markov et al., 2011) which uses the well-established nomenclature and subdivisions of the auditory cortex defined earlier (Hackett et al., 1998; Kaas and Hackett, 1998). To improve the visibility of the cortical surface we flattened the superior temporal plane. As this flattening process is only applied to a local part of the cortical surface it does not require full brain inflation and spherical coordinates transformation. Instead it consist of a two-step procedure: (1) computation of the weighted adjacency matrix that reflects position of the vertices (Dijkstra's algorithm). (2) Computation of the multidimensional scaling (MDS) of the adjacency matrix (Joly et al., 2009).

T1w/T2w ratio images were sampled across cortical depth: along the normal of the surface at each vertex. Sampling along the normal was initiated on the slightly dilated gray/white matter surface. At each vertex, along the normal T1w/T2w, values exceeding ±1 SD of all values were excluded: this had the effect of removing voxels that contained significant blood vessel signal with very high T1w/T2w values or CSF signal with very low values (Glasser and Van Essen, 2011). Finally, maps were smoothed across the surface using a gaussian average weighted by geodesic distance (FWHM = 1 mm) that reduced high spatial frequency information, which appeared to be mostly noise.

#### *Functional images*

Raw fMRI data entered into a preprocessing stage using Statistical Parametric Mapping (SPM8) software (*www.fil.ion.ucl.ac. uk/spm/*), including slice timing correction and rigid body motion correction. The fMRI data were also spatially smoothed with a Gaussian kernel (FWHM = 1.5 mm). Voxel-based analyses were performed in SPM and consisted of the estimation of a general linear model (GLM) with a block design of alternating (18 s) ON and (12 s) OFF blocks and including the motion parameters. The quality of the co-registration of the 3D LS surface with the functional time series was visually verified using mean time series (**Figure 2B**). Projection of the functional volumetric data onto the cortical surface (**Figure 2C**) was then performed with Caret command line (interpolated voxel method) and resulted into surface-based (texture) time-series. The resulting time-series were further processed using python scripts (nitime and nibabel python libraries). Times series entered a filter with an infinite impulse response (IIR) function to remove fluctuations below 0.02 and above 0.1 Hz. The filtered times series of each vertex was then normalized as percentage of signal change relative to the mean signal of that vertex. For each vertex, cross-correlation between time-series from both run types was computed and time delay between the two signals (argument of the maximum correlation) revealed the preferred frequency. Maps of frequency preference were generated for the computation at each vertex and using an inclusive mask of correlation values above 0.2.

## **RESULTS**

## **SOUND ACTIVATIONS**

In both monkeys, we first analyzed the BOLD activation associated with sound stimulation in voxel space. The resulting SPM maps were projected onto the 3D surfaces and maps as shown in **Figure 3B**. In both animals, activation to sounds was found in most of the superior temporal plane. Robust sound-related activation was found in both monkeys. SPM maps show a rather symmetrical pattern of activation in both subjects. Maxima were found in a region posterior and lateral to the fundus of the circular sulcus illustrated with the medial white line (**Figure 3B**). Regions with less or no significant activation were found in the most posterior (around putative area Tpt) and most anterior part (temporal pole) of the superior temporal plane and also medial to the fundus of the circular sulcus in the insula.

### **SURFACE-BASED ANALYSES AND FREQUENCY PREFERENCES**

Next, we performed surface-based analyses where functional times-series are defined at each point (vertex) of the surface of the superior temporal plane. Time-series for each run-type were averaged, as illustrated in **Figure 4**, and cross-correlations were computed between the two averaged time-series. Best frequency maps (**Figure 5**) represent at each vertex the lag with the highest correlation (see Materials and Methods section). Iso-frequency lines were also generated and overlaid to highlight the feature of interest. In both monkeys, largely symmetrical maps were observed. Low-frequency preference regions were found in the 4 hemispheres at the coordinates *y* ∼ −3 mm and this region corresponds functionally to the putative border between A1 and R. In both monkeys, this low frequency region was found at the posterior end of the circular sulcus. Posterior to this region, a high frequency preference was often reported at the coordinate *y* ∼ −10. This high frequency region, which was less clear in left hemisphere of M2, was located at about 10 mm from the posterior end of the sulcus. In the left hemisphere of M2, the posterior border of A1 with high frequency preference in the medial and lateral part also shows vertices with a low frequency preference and vertices with a very low cross-correlation value in the masked

Local flattening allows the 2D representation of the 3D surface with minimal distortion as compared to full brain flattening. Mean curvature illustrates the original folding of the left and right STP in both monkeys. White lines

lines. **(B)** Activations to sounds as compared to silent baseline. SPM t-maps (volumes) are projected onto the surfaces. Statistical maps are thresholded at *p <* 0*.*001 uncorrected for multiple comparison.

region (gray). The posterior high-frequency region represents a functional putative border between A1 and caudal belt regions (CM and CL). Posterior to A1, nearly 10 mm of cortical surface, according to the adjusted atlas would remain to house the caudal belt regions and area Tpt. In both monkeys, another high frequency region was observed at *y* ∼ +5 mm and represents a putative border between area R and area RT. Finally anteriorly, a last low frequency region was observed in all hemispheres but in a lesser extent in the left hemisphere of M2. Co-registered F99 atlas (affine transformation) illustrates the subdivisions of the auditory cortex and highlights the main tonotopic progression in area A1 and R.

#### **ANATOMICAL FEATURES AND T1W OVER T2W RATIO**

Finally, the T1w over T2w ratio image was used to compute an index which represents the average intensities across the cortical thickness. For the lateral sulcus, the derived maps are shown in **Figure 6A** for both monkeys. The Tentative outlines drawn (red dashed lines) and illustrate the cortical region within which auditory core areas A1 and posterior part of R would be located based on this mapping. Highest intensities of gray matter voxels in the ratio T1w over T2w MR image were found within a posterior region of the lateral sulcus where A1 is to be expected according to the anatomical atlas and according to the tonotopic progression in each animal (see **Figure 5**). High values of this T1/T2 derived index were also found anteriorly where the core region R is predicted from atlas-based parcellation (**Figure 6**) and from the frequency progression (low-to-high) illustrated in this region (**Figure 5**). Note that in 3 out of 4 hemispheres (M1:right and M2:left and right), high values were also found to extend posteriorly into caudal belt regions (areas CM and CL).

## **DISCUSSION**

Here we combine information from functional MRI and anatomical MRI which allows improved parcellation of auditory cortex, and thus improved mapping of function to auditory subfields. We used for the first time in monkeys a phase-encoded fMRI design to map frequency preference in the monkey auditory cortex. Phase-encoded fMRI design has been successfully applied to retinotopic mapping in humans (e.g., Sereno et al., 1995) and in monkeys (Kolster et al., 2009) and to human tonotopic mapping (Talavage et al., 2004; Da Costa et al., 2011; Striem-Amit et al., 2011). Combined with macro-anatomical features (cortical folding) and co-registered surface-based atlas, we present here a detailed tonotopic map of the monkey auditory cortex in good agreement with the current model of organization of the monkey auditory cortex.

## **RELATION TO PREVIOUS TONOTOPIC MAPPING IN MONKEYS**

Previous tonotopic mappings in monkeys were performed and illustrated in voxel-space, and it is therefore difficult to compare our results with these studies. Previous work (Petkov et al., 2006) used voxel-based representation of oblique fMRI slices (2 mm; averaged 1–3 slices—up to 6 mm) in contrast to our surface mapping. Moreover, the underlying structural images in their study had limited contrast which further complicates the exact assignment to different cortical areas. Tanji et al. (2010) used maximum projection maps to illustrate more precisely the average of signal or *t*-values taken across 4 slices (6 mm) but it remains difficult to relate their map to the flattened representation from Hackett et al. (1998, 2001). In these previous fMRI studies, the circular sulcus is represented with a single line, while a more precise surface representation can illustrate the fundus or floor and the outer bank of the ventral circular sulcus. While A1 mainly lies in the posterior part of the ventral bank of the lateral sulcus which is rather planar, a substantial part of R and RT are found within the circular sulcus and therefore these regions suffer seriously from the axial (voxel-based) representation. This was partly addressed by Tanji et al. (2010) where area RT is shown within the circular sulcus via coronal sections (see their **Figure 5**) and R stretches from the STP into the adjacent postero-lateral bank of the circular sulcus. To overcome this limitation, 3D surface mapping and isotropic sampling is needed to faithfully represent auditory cortex beyond A1. One further challenge in subdividing the auditory

cortex resides in the fact that core and the adjacent lateral belt region share similar frequency preference, that medial belt regions are narrow and therefore difficult to isolate from neighboring core regions with functional MRI. In macaque monkeys, auditory fMRI mapping is more difficult than visual mapping because of the relatively small surface area of the auditory core (A1, R and RT) which is about 100 mm<sup>2</sup> when V1 is about 1400 mm<sup>2</sup> (V2 <sup>∼</sup> 1000 mm2). In the future, our combination of anatomical and functional features could enter a hierarchical clustering algorithm such as Ward's approach that can use the graph connectedness of our triangulated surface and perform an automatic parcellation of the auditory subfields for a given number of clusters.

In our two monkeys, we found a symmetrically bilateral low frequency preference at the posterior end of the circular sulcus which according to co-registered atlas would be the border between A1 and R. Very interestingly, this correspondence between the low frequency preference and a macro-anatomical landmark (posterior part of the circular sulcus) is reminiscent of the recent finding in humans (Da Costa et al., 2011) where low frequency preference was found on the crown of Heshl's gyrus (HG) (or within the sulcal divide of duplicated HG) in 10 subjects. Interestingly, this correspondence between the function and the macro-anatomy is probably the very reason for the success of human tonotopic fMRI (group) studies (e.g., Langers and van Dijk, 2012) which rely on non-rigid normalization into a common brain space (e.g., MNI space). Indeed, these human studies successfully demonstrated the tonotopic organization for at least two reasons: (1) The non-rigid normalization process aligns well the individual Heshl's gyri to the averaged structure of the MNI template. (2) A sufficiently strong correspondence between the tonotopic organization and the underlying macroanatomy. Moreover, the macro-anatomy in both species suggests that the postero-lateral bank of the circular sulcus in monkeys might roughly correspond to the antero-medial bank of the HG in humans (Baumann et al., 2013). There is often a forme fruste of Heschl's Gyrus in the macaque in the form of a ridge in similar position and orientation relative to the STP (Baumann et al., 2013)

Future developments in voxel-based quantification (VBQ) using quantitative MRI (Sigalovsky et al., 2006; Bock et al., 2013; Weiskopf et al., 2013) would allow the same definition of brain regions in humans and non-human primates. In the current study, we derived a map of cortical myelin and we found highly myelinated regions which seem to be centered into A1, extending anteriorly into posterior part of area R as expected from Hackett et al. (1998). In their anatomical observations, Hackett et al. (1998) illustrated (cf. their Figure 2C) a strong gradient between a very heavily myelinated area A1 toward a much less stained area R and RT: These observations are also in agreement with our maps. However, high values in our maps also extend posteriorly into caudal belt regions. It could be related to recent observations showing involvement in fast processing of sounds and short latencies of neural responses (shorter than in A1) in dorsal auditory regions (Kusmierek and Rauschecker, 2014). Despite its general agreement with known myelin maps (e.g., primary cortices, area MT) it remains difficult to know how our current implementation of cortical myelin mapping is a reliable predictor of core versus belt regions. In the future, this could be assessed in individual monkeys using a co-registration of MRI and post-mortem histological studies. However, in the meantime, improvement in functional MRI resolution and reduction of geometric deformations would be highly beneficial to increase the accuracy in association of functional MRI findings with the different auditory subfields. Before these improvements can be achieved, combining information from anatomical and functional properties, as described here, can help better localize recording and activation sites within the auditory subfields. This will help to guide precisely and efficiently electrophysiological recordings in monkeys and also to relate monkey fMRI results to similar fMRI studies in humans (Da Costa et al., 2011; Dick et al., 2012; Joly et al., 2012a).

### **ACKNOWLEDGMENTS**

We would like to thank the reviewers for their comments, the staff of the Comparative Biology Centre (Newcastle University) for helping with excellent animal welfare. Christopher I Petkov, David Hunter and Li Sun for sharing technical equipments and for technical support. The help of Melissa Saenz for generation of stimuli is also acknowledged. This research was funded by Wellcome Trust UK (WT091681MA and WT085002MA).

#### **REFERENCES**


functional magnetic resonance imaging. *Science* 268, 889–893. doi: 10.1126/science.7754376


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 15 April 2014; accepted: 25 June 2014; published online: 21 July 2014.*

*Citation: Joly O, Baumann S, Balezeau F, Thiele A and Griffiths TD (2014) Merging functional and structural properties of the monkey auditory cortex. Front. Neurosci. 8:198. doi: 10.3389/fnins.2014.00198*

*This article was submitted to Auditory Cognitive Neuroscience, a section of the journal Frontiers in Neuroscience.*

*Copyright © 2014 Joly, Baumann, Balezeau, Thiele and Griffiths. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Functional MRI of the vocalization-processing network in the macaque brain

Michael Ortiz-Rios 1, 2, 3 \*, Paweł Kusmierek ´ 1 , Iain DeWitt <sup>1</sup> , Denis Archakov 1, 4 , Frederico A. C. Azevedo2, 3, Mikko Sams <sup>4</sup> , Iiro P. Jääskeläinen<sup>4</sup> , Georgios A. Keliris 2, 5, 6 and Josef P. Rauschecker 1, 4, 7 \*

<sup>1</sup> Department of Neuroscience, Georgetown University Medical Center, Washington, DC, USA, <sup>2</sup> Department of Physiology of Cognitive Processes, Max Planck Institute for Biological Cybernetics, Tübingen, Germany, <sup>3</sup> IMPRS for Cognitive and Systems Neuroscience, Tübingen, Germany, <sup>4</sup> Brain and Mind Laboratory, Department of Neuroscience and Biomedical Engineering, Aalto University School of Science, Aalto, Finland, <sup>5</sup> Bernstein Centre for Computational Neuroscience, Tübingen, Germany, <sup>6</sup> Department of Biomedical Sciences, University of Antwerp, Wilrijk, Belgium, <sup>7</sup> Institute for Advanced Study and Department of Neurology, Klinikum Rechts der Isar, Technische Universität München, München, Germany

#### Edited by:

Monica Munoz-Lopez, University of Castilla-La Mancha, Spain

#### Reviewed by:

Simon Baumann, Newcastle University, UK Huan Luo, Chinese Academy of Sciences, China Olivier Joly, MRC Cognition and Brain Sciences Unit, UK

#### \*Correspondence:

Michael Ortiz-Rios and Josef P. Rauschecker, Department of Neuroscience, Georgetown University Medical Center, NRB WP19, 3970 Reservoir Rd. NW, Washington, DC 20057, USA michael.ortiz@tuebingen.mpg.de; rauschej@georgetown.edu

#### Specialty section:

This article was submitted to Auditory Cognitive Neuroscience, a section of the journal Frontiers in Neuroscience

> Received: 30 December 2014 Accepted: 17 March 2015 Published: 01 April 2015

#### Citation:

Ortiz-Rios M, Kusmierek P, DeWitt I, ´ Archakov D, Azevedo FAC, Sams M, Jääskeläinen IP, Keliris GA and Rauschecker JP (2015) Functional MRI of the vocalization-processing network in the macaque brain. Front. Neurosci. 9:113. doi: 10.3389/fnins.2015.00113

Using functional magnetic resonance imaging in awake behaving monkeys we investigated how species-specific vocalizations are represented in auditory and auditory-related regions of the macaque brain. We found clusters of active voxels along the ascending auditory pathway that responded to various types of complex sounds: inferior colliculus (IC), medial geniculate nucleus (MGN), auditory core, belt, and parabelt cortex, and other parts of the superior temporal gyrus (STG) and sulcus (STS). Regions sensitive to monkey calls were most prevalent in the anterior STG, but some clusters were also found in frontal and parietal cortex on the basis of comparisons between responses to calls and environmental sounds. Surprisingly, we found that spectrotemporal control sounds derived from the monkey calls ("scrambled calls") also activated the parietal and frontal regions. Taken together, our results demonstrate that species-specific vocalizations in rhesus monkeys activate preferentially the auditory ventral stream, and in particular areas of the antero-lateral belt and parabelt.

Keywords: auditory cortex, monkey, species-specific calls, spectrotemporal features, higher-level representations

## Introduction

The concept of two streams in auditory cortical processing, analogous to that in visual cortex (Mishkin et al., 1983), was proposed more than a decade ago (Rauschecker, 1998a; Rauschecker and Tian, 2000). The concept was supported by contrasting patterns of anatomical connections in the macaque from anterior/ventral and posterior/dorsal belt regions of auditory cortex to segregated domains of lateral prefrontal cortex (Romanski et al., 1999) and by different physiological properties of these belt regions. In particular, the anterior lateral belt (area AL) in the macaque exhibited enhanced selectivity for the identity of sounds (monkey vocalizations), whereas the caudal lateral belt (area CL) was particularly selective to sound location (Tian et al., 2001; see also Ku´smierek and Rauschecker, 2014). Evidence for segregated streams of auditory cortical processing has also been provided in human studies (Maeder et al., 2001; Arnott et al., 2004; Ahveninen et al., 2006).

Use of species-specific vocalizations for auditory stimulation in the macaque is of particular interest in the context of the ongoing debate about the evolution of speech and language (Rauschecker, 2012; Bornkessel-Schlesewsky et al., 2015). Comparative approaches have focused on identifying the common neural networks involved in the processing of speech in humans and of vocalizations in nonhuman primates (Gil-da-Costa et al., 2004; Frey et al., 2008, 2014; Petrides and Pandya, 2009; Joly et al., 2012b). Monkey calls convey semantic information about objects and events in the environment as well as about affective states of individuals, similar to information contained in human communication sounds and speech (Cheney and Seyfarth, 1990; Ghazanfar and Hauser, 1999; Yovel and Belin, 2013). An open question regarding the vocalization-processing network in the macaque brain is whether it also carries information about the motor actions necessary to produce the vocalizations, as has been shown in humans listening to speech and music (Wilson et al., 2004; Leaver et al., 2009).

Several studies have examined the representation of complex sounds, including vocalizations, in the macaque brain using neuroimaging techniques (Poremba et al., 2003; Petkov et al., 2008; Joly et al., 2012b). In particular, the first fMRI study by Petkov et al. (2008) found activation specific to monkey vocalizations in the anterior STG region. One of the aims in later studies has been to characterize the physiological properties of the anterior superior temporal (aSTG) region that shows sensitivity to higherlevel spectrotemporal features in vocalizations (Russ et al., 2008; Kikuchi et al., 2010, 2014; Perrodin et al., 2011; Fukushima et al., 2014). A recent comparative study by Joly et al. (2012b) replicated and extended these results by analyzing fMRI images of the entire brain and found an involvement of orbitofrontal cortex in the processing of monkey vocalizations. Given that the ventral pathway continues into orbitofrontal and ventrolateral prefrontal cortex (vlPFC) (Barbas, 1993; Romanski et al., 1999; Cohen et al., 2007; Petkov et al., 2015), this finding is of particular interest.

In humans, the ventral auditory pathway is thought to be particularly involved in the recognition and identification of vocalizations as well as speech (Binder et al., 2000; DeWitt and Rauschecker, 2012). By contrast, the dorsal pathway is involved primarily in processing sound source location and motion in both humans and animals (Maeder et al., 2001; Tian et al., 2001; Arnott et al., 2004). However, a recent proposal, derived from both human and non-human primate studies, suggests that the dorsal stream may also play a role in sensorimotor integration and control of complex sounds, including speech (Rauschecker and Scott, 2009; Rauschecker, 2011). Thus, activation of frontal and parietal regions might also be expected when monkeys are presented with conspecific vocalization sounds.

Here we identified which brain regions of the macaque monkey are sensitive to conspecific vocalizations using whole-brain functional magnetic resonance imaging (fMRI). We found the most distinct activation in the anterior STG and along the auditory ventral stream, but some clusters of activation were also found in prefrontal, premotor, and parietal cortex when comparing monkey vocalizations to environmental sounds. These findings are discussed in terms of their functional significance.

## Materials and Methods

## Subjects

Two male rhesus monkeys (Macaca mulatta) weighing 10–12 kg participated in our awake-fMRI experiments. Each animal was implanted with an MRI-compatible headpost (Applied Prototype) secured to the skull with ceramic screws (Thomas Recording), plastic strips, and bone cement (Osteobond, Zimmer). All surgical procedures were performed under general anesthesia with isoflurane (1–2%) following pre-anesthetic medication with ketamine (13 mg/kg) and midazolam (0.12 mg/kg). The experiments were approved by the Georgetown University Animal Care and Use Committee and conducted in accordance with standard NIH guidelines.

## Behavioral Training

To ensure the monkeys attended to each stimulus for which a brain volume was acquired, we adapted a go/no-go auditory discrimination task (Ku´smierek and Rauschecker, 2009; Kikuchi et al., 2010) for sparse-sampling functional MRI.

First, each monkey was trained to lie in sphinx position in an MRI-compatible primate chair (Applied Prototype) placed inside a double-walled acoustic chamber simulating the scanner environment. Inside the chamber, the animals were trained to be accustomed to wearing headphone equipment and hearing (simulated) scanner noise, presented by a loudspeaker. Eye movements were monitored using an infrared eye-tracking system (ISCAN). Analog output of the tracker was sampled with an analog-to-digital conversion device (National Instruments). A PC running Presentation software (Neurobehavioral Systems) was used to present visual and auditory stimuli, control the reward system, and trigger imaging data acquisition (see below).

After the animal completed the fixation training, a go/nogo auditory discrimination task was introduced, in which the monkeys could initiate a trial by holding fixation on a central red spot while a block of auditory stimuli would be simultaneously presented. After the first 6 s of auditory stimulation, a trigger was sent to the scanner, starting the acquisition of an image volume (**Figure 1B)**. Following acquisition and a random delay, the target sound (white noise) was presented, cueing a saccade to the left or to the right side as signaled at the beginning of each experimental session (**Figure 1A)**. To provide feedback, after the response window, a yellow spot was shown indicating the correct target location. Finally, contingent on performance, the animal received a juice reward. An inter-trial interval of at least 2 s was enforced before the next trial could be initiated by fixation. Every sound presentation trial was followed by a "silence" trial, allowing for measurement of baseline blood oxygen level dependent (BOLD) signal. Monkey 1 (M1) performed the task correctly for over 90% of the trials. Monkey 2 (M2) was not able to perform the saccadic go/no-go discrimination task with high accuracy and was therefore scanned while passively listening to the acoustic stimuli. To ensure stable attention, M2 was rewarded for successfully holding fixation throughout the trial.

## Auditory Stimuli

Three sound categories were used in the experiments: environmental sounds (Env), monkey vocalizations or calls (MC), and scrambled monkey calls (SMC). Spectrograms of example clips from each of these three categories are illustrated in **Figure 1C.** Environmental sounds were obtained from multiple

online sources and from recordings made in our laboratory facilities (Ku´smierek and Rauschecker, 2009). They included the sounds of vehicles, cages, water, food containers, clocks, cameras, applause, coins, footsteps, chewing, heartbeats, horns, and telephones (n = 56). The mean duration of the Env stimuli was 1.14 s (range: 0.96–2.6 s). Monkey calls were obtained from recordings made outside our colony [M. Hauser and/or Laboratory of Neuropsychology (LN) library]. Monkey vocalizations (n = 63) consisted of grunts, barks, warbles, coos, and screams, as used in prior studies (Rauschecker et al., 1995; Tian et al., 2001; Ku´smierek et al., 2012). The mean duration of the vocalization stimuli was 0.67 s (range: 0.13–2.34 s). SMC were generated by randomly rearranging 200 ms by 1-octave tiles of the constant-Q spectrogram (Brown, 1991) for each monkey call and reconstructing a time-domain waveform with an inverse transform (Schörkhuber and Klapuri, 2010). Transposition along the time axis was not constrained while transposition along the frequency axis was restricted to displacement by a single octave. For each trial, a random selection of stimuli from one class (MC, Env, or SMC) was arranged sequentially into a smooth auditory clip that lasted for the duration of the trial (8 s).

Sounds were presented through modified electrostatic inear headphones (SRS-005S + SRM-252S, STAX), mounted on ear-mold impressions of each animal's pinna (Sarkey Eden Prairie) and covered with a custom-made earmuff system for sound attenuation. To match loudness, the stimuli were played through the sound presentation system and re-recorded with a probe microphone (Brüel and Kjær, type 4182 SPL meter) inserted in the ear-mold of an anesthetized monkey. The recordings were then filtered with an inverted macaque audiogram (Jackson et al., 1999) to simulate the effect of different ear sensitivity at different frequencies, analogous to the dB(A) scale for humans. The stimuli were finally equalized so that they produced equal maximum root mean square (RMS) amplitude (using a 200-ms sliding window) in filtered recordings (Ku´smierek and Rauschecker, 2009). During experiments, all stimuli were amplified (Yamaha AX-496) and delivered at a calibrated RMS amplitude of ∼80 dB SPL.

## Analyses of Sound Categories

A modulation spectrum analysis (Singh and Theunissen, 2003) was performed for each sound with the STRFpak Matlab toolbox (http://strfpak.berkeley.edu). We obtained a spectrogram of each sound by decomposing it into frequency bands using a bank of Gaussian filters (244 bands, filter width = 125 Hz). The filters were evenly spaced on the frequency axis (64–48,000 Hz) and separated from each other by one standard deviation. The decomposition resulted in a set of narrow-band signals, which were then cross-correlated with each other and themselves to yield a cross-correlation matrix. This matrix was calculated for time delays of ±150 ms, and the two-dimensional Fourier transform of this matrix was calculated to obtain the modulation spectrum of each sound (**Figure 1D**).

## Data Acquisition

Images were acquired with a horizontal MAGNETOM Trio 3-T scanner (Siemens) with a 60-cm bore diameter. A 12-cm custommade saddle shape radiofrequency coil (Windmiller Kolster Scientific) covered the entire brain and was optimized for imaging the temporal lobe. The time series consisted of gradient-echo echo-planar (GE-EPI) whole-brain images obtained in a sparse acquisition design. Sparse sampling allows single volumes to be recorded coincidentally with the predicted peak of the evoked hemodynamic response (Hall et al., 1999). This helps to avoid contamination of the measured stimulus-specific BOLD response by the scanner-noise-evoked BOLD response. Further, by triggering acquisition 6 s after stimulus onset, the auditory stimulus was presented without acoustic interference from gradient-switching noise, typical of a continuous fMRI design. For the functional data, individual volumes with 25 ordinal slices were acquired with an interleaved single-shot GE-EPI sequence (TE = 34 ms, TA = 2.18 s, flip angle = 90◦ , field of view (FOV) = 100 × 100 mm<sup>2</sup> , matrix size = 66×66 voxels, slice thickness = 1.9 mm, voxel size = 1.5 × 1.5 × 1.9 mm<sup>3</sup> ). On each experiment day, a low-resolution FLASH anatomical scan was acquired with the same geometry as the functional images (TE = 14 ms, TR = 3 s, TA = 2.18 s, FOV = 100 × 100 mm<sup>2</sup> , matrix = 512 × 512 voxels, slice thickness = 1.9 mm, number of averages = 2, flip angle = 150◦ ). For overlaying our functional images, we created a high-resolution anatomical template (0.5 × 0.5 × 0.5 mm<sup>3</sup> isotropic voxels) by averaging five high-resolution anatomical scans acquired under general anesthesia with an MP-RAGE sequence (TE = 3.0 ms, TR = 2.5 s, flip angle = 8 ◦ , FOV = 116 × 96 × 128 mm<sup>3</sup> ; matrix = 232 × 192 × 256 voxels).

#### Data Analysis

For M1, nine EPI runs (180 time points each) were acquired over six sessions. For M2, seven runs were acquired over four sessions. All data analyses were performed using AFNI (Cox, 1996) (http://afni.nimh.nih.gov/afni), FreeSurfer (Dale et al., 1999; Fischl et al., 1999) (http://surfer.nmr.mgh.harvard.edu/), SUMA (http://afni.nimh.nih.gov/) and custom code written in Matlab (MathWorks). Preprocessing involved slice timing correction, motion correction (relative to the run-specific mean GE-EPI), spatial smoothing with a 3.0 mm full width at halfmaximum Gaussian kernel, and normalization of the time series at each voxel by its mean. All volumes that had motion values with shifts >0.5 mm and/or rotations >0.5◦ were excluded from further analyses. Lastly, we performed linear least-squares detrending to remove non-specific variations (i.e., scanner drift). Following preprocessing, data were submitted to generalized linear model analyses. The model included three stimulusspecific regressors and six estimated motion regressors of no interest. For each stimulus category (Env, MC, SMC) we estimated a regressor by convolving a one-parameter gamma distribution estimate of the hemodynamic response function with the square-wave stimulus function. We performed t-tests contrasting all sounds vs. baseline ("silence" trials), MC vs. Env, and MC vs. SMC. Finally we co-registered and normalized our functional data to the population-average MRI-based template for rhesus monkeys 112RM-SL (McLaren et al., 2009) and then displayed the results on a semi-inflated cortical surface of the template extracted with Freesurfer and displayed with SUMA to facilitate visualization and identification of cortical activations. The anatomical boundaries described here are based on the macaque brain atlas of Saleem and Logothetis (2012).

To quantify the lateralization of the BOLD response across hemispheres we measured a lateralization index [LI = (R<sup>h</sup> - Lh)/(R<sup>h</sup> + Lh)], where R<sup>h</sup> and L<sup>h</sup> are the mean responses in the right and left hemisphere, respectively. The LI curve analyses ensure that the lateralization effect is not caused by small numbers of highly activated voxels across hemispheres. The LI curves were based on the t-values obtained from each contrast condition and were calculated using the LI-toolbox (Wilke and Lidzba, 2007) with the following options: ±5 mm mid-sagittal exclusive mask, clustering with a minimum of 5 voxels and default bootstrapping parameters (min/max sample size 5/10,000 and bootstrapping set to 25% of data). The bootstrapping method calculates 10,000 times LIs using different thresholds ranging from zero until the maximum t-value for a specific contrast condition. For each threshold a cut-off mean value is obtained from which a weighted mean (LI-wm) index value can then be calculated (Wilke and Lidzba, 2007). This yields a single value between −1 and 1 indicating right- or left-sided hemisphere dominance.

## Results

Our first goal was to identify brain regions involved in the processing of conspecific vocalizations by the macaque brain. To this end, we collected functional MR images of two monkeys in a horizontal 3-T scanner while stimuli from three different sound categories were presented to the animals. Complex sounds are characterized by having a wide range of spectrotemporal features. While environmental sounds typically contain sharp temporal onsets, monkey vocalizations contain greater modulations in the spectral domain because of the harmonics contained in these sounds. Environmental sounds also carry abstract information

about the identity of objects, so a comparison between BOLD responses to monkey vocalizations and environmental sounds is useful in determining brain structures involved in higher-level processing. However, specific spectrotemporal differences exist between these two types of sounds. This can be seen, for instance, in the spectral modulation of monkey vocalizations at approximately 1.5–2 cycles/kHz, which is not present for other sound categories (**Figure 1D**). Thus, scrambled versions of monkey calls (SMC) were used to further control for the local spectrotemporal features in the vocalizations (see **Figure 1C** and Material and Methods). Comparison of average modulation spectra between categories showed that SMC were acoustically better matched to MC than Env (correlation coefficient between the modulation spectra: SMC vs. MC: 0.92, Env vs. MC: 0.86; **Figure 1D**).

Overall, sound stimulation elicited significant BOLD responses compared to silent trials irrespective of auditory stimulus category [q (FDR) < 0.05, p < 10−<sup>3</sup> , one-tailed t-test, t range: 2.3–10, cluster size > 10 voxels] in a broad network of brain regions, including subcortical auditory pathways, classical auditory areas of the superior temporal gyrus (STG), but also regions in parietal and prefrontal cortices (**Figure 2**). The clusters in **Figure 2A** highlight the main activation sites on the cortical surface of monkey M1. **Figure 2B** shows selected coronal slices for both animals (M1 and M2) showing activation

the superior temporal sulcus (STS), ventral intraparietal area (VIP), and the frontal pole (Fp). Activated dorsal-stream regions included the ips and

main auditory activation showed a right-hemisphere bias in both animals (M1, weighted mean = −0.33, M2, weighted mean = −0.66).

in the ascending auditory pathway. These regions include the cochlear nucleus (CN), the inferior colliculus (IC), the medial geniculate nucleus (MGN), the primary auditory cortex (A1), and areas in the anterior superior temporal cortex, including the rostral (R) and anterolateral (AL) areas, the rostrotemporolateral area (RTL), and the rostrotemporal pole (RTp) region.

Activation clusters (averaged across animals and hemispheres) taken from a normalized number of voxels (i.e., equal number of left and right voxels) were found in: IC [N = 84 voxels, peak coordinate = (4, −1, 12)]; A1 [N = 198 voxels, peak coordinate = (22, 6, 24)]; R/AL [N = 131 voxels, peak coordinate = (24, 17, 12)]; and RTL/RTp [N = 165 voxels, peak coordinate = (23, 22, 8)].

For both animals we observed a larger amplitude and spatial extent of the BOLD response in the right hemisphere as compared to the left hemisphere (**Figure 2B**). Activation (percent signal change) in selected clusters for each hemisphere is shown in **Figure 2C**. We compared the activation between the two hemispheres by calculating a laterality index (LI), with a positive index indicating a left-hemisphere bias and a negative index indicating a right-hemisphere bias. Given the fact that LIs show a threshold dependency (Nagata et al., 2001), we measured LI curves to provide a more comprehensive estimate over a whole range of thresholds (Wilke and Lidzba, 2007). Using this adaptive thresholding approach we found a right-hemisphere bias in the LI curves for general auditory activation (all sounds vs. baseline) in both monkeys (M1, weighted mean = −0.33; M2, weighted mean = −0.66). For higher thresholds, the activation was clustered in primary auditory cortex (A1) of the right hemisphere in each animal.

Vocalizations are complex naturalistic stimuli that contain behaviorally relevant information. In order to investigate if the auditory system contained representations that are sensitive to this sound category vs. other types of behaviorally relevant complex sounds, we contrasted monkey calls against environmental sounds (see Material and Methods). Environmental sounds also carry abstract information about object identity in their spectrotemporal patterns. We, therefore, also looked for areas showing elevated response to these sounds relative to monkey vocalizations. When correcting for multiple comparisons [q (FDR) < 0.05], no differences were observed for the contrast of MC vs. Env. However, at uncorrected thresholds, we found significantly higher activations by MC as compared to Env in both monkeys across regions in temporal, parietal and prefrontal cortices (M1, p < 10−<sup>3</sup> uncorrected, t-value range: −4.2 to 6.1, cluster size > 5 voxels; M2, p < 10−<sup>2</sup> uncorrected, t-value range: −3.6 to 5.9, cluster size > 5 voxels) (**Figure 3A**). Specifically, activations sensitive to MC were found in the anterior STG region, including areas AL and RTp of the rostral belt/parabelt, and further along the auditory ventral stream in ventrolateral prefrontal cortex (vlPFC). In addition, we observed activation patches in the inferior parietal lobule (areas PF/PFG) of the right parietal cortex, and bilaterally inside the inferior branch of the arcuate sulcus, possibly corresponding to Brodmann's area (BA) 44, and posterior to the arcuate sulcus, in a region that is part of ventral premotor cortex (PMv). In addition, we found regions sensitive to environmental sounds (blue) along the superior temporal sulcus (STS) and inferotemporal (IT) cortex. To investigate hemispheric lateralization in the processing of vocalizations, we measured LI curves for this contrast (Mc > Env), finding a slight right hemispheric bias in monkey M1 (weighted mean = −0.19) and a moderate right-hemisphere bias in monkey M2 (weighted mean = −0.42).

In order to determine whether spectrotemporal features alone could have driven the activation in these areas, we further

contrasted monkey calls (MC) with scrambled monkey calls (SMC). The results showed similar patterns of MC activation in both monkeys in the RTL region of the aSTG (M1; p < 10−<sup>3</sup> uncorrected, t-value range > −4.8 to 7.5, cluster size > 5 voxels and for M2, p < 10−<sup>2</sup> uncorrected, t-value range > −4.3 to 6.1, cluster size > 5 voxels) in both monkeys specifically in the RTL region of the aSTG (**Figure 3B**). In monkey M2, a second region, the middle medial belt (MB), was also more strongly activated by monkey vocalizations than by their scrambled counterparts. The weighted-mean lateralization index (LI) for this contrast (MC > SMc) also showed higher values toward the right hemisphere (M1: weighted mean = −0.34; M2: weighted mean = −0.44). A summary is shown in **Table 1**.

Some differences in the patterns of activity were observed across the two animals. These differences might be explained either by variability across subjects or by differences in attentional state: M1 was significantly engaged in completing the task (>90% success), whereas M2 was scanned passively while holding fixation. To compensate for this variability, we calculated the minimum t-statistic (p < 0.01 uncorrected) across contrasts in each monkey (a conjunction test) and across monkeys in each contrast (**Figure 4**). Conjunction across contrasts (MC > Env and MC > SMC) and monkeys (M1 and M2) found a single area in the right hemisphere to be specifically involved across both conjunction analyses, area RTL/RTp (peak coordinate: 24, 17, 12).

## Discussion

Species-specific vocalizations in non-human primates ("monkey calls") convey important information about affective/emotional states as well as the recognition of objects and individuals (Ghazanfar and Hauser, 1999). We used whole-brain functional magnetic resonance imaging (fMRI) in awake behaving monkeys to examine auditory responses to stimuli from three different sound categories: (a) multiple types of conspecific monkey calls, (b) environmental sounds, and (c) scrambled versions of the same monkey calls largely preserving their local spectrotemporal features.

For all three sound categories combined we found robust BOLD responses along various regions in the ascending auditory pathways (CN, IC, MGB, and A1, **Figures 2A,B**). These results,

TABLE 1 | LI-weighted-mean values for the overall sound activation and for each contrast condition.


Mean lateralization index values (LI-wm) are shown that were obtained from LI curves measured as a function of the statistical threshold (t-value) for the overall auditory activation (all sounds vs. baseline), for the contrast between monkey calls and environmental sounds (MC > Env) and for the contrast between monkey calls and scrambled monkey calls (MC > SMC). A positive index indicates a left-hemisphere bias, while a negative index indicates a right-hemisphere bias. LI-wm values are shown separately for monkeys M1 and M2.

using a 3-T scanner without contrast agent, corroborate previous fMRI findings obtained on a 1.5-T magnet with the contrast agent MION, showing activation by complex sounds along the auditory pathway (Joly et al., 2012a). The results further attest to the fact that complex sounds are highly effective for mapping subcortical and cortical auditory structures (Rauschecker et al., 1995; Rauschecker, 1998b; Poremba et al., 2003). Furthermore, our results confirm the general trend of a slight right-hemisphere bias (**Table 1**) in the processing of complex sounds in the macaque auditory cortex, as measured with fMRI (Petkov et al., 2008; Joly et al., 2012a). Similar results have been found in humans for non-speech voice sounds (Belin et al., 2000).

When we compared activations produced by monkey vocalizations vs. the other two sound categories using a conjunction analysis, we found consistent activations in regions along the anterior STG, in particular in areas AL, RTL and RTp, in both animals (**Figure 4**). Our results extend previous findings of increased sensitivity to monkey vocalizations in anterior STG regions (Poremba et al., 2003; Petkov et al., 2008; Kikuchi et al., 2010; Joly et al., 2012a,b; Fukushima et al., 2014) by using control stimuli (SMC) that retained the low-level acoustic information of macaque vocalizations and whose acoustic structure was better matched to the vocalizations than the acoustic structure of other complex sounds (**Figure 1D**). Single-unit studies of the R/AL region have also found increased selectivity either to monkey calls, or to sound categories including vocalizations (Tian et al., 2001; Ku´smierek et al., 2012), consistent with the present results (**Figures 3**, **4**).

Thus, the cortical representation of vocalizations involves an auditory ventral pathway, consisting of a chain of interconnected regions in anterior STG and vlPFC that extract abstract information for the recognition and categorization of vocalizations (Rauschecker, 2012). The rostral belt, parabelt and aSTG send afferent projections into ventrolateral, polar, orbital, and medial regions of the prefrontal cortex (PFC) (Jones and Powell, 1970; Hackett et al., 1999; Romanski et al., 1999; Cavada et al., 2000; Kaas and Hackett, 2000; Hackett, 2011; Yeterian et al., 2012), and together these regions form the ventral cortical stream in audition. Vocalization-sensitive neurons are found along with face-sensitive neurons in the vlPFC (Romanski et al., 2005), allowing these regions to integrate vocalizations with the corresponding facial gestures (Romanski and Goldman-Rakic, 2002; Cohen et al., 2007; Diehl and Romanski, 2014). The PFC is involved in higher-level integrative processes for the cognitive control of vocalizations as well as in the interpretation of semantic content in vocalizations (Romanski and Averbeck, 2009). The activation patterns observed in PFC (**Figure 3A**) could represent categorical or affective information reflected in the vocalizations. Further imaging studies and multivariate analyses comparing multiple vocalization types might elucidate the differential contribution of each subregion of the PFC.

Our stimuli also activated higher-level visual areas, such as the middle temporal (MT) and inferior temporal areas (IT). These areas are known to be involved in the processing of visual motion (Maunsell and Van Essen, 1983) and in object perception (including faces), respectively (Tsao et al., 2006; Ku et al., 2011). Their activation by purely auditory stimuli raises interesting questions

regarding their possible role in the multisensory processing of dynamic audio-visual stimuli, such as facial expressions that naturally occur in conjunction with vocalizations and/or motion of the face (Furl et al., 2012; Polosecki et al., 2013; Perrodin et al., 2014). However, to answer these questions more definitively, further imaging experiments utilizing dynamic audio-visual stimuli would be necessary. Such studies could enlighten us on how auditory information combines with visual information in both the ventral and dorsal pathways building multimodal representations from dynamic facial expressions combined with vocalizations (Ghazanfar and Logothetis, 2003).

When we contrasted monkey calls to environmental sounds, we also found differential activation in regions PF/PFG (area 7b) (Pandya and Seltzer, 1982; Rozzi et al., 2006) of the inferior parietal lobule (IPL), in addition to the well-known regions in the STG sensitive to monkey vocalizations. Parietal regions inside the intraparietal sulcus (IPS) have been known to receive auditory projections (Lewis and Van Essen, 2000) and to contain neurons that respond to auditory and multimodal stimuli (Stricanne et al., 1996; Bushara et al., 1999; Grunewald et al., 1999; Cohen and Andersen, 2000; Cohen, 2009), but the role of these regions has traditionally been assumed to lie in spatial processing and control of eye movements.

Similarly, we found an engagement of the ventral premotor cortex (PMv) in the processing of monkey vocalizations (**Figure 3A**). This region has previously been thought to be involved in the processing of the location (but not quality) of nearby sounds (Graziano et al., 1999). Surprisingly, when we compared the effects of vocalizations (MC) against vocalizations that were scrambled in both the spectral and temporal domains (SMC), we did not observe greater activation in parietal or prefrontal areas for MC, suggesting that the scrambled versions of the MC evoked the same amount of activity in these regions. Similar results were obtained by Joly et al. (2012b) with temporally scrambled vocalizations activating large regions of premotor and parietal cortices. Ventral premotor cortex (PMv) has also been implicated in the initiation of vocalizations in the macaque monkey (Hage and Nieder, 2013). It appears possible, therefore, that the same neurons are the source of an efference copy signal (Kauramäki et al., 2010), which is responsible for the suppression of auditory cortex during self-initiated vocalizations (Eliades and Wang, 2003). More generally, they could be part of an audio-motor network connecting perception and production of sounds (Rauschecker and Scott, 2009; Rauschecker, 2011).

## Author contributions

MO co-designed the study, trained the animals, programmed stimulus presentation, acquired part of the data, conducted most analyses, and co-wrote the manuscript. PK programmed the behavioral task and participated in writing the manuscript. DA trained the animals and acquired part of the data. ID generated the scrambled stimuli and acquired part of the data. FA contributed with data analyses and participated in writing the manuscript. GK, interpreted data and participated in writing the manuscript. MS, IJ, and JR co-designed the study and participated in writing the manuscript.

## Acknowledgments

Special thanks to Josie Cui for animal care and assistance with the experiments, and John VanMeter for fMRI data optimization. This work was supported by grants from the National Institutes of Health (R01-DC03489, R01-NS052494, and R56-NS052494 to JR), a PIRE Grant from the National Science Foundation (OISE-0730255 to JR), and a FiDiPro award from the Academy of Finland (JR).

## References


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Ortiz-Rios, Ku´smierek, DeWitt, Archakov, Azevedo, Sams, Jääskeläinen, Keliris and Rauschecker. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Anatomical pathways for auditory memory II: information from rostral superior temporal gyrus to dorsolateral temporal pole and medial temporal cortex

M. Muñoz-López 1, 2 \*, R. Insausti <sup>2</sup> , A. Mohedano-Moriano<sup>2</sup> , M. Mishkin<sup>1</sup> and R. C. Saunders <sup>1</sup>

<sup>1</sup> Laboratory of Neuropsychology, National Institute of Mental Health, National Institutes of Health, Bethesda, MD, USA, <sup>2</sup> Human Neuroanatomy Laboratory and Regional Centre for Biomedical Research (CRIB), School of Medicine, University of Castilla-La Mancha, Albacete, Spain

#### Edited by:

Marc Schönwiesner, University of Montreal, Canada

#### Reviewed by:

James Bigelow, University of Iowa, USA Lisa De La Mothe, Tennessee State University, USA

#### \*Correspondence:

M. Muñoz-López, School of Medicine, University of Castilla-La Mancha, Avenida Almansa 14, 02006 Albacete, Spain monica.munozlopez@uclm.es

#### Specialty section:

This article was submitted to Auditory Cognitive Neuroscience, a section of the journal Frontiers in Neuroscience

> Received: 23 January 2015 Accepted: 16 April 2015 Published: 18 May 2015

#### Citation:

Muñoz-López M, Insausti R, Mohedano-Moriano A, Mishkin M and Saunders RC (2015) Anatomical pathways for auditory memory II: information from rostral superior temporal gyrus to dorsolateral temporal pole and medial temporal cortex. Front. Neurosci. 9:158. doi: 10.3389/fnins.2015.00158 Auditory recognition memory in non-human primates differs from recognition memory in other sensory systems. Monkeys learn the rule for visual and tactile delayed matching-to-sample within a few sessions, and then show one-trial recognition memory lasting 10–20 min. In contrast, monkeys require hundreds of sessions to master the rule for auditory recognition, and then show retention lasting no longer than 30–40 s. Moreover, unlike the severe effects of rhinal lesions on visual memory, such lesions have no effect on the monkeys' auditory memory performance. The anatomical pathways for auditory memory may differ from those in vision. Long-term visual recognition memory requires anatomical connections from the visual association area TE with areas 35 and 36 of the perirhinal cortex (PRC). We examined whether there is a similar anatomical route for auditory processing, or that poor auditory recognition memory may reflect the lack of such a pathway. Our hypothesis is that an auditory pathway for recognition memory originates in the higher order processing areas of the rostral superior temporal gyrus (rSTG), and then connects via the dorsolateral temporal pole to access the rhinal cortex of the medial temporal lobe. To test this, we placed retrograde (3% FB and 2% DY) and anterograde (10% BDA 10,000 mW) tracer injections in rSTG and the dorsolateral area 38DL of the temporal pole. Results showed that area 38DL receives dense projections from auditory association areas Ts1, TAa, TPO of the rSTG, from the rostral parabelt and, to a lesser extent, from areas Ts2-3 and PGa. In turn, area 38DL projects densely to area 35 of PRC, entorhinal cortex (EC), and to areas TH/TF of the posterior parahippocampal cortex. Significantly, this projection avoids most of area 36r/c of PRC. This anatomical arrangement may contribute to our understanding of the poor auditory memory of rhesus monkeys.

Keywords: auditory, memory, superior temporal gyrus, primate, temporal pole, medial temporal cortex

## Introduction

Primates have a surprisingly poor ability to store auditory sensory information into long-term memory (Fritz et al., 2005; Scott et al., 2012). This contrasts with their remarkable capability to form long-term visual and tactile memories (Murray and Mishkin, 1984; Goulet and Murray, 2001). As tested with the delayed nonmatching to sample (DNMS) task, visual recognition memory is learned quickly and displays a high level performance at long delays or with many items to remember. In contrast, an auditory version of the DMS/DMNS tasks is very difficult to learn taking many thousands of trials and months to acquire the basic rule and, once learned, performance is poor with monkeys unable to remember more than a single stimulus and only for a few seconds. Similarly, recent behavioral data in humans shows a better ability to remember visual and tactile information than that presented in the auditory modality (Bigelow and Poremba, 2014). As noted in our earlier paper of this series (Muñoz-López et al., 2010), comparison of the visual, tactile, and auditory anatomical pathways might provide us with an explanation as to the difference in recognition memory ability.

The visual system is organized into ventral and dorsal processing streams, with the ventral stream important for object identity and recognition memory (Mishkin and Ungerleider, 1982; Kravitz et al., 2013). The ventral stream is described as organized anatomically in a hierarchical series of connections characterized functionally by the processing of increased stimulus complexity (i.e., 3D objects) at progressively more rostral areas (Mishkin and Ungerleider, 1982; Desimone, 1996; Nakamura and Kubota, 1996; Tanaka, 1996; Kravitz et al., 2013). This processing stream originates in the striate cortex (V1) and courses through the occipitotemporal cortex (V4, TEO) to its anterior temporal target (area TE, Kravitz et al., 2013). Area TE then projects into the memory related areas of the medial temporal cortex, i.e., perirhinal (PRC), posterior parahippocampal (PHC) cortices, and from these to the entorhinal (EC) cortex (Suzuki and Amaral, 1994a,b). Furthermore, tactile information reaches area 35 of the PRC from higher processing somatosensory insular area SII (Friedman et al., 1986). Damage to these rhinal cortical areas results in a severe visual (Meunier et al., 1993; Malkova et al., 2001) but also tactile recognition memory impairment (Goulet and Murray, 2001).

Ventral and dorsal processing streams have also been described anatomically with respect to audition (Romanski et al., 1999). However, the details on the anatomy and function of the auditory ventral stream are still poorly understood. The auditory ventral stream, thought to be important for processing information about stimulus identity, originates in primary core areas A1/R/RT and courses rostrally in a multistep fashion within the STP and in parallel through the belt areas RM, AL, RTL, RTM (Kaas and Hackett, 2000, see **Figure 1**). From these rostral belt areas connections course downstream within the parabelt (Ts3), to areas Ts2 and Ts1 on the dorsolateral surface of rSTG (Galaburda and Pandya, 1983; Pandya and Yeterian, 1984) and make their way as far rostral as the dorsolateral temporal pole. Functional imaging studies suggest that the rostral STP and dorsolateral temporal pole are important for processing of complex stimuli such as species-specific calls (Gil-da-Costa et al., 2004; Poremba et al., 2004; Petkov et al., 2008). More specifically, neural responses from the belt/core areas have short latencies to basic acoustic properties of sounds (i.e., frequencies, Tian et al., 2001) while responses in the anterior belt and parabelt have longer latencies, and respond more selectively to complex sounds such as monkey calls (Kikuchi et al., 2010; Perrodin et al., 2011; Fukushima et al., 2014). Taken together, the data suggests a rostrally directed stimulus identity processing stream in STG.

It would appear that direct connections between the auditory association areas of the superior temporal gyrus (STG) with the medial temporal cortex might also underlie recognition memory for sounds. However, monkeys do not appear to have very good auditory recognition memory, at least as tested using conventional tests. This poor auditory memory may be reflected in a difference in the anatomical organization of the auditory system with the medial temporal cortex.

The aim of the present report is to examine the auditory projections from the rostral auditory association areas into areas 35 and 36 of PRC, EC, and areas TH and TF of PHC (see **Figure 1B**). To investigate this anatomical pathway, we examined first the auditory cortical afferent connections to the dorsolateral temporal pole (area 38DL) by means of retrograde injections in 38DL and anterograde tracer injections in the rostral STP and rSTG. The second step was to determine the pattern of efferent projections from rSTG areas and 38DL to EC, PRC, and PHC by means of anterograde tracer injections into 38DL and Ts2, Ts3, and RTL.

**Abbreviations:** 35, area 35 of the perirhinal cortex (Brodmann, 1909); 36<sup>r</sup> , rostral division of area 36 of the perirhinal cortex (Insausti et al., 1987); 36c, caudal division of area 36 of the perirhinal cortex (Insausti et al., 1987); 36pDM, dorsal medial division of the temporal pole; 36pVM, ventral medial division of the temporal pole; 38DL, dorsal lateral division of the temporal pole; 38VL, ventral lateral division of the temporal pole; A1, area A1 (Kaas and Hackett, 2000); AL, area AL (Kaas and Hackett, 2000); amts, anterior middle temporal sulcus; cc, corpus callosum; CL, area CL (Kaas and Hackett, 2000); CM, area CM (Kaas and Hackett, 2000); DM, area DM (Kaas and Hackett, 2000); EC, entorhinal cortex; EC, caudal subfield of EC (Amaral et al., 1987); ECL, caudal limiting subfield of EC (Amaral et al., 1987); E<sup>I</sup> , intermediate subfield of EC (Amaral et al., 1987); ELc, lateral caudal subfield of EC (Amaral et al., 1987); ELr, lateral rostral subfield of EC (Amaral et al., 1987); EO, olfactory subfield of EC (Amaral et al., 1987); ER, rostral subfield of EC (Amaral et al., 1987); Ia, insula; IPa, area IPa (Seltzer and Pandya, 1978, 1989); la, lateral sulcus; MM, area MM (Kaas and Hackett, 2000); PaI, Parainsular cortex; PHC, areas TH and TF of the parahippocampal cortex; PGa, area PGa (Seltzer and Pandya, 1978, 1989); pmts, posterior middle temporal sulcus; PRC areas 35 and 36 of the perirhinal cortex; R, area R (Kaas and Hackett, 2000); RM, area RM (Kaas and Hackett, 2000); rs, rhinal sulcus; RT, area RT (Kaas and Hackett, 2000); RTL, area RTL (Kaas and Hackett, 2000); RTM, area RTM (Kaas and Hackett, 2000); STG, superior temporal gyrus; STP, superior temporal plane; ts, superior temporal sulcus; TAa, area Tpt (Seltzer and Pandya, 1978, 1989); TE, area TE (Von Bonin and Bailey, 1947); TF, area TF (Von Bonin and Bailey, 1947); TF<sup>l</sup> , lateral division of area TF (Insausti et al., 1987); TFm, lateral division of area TF (Insausti et al., 1987); TH, area TH (Von Bonin and Bailey, 1947); THc, caudal division of area TH (Insausti et al., 1987); TH<sup>r</sup> , rostral division of area TH (Insausti et al., 1987); TPC, temporal pole cortex; TPO, area TPO (Seltzer and Pandya, 1978, 1989); Tpt, area Tpt (Seltzer and Pandya, 1978, 1989); Ts1, area Ts1 (Seltzer and Pandya, 1978, 1989); Ts2, area Ts2 (Seltzer and Pandya, 1978, 1989); Ts3, area Ts3 (Seltzer and Pandya, 1978, 1989).

## Materials and Methods

## Subjects

Rhesus monkeys (Macaca mulatta, N = 14) of both sexes weighing between 6.0 and 10.0 kg and Cynomolgous monkeys (Macaca fascicularis, N = 2, males) weighing between 3.0 and 5.0 kg were used in this study. Five Rhesus monkeys (M3, M6, M7, M8, M12) had forebrain commissurotomy previous to tracer injections and were used as part of a previous study (Muñoz et al., 2009). Experiments were carried out in strict adherence to the Guide for the Care and Use of Laboratory Animals (Clark et al., 1997) and under an approved NIMH Animal Study Proposal and the European Union rules for care and use of animals (UE 86/609/CEE) and the supervision and approval of the Ethical Committee of Animal Research of the University of Castilla-La Mancha (UCLM), Spain.

the lateral sulcus. Sulci are indicated with dotted lines. Scale bar: 1 cm. (B)

## Tracers

Surgical details are described previously (Muñoz et al., 2009). Discrete 1µl injections, of the fluorescent retrograde tracers Fast Blue and Diamidino Yellow (FB and DY, Sigma Chemical CO, St. Louis, MO) suspended in distilled water at concentrations of 3% (FB) and 2% (DY), and the anterograde tracer biotin dextran amine (BDA 10,000 mW, Molecular Probes, Eugene OR) at a concentration of 10% in 0.01 M phosphate buffer, were injected with a Hamilton syringe at a depth of 1.5–2 mm below the cortical surface. A total of nineteen tracer injections aimed at the rostral regions of the STG were analyzed.

## Tissue Processing

After a survival period of 2 weeks, the animals were deeply anesthetized and transcardially perfused with 4% paraformaldehyde. The brains were cryoprotected, quickly frozen in −80◦C isopentane, and then cut in the coronal plane at 50µm with a sliding microtome coupled to a freezing stage (Muñoz et al., 2009). Eight, one-in-10 series were processed as follows: two series were immediately mounted onto subbed slides and air-dried, with one series then stored at −80◦C, and the second series stored at −20◦C in air-tight boxes. Both series were coverslipped with 0.4 M potassium bicarbonate, protected from light, and used for fluorescent retrograde label analysis. A third series was stained with thionin and used for cytoarchitectonic evaluation. A fourth series was processed to visualize BDA transport as follows: (a) 30 min incubation in 1% hydrogen peroxide; (b) incubation in 0.5µg/ml streptavidin-horseradish peroxidase conjugate (Molecular Probes, Inc., Or.) in 0.05 M Tris buffer (pH 7.6) for 4 h at room temperature and then overnight at 4 ◦C. The two Macaca fascicularis brains were processed for BDA visualization following the same avidin-biotin principle, but with the abidin biotin complex (ABC, Vector Ltd., Peterborough, UK) for 2 h at room temperature-; (c) development in 0.025% 3,3-diaminobenzidine tetrahydrochloride reaction (DAB, Sigma Co. St. Louis, MO) with 0.075% hydrogen peroxide and 0.1–0.2% nickel sulfate in 0.05 M Tris buffer (pH 8.0) to intensify the staining and minimize background (3–5 min). Sections were then mounted, dried, and coverslipped. In two animals, a fifth series was stained with a modified Gallyas procedure to visualize

the entry of auditory projections to the medial temporal cortex (red arrow).

fibers (Gallyas, 1979). The sixth, seventh, and eight series were processed to visualize parvalbumin, cytochrome-oxidase, and acetylcholinesterase used to delimit core and belt auditory cortex boundaries. For parvalbumin immunohistochemistry, primary antibody (1:8000) mouse monoclonal IgG1 lyophilized (Swant, Bellinzona, Switzerland) and secondary antibody (1:200) biotinylated horse-anti-mouse IgG (Vector, Burlingame, CA) were used. Visualization started with streptavidin-horseradish peroxidase conjugate in 0.05 M TBS (pH 7.6) and finalized in 0.025% DAB with 0.075% hydrogen peroxide in 0.05 M TBS (pH 8.0). Cytochrome-oxidase staining was based on the Wong-Riley (Wong-Riley, 1979). Briefly, sections were incubated for 1–2 h at 37◦C in 0.4% cytochrome C (Sigma Chemical, Co. St. Louis, MO), 0.1 M PB 7.4 pH with 0.6% DAB, and 0.04% glycerol. Sections were rinsed in 0.1 PB, air dried, and coverslipped. Acetylcholinesterase staining was based on the procedure of Tago et al. (1986) as described in Turchi et al. (2005).

## Data Analysis

Individual retrogradely labeled fluorescent cells and anterograde labeled axons in the cerebral cortex in the hemisphere ipsilateral to the injections were plotted from coronal sections 1 mm apart at a magnification of 20× with the aid of an Axiophot Zeiss microscope equipped with a digital video camera (CCD, Optronics, Goleta, CA) and an image analysis system (Bioquant Nova, R&M Biometrics Inc., Nashville, TN). Cytoarchitectonic divisions were analyzed in adjacent thionin sections and were superimposed on sections with anterograde and retrograde label with the aid of a camera lucida. The two Macaca fascicularis cases (102BDA and 302BDA) were analyzed with an Olympus B50 microscope and labeled fibers were drawn with camera lucida plotted at a magnification of 20×. Two-dimensional, unfolded maps were constructed for each monkey's temporal lobe following the procedure of Van Essen and Maunsell (1980) (see **Figure 2**). We used the rhinal sulcus as reference to extend the temporal cortex outline along layer IV/V boundary (Muñoz and Insausti, 2005). Label in the unfolded maps is depicted for layers II–III in green while label in layers V–VI is in black.

## Nomenclature

The approximate location of the architectonic subdivisions of the superior temporal plane (STP), STG, temporal pole cortex (TPC), inferior temporal gyrus (ITG), and the medial temporal cortex are indicated in **Figures 1**, **2**. Rhesus and Cynomolgous monkeys share the cytoarchitectonic features of the areas studied here, with no major differences other than the exact boundary location, and therefore, we used the same nomenclature for both species.

## Temporal Pole

The TPC extends anteriorly from the rostral tip of superior temporal sulcus (ts) to the tip of the temporal lobe. The caudal limit medially is near the limen insulae, where it borders with the agranular insular cortex. TPC has been identified as a separate cytoarchitectonic area in humans (area 38 of Brodmann, 1909) and in monkeys (area TG of Von Economo, 1927) and later by Von Bonin and Bailey (1947) (for historical comparative review see Insausti, 2013). Studies of the anatomy of the temporal pole have distinguished an isocortical lateral portion and a medial portion with a more limbic appearance. TPC has been divided into approximately four quadrants according to their laminar organization, with special reference to the presence or absence of layer IV (Moran et al., 1987; Gower, 1989; Kondo et al., 2003). The lateral and medial subdivisions have also been identified in humans, where the lateral temporopolar cortex (TPCl) is related architectonically with the STG, and the medial temporopolar division (TPCm) is closer anatomically to the limbic cortex (Blaizot et al., 2010). In the monkey, our previous cytoarchitectonic descriptions, and that of others, of the medial temporal cortex have included the temporal pole as part of area 36 of PRC given that it shares some architectonic features and has connections with EC (Insausti et al., 1987; Suzuki and Amaral, 1994b, 2003; Blaizot et al., 2004; Lavenex et al., 2004). We have retained the term 36p for the medial side of the temporal pole. Area 36p can be subdivided into a dorsomedial (36pDM), and ventromedial portion (36pVM). The lateral aspect of the temporal pole resembles the cytoarchitectonics of the adjacent neocortical areas of the STG and ITG, and therefore, we used term area 38 of Brodmann. We further divided 38 into dorsolateral (38DL) and ventrolateral divisions (38VL).

FIGURE 4 | Location of retrograde tracer injections (FB and DY) in different dorsoventral parts of the lateral temporal pole and location of anterograde (BDA) tracer injections in areas RTL, Ts3, and Ts2, and in area 38DL. The pial surface (solid line) and

layer IV (dotted line) indicate the laminar involvement of the tracer injection sites. Tracer injections were placed in both right or left hemispheres (indicated with L and R), but all injection sites are shown in the left hemisphere.

## Areas 38DL and 38VL

These two areas lie on the gyral surface of the temporal pole (see **Figures 1**–**3**). Briefly, as delineated using thionin stained coronal sections, area 38DL has a characteristic thin layer II rich in small darkly stained cells well differentiated from layer III, a clearly demarcated layer IV, particularly at caudal levels, and fused layers V and VI (**Figure 3**). In contrast, area 38VL lacks the characteristic darkly stained cells of layer II, but its border with layer III is easily distinguishable. Layer III in area 38VL has the highest cell density and, although modest, the most radial appearance of the adjacent fields of the temporal pole, that becomes progressively more radial at caudal levels (**Figure 3**). Layer IV in 38VL is more clearly demarcated than in 38DL, but layers V and VI still appear fused, as in 38DL. Area 38VL is the area within the temporal pole with the closest resemblance to the ventral part of area 36, and further caudally with the six-layered neocortex area TE. Myelin staining confirmed these subareas of the lateral temporal pole. Briefly, as **Figure 3** illustrates, the outer portion of layer I in area 38DL contains a band of myelinated axons horizontally oriented. This band becomes narrower in area 38VL. Layer IV, typically formed by a dense plexus of myelinated axons oriented perpendicular to the pial surface is very prominent in areas 38VL and 38DL (outer stripe of Baillarger), especially in their caudal portions. This plexus diminishes more medially in areas 36pVM and 36pDM (**Figure 3**). Caudally 38DLborders with parabelt area Ts1 of the rostral STG. Compared with area 38DL, area Ts1 has a clearer demarcation between layers II and III, a higher cell density in layer III, a more prominent layer IV, and a less prominent layer V.

## Areas 36pVM and 36pDM

These areas lie on the medial aspect of the temporal pole. Area 36pDM is located dorsomedially while area 36pVM occupies the ventromedial portion (**Figures 1**–**3**). As with other areas of PRC (36r/c), area 36pDM is characterized by the clusters of darkly stained cells that appear throughout layer II, although these tend to be less dense than in other more caudal areas of area 36r/c. There is a wide layer III with a blurred demarcation with layer IV. There is no clear distinction between layers V and VI. In contrast, area 36pVM has smaller cell clusters in layer II, a thinner layer III, and although layer IV is still faint, the layer III and V boundaries are more distinct than in 36pDM. The analysis of the myelin staining sections showed that the fiber bundles typical of layer I are wider in both subfields of area 36<sup>p</sup> relative to areas 38VL and 38DL. Area 36pDM has a characteristic band of axons that take an arch-like shape (with one end in 36pDM the other

in area 38DL, see myelin stained sections in **Figure 3**). In both, thionin and myelin stained series, layer IV loses its prominence from 38DL toward area 36pDM, which lacks layer IV. Area 36pVM contains an incipient layer IV that becomes more prominent near the border with area 38VL. Layers V and VI appear fused and darkly stained in the entire TPC, with the exception of area 36pDM, which contains a lighter density of myelin stained axons.

## Superior Temporal Gyrus (STG)

We used the cytoarchitectonic delimitation of areas Ts1, Ts2, Ts3, TPO, PGa, IPa and TAa as defined by Pandya and colleagues (Pandya and Sanides, 1973; Seltzer and Pandya, 1978, 1989, see correspondence with Macaca fascicularis in Figure 1 in Muñoz-López et al., 2010). In area Ts1, like in area 38DL, layers V–VI appear fused, but in contrast, Ts1 has a very distinguishable granular layer IV. In myelin sections, Ts1 appears

layers V–VI (in black) indicate the laminar distribution of label (see inset for

areas Ts1, TAa, and TPO compared with the subdivisions of area TE.

more myelinated and exhibits clear a outer stripe of Baillarger compared with area 38DL. Caudally, areas Ts2 and Ts3 show better laminar organization and an increase in myelination; inner band of Baillarger begins to emerge. Layers V–VI show a better differentiation in Ts2 than in area Ts1. In area Ts3, the prominent pyramidal cells in layer V make this layer prevail over layer III pyramids and give this area limbic appearance. Medial to areas Ts2/3, area TAa lies entirely in the dorsal bank of the ts. Area TAa has prominent pyramids in layers III (IIIc) and V (Va) and a discrete demarcation between layers V and VI. Area TAa can be distinguished from Ts2/3 by its relatively equal proportion of supra- and infragranular cell layers, and by a characteristic radial arrangement of cells. In myelin sections, the outer bands of Baillarger are darker in Ts2/3 than those in TAa. Area TPO occupies the dorsal bank of the ts. Medial to area TAa, area TPO layer III is broad with many distinct IIIc pyramids, layer IV appears as well-developed although non-columnar, layer V is not as quite prominent as the Va in area TAa, and layers V–VI show a broader space between them than in TAa, owing to the smaller number of sixth layer cells. Area PGa is the third zone in the dorsal bank of the ts, medial to TPO. Rostrally, this area is difficult to locate because of its location in the fundus of the sulcus, but caudally it expands and occupies almost the entire extent of the ts. It is thin cortex with most of its layers only modestly developed. Layer II is thick and layer VI exhibits a characteristic cluster-like arrangement. Area PGa is better myelinated than the adjacent area PG, it has both bands of Baillarger (faint inner one) and a dense plexus of vertical fibers. Inner layer of Baillarger is scarcely visible, but the vertically oriented myelinated fibers are better developed than in area TAa.

Earlier auditory processing areas within the rostral portion of the STP, including core and belt areas (i.e., A1/R, RM, AL, RTL, RTM, ML, MM, CL, and CM) were identified according to Kaas and Hackett (2000). Briefly, core areas (A1/R) lie in the center of the STP and are characterized by its high density of cytochrome-oxidase, acetylcholinesterase, parvalbumin, and myelinated fibers and a prominent layer IV as seen in Nissl stain. There is a progressive decrease of positive staining and of layer IV laterally and rostrally in the surrounding belt areas (AL, RM, RTM, RTL, ML, MM, CL, and CM), and even more so in the adjacent parabelt areas. For the parabelt areas located laterally to belt areas we have used Pandya's architectonic divisions Ts3-1 (**Figure 2**).

#### Inferior Temporal Gyrus (ITG)

We adopted the architectonic divisions of Von Bonin and Bailey (1947) with modifications (Seltzer and Pandya, 1989, **Figure 3**). Briefly, within ITG, there are different architectonic areas from medial to lateral: TE1, TE2, and TE3, and two in the ventral bank of the ts; TEm and TEa. There is a progression in the architectonic organization from medial to lateral whereby supragranular layers become more prominent, pyramidal cells in layer IIIc make this layer progressively more distinct, layer IV is gradually more differentiated, and layer VI becomes clearly apparent and differentiated from layer V.

#### Medial Temporal Cortex

In this study, we adopted the terminology of Amaral et al. (1987) for EC architectonics and the terminology of Suzuki and Amaral (1994a,b, 2003) for PRC and PHC with two slight modifications. First, we unified 36rm-36rl under the term 36<sup>r</sup> and 36cm-36cl as 36c, and second, we found an increasingly prominent layer IV caudally in area TH, and therefore we used the term THc to differentiate this region from the more rostral portion, namely THr, in which layer IV is absent.

## Results

### Injection Sites

**Figure 4** illustrates the location of the 12 retrograde tracer injections at different dorsoventral levels within area 38 that were used to investigate the auditory projections to the temporal

TABLE 1 | Percentage of labeled neurons in the architectonic areas of the rostral STG (RTM, RTL, Ts1-3, TAa, TPO, PGa, IPa), inferior temporal gyrus (TE), EC, areas 35 and 36 of PRC, and areas TH and TF of PHC.


<sup>a</sup>Percent of retrogradely labeled neurons of the total labeled neurons in the whole cerebral cortex.

<sup>b</sup>Percent of retrogradely labeled neurons of the total labeled neurons in the temporal cortex.

polar cortex. Second, anterograde tracer injections in areas RTL, Ts2, and Ts3 were used to examine in more detail the STG projections to the temporal polar cortex and to explore possible direct connections to the medial temporal cortex. **Figure 4** also shows the anterograde tracer injections in area 38DL destined to determine the full extent of the projection from this area to the medial temporal cortex.

## Afferent Projections from STP and STG to the Temporal Pole

## Retrograde Injections in Dorsolateral Lateral Temporal Pole Area 38DL

The three retrograde tracer injections placed in area 38DL resulted in extensive retrograde labeling of neurons in adjacent areas of the rostral STG. As **Figures 5**, **6** show, layers II–III and V–VI of areas Ts1 and rostral TAa contained the largest density of labeled neurons accounting for up to 70% of the temporal cortical input to this area. The density of retrograde label decreased substantially at more caudal levels in area Ts3 (**Figures 5**, **6**, **Table 1**). Areas RTL and RTM contained up to 11% of the temporal cortex input to 38DL and distributed across layers II–III and V–VI rostrally, and in layers II–III more caudally. As illustrated in **Figures 5**, **6**, the density of retrograde label was high in the multimodal area of the dorsal bank of the ts; specifically at rostral and mid-levels of area TPO (layers II–III and V–VI), accounting up to 21% of the total temporal cortex input to 38DL. The density of retrograde label in area TPO decreased medially toward the fundus of the ts (area PGa) and caudally at the level of Ts2-Ts3. Visual processing area TE1 of the inferior temporal cortex had very modest retrograde label (2% of temporal cortex input) located in layers II–III and V–VI (**Table 1**, **Figure 6**).

In contrast, as **Figures 7**, **8** show, injections in area 38VL (n = 5), yielded the largest amount of labeled cells in multimodal area of the dorsal bank of the ts, accounting for up to 60% of the temporal input to this area. Within the cortex of the dorsal bank of the ts, layers II–III and V–VI of area TPO had the highest density of retrograde label (43% of the temporal input), followed by area PGa in the fundus of the ts (17%, **Table 1**, **Figure 7**). This projecting region of the ts continued laterally and encroached area TAa (22%) in the gyral surface of the rostral STG (**Figure 7**). The density of retrograde label decreased in areas Ts1, RTL, and RTM (**Figures 7**, **8**). However, visual processing area TE had high density of retrograde label (up to 16% of the temporal input).

Label in area TE was concentrated primarily in the rostral portion of subareas TEa and TEm in the ventral bank of the ts, and in two large patches in area TE1 on the gyral surface, one rostral and another one located more caudally.

injections in area 38VL. Note the low density of retrograde label in areas

The retrograde injections in area 38 near the 38DL/38VL boundary (n = 3) labeled neurons that took a transitional pattern of distribution between that seen after the more dorsal and ventral injections in 38. In one hand, as **Figures 9**, **10**

area 38DL. Symbols and abbreviations as in previous figures.

show, injections near the 38DL/38VL boundary resulted highest density of retrograde label in layers II–III and V–VI of the multimodal areas TPO and PGa of the dorsal bank of the ts, accounting for up to 40% of the temporal cortex input to this area, with the heaviest contribution from area TPO (34% of the temporal input) followed by area PGa (up to 17%). On the other hand, the next heaviest projection originated similarly in terms of densities from both auditory and visual processing areas; such as the rostral part of area TAa (17%) and areas TE1-2, TEa, and TEm (16%). Areas Ts1 (7%), Ts2 (5%), Ts3 (1%), and RTL/RTM (2%) of the STG also contained retrograde label. Like in previous cases, the density of retrograde label decreased progressively at more caudal levels in all areas (**Figures 9**, **10**).

#### Anterograde Injections in Areas Ts3, Ts2, and RTL

The anterograde injections in areas Ts2 and Ts3 (**Figures 11**, **12**, respectively) yielded similar patterns of anterograde label in the temporal lobe. Both injections resulted in dense bundles of labeled axons with termination in layers II–III and V–VI of the neighboring areas Ts1, RTL, RTM, and the parainsular area PaI as well as in areas TAa and TPO within the cortex of the dorsal bank of the ts. These bundles of labeled axons coursed rostrally within the temporal lobe white matter with extensive termination label in layers II–III and V–VI of area 38DL and area 36pDM (**Figures 11**, **12**). Anterograde labeled fibers appeared to form columns in the temporopolar cortex, but only occasionally in areas Ts1, RTL, RTM, PaI, TPO, and none were observed in area TAa (**Figures 11**, **12**).

The BDA injection in area RTL of the rostral STP (**Figure 13**) labeled axons that coursed medially toward the adjacent area RTM and the parainsular cortex (PaI), where terminal label took a columnar-like appearance across layers II–III and V–VI. Another bundle of labeled fibers coursed laterally to areas Ts1-Ts3, TAa, and, to less so to area TPO. Labeled fibers continued rostrally toward the temporal pole to terminate primarily in layers II–III and V–VI of areas 38DL and 36pDM in the dorsal temporopolar cortex (**Figure 13A**).

It is worth noting that none of these anterograde injections in rostral STG areas resulted in any substantial anterograde

label in the medial temporal cortex, whereas anterograde label was found in the temporal pole and multimodal areas of the ts. However, the medial temporal cortical areas that receive this scarce projection also receive projections from 38DL (i.e., ELr, ER, and E<sup>I</sup> and areas TH and TF, see next section).

the temporal lobe after a BDA injection in area Ts3. Labeled

## Projections from the Dorsolateral Temporal Pole (38DL) to Medial Temporal Cortex Temporal Pole Intrinsic Connections

Anterograde tracer injections in area 38DL yielded a high density of labeled axons and terminals in layers in layers II– III and V–VI of the adjacent area 36pDM and extending more moderately 36pVM, suggesting a pattern of high density of local connectivity within the most dorsal subdivisions of the temporal pole and less so with the more ventral subdivisions (**Figure 14**).

## Entorhinal Cortex (EC)

While density of anterograde label in the EC olfactory division (EO) was scarce, it increased substantially in layers I–III and V–VI of the rostral-lateral EC (ER, ELr, and ELc) and then decreased again caudally and medially in the subdivisions E<sup>I</sup> , EC, and ECL with label primarily in layers I–III (**Figure 14**). Anterograde label tended to occupy all layers of the EC when label was densest and layers II–III when label was moderate to light.

It is interesting to note that the laminar and topographical distribution of label in EC was different after the retrograde and anterograde injections in 38DL. In contrast to the rostral-lateral distribution of anterograde label in EC, retrograde label was almost absent in the lateral divisions (ELr and ELc) and distributed medially in ER, E<sup>I</sup> , EC, and ECL (compare **Figures 5**, **6** with **Figure 14**). In terms of laminar distribution, retrograde label, was more restricted and concentrated primarily in layers V–VI of the EC projecting subdivisions (ER, E<sup>I</sup> , EC, and ECL).

scarce projection from area Ts3 to the medial temporal cortex.

## Perirhinal Cortex (PRC)

As shown in **Figure 14**, area 36<sup>r</sup> had a modest density of anterograde label and was located in its most rostral portion and distributed across layers. Area 36<sup>c</sup> had only very light density of labeled fibers that often continued with label in area TF<sup>l</sup> of the posterior parahipocampal cortex. In contrast, area 35 of PRC, along the fundus of the rhinal sulcus, had moderate density of labeled fibers primarily in layers V and VI. It is worth noting that the topographical distribution of anterograde and retrograde label in areas 35 and 36 of PRC after 38DL injections was similar.

## Posterior Parahippocampal Cortex (PHC)

Anterograde label was found in the rostral half of areas TH and TF, primarily in layers I–III and V–VI of the lateral division of area TF (TF<sup>l</sup> ). Anterograde label became progressively lighter

and more restricted to layers I–III more caudally in TF<sup>m</sup> and area TH (**Figure 14**). It is worth noting that the topographical distribution of anterograde and retrograde label in areas TH and TF of PHC after 38DL injections was similar.

## Discussion

The aim of this study was to determine if or how highly processed auditory information might enter the medial temporal cortex. Our results showed first, that about 70% of the total temporal input to area 38DL of the dorsolateral temporal pole originated in the auditory processing areas of Ts1 and TAa of the rostral STG and area RTL of the rostral STP. Second, area 38DL sends this information to EC, area 35 of PRC, and areas TH-TF of the PHC. Third, the projection to area 36 of PRC are restricted to the most rostral part of its rostral subdivision 36<sup>r</sup> and the most caudal portion of 36<sup>c</sup> ; this caudal patch of cells was often continuous with that of TF<sup>l</sup> (see summary in **Figure 15**). Fourth, area 38DL of the temporal pole receives a proportion of its input from polysensory areas of the cerebral cortex (i.e., dorsal bank of the ts, orbital frontal, medial frontal, agranular insular, and medial temporal cortices), and therefore, this area may integrate auditory information with inputs from other sensory modalities. We discuss our results with previous studies of auditory processing within the STG and STP and conclude with the implications of our own results on the anatomical organization of memory pathways for audition.

## Sensory Domains in the Temporal Pole

The importance of the projections from the rostral part of the STG to the temporal pole for the processing of higher order auditory information was first suggested by Jones and Powell (1970) and supported by Moran et al. (1987). They showed that, whereas the medial subdivisions of the temporal pole receive primarily olfactory and limbic input, the dorsolateral temporal pole (38DL here) receives input from auditory processing areas. Later anatomical studies suggested an anatomical schema whereby anterior subdivisions of the auditory

belt send projections to progressively more anterior portions of the STG (Seltzer and Pandya, 1978; Galaburda and Pandya, 1983; Cipolloni and Pandya, 1989; Kaas and Hackett, 2000). This stream of connections would course rostrally to reach the temporal pole (Markowitsch et al., 1985), in particular, the dorsolateral aspect of the temporal pole (Moran et al., 1987, our own results). Our results, therefore, support previous studies and add that auditory input represents about 50% of the total cortical input and 70% of the total temporal cortex input to area 38DL of the dorsolateral temporal pole.

## The Ventral Auditory Stream

Whether there is a unique auditory ventral stream within the STG directed rostrally or an additional one directed mediolaterally toward the gyral convexity and the cortex of the STG remains still an open question (Bendor and Wang, 2008, see discussion in Kikuchi et al., 2010; Tanji et al., 2010). Although our study addressed primarily the rostral end of the ventral stream, our results reinforce the hypothesis that downstream projections within the rostral STG might be organized in two main parallel streams. As illustrated in **Figures 11**–**13**, anterograde injections in areas Ts3, Ts2, and RTL labeled axons that course toward area 38DL of the dorsolateral temporal pole in a rostrally directed stream, but these injections also labeled axons that course laterally to areas TAa and TPO of the gyral convexity and dorsal bank of the ts. Despite the unknown mechanisms underlying the stimulus processing by both streams, fMRI and electrophysiological data suggest that the adjacent areas Ts1 and Ts2 are especially important for encoding complex sounds, including conspecific calls in monkeys (Petkov et al., 2008; Kikuchi et al., 2010; Fukushima et al., 2014). Although fMRI data call-activation areas are located in areas Ts1-2, PET studies in primates have shown that the dorsal aspect of the temporal pole (area 38DL in this study) is especially responsive to speciesspecific calls (Poremba et al., 2003, 2004; Gil-da-Costa et al., 2006). The differences in functional activation in fMRI vs. PET reports might be explained by differences in vulnerability to scanning artifacts. A comparative PET-fMRI study in humans showed speech-activated regions in the temporal pole region using PET but not fMRI, suggesting that whereas fMRI signal in the temporal pole is more vulnerable to artifacts, PET can detect activity in this region more reliably (Devlin et al., 2000). The authors also suggest that fMRI requires to adapt data acquisition paradigms and/or the use of ROI analysis to match PET sensitivity. This leaves the doors open to compare between primate PET and fMRI studies on complex auditory stimulus processing.

However, the cortical network for recognition of speciesspecific monkeys calls might be a large one of which the dorsolateral temporal pole (area 38DL) is only one part. According to functional 2-deoxyglucose data, auditory processing includes the entire STG, and some regions of the

frontal, parietal, and medial temporal cortical areas (Poremba et al., 2003). In line with this, fMRI data suggests that area 38DL of the dorsolateral temporal pole (Gil-da-Costa et al., 2004; Poremba et al., 2004; Petkov et al., 2008), inferior frontal and parietal regions (possible analogs of Broca's and Wernicke's areas, Gil-da-Costa et al., 2006), area 32 of the medial frontal cortex, amygdala, and hippocampus are especially important for the processing of species-specific calls (Gil-da-Costa et al., 2004). All

the components of the network have connections with area 38DL (Muñoz et al., 2003, present results) from which information is forwarded to 36pDM, EC, area 35 of PRC and posterior areas TH and TF of PHC and to the most rostral portion of area 36<sup>r</sup> . This rostral STG-38DL-EC/PRC/PHC pathway, although not functionally enough to support long-term recognition of purely auditory information as tested with DMS tasks, it may still be important for the storage of complex auditory information in rhesus monkeys, especially con-specific calls (Wich and de Vries, 2006; Ng et al., 2009).

A recent study reported neurons in the dorsolateral temporal pole (area 38DL here) that responded to task-relevant events in a delayed matching task, with some neuronal responses associated with accuracy in recognition performance in a DMS task (area dTP in Ng et al., 2014). Some neurons in area 38DL showed match suppression responses similar to those observed in the visual object identification pathway located in the ventral part of the temporal pole (area 38VL here, Desimone, 1996; Nakamura and Kubota, 1996). This suggests that the dorsolateral temporal pole might be an important area for memory encoding.

It is important to mention here the case of tactile memory. Even though monkeys and humans retain tactile information in mind efficiently for long delays (Goulet and Murray, 2001; Bigelow and Poremba, 2014), the projection from higher order somatosensory areas that process touch in the granular insula is restricted to area 35 of PRC (Murray and Mishkin, 1984; Schneider et al., 1993; Friedman et al., 1986). However, the anatomical pathway for touch despite of being restricted, just like the auditory one, it appears to be sufficient to hold tactile information in mind long enough as to be transferred in to long-term memory in primates, but also in humans (Bigelow and Poremba, 2014). However, a possible explanation for this is that tactile information is translated internally to vision and gets remembered by means of using the visual memory pathway. This is a working hypothesis that calls for further research.

## Auditory Memory Pathway

The rostral part of STG (38DL, Ts1, TAa) and area TPO in the dorsal bank of the ts sends information directly to EC (Amaral et al., 1983, see review in Mohedano-Moriano et al., 2007; Insausti and Amaral, 2008). However, with the exception of a dense projection from area TPO, these areas of the rostral STG only send a meager projection to areas 35 and 36r/<sup>c</sup> of PRC (Suzuki and Amaral, 1994b; Kondo et al., 2003; Muñoz et al., 2003). There is another minor entry of auditory input to the medial temporal cortex via a small projection from the caudal part of STG to area TH of PHC (Tranel et al., 1988; Suzuki and Amaral, 1994b). Our results show that the areas that form the rostral STG project mainly to area 38DL, which in turn projects to EC, area 35 of PRC, and areas TH and TF of PHC. However, and in striking disparity with the pathway important for visual memory (TE-PRC-EC), this projection bypasses most of area 36r/<sup>c</sup> of PRC. This finding in particular might offer an explanation, at least in part, of the poor recognition memory ability of rhesus monkeys in audition. An explanation that might be extensive to the poorer ability for auditory memory in humans compared with touch and vision (Bigelow and Poremba, 2014).

## Conclusion

We have shown that area 38DL receives 70% of its cortical input from the auditory association region of the rostral STG, with a substantial input from the polysensory areas of the ts, medial frontal, orbitofrontal, insular, and medial temporal cortices. These results are consistent with lesion and functional imaging in rhesus monkeys suggesting that, among other functions, the dorsolateral temporal pole processes complex auditory stimuli (including species-specific calls). Area 38DL sends heavy projections to the EC, area 35 of PRC and areas TH and TF of PHC, but bypasses most of area 36r/<sup>c</sup> of PRC. This anatomical arrangement may contribute to our understanding of the poor auditory memory of rhesus monkeys.

## Acknowledgments

This study was supported by NIMH/IRP grant BFI 2003-09581 and the Spanish Ministry of Science and Innovation grant BFU 2006-12964. The Technical support of Elena Hernáez, Marta Fonollosa is greatly appreciated.

## References


boundaries applied to the primate medial temporal lobe. Neuroscience 120, 893–906. doi: 10.1016/S0306-4522(03)00281-1


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Muñoz-López, Insausti, Mohedano-Moriano, Mishkin and Saunders. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

## Neural correlates of short-term memory in primate auditory cortex

## *James Bigelow , Breein Rossi and Amy Poremba\**

*Department of Psychology, University of Iowa, Iowa City, IA, USA*

#### *Edited by:*

*Monica Munoz-Lopez, University of Castilla-La Mancha, Spain*

#### *Reviewed by:*

*Christopher I. Petkov, Newcastle University, UK Ricardo Insausti, University of Castilla-La Mancha, Spain*

#### *\*Correspondence:*

*Amy Poremba, Department of Psychology, University of Iowa, E11 Seashore Hall, Iowa City, IA 52242, USA e-mail: amy-poremba@uiowa.edu* Behaviorally-relevant sounds such as conspecific vocalizations are often available for only a brief amount of time; thus, goal-directed behavior frequently depends on auditory short-term memory (STM). Despite its ecological significance, the neural processes underlying auditory STM remain poorly understood. To investigate the role of the auditory cortex in STM, single- and multi-unit activity was recorded from the primary auditory cortex (A1) of two monkeys performing an auditory STM task using simple and complex sounds. Each trial consisted of a sample and test stimulus separated by a 5-s retention interval. A brief wait period followed the test stimulus, after which subjects pressed a button if the sounds were identical (match trials) or withheld button presses if they were different (non-match trials). A number of units exhibited significant changes in firing rate for portions of the retention interval, although these changes were rarely sustained. Instead, they were most frequently observed during the early and late portions of the retention interval, with inhibition being observed more frequently than excitation. At the population level, responses elicited on match trials were briefly suppressed early in the sound period relative to non-match trials. However, during the latter portion of the sound, firing rates increased significantly for match trials and remained elevated throughout the wait period. Related patterns of activity were observed in prior experiments from our lab in the dorsal temporal pole (dTP) and prefrontal cortex (PFC) of the same animals. The data suggest that early match suppression occurs in both A1 and the dTP, whereas later match enhancement occurs first in the PFC, followed by A1 and later in dTP. Because match enhancement occurs first in the PFC, we speculate that enhancement observed in A1 and dTP may reflect top–down feedback. Overall, our findings suggest that A1 forms part of the larger neural system recruited during auditory STM.

**Keywords:** *Macaca mulatta***, working memory, A1, primary auditory cortex, rhesus macaque, recognition memory**

## **INTRODUCTION**

One of the vital cognitive processes enabling adaptive behaviors in humans and other animals is short-term memory (STM), i.e., the temporary retention of behaviorally-relevant information in the absence of direct stimulation (Goldman-Rakic, 1995). In contrast to the sizable literature describing visual STM and its neural substrates, relatively few studies have investigated auditory STM at the behavioral or neuronal levels. This central function of the auditory system is fundamental to vital behaviors such as conspecific communication. Thus, one of the remaining steps toward a complete view of the functional organization of the auditory system is a more detailed understanding of auditory STM and its underlying neural processes.

Early studies investigating the neural substrates of visual STM in non-human primates singled out the critical involvement of the lateral prefrontal cortex (PFC) in tasks that included a memory delay. Thus, bilateral lesions of the PFC produced severe performance deficits in canonical tests of STM such as delayed response and delayed matching-to-sample (DMS; Jacobsen, 1935; Mishkin and Manning, 1978; Goldman-Rakic, 1987). Further, electrophysiological studies have shown that neuronal activity in the PFC changes in ways that correspond to STM task demands. For example, many studies have reported that significant proportions of PFC neurons exhibit sustained changes in firing rates (often elevated but sometimes suppressed) during the retention phase of STM tasks (e.g., Fuster and Alexander, 1971; Miller et al., 1996; Shafi et al., 2007). Moreover, when task contingencies require the subject to identify whether a test stimulus matches a prior sample stimulus, many PFC neurons exhibit significantly enhanced firing rates when a match is detected, whereas other cells exhibit significant match suppression (e.g., Miller et al., 1996).

Although additional research has largely validated the prominent role of the PFC in STM, growing evidence has required expanded models of STM, which accommodate the involvement of earlier sensory cortical areas (Constantinidis and Procyk, 2004; Pasternak and Greenlee, 2005; Postle, 2006). For non-spatial forms of visual STM, this includes coactivation and functional interactions between PFC and visual areas in the temporal lobes (Fuster and Jervey, 1981, 1982; Fuster et al., 1985; Miller et al., 1993, 1996; Miller and Desimone, 1994; Constantinidis and Procyk, 2004; Ranganath, 2006), whereas spatial forms of visual STM rely heavily upon fronto-parietal interaction (Friedman and Goldman-Rakic, 1994; Chafee and Goldman-Rakic, 1998, 2000; Quintana and Fuster, 1999; Curtis, 2006; Klingberg, 2006). Further, correlates of visual STM have been observed in early visual areas in the occipital lobe including primary visual cortex (Supèr et al., 2001; Sligte et al., 2009; Emrich et al., 2013), as well as in the mediodorsal nucleus of the thalamus (Schulman, 1964; Fuster and Alexander, 1973; Isseroff et al., 1982). Thus, contemporary views hold that visual STM is enabled by collaborations among multiple nodes of a widespread network comprising cortical and subcortical structures. Within this system, the roles of the PFC include integrating sensory inputs, selecting task-relevant information, and exerting top–down influence on earlier sensory areas, thus modulating responses to behaviorally-relevant stimuli and ultimately guiding goal-directed behavior (Miller and Cohen, 2001; Fuster, 2008).

Fewer studies have investigated the neural substrates of auditory STM, perhaps in part due to the difficulties associated with training non-human primates to perform auditory tasks (Cohen et al., 2005; Fritz et al., 2005; Munoz-Lopez et al., 2010; Scott et al., 2012; Bigelow and Poremba, 2014). Nevertheless, the available evidence suggests that neural circuits underlying auditory and visual STM share at least some of the same organizational and functional principles (Poremba and Bigelow, 2013). Some of the earliest attempts to characterize the role of the PFC in auditory STM used delayed response or DMS tasks in which subjects were trained to match an auditory sample to a visual test. In both spatial and non-spatial versions of these tasks, neurons in the PFC exhibited changes in firing rate during the retention interval similar to those observed in visual tasks (Joseph and Barone, 1987; Bodner et al., 1996; Fuster et al., 2000). Correspondingly, performance in these tasks was significantly impaired by PFC lesions or cooling inactivations (Blum, 1952; Sierra-Paredes and Fuster, 2002). Subsequent studies have also observed neurophysiological correlates of audiospatial STM in the PFC using purely auditory delayed response and DMS tasks (Kikuchi-Yorioka and Sawaguchi, 2000; Artchakov et al., 2007, 2009). Outside of the PFC, several lesion and recording studies have indicated that auditory areas in the temporal lobe are important for non-spatial auditory STM (Colombo et al., 1990, 1996; Fritz et al., 2005; Ng et al., 2014), and one study has indicated the involvement of the lateral intraparietal area in spatial auditory STM (Mazzoni et al., 1996). Further, correlates of auditory STM for tone frequencies have been reported in primary auditory cortex (Gottlieb et al., 1989; Sakurai, 1990, 1994) as well as auditory thalamus (Sakurai, 1990). Thus, like the visual system, auditory STM may rely on the coordinated action of multiple brain areas including the PFC, temporal and parietal sensory association areas, primary sensory cortex, and thalamus.

Despite the moderate amount of progress toward understanding the neural substrates of auditory STM, there are still many remaining questions. For example, very few studies have investigated non-spatial STM using purely auditory tasks that include complex, naturalistic sound types such as conspecific vocalizations, which may be important for communication (Poremba et al., 2013). Our lab has recently conducted neurophysiological recording studies in the PFC (Plakke et al., 2013) and dorsal temporal pole (dTP), the rostral-most portion of the superior temporal gyrus (Poremba et al., 2003, 2004; Poremba and Mishkin, 2007; Ng et al., 2014), in an effort to fill these gaps in knowledge. Non-human primate subjects were trained to perform a same/different DMS task, wherein sample and test sounds were separated by a 5-s retention interval. Subjects were trained to press a button ("go" response) if the sounds were identical, and to withhold button presses ("no-go" response) if the sounds were non-identical. In the interest of separating soundevoked responses from activity related to the button press and/or rewards, subjects were required to wait 1 s after the test sound had terminated to make their response. During the retention interval, portions of cells in both PFC and dTP exhibited significant changes in firing rate, though in smaller proportion, and with less consistency than has been reported in most unit-recording studies of visual STM. In the PFC, matching test sounds often evoked enhanced firing rates relative to non-matching sounds as well as the sample (Plakke et al., 2013). On average, these firing rates remained elevated throughout the wait period before a behavioral response was made. In dTP on the other hand, matching sounds were typically associated with suppressed firing rates that were observed very early during the sound presentation period (Ng et al., 2014). Perhaps as a result of top–down feedback originating in PFC, firing rates on match trials later increased during the wait period such that they exceeded firing rates on non-match trials.

Taken together, the findings that matching sounds which require "go" responses produce elevated firing rates in PFC but initially suppress firing rates in dTP suggest that separate neural mechanisms may be involved in differentiating matching vs. non-matching sounds. One possibility is that early match suppression effects in dTP and match enhancement effects in PFC (and later in dTP) reflect bottom–up processes involved in detecting changes in the acoustic environment (e.g., Jääskeläinen et al., 2007), and top–down processes involved in detecting events that are needed to guide prospective behavior, respectively. If true, early match suppression and late enhancement effects similar to those observed in dTP might be observed at earlier levels of the auditory system, including primary auditory cortex (A1). On the other hand, cue enhancement and suppression effects and other task-driven modulations of neurophysiological activity, such as delay-related changes in firing rate, might not be observed at this early stage in the auditory processing stream. To investigate these possibilities, neurophysiological activity was recorded from A1 in subjects performing an auditory STM task.

## **METHODS**

#### **SUBJECTS AND SURGERY**

Two adult macaque monkeys (*Macaca mulatta*) served as subjects for this experiment (monkey A: female; monkey O: male). The subjects were the same as those used in prior experiments from our lab investigating neural correlates of auditory STM in PFC (Plakke et al., 2013) and dTP (Ng et al., 2014). Both animals had extensive prior experience with auditory STM tasks and passive sound exposures (Plakke et al., 2008, 2013; Ng et al., 2009, 2014; Bigelow and Poremba, 2013a). The monkeys were housed under a 12:12 light:dark cycle in individual cages with *ad libitum* access to water and controlled feeding schedules. Subjects were fed after training each day (Harlan monkey diet plus fruit, vegetables, and treats) and maintained above 85% of their free-feeding weight throughout the duration of the experiment. Prior to the experiment, the monkeys were surgically prepared with electrophysiological recording chambers. Subjects were sedated with ketamine (10 mg/kg) and anesthetized with isoflurane (1–2%). Prior to surgery, each monkey was scanned with magnetic resonance imaging (MRI: 2T Sigma unit; GE Medical Systems, WI) to locate the coordinates of A1 and to verify the placement of electrodes within the recording grid (see below). Using a stereotaxic apparatus (David Kopf Instruments, Tujunga, CA), an angled 45-degree recording chamber (Crist Instruments, Hagerstown, MD) was implanted on the skull over the left hemisphere, centered at −2 mm posterior and −23 mm lateral of stereotaxic 0,0 (Saleem and Logothetis, 2007), and its position was secured with titanium screws and dental acrylic. A stainless steel head post was attached to the back of the skull to enable head restraint during electrophysiological recordings. Antibiotics and analgesics were administered as needed following surgery. Recording chambers were routinely cleaned with antiseptics using sterile instruments to inhibit infection. All surgical and experimental procedures conformed to standards provided by the National Institutes of Health and were approved by the Institutional Animal Care and Use Committee at the University of Iowa.

#### **APPARATUS AND RECORDING PROCEDURE**

Experiments were conducted in a double-walled sound attenuation chamber (Industrial Acoustics Company, Bronx, NY). Subjects sat in a custom-made primate chair that allowed free arm movements while restraining head movements with a bar that attached to the head post. Sounds were presented through a central speaker located approximately 40 cm from the head region. Responses were made via a single acrylic button positioned 3 cm below the speaker. Small food rewards were dispensed from a pellet dispenser (Med Associates, Georgia, VT) into a dish located 3 cm below the response button. An overhead "house light" provided illumination for the duration of the experiment, and a second overhead light provided additional illumination during the intertrial interval (ITI). Custom-designed software (LabView, National Instruments, Dallas, TX) controlled and recorded all task events. A small overhead camera with microphone allowed audiovisual observation by the experimenter.

At the outset of each session, a multielectrode system was used to lower 1–4 tungsten microelectrodes (1–3 M impedance; FHC Inc., Bowdoin, ME) into A1. Each electrode was held by a 23-g sterile guide cannula positioned in an x-y grid attached to a micromanipulator, and was advanced to the region of interest using a computer-controlled electrode drive system (NAN Instruments, Nazareth, Israel). Spiking activity was extracted by applying a band-pass filter (0.5–10 kHz) to the raw extracellular signal. The resulting spike waveforms were amplified, digitized, and displayed in real time using a Multichannel Acquisition Processor (Plexon Inc., Dallas, TX), with spike times saved to hard disk at 40 kHz. Task events such as stimulus presentations and behavioral responses were recorded concurrently with the neurophysiological data. Both single- (SUA) and multi-unit activity (MUA) were collected. At many recording sites, it was possible to isolate SUA using a combination of online (dual window discriminators; Sort Client, Plexon Inc., TX, USA) and offline (e.g., principal components analysis, template matching; Offline Sorter, Plexon Inc., Dallas, TX) spike-sorting techniques. MUA was defined as the unsorted spike activity exceeding a site-specific amplitude threshold. Neurophysiological recordings were initiated after one or more single units had been isolated. A total of 334 units (SUA: 160 units; MUA: 174 units) were recorded and analyzed.

The position of A1 was estimated using electrode coordinates based on the recording grid position and electrode depth in conjunction with the animals' MRIs. The locations of all units included in the analyses below were estimated between −22 and −28 mm from bregma in the medial to lateral plane, and between 9.5 and 4 mm in the anterior to posterior plane (Saleem and Logothetis, 2007), covering the full area of A1, with multiple dorsal to ventral electrode penetrations. Following the STM task, each unit was passively exposed to a range of 56 pure tones and band-passed noise stimuli (500-ms duration) with center frequencies spanning 0.1–18.9 kHz. Each stimulus was repeated 9–11 times in pseudorandom order separated by a variable interstimulus interval (mean: 1320 ms; range: 1200–1500 ms). Consistent with the estimated recording location of the electrodes, most of the available units (300/313) exhibited significant frequency selectivity (21 of the units were lost before the passive exposure phase and so were not analyzed), where the firing rate elicited by the best tone frequency exceeded the mean response elicited by the remaining frequencies by at least two standard deviations (example units shown in Figure S1). In addition, mean peak amplitudes and latencies elicited by tones and noise were assessed for each unit. Previous studies have indicated that belt areas exhibit significantly greater peak amplitudes with shorter latencies in response to noise stimuli compared to tones, whereas no significant differences are observed in A1 (e.g., Lakatos et al., 2005; Kayser et al., 2008). Using repeated-measures analysis of variance (ANOVA), no significant differences in peak amplitude [*F*(1, 312) = 1.1, *p* > 0.05] or latency [*F*(1, 312) = 2.1, *p* > 0.05] elicited by tones and noise were detected at the unit population level. Thus, in conjunction with the estimated anatomical coordinates, the physiological results suggest that our unit population was recorded primarily within A1, although it is possible that some units were recorded from the immediately adjacent cortical fields.

#### **SHORT-TERM MEMORY TASK**

The auditory STM task employed in this experiment was the same/different variation of the DMS task (D'Amato and Worsham, 1974; Wright, 2007), which is suitable for auditory stimuli. A schematic diagram of the task is depicted in **Figure 1**. Following a variable ITI (mean: 9 s, range: 8–10 s), each trial began with a sample stimulus, followed by a 5-s retention interval, after which a test stimulus was presented. Similar to previous experiments from our lab (Plakke et al., 2013; Ng et al., 2014), a 750-ms pre-response wait period began after the test stimulus had terminated. This was included to ensure that sound-evoked

**FIGURE 1 | Diagram of the auditory short-term memory task.** Each trial consisted of 500-ms sample and test sounds separated by a 5-s retention interval. For match trials, the sounds were identical and the correct response was a button press, whereas for non-match trials the sounds were non-identical and the correct response was to withhold from pressing the button. Sample and test sounds were pseudorandomly selected for each trial from a variety of naturalistic and artificial sound exemplars (see Methods).

A pre-response wait period followed the test stimulus, after which the response button was illuminated to signal the response window. Responses outside of the response window (e.g., during the sound presentations or wait period) aborted the trial and these trials were not included in subsequent analyses. Overhead lighting provided constant low-level illumination throughout the session, and a second overhead light was turned on during the ITI to serve as a cue by which trials could be segregated.

responses were not contaminated by artifact related to behavioral responses or reward expectancy (e.g., Brosch et al., 2005; Yin et al., 2008). Following the pre-response wait period, the response button was illuminated by an orange backlight for 1 s, indicating the response window. Responses outside of the response window (e.g., during the sound presentations or wait period) aborted the trial and these trials were not included in subsequent analyses. For trials on which the sample and test sounds were identical (*same* or *match* trials), the correct response was defined as a button press ("go" response). For trials on which the sample and test sounds were non-identical (*different* or *non-match* trials), the correct response was defined as the absence of a button press ("no-go" response). Responses were subject to an asymmetric reinforcement contingency in which correct "go" responses on match trials were rewarded with a small food pellet and incorrect button presses on non-match trials ("false match" responses) were occasionally punished by a brief, mild air puff presented indirectly from a distance of approximately 15 cm from the animal. During the monkeys' initial training, false match responses regularly resulted in punishment; however, following acquisition of the task, approximately 1/10 of the "false match" responses were punished on a variable schedule. Similar DMS tasks using the go/no-go paradigm and asymmetric reinforcement contingency have been used in previous studies of auditory STM in monkeys and other animals as they facilitate learning the same/different rule (Stepien and Cordeau, 1960; Nelson and Wasserman, 1978; Kojima, 1985; Colombo and D'Amato, 1986; Ng et al., 2009; Munoz-Lopez et al., 2010). Each session comprised 200 trials with an equal number of match and non-match trials presented in pseudorandom order.

#### **STIMULI**

One of 12 stimulus sets was pseudorandomly selected for each experimental session which contained one exemplar of each of the following eight sound types: conspecific monkey vocalizations, human vocalizations, animal vocalizations, natural/environmental sounds, music samples, synthetic sounds, pure tones, and band-passed noise. All stimuli were trimmed to 500 ms with the exception of several vocalization stimuli that were shorter than 500 ms. The sounds were volume normalized using Audition (Adobe Systems, San Jose, CA) and presented at 72 ± 5 dB. Spectrograms and temporal envelopes for each of these stimuli are shown in Figure S2. Monkey vocalizations were recorded at a natural monkey reserve in South Carolina, USA (by Amy Poremba), and included coos, grunts, screams, and shrill barks. Human vocalizations included various speech and nonspeech vocal sounds from a variety of male and female speakers. Animal vocalization exemplars were drawn from a variety of birds and non-primate mammals. Natural and environmental sounds included recordings of events such as flowing water and rushing wind. Music samples were recordings of instrumental music, e.g., a three-note sequence played on a piano. Synthetic sounds comprised sounds that do not occur naturally, e.g., they were generated electronically with a synthesizer. Pure tones and bandpassed noise exemplars were digitally generated with a range of center frequencies spanning 1083–8820 Hz. Each sound was presented with equal frequency as the sample and test sounds on both match and non-match trials.

In addition to these eight sound types, two variations of a white noise burst were included in each session and presented on the same number of trials as the other sounds. For match trials, the white noise burst comprised two 200-ms periods of noise separated by a 100-ms silence gap. For non-match trials, the noise burst comprised four 100-ms periods of noise separated by three silence gaps (100 ms total). These stimuli were included, among other objectives, to investigate whether subjects were sensitive to the differential contingencies (match vs. non-match) associated with the subtle temporal variations in the noise bursts. We found only limited evidence that they did so: accuracy did not change when the white noise variation associated with match trials was presented, and accuracy benefitted only modestly when the white noise variation associated with non-match trials was presented as the test sound, but not as the sample. In light of recent findings by Scott et al. (2013), the failure of our subjects to exploit the information contained in the noise burst stimuli is not surprising. Scott and colleagues found that monkeys performing an auditory DMS task made little use of the temporal information contained in a variety of natural and artificial sounds, similar to those used in our experiment. Instead, the monkeys relied heavily upon spectral content of the sounds in making the match/non-match decision. Thus, non-matching sound pairs with uncorrelated spectra and disparate harmonics-to-noise ratios (HNRs) were associated with higher non-match decision rates. Indeed, the white noise bursts in our study were spectrally distinct from the remaining sound types, which may have contributed to the modest benefit in accuracy when presented as a non-matching test stimulus. Thus, for the purposes of the current study, the noise burst stimuli were not evaluated separately from the other sound types.

#### **DATA ANALYSIS**

Behavioral data were analyzed by computing mean accuracy, response latency, and d-prime values for each session. Comparisons between trial types were tested using ANOVA with the session means as individual data points. Subjects occasionally quit participating in the task prior to the end of the programmed session. These trials were not included in behavioral or neurophysiological analyses to ensure that any observed effects were attributable to mnemonic factors, rather than motivation or arousal. As in previous studies (Bigelow and Poremba, 2013a,b), for sessions in which subjects made no responses during the last 20 trials or more, the final response was considered as the end of the session (6.0% of total trials).

The sorted SUA and MUA data were exported to neurophysiological data analysis software (NeuroExplorer, Nex Technologies, Littleton, MA), wherein spike activity related to task events such as the sample and test sounds was evaluated using peristimulus time histograms (PSTH). Unless otherwise indicated, average firing rates were sampled in 20-ms bins. For individual unit analyses, single trial means comprised individual data points (note that non-identical numbers of trials were typically used for comparisons between conditions, such as match vs. non-match, and were therefore considered independent). For population analyses, the session means for each unit (collapsed across individual trials) served as individual data points. Population analyses combined SUA and MUA except where noted (cf. Kayser et al., 2008). Changes in firing rate during the retention interval were assessed with ANOVA plus *post-hoc* tests comparing 10 successive 500-ms segments of the retention interval to a 500-ms pretrial baseline period (*p* < 0.05, Fisher's LSD). Differences in firing rate between conditions (e.g., match vs. non-match) were tested with ANOVA using a 100-ms sliding window, advancing in 20-ms increments (cf. Apicella et al., 1997; Darbaky et al., 2005; Chandrasekaran and Ghazanfar, 2009). Effects were only considered significant in cases where significant differences (*p* < 0.05) were obtained for two or more consecutive steps. Because population analyses included a relatively large number of units, and since additional comparisons were made between conditions during the sample stimulus resulting in a larger number of tests, a more conservative alpha level was adopted (*p* < 0.005).

#### **RESULTS**

#### **BEHAVIORAL RESULTS**

The subjects attained, on average, 65.5% overall accuracy based on 75 total behavioral sessions. This modest level of accuracy is common for non-human primates performing auditory STM tasks, even after extensive training (Fritz et al., 2005; Scott et al., 2012, 2013; Bigelow and Poremba, 2013a, 2014). Although relatively poor compared to studies of visual STM in monkeys (e.g., Fritz et al., 2005), a comparison of the number of correct and incorrect trials per session confirmed that performance was well above chance [*F*(1, 74) = 315.1, *p* < 0.05). As in prior studies from our lab using the same subjects as well as other animals (Bigelow and Poremba, 2013a,b; Plakke et al., 2013; Ng et al., 2014), a strong "go" bias was observed: subjects correctly responded on 75.5% match trials ("hits"), but incorrectly responded on 44.1% of non-match trials ("false alarms"; mean d-prime: 1.1). Also consistent with our previous experiments was the finding that correct hits were made significantly faster (response latency = 394 ms) than false alarms (response latency = 462 ms; *F*(1, 74) = 397.6, *p* < 0.05).

#### **RETENTION INTERVAL**

Changes in firing rate during the retention interval were assessed by comparing the mean firing rate during the pretrial baseline (500 ms prior to sample onset) to 10 successive 500-ms segments during the retention interval. Example units exhibiting significant changes from baseline in one or more segments of the retention interval are shown in **Figure 2**, and a summary of units with significant changes from baseline during each segment is presented in **Figure 3A**. The largest portion of units (23.4%) exhibited an increased firing rate relative to baseline in the first 500-ms period of the retention interval (i.e., the sample offset period). However, for the majority of these units, the elevated firing rate did not persist into the retention interval any further. Although suppressed firing rates relative to baseline were less common during the first segment of the retention interval, they were observed more frequently further into the retention interval (e.g., 2000 ms after sample offset). Also, more units exhibited suppressed firing rates (117 units; 35.0%) compared to elevated firing rates (93 units; 27.8%) for at least one 500-ms segment of the retention interval, with a large portion of suppression effects observed during the latter portion of the retention interval. Consistent with these observations, repeated ANOVA revealed that mean population firing rates varied significantly

periods.

i–iv), the firing rate fell below baseline during the latter portion of the retention interval. In other cases (units v–vi), the firing rate returned to baseline levels from a significantly elevated firing rate earlier in the

firing rates were not observed at the population level in either PFC or dTP (Plakke et al., 2013; Ng et al., 2014), suggesting a greater degree of suppression in A1.

onset). Shaded gray areas indicate sample and test stimulus presentation

In general, the percentages of units exhibiting changes from baseline during the retention interval in each of our studies (PFC, dTP, and A1) have been smaller than what has typically been reported in studies of visual STM in various cortical areas (Fuster and Alexander, 1971; Fuster and Jervey, 1981; Miller et al., 1996; Shafi et al., 2007). Moreover, in contrast to the sustained changes in firing rate in these studies, the units recorded in our experiments typically exhibited changes in firing rate that were transient or intermittent. Indeed, only 16 units (4.8%) in the present study exhibited significant changes from baseline for half of the retention interval or more, and only 1 unit exhibited sustained suppression throughout the entire retention interval. These findings also differ from a previous study of neuronal activity in A1 during an auditory STM task, which showed changes in firing rate (both increased and decreased) that were sustained throughout the entire retention interval (Gottlieb et al., 1989).

from baseline during the retention interval [*F*(10, 3330) = 19.5, *p* < 0.05). *Post-hoc* tests (*p* < 0.05, Bonferroni correction) indicated that firing rates were briefly elevated at sample stimulus offset, but then became suppressed. After returning to baseline near the midpoint of the retention interval, firing rates again fell significantly below baseline during the last 1500 ms prior to test stimulus onset (**Figure 3B**).

The pattern of firing rate changes observed during the retention interval in the current study are similar in many ways to the results of previous studies from our lab investigating neuronal activity during auditory STM in the PFC (Plakke et al., 2013) and in dTP (Ng et al., 2014). Units in all three cortical areas exhibited significant increases or decreases in firing rate relative to baseline. In the current study, units more frequently exhibited reduced firing rates with the exception of the first bin of the retention interval (i.e., the sample offset period). In particular, suppressed responses were dominant during the latter portion of the retention interval, where firing rates were significantly below the pretrial baseline at the population level (**Figure 3**). Similar changes from baseline

returning to baseline near the midpoint of the retention interval, firing rates again fell significantly below baseline during the latter portion of the retention interval prior to the test stimulus. Asterisks indicate retention interval periods that differed significantly from baseline (500 ms prior to trial onset) indicated by the dashed line.

However, the retention interval in that study was only 1 s, leading to the possibility that the responses might have returned to baseline during an extended retention interval. Moreover, in contrast to the go/no-go paradigm used in the present study, Gottlieb et al. (1989) trained their subject to perform a twoalternative forced-choice DMS task in which reward was available on every trial (pending correct responses). The differences in task contingencies may have thus contributed to the differences in firing rate changes during the retention interval, inasmuch as response and reward anticipation has been shown to influence neuronal activity in A1 (Brosch et al., 2005, 2011; Yin et al., 2008) and other cortical areas (Curtis and D'Esposito, 2003).



*Enhancement and suppression effects based on comparison of mean firing rates between cue types using a 100-ms sliding window, advancing in 20-ms steps. Effects are reported where significant differences were obtained for two or more consecutive bins. "Cue"* = *cue presentation period; "Offset"* = *500 ms post-cue period; "Wait"* = *500 ms post-offset period. Percentages based on 334 units (SUA: 160 units; MUA: 174 units).*

#### **CUE-EVOKED RESPONSES: INDIVIDUAL UNIT ANALYSES**

Cue enhancement and suppression effects were examined on an individual unit basis and at the population level by comparing firing rates between cue types using a 100-ms sliding window (20-ms step). Previous studies have observed "match enhancement" both during and after the cue presentation period (Plakke et al., 2013; Ng et al., 2014). To capture these possible effects, comparisons were made to test for potential differences in firing rate on match and non-match trials during the cue presentation period (0–500 ms from cue onset) as well as the offset period (0–500 ms from cue offset) and the pre-response wait period (500–1000 ms from cue offset). Units that exhibited significant positive differences (*p* < 0.05) for two or more consecutive bins were considered to show enhancement effects, whereas units that exhibited significant negative differences for two or more consecutive bins were considered to show suppression. The results are summarized in **Table 1**, with individual unit examples presented in **Figure 4**. In general, a higher proportion of units exhibited match enhancement compared to suppression (**Table 1**). An exception to this general outcome was that, during the cue period only, the single-unit subpopulation more frequently exhibited suppressed responses to matching test stimuli. The proportion of units exhibiting significant match enhancement effects increased as the trial progressed from the cue presentation period to the cue offset and pre-response wait periods.

#### **CUE-EVOKED RESPONSES: POPULATION ANALYSES**

In general, the trends observed in the individual unit enhancement and suppression analyses were reflected in the population-averaged firing rate shown in **Figure 5**. There were no significant differences between trial types during the sample stimulus period or retention interval, or during the peak response evoked at the onset of the test stimulus (∼0–100 ms post-stimulus onset). However, significantly enhanced firing rates were observed beginning approximately 300 ms after test onset and continuing throughout the offset and pre-response periods. At this latency, the significant match enhancement effects observed in the present study follow those observed in the PFC by at least 100 ms (Plakke et al., 2013). Moreover, the magnitude of the match enhancement effects in A1 was relatively modest compared to those reported in PFC. These

observations are consistent with previous studies suggesting task-specific response modulation in A1 likely reflects feedback from other cortical areas including PFC, where task-relevant information is identified and responses are selected (Scheich et al., 2007). In dTP, firing rates elicited by matching test sounds were significantly elevated over non-matching test sounds only during the late offset and pre-response wait periods (Ng et al., 2014). This suggests that task-related feedback originating in higher cortical areas such as PFC may reach A1 first, and in turn propagate along the superior temporal gyrus.

Because there were differences in the percentages of single- and multi-units that exhibited match enhancement effects (**Table 1**), a subpopulation analysis was conducted that included only the single units (**Figure 6**). The general trends observed in the singleunit subpopulation were similar to those in the entire population analysis, although fewer differences between match and nonmatch trials reached statistical significance. One of the most substantial differences was that significant early match suppression effects were observed in the single-unit subpopulation. Because comparisons were made between 100-ms averages (advancing in 20-ms steps), these effects could have occurred as early as 40–60 ms post-stimulus onset—a latency comparable to the match suppression observed at 30–60 ms post-stimulus onset in dTP (Ng et al., 2014).

The observations of both early match suppression and late match enhancement effects in both A1 and dTP lend support to the possibility of separate neural mechanisms enabling auditory STM and ultimately differential behavioral responses on match and non-match trials. Although the mechanisms underlying reduced firing rates for repeated sounds are still under debate (Ng et al., 2014), early match suppression could reflect bottom–up stimulus-specific adaptation effects produced by local recurrent connections and input from thalamus and other cortical areas (Jääskeläinen et al., 2007; Liu et al., 2009; Ng et al., 2014). Indeed, modest adaptation effects have been observed in passive-exposure paradigms at interstimulus intervals of up to 5s (Werner-Reiss et al., 2006). On the other hand, the ensuing elevated firing rates observed for matching sounds might reflect top–down feedback from higher cortical areas such as PFC, which are predominantly involved in integrating task-relevant sensory information and response selection.

One final observation that was evident in the population average firing rate (**Figure 5**) was a small excitatory response beginning approximately 120 ms into the response window. This response was apparently elicited by the orange backlight of the response button that signaled the response window. These modest light-evoked responses are consistent with previous studies demonstrating activation of A1 by non-acoustic stimulation including visual, somatosensory, and motor events, particularly if they are related to an auditory task in trained subjects (Fu et al., 2003; Brosch et al., 2005; Ghazanfar et al., 2005; Scheich et al., 2007, 2011; Kayser et al., 2008; Yin et al., 2008).

#### **FIGURE 5 | Population spiking activity during auditory short-term memory task.** Firing rates elicited by matching and non-matching test stimuli are depicted in the right panel, and firing rates elicited by the sample stimuli are shown in the left panel as a control comparison. Beginning during the latter portion of the test stimulus presentation period, firing rates became significantly higher for match compared to non-match trials. This difference was sustained with minimal interruption throughout the offset and

pre-response wait periods. The black bars below the firing rate histograms indicate significant differences between trial types (assessed with a 100-ms sliding window, advancing in 20-ms steps). Differences were only accepted if significant effects were obtained for two or more consecutive steps. The gray bars above the abscissae indicate the sample and test stimulus presentation periods (0–500 ms from cue onset) as a well as the onset of the response window (R.W.).

#### **FIGURE 6 | Single-unit subpopulation spiking activity during auditory short-term memory task.** Firing rates elicited by matching and non-matching test stimuli are depicted in the right panel, and firing rates elicited by the sample stimuli are shown in the left panel as a control comparison. Similar trends were observed in the population (**Figure 5**) and single-unit subpopulation analyses. However, early match suppression effects reached significance only in the single unit subpopulation. In addition, the elevated firing rates beginning during the latter portion of the test stimulus were less

## **ERROR TRIALS**

Additional analyses were conducted to test for potential differences in firing rates on non-match trials in which subjects incorrectly made button presses (false alarms). As seen in **Figure 7**, there were no differences in firing rate between non-match error trials and correct trial types during or immediately following the sample stimulus presentation period. Non-match error trials also did not differ from correct trials during the baseline firing rate or retention interval. During the latter portion of the test stimulus, however, firing rates on non-match error trials robust, reaching significance only during the late pre-response period. The black bars below the firing rate histograms indicate significant differences between trial types (assessed with a 100-ms sliding window, advancing in 20-ms steps). Differences were only accepted if significant effects were obtained for two or more consecutive steps. The gray bars above the abscissae indicate the sample and test stimulus presentation periods (0–500 ms from cue onset) as a well as the onset of the response window (R.W.).

exceeded those observed on correct non-match trials, similar to what was observed on correct match trials. The differences were initially as great as those observed between correct match and non-match trials, but diminished later in the offset and preresponse periods, such that firing rates on non-match error trials eventually fell significantly below firing rates on correct match trials. The observation that firing rates on non-match error trials exhibit a relatively late increase in firing rate similar to correct match trials reinforces the idea that "match enhancement" may be related to top–down feedback reflecting response selection and/or

anticipation, inasmuch as button presses were made for both trial types. This notion is corroborated by the observation that firing rates were similarly elevated on non-match error trials in PFC (Plakke et al., 2013). However, along with the differential response latencies observed for these two response types, the finding that elevated non-match error firing rates were not sustained to the same degree as firing rates on correct match trials suggests that processes underlying these two "go" trial types are not identical. Rather, neuronal and behavioral activity observed during nonmatch error trials appears to be intermediate between true match responses and correct non-match rejections, perhaps as others have suggested, reflecting reduced certainty in the behavioral choice (Benjamin and Bjork, 1996).

## **DISCUSSION**

The foregoing results reveal that neurophysiological activity in A1 was associated with several aspects of auditory STM processing at the individual-unit and population levels. As in PFC and dTP, a modest number of units exhibited significant increases and decreases in firing rate during the retention interval. Moreover, stimulus-evoked responses were frequently modulated depending on the context in which the sounds were presented. Specifically, many units exhibited enhanced or suppressed responses depending on whether the sound was presented as a matching or nonmatching test stimulus (**Table 1**, **Figures 4**–**6**). Analyses of error trials suggested these modulation effects in part reflected the subjects' perceptual decisions (**Figure 7**). Overall, these observations highlight flexible task-engagement of neurons at this early stage of auditory cortical processing.

As in our earlier studies of PFC and dTP (Plakke et al., 2013; Ng et al., 2014), both increases and decreases in firing rate were observed during the retention interval (**Figures 2**, **3**). In the present study, increased firing rates were more frequently observed immediately following the sample stimulus, but the majority of significant effects thereafter reflected decreases in firing rate relative to baseline. In contrast to the results from PFC and dTP, these effects were sufficiently prevalent that firing rates fell significantly below baseline during the latter portion of the retention interval at the population level (**Figure 3B**). In studies of visual STM, sustained changes in firing rate during the retention interval have typically been interpreted as a correlate of mnemonic retention of a sensory cue for the guidance of prospective behavior (e.g., Shafi et al., 2007). Since these effects have been observed in many cortical areas, and have been shown to depend on interactions among these areas (e.g., Fuster et al., 1985), they are generally assumed to reflect sustained interactions among a distributed cortical/subcortical network that collectively enables neural representation of the sensory cue once it has passed from the environment. Delay-related changes in firing rate observed in the current study could reflect similar processes. Alternatively, since these firing rate changes were not sustained, but were mostly observed for 1 or 2 s following the sample stimulus and prior to the test stimulus, they could reflect mechanisms encoding the timing of the trial sequence (e.g., decreased firing rates near the end of the retention interval could reflect anticipation of the test stimulus). One additional possibility is that the suppression effects observed prior to the onset of the test stimulus could serve to increase the signal-to-noise ratio of the behaviorallyrelevant sounds. Each of these possibilities deserves further experimental attention in studies using appropriate variations in task contingencies (e.g., variable vs. fixed retention interval).

In all three areas (PFC, dTP, A1), significant changes from baseline firing rates were generally not sustained, and were observed in a smaller proportion of units than typically reported in studies of visual STM (Fuster and Alexander, 1971; Fuster and Jervey, 1981; Miller et al., 1996; Shafi et al., 2007). One factor that might have contributed to these differences is the fact that, under the asymmetric response/reward contingency employed in our studies, subjects could not predict behavioral responses or rewards during the retention interval, which has been shown to modulate firing rates in PFC and other cortical areas (e.g., Kobayashi et al., 2002; Curtis and D'Esposito, 2003; Brosch et al., 2005, 2011; Shafi et al., 2007; Yin et al., 2008). Besides this difference in task contingency, each of our studies has used exclusively auditory stimuli as memoranda. Several earlier experiments investigating task-related activation of PFC neurons by auditory or visual stimuli during various STM and discrimination tasks invariably reported that fewer cells were activated by auditory stimuli, and that behavioral accuracy was lower for auditory trials (Watanabe, 1992; Kikuchi-Yorioka and Sawaguchi, 2000; Artchakov et al., 2007). These observations raise the possibility that delay-related changes in firing rate in PFC might be less robust for auditory stimuli, which could have downstream effects in A1 and dTP.

The early suppressed firing rates elicited by matching test stimuli relative to non-matching stimuli in the single unit subpopulation (**Figure 6**) are comparable to those observed in dTP (Ng et al., 2014). Although the mechanisms underlying match suppression are still under investigation (Grill-Spector et al., 2006), they may include local interactions among recurrent connections as well as inputs from thalamus and other cortical areas (Liu et al., 2009; Ng et al., 2014). Although it is possible that suppression effects could originate in A1 and subsequently bias firing rates in dTP, the early timing of the effects in both areas, and the fact that both areas receive direct input from auditory thalamus (Markowitsch et al., 1985), raise the possibility that they may arise independently in each area. In either case, these suppression effects appear to be the earliest indicator of a matching test stimulus. This signal could feed forward to higher cortical areas such as PFC, ultimately setting the stage for the distinct behavioral responses required by the STM task following repeated vs. different sound presentations.

Following the early suppression effects on match trials, firing rates became relatively elevated compared to non-match trials beginning approximately 300 ms after test stimulus onset, and remained elevated throughout the cue offset and pre-response periods (**Figure 5**). In dTP, firing rates on match trials similarly became relatively enhanced (Ng et al., 2014), but only beginning in the latter cue offset period—several 100 milliseconds later than the effects observed in A1. Of particular significance, match enhancement effects were also observed in PFC, and they were of larger magnitude and occurred earlier than both A1 and dTP (**Figure 8**; Plakke et al., 2013). These observations suggest that the relatively late elevated firing rates observed on match trials in A1 and dTP might be produced by top–down feedback originating in PFC, where task-relevant information is extracted. On the other hand, the relatively early match suppression effects might reflect bottom–up influences involved in detecting change in the acoustic environment. Together, these influences could work together to enable detection of matching sounds and selection of appropriate behavioral responses.

An additional observation supporting the hypothesis that match enhancement reflects top–down feedback reflecting behavioral choice is that firing rates were similarly elevated on non-match error trials, wherein subjects incorrectly reported a "match" decision. Similar elevated firing rates were observed on non-match error trials in the PFC (Plakke et al., 2013), as well as in dTP during the pre-response wait period (Ng et al., 2014). The late elevated firing rates on correct match trials and non-match error trials are therefore associated with the subjects' perceptual choices, rather than the actual same/different relationship of the sample and test sounds. Passive response modulation influences such as stimulus-order facilitation (e.g., Kilgard and Merzenich, 2002) are unlikely to account for these effects, inasmuch as the enhanced responses were observed on trials with both repeated (match) and distinct sounds (non-match error). Our observation of such effects in A1 corroborates earlier reports that A1 activity was correlated with subjects' perceptual decisions during auditory discrimination tasks (Sutter and Shamma, 2010; Niwa et al., 2012, 2013; Bizley et al., 2013).

The foregoing results can be added to a growing body of evidence that undermines the notion of A1 as a strictly unisensory area exclusively involved in processing acoustic information, e.g., detecting specific sound frequencies (Scheich et al., 2007; Weinberger, 2010). In addition to the correlates of perceptual choices discussed above, A1 activity has been shown to be modulated by non-auditory influences including visual and somatosensory events (Brosch et al., 2005; Ghazanfar et al., 2005; Bizley et al., 2007; Kayser et al., 2008; Scheich et al., 2011), motor activity (Brosch et al., 2005; Yin et al., 2008; Scheich et al., 2011), and reward feedback (Brosch et al., 2011). Scheich et al. (2007) have argued that these responses are unlikely to be generated by A1 itself, but rather reflect dynamic interactions with numerous other cortical areas that are driven by task demands. Our results are quite consistent with this view, since, with the exception of early match suppression, changes in firing rate associated with subjects' subsequent behavioral choices followed similar effects observed in PFC.

The studies reporting that fewer PFC neurons were activated by auditory stimuli during STM and discrimination tasks (Watanabe, 1992; Kikuchi-Yorioka and Sawaguchi, 2000; Artchakov et al., 2007) provide evidence for an important difference in the neural processes underlying visual and auditory STM in primates (see also Muñoz et al., 2009). Another significant difference was demonstrated in a study by Fritz et al. (2005), which showed that lesions of the perirhinal and entorhinal cortices significantly impair visual but not auditory DMS task performance. Notably, preoperative performance was superior for the visual task, but postoperative performance was similar for visual and auditory tasks. These performance outcomes are consistent with anatomical studies showing substantial projections to the rhinal cortices from visual and somatosensory, but not auditory cortical areas (Brown and Aggleton, 2001; Munoz-Lopez et al., 2010). These differences notwithstanding, other studies have provided evidence for similarities between auditory and visual STM circuits, such as a prominent role of the PFC in identifying task-relevant events and selecting appropriate behavioral responses, the involvement of other cortical areas including

primary sensory cortex, and similar physiological phenomena including match enhancement and delay-related changes in firing rate. Thus, the available evidence reveals both substantial similarities and differences in neural processes underlying visual and auditory STM.

The current results contribute to a small but growing body of literature casting light on the neural processes underlying auditory STM. In combination with our prior studies of PFC and dTP, the current study strengthens the idea that distinct neural mechanisms may be involved in mediating the match/nonmatch decision during the auditory DMS task. Specifically, early bottom–up processes might enable the basic distinction of repeated vs. non-repeated sounds, and top–down influences might reflect selection of the appropriate behavioral response. In addition, all three studies have revealed that changes in firing rate during the retention interval are generally less robust during auditory STM. Because these types of activity have been shown to be important for performance of visual STM tasks, the less robust auditory effects might be related to the inferior performance that has been observed in numerous studies of auditory STM in primates (Cohen et al., 2005; Fritz et al., 2005; Munoz-Lopez et al., 2010; Scott et al., 2012; Bigelow and Poremba, 2014). Notwithstanding the contributions of the current experiment and other recent studies, our understanding of the neural substrates of auditory STM remains largely incomplete. For example, simultaneous recordings from multiple cortical and subcortical areas, perhaps paired with lesions or inactivations, could be conducted to directly test the speculative possibility that early match suppression and late enhancement effects represent bottom–up influences and top–down influences from PFC and other cortical areas, respectively. Additional studies are also needed to clarify the extent to which auditory and visual STM depend on similar neural processes and circuitry. In particular, studies using comparable auditory and visual STM tasks and ideally the same subjects hold the potential to explain differences observed at the behavioral level and aid in interpreting the function of neurophysiological phenomena such as cue-modulation effects and changes in firing rate during mnemonic retention.

#### **ACKNOWLEDGMENTS**

We thank Drs. Bethany Plakke and Chi-Wing Ng for assistance with training the animals and Ryan Opheim for assistance with data collection. We also thank Mortimer Mishkin, Richard Saunders, and Megan Malloy at the National Institute of Mental Health for their support. This research was supported by funding awarded to Amy Poremba from the University of Iowa and a grant from the National Institute on Deafness and Other Communication Disorders (DC0007156).

#### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fnins.2014. 00250/abstract

#### **REFERENCES**


Miller, E. K., and Desimone, R. (1994). Parallel neuronal mechanisms for shortterm memory. *Science* 263, 520–522. doi: 10.1126/science.8290960


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 01 April 2014; accepted: 28 July 2014; published online: 14 August 2014. Citation: Bigelow J, Rossi B and Poremba A (2014) Neural correlates of shortterm memory in primate auditory cortex. Front. Neurosci. 8:250. doi: 10.3389/fnins. 2014.00250*

*This article was submitted to Auditory Cognitive Neuroscience, a section of the journal Frontiers in Neuroscience.*

*Copyright © 2014 Bigelow, Rossi and Poremba. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Specialized prefrontal "auditory fields": organization of primate prefrontal-temporal pathways

## *Maria Medalla1,2 and Helen Barbas 1,2,3\**

*<sup>1</sup> Department of Anatomy and Neurobiology, Boston University, Boston, MA, USA*

*<sup>2</sup> Neural Systems Laboratory, Department of Health Sciences, Boston University, Boston, MA, USA*

*<sup>3</sup> Department of Health Sciences, Boston University, Boston, MA, USA*

#### *Edited by:*

*Monica Munoz-Lopez, University of Castilla-La Mancha, Spain*

#### *Reviewed by:*

*Ricardo Insausti, University of Castilla-La Mancha, Spain Brian H. Scott, National Institute of Mental Health, USA*

#### *\*Correspondence:*

*Helen Barbas, Neural Systems Laboratory, Boston University, 635 Commonwealth Ave., Room 431, Boston, MA 02215, USA e-mail: barbas@bu.edu*

No other modality is more frequently represented in the prefrontal cortex than the auditory, but the role of auditory information in prefrontal functions is not well understood. Pathways from auditory association cortices reach distinct sites in the lateral, orbital, and medial surfaces of the prefrontal cortex in rhesus monkeys. Among prefrontal areas, frontopolar area 10 has the densest interconnections with auditory association areas, spanning a large antero-posterior extent of the superior temporal gyrus from the temporal pole to auditory parabelt and belt regions. Moreover, auditory pathways make up the largest component of the extrinsic connections of area 10, suggesting a special relationship with the auditory modality. Here we review anatomic evidence showing that frontopolar area 10 is indeed the main frontal "auditory field" as the major recipient of auditory input in the frontal lobe and chief source of output to auditory cortices. Area 10 is thought to be the functional node for the most complex cognitive tasks of multitasking and keeping track of information for future decisions. These patterns suggest that the auditory association links of area 10 are critical for complex cognition. The first part of this review focuses on the organization of prefrontal-auditory pathways at the level of the system and the synapse, with a particular emphasis on area 10. Then we explore ideas on how the elusive role of area 10 in complex cognition may be related to the specialized relationship with auditory association cortices.

**Keywords: frontopolar cortex, frontal pole, area 10, anterior cingulate cortex, synaptic pathways, inhibitory neurons, laminar connections**

## **OVERVIEW**

It is quite remarkable that there is not a waking moment that is completely free of sound. Whether it is the buzzing of our surroundings or on-going conversations, our minds are bombarded by endless streams of auditory signals [e.g., (Conway et al., 2001; Denham and Winkler, 2006; Jaaskelainen et al., 2007; Micheyl et al., 2007); reviewed in (Bee and Micheyl, 2008; Winkler et al., 2009)]. Superimposed on the external auditory environment is an inward stream of thoughts akin to the external that contributes to the sea of auditory signals [e.g., (Scott et al., 2013b); reviewed in (Haykin and Chen, 2005; Allen et al., 2008; Perrone-Bertolotti et al., 2014)]. But what is more remarkable is our ability to sort out what is important in this sea of noise. The prefrontal cortex is necessary for the function of selecting relevant information and suppressing irrelevant signals for the task at hand (reviewed in Knight et al., 1999; Miller and Cohen, 2001). The interaction of prefrontal cortices with auditory association cortices provides an excellent demonstration of this prefrontal executive function (Chao and Knight, 1997), which is thought to reach beyond auditory processing *per se*, and extend to the global process of "using our thoughts" to guide cognitive tasks [e.g., (Frith, 1996; Wenzlaff and Wegner, 2000; Brewin and Smart, 2005); reviewed in (Knight et al., 1999; Allen et al., 2008; Winkler et al., 2009; Perrone-Bertolotti et al., 2014)]. The behavioral exemplars of these prefrontal-auditory interactions are evident in our daily lives—from following a conversation in a crowded room or tackling an inner debate on what to order from a menu—but the neural substrate and mechanisms are unclear.

From a neuroanatomical perspective, the importance of auditory information in prefrontal function is intuitive given that no other sensory modality is more frequently and vastly represented in the prefrontal cortex than the auditory modality (for review see Barbas et al., 2002). Pathways from auditory association cortices reach lateral, medial, and orbital surfaces of the prefrontal cortex. But the densest prefrontal interconnections with auditory association areas are with the frontopolar cortex, area 10, which mediates the most complex and abstract cognitive tasks (**Figure 1**; reviewed in Barbas et al., 2002; Burgess et al., 2007; Koechlin and Hyafil, 2007; Smith et al., 2007; Badre and D'Esposito, 2009).

The frontopolar cortex is situated in the most anterior part of the prefrontal cortex, extending from the lateral to the medial and orbital surfaces (Barbas and Pandya, 1989; Petrides, 2005). The function of this region had remained elusive and it was not until the advent of human functional neuroimaging that its role in complex cognition began to emerge (reviewed in Burgess et al., 2007; Koechlin and Hyafil, 2007). Early detailed physiologic work on prefrontal function in non-human primates had focused on visual-related processing, specifically the caudal periarcuate

area 46 and area 8, the frontal eye fields (FEF) (Jacobsen, 1936; Robinson and Fuchs, 1969; Fuster, 1973; Niki and Watanabe, 1976). Studies in these periarcuate regions have led to important findings on the role of the prefrontal cortex in the active maintenance of information for a task at hand (working memory) and attention (reviewed in Goldman-Rakic, 1995; Fuster, 2001; Constantinidis and Procyk, 2004; Funahashi, 2006). While these findings are pivotal to our understanding of the prefrontal cortex, it is often not emphasized that visual information in lateral prefrontal cortex is represented in rather restricted areas within the caudal periarcuate region and at the most posterior part of the principal sulcus (Barbas and Mesulam, 1981, 1985; Barbas, 1988; Bullier et al., 1996).

Early anatomic studies showed that visual and visuomotor connections were confined to caudal lateral prefrontal areas (Barbas and Mesulam, 1981, 1985). Moreover, the studies revealed a striking opposite gradient in visual and auditory input along the rostro-caudal extent of the principal sulcus in rhesus monkeys (Barbas and Mesulam, 1985). While visual input was robust caudally, projections from auditory association cortices were sparse in caudal peri-principalis regions and became progressively denser in rostral areas, with the densest auditory pathways directed to the rostral frontopolar cortex in area 10 (**Figure 2**). While classical and modern anatomic studies have found strong auditory-related connections in the frontal pole (Pandya and Kuypers, 1969; Chavis and Pandya, 1976; Petrides and Pandya, 1988; Barbas et al., 1999, 2005; Hackett et al., 1999; Romanski et al., 1999a,b; Medalla et al., 2007), it is generally understated in the functional literature. This is in striking contrast to the well-known functional involvement of the caudal periarcuate FEF in visual tasks [e.g., (Mishkin, 1966; Robinson and Fuchs, 1969; Mohler et al., 1973); reviewed in (Bruce and Goldberg, 1984; Schiller and Tehovnik, 2001; Schall and Boucher, 2007)]. The extrinsic connections of the frontal pole are nearly exclusively with auditory association areas in rhesus monkeys, which rival in strength the visual-related connections of the FEF [(Barbas and Mesulam, 1985); reviewed in (Barbas et al., 2002; Lynch and Tian, 2005)]. Moreover, while most of the prefrontal cortex receives highly-processed high-order sensory information, both area 10 and the FEF receive projections from relatively "early" unimodal sensory association cortices as well [(e.g., auditory belt and parabelt areas for area 10, and areas V2 and V4 for FEF); (Barbas and Mesulam, 1985); reviewed in (Barbas et al., 2002)]. Early-processing sensory areas receive strong direct input from the primary cortices [e.g., (Pandya et al., 1988; Rauschecker et al., 1997; Kaas and Hackett, 1998)]. In this light, the frontopolar area 10 may be regarded as the frontal "auditory field," to reflect the emphasis of its rich and varied bidirectional connections with auditory association cortices.

## **ORGANIZATION OF AUDITORY-RELATED PREFRONTAL AREAS**

While auditory connections predominate for area 10 among prefrontal areas, auditory input impinges on several prefrontal auditory "hotspots" on the lateral and medial surfaces (**Figure 1**; Pandya and Kuypers, 1969; Chavis and Pandya, 1976; Barbas and Mesulam, 1981, 1985; Petrides and Pandya, 1988, 2002; Morel et al., 1993; Kaas and Hackett, 1998; Barbas et al., 1999, 2005; Hackett et al., 1999; Romanski et al., 1999a,b; Medalla et al., 2007). Auditory pathways also reach areas within the largely multimodal orbitofrontal cortex, but this is discussed elsewhere (reviewed in Barbas et al., 2011).

## **AUDITORY ASSOCIATION CORTICES THAT ARE LINKED WITH PREFRONTAL CORTICES**

The auditory cortices that are most strongly connected with prefrontal areas lie within the superior temporal gyrus (STG), extending from the inferior bank of the lateral fissure to the upper (medial) bank of the superior temporal sulcus (**Figure 1**, top). This temporal region is subdivided into distinct areas, according to the maps of Galaburda and Pandya (1983) and Hackett et al. (1999). These areas fall within four main subdivisions of the functional map of the auditory cortex: the core area, which includes the primary auditory cortex; the adjacent belt and parabelt region, and the anterior temporal polar region (reviewed in Romanski and Averbeck, 2009; **Figure 1**, top).

The prefrontal cortex is interconnected roughly with the anterior two thirds of STG, which extends from temporal polar cortex through anterior parabelt and belt areas (**Figure 1**, top). These parts of STG consist mostly of high-order auditory association areas that respond to complex auditory stimuli [(Plakke et al., 2013; Ng et al., 2014); reviewed in (Romanski and Averbeck, 2009)]. A small subset of prefrontal interconnections include caudal auditory belt areas (**Figure 1**, top; Hackett et al., 1999; Romanski et al., 1999a,b). Physiologic and metabolic mapping studies show activation of these temporal and interconnected prefrontal areas in response to auditory stimuli (Rauschecker et al., 1997; Poremba et al., 2004; Plakke et al., 2013; Ng et al., 2014). Details of the organization of the auditory cortex can be found elsewhere (reviewed in Romanski and Averbeck, 2009). This review focuses specifically on the anatomic organization of prefrontal-temporal pathways that may shed light on the mechanism of communication and information transfer within a network for high-order cognition.

## **TOPOGRAPHY OF PREFRONTAL "HOTSPOTS" FOR AUDITORY INPUTS AND OUTPUTS**

In the lateral prefrontal cortex, there is a graded increase in the density of auditory connections along the caudal to rostral axis (**Figure 2**; Barbas and Mesulam, 1985). Within the caudal lateral prefrontal cortex, auditory input is relatively restricted to specific domains of rostral dorsal area 8 (Barbas and Mesulam, 1981) and areas 45 and 12 in the ventrolateral prefrontal cortex (**Figure 1**, top; Hackett et al., 1999; Romanski et al., 1999a,b). These areas receive pathways from auditory association cortices from a restricted and more caudal part of STG, within the parabelt and belt areas (**Figure 1**, top; Barbas and Mesulam, 1981, 1985; Hackett et al., 1999; Romanski et al., 1999a,b). These areas also receive significant projections from visual association cortices and are thought to be sites of visual-auditory convergence in the prefrontal cortex [(Barbas, 1988); reviewed in (Romanski, 2007)]. Electrical stimulation of rostral dorsal area 8 (area 8a) in the upper bank of the arcuate sulcus, elicits large amplitude saccades (Robinson and Fuchs, 1969; Mohler et al., 1973; Bruce and Goldberg, 1985) and has been discussed as a region that may help direct attention to peripheral visual and auditory stimuli [e.g., (Barbas and Mesulam, 1981)]. Ventrolateral areas 12 and 45 have been the most well studied in terms of single-unit recordings during auditory tasks in awake behaving monkeys. In the ventrolateral prefrontal cortex, neurons are responsive to complex acoustic stimuli, including species-specific vocalizations that involve a complex interplay of visual and auditory information (reviewed in Romanski and Averbeck, 2009). In the more anterior lateral prefrontal areas, mid-dorsolateral areas 46 and 9, pathways from auditory cortices are stronger as visual input wanes (**Figures 2B,C**). This mid-dorsolateral prefrontal cortex is thought to play a prominent role in classic working memorytemporary active maintenance of information needed to perform the task at hand (reviewed in Levy and Goldman-Rakic, 2000; Petrides, 2005).

Unlike the restricted patches in lateral prefrontal cortices that have connections with auditory association cortices, the representation of the auditory modality is widespread in the medial prefrontal cortex (**Figure 1**). Dorsolateral area 9 extends to the medial surface, where there is graded increase in connections with auditory cortices (Barbas et al., 1999). Dense auditory pathways

along the medial wall reach all the way from rostral medial prefrontal area 10 to more caudally situated areas 14, 32, and 25 (**Figure 1**, bottom; Barbas et al., 1999). In particular, auditory connections in areas 32 and 25 of the anterior cingulate cortex (ACC) rival in density auditory connections with area 10. However, unlike area 10, which is privy to information from both early and high-order auditory association cortices, the ACC has access only to highly-processed auditory information through dense interconnections with the rostral STG, especially with areas near the temporal pole [(Barbas et al., 1999); reviewed in (Barbas et al., 2002)]. Nonetheless, with its robust anatomic links with the rostral auditory association cortices, the ACC can be regarded as the medial frontal auditory "hotspot." In fact, the ACC has a demonstrated robust and functional interaction with auditory

lateral surface. In **(A)** a complete map of pathways directed to area 10 shows

areas. Electrical stimulation of the ACC can evoke species-specific vocalizations in monkeys, a pathway thought to mediate emotional communication [(Muller-Preuss et al., 1980; Muller-Preuss and Ploog, 1981); reviewed in (Vogt and Barbas, 1988)]. Activity in the ACC has also been correlated with auditory processing of actual and internal speech in humans (Frith et al., 1995; McGuire et al., 1995).

Rh, rhinal; ST, superior temporal. Adapted from Barbas and Mesulam, 1985.

In contrast to the lateral and medial prefrontal cortices described above, the role of auditory connections in area 10 processing is largely unknown. It was not until recently that the first electrophysiologic recordings from individual neurons in area 10 of non-human primates were conducted, ironically using visual stimuli (Tsujimoto et al., 2010). In that study, a subset of neurons in area 10 showed decision-selective activity but only during the feedback period of a visual working memory task. In humans, highly complex cognitive tasks that engage area 10 also entail high-order verbal processing [e.g., (Brewin and Smart, 2005; Bunge et al., 2005; Christoff et al., 2011); reviewed in (Wenzlaff and Wegner, 2000; Burgess et al., 2007; Koechlin and Hyafil, 2007; Badre and D'Esposito, 2009)]. Nonetheless, the functionally unexplored frontopolar-auditory network is anatomically robust and in recent years we have used high-resolution tract-tracing and imaging techniques to elucidate the organization of these pathways at the level of the system and the synapse.

## **STRUCTURAL ORGANIZATION OF FRONTOPOLAR AREA 10**

Frontopolar area 10 in rhesus monkeys is a granular cortical area with a well-defined layer IV; it encompasses about the anterior quarter of the prefrontal cortex (**Figure 3A**; Barbas and Pandya, 1989). Area 10 has expanded significantly in humans, especially on the lateral surface, concomitant with the expansion of other lateral prefrontal areas (Semendeferi et al., 2011). Recent mapping studies that compare functional coupling of activation across cortical areas in humans and monkeys have shown that the entire macaque area 10 corresponds only to the medial part of the human frontal pole (Sallet et al., 2013; Neubert et al., 2014).

In the rhesus monkey, area 10 has dorsal, medial and ventral (basal) subdivisions, all of which are interconnected with auditory areas of the STG (**Figures 3A**, right, **3B**; Pandya and Kuypers, 1969; Chavis and Pandya, 1976; Petrides and Pandya, 1988). In addition to auditory connections, area 10 is heavily interconnected with other parts of the prefrontal cortex, especially dorsolateral prefrontal areas 9/46 and ACC area 32 [**Figure 2A**, black; (Barbas and Pandya, 1989; Barbas et al., 1999; Medalla and Barbas, 2010); reviewed in (Barbas et al., 2002)]. With regard to the extrinsic connections of area 10 outside the prefrontal cortex, most are with auditory association areas (**Figure 2A**, blue), and the rest are with the cortical limbic system in the cingulate, retrosplenial, rhinal, and anterior temporal polar cortices [**Figure 2A**, green; (Barbas and Mesulam, 1985; Barbas et al., 1999); reviewed in (Barbas et al., 2002)]. Area 10 receives massive input from a wide spectrum of STG areas, spanning from rostral sectors (temporal pole, Ts1-2 or STGr) to the more caudally situated anterior parabelt (area Ts3 or areas RP/RTL/AL) and belt (PaI/RTM and ProA/RM) auditory association cortices (**Figures 1A**, **2A**; Barbas and Mesulam, 1981, 1985; Hackett et al., 1999; Romanski et al., 1999a,b). The reciprocal connections from area 10 to STG are also robust, encompassing a similarly widespread rostro-caudal extent along the STG (**Figures 3B**, **4A**; Barbas et al., 2005; Germuska et al., 2006; Medalla et al., 2007). In recent years, we have investigated the fine structural features of these pathways from area 10 to STG to shed light on the potential influence of frontopolar area 10 on auditory processing within the STG.

## **FRONTOPOLAR AREA 10 PATHWAYS TO AUDITORY ASSOCIATION CORTICES**

#### **GRADED LAMINAR TERMINATIONS FROM AREA 10 TO DISTINCT AUDITORY ASSOCIATION AREAS**

Pathways from area 10 terminate densely in rostral parts of STG and extend caudally to auditory parabelt and belt areas (**Figure 4A**; Barbas et al., 2005; Germuska et al., 2006; Medalla

**FIGURE 3 | Architecture of area 10 and prefrontal projection neurons to auditory cortex. (A)** Photomicrograph of a coronal Nissl-stained section shows the cytoarchitecture of area 10 in a rhesus monkey brain, with delineated dorsal, medial and ventral subregions (right). Thin line marks the top of layer IV. Inset shows location of a higher magnification photomicrograph of dorsal area 10 (left). **(B)** Lateral view of a rhesus monkey brain shows injection site of a retrograde tracer in STG area Ts2 (black); **(a–c)**, Coronal sections show plots of projection neurons in prefrontal cortices directed to the STG site in area Ts2. Adapted from Medalla et al., 2007.

the relative density of area 10 pathway terminations in distinct STG areas. Axon terminals were labeled after injection of anterograde tracers in dorsal area 10. Density is normalized to the highest in the set. Long dashes demarcate banks of sulci schematically unfolded; short dashes delineate areal boundaries. Bottom inset shows coronal sections **(a–c)** through STG with plots of labeled terminations from area 10. Rostro-caudal level of each section is indicated on the whole brain (top). **(B)** The laminar distribution (expressed as percent in the upper layers I–III) of prefrontal interconnections relative to the laminar distribution of inhibitory neurons labeled with PV and CB in distinct STG areas. Top panel

et al., 2007). The highest densities of axon terminals from area 10 reach areas Ts1-3, especially anterior parabelt area Ts3 (**Figure 4A**). There is a graded pattern of terminations in the cortical layers targeted by area 10 in STG (**Figures 4B,C**). Interestingly, this graded pattern of laminar termination coincides with graded changes in cortical structure of the targeted cortices in STG, characterized by an increase in neuronal density and granularity (appearance of layer IV) from temporal polar areas to more posterior STG cortices (Barbas and Rempel-Clower, 1997; Rempel-Clower and Barbas, 2000; Barbas et al., 2005). A large part of the temporal polar cortex is limbic type of cortex, characterized by an absent layer IV (agranular) or a poorly delineated layer IV (dysgranular). Posterior association areas are granular, with well-delineated cortical layers (Galaburda and Pandya, 1983). Axon terminals from area 10 are densest in the deep cortical layers (V–VI) of the agranular/dysgranular STG cortices in the temporal pole (**Figures 4B**, black dots; **4C** top), but are densest in the upper layers (I–IIIa) of the granular posterior auditory association areas Ts1-3 (**Figures 4B**, black dots, **4C** summarizes the predominant pattern of connections (boutons, black dots; projection neurons, blue triangles) of prefrontal cortices with the agranular and dysgranular (limbic) parts of the temporal pole (top) and with a caudal eulaminate area of the superior temporal cortex (bottom), and their relationship to PV+ (red ovals) and CB+ (orange ovals) inhibitory interneurons. Black arrows (left) show the predominant laminar termination of prefrontal axons in superior temporal areas; blue arrow (right) shows the predominant laminar origin of projection neurons in superior temporal areas directed to prefrontal cortex. Abbreviations as in **Figures 1**, **2**. Adapted from Barbas et al., 2005.

bottom; Barbas et al., 2005). This pattern is consistent with our structural model for cortico-cortical connections, which relates the laminar pattern of connections to the structural difference between interconnected areas (Barbas and Rempel-Clower, 1997; Rempel-Clower and Barbas, 2000; Barbas et al., 2005; Medalla and Barbas, 2006). Pathways that terminate in the upper layers emanate from areas that have a simpler laminar structure than the area of termination (e.g., a pathway from an agranular to a granular cortex). Pathways that terminate in the middle-deep layers link areas with the opposite structural relationship.

The significance of laminar termination patterns is that cortical layers are also distinct in terms of the excitatory and inhibitory neuronal microenvironment in different auditory association cortices [**Figures 4B,C**; (Barbas et al., 2005); reviewed in (Barbas et al., 2002)]. In the temporal pole, fibers from area 10 are concentrated in the middle-deep layers where they overlap with the dense population of excitatory layer V–VI pyramidal neurons that project back to area 10, but largely avoid the large population of inhibitory neurons found in the upper layers (**Figures 4B,C**, top). In contrast, axon fibers from area 10 reach primarily the upper layers of auditory association areas Ts1-3, where they co-mingle with the population of excitatory projection neurons in layers II–III that project to area 10, as well as the population of calbindin and calretinin inhibitory neurons, which are dense in the upper layers (**Figures 4B,C**, bottom). Thus, while there is a laminar "match" with regard to the prevalence of excitatory connections and inhibitory neurons in the auditory association areas Ts1-3, there is a laminar "mismatch" in the temporal pole where excitatory connections predominate in the deep layers but most inhibitory neurons are found in the upper layers in this region (Barbas et al., 2005).

#### **LAMINAR SPECIFIC SYNAPTIC FEATURES OF AREA 10 TERMINATIONS IN AUDITORY ASSOCIATION CORTEX**

Our recent work has focused on the pathways within the prefrontal-auditory network at the synaptic level, including features of axon terminals from area 10 to distinct cortical layers of STG areas Ts1-2 (**Figures 5A,B**, pathways *a, b, c*). We found a progressive increase in the size (volume) of area 10 axon terminals (boutons) in STG from layer I (**Figures 5A,B**, *a*), to layers II– IIIa (**Figures 5A,B**, *b*), and to the middle layer IV (**Figures 5A,B**, *c*; Germuska et al., 2006; Medalla et al., 2007). Thus, boutons from area 10 (**Figure 5B**, blue dots) in layer IV are larger than terminals in layer I of STG. The middle cortical layers are recipient of cortico-cortical and cortico-thalamic "feedforward" driving input, while layer I receives "feedback" modulatory pathways [e.g., (Hashikawa et al., 1995); reviewed in (Felleman and Van Essen, 1991; Jones, 1998; Abbott and Chance, 2005; Lee and Sherman, 2010)]. Large boutons that terminate in layer IV contain more synaptic vesicles and have a larger mitochondrial content than small boutons in layer I (Germuska et al., 2006), suggesting higher synaptic efficacy. The size of presynaptic terminals is correlated with the number of synaptic vesicles (Germuska et al., 2006; Zikopoulos and Barbas, 2007) and the probability of neurotransmitter release (Tong and Jahr, 1994; Murthy et al., 1997, 2001). One of the most efficient "drivers" of cortical neurons, especially in sensory areas, is the thalamic pathway that terminates in layer IV [(Rose and Metherate, 2005; Lee and Sherman, 2008; Cruikshank et al., 2010); reviewed in (Castro-Alamancos and Connors, 1997; Sherman and Guillery, 1998, 2002; Guillery and Sherman, 2002; Jones, 2002; Abbott and Chance, 2005; Silberberg et al., 2005; Lee and Sherman, 2010)]. Thus, as we have suggested previously (Germuska et al., 2006; Medalla et al., 2007) the laminar differences in size of axonal boutons from area 10 that terminate in auditory association cortex suggest differences in the strength of synaptic influence across cortical layers. This evidence suggests that area 10 can exercise diverse excitatory effects on STG association areas depending on the predominance of terminations in the upper or middle layers. In areas Ts1-2, the smaller upper layer terminals from area 10 suggest a predominant modulatory role of this specific pathway. Based on these synaptic relationships and findings of changes in the relative density of upper vs. deep layer terminals from area 10 to distinct STG areas, we have suggested that area 10 may drive activity in the temporal polar areas of STG through dense terminations in the middle-deep layers (Barbas et al., 2005).

Laminar terminations encounter specific microenvironments with regard to populations and dendritic segments of excitatory and inhibitory postsynaptic targets (reviewed in Peters, 1987; White, 1989; Callaway, 2002; Douglas and Martin, 2004). We found that most (about 80%) of the synapses in the pathway from area 10 to STG (areas Ts1-2) target spines (Germuska et al., 2006; Medalla et al., 2007), which are enriched on the dendrites of cortical excitatory neurons (**Figures 5B**, green; **5C,E,F** from Medalla et al., 2007). The laminar specificity of these spine-targeting boutons can influence which dendritic domains or population of neurons are innervated (reviewed in Silberberg et al., 2005; Spruston, 2008). Layer I is populated with the distal apical dendrites of neurons from the layers below, while the middle-deep layers consist mostly of proximal and basal dendrites of pyramidal neurons (**Figure 5B**, green P; Larkman and Mason, 1990; Larkman, 1991). Layer IV consists mostly of spiny stellate excitatory neurons (**Figure 5B**, s) that receive direct thalamic input in sensory cortices [(Peters et al., 1994); reviewed in (White, 1989)]. Thus, area 10 innervates mostly spines from apical dendrites of pyramidal neurons in layer I, but may interact with other dendritic segments and excitatory neuronal types in the deep layers.

## **SYNAPTIC INTERACTION OF AREA 10 WITH INHIBITORY NEURONS IN STG**

Inhibitory neurons in the primate cortex can be reliably identified and grouped by the expression of one of three calciumbinding proteins. One group of neurons expresses parvalbumin (PV, **Figures 5A,B**, red), a second group expresses calbindin (CB, **Figures 5A,B**, magenta), and a third group expresses calretinin (not shown). In primates, these neurochemical classes of inhibitory neurons represent distinct non-overlapping populations that differ in distribution, morphology, physiology and synaptic interactions with neighboring neurons (reviewed in Defelipe, 1997). Parvalbumin labels inhibitory neurons that innervate neighboring pyramidal neurons at their proximal dendrites or somata (basket cells) or the axon initial segments (chandelier cells) (**Figures 5A,B**, red; Defelipe et al., 1989b; Kawaguchi and Kubota, 1997; Thomson and Deuchars, 1997). Parvalbumin neurons have distinct fast-spiking firing properties and are the most reliably identified by physiologic methods [(Kawaguchi and Kubota, 1997; Krimer et al., 2005); reviewed in (Markram et al., 2004)]. The proximal innervation pattern and fast-firing properties of PV neurons suggest strong inhibition with rapid temporal dynamics for controlling the timing of spike output of pyramidal neurons (Rao et al., 1999; Constantinidis and Goldman-Rakic, 2002; Trevelyan and Watkinson, 2005). On the other hand, calbindin labels inhibitory neurons that innervate the distal dendrites and spines of excitatory neurons (**Figures 5A,B**, magenta; Defelipe et al., 1989a; Kawaguchi and Kubota, 1997; Peters and Sethares, 1997). CB inhibitory neurons are physiologically diverse but they are non-fast spiking and generally have slower firing dynamics than PV neurons (Kawaguchi and Kubota, 1997; Krimer et al., 2005; Zaitsev et al., 2005). It has been suggested that CB neurons engage a modulatory type of dendritic inhibition to selectively enhance the signal-to-noise ratio of relevant inputs within a cortical column (Wang et al., 2004). Interestingly,

**FIGURE 5 | Synapses of prefrontal axons in auditory association cortex. (A,B)** Schematic summarizes the predominant synaptic connections of the pathway from frontopolar area 10 to auditory association areas Ts1-3. The pathway from area 10, which terminates predominantly in the upper layers, shows a progressive increase in the volume of axon terminals **(B**, blue dots**)** from smallest in layer I (pathway *a*), through layers II–IIIa (*b*) and largest in layer IV (*c*). Boutons that terminate in different layers interact with specific excitatory (green) and inhibitory (red and magenta) dendritic domains and possibly with distinct populations of inhibitory neurons. Area 10 innervates mostly spines of pyramidal neurons (P) in layers I–IIIa, but may interact with other excitatory neuronal types, such as the spiny stellate neurons (s) in layer IV. Among the minority of area 10 axons that innervate inhibitory neurons, synapses are formed on both PV+ (red) and CB+ (magenta) inhibitory neurons that inhibit specific dendritic domains of pyramidal neurons. Inhibitory control may also occur at the site of origin of the pathway **(A**, pathway *d***)** through inhibition of STG-directed projection neurons in area 10 (P, blue). STG-directed projection neurons are dense in the upper layers of

area 10, with apical dendrites overlapping extensively with CB+ inhibitory neurons. **(C)** Example of an electron micrograph shows tracer-labeled bouton from prefrontal cortex forming an asymmetric (excitatory) synapse (green arrow) on a PV+ dendrite (red arrowheads) in STG. Note the nearby unlabeled synapse (black arrow) on the PV+ dendritic shaft. **(D)** An electron micrograph shows a labeled prefrontal bouton forming a synapse with a spine in STG (green arrow). Note that the spine receives a symmetric (inhibitory) synapse (red arrow) from a CB+ inhibitory terminal (red arrowheads). **(E–G)** Examples of three-dimensional reconstructions from serial sections through labeled presynaptic axon terminals (At, blue) from prefrontal pathways and their corresponding postsynaptic densities (PSD, red) and postsynaptic targets in STG photographed in the electron microscope. **(E)** A small and **(F)** a large prefrontal bouton (At) each form a synapse (PSD) on a spine (sp, white). **(G)** A prefrontal bouton forms a synapse with a smooth/aspiny dendrite from a presumed inhibitory neuron in STG. Note the lack of spines and presence of nearby synapses on the shaft from unlabeled boutons, characteristics of smooth dendrites of inhibitory neurons. Adapted from Medalla et al., 2007.

PV and CB inhibitory neurons in the primate cortex, including area 10 and the STG, have complementary laminar distributions (Hendry et al., 1989; Conde et al., 1994; Kondo et al., 1994; Gabbott and Bacon, 1996; Dombrowski et al., 2001; Medalla and Barbas, 2006). While PV neurons predominate in the middledeep layers (IIIb–VI), CB neurons are densest in the upper layers (II–IIIa).

In addition to synapses on spines of presumed excitatory neurons, a smaller subset (∼20% or less) of synapses from area 10 innervates dendrites of presumed inhibitory neurons in areas Ts1-2 (**Figures 5D,G**; Germuska et al., 2006; Medalla et al., 2007). By morphology, cortical inhibitory neurons have smooth or sparsely spiny dendrites [(Feldman and Peters, 1978; Kawaguchi et al., 2006); reviewed in (Peters et al., 1991; Fiala and Harris, 1999)], which are features that can readily be assessed at high-resolution, using three-dimensional electron microscopic methods (**Figure 5G**; Germuska et al., 2006; Medalla et al., 2007; Medalla and Barbas, 2009, 2010, 2012). We have found that area 10 innervates inhibitory neurons in layers I, II–IIIa, and IV of STG, with a trend for a slightly higher frequency in progressively deeper layers (II–IIIa and IV) compared to layer I (Germuska et al., 2006; Medalla et al., 2007). The middle-deep layers of STG are more densely populated by PV inhibitory neurons (Barbas et al., 2005). These layers are also innervated by excitatory "feedforward" cortico-cortical and corticothalamic fibers in the auditory cortex (Rose and Metherate, 2005; Lee and Sherman, 2008) and other cortical areas [(Melchitzky et al., 1999; Zhu and Connors, 1999; Beierlein et al., 2003; Gonchar and Burkhalter, 2003; Negyessy and Goldman-Rakic, 2005; Zikopoulos and Barbas, 2007; Cruikshank et al., 2010); reviewed in (White, 1989; Peters et al., 1994)]. We found that terminals from area 10 in layers II–IIIa of STG (areas Ts1-2) innervate CB neurons, as well as PV neurons (**Figure 5B**; Medalla et al., 2007). Thus, area 10 has the potential to engage two distinct modes of inhibition in STG: modulatory CB-mediated as well as rapid and strong PV-mediated inhibition.

In addition to inhibition at the site of termination in STG, inhibitory control via the area 10 pathway may also occur locally within area 10, by engaging inhibitory neurons that innervate projection neurons directed to STG (**Figure 5A**, pathway *d*). We have shown that projection neurons directed to STG areas arise mostly from the upper layers (II–III) of area 10 (Medalla et al., 2007). Pyramidal neurons in layers II–III extend their apical dendrites and arborize profusely in layer I [e.g., (Larkman and Mason, 1990; Larkman, 1991); reviewed in (Silberberg et al., 2005)]. Thus, the proximal and extensive distal apical domains of STG-directed projection neurons in area 10 are sites of potential synaptic innervation by the distinct classes of inhibitory neurons. In particular, the robust laminar overlap of STG projection neurons and CB neurons in layers II–IIIa suggests CB-mediated inhibition of auditory-directed projection neurons in area 10 (**Figure 5A**, magenta; Medalla et al., 2007).

In summary, pathway terminations from area 10 are diverse in distribution and synaptic features, which depend on the specific STG area and cortical layer of termination. These varied innervation patterns suggest that area 10 may have diverse influences on excitatory and inhibitory microcircuits in STG areas, allowing flexibility in prefrontal-auditory functional interactions for complex cognition.

## **FUNCTIONAL IMPLICATIONS OF PREFRONTAL-AUDITORY PATHWAYS IN HIGH-ORDER COGNITION**

#### **AUDITORY CONNECTIONS OF AREA 10 FOR COMPLEX COGNITION**

The robust and diverse synaptic pathways from area 10 to the STG suggest a tight link between area 10 function and auditory processing. The evidence reviewed points to a specialized relationship of area 10 with the auditory association cortex as a key frontal "auditory field." Area 10 receives information from almost all levels of processing in the STG—from the very detailed and early sensory processing in belt and parabelt areas to the complex high-order processing of acoustic stimuli for con-specific communication in temporal polar areas [e.g., (Poremba et al., 2004; Kusmierek and Rauschecker, 2009; Kikuchi et al., 2010); reviewed in (Romanski and Averbeck, 2009)]. The question thus arises as to how these connections are used in high-order cognitive functions mediated by area 10.

The frontal pole is thought to be part of the working memory network together with dorsolateral prefrontal areas 9/46, engaged for active maintenance of information for the task at hand (reviewed in Petrides, 2000; Barbas et al., 2002). Baddeley (1996) proposed two important components for working memory in humans—a visuospatial scratchpad and an articulatory loop. This is not surprising from an anatomical perspective, given that the working memory system in the lateral prefrontal cortex is indeed predominated posteriorly by visual-related periarcuate areas (caudal area 46 and FEF) and anteriorly by auditory-related areas (area 10 and mid-dorsolateral areas 46/9). Interestingly, this auditory-visual gradient along the rostro-caudal axis of lateral prefrontal areas is thought to coincide with a functional hierarchy by complexity of processing (reviewed in Petrides, 2000; Barbas et al., 2002; Burgess et al., 2007; Koechlin and Hyafil, 2007; Smith et al., 2007; Badre and D'Esposito, 2009). Area 10 is thought to be at the top of this hierarchy, which mediates the most complex and abstract cognitive tasks. This idea is exemplified in human functional neuroimaging studies that have shown specific recruitment of area 10 during complex multi-tasking, when one task must be temporarily suspended to attend to another task [(Koechlin et al., 1999; Burgess et al., 2000; Braver et al., 2003; Dreher et al., 2008; Tsujimoto et al., 2010); reviewed in (Burgess et al., 2007; Koechlin and Hyafil, 2007; Smith et al., 2007; Badre and D'Esposito, 2009)]. This can be illustrated for instance when a person interrupts cooking a meal to answer the phone and subsequently resumes cooking from where one left off. Moreover, the functional imaging studies have shown that multi-task functions and other related complex cognitive tasks mediated by area 10 rely on phonologic processing of "inner thoughts" for mental tracking of multiple information streams [e.g., (Bunge et al., 2005; Christoff et al., 2011); reviewed in (Burgess et al., 2007)]. Thus, area 10 may engage its strong auditory links for abstract representation of information in organized thought during complex cognitive tasks.

It is also interesting that the evolved complexity of cognition from monkeys to humans seems to parallel the cortical expansion of both the auditory system and the frontal pole. In particular, in humans the language cortices have evolved as specialized systems for verbal articulation as the frontal pole has also expanded laterally (Semendeferi et al., 2011; Sallet et al., 2013; Neubert et al., 2014). This evolutionary trend is also evident in the connections of area 10 in different species of non-human primates. Neural tracing in marmoset monkeys has shown a smaller representation of auditory connections in area 10 compared to macaque monkeys (Barbas et al., 1999; Burman et al., 2011). The above evidence is consistent with the idea that as the auditory system evolved in humans, area 10 kept pace with more extensive auditory connections. This trend is also reflected in behavior: as cognitive tasks in humans rely more on verbal information, more complex tasks can be tackled [e.g., (Frith, 1996; Wenzlaff and Wegner, 2000; Brewin and Smart, 2005; Bunge et al., 2005; Christoff et al., 2011); reviewed in (Knight et al., 1999; Allen et al., 2008; Winkler et al., 2009; Perrone-Bertolotti et al., 2014)]. By comparison with humans, monkeys have relatively poor capacity for working memory in the auditory domain (Ng et al., 2009; Scott et al., 2012, 2013a). Based on the functional and anatomical evidence, it is likely that the auditory interactions in a highly evolved area 10 are crucial to a role in high-order cognition in humans.

#### **INTERACTION OF FRONTOPOLAR AND MEDIAL PREFRONTAL "AUDITORY FIELDS" FOR COGNITIVE CONTROL**

In addition to area 10, auditory signals impinge on a wide spectrum of prefrontal areas (**Figure 1**). Particularly strong auditory connections are seen for medial prefrontal areas 32 and 25 in the ACC. Importantly, these prefrontal auditory "hotspots" are also robustly interconnected with each other through intrinsic prefrontal pathways (Barbas and Mesulam, 1985; Barbas and Pandya, 1989; Barbas et al., 1999; Medalla and Barbas, 2010). Thus, we have previously suggested that the prefrontal cortex may use auditory information either through direct connections with STG, or indirectly through local interconnections between auditoryrelated prefrontal areas (Barbas et al., 2005; Medalla et al., 2007; Medalla and Barbas, 2010).

Frontopolar area 10 and anterior cingulate areas, the rostral and medial frontal "auditory fields" that are most strongly interconnected with the STG, are also robustly linked with each other. In particular, a pathway from ACC area 32 innervates spines of excitatory neurons in area 10 through large and synapticallyeffective boutons in layers II–III [(Medalla and Barbas, 2010); reviewed in (Barbas et al., 2013)]. These ACC boutons are larger than in the pathways linking area 10 with other dorsolateral prefrontal areas (46 or 9), and they are comparable in size to the "feedforward/driving" pathway terminations in layer IV of sensory cortices (Melchitzky et al., 2001; Anderson and Martin, 2002, 2005, 2006; Medalla et al., 2007), including the terminations from area 10 to layer IV of STG (Germuska et al., 2006). We have previously suggested that the large ACC terminations may drive and redirect activity in area 10 to help select relevant signals (presumably from auditory areas), and suppress noise for complex multi-task functions (Medalla and Barbas, 2010). This idea is consistent with the prominent role of the ACC in allocating attention and in task-switching, especially when cognitive demand is high (reviewed in Barbas and Zikopoulos, 2007; Botvinick, 2007; Lee et al., 2007; Schall and Boucher, 2007). Interestingly, the ACC has a demonstrated influence in several high-order auditory-related functions by affecting activity in auditory cortices. Microstimulation of ACC evokes species-specific vocalization in monkeys and affects activity in auditory cortices [(Muller-Preuss et al., 1980; Muller-Preuss and Ploog, 1981); reviewed in (Vogt and Barbas, 1988)]. In humans, correlated gamma-band activity in ACC and auditory areas suggests functional coupling between these cortices during demanding cognitive tasks (Mulert et al., 2007).

The ACC-frontopolar-auditory network may mediate highorder filtering of auditory processing to allow communication in a noisy environment. Such filtering has been discussed for the auditory modality, in general (reviewed in Conway et al., 2001; Denham and Winkler, 2006; Jaaskelainen et al., 2007; Micheyl et al., 2007). The pathways that link area 10 with the auditory areas may help keep track of internal thoughts, which is important for working memory and problem solving [e.g., (Brewin and Smart, 2005); reviewed in (Wenzlaff and Wegner, 2000)]. This hypothesis is consistent with findings that the ACC and area 10 are activated during mental tracking of multiple tasks (reviewed in Burgess et al., 2007). The anatomical evidence supports this hypothesis, but behavioral and functional studies that employ auditory-related tasks are needed to investigate the role of auditory input, and the specialized projections from ACC, for cognitive processing in area 10.

## **PREFRONTAL-AUDITORY PATHWAY DISRUPTION IN DISEASE**

Pathology in the prefrontal-auditory network has been implicated in schizophrenia, a disease characterized by high distractibility, disordered thought patterns and auditory hallucinations (reviewed in Cohen et al., 1996; Honey and Fletcher, 2006; Allen et al., 2008). *Post-mortem* studies in brains of schizophrenic patients show that specific markers for populations of inhibitory and excitatory neurons are diminished in auditory-related ACC and mid-dorsolateral prefrontal areas, disrupting the excitatory and inhibitory balance necessary for normal cognitive functions (reviewed in Benes, 2000; Beasley et al., 2002; Volk and Lewis, 2002; Vogels and Abbott, 2007; Fornito et al., 2009; Eisenberg and Berman, 2010). For instance, the ACC shows a decrease in pyramidal neuron density in the deep layers (Benes et al., 2001) and reduced overall activity in schizophrenia (Fletcher et al., 1999; Kerns et al., 2005; Snitz et al., 2005; Allen et al., 2007; Leicht et al., 2010). The deep layers of ACC give rise to projections to lateral prefrontal cortices in monkeys, in a pattern expected to hold for humans based on the predictions of the structural model for connections (Barbas and Rempel-Clower, 1997). Based on available data on the synaptic circuits within the prefrontal-auditory network, we have speculated that pathologic hypofunction in ACC may weaken its output to inhibitory neurons in other auditory-related prefrontal cortices such as frontopolar area 10 and mid-dorsolateral areas 9/46 (Medalla and Barbas, 2009, 2010). The strong influence of the ACC on CB inhibitory neurons, in particular, suggests a mechanism to suppress noise (Wang et al., 2004). By the same principle, hypofunction especially in the deep layers of ACC may reduce excitation to frontopolar area 10, which is engaged when keeping track of internal thoughts to perform multiple tasks in humans (reviewed in Burgess et al., 2007; Koechlin and Hyafil, 2007; Smith et al., 2007; Badre and D'Esposito, 2009). Weakening of these interactions may account for the high distractibility and disordered thought patterns in schizophrenia (reviewed in Cohen et al., 1996; Honey and Fletcher, 2006; Allen et al., 2008). Finally, the relative activation of ACC and auditory cortices appears to help distinguish actual from inner speech in humans, in functions that are disrupted in schizophrenic patients who experience auditory hallucinations (Frith et al., 1995; McGuire et al., 1995). The strong and specific anatomic pathways interlinking prefrontal and auditory cortices reviewed here thus suggest a key role of these interactions in high-order cognition, and may help explain the impairments in processing of "inner thoughts" that account for the distractibility, disordered thought process and auditory hallucinations in schizophrenia.

## **ACKNOWLEDGMENTS**

This work is supported by grants from NIH (NIMH R01MH057414 and NINDS R01NS024760, Barbas; NIMH K99MH101234, Medalla) and CELEST, an NSF Science of Learning Center (NSF OMA-0835976).

#### **REFERENCES**


anterior cingulate cortex in subjects with schizophrenia. *Am. J. Psychiatry* 162, 1833–1839. doi: 10.1176/appi.ajp.162.10.1833


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 02 February 2014; paper pending published: 14 February 2014; accepted: 27 March 2014; published online: 16 April 2014. 6*

*Citation: Medalla M and Barbas H (2014) Specialized prefrontal "auditory fields": organization of primate prefrontal-temporal pathways. Front. Neurosci. 8:77. doi: 10.3389/fnins.2014.00077*

*This article was submitted to Auditory Cognitive Neuroscience, a section of the journal Frontiers in Neuroscience.*

*Copyright © 2014 Medalla and Barbas. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Auditory connections and functions of prefrontal cortex

## *Bethany Plakke\* and Lizabeth M. Romanski*

*Department of Neurobiology and Anatomy, University of Rochester School of Medicine and Dentistry, Rochester, NY, USA*

#### *Edited by:*

*Yukiko Kikuchi, Newcastle University Medical School, UK*

#### *Reviewed by:*

*Kadharbatcha S. Saleem, National Institutes of Health, USA Christopher I. Petkov, Newcastle University, UK*

#### *\*Correspondence:*

*Bethany Plakke, Department of Neurobiology and Anatomy, University of Rochester School of Medicine and Dentistry, 601 Elmwood Ave., Box 603, Rochester, NY 14642, USA e-mail: bethany\_plakke@ urmc.rochester.edu*

The functional auditory system extends from the ears to the frontal lobes with successively more complex functions occurring as one ascends the hierarchy of the nervous system. Several areas of the frontal lobe receive afferents from both early and late auditory processing regions within the temporal lobe. Afferents from the early part of the cortical auditory system, the auditory belt cortex, which are presumed to carry information regarding auditory features of sounds, project to only a few prefrontal regions and are most dense in the ventrolateral prefrontal cortex (VLPFC). In contrast, projections from the parabelt and the rostral superior temporal gyrus (STG) most likely convey more complex information and target a larger, widespread region of the prefrontal cortex. Neuronal responses reflect these anatomical projections as some prefrontal neurons exhibit responses to features in acoustic stimuli, while other neurons display task-related responses. For example, recording studies in non-human primates indicate that VLPFC is responsive to complex sounds including vocalizations and that VLPFC neurons in area 12/47 respond to sounds with similar acoustic morphology. In contrast, neuronal responses during auditory working memory involve a wider region of the prefrontal cortex. In humans, the frontal lobe is involved in auditory detection, discrimination, and working memory. Past research suggests that dorsal and ventral subregions of the prefrontal cortex process different types of information with dorsal cortex processing spatial/visual information and ventral cortex processing non-spatial/auditory information. While this is apparent in the non-human primate and in some neuroimaging studies, most research in humans indicates that specific task conditions, stimuli or previous experience may bias the recruitment of specific prefrontal regions, suggesting a more flexible role for the frontal lobe during auditory cognition.

**Keywords: monkey, working memory, acoustic, frontal lobe**

### **INTRODUCTION**

Connections from the auditory cortex to the frontal lobes mediate a number of functions including language, object recognition and spatial localization. Discerning what types of auditory information reaches the frontal cortex, where that auditory input originates, and how information is utilized by the frontal lobes for complex behaviors, such as communication, is a fundamental question of neuroscience.

The frontal cortex is a heterogeneous region with multiple functional subdivisions, including the prefrontal cortex, which lies in the anterior frontal lobe and consists of medial, lateral, and orbital subdivisions. This review will focus on the lateral prefrontal cortex including the dorsolateral regions (DLPFC) (areas 8, 46, and 9) and the ventrolateral regions (VLPFC) (areas 12/47, 45, and 12 orbital) (**Figure 1**). Possible auditory functions and connections of frontal pole, medial and orbital areas of the frontal lobe are described elsewhere including Medalla and Barbas (2014).

The frontal lobe is well-known for its role in speech and language processes and executive functions that include working memory, planning, and decision making (Fuster, 2008). Early lesion studies indicated that lesions of prefrontal cortex caused impairments in delay response, delay spatial alternation, and delay object alternation tasks (Pribram et al., 1952; Mishkin and Pribram, 1955, 1956; Pribram and Mishkin, 1956; Mishkin et al., 1969). Later, more precise lesion studies implicated DLPFC in spatial and delay processes (Malmo, 1942; Mishkin, 1957; Passingham, 1975; Mishkin and Manning, 1978). In contrast, lesions of VLPFC resulted in impaired performance in non-spatial tasks and implicated VLPFC in object recognition (Mishkin and Manning, 1978). In the last two decades there have been a wealth of neuroimaging studies in human subject and single-unit recording studies in non-human primates, which confirm a role in working memory for the prefrontal cortex (Funahashi et al., 1993; Awh et al., 1996; McCarthy et al., 1996; Miller et al., 1996; Owen et al., 1996; Courtney et al., 1997; D'Esposito et al., 2000; Fuster et al., 2000; Bunge et al., 2003; Postle et al., 2003; Bor et al., 2004; Rowe et al., 2008). Unfortunately most neurophysiology studies utilize visual working memory paradigms. Therefore, while these studies have shed light on the neuronal mechanisms underlying prefrontal visual information processing and visual memory, there is much less known about prefrontal processing of auditory information. Fortunately, the past decade has seen several advances in our understanding of the organization of the primate

auditory cortical system and how this system, is critical for speech, auditory attention, and multisensory integration. These advances have made it possible and necessary to investigate the pathways that bring auditory information to the prefrontal cortex and the neural mechanisms which underlie auditory cognition.

#### **AUDITORY CONNECTIONS OF THE FRONTAL LOBES**

Historically, anatomical tract tracing and lesion degeneration studies provided evidence that presumptive auditory cortical regions send projections to prefrontal cortex. One general principal observed in these studies of prefrontal-auditory connections is the rostro-caudal topography (Pandya and Kuypers, 1969; Chavis and Pandya, 1976; Petrides and Pandya, 1988; Seltzer and Pandya, 1989; Barbas, 1992; Romanski et al., 1999a,b). Reciprocal connections are apparent between the caudal STG and caudal PFC, including caudal (dorsal) area 46, periarcuate area 8a and the inferior convexity, or ventral prefrontal cortex—areas 12 and 45 (Petrides and Pandya, 1988; Barbas, 1992). In addition, middle and rostral STG are reciprocally connected with rostral 46 and area 10 and orbito-frontal areas 11 and 12 (Pandya and Kuypers, 1969; Pandya et al., 1969; Chavis and Pandya, 1976). Furthermore, studies noted projections from the anterior temporal lobe to orbital and medial prefrontal cortex and the frontal pole (Petrides and Pandya, 1988; Barbas, 1993; Carmichael and Price, 1995; Hackett et al., 1999; Romanski et al., 1999a).

While these studies inform us of the existence of temporal prefrontal connectivity they do not indicate which of these connections carries acoustic information. To understand the flow of auditory information to the prefrontal cortex, it is necessary to know what parts of the temporal lobe are, in fact auditory responsive. Progress in defining the connections and areal organization of the auditory cortex was greatly accelerated by advancements in auditory cortical neurophysiology and neuroanatomy. First, Rauschecker and colleagues delineated the physiological boundaries of auditory cortical core and belt regions (Rauschecker et al., 1995, 1997). These studies provided the first electrophysiological evidence for three separate tonotopic regions in the nonprimary lateral belt cortex (AL, ML, and CL) (antero-lateral belt, middle-lateral belt, caudal-lateral belt cortex respectively) with frequency reversals separating them. Compared with primary auditory cortical neurons, which readily respond to relatively simple acoustic elements, such as pure tones, neurons of the lateral belt association cortex prefer complex stimuli including bandpassed noise and vocalizations (Rauschecker et al., 1995, 1997). Simultaneous advances in anatomical organization confirmed and extended these findings. Several groups showed that primary and non-primary auditory cortex could be distinguished on the basis of differential staining for the calcium binding protein parvalbumin along with cytoarchitectonic changes (Morel et al., 1993; Jones et al., 1995; Kosaki et al., 1997; Hackett et al., 1998). These combined physiological and anatomical studies made it possible to recognize individual boundaries of the auditory cortical system and showed its organization to consist of a primary, or core, region composed of potentially two areas, AI and R, surrounded by and connected to, a medial and lateral belt of secondary auditory association cortex with a lower density of parvalbumin staining (Morel et al., 1993; Jones et al., 1995; Kosaki et al., 1997; Hackett et al., 1998). A third zone lying adjacent to the lateral belt is the parabelt auditory cortex. Further distinctions between the core and belt, and the belt and parabelt have been based on myeloarchitectonic, and connectional differences. Recent neurophysiological studies have examined the more complex auditory and multisensory responses of the belt (Ghazanfar et al., 2005; Kusmierek et al., 2012 ´ ), the rostral superior temporal gyrus (STG) (Kikuchi et al., 2010; Tsunada et al., 2012; Scott et al., 2013, SFN; Perrodin et al., 2014), and the cortex of the superior temporal sulcus (STS) (Ghazanfar et al., 2008; Kikuchi et al., 2010).

Two relevant studies followed on the heels of this revised characterization of auditory core and belt regions and described prefrontal-auditory connections in the context of these defined core, belt and parabelt regions (Hackett et al., 1999; Romanski et al., 1999a). A series of *>*15 tracer injections into discrete cytoarchitectonic regions of the prefrontal cortex showed that the rostral, orbital and ventrolateral areas of the prefrontal cortex are reciprocally connected with the rostral STG, the rostral belt (areas AL and anterior ML) and the rostral parabelt, whereas caudal principalis and some dorsolateral regions (46, 8, 9) of the prefrontal cortex are reciprocally connected with the caudal belt (caudal ML and CL) and caudal parabelt (Romanski et al., 1999a). Importantly, projections to the PFC from higher order cortical auditory regions such as parabelt and STS were more robust than the early auditory cortical regions such as the lateral belt, suggesting a cascade of lighter to stronger projections to the prefrontal cortex from early to late auditory processing regions (**Figure 2**), (Hackett et al., 1999; Romanski et al., 1999a). Furthermore, the ventrolateral prefrontal cortex (VLPFC) was shown to have a very dense reciprocal connection with the dorsal bank of the STS including areas TPO (temporal parieto-occipital junction), and TAa (temporal area a) (Romanski et al., 1999a; **Figures 2**, **3**).

While these anatomical studies suggest that the PFC receives auditory information, since afferents from the auditory belt and parabelt terminate in PFC, more direct evidence that projections are carrying acoustic information is obtained when anatomical and physiological methods are combined. In one such study, Romanski et al. (1999b) recorded auditory responses from lateral belt auditory areas AL, ML, and CL and placed injections of anatomical tracers into these physiologically defined regions. These connections were topographically organized such that projections from AL typically involved the frontal pole (area 10), the rostral principal sulcus (area 46), the inferior convexity (areas 12/47 and 45) and the lateral orbital cortex (areas 11, 12o). In contrast, projections from area CL targeted the dorsal periarcuate cortex (area 8a, frontal eye fields) and the caudal principal sulcus (area 46), and a small connection with caudal inferior convexity (areas 12/47 and 45) and, in two cases, premotor cortex (area 6d). These highly specific rostrocaudal topographical frontaltemporal connections suggest the existence of separate streams of auditory information that targeted previously identified visual domains in the prefrontal cortex. One pathway, originating in CL, targets caudal DLPFC; the other pathway, originating in AL, targets rostral prefrontal cortex and VLPFC. Previous studies have designated these regions in the frontal lobe as being involved in visuo-spatial (DLPFC) and visual object (VLPFC) processing based on physiological responses to visual stimuli (Wilson et al., 1993; O'Scalaidhe et al., 1997). Thus, it is possible the pathways originating from anterior and posterior auditory belt and parabelt cortices are analogous to the "what" and "where" streams of the visual system and that auditory functions in VLPFC and DLPFC could also be object and spatially based, respectively.

Further exploration of prefrontal auditory connections has focused on the VLPFC following the discovery of auditory responsive neurons in VLPFC, (Romanski and Goldman-Rakic, 2002). Anatomical connections of VLPFC regions with auditory belt, parabelt and rostral STG have been confirmed in other anatomical studies (Gerbella et al., 2010; Saleem et al., 2014) though clarification on whether area 45 or 12/47 receives greater auditory inputs is still needed (Romanski, 2012). Previous examination of responses in area 45 and the gradation of visual responses from the frontal eye fields located just dorsal to it argue in favor of stronger visual inputs to area 45 (Webster et al., 1994; Bullier et al., 1996; O'Scalaidhe et al., 1997). Previous cytoarchitectonic studies of VLPFC in *M. Mulatta* differ with the recent studies cited by Gerbella et al. (2010) and Saleem et al. (2014). Our organization of VLPFC is based on parcellations mainly by Preuss and Goldman-Rakic (1991) with additional studies by Carmichael and Price (1995), Medalla and Barbas (2014), Price (2008), Barbas (1988), and Saleem et al. (2008). Furthermore, we maintain that characterization of VLPFC must be accomplished with both anatomical and physiological data as stated above. Cytoarchitectonic boundaries vary across the different the studies we have referenced. Preuss and Goldman-Rakic (1991) show a much smaller boundary for area 45 while Saleem et al. (2014) shows it to be much larger. Gerbella et al. (2010) and Petrides and Pandya (2002) show differences in their parcellation of area 12. These differences confirm that additional studies combining neurophysiology and anatomical methods are needed to understand the organization of the frontal lobe in general, and VLPFC specifically.

One principle that has emerged from anatomical studies is that a cascade of afferents reaches the VLPFC (**Figure 4**). The densest projections to VLPFC originate from the STS and as-yetuncharacterized regions of the rostral STG, while the parabelt provides a moderate innervation of rostral and ventrolateral regions (area 12/47 and area 12o). In contrast, the anterior and

middle auditory belt cortex provides only a modest input to VLPFC (Hackett et al., 1999; Romanski et al., 1999a,b; **Figures 3**, **4**), though their input may arrive earliest due to fewer synaptic junctions. This is in agreement with the notion that our association cortical regions receive highly processed information about a sensory stimulus after it has undergone transformations through earlier sensory cortical regions.

#### **PHYSIOLOGICAL RESPONSES OF NEURONS IN PFC**

Prior to 2000, responses to acoustic stimuli of a non-spatial nature were sporadically noted across a widespread region of the frontal lobe in Old and New World primates (Newman and Lindsley, 1976; Benevento et al., 1977; Wollberg and Sela, 1980; Tanila et al., 1992, 1993; Watanabe, 1992; Bodner et al., 1996). Several of these studies used auditory stimuli in combination with visual stimuli as task elements but did not systematically explore the selectivity of auditory responsive cells (Ito, 1982; Vaadia et al., 1986, 1989; Watanabe, 1992). Despite reports of responses to complex stimuli including clicks, environmental sounds and vocalizations, the prior neurophysiological recordings in the frontal lobe of non-human primates failed to demonstrate a discrete clustering of auditory cells indicative of an auditory responsive domain (Newman and Lindsley, 1976; Tanila et al., 1992, 1993). Building on the connectional studies which predicted an auditory-responsive region in VLPFC (Romanski et al., 1999a,b), neurophysiological studies investigated the responses of lateral PFC neurons. Romanski and Goldman-Rakic (2002), described a discrete auditory responsive region in the macaque prefrontal cortex in which a region of VLPFC had neurons which responded to a variety of complex acoustic stimuli including species-specific vocalizations. The auditory responsive region was small (4 × 4 mm) and was localized to the VLPFC, mostly area 12/47 and potentially area 45 (Romanski and Goldman-Rakic, 2002). Further analysis showed that prefrontal neurons typically responded to stimuli that were acoustically similar (Romanski et al., 2005). Specifically neurons responded to species-specific Temporal area.

Temporal Sulcus; STG, Superior Temporal Gyrus; AL, Antero-lateral; ML, Middle-lateral; CL, Caudal-lateral; TPO, temporal parieto-occiptal area; TAa,

vocalizations that had a similar acoustic morphology and not a similar behavioral referent, (Romanski et al., 2005). Analysis of the classification of the vocalizations with a hidden Markov model (HMM), showed that the HMM was more effective at discriminating among the call classes than previous methods, reaching a classification performance of almost 75% correct. Furthermore the complex responses of prefrontal neurons to these sounds could be predicted as linear functions of the probabilistic output of the HMM (Averbeck and Romanski, 2006).

The auditory responsive region in VLPFC lies adjacent to a region where visually responsive neurons, face cells and faceresponsive patches have been localized (O'Scalaidhe et al., 1997, 1999; Tsao et al., 2008). Thus, the idea that VLPFC neurons might be responsive to both vocalization and faces is hardly surprising. VLPFC, as mentioned previously, receives afferents from both auditory and visual portions of the temporal lobe as well as a robust innervation from the multisensory area TPO in the dorsal bank of the STS (Barbas, 1988; Romanski et al., 1999a,b). A study by Benevento et al. (1977) found neurons in VLPFC (area 12o) that were responsive to simple auditory and visual stimuli (clicks and light flashes), and, as demonstrated with intracellular recordings, at least some of these interactions were due to convergence on single cortical cells. Using speciesspecific vocalizations and their accompanying facial gestures, Romanski and colleagues demonstrated multisensory responses to simultaneously presented faces and vocalizations in VLPFC neurons (Sugihara et al., 2006). Sugihara et al. (2006) further characterized multisensory responses as enhanced or suppressed. Multisensory neurons accounted for about half the recorded population with ∼4% unimodal auditory responses and ∼50% unimodal visual responses, suggesting that a large proportion of VLPFC neurons are likely to be multisensory if tested properly. Since the region of VLPFC where multisensory neurons are located overlaps extensively with the location of previously characterized auditory responses, it is probable that previous studies which examined either unimodal auditory or unimodal visual functions included multisensory cells in their populations.

The anatomical studies described above have shown that the auditory responsive regions in VLPFC receives very dense innervation from areas TPO and TAa multisensory zones on the dorsal bank of the STS (Romanski et al., 1999a; **Figures 2**, **4**), with moderate projections from the rostral STG and parabelt and lighter inputs from the anterior and middle belt (AL and ML) to VLPFC. Thus, VLPFC neurons may receive acoustic afferents from early (belt) or late (TPO/rostral STG) regions of the auditory cortical hierarchy. It is possible that the specific pattern of afferent input may dictate the types of neurophysiological responses found in VLPFC. The fact that neurons in VLPFC exhibit a wide range of response latencies to auditory stimuli (30–330 ms) also supports this concept of heterogeneous afferents (Romanski and Hwang, 2012). For example, a small number of auditory responsive neurons have extremely fast latency responses, these cells could be receiving inputs from early auditory cortical areas (Romanski et al., 2005; Romanski and Hwang, 2012) with narrow selectivity and phasic onsets to acoustic stimuli. It is possible that these feature-sensitive, rapid onset responses could arise from early auditory cortex such as the anterior belt region AL which is known to project sparsely to this region and would arrive first. In contrast, neurons which respond to combinations of complex acoustic features, or more generally to task variables may be more likely to receive afferents from parabelt and rostral STG which would be several synapses away from VLPFC and would presumably take longer and provide more highly processed information about an auditory object. Finally, multisensory responses in VLPFC could arise as a *de novo* integration of inputs from auditory belt, parabelt or rostral STG and extrastriate visual cortical areas such as TE. Alternatively multisensory VLPFC responses could originate from multisensory cells of TPO or TAa on the dorsal bank of the STS, which send dense projections to VLPFC. Multisensory responses in VLPFC have longer latencies than unimodal auditory response latencies measured in the same cells (multisensory response range 50–490 ms; Romanski and Hwang, 2012).

#### **LOCALIZATION OF AUDITORY FUNCTION IN DLPFC AND VLPFC: ANIMAL STUDIES**

As reviewed above the frontal cortex receives afferents from early and late auditory cortical processing stations allowing frontal lobe neurons to detect and discriminate auditory stimuli (Ito, 1982; Watanabe, 1992; Romanski and Goldman-Rakic, 2002; Poremba et al., 2004), or to be remembered during auditory working memory processes (Plakke et al., 2013a). Divergent processing pathways conforming to ventral and dorsal "what" and "where" streams, respectively, originate in the belt and parabelt auditory cortex and terminate in VLPFC and DLPFC regions as described above. DLPFC receives information from caudal auditory regions, which have been shown to preferentially process auditory location information and VLPFC receives input from rostral auditory regions that show a greater preference for type of stimuli (Romanski et al., 1999b; Rauschecker and Tian, 2000; Tian et al., 2001; Kusmierek et al., 2012 ´ ). Based on these anatomical connections it has been proposed that DLPFC is primarily involved in spatial processing while VLPFC may be preferentially involved in object processing.

This traditional division of labor between dorsal and ventral prefrontal regions is supported by some neurophysiology studies. Early studies demonstrated that DLPFC neurons were preferentially responsive when acoustic stimuli were presented from specific directions (Azuma and Suzuki, 1984) or when animal subjects localized auditory or visual stimuli (Vaadia et al., 1989). In latter studies which focused on working memory processes, neurons in DLPFC were active during the mnemonic processing of auditory and visual location (Kikuchi-Yorioka and Sawaguchi, 2000; Artchakov et al., 2007). In both studies, a portion of DLPFC neurons were spatially selective during the delay for both auditory and visual cues.

However, other neurophysiological studies demonstrated that DLPFC neurons were active during non-spatial tasks. Studies by Watanabe (1992) showed that prefrontal neurons responded when tones were predictive of juice reward and Bodner et al. (1996) described auditory working memory cells in DLPFC during a task where a tone was paired with a color to predict reward. More recently, recordings during a non-spatial auditory delayed match-to-sample task demonstrated task related activity in neurons in both dorsal and ventral PFC (Plakke et al., 2013a). Prefrontal neurons responded to sound cues during both the sample and match/nonmatch presentations, and also during the delay, response, and reward periods of the task (Plakke et al., 2013a), (**Figure 5**). During this task, cells in this region appeared to be responsive to tracking when a relative stimulus is needed to be remembered or responded too. The general task responses of these neurons suggests that the role of the DLPFC in auditory working memory may be for rule representation or response control, as previously suggested in studies of visual working memory

**FIGURE 5 | Example cells with activity occurring during the presentation of the auditory sample, match/nonmatch and during the decision period of an auditory delayed match-to-sample task. (A)** An example cell with increased activity during the auditory cues, wait time and response periods for correct trials. **(B)** An example cell with increased firing rated during auditory cue and wait time periods for correct trials. Y-axis label is frequency (imp/s); bin = 100 ms; asterisk signifies significant change in firing rate from baseline.

(Fuster et al., 1982; Miller et al., 1996; Iba and Sawaguchi, 2002; Warden and Miller, 2007). Together these studies suggest that the role of DLPFC in auditory memory may relate more to task and cognitive requirements than to acoustic stimulus encoding.

In contrast to the task related processes in DLPFC, neurophysiology in non-human primates suggest that VLPFC may perform both stimulus and task related processes. As described above, VLPFC contains neurons that are responsive to complex sounds including, species-specific vocalizations and human vocalizations (Romanski and Goldman-Rakic, 2002; Romanski et al., 2005), suggesting a role for VLPFC in auditory object processing. VLPFC involvement in auditory feature processing is supported by studies showing single-units that encode categories of vocalization call types (Averbeck and Romanski, 2004, 2006; Plakke et al., 2013b). Moreover, evidence that VLPFC cells are multisensory and respond to the simultaneous presentation of faces and their corresponding vocalizations strongly suggests a role in recognition and identity processing, a ventral stream function (Sugihara et al., 2006).

Several studies from Cohen and colleagues have examined neuronal responses in VLPFC during non-spatial auditory performance tasks (Cohen et al., 2006, 2007; Russ et al., 2008a; Tsunada et al., 2011). For example, VLPFC neurons were modulated during non-spatial auditory discrimination but showed no modulation during spatial auditory discrimination (Cohen et al., 2009). Further recordings over a large region of PFC which Cohen termed "vPFC" during categorization and decision making paradigms, demonstrate that prefrontal neuronal activity is correlated with behavioral choices (Russ et al., 2008b; Lee et al., 2009), although the location of these prefrontal neurons does not appear to overlap entirely with the ventrolateral PFC regions previously shown to be auditory responsive (Romanski et al., 2005). Nonetheless, inactivation studies are needed to determine whether VLPFC is essential in the performance of working memory or decision making tasks. Toward this end, a recent study by Plakke et al., (2013c, SFN) shows that transient inactivation of VLPFC impairs performance in an audiovisual working memory task and suggests an essential role in mnemonic processing when acoustic stimuli are involved. Thus, processing of auditory information in DLPFC may relate more to the task demands, while processing of auditory information in VLPFC is clearly related to auditory features and task demands.

#### **PROCESSING OF AUDITORY INFORMATION IN THE HUMAN DORSAL AND VENTRAL PFC**

The anatomical and neurophysiological studies performed in nonhuman primates delineate somewhat separable roles for dorsal and ventral frontal lobe regions. How these functional streams in nonhuman primates map onto auditory function in the human brain is, as yet, not completely clear. Although it is well known that speech and language functions rely on the cortex within the inferior frontal gyrus (IFG) neuroimaging studies have provided evidence that the human frontal lobe is also active during auditory discrimination (Zatorre et al., 1994), auditory detection (Benedict et al., 1998, 2002), auditory attention/oddball tasks (Stevens et al., 2000), auditory judgments (Zatorre et al., 1998), and auditory working memory (Anurova et al., 2003, 2001; Grady et al., 2008; Protzner and McIntosh, 2009). These studies have described discrete activations in DLPFC and VLPFC that are related to the type of information processed. For example, several imaging studies have described activations in DLPFC (superior frontal gyrus, superior frontal sulcus) during auditory spatial localization (Griffiths et al., 1998; Martinkauppi et al., 2000; Weeks et al., 2000; Lipschutz et al., 2002; Lutzenberger et al., 2002; Zatorre et al., 2002; Gaab et al., 2003; Leiberg et al., 2006). Conversely, VLPFC activation (IFG; BA 45,47), has been noted during auditory non-spatial processes, such as listening to melodies, attending pitch/rhythm, determining sound length, word/voice discrimination and auditory working memory (Zatorre et al., 1994, 1998; Platel et al., 1997; Linden et al., 1999; Pedersen et al., 2000; Alain et al., 2001; Kiehl et al., 2001; Muller et al., 2001; Kaiser et al., 2003; Maddock et al., 2003; Arnott et al., 2004; Rämä et al., 2004; Rämä and Courtney, 2005; Kaiser et al., 2009; Koelsch et al., 2009).

In addition, activation of DLPFC (Brodman's area 46/9) occurs during various complex working memory paradigms. For instance there were increases in activity in DLPFC when participants listened to numbers and made self-ordered choices (Petrides et al., 1993). Dorsolateral activity is also increased during studies of divided auditory attention (Benedict et al., 1998) as well as encoding of nonverbal sounds (Opitz et al., 2000). Taken together these studies suggest DLPFC may be recruited more frequently based on cognitive demands including the type of process that is necessary such as monitoring information in memory, encoding auditory information, as well as manipulation of spatial information.

In contrast, the IFG and related VLPFC regions are activated during phonological processing (Klein et al., 1995; Buchanan et al., 2000; Strand et al., 2008), semantic processing (Caplan et al., 2000; Burton et al., 2003), syntactic operations (Waters et al., 2003), naming objects (Tranel et al., 2003), word discrimination (Buchanan et al., 2000; Vaden et al., 2013), and directed auditory attention (Hill and Miller, 2010) reinforcing the connection of this region with language and auditory feature processing. Interestingly, there has also been activation within the IFG during nonverbal auditory stimulus detection (Linden et al., 1999; Kiehl et al., 2001; Maeder et al., 2001), nonverbal auditory discrimination (Zatorre et al., 1994; Muller et al., 2001), and auditory working memory (Kaiser et al., 2003). The activation of the more anterior regions of the IFG (areas 47 and 45) during nonverbal auditory sound detection, discrimination and auditory feature detection (Zatorre et al., 2004; Fecteau et al., 2005) suggests these areas may play a more fundamental role in auditory processing, paralleling the auditory responsive region that has been described in non-human primates (Romanski and Goldman-Rakic, 2002; Romanski et al., 2005). The role of VLPFC in general sound discrimination is also supported by its activation when listening to rhymes (Burton et al., 2003) and by the case of a patient with an inferior frontal lesion that was impaired on detecting modulated sounds (Griffiths et al., 2000).

#### **VERBAL vs. NON-VERBAL STIMULI AND COGNITIVE REQUIREMENTS**

Localization of auditory cognition to discrete networks in the human brain is complicated by the potential activation of language networks when verbal stimuli are used as memoranda in cognitive tasks. Comparing studies when verbal and nonverbal stimuli have been used has revealed activation in both DLPFC and VLPFC including the middle frontal gyrus and the anterior and posterior portions of the IFG. As might be predicted, VLPFC is active for language related functions but VLPFC activation also occurs for simple nonverbal auditory target detection/discrimination with tones (Linden et al., 1999; Kiehl et al., 2001; Muller and Basho, 2004; Huang et al., 2012), animal cries (Maeder et al., 2001), and melodies (Zatorre et al., 1994). Conversely, verbal discrimination has activated DLPFC (middle frontal gyrus) (Pedersen et al., 2000). This suggests that the prefrontal cortex is not simply dividing the processing of auditory information based solely on verbal information (**Figure 6**).

In order to examine auditory function independent of language circuits, noise bursts were used for both a spatial (localization) and non-spatial (pitch discrimination) auditory task (Alain et al., 2001). As predicted by the dorsal/ventral streams model, pitch processing evoked greater activation in the IFG while localization evoked greater activity in the superior frontal gyrus (Alain et al., 2001). The use of identical auditory stimuli under different demands, which led to diverse activation patterns, indicates cognitive load can recruit specialized areas within the frontal cortex (Alain et al., 2001). A similar pattern of results emerged in Du et al. (2013). In this study, subjects were trained to discriminate simultaneously presented vowel sounds. Vowels were presented with different frequencies or from different locations; this information was irrelevant for correct performance, but served as implicit information. After training, participants were exposed to both spatial and pitch differences while making vowel judgments and improved accuracy of vowel discrimination was observed when the pair of vowels presented matched their previous training (frequency or location). In addition, magnetoencephalography (MEG) activity was localized to the anterior ventral frontal regions for the group exposed to frequency changes, while MEG changes were more frequent in dorsal frontal regions for the group exposed to location changes (Du et al., 2013). Thus, even when participants did not make any explicit frequency or location choices the short term exposure to implicit spatial and object information segregated the dorsal and ventral prefrontal cortex respectively. This demonstrates that the activation of a particular neural network can be biased based on subtle cognitive demands.

In general, a division of labor for spatial and non-spatial information may exist (Ahveninen et al., 2013), and in non-human primates that do not possess language functions, may be most prominent. However, it is the underlying cognitive contingencies of a task that may ultimately recruit specific regions of frontal cortex. For example, pitch discrimination/ detection and auditory attention have been found to activate both DLPFC (Griffiths et al., 1998; Linden et al., 1999; Muller et al., 2001; Gaab et al., 2003; Seydell-Greenwald et al., 2013) and VLPFC Linden et al., 1999; Alain et al., 2001; Gaab et al., 2003; Seydell-Greenwald et al., 2013. Moreover, attention may bias which auditory network is recruited. Lipschutz et al. (2002) demonstrated that during dichotic listening when attention was divided both the lateral middle frontal gyrus and the IFG were activated although, the middle frontal gyrus was active when participants were told to selectively attend. Therefore, examining only whether a task has a spatial component is insufficient to determine which prefrontal regions will be recruited. Performing diverse types of cognitive processes such as making a pitch discrimination or dividing auditory attention may rely on different or overlapping auditory networks.

It has been questioned whether frontal lobe activity is related to cognitive demands or the stimulus properties within the task. Surprisingly, when verbal working memory is required, more dorsal regions (area 46/9) are recruited (Petrides et al., 1993; Petrides, 1996; Crottaz-Herbette et al., 2004). Whereas ventral regions (BA 47/12; 45) are utilized during active retrieval (Petrides, 1996;

Kostopoulos and Petrides, 2008). In addition, areas of activation in frontal cortex can be shared by different auditory working memory demands (Arnott et al., 2005). It has also been suggested that within the auditory domain, DLPFC is more important for heavier memory loads, while VLPFC is necessary for dealing with attentional interference (Huang et al., 2013). Postle (2006) has reviewed the role of the prefrontal cortex with respect to information encoding, segregation, and manipulation of information, for visual working memory. Similar treatment needs to be given to the processing of auditory information and how dorsal and ventral prefrontal areas contribute to its encoding, manipulation, and short-term storage.

## **SUMMARY**

The prefrontal cortex is involved in auditory cognition and receives information from a wide array of auditory regions including multisensory (STS) and unimodal auditory cortical regions. Understanding how that information is processed by the PFC and utilized during auditory cognition is an ongoing investigation. In the non-human primate, single-unit studies have indicated VLPFC has a specialized region for processing auditory stimuli but is also multisensory (Sugihara et al., 2006; Romanski, 2012) and involved in some aspects of higher auditory function (Cohen et al., 2007; Bizley and Cohen, 2013). In contrast, the DLPFC may have auditory responsive units but activity has mainly been observed during tasks requiring cognitive processes or localization of sound (Bodner et al., 1996; Kikuchi-Yorioka and Sawaguchi, 2000; Artchakov et al., 2007; Plakke et al., 2013a). Research in non-human primates suggests a functional division between DLPFC and VLPFC, with DLPFC utilized for spatial and auditory task requirements, while VLPFC is recruited for non-spatial and auditory feature processing as well as some cognitive operations. In humans, more cortical regions and cognitive ability complicate the picture. The spatial and non-spatial divide is somewhat supported; but new research suggests a more nuanced view is necessary and that different neural areas are recruited under various stimulus and cognitive demands. These recent neuroimaging studies provide support for a role of the prefrontal cortex in complex auditory cognition and demonstrate that attentional demands can shift which prefrontal network is activated. Overall, research from both humans and non-human primates suggests that the frontal cortex is essential in auditory cognition. Determining which specific cortical networks and prefrontal regions are critical in various aspects of auditory cognition is necessary for comprehending and treating communication disorders.

### **ACKNOWLEDGMENTS**

This work was supported by grants from the National Institutes of Health grant (DC: 04845) and the Training and Hearing Balance and Spatial Orientation grant (DC: 009974). The authors thank Lia Soneson for graphic design assistance on **Figures 3**, **4**.

#### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fnins. 2014.00199/abstract

## **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 31 March 2014; accepted: 26 June 2014; published online: 23 July 2014. Citation: Plakke B and Romanski LM (2014) Auditory connections and functions of prefrontal cortex. Front. Neurosci. 8:199. doi: 10.3389/fnins.2014.00199 This article was submitted to Auditory Cognitive Neuroscience, a section of the journal*

*Frontiers in Neuroscience.*

*Copyright © 2014 Plakke and Romanski. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Hearing in action; auditory properties of neurons in the red nucleus of alert primates

## *Jonathan M. Lovell 1,2\*, Judith Mylius 1, Henning Scheich1 and Michael Brosch1*

*<sup>1</sup> Special Lab for Primate Neurobiology, Leibniz Institute for Neurobiology, Magdeburg, Germany*

*<sup>2</sup> Deutsches Zentrum für Neurodegenerative Erkrankungen, Magdeburg, Germany*

#### *Edited by:*

*Monica Munoz-Lopez, University of Castilla-La Mancha, Spain*

#### *Reviewed by:*

*Simon Baumann, Newcastle University, UK Manuel S. Malmierca, University of Salamanca, Spain*

#### *\*Correspondence:*

*Jonathan M. Lovell, Special Lab for Primate Neurobiology, Leibniz Institute for Neurobiology, Brenneckestrasse 6, 39118 Magdeburg, Germany e-mail: jlovell@ifn-magdeburg.de*

The response of neurons in the Red Nucleus pars magnocellularis (RNm) to both tone bursts and electrical stimulation were observed in three cynomolgus monkeys (*Macaca fascicularis*), in a series of studies primarily designed to characterize the influence of the dopaminergic ventral midbrain on auditory processing. Compared to its role in motor behavior, little is known about the sensory response properties of neurons in the red nucleus (RN); particularly those concerning the auditory modality. Sites in the RN were recognized by observing electrically evoked body movements characteristic for this deep brain structure. In this study we applied brief monopolar electrical stimulation to 118 deep brain sites at a maximum intensity of 200µA, thus evoking minimal body movements. Auditory sensitivity of RN neurons was analyzed more thoroughly at 15 sites, with the majority exhibiting broad tuning curves and phase locking up to 1.03 kHz. Since the RN appears to receive inputs from a very early stage of the ascending auditory system, our results suggest that sounds can modify the motor control exerted by this brain nucleus. At selected locations, we also tested for the presence of functional connections between the RN and the auditory cortex by inserting additional microelectrodes into the auditory cortex and investigating how action potentials and local field potentials (LFPs) were affected by electrical stimulation of the RN.

**Keywords: red nucleus, auditory, primate, electrophysiology, neuron**

## **INTRODUCTION**

The Red Nucleus (RN) or *nucleus ruber* lies in the rostral midbrain and derives its name from the high concentration of ironcontaining pigments within its cellular structure (Hernandez, 1931). The RN is comprised of two subnuclei, the rostral pars parvocellularis (Oka and Jinnai, 1978; Onodera and Hicks, 2009), and the smaller caudal pars magnocellularis (Yamaguchi and Goto, 2006). Gibson et al. (1985) found that in rhesus monkeys, the magnocellular RN controls the onset, velocity, and duration of specific upper limb movements, with electrical stimulation causing discrete contractions of limb muscles in the shoulder, elbow, wrist, and digits.

Compared to its role in motor behavior little is known about sensory response properties of neurons in the RN, particularly those concerning the auditory modality. In chloraloseanesthetized cats, neurons in the rostral RN have been found to exhibit short-latency responses to clicks and tones (Massion and Albe-Fessard, 1963; Irvine, 1980). Tuning curves are generally very broad, and neurons receive inputs from both ears and are sensitive to interaural time and level differences (Shinkarenko, 1984; Shinkarenko et al., 1985). Bratus et al. (1981) recorded evoked potentials from the magnocellular RN in anesthetized cats, and showed that the response latency varies from 3.5 to 10.5 ms, depending on the intensity of the auditory stimulation. It has been hypothesized that the RN is a component of the subcortical path for reflexes connected with turning the ear in the direction of sound (Courville, 1968), and that it is a part of the complex pathways concerned with the animal's postural and defensive responses to acoustic stimulation (Martin and Dom, 1970).

Here we report auditory response properties of the RN in alert macaque monkeys. These data were obtained in a series of studies that were primarily designed to characterize effects of the dopaminergic ventral midbrain on auditory processing. The auditory responses in the RN attracted our attention because of their potential role in audiomotor interactions, i.e., its possible involvement in guiding movement-related reactions to sounds, and its involvement in monitoring the sounds that are evoked by movement.

In these experiments, we moved microelectrodes along tracks oriented approximately in a mediolateral direction, through different deep brain structures toward RN. Sites in the RN were recognized by observing electrically evoked body movements characteristic for this deep brain structure. At the same time, the electrodes served to record action potentials and local field potentials (LFPs) from the RN, and to characterize neuronal responses to clicks and pure tones. At selected locations, we also tested for the presence of functional connections between the RN and the auditory cortex by inserting additional microelectrodes into the auditory cortex and analysing how action potentials and LFPs were affected by electrical stimulation of the RN.

## **MATERIALS AND METHODS**

The subjects used in this experiment were three adult cynomolgus monkeys (*Macaca fascicularis*); monkey E (male, 5.5 kg) monkey W (male, 6.2 kg), and monkey M (female, 4 kg). Each monkey was fitted with a head holder and a recording chamber (18 mm diameter), which was positioned in the right (monkeys E and M) and the left (monkey W) temporal regions of the skull. Details of the surgery are given elsewhere (Brosch and Scheich, 2008). Experiments were conducted in a sound-attenuated doublewalled room. During the experiments the monkeys were seated in a primate chair and their heads were fixed to allow for acute recordings. All experiments were carried out under approval of the animal care and ethics committee of the State Sachsen-Anhalt (No. 28-14 42502/2-806 IfN) and in accordance with the guidelines for animal experimentation of the European Communities Council Directive (86/609/EEC).

Micro-fiber electrodes were used for both electrical stimulation and recording neuronal activity. The electrodes were constructed from a single fiber with a 25µm tungsten and iridium core, insulated with quartz glass to a total diameter of 80µm, with tips sharpened to a point using an ultrafine diamond grinder. The impedance of each electrode was tested and found to be in the ranges of 0.5 M for recording electrodes to less than 25 k for low impedance stimulation electrodes. The electrodes were then fashioned to fit in the micro-drive and were able to extend over 40 mm through the macaque brain. Electrodes were advanced remotely into the brain from outside of the sound-attenuating room using a seven-channel microdrive (System Eckhorn, Thomas Recording).

Through these electrodes, we simultaneously recorded multiunit activity using a band-pass filter set between 1 to 7 kHz, and LFPs with a band-pass of 0.1 to 250 Hz, using the systems SUA-02 and LFP-03 (Thomas Recording). All signals were fed into an AD data-acquisition system (32-channel Alpha-Map, Alpha-Omega). The LFP sampling rate was 658 Hz, and for multiunit activity it was 50 kHz. We only stored time stamps of multiunit activity when spike amplitudes exceeded noise thresholds.

Initially, MRI images were used to plan electrode trajectories toward the RN. As can be seen in **Figure 1A**, the iron rich pigmentation within the RN makes this structure highly visible when imaged using MRI. The temporal location of the recording chambers restricted the electrode trajectories to those shown in **Figure 1B** Thus, electrodes were inserted into the brain at an angle of 18 (±6) degrees from the horizontal plane. Once the electrodes had been advanced to around 19 mm from the surface of the dura membrane, we commenced presentation of electrical and auditory stimulation to the monkey. Finer localization of the two red nuclei was pinpointed by observing upper body movements (e.g., arms, shoulders, face, mouth, and eyelids) in response to the electrical stimulation (note that the monkeys were head fixed and seated in a primate chair, thus limiting the full expression of certain electrically evoked movements). No histological verification of stimulation sites was undertaken because more than one hundred stimulation sites were tested in each monkey and because all monkeys in the current study are still involved in other research projects, following the principles of Replacement, Reduction, and Refinement in animal experimentation.

Multiunit activity and LFPs were recorded from the auditory cortex using a 16-channel microdrive (System Eckhorn, Thomas Recording). The microdrive was oriented slightly off the dorsoventral plane (10◦). Electrodes entered the cortex between

stereotactic coordinates A5 to A9 and D7.5 to D12.5 and subsequently were moved through the parietal cortex into the core fields of the auditory cortex.

Monopolar electrical stimulation was delivered using a Multichannel Systems STG 400-4 stimulator, programmed to generate 50 ms trains of square edged biphasic pulses. These were presented at a frequency of 100 Hz, and a pulse width of 700µs and at a maximum intensity of 200µA, to evoke minimal body movements only. We presented acoustic stimulation in the form of clicks and tone bursts. Clicks were produced by a waveform generator (WG1, Tucker-Davis Technologies) and presented every 2 s during each of the electrode tracks. Tone bursts were generated using a computer coupled to an array processor (Tucker-Davis Technologies, Gainesville, FL), which converted the signal from digital to analog at a sample rate of 100 kHz. Each frequency (40 in total) was repeated 10 times, in pseudorandom order, from

responses to electrical stimulation were observed. **(B)** Frequency of sites where electrical stimulation evoked upper body movement for all electrode tracks (diamonds monkey E, triangles monkey W) and frequency of

evoked by the electrical stimulation of the left and right RN (monkeys E and W combined). A, arm; S, shoulder; F, face; M, mouth; E, eye (left nucleus:

circles, right nucleus: triangles).

0.11 to 27.2 kHz, covering a range of 8 octaves spaced by 0.2 of an octave. The tone burst envelope was 100 ms in duration, ramped by 5 ms with an inter tone interval of 900 ms. The auditory stimulus gain was increased using an amplifier (model A202; Pioneer, Long Beach, CA), then fed to two bilaterally placed free-field loudspeakers (Canton Karat 720.2), positioned 1.1 m from the subject, at intensities not exceeding 65 dB (re. 20µPa). The sound pressure level was measured with a free-field 1/2 microphone (40AC, G.R.A.S.), located near to the monkey's head.

Custom written MATLAB (version 2007b, MathWorks, Natick, MA, USA) programs were used for the off-line analyses of multiunit and LFP recordings. In the RN, tuning curves were generated from each multiunit recording of responses to the tone bursts which were presented at 40 different frequencies, as described elsewhere (Brosch et al., 1999). From these tuning curves, we obtained the best frequency, the bandwidth, the firstspike latency, and the last-spike latency. In the auditory cortex, we computed a post-stimulus time histogram (PSTH) with a bin size of 10 ms for each of the multiunit recordings, relative to the onset of electrical stimulation in the RN (≥50 trials). Because the electrical stimulation generated an artifact that lasted maximally 20 ms, this period was excluded from the analysis. A multiunit recording was considered to have responded to the electrical stimulation if the number of discharges within at least two of 30 consecutive 10-ms post-stimulus bins was significantly above the number of discharges immediately before stimulus presentation (Wilcoxon signed rank tests, two-sided). For each of the LFP recordings, evoked potentials were calculated by averaging the LFP relative to the onset of auditory or electrical stimulation. In order to analyse auditory-evoked potentials, the lowest value of the first main trough of the waveform was used to indicate the response latency and frequency bandwidth.

### **RESULTS**

In this study we applied brief electrical stimulation to 118 deep brain sites in two monkeys (monkey E and monkey W). **Figure 2A** shows the results of a representative track through the red nuclei, following the center dashed line from **Figure 1B**. At 86 of these sites (76 in monkey E and 10 in monkey W) we could evoke upper body movements that are characteristic for the RN. **Figures 2B,C** show the frequency and probability of evoked upper body movements and no responses to the electrical stimulation for all tracks in monkeys E and W. At the remaining sites, no such movements were observed. Upper body and face movements were exclusively observed in two clusters between 22 and 28 mm and between 30 and 35 mm from the dura surface, which is in good correspondence to the location of the two red nuclei in both hemispheres and relative to published brain atlases (see **Figure 1B**).

When we recorded action potentials from these 118 stimulation sites, we found that 18 sites (14 in monkey E and 4 in monkey W) responded to the click stimulation, which was presented every 2 s to the monkeys. With one exception, the auditory responses were found exclusively at the 86 sites where electrical stimulation evoked upper body movements. Thus, auditory responses were found at 19.7% of the sites in the RN. We also tested the click stimulation at another 335 sites (312 in monkey E and 23 in monkey W) at which electrical stimulation was not applied directly to reduce the number of electrical stimulations delivered to the monkeys, but which fell within 200µm from a site from which upper body and face movements were elicited. Of these sites, most are presumed to be located in the RN; 67 sites (19.5%; 55 in monkey E and 12 in monkey W) did respond to the click stimulation. **Figure 2D** shows the probability for finding click responses for all of the 335 sites in and around the RN. **Figure 2E** shows the category and frequency of the main body movements evoked by electrical stimulation of the left and right RN.

Indication for auditory responses in the RN was also obtained at 23 of 51 sites in a third monkey (monkey M). From this monkey, neuronal recordings were obtained from a brain region that corresponded to that tested in monkeys E and W, and following the same stereotactic coordinates used on the two monkeys. Electrical stimulation was omitted so long as the electrode was estimated to be within the RN, in order to reserve electrical

stimulation exclusively for locations within the ventral tegmental area where the monkey would receive brain stimulation reward for bar pressing. In this experiment, the presence of auditory responses around the estimated depth of the RN was actually used as a landmark to determine the position of the proximal ventral tegmental area.

Auditory sensitivity of RN neurons was analyzed more thoroughly at 15 sites (12 in monkey E and 3 in monkey W), where strong auditory responses to clicks were observed and from where upper body movements could be electrically evoked, or which fell within 200µm from such a site. To achieve this, we presented 400 tone bursts over a range of frequencies from 0.11 to 27.2 kHz. **Figures 3A,B** shows spike responses to the tonal stimuli from two representative sites.

The auditory sensitivity of the RN was also detected in the LFPs that were recorded in parallel with the spikes. **Figure 3C** shows auditory-evoked potentials that were obtained from the same site at which the tuning curve was obtained from the spike responses shown in **Figure 3B**. We noticed that, similar to the spike responses, auditory-evoked potentials were phase locked to low stimulus frequencies during the full duration of the tones (note that our filter setting did not allow for us to reliably detect frequency following responses above 250 Hz). For higher stimulus frequencies, by contrast, auditory-evoked potentials consisted only of an initial, negative going wave that occurred about 10 ms

after stimulus onset, but which was not followed by any later wave. Auditory potentials were evoked from a frequency range that was similarly as wide as the range that elicited multiunit responses.

In our sample, RN neurons had best frequencies that were distributed between 0.16 and 17.75 kHz (median 0.9 kHz; **Figure 4A**) and generally exhibited quite broad tuning curves (median 7.2 octaves; **Figure 4B**). Responses started between 9 to 17 ms post-tone onset (median 11 ms; **Figure 4C**) and lasted, in most cases, for the full duration of the stimulus sound (**Figure 4D**). We also noted that at 6 sites (4 in monkey E and 2 in monkey W) the responses were partially phase locked to the waveform of the tested sounds. The upper limit of phase locking ranged between 0.44 and 1.03 kHz (median 0.9 kHz; **Figure 4E**).

Our experimental approach also allowed us to demonstrate a functional connection from the RN to the primary auditory cortex. To this end, we applied monopolar electrical stimulation to the RN and performed microelectrode recordings of LFPs and spikes from the auditory cortex. Analysis of 204 stimulation/recording pairs (monkeys E and W) revealed that such stimulation always resulted in an electrically evoked potential in the auditory cortex, which is similar to the representative evoked potentials shown in **Figure 5A**. The earliest negative trough of the evoked waveform occurred around 34 ms post-electrical stimulation. This was rapidly followed by a positive peak and a second negative trough, followed by a slow positive deflection ending after about 400 ms post-stimulation. At a few sites (*n* = 15, 7.4%) in the auditory cortex, electrical stimulation in the RN even resulted in an elevated firing of the neurons at a latency of around 50 ms and a duration of 90 ms (**Figure 5B**).

#### **DISCUSSION**

This is the first time that auditory responses have been recorded from the RN of primates, and reveals that groups of neurons that control upper body movement also respond to acoustic signals. The auditory sensitivity observed here is generally in good agreement with that previously observed in chloralose-anesthetized cats (Massion and Albe-Fessard, 1963; Irvine, 1980; Bratus et al., 1981; Shinkarenko, 1984; Shinkarenko et al., 1985). Thus, neurons in the RN respond with short latencies, to a broad range of tone frequencies, can phase lock their responses to quite high tone frequencies, and respond to the inputs from the two ears, resulting in some selectivity for localizing sound sources. These auditory properties suggest that the RN receives inputs from the non-lemniscal part of the auditory brainstem, even though such projections appear not be present (Massion, 1967; Mylius et al., 2013). A striking difference to the previous studies is the percentage of sites in the RN at which auditory responses were found. Whereas in all cat studies more than 90% of neurons were found to respond to clicks or tones, in the monkey RN such responses were only found in around 20% of the sites. These differences may reflect methodological differences in defining the borders of the RN (e.g., anatomy vs. electrically evoked movements, or anesthetized vs. alert). They may even reflect species differences, such as the importance of pinnae movements for sound source localization. Thus, it is possible that the RN may be under less auditory influence in primates than in lower mammals,

and perhaps reflects a diminished importance of the RN in motor functions, particularly in primates.

The existence of responses to auditory and electrical stimulation in these brainstem nuclei is of considerable use, especially as a landmark when targeting proximal "harder to detect" structures, such as the ventral tegmental area. Our study also suggests a functional connection from the RN to the auditory cortex. Electrical stimulation of the RN-evoked post-synaptic potentials in the auditory cortex, which were suprathreshold in some neurons, thereby causing them to fire action potentials. The latter finding together with the short response latency of ∼30 ms, suggests the presence of a projection from the RN to the auditory cortex that has a slow conduction velocity in the order of 1 mm/ms although the existence of this pathway still needs to be anatomically demonstrated.

Our findings corroborate previous accounts that the RN provides a place for audiomotor interactions (Courville, 1968; Martin and Dom, 1970; Shinkarenko, 1984; Shinkarenko et al., 1985). Since the RN appears to receive inputs from a very early stage of the ascending auditory system, our results suggest that sounds can modify the motor control exerted by this brain nucleus. Our finding of a functional connection from the RN to the auditory cortex suggests that neuronal activity controlling motor behavior might also affect auditory processing in the auditory system. This pathway might even provide a source for motor-related activity in the auditory cortex (Brosch et al., 2005).

## **ACKNOWLEDGMENTS**

The authors would like to thank Jörg Stadler for producing the MRI images. Supported by the Deutsche Forschungsgemeinschaft (DFG, SFB 779) and the European Regional Development Fund (ERDF 2007-2013).

## **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 08 March 2014; accepted: 22 April 2014; published online: 13 May 2014. Citation: Lovell JM, Mylius J, Scheich H and Brosch M (2014) Hearing in action; auditory properties of neurons in the red nucleus of alert primates. Front. Neurosci. 8:105. doi: 10.3389/fnins.2014.00105*

*This article was submitted to Auditory Cognitive Neuroscience, a section of the journal Frontiers in Neuroscience.*

*Copyright © 2014 Lovell, Mylius, Scheich and Brosch. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Responses of neurons in the marmoset primary auditory cortex to interaural level differences: comparison of pure tones and vocalizations

Leo L. Lui 1, 2 \*, Yasamin Mokri <sup>1</sup> , David H. Reser <sup>1</sup> , Marcello G. P. Rosa1, 2 and Ramesh Rajan1, 2, 3

*<sup>1</sup> Department of Physiology, Monash University, Clayton, VIC, Australia, <sup>2</sup> Australian Research Council, Centre of Excellence for Integrative Brain Function, Monash University, Clayton, VIC, Australia, <sup>3</sup> Ear Sciences Institute of Australia, Subiaco, WA, Australia*

## Edited by:

*Yukiko Kikuchi, Newcastle University Medical School, UK*

> Reviewed by: *Yi Zhou,*

*Arizona State University, USA Pawel Kusmierek, Georgetown University, USA*

#### \*Correspondence:

*Leo L. Lui, Department of Physiology, Monash University, Building 13F, Clayton, VIC 3800, Australia leo.lui@monash.edu*

#### Specialty section:

*This article was submitted to Auditory Cognitive Neuroscience, a section of the journal Frontiers in Neuroscience*

Received: *19 September 2014* Accepted: *01 April 2015* Published: *20 April 2015*

#### Citation:

*Lui LL, Mokri Y, Reser DH, Rosa MGP and Rajan R (2015) Responses of neurons in the marmoset primary auditory cortex to interaural level differences: comparison of pure tones and vocalizations. Front. Neurosci. 9:132. doi: 10.3389/fnins.2015.00132* Interaural level differences (ILDs) are the dominant cue for localizing the sources of high frequency sounds that differ in azimuth. Neurons in the primary auditory cortex (A1) respond differentially to ILDs of simple stimuli such as tones and noise bands, but the extent to which this applies to complex natural sounds, such as vocalizations, is not known. In sufentanil/N2O anesthetized marmosets, we compared the responses of 76 A1 neurons to three vocalizations (Ock, Tsik, and Twitter) and pure tones at cells' characteristic frequency. Each stimulus was presented with ILDs ranging from 20 dB favoring the contralateral ear to 20 dB favoring the ipsilateral ear to cover most of the frontal azimuthal space. The response to each stimulus was tested at three average binaural levels (ABLs). Most neurons were sensitive to ILDs of vocalizations and pure tones. For all stimuli, the majority of cells had monotonic ILD sensitivity functions favoring the contralateral ear, but we also observed ILD sensitivity functions that peaked near the midline and functions favoring the ipsilateral ear. Representation of ILD in A1 was better for pure tones and the Ock vocalization in comparison to the Tsik and Twitter calls; this was reflected by higher discrimination indices and greater modulation ranges. ILD sensitivity was heavily dependent on ABL: changes in ABL by ±20 dB SPL from the optimal level for ILD sensitivity led to significant decreases in ILD sensitivity for all stimuli, although ILD sensitivity to pure tones and Ock calls was most robust to such ABL changes. Our results demonstrate differences in ILD coding for pure tones and vocalizations, showing that ILD sensitivity in A1 to complex sounds cannot be simply extrapolated from that to pure tones. They also show A1 neurons do not show level-invariant representation of ILD, suggesting that such a representation of auditory space is likely to require population coding, and further processing at subsequent hierarchical stages.

Keywords: primate, auditory cortex, response properties, sound localization, interaural level differences

## Introduction

The mammalian primary auditory cortex (A1) is critical for sound localization (Thompson and Cortez, 1983; Jenkins and Merzenich, 1984; Heffner and Heffner, 1990), and the responses of single A1 neurons encode information about sound source locations (e.g., Phillips and Irvine, 1981, 1983; Rajan et al., 1990; Recanzone et al., 2000; Mrsic-Flogel et al., 2005; Woods et al., 2006; Kusmierek and Rauscheker, 2014; see Grothe et al., 2010 for review). However, the spatial sensitivity of A1 neurons can vary with stimulus level (e.g., Brugge et al., 1996; Middlebrooks et al., 1998; Reale et al., 2003; Zhou and Wang, 2012). This is in contrast to psychophysical performance in both humans and monkeys, in which sound localization abilities remain relatively constant at intensities of 30 dB sound pressure level (SPL) and greater (Su and Recanzone, 2001; Recanzone and Beckerman, 2004; Vliegen and Van Opstal, 2004; Sabin et al., 2005). Pooling models where population responses are used to disambiguate sound location with changes in stimulus level can reconcile the discrepancy (Stecker et al., 2005; Miller and Recanzone, 2009). However, physiological studies to date have only used a limited range of auditory stimuli, primarily tones and/or noise bands, and thus we do not know whether such read-out strategies apply to more complex stimuli, including many types of natural sounds.

The social behavior of marmoset monkeys requires a range of context-specific vocalizations (Stevenson and Poole, 1976). These provide auditory stimuli that are both complex and biologically relevant (Miller et al., 2009), which have become the focus of many studies (i.e., Lu et al., 2001; Nagarajan et al., 2002; Eliades and Wang, 2003, 2013). Neurons in marmoset A1 responses reliabily to marmoset vocalization (i.e., Wang et al., 1995; Wang, 2000) including those which were anesthetized with sufentanil/N2O (Rajan et al., 2013). Although the spatial receptive fields in response noise-bands have been well characterized in marmoset A1 (Zhou and Wang, 2012, 2014), stimulus driven responses to complex stimuli may not necessarily reflect those elicited by noise bands and pure tones, given that nonlinear spectrotemporal interactions underlie A1 neuronal responses (Sadagopan and Wang, 2009).

The perception of a sound source derived from the processing of two types of binaural cues (interaural level differences [ILDs] and interaural timing differences [ITDs]), as well as monaural cross-frequency band comparison (Wightman and Kistler, 1997; Jin et al., 2004; Grothe et al., 2010). The latter predominates localization in the vertical plane (elevation), whereas localization in the horizontal plane (azimuth) depends on the two binaural cues. The range of ILD of the marmoset head-related transfer function is compatible with detection (Slee and Young, 2010); moreover, since ILDs are most useful for localizing high frequency sounds (Wise and Irvine, 1983; Grothe et al., 2010) and marmosets hears relatively high frequencies, ILD must be considered a strong cue for sound localization. Indeed, sensitivity to ILDs has been found in the nucleus of the brachium of the inferior colliculus of marmosets (Slee and Young, 2013), which is part of the input pathway to A1.

Here we address the extent to which A1 neurons code for ILDs in marmoset vocalizations as examples of complex naturalistic stimuli, and whether sensitivity to ILDs of vocalizations differ from sensitivity to ILDs in pure tones. Moreover, we test whether the neuronal encoding of ILDs changes at different overall SPLs. This study is directed to sound localization based specifically on ILD cues of which we hypothesize to be a major contributor. Our data yield new information regarding primate A1 responses to behaviorally relevant complex sounds, which can facilitate the development of models of sound localization based upon ILD in natural conditions.

## Experimental Procedures

Experiments were conducted in six adult marmosets (Callithrix jacchus), in acute (24–72 h) recording sessions that targeted the auditory cortical areas on the surface of the left superior temporal gyrus. Experiments were approved by the Monash University Animal Experimentation Ethics Committee, which also monitored the welfare of the animals. All procedures followed the guidelines of the Australian Code of Practice for the Care and Use of Animals for Scientific Purposes.

## Preparation

The preparation has been described in detail previously (Rajan et al., 2013) and closely followed the protocols used in previous studies of visual (e.g., Lui et al., 2006) and motor (Burman et al., 2008) cortex physiology in marmosets. Each animal was premedicated with intramuscular injections of diazepam (3 mg/kg) and atropine sulfate (0.2 mg/kg). After 20 min, anesthesia was induced with intramuscular alfaxalone (Alfaxan, 10 mg/kg; Jurox, Rutherford, Australia), allowing tracheotomy, vein cannulation and craniotomy to be performed. After all surgical procedures were completed, the animal was administered an intravenous infusion of sufentanil (8µg/kg/h; Janssen-Cilag, Sydney, Australia) and dexamethasone (0.4 mg/kg/h; David Bull, Melbourne, Australia) diluted in lactated Ringer's solution (injection volume 1.5 ml/h). Artificial ventilation with a gaseous mixture of nitrous oxide and oxygen (7:3) was delivered via the tracheal cannula. The electrocardiogram, blood pressure and SpO<sup>2</sup> level were continuously monitored, and core body temperature was maintained at 38 ± 0.5◦C using a thermostatically controlled electric blanket regulated by a rectal thermometer. A head bar held in a stand was fixed to the forehead using a short screw and dental cement to hold the head rigidly. The ear canals were surgically exposed to allow the insertion of sound delivery tubes connected to Sennheiser headphone speakers (Rajan, 2000). This preparation allowed very stable recordings, including monitoring of the same cells over periods in excess of 2 h (Bourne and Rosa, 2003; Rajan et al., 2013). Targeting of the electrode toward A1 was based on previously published maps (Aitkin et al., 1986; de la Mothe et al., 2006; Reser et al., 2009; Paxinos et al., 2012) and direct visualization of the lateral sulcus, which was visible through the silicone oil-covered dura mater.

## Stimuli

Presentation of stimuli, together with acquisition and processing of neural responses, was performed using custom-developed Matlab software (MathWorks, Natick, MA), which has been employed in previously published studies (Rajan et al., 2013). Sounds were delivered using probe tubes fitted snugly into the external ear canal, at a distance of about 1 mm from the eardrum. SPL was calibrated using a type 2673 microphone, powered by a type 2804 microphone power supply, against the speaker output probe tube, and feeding into a Bruel and Kjaer (Copenhagen, Denmark) sound level calibrator type 4230 (94 dB @ 1000 Hz).

Pure tone stimuli (100 ms duration) were generated using TDT hardware (Tucker Davis Technologies, Alachua, FL). Tones were created in a TDT RX6 multi-function processor (which also controlled the presentation of vocalization stimuli, see below). Stimuli were filtered using a low-order 10 kHz highpass filter to flatten the speaker output (which was otherwise strongly biased to low frequencies): The output of the speakers was such that the unfiltered output was 20–30 dB greater from 1 to 3 kHz than at frequencies from 8 to 40 kHz, where the output was flat within ± 3 dB. This would create problems in trying to generate a broad band stimulus where the output at individual frequency components was needed to be equal. To overcome this issue, a 10 kHz high pass filter was applied, so that the output at lower frequencies was attenuated to "flatten out" the maximum output across frequencies at the speaker was very similar. The high-pass filter had a slope of 10 dB/octave, reducing the speaker output at the lower frequencies to levels similar to those from 8 to 40 kHz. Note that this has no effect on any across-frequency variations in level within a call (see below), but allowed the output for a broad band noise stimulus to be even across-frequency. This was then passed to a PA5 programmable attenuator and subsequently to an HB7 headphone driver (Tucker Davis Technologies), before delivery via the speakers.

Three recorded marmoset calls ("Ock," "Tsik," and "Twitter"; Epple, 1968; Aitkin and Park, 1993; **Figure 1**) were also used for study. These calls served as examples of complex, biologically relevant auditory stimuli. Only one call (token) per each type of vocalization was used per experiment to ensure that neural variability was not due to the variation in stimulus. In behaving marmosets, "Ock" and "Tsik" are social mobbing calls, used to attract and potentially provide cues necessary for localization by conspecifics, whereas "Twitter" is a contact call, presented upon visual contact with another conspecific (Aitkin and Park, 1993). Comparatively, the Ock call is a broadband stimulus, not only with power in the lower frequencies, but also with power distributed through the entire audible range of the marmoset. Meanwhile, the Tsik and Twitter have similar frequency characteristics, carrying multiple narrower bands of spectral energy from around 7 kHz up to 20 and 30 kHz for the Twitter and Tsik, respectively (see **Figure 1**).

Calls were recorded from monkeys not included in this sample, using Aco Pacific Type 1 microphones (Model 7012, Aco Pacific, Belmont, CA) in a sound-attenuated chamber. Microphone output was acquired via a preamplifier by a Tascam HD-P2 digital recorder (192 kHz A/D rate; TEAC America, Inc., Montibello, CA) and stored as individual.WAV files. Specific calls were extracted from the recording using a custom MATLAB interface, which allowed an experienced user to classify each call type and capture it as a separate file. Call files were trimmed, normalized in amplitude, and played at the same rate of 192 kHz. For calibration, the.WAV file was played out at maximum amplitude by the TDT system, through the same sound delivery speaker used in experiments, into a closed coupler system that contained the condenser microphone placed against the sound delivery tube. The condenser microphone sensitivity function was used to calibrate the system output as the SPL averaged over the call duration. This was used to generate.WAV files of recorded calls at other desired SPLs.

## Testing for ILD Sensitivity

Neuronal sensitivity to ILDs was tested for each stimulus. ILD sensitivity was tested using the Average Binaural Level (ABL) constant method, in which ILDs are generated by varying the level of the stimulus in two ears symmetrically around a base (average) binaural level (Irvine, 1987). For each stimulus three ABLs were used: 30, 50, and 70 dB SPL, and for each ABL, nine ILDs were generated by varying the levels in the two ears symmetrically around that ABL. Test ILDs used for each stimulus ranged from −20 dB (20 dB louder in the ear contralateral to the recording A1) to +20 dB (20 dB louder in the ear ipsilateral to the recording A1), usually in 5 dB increments, and always included 0 (same level in both ears). This stimulus set encompasses the majority of the range of ILDs measured in the marmoset, for frequencies that are predominate in the vocalizations we used (Slee and Young, 2010; Rajan et al., 2013).

## Recordings

All recordings were done with the animal inside a soundattenuated and lightproof room. Parylene-coated tungsten electrodes (FHC, Bowdoin, ME) with an impedance of 2–4 M at 1 kHz were positioned approximately normal to the cortical surface. The electrodes were advanced through the intact dura mater until the first depth at which responses from a multi-unit cluster could be reliably observed above background activity (generally when the response amplitude was about 1.5 × mean noise level). The dura mater (covered in silicone oil) was readily penetrated by the electrode, and first recordings were generally obtained within 100–200µm of the surface. From this point, multi-unit clusters were sampled at approximately 150µm intervals, until the white matter was reached (approximately 2–3 mm from the pial surface, depending on the electrode's angle relative to the banks of the lateral sulcus). As expected, neurons recorded at adjacent sites had similar characteristic frequencies (CFs). Nonetheless, other response properties changed, confirming that the separations between recording sites were sufficient to avoid repeated recordings from the same cell. Sampling (25 kHz), amplification (×1000) and filtering (bandpass 500 Hz–5 kHz) of the electrophysiological signal were achieved by a Model RA4PA Medusa Pre-amplifier and an RA16 Medusa Base Station (Tucker Davis Technologies). Data were stored digitally for both online and offline processing.

At all recording sites, the CF and threshold of the multiunit responses were first determined under interactive control by the experimenter, using online monitoring of neuronal responses while pure tone stimuli were varied in frequency and level (Rajan et al., 2013). This assessment was followed by a systematic characterization of the responses to a randomized array of stimuli

consisting of up to 22 pure tones with frequencies linearly spaced from 500 Hz to 32 kHz, and up to eight amplitudes (10–80 dB). Each tone (100 ms duration, 0.5 ms cosine rise–fall ramps) was presented binaurally at equal levels, with a 500 ms inter-stimulus interval. The full stimulus matrix was presented a minimum of 10 times, and the responses to each frequency–level combination were summed online and displayed as tone frequency–level response areas (Rajan, 1998; Carrasco and Lomber, 2009). The CF was then determined as the frequency at which responses

(right). The Fourier transforms and spectrograms were computed using

could be reliably evoked at the lowest intensity in the frequency– level response areas.

color bar for the spectrogram.

After CF determination, sensitivity to ILDs was tested using a pure tone of appropriate CF for the cell, as well as the three vocalizations. Each of these stimuli was presented at 3 ABLs and 9 ILDs. The order of test conditions was randomized, and at least 20 trials were conducted in each test condition; for example, each panel in **Figure 2** contains data from at least 540 trials (3 ABL × 9 ILD × 20 trials).

types (pure tone, Ock, Tsik, Twitter, and FRA) are arranged in columns. The color code identifies the ABL as per the legend in the top-left sub-plot. For any stimulus/ABL combinations that were sensitive to changes in ILD (*p* < 0.005), the optimal model (solid line) is also illustrated. Negative ILD values refer to ILDs that are louder in the contralateral (contra) ear, and

of these values correspond to their respective ABLs as indicated by legend. CFs for cell (A) 15 kHz, (B) 10 kHz, (C) 22 kHz, (D) difficult to define, recorded at the area of ≈28 kHz which was used. Spike rates in the FRAs are indicated by the color bar on the very right, the number on the top indicates the maximum value in spikes/second.

## Data Analysis

Single and multi-unit data were extracted from the recorded signal using the Offline Sorter program (Plexon Inc., Dallas, Tx). Single units were identified by separation of spike waveform and principal components analysis. Typically only one type (single or multi) of neural recording was extracted from each recording site. For analysis both single and multi-units were analyzed and the results presented in this report consist of data from both types of neural recordings presented together. Spike time data were exported from the Offline Sorter to MATLAB, and all subsequent analyses were performed using purpose-written MATLAB programs.

A single trial response was computed as the mean firing rate over the first 100 ms after stimulus onset. The 100 ms matched the duration of the pure tone; while the durations of all calls were shorter than 100 ms, the neural responses to the calls generally lasted for at least 100 ms and therefore we opted to keep the analysis time window consistent between stimuli. The 100 ms interval represents a reasonable time for neural activity to be read-out for a behavioral response, and a similar window has been used in other studies of sound localization encoding in marmoset A1 (Zhou and Wang, 2012).

The level of spontaneous activity was defined as the mean firing rate in the 50 ms prior to stimulus onset. Only cells that responded at a level at least two standard deviations above the mean spontaneous rate at the optimal average binaural level (ABLopt) and ILD for at least 1 call or the pure tone were included in subsequent analyses. To determine if an individual cell exhibited ILD sensitivity for a particular stimulus across all ABLs, we applied a Two-Way ANOVA, with the independent variables being ILD and the ABL level (SPL) in that test condition. A cell was considered to be sensitive for ILD if there was a significant main effect for ILD, and/or a significant interaction effect of ILD and ABL on the firing rate. To correct for the fact that we were conducting this analysis four times for each cell, once for each stimulus type, we applied a Bonferroni correction to choose the conservative value of α = 0.01.

To assess whether cells were sensitive to ILD at a given ABL for a particular stimulus, we used a One-Way ANOVA (p < 0.005); again, this conservative value was adopted to correct for multiple comparisons, as we were conducting this analysis three times (three ABL levels) for each stimulus, equating to 12 times per unit (4 stimuli × 3 analyses). In addition, we also determined ILD sensitivity type and parameters for all significant cells by fitting two ILD-response functions below, a monotonic (Equation 1) or peaked (Equation 2) function, consistent with the large body of literature segregating ILD functions in auditory cortex and lower levels of the auditory pathway into these two predominant classes. The sensitivity type was assigned on the basis of the best fitting function as described below.

Equation (1) is a sigmoid, representing a monotonic relationship between the ILD and cell response as measured by spike rate:

$$\mathbf{R(d)} = \mathbf{B} + \mathbf{A} \,\mathrm{erf}\left[ (\mathbf{d} - \mathbf{d\_0})/\sigma \right] \text{(erf is an error function)} \quad \text{(1)}$$

Here, R(d) is the fitted responses with respect to d, where d is the ILD. The four free parameters are A, which represents the peak firing rate with the baseline response subtracted, d0, representing the midpoint of the sigmoid function, σ, representing the slope of the ILD-response, and B representing the baseline response.

For peaked ILD functions, we fitted the following Gaussian function:

$$\text{R(d)} = \text{B} + \text{A } \exp[-\text{0.5((d - d\_0)}^2/\sigma^2)] \tag{2}$$

where R(d) represents the fitted response with respect to ILD, and d is ILD. Again, this function has four parameters: A again represents the amplitude (maximum—baseline), d<sup>0</sup> the position of the peak representing the optimal ILD, σ gives the slope, and B is the baseline response. In both instances, parameters were optimized using the "lsqcurvefit" function in MATLAB. The amplitude of the functions (parameter A) was constrained for both Equations (1) and (2), so that it never exceeded 1.2 times the modulation range (maximum response minus minimum response) of the cell; this minimized the probability of artificial peaks being fitted to the data, as confirmed by visual inspection of the curves. The σ is a measure of relative slope with respect to the amplitude (parameter A); large σ values would indicate that the rate of change in spiking activity that accompanied changes in ILD occurs relatively slowly, conversely, a smaller σ would indicate the rate of change is more abrupt. For the Gaussian function, this parameter which can be measured in this case in dB of ILD is usually thought of as a measure of the width of the bell curve, a measure directly dependent on the slope, and hence the σ parameter can also to be a measure of slope. In terms of the absolute change in firing rate with respect to change in ILD, this is clearly highly non-linear and depends on other parameters of the model and also the ILD in question.

As both functions had four free parameters, the optimal model was determined by comparing coefficients of determination (R 2 ) calculated using the residual to each fit. Responses that had higher R 2 values to the Gaussian function were considered to exhibit peaked sensitivity, whereas responses that had higher R 2 values to the sigmoid function were sensitive to ILD in a monotonic manner. In our convention, monotonic cells with parameter A < 0 had a higher firing rate to the contralateral side for a given stimulus/ILD combination (negative ILD values denoting louder in the contralateral ear), and A > 0 indicates cells that preferred the ipsilateral side.

To determine a neuron's optimal ABL for ILD sensitivity, we calculated the following discrimination index (DI):

$$\text{DI} = \frac{\text{R}\_{\text{max}} - \text{R}\_{\text{min}}}{R\_{\text{max}} - R\_{\text{min}} + 2\sqrt{\text{SSE}/(\text{N} - \text{M})}} \tag{3}$$

where Rmax and Rmin are the maximum and minimum firing rates respectively, SSE is the sum squared error around the mean response, N is the number of observations, and M is the number of stimulus values tested. This index was chosen as it takes into account neuronal variability, in addition to the means. This index has the advantage of considering the ability of a neuron to discriminate changes in the stimulus relative to its intrinsic level of variability (Prince et al., 2002; DeAngelis and Uka, 2003). We also calculated the modulation range of the firing rate as Rmax−Rmin which does not take into account the variance of responses.

We used t-tests to compare means. Either parametric (ANOVA and t-test) or non-parametric (Kruskal-Wallis, Wilcoxon rank-sum test) measures were used to compare means/medians depending on normality of distributions; these are identified where applicable in the reporting of the results.

#### Histology

At the end of the experiment the animal was administered an overdose of sodium pentobarbitone (300 mg/kg) (Rhone-Merieux, Brisbane, Australia) and perfused transcardially with 0.9% saline, followed by 4% paraformaldehyde in 0.1 m phosphate buffer (pH 7.4). After cryoprotection in increasing concentrations of sucrose (10–30%), and sectioning (40µm, coronal plane), alternate sections were stained for Nissl substance, myelin (Gallyas, 1979) and cytochrome oxidase (Wong-Riley, 1979), for the reconstruction of electrode tracks relative to histological borders (see Rajan et al., 2013). Electrode tracks were reconstructed with the aid of small electrolytic lesions (4µA, 10 s), which were placed at various sites during the experiments. The laminar distribution of the recorded units was also determined based on the cytoarchitecture.

Determination of the identity of the areas from which recordings were obtained used criteria defined by previous studies (de la Mothe et al., 2006; Reser et al., 2009), and the recent stereotaxic atlas of the marmoset brain (Paxinos et al., 2012). One of the principal histological features that characterizes A1 is a thick band of dense cytochrome oxidase reactivity in the middle cortical layers (including the lower part of layer 3, but centered on layer 4). The characteristics of this cytochrome oxidase band allow discrimination of the A1 from the adjacent core and belt areas. The myelination pattern of A1 relative to the latter areas is another useful indicator of the anatomical boundary (de la Mothe et al., 2006).

## Results

The responses of 76 A1 units from six animals, to CF pure tones and three natural marmoset vocalizations (Ock, Twitter, and Tsik), were characterized, using ILDs from −20 dB to +20 dB created at three ABLs. The majority of these recordings (66%) were multi-units. All recording sites in the present report were confirmed to be in A1 by histological reconstruction of the electrode tracks, which revealed that the sample included cells from layers 2 to 6. The CFs ranged from 0.9 to 31 kHz, with a median of 12 kHz. 75% of these cells had a CF between 5 and 20 kHz, a range that covers most of the energy content of the vocalizations tested (see **Figure 1**). Only 12 cells had CFs ≥ 20 kHz.

The total number of cells presented with each stimulus is shown in **Table 1**; of the 76 cells, the majority of these (62) were tested with all four stimuli, 10 were tested with three out of the four stimuli, and the remaining four cells had data from two stimuli. The number of cells sensitive to ILD is also shown in **Table 1**, and this depended on stimulus type [χ 2 (3) <sup>=</sup> <sup>9</sup>.23, <sup>P</sup> <sup>=</sup> 0.026]: more cells were sensitive to the Ock (77%) and the appropriate CF tone (78%) compared to the number of cells sensitive to the Tsik (63%) and Twitter (69%). The fact that large numbers of cells were responsive for any given call may be explained by that fact that these calls have spectral power across a broad range of frequencies, including the plateau spectrum range outside the main spectral peak, and would therefore cover at least some part of the response area of these cells. Five (of the 76) cells did not respond differentially to ILDs in any of the four stimuli. These cells were found adjacent to cells which did respond to ILDs in two or more

TABLE 1 | The number of cells that were tested for each stimulus, the number that were ILD sensitive and which ABI these preferred.


of the stimuli, suggesting that the lack of sensitivity reflected a genuine neuronal property, and not inadequate stimulation of the cell by the acoustic stimuli (see Discussion).

## Types of Sensitivity Functions for ILD

**Figure 2** shows the response patterns of four cells which exemplify the types of ILD sensitivity observed in A1. Cell A was always monotonically sensitive, favoring the contralateral ear regardless of stimulus type; this was the most frequently observed pattern in A1, representing approximately 59% of the total sample (**Figure 3**: red bars). Cell B is an example of a cell with sensitivity that, when present, is generally best described by a peaked function (here for pure tone, Tsik, and Twitter stimuli). Cell C's sensitivity to ILD depended on both stimulus type and ABL. Both peaked and monotonic responses were observed, depending on the stimulus and ABL (e.g., pure tone response). Interestingly, this cell was sensitive for ILDs in the Tsik call, but did not respond to the Twitter call, which has a similar frequency spectrum. The activity of Cell D can also be considered as complex, as its responses to certain stimuli were not predictable based on their spectral content; it responded only to the Ock stimulus, a stimulus which encompasses this cell's CF of 26 kHz, yet it did not show any response to ILDs in CF pure tones, or other calls.

It is also worth noting that where cells were deemed to be insensitive to ILD for a particular stimulus/ABL combination, in the majority of cases (87%) these cells were actually non-responsive to that combination (e.g., Cell D, Twitter responses, in **Figure 2**). Conversely, where cells were responsive to a given stimulus/ABL combination, the majority (78%) of these responses were significantly sensitive to ILD (p < 0.005), consistent with the fact that ILD is a major stimulus parameter influencing A1 neurons.

As demonstrated by the examples shown in **Figure 2**, the pattern of ILD sensitivity for each stimulus type almost always depended on ABL. This is summarized for our population of cells in **Figure 4**; each subpanel presents data for one stimulus type. For all stimuli, the majority of cells were sensitive to ILD (first

bar in each subpanel). However, the proportion of ILD-sensitive units was always marginally lower than that of cells sensitive to ABL (second bar in each subpanel), and more importantly, the majority of ILD-sensitive cells showed a significant interaction with ABL (the gray segment of each bar in these subpanels) irrespective of stimulus. This is exemplified by Cell C in **Figure 2**: its sensitivity (or the lack thereof) always depended on the ABL for every stimulus to which it was responsive. Cells that showed an interaction effect represented a large majority of the total cells tuned for ILD (76–81%); these percentages were not different between stimuli [χ 2 (3) <sup>=</sup> <sup>0</sup>.02, <sup>p</sup> <sup>&</sup>gt; <sup>0</sup>.05; **Figure 4**]. The percentage of cells that showed an interaction between ILD and ABL was about 50–60% across all stimuli (gray bars in **Figure 4**), and this difference corresponded to the total number of cells that were sensitive to ILD.

The distribution of the different patterns of ILD sensitivity for each of the four stimulus types is illustrated in **Figure 3**, with data segregated according to ABL. For the ILD-sensitive cells, the most common response function was monotonic (blue + red segments of bars in **Figure 3**; e.g., **Figure 1**, Cell A, for most ABLs and all stimuli), with 59% of ILD-sensitive cells favoring the contralateral ear (red segment of bars in **Figure 3**; e.g., **Figure 2**, Cell A), and 18% of ILD sensitive cells favoring the ipsilateral ear (blue segment of bars in **Figure 4**; e.g., **Figure 2** Cell C, Tsik and CF tone stimuli). The remaining 23% of ILD-sensitive cells were best described by a peaked function (gray segment of bars in **Figure 3**; e.g., **Figure 2**, Cell B, Tsik and Twitter calls). The distribution of sensitivities did not differ across stimulus type or level [χ 2 (11) <sup>=</sup> 19.7, p = 0.60]. For the Tsik and Twitter calls, the proportion of ILD-sensitive cells was significantly smaller at lower sound levels

FIGURE 4 | Percentage of cells sensitive to ILD and ABL for each stimulus type. The total percentages of cells sensitive to ILD and to ABL (including interaction effect) for each stimulus are shown in the four sub-plots. The color code indicates whether cells are sensitive to only one variable, or have a significant interaction effect. Cells are considered to be sensitive to ILD if there is a significant main effect for ILD and/or a significant interaction of ILD and ABL (Two-Way ANOVA, *p* < 0.01). The corresponding rule applies to ABL. These categories are not mutually exclusive, i.e., a cell can be sensitive to both ILD and ABL and therefore would contribute to both bars; however, each cell can only contribute to a single bar only once, i.e., cells with interaction effects will appear in gray regardless of main effect(s).

[Tsik: χ 2 (2) <sup>=</sup> <sup>12</sup>.0, <sup>p</sup> <sup>=</sup> <sup>0</sup>.002; Twitter: <sup>χ</sup> 2 (2) <sup>=</sup> <sup>13</sup>.9, <sup>p</sup> <sup>&</sup>lt; <sup>0</sup>.001], whereas ILD sensitivity did not change with level for pure tones [χ 2 (2) <sup>=</sup> <sup>0</sup>.68, <sup>p</sup> <sup>=</sup> <sup>0</sup>.71] or the Ock call [<sup>χ</sup> 2 (2) <sup>=</sup> <sup>2</sup>.2, <sup>p</sup> <sup>=</sup> <sup>0</sup>.32].

## Sensitivity of A1 Neurons to ILD–Discrimination Index and Modulation Range

We characterized the discrimination quality afforded by the ILD sensitivity of A1 cells by calculating DIs and the modulation range of firing rates over the test ILD range. **Figure 5** illustrates the distributions of DIs (Column A) and the modulation range of spike rates (Column B) for the four stimuli. The modulation range and DI were computed at each cell's optimal level (ABLopt), which was defined as follows: if the within-level analysis (One-Way ANOVA) revealed only one ABL with significant ILD sensitivity, that ABL was defined as ABLopt. Otherwise, the ABL with the highest DI was defined as ABLopt. Therefore, data

FIGURE 5 | Discrimination index (DI) and modulation range. Column (A) shows the distribution of DI ABLopt for all stimulus types. Column (B) shows the distribution of modulation range with respect to changes in ILD at the ABLopt. Shading of bars in both columns indicates whether or not these cells were significantly modulated in response to different ILDs. Arrows indicate the means of the distributions.

points in each sub-panel of **Figure 5** are independent, as each cell is only represented once at the ABLopt. Note that the DI calculation does not make any assumptions about the shape of the sensitivity function; therefore, ILD-insensitive (p > 0.05) cells simply have low DIs, reflecting the experimental observations. The distributions of ABIopt for each stimulus are shown in **Table 1**; this distribution was significantly dependent on stimulus [χ 2 (6) <sup>=</sup> <sup>28</sup>.7, <sup>p</sup> <sup>&</sup>gt; <sup>0</sup>.0001]: for the pure tone, the number of cells at ABLopt is relatively consistent, whereas, this number increases with ABL especially for the Tsik and Twitter calls, where more cells have ABIopt of 70 dB.

The mean (±SEM) DIs for the CF tone, and the Ock, Tsik and Twitter calls were 0.3990 ± 0.0098, 0.4016 ± 0.0099, 0.3668 ± 0.0103, and 0.3483 ± 0.0094, respectively (**Figure 5**, Column A), significant differences were found between groups (p < 0.0001, repeated measures ANOVA). Note that the highest DI was found for the stimulus in our set that encompassed the largest range of frequencies (Ock), but that the next highest DI was found for the CF tones with the narrowest frequency spectrums. This however has to be interpreted with caution as the CF tones, by definition, were played at the cells' characteristic frequency, and hence the majority of cells were responsive. A similar trend was observed for the modulation range (**Figure 5**, Column B), with the neuronal responses to the CF pure tone (28.8 ± 2.5) and the Ock call (31.1 ± 2.8) showing higher mean modulation rates than those to the Tsik (24.3 ± 2.7) and Twitter (17.6 ± 1.7) calls; again, significant differences were evident (p < 0.0001 repeated measures ANOVA). Post-hoc tests on both measures revealed that Ock and the CF tone resulted in significantly higher DIs and modulation ranges than the Twitter call (P < 0.05); the remaining pairwise comparisons did not reveal significant differences. In summary, for both CF tones and vocalizations, a large proportion of cells were modulated by ILDs. ILD sensitivity were dependent on stimulus type, ILD related modulation were strongest in response to the Ock and the CF pure tone in comparison to the Tsik and Twitter.

## Characterization of ILD Sensitivity Functions

For cells that were significantly ILD-sensitive (p < 0.005, ANOVA) we conducted analyses to characterize the sensitivity function in terms of the ILD location of the midpoint of the ILD sensitivity function and the slope of the ILD sensitivity function.

#### Lateral Biases in ILD-Sensitivity Functions

A key parameter for characterizing the ILD sensitivity function is the midpoint, which corresponds to the point of inflection of monotonic functions (Equation 1), or the peak of Gaussian functions (Equation 2); in both cases, this is represented by parameter d0. For cells that are best described by a monotonic ILD-response function, this parameter indicates the point around which the neuronal responses are most sensitive to small changes in ILD. For the monotonic functions, we wanted to represent the midpoint ILD relative to the preferred side (the set of ILDs at which firing rate was maximum), independent of whether that corresponded to ILDs favoring the contralateral or ipsilateral ear. This was achieved by multiplying d<sup>0</sup> by -1 for every monotonic cell that had preferred ILDs favoring the contralateral ear (parameter A < 0). As a result, all positive values in **Figure 6**, Columns A–C, indicate midpoint ILDs that are closer to the preferred set of ILDs, and conversely, negative values indicate midpoint ILDs closer to the non-preferred set of ILDs.

**Figure 6** Column A shows midpoint data at only the ABLopt, for all cells with significant monotonic ILD sensitivity function. Each cell accounts for only one data point in the plots of **Figure 6** Column A. For all stimuli, the midpoint ILDs of the monotonic sensitivity functions were generally closer to the preferred side, as indicated by the skew toward positive values in **Figure 6A**. These distributions were significantly different from zero (all 4 stimuli: p < 0.0001, one sample t-test; **Figure 6A**).

In **Figure 6** Columns B and C, we separated the monotonically sensitive responses into those that preferred ILDs favoring the ipsilateral ear (B) or the contralateral ear (C). We now included in the analysis data for significant ILD-sensitive functions at nonoptimal ABLs; hence in these plots, each cell could potentially account for three data points (one at each ABL). This analysis was performed separately according to laterality of the ILD sensitivity curve, given that pooling data could result in a systematic bias in favor of the more numerous type (the contralateral-preferring) of cells. The results confirm a previously observed bias in the pooled data toward the preferred ear, with only one exception: Ock ipsilateral (p = 0.22) where the distribution was not significantly shifted away from zero. The means of all other distributions differed significantly from zero (CF tone, ipsilateral: p < 0.0001, contralateral: p < 0.0001; Ock, contralateral: p < 0.0001; Tsik, ipsilateral p = 0.0003, contralateral: p < 0.0001; Twitter, ipsilateral: p = 0.0006, contralateral: p < 0.0001; one-sample t-test in all cases). Moreover, all distributions that were significantly different from zero had mean midpoint ILDs favoring the preferred ear, i.e., for cells that preferred the contralateral ear, the midpoints were also closer to the ILDs favoring the contralateral ear, and for cells which preferred the ipsilateral ear, the midpoints were also biased to ILDs favoring the ipsilateral ear.

We also investigated whether the position of peaks in peaked cells favored either ear. **Figure 6** Column D shows data for cells with peak sensitivity at the ABLopt. This analysis was simply performed relative to recording hemisphere: negative values represent peak-response ILDs favoring the contralateral ear and positive values represent peak-response ILDs favoring the ipsilateral ear. The distributions of peaks for the Tsik and Twitter calls were significantly biased to ILDs favoring the contralateral ear (p = 0.005 and p = 0.04 respectively, one sample t-test). This effect was not seen for the Ock or CF tone stimuli (p > 0.05), which showed no significant bias. However, interpretation of this result must be tempered by the small number of A1 cells with peaked ILD-response for Tsik (7 units) and Twitter (7 units) calls. When we analyzed all conditions that elicited significantly peaked tuning, regardless of level (**Figure 6**, Column E), we also found that the distributions for all stimuli were not different from zero (p > 0.05). In summary, laterality effects were generally absent among the populations of cells for which ILD sensitivity was best described by peaked functions; i.e., peaked ILD functions tended to have their peaks at zero ILD which corresponds to the midline in azimuth.

## Slope of ILD Sensitivity Functions

The slope of ILD sensitivity functions of A1 cells was quantified for both monotonic functions (Equation 1) and peaked functions (Equation 2) by the parameter σ, which has been commonly used to identify relative rates of change in sensory neuroscience for Gaussian and monotonic response functions (i.e., DeAngelis and Uka, 2003; Lui et al., 2006). Note that σ values are comparable between models/equations, with a given σ value reflecting a similar relative gradient. Indeed, when we examined the fitted σ values of both peaked and monotonically sensitive cells (at ABLopt; **Figure 7** Column A), no difference was found between the slopes for the two types of ILD sensitivity function for any of the stimulus types (p > 0.05, Wilcoxon rank-sum test). Median σ values were between 10.2 and 7.2, and there was no significant difference between responses to different stimuli (p = 0.22, Kruskal– Wallis due to non-normal distribution; **Figure 7** Column A). In real terms, the median σ values of 7.2 and 10.2 corresponded to

preferring groups, respectively, and all significant responses are included; in this instance, there could be up to three data points from each unit if

> 80% of the cell's dynamic range spanning approximately 13.1 and 18.5 dB, respectively, in ILD. When the analysis was expanded to also include responses at non-optimal ABLs (**Figure 7** Column B), the same conclusion was reached (p > 0.05, Wilcoxon rank-sum test).

indicate the mean, \**p* < 0.05, \*\**p* < 0.001. Abbreviations: contra, contralateral; ipsi, ipsilateral; pref, preferred; non-pref, non-preferred.

#### Changes in Sensitivity Type with Changes in ABL

Our initial analysis (**Figure 4**) demonstrated that the sensitivity to ILD of A1 neurons depends on stimulus level (ABL). To explore this relationship in greater detail at the level of single cell responses, we analyzed all ILD-sensitive responses as determined by Two-Way ANOVA (see Experimental Procedures). For this analysis, it was also necessary for ILD-sensitive units to be significantly sensitive to ILD at their ABLopt (One-Way ANOVA); therefore, we adopted the additional criterion that the unit had to be significantly sensitive to ILD for least one of the three test ABLs in order to be considered as ILD-sensitive. With this additional criterion, several units (3 units for the pure tone, 2 for Ock, 2 for Tsik and 9 for Twitter; not necessarily the same units) were excluded from the ILD-sensitive group and not included in this analysis. Note that this discrepancy is possible because of the increase in the number of trials when data from all ABLs was considered, together with the more stringent criteria applied to the One-Way ANOVA for the reason of multiple comparisons. For cells which were significantly sensitive at only one ABL, that ABL was considered to be ABLopt; for units which were sensitive to ILD at multiple ABLs, the ABL that yielded the best DI was considered the ABLopt. Units that were not significantly selective for ILD were not included in this analysis. The total number of units with preference at each ABL for each stimulus type is shown in **Figure 10**.

responses are shown, so there could potentially be three data points from each cell. Shading indicates the type of ILD-response function. Arrows indicate the medians of the distributions. No significant differences were found.

For a neuron to code for ILD independent of ABL, a major criterion is that it must maintain its sensitivity type (monotonic or peaked) across changes in ABL. This stability of ILD sensitivity type with changes in ABL was observed in 30–50% of cells when ABL shifted by 20–40 dB from ABLopt (**Figure 8**). Although this is not the only criterion that needs to be fulfilled for ability to code ILD independent of ABL (as we will address below), as the first pass we classed such cells as "level invariant" sensitivity type (black bars). The remaining cells were considered to have level-variant sensitivity (gray and white bars). Of the latter group, the majority (68%) of ABL variant responses became insensitive for ILD when ABL was shifted away from optimal (white bars in **Figure 8** example cell is shown in **Figure 2** Cell B pure tone) while the remainder of the level-variant responses showed a change in ILD sensitivity function type when the ABL was shifted to a nonoptimal level 20 dB away from ABLopt (gray bars in **Figure 8**; example cell in **Figure 2** Cell C pure tone). It could be argued that neurons with different ILD-response functions at different ABLs may still prefer the same "side" of space. However, despite such an overall preference for the same "side" of space, the different ILD sensitivity function type indicates that there will be ILDs for which these sensitivity functions may give quite different responses. In population read-outs, especially for accuracy tasks (i.e., where in space is this sound coming from?), different

FIGURE 8 | Distribution of level invariance in ILD sensitivity with respect to stimulus type and ABL. Sub-plots in each row illustrate data from different stimuli; left and right bar groups indicate the ABL difference from ABLopt. Shading indicates the percentage of each cell type as indicated by the legend. Numbers in brackets the number of cells included for each stimulus.

shapes of ILD-response functions will contribute differently to the overall read-out (see Ma et al., 2006) and this will confuse the read-out. Thus, we argue that it is important to restrict the classification of level-invariant cells only to those with the same ILDsensitivity function, and not to broadly include event those with changes in ILD-sensitivity function but maintained preference for the same "side" of space.

Level invariance depended on stimulus [χ 2 (6) <sup>=</sup> <sup>21</sup>.92, <sup>p</sup> <sup>=</sup> 0.0013; **Figure 8**]: more cells were level-invariant to the pure tone and the Ock call compared to the Tsik and Twitter calls. There was also a change in the distribution of level invariance when ABL was shifted by 20 dB from ABLopt vs. when it was shifted by 40 dB from ABLopt (**Figure 8**, left vs right). In the latter case, a smaller proportion of cells remained level-invariant [χ 2 (2) <sup>=</sup> <sup>12</sup>.8, p = 0.002]. These results were also reflected in the DIs: for the pure tone and Ock, the reduction in DI as ABL shifted away from ABLopt was less than for Tsik or Twitter [**Figure 9**; main effect for stimulus type F(3, 1) = 3.28, p = 0.02]. Predictably, the reduction in DI was also more substantial when the ABL was further from ABLopt [**Figure 9**; main effect for ABL F(3, 1) = 3.28, p < 0.00001]; this result was consistent for all stimuli as no interaction effect was found [F(3, 1) = 0.28, p > 0.05].

In summary, 30–50% cells maintained the same sensitivity type (i.e., were level-invariant) when ABL shifted from its optimal level. The percentage of such level-invariant cells and the change in DI depended on the stimulus type and on the difference in ABL from ABLopt. In the following sections, we describe changes in response properties that account for these level-dependent changes.

## Level-Dependent Changes in ILD-Response Functions

Consistent with the loss of sensitivity to ILD at non-optimal ABLs, a cell's modulation range was the most reliable indicator of the ABLopt for each stimulus. Predictably, as this measure

illustrated for each of the four stimuli separately; bars indicate means and error bars are SEM. Color code as per the legend indicates the absolute difference of ABL from ABLopt. Numbers in brackets the number of cells included for each stimulus.

was indirectly use to calculate ABLopt, the modulation range was smaller for non-optimal ABLs in comparison to ABLopt for all stimulus types (repeated measures t-test: p < 0.00001 for all four stimuli). This effect is illustrated in Columns A–C of **Figure 10** which plots the modulation range for the ABLopt vs. that for the non-optimal ABL. Note that for all stimuli, and for all ABLopt levels (Column A: ABLopt = 30 dB; Column B: ABLopt = 50 dB; Column C: ABLopt = 70 dB), the majority of data points fall below the line of equality, indicating lower dynamic ranges for the non-optimal ABL than the ABLopt (see **Figure 10**. Column D for summary). As the DI which was used to calculate ABLopt is heavily dependent on the modulation range, the modulation range will be greater for ABLopt in comparison to non-optimal ABLs. What is important here is that shifting the ABL from its optimal level by 20 dB will on average reduce the modulation range by approximately 40%.

To determine the ability of a cell to code for ILD independent of ABL, as well as maintaining sensitivity type (as noted above), the position and rate of change of the ILD-response should also be independent of ABL. We restrict analysis to level-invariant cells since, for level-variant cells, a change in sensitivity type (including loss of tuning) already indicates that the neuron cannot code for ILD in the same way for those two ABLs; secondly, parameters fitted to a function specifically refer to specific aspects of the tuning curve for monotonic (Equation 1) and peaked (Equation 2) curves; and lastly, it is not possible to reliably compare fitted parameters when a cell has lost its sensitivity at an non-optimal ABL. The two parameters examined here, σ and d0, were not involved in the calculation of ABLopt, therefore potential differences found here can be attributed to the behavior of the neurons, and not to the method of analyses. We first evaluated the consistency of ILD-response function positions when ABL was varied, by comparing midpoints (parameter d0) between ILD-response functions from different ABLs (for level-invariant responses only). These data are illustrated in Column A of **Figure 11**. If the position of ILD-response functions was maintained across intensities, positive correlations would be evident in these data. Significant correlations were present but sporadic, being manifest at 20 dB from ABLopt (circles) only for the pure tone and the Twitter call, and at 40 dB from ABLopt (crosses) for the Ock call (as denoted by <sup>∗</sup> beside the r-value in the legend). Given that the majority of the cells were sensitive to ILD in a monotonic way (see **Figure 3**), we tested the hypothesis that there was a systemic shift in the midpoint, either toward or away from the preferred ear, when ABL was increased by 20 dB (regardless of which ABL was optimal). While shifts in either direction were observed for individual neurons, on the population level, these shifts were not significant in one direction or the other for any of test stimulus (**Figure 11** Column B; CF tone: p = 0.46; Ock call: p = 0.11; Tsik call: p = 0.66; Twitter call: p = 0.52; repeated measures t-test in all cases). We also tested the hypothesis that there was a systemic shift in midpoint either toward or away from the preferred ear when ABL was changed away from ABLopt by 20 dB. We also did not observe any systemic changes (**Figure 11** Column C; CF Tone: p = 0.13; Ock call: p = 0.21; Tsik call: p = 0.62; Twitter call: p = 0.24; repeated measures t-test in all cases).

We also compared the rate of change in firing rates with ILD for level-invariant cells and found that the slope of the ILDresponse functions did not change when stimulus ABL increased, as measured by comparing the fitted parameter σ for levelinvariant responses at multiple ABLs. The results of this analysis are shown in Column A of **Figure 12**. While changes in slope were observed for individual cells, no significant changes with respect to increasing ABL were found for any of our stimuli when considered across the population (p > 0.05; Wilcoxon ranksum test due to non-normality). We also assessed whether the slope of the ILD-response curves changed when ABL was shifted away from optimal (**Figure 12** Column B). No such changes were evident across our entire stimulus set (p > 0.05; Wilcoxon rank-sum test).

In summary, we found that the changes in ILD-response properties with respect to ABL were mostly accounted for by the reduction in modulation range (of spike rates) when the ABL was non-optimal. The midpoints of the ILD-response functions were maintained for level-invariant cells for some, but not all, stimulus types. Finally, as a population, there were no systematic shifts in midpoints or slopes of ILD-response functions, when the ABL of the stimulus changed.

## Discussion

This study provides new information regarding ILD sensitivity of vocalization responses of A1 neurons in the marmoset, and extends previous findings of ILD tuning to CF tones. Single cell

ILD responses in A1 to CF pure tones and calls were similar in that sensitivity was heavily dependent on stimulus level (ABL), and the distributions of response types (contralateral/ipsilateral monotonic or peaked) were similar. However, on a population level, more cells were responsive to the appropriate CF pure tone and to the Ock call than to the Tsik or Twitter calls, and most cells showed greater response sensitivity (e.g., greater modulation depth of responses) to the CF pure tones and the Ock call than to the Tsik or Twitter calls.

The majority of the ILD-sensitive cells had monotonic response functions with a strong bias toward the side from which stronger responses were elicited. By definition, this was obviously true of the ILDs that elicited the maximum firing rate in each cell, but we also found this bias in the midpoints of the ILD-response functions, which also tended to be displaced toward the ear eliciting the stronger responses (regardless of whether this was the contralateral or ipsilateral ear). For cells that had peaked ILDresponse functions, evidence for lateralization was very weak. Regardless of the type of ILD-response function, a wide range of slopes was observed, with the dynamic range of neurons typically spanning 12–20 dB.

We also found that, for all stimuli, ILD sensitivity was heavily dependent on sound level, with a decrease in the number of neurons that exhibited the same form of ILD sensitivity when the ABL was shifted away from ABLopt, as well as a reduction in modulation range and DI. However, this also depended on the stimulus, and ILD-sensitive responses to the pure tone and the Ock call appear to be more robust to changes in ABL than

those to the Tsik or the Twitter call. Finally, we found no systematic change in the position of the midpoints or slope of the ILD-response functions to any stimulus, with changes in stimulus level.

## Broad ILD-Response Functions of A1 Neurons Suggests a Distributed Population Code for Auditory Space

Typically, spatial receptive fields in A1 span tens of degrees along the azimuth, and are confined to one hemifield or the other, with the majority of cells showing sensitivity to the contralateral hemifield (Brugge et al., 1996; Reale et al., 2003; Zhou and Wang, 2012). Our finding that most cells displayed monotonic sensitivity for ILDs that favor contralateral azimuths is consistent with this pattern, with the important caveat that ILDs do not map linearly to varying spatial locations (e.g., Martin and Webster, 1989; Middlebrooks et al., 1989; Slee and Young, 2010). While we only used the ILD cue, our results are generally compatible with those using free-field sound sources (i.e., Middlebrooks et al., 1998; Woods et al., 2006), and particularly to those in the study of Zhou and Wang (2012, 2014) which was also performed in the marmoset. At the very least, this would suggest sound localization information is available in A1 neurons upon the presentation of stimuli with ILD without ITD.

Large receptive fields that favor either hemifield may appear to conflict with behavioral results in both monkeys and humans, in which the lowest thresholds are observed near the midline (i.e., Recanzone and Beckerman, 2004). Furthermore, localization accuracy using broad-band noise and pure tones is substantially better than the width of typical A1 spatial receptive fields (Makous and Middlebrooks, 1990; Nelken et al., 2008). These findings can be reconciled by the notion of distributed coding, where a large number of broadly sensitive units combine to give information about space, rather than a point code where a small number of active neurons with small precise receptive fields for different spatial locations yield sufficient information about space, as reported in primary visual cortex (Middlebrooks et al., 1998; Stecker and Middlebrooks, 2003; Stecker et al., 2005). The concept of distributed coding of auditory space localization in A1 (e.g., Jenkins and Merzenich, 1984) is supported by the ILDresponse function widths. In the marmoset, ILDs at medium to high frequencies range from approximately +25 dB to −25 dB (mainly from +20 dB to −20 dB) across the head (Slee and Young, 2010). The modulation range of a typical A1 cell in our sample covers approximately one third of this range, and therefore the call can be considered relatively broadly tuned. However, when population responses are taken into account, broad ILD responses will increase the number of cells that participate in the computation, thereby improving neural decoding by averaging out noise between more cells (see Bejjanki et al., 2011).

In distributed population codes, it has been shown both computationally (Ma et al., 2006; Law and Gold, 2009) and experimentally (Purushothaman and Bradley, 2005; Jazayeri and Movshon, 2006) that the steepest portion of the response function, if given the right situation, is the most influential with respect to perception. This portion of the ILD-response function is represented by the midpoint (parameter d0) in our data. For auditory spatial sensitivity, this most informative part of the response function for the majority of cells has also been shown to be located close to the midline (Stecker et al., 2005; Campbell et al., 2006). However, equal distribution of midpoints around the midline could not account for observed deficits after unilateral lesions, which are confined mostly to the contralateral space (i.e., Jenkins and Masterton, 1982; Jenkins and Merzenich, 1984; Kavanagh and Kelly, 1987; Bizley et al., 2007). We found that the midpoints of monotonic ILD-response functions were not evenly distributed around the center (0 ILD), but were shifted toward the preferred side of space. This suggests that the "center of mass" of information across the population of A1 neurons will favor the contralateral side, which could better explain the lateralization deficits observed following unilateral A1 lesions. Even so, given that (a) ILD-response functions are broad, (b) the center of mass of d<sup>0</sup> is only on average 6 dB in ILD away from equality, (c) both contralateral and ipsilateral-preferring cells are present in each hemisphere, and (d) cells from both hemispheres contribute to sound localization, our results are also compatible with the fact that the best behavioral thresholds are found around the midline.

## ILD Responses Depend on ABL for Complex Naturalistic Stimuli

Our results support and extend those obtained by Zhou and Wang (2012) in the marmoset, who reported changes in spatial receptive fields when the sound level of broadband noise stimuli (in their study) was varied in free-field conditions. The pattern of ILD sensitivity to CF tones we observed in marmoset A1 was similar to that described in A1 neurons from other species (e.g., Semple and Kitzes, 1993a,b; Irvine et al., 1996; Grothe et al., 2010), where ILD sensitivity was also dependent on sound level. Our data extend these findings into the domain of natural stimuli (vocalizations). By using these stimuli, we were able to demonstrate that the extent to which ILD sensitivity changes with ABL depends on the stimulus. ILD sensitivity to the pure tone or Ock call was more robust to changes in ABL than that of the Tsik or the Twitter call. Interestingly, this result correlated with the overall ILD sensitivity of the different stimuli; we stress here that this result is unlikely to be a consequence of our analysis, as we analyzed the difference in DI from that of the ABLopt which would account for any differences in overall sensitivity (**Figure 8**). The stimulus dependent ILD sensitivity can most likely, at least in part, be attributed to the spectral composition of the stimulus relative to the cell's preference; others indeed have found that spatial responses in auditory cortex depends on stimulus bandwidth (i.e., Eisenman, 1974; Rajan et al., 1990; Clarey et al., 1995), although these studies did not use naturalistic stimuli.

Vocalizations of marmoset do not have a stereotypical template; factors like bandwidth, harmonic ratio and duration varies between animal to animal and even between calls of the same animal for a particular call type (Dimattina and Wang, 2006). We choose to use only one token per call for each experiment, as we wanted the variance in the neural responses to reflect neural noise rather than the variation in the stimulus. Moreover, while our stimuli were calls recorded from marmosets, we did not use control stimuli of comparable spectral and temporal complexity (i.e., Fukushima et al., 2014), therefore, it's difficult to conclude that these ILD responses, or differences in ILD sensitivity, can be directly attributed to the identity of these calls. The results here should be interpreted as responses to complex naturalistic stimuli of varying bandwidths and complexities.

At first sight, the proportion of ILD-sensitive neurons in our sample (∼60 to ∼80%; **Table 1**) appears less than that reported in the cat (Semple and Kitzes, 1993a), where over 90% of A1 neurons are jointly influenced by ILD and ABL. However, this may not reflect a species difference, as the majority of ILD-insensitive cells in our sample were actually unresponsive to a particular stimulus. In the Semple and Kitzes studies (1993a,b) only pure tones were used as stimuli, and only cells responsive to pure tones were then tested for ILD sensitivity. In contrast, in our study all cells were tested for ILD sensitivity. We found that 78% of cells were ILD sensitive to pure tones, and if we exclude from our pure tone tally the nine cells that were insensitive to ILDs in pure tones (but were sensitive to ILDs in at least one call), the percentage of cells sensitive to ILDs in pure tones rose to 90%. Note that we do not discount other possible factors that would explain differences between our data and those in the cat, one of which is the difference in the anesthesia regime employed in each study. The barbiturate anesthesia used by Semple and Kitzes (1993a,b) potentiates inhibition, and may have increased the amount of non-monotonic type sensitivity, a property associated with cortical inhibition (Razak and Fuzessery, 2010; see also Rajan, 1998, 2001; Rajan et al., 2013 for discussion on the effects of sufentanil).

Several studies have found that spatial receptive fields broaden as sound levels increase (Brugge et al., 1996; Middlebrooks et al., 1998; Xu et al., 1998; Reale et al., 2003; Mrsic-Flogel et al., 2005). The expansion of spatial receptive fields could be reflected in ILD-response functions in two ways: the midpoint of monotonic ILD-response functions could shift away from the preferred side toward the non-preferred side, and/or the slope of the ILDresponse function (σ) could decrease. While some individual cells exhibited this behavior, a similar number of cells also exhibited the opposite, which equated to no net change for the population. This result applied for CF tones and all calls, when ABL increased. One trivial explanation for the observed pattern could be that only cells which maintained sensitivity at multiple ABLs were included in our analysis, whereas the response curves broaden when cells become insensitive at non-optimal intensities. However, other investigators have reported compatible results: while spatial receptive field sizes may change with increasing level, the positive and negative changes offset each other, which equated to no significant changes in receptive size (Mickey and Middlebrooks, 2003; Woods et al., 2006; Zhou and Wang, 2012). Although the latter studies were conducted in awake animals, we have shown that our opiate anesthetic regime yields auditory cortical recordings that are more comparable to those described for awake animals, at least in early hierarchical stages of processing such as A1 (Rajan et al., 2013). Thus, our results suggest that while ILD information is carried by different groups of cells across different intensities, neither receptive field size nor the distribution of midpoints changes with stimulus level, at least for the cell population as a whole. This pattern could serve to simplify a putative level-invariant read-out strategy.

On the population level, we also observed a disproportionate number of ILD sensitive cells which had ABLopt of higher levels (**Table 1**), interestingly this was only observed for the calls, and was particularly pronounced for the Tsik and Twitter. Considering also that these two calls had a greater reduction in DI when ABI was shifted away from ABLopt, this suggests that sources of the Tsik and Twitter call can be more easily localized at higher intensities. This hypothesis, as far as we are aware, has never been tested.

## Level-Invariant Representation of Space in A1

To understand the generation of level-invariant representations of auditory space, knowledge of the responses of single cells is critical, as population averages usually have broader sensitivity than the sum of the individual units, and neuronal read-outs leading to behavior do not necessarily follow the average of all cells. In addition to our investigation of responses to ILDs in vocalizations, we have specifically addressed the issue of the effects of shifting the sound level away from optimal on the ILDresponse properties of individual neurons in A1. The principle result of such a change is that decreased ILD sensitivity is largely accounted for by modulation of spike rates; while the shape (σ) and position in ILD space (d0) for individual neurons may vary between ABLs, these as a population do not change systematically. These effects occurred for both vocalizations and CF pure tone stimuli.

Even for the most reliably level-invariant cells (e.g., **Figure 2A**, Ock), the sound level (and the stimulus type) has to be known before one can reliably infer ILD from neuronal response. Therefore, one assumes that for a system to decode space from the ILD responses of a population of A1 cells, it has to know the level of the stimulus a priori. Indeed, this would be the most accurate way to determine ILD: the "weights" to be given to the information carried by each neuron contributing to the overall perceptual decision can be assigned upon stimulus level. It has been suggested that neurons can change their input weights according to the reliability of evidence from different sensory channels (Fetsch et al., 2009), and a similar mechanism could function in this context. However, our data also suggest that a level-invariant read-out, in which all weights remain the same regardless of level, may be effective. For linear classifiers, adding cells that contain no information will not degrade the system's performance; thus, level-variant cells that become insensitive to ILD at non-optimal levels would not preclude level-invariant read-outs. It remains to be seen whether a large population of A1 neurons could represent ILD in an level-invariant way, either through simulation of multiple realistic single neurons (i.e., Ma et al., 2006), or via multiple neurons recorded simultaneously, which takes into account real correlations between neurons (i.e., Graf et al., 2011). Considering that we observed no population shifts in the distribution of midpoints (d<sup>0</sup> parameter) and steepness of ILD-response functions with respect to ABL, in theory this increases the chances of level-invariant coding strategy being successful.

## References


It has previously been thought that the spike rates of cortical neurons are insufficient to account for behavioral performance, with spike patterns consistently carrying more information than spike counts alone (Middlebrooks et al., 1994, 1998). More recently, it has been suggested that population spike counts of cortical neurons using an "opponent-channel" strategy can perform relatively well (Stecker et al., 2005; Miller and Recanzone, 2009). The spike counts of A1 units sampled in the present study carry more information regarding space than those investigated by Middlebrooks et al. (1994), with the positive correlation between positions of ILD-response functions (midpoints) for different levels potentially facilitating level-invariant decoding.

## Conclusions

Our study extends previous findings of ILD sensitivity of A1 neurons in response to pure tones to encompass natural marmoset vocalizations. While similar types of ILD-response functions were found for each stimulus, A1 cells were more sensitive to ILD for the Ock vocalization and the CF pure tone in comparison to the Tsik and Twitter calls. ILD sensitivity of A1 neurons was dependent on ABL; the extent which this occurred was dependent on stimulus type, reiterating that A1 responses to complex sounds cannot always be predicted by its responses to pure tones. Altogether, our results suggest that a large number of A1 neurons participate in sound localization in order to create a representation of space that's invariant of level.

## Acknowledgments

This work was funded by the National Health and Medical Research Council (NHMRC) project grants 545982 (RR and MR), 1029342 (RR and MR) and 1066232 (LL and RR). LL was funded by Discovery Early Career Researcher Award from the Australian Research Council (ARC; DE130100493). Multiple authors were funded by the ARC Centre of Excellence for Integrative Brain Function (CE140100007). We thank Katrina Worthy for the histological work, and Rowan Tweedale and Tristan Chaplin for useful comments on manuscript. We also thank Janssen-Cilag Pty (Australia) for the donation of sufentanil citrate, which made these experiments possible.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Lui, Mokri, Reser, Rosa and Rajan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Conjugating time and frequency: hemispheric specialization, acoustic uncertainty, and the mustached bat

Stuart D. Washington1, 2, 3 \* and John S. Tillinghast 4, 5

*<sup>1</sup> Center for Functional and Molecular Imaging, Georgetown University Medical Center, Washington, DC, USA, <sup>2</sup> Department of Neurology, Georgetown University Medical Center, Washington, DC, USA, <sup>3</sup> Center for Neuroscience Research, Children's National Medical Center, Washington, DC, USA, <sup>4</sup> Department of Mathematics and Statistics, American University, Washington, DC, USA, <sup>5</sup> Department of Statistics, The George Washington University, Washington, DC, USA*

A prominent hypothesis of hemispheric specialization for human speech and music states that the left and right auditory cortices (ACs) are respectively specialized for precise calculation of two canonically-conjugate variables: time and frequency. This spectral-temporal asymmetry does not account for sex, brain-volume, or handedness, and is in opposition to closed-system hypotheses that restrict this asymmetry to humans. Mustached bats have smaller brains, but greater ethological pressures to develop such a spectral-temporal asymmetry, than humans. Using the Heisenberg-Gabor Limit (i.e., the mathematical basis of the spectral-temporal asymmetry) to frame mustached bat literature, we show that recent findings in bat AC (1) support the notion that hemispheric specialization for speech and music is based on hemispheric differences in temporal and spectral resolution, (2) discredit closed-system, handedness, and brain-volume theories, (3) underscore the importance of sex differences, and (4) provide new avenues for phonological research.

Keywords: acoustic uncertainty, echolocation, Heisenberg-Gabor limit, hemispheric specialization, mustached bats, music, speech, spectral-temporal

## Hemispheric Lateralization for Language: a Multi-Faceted Controversy

The finding that damage to the left cerebral hemisphere in humans impairs receptive language (e.g., speech perception) is seminal to the field of neuroscience (Wernicke, 1874). More precisely, damage to portions of the temporal lobe in a human's left cerebral hemisphere disrupts one's ability to comprehend vocalizations that symbolize objects, ideas, and meanings to oneself and other humans (e.g., words, phrases, and sentences). Comparable right hemispheric damage has fewer effects on receptive language but impairs musical processing (Milner, 1962; Samson and Zatorre, 1991; Zatorre et al., 1994) and pitch discrimination (Sidtis and Volpe, 1988; Robin et al., 1990; Zatorre et al., 1994) as well as the ability to identify a speaker and the prosody of his or her speech (Robinson and Fallside, 1991). Both classical and modern studies commonly show that left cerebral specialization for receptive language is less pronounced in human females than in conspecific males (Lansdell, 1964; McGlone, 1977; Shaywitz et al., 1995).

Neuroscientists have proposed numerous hypotheses to explain this asymmetry. An early hypothesis erroneously credited to Paul Broca relates both left lateralization of language function and, via decussation, right handedness to a general left hemispheric dominance common to most

#### Edited by:

*Yukiko Kikuchi, Newcastle University Medical School, UK*

#### Reviewed by:

*Khaleel A. Razak, University of California, Riverside, USA Robert J. Zatorre, McGill University, Canada*

#### \*Correspondence:

*Stuart D. Washington, Department of Neurology, Georgetown University Medical Center, Room LM14, Preclinical Sciences Building, 3900 Reservoir Rd. NW, Washington, DC 20057, USA sdw4@georgetown.edu*

#### Specialty section:

*This article was submitted to Auditory Cognitive Neuroscience, a section of the journal Frontiers in Neuroscience*

Received: *24 November 2014* Accepted: *07 April 2015* Published: *27 April 2015*

#### Citation:

*Washington SD and Tillinghast JS (2015) Conjugating time and frequency: hemispheric specialization, acoustic uncertainty, and the mustached bat. Front. Neurosci. 9:143. doi: 10.3389/fnins.2015.00143* humans (Harris, 1991). This "Broca handedness rule" implies that most left handed people display right hemispheric dominance for language, an assertion not validated by rigorous empirical studies (Knecht et al., 2000). However, since human tool usage is irrefutably the most advanced in the animal kingdom and is inexorably linked to handedness, the "Broca handedness rule" is appealing as it links left-lateralized cortical control of handedness to left-lateralized cortical control of speech and language. Another hypothesis proposes that in larger mammalian brains, such as those of humans, time-critical neuronal computations strain the capacity of the corpus callosum and would be performed more quickly by intrahemispheric circuits (Ringo et al., 1994). The brain-volume hypothesis implies that hemispheric specialization for communication sound processing would be greater in the left hemispheres of mammals with greater brain volumes than humans, such as proboscidea (e.g., elephants) and cetaceans (e.g., dolphins). These two hypotheses are generally respected as plausible explanations for language lateralization in humans and are not mutually exclusive.

Two other hypotheses are mutually exclusive and have thus generated much debate in the last half century. The first of these is the "closed system" hypothesis, which argues that neural mechanisms underlying receptive auditory communication in humans (i.e., speech perception) are unique to humans, specific for speech and language, and are contained within a "speech organ" in the left hemisphere (Liberman and Mattingly, 1989). Specifically, advocates of the "closed system" hypothesis state that neurons comprising this uniquely human, left-lateralized "speech organ" exclusively process linguistically salient aspects of speech sounds, such as consonants and vowels, and relegate the processing of pitch, loudness, timbre, and location to other, less specialized auditory neural substrates.

The various "domain-general" hypotheses, which state that speech sounds are processed by the same neural substrates as all other sounds and describe language dominance via auditory signal processing, are irreconcilable with the "closed system" hypothesis. Though domain-general hypotheses differ, most implicate a fundamental principle of acoustics called "acoustic uncertainty" as the underlying, evolutionary force driving left hemispheric dominance for language (Zatorre et al., 2002). The acoustic uncertainty principle describes a trade-off between time and frequency such that, at the upper limit of resolution, increasing time (temporal) resolution can only be achieved at the expense of frequency (spectral) resolution and vice-versa. Domain-general theories emerged from decades of observations showing temporal domain processing deficits across multiple language disorders, including aphasia (Efron, 1963; Tallal and Piercy, 1973, 1975), dyslexia (Tallal, 1980; Tallal et al., 1993; Temple et al., 2000), and dysphasia (Tallal and Newcombe, 1978; Tallal et al., 1991, 1993). These language studies were followed by other studies showing either deficits in the spectral domain following right hemispheric lesions (Zatorre, 1985, 1988; Samson and Zatorre, 1988) or a double-dissociation for temporal or spectral domain processing following left or right hemispheric lesions, respectively (Robin et al., 1990). Advocates of domain-general hypotheses argue that both the left and right auditory cortices process speech and other sounds, but only the left auditory cortex has the temporal resolution necessary to differentiate consonant sounds by their formant transition rates and voice-onset-times. A lack of such temporal resolution, either by congenital defect or neurological damage, would render most consonant sounds indistinguishable and thus most spoken languages incomprehensible. In what would appear to be a classic example of "multiple independent discovery," two research groups proposed similar domain-general models based around hemispheric differences in spectral and temporal resolution at nearly the same time (Zatorre et al., 2002; Poeppel, 2003). To avoid favoring one set of terminology over the other, we will use the acronym Asymmetry for Spectral versus Temporal Integration and Resolution (ASTIR) as an umbrella-term for hypotheses that explain auditory hemispheric specialization via a trade-off between acoustic spectral and temporal resolution (see Supplementary Section 1 for a more in depth perspective). Each of the hypotheses conforming to ASTIR shares one polarizing implication: hemispheric specialization should emerge within the brains of any species whose survival hinges upon extracting refined temporal and spectral information from the auditory signals in its environment regardless of brain volume or handedness.

Our aim here is to further validate either the closed-system or domain-general hypothesis of receptive auditory communication by exploring this last implication of ASTIR. We base our exploration of ASTIR around the functional organization of the mustached bat (Pteronotus parnellii) auditory cortex due to (1) the fact that individuals in this species primarily orient themselves and communicate with each other using complex auditory signals and (2) the vast and well-established literature describing the auditory cortical maps in this species.

## The Heisenberg-Gabor Limit: the Mathematical Basis for Acoustic Uncertainty

The Acoustic Uncertainty Principle is one of the many uncertainty principles common to physical sciences. Uncertainty principles are defined by mathematical inequalities that place a limit on the precision of simultaneous measurements of two canonically-conjugate variables (i.e., variables that are Fourier transformations of one another) (Joos, 1948). The Acoustic Uncertainty Principle states that frequency and time are canonically-conjugate variables of sound waves. The mathematical basis of acoustic uncertainty is the Heisenberg-Gabor Limit (Schuller and Batliner, 2014), a principle of signal processing that is applicable to all functions and which states:

$$
\Delta f \cdot \Delta t \geq \frac{1}{4\pi},
$$

where 1f is the standard deviation of frequency and 1t is the standard deviation of time from the peak intensity of the signal.

The Heisenberg-Gabor Limit demonstrates that simultaneous, precise measurements of a function in both the temporal (time) and spectral (frequency) domains is impossible, because refined temporal resolution only comes at the expense of spectral resolution and vice-versa, as shown in **Figure 1**. In terms of short-time Fourier transformations, a wide temporal window permits refined spectral resolution whereas a narrow temporal window permits refined temporal resolution. Such a multiwindow system by definition resigns itself to measuring spectral and temporal components of the same sound on different time scales. We mathematically articulate similarities between the Heisenberg's Quantum Uncertainty Principle and the Acoustic Uncertainty Principle and further explore both concepts in Supplementary Section 1.

Independent groups of researchers postulated that the specialization for speech and music characteristic of the left and right hemispheres of humans, respectively, stems from the use of narrow temporal windows by left auditory cortex and wide temporal windows by right auditory cortex (Zatorre

et al., 2002; Poeppel, 2003). This cortical asymmetry may stem from a right ear advantage for temporal information and a left ear advantage for spectral information that is evident at the level of the cochlea (Sininger and Cone-Wesson, 2004), a natural spectrum analyzer also subject to the Heisenberg-Gabor Limit. The open-ended mathematical nature of ASTIR suggests that such hemispheric specialization would develop in any mammalian species whose survival hinges upon the extraction of precise spectral and temporal information from sounds. For instance, acquisition of both refined temporal and spectral information is key to the survival of mustached bats.

## Echolocation in Mustached Bats: New Perspectives on a Classic Model

The behavior of mustached bats and the functional organization of their auditory cortices has been explored primarily from the perspective of echolocation, the method by which micro-bats (microchiroptera) generate sonar signals to orient themselves and hunt insects (Suga, 1985). During echolocation, mustached bats emit sounds that are comprised of a constant frequency (CF) and downward frequency modulation (FM) and the three harmonics thereof. These four signals (fundamental + 3 harmonics) are labeled H1−4, where the fundamental and each harmonic are composed of CF (CF1−4) and FM (FM1−4) components (**Figure 2**). When flying toward a stationary target, a bat detects both its pulse (i.e., emitted) and echo (i.e., returning) signals, the latter of which has been Doppler-shifted upward in frequency relative to the former. Any possibility of masking by temporal overlap between the pulse and echo is averted since the mustached bat's auditory periphery evolved an enhanced sensitivity to the echo-CF<sup>2</sup> (60–63 kHz in P.p. parnellii, Kanwal et al., 1999, and 57.5–60 kHz in P.p. rubiginosus, Xiao and Suga,

landmarks (blood vessels shown by thick lines) and tuning properties of neuronal responses were used to identify the Doppler-shifted constant frequency (DSCF), anterior primary auditory (A1a), posterior primary auditory (A1p), dorsomedial (DM), CF/CF, FM-FM, and dorsal fringe (DF) areas (adapted from Suga et al., 1983).

frequency (CF) and frequency-modulated (FM) components present in the pulse and echo. (B) Lateral view of the mustached bat auditory cortex showing the location of the DSCF area (shown in gray) as

2002) and a relative insensitivity to the pulse-CF<sup>2</sup> (Suga, 1985; Kanwal, 1999; Kanwal et al., 1999).

Doppler-shifts and echo-delays respectively impose key spectral and temporal changes to the echo H1−<sup>4</sup> that differentiate it from the pulse H1−4, and these differences are in turn exploited by "combination-sensitive" neurons in the mustached bat auditory cortex (Suga, 1978). The neural responses of combination-sensitive neurons are facilitated when certain CF and FM components of the pulse and echo are presented together, such that the facilitated responses are greater (i.e., a greater spike count or higher spike rate) than the sum of the responses elicited when the individual CF and FM components are presented alone. For instance, neurons in the FM-FM processing subregion are facilitated by pulse-echo pairs of FMs (e.g., pulse-FM1+echo-FM3) and tuned to their delays (i.e., inter-stimulus-intervals, 0.4–18 ms) (O'Neill and Suga, 1979). The temporal combination sensitivity of FM-FM neurons enables the bat to detect target range, and the spatial organization of FM-FM neurons forms a cortical map of ranges based on echo-delays in non-primary auditory cortex. Thus, the bat is able to receive accurate range information due to the refined temporal processing of neurons in its auditory cortex.

The CF/CF processing area contains neurons that are facilitated when CFs in the pulse-CF<sup>1</sup> range are combined with CFs in either the echo-CF<sup>2</sup> or echo-CF<sup>3</sup> ranges. Similarly, neurons in the Doppler-shifted constant frequency (DSCF) processing subregion (sometimes referred as the auditory fovea Schnitzler and Denzinger, 2011) are facilitated when CFs in the echo-CF<sup>2</sup> range are paired at onset with CFs in the pulse-FM<sup>1</sup> range (23–27 kHz) (Fitzpatrick et al., 1993; Kanwal et al., 1999). The refined spectral resolution of CF/CF and DSCF neurons enables them to distinguish between the pulse- and echo-CF<sup>2</sup> or, in the case of some CF/CF neurons, the pulse- and echo-CF3. Thus, the bat is able to receive accurate velocity information due to the refined spectral processing of neurons in its auditory cortex. Both the CF/CF and DSCF areas contain maps of relative velocities derived from representations of frequencies at or near echo-CF<sup>2</sup> or, in the case of some CF/CF neurons, echo-CF<sup>3</sup> (Suga and Jen, 1976). The CF/CF area and its velocity map occupy a relatively small portion of the bat's auditory cortex. The DSCF area and its velocity map, on the other hand, occupy the centermost 50% of the primary auditory cortex (A1) and will be of particular importance to this discussion going forward.

For mustached bats, both refined spectral and temporal resolution are essential to tracking the velocity and range of targets. Like momentum and position or frequency and time, precise sonar measurements of velocity and range are impossible to achieve on the same time scale (Parker, 2011). This Doppler Ambiguity could explain nuances of the mustached bat's echolocation behavior. As **Figure 2** shows, when the bat is at rest, the 20-ms CF components of biosonar signals always precede the 3-ms FM components (Suga, 1985). Thus, the bat processes any spectral differences between the pulse and echo CF components imposed by Doppler-shifts (i.e., target velocity) via a wide temporal window prior to processing temporal differences (delays) between the pulse and echo FM in a narrower window. We detail the role of acoustic uncertainty in the context of mustached bat pursuit behavior in Supplementary Section 2. Pharmacobehavioral results confirm that (1) mustached bats can discriminate between 20-ms CFs presented within the pulseand echo-CF<sup>2</sup> ranges with a 0.05 kHz resolution and (2) this refined frequency discrimination is performed by neurons in the DSCF area (Riquimaroux et al., 1992). Substituting 1t in the Heisenberg-Gabor Limit formula with 20 ms shows that the maximum frequency discrimination (i.e., spectral resolution) possible using a typical echo-CF<sup>2</sup> is 4 Hz (or 0.004 kHz).

Doppler Ambiguity exposes a potential flaw in ASTIR. Velocity and range are processed within the bilateral, cortically adjacent subregions of DSCF, CF/CF, and FM-FM, despite being canonically-conjugate variables. Thus, echolocation demonstrates that the ethological need to precisely calculate canonically-conjugate variables is not necessarily sufficient biological pressure to impose hemispheric specialization. Indeed, anterior auditory field (AAF) in rodents (Linden et al., 2003; Trujillo et al., 2011) and cats (Schreiner and Urbas, 1988; Tian and Rauschecker, 1994; Imaizumi et al., 2004; Carrasco and Lomber, 2009) as well as the AAF homolog of the rhesus macaque (rostral auditory field, or Field R) (Rauschecker et al., 1997) are specialized for faster temporal processing relative to A1. It is conceivable that AAF and A1, like FM-FM and DSCF, could process canonically-conjugate variables like time and frequency bilaterally, making ASTIR unnecessary. Such an issue would be a stronger criticism of ASTIR if the extent of our knowledge on mustached bat auditory cortex were limited to its role in echolocation.

## Conflict and Concord: Social Call and Biosonar Signal Processing in the FM-FM and DSCF Areas

Among animals, only human speech (Liberman et al., 1967) and the social calls of cetaceans (Payne and McVay, 1971), mimicking birds (Marler and Pickert, 1984), and some primates (Sutton, 1979) show equal or greater spectrotemporal acoustic complexity than those of mustached bats and other CF-FM bats (Kanwal et al., 1994; Kanwal, 1999; Clement et al., 2006; Ma et al., 2006). Multi-dimensional scaling helped to classify the 19 recurring mustached bat social call syllables as CFs, FMs, or NBs (noisebands) (Kanwal et al., 1994). Unlike the repeating, stereotypic call sequences of frogs (Wells and Schwartz, 1984) and song birds (Marler and Pickert, 1984), mustached bats emit a variety of simple syllabic social calls and calls that are composites of simple syllables. These composite calls reveal a phonetic-like syntax to mustached bat communication, as only 11 of the 19 syllables are even used to construct composites and mustached bats emit only 4% (15/342) of all the possible composites.

Classic studies of biosonar signal processing in mustached bat auditory cortex described the FM-FM, CF/CF, and DSCF areas as "specialized" for echolocation. Prominent language researchers interpreted this specialization for echolocation as a closed-system, likening it to and presenting it as evidence for the closed-system model of speech (Liberman and Mattingly, 1989). CF/CF neural responses to social calls have not been sufficiently studied to warrant discussion here. However, neurons in the FM-FM (Esser et al., 1997; Kanwal, 1999, 2006) and DSCF (Kanwal, 1999, 2006; Washington and Kanwal, 2008) areas respond robustly to conspecific social call syllables, a result noted by critics of closed-system models of speech (Tallal, 2012). Call selectivity within the FM-FM and DSCF areas has a semblance of compatibility with theirrespective temporal (range) and spectral (velocity) domain processing roles in echolocation. FM-FM neuron responses to composite social calls decline when an artificial silent interval is introduced between the two simple syllables; as the duration of that silent interval increases, FM-FM neuron responses monotonically decrease (Ohlemiller et al., 1994; Esser et al., 1997). Furthermore, either reversing the natural order of composite calls or presenting a simple or composite call in reverse is sufficient to reduce the magnitude of FM-FM neural responses to social calls (Esser et al., 1997). Each of these experimental manipulations had the effect of corrupting the natural temporal structure of social calls and of diminishing excitatory responses of temporally combination sensitive FM-FM neurons.

Likewise, the magnitudes of DSCF neuron responses to certain social calls are known to be comparable to and may even surpass the magnitudes of their responses to pulse-echo CF components (Kanwal, 1999, 2006). Call selectivity in DSCF neurons is based primarily on spectral facilitation. Specifically, when the spectral components of social calls that traverse both the pulse-FM<sup>1</sup> and echo-CF<sup>2</sup> ranges are extracted from the call and presented separately, the neuron's response to both call-components (pulse-FM1-range+echo-CF2-range) is facilitated such that its magnitude is greater than the sum of response magnitudes elicited by each call-component alone. Such band-pass filtered call-components may elicit responses of greater magnitude than the entire natural social call due to the absence of spectral energy traversing inhibitory response areas. Further, similar to FM-FM neurons, DSCF neuron responses to social calls are greatly diminished by reversing the call, but this phenomenon in DSCF neurons may be attributed to asymmetrical inhibitory areas flanking the narrow, excitatory echo-CF<sup>2</sup> range.

Temporal processing of social calls amongst neurons in the FM-FM area appears concordant with their role in calculating target range during echolocation. However, the means by which DSCF neurons process social calls often differs from how they calculate target velocity via Doppler-shift. Most simple call syllables of mustached bats contain linear, curvilinear, or sinusoidal FMs. Frequency-modulated mustached bat social calls often contain FMs with rates surpassing 500 Hz/ms (e.g., bent-upward FM), and some curvilinear or sinusoidal FM calls have instantaneous rates higher than 5 kHz/ms (e.g., stretched-rippled FM and checked-downward FM). Many neurons in the DSCF area are responsive to rapidly-modulated call components that traverse the echo-CF<sup>2</sup> range (Kanwal, 1999, 2006; Washington and Kanwal, 2008). FM selectivity is a commonality mustached bat DSCF neurons share with A1 neurons in other mammalian species (Heil et al., 1992a,b; Mendelson et al., 1993; Shamma et al., 1993; Nelken and Versnel, 2000; Zhang et al., 2003; Godey et al., 2005; Atencio et al., 2007). Further, DSCF neurons as a group show upward direction selectivity for linear FMs centered within the echo-CF<sup>2</sup> range (Washington and Kanwal, 2008, 2012). Some DSCF neurons are direction selective for linear FM with durations as short as 1.3 ms and modulation rates as rapid as 4.0 kHz/ms (Washington and Kanwal, 2008). DSCF neurons are capable of responding to linear FMs with durations as short as 0.7 ms and rates as fast as 8 kHz/ms.

Even the most elaborate neural circuits are subject to physical laws. Thus, the mustached bat auditory system is no exception to the Heisenberg-Gabor Limit. The ability of DSCF neurons to detect and respond to such rapid modulations of frequency necessitates that they make use of some form of narrow temporal window. However, neurons in the DSCF area are defined by their refined spectral resolution, which requires the use of a wide temporal window. The constraint that neurons in the DSCF area must process auditory signals using both wide and narrow temporal windows creates a fundamental conflict between integration and resolution.

In theory, DSCF neurons may have evolved in such a way as to contend with this conflict. Potential strategies include (1) having one group of DSCF neurons process signals using wide temporal windows and another group using narrow windows, (2) having each neuron contain a group of synaptic or dendritic microcircuits which process signals using wide temporal windows and another set that does so using narrow windows, and (3) metabolically adjusting excitatory and inhibitory response areas such that temporal windows are wide while hunting and narrow while socializing. However, all but the first of these strategies would be energy-intensive, computationally problematic, or behaviorally untenable. The second strategy is computationally problematic on two counts. First, temporal domain computations would be performed either faster and/or at a more consistent pace than spectral domain computations, creating a bottleneck at the axon hillock by which the slow and/or intermittent flow of spectral computations interferes with the rapid and/or steady flow of temporal computations. Second, a method would be needed to differentiate between any firing patterns elicited by the spectral and temporal components of signals since they would be generated within the same neuron and thus propagating down the same axon(s). The third strategy is energy-intensive for an organism with an already high metabolism and behaviorally untenable since the bats constantly echolocate, even in social situations (Clement et al., 2006).

It is known that the same groups of neurons can accommodate multiple dimensions of stimuli within overlapping primary auditory cortex maps (e.g., cochleotopy, Merzenich et al., 1975, aural dominance, Liu and Suga, 1997, and rate selectivity, Heil et al., 1992a; Mendelson et al., 1993), much like in the primary visual cortex (e.g., retinotopy, ocular dominance, and orientation, Goodhill, 2007). However, none of the neurons constituting these overlapping maps appears to be processing stimulus dimensions derived from canonically-conjugate variables and do so on the same time scale. Although the FM-FM and DSCF areas process refined temporal and spectral information respectively, exist within the same hemisphere, and process different stimulus components, the DSCF and FM-FM areas simply do not constitute overlapping cortical maps.

The hypothesis of two functional groups of DSCF neurons, one with refined spectral resolution and another with refined temporal resolution, begs the question of how these two groups would be organized. Echolocation and Doppler Ambiguity demonstrate that canonically-conjugate variables (i.e., velocity and range via sonar) can be processed within bilateral adjacent cortical regions. However, there are key differences in the acoustic structure of biosonar signals and social calls as well as the neural circuitry used to process them. CF components always precede FM components during echolocation whereas the same cannot be said of composites (e.g., composites of the single-humped FM and short-quasi CF calls) and other sequences of social calls (Clement et al., 2006; Clement and Kanwal, 2012). As for neural circuitry, the DSCF area is located within A1 (Suga and Jen, 1976), and FM-FM is located within non-primary auditory cortex (Suga, 1985). Together, these facts illustrate the different neurocomputational constraints placed on calculating one pair of canonically-conjugate variables (velocity and range between the FM-FM and DSCF areas during echolocation) versus another (time and frequency within the DSCF area during communication). First, DSCF neurons may begin processing echo-CF components dozens of milliseconds before the FM-FM neurons even receive echo-FM information, giving them a substantial head-start in calculating velocity. Second, A1 and non-primary auditory cortex receive their own direct, separate inputs from different regions of the medial geniculate body of the thalamus (Burton and Jones, 1976; Huang and Winer, 2000). Thus, the DSCF area could start processing spectrally-based velocity information while the FM-FM area processes temporally-based range information in quasi-parallel.

Neither possibility exists for the two hypothetical populations of DSCF neurons. During communication, both neural populations would intermittently receive refined spectral information (in the form of echo-CF components of biosonar signals, CF-type syllables in the echo-CF<sup>2</sup> range, or both at once) and refined temporal information (in the form of rapid FM-type syllables traversing the echo-CF<sup>2</sup> range). Thus, the DSCF area would need to contend with two subpopulations that intermittently perform computations on different time scales while sharing many of the same inputs and projections. If their inputs largely originate from one cochlea (itself subject to the Heisenberg-Gabor Limit), how one population with refined spectral and another with refined temporal resolution managed to co-exist (i.e., co-evolve or co-develop) within the cochleotopic axis of A1 would be difficult to understand.

Further, any regions receiving projections from these separate populations would in all likelihood adapt, over the course of either development in the short term or evolution in the long term, by starting to specialize in spectral or temporal domain processing as well. By analogy, the neuronal coalition composing the DSCF area would be broken because its constituent neurons split into two factions that are incapable of coordinating with each other, and their conflict would eventually spread to neighboring regions.

If accurate, ASTIR would represent an elegant solution to the DSCF area's internal conflict over acoustic uncertainty. According to ASTIR, these two subpopulations of DSCF neurons could simply reside in different cerebral hemispheres. One population would be capable of slowly processing the refined spectral information necessary for tracking the velocity of a distant insect while the other population is quickly processing a steady stream of rapid FM call syllables. Communication via commissural connectivity would enable the two populations to combine information or modulate each other's activity as needed. Projections from neurons in these left and right DSCF areas to nearby cortical areas would be primarily ipsilateral, potentially resulting in entire cerebral hemispheres populated by functional areas specialized for higher-order functions ultimately rooted in temporal or spectral processing. To further stretch an analogy, ASTIR offers the spectral and temporal DSCF neural populations a most generous two-state solution.

From a population coding perspective, ASTIR holds even greater advantages over local intrahemispheric specialization for the precise processing of temporal and spectral information, especially in mustached bats. First, subregions of the auditory cortex in one hemisphere are to some degree interconnected and hierarchically organized. If a region (or set of regions) responsible for processing rapidly changing signals within a narrow temporal window is connected to an adjacent region responsible for fine frequency discrimination (a necessarily slow process relative to temporal domain processing), the resulting circuit will only be as fast as its slowest node. That is to say that the region responsible for fine frequency discrimination will become an unnecessary rate-limiting step, slowing down the processes of other adjacent regions responsible for rapid auditory processing. Housing the spectral and temporal processing regions in different hemispheres would allow the auditory cortices in both hemispheres to process signals at rates ideal for maximizing spectral and temporal information while allowing them to communicate via the corpus callosum as needed. Second, A1 is an example of primary sensory cortex. In mustached bats and other animals, there is an advantage to processing auditory signals with refined spectral (velocity) and temporal (communication) information at this level of the cortical hierarchy. Appropriating a region adjacent to a primary sensory cortical area like A1, which has different cytoarchitecture from A1 and could otherwise perform higher-level analyses on information it receives from A1, would not necessarily be evolutionarily advantageous or neuroplastically trivial. Such a waste of cortical resources would be egregious if the sole purpose for appropriating this adjacent region was to analyze Fourier-transformed (i.e., canonicallyconjugate) versions of the same auditory information processed by A1. This waste of cortical resources becomes even more nonsensical when there is another A1 on the other side of the brain that is ideally situated to perform an analysis of Fouriertransformed versions of the same information in parallel.

Taken from another perspective, ASTIR asserts that the human brain developed this specialization due to environmental pressures necessitating precise acoustic calculation of time and frequency, especially as they relate to speech and music (Zatorre et al., 2002; Poeppel, 2003). Though music's ethological purpose is still debated, its existence within every known culture suggests a role in alleviating some environmental pressures, such as adapting to living in social groups (Loersch and Arbuckle, 2013). Though spectral domain processing is necessary for detecting prosody (Lakshminarayanan et al., 2003) and speaker identity (Robin et al., 1990), humans with right auditory cortical infarct are reported as having fewer speech processing deficits than those with similar left hemispheric infarct (Purves, 2004). On the other hand, the loss of velocity tracking in a mustached bat would greatly compromise the hunting abilities of an animal with a very high metabolism, resulting in its starvation in as little as 48 h. Likewise, a mustached bat's inability to process rapid FMs, akin to receptive aphasia in a human, would likely result in social isolation, aggression from conspecifics, and/or a loss of mating opportunities. In short, the environmental pressure to develop ASTIR, the neural mechanism purported to underlie hemispheric specialization for speech and music, is arguably greater for mustached bats than for humans. Compelling environmental pressures to develop such refined spectral and temporal processing within the same auditory cortical subregion (e.g., A1) that result in an acoustic uncertainty conflict are not evident in other mammals, such as mice, cats, and macaques. The question going forward is whether such a mechanism evolved within the small brains of mustached bats and what this implies for the closed-system and domain-general hypotheses of human speech processing.

## Converge and Impact: Evidence and Implications of Hemispheric Differences in Mustached Bat Auditory Cortex

Hemispheric differences in neural processing in the FM-FM area have never been the specific topic of a scientific paper. However, a prominent bat researcher reported maps of range (i.e., echodelay) in the left and right hemispheric FM-FM areas (Suga, 1985) in a single mustached bat. This researcher concluded "that the distributions of best delays for facilitation are not the same between the left and right FM-FM areas of a single [mustached] bat." Closer examination of these maps reveals that the left FM-FM area is highly organized and refined in the time dimension such that populations of neurons responding to fine changes in echo-delays are organized into narrow, parallel columns running along the dorsal-ventral axis. These same columns were wider and contained neurons tuned to broader echo-delays in the right hemisphere. Neurons in the left FM-FM area of this single bat have more refined temporal resolution (i.e., narrower gap detection thresholds) than those in the right FM-FM area. The sex of this bat remains unknown. Intriguing as these cortical FM-FM maps are, however, conclusions about hemispheric specialization in mustached bats cannot be extrapolated from a single animal.

On the other hand, the spectral and temporal domain processing of neurons within the left and right hemispheric DSCF areas of six bats were directly tested using linear FMs centered on each neuron's best frequency in the echo-CF<sup>2</sup> range (Washington and Kanwal, 2012). Temporal domain processing was tested by

```
FIGURE 3 | Comparisons of temporal and spectral metrics of neural
responses from left (blue) and right (red) hemispheric DSCF neurons in
male mustached bats. (A) Latencies of peak DSCF neural responses elicited
by a 30-ms tone in the echo-CF2
                                 range paired at onset with a 30-ms tone in
the pulse-FM1
              range. Left: Bar plot shows the average response latency for 88
left and 70 right hemispheric DSCF neurons. Right: Kernel plot shows the
distribution of the same data at left. (B) Selectivity for the rates of FMs centered
in the echo-CF2
                range and paired at onset with a 30-ms tone in the pulse-FM1
range. Left: Average of normalized curves derived from magnitudes of peak
DSCF neural responses (proportional to spike rates) elicited by FMs increasing
in modulation rate from 0.04 to 4.0 kHz/ms in the left (46 neurons) and right
(45 neurons) hemispheres. The abscissa axis shows FM rates from 0.04 to
4.0 kHz/ms and includes a separate demarcation for best tone pairs in the
echo-CF2 and pulse-FM1
                          ranges. The ordinate axis represents the percentage
of the average DSCF neuron's peak response to FMs elicited at each rate in
the 0.04–4.0 kHz/ms range. At the far left is the magnitude of the average
DSCF neuron's response to its best tone-pairs as a percentage of its maximum
responses to FMs. The dotted line represents the average best FM rate of
0.59 kHz/ms. Right: Pie charts representing the percentage of left hemispheric
(top) and right hemispheric (bottom) neurons with best FM rates above (dark)
and below (light) the average best FM rate of 0.59 kHz/ms. (C) Selectivity for
the bandwidths of FMs centered in the echo-CF2
                                                 range and paired at onset
with a 30-ms tone in the pulse-FM1
                                   range. Left: Average of normalized curves
derived from magnitudes of peak DSCF neural responses elicited by FMs
increasing in bandwidth from 0.4 to 7.9 kHz in the left (74 neurons) and right
(47 neurons) hemispheres. The abscissa axis shows FM bandwidths from 0.4
to 7.9 kHz. The ordinate axis represents the percentage of the average DSCF
neuron's peak response to FMs elicited at each bandwidth in the 0.4–7.9 kHz
range. The dotted line represents the average best FM bandwidth of 4.5 kHz.
Right: Pie charts representing the percentage of left hemispheric (top) and
right hemispheric (bottom) neurons with best FM bandwidths above (dark) and
                                                                 (Continued)
```
#### FIGURE 3 | Continued

below (light) the average best FM bandwidth of 4.5 kHz. (D) Latencies of peak DSCF neural responses elicited by FM optimized for rate, bandwidth, and amplitude, centered in the echo-CF2 range, and paired at onset with a 30-ms tone in the pulse-FM1 range. Left: Bar plot shows the average peak response latency for 64 left and 43 right hemispheric DSCF neurons. Right: Kernel plot shows the distribution of the same data at left. Though tone-pairs generally tend to elicit greater responses in DSCF neurons than FMs, FMs optimized for rate, bandwidth, and modulation direction commonly elicit greater responses from these neurons than do tone-pairs (Washington and Kanwal, 2012). In males, the average best FM rates, durations, and bandwidths for left hemispheric DSCF neurons were 0.99 kHz/ms ± 0.13 S.E.M. (standard error of the mean), 14.04 ms ± 1.96 S.E.M., and 4.39 kHz ± 0.26 S.E.M. whereas these values for right hemispheric DSCF neurons were 0.27 kHz/ms ± 0.05 S.E.M., 34.22 ms ± 4.94 S.E.M., and 3.49 kHz kHz/ms ± 0.29 S.E.M. Thus, best FM rate and bandwidth are significantly greater (*p* < 0.05) amongst left DSCF neurons and best FM duration is greater amongst right DSCF neurons. Adapted from Washington and Kanwal (2012). Reproduced with the permission of Dr. Jagmeet S. Kanwal and the American Physiological Society.

varying the rates (1f/1t) of FMs, specifically by changing their durations (1t) while keeping their bandwidths (1f) constant. Spectral domain processing was tested by varying the bandwidths of FMs while maintaining their rates at the preferred rate for each neuron. FMs were always paired at onset with a CF at the best frequency in the pulse-FM<sup>1</sup> range so as to optimize neural responses via facilitation.

Responses recorded from 158 neurons (LH = 88, RH = 70) in the DSCF areas of six bats in showed profound hemispheric differences that conformed to ASTIR (**Figure 3**). Latencies of responses elicited by pairs of CFs presented at the best frequencies in the echo-CF<sup>2</sup> and pulse-FM<sup>1</sup> ranges were significantly longer (LH = 15 ms; RH = 18 ms) and showed greater variance (LH = 21 ms; RH = 33 ms) amongst right DSCF neurons than those on the left. Likewise, latencies of responses elicited by FMs optimized for the spectral and temporal selectivities of each neuron (i.e., best FM bandwidths and rates) showed even greater hemispheric differences, such that latencies were nearly twice as long (LH = 13 ms; RH = 23 ms) and showed almost 20 times the variance in the right hemisphere relative to the left (LH = 16 ms; RH = 311 ms). Left DSCF neurons selected for FMs with faster rates (1 kHz/ms) than those on the right (0.2 kHz/ms) whereas right DSCF neurons selected for FMs with narrower bandwidths (3.5 kHz) than those on the left (4.4 kHz). Right DSCF neurons also selected for FMs with durations over twice as long (34 ms) as those on the left (14 ms) and had longer response durations (31 ms) than those on the left as well (20 ms). Further analyses ruled out the possibility of FM duration selectivity and hierarchical linear modeling ruled out the possibility that these results were biased to individual bats.

These results require some further explanation. Left DSCF neurons had generally less selectivity than those on the right. For instance, although left DSCF neurons selected for faster FM rates, they were more likely to respond to a multitude of FM rates, both rapid and slow. These FM rate selectivity results are consistent with behavioral (Schwartz and Tallal, 1980) and neuroimaging (Belin et al., 1998) results for formant transitions in humans. Their responses to FM bandwidths could be similarly characterized. Right DSCF neurons on the other hand generally responded robustly to long, slow, narrowband FMs but showed few if any responses to short, rapid, or broadband FMs. Indeed, on average, there is a 900 Hz difference (LH > RH) in best FM bandwidth between left and right DSCF neurons, which is 18 times the spectral resolution the bat needs to detect differences between a pulse- and echo-CF<sup>2</sup> (Riquimaroux et al., 1992; Washington and Kanwal, 2012). Their selectivity for longer, narrowband sounds and their longer response durations strongly suggest that, consistent with ASTIR, right hemispheric DSCF neurons employ longer temporal integration windows relative to those on the left. Further, left and right DSCF neurons differ not only in their ability to detect rapid changes in stimulus features but also they differ in how quickly and for how long they respond to stimuli in general. FM rate selectivity and response latency would appear to be unrelated measures, but they both reflect finer temporal domain processing in left hemispheric DSCF neurons relative to those on the right. Right hemispheric DSCF neurons take longer to respond to stimuli and have less reliable spike times (i.e., less time-locked) than their left hemispheric counterparts. The finding that multiple temporal measures (i.e., FM rate selectivity, latency, response duration, etc.) are shorter and/or more refined in the left hemisphere suggests that hemispheric differences in the widths of temporal integration windows manifests in multiple ways, even at the single neuron level, in the mustached bat's A1 (**Figure 4**).

There is one factor mitigating the results above. The results represent only half of the mustached bat population: Males.

The same study described above reported recordings not only from neurons in the DSCF areas of six male mustached bats but also reported recordings from 168 neurons (LH = 91, RH = 77) in the DSCF areas of four female mustached bats (Washington and Kanwal, 2012). While sometimes significant, hemispheric differences in spectral versus temporal processing were decidedly less pronounced in females than in males (**Figure 5**). Latencies of responses to CF-pairs were remarkably similar in duration (LH = 15 ms; RH = 14 ms) and variance (LH = 15 ms; RH = 20 ms) across hemispheres, comparable to those of the left hemisphere in males. Left and right DSCF neurons also selected for FMs with similar rates (LH = 0.41 kHz/ms; RH = 0.47 kHz/ms) and bandwidths (LH = 4.93 kHz; RH = 4.74 kHz). However, like males, right DSCF neurons from female bats also selected for FMs with significantly longer durations (65 ms) and had longer response durations relative to those on the left (37 ms). Likewise, response latencies to FMs in females were significantly longer on the right (30 ms) than on the left (17 ms), similar to males. Left DSCF neurons were again less selective for slow FM rates in females but to a far lesser extent than in males. There were no appreciable hemispheric differences in spectral domain processing in females. Yet, response characteristics of neurons in the right DSCF areas of female bats showed multiple signs of processing sounds using longer integration windows relative to those in the left hemisphere. Like in males, female bats displayed no selectivity for FM durations, and these results were not biased to individual bats.

Despite these sex differences in hemispheric specialization, what must be emphasized is that ASTIR appears to be a

FIGURE 4 | Temporal response parameters of DSCF neurons as evidence for asymmetric sampling of time in mustached bats. All stimuli presented in the echo-CF2 range (57.5–60 kHz in *P.p. rubiginosus*) were paired at onset with a 30-ms CF tone-burst in the pulse-FM1 (23–28 kHz). Responses shown in (A–C) are from six different DSCF neurons, selected because they best illustrated a particular ASTIR-related concept. (A top): A 30-ms, constant-frequency tone presented at echo-CF2 . (A middle): Voltage trace from a typical left hemispheric DSCF neuron in a male mustached bat following one presentation of a 30-ms CF tone-burst presented at the neuron's best frequency (BF) and best amplitude of excitation (BAE). This neuron is responding within 10 ms after stimulus onset. (A bottom): Voltage trace from a typical right hemisphere DSCF neuron in a

male mustached bat following one presentation at BAE of a 30-ms CF tone-burst centered on the neuron's BF. This neuron is responding >20 ms after stimulus onset. Left DSCF neurons typically respond to tonal stimuli 3–5 ms before those on the right in male, but not female, bats (Washington and Kanwal, 2012). Assuming DSCF neurons conform to typical integrate-and-fire models, in male moreso than female bats, ASTIR takes the form of left DSCF neurons to integrating salient stimulus features and firing in less time than right DSCF neurons. (B top): A 1.31-ms, upward FM centered on echo-CF2 , which has a modulation rate of 4 kHz/ms and a bandwidth of 5.25 kHz. (B middle): Voltage trace from a typical left DSCF neuron in a male bat following one presentation of a 5.25 kHz, 4 kHz/ms upward FM at BAE *(Continued)*

#### FIGURE 4 | Continued

and centered on the neuron's BF. This neuron is responding within 10 ms after stimulus onset. (B bottom): Voltage trace from a typical right DSCF neuron in a male bat following presentation at BAE of a 5.25 kHz, 4 kHz/ms upward FM centered at the neuron's BF. This neuron is simply not responding. Relative to left DSCF neurons, right DSCF neurons are less responsive to shorter FM signals (Washington and Kanwal, 2012). This selectivity for longer sounds suggests right DSCF neurons have longer integration windows and are thus less likely to respond to such short sounds. Though this hemispheric difference is observed in both sexes, it is more pronounced in males. (C top): A 131-ms, upward FM centered at echo-CF2 , which has a modulation rate of 0.04 kHz/ms and a bandwidth of 5.25 kHz. (C middle): Voltage traces from a typical left DSCF neuron in a male

feature of both the male and female mustached bat auditory cortex. Further, data collected from multiple species suggests that these sex differences represent less a flaw in the hypothesis proposed here than a feature of hemispheric specialization for communication sounds. Hemispheric specialization for song production and perception is greater in male than in female songbirds (Nottebohm and Arnold, 1976; DeVoogd and Nottebohm, 1981). Male songbirds are also able to use both spectral and temporal information to classify call stimuli by the sex of the caller but females can only use temporal information (Vicario, 2004). Certainly, hemispheric specialization for speech and language is often, but not always (Obleser et al., 2001, 2004), found to be stronger in men than in women (Lansdell, 1964; McGlone, 1977; Dawe and Corballis, 1986; Shaywitz et al., 1995). Men are reported to have greater left hemispheric specialization (i.e., right-ear-advantage) for temporal domain processing than women as well (Brown et al., 1999). A sex-dependent asymmetry in mustached bat auditory cortex implies that this asymmetry is at least analogous to the asymmetries found in songbirds, rats, and humans. Please note that, though there was no evidence for rightlateralized refined spectral domain processing in female bats, refined spectral processing in the right hemispheric DSCF areas of male bats was less statistically robust than refined temporal processing in the left hemisphere. Thus, the apparent lack of right-lateralized, refined spectral domain processing in female bats may simply reflect their overall diminished hemispheric specialization relative to males.

Placing mustached bat echolocation and communication into a computational context via the Heisenberg-Gabor Limit allows us to begin answering longstanding questions. First, the fact that ASTIR appears to be greater in male bats and that advantages for temporal and spectral domain processing are found in the left and right hemispheres respectively, and not vice-versa, strongly suggest that hemispheric specialization in mustached bats is analogous to such specialization in the human brain. Second, ASTIR's presence within the brains of mustached bats, when coupled with the fact that neither echolocation nor communication represents a closed system in this species, is evidence against closed-system hypotheses of speech processing. If ASTIR can occur in mustached bats and amongst the same neurons responsible for processing both biosonar signals and social calls, a language-only "speech organ" existing within the bat following four presentations at BAE of a 5.25 kHz, 0.04 kHz/ms upward FM centered on the neuron's BF. This neuron's four responses are time-locked and occur within the first 30 ms of the stimulus. (C bottom): Voltage traces from a typical right DSCF neuron in a male bat following four presentations at BAE of a 5.25 kHz, 0.04 kHz/ms upward FM centered at the neuron's BF. This neuron's four responses are not time-locked (i.e., tonic or burst firing) and occur after the first 70 ms of the stimulus. In both sexes, the maximum response duration of the left DSCF neuron (1*t*L ) is less than that of the right DSCF neuron (1*t*R) (Washington and Kanwal, 2012). Since, in general, 1*t*R > 1*t*L , right DSCF neurons in general are less capable of processing precise temporal information than left DSCF neurons. Washington and Kanwal, unpublished data, reproduced with the permission of Jagmeet S. Kanwal, PhD.

left superior temporal gyri of humans seems unnecessary. Third, ASTIR's presence in mustached bats is even stronger evidence against brain-volume and handedness hypotheses. The small brains of mustached bats and the large brains of humans are capable of having similar hemispheric differences. Mustached bats also do not have hands or even use tools.

Nonetheless, this theoretical discussion of neural mechanisms of hemispheric specialization and the evidence supporting their existence in mustached bat auditory cortex raises many questions. Those questions stemming from sex differences are admittedly some of the most difficult: Why do these sex differences exist? What adaptive purpose do they serve? The hypothesis presented above asserts that powerful ethological pressures related to hunting and socialization in mustached bats underlies the development of ASTIR in mustached bat auditory cortex. However, some form of ethological pressure also drove hemispheric specialization for communication in humans and songbirds while leaving some startling exemptions for females in those species. It is likely that the sex differences for hemispheric specialization in mustached bats are present for the same reason similar sex differences are present in humans and songbirds. However, there is no consensus on why these sex differences exist in any of these species. Testosterone levels in-utero and during infancy are known to modulate hemispheric specialization for speech and language in humans (Geschwind and Galaburda, 1985; Tallal et al., 1988, 1993; Beech and Beauvois, 2006). However, such mechanistic explanations do not adequately address the ethological question as to why such sex differences evolved in the first place.

To this end, current results in the mustached bat may be more useful for questioning answers than for answering questions. Specifically, anthropologists have associated sex differences in hemispheric specialization for speech and language with the respective hunter and gatherer roles of men and women (Joseph, 2000). This anthropological explanation is inadequate for mustached bats since the males and females in this species are both insectivorous hunters. There are behavioral differences between male and female mustached bats relevant to how they process both biosonar signals and social calls: pulse-CFs<sup>2</sup> are higher in frequency amongst female bats (Suga et al., 1987), males emit social calls more often than females, and the sexes differ in the types of social calls they emit (Clement and

FIGURE 5 | Comparisons of temporal and spectral metrics of neural responses from left (blue) and right (red) hemispheric DSCF neurons in female mustached bats. (A) Latencies of peak DSCF neural responses elicited by a 30-ms tone in the echo-CF2 range paired at onset with a 30-ms tone in the pulse-FM1 range. Left: Bar plot shows the average response latency for 91 left and 77 right hemispheric DSCF neurons. Right: Kernel plot shows the distribution of the same data at left. (B) Selectivity for the rates of FMs centered in the echo-CF2 range and paired at onset with a 30-ms tone in the pulse-FM1 range. Left: Average of normalized curves derived from magnitudes of peak DSCF neural responses elicited by FMs increasing in modulation rate from 0.04 to 4.0 kHz/ms in the left (50 neurons) and right (46 neurons) hemispheres. The abscissa axis shows FM rates from 0.04 to 4.0 kHz/ms and includes a separate demarcation for best tone pairs in the echo-CF2 and pulse-FM1 ranges. The ordinate axis represents the percentage of the average DSCF neuron's peak response to FMs that is elicited at each rate in the 0.04–4.0 kHz/ms range. At the far left is the magnitude of the average DSCF neuron's response to its best tone-pairs as a percentage of its maximum responses to FMs. The dotted line represents the average best FM rate of 0.59 kHz/ms. Right: Pie charts representing the percentage of left hemispheric (top) and right hemispheric (bottom) neurons with best FM rates above (dark) and below (light) the average best FM rate of 0.59 kHz/ms. (C) Selectivity for the bandwidths of FMs centered in the echo-CF2 range and paired at onset with a 30-ms tone in the pulse-FM1 range. Left: Average of normalized curves derived from magnitudes of peak DSCF neural responses elicited by FMs increasing in bandwidth from 0.4 to 7.9 kHz in the left (76 neurons) and right (59 neurons) hemispheres. The abscissa axis shows FM bandwidths from 0.4 to 7.9 kHz. The ordinate axis represents the percentage of the average DSCF neuron's peak response to FMs that is elicited at each bandwidth in the 0.4–7.9 kHz range. The dotted line represents the average best FM bandwidth of 4.5 kHz. Right: Pie charts representing the percentage of left hemispheric (top) and right hemispheric (bottom) neurons with best FM *(Continued)*

#### FIGURE 5 | Continued

bandwidths above (dark) and below (light) the average best FM bandwidth of 4.5 kHz. (D) Latencies of peak DSCF neural responses elicited by FM optimized for rate, bandwidth, and amplitude, centered in the echo-CF2 range, and paired at onset with a 30-ms tone in the pulse-FM1 range. Left: Bar plot shows the average peak response latency for 63 left and 75 right hemispheric DSCF neurons. Right: Kernel plot shows the distribution of the same data at left. Adapted from Washington and Kanwal (2012). In females, the average best FM rates, durations, and bandwidths for left hemispheric DSCF neurons were 0.41 kHz/ms ± 0.06 S.E.M. (standard error of the mean), 36.68 ms ± 5.65 S.E.M., and 4.93 kHz ± 0.29 S.E.M. whereas these values for right hemispheric DSCF neurons were 0.44 kHz/ms ± 0.10 S.E.M., 62.32 ms ± 8.69 S.E.M., and 4.60 kHz kHz/ms ± 0.32 S.E.M. Thus, only best FM duration is significantly different such that it is greater amongst right DSCF neurons. Adapted from Washington and Kanwal (2012). Reproduced with the permission of Dr. Jagmeet S. Kanwal and the American Physiological Society.

Kanwal, 2012). Pharmacobehavioral techniques previously used to determine DSCF frequency resolution (Riquimaroux et al., 1992) could be altered (e.g., unilateral muscimol application) so as to determine the extent to which the left and right hemispheric DSCF areas differ in spectral resolution in male and female mustached bats. Further, field studies of mustached bats that ask behavioral questions framed by comparisons between sexes (e.g., is there finer-tuned velocity tracking amongst males?) would shed light on the reasons for this phenomenon in humans and birds. Such field studies could ultimately have a surprising impact on anthropological theories concerning the evolution of hemispheric specialization for speech and language.

Neurophysiological studies employing more sophisticated equipment and experimental designs will be needed to fully explore this sex dependent asymmetry. Mapping studies could determine if males and females have a different spatial distribution of neurons in the auditory cortex or a different morphology of auditory cortical fields. Neuropharmacological techniques previously used in the study of neural selectivity for social calls in bats (Klug et al., 2002) could be employed to manipulate GABA, glutamate, and perhaps even sex hormones to observe how they alter the firing of left and right hemispheric neurons in the auditory cortices of male and female mustached bats. Further, the evidence presented above suggests that right hemispheric DSCF neurons in males would be more selective for CF-type social calls relative to left hemispheric DSCF areas in male bats or either hemisphere in female bats. An otherwise rigorous study of hemispheric differences in the processing of social calls in the DSCF area did not address this key question (Kanwal, 2012). Neuroimaging could not only determine this sex difference's consistency across animals but also would determine the extent to which hemispheric specialization for audiovocal communication in general pervades the mustached bat auditory cortex, much like neuroimaging studies have in songbird nuclei (Poirier et al., 2009). Far from being a simple scientific anomaly, sex-dependent ASTIR in mustached bats may inspire experiments that will unravel persistent neurophysiological, phonological, and anthropological mysteries.

## Acknowledgments

We thank Dr. Robert Rudnitsky of the National Institute for Standards and Technology for in-depth editorial comments on this manuscript. Insights and critiques of the scientific ideas proposed here were given by (in alphabetical surname order) Drs. Pascal Belin, Iain DeWitt, Jeffrey Krichmar, William Parke, and Maximilian Riesenhuber. We would like to thank Dr. Jagmeet Kanwal, whose oversight and support of Dr. Washington's PhD thesis led to many of the ideas proposed here and for his permission to use the data presented here. We would also like to thank the American Physiological Society for permitting us to reproduce previously published data here. Ultimately, we would like to thank Drs. Vittorio Gallo, John VanMeter and Edward Healton for their support and for the support of their respective institutions, namely the Center for Neuroscience

## References


Research at Childrens National Medical Center, the Center for Functional and Molecular Imaging at Georgetown University, and Georgetown University Medical Center's Department of Neurology. Some of the ideas presented here were generated by SW while being supported by DC02054 and DC008822 (to J. S. Kanwal), DC75763 (to SW), and HD046388 (to V. Gallo). We would also like to credit the anonymous artist whose public domain, freely-downloadable animated gif of a flying bat was incorporated into a figure in Supplementary Section 2.

## Supplementary Material

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fnins. 2015.00143/abstract

and a program for research. Arch. Neurol. 42, 428–459. doi: 10.1001/archneur.1985.04060050026008


Joos, M. (1948). Acoustic Phonetics. Baltimore, MD: Linguistic Society of America.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Washington and Tillinghast. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Functional significance of the electrocorticographic auditory responses in the premotor cortex

Kazuyo Tanji <sup>1</sup> \*, Kaori Sakurada<sup>2</sup> , Hayato Funiu<sup>2</sup> , Kenichiro Matsuda<sup>2</sup> , Takamasa Kayama<sup>2</sup> , Sayuri Ito<sup>1</sup> and Kyoko Suzuki <sup>1</sup>

*<sup>1</sup> Department of Clinical Neuroscience, Yamagata University Graduate School of Medicine, Yamagata, Japan, <sup>2</sup> Department of Neurosurgery, Yamagata University Graduate School of Medicine, Yamagata, Japan*

#### Edited by:

*Monica Munoz-Lopez, University of Castilla-La Mancha, Spain*

#### Reviewed by:

*Daniela Sammler, Max Planck Institute for Human Cognitive and Brain Sciences, Germany Kiyoshi Kurata, Hirosaki University Graduate School of Medicine, Japan*

#### \*Correspondence:

*Kazuyo Tanji, Department of Clinical Neuroscience, Yamagata University Graduate School of Medicine, Iida-nishi 2-2-2, Yamagata 990-9585, Japan kaztanji@gmail.com*

#### Specialty section:

*This article was submitted to Auditory Cognitive Neuroscience, a section of the journal Frontiers in Neuroscience*

> Received: *13 December 2014* Accepted: *22 February 2015* Published: *16 March 2015*

#### Citation:

*Tanji K, Sakurada K, Funiu H, Matsuda K, Kayama T, Ito S and Suzuki K (2015) Functional significance of the electrocorticographic auditory responses in the premotor cortex. Front. Neurosci. 9:78. doi: 10.3389/fnins.2015.00078* Other than well-known motor activities in the precentral gyrus, functional magnetic resonance imaging (fMRI) studies have found that the ventral part of the precentral gyrus is activated in response to linguistic auditory stimuli. It has been proposed that the premotor cortex in the precentral gyrus is responsible for the comprehension of speech, but the precise function of this area is still debated because patients with frontal lesions that include the precentral gyrus do not exhibit disturbances in speech comprehension. We report on a patient who underwent resection of the tumor in the precentral gyrus with electrocorticographic recordings while she performed the verb generation task during awake brain craniotomy. Consistent with previous fMRI studies, high-gamma band auditory activity was observed in the precentral gyrus. Due to the location of the tumor, the patient underwent resection of the auditory responsive precentral area which resulted in the post-operative expression of a characteristic articulatory disturbance known as apraxia of speech (AOS). The language function of the patient was otherwise preserved and she exhibited intact comprehension of both spoken and written language. The present findings demonstrated that a lesion restricted to the ventral precentral gyrus is sufficient for the expression of AOS and suggest that the auditory-responsive area plays an important role in the execution of fluent speech rather than the comprehension of speech. These findings also confirm that the function of the premotor area is predominantly motor in nature and its sensory responses is more consistent with the "sensory theory of speech production," in which it was proposed that sensory representations are used to guide motor-articulatory processes.

Keywords: premotor area, apraxia of speech, TMS, mirror neuron, motor theory of speech perception

## Introduction

Other than well-known motor activities in the precentral gyrus, functional magnetic resonance imaging (fMRI) studies have found that the ventral part of the precentral gyrus is activated in response to linguistic auditory stimuli (Wilson et al., 2004). It has been suggested that this activity is best interpreted according to the "motor theory of speech perception" which argues that phonetic information is perceived in a "module," or a biologically based link between perception and production specialized to detect the intended gestures of the speaker, rather than by translation from auditory impressions (Liberman and Mattingly, 1985). Accordingly, a number of recent studies have demonstrated that transcranial magnetic stimulation (TMS) to the left ventral premotor cortex modulates the efficiency and accuracy of phoneme comprehension. For example, deactivation of the left premotor cortex with repetitive TMS is associated with a decline in the accuracy of auditory syllable perception (Meister et al., 2007) while double TMS to the ventral precentral gyrus facilitates reaction time during speech perception (D'Ausilio et al., 2009). These results support the involvement of the motor cortices in phoneme comprehension which, in turn, underlies the motor theory of speech perception. However, it may not be possible to link these relatively minor TMS-induced behavioral effects with the symptoms of patients suffering from physical lesions in the precentral gyrus. Such TMS effects are reported to emerge only when the speech sounds are partially ambiguous with addition of noise and/or when the behavioral measure is reaction time rather than accuracy (Sato et al., 2009; Hickok et al., 2011). Additionally, clinical studies have clearly established that lesions in the precentral gyrus are associated with articulatory disturbances rather than speech comprehension disturbances (Duffy, 2012), which is more suggestive that the auditory response in the precentral gyrus can be interpreted with "sensory theory of speech production," in which it was proposed that sensory representations are used to guide motor-articulatory processes, as the reverse relation of the one proposed in motor theories of speech perception (Venezia and Hickok, 2009; Hickok et al., 2011).

Cortical speech motor disorders are typically classified into two categories: dysarthria and apraxia of speech (AOS). Dysarthria is defined as speech disorders resulting from disturbances in muscular control of the speech mechanism (Darley et al., 1975). However, this term should be interpreted with caution because speech disturbances that result from cortical lesions are often broadly classified using the generic term "cortical dysarthria" and do not include a qualitative description of the speech disturbance (e.g., Kim et al., 2003). Patients with AOS could easily be misclassified as cortical dysarthria. AOS, also known as pure anarthria or aphemia, is characterized by unpredictable and irregular errors in the absence of paralytic disorders and linguistic disturbances. Although the original definition of AOS simply states that the condition is a motor planning or programming disturbance (Darley et al., 1975), the actual symptoms of AOS are not homogeneous, creating frequent disagreements in terms of diagnosis, even among experienced speech pathologists (Haley et al., 2012). It is likely that this debate continues because most patients do not display the pure symptoms of AOS (Laganaro, 2012), many cases present with comorbid AOS and dysarthria (Duffy, 2012), and, often, AOS is part of an aphasia syndrome.

A number of studies have attempted to better characterize the locations of lesions that accompany AOS (Dronkers, 1996; Hillis et al., 2004; Richardson et al., 2012). It has been proposed that lesioning of several structures, including the insula and Broca's area, is crucial for development of AOS. However, the exact nature of the relationship between lesion location and clinical symptoms has yet to be established. This may be due to the fact that pure cases of AOS are rare and that most data on this disorder are based on findings from cases with large lesions in which AOS manifests as a component of an aphasia syndrome. Although some researchers have argued that damage to Broca's area is crucial for development of AOS (Hillis et al., 2004), this notion is not consistent with the fact that lesions restricted to Broca's area do not result in AOS (Mohr et al., 1978). Moreover, such studies typically include patients with large lesions that span the prefrontal cortex, the precentral gyrus, and the underlying white matter (Hillis et al., 2004).

On the other hand, AOS has consistently been associated with lesions that are restricted to the precentral gyrus. The first such case that included a sufficient description of both the location of the lesion and the qualities of the articulatory disturbance was reported by Lecours and Lhermitte (1976). These authors found that phonetic disintegration syndrome, which is disorganization of the exclusive choice of a certain number of features and their largely cotemporal integration into a more complex unit (the phoneme), resulted from a lesion restricted to the ventral precentral gyrus. This report was followed by several studies describing lesions that were generally restricted to the precentral gyrus but that exhibited heterogeneous clinical symptoms. Although a majority of these cases featured a phonetic disorder (such as a distortion in speech), there were also cases in which precentral lesions led to phonemic issues with sequential errors (Sasanuma, 1971; Tanji et al., 2001), cases that were characterized by phonemic issues without sequential errors (Larner et al., 2004), and cases that were characterized by a foreign accent syndrome (Sakurai et al., 2014). It is notable that the clinical symptoms associated with lesions in the precentral gyrus, which tends to be regarded as a structure supporting relatively simple motor functions, are so diverse. Based on the findings of these case studies, it is clear that the exact relationship between a lesion and the symptoms thereof in terms of articulatory disturbances, even within the precentral gyrus, has yet to be established. Such heterogeneity is likely the result of not only variability in the location of the lesioned area but also variations of functional organization within the precentral gyrus.

In fact, organization within the precentral gyrus is not as simple as is generally believed. The term "precentral gyrus" is often used as a synonym for the primary motor area, or Brodmann area (BA) 4, and the fact that the convex surface thereof is mostly occupied by the premotor cortex, or BA 6 (Rizzolatti et al., 1998), is often overlooked. Depending on its precise location within the precentral gyrus, a lesion could affect BA 4 and BA 6 to varying degrees and this may explain the co-occurrence of dysarthria and AOS. Additionally, it has been proposed that dissociation of premotor function occurs along the dorsoventral axis (Duffau, 2003). Given the complexities of the articulatory disorders that result from precentral lesions, it is possible that other unknown organizing principles also influence the functional distribution of premotor activity.

Neuroimaging studies affording excellent temporal and spatial resolution should help reveal the nature of such organizing principles and elucidate their functional distribution within the precentral gyrus. Thus, the present study details the electrophysiological responses induced by a verb generation task in the precentral premotor areas of a patient undergoing an awake craniotomy for tumor resection. The present study used electrocorticography (ECoG) in an attempt to confirm the reproducibility of the auditory activity that has been previously reported by fMRI studies (Wilson et al., 2004). It was expected that the distribution of the auditory activity within the precentral gyrus in response to linguistic auditory stimuli would be revealed, due to the excellent temporal and spatial resolution provided by direct cortical recording. To this end, the present study used equivalent linguistic stimuli in the form of written words (a visual modality) to confirm whether the activity was modalityspecific or cross-modal; to the best of our knowledge, this has never been assessed in humans. Additionally, a description of the articulatory disorder in this patient that developed following the resection of this area of the precentral gyrus is provided.

## Materials and Methods

## Subject

A 52 year-old right-handed woman was admitted to a hospital after an episode of partial seizure with secondary generalization. Magnetic resonance imaging revealed a brain tumor in the left precentral gyrus (**Figure 1**), and was referred to our hospital for surgical consultation. She underwent an awake craniotomy for resection of the tumor and, at the same time, participated in this language mapping experiment. This study was approved by the ethics committee of the Faculty of Medicine at Yamagata University and posed no additional risk to the patient. The patient provided signed informed consent after a detailed explanation of the procedure. She did not have any preoperative disturbance in language or motor function, except for very mild dysarthria. The Wechsler Adult Intelligence Scale-III (WAIS-III) revealed a verbal intelligence quotient (IQ) of 96, a performance IQ of 95, and a full-scale IQ of 95. The patient was initially anesthetized with propofol but all medication was discontinued prior to the start of the experiment.

## Stimuli and Tasks

The task began when a fixation cross appeared at the center of a screen. Subsequently, a series of nouns was presented and the patient was asked to think of an associated verb for each noun and then make an overt response after the "Go" signal was visually displayed 2 s after presentation of the nouns (delayed response design). The nouns were presented either visually or auditorily in separate trial runs. In the first run, the patient was asked to only listen to auditorily presented nouns. In the remainder of the runs, the patient was asked to make an overt response to each of the nouns that were presented auditorily in the second and fourth runs, and visually in the third and fifth runs after the "Go" signal. The delayed response design was intended to separate and differentiate the neural activities that were related to sensory stimulus processing, motor processing, and the delay period. Each run consisted of the presentation of 33 words and, thus, a total of 66 words were presented for the verb generation tasks in each modality.

While in the operating room, all auditory stimuli were presented at a comfortable sound level via a loudspeaker placed 1 m away from the patient, and all visual stimuli were presented in the center of a computer monitor (refresh rate: 60 Hz) placed 0.8 m from the patient; all nouns were presented with an interstimulus interval of 6 s. Only concrete high-familiarity nouns were used for this task and the patient was familiarized with the procedure prior to her operation. The intraoperative performance of the patient was 92.4% for the auditory task and 97.0% for the visual task. The electrocorticogram was recorded from electrodes that were temporarily placed directly on and surrounding the precentral gyrus, with an inter-electrode distance of 5 mm (**Figures 2A,B**). Only signals from electrodes that covered the pericentral gyri were analyzed in the present study.

## Data Analysis

The potentials at each electrode were re-referenced to an intracranial average electrode. The signals were digitized at 1000 Hz, recorded onto a computer hard disk, and the data bandpassfiltered at 0.1–300 Hz. Any trial containing an epileptic discharge or other artifact was eliminated from further analysis. Event-related spectral perturbation (ERSP) measures the average

bad channels.

dynamic changes in the amplitude of the broadband electroencephalographic frequency spectrum as a function of time relative to an experimental event (Makeig, 1993). In the present study, time-frequency analyses were conducted using three-cycle standard Morlet wavelets at each frequency from 2.9 to 150 Hz progressing through 6.5 s data epochs (from 1.5 s prior to and 5 s after the stimulus onset) in 26.8 ms steps. A bootstrap resampling method was used to test whether the ERSP deviations in spectral power in the post-stimulus interval were significantly larger than in the pre-stimulus period. Bootstrapping addresses the significance of deviations from pre-stimulus baseline power by randomly resampling the spectral estimates of the selected

represents time, the vertical axis represents frequency, and the colors

pre-stimulus epoch data of each trial and then averaging these, thus constructing a surrogate baseline data distribution. Additionally, the False Discovery Rate (FDR) algorithm was applied to correct for multiple comparisons. The ERSP plots provided time-frequency points at which the mean log power was significantly higher or lower (bootstrap, p < 0.05, FDR-corrected) than the mean power during the 1.5 s pre-stimulus baseline period for the same epochs. The increase in power in a given electrode was considered to be significant if a cluster of pixels of power larger than 1.5 dB in the gamma band range was arranged in a continuous array spanning at least 40 Hz in width. The clusters that fulfilled this condition were detected with the Matlab function bwconncomp.m, which was used for finding connected components in an image.

## Speech Analysis (Post-Operative)

In the present case study, resection of the ventral portion of the precentral gyrus was inevitable due to the following clinical findings: (1) other than the pathological Gadolinium-enhancement on preoperative MRI, which is associated with more aggressive lesions (Upadhyay and Waldman, 2011), thallium uptake was high on the preoperative single photon emission computed tomography (SPECT), which is known to be a good indicator of the histological grade (Comte et al., 2006). (2) Histological diagnosis of glioblastoma multiforme was made on the intraoperative as well as post-operative histological examinations. Her post-operative speech findings were first evaluated 1 month after the surgery with The Japanese Standard Language Test of Aphasia (SLTA), which is the standardized test battery most commonly used to evaluate Japanese aphasic patients. This test consists of 26 subtests addressing four modalities (speaking, listening, writing, and reading), three component levels (phoneme, word, and sentence), and two character types of Japanese language: kanji (morphograms) and kana (syllabograms). Because the patient developed non-aphasic speech disturbances post-operatively, the qualitative features of her articulatory disturbance were recorded on an IC-recorder and analyzed. One month after her surgery, the articulation of the patient was evaluated at the word level using the same list of 180 words with 3–5 syllables to examine her repetition and reading aloud abilities, on the same day. The error types were classified as follows: sequential phonemic error, omission (of the word-initial consonant), non-sequential phonemic error, and distortion. Sequential errors were defined as errors caused by the inadequate ordering of phonemes or syllables in which the distance between the original position of a target and its actual position was within two syllables. As described previously (LaPointe and Horner, 1976) these errors were subdivided as follows: pre-positioning, in which a phoneme is replaced by one that occurs later in the word; post-positioning, in which a phoneme is replaced by one that occurs earlier in the word; and metathesis, in which two phonemes switch places. Classification as a non-sequential phonemic error was made conservatively, such that a phoneme was considered to fit into the distortion category if it shared the place of articulation with the target syllable even when it was judged to be free of distortion. Two experienced neurologists and an experienced speech pathologist transcribed the speech sample of the patient. First, one examiner orthographically transcribed the speech sample and the resulting transcript was then independently verified or modified by the other two examiners, yielding a total of three transcripts. These three transcripts were then compared and consolidated to produce a composite transcript that reflected the consensus of at least two of the three examiners. No further analyses were performed on utterances for which no agreement could be reached. If the patient discontinued articulating a word halfway through her verbalization thereof, the errors in the discontinued word were evaluated if it was intelligible and decipherable from the target word, or by her corrected utterance.

## Results

## ECoG Findings from the Verb Generation Task

The recorded area was localized based on the central sulcus and the precentral sulcus which were identified according to the hand-knob sign (Yousry et al., 1997), and preoperative fMRI was used to locate the hand motor area (thumb opposition task). Time-frequency analysis of the electrocorticogram revealed significant responses characterized by a broadband high-gamma band in multiple electrodes on the precentral gyrus (**Figure 2C**). Whereas early phase responses that were time-locked to the stimulus (noun) presentation were observed under auditory conditions, these responses were not observed following visual presentation of stimuli (**Figures 2D,E**, **3**). Five electrodes in the precentral gyrus exhibited activity in response to auditory stimuli (Channels 29, 30, 38, 40, and 50; **Figure 2D**). In contrast, late-phase responses that were time-locked to the "Go" signal were observed in common electrodes (**Figures 2D,E**) that represented the motor response. All of the sites exhibiting auditory responses were characterized by biphasic activity, with early and late phases (e.g., Channels 30 and 50; **Figure 3**). Also, some electrodes exhibited only late-phase activity that did not accompany any sensory activity (e.g., Channels 28 and 41; **Figure 3**), and likely corresponded to a purely motor area. Passive listening to auditory presentation of the noun (without a verbal response) induced significant responses from the set of electrodes that exhibited early-phase responses in the auditory verb generation task (**Supplementary Figure 1**). It is of note that adjacent

FIGURE 3 | Power curve in the 60–140 Hz band under auditory (red) and visual (green) conditions. Indicating (upper row) biphasic responses with the auditory-selective response accompanied by the motor response (Channels 30 and 50) and (lower row) monophasic responses with purely motor responses (Channels 28 and 41). The end of stimulus presentation is indicated by the dotted line at 0.65 s while the other dotted line indicates the timing of the "Go" signal at 2.65 s. Gray line indicates mean voice onset time of the subject's response.


electrodes often displayed completely different response patterns (e.g., Channel 50 vs. Channel 41; **Figure 2D**).

## Post-Operative Speech Findings

In the present case study, the ventral portion of the precentral area, which included an auditory-responsive cluster, had to be resected due to the location of the tumor. Excision of the tumor was done with minimal damage to surrounding structures with the extent of resection (EOR) of 100% (**Figure 4**). Postoperatively, the patient suffered from AOS. She did not show any apparent orofacial motor weakness or buccofacial apraxia and her language function, including auditory and reading comprehension of syllables, words, and sentences, and the writing of words and sentences, was normal according to the SLTA: the patient scored full (10 out of 10 questions) in all of the following 9 subscores of the SLTA which were relevant for the present evaluation, namely: "auditory comprehension of words," "auditory comprehension of short sentences," "auditory comprehension (to obey verbal commands)," "auditory comprehension of syllables," "written naming of pictures with kanji letters," "written naming of pictures with kana letters," "dictation of kana letters," "writing to dictation with kanji words," "writing to dictation of kana words." Her speech was characterized by frequent distortion, a slow overall rate with abnormal prosody, lengthened segment durations and segmentation, and variable articulatory disturbances including sound distortions, substitutions, omissions, and sequential errors. Some of the errors were interpreted to be phonemic rather than phonetic based on their sequential nature (**Table 1**). In the word-level reading and repetition task, sequential errors were identified in 15.8% of words (15.6% during repetition and 16.1% during reading). Of the 57 sequential errors, 53 (93%) were pre-positioning, three (5.6%) were postpositioning, and one was equivocal. Metathesis was not observed. Omissions of the word-initial consonant were identified in 8.0% of the words, non-sequential phonemic errors in 13.3% of the words, and distortion in 21.7% of the words. Additional characteristic finding was that when the patient had trouble pronouncing the syllable "mu" and was not successful even after several trials; subsequently, she would begin with the syllable "ma" and say "ma mi mu," which is a part of the M row of the "50-on," an overlearned Japanese kana syllabary. The patient strategically used the 50-on syllabary as a cue and eventually could successfully pronounce "mu." Similar behavior was observed six times during the session.

## Discussion

In the present study, we recorded ECoG from the pericentral region of a patient with a brain tumor in this area during an awake craniotomy. Activities were observed in the precentral gyrus in response to auditory, but not visual, stimuli, which suggests that the observed responses did not reflect general linguistic processes, such as word production or comprehension, or the premotor preparatory activity that precedes articulation, because no activities were observed in response to the visual presentation of nouns in the same task. Resection of a part of the ventral precentral area, from which a positive ECoG auditory response was detected (Channel 50), was inevitable due to the location of the tumor. Consistent with previous case studies of patients with lesions in the precentral gyrus, the present patient did not develop any post-operative comprehension disorders. However, she did develop AOS which was characterized by dysprosodic, slow, irregular speech with distortion and phoneme substitutions, including multiple sequential errors (pre-positioning and post-positioning). This was likely because the lesion included the auditory-responsive precentral premotor area.

## Role of the Auditory-Responsive Ventral Premotor Subarea in Speech Production

In clinical studies, the uniformity of functional structure within the precentral gyrus has rarely been questioned, except for the distinction between BA 4 and BA 6. In experimental studies using monkeys, the ventral premotor area (PMv) is thought to specialize in direct sensory-motor mapping while the dorsal premotor area (PMd) is thought to be involved in indirect sensorymotor mapping (Hoshi and Tanji, 2004). PMv in monkeys has been extensively discussed as a structure associated with mirror neurons, which fire during the execution of an action as well as during the observation of an action, and are proposed to support the understanding of others' actions via motor simulation (Di Pellegrino et al., 1992). A large number of neuroimaging studies in humans have demonstrated the existence of the mirror mechanism in humans in the posterior inferior frontal area and the PMv (Rizzolatti and Craighero, 2004). Some see BA44 as the human homolog of macaque F5 (Rizzolatti and Arbib, 1998), but others see BA6 as the human homolog of F5 (Morin and Grèzes, 2008). Mirror neuron findings were generalized to speech understanding (Rizzolatti and Arbib, 1998) based on analogy to the motor theory of speech perception (Liberman and Mattingly, 1985). However, as was mentioned in Introduction, it is increasingly clear that motor system could be crucially recruited for speech perception only under certain conditions that make speech discrimination hard (D'Ausilio et al., 2012; Krieger-Redwood et al., 2013). Consistent with these recent findings, our patient performed flawless on all standard tasks evaluating language comprehension including phoneme identification task. Although only parts of the auditory-responsive area have been resected and it is possible that the remaining regions could account for intact speech perception in this case, previously reported cases with lesions restricted to precentral gyrus also showed intact phoneme identification in the same sets of

FIGURE 4 | (A) Post-operative axial MR image of the resected region (arrow) in a slice most similar to Figure 1B. (B) Resected area (black) superimposed on the cortical surface, with the positioning of the electrodes.

tasks (Tanji et al., 2001; Kasahata, 2013). Also, if premotor neurons are responsible for understanding of a certain behavior, it would be expected that the auditory response accompanied by this understanding should be cross-modal, as was proposed following the discovery of audiovisual mirror neurons in monkeys (Kohler et al., 2002). However, in the present study, the dissociation between the neural activities induced by auditorily and visually presented words suggests that this process was sensoryspecific; the auditory modality, in this case. Taken together, the auditory-motor response observed in our case is not likely to be interpreted as having a proposed "mirror" property.

The auditory-motor activation of the PMv in our case might be explained better with recently proposed model of language processing proposing two parallel streams, in which dorsal stream serves auditory–motor integration (Hickok and Poeppel, 2007). The dorsal auditory stream was found to be closely linked with the vocal motor control system, and PMv was proposed to serve as a key component in the theories that proposes production-oriented auditory processing of the dorsal stream (Houde and Nagarajan, 2011; Guenther and Vladusich, 2012). DIVA (Directions Into Velocities of Articulators) model, a neural model of speech acquisition and production that provides a conceptual and computational framework for interpreting data concerning brain activity during speech and language task (Guenther et al., 2006), proposed that speech sound map neurons gradually acquire the feedforward motor command programs corresponding to the auditory target sounds. The link between perception and action thus arises in the DIVA model because the motor reference frame is brought into register with the auditory reference frame. The model therefore predicts a causal relationship between the speech sounds acquired in auditory coordinates and their associated motor programs (Guenther and Vladusich, 2012). Consistent with the DIVA model, the state feedback control model (SFC) proposes that auditory input is not only used for comprehension during listening, but also for speech production. In this model physical auditory feedback of one's own speech is compared with a prediction derived from efference copy of the motor output, and used to train and update the internal model. It was postulated that the auditory-responsive portion of the ventral premotor area is ideally placed to serve this intermediary role that mediates the prediction and correction processes running between motor and sensory cortices (Houde and Nagarajan, 2011). The fact that PMv is active during passive listening to speech is consistent with the reciprocal connections between premotor and sensory areas and was regarded as the evidence for premotor cortex playing such an intermediary role in speech production.

A recent study investigated ECoG responses in the premotor area during the articulation of pseudo-words under two conditions: one in which the subjects were required to repeat the presented syllables and one in which the subjects were required to respond using pre-associated pseudo-words (Cogan et al., 2014). Under both conditions, the decoding of the initial auditory response in the premotor cortex successfully predicted the paired response, which is consistent with the existence of the representation of "parity" between sensory and motor processes in the premotor cortex.

#### Implication for the Pathophysiology of AOS

A recent study investigating the lesion distributions of AOS patients without concomitant aphasia found that the lesions were predominantly located in the left premotor cortex (BA 6) (Jacks et al., 2010). Furthermore, single-case studies on patients with lesions contained within the precentral gyrus have repeatedly reported the development of AOS (e.g., Lecours and Lhermitte, 1976). Along with these case studies, the present findings confirm that complex speech disturbances characterized by prosodic abnormalities, inconsistent distortion, and phonemic inaccuracies with sequential errors, result from lesions in the precentral gyrus.

Several issues render the diagnosis of AOS difficult, and indeed debatable (see Introduction), such as the problem of whether articulatory disturbances should be classified as purely phonetic instead of phonemic. While AOS is predominantly characterized by distorted sounds not attributable to deficits in muscle tone or reflexes, which would reflect disturbances during the encoding of phonological patterns into appropriate speech movements (Canter et al., 1985), it is argued that speech disturbances accompanied by good acoustically and perceptually produced sounds that are missequenced should be classified as phonemic paraphasias, which are typically associated with posterior lesions (McNeil et al., 2009). In fact, it has been asserted that the debate over whether AOS is a phonological disorder or a motor programming disorder is fatuous because, by definition, AOS is a motor planning/programing disorder (McNeil et al., 2009). The clear-cut demarcation of an abstract amodal phonology from the motor mechanisms of speaking support these arguments (Ziegler et al., 2012). However, it is challenging to delineate a clear boundary between AOS and phonological paraphasias based on the distinction between phonemic and phonetic errors because patients with AOS often produce apparently well-articulated phonemic errors (Goodglass, 1993; Ziegler et al., 2012), as in the present case. Lapointe and Johns found that all 13 of their AOS patients produced sequential errors, albeit in various proportions from 0.8 to 20% of all errors (La Pointe and Johns, 1975). In other studies, extreme cases of AOS have been reported. One such patient with a frontal lesion that included the precentral gyrus was primarily characterized by metathesis in which approximately 90% of the phonemic errors were classified as sequential errors (Sasanuma, 1971). An MRI that was performed later revealed a cortical lesion that involved the left precentral gyrus and the insula and extended to the underlying subcortical regions (Kawachi, 1987). Another similar case with a lesion that was restricted to the precentral gyrus and insula developed predominantly sequential errors (52% of all phonemic errors) that were typically due to pre-positioning (Tanji et al., 2001). Lesions common to both cases involved the ventral precentral gyrus and the insula. These authors suggested that these regions may be responsible for articulatory sequencing during which the proper timing for each phoneme and syllable are coded (Tanji et al., 2001). In the present case, sequential errors of phonemes or syllables were identified in 16% of the words that were repeated or read aloud. Although it remains uncertain whether or not non-sequential phonemic errors result from random distortion, sequential errors can be confidently identified as phonemic because these types of errors reflect the influence of contextual phonemic information. Although it has been argued that sequential errors result exclusively from disturbances in the phonological system of the posterior language area (McNeil et al., 2009), cases such as these indicate that sequential errors can also be caused by lesions localized to the premotor area. It has been suggested that the primitives of speech motor programs are the size of syllables and the overlearned syllable-sized programs form a mental syllabary that is stored in the ventral premotor cortex (Levelt, 2001). The cited authors further proposed that word forms comprised of two or more syllables are not stored as pre-specified motor routines and, consequently, when a word is articulated, pre-compiled programs should assemble into sequence. Hence, this type of sequencing of syllables into larger utterances must be performed online during speech production and should depend on the motor execution system, including the premotor cortex. The mechanism to "bind" the temporally segregated sub-entities of these events into a single unified entity was proposed to be provided by the PMv (Fiebach and Schubotz, 2006). This is also consistent with the existence of the motor phonological system (Hickok et al., 2011), which implies that the premotor cortex is involved in the phonological process.

Another finding from the present study that supports the division of labor between the sensory phonological and motor phonological processes is the characteristic groping behavior regarding the articulation of "mu" (see Results). The most likely interpretation of this behavior is that the patient in the present study had a clear idea of the target syllable in terms of sensory phonology, which is supported by the fact that she was able to write the letter correctly, but she was not able to retrieve an articulatory motor counterpart therefor. However, she was able to overcome this issue using the "50-on," an overlearned Japanese kana-syllabary, as a contextual cue. This is consistent with the well-accepted characterization of AOS: The affected patient cannot speak properly although he or she is aware of what he or she wants to say and how it should sound (Hillis et al., 2004). As in the present case, some of the struggling behavior that is often observed in patients with AOS may be interpreted as an inability to recall the motor plan for production of a speech sound (Van der Merwe, 2009).

Turning to the heterogeneity of AOS symptoms, if each patient with AOS is carefully assessed on a case-by-case basis, even patients with circumscribed lesions in the ventral precentral cortex can show a diversity of symptoms (Kasahata, 2013). This likely reflects the heterogeneity of physiological properties within the precentral gyrus. As indicated by the ECoG findings in the present study, subregions with distinct functional properties exist within circumscribed areas of the precentral cortex. It is reasonable to assume that a slight shift in the location of a lesion would thus result in different symptoms. As discussed above, resection of the tumor in the present study occurred in a cortical area associated with characteristic sensory-motor responses. To the best of our knowledge, this is the first patient study which has reported AOS as a consequence of the resection of a brain region within the precentral gyrus, with a confirmed auditory response.

## Conclusions

Based on the dual-route model (Hickok and Poeppel, 2007), premotor auditory activity likely reflects input from the area Spt at the parietal-temporal boundary via the arcuate fasciculus. Moreover, recent ECoG data suggest that the area Spt exhibits premotor activity which precedes the production of speech (Edwards et al., 2010). This finding is consistent with the clinical observation that a lesion in the temporoparietal junction including the area Spt can result in phonemic paraphasia (Damasio and Damasio, 1980). The present observation that a lesion in the premotor subregion of the precentral gyrus, which processes auditory inputs, results in AOS, suggests that the premotor area plays an important role during articulation, using input from the auditory linguistic system. These findings also support the argument that the premotor area is more appropriately interpreted using the "sensory theory of speech production" rather than the "motor theory of speech perception."

## Acknowledgments

Authors thank Aya Sato for taking part in the classification of articulatory disturbance. We also thank Jun Tanji and Atsushi Yamadori for valuable comments on the manuscript.

## References


## Disclosures

This study was supported by a Grant-in-Aid for Scientific Research from the Japanese Ministry of Education, Culture, Sports, Science, and Technology (Grants No. 21890021).

The English in this document has been checked by at least two professional editors, both native speakers of English. For a certificate, please see: http://www.textcheck.com/certificate/LnWwEE

## Supplementary Material

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fnins. 2015.00078/abstract

Supplementary Figure1 | Passive listening to auditorily presented nouns induced significant responses in electrodes common to the verb generation condition, that were timelocked to the auditory stimuli (Channels 29, 30, 38, 40, and 50).


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Tanji, Sakurada, Funiu, Matsuda, Kayama, Ito and Suzuki. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

## An acoustic gap between the NICU and womb: a potential risk for compromised neuroplasticity of the auditory system in preterm infants

## *Amir Lahav1,2\* and Erika Skoe3*

*<sup>1</sup> Department of Pediatrics and Newborn Medicine, Brigham and Women's Hospital, Boston, MA, USA*

*<sup>2</sup> Department of Pediatrics, Harvard Medical School, MassGeneral Hospital for Children, Boston, MA, USA*

*<sup>3</sup> Department of Speech, Language, and Hearing Sciences, Department of Psychology Affiliate, Cognitive Sciences Program Affiliate, University of Connecticut, Storrs, CT, USA*

#### *Edited by:*

*Monica Munoz-Lopez, University of Castilla-La Mancha, Spain*

#### *Reviewed by:*

*Catherine J. Stevens, University of Western Sydney, Australia Karel Allegaert, Universitaire Ziekenhuizen Leuven, Belgium*

#### *\*Correspondence:*

*Amir Lahav, Department of Pediatrics and Newborn Medicine, Brigham and Women's Hospital, 75 Francis St. Boston, MA 02115, USA e-mail: amir\_lahav@hms.harvard.edu* The intrauterine environment allows the fetus to begin hearing low-frequency sounds in a protected fashion, ensuring initial optimal development of the peripheral and central auditory system. However, the auditory nursery provided by the womb vanishes once the preterm newborn enters the high-frequency (HF) noisy environment of the neonatal intensive care unit (NICU). The present article draws a concerning line between auditory system development and HF noise in the NICU, which we argue is not necessarily conducive to fostering this development. Overexposure to HF noise during critical periods disrupts the functional organization of auditory cortical circuits. As a result, we theorize that the ability to tune out noise and extract acoustic information in a noisy environment may be impaired, leading to increased risks for a variety of auditory, language, and attention disorders. Additionally, HF noise in the NICU often masks human speech sounds, further limiting quality exposure to linguistic stimuli. Understanding the impact of the sound environment on the developing auditory system is an important first step in meeting the developmental demands of preterm newborns undergoing intensive care.

**Keywords: auditory development, NICU, preterm infants, noise exposure, high frequency**

## **AN ACOUSTIC GAP BETWEEN THE NICU AND THE WOMB**

Surrounded by amniotic fluid, the first sounds the fetus experiences are low-frequency digestive noises and maternal sounds transmitted through the bones of the skull (Querleu et al., 1988; Lecanuet and Schaal, 1996; Sohmer et al., 2001). However, preterm infants (born *<*37 weeks of gestation) are no longer surrounded by fluids or live underwater, and this new reality forces them to hear primarily through air conduction despite their auditory system being accustomed to bone conduction. This major difference in the primary mode of hearing (bone vs. air conduction) and the medium of sound transmission (fluid vs. air), presents an acoustic gap between the unnatural acoustic environment of the hospital and the developmental demands of the newborn's auditory system. The developmental implications of this acoustic gap remain largely unstudied. Differences between the auditory environments in the neonatal intensive care unit (NICU) vs. the womb are summarized in **Table 1**. Unlike the womb, the primary auditory stimulation available to intensive care neonates is environmental noise generated by ventilators, infusion pumps, fans, telephones, pagers, monitors, and alarms. Such excessive exposure to high-frequency noise, and recurrent electronic beeps that would not otherwise be present had the baby remained protected by the intrauterine environment and not been born prematurely, constitutes a trauma to the auditory system of a preterm infant. This acoustic trauma, we argue, may be potentially harmful, increasing the risk for auditory, language, and attention disorders. Although cases of hearing disorders in newborns are typically associated with congenital malformations, prenatal infections, and drug exposure (for review see Resendes et al., 2001; Beswick et al., 2012), this article is specifically focused on auditory impairments induced by environmental noise.

While exposure to loud noise is intuitively understood to be distracting and harmful, the shortage of biological and periodic auditory stimuli in the NICU environment is less acknowledged to be of concern. For example, the sensory perception of the maternal heartbeat in the womb provides the fetus with an important rhythmic experience that likely explains the natural tendency of the newborn to seek auditory entrainment soon after birth (Ingersoll and Thoman, 1994; Ullal-Gupta et al., 2013). In contrast, the more random, aperiodic nature of NICU noise suppresses opportunities for rhythmic entrainment known to facilitate arousal regulation (Smith and Steinschneider, 1975) and social interactions (Phillips-Silver et al., 2010) in early infancy.

## **THE FREQUENCY SPECTRA IN THE NICU vs. THE WOMB: IMPLICATIONS FOR THE TONOTOPIC DEVELOPMENT OF THE AUDITORY SYSTEM**

Auditory development is a slow process that begins *in utero*. Critical aspects of this development take place before full

#### **Table 1 | An acoustic gap between the NICU and the womb environments.**


gestation and are therefore vulnerable to disruption by the NICU environment especially given that the frequency spectrum of the NICU environment is quite different from what is experienced in the womb. Previous studies have shown that the acoustic environment of the NICU contains a significant amount of HF noise (*>*500 Hz), emanating from a wide variety of medical equipment and human activity that are unlikely to be heard in the womb (Kellam and Bhatia, 2008; Livera et al., 2008). A recent study using sound spectral analysis over a five-day period showed that NICU infants were exposed to frequencies between 500 and 16,000 Hz 57% of the time, with the majority of exposure being during daytime falling in the range of 501–3150 Hz (Lahav, 2014). The potential risk of HF noise exposure in the NICU is further increased by the fact that the frequency spectra of NICU noise is rarely monitored, with majority of studies in the field solely focused on measuring loudness levels.

High-frequency frequency noise exposure in the NICU is a concern because the auditory system is still functionally underdeveloped at birth, with critical stages of development occurring during the final weeks of gestation (for review, see Graven and Browne, 2008). While the structural components of the inner ear (bony labyrinth of the cochlea) are already formed by 15 weeks gestational age (GA), the onset of cochlear function does not occur until 24 weeks GA or later (Pujol et al., 1991; Moore and Linthicum, 2007). As evidence of the functional onset of hearing, electrophysiological data from preterm neonates demonstrates that brainstem auditory evoked potentials are first recordable between 25 and 32 weeks GA (Starr et al., 1977; Amin et al., 2003; Yin et al., 2008; Coenraad et al., 2011; Jiang and Chen, 2014). After 34 weeks GA once the spiral ganglion neurons in the cochlea have formed sufficient neural connections with the auditory brainstem and have begun to extend those connections toward the auditory cortex, evoked potentials to sound become more robust (Pujol and Lavigne-Rebillard, 1992; Hepper and Shahidullah, 1994; Hall, 2000).

Development of the cochlea and central auditory system is complex. Within the cochlea, reside tens of thousands of inner hair cells, sensory receptors, that each respond maximally to a specific frequency (Pujol et al., 1991; Pujol and Lavigne-Rebillard, 1992; Morlet et al., 1993). These hair cells are arranged tonotopically with high-frequency hair cells located basally (closer to the middle ear) and low-frequency ones located apically (Kandler et al., 2009) (see **Figure 1**). This cochlear tonotopy is preserved along the auditory neuroaxis as a consequence of spiral ganglion neurons establishing precise connections between cochlear hair cells and target neurons in the auditory brainstem that code for different sound frequencies (Pujol and Lavigne-Rebillard, 1992; Appler and Goodrich, 2011). Gradual development of these tonotopic frequency maps occurs with low-frequency regions maturing before high-frequency ones, a process often referred to as "frequency-dependent plasticity" (Talavage et al., 2000). This low-to-high developmental gradient is promoted by the acoustic makeup of the womb in which frequencies above 500 Hz are attenuated by maternal tissues and fluids within the intrauterine cavity. Toward the end of pregnancy, as the walls of the uterine lining begin to thin, gradually more HF energy (*>*500 Hz) is passed through the womb (Bench, 1968; Gerhardt, 1989; Gerhardt et al., 1990; Hepper and Shahidullah, 1994; Abrams and Gerhardt, 2000). Thus, while the womb provides an optimal medium for the initial phases of hearing development by limiting exposure to HF sounds (Hall, 2000), the sound frequencies present in the NICU are not necessarily conducive to furthering this development (Graven, 2000) (see **Figure 1**). Increased exposure to HF stimulation in the NICU while a majority of cochlear neurons are still migrating (Battin et al., 1998; Bystron et al., 2008) and cortical folding is still in flux, may disrupt the normal tonotopic tuning of cochlear hair cells, and hinder auditory development subcortically and cortically (Walker et al., 1971). Thus, owing to the experience-dependent nature of auditory development (Zhang et al., 2001; Chang and Merzenich, 2003; Oliver et al., 2011; Zhou et al., 2011), the statistical properties of the acoustic environment in the NICU may potentially misguide the topographic assembly of the auditory brain system (Pujol and Lavigne-Rebillard, 1992), resulting in poorer frequency resolution. It is therefore likely that overexposure to HF noise during this critical period may impede the developing auditory system with effects seen well-beyond the postnatal period.

## **CAN THE FETUS HEAR HF SOUNDS ORIGINATING OUTSIDE OF THE WOMB?**

As a consequence of the acoustic properties of the womb, the fetus receives, for the most part, a low-pass filtered version of the auditory environment in the world. Given the low-to-high frequency development of the cochlea, this raises the question of whether the fetus can in fact hear HF sounds that are loud enough to penetrate the womb and, if yes, when does this responsivity to HF sounds emerge? These questions have been addressed directly and indirectly by several studies using a variety of techniques. Hepper and Shahidullah (1994) examined the responsiveness of the human fetus to external auditory stimuli (pure tones) presented by a loudspeaker placed on the maternal abdomen at different frequencies (100, 250, 500, 1000, and 3000 Hz). Recording of fetal movements via ultrasound revealed a preferential sensitivity of the fetus to external sounds in the low-frequency range (*<*500 Hz) as early as 19 weeks of gestation. At 27 weeks GA, the vast majority of fetuses responded to sounds below 500 Hz but none responded to the higher frequency sounds at 1000 Hz or 3000 Hz. Responsiveness to sounds above 1000 Hz was not observed until 33 weeks gestation. For all frequencies presented, there was a significant decrease in the intensity required to elicit a response with increased GA, likely due to the maturation of the auditory system and the thinning of the intrauterine walls in the last trimester of the pregnancy (Querleu et al., 1988). A followup study by Kisilevsky et al. (2000) using high volume high-pass filtered white noise (800–20,000 Hz) presented to the mother's abdomen showed that sound-evoked responses (in the form of cardiac acceleration and body movement) emerged at 30 weeks for both low-risk and high-risk fetuses, and required less intense stimulation to evoke responses later in development. While both studies indicate that sensitivity to HF sound emerges during the 7–8th month of gestation, neither study examined auditory system function directly but instead used fetal movements as an indirect measurement of hearing sensitivity.

Studies using MEG- and fMRI-based techniques in fetuses provide more direct measurement of auditory cortical function to HF sounds presented at high intensity. This body of research has provided modest evidence that the auditory cortex is activated by frequencies above 500 Hz by 33 weeks (Draganova et al., 2005; Jardri et al., 2008), that by 33–36 weeks that the fetus can differentiate a 500 Hz sound from a higher frequency one (Draganova et al., 2005), and that later in gestation (37–41 weeks) the auditory cortex is activated by naturalistic sounds containing a broad spectrum of frequencies (Moore et al., 2001). Thus, the likelihood exists that by ∼33 weeks, there is a degree of HF penetration of external sounds through the abdomen, allowing the fetal auditory system to be activated by the high-frequency aspects of speech and other complex, naturalistic stimulation. This prenatal exposure to HF naturalistic sounds has been argued to prime the fetus for voice recognition, vowel discrimination, melody discrimination, among other complex auditory skills (reviewed in Granier-Deferre et al., 2011).

Based on the studies reviewed in this section, it appears that fetuses can respond to HF sounds transmitted through the maternal abdomen after ∼33 weeks, and that responsiveness increases with GA. However, the mere fact that fetuses are capable of responding to HF noise does not necessarily imply they should be exposed to such sounds, especially in high doses. The existing literature does not rule out the possibility that the intense exposure to HF sounds used in these specific experiments, especially during early stages of development, was in fact harmful. Therefore, the potential harm of intense, direct, and repeated exposure to HF noise without the protection of the maternal abdomen, as experienced by extremely preterm infants in the NICU, should be given a more carefully evaluation. However, unlike HF noise, exposure to potentially positive HF sounds (e.g., speech, music) during the last stages of gestation may in fact help set the stage for hearing and language skills.

## **NOISE-INDUCED PLASTICITY IN THE AUDITORY SYSTEM FUNCTION**

Our concern regarding the adverse effects of HF noise exposure is supported by evidence from animal studies. Animal models have revealed that sensory impoverishment during critical periods of development, in the form of acoustic noise or reduced complexity of auditory input, can lead to malformed tonotopic cortical maps, reduced neural synchrony, and broader tuning curves, which reflect decreased frequency-sensitivity of the auditory system (Zhang et al., 2001, 2002; Oliver et al., 2011). For example, young rats repeatedly exposed to HF tone pips showed distorted auditory function later in life (Oliver et al., 2011). The residual effects of that early augmented environment included altered brainstem auditory evoked potentials to the frequency of overstimulation, in addition to expanded neural frequency maps (Oliver et al., 2011). As a consequence of this early unnatural sound experience, the rat's auditory system became tuned to the frequency of the tone pips at the expense of processing other sound frequencies. Modification of tonotopic maps in deaf individuals (Guiraud et al., 2007) and musicians (Pantev et al., 1998), suggests that similar experience-dependent developmental principles operate in humans. It is therefore likely that the abrupt transition from the womb to the NICU changes the typical patterns of auditory development, specifically altering how frequency information is processed and coded.

In addition to compromising tonotopy, increased HF noise exposure during the neonatal period may have other long-term consequences for the functional integrity of the auditory system. In recent years, there has been growing concern about environmental noise in individuals of all ages. Indeed, sound intensities once thought to be safe for the auditory system are now considered less safe, especially for more extended exposures (Maison et al., 2013; Basner et al., 2014; Gourevitch et al., 2014). In laboratory animals, prolonged noise exposure has been shown to impede auditory development (Chang and Merzenich, 2003), accelerate age-related hearing losses (Kujawa and Liberman, 2006), increase neural loss (Salthouse and Lichty, 1985; Maison et al., 2013), and reduce neural efficiency by increasing the spontaneous firing of auditory neurons in the absence of sound stimulation (Costalupes et al., 1984; Seki and Eggermont, 2003). Moreover, in children, excessive noise exposure can manifest in decreased reading and cognitive performance (Cohen et al., 1973; Bronzaft and McCarthy, 1975; Hygge et al., 2003), and may change how children discriminate and attend to auditory stimuli (Cohen et al., 1973; Evans and Kantrowitz, 2002; Evans et al., 2009), even when tested in quiet environments.

Considering the acoustic gap between the NICU environment and the womb, it is not surprising that auditory development is compromised in preterm compared to full-term newborns. Studies using brainstem auditory evoked potentials suggest that preterm infants have delayed myelination of the central auditory pathway (Pasman et al., 1996; Roopakala et al., 2011; Hasani and Jafari, 2013) in addition to atypical neural pathways when processing, discriminating, and memorizing auditory information (Fellman et al., 2004; Therien et al., 2004). Thus, exposing preterm infants to HF noise too early, while the auditory system is still immature, may hinder the normal development of hearing and subsequent language acquisition.

## **INCREASED BEHAVIORAL RELEVANCE OF HF NOISE AS A CONSEQUENCE OF NICU EXPERIENCE**

In addition to the presumably harmful effects of HF noise exposure to auditory system development, the collection of electronic noises in the NICU environment (coming from ventilators, telephones, pagers, and alarms) can often produce sufficient acoustic energy to mask natural human speech sounds potentially important to the preterm infant, whose exposure to linguistic stimuli is already restricted. This impoverished linguistic experience increases the behavioral relevance of noise, by shifting attentional focus away from speech sounds toward the noise in the environment. Behavioral and neurophysiological data from fetuses and healthy newborns, have revealed that fetuses become sensitive to sounds in the environment that are transmitted through the amniotic fluid including the sound of the mother's and father's voices (Fifer and Moon, 1994; Kisilevsky et al., 2003; Beauchemin et al., 2011; Voegtline et al., 2013; Lee and Kisilevsky, 2014) (reviewed in, Fava et al., 2011), with evidence of experiencedependent auditory learning emerging before birth (Kujala et al., 2003; Partanen et al., 2013; Krueger and Garvan, 2014). Given the importance of early experience in molding the auditory system (Skoe and Chandrasekaran, 2014), increased exposure to noise may over sensitize infants to noise, and, as a consequence, neural circuits may be formed to make noise the primary target of attention rather than treating it as a background stimulus that should be ignored. While this is an intriguing possibility, further research is needed to confirm or dispute this hypothesis.

Another factor that may impede the preterm infant's ability to tune out noise, is the immaturity of auditory feedback mechanisms (Morlet et al., 1993; Graven and Browne, 2008). In addition to inner hair cells, the cochlea contains outer hair cells that receive feedback from the central auditory system that buffer noise-induced damage and improve speech intelligibility in noise (Guinan, 2006). Background noise leads to a decrease in speech intelligibility that poses a perceptual challenge even for healthy adults with normal hearing. In addition to masking the signal due to physical overlap between the acoustics of noise and the acoustics of speech, noise acts as a competing signal that interferes with the ability to attend to a concurrent speech stream (Assmann and Summerfield, 1999). If greater behavioral relevance (i.e., unconscious attention) is placed on noise in the environment, or if biological feedback mechanisms are not fully intact, this could create further challenges for processing speech in noise for the preterm infant both in the immediate and also later in life. In support of this possibility, studies using brainstem auditory evoked potentials suggest that preterm infants have delayed myelination of the central auditory pathways (Pasman et al., 1996; Roopakala et al., 2011), which is associated with a variety auditory processing disorders (APD). Whether or not the high prevalence of APD in preterm population is attributed to the presence of highfrequency noise in the NICU is undetermined. However, one of the hallmarks of APD is difficulty processing target signals within a background of noise (Keith, 1999), further supporting the possibility that exposure to HF noise of the NICU environment may impede the preterm infant's ability to pull out signals from noise.

## **OPTIMAL FREQUENCY EXPOSURE FOR INTENSIVE CARE NEONATES: LACK OF RECOMMENDED STANDARDS**

Current guidelines set by the American Academy of Pediatrics (AAP) are primarily focused on loudness levels, leaving the potential risks of HF noise exposure in the NICU infants largely unaddressed. According to AAP standards (White et al., 2013), the combination of continuous background sound and operational sound shall not exceed an hourly Leq of 45 dB and an hourly L10 of 50 dB, while transient sounds (Lmax) shall not exceed 65 dB, all A-weighted slow response measurements (White et al., 2013). However, in practice, previous studies examining noise in the NICU have reported extremely high noise levels, exceeding the AAP recommended standards more than 70% of the time (Williams et al., 2007). Sound measurements within the NICU environment have been measured between 62 and 70 dBA (Philbin and Gray, 2002), with peak impulses exceeding 90 dBA (Williams et al., 2007) and 120 dBA (Kent et al., 2002). In another study, sound measurements yielded an overall average hourly level (Leq) of approximately 60 dBA with peak levels (Lmax) of 78.39 dBA (Krueger et al., 2005).

The problem of loud noise in the NICU has been diminished by modifications to NICU architectural designs, as more hospitals transition toward the single-room model in which newborns are housed in private rooms vs. the open-bay model where multiple babies are co-cared in a large room (White, 2011). Studies have shown that private-room NICUs are generally quieter than openbay NICUs (Szymczak and Shellhaas, 2014), except when high– frequency ventilation is used (Liu, 2012). However, while private rooms may have more favorable acoustics than open-bay designs, care should still be taken to ensure that the peak intensity and frequency characteristics in the NICU environment, even within a private room, are still optimal for preterm newborns. Thus, in the absence of clear guidelines and recommendations regarding the acoustic makeup of optimal sound exposure at birth, the NICU may present an acoustic danger zone for preterm newborns.

## **LIFELONG AUDITORY PLASTICITY: RECOVERY OPTIONS FOR PRETERM INFANTS**

While experience-dependent plasticity is greatest in the early years, the auditory system maintains the potential for malleability throughout life (Sanes and Woolley, 2011). For example, auditory brain plasticity has been demonstrated in older adults following short-term sound-based training (Tremblay et al., 2001; Song et al., 2008; Anderson et al., 2013). Similarly, musical training has been shown improve linguistic and cognitive abilities (Moreno et al., 2009; Strait et al., 2014) and speech intelligibility in noise (Strait et al., 2012) in young children, leading to neural enhancements of brain structure and function (Hyde et al., 2009; Halwani et al., 2011; Ellis et al., 2012; Strait and Kraus, 2014), and buffering against auditory aging in older adults (Parbery-Clark et al., 2009, 2012). In addition, cochlear implants can induce functional plasticity in the auditory brainstem even after many years of deafness in childhood, demonstrating the high degree of modifiability in brain mechanisms that support hearing abilities (Gordon et al., 2011; Cardon et al., 2012). Thus, our auditory histories—whether in the form of excessive noise, acoustic deprivation, or augmented sound training—can influence auditory processes across the lifespan (Skoe and Chandrasekaran, 2014).

What are the implications of this lifelong plasticity for preterm infants following NICU discharge? While the NICU environment may initially compromise the auditory development, it is encouraging that the post-NICU environment may help to close the developmental gap by allowing for near normal to normal auditory functionality later in life. Enriched home literacy environment and quality exposure to auditory and linguistic stimuli in the post-NICU environment are considered fundamental building blocks for this auditory neuroplasticity, laying the foundation for speech and language development (Burgess et al., 2002; Roberts et al., 2005; Rowe and Goldin-Meadow, 2009; Hart and Risley, 2010; Hammer et al., 2010; Skoe et al., 2013; Ramirez-Esparza et al., 2014). Although hearing, language, and attention deficits are common among preterm infants (Vohr, 2014), the fact that some children born prematurely manage to catch up to their peers suggests that despite the initial auditory trauma induced by the NICU environment, the window of opportunities for further plasticity and recovery remains open.

## **CONCLUSIONS**

The acoustic gap between the NICU and the womb, although somewhat unescapable, poses a hazard that may disrupt auditory development in intensive care neonates. As a consequence of the NICU environment, preterm infants receive a heavier dose of HF noise than what would be normally possible in the womb. The long-term effects of HF noise exposure on the development of preterm newborns prior to full gestation development is a growing area of research of particular clinical importance. The negative plasticity of the auditory brain system in response to HF noise exposure is concerning and highlights the importance of the newborn's sensory experience during postnatal hospitalization. It is tempting to theorize that excessive exposure to high-frequency noise during critical periods may be a contributing factor to the language, attention, and cognitive deficits often seen in the preterm population. Despite these evident concerns regarding HF noise exposure, current guidelines set by the AAP (White et al., 2013) are primarily focused on loudness levels, leaving the potential risks of HF noise exposure in the NICU largely overlooked. More knowledge of the spectral content of NICU noise would help in evaluating the auditory developmental consequences in NICU graduates. Intensive care neonates deserve to have a better protection plan against toxic sounds.

## **ACKNOWLEDGMENTS**

We gratefully thank Linda Malie for assisting with graphic design of **Figure 1** and Parker Tichko for comments on the manuscript. We are also grateful for the generous support Amir Lahav received from the Charles H. Hood Foundation, Peter and Elizabeth C. Tower Foundation, Gerber Foundation, Little Giraffe Foundation, Hailey's Hope Foundation, Jackson L. Graves Foundation, and TripAdvisor Fund.

## **REFERENCES**


Vohr, B. (2014). Speech and language outcomes of very preterm infants. *Semin. Fetal Neonatal Med*. 19, 78–83. doi: 10.1016/j.siny.2013.10.007


Williams, A. L., van Drongelen, W., and Lasky, R. E. (2007). Noise in contemporary neonatal intensive care. *J. Acoust. Soc. Am*. 121(Pt 1), 2681–2690. doi: 10.1121/1.2717500

Yin, R., Wilkinson, A. R., Chen, C., Brosi, D. M., and Jiang, Z. D. (2008). No close correlation between brainstem auditory function and peripheral auditory threshold in preterm infants at term age. *Clin. Neurophysiol*. 119, 791–795. doi: 10.1016/j.clinph.2007.12.012

Zhang, L. I., Bao, S., and Merzenich, M. M. (2001). Persistent and specific influences of early acoustic environments on primary auditory cortex. *Nat. Neurosci*. 4, 1123–1130. doi: 10.1038/nn745

Zhang, L. I., Bao, S., and Merzenich, M. M. (2002). Disruption of primary auditory cortex by synchronous auditory inputs during a critical period. *Proc. Natl. Acad. Sci. U.S.A*. 99, 2309–2314. doi: 10.1073/pnas.261707398

Zhou, X., Panizzutti, R., de Villers-Sidani, E., Madeira, C., and Merzenich, M. M. (2011). Natural restoration of critical period plasticity in the juvenile and adult primary auditory cortex. *J. Neurosci*. 31, 5625–5634. doi: 10.1523/JNEUROSCI.6470-10.2011

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 25 September 2014; accepted: 08 November 2014; published online: 05 December 2014.*

*Citation: Lahav A and Skoe E (2014) An acoustic gap between the NICU and womb: a potential risk for compromised neuroplasticity of the auditory system in preterm infants. Front. Neurosci. 8:381. doi: 10.3389/fnins.2014.00381*

*This article was submitted to Auditory Cognitive Neuroscience, a section of the journal Frontiers in Neuroscience.*

*Copyright © 2014 Lahav and Skoe. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Mapping tonotopic organization in human temporal cortex: representational similarity analysis in EMEG source space

## *Li Su1,2\*, Isma Zulfiqar 2, Fawad Jamshed2, Elisabeth Fonteneau2 and William Marslen-Wilson2,3*

*<sup>1</sup> Department of Psychiatry, University of Cambridge, Cambridge, UK*

*<sup>2</sup> Department of Psychology, University of Cambridge, Cambridge, UK*

*<sup>3</sup> MRC Cognition and Brain Sciences Unit, Cambridge, UK*

#### *Edited by:*

*Yukiko Kikuchi, Newcastle University Medical School, UK*

#### *Reviewed by:*

*Fatima T. Husain, University of Illinois at Urbana-Champaign, USA Peter Cariani, Harvard Medical School, USA*

#### *\*Correspondence:*

*Li Su, Department of Psychiatry, School of Clinical Medicine, University of Cambridge, Cambridge Biomedical Campus, Box 189, Level E4, Cambridge CB2 0SP, UK e-mail: ls514@cam.ac.uk*

A wide variety of evidence, from neurophysiology, neuroanatomy, and imaging studies in humans and animals, suggests that human auditory cortex is in part tonotopically organized. Here we present a new means of resolving this spatial organization using a combination of non-invasive observables (EEG, MEG, and MRI), model-based estimates of spectrotemporal patterns of neural activation, and multivariate pattern analysis. The method exploits both the fine-grained temporal patterning of auditory cortical responses and the millisecond scale temporal resolution of EEG and MEG. Participants listened to 400 English words while MEG and scalp EEG were measured simultaneously. We estimated the location of cortical sources using the MRI anatomically constrained minimum norm estimate (MNE) procedure. We then combined a form of multivariate pattern analysis (representational similarity analysis) with a spatiotemporal searchlight approach to successfully decode information about patterns of neuronal frequency preference and selectivity in bilateral superior temporal cortex. Observed frequency preferences in and around Heschl's gyrus matched current proposals for the organization of tonotopic gradients in primary acoustic cortex, while the distribution of narrow frequency selectivity similarly matched results from the fMRI literature. The spatial maps generated by this novel combination of techniques seem comparable to those that have emerged from fMRI or ECOG studies, and a considerable advance over earlier MEG results.

**Keywords: MEG, tonotopy, auditory cortex, spatiotemporal searchlight, RSA**

## **INTRODUCTION**

For most sensory systems, the spatial organization of cortical neuronal responses resembles that of the sensory surfaces, for example, retinotopy in vision, cochleotopy in audition, and somatotopy in the cutaneous senses. Although tonotopy has been found in non-human primates in the cochlea, the auditory brainstem and auditory cortex (Merzenich and Brugge, 1973; Gross et al., 1974; Ryan and Miller, 1978; Zwiers et al., 2004), it was historically difficult to observe this in humans before the advent of non-invasive neuroimaging methods (Ojemann, 1983; Howard et al., 1996). Here, we argue that the methods used to date are limited in their capacity to capture the rich temporal dynamics intrinsic to auditory processing (Gutschalk et al., 2004), and that improved methods are needed to map the neural substrates of auditory processing, in real time and non-invasively, in the human brain.

In recent decades, non-invasive methods, primarily MEG and EEG, have played an increasingly important role in neuroimaging investigations of auditory processes ranging from elementary auditory perception to complex speech processing (e.g., Luo and Poeppel, 2007). A pioneering neuromagnetic study by Romani et al. (1982), for example, used a single gradiometer device to provide evidence suggesting tonotopic organization in human auditory cortex. Most subsequent MEG studies used equivalent current dipole (ECD) methods to model the MEG sources for the N100 component (Sarvas, 1987), typically finding a gradient from high to low frequency in auditory cortex. Specifically, dipoles that correspond to high frequency inputs are located in deeper parts of the superior temporal plane, while low frequency dipoles are located in more superficial parts of the superior temporal plane (Romani et al., 1982; Pantev et al., 1996; Huotilainen et al., 1998; Cansino et al., 2003). However, it is now recognized that the assumptions made by such ECD methods are not empirically defensible (Lutkenhoner et al., 2003). In particular, when multiple sources are likely to exist in auditory cortex, it is unrealistic to assume that there is a single dipole in this region and results in unreliable conclusions about the organization of the auditory system (Lutkenhoner et al., 2003). In addition, the exclusive focus on the N100 component in many earlier MEG studies militated against exploring frequency responses in other time windows—although other auditory components have been examined in some studies (e.g., Pantev et al., 1995).

It is only in the last decade, when high field (≥3 T) fMRI began to be routinely used to explore the human brain, that more detailed tonotopic maps of human auditory cortex have emerged. A series of important studies have identified at least two tonotopic maps in human auditory cortex composing of multiple gradients from high to low frequency specific brain regions (e.g., Formisano et al., 2003; Dick et al., 2012; Moerel et al., 2012, 2013). These fMRI studies have superseded the earlier MEG research by virtue of their superior spatial resolution. Nonetheless, BOLD fMRI is driven by the slow processes of blood flow and only indirectly reflects the underlying neural processes that generate changes in the BOLD response. The intrinsically sluggish and indirect nature of these measures makes them unable to capture the millisecondby-millisecond temporal dynamics of neural processing, which is a core property of the auditory system (e.g., Gutschalk et al., 2004).

To overcome these and other shortfalls of the existing approaches used to non-invasively investigate human auditory cortex, we have combined a multivariate pattern analysis method, called Spatiotemporal Searchlight Representational Similarity Analysis (ssRSA), with a sliding time window approach to decode information about frequency preference and selectivity directly from the dynamic neural activity of the brain as reconstructed in combined MEG and EEG (EMEG) source space (Su et al., 2012). This method is an extension of the fMRI RSA (Kriegeskorte et al., 2006) to time resolved imaging modalities. All analyses in this paper were carried out using the Matlab Toolbox for RSA (Nili et al., 2014) and its MEG/EEG extension (http://www*.*mrc-cbu*.* cam*.*ac*.*uk/methods-and-resources/toolboxes/).

The reason for using time resolved imaging modalities is that the auditory cortex has to process an auditory input with rich, millisecond-by-millisecond temporal dynamics. Using this new approach, we focus on a key temporal property of auditory inputs such as speech—the complex and communicatively critical variations in frequency in the speech input over time—and on the cortical structures that dynamically process and represent these variations, tonotopically or otherwise. We map out both frequency preference and selectivity in bilateral superior temporal areas using combined MEG, EEG, and MRI data, where *frequency preference* refers to the dominant frequency range that a specific brain region may encode, and *frequency selectivity* refers to how broad or narrow is the frequency response of this same region. These two metrics tap into fundamental and important aspects of auditory processing.

The core procedure in ssRSA is the computation of similarity structures that express the dynamic patterns of neural activation at specific points in space and time. This similarity structure is encoded in a representational dissimilarity matrix (RDM), where each cell in the RDM is the correlation distance between the neural activation patterns elicited by pairs of experimental conditions (in this context, auditory stimuli). These *brain data RDMs*, reflecting the pattern of brain activity within the spatiotemporal window defined by the searchlight procedure (Kriegeskorte et al., 2006, 2008a), are then related to *model RDMs*, which express specific theoretical hypotheses about the properties of this activity. In the current study, exploring the frequency preferences and selectivity of auditory cortex, the model RDMs capture the similarity between each stimulus at each frequency band, derived from a computational model of the early stages of auditory processing (Patterson, 1987). This cross-correlational procedure makes it possible to relate low-level neural patterns directly to abstract higher-level functional hypotheses about the organization of the auditory cortex.

To illustrate and investigate the flexibility and the potential power of this ssRSA approach, and following the lead of Moerel et al. (2012, 2013), we use tokens of natural speech (single words) to probe the activity in human auditory cortex. The RSA approach does not require a stimulus set to be pre-structured in advance to test a set of hypotheses. Natural speech intrinsically contains variations in its frequency properties, and if these can be extracted and identified using computational modeling methods, then these more ecologically valid stimuli can form the basis for a set of model RDMs exploring the properties of the neural response to these variations. Although it will also be desirable to use the RSA approach with stimuli such as pure tones, the results of Moerel et al. (2012, 2013), using high-field fMRI, suggest a good concordance between the results for pure tones and the results for natural sounds. This is a comparison that we hope to pursue in the ssRSA/EMEG environment in future research.

In terms of neuroimaging methods, we record EEG simultaneously with MEG because the use of combined MEG and EEG delivers better source localisation than either of these modalities alone. This is because of their complementary sensitivity to neural generators at different orientations and depths (Sharon et al., 2007; Molins et al., 2008; Goldenholz et al., 2009; Henson et al., 2009; Hauk and Stenroos, 2014). The combination of MEG and EEG with neuroanatomical constraints from structural MR for each participant leads to still better source reconstruction results. We combine these three sources of constraint using well established minimum norm estimation techniques (Hämäläinen and Ilmoniemi, 1994; Gramfort et al., 2014), rather than the problematic ECD approach.

The ssRSA method, finally, not only provides a dynamic perspective on the functional organization of human auditory cortex, but also has the potential to relate directly the functional properties of human auditory cortex to neurophysiological evidence from behaving animals, as Kriegeskorte et al. (2008b) have already demonstrated in the visual domain.

## **METHODS**

#### **PARTICIPANTS, MATERIALS, AND PROCEDURES**

Seventeen right-handed native speakers of British English (6 males, mean age = 25 years, range = 19–35, with self-reported normal hearing and no history of hearing problems) were recruited for the study. All gave informed consent and were paid for their participation. The study was approved by the Peterborough and Fenland Ethical Committee (UK).

The study used 400 English verbs and nouns (e.g., *talk*, *claim*) some of which had past tense inflections (e.g., *arrived*, *jumped*). These materials were prepared for another experiment, and we assume (a) that their linguistic properties were independent of the basic auditory parameters being examined here and (b) that they provide a reasonably extensive and random sample of naturally occurring frequency variation in human speech. All analyses conducted here were restricted to the first 200 ms of each word. The stimuli were recorded in a sound-attenuated room by a female native speaker of British English onto a DAT recorder, digitized at a sampling rate of 22 kHz with 16-bit conversion, and stored as separate files using Adobe Audition (Adobe Inc., San Jose, CA). They averaged 593 ms in length.

Each trial began with a centrally presented fixation cross for a time-interval jittered between 250 and 500 ms. While the cross stayed on for another 1000 ms, a spoken word was presented binaurally at approximately 65 dB SPL via non-magnetic earpieces (Etymotics ER2 Acoustic Stimulator) driven through 2 m plastic tubes1 . The spoken word was followed by a blank screen of 1500 ms. On the majority of trials the participant made no response and was asked to simply listen attentively to the spoken words. On 8% of the trials, chosen at random, a written probe word was presented after the blank screen. On these trials (designed to ensure the participants were paying attention) the participants performed a one-back memory task, indicating whether the visually presented word matched the preceding acoustic stimulus or not by pressing a response button. Half of the participants answered "yes" with the right hand and "no" with the left hand. The other half used the reverse combination. Feedback was presented on the screen for 1000 ms and followed by a blank screen of 500 ms. The presentation and timing of stimuli was controlled using Eprime software (www*.*pstnet*.*com). Each item was presented twice in a pseudorandom order within 7 blocks. Each participant received 20 practice trials, which included a presentation of each different stimulus type and three exemplars of one-back memory trials.

#### **DATA RECORDING**

Continuous MEG data were recorded using a 306 channels VectorView system (Elektra-Neuromag, Helsinki, Finland) containing 102 identical sensor triplets, composed of two orthogonal planar gradiometers and one magnetometer, covering the entire head of the subject. Participants sat in a dimly lit magneticallyshielded room (IMEDCO AG, Switzerland). The position of the head relative to the sensor array was monitored continuously by feeding sinusoidal currents into four Head-Position Indicator (HPI) coils attached to the scalp. EEG was recorded simultaneously from 70 Ag-AgCl electrodes placed within an elastic cap (EASYCAP GmbH, Herrsching-Breitbrunn, Germany) according to the extended 10/20 system and using a nose electrode as the recording reference. Vertical and horizontal EOG were also recorded. All data were sampled at 1 kHz with a band-pass filter from 0.03 to 330 Hz. A 3D digitizer (Fastrak Polhemus Inc., Colchester, VA) was used to record the locations of the EEG electrodes, the HPI coils and approximately 50–100 "headpoints" on the scalp, relative to three anatomical fiducials.

#### **DATA PRE-PROCESSING**

Static MEG bad channels were detected and excluded from all subsequent analyses using MaxFilter (Elektra-Neuromag). Compensation for head movements (measured by HPI coils every 200 ms) and a temporal extension of the signal–space separation technique (SSS) was applied to the MEG data using MaxFilter. Static EEG bad channels were visually detected and interpolated (Hämäläinen and Ilmoniemi, 1994). The EEG data were rereferenced to the average over all channels. The continuous data were low-pass filtered to 40 Hz and epoched with respect to the onset of each word containing the first 200 ms period of the stimuli. By analysing the neural response to the earlier part of word, we hope to minimize the potential influence of high-level linguistic and cognitive processes (Hauk et al., 2012). Baseline correction was applied by subtracting the average response of the 100 ms prior to onset of the epoch. EEG and MEG epochs in which the EEG or EOG exceeded 200µV, or value on any gradiometer channel exceeded 2000 fT/m were rejected as potentially containing artifacts. In addition, artifact components associated with eyeblinks and saccades were automatically detected and removed using the independent component analysis tools of EEGLAB (Delorme and Makeig, 2004).

#### **SOURCE RECONSTRUCTION**

We estimate the location of cortical sources with the anatomically constrained minimum norm estimate (MNE; Hämäläinen and Ilmoniemi, 1994). MR structural images were obtained using a GRAPPA 3D MPRAGE sequence (*TR* = 2250 ms; *TE* = 2*.*99 ms; flip-angle = 9◦; acceleration factor = 2) on a 3 T Trio (Siemens, Erlangen, Germany) with 1 mm isotropic voxels. From the MRI data, a representation of each participant's cerebral cortex was constructed using the FreeSurfer program (http://surfer*.*nmr*.* mgh*.*harvard*.*edu/). The forward model was calculated with a three-layer Boundary Element Model (BEM) using the outer surface of the scalp as well as the outer and inner surfaces of the skull identified in the anatomical MRI. This combination of MRI, MEG, and EEG data provides better source localization than MEG or EEG alone. The constructed cortical surface was decimated to yield approximately 12,000 vertices that were used as the locations of the dipoles. To perform group analysis, the cortical surfaces of individual subjects were inflated and aligned using a spherical morphing technique implemented by MNE (Gramfort et al., 2014). Sensitivity to neural sources was improved by calculating a noise covariance matrix based on the 100 ms pre-stimulus period. The activations at each location of the cortical surface were estimated over 1 ms windows.

### **SPATIOTEMPORAL SEARCHLIGHT REPRESENTATIONAL SIMILARITY ANALYSIS (ssRSA)**

In ssRSA, we need to represent the similarity structure in the observed dynamic patterns of brain activation as well as the theoretically relevant similarity structure in the stimuli (Su et al., 2012). As noted earlier, the former is called the data RDM and the latter is called the model RDM. Brain data RDMs express the pairwise similarity between neural activation patterns in the EMEG data. In general, a model RDM expresses a hypothesis about what the neural system might encode. If the brain data RDM matches the model RDM, we can infer that the cortical region from which the brain data RDM is derived may indeed code information captured by the model RDM. If there is only one hypothesis about the data, the simplest approach to determine the match between data and model RDMs is to compute a Spearman's correlation between them (e.g., Su et al., 2012). This approach is not used in the current study, since there are multiple interrelated hypotheses—i.e., several model RDMs each of which represents a hypothesis about a particular frequency band in the sound. This is because (as specified below) we generate a model RDM for each of sixteen

<sup>1</sup>The output of this sound transduction system is essentially flat up to 5 KHz with a 12 dB drop off at 6 KHz.

frequency components in the stimuli ranging from 30 to 8000 Hz. Instead of Spearman's correlation, we used a general linear model (GLM) approach to estimate the contribution of each hypothesis (as expressed in each model RDM) in explaining the brain data RDM. The GLM estimates a set of parameters, which show how well each model RDM matches the data. The advantage of this approach is that the comparisons between brain data RDM and multiple model RDMs are performed in a single step, and GLM can also consider correlations between different model RDMs while estimating the relevant parameters (Mur et al., 2013).

The ssRSA procedure extends the concept of spatial searchlight developed for fMRI to include an extra temporal dimension (Su et al., 2012). This is done by combining a spatial searchlight with a sliding time window under the assumption that neuro-cognitive representations may be realized in the continuous spatiotemporal patterns of the source-reconstructed EMEG data. Specifically, at each spatial location (or vertex) on the MR-estimated cortical surface, the searchlight covers a hexagonal cortical patch (approximately 20 mm radius), which includes about 128 vertices. Based on previous research (e.g., Moerel et al., 2012), we predefined a search area that covered bilateral superior temporal cortex, including Heschl's gyrus, superior temporal gyrus (STG), and superior temporal sulcus (STS) using Freesurfer cortical parcellation (Fischl et al., 2004; Desikan et al., 2006) (see **Figure 7** below).

In this study, we analyzed the data from the onset of the word to 200 ms after onset. The choice of this short time period reduces the potential influence of linguistic processing, providing a clearer view of auditory cortical organization. The temporal dimension of the ssSRA searchlight was set to 30 ms in width and was moved, as a sliding window, from word onset to 200 ms after onset in incremental steps of 10 ms. These parameters were chosen to give sufficiently fine-grained temporal resolution to reveal the development of neural responses in the brain. The same sliding time window was applied to the EMEG data, when computing the brain data RDM, as was applied to the stimuli when constructing the model RDMs. This allowed us, when comparing a model RDM to a data RDM, to compare RDMs derived from the same time period.

The brain data RDM was computed for each spatiotemporal searchlight location and assigned to the center vertex of the hexagonal searchlight region. We allow the searchlight to overlap in space and in time resulting in separate brain activation RDMs for each vertex at each time point. Sampling with overlapping spatiotemporal searchlights enables us to detect distributed and transient representations that might otherwise straddle the boundary between adjacent cortical patches or successive temporal windows and fail to be analyzed as a single pattern.

The data RDM for each vertex and each time point was computed for each subject individually, and an averaged RDM was then derived across all subjects. This averaging at the level of dissimilarity rather than at the level of neural response allows ssRSA to take into account individual variability in neural representations. This method is less affected by differences in how a stimulus is actually encoded in a subject's auditory cortex, because while the same auditory stimulus may elicit a unique EMEG pattern for that specific individual (Moerel et al., 2013), we can nonetheless assume that the similarity structure across the 400 words is directly comparable across subjects.

We explain below how we compute brain data RDMs based on the EMEG data and model RDMs based on human cochlear models of the stimuli. We then describe how we estimate parameters from the GLM relating brain data and model RDMs, and how we map frequency preference and selectivity across human auditory cortex using ssRSA.

#### **CONSTRUCTING BRAIN DATA RDMs**

As previously discussed, a brain data RDM is derived from the dynamic patterns of neural activation over space and time. Each entry in the RDM is a correlation distance (1 minus the correlation value) between the activation patterns elicited by a pair of experimental conditions (here, individual words). These activation patterns are the source estimations of the EMEG data for each pair of words, as computed within a window defined by the searchlight algorithm (see **Figure 1**). In general, such a pattern is based on the distribution of EMEG source estimation over a number of vertices over a period of time. For the same group of vertices and the same time window, the pattern of activation will differ between conditions (pairs of words) because the underlying neural population responds differently to different auditory stimuli. If two stimuli are similar in their physical properties, e.g., sounds with similar frequency components, then the distribution of source estimations for these stimuli should be more similar in auditorily sensitive cortex. Conversely, two distinct auditory stimuli should elicit more dissimilar activation patterns in these regions.

As previously implemented in the fMRI variant of RSA (Kriegeskorte et al., 2006), we define similarity between two activation patterns as the Pearson's correlation distance between the two data vectors of EMEG source estimations. Here, the data vector refers to a one-dimensional vector derived from the activation pattern, which is a three-dimensional object (two-dimensions for space along the cortical surface and a dimension of time). **Figure 1** shows an example brain data RDM computed from a location in the superior temporal lobe. The center of the searchlight was placed in the region of auditory cortex, with a radius of 20 mm, and a time window that runs 30 ms from the onset of the stimulus. The example shown captures the initial response in the auditory cortex, since the latency of a detectable signal in the auditory core area is approximately 10 ms post-stimulus onset in animals (e.g., Heil, 1997). By using a sliding time window approach for the generation of model and brain data RDMs, we can track the unfolding of this response over time, while the searchlight method covers the entire search area by moving the center of the searchlight vertex-by-vertex throughout this area.

On the right hand side of **Figure 1**, it can be seen that elements on the main diagonal of RDM are zeros by definition. In the off-diagonal parts of the matrix, a large value (shown in red) indicates that the two conditions have elicited highly dissimilar spatiotemporal activation patterns, and vice versa for small values (shown in blue). The RDM is a 400 × 400 matrix representing the pairwise similarity between the 400 words used in the experiment (ordered alphabetically). RDMs computed using this method are symmetric about the main diagonal, and subsequent

computations were restricted to the portion of the matrix falling above the diagonal.

#### **THEORETICAL MODEL RDMs**

The motivation for generating model RDMs to map tonotopic distributions is in order to make predictions about the spatiotemporal patterns induced in EMEG data by the frequency dimension of a stimulus set. The performance of the ssRSA procedure is specific to the hypotheses being tested and will only show effects that match the specified model RDMs. To keep the model RDMs as realistic as possible, as a reflection of the spectral characteristics of early auditory processes, the stimuli were filtered using a Gammatone filter bank (Patterson, 1987) to generate a probable representation of the spectral response for each word, as output by the human cochlea. This representation was chosen because of the inherent bias of the human ear to different frequency components in the auditory stimuli, and because the Gammatone filter bank model has been validated by numerous empirical investigations (e.g., Patterson, 1987).

The filter bank aims to generate a cochlear representation of an auditory signal by convolving the signal with the impulse responses of individual Gammatone filters. The envelope of each filter is designed in such a way that it is narrow at low frequency channels and the impulse responses are wider for high frequencies (Patterson, 1987). We used an implementation of 4th order non-phase aligned Gammatone filters in Matlab by Christopher Hummersone (http://www*.*mathworks*.*co*.*uk/ matlabcentral/fileexchange/32212-gammatone-filterbank). The center frequency of each channel is equally spaced between low and high center frequencies on the Equivalent Rectangular Bandwidth (ERB) scale. As previously mentioned, we set the lowest frequency channel to 30 Hz and the highest to 8000 Hz. These values were chosen based on the power spectral density **Table 1 | Center frequencies for 128 Gammatone filters grouped into 16 bands ranging from 30 to 8000 Hz (equally spaced on a logarithmic scale).**


of the stimuli such that most of the energy of the sound was covered by our analysis. The total number of frequency channels was set to 128 in order to cover the frequency range from 30 to 8000 Hz with sufficient resolution without losing spectral information. This resulted in a cochleagram that represents activation across time for each of the 128 frequency channels. **Table 1** gives the center frequencies of these frequency channels, and **Figure 2** provides two examples of cochleagrams computed from word onset to 200 ms post-onset. The model RDMs are computed by sampling from these cochleagram in a 30 ms time window moved in incremental 10 ms steps over this 200 ms epoch.

To explore how auditory cortex responds to different frequency components, we divided each cochleagram into 16 frequency bands with 8 channels in each band, as shown in **Table 1**. This allows us to preserve the rich frequency-varying properties of the stimuli while reducing the number of model RDMs that need to be tested, thereby improving the efficiency of the ssRSA algorithm. The selected 30 ms time window moves with an incremental step of 10 ms in order to capture the detailed dynamical changes of spectral information in the stimuli. Within each frequency band, at each incremental time step, we generate a frequency-characteristic curve by taking the mean energy of all constituent frequency channels across time and converting this into magnitude gain in dB. A model RDM is built up by computing, for each pair of frequency bands and time windows, the pairwise correlation distance (1—Pearson's correlation) using the corresponding frequency characteristic curves for the two words in question (see **Figure 3**, left panel). In this way, each full model RDM reflects the similarity between all 400 stimuli at a particular combination of frequency bands and time points. Our key hypothesis here is that if a brain data RDM matches one of these model RDMs, then the brain region from which the data RDM is derived may encode information for the corresponding frequency band. **Figure 3** shows 16 model RDMs for different frequency bands calculated for a time window of 0–30 ms from the onset of the stimuli.

#### **COMPARING BRAIN DATA RDMs TO THEORETICAL MODEL RDMs**

From the sequence of operations outlined in the preceding sections, we obtain a set of model RDMs, which define a set of hypotheses about potential stimulus properties, and a set of brain data RDMs averaged over subjects for every vertex and time point as the searchlight moves. The comparisons between brain activation RDMs and model RDMs were performed by fitting a GLM as shown in **Figure 4**. In the GLM, the data RDM is expressed by a linear combination of 16 model RDMs, each representing a different frequency band. Estimating the GLM gave 16 beta parameters and a residual matrix for this particular vertex for a particular time window. These parameters reflect how much variance in the brain activation RDM can be explained by the corresponding frequency component in the stimuli. Since neurons in the brain do not respond only to a single frequency but potentially to a range of different frequencies (e.g., Moerel et al., 2012, 2013), we fit a Gaussian distribution to the 16 estimated parameters. We suggest that this Gaussian is akin to the tuning curve found in neurophysiology representing the frequency preference and selectivity of the underlying neural population. After we have fitted the Gaussian distribution, we assign the center and the standard deviation (*SD*) of this Gaussian to the centroid of this searchlight (Formisano et al., 2003; Moerel et al., 2012). In the presentation of the results below (**Figures 6**–**10**) we focus on these measures of frequency response in order to preserve comparability with the Moerel et al. (2012) results. In addition, to give a more complete picture of the sensitivity of the EMEG/ssRSA technique, we also provide information about the distribution of the *peak* frequencies observed for each Gaussian (see **Figure 8**).

Note that this computation of the Gaussian tuning curve for any specific vertex at a single time window (e.g., 40–70 ms), as shown in **Figure 4**, will take into account the match between this data RDM and the model RDM that has the same start and end points as this time window. The result of the GLM is assigned to the mid-point (i.e., 55 ms) of the sliding time window under analysis. In the next overlapping position of the sliding time window (i.e., 50–80 ms), we carry through the same procedure for a new pair of data and model RDMs computed for this new time window. The result of this GLM is assigned to the 65 ms mid- point for this window.

More generally, it is important to note that we assume—in common with the field in general—that the functional properties of auditory cortex are relatively stable over time, and certainly over the 200 ms epoch employed here. This means that the tonotopic maps generated at each time window are providing information about the properties of the same underlying object of interest—in this study the organization of frequency-sensitive processes in bilateral human superior temporal cortex. At the different time windows sampled here, however, the varying frequency properties of the stimuli will be different, so that the model RDMs probe the auditory system with differing spectral context across time windows. In this sense, every time-point can be seen as a new test for auditory cortex.

At the endpoint of the ssRSA process, which has generated Gaussian frequency tuning curves at every vertex in the superior temporal cortex search area at every incremental time point from onset to 200 ms, it is necessary to combine this information to give a unified view of the functional organization of these brain areas. We do this by averaging the center frequencies and the SDs across the 200 ms analysis epoch at each vertex. These

**representing the pairwise correlations between the 400 stimulus words in a 400 × 400 matrix.** The **left hand panels** show two

indicate the frequency characteristic curve for each word for the band indicated.

values are then mapped back to the brain, so that the results of the ssRSA analysis are brain maps showing the frequency preference reflected by the center of the Gaussian and the frequency selectivity reflected by the SD. It is these averaged data, appropriately statistically thresholded, that we present in the Results Section below <sup>2</sup> .

#### **CORRECTION FOR MULTIPLE COMPARISONS**

In the source estimations of EMEG data, there can be tens of thousands of vertices multiplied by hundreds of time points. The analysis will therefore contain very large numbers of individual comparisons even within our relatively restricted search area (bilateral superior temporal cortex). This creates a massive spatiotemporal multiple comparisons problem, potentially resulting in high proportions of false positives. To assess the significance of the results and to control for false positives, we therefore performed permutation-based statistical tests (Bullmore et al., 1996; Brammer et al., 1997; Nichols and Holmes, 2001). Under the assumption of the null hypothesis, the superior temporal cortex does not represent the frequency properties of a sound, so that any two sounds are equally similar in terms of their neural response to frequency. If the null hypothesis holds, we can freely relabel each condition, and the relationship between our model RDMs and the condition-relabeled brain activation RDMs would not change.

To simulate the null hypothesis, we randomly permuted the condition labels of our 400 stimuli by swapping the rows and columns of all brain data RDMs. When performing this random permutation, we kept the order of the permutation unchanged for all data RDMs across vertices and time points in order to preserve the spatiotemporal autocorrelation in the data. We performed 1000 permutations of our data RDMs. For each permutation, we compared the new data RDMs with the same set of model RDMs using GLM as we did with the original data. We then selected the maximum beta parameters for each GLM. From these 1000 permutations, we were able to build a null distribution of the maximum beta parameters (28,028,000 data points in total), which assumes that the fit between data and model RDMs was due only to random noise. Thresholding the null distribution to select the top 5% will ensure that we have a risk equivalent to *p* = 0*.*05

<sup>2</sup>The question of whether the tonotopic organization of auditory cortex may have dynamic properties, and therefore vary over time (possibly as a function of the distribution of recent inputs), is an interesting one, but outside the scope of this report.

of detecting any vertex as significant (i.e., as a false positive) if the null hypothesis were true. We therefore control false positives by thresholding our tonotopic maps with the appropriate beta value that picks out the top 5% of the null distribution. This procedure addresses the spatiotemporal multiple comparisons problem, and ensures that the results are robust to noise.

## **RESULTS**

## **THE NULL DISTRIBUTION**

After permuting the condition labels in our data RDMs 1000 times, we obtain a null distribution of the maximal beta parameters (see **Figure 5**). This null distribution is skewed toward zero and has a long tail toward the positive end. This distribution reflects the fact that when we randomly permute the condition labels, thereby disassociating the EMEG data of each word from their spectral characteristics, most of the model RDMs fail to explain much of the variance in the permuted data RDMs, resulting in beta values close to zero. This in fact is what the null hypothesis is assuming. Thus, values above zero reflect false positives and *p* = 0*.*05 corresponds to the threshold of beta values at 0.0029, which selects the top 5% of the null distribution.

## **FREQUENCY RESPONSE DISTRIBUTIONS**

**Figures 6A,B** show the distributions of preferred frequencies for all vertices in bilateral superior temporal cortex, summarizing the results compiled over the 200 ms from stimulus onset. These distributions are very similar for the two hemispheres although the right hemisphere shows a small shift toward higher frequencies. The peak of the distribution on the left is at around 250–350 Hz, which is in the range of the fundamental frequency of the female voice (the stimuli were recorded by a female speaker). The peak for the right hemisphere was slightly higher, at 450 Hz. The distribution of frequency preferences for both hemispheres was bimodal with a second (much reduced) peak centered at around 900–1000 Hz, likely reflecting the harmonic formant structures in speech. This result suggests that the majority of the area sampled (extending well beyond primary auditory cortex as standardly defined) responded most strongly to the predominant frequencies in the voice range (200–2000 Hz). Note that this did not mean an absence of responses to higher frequencies, as can be seen by the peak frequency plots provided in **Figure 8**.

**Figures 6C,D** show the distributions of the standard deviation (*SD*) of the Gaussian tuning curve for each vertex in bilateral superior temporal cortex. Note that because the unit of the SD is expressed in frequency bands, the SDs for Gaussians with higher frequency preferences will cover larger ranges of frequencies (in Hz). The distributions for the left hemisphere are strongly bimodal, with a narrow selectivity group centered around a peak SD distribution of 3.7, and a broader sensitivity group with a peak SD distribution of 5.2. The right hemisphere (**Figure 6D**) shows a very different pattern, with most of the region tested being only weakly frequency selective. There is a primarily unimodal distribution, corresponding to the broader sensitivity group in the left hemisphere, with the peak of the SD distribution falling at 4.5. A much smaller set of vertices show stronger selectivity, with SDs falling in the range 2–3. The spatial mapping of frequency preference and selectivity is shown in **Figures 7**, **9** below.

### **MAPS OF FREQUENCY PREFERENCE (CENTER AND PEAK FREQUENCIES)**

The results of the ssRSA procedure, after applying the threshold derived from the permutation testing, were two maps of bilateral human superior temporal cortex. The first map shows frequency preference (the center frequency of the Gaussian tuning curve) and the second shows frequency selectivity (the SD

of the tuning curve). In both maps we see frequency sensitive processes extending across the entire area of interest, and well outside primary auditory areas (Heschl's gyrus, planum temporale, etc.). **Figure 7** shows that in the left hemisphere, regions exhibiting higher center frequency preferences (yellow) are seen medially in Heschl's gyrus, with a substantial area of higher frequency preference extending posteriorly into planum temporale and posterior STG. Further higher frequency sensitive areas, at some distance from primary auditory cortex, are found more ventrally in posterior STS. Low frequency regions (darker orange/red) occur in an extensive region of the middle part of the superior temporal lobe, extending laterally and ventrally from Heschl's gyrus into STG and STS. Heschl's gyrus itself appears to exhibit the high-low and low-high tonotopic gradients frequently reported for this region in earlier studies (Baumann et al., 2013; Saenz and Langers, 2014)—see below for further discussion.

In the right hemisphere, Heschl's gyrus exhibits primarily lower center frequency preferences—similarly to the left hemisphere—with an apparent low-high tonotopic gradient extending postero-medially. A pronounced higher center

**FIGURE 7 | Frequency preferences in left and right hemisphere superior temporal cortex derived from ssRSA analysis of EMEG data.** The search areas were restricted to the regions denoted by the white lines in the upper panels. Green dashed lines (lower panels) show the outlines of Heschl's gyrus (HG). Other anatomical landmarks are superior temporal gyrus (STG), superior temporal sulcus (STS), and planum temporale (PT). The outlines of HG were generated based on the FreeSurfer cortical parcellation (Fischl et al., 2004; Desikan et al., 2006) and on Moerel et al. (2014).

**FIGURE 8 | Peak frequencies in left and right hemisphere superior temporal cortex derived from ssRSA analysis of EMEG data plotted in logarithmic scale in order to accommodate the full frequency range.** Dashed green lines show the outlines of Heschl's gyrus (HG).

frequency preference area is located lateral to Heschl's gyrus, extending to STS. A further higher center frequency preference strip is seen in anterior STG and STS, reaching as far as the temporal pole. No comparable antero-medial region of frequency sensitivity is seen on the left. The right hemisphere also shows a substantial region more posteriorly of lower center frequency preferences, falling mainly in posterior STS. It should be noted, however, that center frequency preferences in right superior temporal cortex are generally accompanied by broadly tuned frequency selectivity (see **Figure 9**).

In addition, to complement these plots of center frequency preference for each vertex, we also provide plots of the significant peak frequencies observed for the same vertices (see **Figure 8**). These plots, showing patches of peak frequency extending up to 8000 Hz, show a very similar distribution across superior temporal cortex to the center frequency results in **Figure 7**. Unsurprisingly, when the Gaussian tuning curve includes responses to lower frequencies, this may shift the center

frequency of the Gaussian (see **Figure 7**) downwards from the peak frequency (**Figure 8**).

## **MAPS OF FREQUENCY SELECTIVITY**

**Figures 6C,D** suggested marked differences between the hemispheres in frequency selectivity. This is reflected in the spatial maps of frequency selectivity (**Figure 9**). The right hemisphere shows relatively broad frequency selectivity (dark orange/red) across almost all of the regions tested. Only Heschl's gyrus, indicating primary auditory cortex, shows a substantial patch of narrow frequency selectivity (yellow), corresponding to the region of low frequency preference seen in **Figure 7**. The left hemisphere shows generally narrower frequency selectivity (light orange to yellow), with an area of narrow selectivity also falling into Heschl's gyrus, in a comparable location to the right. There is also a marked and well-defined area of narrow selectivity in posterior STG and STS, which is not present on the right.

## **DISCUSSION**

#### **ORGANIZATION OF AUDITORY CORTEX**

The ssRSA analyses of real-time neural responses to spectrally complex spoken words, as summarized in **Figures 6**–**9**, seem to deliver statistically robust and regionally coherent patterns of frequency sensitivity and frequency selectivity across bilateral superior temporal cortex. The critical issue in evaluating these outcomes, however, is their interpretability relative to existing data and theory where human auditory cortex is concerned, as well as to the analyses of tonotopy carried out in non-human primates using invasive methods.

We present below some initial comparisons along these lines, but we emphasize that these comparisons can only be preliminary. A more quantitative treatment of the parallels between the current ssRSA/EMEG results and the high-field fMRI results will require a deeper analysis of how the very different properties of the two analysis processes might affect how variations in frequency preference and selectivity are captured by each process, and the consequences of this for spatial maps of these variations. This analysis is outside the scope of the current paper, and the comparisons we provide below should be regarded as preliminary and illustrative.

**Figure 10** summarizes some salient aspects of the relationship between the ssRSA/EMEG results and current tonotopic maps based (primarily) on high field (3 T/7 T) fMRI, as summarized in recent reviews and commentaries (e.g., Baumann et al., 2013; Moerel et al., 2014; Saenz and Langers, 2014). As these reviews make clear, there has been a surprising degree of controversy over the past decade about the exact tonotopic organization of the auditory "core" in human auditory cortex, reflecting both the relatively small size and inaccessibility of the relevant brain areas and the rapid evolution of neuroimaging technologies providing a succession of new perspectives on neural responses in these regions.

Research in non-human primates has long converged (e.g., Merzenich and Brugge, 1973) on the view that mirror-symmetric tonotopic gradients are found in the AI and R region of core auditory cortex in the macaque. These frequency preference gradients, running from high to low to high, were generally agreed to fall along the posterior-anterior axis of the macaque auditory core, either collinearly (e.g., Kaas and Hackett, 2000) or forming an angled pattern converging at the AI/R midline (e.g., Baumann et al., 2010). It has been less clear how (and whether) these core regions, and their tonotopic gradients, map onto human primary auditory cortex, and onto Heschl's gyrus in particular. Earlier MEG studies (e.g., Pantev et al., 1995), using less accurate dipolebased analysis methods, were interpreted as consistent with a simpler tonotopic arrangement in humans, with a single gradient running low to high, lateral to medial along Heschl's gyrus. Only with the advent of high field fMRI studies (e.g., Formisano et al., 2003) did it become clear that human auditory cortex also exhibited multiple mirror-symmetric high-low-high tonotopic gradients akin to those seen in macaque. Even so, there has been continuing disagreement about the orientation of these gradients relative to Heschl's gyrus, with the collinear arrangement suggested by Formisano et al. (2003), running high-low-high along the gyrus, being challenged by Humphries et al. (2010) and others, arguing that tonotopic gradients run perpendicularly across Heschl's gyrus rather than along it.

Recent reviews by Baumann et al. (2013) and subsequently Saenz and Langers (2014), argue convincingly for a third view of the orientation of symmetric tonotopic gradients in and around Heschl's gyrus, related to the angled orientation proposed in earlier macaque studies (e.g., Baumann et al., 2010). On this account, there is a clear low-frequency trough in the mid-to-lateral half of HG, which is flanked by high-frequency representations, running anteromedially toward the planum polare and posteromedially toward the planum temporale (see Figure 2 in Saenz and Langers, 2014). Both Baumann et al. (2013) and Saenz and Langers (2014) argue that this V-shaped arrangement of the tonotopy gradient also has the advantage of being much closer to current views of macaque auditory cortex.

Inspection of the current ssRSA results (**Figure 10**) suggests a very similar arrangement. **Figure 10A** (reproducing the LH frequency preference map from **Figure 7**) shows a predominance of lower frequency preferences in Heschl's gyrus. Higher frequency regions are located anteromedially and posteromedially, forming a potentially V-shaped organization of symmetric highlow-high tonotopic gradients. This arrangement, indicated by black arrows in **Figure 10A**, corresponds well to the schematic diagram (**Figure 10B**) of tonotopic maps for the same region derived from recent high-field fMRI studies (Moerel et al., 2014;

to high frequency gradients in V-shaped orientation. **(C)** Frequency

based on information from Saenz and Langers (2014) and Moerel et al. (2014). Dashed green lines show the outlines of Heschl's gyrus.

Saenz and Langers, 2014). It is also consistent with the layout described by Moerel et al. (2012), based like the current study on frequency preferences elicited from natural sounds.

Outside Heschl's gyrus and surrounding areas (likely corresponding to core and belt auditory regions), both ssRSA and fMRI schematic maps exhibit a predominant LH preference for lower frequencies, both in middle and anterior STG and STS, and in posterior STS. This may reflect a tuning of these areas to the frequency properties of speech in particular (Moerel et al., 2012; Norman-Haignere et al., 2013). In the current study we explored a wide frequency range from 30 to 8000 Hz. While we found clear evidence of (peak) frequency sensitivity up to 8000 Hz (see **Figure 8**), the (center) frequency preference distributions (**Figures 6A,B**) suggest both auditory cortex and surrounding superior temporal regions were predominantly driven by the lower end of this frequency range, from 200 to 2000 Hz.

Frequency selectivity, referring here to the width of the Gaussian tuning curve computed at each spatiotemporal searchlight window, has featured prominently in neurophysiological studies of the response properties of individual neurons (Kanold et al., 2014). In the non-invasive human literature, where any tuning curve will average over many thousands of neurons with potentially heterogeneous frequency selectivities, less emphasis has been placed on this aspect of the neural substrate for auditory cortex. An exception is the recent fMRI research by Moerel et al. (2012, 2013), who also computed Gaussian tuning curves from which selectivity can be derived. Their results, schematically represented in **Figure 10D**, show substantial similarity with the results we obtained. The ssRSA map of the SD of the tuning curve (**Figure 10C**) shows a LH region of narrower frequency selectivity (marked by black dashed lines) in the anterolateral half of Heschl's gyrus, extending into anterior STS and STG, with a second well-defined patch of narrow selectivity in posterior STS. These are both regions that also show frequency selectivity in the fMRI studies (**Figure 10D**).

The ssRSA frequency selectivity results for Heschl's gyrus, both on the left and on the right (see **Figure 9**) can also be linked to neurophysiological research with non-human primates. This research uses several methods to distinguish core from belt auditory cortex, including a functional definition based on the width of the frequency-tuning curve. Neurons in the core auditory field have much sharper tuning curves compared with neurons in the belt region (Rauschecker et al., 1995; Hackett et al., 1998). The results here are consistent with this, showing that the primary auditory cortex located midway in Heschl's gyrus was more frequency selective than the areas surrounding it.

#### **METHODOLOGICAL IMPLICATIONS**

The novel combination of techniques presented here has three characteristics which, taken together, distinguish this research from previous studies of tonotopy—and, indeed, of cortical function more generally.

First, the analyses are conducted in MRI-constrained EMEG source space, using minimum norm distributed source reconstruction methods, which map signals recorded at the scalp back to the whole brain cortical surface (defined as the white matter/gray matter boundary). While necessarily noisy and imperfect, these are nonetheless the best available non-invasive methods for measuring and representing the dynamic electrophysiological events that underpin real-time brain function, with millisecondlevel temporal resolution and potentially sub-centimeter spatial resolution.

Second, the ssRSA method provides a statistically unbiased and robust means of interrogating this representation of real-time neural activity in order to determine the *qualitative* functional properties of the neural computations supported by this dynamic electrophysiological activity in EMEG source space. Every model RDM encodes (implicitly or explicitly) a theoretical claim about the functional dimensions that constrain neural activity within the spatiotemporal window sampled by the ssRSA searchlight procedure. The match between model RDM and brain data RDM is not simply evidence that a pattern match can be found (as in machine learning-based pattern-classification techniques) but that this is a pattern match with specific neurocomputational implications.

In the current study, the model RDMs express a neurobiologically plausible theoretical model of how frequency variation is encoded in the early stages of auditory processing, modulated by the spectral properties of a given auditory input dynamically changing over time. The significant fit of these multiple model RDMs to the correlational structure encoded in each brain data RDM therefore licenses direct inferences about the qualitative properties of the neural computations being conducted within the spatiotemporal window covered by that data RDM—in this study, inferences about the frequency preferences of the brain area being sampled, and the selectivity of these preferences. Critically, these inferences are not based on simple variations in the amount of activity associated with a given frequency dimensions, but rather on the multivariate correlational pattern elicited (in this case) across a 400 × 400 stimulus matrix.

Thirdly, the ssRSA approach (and RSA in general) allows the use of naturalistic stimuli—in this experiment naturally spoken words—that correspond more closely to the kinds of sensory inputs to which the neural systems of interest are normally exposed. Naturally spoken words present the auditory object processing system with frequency variation in its natural environment—i.e., reflecting the complex mixture of spectral patterns imposed on the speech output by the human speech apparatus in order to serve specific human communicative functions. This is the ecologically central environment for human auditory processing of spectral variation, and a necessary context in which to study these processes. Studies using artificially generated tone sequences, in a psychophysical testing format, may pick out neural response properties that are not in fact the salient modes of processing in more ecologically natural contexts<sup>3</sup> .

The ability of ssRSA to use naturalistic stimuli derives from its use of model RDMs. In so far as the relevant dimensions for constructing the correlational structure of a model RDM can be extracted from a given stimulus set—in this study frequency variation from spoken words—then any stimulus set which allows this is potentially usable. This also means that multiple functional dimensions can be extracted from the same stimulus sets. Naturally spoken words exhibit a rich set of properties over many dimensions, ranging from the acoustic to the phonemic to the lexical. ssRSA allows us to probe the neural responses to such stimuli across any functional dimension for which it is possible to specify a model RDM—for an example of a preliminary study using lexical models to probe word-recognition processes in the same set of words (see Su et al., 2012). Here we demonstrate this for the salient dimension of frequency-based processes, especially critical for speech comprehension. In complementary research on the same stimulus set, we can interrogate EMEG source space along phonetic and phonemic dimensions, providing a broader functional context for interpreting the auditory processing characteristics observed in the current study of tonotopy.

## **CONCLUSIONS**

In summary, the current results present a credible and realistic analysis of the neural distribution of frequency sensitive processes in human bilateral superior temporal cortex. Frequency preferences in and around Heschl's gyrus are consistent with current proposals for the organization of tonotopic gradients in primary acoustic cortex, while the distribution of narrow frequency selectivity similarly matches results from the fMRI literature. While we cannot provide exact measures of localizational accuracy in the spatial domain, the group level maps provided here seem comparable to those that have emerged from fMRI or ECOG studies, and a considerable advance over earlier MEG research.

More generally, the *in-vivo* and non-invasive ssRSA approach can be combined with both neurophysiological and cytoarchitectonic methods for locating and subdividing auditory cortex. For example, when measuring the neural response in human auditory cortex using single-unit recording in selected patients, frequency preference and selectivity can be mapped using techniques directly comparable to those used in nonhuman primates. New quantitative MRI techniques allow the mapping of tissue microstructure and provides information about density of myelination of the neurons, resulting in an *in-vivo* map of human primary auditory cortex (Dick et al., 2012). We believe that combining advances in imaging techniques (fMRI, EEG, and MEG) with advanced computational methods such as ssRSA will provide important new opportunities to unravel the functional organization of human auditory cortex.

<sup>3</sup>It is worth noting, nonetheless, that Moerel et al. (2012), who used both natural words and pure tone sequences, found broadly similar spatial maps of frequency sensitivity for both types of input.

## **ACKNOWLEDGMENTS**

We would like to thank Niko Kriegeskorte, Roy Patterson, Elia Formisano, Howard Bowman, Andrew Thwaites, as well as the Frontiers editorial and reviewing team, for their insightful comments on the research. This work was supported by a European Research Council Advanced Grant (230570 Neurolex) and Medical Research Council Cognition and Brain Sciences Unit funding to William Marslen-Wilson (U.1055.04.002.00001.01). The involvement of Li Su was also partly supported by the NIHR Biomedical Research Centre and Biomedical Research Unit in Dementia based at Cambridge University Hospitals NHS Foundation Trust.

## **REFERENCES**


EEG: comparison to fMRI in focally stimulated visual cortex. *Neuroimage* 36, 1225–1235. doi: 10.1016/j.neuroimage.2007. 03.066


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 27 April 2014; accepted: 27 October 2014; published online: 12 November 2014.*

*Citation: Su L, Zulfiqar I, Jamshed F, Fonteneau E and Marslen-Wilson W (2014) Mapping tonotopic organization in human temporal cortex: representational similarity analysis in EMEG source space. Front. Neurosci. 8:368. doi: 10.3389/fnins. 2014.00368*

*This article was submitted to Auditory Cognitive Neuroscience, a section of the journal Frontiers in Neuroscience.*

*Copyright © 2014 Su, Zulfiqar, Jamshed, Fonteneau and Marslen-Wilson. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## *Michelle Moerel 1,2,3, Federico De Martino1,2 and Elia Formisano1,2\**

*<sup>1</sup> Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, Netherlands*

*<sup>2</sup> Maastricht Brain Imaging Center, Maastricht University, Maastricht, Netherlands*

*<sup>3</sup> Department of Radiology, Center for Magnetic Resonance Research, University of Minnesota, Minneapolis, MN, USA*

#### *Edited by:*

*Yukiko Kikuchi, Newcastle University Medical School, UK*

*Reviewed by: Simon Baumann, Newcastle University, UK Li Su, University of Cambridge, UK*

#### *\*Correspondence:*

*Elia Formisano, Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, PO Box 616, Oxfordlaan 55, Maastricht, 6229 EV, Netherlands e-mail: e.formisano@ maastrichtuniversity.nl*

While advances in magnetic resonance imaging (MRI) throughout the last decades have enabled the detailed anatomical and functional inspection of the human brain non-invasively, to date there is no consensus regarding the precise subdivision and topography of the areas forming the human auditory cortex. Here, we propose a topography of the human auditory areas based on insights on the anatomical and functional properties of human auditory areas as revealed by studies of cyto- and myelo-architecture and fMRI investigations at ultra-high magnetic field (7 Tesla). Importantly, we illustrate that—whereas a group-based approach to analyze functional (tonotopic) maps is appropriate to highlight the main tonotopic axis—the examination of tonotopic maps at single subject level is required to detail the topography of primary and non-primary areas that may be more variable across subjects. Furthermore, we show that considering multiple maps indicative of anatomical (i.e., myelination) as well as of functional properties (e.g., broadness of frequency tuning) is helpful in identifying auditory cortical areas in individual human brains. We propose and discuss a topography of areas that is consistent with old and recent anatomical *post-mortem* characterizations of the human auditory cortex and that may serve as a working model for neuroscience studies of auditory functions.

**Keywords: human auditory cortex, tonotopy, ultra-high field fMRI, cytoarchitectonic parcellation, auditory cortical areas**

### **INTRODUCTION: CHALLENGES FOR THE INVESTIGATION OF THE HUMAN AUDITORY CORTEX**

A major scientific approach in brain research has been to divide the cortex into smaller anatomical areas based on their microstructural properties (Brodmann, 1909; Zilles and Amunts, 2009; Nieuwenhuys, 2012) and examine each area's functional properties through the analysis of the responses of neurons and neuronal populations. Whereas in animal models the link between micro-structural and functional properties of an area can be studied directly and in the same individual animal, in non-invasive research in humans such a link is much more labile, as it relies on the gross correspondence to macro-anatomical landmarks or matching to probabilistic atlases derived from post-mortem analysis of different brains (Morosan et al., 2001). Establishing an accurate parcellation of the cortical areas is thus essential in human research for studying the functional role of the various areas and for comparing results across experiments and laboratories. Furthermore, such a parcellation is crucial for understanding homologies and differences between human and animal cortex. Research into the visual system is a prominent example where such an approach has been successful. Functional magnetic resonance imaging (fMRI) has enabled mapping of the retinotopic organization in the human visual cortex *in vivo* and non-invasively (Engel et al., 1994; Sereno et al., 1995; Goebel et al., 1998). Because adjacent areas have opposite representations of the retinal image, the area borders can be outlined by calculating the sign of the local visual field (Sereno et al., 1995). With such an approach, the functional topography of early visual areas could be objectively mapped in individual human subjects and compared to topography of areas in the monkey visual cortex (Van Essen, 2004). This methodology provided a crucial tool for studying in detail the role of the distinct visual areas in visual information processing. Furthermore, similar methods have been used for discovering location and functional topography of highorder visual areas in both the ventral-temporal (Malach et al., 2002; Hasson et al., 2003) and parietal cortex (Sereno et al., 2001).

Despite the fact that fMRI research on the auditory system begun approximately at the same time as that on the visual system (see Talavage and Hall, 2012), to date there is no functional parcellation scheme of human auditory cortical areas that is generally accepted and routinely used across laboratories. While some of the impediments are of technical nature (e.g., the experimental limitations arising from the acoustic noise generated by the MR scanner, see Di Salle et al., 2003; Talavage and Hall, 2012), the main reasons remain exquisitely neuroscientific. First, there is no dominant model of anatomical parcellation of human auditory cortical areas. In the monkey, the auditory cortex presents a hierarchical organization with a core of primary auditory areas that receive ascending projections from the auditory portion of the thalamus, and is surrounded by non-primary belt and parabelt regions (Hackett et al., 1998, 2001). Each of these cortical partitions (i.e., core, belt, and parabelt) contains a number of auditory areas that can be distinguished based on their micro-anatomical and functional properties and their connectivity to sub-cortical structures and other cortical areas (Kaas and Hackett, 2000). This anatomical model of monkey auditory cortex is well-established and similar cortical models exist for a range of other species (Kaas, 2011). However, large differences exist between monkey and human auditory cortex even at macro-anatomical level. For example, in the human brain, the auditory cortex presents an expansion of cortical surface, with additional gyri and with a much larger inter-individual variability compared to the monkey (Galaburda et al., 1978; Hackett et al., 2001). Thus, when the goal is to define the detailed topography of auditory areas in individual human subjects, the monkey model may not be directly applicable. Studies of *post-mortem* anatomy indicate that the human auditory cortex contains a similar organization as in the monkey with core, belt, and parabelt subdivisions (Hackett et al., 1998; Morosan et al., 2001). But, strikingly, at the finer level of area definition, there are large differences among the various reports both with respect to the number of presumed auditory areas and to their location (see below).

Second, while in the visual system adjacent areas have opposite representations of the retinal image (Sereno et al., 1995), in the auditory system the frequency preference (i.e., the tonotopic gradient) is expected to run in parallel throughout the core and the directly adjacent belt area (Rauschecker and Tian, 2004). Thus, based on tonotopy maps alone, it is not possible to delineate precise areal borders. It is because of this intrinsic indeterminacy that—despite the feasibility of obtaining fMRI tonotopic maps of the human auditory cortex—a consensus regarding a tonotopybased parcellation of the auditory areas has not yet been reached (Langers and van Dijk, 2012; Baumann et al., 2013; Saenz and Langers, 2014).

The aim of this review is to suggest a topography of the human auditory areas that may serve as a reference for fMRI studies of auditory functions. First, we review old and recent anatomical studies that provide a cyto- or myelo-architectonic characterization of the human auditory cortex with the goal of defining a consistent anatomical subdivision of the human auditory cortex and of reconciling reports that used different methods and different nomenclatures. Next, we show that the tonotopic maps found in different laboratories using different stimuli and acquisition/analysis methods are largely consistent. We demonstrate that whereas a group-based approach is appropriate to highlight the main high-low-high primary frequency gradient, the analysis of the maps at single subject level is required to detail the topography of areas and tonotopic gradients that may be more variable across subjects. Finally, we interpret the tonotopic maps in the light of recent characterizations of the human auditory cortex beyond frequency preference and propose a model that is compatible with both anatomical and functional characterizations of human auditory cortex.

## **ANATOMY OF THE HUMAN AUDITORY CORTEX**

## **MACROANATOMY OF THE HUMAN AUDITORY CORTEX**

The human auditory cortex is situated on the supratemporal plane, and comprises the superior two-thirds of the superior temporal gyrus (STG; Celesia, 1976; Galaburda and Sanides, 1980; Rivier and Clarke, 1997). On a macroscopic scale, the human auditory cortex can be divided in three regions (Kim et al., 2000; **Figure 1**). In anterior to posterior direction, the auditory cortex includes planum polare (PP), the transverse temporal gyrus or Heschl's gyrus (HG), and planum temporale (PT). HG is a convolution on the supratemporal plane, branching obliquely from the STG and hidden in the depth of the Sylvian fissure (SF). HG is evolutionary new: this convolution is not present in the macaque monkey (but see Baumann et al., 2013), and can be discerned in only a subset of chimpanzee brains (Hackett et al., 2001). There is considerable variability in the number of convolutions on the human supratemporal plane, ranging from one to three complete duplications of the transverse gyrus per hemisphere (compare **Figures 1B–D**; Campain and Minckler, 1976; Penhune et al., 1996). Besides complete duplications, a shallow intermediate sulcus (SI) may divide a single HG incompletely (**Figure 1C**). HG is bordered medially by the insular cortex, laterally by STG, and anteriorly and posteriorly by the first transverse sulcus and Heschl's sulcus, respectively (but see variations in **Figures 1B–D**). PT is posterior to HG. This triangular region is bordered medially by the SF, and laterally by the rim of the supratemporal plane. It shows a marked asymmetry and is consistently larger in the left hemisphere (Geschwind and Levitsky, 1968; Galaburda et al., 1978; Bonte et al., 2013). In humans, the PT region is much expanded compared to the monkey (Galaburda et al.,

**FIGURE 1 | Anatomical landmarks on the human supratemporal plane. (A)** Lateral view of the left hemisphere, with STG indicated in red. **(B–D)** Top view of left supratemporal plane, after removal of a large part of the parietal cortex. PP, HG, and PT are indicated in blue, yellow, and green, respectively. Major sulci are outlined in black (FTS, first transverse sulcus; SI, sulcus intermediate; HS, Heschl's sulcus; HS1, first Heschl's sulcus; HS2, second Heschl's sulcus). Panels include hemispheres with one HG, an incomplete separation of HG, and two HG in **(B–D)**, respectively.

1978). Anterior to HG—separated by the FTS—lays PP, further delimited by the insula and the frontal operculum (Kim et al., 2000).

### **CYTOARCHITECTONIC SUBDIVISIONS**

In addition to describing the human auditory cortex in terms of its major anatomical landmarks, it has been labeled according to a variety of architectonic schema (Galaburda and Sanides, 1980; Rivier and Clarke, 1997; Hackett et al., 2001; Morosan et al., 2001). Across architectural studies, however, large differences exist with respect to the number of observed auditory areas, the location of these regions, and nomenclature. These differences already exist when parcellating HG, yet discrepancies between studies enlarge with increased distance from HG. Here, we present an overview of obtained results and propose how the different studies may be reconciled (see **Table 1** and **Figure 2**).

All cytoarchitectonic studies delineate homologs regions to monkey primary auditory cortex (PAC) or "core," referring to the highly granular koniocortex within the auditory cortex (see yellow region in **Figure 2**). The core has a well-developed layer IV, presumably reflecting dense thalamic input from the auditory portion of the thalamus, the medial geniculate body (MGB). Layer III of the core can be characterized by the presence of small to medium sized pyramidal cells (Clarke and Morosan, 2012). Chemo-architectonically the core has a dense expression of AChE, cytochrome oxidase (CyOx), and parvalbumin in the neuropil of layer IV (Clarke and Morosan, 2012). Brodmann (1909) named the core auditory area BA 41 and it may correspond to area TC of Von Economo and Horn (1930). Rivier and Clarke (1997) confirmed the presence of a primary area using CyOx staining, and referred to it as AI. Morosan et al. (2001) refer to it as Te1. In accordance with the monkey auditory core, which includes two [auditory area 1 (AI), rostral field (R)] or three [AI, R, and rostrotemporal field (RT)] subdivisions (Rauschecker et al., 1995; Hackett et al., 2001), several studies divided the human core into subfields, most likely reflecting the human homologs of monkey AI and R (column "PAC/core" of **Table 1**; KAm and KAlt: Galaburda and Sanides, 1980; AI and LP: Wallace et al., 2002), and possibly RT (green field lateral to the core in **Figure 2**; PaAr: Galaburda and Sanides, 1980; Te1.2: Morosan et al., 2001; ALA; Wallace et al., 2002). Alternatively, the region lateral to the core could reflect an extension of the lateral belt (TB in Von Economo and Horn, 1930).

The position of the human PAC relative to sulcal and gyral landmarks is variable. While in the macaque monkey the core region is elongated along the rostro-caudal axis of the temporal lobe, in the chimpanzee—where a rudimentary HG appears in part of the brains—the core is roughly aligned to the main axis of HG that is oriented from posteromedial to anterolateral direction across the supratemporal plane (Hackett et al., 2001). In human brains, when only one HG is present, the core is confined to this HG and occupies its medial and central parts. However, when other combinations of HGs are present (occurring in the majority of the population), the PAC may extend postero-medially into medial HS and even onto the second HG (Galaburda and Sanides, 1980; Rivier and Clarke, 1997; Hackett et al., 2001; Morosan et al., 2001; Sweet et al., 2005). Importantly, the PAC has been reported to occupy approximately half of the HG volume (Rademacher et al., 2001). To date, a cytoarchitectonic analysis is needed to univocally determine the anatomical location of the PAC in humans (but see below for recent MR-based developments).

In monkey auditory cortex, a belt region is situated around the core. The belt contains various subdivisions, including the anterolateral field (AL), middle lateral field (ML), caudolateral field (CL), caudomedial field (CM), and middle medial field (MM) (Hackett et al., 1998, 2001; Rauschecker and Tian, 2004). Belt subfield CM seems to be intermediate in hierarchy between core and belt regions (Hackett et al., 2001). Immediately adjacent to the lateral belt on the exposed surface of the STG lay rostral and caudal parabelt (Hackett et al., 2001). In accordance with the monkey model, in the human cortex several less granular fields surround the PAC. The cell packing in these fields is less dense than in the core, and pyramidal cells in layer III are larger and more numerous (Clarke and Morosan, 2012). Occupying HS—posterior and immediately adjacent to the PAC—an area with a reduced granular structure compared to primary core areas (parakoniocortex) and with large pyramidal neurons in layer IIIc has been consistently reported (column "lateral belt" in **Table 1**; green regions in **Figure 2**; PaAi, Galaburda and Sanides, 1980; PA/LA, Rivier and Clarke, 1997 and Wallace et al., 2002; Te2, Morosan et al., 2001, 2005; region TB, Von Economo and Horn, 1930). Posterior to HS, bordering PaAi and extending along the STG, Galaburda and Sanides (1980) distinguished an additional region named PaAe (column "parabelt" of **Table 1**; posterior green region and red region in **Figure 2**). The medial part of this region may correspond to posterior BA 42, while its lateral part may correspond to BA 22 (Te3 in Morosan et al., 2005). At this approximate cortical region, other studies described subfields oriented in medial to lateral direction (areas PA, LA, and STA; Rivier and Clarke, 1997; Wallace et al., 2002; see **Table 1** and **Figure 2**). Posterior to these regions extending toward the temporoparietal junction, area Tpt is located (gray region in **Figure 2**). Tpt extends beyond



*Interpreted from Brodmann (1909); Von Economo and Horn (1930); Galaburda and Sanides (1980); Hackett et al. (2001); Morosan et al. (2001, 2005); Rivier and Clarke (1997), and Wallace et al. (2002). Regions defined by Hopf (1954) and Beck (1928) are included, as they were summarized in Nieuwenhuys (2012). The "medial junction" refers to the intersection of posteromedial HG, the retroinsula, and the medial aspect of the parietal operculum.*

the PT, including the posterio-lateral STG, portions of the parietal operculum, and part of the supramarginal gyrus (Galaburda and Sanides, 1980; Sweet et al., 2005). Cytoarchitectonically, this is a transitional region between specialized sensory and more general cortex.

At the intersection of the most postero-medial end of HG, around the retroinsular region and the medial aspect of the parietal operculum lies a region described as parakoniocortex (PaAc/d) by Galaburda and Sanides (1980; column "posteromedial HG" in **Table 1**). While this region shares parakoniocortical features, its layer III pyramidal cells are smaller than in the lateral belt posterior and lateral to PAC (PaAi). This region may correspond to the most medial part of Te1 (Te1.1; Morosan et al., 2001), area TD of Von Economo and Horn (1930), and reflect the human homolog of monkey belt region CM (and possibly CL), the region intermediate in hierarchy between core and belt (Hackett et al., 2001).

Anteromedial to the PAC, at the border between insular and temporal cortex, another region is discriminated across studies ("medial belt" in **Table 1**). Galaburda and Sanides (1980) distinguish area ProA, which may roughly correspond to area AA or MA in Wallace et al. (2002), and BA 52 (Brodmann, 1909; blue regions in **Figure 2**). This area is characterized by its relatively thin cortical ribbon and prominent infragranular layers. It may reflect the human homolog of monkey medial belt areas. The description of correspondences between monkey and human auditory cortex beyond the PAC is complicated by evolutionary recent expanses in the human cortex (Galaburda et al., 1978), and the lack of a thorough establishment of functional properties of these auditory cortical areas in either monkey or human (Schreiner and Winer, 2007). Comparative studies are needed to further our understanding of homologies and differences in the functional neuroanatomy of the human and monkey auditory cortex. To this end, large progress is expected from the acquisition of functional MRI (fMRI) measurements in the monkey brain (Petkov et al., 2006; Remedios et al., 2009; Joly et al., 2011).

#### **MYELOARCHITECTONIC PARCELLATIONS OF THE SUPRATEMPORAL PLANE**

In addition to parcellating the supratemporal plane based on its cell types and density (cytoarchitecture) or chemical pattern (chemoarchitecture), variations in myelin content provide for another possible subdivision (myeloarchitecture). In the macaque auditory cortex, myeloarchitectonic studies revealed that the auditory core can be discriminated from surrounding belt cortex by its heavy myelination, reflecting its high density of thalamocortical connections. Within the core region, the most caudally located A1 stains more heavily for myelin than R and RT (Hackett et al., 1998). After establishment of myeloarchitecture in the school of Vogt and Vogt (Nieuwenhuys, 2012), two influential parcellations of the human temporal lobe have been carried out by Hopf (1954) and Beck (1928). Beck divided the temporal lobe into six regions, while Hopf distinguished seven regions. Each of these regions could be distinguished in subregions that could then be further divided into areas, resulting in an impressive number of regions in both parcellation schemes (74 and 60 areas for Beck and Hopf, respectively).

Nieuwenhuys (2012) recently summarized these myeloarchitectonic schemes. As in the monkey, a densely myelinated region was defined on HG (ttr or ttr1; Nieuwenhuys, 2012), presumably reflecting human PAC. Myelin density was highest on the crown of HG, and decreased when moving from medial to lateral HG. While the caudal part of HG was astriate, with no stripe visible in layers IV/Vb due to a uniformly dense myelination, weaker myelination of layer Va in the rostral part of HG resulted in a unistriate pattern (layer IV was visible). This region of densest myelination could be divided further in medial-lateral direction (ttrIi and ttrIe in Beck, 1928) possibly reflecting the human homologs of monkey core fields A1 and R, and caudo-lateral direction (Hopf, 1954), possibly reflecting regions KAm/KAlt described in Galaburda and Sanides (1980).

While a correspondence between myeloarchitectonic and cytoarchitectonic schemes beyond the PAC remains highly tentative, it is consistently reported that myelination decreases with distance from HG most likely reflecting belt and parabelt regions. More specifically, posterior to HG on PT a bi-striate myelination has been reported (Hackett et al., 2001), resulting from the lower myelination of layers Va and VIa and resulting visibility of the inner and out stripes of Baillarger (layers Vb and IV, respectively). This region, most likely reflecting human lateral belt corresponds to region ttr2 in Hopf (1954; ttrII in Beck, 1928). Hopf (1954) further segregates ttr2 along the anterior/posterior axis, which resembles the distinction between Te2.1/Te2.2 (Clarke and Morosan, 2012) and PaAi/PaAe (Galaburda and Sanides, 1980). However, while ttr2 and Te2 occupy only PT, PaAe extends onto the STG. Here, both the parcellation by Morosan et al. (2005) and the myeloarchitectonic schemes of both Beck and Hopf (as described in Nieuwenhuys, 2012) discriminate a tertiary or parabelt type cortex (Te3 and the lateral part of ts/tsep, respectively, possibly including tpartr of Hopf). The medial part of ts/tsep is situated anterior to the densely myelinated core, and as such may correspond to regions ProA (Galaburda and Sanides, 1980), AA/MA (Rivier and Clarke, 1997; Wallace et al., 2002), BA 52 (Brodmann, 1909), and the monkey medial belt (Hackett et al., 2001).

While the cyto-, myelo-, and chemoarchitectonic parcellations each give different schemes and seem hard to reconcile at first glance, several studies emphasize that greater precision of boundary definition is achieved when multiple architectonic techniques are applied simultaneously (Hackett et al., 2001). A similar idea is pursued by Zilles et al. (2002), which mapped the human cortex based on multiple transmitter receptors (Zilles et al., 2002; Morosan et al., 2005). They found that human PAC contained a high density of cholinergic muscarinic M2 and nicotinic receptions, most densely expressed in middle cortical layers. Both M2 and nicotinic receptor density sharply dropped at the lateral border of PAC with the belt (Clarke and Morosan, 2012). The combination of cyto-, myelo, and receptor architecture mapping (by staining alternating brain slices with different methods) applied to regions beyond the auditory core may in the future provide the unification of the studies summarized above.

## **TONOTOPIC MAPS IN THE AUDITORY CORTEX TONOTOPY IN THE NON-HUMAN PRIMATE**

Numerous studies have investigated tonotopy—the orderly spatial representation of a neuron's preferred sound frequency—in the auditory cortex. Although tonotopy has been shown to break down at the level of individual cortical neurons (Bandyopadhyay et al., 2010; Rothschild et al., 2010), at a larger spatial scale tonotopic maps can reliably be found in the auditory cortex across species (Merzenich et al., 1973; Reale and Imig, 1980; Morel et al., 1993; Bendor and Wang, 2005). In primates, tonotopic maps are present in the core auditory region (Merzenich and Brugge, 1973), with reversals in the frequency gradient indicating the borders between the separate auditory fields (AI, R, and RT). The low-frequency border shared between AI and R and high-frequency border between R and RT appear to coincide with histologically defined borders (Merzenich and Brugge, 1973; Morel et al., 1993; Kaas and Hackett, 2000). The frequency selectivity or sharpness of tuning—reflecting the range of frequencies to which a neuron responds—is narrowest in core regions (Rauschecker et al., 1995; Hackett et al., 1998; Rauschecker and Tian, 2004; Kajikawa et al., 2005; Kusmierek and Rauschecker, 2009). In the belt areas, a number of auditory fields (e.g., AL, ML, CL, CM, and MM) have also been shown to contain a tonotopic map (Merzenich and Brugge, 1973; Rauschecker et al., 1995; Kosaki et al., 1997; Rauschecker and Tian, 2004; Kusmierek and Rauschecker, 2009). The primary frequency gradient (in the regions R, AI, and CM) runs parallel to the gradient in belt areas (AL, ML, and CL, respectively; Rauschecker and Tian, 2004). Consequently, reversals in the tonotopic gradient, used to divide core and belt into subfields, cannot be used to distinguish core from belt auditory cortex. Tuning width of neurons is commonly used to achieve this feat, as neurons in belt regions have a broader tuning width than those in core areas (Rauschecker et al., 1995; Hackett et al., 1998; Rauschecker and Tian, 2004; Kajikawa et al., 2005; Kusmierek and Rauschecker, 2009). Auditory cortex beyond the belt is not well-characterized in terms of its tuning to acoustic features (e.g., frequency preference, selectivity, and spectral/temporal modulations; Schreiner and Winer, 2007).

#### **TONOTOPIC MAPS IN THE HUMAN AUDITORY CORTEX**

FMRI studies in humans have partially confirmed the functional organization of the monkey auditory system. Early studies (Bilecen et al., 1998; Talavage et al., 2000; Engelien et al., 2002; Schönwiesner et al., 2002) gathered evidence for the presence of multiple frequency-selective responses along the Heschl's region, but failed to obtain detailed topographical representations of these frequency-selective responses. In one of the first neuroscientific applications of ultra-high field MR (7 Tesla), Formisano et al. (2003) depicted the detailed tonotopic layout of human PAC. Based on the spatial arrangement and mirror-symmetry of the frequency-selective responses, this tonotopic map was interpreted as reflecting the human homologs of monkey areas A1 and R (hA1 and hR; Merzenich and Brugge, 1973; Merzenich et al., 1973; Reale and Imig, 1980; Kaas and Hackett, 2000).

In recent years, the extraction of tonotopic maps throughout the human superior temporal plane with fMRI has become increasingly feasible (Talavage et al., 2004; Woods et al., 2009, 2010; Humphries et al., 2010; Da Costa et al., 2011; Striem-Amit et al., 2011; Langers and van Dijk, 2012). Resulting maps show good correspondence across studies. A large low frequency region on HG is consistently observed, surrounded posteriorly (on HS and PT), antero-medially, and antero-laterally (on PP) by regions preferring high frequencies (**Figures 3B,C**). The regions preferring high frequencies adjoin at the medial end of HG, creating a "V" shaped pattern (blue regions in **Figures 3B,C**). It is commonly agreed that the human auditory core is situated within this main high-low-high gradient, yet studies vary widely in the exact part of the tonotopic gradient that they assign to the core (Baumann et al., 2013; Saenz and Langers, 2014). Interpretations vary from a placement of the auditory core along HG ("classical interpretation," **Figure 3B**), to a placement across HG ("orthogonal interpretation," **Figure 3C**) and everything in between (Baumann et al., 2013). The classical interpretation is in agreement with cytoarchitectonic investigations of human auditory cortex that reliably place the core on the medial and central part of HG. However, as the long axis of monkey auditory core runs parallel to the STG, across-species consistency may favor a perpendicular (Da Costa et al., 2011) or oblique (Baumann et al., 2013) arrangement of the core. Moreover, while some studies interpret the complete high-low-high map, stretching from PP to PT, as reflecting two primary auditory fields hA1 and hR (Da Costa et al., 2011), other studies suggest that part of this large gradient reflects auditory belt fields (Talavage et al., 2004; Woods et al., 2009, 2010; Humphries et al., 2010; Striem-Amit et al., 2011). Beyond the main high-low-high frequency

gradient, an additional low frequency region is often reported at the antero-lateral border of the main gradient on PP/anterior STS (region 3 in **Figure 3D**; Talavage et al., 2004; Woods et al., 2009; Humphries et al., 2010; Moerel et al., 2012). Together with part of the anterior high frequency part of the main gradient, this region may reflect the human homolog of primary region RT (hRT).

#### **ADDITIONAL FREQUENCY GRADIENTS IN FINE-GRAINED TONOTOPIC MAPS**

As we explore tonotopic maps at higher spatial resolution, refrain from smoothing maps with large spatial filters, and inspect single subject maps, it becomes apparent that the auditory cortex contains a larger number of frequency reversals than commonly assumed (see **Figure 3D**). These additional gradients on the supratemporal plane are evident in individual subject maps, yet possibly due to their small extension and relatively variable location across individuals, are often not evident on group maps. Consequently, they are generally not discussed. Although we acknowledge that care must be taken with overinterpreting small regions, there are four patterns beyond the main gradient that consistently appear in single subject tonotopic maps (indicated with white circles in the tonotopic maps of **Figure 4**). These patterns may provide important information for defining a functional topography of human auditory areas.

First, the large low frequency region on HG and adjacent STG can be divided into two smaller regions (indicated with numbers 1A and 1B in **Figure 3D**; see regions "4" and "6" in left and right hemisphere, respectively in Striem-Amit et al., 2011, and the progression between endpoints 3 and 6 in Talavage et al., 2004). An ellipse-shaped low frequency region of which the long axis runs along HG is located in the middle of HG; this part of the large low frequency region most likely belongs to the auditory core (region 1A in **Figure 3D**; included within the black outlines on the tonotopic maps in **Figure 4**). A larger low frequency region can be discriminated on lateral HG/middle STG (region 1B in **Figure 3D**; white circle on middle STG in **Figure 4**). While these two low frequency regions often merge into one large frequency patch, only the medial region on HG belongs to the high-low-high core gradient. The region on lateral HG/STG may be part of the lateral belt.

Second, the large high frequency region on the anterior part of the auditory cortex (regions 2A and 2B in **Figure 3D**) is divided into two smaller regions by a small low frequency region (region 4 in **Figure 3D**, and most anterior white circle in **Figure 4**). This small low frequency region appears in a substantial number of single subject maps across fMRI investigations (Da Costa et al., 2011; Herdener et al., 2013; Moerel et al., 2013; see **Figure 4**), and may reflect fields of the human medial belt cortex (Brodmann area 52; Brodmann, 1909).

Third, another reversal in frequency is present on the posteromedial end of HG (region 5 in **Figure 3D** and medial white circle in **Figure 4**; Talavage et al., 2004; Da Costa et al., 2011; Langers, 2014). This region may correspond to monkey regions CM/CL, which have been reported to contain tonotopic gradients and share a high frequency border with A1 and ML, respectively (Kajikawa et al., 2005).

Finally, posterior to the main high-low-high frequency gradient, extending from HS covering PT and posterior STG, additional frequency regions are located (regions 6A,B in **Figure 3D**). A low frequency region has been reported (Humphries et al., 2010; Da Costa et al., 2011 describe this reversal attributing it mainly to the right hemisphere; Langers, 2014; the frequency progression between endpoints 3 and 8 in Talavage et al., 2004), and in many maps additional clusters preferring high frequencies are additionally present (posterior white circle on PT in **Figure 4**). Depending on the orientation of the auditory core, these regions have been interpreted as reflecting lateral belt and parabelt cortex (classical interpretation; Barton et al., 2012; Moerel et al., 2012), or CM/CL (orthogonal and oblique orientations; Humphries et al., 2010; Striem-Amit et al., 2011; Baumann et al., 2013). Because of large differences in anatomy between human and non-human PT (Hackett et al., 2001) and a relatively poor functional characterization of the monkey cortex beyond belt (Schreiner and Winer, 2007), we can no longer build on results from the nonhuman primate for an interpretation and parcellation of this region of cortex. Based on human cytoarchitecture studies, this region should include PaAe (Galaburda and Sanides, 1980; Te3, Morosan et al., 2005; LA/PA/STA, Wallace et al., 2002) and Tpt (Galaburda and Sanides, 1980). Understanding the precise topology of these regions requires additional knowledge regarding their response properties (e.g., tuning to temporal/spectral modulations, latency).

## **MAGNETO-ENCEPHALOGRAPHIC AND ELECTROPHYSIOLOGICAL RECORDINGS OF FREQUENCY SELECTIVE AUDITORY CORTICAL RESPONSES**

Major contributions to our current knowledge of the functional topography of human auditory cortex come from methods other than fMRI. Using MEG, the presence of a tonotopic organization has been investigated by exploring frequency-dependent shifts of auditory-evoked responses (AEFs). An early MEG study showed that evoked responses increased in depth (i.e., toward medial HG) with increases in frequency, presumably reflecting the tonotopic gradient in hA1. The observed tonotopic progression was described as a logarithmic mapping, in which the evoked response displaced as a function of the logarithm of the stimulus' frequency (Romani et al., 1982). MEG investigations of the human tonotopic organization since this study present conflicting outcomes. While some studies did not observe evidence of a tonotopic organization in their data (Roberts and Poeppel, 1996), the majority of MEG studies report posteromedial shifts of the equivalent dipole location with increasing frequency in agreement with Romani et al. (1982; N19m-P30m response in Scherg et al., 1989; N100m in Pantev et al., 1988; steady state response in Wienbruch et al., 2006). Findings of invasive (intracranial) electrophysiological recordings in humans are in accordance with this pattern. They observed that the neuron's CF increased toward postero-medial locations, supporting the presence of one tonotopic gradient on human medial HG (Howard et al., 1996). However, using MEG several other tonotopic patterns were observed as well, including one frequency gradient with reversed direction (compatible with the low-to-high gradient in hR; Hari and Mäkelä, 1986) and a mirror symmetric pattern (Pantev et al., 1995; N100m and Pam response reflecting high-to-low and low-to-high pattern when moving from postero-medial to antero-lateral locations, respectively). Throughout MEG studies, reported gradients are reproducible within an individual but highly variable across individuals (Lütkenhöner et al., 2003).

The variability across individuals and studies may be explained when considering advantages and disadvantages of using MEG as a tool for tonotopic mapping. While MEG does not suffer from fMRI drawbacks such as relatively low temporal resolution or interference of scanner noise, MEG is limited by other factors when mapping tonotopy (see discussion in Formisano et al., 2003). First, equivalent dipole modeling of neural activity originating from simultaneously active locations is problematic, and resulting dipoles generally reflect the combined activity from these sources. Tonotopic gradients within human auditory cortex are close to each other in time and space, and may therefore not be distinguished from each other using MEG. Second, because the multiple tonotopic gradients in subdivisions of human auditory cortex are variously oriented, it is not possible to distinguish whether an observed shift of dipole location with frequency originates from one tonotopic gradient, or from the relative weighting of these subdivisions. Again, this will have the result that MEG may not be able to distinguish frequency gradients correctly from each other. Third, interpretation of the N100m component that most MEG tonotopy studies are based on is controversial. While some studies interpret it as reflecting activity in PAC, substantial evidence points to it originating from secondary auditory areas such as PT instead (Lütkenhöner and Steinsträter, 1998; Engelien et al., 2000). This is supported by investigations of the anatomical origin of auditory evoked potentials using invasive electrophysiological recordings, which ascribed the generator of the N100 to PT and possibly the lateral part of HG (Liégeois-Chauvel et al., 1994). Thus, while MEG is well-suited to capture the dynamics of auditory processing, and has made substantial contributions to for example the investigation of cortical speech processing (Lütkenhöner and Poeppel, 2009), it is not optimal for mapping the relatively small frequency gradients within the human cortical tonotopic map.

Alternatively, invasive (intracranial) electrophysiological recordings (Liégeois-Chauvel et al., 1994; Howard et al., 1996; Nourski et al., 2014) have good spatial and temporal resolution, and thereby provide a unique window into the workings of human auditory cortex. For example, using invasive electrocorticography (ECoG) a recent study observed a dynamic mirror-symmetric tonotopic gradient on postero-lateral STG, supporting that cortical tonotopy maps extend far beyond the auditory core (Striem-Amit et al., 2011; Moerel et al., 2012). However, invasive electrophysiological recordings have limited applicability (i.e., mostly restricted to patients undergoing neurosurgical procedures for epilepsy or brain tumor) and spatial coverage (i.e., grid placement is determined by clinical criteria). As such, these measurements have not yet been able to provide a complete picture of spectral selectivity throughout the supratemporal plane.

## **CHARACTERIZATIONS OF AUDITORY CORTEX BEYOND TONOTOPY**

#### **LIMITATIONS OF TONOTOPIC MAPS**

Based on results from the monkey auditory cortex, the frequency gradient in the human core is commonly assumed to run parallel to the gradient in belt areas (Rauschecker and Tian, 2004). Consequently, the auditory cortex cannot be divided into core, belt, and parabelt based on maps of tonotopy alone. This creates several omissions in our knowledge of the human auditory cortex. As frequently discussed in the auditory neuroscience community, tonotopic maps alone are insufficient to determine the *orientation* of the auditory core with respect to HG (classical, orthogonal, or oblique; compare maps in bottom row of **Figure 3**). Equally important is the impossibility to determine the *size* of the human core based on tonotopic maps. For example, in the bottom row of **Figure 3** the auditory core can be equally well-represented by the black lines and the white dotted lines. Cytoarchitectonic parcellations of the auditory cortex showed that the average size of the human auditory core is approximately 1650 mm3, roughly half of the entire HG (average size of HG <sup>=</sup> 3200 mm3). The size of the auditory core, and the relation between the size of the core and the size of HG, was shown to vary greatly across individuals (Rademacher et al., 2001). These results should be taken into account when interpreting tonotopic maps. While it is commonly agreed that the auditory core must include HG, the several studies interpret not only the entire HG but also surrounding areas on PP and PT (Da Costa et al., 2011; Herdener et al., 2013; Langers, 2014), leading to a substantial overestimation of the auditory core size. Finally, macroanatomy, microanatomy, and tonotopic pattern vary substantially across individuals (Rademacher et al., 2001; Da Costa et al., 2011). Part of the tonotopic map varies with macroanatomy, such that the main low frequency patch is likely to move in a posterior direction in the case of partial or complete duplications of HG (Da Costa et al., 2011). However, with only tonotopy as a characterization of the auditory cortex, *interindividual variation* in cortical organization—including interindividual variation in the orientation of the core—cannot be resolved (Rademacher et al., 2001). Reliable estimates of core size and location in individuals are crucial if we are to systematically study the functional properties of auditory fields, the transformations of sound representations throughout these fields, and deviations in special cases (e.g., musicians, tinnitus, or cochlear implants patients).

#### *IN VIVO* **MAPPING OF MYELO-ARCHITECTURE**

Recent studies have explored functional and anatomical properties of the auditory cortex beyond its frequency preference. One promising research stream is to map cortical myelin density non-invasively using MRI (Glasser and Van Essen, 2011; Dick et al., 2012). While exploring cortical myelin density can only be performed on post-mortem tissue, recent studies showed that MRI contrast can reveal myelin-related maps *in vivo*. Specifically, myelin-related maps have been created using either quantitative T1 (Sigalovsky et al., 2006; Dick et al., 2012; Sereno et al., 2012), quantitative T2∗ (Cohen-Adad et al., 2012), or based on T2 or T2∗ weighted contrasts (Glasser and Van Essen, 2011; De Martino et al., 2014). In agreement with post-mortem studies, experiments mapping myelin non-invasively in the human cortex with MRI revealed a heavily myelinated region on the superior temporal plane. Specifically, anterior regions (PP) showed the least myelination, posterior regions (PT) were moderately myelinated, and HG was found to be most densely myelinated (Sigalovsky et al., 2006; Glasser and Van Essen, 2011). The highly myelinated region on HG coincided with probabilistic cytoarchitectonic regions Te1.1 and Te1.0 (Morosan et al., 2005; Glasser and Van Essen, 2011), and with two mirror-symmetric tonotopic gradients oriented along HG that were interpreted as reflecting hA1 and hR (Dick et al., 2012). The medial part of this mirrorsymmetric gradient showed a slightly greater myelination than the lateral part. De Martino et al. (2014) partially replicated these findings at 7T. Using a clustering approach and multiple MRcontrasts, they automatically identified the most densely myelinated region in individual hemispheres. This region overlapped with a single high-to-low frequency gradient (see third column in **Figure 4**), and was interpreted as reflecting hA1. Importantly, both studies showed that myelin-related contrast varied among hemispheres and individuals, illustrating the need to obtain a distinct measure beyond tonotopy in order to identify the core in individual hemispheres (Dick et al., 2012; De Martino et al., 2014).

### **FUNCTIONAL CORTICAL TUNING BEYOND FREQUENCY**

In addition to cortical myelin contrasts, functional properties may provide crucial information on the auditory cortical organization. In the monkey auditory cortex, cortical tuning width is employed to distinguish core from belt areas. Tuning width refers to the frequency selectivity of a neuron, which is narrower in core than in belt regions (Rauschecker et al., 1995; Hackett et al., 1998; Rauschecker and Tian, 2004; Kajikawa et al., 2005; Kusmierek and Rauschecker, 2009). Recent studies used a computational model to analyze responses to natural sounds measured with fMRI, and thereby obtained maps of tuning width throughout the human auditory cortex (Moerel et al., 2012; De Martino et al., 2013). Regions of both narrow and broader tuning could be identified throughout the supratemporal plane. A narrowly tuned region along HG was evident in both hemispheres (second column of **Figure 4**). When only interpreting the narrow part of the tonotopy map as the PAC, a high-low-high-low tonotopic gradient was distinguished running in antero-lateral direction along HG (**Figure 4**). This region was identified as reflecting hA1, hR, and hRT. Note that tuning width maps only reflect the width of the main spectral peak, and therefore do not convey information regarding the complexity of spectral tuning (Moerel et al., 2013). Furthermore, as each fMRI voxel combines the signal coming from a substantial cortical patch and a large number of neuronal populations, the tuning width maps may reflect at least in part the homogeneity of neuronal spectral tuning rather than the tuning width alone. Consequently, while tuning width maps may be used to identify PAC in individuals, they may not be informative for regions beyond HG.

As natural sounds can be characterized well by their energy modulations in the spectral and temporal dimensions, it has been suggested that preferential processing of these auditory features may crucial to describe the topography of the auditory cortex. Indeed, in the monkey auditory midbrain (Baumann et al., 2011) and cat auditory cortex (Langner et al., 2009), a map of periodicity preference orthogonal to the tonotopic map has been observed. Periodicity refers to the rate of temporal modulations in a sound, which evokes a corresponding pitch percept. Recent studies used fMRI to map frequency and periodicity preference throughout the human auditory cortex (Barton et al., 2012; Herdener et al., 2013). Based on the combination of these maps, Barton et al. (2012) parcellated the auditory cortex into "clover leaf" clusters (Barton et al., 2012). Within this parcellation scheme, tonotopic reversals serve to segregate cloverleaf clusters from each other, while periodotopic reversals divide a cluster into auditory fields. In this manner 11 auditory subfields were identified, with core regions hA1 and hR occupying medial and lateral HG, respectively. Conversely, Herdener et al. (2013) observed a gradient of periodotopic preference along HG, with medial and lateral parts preferring high and low temporal modulations, respectively. This discrepancy in results is so far not explained when simultaneously exploring preference to combined spectral and temporal modulations, either using artificial sounds ("ripples"; Langers et al., 2003; Schönwiesner and Zatorre, 2009) or complex natural sounds (Santoro et al., 2014). While the cortical spectral modulation preferences revealed by these studies consistently showed that regions along and antero-ventrally to HG process fine-grained spectral information, such consistency across studies was not apparent with regard to resulting maps of temporal modulation preference (i.e., periodotopy). Future studies will be needed to elucidate these findings.

Beyond large-scale maps of feature preference, research in primates (Petkov et al., 2008) and humans (Belin et al., 2000; Zatorre et al., 2002) indicates that the non-PAC contains regions where neuronal populations respond stronger to conspecific vocalizations than to other sound categories (i.e., speech and voice regions, see **Figure 4**). These speech/voice regions contribute to the formation of higher-level sound representations, at least partially abstracted from the sound acoustics (Belin et al., 2000; Formisano et al., 2008). A recent exploration of the relation between these higher level regions and low level feature maps revealed a consistent overlay of speech/voice regions and the low frequency part of tonotopic maps (Moerel et al., 2012; compare first and last columns of **Figure 4**). This overlap was present even when simple tones were presented and was interpreted as reflecting a specialized filter mechanism, enhancing those low level features (i.e., the low frequencies) crucial to speech and voice sounds. These results suggest that—similar to eccentricity mapping in the visual system (Malach et al., 2002)—tonotopic mapping may also help defining the topography of high-order auditory areas.

## **A WORKING MODEL OF HUMAN AUDITORY CORTEX THE ORIENTATION AND SIZE OF THE HUMAN CORE**

Over a decade after the first fMRI studies showing tonotopic maps in the human auditory cortex, the discussion of how these maps should be interpreted is still at full force (Langers and van Dijk, 2012; Baumann et al., 2013; Saenz and Langers, 2014). To add to this discussion, we propose a working model of the human auditory cortex (**Figure 5**) and attempt to reconcile results

our tonotopy maps.

from high-resolution tonotopic mapping and other non-invasive functional characterizations with results from *post-mortem* and *in vivo* anatomical studies. Furthermore, we discuss similarities and divergence with respect to the commonly accepted model of monkey auditory cortex (Hackett et al., 1998).

boundaries assumed based on literature, but for which no objective measure is available. White outlines indicate maps of myelin (left) or

Based on human cytoarchitectonics, the core size should on average correspond to half of HG. Furthermore, it should be largely restricted to HG, yet deviations from HG can occur at the postero-medial end especially in the case of partial or complete duplications. In individual subjects the core should coincide with a narrowly tuned region of tuning width maps. Accordingly, we place the core largely in the medio-lateral direction of HG, oriented at a relatively small angle from the long direction of the STG (**Figures 4**, **5**). The medial part of the core coincides with the region of highest myelination. This orientation of the core is compatible with the macaque model. In a recent review, Baumann et al. (2013) clarified that contrary to common assumption the macaque auditory cortex contains a protuberance that may be interpreted as the precursor of human HG. The main high-lowhigh tonotopic gradient runs at a slight angle with respect to this protuberance, extending slightly beyond its postero-medial and antero-lateral endpoints. The proposed core region in **Figures 4**, **5** is consistent with this arrangement. Note that while our proposed region is very similar in orientation compared to the region proposed by Baumann et al. (2013), there is a crucial difference in the location of hA1. Baumann et al. (2013) proposed to place hA1 in the medial HG with overlap into medial and central HS and possibly extending onto PT. Instead, in our model hA1 occupies medial locations including medial HG and medial HS, but excludes the most lateral part of hA1 (i.e., the extension of hA1 into central HS/PT) as proposed by Baumann et al. (2013).

HG. Isofrequency gradients are color-coded to match the colorscale in

The first high frequency maximum occupies the most medial part of HG. The main high-to-low frequency gradient, reflecting hA1, proceeds laterally ending in the main low frequency maximum. The tonotopic gradient reverses direction at this low frequency maximum and travels to the second high frequency maximum on anterior and lateral HG. This creates the second complete frequency map, reflecting hR. The frequency gradients in hA1 and hR run at an approximately 90◦ angle to each other (see two white arrows in hA1 and hR in **Figure 5**). The organization of the resulting human PAC tonotopy model is strikingly similar to what was proposed for the non-human primate. Compared to the model as proposed by Hackett et al. (1998, 2001; see Kaas and Hackett (2000) for the orientation of isofrequency bands within the core), the current PAC is rotated to align with HG but its tonotopic organization remains identical (compare human tonotopy gradient in **Figure 5** "Group – Core" to primate model inset in this panel). A second frequency gradient reversal occurs in antero-lateral HG (reflecting hRT; most anterior and lateral white arrow in **Figure 5**), resulting in a high-low-high-low tonotopic pattern within human PAC.

#### **EXPLORING BELT AND PARABELT REGIONS**

An auditory responsive region surrounds the core, possibly reflecting the medial and lateral belt (middle part of **Figure 5**). In the medial portion, a small low frequency region divides the large region preferring high frequencies into two separate regions (hMM and hRM) possibly reflecting the homolog belt fields in the macaque (Kusmierek and Rauschecker, 2009). At the medial junction of the belt regions, located on the medial crown of HS, another reversal in frequency is found. We interpret it as reflecting regions hCM and hCL, each containing a fully tonotopic gradient (Hackett et al., 1998; Kajikawa et al., 2005), but with strong predominance of responses to high frequencies. The lateral belt (middle part of **Figure 5**) contains a single high-to-low tonotopic gradient in postero-medial to antero-lateral direction (that follows the tonotopic gradient in hA1). Although only one tonotopic gradient occupies this region, it may comprise two functionally separate subdivisions. The medial part of this region includes a full high-to-low tonotopic gradient, and may correspond to the human homolog of lateral belt regions ML in the monkey (Hackett et al., 1998). Conversely, the lateral part of this region is strongly tuned to low frequencies and overlaps with the speech/voice sensitive region on lateral HS/middle STG (white outlines in **Figure 5**; Belin et al., 2000; Moerel et al., 2012). We interpret this region as reflecting hAL. The low frequency tuning of this region could reflect a uniquely human property, deriving from the need to process the low-frequency spectral energy of voices and speech. Furthermore, based on its anatomical location (lateral to the hA1/hR boundary) and low frequency tuning, this region may correspond to the human homolog of the "pitch" region (Griffiths and Hall, 2012), which was shown to contain a large proportion of low-frequency tuned neurons responding selectively to missing fundamental pitch (Bendor and Wang, 2005).

The human parabelt may be situated posterior-laterally to the lateral belt (right part of **Figure 5**; Galaburda and Sanides, 1980; Hackett et al., 1998; Morosan et al., 2005). This region is substantially larger in the left than in the right hemisphere. Correspondingly, while the parabelt may be largely situated on the external part of PT in the left hemisphere, it may be shifted laterally onto posterior STG/STS in the right hemisphere. Systematic research across individuals is required to confirm this proposal. The border between lateral belt and parabelt cannot be reliably identified non-invasively. Progress in making this division may be expected from MRI explorations of myelin contrast. In addition to the cluster with densest myelination, which was interpreted as reflecting the core, De Martino et al. (2014) identified three regions with varying patterns of myelination throughout the cortical depth. If and how these clusters may be related to belt/parabelt divisions is topic of further investigations. Alternatively, we may learn more about the parabelt regions by exploring their functional properties. Additional gradients of tonotopy and tuning width occupy these regions on the STG and the posterior end of the temporal plane. Within these regions, speech/voice sensitive regions reside. Mapping the response properties of these regions (e.g., tuning to temporal/spectral modulations, latency; Santoro et al., 2014) and the transformations of sound representations may give insights in the topology of the auditory parabelt.

#### **HEMISPHERIC DIFFERENCES IN THE TOPOGRAPHY OF HUMAN AUDITORY CORTEX**

While a subset of tonotopy studies observed hemispheric biases, reporting a more prominent tonotopic organization in either left (Wessinger et al., 1997) or right hemisphere (Bilecen et al., 1998; Langers et al., 2007), in others (Woods et al., 2009; Da Costa et al., 2011; Striem-Amit et al., 2011; Moerel et al., 2012) the main tonotopic axis in the vicinity of HG is similar across hemispheres and also the orientation of the narrowly tuned region with respect to HG appears stable across hemispheres (Moerel et al., 2012). The additional frequency regions reported above (on middle STG, in the FTS, and on the posteromedial end of HG) are consistently observed in the right hemisphere as well as in the left. The only exception may be the additional frequency gradients posterior to the main high-low-high frequency gradient, extending from HS covering PT and posterior STG. While the additional low frequency reversal on posterior STG/lateral PT (region 6A in **Figure 3D**) is present in the right hemisphere, the right hemisphere tonotopic map may miss a part in medial PT (region 6B in **Figure 3D**).

We observed an increase in intersubject variability in the right hemisphere tonotopic maps compared to the left hemisphere tonotopic maps (Moerel et al., 2013). It is not clear whether this increased variability is due to poorer across-subject alignment of macroanatomy, or if it reflects true variability in the tonopic pattern. In terms of gross macroanatomy, the right supratemporal plane is shifted anteriorly and laterally compared to the left supratemporal plane (shift of approximately 7 and 5 mm anteriorly and laterally, respectively; Rademacher et al., 2001), and the STS is deeper in the right than left hemisphere (Ochiai et al., 2004). Alternatively, the SF is longer and more horizontal in the left hemisphere than the right (Steinmetz et al., 1990; Ide et al., 1996), and PT is larger in the left than the right supratemporal plane (Geschwind and Levitsky, 1968). These asymmetries in macroanatomy already exist in infants (Witelson and Pallie, 1973; Glasel et al., 2011), suggesting that they are genetically determined. Interestingly, a recent study showed that acrosssubject variability in macroanatomical landmarks and functional responses increases with development in the right compared to the left hemisphere. This suggests that the right supratemporal plane may be shaped by unique individual developmental experiences (Bonte et al., 2013). The increased inter-subject variability in the right hemispheric tonotopic maps compared to the left is in accordance with this suggestion. Interhemispheric differences have also been reported at microanatomical level. As PT is larger in the left hemisphere, cytoarchitectonically defined region Tpt on PT is larger in the left hemisphere as well (Galaburda and Sanides, 1980). While the size of left and right PAC is similar (Galaburda and Sanides, 1980; Rademacher et al., 2001), regions PaAi and PaAe that are located posterior to HG and anterior to Tpt are larger in the right hemisphere (Galaburda and Sanides, 1980). Therefore, we expect lateral belt regions (middle column in **Figure 5**) to be wider in the right hemisphere compared to our working model.

The left and right hemispheres have different functional roles in sound processing. Studies showed a relative dominance for language processing and tonal, music, and voice processing for left and right hemisphere, respectively (Zatorre, 1988; Belin et al., 2000; Hickok and Poeppel, 2000; Scott et al., 2000). Hemispheric biases in acoustic feature processing reflect the computational demands arising from this task-dependent specialization, such that the left hemisphere is relatively optimized for temporal processing (Shannon et al., 1995; Liégeois-Chauvel et al., 1999; Zatorre et al., 2002), while the right hemisphere is relatively superior in fine spectral processing (Liégeois-Chauvel et al., 2001; Zatorre and Belin, 2001; Zatorre et al., 2002). To the best of our knowledge, the relation of this functional asymmetry to the underlying tonotopic maps has not been studied. As such, it is a challenge for future research to explore how these reported hemispheric biases in spectrotemporal processing are reflected within the different auditory fields.

## **ACKNOWLEDGMENTS**

This work was supported by Maastricht University and the Netherlands Organization for Scientific Research (NWO: VICI grant [Elia Formisano] 453-12-002; VIDI grant [Federico De Martino] 864-13-012; Rubicon [Michelle Moerel] 446-12-010).

### **REFERENCES**


by progressions of frequency sensitivity. *J. Neurophysiol.* 91, 1282–1296. doi: 10.1152/jn.01125.2002


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 15 May 2014; accepted: 08 July 2014; published online: 29 July 2014.*

*Citation: Moerel M, De Martino F and Formisano E (2014) An anatomical and functional topography of human auditory cortical areas. Front. Neurosci. 8:225. doi: 10.3389/fnins.2014.00225*

*This article was submitted to Auditory Cognitive Neuroscience, a section of the journal Frontiers in Neuroscience.*

*Copyright © 2014 Moerel, De Martino and Formisano. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## The relevance of task-irrelevant sounds: hemispheric lateralization and interactions with task-relevant streams

## *Ana A. Amaral 1,2\* and Dave R. M. Langers 2,3*

*<sup>1</sup> International Neuroscience Doctoral Programme, Champalimaud Neuroscience Programme, Champalimaud Centre for the Unknown, Lisbon, Portugal*

*<sup>2</sup> Department of Otorhinolaryngology/Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Groningen, Netherlands*

*<sup>3</sup> National Institute for Health Research, Nottingham Hearing Biomedical Research Unit, School of Medicine, University of Nottingham, Nottingham, UK*

#### *Edited by:*

*Yukiko Kikuchi, Newcastle University Medical School, UK*

#### *Reviewed by:*

*Kimmo Alho, University of Helsinki, Finland Amy Poremba, University of Iowa, USA*

#### *\*Correspondence:*

*Ana A. Amaral, Champalimaud Foundation, Champalimaud Centre for the Unknown, Avenida Brasília, 1400-038 Lisboa, Portugal e-mail: ana.amaral@ neuro.fchampalimaud.org*

The effect of unattended task-irrelevant auditory stimuli in the context of an auditory task is not well understood. Using human functional magnetic resonance imaging (fMRI) we compared blood oxygenation level dependent (BOLD) signal changes resulting from monotic task-irrelevant stimulation, monotic task-relevant stimulation and dichotic stimulation with an attended task-relevant stream to one ear and an unattended task-irrelevant stream to the other ear simultaneously. We found strong bilateral BOLD signal changes in the auditory cortex (AC) resulting from monotic stimulation in a passive listening condition. Consistent with previous work, these responses were largest on the side contralateral to stimulation. AC responses to the unattended (task-irrelevant) sounds were preferentially contralateral and strongest for the most difficult condition. Stronger bilateral AC responses occurred during monotic passive-listening than to an unattended stream presented in a dichotic condition, with attention focused on one ear. Additionally, the visual cortex showed negative responses compared to the baseline in all stimulus conditions including passive listening. Our results suggest that during dichotic listening, with attention focused on one ear, (1) the contralateral and the ipsilateral auditory pathways are suppressively interacting; and (2) cross-modal inhibition occurs during purely acoustic stimulation. These findings support the existence of response suppressions within and between modalities in the presence of competing interfering stimuli.

**Keywords: auditory cortex, dichotic listening, lateralization, cross-modal inhibition, Humans, fMRI**

## **INTRODUCTION**

The auditory system relies on various clues to segregate concurrent sound streams. These among others include clues related to sound source location, derived from head-related transfer functions, binaural interaural time differences, and interaural level differences, for instance (Ehret and Romand, 1997; Moore et al., 2010). The relationship between the lateralization of sound that is detected by the two ears and the lateralization of soundevoked brain responses in the two hemispheres has been well studied. Both ears are known to project to both auditory cortices through contralateral and ipsilateral auditory pathways. Contralateral connections are more numerous than the ipsilateral ones (Rosenzweig, 1951; Hall and Goldstein, 1968; Reite et al., 1981). Brain responses resulting from monotic stimulation are bilateral and stronger in the hemisphere contralateral to stimulus presentation (Rosenzweig, 1951; Reite et al., 1981; Pantev et al., 1986; Scheffler et al., 1998; Alho et al., 1999; Woldorff et al., 1999; Langers et al., 2005; Della Penna et al., 2007). Furthermore, it has been shown using functional magnetic resonance imaging (fMRI) (Scheffler et al., 1998; Jäncke et al., 2002; Krumbholz et al., 2005) and magnetoencephalography (MEG) (Pantev et al., 1986; Fujiki et al., 2002; Kaneko et al., 2003) that the responses are sub-additive, that is, the sum of brain responses to left and right monotic stimulation exceeds the response to dichotic stimulation, a phenomenon known as 'binaural interaction'. To explain this, it has been suggested that a competition arises between the two pathways causing the stronger contralateral pathway to suppress the ipsilateral one, decreasing the overall brain responses (Fujiki et al., 2002; Kaneko et al., 2003; Brancucci et al., 2004; Della Penna et al., 2007).

Competition between inputs from the ipsi- and contralateral ear has been observed in the context of dichotic listening tasks, where participants are requested to attend to both ears receiving task-relevant streams and report the stimulus that was best heard. However, because dichotic stimulation typically involves multiple stimulus streams, attention forms an important confounding factor. Attention is known to influence auditory information processing (Jäncke et al., 1999; Petkov et al., 2004; Fritz et al., 2005; Rinne et al., 2005, 2009; Polley et al., 2006), and it can modulate neural responses in a "top-down" fashion (Kastner et al., 1999; Kastner, 2000; Fu et al., 2001). A common characteristic among some dichotic listening experiments is that subjects distribute their attention across the two presented streams (*divided attention*) and, when required, they are free to report from any ear (*non-forced attention*). Giving subjects the freedom to choose where to attend adds variability to these experiments: attentional shifts between ears are likely to occur and may interfere with lateralization effects related to "bottom up" acoustic clues. In electroencephalography (EEG) research the amplitude of the N1 component of the auditory event-related potential (ERP) that is evoked by the auditory stimuli is larger when the stimuli are attended than when the stimuli are unattended (Picton et al., 1971; Hillyard et al., 1973; Woldorff et al., 1993), but only if subjects are able to sustain their attention to the relevant stimuli (Donald and Young, 1982). In summary, it is unclear if the competition arising between the contralateral and ipsilateral pathways results from a bottom-up acoustic process, a top-down cognitive and attentional mechanism, or both.

In order to elucidate this, a change to the dichotic listening task can be introduced (Bryden et al., 1983) by forcing subjects to focus on the information presented to one of the two ears (*focused attention*) according to a provided instruction (*forced attention*). Thus, direction of attention becomes a controlled parameter in the experiment. Knowing a priori which task needs to be performed enables a person to focus on particular modalities, stimuli or stimulus features. Focusing on a task introduces a bias toward the stimuli or modalities that are relevant for task performance. However, the presence of task-irrelevant distracting stimuli can cause interference, which can result in unintended shifts of attention and consequently decreased performance or increased reactions times (Berti and Schröger, 2001). Recent EEG studies showed that the N1 amplitude is reduced in the presence of competing task-irrelevant auditory distractions presented to an unattended ear, when attention is directed to a task-relevant stream simultaneously presented to the other ear (Ahveninen et al., 2011; Ponjavic-Conte et al., 2012). This suggests that the presence of distractions interferes with top-down attentional enhancement of task-relevant stimuli.

A number of studies investigating the neural processing of task-irrelevant unattended stimuli showed that it may involve early sensory or later cognitive stages (Berti and Schröger, 2001; Sætrevik and Hugdahl, 2007; Sætrevik and Specht, 2009; Sabri et al., 2013). A recent study introduced a modified version of the dichotic listening paradigm with attentional instruction in which the relative intensity of the presented stimuli in both ears was varied (Westerhausen et al., 2009, 2010). This allowed not only the manipulation of a top-down cognitive cue (the instruction which ear to attend) but also a bottom-up acoustic cue (the interaural level difference). This study found that bottom-up and top-down mechanisms do not act independently. The authors identified two networks responsible for the interaction of the two different processes—a medial-lateral frontal cognitive control and a fronto-parietal attention control network. Moreover, in agreement with other studies (Barch et al., 1997; Duncan and Owen, 2000), they showed increases in the activations in frontal and parietal areas known to be involved in control of attention, indicating that degradation of the sensory input increases task difficulty that can be compensated with increased attention. However, interestingly, the study by Westerhausen et al. (2010) did not reveal any changes in activation in the auditory cortex (AC) nor a significant effect of stimulus manipulation.

The nature and mechanisms underlying the interactions between contralateral and ipsilateral auditory pathways remain an open question. In particular it is not known how these interactions change in the presence of differently attended or unattended stimulus streams. Research focusing on the role of task-irrelevant stimuli in auditory processing can be particularly relevant to increase our understanding of attentional disorders such as ADHD (e.g., Cherkasova and Hechtman, 2009), and conditions like tinnitus, also known as 'ringing in the ears', where subjects perceive sounds unrelated to their acoustical environment (e.g., Roberts et al., 2013).

In the present study, we used a forced attention dichotic listening task and varied the instruction and the task-irrelevant unattended stimulus identity, while maintaining an identical attended stimulus stream. This enabled us to modulate the top-down attentional processing and the bottom-up acoustic responses in relation to the processing of unattended stimuli. We used fMRI to test the hypothesis that unattended stimuli are essentially processed in bottom-up fashion, without top-down enhancement.

## **METHODS**

#### **SUBJECTS**

Twenty-one healthy subjects (11 female, 2 left handed: one female and one male) aged between 20 and 61 (mean 40*.*4 ± 11*.*1 *SD*) years were recruited through advertising. All subjects reported normal hearing, which was verified through standard pure tone audiometry. Averaged over both ears, mean thresholds across octave frequencies from 0.25 to 2 kHz equaled 6*.*3 ± 7*.*0 dB HL. All subjects had normal, or corrected-to-normal, vision. Each subject gave written informed consent in approved accordance with the guidelines of the Medical Ethical Committee of the University Medical Center Groningen in The Netherlands. This work is part of a bigger study in which subjects participated on two separate days. The present report concerns one of the two 1 h neuroimaging sessions that was preceded by an approximately half and hour instruction and practice session.

## **TASK AND STIMULI**

The stimuli that were used in the neuroimaging session were letters (8 consonants: "L," "T," "R," "C," "H," "K," "S," "Q") spoken by a Dutch speaker as consonant-vowel or vowel-consonant utterances (/εl/, /te:/, /εr/, /se:/, /ha:/, /ka:/, /εs/, and /ky/, respectively). These were presented at a fixed rate of 1 Hz through MR-compatible headphones (MR Confon GmbH, Magdeburg, Germany; Baumgart et al., 1998).

Subjects performed an auditory one-back task in which a taskrelevant stream was presented in either the left or the right ear and, at the same time, a task-irrelevant stream was presented in the other ear. Both streams were spoken by different talkers, a female voice for the task-relevant stream and a male voice for the task-irrelevant stream. Subjects were required to attend to the task-relevant stream, compare consecutive stimuli, and press, at every stimulus presentation, one button if the stimuli were the same (i.e., a target), and a different button if the stimuli were different. Target stimuli were present at 30% probability. Subjects were instructed to answer as quickly and accurately as possible. All subjects' button presses were recorded. (**Figure 1**).

The stimuli that were presented in the task-irrelevant stream could either consist of the same letter as that in the target stream, a different letter, or it could consist of something different from a letter; in the latter case the competing stimuli comprised bird song syllables (Joly et al., 2012). The length of each of the auditory stimuli (letter and non-letter) ranged from 350 to 450 ms.

During the dichotic conditions the left and right ear stimulus onset was synchronous. In addition, two monotic control conditions were included in which only one stimulus stream was presented. In one, the subjects performed the one-back task in the absence of a task-irrelevant stream; the other was a passive listening condition. This resulted in a total of five different conditions: *Concordant*, *Discordant*, *Non-Letter*, *No-Distractor*, and *No-Task*.

stimulus is different from the previous one and a different button if it is

The neuroimaging session comprised six runs of approximately 7 min each. Before each run subjects were orally instructed to pay attention to the target stream in either the left (L) or the right (R) ear. Stimuli were presented in a block design. Each run comprised 10 blocks, two of each condition. Each block started with a 2-s visual instruction informing the subjects whether they had to perform the one-back task (for *Concordant*, *Discordant*, *Non-Letter* and *No-Distractor*) or not (for *No-Task*), followed by 23 stimulus presentations. Consecutive blocks were separated by 15 s during which no stimuli were presented. The order of the runs, conditions and stimuli was randomized. Subjects were instructed not to close their eyes (except for blinking) and fixate on a white dot on a screen during all runs (**Figure 1**).

employed in the experimental procedure.

The percentage of correct responses to stimulus presentations (number of correct "same" or "different" responses divided by the number of trials) was determined as a measure of subjects' performance. In order to avoid any masking effects of scanner noise we discarded all trials that coincided with acquisitions. Subjects' responses were considered to belong to a stimulus if a button was pressed between 100 and 1100 ms after its onset of presentation.

Performance was analyzed by means of a Two-Way repeated measures analysis of variance (ANOVA) with factors for attended ear (2 levels: L, R) and condition (4 levels: *Concordant*, *Discordant*, *Non-Letter*, *No-Distractor*; since *No-Task* did not produce behavioral results).

#### **FUNCTIONAL MRI**

Neuroimaging was performed using a Philips Intera 3-Tesla MR system, equipped with an 8-channel phased-array (SENSE) head coil, at the Neuroimaging Center (NiC) in Groningen. An anatomical T1-weighted image was acquired for each subject before the functional imaging acquisition. Blood oxygenation level dependent (BOLD) images were acquired using a silent sparse sampling paradigm (TR = 13 s, TA = 2.0 s, which resulted in a silent interval of 11 s) to avoid interference from acoustic scanner noise (Hall et al., 1999). For each of the six runs, 32 dynamic T2∗-sensitive echo planar imaging volume acquisitions were collected (TE <sup>=</sup> 22 ms, FOV <sup>=</sup> <sup>192</sup> <sup>×</sup> <sup>192</sup> <sup>×</sup> 144 mm3, 64 × 62 × 48 matrix). All subjects wore earplugs to attenuate MRI-related gradient-induced noise.

Data were preprocessed with the Statistical Parametric Mapping software package (SPM8, FIL Welcome Trust Centre for Neuroimaging, London, UK) running in MATLAB® (Natick, Massachusetts: The MathWorks Inc.). The first dynamical scan in each run was used to trigger sound presentation but was excluded from the analyses due to lack of magnetization equilibrium. Each subject's data were realigned, the anatomical images were coregistered to the functional images, all images were normalized into Montreal Neurological Institute (MNI) stereotaxic space and smoothed using an isotropic 5-mm full-width at halfmaximum Gaussian kernel. A logarithmic transformation was applied to express all voxels' signals in units of percentage signal change relative to the mean. A general linear model (GLM) was constructed for each subject that included 10 regressors modeling all experimental conditions (2 ears × 5 tasks), six regressors containing the estimated motion parameters modeling residual motion effects, and four regressors for each run describing a 0th to 3rd order Legendre polynomial modeling baseline and scanner drift effects. All statistical parametric maps resulting from constructed contrasts were thresholded at *p <* 0*.*05, corrected for family-wise errors (FWE), unless stated otherwise.

Anatomically defined regions of interest (ROI) were obtained from the WFU\_PickAtlas software toolbox (Lancaster et al., 1997, 2000; Maldjian et al., 2003). All the areas from the Brodmann area (BA) atlas based on the Talairach Daemon database were used as ROIs. In addition, the following sensory ROIs were defined and separated into subvolumes in the left and right hemisphere for further analysis: primary auditory cortex (PAC: L-BA41+42 and R-BA41+42), secondary auditory cortex (SAC: L-BA22 and R-BA22), primary visual cortex (PVC: L-BA17 and R-BA17) and secondary visual cortex (SVC: L-BA19 and R-BA19).

For the aforementioned ROIs, the estimated regression coefficients were averaged across all voxels. Effects of interest were assessed by means of a Two-Way repeated-measures ANOVA comprising a 2-level factor ear (L or R) and a 5-level factor condition (*Concordant*, *Discordant*, *Non-Letter*, *No-Distractor* or *No-Task*). In the cases where the assumption of sphericity was violated the degrees of freedom were adjusted using Greenhouse-Geisser correction. *Post-hoc* analysis was performed using pairwise comparisons between conditions. A correction for multiple comparisons was performed using False Discovery Rate (FDR) criteria, controlled at 0.05 level. The following families of nullhypotheses were assessed by means of Student *t*-tests: stimulus effect (*No-Task* vs. baseline); task effect (*No-Distractor* vs. *No-Task*); distractor effect (*Concordant* vs. *No-Distractor*; *Discordant* vs. *No-Distractor*; *Non-Letter* vs. *No-Distractor*); distractor comparison (*Concordant* vs. *Discordant*; *Concordant* vs. *Non-Letter*; *Discordant* vs. *Non-Letter*); instruction effect (L-*No-Task* vs. R-*No-Task*; L-*No-Distractor* vs. R-*No-Distractor*; L-*Concordant* vs. R-*Concordant*; L-*Discordant* vs. R-*Discordant*; L-*Non-Letter* vs. R-*Non-Letter*).

## **RESULTS**

#### **BEHAVIOR**

As shown in **Figure 2**, task performance was high for all subjects. The mean percentage of correct responses over all subjects was above 85% for each combination of ear and condition. The ANOVA did not reveal any significant dependence upon the factor ear (*p* = 0*.*949). However, for the factor condition there was a significant dependence (*<sup>p</sup>* <sup>=</sup> <sup>5</sup> <sup>×</sup> <sup>10</sup>−6). Correct responses to the discordant letter condition were lowest, followed by the non-letter and concordant conditions. Since the interaction between both factors was not significant (*p* = 0*.*93), results were averaged over the non-significant factor ear and a paired Wilcoxon signed-rank test was calculated for all pairs of conditions, and FDR corrected for multiple comparisons (controlled at 0.05 level). Significant differences were found for all three pairs involving the discordant letter condition: *No-Distractor* vs. *Discordant* (*p* = 2*.*2 ×

<sup>10</sup>−3), *Concordant* vs. *Discordant* (*<sup>p</sup>* <sup>=</sup> <sup>4</sup>*.*<sup>0</sup> <sup>×</sup> <sup>10</sup>−5), *Non-Letter* vs. *Discordant* (*<sup>p</sup>* <sup>=</sup> <sup>3</sup>*.*<sup>5</sup> <sup>×</sup> <sup>10</sup>−3). The *Concordant* vs. *Non-Letter*, *No-Distractor* vs. *Non-Letter* and *No-Distractor* vs. *Concordant* comparisons did not reach statistical significance (*p >* 0*.*1).

#### **fMRI CONTRASTS**

The significance of the (de)activation to all 10 conditions relative to baseline according to an omnibus *F-*test is shown in **Figure 3**. Activation was found in widespread areas of the brain: sensory auditory and visual cortices in the temporal and occipital lobes, motor and pre-motor areas, as well as regions in the frontal lobe.

Group-level activation to passive listening (*No-Task*), contrasted against baseline (silence), for the L and R ear presentations is shown in **Figure 4A**. For each contrast there was bilateral activation in the auditory cortex that was stronger in the hemisphere contralateral to the stimulated ear. In addition, bilateral decreased signals relative to the baseline of the medial visual cortex in the calcarine sulcus was observed.

The effect of task performance, addressed through a *No-Distractor* vs. *No-Task* contrast, is shown in **Figure 4B**. Both when instructed to attend the left (L) or right (R) ear, activation was found in the supplementary motor area and cerebellum.

To study the effect of the task-irrelevant stimuli, contrasts were computed for the three respective conditions (*Concordant*, *Discordant*, and *Non-Letter*) against the condition without any task-irrelevant stream (*No-Distractor*), for the left and right ear presentations separately. **Figure 4C** shows the mean effect of all three conditions when presenting task-irrelevant information to the right ear when attending the left ear, or to the left ear when attending the right ear. Unilateral activation was observed in the auditory cortex contralateral to the irrelevant stimuli (that is, ipsilateral to the attended ear). **Figure 4D** further shows the contrasts involving each of these three conditions separately. As before, the effect was present on the side contralateral to the presentation of the task-irrelevant stimuli. The most extensive activation was observed in the contrast involving the discordant letters (i.e., *Discordant* vs. *No-Distractor*). The non-letter stimuli resulted in more confined activation (i.e., *Non-Letter* vs. *No-Distractor*), although activation still peaked in similar locations. Finally, the concordant letters (i.e., *Concordant* vs. *No-Distractor*) evoked the least extensive activation (or no significant effect at all when attending the right ear).

In order to assess whether the apparent differences in activation patterns in **Figure 4C** and **Figure 4A** were significant, a comparison between these two contrasts was made. The resulting contrast [(*Concordant* + *Discordant* + *Non-Letter*)/3 − *No-Distractor*] vs. *No-Task* is shown in **Figure 5**. When comparing the activation evoked by a task-irrelevant stream in the left ear (when attending to the right ear in a task condition *with* distractors) to the activation evoked by a single task-irrelevant stream in the same left ear (during passive listening *without* distractors), bilateral decreased responses in the auditory cortex were observed. A similarly bilateral but less extensive decreased response pattern was observed when comparing the activation evoked by a task-irrelevant stream in the right ear (when attending to the left ear) to the activation evoked by a single task-irrelevant stream in the same right ear (during passive listening). That is, bilateral auditory cortex responded significantly less strongly to a taskirrelevant stream that was presented to one ear in the presence of a task-relevant stream in the other ear than to a task-irrelevant stream that was presented alone in one ear. In other words, the activations evoked by the task-relevant and task-irrelevant

in right and left auditory cortices. Visual and motor-related areas were also active as was a small region in frontal cortex. **(A)** A "glass brain" image,

Images were thresholded at a confidence level *p <* 0*.*05 (FWE-corrected) and cluster size *k >* 100 voxels. A, anterior; L, left; P, posterior; R, right.

(*No-Task*) relative to the silent baseline for left (L) and right (R) ear stimulus presentations. **(B)** Task related activation (deactivation) derived from the contrast *No-Distractor* vs. *No-Task*. **(C)** Activation (deactivation) for the average distractor effect relative to task condition [(*Concordant* + *Discordant* + *Non-Letter*)/3 vs. *No-Distractor*] when attending the left (L) and right (R) ear. **(D)** Activation (deactivation) for each distractor effect

relative to task condition (*Concordant* vs. *No-Distractor*; *Discordant* vs. *No-Distractor*; *Non-Letter* vs. *No-Distractor*) when attending the left ear (L) and right ear (R). All images were thresholded at a confidence level *p <* 0*.*05 (FWE-corrected). Activations (deactivations) of nine adjacent slices were overlaid over the mean anatomical image. Hot (cold) colors refer to increased (decreased) signals. A, anterior; L, left; P, posterior; R, right.

streams combine sub-additively in both hemispheres. **Figure 5** further shows activation of the primary visual cortex in the calcarine sulcus, which was significant only for the comparison concerning the activation evoked by task-irrelevant streams presented to the right ear.

#### **ROIs**

All Brodmann areas that showed significance for any factor in the ANOVA are presented in **Table 1**. The pattern of brain areas that were responsive to the stimuli and task well agreed with those according to the voxel-wise omnibus test in **Figure 3**. Interactions between ear and condition were never significant.

Subsequently, the left and right primary and secondary auditory and visual cortices were analyzed further. The BOLD percentage signal change in these sensory ROIs is presented in **Figure 6** by means of barplots indicating the mean activation for each combination of ear and condition; ANOVA results are listed in **Table 1** as well.

and on the right ear. Images were thresholded at a confidence level *p <* 0*.*05 (FWE-corrected). Supra-additivity (sub-additivity) of nine adjacent slices were overlaid over the mean anatomical image over all subjects. Hot (cold) colors refer to increased (decreased) signals. L, left; R, right.

In general, activity in the left PAC was larger than activity in the right PAC. Left PAC and SAC exhibited a highly significant effect of the factor condition; the effect of ear was not significant. For the right PAC, both the effects of ear and condition were significant. For the right SAC there was a significant effect of condition but no significant effect of ear.

We subsequently considered various pairwise comparisons of interest (**Table 2**). Consistent with the contrasts in **Figure 4A**, *post-hoc* analysis revealed that for both bilateral PAC and SAC, and both attending L and R, activation to the *No-Task* condition was significantly different from baseline. Furthermore, and also consistent with the contrasts in **Figure 4A**, in the passive monotic condition (*No-Task*), right PAC activation was stronger when stimuli were presented to the left ear than when presentation was to the right ear (L-*No-Task* vs. R-*No-Task*); this comparison was not significant for left PAC, although a similar trend toward a contralateral preference existed. During the monotic condition with task performance (*No-Distractor*) right PAC activations were stronger when attending the left ear than when attending the right ear (L-*No-Distractor* vs. R-*No-Distractor*). This was only a trend for left PAC. Left and right PAC/SAC activations exhibited a similar pattern: activation during the monotic active task condition (*No-Distractor*) significantly increased relative to passive listening conditions (*No-Task*).

A further increase of signal occurred due to the presence of an irrelevant stream (*Concordant*, *Discordant*, *Non-Letter*) relative to the condition without such a stream (*No-Distractor*). This effect was strongest and always significant in



*All Brodmann areas (BA) were tested; areas that are not listed showed no significant effect for either factor. The sensory ROIs that were defined for further analysis are all included as well.* **\****p < 0.05;* **\*\****p < 0.01;* **\*\*\****p < 0.001; n.s., non-significant. BA labels according to Strotzer (2009). aGreenhouse-Geisser corrected significance of the two factors is reported for the sensory ROIs.*

PAC on the side contralateral to the task-irrelevant stream (thus, ipsilateral to the attended ear). This increase was significant for the left PAC when attending the right ear but only for *No-Distractor* vs. *Discordant* and *No-Distractor* vs. *Non-Letter* comparisons. Left SAC partially exhibited the same pattern with the following comparisons being significant: L-*No-Distractor* vs. *Discordant* and R-*No-Distractor* vs. R-*Discordant*. For right SAC none of these comparisons reached significance.

Comparisons between the dichotic conditions (*Concordant* vs. *Discordant*; *Concordant* vs. *Non-Letter;* and *Discordant* vs. *Non-Letter*) did not reach significance although the figures show a trend with the most difficult condition (*Discordant)* systematically resulting in the largest mean activation.

The ROIs in the visual cortex showed negative BOLD responses relative to baseline (**Figures 6C,D**). Left and right primary visual cortex (PVC) showed no significant effect of either condition or ear (**Figure 6C**). Left and right SVC both showed a significant effect of condition but no significant effect of ear (**Figure 6D**).

*Post-hoc* analyses revealed for both left and right PVC, and SVC a significant difference between the activation during passive listening conditions and the baseline, except for right SVC, which exhibited only a trend. In contrast with PVC, SVC showed stronger negative responses relative to baseline for each of the four conditions requiring task performance when compared to the No-Task condition. No further comparisons between conditions for left and right SVC reached significance. Comparisons between conditions for left and right PVC were not significant.

**Table 2 | Significance of the contrasts of the sensory ROIs, for the Attend Left (L) and Attend Right (R) experimental conditions.**


*Contrasts for the selected families of null-hypotheses were assessed by means of Student t-tests and corrected for multiple comparisons using False Discovery Rate (FDR) criteria, controlled at 0.05 level. Additionally, comparisons were made between corresponding stimuli presented to the left or right ear; these are not listed in the Table, but see the main text for significant results.* **\****p < 0.05;* **\*\****p < 0.01;* **\*\*\****p < 0.001; n.s., non-significant.*

## **DISCUSSION**

This study provides a comparison between fMRI BOLD signal changes resulting from monotic unattended stimulation (*No-Task*), monotic attended stimulation (*No-Distractor*), and dichotic stimulation with one attended and one unattended ear simultaneously (*Concordant*, *Discordant*, *Non-Letter*). Widespread areas of the brain were shown to be active for all task conditions. Consistent with previous work, strong BOLD signal changes resulting from monotic stimulation were observed in bilateral AC, responses being largest on the side contralateral to stimulation. We showed that this was the case for both attended task-relevant sound stimuli as well as for unattended task-irrelevant sound stimuli, whether accompanied by taskrelevant stimulation of the other ear or not. Moreover, we found that the preferred contralateral activation to task-irrelevant stimuli was strongest for the condition involving stimuli that interfered most strongly with task performance. The activations to two distinct unattended streams were contrasted directly: one from a monotic passive listening condition and the other from a dichotic condition in the presence of a task-relevant stream to the other ear. The latter showed weaker activation in the bilateral AC than the former. Finally, we showed that passive listening is enough to deactivate primary and secondary visual cortex, suggesting that cross-modal inhibition does not require task performance.

#### **ACTIVATION TO MONOTIC STIMULI**

The present study comprised five distinct conditions, per attended ear. The passive listening condition (No-Task) serves as a model for a situation in which a subject is exposed to environmental stimuli but not specifically attending to them since no task is required to be performed. Following monotic task-irrelevant stimulation (*No-Task*), both contralateral and ipsilateral AC were active. Yet, voxel-based analyses showed that the strongest activity was measured in the hemisphere contralateral to the stimulated ear. In agreement was the ROI-based analysis showing that PAC activation was stronger when the stimuli were presented to the contralateral ear. This was particularly significant for the right PAC, whereas left PAC exhibited only a trend. This is in agreement with previously reported results, both in animals (e.g., Rosenzweig, 1951; Hall and Goldstein, 1968; Mrsic-Flogel et al., 2005, 2006; Nelken et al., 2008; Werner-Reiss and Groh, 2008) and humans (e.g., Pantev et al., 1998; Alho et al., 1999; Fujiki et al., 2002; Jäncke et al., 2002; Petkov et al., 2004; Langers et al., 2005; Della Penna et al., 2007; Woods et al., 2009).

Although the stimuli presented during the passive condition were task-irrelevant, we cannot completely exclude that attention may still have been drawn to them. However, given the fast stimulus presentation rate (Alain and Izenberg, 2003), it seems reasonable to assert that a state of sustained and focused attention was absent during this condition, or at least weaker if compared to the other conditions that required task performance. After the session the majority of subjects reported that they had indeed not been performing the task when not instructed to. Furthermore, supporting this view were the ROI BOLD responses: the No-Task condition exhibited weaker activation in the auditory cortices (PAC and SAC) than the No-Distractor task condition. Such task-related attentional enhancement of activity relative to No-Task, during the presentation of the same stimuli, has been previously reported (Grady et al., 1997; Jäncke et al., 1999; Hall et al., 2000).

Conversely, for the four task conditions (No-Distractor, Concordant, Discordant and Non-Letter) focused attention was present. This was firstly confirmed by the behavioral results showing that all subjects performed well above chance level, suggesting that all subjects were engaged during task conditions. Secondly, the No-Distractor vs. No-Task contrast showed activation in motor and premotor cortices, supplementary motor area (SMA) and pre-SMA, and cerebellum. These areas are known to be active during task performance that comprises sensory, cognitive and motor processing (Picard and Strick, 2001; Jäncke et al., 2003; Salmi et al., 2009; Baumann and Mattingley, 2010). In summary, our observations suggest that during passive listening there was no task-related activity and subjects were likely not internally performing the task, while during task conditions subjects were attentively engaged in the one-back task.

#### **ACTIVATION TO DICHOTIC STIMULI**

During three of the five conditions (*Concordant*, *Discordant* and *Non-Letter*) subjects were presented with both a task-relevant and a task-irrelevant stimulus stream. The introduction of additional task-irrelevant distracting stimuli increased the difficulty of the task. Yet, subjects' mean performance remained above 85% for all conditions with few subjects scoring above 90% for all conditions. Ceiling effects may be present suggesting that the task was not difficult enough, allowing the subjects not only to attend the stimuli in order to perform the task but also to attend the distractors. Still, we were able to show significant differences in task performance between various pairs of conditions (Discordant vs. Concordant, Discordant vs. Non-Letter, Discordant vs. No-Distractor). Thus, we argue that the various distractor types interfered differently with task performance.

Responses resulting from contrasting all distractor conditions against the No-Distractor condition were not found bilaterally (**Figures 4C,D**). These were strongest in the hemisphere contralateral to the additional distractor stream, ipsilateral to the attended ear. Also, from the ROI analyses AC responses to the addition of a second task-irrelevant stream did not produce significantly stronger responses on the side contralateral to the attended ear, with an exception for the left PAC and SAC. These areas additionally showed significant differences between the No-Distractor condition and some of the dichotic conditions. These results were consistent with the stronger contralateral responses in all monotic conditions. There was an acoustic response increasing the contralateral hemispheric responses due to the task-irrelevant stream presentation, which may be explained by the fact that the No-Distractor condition was monotic while the distractor conditions were dichotic. AC BOLD responses to a dichotic stimulus presentations have been reported to be stronger than those presented monotically (Scheffler et al., 1998). Furthermore, not having a correspondent strong increase in ipsilateral responses may indicate that the already strong contralateral responses to the task-relevant stream could not be significantly elevated by adding a second task-irrelevant stream. This may be attributable to hemodynamic response saturation. BOLD fMRI is not directly sensitive to neural activity, but measures the increased blood flow that follows the increased metabolic demand of activated brain tissue. Because the achievable amount of vascular dilation is limited, BOLD responses tend to saturate at high levels. Such non-linearities would also express themselves as apparent suppression of evoked responses when baseline activity is elevated by the attended sound.

We found stronger activation in the left PAC when compared to the right PAC, in dichotic listening conditions and regardless of the side that is being attended. This can be related to the previously reported phenomenon known as "right ear advantage" (REA) for verbal stimuli in subjects showing left-lateralization for language processing (e.g., Kimura, 1961; Foundas et al., 2006; Della Penna et al., 2007). It has been shown that, behaviorally, attention plays an important role in dichotic listening (Kinsbourne, 1970; Bryden et al., 1983). Correspondingly, neuroimaging studies have showed that the level of activation in the auditory cortex depends on the direction of attention: selective attention directed to one ear increases activation in the auditory cortex contralateral to the attended ear (Alho et al., 1999, 2012; Jäncke et al., 2001, 2003). Thus, it would be expected that when directing attention to the left ear the correspondent contralateral responses, in the right hemisphere, would be strongest when compared to the ipsilateral responses, in the left hemisphere. This results are in agreement with previous research reporting left hemisphere preference for language processing (Damasio and Geschwind, 1984; Giraud et al., 2007).

Activations were different for different distractor types, although ROI-based analyses showed only a trend. Among the dichotic conditions, the strongest activation was measured for the most difficult condition, in both primary and secondary AC. One could argue that this is due to pure bottom-up effects related to the complexity of the auditory scene: two voices speaking different letters (in Discordant), or one voice and one non-voice (in Non-Letter), may require more acoustic processing to be disentangled than two voices speaking the same letter (in Concordant). However, even in the Concordant condition the two streams were spoken by different speakers, one male and one female. Although semantically the same, these were therefore acoustically very different. In particular, with regard to average pitch or spectra, the two Concordant streams were comparably different as the Discordant streams. More generally, for all three of our dichotic conditions the auditory scene consisted of two clearly distinguishable sound sources or auditory objects. This suggests that the differences in activation evoked by the various dichotic distractor conditions were not purely due to the required amount of lowlevel acoustic processing, but were affected by more high-level functions as well. This may have comprised increased attentional requirements in order to manage greater interference, in accordance with previously reported studies on attentional modulation in the AC (e.g., Jäncke et al., 1999; Petkov et al., 2004; Woods et al., 2009). Given these arguments we conclude that not only bottomup acoustic mechanisms but also top-down attentional processing was present during dichotic presentations.

Previous work has discussed the influence of attentional load and task difficulty in stimulus processing. While some reported decreased responses with increased task demands (Lavie, 1995; Rees et al., 1997) others reported the opposite relationship (Fockert et al., 2001) or differentiated effects (Alain and Izenberg, 2003; Chait et al., 2012). Lavie et al. (2004) suggested the existence of two types of load: perceptual load and working memory load, with opposite effects (for a review see Lavie, 2005). Recent work (Sabri et al., 2013) showed that increased perceptual load in the attended ear correlates with decreased responses in the auditory cortex to task-irrelevant sounds. This is the opposite from the trend that we observe (increased responses for the most demanding conditions) and furthermore inconsistent with another recent study that did not show any modulatory effect (Murphy et al., 2013). However, our present paradigm differs from Lavie's model in an important regard: we did not vary the perceptual load of the attended stream itself, which remained unchanged over the whole experiment. Instead, we only changed the congruency or the category of the unattended task-irrelevant stream. We surmise that there was an indirect load change that happened through the interference of the different distractors. The most interfering distractor acted to increase the cognitive load of the condition. Thus, differences regarding the nature of the task used might be relevant, considering that Sabri et al. (2013) use a perceptual detection task while the current study requires the participants to perform a cognitive control task, specifically a working memory task. Cognitive control of attentional processes is necessary for minimizing distractor interference, which is the case in the present experiment where one task-relevant stream competes for attention with another task-irrelevant stream. The discrepancy between these studies may therefore be attributed to the observation that working memory load and perceptual load involve different perceptual and cognitive processes (Fockert et al., 2001; Lavie and Fockert, 2005; Dalton et al., 2009).

#### **SUPPRESSIVE BINAURAL INTERACTION**

To further understand task-irrelevant processing, we addressed the differences between responses to a monotic task-irrelevant stream and responses to a task-irrelevant stream in a dichotic stimulation during simultaneous presentation of a task-relevant stream. We were primarily interested in the neural responses due to the additional unattended distractor stream. To assess the response to an unattended distractor stream, we compared a diotic condition with an attended and an unattended stream to a monotic condition with an attended stream alone. Additionally, we wished to assess whether the presence of the attended stream influences the measured response to the unattended stream. For this purpose, we also measured the neural response to an unattended monotic stream (*No-Task*) compared to a baseline without any streams. Given the responses to an unattended stream in the presence and absence of another attended stream, we could finally assess the interaction between both streams.

The monotic presentation resulted in stronger bilateral auditory cortex activation. Based on previous research it seems reasonable to expect that the two existing ipsilateral pathways are suppressed during dichotic listening, due to dichotic interaction. However, which pathway is being suppressed by the other cannot not be distinguished from our results. We can, however, say that there is evidence of a suppressive interaction mechanism involving the contralateral and the ipsilateral pathways.

Suppressive binaural interaction was proposed in previous studies comparing left and right monotic with dichotic stimulations (e.g., Fujiki et al., 2002; Kaneko et al., 2003). Fujiki et al. (2002) reported suppression of the ipsilateral responses during dichotic stimulation when compared to monotic stimulation, in both hemispheres. The authors discussed this result in terms of existing inhibitory effects present during dichotic stimulation that lead to competition between auditory stimuli. We argue that a similar mechanism occurred in this experiment which requires the processing of two distinct streams: the task-relevant stream, which is supposedly attended, and a task-irrelevant distractor stream that has to be ignored. For left and right presentations, only the contralateral responses to the additional presentation of a task-irrelevant stream (in the presence of a task-relevant stream in the other ear) showed significance, and not the ipsilateral responses to the same stimuli. This can be related with an increase of the ipsilateral responses to the attended stimuli. However, it also suggests that the ipsilateral response of the taskirrelevant stream was suppressed by the stronger contralateral attended task-relevant stream or by an active suppression mechanism of the task-irrelevant stimuli, in agreement with what was suggested in previous research (Alho et al., 1999). Thus, we cannot exclusively argue in favor of the existence of a suppression of the ipsilateral responses of the attended stream. This is an interesting finding which might be correlated with the previously mentioned EEG result showing that the N1 amplitude is reduced in the presence of competing task-irrelevant auditory distractions presented to an unattended ear, when attention is directed to a task-relevant stream simultaneously presented in the other ear (Ahveninen et al., 2011; Ponjavic-Conte et al., 2012).

Although we were limited in the number of conditions due to practical concerns, we concede that other conditions might have been of interest. For example, the inclusion of a passive dichotic listening condition (possibly comprising all three combinations of streams used in this study) would enable a comparison between activation to the presentation of two unattended streams (passive dichotic) with activation to one unattended stream in the presence of another attended stream (active dichotic). It would therefore allow the assessment of the effect of task-relevance on one stream, for instance through top-down attention. Given our primary focus on the unattended stream, we chose to include only the conditions that were required to make the assessments that we present reported on. We nevertheless feel that future studies including these other conditions constitute an important complement to the present work.

#### **VISUAL CORTEX RESPONSES**

Deactivation of SVC during passive listening relative to the baseline condition was not completely unexpected, however PVC deactivation was. Additionally, the ROI analyses showed that in comparison to PVC, SVC appeared to be more strongly affected by task performance: the No-Distractor vs. No-Task comparison was significantly different for SVC and not for PVC; during passive listening (No-Task) PVC deactivation was stronger than that in SVC. This suggests stronger task-related attentional influences in non-primary visual than primary visual cortex, in agreement with previous studies (Hairston et al., 2008; Mozolic et al., 2008). Moreover, increased task difficulty, with the addition of distractors, did not produce any significant change compared to the active condition with no distractor (No-Distractor), which is different from what has previously been suggested for the SVC (Hairston et al., 2008). Hairston et al. (2008) employed an auditory temporal-order judgment task at different levels of difficulty that were adjusted for each individual's own threshold. This is considered to be a perceptually demanding task. Possibly, the present study employed an easier task requiring less attentionalrelated resources and consequently a weaker task-difficulty modulatory effect. Additionally, as argued before, differences in the results obtained may reflect distinct neural processes that are taskrelated, since the present study used a cognitive working memory (as opposed to a perceptual task).

Cross-modal inhibition has been reported in previous studies. In the context of unimodal stimulus presentations, several studies have shown decreased responses in sensory areas that are not classically considered to be relevant to the processing of the presented stimuli (Haxby et al., 1994; Zatorre et al., 1999; Laurienti et al., 2002; Johnson and Zatorre, 2005; Hairston et al., 2008; Salo et al., 2013), although others do not consistently present similar results (for a review see Shulman et al., 1997). In particular, decreased responses to unimodal auditory stimulation have been reported in visual areas, during active conditions requiring auditory sustained attention (Zatorre et al., 1999; Johnson and Zatorre, 2005; Hairston et al., 2008; Mozolic et al., 2008; Salo et al., 2013) and also, although less commonly reported, during passive stimulation (Laurienti et al., 2002; Johnson and Zatorre, 2005). The referred visual related areas were generally confined to higher processing regions (BA19). Interestingly, however, in the present study we show decreased responses not only in higher visual cortex (BA19) but also in the earlier visual processing region in the primary visual cortex (BA17).

We show decreased responses in the primary visual cortex during both auditory active (with or without distractor presence) and passive stimulation (without distractor presence), and with simultaneous increased responses in the auditory cortex. The existence of anatomical connections between auditory and visual areas has been reported before in nonhuman primates (Falchier et al., 2002; Rockland and Ojima, 2003; Clavagnier et al., 2004; Cappe and Barone, 2005). Recently, an interesting study has shown that activation of auditory cortex to passive sound exposure drives synaptic-inhibition in the primary visual cortex, through recruitment of local inhibitory circuitry (Iurilli et al., 2012). Our results for the primary visual cortex are in agreement with the existence of a functional relation between auditory and visual cortex that does not necessarily require attention, and strongly suggest that an automatic sensory processing mechanism occurs within the visual cortices, during acoustic stimulation. Since secondary visual areas seem to be more attentionally modulated than the primary visual, it can be speculated that deactivation of primary sensory areas triggers the (re)allocation of attentional resources within a modality, potentially through the involvement of supramodal areas like frontal and parietal cortices, for further use by the relevant cortices. Future research is necessary to better understand the mechanisms underlying cross-modal interactions.

## **ACKNOWLEDGMENTS**

The author Ana A. Amaral was supported by the research grant SFRH/BD/33945/2009 from the Fundação para a Ciência e a Tecnologia (FCT) funded through the Portuguese Ministério da Educação e Ciência (MEC), Portugal; Dave R. M. Langers was funded by VENI research grant 016.096.011 from the Netherlands Organisation for Scientific Research (NWO) and the Netherlands Organization for Health Research and Development (ZonMw). Additionally the authors would like to acknowledge the Fundação para a Ciência e Tecnologia (FCT) and the Heinsius Houbolt Foundation for the financial support. Conflict of interest: none to declare.

## **REFERENCES**


Damasio, A. R., and Geschwind, N. (1984). The neural basis of language. *Annu. Rev. Neurosci.* 7, 127–147. doi: 10.1146/annurev.ne.07.030184.001015


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 30 August 2013; accepted: 16 December 2013; published online: December 2013. 27*

*Citation: Amaral AA and Langers DRM (2013) The relevance of task-irrelevant sounds: hemispheric lateralization and interactions with task-relevant streams. Front. Neurosci. 7:264. doi: 10.3389/fnins.2013.00264*

*This article was submitted to Auditory Cognitive Neuroscience, a section of the journal Frontiers in Neuroscience.*

*Copyright © 2013 Amaral and Langers. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Across-frequency behavioral estimates of the contribution of inner and outer hair cell dysfunction to individualized audiometric loss

## *Peter T. Johannesen1,2, Patricia Pérez-González 1,2 and Enrique A. Lopez-Poveda1,2,3\**

*<sup>1</sup> Auditory Computation and Psychoacoustics, Instituto de Neurociencias de Castilla y León, University of Salamanca, Salamanca, Spain*

*<sup>2</sup> Grupo de Audiología, Instituto de Investigación Biomédica de Salamanca, University of Salamanca, Salamanca, Spain*

*<sup>3</sup> Departamento de Cirugía, Facultad de Medicina, Facultad de Medicina, Universidad de Salamanca, Salamanca, Spain*

#### *Edited by:*

*Monica Munoz-Lopez, University of Castilla-La Mancha, Spain*

#### *Reviewed by:*

*Chris Plack, The University of Manchester, UK Tim Juergens, Universität Oldenburg, Germany*

#### *\*Correspondence:*

*Enrique A. Lopez-Poveda, Instituto de Neurociencias de Castilla y León, Universidad de Salamanca, Calle Pintor Fernando Gallego 1, 37007 Salamanca, Spain e-mail: ealopezpoveda@usal.es*

Identifying the multiple contributors to the audiometric loss of a hearing impaired (HI) listener at a particular frequency is becoming gradually more useful as new treatments are developed. Here, we infer the contribution of inner (IHC) and outer hair cell (OHC) dysfunction to the total audiometric loss in a sample of 68 hearing aid candidates with mild-to-severe sensorineural hearing loss, and for test frequencies of 0.5, 1, 2, 4, and 6 kHz. It was assumed that the audiometric loss (HLTOTAL) at each test frequency was due to a combination of cochlear gain loss, or OHC dysfunction (HLOHC), and inefficient IHC processes (HLIHC), all of them in decibels. HLOHC and HLIHC were estimated from cochlear I/O curves inferred psychoacoustically using the temporal masking curve (TMC) method. 325 I/O curves were measured and 59% of them showed a compression threshold (CT). The analysis of these I/O curves suggests that (1) HLOHC and HLIHC account on average for 60–70 and 30–40% of HLTOTAL, respectively; (2) these percentages are roughly constant across frequencies; (3) across-listener variability is large; (4) residual cochlear gain is negatively correlated with hearing loss while residual compression is not correlated with hearing loss. Altogether, the present results support the conclusions from earlier studies and extend them to a wider range of test frequencies and hearing-loss ranges. Twenty-four percent of I/O curves were linear and suggested total cochlear gain loss. The number of linear I/O curves increased gradually with increasing frequency. The remaining 17% I/O curves suggested audiometric losses due mostly to IHC dysfunction and were more frequent at low (≤1 kHz) than at high frequencies. It is argued that in a majority of listeners, hearing loss is due to a common mechanism that concomitantly alters IHC and OHC function and that IHC processes may be more labile in the apex than in the base.

**Keywords: cochlear non-linearity, auditory masking, hearing aid, cochlear damage, hearing loss, hearing impairment**

## **INTRODUCTION**

Cochlear hearing loss occurs when absolute hearing thresholds for pure tones are higher than normal without signs of middleear or auditory neural pathology (Moore, 2007). In the healthy cochlea, inner hair cells (IHCs) transduce mechanical basilar membrane (BM) vibrations into nerve signals, while outer hair cells (OHCs) amplify BM responses to low-level sounds and are thus responsible for our high auditory sensitivity (Bacon et al., 2004). A reduction in the number of OHCs or lesions to the OHCs or associated structures can reduce the cochlear gain to low level sounds and hence cause an audiometric loss. Similarly, a reduction in IHC count or lesions to the IHCs or their associated structures can increase the BM excitation required for detecting a signal, which may also cause an audiometric loss (Moore, 2007). Although it is not generally possible to establish a one-to-one correspondence between audiometric loss and the degree of physical IHC/OHC loss or injury (Chen and Fechter, 2003; Lopez-Poveda and Johannesen, 2012), it is reasonable to assume that the audiometric loss may be due to combined loss or dysfunction of IHCs and OHCs. Indeed, some authors have assumed that the audiometric loss (HLTOTAL) for a given test frequency may be conveniently expressed as the sum of two contributions: one associated with cochlear mechanical gain loss, or OHC dysfunction (HLOHC), and one associated with inefficient IHC transduction, or IHC dysfunction (HLIHC), where HLTOTAL, HLIHC and HLOHC are all in decibels (dB) (Moore and Glasberg, 1997; Plack et al., 2004; Moore, 2007; Jepsen and Dau, 2011; Lopez-Poveda and Johannesen, 2012). The aim of the present study was to assess HLOHC and HLIHC over the frequency range from 500 Hz to 6 kHz in a large sample of listeners with mild-to-severe sensorineural hearing loss.

The prevailing view is that OHCs are generally more labile than IHCs and that IHCs and OHCs in the basal region of the cochlea are damaged first and to a greater extent than cells in the apical region (reviewed by Møller, 2000). The relative degree of physical IHC/OHC loss or dysfunction and the location of the dysfunction, however, almost certainly depend on the cause and magnitude of the lesion. Noise-induced hearing loss is associated mostly with loss of basal OHCs (Chen and Fechter, 2003). In human, temporal bone studies of noise-induced hearing loss report increased cell death in basal BM locations and fewer surviving OHCs than IHCs (McGill and Schuknecht, 1976). On the other hand, acoustic trauma damages IHC and OHC stereocilia to similar degrees, which suggests that noise-induced hearing loss probably has a substantial contribution from IHC dysfunction (Liberman and Dodds, 1984). The cochlear location of the dysfunction almost certainly depends on the noise spectrum.

Some ototoxic drugs also cause a hearing loss. In this case, the degree of physical IHC and OHC damage depends on the drug employed. Aminoglycosides cause mostly OHC dysfunction and basal OHCs are first affected and more affected than apical OHCs (van Ruijven et al., 2004; Selimoglu, 2007; Pickles, 2008). Carboplatin, by contrast, does not reduce otoacoustic emission levels (Trautwein et al., 1996) or the sharpness of neural response tuning curves (Wang et al., 1997), which suggests that carboplatin hardly affects cochlear mechanics and affects mostly IHCs or their related structures. Furthermore, carboplatin raises the tips of neural tuning curves comparably at all frequencies (Wang et al., 1997), which indicates that its effect on IHCs is comparable along the cochlear length. In humans, histological studies of aminoglycoside-induced hearing loss report increased cell death in basal BM location and fewer surviving OHCs than IHCs (Huizing and de Groot, 1987).

Sensorineural hearing loss, however, need not always be caused by reduced counts or injury to hair cells or their associated structures. Metabolic presbycusis, for example, a form of agerelated hearing loss (Schmiedt et al., 2002), causes a reduction of the endocochlear potential that can simultaneously reduce the cochlear mechanical gain (Saremi and Stenfelt, 2013) *and* the IHC response (Meddis et al., 2010; Panda et al., 2014) Functionally, this can manifest as a simultaneous dysfunction of IHCs and OHCs. Computer simulation studies suggest that whatever the mechanism, a reduction of the endocochlear potential always raises absolute thresholds more at high than at low frequencies (Meddis et al., 2010; Saremi and Stenfelt, 2013; Panda et al., 2014), which probably explains the association between aging and gradually sloping high-frequency losses. Likewise, aspirin, an ototoxic agent that impairs OHC function, broadens psychoacoustical tuning curves, reduces two-tone suppression, and linearizes growth-of-masking functions slightly more at 3 kHz than at 750 Hz, which can be explained in terms of greater involvement of labile cochlear non-linear processes in basal than in apical cochlear regions (Hicks and Bacon, 1999). In summary, it would be erroneous to conclude that the typically greater highfrequency losses are always due to comparatively greater loss or injury of basal than apical IHCs and/or OHCs.

Regardless of its actual cause, sensorineural hearing loss is typically treated with hearing aids. In programming a hearing aid, the assumption is made that HLTOTAL is partly due to cochlear mechanical gain loss (akin to HLOHC) and partly due to other factors (akin to HLIHC). Individual across-frequency estimates of HLOHC and HLIHC would be highly useful to optimize individualized treatment with hearing aids (Muller and Janssen, 2004; Mills, 2006). Estimation of HLOHC and HLIHC is, however, hard because it can only be done using indirect methods. For this reason, large-scale studies are rare. Using a loudness model, Moore and Glasberg (1997) concluded that HLOHC and HLIHC account on average for 80 and 20% of HLTOTAL, respectively, but reported that for a few listeners the loss attributable to OHC damage appears to be less than 50%. Plack et al. (2004) used the temporal masking curve (TMC) method (Nelson et al., 2001) to infer I/O curves at 4 kHz and estimated that HLOHC contributes 65% of HLTOTAL. Also based on TMC data, Jepsen and Dau (2011) used a computer auditory model to estimate HLIHC and HLOHC at 1 and 4 kHz in 10 hearing impaired (HI) listeners. Their results were broadly consistent with the common view that HLOHC is greater and more frequent than HLIHC, but they also reported some cases with substantial HLIHC at low frequencies. Jürgens et al. (2011) concluded that at 4 kHz cochlear gain loss (or HLOHC) was proportional to HLTOTAL but 10–15 dB lower (p. 189). More recently, we have proposed a more refined method for estimating HLOHC and HLIHC from the analysis of TMC-based input/output (I/O) curves. We concluded that HLOHC and HLIHC account on average for 60 and 40% of HLTOTAL with large variability across cases; indeed, percentages were sometimes reversed (Lopez-Poveda and Johannesen, 2012). Our conclusions were based on 26 I/O curves (most of them for a test frequency of 4 kHz) from 18 listeners with mild-to-moderate sensorineural hearing losses and are awaiting confirmation and extension to other frequencies and broader range of hearing losses.

The main aim of the present study was to assess HLIHC and HLOHC from behaviorally inferred I/O curves for a large sample of hearing aid candidates (*N* = 68) and for test frequencies of 0.5, 1, 2, 4, and 6 kHz. A second objective was to investigate to what extent HLIHC and HLOHC vary across test frequencies to examine potential structure-function correlations; that is, to examine the potential correspondence between HLOHC and HLIHC with existing evidence regarding physical loss or injury and/or dysfunction of OHCs and IHCs and their distribution across frequency. A third objective was to investigate the degree of variability of HLOHC and HLIHC across listeners. We used virtually the same approach as in our recent study (Lopez-Poveda and Johannesen, 2012). The present work, however, extends our previous study in several important aspects: first, HLOHC and HLIHC estimates in our previous study were restricted to I/O curves that showed a "knee-point" or a compression threshold (CT), whereas the present analysis is extended to all I/O curves; second, the present study is for a much larger subject sample and for a wider range of frequencies; third, the present study included participants with hearing losses from mild to severe, hence more representative of the hearing-aid candidate population.

## **METHODS**

#### **APPROACH AND ASSUMPTIONS**

Our approach was virtually identical to that of Lopez-Poveda and Johannesen (2012). The details can be found in that publication and for conciseness only a summary is provided here. After Moore and Glasberg (1997), we assumed that the total audiometric loss may be split into two contributions: one pertaining to a reduction of mechanical cochlear gain due to OHC dysfunction and a remaining component, which, for convenience, will be assumed due to inefficient IHC processes, or IHC dysfunction:

$$\text{HL}\_{\text{TOTAL}} = \text{HL}\_{\text{OHC}} + \text{HL}\_{\text{IHC}}, \text{(1)}$$

where HLTOTAL, HLOHC, and HLIHC are all in dB. In what follows, HLOHC and HLIHC will be referred to as "OHC loss" and "IHC loss," respectively, and should be interpreted as contribution to audiometric loss (in dB) rather than as anatomical lesions or reduced cell counts.

We further assumed that HLOHC can be found using the OHC dysfunction model of Plack et al. (2004). In this model, a cochlear mechanical input/output (I/O) curve is modeled by a function consisting of a linear segment (slope ∼1 dB/dB) at low input levels, followed by a compressive segment at mid-level inputs (slope *<* 1 dB/dB), eventually followed by another linear segment at high input levels. The breakpoint between the low-level linear segment and the compressive segment is referred to as the CT and the breakpoint between the mid-level compressive segment and the high-level linear segment is referred to as the returnto-linearity threshold (RLT). OHC dysfunction causes a loss of low-level cochlear gain and is modeled as a horizontal shift of the low-level linear segment of the I/O curve toward higher input levels without affecting the slope of the compressive segment (Plack et al., 2004). An assumption of our approach is that HLOHC can be found by comparing the CT of a given hearing-impaired (HI) listener with a reference CT for normal hearing (NH) listeners (Lopez-Poveda and Johannesen, 2012). When an HLOHC estimate is available, then HLIHC can be estimated using Equation (1) as the difference between HLOHC and HLTOTAL. For a sufficiently large OHC dysfunction, all the cochlear gain is lost, the I/O curve becomes linear (absent CT) and HLOHC is assumed to be equal to the NH gain.

IHC dysfunction is assumed to increase the BM excitation needed for signal detection at threshold. When estimating the I/O curve with a psychophysical approach, only the part of the I/O curve that is above the cochlear mechanical excitation required for detection can be measured. For a large increase in BM excitation, a CT may be absent and only a part of the compressive portion of the I/O curve is available. Therefore, the absence of a CT with presence of a compressive segment in the I/O curve is assumed as indicative of substantial HLIHC. For these cases, it is assumed that Equation (1) does not hold and that HLTOTAL ∼ HLIHC. In other words, it is assumed that even though HLOHC may occur, it does not contribute to the audiometric hearing loss (for a full explanation, see Figure 1D in Lopez-Poveda and Johannesen, 2012 and its related text).

Estimation of HLIHC and HLOHC as outlined above requires access to cochlear I/O curves. We assumed that I/O curves can be inferred behaviorally using the TMC method of Nelson et al. (2001). Briefly, this method consists of measuring the levels of a pure tone forward masker required to just mask a following fixed low-level probe tone as a function of the masker-probe time gap. Two TMCs are measured to infer an I/O curve: one for a condition where the masker is processed linearly by the BM (linear reference); and one for a condition where the masker and the probe tones are equal in frequency (on-frequency). It is assumed that the slope of the linear-reference TMC reflects the post-mechanical rate of recovery from forward masking while the slope of the on-frequency TMC reflects both BM compression on the masker *and* the post-mechanical rate of recovery from forward masking. Under the assumption that the post-mechanical rate of recovery is independent of masker level and frequency, a cochlear I/O curve can be inferred by plotting the masker levels of the linear reference TMC as a function of the masker levels for the on-frequency TMC for paired time gaps (Nelson et al., 2001). Lopez-Poveda et al. (2003) proposed to use a common linear-reference TMC for a high-frequency probe and a low-frequency masker to infer I/O curves for all probe frequencies on the assumption that the recovery from forward masking is also independent of the probe frequency.

### **PARTICIPANTS**

A total of 68 listeners (43 males) with symmetrical sensorineural hearing loss participated in the study. Their ages ranged from 25 to 82 years (median = 61 years). Air conduction absolute thresholds were measured using a clinical audiometer (Interacoustics AD229e) at the typical audiometric frequencies (0.125, 0.25, 0.5, 1, 2, 3, 4, 6, and 8 kHz) (ANSI, 1996). Bone conduction thresholds were measured at 0.5, 1, 2, 3, and 4 kHz. Air and bone conduction thresholds were also measured at 0.75 and 1.5 kHz for a large subset of subjects. A hearing loss was regarded as sensorineural when tympanometry was normal and air-bone gaps were smaller than or equal to 15 dB at one frequency and smaller than or equal to 10 dB at any other frequency. Participants were recruited for a large-scale bilateral hearing-aid outcome study. Therefore, they were additionally required to be hearing-aid candidates (as judged by an experienced audiologist) and to have symmetrical bilateral loss. A hearing loss was regarded as symmetrical when the mean air conduction thresholds at 0.5, 1, and 2 kHz differed by less than 15 dB between the two ears, and the mean difference at 3, 4, and 6 kHz was less than 30 dB (AAO-HNS, 1993). For the current purpose, each participant was tested in one ear. The ear was selected to maximize the number of test frequencies for which TMCs could be obtained. For the majority of cases, this meant selecting the ear with better thresholds in the 2–6 kHz frequency range (30 left ears, 38 right ears). **Figure 1** gives an idea of the distribution of hearing losses (see below).

The data from our previous related study was also included in the present analysis (Lopez-Poveda and Johannesen, 2012). This included reference data for 15 NH listeners and data for 18 listeners with mild-to-moderate sensorineural hearing loss. Results for these groups will be clearly identified below.

All procedures were approved by the human experimentation ethical review board of the University of Salamanca. Subjects gave their signed informed consent prior to their inclusion in the study.

#### **TMC STIMULI AND PROCEDURE**

Stimuli and procedure were similar to those of Lopez-Poveda and Johannesen (2012). On-frequency TMCs were measured for

probe frequencies (*fP*) of 0.5, 1, 2, 4, and 6 kHz. Maskers and probes were sinusoids. The duration of the maskers was 210 ms including 5-ms cosine-squared onset and offset ramps. Probes had durations of 10 ms, including 5-ms cosine-squared onset and offset ramps with no steady state portion, except for the 500- Hz probe, whose duration was 30 ms with 15-ms ramps and no steady state portion. The level of the probes was fixed at 10 dB above the individual absolute threshold for the probe. Maskerprobe time gaps, defined as the period from masker offset to probe onset, ranged from 5 to 100 ms in 10-ms steps with an additional gap of 2 ms. Masker levels sometimes reached the maximum permitted sound level output (105 dB SPL) after a few time gaps. If the number of measured data points was insufficient for curve fitting (see below), masker levels were measured for additional intermediate gaps (e.g., 5, 15, 25 ms). In a few cases, masker levels were atypically low for a time gap of 100 ms. In these cases, masker levels were measured for additional gaps in the range 110–140 ms.

A single linear reference TMC was measured for each listener and it was used to infer I/O curves for all other probe frequencies (Lopez-Poveda et al., 2003). The linear reference TMC was for a probe frequency of 2, 4, or 6 kHz and for a masker frequency equal to 0.4*fp* or 0.5*fp*. The selection of linear reference condition depended on the listener's hearing loss at the linear-reference probe frequency and on the maximum permitted sound level output (105 dB SPL). Following the indications of earlier studies, (Lopez-Poveda et al., 2003; Lopez-Poveda and Alves-Pinto, 2008), the linear reference conditions were sought in the order of priority shown in **Table 1**.

Stimuli were generated digitally in Matlab and output via an RME Fireface 400 sound card (sampling frequency of 44100 Hz, 24-bit resolution) and delivered to the listeners through Sennheiser HD-580 headphones. Sound pressure levels (SPL) were calibrated by placing the headphones on a KEMAR equipped with a Zwislocki DB-100 artificial ear connected to a sound level meter. Calibration was performed at 1 kHz only and the obtained sensitivity was used at all other frequencies.

Masker levels at threshold were measured using a two-interval, two-alternative, forced-choice adaptive procedure with feedback. The inter-stimulus interval was 500 ms. The initial masker level was set sufficiently low that the listener always could hear both **Table 1 | Prioritized linear-reference TMC conditions and number of cases (***N***) where each condition applied (see also red curves in Figure 2).**


*Note that these numbers add up to 63 rather than to the total number of participants (N* = *68). This is because a linear reference TMC could not be measured for four participants, and the data from one additional participant who performed inconsistently during the task were excluded from the analysis.*

the masker and the probe. Masker level was then changed according to a two-up, one-down adaptive procedure to estimate the 71% point on the psychometric function (Levitt, 1971). An initial step size of 6 dB was applied, which was decreased to 2 dB after three reversals. The adaptive procedure continued until a total of 12 reversals in masker level were measured. Threshold was calculated as the mean masker level at the last 10 reversals. A measurement was discarded if the standard deviation of the last 10 reversals exceeded 6 dB. Three threshold estimates were obtained in this way and their mean was taken as the threshold. If the standard deviation of these three measurements exceeded 6 dB, one or more additional threshold estimates were obtained and included in the mean. Measurements were made in a double-wall sound attenuating booth. Listeners were given at least 2 h of training on the TMC task before data collection began.

Absolute thresholds for the probes and maskers were measured using a similar procedure except that the adaptive procedure was one-up, two-down.

TMC and absolute threshold measurements took between 12 and 15 h per participant in total and were distributed in several (1- or 2-h) sessions on several days.

## **TMC FITTING**

Linear-reference and on-frequency TMCs were fitted before they were used to infer I/O curves. Linear reference TMCs were fitted with a double exponential function with four parameters (Lopez-Poveda and Johannesen, 2012); on-frequency TMCs were fitted with a function consisting of the double exponential function fitted to the linear reference TMC plus a second-order Boltzmann function with six parameters (Lopez-Poveda and Johannesen, 2012). When fitting the on-frequency TMC, the parameters of the double exponential function were held fixed and only the parameters of the second-order Boltzmann function were allowed to vary. When the number of data points in a TMC was equal or fewer than the number of parameters of the double exponential or the second-order Boltzmann function, single exponential (two parameters) and first-order (four parameters) Boltzmann functions were used instead. A full justification of this approach can be found elsewhere (Lopez-Poveda and Johannesen, 2012). The goodness-of-fit was assessed using the root-mean-square (RMS) error between measured and fitted TMCs. RMS errors were less than 2 dB for all linear reference TMCs, and less than 4 dB for on-frequency TMCs, except for three cases for which RMS errors were less than 6 dB.

#### **INFERENCE OF I/O CURVES**

I/O curves were inferred for each participant by plotting the masker levels of his/her linear reference TMC against the masker levels for the on-frequency TMCs paired according to time gaps (Nelson et al., 2001). For any given participant, a common linear reference condition was used to infer I/O curves at all test frequencies (Lopez-Poveda et al., 2003). A linear reference TMC could not be found for four participants because their hearing loss was too high at the linear-reference probe frequencies (**Table 1**). In these four cases, an average linear reference (mean across all other participants for the condition *fP* = 4 kHz and *fm* = 1.6 kHz) was used to infer I/O curves. This average linear reference TMC was also used for reanalysis of four cases from our previous study (Lopez-Poveda and Johannesen, 2012) that did not have linear reference for the same reason.

#### **RESULTS**

#### **HEARING LOSS DISTRIBUTIONS**

Absolute thresholds for the maskers were used to assess hearing losses. Masker duration was shorter for the present participants (200 ms) than for the NH reference group or the HI listeners used in our previous study (300 ms in Lopez-Poveda and Johannesen, 2012). Because absolute threshold depends on signal duration, this difference in masker duration could have introduced a small difference in threshold for the participants in each study. Given that HLTOTAL was defined as the difference between masker thresholds of the HI and NH listeners, an attempt was made to correct the present masker thresholds for the influence of duration on absolute thresholds by adding the difference between NH absolute thresholds for pure tone durations of 300 and 200 ms to the masker thresholds of the present HI subjects (Watson and Gengel, 1969). Corrections were smaller than 1 dB at all frequencies. **Figure 1** shows the corrected absolute thresholds for the present participants; thresholds for the HI participants from our previous study are omitted in **Figure 1** but can be found in the original reference. Clearly, on average, participants had high-frequency losses typical of presbycusis but the range of hearing losses at each frequency was quite variable.

### **TEMPORAL MASKING CURVES**

**Figure 2** shows fitted linear-reference and on-frequency TMCs for 67 participants; one participant performed inconsistently during the TMC task and her data were excluded from further analysis. Measured TMCs are omitted in **Figure 2** to avoid clutter. Each column is for a different test frequency as indicated by column title, and each row is for a hearing-loss range as indicated by the text on the right-most ordinate. Both linear-reference (red curves) and on-frequency TMCs (black curves) had characteristics similar in most aspects to those published in earlier reports (Nelson et al., 2001; Plack et al., 2004; Lopez-Poveda et al., 2005; Jepsen and Dau, 2011; Lopez-Poveda and Johannesen, 2012). The onfrequency masker levels for the shortest time gap (2 ms) decreased with decreasing frequency and on-frequency and linear-reference TMCs were less parallel (i.e., on-frequency TMCs were steeper than linear-reference TMCs) at lower than at higher frequencies. Both these aspects are consistent with listeners having less hearing loss (**Figure 1**) and presumably less gain loss and more compression at low than at high frequencies.

Lopez-Poveda and Alves-Pinto (2008) argued that an ideal linear-reference TMC for inferring I/O curves would be for *fp* = 4 kHz and *fm* = 1.6 kHz on the grounds that the slope of such a TMC would be unlikely affected by cochlear compression and would reflect only the post-mechanical rate of recovery from forward masking. As explained above, the hearing loss of some listeners was so large at 4 kHz that it was not possible to measure this preferred linear-reference TMC and alternative linear references were measured instead (**Table 1**). Before using these linear references to infer I/O curves, we verified that their slopes were statistically comparable to the slopes of the preferred linear reference. To do it, we calculated the mean slope of all measured linear reference TMCs across all available time gaps (red curves in **Figure 2**), and compared the mean slope of the preferred linear reference condition (denoted as priority 1 in **Table 1**) with the mean slope for every other condition (denoted as priority orders 2 to 6 in **Table 1**) using a Student's *t*-test. The tests confirmed that all linear references had statistically equivalent slopes (*p >* 0*.*05). The difference for conditions 1 and 5 was close to being significant (*p* = 0*.*055) but did not reach significance. Therefore, we concluded that all linear-reference TMCs had statistically comparable slopes and that it was reasonable to use them to infer I/O curves.

#### **I/O CURVES INFERRED FROM TMCs**

**Figure 3** shows the I/O curves inferred from the TMCs of **Figure 2**. Dotted lines depict linearity with no gain (input level = output level). The large majority of I/O curves had shapes typical of HI subjects: they often had a linear segment at low input levels followed by a compressive segment at mid input levels, followed sometimes by another linear segment at high input levels. Other I/O curves were best described by an almost straight line with either a compressive slope or with a slope close to linearity. Few I/O curves showed unusual characteristics. For example, their RLTs were surprisingly low (50–70 dB SPL), particularly at low frequencies. Also, some I/O curves were almost flat (e.g., at 1 kHz for hearing loss below 15 dB HL). The latter occurs because their corresponding linear reference TMCs were very shallow. Overall, I/O curves extended to lower input levels at low than

at high frequencies, a reasonable result considering that on average participants had greater hearing losses for high than for low frequencies (**Figure 1**).

#### **I/O CURVE ANALYSES AND TAXONOMY**

Lopez-Poveda and Johannesen (2012) argued that HLOHC and HLIHC may be reliably obtained from an I/O curve only if the I/O curve in question shows a CT. They nonetheless hinted that the shape of the I/O curves may be indicative of the type and extent of HLOHC or HLIHC (see their Figure 1 and related text). Here, each I/O curve was analyzed in search for HLOHC and HLIHC using their reasoning and following the logic outlined in **Figure 4**.

A CT was first sought for each I/O curve. Lopez-Poveda and Johannesen (2012) arbitrarily defined the CT as the input level where the I/O curve reached a slope of 0.5 dB/dB from a higher value at lower input levels (**Figure 5B**). To take into account the experimental TMC variability on the CT estimate, rather than inferring the CT from the mean I/O curve, they simulated 100 I/O curves for each condition using a Monte-Carlo approach and used the median CT of those simulations in their subsequent analysis. They regarded the obtained CT as unreliable when it was the lowest input level in the mean I/O curve or if the mean I/O curve slope did not reach the criterion value of 0.5 dB/dB, something infrequent in their data (see Lopez-Poveda and Johannesen,

2012). Here, we first tried to apply their same criteria but found many instances where the resulting CTs were unreliable. To maximize the number of I/O curves with valid CTs, we opted to apply slightly different criteria: (1) that 60% of the Monte-Carlo simulated I/O curves showed a valid CT; *and* (2) that the residual cochlear gain of the mean I/O curve (estimated as described below) was greater than zero. The median CT of the Monte-Carlo simulated I/O curves was taken as the final CT.

A large proportion of I/O curves showed a CT (**Table 2**). Many other I/O curves, however, were best described as straight lines with varying slopes (as depicted in **Figure 5D** or **Figure 5F**) or showed a compressive segment and an RLT but no CT (as shown in **Figure 5H**). The distinction between these cases was made based on residual gain and mean slope using the logic depicted in **Figure 4**.

Gain was defined here as the difference in sensitivity for low and high input levels, as illustrated in the right panels of **Figure 5**. That is, gain was defined as the horizontal distance between intersects with the abscissa of two lines with slopes 1 dB/dB that passed through the end points of the I/O curve. Of course, if the measured I/O curve were only a segment of the actual underlying I/O curve, as would happen for instance for straight-line I/O curves like those shown in **Figure 5D** or **Figure 5F**, this gain estimate would be smaller than the actual residual gain. Actually, insofar as an I/O curve is inferred from an on-frequency and a linear-reference TMC (compare the left and right panels of

**Figure 5**), gain for all types of I/O curves was directly obtained from the corresponding TMCs as follows:

$$gain = (L\_{\rm A} - L\_{\rm B}) - (L\_{\rm C} - L\_{\rm D}) \quad (2)$$

where *LA*, *LB*, *LC*, and *LD* were defined as in **Figure 5**.

If gain was not significantly different from zero1 , then the I/O curve was regarded as linear, hence indicative of total gain loss. If, however, gain was greater than zero, we tried to find an RLT2 in the I/O curve (as shown in **Figure 5H**). If absent, the I/O curve was regarded as linear when its average slope was steeper than an arbitrary value of 0.75 dB/dB. This criterion prevented cases with small amounts of residual gain and a moderate degree of compression from being erroneously classified as total gain loss; that is, it served to distinguish cases like that shown in **Figure 5F**, almost certainly indicative of significant IHC dysfunction, from cases like that shown in **Figure 5D**, almost certainly indicative of total gain loss. If, however, a RLT was present or if the average slope of the I/O curve was *<*0.75 dB/dB, then we assumed that compression was present and that the I/O curve was indicative of significant IHC dysfunction.

**Table 2** shows the number of I/O curves in each of the three categories (Type 1: CT present; Type 2: linear; Type 3: CT

**FIGURE 5 | A taxonomy of I/O curves (blue line, right panels) and their corresponding TMCs (left panels).** Left: linear-reference (red curves) and on-frequency (black curves) TMCs. Right: corresponding, inferred I/O curves. **(A,B)** for Type 1 I/O curves. **(C,D)** for Type 2 I/O curves. **(E,F)** for Type 3 I/O curves with little residual gain. **(G,H)** for Type 3 I/O curves with large residual gain. CT, compression threshold; RLT, return-to-linearity threshold. See main text for details.

**Table 2 | Number of I/O curves according to their shapes.**


*The table includes the present data plus data from (Lopez-Poveda and Johannesen, 2012).*

absent with compression). The proportion of linear I/O curves was greater at and above 2 kHz than at lower frequencies. The proportion of Type 3 I/O curves was greater at lower than at higher frequencies. In a few cases, the hearing loss was so high (above ∼70 dB HL) that measuring the TMC needed to infer an I/O curve would have required masker levels beyond the maximum sound pressure output of our system. These cases, classified

<sup>1</sup>Because each TMC was measured at least three times, we could assess the variance in L*A*, L*B*, L*C*, and L*D*, hence the gain variance (Equation 2). A Student's *t*-test was then used to verify if the mean gain estimate was statistically greater than zero at the 5% significance level.

<sup>2</sup>The return-to-linearity (RLT) was defined as the input level at which the slope of the I/O curve reached an arbitrary value of 0.5 dB/dB from a lower value at lower input levels. It was obtained using the same method and criteria that were used to obtain the CT.

as "too-high loss" in **Table 2**, increased slightly in number with increasing frequency.

Once classified, different I/O curves types were analyzed in search of HLOHC and HLIHC. Type 1 I/O curves were analyzed as suggested by Lopez-Poveda and Johannesen (2012); Type 2 and Type 3 I/O curves were analyzed differently, as described below.

## **HLOHC AND HLIHC ESTIMATES FROM I/O CURVES** *From I/O curves with a compression threshold*

For I/O curves with a CT (Type 1), HLOHC was calculated as the difference between the CT and the mean CT for the reference NH group multiplied by (1–*c*) (Equation 2 in Lopez-Poveda and Johannesen, 2012), where *c* is the mean compression exponent over the compressive segment of the NH I/O curves. HLIHC was obtained as HLTOTAL–HLOHC (Equation 1). This procedure required having mean reference CT and *c* values for NH listeners at each of the test frequencies (0.5, 1, 2, 4, and 6 kHz). Lopez-Poveda and Johannesen (2012) provided reference data for 0.5, 1, and 4 kHz but, to the best of our knowledge, reference data are still lacking at 2 and 6 kHz. For this reason, in the current analysis, the reference values at 4 kHz were used to infer HLOHC and HLIHC also at 2 and 6 kHz. The impact of this approximation on the results is discussed below.

**Figure 6** illustrates HLOHC (top) and HLIHC (bottom) as a function of HLTOTAL. Note that HLTOTAL is defined here as the difference between a participant's absolute threshold for the masker and the mean absolute masker threshold of the reference NH group (the latter was not 0 dB HL as noted by Lopez-Poveda and Johannesen, 2012). Each column illustrates results for a different test frequency, as indicated at the top of each column. The lower insets in each panel show corresponding linear-regression functions and the number of data points (N) used in the regression; the upper insets show regression statistics, where *R*<sup>2</sup> is the proportion of variance explained by the regression line, and the *p*-value is the probability of the relationship between the two variables occurring by chance. Red dashed lines depict 95% confidence intervals for a new observation rather than the confidence intervals of the regression lines.

The linear regression functions in **Figure 6** show that HLOHC contributed between 61 and 70% to HLTOTAL, and HLIHC contributed the rest (30–39%). Interestingly, these percentages were approximately constant across test frequencies, as shown by the slopes of the regression lines. The individual variability of the contributions HLOHC and HLIHC can be assessed from the confidence limits for new single observations. The confidence intervals for HLOHC and HLIHC were around ±9 dB at 0.5 kHz and around ±6 dB over the range 1–6 kHz. In all cases, the confidence

**FIGURE 6 | The contribution of HLOHC (top) and HLIHC (bottom) to HLTOTAL assessed from the analysis of Type 1 I/O curves (i.e., from I/O curves with CT present).** Each column is a for a different test frequency, as indicated by the column title. Results for the current hearing-impaired listeners are depicted as blue symbols; results for NH listeners and for

listeners with mild-to-moderate loss from our earlier study (Lopez-Poveda and Johannesen, 2012) are depicted by black and red symbols, respectively. Continuous lines illustrate mean linear regression functions; dotted lines illustrate 5 and 95% confidence intervals of new individual observations. The insets show linear regression functions and related statistics.

intervals were almost independent of the HLTOTAL. Recall that these results were only for Type 1 I/O curves.

**Figure 7** allows statistical judgment of the incidence of cases suffering from pure IHC loss, pure OHC loss, or mixed IHC/OHC loss. The figure illustrates absolute threshold (in dB HL) as a function of cochlear gain loss (HLOHC) separately for each frequency. The vertical dotted red line (at HLOHC = 0 dB) indicates the hypothetical location of cases whose hearing loss was exclusively due to IHC dysfunction (pure IHC loss). The blue diagonal line depicts the hypothetical location of cases whose hearing loss was exclusively due to cochlear gain loss (pure OHC loss). The blue-dotted diagonal lines show 5–95% confidence intervals for gain loss as calculated from the reference NH listeners (Lopez-Poveda and Johannesen, 2012). Note that the diagonal does not match with the condition HLTOTAL = HLOHC, as one might expect, because as explained by Lopez-Poveda and Johannesen (2012), their NH listeners did not have a mean hearing loss of 0 dB HL. The shaded area indicates the placement of cases whose hearing loss is due partly to cochlear gain loss (HLOHC) plus an additional component (mixed OHC/IHC loss). The results from I/O curves with a CT are depicted as blue circles in the top panels of the figure. For completeness, also shown are the results for listeners with NH (black circles) and mild-to-moderate hearing loss (red circles) from our earlier study (Lopez-Poveda and Johannesen, 2012).

**Figure 7**(top) shows that pure OHC loss was rare and occurred mostly for low absolute thresholds (or, equivalently, small hearing losses). There were no cases of pure IHC loss, something not surprising considering that significant HLIHC would probably make it impossible to measure a CT (Figure 1 in Lopez-Poveda and Johannesen, 2012) and **Figure 7**(top) only show results for cases with a CT. Most cases were in the shaded areas and thus were consistent with mixed IHC/OHC loss. The number of cases with mixed loss tended to increase with increasing absolute threshold (or hearing loss). Incidentally, the number of cases with mixed loss appeared somewhat larger at 2 kHz than at other frequencies. This may be somewhat artifactual due to our using the mean NH CT and absolute threshold at 4 kHz to estimate HLOHC at 2 kHz. Any difference between the mean NH CTs at 2 and 4 kHz would bias the data horizontally and a difference between the mean NH absolute threshold at 2 and 4 kHz would bias the data vertically

Type 1 (circles) and Type 2 (crosses) I/O curves. **Bottom panels:** results for Type 3 I/O curves (left-pointing triangles with arrows). Each column is for a different test frequency, as indicated by the column title. In each panel, the diagonal blue line and associated dotted lines indicate mean values and 5% confidence limits for pure OHC loss (HLTOTAL = HLOHC), and the vertical black, dotted line depicts the hypothetical location of cases with total cochlear

circles) (Johannesen and Lopez-Poveda, 2008). The red dotted lines depict the hypothetical location of cases with pure IHC loss (i.e., hearing loss with zero HLOHC). The shaded areas indicate mixed OHC/IHC losses. Results for the current listeners are depicted as blue symbols; results for NH listeners and for listeners with mild-to-moderate loss from our earlier study (Lopez-Poveda and Johannesen, 2012) are depicted as black and red symbols, respectively.

and thus might contribute to an apparent higher incidence of mixed IHC/OHC loss at 2 kHz.

#### *From linear I/O curves*

Linear I/O curves were assumed to be indicative of total gain loss. Hence, HLOHC for these cases was set equal to the average cochlear gain for the NH reference group. The latter was estimated using Equation (2), and was equal to 35.2, 43.5, 42.7, 42.7, 42.7 dB at 0.5, 1, 2, 4, and 6 kHz. HLIHC was then obtained using Equation (1).

Results for these cases are shown as blue crosses in the top panels of **Figure 7**. Clearly, the great majority of these cases were in the shaded area, hence were indicative of mixed OHC/IHC loss. In other words, for most of these cases, hearing loss was greater than the maximum possible mechanical cochlear gain loss (the gain loss of NH listeners), hence HLIHC *>* 0 dB.

#### *From compressive I/O curves without a compression threshold*

As explained above, I/O curves that were either compressive straight lines (with slopes *<*0.75 dB/dB; **Figure 5F**), or that showed an RLT but not a CT (as in **Figure 5H**) were assumed indicative of IHC dysfunction. This is because any gain reduction will only affect the low-level linear portion of the I/O curve and IHC dysfunction may increase the BM response at detection threshold above the knee-point of the I/O curve (Figure 1B of Lopez-Poveda and Johannesen, 2012). Lopez-Poveda and Johannesen (2012) argued that for these cases Equation (1) does not hold, and that it is reasonable to assume that the audiometric loss can be fully explained in terms of inefficient IHC transduction combined with residual compression (see their Figure 1D). Therefore, we assumed that for these cases HLTOTAL was equal to HLIHC.

This is not to say, however, that cochlear gain loss did not occur in these cases; we are saying that if cochlear gain loss did occur, it is unlikely that it contributed to the audiometric loss (see Figure 1D in Lopez-Poveda and Johannesen, 2012). Indeed, an estimate of (residual) gain was obtained as illustrated in **Figure 5F** or **Figure 5H** using Equation (2). Note that this gain estimate was almost certainly less than the actual residual gain because, due to IHC dysfunction, the measured compressive segment of the I/O was only a portion of the true compressive segment. Cochlear gain loss (HLOHC) was estimated by subtracting the obtained gain estimate from the reference gain for NH listeners (see the previous section). The bottom panels of **Figure 7** illustrate residual gain for these cases. The left pointing arrows indicate that the actual HLOHC was probably *smaller* than estimated, hence that symbols should be to the left of their position in the figure, and closer to the red-dotted line indicative of pure IHC loss. The figure reveals two important results: first, that most of these cases are indicative of mixed IHC and OHC dysfunction (indeed, mixed dysfunction appears more frequent for these cases than for I/O curves with a CT; compare the placement of blue triangles and circles in the bottom and top panels of **Figure 7**); and second, that for any given absolute threshold (or hearing loss), there were comparatively more cases with little gain loss (i.e., indicative of IHC dysfunction) at lower than at higher frequencies. In other words, low-frequency hearing loss is more likely related to IHC dysfunction than to cochlear gain loss.

### **ACROSS LISTENER VARIABILITY OF HLOHC**

**Figure 6** suggests that HLOHC accounted on average for 61–70% of HLTOTAL but it also suggests that there was large across-listener variability. **Figure 8** illustrates this variability more clearly by showing the distribution of HLOHC for three different ranges of HLTOTAL: 15–35, 35–55, and 55–80 dB. Results are based on Type 1 and Type 2 I/O curves. At 2 kHz and above, HLOHC tended to increase with increasing HLTOTAL, while at 0.5 and 1 kHz it decreased slightly or remained approximately constant. The main

result from this figure is, however, that for a given frequency and hearing-loss range, HLOHC was broadly distributed across cases. For example, based on data for 25 subjects, at 4 kHz and for a hearing-loss range of 35–55 dB, HLOHC accounted for between 55 and 100% of HLTOTAL. [Note that the figure suggests that in a few cases with small losses, HLOHC accounted for more than 100% of HLTOTAL. These were cases whose CTs were lower than the mean CT for the reference, NH group (i.e., cases below the diagonal line in **Figure 7**)].

#### **PREVALENCE OF IHC AND OHC DYSFUNCTION**

The previous analyses have focused mostly on the relative contribution of HLOHC and HLIHC to HLTOTAL. The data may be alternatively analyzed with a focus on the type of hearing loss; that is, on how many data points fall in each of several regions depicted in **Figure 7**. To this end, Type 1 (CT present) and Type 2 (linear) I/O curves were split into two subcategories: "Pure OHC dysfunction," when the audiometric loss could be entirely explained as loss of cochlear gain, that is, when HLTOTAL ∼ HLOHC (points within the diagonal range in **Figure 7**); and "Mixed OHC/IHC dysfunction," when the audiometric loss exceeded the cochlear gain loss (i.e., when HLIHC *>* 0; points in the shaded area of **Figure 7**). For the reasons explained above, for Type 3 I/O curves, the absence of a CT was taken as indicative that the audiometric loss could be explained entirely in terms of IHC dysfunction (HLTOTAL ∼ HLIHC). As shown in the bottom panels of **Figure 7**, however, cochlear gain loss of uncertain extent still occurred in a majority of these cases even though it probably did not contribute to the audiometric loss. Therefore, Type 3 I/O curves were also regarded as indicative of mixed OHC/IHC dysfunction.

The top part of **Table 3** gives the number of cases in each of these categories, and the bottom part of **Table 3** the corresponding percentages. Note that the number of cases of Type 3 I/O curves decreased with increasing frequency, suggestive that IHC dysfunction was more determinant to audiometric loss at low frequencies than cochlear gain loss. Note also that the percentage of cases of pure OHC loss decreased with increasing frequency, while the percentage of cases of mixed loss increased with increasing frequency, and that the two percentages add up to 100%. Mixed OHC/IHC loss was significantly more frequent than pure OHC at all frequencies. The bottom part of **Table 3** gives one additional percentage: "Total gain loss" refers to the total percentage of linear I/O curves, whether indicative of pure OHC dysfunction or mixed OHC/IHC dysfunction. The percentage of these cases increased with increasing frequency. Chi χ<sup>2</sup> tests were used to test if the above described frequency trends were statistically significant. The null hypothesis was that for each I/O curve type, the frequency distribution followed the distribution of the total number of cases (i.e., the distribution in the line labeled as "Total" in the table).

## **VERIFICATION AND EXTENSION OF MODEL ASSUMPTIONS**

The present analysis was based on the hearing loss model of Plack et al. (2004) whereby OHC loss would reduce cochlear gain without significantly altering the amount of compression; that is, OHC loss would shift the low-level linear segment of the I/O curve without altering the slope of the compressive segment (Figure 7D of Plack et al., 2004). Their model was based on their observed lack of correlation between the compression exponent and absolute threshold accompanied by a strong negative correlation between gain and absolute threshold (their Figure 6). Their data was restricted to mild-to-moderate hearing losses and to a probe frequency of 4 kHz. Hence, one might object to the present analyses on the grounds that their model has not yet been corroborated for larger hearing losses or for the wider range of test frequencies used here. Our data, however, do support their model. **Figure 9** shows that the CT, a parameter of the I/O curve directly related with cochlear gain, is positively and highly significantly

#### **Table 3 | Number of cases per I/O curve type and frequency (top) and percentage of cases per loss type (bottom).**


*p indicates significance levels for chi-squared tests. The asterisk indicates that the statistical test was not reliable because the number of cases was insufficient.*

correlated with absolute threshold (**Figure 9**, bottom) while the average slope over the compressive segment of the I/O curve (i.e., over the input level range from the CT to the RLT) is uncorrelated with absolute threshold (**Figure 9**, top). This supports the results of Plack et al. (2004) at 4 kHz, extends their model to greater hearing losses and to a wider frequency range from 0.5 to 6 kHz, and supports the validity of our approach.

## **DISCUSSION**

The aim of the current study was threefold: (1) to assess to what extent the audiometric loss is due to a reduction in cochlear gain (or OHC dysfunction), and/or to an additional component, referred here to as IHC dysfunction; (2) to investigate the frequency distribution of the two potential contributions; and (3) to investigate the degree of variability of the two contributions across listeners. Our approach was based on the analysis of behaviorally inferred cochlear I/O curves, as we proposed elsewhere (Lopez-Poveda and Johannesen, 2012).

Regarding the first and second aims, results for Type 1 I/O curves (i.e., for curves with a CT) suggest that on average IHC and OHC dysfunction contribute 30–40 and 60–70% to the audiometric loss, respectively, and that these percentages hold approximately constant across the frequency range from 500 Hz to 6 kHz (**Figure 6**). Regarding the third aim, results suggest that the proportion of the audiometric loss attributed to cochlear gain loss can vary largely across listeners with similar hearing losses, without a clear frequency pattern (**Figure 8**). Cases for which audiometric thresholds could be explained exclusively in terms of IHC dysfunction (Type 3 I/O curves) or in terms of cochlear gain loss (points in the diagonal region of **Figure 7**) were comparatively more numerous at low than at high frequencies (**Table 3**). The large majority of cases, however, were consistent with mixed OHC/IHC dysfunction, even though in some of these cases (Type 3 I/O curves) cochlear gain loss was unlikely to contribute to the audiometric loss (**Table 3**). Total cochlear gain loss (i.e., linear I/O curves), occurred more frequently at high frequencies than at low frequencies (**Table 3**).

### **POTENTIAL METHODOLOGICAL SOURCES OF BIAS** *On the accuracy of the TMC method for estimating I/O curves*

In inferring I/O curves from TMCs, the assumption has been made that the post-mechanical rate of recovery from forward masking is independent of masker frequency and level (Nelson et al., 2001). Evidence exists, however, that for NH listeners the recovery rate is twice as fast for masker levels below around 83 dB SPL than for higher masker levels (Wojtczak and Oxenham, 2009). This level effect, however, does not occur for HI listeners (Wojtczak and Oxenham, 2010). There also exists evidence that the recovery rate might be slower at low (≤1 kHz) than at high probe frequencies (Stainsby and Moore, 2006), although this evidence is controversial (Lopez-Poveda and Alves-Pinto, 2008). Lopez-Poveda and Johannesen (2012) discussed that if these assumptions did not hold, Type 1 I/O curves (i.e., curves with a CT) would lead to larger HLIHC and smaller HLOHC. In the present context, this means that if the assumptions were not valid, the contribution of HLIHC to the total hearing loss might be higher than reported in **Figure 6**.

#### *Ambiguity of linear I/O curves*

Linear I/O curves have been assumed indicative of total cochlear gain loss. This assumption may be inaccurate sometimes. Assuming that cochlear I/O curves become linear at high input levels (something still controversial, Robles and Ruggero, 2001, pp. 1308–1309), for cases with substantial IHC dysfunction, the mechanical cochlear response at the probe detection threshold might be so much higher with respect to NH that only the highlevel linear segment of the I/O curve can be measured (e.g., Figure 1D of Lopez-Poveda and Johannesen, 2012). Hence, linear I/O curves at high input levels may indicate two different things: total cochlear gain loss or substantial IHC dysfunction. It is not possible to distinguish between these two cases. Therefore, some of the cases presently classified as "total cochlear gain loss" (or total OHC dysfunction) may actually reflect substantial IHC dysfunction.

An arbitrary slope criterion of 0.75 dB/dB has been used to separate Type 2 from Type 3 I/O curves. A sensitivity analysis was done to test to what extent results depended on the slope criterion value and we found that only five out of the 325 I/O curves would change type if the slope criterion were varied from 0.6 to 1 dB/dB. Therefore, I/O curve classification seems rather insensitive to slope criterion within these limits.

#### *The impact of using a mean linear-reference TMC for some cases*

A linear reference TMC could not be measured for eight participants (four of them from our previous study) because their hearing losses at the linear reference probe frequencies (**Table 1**) were so high that masker levels would have exceeded the maximum output level of our system. I/O curves for these cases were inferred using a mean linear reference TMC from all other subjects (see Methods). It is unlikely that this methodological difference affected the main results. First, CTs inferred using the mean linear reference TMC were within 5-dB of corresponding estimates inferred using the variant TMC method of Lopez-Poveda and Alves-Pinto (2008), a method that does not require a linear reference TMC (results not shown). Second, the number of I/O curves inferred using a mean linear reference TMC was only a very small fraction of the total number of I/O curves used in the present study.

#### *Cochlear gain for normal hearing listeners and total OHC loss*

Linear I/O curves were regarded as indicative of total cochlear gain loss (**Figures 4**, **5**). For these cases, HLOHC was set equal to the mean cochlear gain of the reference, NH group. If the latter were inaccurate, this could have affected the present estimates of HLOHC (i.e., the number and position of blue crosses in **Figure 7**). Gain for the NH group was calculated as described in section I/O Curve Analyses and Taxonomy and one might argue that this method underestimated gain for those NH I/O curves with absent CT or RLT; that is, for I/O cures that were still compressive at the lowest or the highest input levels in the I/O curve. The present NH gain values at high frequencies, however, compare well with previously reported values inferred using different psychoacoustical methods and with values inferred from direct BM recordings. For example, at 4 kHz, mean gain was 42.7 dB hence comparable to the value (43.5 dB) reported by Plack et al. (2004). Plack

et al. estimated gain as the difference between the masker levels of the linear-reference and on-frequency TMCs for the shortest gap, while gain was defined here as the sensitivity difference for low and high input levels (see Ruggero et al., 1997 for a discussion of different gain definitions). Gain for the present NH group would have been 48.9 dB had it been calculated using the definition of Plack et al. (2004), hence slightly higher than the value of Plack et al. The present NH gain compares well also with the value (35 dB at 6 kHz) that would be obtained from the I/O curves in Figure 2 of Oxenham and Plack (1997) that were inferred using a different psychoacoustical method known as growth of forward masking. Also, the present NH gain values at 4 kHz are within the value range suggested by direct basal BM recordings (range = 19–62 dB; median = 40 dB; mean = 38 dB; Table 1 of Robles and Ruggero, 2001). Altogether, this suggests that the present high-frequency NH gain values were reasonable.

Direct BM recording in animals suggest that cochlear gain is less for apical than for basal BM regions although it is possible that the difference is partly due to damage of apical cochlear mechanics during experimental recordings. For example, the change of chinchilla BM sensitivity at the characteristic frequency between low and high input levels is 10–20 dB at 500–800 Hz compared to 50 dB at 8–9 kHz (Tables 2, 3 in Robles and Ruggero, 2001). Previous psychoacoustical reports in humans using other methods and assumptions also suggest less gain at low frequencies but do not provide quantitative estimates (Plack et al., 2008). Gain estimates for the present NH group were 35.2 dB at 500 Hz and 42.7 dB at 4 kHz. The frequency trend in the present results is thus qualitatively consistent with direct BM observations, and quantitative differences might be due to differences in cochlear tonotopic mappings across species. If, however, the postmechanical rate of recovery from forward masking were after all faster at lower frequencies (see previous sections), then cochlear gain would be smaller than reported here and the pattern of results would become more consistent with the animal data.

In summary, the NH gain values used here to quantify HLOHC for cases of total OHC loss (linear I/O curves) seem reasonable at high frequencies but are less certain at low frequencies.

Incidentally, it is noteworthy that the present NH gain increased from 35.2 dB at 500 Hz to 43.5 dB at 1 kHz (unpaired, equal variance, *t*-test, *p* = 0*.*014) and then gain remained constant at higher frequencies (42.7 dB at 4 kHz). This pattern differed slightly from that reported by (Johannesen and Lopez-Poveda, 2008), from where some of the present NH data were taken. Indeed, in that study, gain increased gradually with increasing frequency from 37 dB at 500 Hz to 55 dB at 4 kHz (see their Figure 11A). This discrepancy is almost certainly due to methodological differences. First, the two studies used different definitions of gain; Johannesen and Lopez-Poveda (2008) calculated gain as the difference between the RLT and CT. Second, the present NH data combined data from the 10 participants that took part in the study of Johannesen and Lopez-Poveda (2008) plus data for five more NH participants from Lopez-Poveda and Johannesen (2009); the latter contributed data particularly at 0.5 and 1 kHz. Third, Johannesen and Lopez-Poveda (2008) fitted their I/O curves with a third-order polynomial, which "forces" an RLT when a CT is present because the slopes of a third-order polynomial are identical below and above its inflection point. Indeed, fewer of the I/O curves from the study of Johannesen and Lopez-Poveda (2008) retained an RLT when they were re-analyzed using the present fitting approach.

#### *The influence of conductive hearing loss on the results*

Participants were controlled for conductive hearing loss. Nonetheless, their air-bone gaps could have differed by ≤15 dB at one frequency and/or ≤10 dB at any other frequency (see Methods). Small conductive losses might have increased probe absolute threshold and hence TMC masker levels by an amount equal to the conductive loss at the corresponding probe frequencies. The influence on the inferred I/O curve would be an upward vertical shift of the I/O curve equal to the conductive loss at the frequency of the linear reference probe and a rightward horizontal shift equal to the conductive loss at the frequency of the on-frequency masker. The CT would be affected only by the horizontal shift. Therefore, conductive loss at the particular frequency might lead to an overestimate of HLOHC at that frequency. Pearson's correlation between HLOHC and air-bone gap was significant only at 1 kHz and indicated decreasing HLOHC for increasing air-bone gap. The direction of the effect was therefore opposite to the presumed effect of conductive hearing loss on HLOHC and hence we concluded that conductive loss was unlikely to affect mean HLOHC estimates in **Figure 6**.

## *The potential influence of dead regions on the results*

A "dead region" is "a region in the cochlea where the IHCs and/or neurons are functioning so poorly that a tone which produces peak BM vibration in that region is detected via an adjacent region where the IHCs and/or neurons are functioning more efficiently" (p. 272 in Moore, 2007). In principle, dead regions could affect TMC measures as the probe presented in a dead region would be detected at a cochlear place removed from the probe place: e.g., at a place where the on-frequency masker might be subject to a compression regime different from compression at the normal probe place. For example, if the 4-kHz cochlear region was dead, a 4-kHz probe might be detected at the 2-kHz cochlear region where a 1.6-kHz (off-frequency) masker, which is typically regarded as a linear-reference condition, might be actually subject to significant compression.

Dead regions occur almost always for hearing losses above ∼60 dB HL (Table 1 in Vinay and Moore, 2007) and the present listeners were roughly selected to have hearing losses *<*80 dB HL to be able to measure TMCs for a majority of test frequencies (**Figure 1**). Despite this, TMCs could not be measured for the higher losses. Of the 325 measured I/O curves, the number that may have been affected by dead regions can be roughly estimated from the data in Table 1 of Vinay and Moore (2007) (note that their data goes to 4 kHz only and we have assumed that the incidence of dead regions is identical at 4 and 6 kHz). Our analysis revealed that the expected incidence of dead regions was one, two and two at 2, 4, and 6 kHz, respectively. These numbers are so low that they are unlikely to have biased the reported HLOHC and HLIHC.

### **COMPARISON WITH EARLIER STUDIES**

Based on our analysis of Type 1 I/O curves, we have shown that HLOHC is 60–70% of HLTOTAL across the frequency range from 0.5 to 6 kHz. This number is roughly consistent with that reported by earlier studies for more restricted frequency ranges, mostly at 4 kHz (Plack et al., 2004; Lopez-Poveda and Johannesen, 2012). It is, however, slightly lower than the 80–90% value reported elsewhere based on loudness models (Moore and Glasberg, 1997). Jürgens et al. (2011) showed that the two approaches (loudness model and TMCs) should give similar results. Therefore, the reason for this difference is uncertain.

We have also shown that even though the percentage of cases for which HLIHC accounts entirely for HLTOTAL (the percentage of Type 3 I/O curves) or the percentage of cases for which HLOHC ∼ HLTOTAL (the percentage of pure OHC dysfunction) are small, they are both larger for frequencies ≤1 kHz and decrease with increasing frequency (**Table 3**). To the best of the authors' knowledge, these trends have not been reported explicitly before, possibly due to the use of small sample sizes in earlier studies, but are not without precedent. For example, Moore and Glasberg (1997) used a model of loudness growth to estimate HLIHC and found that it increased with decreasing frequencies for three listeners. Likewise, Jepsen and Dau (2011) reported greater HLIHC at lower frequencies for a few subjects, although their average results were still consistent with the common notion that the most typical functional deficit is the loss of mechanical gain in the cochlear base.

An important distinction between the present and earlier analyses is that here, HLIHC and HLOHC were not always regarded as mutually exclusive, additive contributions to HLTOTAL. Instead, the possibility has been contemplated that Equation (1) does not hold for cases where IHC dysfunction is so significant that it makes it impossible to measure a CT. In these cases, it was assumed that HLTOTAL may be explained fully in terms of HLIHC even though concomitant cochlear gain loss did probably occurred (**Figure 7**, bottom).

## **STRUCTURE-FUNCTION RELATIONSHIPS**

Great care must be exercised at establishing a direct link between the behavioral deficits seen here (audiometric loss and cochlear gain loss) and hair cell pathophysiology in humans. Discussing potential relationships might be nonetheless useful.

We have shown that for a large percentage of cases (Type 1 I/O curves), 60–70% of HLTOTAL is due to HLOHC and 30–40% is due to HLIHC, and that these percentages are roughly constant across frequencies (**Figure 6**). It would be probably wrong to conclude that this implies identical physical damage to OHCs and IHCs along the cochlear length. First, when physical hair cell damage occurs (e.g., after noise exposure), it is typically greater in the cochlear base than in the apex (Møller, 2000). Second, the median age of the present participants was 61 years, hence for most of them the cause of hearing loss was probably presbycusis. Presbycusis is associated with a reduction of the endocochlear potential that causes high-frequency hearing loss (Schmiedt et al., 2002). This high-frequency loss is almost certainly due to concomitant, combined IHC and OHC dysfunction. A given reduction of the endocochlear potential causes greater loss of cochlear gain in the cochlear base than in the apex (Figure 8 of Saremi and Stenfelt, 2013), and a reduced response in the IHCs (Meddis et al., 2010; Panda et al., 2014). The present results for Type 1 I/O curves are consistent with concomitant IHC and OHC dysfunction characteristic of metabolic presbycusis and less so with the alternative and perhaps prevailing view that highfrequency loss is due to greater anatomical loss or damage of basal OHCs.

We have also shown, however, that the percentage of Type 2, linear I/O curves increases with increasing test frequency (**Figure 7**-top and **Table 3**). If metabolic presbycusis linearized cochlear responses (Saremi and Stenfelt, 2013), this might be indicative that metabolic presbycusis reduces the endocochlear potential more in the cochlear base than in the apex, something unlikely. A more parsimonious explanation for the higher percentage of linear I/O curves at high frequencies would be that they are actually due to severe physical OHC loss or damage. The latter explanation would be consistent with the prevailing view that physical OHC damage is greater in the cochlear base than in the apex (Møller, 2000).

Lastly, we have also shown that the percentage of Type 3 I/O curves is greatest for test frequencies ≤1 kHz and decreases with increasing frequency. This trend of more frequent IHC dysfunction at apical sites remains intriguing. A few studies have reported similar trends. For example, apical IHCs were found to be more labile than basal IHCs in guinea pigs treated with polypeptide antibiotics (Kohonen, 1965). Similarly, after administration of tobramycin, IHCs were found to be normal in the base but completely damaged in the apex whereas the OHCs were found to be normal in the apex and damaged in the base (Aran et al., 1982). Therefore, some Type 3 I/O curves might be indicative of antibiotic-induced hearing loss.

Unfortunately, confirmation of these conjectures was not possible due to the lack of accurate information regarding the etiology of hearing loss for the present participants.

## **CONCLUSIONS**

With regard to the contribution of IHC and OHC dysfunction to the audiometric loss, the main conclusions are:


With regard to the incidence of dysfunction types, the main conclusions are:


Overall, the present results undermine the common view that high-frequency loss is typically due to greater physical damage of basal OHCs, and suggest that in a large percentage of cases, it is due to a common mechanism that concomitantly affects IHCs and OHCs, possibly reduced endocochlear potential. They further suggest that IHC processes may be more labile in the apex than in the base and/or that IHC dysfunction may have a greater impact on auditory threshold than cochlear gain loss at low frequencies.

## **ACKNOWLEDGMENTS**

We thank Bill Woods and Sridhar Kalluri for insightful discussions, Almudena Eustaquio-Martin for technical support, and the staff of the ENT Service of Salamanca University Hospital and "La Alamedilla" Clinic (Salamanca, Spain) for their invaluable help with participant recruitment. Work supported by the Starkey Laboratories, (EEUU); Junta de Castilla y León; and the Spanish Ministry of Economy and Competitiveness (ref. BFU2012-39544- C02).

## **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 14 May 2014; paper pending published: 27 May 2014; accepted: 02 July 2014; published online: 23 July 2014.*

*Citation: Johannesen PT, Pérez-González P and Lopez-Poveda EA (2014) Acrossfrequency behavioral estimates of the contribution of inner and outer hair cell dysfunction to individualized audiometric loss. Front. Neurosci. 8:214. doi: 10.3389/ fnins.2014.00214*

*This article was submitted to Auditory Cognitive Neuroscience, a section of the journal Frontiers in Neuroscience.*

*Copyright © 2014 Johannesen, Pérez-González and Lopez-Poveda. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*