Edited by: Guillaume Andeol, Institut de Recherche Biomédicale des Armées, France
Reviewed by: Peter Cariani, Harvard Medical School, USA; Kenneth Stuart Henry, University of Rochester, USA
*Correspondence: Ana Alves-Pinto, Research Unit of the Buhl–Strohmaier Foundation for Pediatric Neuro-Orthopaedics and Cerebral Palsy, Klinikum rechts der Isar, Technische Universität München, Ismaninger Strasse 22, 81675 Munich, Germany e-mail:
This article was submitted to Auditory Cognitive Neuroscience, a section of the journal Frontiers in Neuroscience.
†Present address: Ana Alves-Pinto, Research Unit of the Buhl–Strohmaier Foundation for Pediatric Neuro-Orthopaedics and Cerebral Palsy, Klinikum rechts der Isar, Technische Universität München, Munich, Germany
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
The interaction of sound waves with the human pinna introduces high-frequency notches (5–10 kHz) in the stimulus spectrum that are thought to be useful for vertical sound localization. A common view is that these notches are encoded as rate profiles in the auditory nerve (AN). Here, we review previously published psychoacoustical evidence in humans and computer-model simulations of inner hair cell responses to noises with and without high-frequency spectral notches that dispute this view. We also present new recordings from guinea pig AN and “ideal observer” analyses of these recordings that suggest that discrimination between noises with and without high-frequency spectral notches is probably based on the information carried in the temporal pattern of AN discharges. The exact nature of the neural code involved remains nevertheless uncertain: computer model simulations suggest that high-frequency spectral notches are encoded in spike timing patterns that may be operant in the 4–7 kHz frequency regime, while “ideal observer” analysis of experimental neural responses suggest that an effective cue for high-frequency spectral discrimination may be based on sampling rates of spike arrivals of AN fibers using non-overlapping time binwidths of between 4 and 9 ms. Neural responses show that sensitivity to high-frequency notches is greatest for fibers with low and medium spontaneous rates than for fibers with high spontaneous rates. Based on this evidence, we conjecture that inter-subject variability at high-frequency spectral notch detection and, consequently, at vertical sound localization may partly reflect individual differences in the available number of functional medium- and low-spontaneous-rate fibers.
The ridges and cavities of the outer ear alter the spectra of sounds that enter the ear canal, mainly (but not only) attenuating energy at high frequencies, such that notches are introduced into the spectra (Shaw and Teranishi,
The spectrum of a sound may be encoded in the AN activity in at least two ways: in the average discharge rate across fibers tuned to different frequencies (a
The question of how high frequency spectral notches are encoded in the AN can be approached by simply testing the hypothesis that they are encoded as AN rate profiles. If this were the case, then the internal, AN representation, and consequently the perception, of high-frequency spectral notches should deteriorate at high sound levels due, firstly, to the broadening of the fibers' frequency response at high levels (Rose et al.,
We have previously tested the hypothesis that the internal representation of high-frequency spectral notches deteriorates with increasing sound level in a series of psychoacoustical and computational modeling studies. The results of these studies, reviewed here in the section ‘Human psychophysics’ and ‘Computational simulation of inner hair cell receptor potentials evoked by flat-spectrum and notch noises’ respectively, did not support the rate-profile code and rather pointed to alternative codes. The section ‘Analysis of AN responses to flat-spectrum and notch noises’ presents new data and analyses pertaining to AN activity elicited by stimuli identical to those used in our previous studies. This new set of physiological data also undermines the rate-profile code and rather suggests that the information required for discriminating between noises with different high-frequency spectra is carried in a temporal code. The combined evidence from this series of related psychoacoustical, computational modeling, and physiological studies will be discussed in the last section in terms of its implications for spatial hearing and for the across-listener variability in auditory-based spatial skills.
Localization of impulsive sounds in the medial sagittal plane by human listeners deteriorates with increasing sound level up to about 70 dB SPL (Hartmann and Rakerd,
Surprisingly, however, threshold notch depth varied
Hence, the initial hypothesis of a monotonic increase in notch detection thresholds with increasing level was not supported by the experimental results, which rather suggested that the notch must be better represented internally at levels above and below around 70–80 dB SPL than at these mid-levels. This result prompted further research aimed at investigating the quality of the internal representation of the spectra of flat-spectrum and notch noises at increasing sound levels using diverse approaches: first, by comparing psychoacoustical masking patterns evoked by the two noises; second, by comparing computer simulations of the peripheral auditory system response to the two noises; and lastly, by analyses of direct AN fiber responses to the two noises.
The quality of the rate-profile representation of flat-spectrum and notch noises was assessed psychoacoustically by measuring the forward-masking patterns of the two noises (Alves-Pinto and Lopez-Poveda,
The forward masking pattern of the flat-spectrum/notch noises were obtained by measuring the masked threshold of detection of pure tones with frequencies covering the spectral region of the notch. They were measured for low (50 dB SPL), medium (70 and 80 dB SPL), and high (90 dB SPL) masker overall levels, to allow comparison with the non-monotonic effect of level in the main discrimination task (Figure
The spectral notch was clearly visible in the difference masking patterns at 50 dB SPL, less obvious at 70 and 80 dB SPL, and barely visible at 90 dB SPL (Figure
The quality of the internal AN representation of high-frequency spectral notches must be limited by the signal processing that takes place before the AN. The inner hair cell (IHC) receptor potential is the driving potential of AN fibers' activity and therefore sets a limit on the quality of the representation of spectral information in the AN. It is possible, for example, that the excitation pattern representation of the stimulus spectrum degrades at high sound levels because saturation already occurs at the level of the IHC receptor potential (e.g., Russell and Sellick,
The model included realistic cochlear mechanical level-dependent gain and tuning and a realistic IHC model (see Lopez-Poveda et al.,
The results of the simulations showed that the quality of the IHC excitation pattern representation of the spectral notch (blue line in Figure
If psychoacoustical discrimination between the flat-spectrum and notch noise were determined by differences between the IHC representations of the flat-spectrum and notch noise spectra, then the simulations suggested that discrimination based on the excitation pattern should be increasingly more difficult with increasing level (Figure
What is the origin of the non-monotonic effect of level in the population receptor potential FFT? This issue was addressed by Lopez-Poveda et al. (
Even though the model may not perfectly simulate the human IHC response (Lopez-Poveda et al.,
Second, the similarity between the effects of intensity on the difference IHC receptor potential FFT (Figure
Inspired by this, further insight about the neuronal code responsible for the internal representation of high-frequency spectral notches and for the main psychoacoustical discrimination results was sought by directly measuring the activity of AN fibers in response to the flat-spectrum and notch noises used in the psychoacoustical and simulation experiments. These new data are described in the following section.
The quality of the internal representation of the high-frequency spectral notch at the level of the AN was assessed physiologically by directly recording the activity of guinea-pig AN fibers in response to stimuli like those used in the main psychoacoustical study. Following the evidence from the psychoacoustical and simulation studies (reviewed above), analyses of neuronal activity included an evaluation of the representation of the spectral notch in the average rate profile, but also in the temporal pattern of ANfiber discharges. For the latter, we could not apply the FFT analysis that we had used to analyze IHC receptor potential simulations because of (1) the discrete nature of the AN spike trains, (2) the short duration of the recording interval (110 ms), and (3) the limited number of recorded AN units. Instead, we used an “ideal observer” analysis (see below).
Recordings from AN fibers of anaesthetized guinea pig were made using the methods described in Palmer et al. (
AN fibers were stimulated with bursts of broadband (0.02–16 kHz) noise similar to those used in the psychoacoustical and simulation experiments. Two types of noises were used: one had a flat spectrum; the other was similar except for a frequency region centered at 7 kHz where it had a rectangular spectral notch (Figure
The noise bursts were generated as described in the related behavioral study (Alves-Pinto and Lopez-Poveda,
In this analysis a subpopulation of 106 fibers, for which at least 5 and typically 10 complete spike trains were recorded for all stimulus conditions tested, was used. The mean discharge rate was calculated over the whole stimulus duration (110 ms). Raw rate profiles are uninformative of the spectral content of the stimulus due to the large across-fiber variability in spontaneous and saturated rates (Rice et al.,
The psychoacoustical threshold notch depth for discriminating between a flat-spectrum and a notch noise, Δα, was predicted from the responses collected for the sample of AN fibers according to the following equation (Siebert,
Given the discrete nature of the recorded AN responses and the limited number of stimulus conditions tested, a discrete version of the above equation was adopted for the current analysis:
Figure
It becomes evident that this analysis is designed to detect the maximum relative change in discharge rate available throughout the stimulus duration and throughout the population of fibers and that it optimizes the information that each fiber can convey in its response toward the detection of a change in the stimulus, hence the term “ideal observer” analysis. The information carried in the variance of firing rate in each time bin counts and, in this sense, this “ideal observer” analysis contrasts with the average rate profile analysis that disregards any rate fluctuations in time and considers only the information conveyed in the overall discharge rate of the fibers assessed throughout the whole stimulus duration.
Equation (1) was derived on assumption that the occurrence of AN spikes follows a Poisson distribution, that is, that spikes occur at times that are independent of each other. Furthermore, in using Equation (1) to predict psychoacoustical discrimination thresholds, the implicit assumption is made that the listener can make optimal use of every bit of information available in the activity of the population of fibers, as explained above. Although neither of these two assumptions apply here (Siebert,
Δα was computed for different time bin durations, Δ
First, we tested whether psychoacoustical spectral discrimination could be accounted for using only the AN rate-profile representation of the stimulus spectrum. A simple visual analysis of both normalized and difference rate profiles (Figures
The above conclusion was confirmed by a signal-detection-theory
The results (Figure
The “ideal observer” analysis (Siebert,
Remarkably, the
To confirm this optimal analysis time binwidth, Kendall's τ non-parametric correlation coefficient (Press et al.,
The notch depth threshold values predicted by the “ideal observer” analysis of AN fiber responses shown in Figure
The “ideal observer” analysis for a time binwidth equal to the stimulus duration (110 ms) disregards any temporal information. Hence, it was another way of testing the rate-profile code hypothesis. The shape of the associated predicted function (diamonds in Figure
The possibility exists that the non-monotonic shape of the behavioral threshold notch depth
This conjecture was tested here by applying the “ideal observer” analysis to two groups of AN fibers, with units classified according to spontaneous rate as HSR or LSR+MSR when their spontaneous rate was higher or lower than 18 spikes/s, respectively (Liberman,
Predicted threshold notch depth
In the psychoacoustical discrimination study, it was observed that threshold notch depths for discrimination were on average 2.5 times larger for a short (20-ms duration) than for a long (220 ms) stimulus, and that this ratio was approximately constant across sound levels (Alves-Pinto and Lopez-Poveda,
The resulting predicted thresholds were higher for the short than for the long stimulus (Figure
We have shown that psychoacoustical discrimination between auditory broadband stimuli with and without high-frequency spectral notches is uncorrelated with the differences in the overall AN rate-profile representations of their spectra. Although the spectral notch is visible in the rate-profile for all sound levels above 50 dB SPL provided it is sufficiently deep (Figure
Differences in neuronal processing between humans and guinea pigs may have contributed to the mismatch between the psychoacoustical and the neural results in terms of level dependence of rate-profile derived discrimination thresholds. Also the anesthetic may have had an effect on neuronal responses. Both of these factors would have however also affected the correspondence between psychoacoustic and neural results based on the “ideal-observer” analysis. Nevertheless, the idea that some form of temporal code may be used for high-frequency spectral discrimination is not new and agrees with evidence from other independent studies in a number of aspects. It has been put forward, for example, to explain the limits of human auditory frequency discrimination for single tones (Heinz et al.,
What is the nature of the temporal code? We have no definite answer, only conjectures. Any AN fiber is effectively driven by a half-wave rectified, low-pass filtered version of the basilar membrane response waveform at its corresponding place in the cochlea. With broadband noise stimulation, this response can be described as a randomly amplitude-modulated carrier with a carrier frequency near the fiber's CF. The range of modulation frequencies is limited by the BW of the cochlear filter (Louage et al.,
That said, however, any difference in the envelopes evoked by the flat-spectrum and notch noises should show up in the aggregated FFTs of the simulated IHC receptor potential waveforms; that is, they should show up in Figures
The present neural results support the “multiple-looks” model for auditory long-term temporal integration: the decrease in threshold with increases in the stimulus duration. Such temporal integration does not actually involve integrating stimulus energy (or correspondingly accumulating nerve spikes) over time, but is more consistent with a model whereby “multiple-looks” of the output envelopes from auditory filters are taken in non-overlapping time windows of about 5–10 ms of duration (Viemeister and Wakefield,
The present physiological results are also consistent with explanations proposed for the so-called “dynamic range problem” of hearing. This refers to the apparent mismatch between the wide range of sound levels over which good intensity discrimination can be shown and the dynamic range of most AN fibers (Viemeister,
Some questions remain. First, the “ideal observer” predictions showed that performance could improve substantially if the discharge rate of AN fibers were sampled in time binwidths shorter than 8 ms (Figure
Second, the amount of perceptually-relevant information for high-frequency spectral discrimination was shown to be less for sound levels around 80 dB SPL than for lower or higher levels. This still needs explaining. The results presented here demonstrate that it is unrelated to having two fiber populations with different thresholds and dynamic ranges. It is possible that spectral representation of the notch in the BM excitation pattern may be compromised at mid-levels due to cochlear mechanical compression (see Lopez-Poveda et al.,
It has been long thought that high-frequency spectral notches in the head-related transfer function (HRTF) are important cues for human (vertical) sound localization (e.g., Butler and Belendiuk,
In any case, the ability of listeners to actually use high-frequency HRTF notches as sound localization cues must depend on a complex combination of their level of performance in notch detection tasks, the shape of their ears, and the characteristics of the stimulus (duration and level).
Performance in high-frequency notch detection tasks, and hence in spatial localization involving detection of these spectral features, will ultimately depend on the quality of the representation of the spectral notch in the AN. The evidence provided here suggests that high-frequency spectral information may be encoded in the temporal pattern of AN discharges, analyzed over time binwidths 4–9 ms long. Studies on the temporal aspects of spectral processing in sound localization also reported that information about the spectrum level of a cochlear filter can only be reliably obtained when the signal from that filter is integrated over a time window of about 5 ms (Jin,
Spectral notch encoding based on the temporal patterns of discharge of AN fibers is likely to be more susceptible to variability than encoding based on the long-term average discharge rate. Spikes occur stochastically in time and spike counts for constant stimuli are likely to vary from time bin to time bin. Variations in the number of spikes have a larger effect in a small than in a larger time window, making any changes that are not stimulus related to more strongly affect the quality of the information encoded in the spike pattern. This higher susceptibility to variability could partly contribute to the large variability in the detection of spectral notches across listeners observed here.
Finally, discrimination thresholds derived from the “ideal observer” analysis of responses of LSR and MSR fibers were comparable to those derived using all fibers, including HSR fibers (Figure
For most listeners, high-frequency spectral notch detection becomes gradually more difficult with increasing level up to 70–80 dB SPL and improves at higher levels. However, across-listener variability is high and depends both on the stimulus characteristics (duration and level) and on the notch BW.
Psychoacoustical, modeling, and physiological results consistently suggest that the non-monotonic effect of level on notch detection is inconsistent with the notch being encoded in the rate profile of AN fibers only and support, instead, that the temporal pattern of AN discharges monitored in time binwidths of 4–9 ms of duration conveys encoding relevant information. Physiological data suggest that LSR fibers are key to notch encoding.
The present evidence suggests that high-frequency spectral notch detection, and consequently, also vertical sound localization accuracy, requires information carried in the temporal characteristics of AN activity, particularly, by the available number of low and medium spontaneous rate fibers. The number of fibers likely varies substantially across individuals, which might contribute to across-listener variability in sound localization.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Experimental work supported by the Spanish Fondo de Investigaciones Sanitarias (grants PI02/0343 and G03/203) and by European Regional Development Funds to Enrique A. Lopez-Poveda. The preparation of this paper was supported by the Spanish Ministry of Innovation and Competitiveness to Enrique A. Lopez-Poveda (grant BFU2012-39544-C02).
190–95% of the population of spiral ganglion neurons comprise type I cells. These are connected to the inner hair cells and encode most auditory information. The rest 5–10% of the population consist of type II afferents that are connected to outer hair cells. Their role in auditory coding remains unclear but they are likely involved in the regulation of the operating point of the “cochlear amplifier” (Pickles,