Edited by: Annie Tremblay, University of Kansas, USA
Reviewed by: Wendy Herd, Mississippi State University, USA; Christine E. Shea, University of Iowa, USA; Adrian Garcia-Sierra, University of Connecticut, USA
*Correspondence: Shannon L. Barrios
This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
To attain native-like competence, second language (L2) learners must establish mappings between familiar speech sounds and new phoneme categories. For example, Spanish learners of English must learn that [d] and [ð], which are allophones of the same phoneme in Spanish, can distinguish meaning in English (i.e., /deɪ/ “day” and /ðeɪ/ “they”). Because adult listeners are less sensitive to allophonic than phonemic contrasts in their native language (L1), novel target language contrasts between L1 allophones may pose special difficulty for L2 learners. We investigate whether advanced Spanish late-learners of English overcome native language mappings to establish new phonological relations between familiar phones. We report behavioral and magnetoencepholographic (MEG) evidence from two experiments that measured the sensitivity and pre-attentive processing of three listener groups (L1 English, L1 Spanish, and advanced Spanish late-learners of English) to differences between three nonword stimulus pairs ([idi]-[iði], [idi]-[iɾi], and [iði]-[iɾi]) which differ in phones that play a different functional role in Spanish and English. Spanish and English listeners demonstrated greater sensitivity (larger d' scores) for nonword pairs distinguished by phonemic than by allophonic contrasts, mirroring previous findings. Spanish late-learners demonstrated sensitivity (large d' scores and MMN responses) to all three contrasts, suggesting that these L2 learners may have established a novel [d]-[ð] contrast despite the phonological relatedness of these sounds in the L1. Our results suggest that phonological relatedness influences perceived similarity, as evidenced by the results of the native speaker groups, but may not cause persistent difficulty for advanced L2 learners. Instead, L2 learners are able to use cues that are present in their input to establish new mappings between familiar phones.
Linguistic experience shapes listeners' sensitivities to phonetic distinctions. Specifically, extensive experience with one's native language (coupled with a lack of experience with nonnative sounds and contrasts) limits listeners' sensitivity to nonnative phonemic distinctions (Lisker and Abramson,
The present study contributes to this literature on sound category learning by investigating the role of language-specific phonological patterning in L2 phonological development. We use both behavioral methods and magnetoencepholographic (MEG) recordings to investigate how adult second language learners' knowledge of native language (L1) phonological patterns impacts the acquisition of their second language sound system. In particular, we ask whether advanced adult late-learners of a second language overcome native language mappings to establish new phonological relations between familiar phones.
Languages differ in their mappings between predictable surface variants (i.e., allophones) and more abstract phonological categories (i.e., phonemes) (Kenstowicz,
Although three very similar phonetic categories, [d], [ð], and [ɾ], exist in both Spanish and English, the functional significance of these categories varies between the two languages. The phones [d] and [ð] distinguish word meanings in English (i.e., [ðeɪ] “they” and [deɪ] “day”). In contrast, a productive phonological pattern causes the voiced obstruents /b, d, ɡ/ to surface as the approximants [β, ð, ɣ] intervocalically in Spanish
A consequence of cross-linguistic variation in the mapping between speech sounds and phonemes is that L2 learners may need to establish new mappings between familiar phones. For example, to attain native-like competence in English, a Spanish learner must learn that [d] and [ð], which are allophones of a single phoneme (i.e., /d/) in Spanish, can distinguish word meaning in English. Doing so is assumed to entail the updating of internalized knowledge about the distribution of the phones in the L2 (i.e., learning that the phones are not restricted to particular environments in the target language, but instead can occur in the same phonological environments). Eckman et al. (
L1 context-dependent allophones present unique challenges for the L2 learner from the perspective of production and perception. The learner must learn to detect the target language phonemic contrasts in perception, and suppress L1 positional variants in L2 production (even when the phonological context is appropriate for their production). Both anecdotal and experimental evidence from speech production (Lado,
Research with adult native listeners has revealed that speech perception is not only influenced by listeners' experience (or lack of experience) with the phones in question; the phonological status of a sound contrast also affects listeners' perception. Several behavioral studies have reported differences in the perception of familiar phones (i.e., phones that occur regularly in the native language of the listener) depending on whether the sounds in the pair function as contrastive phonemes or non-contrastive allophones in the listener's native language. In particular, these studies report that sounds which are contrastive are discriminated more readily, and are rated less perceptually similar than allophonically related phones (Pegg and Werker,
A similar pattern was also reported by Pegg and Werker (
Peperkamp et al. (
In a recent study, Boomershine et al. (
In addition to the behavioral studies reviewed above, research using neurophysiological techniques has also reported important differences in the processing of contrastive vs. non-contrastive sound pairs (Näätänen et al.,
A negative component of the event-related potential known as the mismatch negativity (MMN), and its magnetic counterpart, the mismatch field (MMF) response recorded using MEG, provide an early automatic, change detection response (Näätänen,
A number of studies have demonstrated that aspects of a listener's native phonology modulate MMN amplitude (Näätänen et al.,
The MMN response has also been used as an index of nonnative vowel phoneme acquisition by second language listeners. Winkler et al. (
In a study which looked specifically at the pre-attentive processing of phonemes vs. allophones, Kazanina et al. (
In a recent training study with L2 learners, Herd (
A related question in bilingual speech perception has been whether early stages of speech representation which are indexed by the MMN can be affected by the language being used. For instance, in a follow up to their earlier study, Winkler et al. (
In contrast, a recent study by García-Sierra et al. (
In sum, listeners' perception of speech sounds is strongly and systematically constrained by the native language phonology, with the discriminability of pairs of phones being influenced by phonological status in the native language. This pattern of relative insensitivity to phone pairs which are allophones of a single phoneme category in the listener's native language is observed both in behavioral and neural responses. While these patterns of perception may be optimal for listeners when listening to their native language, such learned, early, and automatic insensitivity to L1 allophones may present challenges for L2 learners who are faced with the task of establishing a novel contrast among familiar pairs of target language phones. These findings prompt the question of whether and to what extent these patterns of perception can be overcome with experience. In particular, do L1 context-dependent allophones continue to play a role in L2 perception?
In this study we further investigate the acquisition of novel target language contrasts among L1 context-dependent allophones by L2 learners. We take advantage of the cross-linguistic differences in the mappings between the phones [d], [ð], and [ɾ] and their respective phoneme categories in English and Spanish. To this end, two experiments were conducted to investigate the representation and processing of three sound contrasts [d]-[ð], [d]-[ɾ], and [ð]-[ɾ] by three participant groups: English native speakers, Spanish native speakers, and advanced L1 Spanish late-learners of English.
We used an AX discrimination task as a behavioral measure of participants' sensitivity to various tokens of three nonword pairings [idi]-[iði], [idi]-[iɾi], and [iði]-[iɾi]. Following Boomershine et al. (
Magnetoencepholographic (MEG) recordings were also used to measure the detailed time-course of brain activity in each of the three listener groups. By making a three-way comparison of pre-attentive processing to the three phones of interest by Spanish, English, and L2 listeners we can gain insight into the interlanguage phonological representations of the L2 learners. By using the presence of an MMN as an index of category identification, we will be able to show whether L2 learners represent the phones [d], [ð], and [ɾ] as English speakers or Spanish speakers do. If early auditory brain responses are shaped by the functional significance of the sound categories in the listeners' native language (Kazanina et al.,
Three groups of participants were recruited to participate in these experiments for monetary compensation; 15 English native speakers (Female = 5, Male = 10, mean age = 22.3 years, range = 19–28), 15 Spanish native speakers (Female = 8, Male = 7, mean age = 34.7 years, range = 23–45), and 15 advanced L1 Spanish late-learners of English (Female = 8, Male = 7, mean age = 30.1 years, range = 24–38). The learner group had a mean age of exposure of 10.1 yrs (
The proficiency of each of the listener groups was assessed by self report. Participants were asked to rate their abilities in the areas of speaking, listening, reading, and writing on a scale of 1–10 (where 1 = poor and 10 = excellent) in both Spanish and English. The English speaker means were 10 (
Materials for our experiments consisted of 10 natural tokens of each of the following VCV sequences: [idi], [iði], [iɾi] spoken by a single female speaker of American English with phonetic training. Multiple instances of each stimulus type were recorded using a head-mounted microphone in a soundproof room. The vowel [i] was chosen for the vowel context because Spanish [i] and English [i] have the greatest perceived similarity by listeners of both groups (Flege et al.,
One challenge for this kind of design is ensuring that the tokens used are relatively natural exemplars across both languages. We examined a number of acoustic parameters to determine to what extent this was true of the current stimuli. The initial [i] of each token had a duration of 160 ms, intensity of 77 dB, F0 of 190 Hz, F1 of 359 Hz, F2 of 2897 Hz, and F3 of 3372 Hz. The initial [i] was cross-spliced with the natural consonant and final [i] productions. The files were matched from positive going zero-crossing to positive going zero-crossing. The final [i] tokens had a mean duration of 177 ms (
To ensure that participants in the study also identified the stimuli as instances of the intended category, each performed a brief identification task following the MEG recording and the AX discrimination task. Participants were presented with 40 stimuli (each of the 30 experimental items and 10 filler items) and were instructed to use the keys 1, 2, and 3 to identify the stimulus they heard. Naturally, the labels for the identification task had to vary across language, such that the English speakers were asked to label stimuli as an instance of a nonword “eithee,” “eady,” or “other” and the Spanish speakers as the nonwords “idi,” “iri,” or “other.” In order to implement the task in a similar way across groups we had to decide which labeling to request from the Learners. Given that our primary interest in the identification task was to learn if our stimulus tokens would be categorized as instances of the expected stimulus type in the listeners' L1, we opted to use L1 labeling options for all three listener groups.
Figure
During the AX discrimination task participants wore headphones and were seated in a quiet room in front of a computer. The presentation of experimental stimuli was controlled by DMDX (Forster and Forster,
Magnetic fields were recorded in DC (no high-pass filter) using a whole-head MEG device with 157 axial gradiometers (Kanazawa Institute of Technology, Kanazawa, Japan) at a sampling rate of 1 kHz. An online low pass filter of 200 and a 60 Hz notch filter were applied during data acquisition. All stimuli were presented binaurally via Etymotic ER3A insert earphones at a comfortable listening level (~70 dB). MEG recording sessions included 4 runs: 1 screening run and 3 experimental blocks which are described in greater detail below. Participants passively viewed a silent movie during the experimental runs to avoid fatigue. Each MEG recording session lasted approximately 90 min in total.
In the screening run, participants were presented approximately 100 repetitions of a 1 kHz sinusoidal tone. Each tone was separated by a randomly chosen ISI of 1000, 1400, or 1800 ms. Data from the screening run were averaged and examined to verify a canonical M100 response. The M100 is an evoked response which is produced whenever an auditory stimulus has a clear onset and is observed regardless of attentional state (Näätänen and Picton,
In the experimental blocks, stimuli were presented using a modified version of the optimal passive oddball paradigm (Näätänen et al.,
The experimental procedures were completed in the following order for all participants: [1] participants were provided an overview of the procedures and provided their informed consent, [2] participants completed a language background and handedness questionnaire to ensure they met the study requirements, [3] MEG recordings were made, and [4] AX discrimination and identification data were collected.
Data from four Spanish participants (S003, S004, S011, S014) whose performance was at or below chance (i.e., 50% accuracy) on the control contrast (i.e., [iði]-[iɾi]) were excluded from subsequent AX discrimination analyses. For the remaining participants, d' scores were computed for each individual and each different pair according to the Same-Different Independent Observations Model (Macmillan and Creelman,
MEG data were imported into Matlab and de-noised using a multi-shift PCA noise reduction algorithm (de Cheveigné and Simon,
For each participant, the 10 strongest left hemisphere channels (5 from left anterior, 5 from left posterior) were identified and selected visually in MEG160 from the peak of the average M100 response to 1 kHz tones elicited during the auditory localizer pre-screening test. Because the MMNm to phoneme prototypes has been found to be stronger in the left hemisphere than in the right (Näätänen et al.,
We created a single summary deviant response for each of the three contrasts by averaging together the two relevant deviant responses. For example, for the [iɾi]-[iði] control contrast, we averaged together the response to [iði] deviants in an [iɾi] block and the response to [iɾi] deviants in an [iði] block. The averaged responses elicited by standards were also pooled, resulting in a single summary standard response. The grand average waveform from −100 ms pre-stimulus to 800 ms post-stimulus was then computed for language group by averaging across participants (
The mean RMS power over a single 100 ms time window from 310 to 410 ms for each of the participants for each of the experimental conditions was computed. This time window was chosen because the vowel offset and consonant onset occurred at 160 ms and the MMN is expected to occur about 150–250 ms following the onset of a detectable change. Our statistical comparisons used linear mixed effects modeling to examine whether the difference in the mean RMS of the response to deviants and the response to standards reached significance over the MMN time window (310–410 ms).
Statistical analyses of d' scores were performed with linear mixed effects modeling using R package
We conducted nine planned tests of our experimental hypotheses regarding listeners' sensitivity to allophonic vs. phonemic contrasts using simultaneous tests for general linear hypotheses with the
For the [d]-[ɾ] contrast, which is phonemic in Spanish, but allophonic in English, it was expected that the L1 Spanish listeners would outperform the English listeners. This prediction was also borne out. The English listeners performed significantly worse than both the Spanish listeners (β = 1.56,
For us the most important question is what level of discrimination performance Spanish late-learners of English would show on a contrast that is phonemic in English but allophonic in Spanish (i.e., [d]-[ð]). First, as expected, the Spanish group performed significantly poorer on this contrast than the English listeners (β = −0.99,
We again used linear mixed effects modeling in R to conduct the statistical analyses of mean RMS amplitude over the 310–410 ms time window. Our first linear mixed effects analysis was designed to confirm that there were no reliable differences between the responses to the different standards. This is important to establish because we would like to collapse across the response to standards in our subsequent critical planned comparisons of the MMN response by contrast. Analyses of mean RMS amplitude for the response elicited by the standards consisted of fixed effects Language Group (English, Learner, Spanish) and Standard Type ([idi] standard, [iði] standard, [iɾi] standard), as well as Language Group × Standard Type interaction and subject as random effect. These statistical analyses revealed no significant results, suggesting that the mean power elicited by standard stimuli did not differ by Language Group [
Figure
In our statistical analyses of the listeners' responses to deviants, we conducted three planned comparisons separately for each listener group using simultaneous tests for general linear hypotheses with the
As expected for the English listeners, the response to the control contrast [iði]-[iɾi] was larger than the response to the standard stimuli (β = 21.21,
Unfortunately, the MMN responses for the Spanish listeners followed none of our predictions. We found a marginal difference between the response to the standard stimuli and the [idi]-[iði] pair which are phonologically related in the language (β = 15.23,
For the critical learner group, the MMN results followed the pattern predicted according to the hypothesis that learners successfully implemented the phonological knowledge of their second language at an early, pre-attentive stage of processing. A significant difference was observed between the standards and L1 allophonic contrast [idi]-[iði] (β = 20.31,
In this study we explored the impact of phonological knowledge on perceptual categorization, particularly in cases in which the phonemic status in a late-learned second language directly conflicts with the native language. Our Spanish and English listeners demonstrated greater sensitivity for nonword pairs distinguished by phonemic than by allophonic contrasts on an AX discrimination task, mirroring previous findings. Interestingly, Spanish late-learners demonstrated sensitivity (large d' scores and MMN responses) to all three contrasts, suggesting that these L2 learners may have established a novel [d]-[ð] contrast despite the phonological relatedness of these sounds in the L1. We discuss each of these findings in turn.
Our behavioral findings from the native speaker groups provide support for the hypothesis that listeners form equivalence classes on the basis of phoneme categories. In particular, we observed better discrimination of the [idi]-[iɾi] contrast by Spanish listeners for whom the pair are phonemic than by English listeners for whom the pair is allophonic in their L1. Similarly, English listeners outperformed Spanish listeners in the discrimination of the [idi]-[iði] pair which is phonemic in English, but allophonic in Spanish. Finally, both Spanish and English listener groups performed comparably well on the [iði]-[iɾi] control contrast which is a phonemic distinction in both languages. These results replicate previous behavioral findings from Boomershine et al. (
The MEG data also provides partial support for the hypothesis that listeners establish equivalence classes on the basis of phonemes. Given this hypothesis, we expected to observe an MMN when the stimulus presented as the deviant is in contrast in the listener's native language with the stimulus serving as the standard in an experimental block, but not when the standard and deviant are phonologically related as allophones of the same phoneme in the listeners' L1. As expected, a significant MMN was observed for the [iði]-[iɾi] control contrast, but not for the allophonic [idi]-[iɾi] contrast for English listeners. Contrary to our expectations, however, no MMN was observed for the phonemic [idi]-[iði] pair. In contrast with the data from the English listeners, the results for the Spanish listeners did not provide support for our hypothesis. A significant MMN was observed for the [idi]-[iði] contrast, which is allophonic in Spanish, while no MMN was observed for either the [idi]-[iri] or the [iði]-[iɾi] pair which are phonemic in Spanish.
It is not clear how to explain the unexpected MMN patterns observed in the two native listener groups. First, any explanation based on poor stimulus quality seems inconsistent with the behavioral data, which showed the predicted pattern of discrimination across groups for all contrasts (although it is of course logically possible that the behavioral responses were based on a late-stage process that the early MMN does not reflect). Second, it is not clear how any simple explanation based on the acoustic properties of the stimuli could explain the cross-linguistic differences in responses. However, we note that the only surprising datapoint in the English listener data was the absence of a significant MMN in the phonemic [idi]-[iði] contrast, but that the response was trending in the right direction. Therefore, we might speculatively attribute this result to a Type II error.
One factor that may have reduced our power to detect MMN differences in the current paradigm is that the position of the deviant within the standard stream was somewhat more predictable than in many MMN studies. In our experiment, a deviant always occurred after either 4, 5, or 6 intervening standards. Previous work has demonstrated that when the position of a deviant within the standard stream is completely predictable, the MMN is almost completely neutralized (see Sussman et al.,
In addition, the slightly non-canonical status of the speech stimuli as neither perfectly English-like nor perfectly Spanish-like may have caused some of the unexpected MMN patterns observed in the Spanish and English groups. In an active task like AX discrimination, increased attention might mitigate the impact of slightly non-canonical tokens on categorization, but in a passive listening mode, as in the MMN paradigm, participants might not have automatically perceived and grouped the tokens according to their native speech categories. On the other hand, the bilingual participants might be more permissive of irregularities even in passive listening, based on their exposure to different distributions of sounds across the two languages. Strange and Shafer (
Our primary research question asked whether advanced L1 Spanish late-learners of English overcome learned insensitivities to L1 context-dependent allophones and acquire a new target-language contrast among familiar phones [d] and [ð]. The behavioral and neural data from L2 learners which we report here converge to suggest that the answer to this question is affirmative. On both tasks we observed no difference between learners' ability to discriminate between phone pairs which are L1 allophones and L1 phonemes, suggesting that they do not classify the two phones as allophones of the same underlying phoneme category. That is, with experience, the advanced L2 learners in our study have acquired adequate knowledge of the L2 phonological system to distinguish the English /d/-/ð/ contrast in perception. Moreover, this learned sensitivity is observable both behaviorally, and in listeners' early, pre-attentive brain responses. We note that the neural data must be interpreted somewhat more cautiously than the behavioral data. Although the MMN pattern observed in the late-learner group was exactly what was predicted if they had successfully acquired the L2 phonological system, the two native listener groups did not show the MMN patterns predicted based on their L1 phonology, as described above. Therefore, further replication will be needed to confirm the interpretation of the MMN pattern in the late-learner group.
Given that our behavioral and MEG data from our Learner group was elicited in an English language context (all testing was conducted in an English speaking environment and all interactions and instructions were given in English), we might have expected the Learners' neural and behavioral responses to look maximally English-like (i.e., discriminating [d]-[ð] and [ð]-[ɾ], but not [d]-[ɾ]). However, this was not what was observed (contra García-Sierra et al.,
A question that arises naturally from our learner data is: how do L2 learners acquire the ability to perceive novel target language contrasts among familiar phones? In particular, what is the role of the input in shaping the learners' hypotheses about the phonological system they are acquiring, and how do learners' expectations about the characteristics of the target language influence the learning process. With respect to mechanisms, three possibilities have been discussed in the infant literature (Seidl and Cristia,
Another possibility is that listeners' make use of distributional information (Maye et al.,
Lexical mechanisms of various sorts have also been proposed, such as knowledge of word meanings and knowledge of words' phonological forms. For example, the availability of minimal pairs has been shown to enhance the perception of nonnative phonetic contrasts in both infants and adults (Hayes-Harb,
Finally, in addition to the implicit learning mechanisms mentioned above, it has also been suggested that adults might avail themselves of explicit learning mechanisms and that these may serve to initiate the acquisition process (Shea,
In sum, the behavioral and neural results presented here suggest that phonological relatedness influences perceived similarity, as evidenced by the results of the native speaker groups, but may not cause persistent difficulty for advanced L2 learners in perception. Instead, L2 learners overcome learned insensitivities to L1 allophones in perception as they gain experience with the target language. These findings provide a starting point to investigate when and how this learning takes place, as well as determine the respective contributions of the proposed mechanisms to the acquisition of novel target language contrasts among L1 context-dependent allophones in the L2.
Conceived and designed the experiments: SB, AN, EL, NF, WI. Performed the data collection: SB, AN. Analyzed the data: SB, EL, WI. Wrote the manuscript: SB. Revised the manuscript critically for important intellectual content: SB, AN, EL, NF, WI. Provided final approval of the version to be published: SB, AN, EL, NF, WI. Agree to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved: SB, AN, EL, NF, WI.
This research was partially supported by a University of Utah University Research Committee Faculty Research and Creative Grant awarded to SB, as well as a University of Maryland, Department of Linguistics, Baggett Scholarship awarded to AN.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
We are grateful to the audiences of the 2014 Second Language Research Forum, the University of Maryland Cognitive Neuroscience of Language Lab, the University of Utah Speech Acquisition Lab, and the reviewers and editor for their comments and suggestions.
The Supplementary Material for this article can be found online at:
1Waltmunson (
2Patterson and Connine (
3It is is worth noting that to attain truly native-like competence in English, the Spanish learner must also learn to treat the phones [d] and [ɾ] as allophones of the same phoneme in English. Since this would involve the joining of L1 allophones, we might call this learning scenario ‘allophonic union.’
4Other related work in L2 phonology has investigated the acquisition of positional variants in the target language by L2 learners in production (Zampini,
5It is worth noting that these results were observed despite the fact that [d] does not occur naturally in an intervocalic environment in either Spanish or English.
6It is worth noting that, in addition to language experience, the participants in the Spanish speaking group likely differ from the listeners in the other two groups in a number of other respects, including SES, level of education, experience and level of comfort working with computers, etc. While it may have been possible to find a better matched group of Spanish speakers elsewhere, we were constrained by location of accessible MEG equipment. This is not an obvious concern for our MEG data (which requires no behavioral response), but could impact the quality of our behavioral data which required participants to respond by pressing buttons on a computer keyboard.
7Simonet et al. (