Cognitive and Electrophysiological Correlates of the Bilingual Stroop Effect

Naylor, Lavelda J.; Stanley, Emily M.; Wicha, Nicole Y.

doi:10.3389/fpsyg.2012.00081

ORIGINAL RESEARCH article

Front. Psychol., 02 April 2012

Sec. Cognition

Volume 3 - 2012 | https://doi.org/10.3389/fpsyg.2012.00081

This article is part of the Research Topic Interfaces Between Language And Cognition View all 18 articles

Cognitive and Electrophysiological Correlates of the Bilingual Stroop Effect

Lavelda J. Naylor¹

Emily M. Stanley²

Nicole Y. Y. Wicha^1,3,4*

¹ Department of Biology, University of Texas at San Antonio, San Antonio, TX, USA
² Department of Psychology, University of Delaware, Newark, DE, USA
³ Neurosciences Institute, University of Texas at San Antonio, San Antonio, TX, USA
⁴ Research Imaging Institute, University of Texas Health Science Center at San Antonio, San Antonio, TX, USA

The color word Stroop effect in bilinguals is commonly half the magnitude when the written and naming languages are different (between) than when they are the same (within). This between-within language Stroop difference (BWLS) is likened to a response set effect, with greater response conflict for response relevant than irrelevant words. The nature of the BWLS was examined using a bilingual Stroop task. In a given block (Experiment 1), color congruent and incongruent words appeared in the naming language or not (single), or randomly in both languages (mixed). The BWLS effect was present for both balanced and unbalanced bilinguals, but only partially supported a response set explanation. As expected, color incongruent trials during single language blocks, lead to slower response times within than between languages. However, color congruent trials during mixed language blocks led to slower times between than within languages, indicating that response-irrelevant stimuli interfered with processing. In Experiment 2, to investigate the neural timing of the BWLS effect, event related potentials were recorded while balanced bilinguals named silently within and between languages. Replicating monolingual findings, an N450 effect was observed with larger negative amplitude for color incongruent than congruent trials (350–550 ms post-stimulus onset). This effect was equivalent within and between languages, indicating that color words from both languages created response conflict, contrary to a strict response set effect. A sustained negativity (SN) followed with larger amplitude for color incongruent than congruent trials, resolving earlier for between than within language Stroop. This effect shared timing (550–700 ms), but not morphology or scalp distribution with the commonly reported sustained potential. Finally, larger negative amplitude (200–350 ms) was observed between than within languages independent of color congruence. This negativity, likened to a no-go N2, may reflect processes of inhibitory control that facilitate the resolution of conflict at the SN, while the N450 reflects parallel processing of distracter words, independent of response set (or language). In sum, the BWLS reflects brain activity over time with contributions from language and color conflict at different points.

Introduction

The Stroop effect has captivated researchers for over 75 years and has resulted in a vast (and daunting) body of literature. Versions of the Stroop paradigm have been used to study diverse cognitive phenomena, like selective attention, inhibition and executive control, conflict detection and monitoring, and automaticity and lexical access (see MacLeod, 1991), and have been used clinically to test for deficits in many areas (Green et al., 2010; Peckham et al., 2010; Pukrop and Klosterkötter, 2010). In the field of bilingualism, the Stroop paradigm has been commonly used to analyze the degree of interference or alternatively the degree of automaticity of access to words in each language and across languages (see Francis, 1999, for a review). The color word Stroop task (Stroop, 1935) has participants name the color of words printed in congruent (RED in red) or incongruent ink color (RED in green). The Stroop effect occurs when incongruent items elicit slower naming times than congruent items, which is generally thought to reflect interference due to the automaticity of reading words compared to naming colors. Bilinguals add the complexity of being able to perform the Stroop task in both of their languages. Moreover, the languages used for the distracter words and naming can match (within) or not (between), such that interference within each language and between languages can be measured. Because the Stroop paradigm taps into a complex set of cognitive processes, there is still much debate over the nature of this powerful effect. The goal of the current study is to examine the behavioral and neural correlates of the bilingual Stroop task to inform word access, attention, and inhibition in the bilingual brain, as well as the nature of the Stroop effect more generally.

The Stroop effect has commonly been explained as a response level conflict, by accounts like the relative speed of processing – where competition occurs strictly at response, in having to choose the color over the faster processed word – and automaticity of access – where faster spread of activation throughout a network of concepts, and inversely smaller attentional demands, occurs for more automatic processes, like reading than naming (see MacLeod, 1991). Connectionist models of the Stroop, such as Cohen et al.’s (1990) model propose that interference can arise from any level of processing, from input to output. Information from the color and the word are processed in parallel in a distributed network with interconnections that are weighted based on experience. Attention plays a critical role in tuning these weights, such that an attentional set can be created for the specific task and even the specific response set simply by virtue of the strength of the connections between the attended items. MacLeod (1991; MacLeod and MacDonald, 2000) has argued that connectionist models present a more parsimonious account of the many factors that affect performance on Stroop tasks, accounting for both the speed of processing and automaticity differences. However, these models do not fully address the nature of the bilingual Stroop.

The Stroop effect is modulated by factors unique to operating in a bilingual mode. There is even some evidence that bilinguals can perform better on the Stroop task compared to monolinguals (Bialystok et al., 2008), a skill thought to emerge from the cognitive demands of managing two languages. Individual factors, such as dominance and relative proficiency in the languages (Mägiste, 1985; Chen and Ho, 1986; Tzelgov et al., 1990; Francis, 1999; Rosselli et al., 2002; Zied et al., 2004; Gasquoine et al., 2007), and form level factors of the stimuli, such as orthographic or phonological overlap between the languages (Preston and Lambert, 1969; Roelofs, 2003), both affect performance on the Stroop task. Bilinguals with one dominant language (herein, unbalanced bilinguals) experience greater Stroop interference when performing in the dominant than weaker language on within language trials, and experience more interference from distracter words written in the dominant than the weaker language on between language trials. In contrast, bilinguals with equivalent proficiency in both languages (herein, balanced bilinguals) generally exhibit no difference in the amount of interference across their languages, both naming within or between languages. This dynamic has been shown to change as the relative proficiency of a bilingual’s languages changes (Mägiste, 1984, 1985; Chen and Ho, 1986).

In addition, bilinguals experience different magnitude of Stroop interference based on the degree of overlap of the word forms across languages (Sumiya and Healy, 2004). When color words share orthographic features across languages (green, grun) the magnitude of the Stroop effect is equivalent within a language (written and naming languages are the same) and between languages (Roelofs, 2003). However, when there is no orthographic overlap across languages (black, schwarz) the within language Stroop effect (incongruent versus congruent) is on average twice the magnitude of the between language effect (Francis, 1999). This has been referred to recently as the within language Stroop superiority effect (WLSSE; Goldfarb and Tzelgov, 2007), but we feel this inappropriately deemphasizes the importance of the between language effect. Therefore, we refer to this between-within language Stroop difference herein as the BWLS or the bilingual Stroop effect, interchangeably. This phenomenon was first observed by Dalrymple-Alford (1968), Dyer (1971) and Preston and Lambert (1969) and has since been replicated across several languages and tasks (Dyer, 1971; Chen and Ho, 1986; Tzelgov et al., 1990; Goldfarb and Tzelgov, 2007; see reviews by MacLeod, 1991; Francis, 1999). Spanish and English bilinguals (our target sample) generally show this BWLS (Preston and Lambert, 1969; Dyer, 1971), with few exceptions (Rosselli et al., 2002).

Under the accounts of the Stroop effect discussed above, which do not directly address the bilingual language system, it is clear how the proficiency of a language could affect the automaticity and/or speed of processing of the words in each language, but it is not clear how within language distracters elicit a significantly larger effect than between language distracters without further restrictions on the processors. This complexity is a result of bilinguals having two lexical representations for a single concept (“red” and “rojo” for concept RED Okuniewska, 2007). There is growing support for a model of bilingual lexical access in which both languages are non-selectively activated, at least at some stages of word recognition, even if processing demand is restricted to one language (Green, 1998; Spivey and Marian, 1999; Dijkstra and Van Heuven, 2002; Rodriguez-Fornells et al., 2005; Costa et al., 2006; Sunderman and Kroll, 2006). These lexical items must be kept at bay when they are not needed, but there is less of a consensus about how bilinguals, particularly those with high proficiency in a second language, prevent cross language interference.

Some contend that a mechanism of inhibition is required (Green, 1998; Kroll et al., 2010), while others propose that only language relevant items are “flagged” when attending to one language on a task, creating an attentional set of plausible responses (Roelofs, 2003, 2010). A third account proposes a mechanism of access through activation thresholds similar to other connectionist models (Dijkstra and Van Heuven, 2002). Spread of activation can occur between languages at various levels of processing, from semantic (Dijkstra et al., 1998; Lemhöfer and Dijkstra, 2004) to orthographic (Dijkstra et al., 1998; Jared and Kroll, 2001), and as a function of proficiency (see also Sunderman and Kroll, 2006, for a different account). Only one of these models has addressed the BWLS directly, claiming that it is something equivalent to a response set effect in monolinguals (Roelofs, 2003, 2010; Goldfarb and Tzelgov, 2007).

A response set effect (or membership effect) is observed when distracter words that are actively used for responding on the task, e.g., GREEN, RED, YELLOW, BLUE, cause more interference (larger Stroop effect) than other color words that are not being actively used to respond, e.g., PINK (Klein, 1964; Proctor, 1978; Glaser and Glaser, 1989; Lamers et al., 2010). Most accounts of the response set effect propose that it occurs at response and not during access to meaning. Cohen et al. (1990) describe response set effects as occurring at the output level of processing by attentional selection of a set of relevant responses. In a slightly different account, Roelofs (2003, 2010) restricts the response set effect to the response level, but does so by “flagging” the response relevant items at the conceptual level in the multi-tiered WEAVER++ model. The flag results in setting and maintaining an attentional set for the response relevant items (see also Treisman and Fearnley, 1969), shielding valid responses from interference anywhere except at the output layer (response selection). Hence, response set effects elicit response conflict, not because the response-irrelevant words elicit competing responses directly, but rather by spread of activation to the response set at the semantic level. It has been argued that this attentional set account can better explain the response set effect than models that propose inhibition of irrelevant responses during stimulus evaluation (see Lamers et al., 2010). Roelofs has argued that the BWLS can be explained parsimoniously with monolingual data as a response set effect. Similar to the word PINK in the example above, the between language words, that is words that are viewed but not actively prepared for naming, e.g., VERDE, ROJO, AMARILLO, AZUL, receive less activation than the equivalent within language response set of words. In this way, the BWLS effect would be caused by differential spread of activation from the response set to related color words in the other language. If this is the case, then there should always be greater activation for response set items, and color incongruent items should be named more slowly for the response relevant than irrelevant language. Similarly, the neural correlate for the BWLS should reflect this differential spread of activation, perhaps as a modulation of amplitude from response relevant to irrelevant but related words.

This is the first study to use event related potentials (ERP) to address the source of the BWLS. In recent history, the debate over the source of Stroop interference, more generally, has been informed by electrophysiological techniques, which provide a way of experimentally disentangling semantic and response level effects. Scalp-recorded ERP, which have extraordinary temporal resolution (on the order of milliseconds), are especially well suited to investigate the timing of cognitive events. Early ERP studies of Stroop interference focused on the P300 –a component found to vary in latency with stimulus evaluation, but not response selection (Kutas et al., 1977; for a review of the P300, see Polich, 2007). Since the P300 latency is insensitive to color congruence on the Stroop task, the Stroop effect must occur later in processing, that is at response selection (Duncan-Johnson and Kopell, 1981; Ilan and Polich, 1999; Rosenfeld and Skogsberg, 2006; however Lansbergen and Kenemans, 2008, found modulation of P300 with low probability of Stroop trials).

In fact robust Stroop effects have been observed later in time at the N450 (or medial frontal negativity – MFN) and the conflict sustained potential or SP (Rebai et al., 1997; West and Alain, 1999; Liotti et al., 2000; Markela-Lerenc et al., 2004; West et al., 2004, 2005; Larson et al., 2009). While the functional significance of these components is not yet fully understood, they are thought to index different levels of conflict processing and are distinguished both by what modulates them and topographical distribution. The conflict SP, which can range in latency and duration based on task demands, generally occurs after the N450, showing increased amplitude for color incongruent than congruent trials (West and Alain, 1999; Liotti et al., 2000; West, 2003; Markela-Lerenc et al., 2004; West et al., 2005; Larson et al., 2009). The activity in this window may reflect a complex of cognitive processes, including response selection, and response monitoring and conflict adaptation, respectively by region of the SP (West et al., 2005; Chen et al., 2011).

The N450 precedes the SP as a medial fronto-central negativity between 300 and 500 ms post-stimulus onset. It is more negative in amplitude for color incongruent than color congruent stimuli, and increasing the degree of conflict increases N450 amplitude (West and Alain, 2000). Though its timing can vary with task demand, the N450 has been observed on a variety of Stroop-like tasks (West et al., 2005), with both covert (silent naming) and overt (naming aloud) responses (Liotti et al., 2000). The component’s neural generators have been source localized to the anterior cingulated cortex (ACC; West, 2003; Markela-Lerenc et al., 2004). Some have argued that the ACC is responsible for “directing attention to a goal, even in the absence of conflict” (MacLeod and MacDonald, 2000), while others contend that it is responsible for conflict detection and monitoring (Van Veen and Carter, 2002; Carter and Van Veen, 2007) and that separate parts of the ACC respond to semantic (stimulus) and response conflict (Roelofs, 2003; van Veen and Carter, 2005; Wendt et al., 2007; Aarts et al., 2009; Bialystok and Craik, 2010). At least one study suggests that the ACC should be more involved in between- than within language processes (Abutalebi et al., 2008) to prevent interference from the non-target language.

The N450 effect has been observed for both response and non-response type conflict on a counting task, suggesting that it might be sensitive to both incongruent but response eligible (i.e., response set) and incongruent but response ineligible items (West et al., 2004). This would suggest that both within and between language words might modulate N450 amplitude. However, a more recent study showed that only response conflict, and not stimulus conflict, modulated the N450 on a 2-1 mapping color word Stroop task (Chen et al., 2011). By mapping two color words to one finger (index finger, BLUE/GRAY; middle finger, GREEN/WHITE; ring finger, YELLOW/PURPLE), the source of conflict was parsed by presenting trials with color incongruent words that created stimulus (GREEN/WHITE) or response (and stimulus) conflict (YELLOW/GRAY; Chen et al., 2011). N450 amplitude was more negative for response incongruent than color congruent trials, but no different for stimulus incongruent and congruent trials. Based on these findings, the BWLS may be reflected as a modulation of the N450, with a larger Stroop effect for between than within language trials.

Finally, response set (and the BWLS) may modulate earlier ERP components than the N450 and conflict SP, in particular the N2 (Folstein and Van Petten, 2008). Although the conflict N2 has not been robustly elicited in a Stroop task (West et al., 2005), its amplitude increases with increasing magnitude of conflict on other tasks, like the Eriksen flanker task (Van Veen and Carter, 2002; Wendt et al., 2007). If the conflict N2 is sensitive to the degree of conflict on the bilingual Stroop task, then greater N2 amplitude might be expected for within than between language distracters. Alternatively, attention to response relevant information, or attentional set, specifically in word recognition tasks, has been shown to modulate N2 (or N200) amplitude with increased negativity for attention to orthographic features of a word (Ruz and Nobre, 2008; see also Grainger et al., 2006, for a similar component that is modulated by orthographic processes in a priming paradigm). The N2 has been modulated on bilingual tasks that focus attention on one language at a time or cause a switch between languages (Jackson et al., 2001; Rodriguez-Fornells et al., 2005). In addition, Proverbio et al. (2009) found that bilinguals can use orthographic information to distinguish between real and pseudo native language words (Italian) as early as 160–180 ms. Hence, the language of response relevant words in the bilingual Stroop task may be detected and processed early, reflected by modulation of the N2 (see Atkinson et al., 2003, for early perceptual effects in a Stroop task).

The current study used behavioral and electrophysiological measures to investigate how Spanish–English bilinguals process language and color congruence in a modified bilingual Stroop task across two experiments. Our central aims were to investigate (1) the unique contribution of language incongruence in the bilingual Stroop paradigm and (2) the temporal dynamics and neural correlates of cognitive control in balanced bilinguals while performing a bilingual Stroop task. In Experiment 1, we collected response time (RT) and error data across single and mixed language blocks to determine the pattern of within and between language effects for our sample (Spanish–English bilinguals) and to explore the possibility that balanced and unbalanced bilinguals use different strategies in mixed versus single language context to manage cross language interference. In Experiment 2, we collected ERP data using EEG to record brain activity while balanced bilinguals performed the single language blocks from Experiment 1 both overtly (for behavioral analysis) and covertly (for ERP analysis) to determine the source of the bilingual Stroop effect or BWLS.

Part I

Experiment 1

The primary goal of Experiment 1 was to determine the pattern of within- and between language Stroop effects in our sample population of Spanish–English bilinguals. We manipulated several variables that had been tested separately in previous studies to attempt to create a complete picture within the same individuals. First, researchers have been inconsistent in their method of categorizing their study population, which may account for the variability in observing the BWLS across studies (e.g., Rosselli et al., 2002). Here we use a battery of independent measures to categorize our participants into separate groups, as proficient balanced bilinguals and bilinguals with a dominant language. Based on previous findings, we expected to observe a BWLS for both groups, but predicted that language dominance would play a role in the size of the BWLS, with larger effects when reading the dominant than non-dominant language (Dyer, 1971). Alternatively, balanced bilinguals might not show a BWLS effect if the strength of the connections for words is equivalent between and within languages. Second, previous research has shown that performance can be affected by the presence of two language simultaneously (mixed language blocks) compared to processing a single language (e.g., Christoffels et al., 2007). This may be due to the specific strategy adopted to cope with each stimulus type. We included both mixed and single language blocks to test the robustness of the BWLS. We predicted that the BWLS would be observed for both types of stimuli, but that the nature of the BWLS could vary. Specifically, interference in the form of slower RTs would be smaller during single than mixed language blocks, since the distracter language could be consistently inhibited. Finally, if the BWLS is the equivalent of a response set effect in monolinguals then color-naming times should always be slower for within language than between language trials.

Methods

Participants. Ninety-two Spanish–English bilinguals, recruited from the University of Texas at San Antonio (UTSA) and the University of Texas Health Science Center San Antonio (UTHSCSA) were paid for their participation. Data was excluded for 6 participants due to experimenter error or equipment failure and 12 participants as outliers (±2 SD from the mean) based on RT (4), accuracy (2), language dominance (4), or age¹ (2). The remaining 74 participants (mean age 25.88 years, SD = 6.56, range = 18–46 years, and handedness: right = 70, left = 4) included 50 women and 24 men, 68 (91.9%) of which reported being of Hispanic origin. All participants had normal or corrected-to-normal vision and reported no cognitive or physical impairments that could affect their performance on the task.

Language profiles. A total of 12 verbal fluency tests (VFT) were used to screen potential participants by phone; 1 min was given per test to name as many words as possible beginning with F, A, or S for English and P, T, or M for Spanish, or that fit into the categories of fruits, vegetables, or animals in each language. Proper names, repetition and variations of the same word were excluded; the number of remaining words were averaged for each language separately. Individuals with a minimum five-word average in the non-dominant language were subsequently tested on-site with a series of language measures. The 60-picture Boston naming test (BNT: Kaplan et al., 2001) was administered untimed in one language then the other. The order of languages tested on the VFT and BNT was counterbalanced across participants. The language history questionnaire (LHQ) assessed, for each language, the age of exposure, percent daily use and self-assessed ability in reading, writing, comprehension, and listening (measured on a scale of 1–7 with “beginner” at 1, “intermediate” at 4, and “native speaker” at 7). Finally, word-reading (color words in black font) and color-naming (color circles) times were measured in each language (random order per participant; 1 40-trial block for each task/language with 10 presentations of each item). In addition to the language battery, participants completed a biographical questionnaire (e.g., age, ethnicity, and hearing and sight conditions) and an abridged version of the Edinburgh Handedness Inventory.

Boston naming test scores and reading and naming times were used as objective productive-language measures to group participants as balanced (N = 24) or unbalanced bilinguals (N = 50)². Participants were operationally defined as balance bilinguals if they had at least two of the three following language scores: (1) a non-significant difference (t-test, p < 0.05) between Spanish and English reading times or (2) naming times and (3) a difference of 10 points or less between their Spanish and English BNT scores. Unbalanced bilinguals performed better (i.e., faster, more accurately and named more pictures) in the same language on at least two of the three measures³. Table 1 shows performance on the language measures for each group.

TABLE 1

Table 1. Language profile means (SD), for balanced (Experiments 1 and 2, N = 24) and unbalanced (Experiment 1, N = 50) bilinguals.

Materials and procedure. Qualified participants read and signed a consent form under the guidelines of UTSA’s and UTHSCSA’s Institutional Review Boards for Human Subject Research, after which they sat approximately 55′′ away from a 19′′ color CRT monitor and named the font color of capitalized centered half-inch tall color words (GREEN, BLUE, YELLOW, RED, VERDE, AZUL, AMARILLO, ROJO). Each color word appeared equally in each of the four font colors (green, blue, yellow, red). Stimuli were randomized and presented on a light gray background using E-Prime software (Psychological Software Tools, Inc., Pittsburgh, PA, USA). Each trial started with the presentation of three fixation crosses (“+++”; randomly 500–750 ms duration, with 200 ms blank screen ISI), followed by the stimulus (150 ms duration with 200 ms blank screen ISI; per Liotti et al., 2000), then a single fixation cross (“+”) which remained on the screen until a verbal response was detected by the integrated voice-key of a PST serial response box by way of an external microphone (Psychological Software Tools, Inc., Pittsburgh, PA, USA). An additional microphone and digital recorder collected verbal responses for accuracy analyses.

A total of 8 blocks were presented, consisting of 96 trials each (768 total trials). In each block, half of the words were color congruent (CC, e.g., “RED” written in red) and half were color incongruent (CI, e.g., “BLUE” written in red), see Table 2 for sample stimuli. Naming language was held constant across an entire block with four blocks named in English, four in Spanish (naming language order was randomized per participant). Four blocks were presented in a single language (SL, two blocks of Spanish color words and two of English color words) and four in mixed languages (ML, Spanish and English color words in the same block). To manipulate language, half of the trials in mixed language blocks, and half of the blocks in single language blocks, were printed in the same language as the naming language (language congruent trials, LC), and half were not (language incongruent trials, LI). An equal number of trials were presented in each minimal contrast (e.g., ML–LC–CC versus SL–LC–CC). Each block was preceded by a short practice session that informed the participant in which language to name the font colors. The inter-block interval lasted no longer than 5 min and the entire session lasted approximately 1.5 h.

TABLE 2

Table 2. Sample stimuli.

Results

Error trials and accurate RTs were analyzed for each group separately. RTs in milliseconds were measured from the onset of the visual word to detection of the voice response (Balanced Bilinguals, M = 375.60, SD = 94.25; Unbalanced Bilinguals, M = 351.96, SD = 101.25). RTs more than ±2 SD away from the condition means and all response errors (defined as wrong font color response, wrong language response, or unintelligible response) were excluded from RT analyses. For balanced bilinguals, a 2 Block Type (single language, mixed language) × 2 Naming Language (English, Spanish) × 2 Color Congruence (congruent, incongruent) × 2 Language Congruence (congruent, incongruent) repeated-measures ANOVA was used. Since unbalanced bilinguals had a known dominant language in which they were expected to perform better, and that language was not always the same across participants, we collapsed across Naming Language to create a level of Language Dominance (dominant, non-dominant) in the ANOVA design. All planned contrasts were Bonferroni adjusted for multiple comparisons. When a Color Congruence × Language Congruence interaction was found, additional paired samples t-tests were conducted to evaluate the Stroop effect size (color incongruent minus color congruent trials) of within and between language interference (when naming and written languages were congruent and incongruent, respectively).

Unbalanced bilinguals.

Error analyses. Overall, unbalanced bilinguals made more errors on color incongruent than congruent trials [M = 3.5%, SD = 2.4% versus M = 0.7%, SD = 0.7%; F(1, 48) = 86.174, p < 0.001], and more errors on language congruent than incongruent trials [LC; M = 5.7%, SD = 0.8% versus LI; M = 1.7%, SD = 1.3%; F(1, 26) = 18.580, p < 0.001], Figure 1. Although there was a significant Color Congruence effect for both Within and Between language conditions (p < 0.001), the effect was significantly larger for language congruent than language incongruent trials; F(1, 48) = 22.087, p = 0.0001. Effects of Block Type and Language Dominance were not significant.

FIGURE 1

Figure 1. Main effects of color congruence and language congruence from Experiment 1. Mean proportion of incorrect responses and mean response times in milliseconds reported for each group: **p ≤ 0.001; *p ≤ 0.050; nsd, non-significant differences.

Response times analyses. Response times in milliseconds were analyzed for accurate trials only (M = 96.43%, SD = 2.33%). As expected, a robust Color Congruence effect was observed, with faster naming times on color congruent than incongruent trials [M = 309.73, SD = 97.42 versus M = 394.20, SD = 107.27; F(1, 49) = 361.458, p < 0.001], Figure 1. In addition, faster naming times were observed overall for language incongruent compared to congruent trials [M = 348.62, SD = 99.54 versus M = 355.30, SD = 103.97; F(1, 49) = 5.185, p = 0.027], and for single than mixed language trials [M = 348.58, SD = 101.70 versus M = 355.34, SD = 102.25; F(1, 49) = 3.882, p = 0.054]. These main effects were qualified by interactions between Color Congruence and Language Congruence, F(1, 49) = 32.078, p < 0.001, and Block Type: Color Congruence by Language Congruence by Block Type, F(1, 49) = 7.173, p = 0.010, and Language Congruence by Block Type, F(1, 49) = 33.042, p < 0.001, but not Color Congruence by Block Type. Analyses focusing first on the Color Congruence effect then the Language Congruence effect explain the source of these interactions.

The Color Congruence effect was observed both within and between languages (p < 0.001), but the effect was significantly larger (i.e., a larger difference between color congruent and incongruent trials) on language congruent (within language) than language incongruent trials [between languages; M_diff = 98.97, SD = 43.74 versus M_diff = 69.97, SD = 26.76, t(49) = 5.664, p = 0.001]. This classic between- versus within language Stroop effect difference, or BWLS, was present for both mixed- and single language presentation (p < 0.005), but was larger for mixed language trials [t(49) = 2.678, p = 0.010], Figure 2. Language congruent trials were slower than language incongruent trials only during single- (p < 0.001), and not mixed language presentation. Planned contrasts revealed an interesting pattern in the simple effects. The effect of Language Congruence for single language trials was carried by the color incongruent trials, Table 3. There was no effect of language congruence when color was congruent, but when color was incongruent, language congruent trials were significantly slower than language incongruent trials (p < 0.001), indicating that interference from the color incongruent distracter word was greater for the response relevant language. In contrast, for mixed language trials, there was an effect of language congruence both when color was congruent and incongruent, but the effects were opposite of each other, Figure 2.

FIGURE 2

Figure 2. Mean response times in milliseconds showing the interaction between color congruence and language congruence by block type and group from Experiment 1. Results are presented for unbalanced bilinguals (UB) for dominant (A,D) and non-dominant (B,E) naming languages separately and for balanced bilinguals (BB) collapsed across naming language (C,F). Panels A–C show results for blocks of stimuli presented in a single written language, collapsed across Spanish and English; panels D–F show results for stimuli presented alternately in Spanish and English in the same block. In all six plots, the effect of color congruence was significant at p ≤ 0.001 and this effect was significantly larger within than between languages at p ≤ 0.05. All other effects noted: **p ≤ 0.001; *p ≤ 0.050; nsd, non-significant differences. CC, color congruent; CI, color incongruent; LC, language congruent; LI, language incongruent.

TABLE 3

Table 3. Simple effects means (SD) in milliseconds.

Specifically, when color was congruent, language congruent trials were significantly faster than language incongruent trials (CCLC versus CCLI, p < 0.001), but when color was incongruent, language congruent trials were significantly slower than language incongruent trials (CILC versus CILI, p < 0.001). The language incongruent trials were slower overall during mixed than single language presentation (CCLI, p < 0.001; CILI, p < 0.004), indicating that the language of the distracter words caused more interference during mixed language presentation. The possible effect of strategy and processing of non-response set words is discussed below.

Finally, with regard to naming language, unbalanced bilinguals were faster overall when responding in their dominant than in their non-dominant language [M = 333.97, SD = 101.70 versus M = 369.96, SD = 104.46; F(1, 49) = 43.008, p = 0.001]. The effect of color congruence was modulated by language dominance [Color Congruence by Dominant Language, F(1, 49) = 7.535, p = 0.008; Color Congruence by Dominant Language by Block Type, F(1, 49) = 4.516, p = 0.039]. During mixed language presentation, the Color Congruence effect was the same whether naming in the dominant or non-dominant language; conversely, the effect of language dominance was the same for both color congruent and incongruent trials. However, during single language presentation, the Color Congruence effect was larger when naming in the non-dominant and reading the dominant language than vice versa; conversely, the difference between the dominant and non-dominant response languages was greater for color incongruent than color congruent trials [t(49) = 3.52, p = 0.001].

No other effects were significant.

Balanced bilinguals.

Error analyses. Data from 26 balanced bilinguals was included in the error analyses. One participant did not have complete accuracy data due to a voice-recording error on the last block of trails. Based on this individual’s percent errors on the other blocks (4.9%), we estimate that approximately 5 error trials were not accounted for here and were included in the RT analyses.

Overall, balanced bilinguals made more errors on color incongruent than congruent trials [M = 5.2%, SD = 4.45% versus M = 0.7%, SD = 0.71%; F(1, 23) = 24.311, p < 0.001], and more errors on language congruent than incongruent trials [M = 3.7%, SD = 3.09% versus M = 2.4%, SD = 2.07%; F(1, 23) = 13.725, p = 0.001], Figure 1. Although there was a significant Color Congruence effect both Within and Between language conditions (p < 0.001), the effect was significantly larger for language congruent than language incongruent trials [M_diff = 5.7%, SD = 1.17% versus M_diff = 3.6%, SD = 0.74%; F(1, 23) = 16.695, p < 0.001].

There were no main effects of Naming Language or Block Type. These factors did, however, interact: Naming Language × Block Type, F(1, 23) = 6.425, p = 0.019; Block Type × Naming Language × Language Congruence, F(1, 23) = 4.652, p = 0.042. These effects are consistent with a speed–accuracy trade off when naming in Spanish (see RTs below).

Response time analyses. Response times in milliseconds were analyzed for accurate trials only (M = 95.39% of total trials, SD = 3.21%; see text footnote 4). As with unbalanced bilinguals, balanced bilinguals showed a robust effect of Color Congruence, with faster naming times on color congruent than incongruent trials [M = 332.08, SD = 94.22 versus M = 424.13, SD = 98.95; F(1, 23) = 289.33, p = 0.001]. There was no main effect of Language Congruence, but Color Congruence and Language Congruence interacted, F(1, 23) = 14.257, p = 0.001. As with the error data, although a Color Congruence effect was observed both within and between languages (p < 0.001), the effect was larger on language congruent (within language) than incongruent trials [between languages; M_diff = 100.50, SD = 29.04 versus M_diff = 83.60, SD = 28.33; t(23) = 3.776, p = 0.001], see Table 3 and Figure 2.

There was no main effect of Block Type, and no interaction between Block Type and Color Congruence, or Block Type, Color Congruence, and Language Congruence, indicating that, contrary to unbalanced bilinguals, this within- versus between language difference on the color congruence effect was not larger during mixed- than single language presentation, Figure 2.

However, similar to unbalanced bilinguals, a Block Type by Language Congruence interaction revealed a trend for faster naming times on language incongruent than congruent items [M = 365.51, SD = 103.34 versus M = 382.21, SD = 93.29; F(1, 23) = 9.693, p = 0.005] on single language trials; language incongruent took longer than language congruent items on mixed language trials (M = 387.35, SD = 96.53 versus M = 377.37, SD = 100.71; p = 0.047), see Figure 2. No other interactions with Block Type reached significance.

Although the participants were considered balanced in their two languages based on performance on the language measures (see Table 1), naming times were faster overall in Spanish⁴ than English [M = 358.14, SD = 98.66 versus M = 389.58, SD = 100.31; F(1, 23) = 12.423, p = 0.002]. There were no significant interactions with Naming Language.

Discussion

The primary goal of Experiment 1 was to determine the pattern of within- and between language Stroop effects in our sample population of Spanish–English balanced and unbalanced bilinguals. In brief, we observed the classic Stroop effect, with longer RTs for color incongruent than congruent trials. This effect was observed both when the naming and reading languages were the same (within language) and when they were different (between language). In addition, we observed a larger Stroop effect within than between languages –the bilingual Stroop effect or BWLS, which was present across all conditions, regardless of group, block type or naming language (Figure 2). We discuss the BWLS effect in detail, beginning with naming language and block type effects for each group separately.

The pattern of Stroop effects was very similar for both groups of bilinguals. The primary difference between the groups was a larger Stroop effect for unbalanced bilinguals when naming in the non-dominant language – showing more cross language interference from reading the dominant than non-dominant language. Balanced bilinguals showed the same pattern in both languages. These findings are consistent with previous research (Dyer, 1971) and can be explained by a difference in automaticity of access to the words in each language based on dominance (Cohen et al., 1990; Kroll and Stewart, 1994). Interestingly, the language dominance effect was observed only for single language blocks, and disappeared on mixed language trials. This pattern reflects a differential mixing cost across the groups driven by the distracter language. Although naming was performed in a single language in the current study, unbalanced bilinguals exhibited a mixing cost in line with Christoffels et al. (2007), who observed mixing costs for German–Dutch unbalanced bilinguals on a picture-naming task, with longer RTs for mixed than single language trials. Perhaps the language dominance effect disappears in unbalanced bilinguals, because they experience more interference when naming between languages on mixed language trials, where reading both languages prevents one from becoming fully active as in the single language case.

Bilingual word recognition models, such as BIA+ (Dijkstra et al., 1998; Green, 1998; Dijkstra and Van Heuven, 2002), assume that some form of inhibition is required to allow one language to surface as the target (for an alternative view see the WEAVER++ model, Roelofs, 2003, 2010; Lamers et al., 2010). For bilinguals with asymmetric language dominance, stronger inhibition is required to keep the dominant language in check when operating in the weaker language, which in turn requires more effort to overcome in order to access the dominant language again. During single language presentation, the need to inhibit the distracter words on between language trials presents an asymmetric problem biased toward more interference from the distracters when naming in the non-dominant language. However, during mixed language presentation, the need to inhibit distracters from the stronger language is present both when naming in the dominant and non-dominant languages. Thus, the powerful effect of language dominance disappears when the languages are presented together.

An alternative explanation for the slower naming times on mixed than single language trials could be a cost from switching languages from trial to trial, in line with the idea that a language switch reverses activation and inhibition patterns in the languages (e.g., BIA+ or Green Inhibitory control model; Jackson et al., 2001; Moreno et al., 2002; Hernandez, 2009; Midgley et al., 2009). However, analyses of variance showed no difference in naming times between switch and non-switch trials in the mixed language blocks for either group, and switching did not interact with response language (no switch-cost asymmetry). Hence, the difference in the Stroop effect between mixed and single language blocks may be due to the mere presence of both languages, rather than switching costs per se. Activation and inhibition of the non-target language will be tested further in Experiment 2.

Despite these group differences, the presence of a between language Stroop effect across all conditions (groups, blocks, naming language) indicates that the words from the non-target language consistently cause interference, in line with our bilinguals performing in a “bilingual mode” (Grosjean, 1998) and contrary to findings that bilinguals can ignore the irrelevant language (Rodriguez-Fornells et al., 2002). The second and key finding from Experiment 1 was the presence of the bilingual Stroop effect or BWLS across all conditions. As discussed above, it has been proposed that the BWLS is simply a response set effect, equivalent to the effect observed in monolinguals (Roelofs, 2003, 2010; Goldfarb and Tzelgov, 2007). Bilinguals are thought to treat the color words in the other language as response-irrelevant, similar to irrelevant words in the same language, because they are not actively producing those words on a given block of trials. The BWLS arises from response conflict, but the source of the conflict may arise at output or at higher levels of processing (Cohen et al., 1990; Roelofs, 2003, 2010). To look for response set effects, it was necessary to look at the Stroop data in an unconventional way; rather than look for color-Stroop effects across languages, we looked at the effect of language in the presence or absence of color interference.

Figure 2 shows that although there was a BWLS in all conditions, the exact pattern of effects varied within each group differently by block type and naming language. This pattern provides only partial support for the response set explanation, where 2 things should be true. First, color congruent items should be named fastest for the response relevant than irrelevant language, due to the converging information in the color and word channels. This was observed consistently during mixed language presentation, regardless of language dominance (Figures 2D–F), indicating that the language of the distracter word can elicit naming interference in the absence of color interference (i.e., the word BLUE in blue versus the word AZUL in blue). However, this was not true during single language presentation (Figures 2A–C). In the absence of color-Stroop interference (color congruent trials – CC) there was an effect of language congruence only for unbalanced bilinguals when naming in their dominant language (Figure 2A). In this case, language congruent items were named slower than language incongruent items⁵. This interaction indicates that during mixed language presentation, the language of the distracter word can elicit interference in the absence of color interference (i.e., the word BLUE in blue versus the word AZUL in blue), which argues against a simple response set effect (Roelofs, 2003; Goldfarb and Tzelgov, 2007) or that the task-irrelevant language can be ignored (Rodriguez-Fornells et al., 2002). This may be due to the strength of the connections for the weaker language (e.g., Cohen et al., 1990), such that even processing a fully congruent word in the dominant language leads to slower color-naming times compared to reading a weaker cross language equivalent. However, the fact that there was no difference between language congruent and incongruent items for balanced and unbalanced bilinguals reading their dominant language (Figures 2B,C), indicates that response set did not play a role on color congruent trials. Overall, these effects suggest that bilinguals are able to control interference from the irrelevant language during single language presentation, perhaps through inhibitory mechanisms, but do less well when distracters are presented in both languages.

Second, if the BWLS is a response set effect then color incongruent items should be named slower for the response relevant than irrelevant language. This was true during single language presentation (Figures 2A–C), where there was consistently more interference from within language distracters (CILC) than between language distracters (CILI) regardless of naming language and in both groups. However, during mixed language presentation this difference was present only for unbalanced bilinguals naming in the dominant language (Figure 2A) and marginal (Figure 2B) or absent (Figure 2C) when reading a proficient language. In particular, for balanced bilinguals the source of the BWLS during single language presentation was greater interference within than between languages on color incongruent trials, but during mixed language presentation was caused by a language effect on color congruent trials and the absence of a language congruence effect on color incongruent trials. Therefore, although the magnitude of the BWLS was the same across blocks, the cause of the BWLS appears to be quite different. This may again indicate that the mere presence of both languages on mixed language blocks makes inhibiting words from the non-target language more difficult.

In brief, the results from Experiment 1 indicate that both balanced and unbalanced bilinguals were unable to ignore the task-irrelevant language (Rodriguez-Fornells et al., 2002), and that a simple response set effect does not fully account for the BWLS (e.g., Roelofs, 2003; Goldfarb and Tzelgov, 2007). The goal of Experiment 2 was to identify the electrophysiological correlates for the bilingual Stroop task in order to delineate what type of activity drives the BWLS, and the Stroop effect more generally, and at what stage of processing it occurs.

PART II: Electrophysiological Correlates for the Bilingual Stroop Effect

Experiment 2

Experiment 2 was designed to uncover the cognitive and neural correlates of the bilingual Stroop effect. To make this initial ERP analysis of the BWLS feasible, we chose to begin exploring this question with balanced bilinguals during single language presentation, given that language dominance in the unbalanced bilinguals played a role in both the language and color congruence effects, and to isolate the BWLS effect in the absence of any mixing effects. Future studies are planned to explore the nature of the mixing effect and the effect of language dominance on the ERP BWLS. Thus, ERPs were recorded while balanced bilinguals performed the single language bilingual Stroop task from Experiment 1, naming the colors of color words first overtly then covertly. RT and accuracy from overt naming trials and ERPs from covert naming trails are presented herein.

The monolingual ERP literature does not provide clear predictions for the ERP correlates of the BWLS, and often do not align with the debate over the source of the BWLS in the behavioral literature. However, we predicted that, consistent with the monolingual ERP Stroop literature, color congruence would modulate the N450 (Liotti et al., 2000; West et al., 2004, 2005; Chen et al., 2011). Based on the assumption that the N450 reflects response conflict, it would be present for within but not between language trials. The N2, which indexes response inhibition on both non-language (Liotti et al., 2007; Pliszka et al., 2007) and language tasks (Jackson et al., 2001; Rodriguez-Fornells et al., 2005) would likely show more negative amplitude for language incongruent than congruent trials. Finally, since the late SP is thought to index general conflict reprocessing (West, 2003) we predicted both color and language congruence effects on this component.

Methods

Participants. Participants were recruited from the UTSA and UTHSCSA general populations. Screening procedures were the same as for balanced bilinguals in Experiment 1 (see Table 1). Thirty Spanish–English right-handed balanced bilinguals were paid for their participation. Data from 6 participants were excluded due to excessive EEG artifact (4), recording error (1), or task performance error (1). The remaining 24 participants (age range 18–35 years; M = 25 years, SD = 4.76) included 21 women and 3 men, all reportedly of Hispanic origin. Twelve participants (50%) previously participated in Experiment 1. Inclusion criteria on the language measures were the same as for balanced bilinguals in Experiment 1. All participants had normal or corrected-to-normal vision and reported no cognitive or physical impairments that could affect task performance.

Materials and procedure. The stimuli and paradigm were similar to Experiment 1 for the single language blocks only, with a few methodological changes. First, naming on the critical ERP trials was silent (covert). Second, two measures were used to ensure naming language and performance accuracy. An overt naming block preceded each covert naming block in the same language, and eight probe trials were included in the covert blocks. These trials were underlined color words cuing the participant to name that trial aloud. Third, the fixation cross that appeared after each word remained on the screen for 1000 ms before the onset of the next trial, see Figure 3. Participants were asked to refrain from blinking during this time to avoid eye movement artifact in the EEG.

FIGURE 3

Figure 3. The timing of paradigm events in Experiment 2, overlaid on grand average ERPs at electrode MiCe (vertex) time-locked to the onset of the visually presented words.

As in Experiment 1, the covert naming trials consisted of four single language blocks, two in Spanish and two in English (language order was randomized across subjects), for a total of 384 critical trials (equal number of randomly presented trials per condition and color in each block). An E-Prime coding error occurred that resulted in a loss of 4 trials of CCLI and 12 trials of CILI when naming in Spanish, thus, pairwise analyses of conditions were performed with trials collapsed across English and Spanish. For each language, 1 block was named in the same language as the written words (language congruent) and 1 block in the incongruent language.

Participants read and signed a consent form under the guidelines of the UTSA and UTHSCSA Institutional Review Board for Human Subject Research. Participants were fitted with EEG electrodes and sat in a sound attenuating, RF shielded chamber approximately 55′′ away from a 19′′ color CRT monitor. Participants were allowed to take breaks between blocks; no single break lasted longer than 5 min. The entire ERP session lasted approximately 2.5 h.

EEG recording. Continuous scalp-recorded EEG was acquired using a geodesic array of 26 pre-amplified sintered Ag–AgCl electrodes embedded in a custom electrode cap (Electro-Cap International Inc.). Additional electrodes were placed below and at the outer canthi of the left and right eyes to record blinks and eye movement respectively, and on the left and right mastoid processes to serve as offline reference. Preamplifiers in each electrode reduced induced noise between the electrode and the amplification/digitization system (BioSemi ActiveTwo, BioSemi B.V., Amsterdam), allowing high electrode impedances. Electrode offsets were kept below 40 mV. A first-order analog anti-aliasing filter with a half-power cutoff at 3.6 kHz was applied (see www.biosemi.com). The data were sampled at 512 Hz (2048 Hz with a decimation factor of 1/4) with a bandwidth of DC to 134 Hz, using a fifth order digital sinc filter. Each active electrode was measured online with respect to a common mode sense (CMS) active electrode producing a monopolar (non-differential) channel, and was referenced offline to the average of the left and right mastoids⁶. Data were processed using BrainVision Analyzer 2 (Brain Products GmbH, Munich). Non-causal Butterworth digital filters were applied with a low cutoff at 0.1 Hz (12 dB/oct) and high cutoff at 30.0 Hz (12 dB/oct). The EEG data were segmented in intervals of 1000 ms time-locked to stimulus onset, followed by DC local detrend for 100 ms blocks (Hennighausen et al., 1993) and baseline correction using −100 to 0 ms prestimulus.

Prefrontal channels were removed from analyses due to excessive artifacts restricted to those channels. The remaining 21 channels were processed using the following artifact rejection measures: maximum step of 75 μV/ms to capture voltage spikes, maximum amplitude difference of 150 μV/200 ms to capture signal drift, maximum amplitude of ±70 μV to capture blinks, and minimum amplitude difference of 0.5 μV/50 ms to capture flat lining and saccades. Only participants who retained 70% or more of the critical trials were included in the averages. The mean trials lost to artifact or error was 14.17%. Average waveforms were calculated for each condition time-locked to the onset of each word.

Results

Behavioral responses for overt naming trials. To determine the pattern of behavioral effects for the participants in Experiment 2, naming errors and RTs in milliseconds for the overt naming trails were analyzed using the same procedure as for balanced bilinguals in Experiment 1. As in Experiment 1, color incongruent trials elicited more errors than color congruent trials [M = 5.7%, SD = 6.4% versus M = 1.3%, SD = 2.8%; F(1, 20) = 12.843, p = 0.002], and the color-Stroop effect was larger for language congruent than language incongruent trials [F(1, 20) = 5.091, p = 0.035], see Figure 4.

FIGURE 4

Figure 4. The interaction between color and language in the overt behavioral results (mean response times) and all three times windows of the covert ERP results (mean amplitude). Note that interactions were observed only for data in (A,D); only main effects of language congruence and color congruence were observed in data from (B) the difference between language congruent and incongruent stimuli trended at p = 0.122) and (C), respectively: **p ≤ 0.001; *p ≤ 0.050; nsd, non-significant differences.

Similarly, slower naming times were observed for color incongruent than congruent trials, [M = 382.23, SD = 97.43 versus M = 306.15, SD = 94.60; F(1, 23) = 149.931, p < 0.001]. Unlike Experiment 1, the main effect of Language Congruence did reach significance, with faster naming times overall for language congruent than incongruent trials [M = 352.11, SD = 94.66 versus M = 336.27, SD = 97.59; F(1, 23) = 6.004, p = 0.022]. The Color Congruence effect was significantly larger within than between languages [M_diff = 85.29, SD = 37.11 versus M_diff = 66.88, SD = 30.60; F(1, 23) = 8.840, p = 0.007]. Naming times were again faster overall in Spanish than English [M = 328.27, SD = 99.98 versus M = 360.11, SD = 93.60; F(1, 23) = 15.583, p = 0.001].

Covert naming ERP results. Naming accuracy on probe trials for the covert naming blocks was at 95.4%, indicating that participants were performing the task correctly. Because responses were covert, we were unable to remove trials with naming errors. However, previous studies have shown equivalent ERP patterns from covert and overt performance on a Stroop task, supporting the validity of this task (Liotti et al., 2000). Inclusion of the few unknown error trials should not significantly affect the pattern of effects. All artifact free trials were included in the ERP analyses.

Overall, the ERP to each word was characterized by early sensory components – N1 and P2 – followed by two successive biphasic negative–positive deflections, with negative peaks at approximately 300 and 530 ms post-stimulus onset (note that the N400 that typically occurs to words is presumably suppressed due to the extensive repetition of each item), see Figure 5. Note that the ERP components of interest are overlaid on the visual onset and offset potentials to the fixation cross that follows the target word, see Figure 3. Visual inspection of the main effects of language and color congruence revealed two modulations with different timing. Language incongruent trials elicited more negativity than congruent trials starting approximately at 200 ms post-stimulus onset and ending before 500 ms, in line with the timing of the N2 (or N200) observed in the language literature, Figure 6A. Color incongruent trials elicited more negativity than congruent trials starting around 350 ms post-stimulus onset and resolving toward the end of the epoch, which is in line with the timing of the classic Stroop N450 in the early part of this deflection, Figure 6B. The effect after the N450 did not have the typical distribution or polarity shift reported in the literature for the conflict SP (e.g., West et al., 2005); hence, it is referred to herein simply as a sustained negativity (SN). However, previous findings support the disassociation of activity in these two time windows (West, 2003; Markela-Lerenc et al., 2004). Based on these contrasts three separate time windows were selected for analyses: N2 (200–350 ms), N450 (350–550 ms), and SN (550–700 ms). Figure 4 plots the BWLS effects for mean amplitude in each time window.

FIGURE 5

Figure 5. A bird’s eye view of the geodesic electrode array showing grand average ERPs. Voltage is plotted in microvolts on the y-axis with negative up; time is plotted in milliseconds on the x-axis with 0 ms marking the onset of the visually presented words and 100 ms tick marks.

FIGURE 6

Figure 6. Grand average ERPs for nine representative recording sites and spline-interpolated scalp topographies showing of three measured time windows for language congruence in (A) and color congruence in (B) (note that projection toward prefrontal channels is estimate). Vertical gray lines mark the time windows used for analyses for the N2, N450, and sustained negativity (SN). Electrode labeled from left to right: left frontal, central, parietal, medial central (vertex), parietal and occipital, right frontal, central, parietal.

Mean amplitudes for each ERP component were subjected to repeated-measures ANOVAs with Naming Language (English, Spanish) × Color Congruence (congruent, incongruent) × Language Congruence (congruent, incongruent) × Electrode. Omnibus ANOVAs with 21 electrodes were used in each window, followed by ANOVAs including 16 electrodes for scalp distribution analyses, with factors of Hemisphere (left, right), Anteriority (frontal, central, occipital), and Laterality (medial, lateral). In addition, region of interest analyses were used as appropriate for each effect. Effects for repeated-measures with greater than one degree of freedom are reported after Greenhouse–Geisser correction; planned contrasts were Bonferroni adjusted for multiple comparisons.

N2 (200–350 ms). Figure 6 shows grand average ERPs at representative electrodes and a spline-interpolated scalp topography for the effect of language congruence. The omnibus ANOVA revealed a trend toward an effect of Language Congruence [F(1, 23) = 3.625; p = 0.070; Language Congruence by Electrode, F(20, 460) = 2.214; p = 0.062].

The distributional analysis revealed a Language Congruence by Laterality interaction [F(1, 23) = 4.521; p = 0.044] with a larger negativity for language incongruent than congruent trials that was significant at medial sites (p = 0.039) in planned contrasts. In post hoc analyses, data from medio-central and right-dorsal electrodes, which encompass the N2 distribution (LMFr, LMCe, RMFr, RMCe, RDFr, RDCe, MiCe, MiPa), were subjected to repeated-measures ANOVA. This confirmed that language incongruent trials elicited more negative amplitude than congruent trials over this region [Language Congruence, F(1, 23) = 5.820, p = 0.024].

N450 (350–550 ms). As expected, the omnibus ANOVA revealed a color-Stroop effect with a larger negativity for color incongruent than congruent trials, see Figure 6 [Color Congruence, F(1, 23) = 5.120, p = 0.033; Color Congruence by Electrode, F(20, 460) = 4.744, p = 0.001]. Distributional analyses revealed the color-Stroop effect was present only at medial sites (p = 0.003) across all levels of anteriority with the strongest effect at medial central sites (Frontal, p = 0.006; Central p = 0.002; Occipital, p = 0.013), [Color Congruence × Laterality, F(1, 23) = 15.806, p < 0.001, Color Congruence × Laterality × Anteriority, F(2, 46) = 3.384, p = 0.055].

Sustained negativity (550–700 ms). The omnibus ANOVA revealed a color-Stroop effect with larger negativity for color incongruent than color congruent trials [Color Congruence, F(1, 23) = 8.058, p = 0.009], and a significant interaction between Color Congruence and Electrode [F(20, 460) = 4.118, p = 0.014], Figure 6 ⁷. The distributional analysis yielded a Color Congruence by Laterality interaction that showed the effect to be present at medial, but not lateral recording sites [F(1, 23) = 6.927, p = 0.015], and a Color Congruence by Anteriority interaction which revealed an effect at Frontal and Central, but not Occipital sites [F(2, 46) = 5.017, p = 0.032].

The interaction between Color Congruence and Language Congruence trended toward significance, F(1, 23) = 3.717, p = 0.066. Figure 7 shows what appears to be an increased negativity as early as 400 ms for the within language Stroop effect compared to the between language Stroop effect. A sliding window analysis in 50 ms increments across the head revealed that both the between and within language Stroop effects were significant from 550 to 600 ms post-stimulus. Then the between language effect disappeared between 600 and 650 ms leading to a brief interaction between Color Congruence and Language Congruence [F(1, 23) = 5.046, p = 0.035], while the negativity for the Color Congruence effect within language continued through 700 ms.

FIGURE 7

Figure 7. Difference ERPs (color incongruence minus congruent) for within and between language trials separately. Sliding window analysis in 50 ms increments revealed that through 600 ms both between and within language Stroop effects were significant and no different from each other, then from 600 to 650 ms (highlighted with gray vertical bars) there was a brief interaction between color and language congruence, where only the within language Stroop effect was present.

Discussion

The goal of Experiment 2 was to study the temporal dynamics, and the corresponding neural and cognitive correlates, of the bilingual Stroop. The findings have implications for explaining the Stroop effect, both for bilinguals and monolinguals. Our data speak to the suggestion that the bilingual Stroop effect reflects a response set effect. We discuss the implications of our findings after a brief summary.

A large N450 effect was observed for the color congruence manipulation, replicating monolingual findings. Color incongruent trials elicited larger negative amplitude than color congruent trials between 350 and 550 ms post-stimulus onset. This effect was the same within and between languages, indicating that the N450 was sensitive to color congruence regardless of whether the distracters were from the response set or not. Following the N450, there was an effect of color congruence with SN amplitude for color incongruent compared to congruent trials. This effect was observed in the same time window as the conflict SP (550–700 ms post-stimulus onset), but did not share the typical distribution reported in monolingual studies (a sustained positivity over central–parietal scalp sites that reverses in polarity over lateral frontal sites; West, 2003; Markela-Lerenc et al., 2004). Finally, there was a language congruence effect at the N2 (200–350 ms), with greater negativity for between than within language trials. This effect was present at central and right frontal sites. The N2 was not modulated by color congruence.

A majority of monolingual Stroop ERP studies suggests that the N450 reflects response conflict and the SP reflects both response and stimulus level conflict. In particular, based on Chen et al. (2011), response-irrelevant items, such as between language distracters, should elicit response conflict, and any form of conflict should elicit effects in the subsequent time window. We found the opposite pattern of effects. The N450 was not significantly modulated by language congruence, with a strong effect for both between- and within language naming, while the SN was. If indeed the N450 reflects cognitive control related to response conflict, then our data indicate that color incongruent words created equal conflict and cognitive control demands regardless of whether they belonged to the response set or not. This is not to say that the N450 is completely insensitive to language congruence, or perhaps even to response set effects more generally. In fact, there appear to be hints of an interaction between color and language congruence, for example at vertex (MiCe) in Figures 3 and 5, although the interaction did not even approach significance at these locations (with p-values of 0.5–0.8 across the time window). Perhaps balanced bilinguals present a unique case in which the cross language lexical equivalents for the response set create response conflict at the N450. A critical test of this in future research would be to include words in both languages in line with the typical response set effects (e.g., PINK/ROSA), so that the degree of spread of activation between words within and between languages could be measured. Likewise, perhaps unbalanced bilinguals might show an N450 asymmetry across languages, with a larger effect for reading response relevant items in the dominant than non-dominant language – a testable question for future research.

Another characteristic of the N450 in this balanced bilingual sample is the broader distribution compared to monolinguals, which might reflect recruitment of additional neural substrates to process the dueling sources of interference (color and language) in the bilingual paradigm. There is growing evidence that bilinguals activate information in both of their languages even when using only one (Marian and Spivey, 2003; Kroll et al., 2006; Sunderman and Kroll, 2006; Duyck et al., 2007; Thierry and Wu, 2007). Consequently, to produce a word in the target language, bilinguals must inhibit the competing non-target language (Green, 1986, 1998; Meuter and Allport, 1999; Bialystok and Martin, 2004; Costa et al., 2006; Kroll et al., 2008; Hernández et al., 2010). Due to this demand, bilinguals may develop an inhibitory control mechanism that is specialized for language (Green, 1998) or domain-general (Roelofs et al., 2011) with benefits for inhibitory control on a variety of non-linguistic tasks, such as the Stroop, Simon, and card sorting tasks (Bialystok and Martin, 2004; Bialystok et al., 2004; Bialystok and Craik, 2010). Costa and Santesteban (2004) have suggested that benefits to executive control are moderated by proficiency across languages; while unbalanced bilinguals rely on inhibitory control to limit access, balanced bilinguals use a language-specific selection mechanism to control cross language interference. This suggestion is perhaps in line with Stroop performance in monolinguals, for whom a steady increase in the amount of Stroop interference is observed until attaining a third grade reading level (Comalli et al., 1962; Schiller, 1966), after which greater reading skill decreases the magnitude of the Stroop effect (Protopapas et al., 2007), reflecting gains in executive function and attentional control (Tzelgov et al., 1990). However, the between language N450 effect found in the current study suggests that the non-target language continues to be processed (beyond the N2), even on a task that does not require more than word form processing (c.f., Rodriguez-Fornells et al., 2002), and even for a response set that has minimal cross language orthographic overlap. Hence, the presence of an N450 Stroop effect both between and within languages lends support for non-selective activation of both languages in balanced bilinguals.

The results also reveal that language membership information is processed prior to the N450 – specifically at the N2. The N2 is thought to be a complex of components that are functionally and distributionally distinct based on stimuli and task demands (for a review of N2 findings, see Folstein and Van Petten, 2008). Most relevant for the current study, the N2 has sometimes been associated with early processes at the level of word form (see also Grainger et al., 2006, for a related component for word recognition). Larger N2 amplitude has been observed to word form information when attended than when not attended (Ruz and Nobre, 2008). By inference then, the attended response relevant language in the current study should have elicited larger N2 amplitude than the response-irrelevant language. We observed the opposite effect, indicating that the N2 observed herein is not related to attention to the response set (c.f. Lamers et al., 2010). Another possibility is that the N2 reflects conflict detection, such as that observed on the Erikson flanker task where both stimulus and response level conflict have resulted in an increase in N2 amplitude (Van Veen and Carter, 2002; Carter and Van Veen, 2007; Wendt et al., 2007). Our data are again inconsistent with the direction of this modulation, since within language trials create more conflict in the behavioral results, and by inference should elicit larger N2 amplitude.

Instead, our data is most consistent with a third type of N2 effect. The direction and scalp distribution of the N2 effect in the current study (slight right-lateralization with a fronto-central maximum; c.f. Aron et al., 2003) is more in line with a no-go N2 (Pliszka et al., 2000, 2007; Liotti et al., 2007), than with either an attentional set effect or a conflict N2. The no-go N2 typically shows larger negative amplitude related to inhibiting a response (Pliszka et al., 2000; Schmajuk et al., 2006; Woodward et al., 2007; Folstein and Van Petten, 2008). In the bilingual Stroop paradigm, within language items are all potential go candidates as part of the response set, while between language distracters are all no-go items. Thus language membership is recognized early, presumably based on word form information, triggering mechanisms of inhibition as reflected by a no-go N2 for between language distracters. Yet, inhibition of the response for between language trials cannot completely explain our data. First, response relevant distracters should also elicit a no-go N2 relative to congruent trials. Our design does not have the power to determine if there is a no-go effect for within language distracters, but future research may show a graded effect for inhibition of response relevant and irrelevant items across languages. Second, clearly this stage of processing does not reflect complete inhibition of between language distracters given the subsequent N450 and SN. Instead, it may reflect a stage of processing parallel to that of the N450, which together may contribute to the end-state behavioral bilingual Stroop effect.

The behavioral findings from Experiments 1 and 2 were consistent with the majority of the literature, showing a larger color word Stroop effect within language than between languages (MacLeod, 1991; Francis, 1999). For this reason, the most surprising effect, or lack thereof, in Experiment 2 was the absence of a clear interaction between color and language congruence. If not from a direct interaction at the N450 or earlier brain activity, where does the interaction between color and language in the RTs come from? It is possible that ERP technology is not sensitive to the source of the BWLS, if for example it is driven by weak or deep sources of brain activity (or sources that cancel at the scalp). This seems unlikely given that our data show robust effects for both color and language congruence that are inline with previous findings. Instead, our data seem to indicate that color and language conflict are processed independently at different time intervals and interact only for a fleeting moment during the late time window of the SN.

It is possible that the BWLS is purely due to the underlying processes reflected in the brief interaction at the late SN. Our data reflect a broadly distributed, SN, inline with earlier reports of ERP effects in a complex Stroop task (West and Alain, 1999). Despite the similarity in scalp distribution, it is unlikely that the SN is simply sustained activity from the N450. The SN appears to resolve more quickly between than within languages. Perhaps this negativity is functionally related to the conflict SP, thought to reflect response monitoring and conflict adaptation (West and Alain, 2000; West, 2003; West et al., 2005; Chen et al., 2011). It could result from a global difference trial to trial in conflict adaptation, with quicker adaptation to between than within language conflict, or a greater impact of response relevant words on response monitoring. Still, these processes must be triggered by earlier stages of processing in which detection occurs of the conflict within or between languages. Perhaps this earlier stage of processing is reflected in the N2 effect. Thus, rather than complete inhibition of the between language distracters, the N2 may index processes of inhibitory control that facilitate later resolution of conflict at the SN. Between language distracters trigger this early inhibitory (no-go) mechanism, resulting in a larger N2 and subsequently quicker resolution of the SN. The intermediate effect at the N450 must then reflect parallel processing of the distracter words, regardless of response set (or language) membership. Thus, the behavioral bilingual Stroop effect could be a product of activity across parallel processing of language and color rather than the presence of a direct interaction of the two. In other words, it is possible that the RT effects reflect the summed brain activity over time, with contributions from language conflict and color conflict at different points in time (c.f., Cohen et al., 1990; Roelofs, 2003).

Conclusion

In summary, data from two bilingual Stroop experiments aimed at uncovering the source of the well-documented bilingual Stroop effect – referred to herein as the between-within language Stroop effect or BWLS. Experiment 1 replicated the BWLS in both balanced and unbalanced bilinguals. This effect was present regardless of language dominance, and during both single language and mixed language presentation. However, by taking an unconventional look at the Stroop data, analyzing the effect of language congruence in the presence or absence of color-Stroop interference, we were able to show that the source of the BWLS varied based on these manipulations. In the process of thoroughly delineating the behavior of our population on the bilingual Stroop task, we were able to address the leading explanation for the BWLS. We show that a response set effect can only partially explain this effect. Experiment 2 delineated the time course and stage of processing at which the BWLS occurs using a real time electrophysiological measure. Our ERP data provide evidence that balanced bilinguals process language congruence prior to color congruence on a bilingual color word Stroop task, as indexed by a language effect at the N2. Importantly, distinguishing the distracters based on language did not affect later processes at the N450, indicating that color incongruent words created equal conflict and cognitive control demands regardless of whether they belonged to the response set or not. Rather than complete inhibition of the between language distracters, the N2 may reflect processes of inhibitory control that facilitate the resolution of conflict at the SN, while the N450 reflects parallel processing of the distracter words, regardless of response set (or language). In sum, the behavioral BWLS reflects summed brain activity over time, with contributions from language conflict and color conflict at different time points. Our findings add to a vast literature, informing models of both monolingual and bilingual conflict processing on the Stroop task, and present new questions for the field.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

We have many individuals to thank for advice and technical assistance on this project, including Ryan J. Giuliano, Amanda Martinez-Lincoln, Shukhan Ng, David Pillow, Elena Salillas, Mai-Anh Tran Ngoc, and especially Delia Kothmann Paskos who inspired this research study. Resources and support were provided by the Computational Biology Initiative. Funding was provided by NICHD/NIGMH HD060435 and the UTSA College of Science to N. Y. Y. Wicha.

Footnotes

^Participants excluded for age were done so based on findings that indicate Stroop performance declines after age 55 (Jolles et al., 1995).
^Performance on the VFT and BNT were highly correlated [r(87) = 0.80, p < 0.01].
^One participant was included as balanced having scored as English dominant on one measure, Spanish dominant on another and balanced on the third, resulting in no clearly dominant language. This participant tested as balanced on two of the three measures upon retesting the naming and reading time measures for participation in Experiment 2. This occurred with other participants as well, who switched from dominant in one language to balanced in both, or vice versa, on a specific measure. This highlights the dynamic nature of bilinguals over time, and the importance of collecting more than one measure of language proficiency/dominance, in particular when classifying individuals as balanced.
^Balanced bilinguals as a group (but not all individuals) were faster at naming colors in Spanish than English on the baseline color-naming task, paired samples t(26) = 2.768, p = 0.010 (Table 1). However, unbalanced bilinguals named colors in their dominant language equally fast whether they were dominant in English or Spanish, and is therefore not due to a general naming bias for Spanish as a language (c.f., Chen and Ho, 1986).
^Our color-naming baseline produced faster naming times than all other trials. Future studies could employ an improved neutral baseline to determine if this difference is facilitatory for within language or inhibitory for between language trials.
^The average reference and average mastoid reference have shown equivalent results in previous studies (see Chen et al., 2011).
^Complex interactions with Naming Language in the distribution analysis could be explained by the loss of trials in Spanish (see Methods) and were not analyzed further.

References

Aarts, E., Roelofs, A., and van Turennout, M. (2009). Attentional control of task and response in lateral and medial frontal cortex: brain activity and reaction time distributions. Neuropsychologia 47, 2089–2099.

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Abutalebi, J., Annoni, J. M., Zimine, I., Pegna, A. J., Seghier, M. L., Lee-Jahnke, H., Lazeyras, F., Cappa, S. F., and Khateb, A. (2008). Language control and lexical competition in bilinguals: an event-related fMRI study. Cereb. Cortex 18, 1496–1505.

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Aron, A. R., Fletcher, P. C., Bullmore, E. T., Sahakian, B. J., and Robbins, T. W. (2003). Stop-signal inhibition disrupted by damage to right inferior frontal gyrus in humans. Nat. Neurosci. 6, 115–116.

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Atkinson, C. M., Drysdale, K. A., and Fulham, W. R. (2003). Event-related potentials to Stroop and reverse Stroop stimuli. Int. J. Psychophysiol. 47, 1–21.

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Bialystok, E., Craik, F. I., Klein, R., and Viswanathan, M. (2004). Bilingualism, aging, and cognitive control: evidence from the Simon task. Psychol. Aging 19, 290–303.

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Bialystok, E., and Craik, F. I. M. (2010). Cognitive and linguistic processing in the bilingual mind. Curr. Dir. Psychol. Sci. 19, 19–23.