Music has been called the universal language of mankind (Longfellow, 1835) reflecting longstanding curiosity on the relationship between music and language. Both share many traits including being perceived primarily through the auditory system, having similar acoustic attributes and reflecting analogous generative syntactic systems. This has led to decades of scientific research, exemplified by the papers included in this volume, exploring their overlapping neurophysiological, perceptual, and cognitive underpinnings. These range from the mechanism for encoding basic auditory cues (Wong et al., 2007; Kraus and Chandrasekaran, 2010), to mechanisms supporting acquisition (Slevc and Miyake, 2006; Schön et al., 2008) to the mechanism for detecting violations in predicted structure (Slevc et al., 2009).
Much of this research with respect to music has made use of trained musicians, in part to look for evidence that the cognitive and neural correlates of specialization for music are similar to the human specialization for language (e.g., Besson and Faita, 1995; Patel et al., 1998a; Maess et al., 2001; Schön et al., 2008; Kraus and Chandrasekaran, 2010). While using trained musicians has led to great strides in our understanding of how music is processed, it has obscured another important similarity between music and language: both may be acquired implicitly, without the aid of explicit instruction. In this paper, we review independent bodies of research exploring the role of implicitly acquired knowledge and associated neural structures in the acquisition of language and musical grammar.
We first consider the role of implicit memory in language by looking at both natural and artificial language learning studies. The studies discussed in the Section “Implicit Memory and Language” show that the implicit memory system plays an important role in acquiring the grammar, or rules, of language at all levels of linguistics structure (Table 1). Similarly, implicit learning in music is found in the acquisition of rhythm, pitch, and melodic structures. The studies discussed in the Section “Implicit Memory and Music” suggest a potentially common learning mechanism shared by both music and language that allows for the acquisition of these complex systems without the need for instruction (Table 1).
Table 1. Summary of representative neurological findings associating implicit memory, language, and music.
The studies we discuss below help us understand this mechanism by highlighting the fact that both music and language involve expectation and the tracking of dependencies between sequential elements. Neurally, there is a significant three-way overlap of the brain structures implicated in implicit memory and those involved in learning language and learning music. This convergence encourages new work that juxtaposes music and language in the context of the implicit memory system. Given the known relationship between dopamine and the implicit memory system, we may also consider more directly the genomic and molecular bases of music and language abilities.
Implicit memory is generally defined as acquired knowledge that is not available to conscious access (Schacter and Graf, 1986; Schacter, 1987). This contrasts with explicit memory, which is characterized by knowledge that involves conscious recollection, recall, or recognition. The majority of behavioral evidence for an implicit memory system is based on experiments wherein experience leads to altered performance on some task without participants being aware of having learned anything.
One type of implicit memory stems from perceptual learning, which involves changes to the perceptual system and to perceptual categories (e.g., phonemes, chords) due to experience. For example, in one study (Wade and Holt, 2005), participants played a video game that involved navigating through a maze. A non-critical feature of the game was that certain non-speech auditory cues were associated with certain events. After playing the game, participants were better able to distinguish the sounds and reliably learned the sound-event patterns. Importantly, learning was qualitatively different, and in some cases better, than explicit training on these same patterns. While explicit attention has been shown to facilitate this sort of perceptual learning (e.g., Ranganath and Rainer, 2003) it is also well established that perceptual learning can be subliminal and implicit (Goldstone, 1998; Seitz and Watanabe, 2003).
Another type of implicit memory involves the implicit learning of sequences (e.g., sentences, melodies). A commonly used paradigm to test implicit memory for sequences is the serial reaction time test (SRTT; Nissen and Bullemer, 1987). In this test, participants are exposed to some stimuli (e.g., objects appearing sequentially at different points on a screen) cuing participants to respond (e.g., by indicating where the stimuli appear) as quickly as possible. While the sequences of stimuli appear to be random to the participant, embedded within the random sequences is a fixed pattern, repeatedly interspersed throughout the random sequences. Over the course of the experiment, response times and accuracy on the fixed sequences improves relative to the random sequences, presumably because the participants are learning this repeated sequence. Crucially, participants do not exhibit an improved ability to explicitly recall this repeated sequence as compared to recalling random non-repeated sequences. The fact that participants show implicit learning without explicit knowledge suggests that these memory systems can operate independently, and that people can learn about the sequencing of some stimuli without being explicitly aware of it. While implicit memory is relevant for both sequence learning and category learning and both sequence learning and category learning are relevant to language and music, we focus primarily on implicit memory in the context of sequence learning.
A more specific kind of implicit sequence learning often discussed in the context of language and music is statistical learning (e.g., Saffran et al., 1996a). Statistical learning involves the same basic idea that participants can learn sequences without explicit awareness, but adds an additional component of tracking statistics over these sequences1. For example, in a series of studies, Saffran et al. (1996a,b, 1999) showed that adults, children, and infants are able to track transitional probabilities between syllables and tones. Participants were exposed to seemingly random sequences of syllables obscuring consistent differences in the probability that certain syllables followed others (see below for more details). Participants were sensitive to these differences in transitional probability, and subsequent work has explored what types of statistics and what types of dependencies can be implicitly tracked (Knowlton and Squire, 1996; Aslin et al., 1998; Gomez, 2002).
More recently, neurological studies have, for the most part, supported this dissociation between implicit and explicit memory systems for tasks like the SRTT (Curran, 1997) and other similar sensory–motor learning tasks. Evidence includes both lesion studies (Vakil et al., 2000; Exner et al., 2002; Peach and Wong, 2004) and functional imaging (Rauch et al., 1997; Koechlin et al., 2002) and implicate the striatum, and more specifically the caudate, in implicit learning. For example, Alzheimer’s patients, characterized by degeneration of the medial temporal lobe, have little trouble with the SRTT despite exhibiting problems with declarative memory, while learning in the same task is impaired for people with diseases characterized by degeneration of the basal ganglia, including in Parkinson’s patients (Reber and Squire, 1994; Jackson et al., 1995) and Huntington’s patients (Gabrieli et al., 1997). More generally, the basal ganglia have been implicated in implicit learning across a number of different tasks (Squire and Knowlton, 2000; Eichenbaum and Cohen, 2001). In addition to implicit memory, the basal ganglia, and the caudate specifically, have also been implicated in motor learning (Knowlton et al., 1996), general learning plasticity (Graybiel, 2005) and learning from feedback (Packard and Knowlton, 2002). There is also some evidence that the inferior frontal gyrus, and in particular, Broca’s area and its right homolog, are also involved in learning sequences (Doyon et al., 1997, 1998; Peigneux et al., 1999a). More generally, Broca’s area has also been associated with a wide range of linguistic functions (Grodzinsky and Santi, 2008) including hierarchical processing (Musso et al., 2003), recursion, binding (Hagoort, 2005), and speech articulation (see Bookheimer, 2002 for a review).
Finally, by virtue of the fact that dopamine receptors are found in the basal ganglia, and in particular, the striatum (which includes the caudate, putamen, and nucleus accumbens), implicit memory has been associated with dopamine. This has been supported by studies showing that increasing dopamine levels in the brain can lead to improved implicit learning (de Vries et al., 2010b), that dopamine deficiencies, as in Parkinson patients, result in poor implicit learning though explicit learning is intact (Shohamy et al., 2009), and that dopaminergic neurons in primates show a burst of activity when learning implicitly (see Shohamy and Adcock, 2010 for a review).
Implicit Memory and Language
Language learning shares a number of important similarities with the learning of sensory–motor sequences, which have been classically associated with implicit memory and which, as will be discussed below, are also implicated in acquiring a musical system. As with the tasks used in implicit learning experiments (e.g., the SRTT), people are often unaware of, or unable to articulate many of the rules of their language (Fodor, 1983). People can also learn language without any explicit instruction (Chomsky, 1957). This is particularly true before school age when children learn language with relative ease (Lenneberg, 1967), which has been argued to be, in part, due to children’s good implicit memory capacity as compared to adults (DiGiulio et al., 1994; DeKeyser and Larson-Hall, 2005). Finally, certain aspects of linguistic knowledge, namely the rules of combination, may be represented probabilistically or as information about the distributional relationships at different levels of linguistic structure (e.g., phonemes, morphemes, words, and sentences; Redington and Chater, 1997). This knowledge is generally not consciously accessible to speakers of a language and is similar in nature to the probabilistic knowledge acquired in implicit learning.
Implicitly Learned Artificial Grammars
The use of implicitly learned distributional information for language learning has been demonstrated at many different levels of linguistic structure. For example, at the level of word segmentation, Saffran et al. (1996a) exposed 8-month-old infants to a stream of running speech consisting of four three-syllable words without any breaks or pauses indicating word-hood. Thus, the only cue to word segmentation was the transitional probability between syllables, where within-word transitional probability of syllables was 1.0 and between-word transitional probability of syllables was 0.33 (no word followed itself). Infants showed a significant ability to discriminate words from part-words (formed by combining the final syllable from one word with the first two syllables of another). Adults performed similarly (Saffran et al., 1996b) in what is argued to reflect implicit learning of word segmentation (Evans et al., 2009). Importantly, this ability is suggested to be domain general as it also applies to tones (Saffran et al., 1999 and below in the discussion on implicit memory and music) and visual stimuli (Fiser and Aslin, 2001).
Analogous behavior is also found with respect to the acquisition of phonotactics. Phonotactics are the restrictions on where phonemes can occur in a word in a language (e.g., English prohibits ng starting a word or h ending one). In one study (Onishi et al., 2002), adults briefly exposed to pseudo-words reflecting some non-English phonotactic generalization showed speeded repetition to words that adhered to the generalization as compared to words that did not.
Another study on implicit phonotactic learning (Dell et al., 2000) found that when participants are tasked with repeating sets of words reflecting some phonotactic generalization, their speech errors tend to reflect these newly learned generalizations, as is true of one’s native language. The authors assessed the implicitness of learning using something they call the “ask-tell technique.” This involved asking all participants whether they had noticed anything about the words they were pronouncing; the experimenters also told half the participants, explicitly, what the phonotactics would be before starting. Neither the uninformed nor informed participants were able to identify any regularities in the experimental materials. These results, in addition to the fact that the speech errors were not intentional, suggest that this learning is, in fact, implicit.
Another important component of learning the phonology of a language, acquiring phonological rules, has been shown to relate to non-linguistic implicit learning as well (Ettlinger et al., in press). In this study, participants took both an artificial grammar learning experiment and a test of implicit learning. The artificial grammar learning task involved exposure to words that reflected a set of rules for forming plural and diminutive variants (e.g., dog, dogs, doggie, doggies). The test of implicit learning was a modified version of the Tower of London task (Shallice, 1982). In this task, participants were required to solve puzzles, increasing in difficulty, which involved virtually moving colored balls on three sticks to match a predetermined pattern. Embedded within the puzzles were repeated sequences of moves, and participants were asked to think through their moves before starting, to minimize the effects of motor coordination, unlike the SRTT. Implicit learning was measured by looking at the improvement in performance on the repeated sequences (Phillips et al., 1999). Results showed a strong correlation between learning the artificial language and performance on the Tower of London task, suggesting that implicit memory and language learning are linked.
In another set of experiments exploring the possible implicit learning of syntactic structure, Reber (1967) taught participants an artificial finite state grammar for sequences of letters (Figure 1). After exposure to strings of letters generated by the grammar, participants were asked to judge the grammaticality of novel sequences of letters. Participants were able to successfully distinguish what constituted a valid sequence without being able to explicitly describe the rules of the grammar.
Figure 1. Examples of finite state grammars used in language (A) and music (B) learning experiments. (A) is a finite state grammar used to generate sequences of letters that participants are exposed to in implicit language learning experiments (from Reber, 1967); (B) shows a similar structure, using notes instead of letters (from Tillmann and Poulin-Charronnat, 2010). Participants can acquire grammars of this sort and identify valid versus invalid sequences without being explicitly aware of any specific aspects of the grammar for both music and language.
In addition to these associative studies, more concrete evidence on the role of implicit memory in language learning is provided by recent imaging studies.
McNealy et al. (2006) adapted a version of Saffran et al.’s (1996) word-segmentation paradigm for functional imaging by presenting participants with three speech streams: one containing no regularities, one containing the statistical regularities as detailed above, and a third containing statistical regularities plus a standard phonetic word-segmentation cue. Greater activation was found in inferior frontal gyrus for the statistical cue and statistical cue plus phonetic cue conditions as compared to the random condition. Additional activation was found in the superior temporal gyrus, which is associated with the processing of speech (Geschwind, 1970). Another study found that when word meaning is introduced to this experimental paradigm, greater activation is also found in the basal ganglia, specifically the caudate (Mestres-Misse et al., 2008), plus the thalamus, which serves as a relay between subcortical (e.g., basal ganglia) and cortical networks and is involved in sensory perception (Steriade et al., 1997). This has led to the hypothesis that the basal ganglia, and therefore, presumably, implicit memory, is important for the integration of multiple information sources during the process of language learning (Rodriguez-Fornells et al., 2009). Similarly, patients with early stage Huntington’s disease and striatal damage also do poorly on tasks of this sort (De Diego-Balaguer et al., 2008).
As with the SRTT and word segmentation, a fronto-striatal network is implicated in acquiring the finite state grammars described above. Alzheimer’s and amnesic patients, with degeneration or lesions of the temporal cortices can still successfully learn artificial grammars of this sort while having trouble with more explicit language tasks (Reber and Squire, 1994; Knowlton and Squire, 1996; Reber et al., 2003). The ability of Parkinson’s patients with degeneration of the basal ganglia to learn artificial grammars is less clear, however, with conflicting evidence present (Peigneux et al., 1999b; Witt et al., 2002). Similar findings are found using functional imaging, with the basal ganglia, and inferior frontal gyrus supporting the acquisition of implicit knowledge of an underlying pattern governing a sequence of letters, while the medial temporal lobe supports the recall of specific sequences (Lieberman et al., 2004; Petersson et al., 2004). Activation of the caudate is also found in a study of syntactic processing (Moro et al., 2001). In this latter study, participants were exposed to a version of Italian (the participants’ native language) where all content words were replaced with pseudo-words, with function words left intact, which served to eliminate any semantic component of processing. Syntactic (word order), morphological (determiner agreement), and phonotactic violations were juxtaposed using PET. The results reflected Broca’s and right IFG activation for the morphological and syntactic conditions, which has long been associated with syntactic processing (Embick et al., 2000; Grodzinsky, 2000) and may be part of a basal ganglia thalamocortical circuit (Ullman, 2006). Greater activation was also found in the left caudate, which is associated with implicit memory (see above). This result had been replicated a number of times with different types of artificial syntactic grammar, with activation consistently found in Broca’s area and the caudate (Forkstam et al., 2006; Petersson et al., in press).
Implicit Learning and Natural Language
With respect to language learning in more ecologically valid settings, a few behavioral studies have shown a relationship between natural language processing and implicit learning. Misyak et al. (2010) created an implicit learning task that combined the SRTT and artificial grammar learning and showed that performance correlated with participants’ ability to comprehend complex English sentences. Evans et al. (2009) looked at children with specific language impairment, ages five to seven, and showed that these children also performed worse on the word-segmentation task from Saffran et al. (1996) as compared to a control group with the same non-verbal IQ. They conclude that specific language impairment may not, in fact, be specific to language, but rather reflects an impairment of implicit learning, which is crucial for language learning but distinct from other measures of intelligence (see also Kaufman et al., 2010 for a similar view). Furthermore, performance on the word-segmentation task correlated with vocabulary size within each participant group, suggesting implicit learning facilitates word learning. Finally, research looking into language processing in more realistic settings has also considered language processing in noise (e.g., Wong et al., 2008, 2009a, 2010; Harris et al., 2009). In particular, Conway et al. (2010) showed a relationship between an ability to perceive speech in noise and implicit sensory–motor sequence learning. Participants who were good at an SRTT-like task were similarly good at perceiving sentences embedded in noise when the last word in the sentence had high-predictability (e.g., Her entry should win first prize), even when controlling for working memory and intelligence. The correlation disappeared for sentences ending in low-predictability words (e.g., The arm is riding on the beach).
This suggests that an important way in which implicit memory is related to language is through prediction and anticipation. A number of studies using eye-tracking (see Kamide, 2008 for a review) and event-related potentials (ERPs; see Van Berkum, 2008 for a review) have shown that people make significant use of context to facilitate processing. For example, participants look more often at a picture of beer than a doll when hearing the beginning of the sentence the man will taste the… (Kamide et al., 2003). A violation of an anticipated sentence completion will also yield a specific ERP response, either N400 for semantic incongruency or P600 for syntactic. The same ERP response is elicited on encountering anomalies in predicted outcome for artificial grammars similar to Reber (1967, above; Friederici et al., 2002) and music (Patel et al., 1998a).
Additional neural evidence comes from functional imaging, showing a significant overlap in the brain regions associated with implicit memory and language. As mentioned above, Broca’s area has been implicated in implicit memory tasks, and Broca’s area has a longstanding association with language learning and language processing (Embick et al., 2000; Grodzinsky, 2000; Sahin et al., 2009). Broca’s area has also been implicated in prediction, and the expectations that yield the N400 or P600, above, result in activation of the bilateral inferior frontal gyrus (i.e., Broca’s area, nearby regions and their right homologs) in addition to the middle temporal gyrus (Kiehl et al., 2002).
Thus, there is a wide range of similarities between language and implicit knowledge both in terms of their neural substrates (the fronto-striatal system) and in their cognitive underpinnings (sequential knowledge, expectation). These similarities have motivated myriad theories in linguistic processing positing that the dissociation between the words and rules of a language is homologous to the dissociation of explicit and implicit memory, respectively (Paradis, 1994, 2009; Pinker, 1999; Pinker and Ullman, 2002). Evidence for this dissociation is discussed below, and is based on the idea that we can explicitly recall and conceptualize the words of our language, which is declarative in nature, whereas the application of the rules of language (when speaking naturally, as contrasted with attempting to adhere to a style guide, for example) is generally more difficult, if not impossible to articulate.
To conclude, there is extensive and convergent evidence for a close relationship between the cognitive and neurophysiological underpinnings of language learning and implicit memory. Language learning involves cognitive abilities that are generally learned implicitly, including tracking dependencies and developing expectations regarding adjacent linguistic structures. Language and implicit memory are also both supported by a set of neural structures including the anterior portion of the inferior frontal gyrus and the basal ganglia. As will be reviewed below, music shares many of these same associations with implicit memory and these shared associations are not restricted to musicians with formal musical training, but extend to everyday music listeners.
Implicit Memory and Music
Although music is sometimes held to be the domain of specialists, its near-ubiquity in daily life, from mp3 players to Internet radio, cinema, and advertising, shows that affinity for music is widespread. Indeed, music has frequently been postulated by anthropologists to be a human universal, present in all known cultures (Blacking, 1973; Zatorre and Peretz, 2001). Although the ability to perform music skillfully is not evenly distributed and often relies on years of formal training, the ability to listen, process, and respond emotionally to music is shared across most of the population and seems to depend only on implicit exposure. For example, Bigand et al. (2005) showed that people with and without formal training responded largely interchangeably to non-vocal classical music. Other deep musical abilities in people without explicit training, such as the ability to perceive the relationship between a theme and its variations and to learn new compositional systems, are chronicled in Bigand and Poulin-Charronnat (2006). With little to no explicit training, how is it possible for people to develop the ability to represent and respond appropriately to the complex syntactic structures of music?
Desain and Honing (1999) demonstrate that even a seemingly simple and near-universal ability like tapping to a beat depends on complex internal representations of harmonic and syntactic musical structures. Indeed, research summarized in Krumhansl (1990) shows that implicit exposure to Western tonal music is sufficient for listeners to develop internal representations of the pitch relationships that music theorists hold to underlie tonality. Given a tonal context, such as a scale or chord progression, listeners without formal training can accurately judge how well a given continuation fits the established tonality.
One of the mechanisms by which passive exposure can ultimately yield sophisticated internal representations is statistical learning. Saffran et al. (1999) constructed long isochronous tone sequences out of 6 three-note “figures” repeated in random order, with no breaks or other indication of boundaries between the figures, and constrained so that the same figure never appeared twice in succession. When infants were exposed to this series of tones over a 20-min period, they were able to abstract the constituent three-note figures, despite the fact that nothing but the reduced transition probabilities between them delineated the figures in the continuous stream of the musical surface. The infants, it seemed, had carefully tracked continuation probabilities in the sequence, despite the fact that their exposure to it was entirely passive. This ability to track common outcomes in musical repertoires may seem arbitrary, but in fact has been held by music theorists and psychologists since Meyer (1956) to form the basis of affective responses to music (see Huron and Margulis, 2010 for a summary). Continuations that are recognized, even implicitly, as unusual are thought to result in perceptions of special expressivity or esthetic charge. In this way, the ability to implicitly track statistics about continuations may form the fundamental scaffolding for the widespread ability to respond emotionally to music, even in the absence of formal training.
Implicit memory for music also reveals itself in various well-documented priming effects. Priming is generally defined as an implicit memory effect in which exposure to a stimulus influences responses to later stimulus without awareness of or an ability to recall the specific prime (Tulving et al., 1982). For example, Hutchins and Palmer (2008) showed that participants were more accurate in singing back the last tone of a short melody if that tone had appeared previously in the melody. Musical priming can also evidence itself in the form of faster and more accurate judgments about pitches or chords that are normative and expected given the tonal context. This kind of tonal priming has been documented in responses to melodic continuations (Margulis and Levine, 2006), and harmonic continuations (Bigand and Pineau, 1997) by listeners with no formal training. fMRI studies have implicated suppressed activity in bilateral inferior frontal regions of the brain during harmonic priming (Tillmann et al., 2000, 2003). It has even been documented in children (Schellenberg et al., 2005). Bharucha and Stoeckig(1986, 1987) provide evidence that harmonic priming is cognitive (based on the implicit abstraction of regularities in the musical environment) rather than sensory (based on psychoacoustic relationships) in nature. Tillmann et al. (2000) propose a self-organizing network model that can account for the kind of implicit learning of tonal structure revealed by priming studies. These priming effects are also observed to reflect the acquisition of musical grammars implicitly learned in the same fashion as in the implicit language learning experiments above (Figure 1; Tillmann and Poulin-Charronnat, 2010).
It is not only continuation statistics that listeners track implicitly. Duple and quadruple meters are more common than triple meters in Western music, and Brochard et al. (2003) confirmed that when presented with an ambiguous stimulus, listeners assume a binary division of the beat. Relatedly, the major mode is more common than the minor mode in Western music, and Huron (2006) confirmed that when presented with an ambiguous stimulus, listeners assume the major mode. And although absolute pitch perception is restricted to a tiny fraction of the population, Levitin (1994) demonstrated that ordinary listeners generally sing familiar songs within a semitone or so of their actual pitch level, suggesting that people have some implicit sense of pitch even in the absence of formal training on scales, producing notes, performing in key, or tuning an instrument. It is clear that mere exposure, independent of formal training, or active use (such as performance or participation) is sufficient to engender highly structured and highly specific memory traces in ordinary listeners.
Implicit memory for music emerges consistently in preference effects. Halpern and O’Connor (2000) showed that although explicit recognition memory for melodies deteriorated with age, implicit memory was retained, in the form of elevated preference (the mere exposure effect first documented in Zajonc, 1968). A battery of studies over the past several decades (summarized nicely in Szpunar et al., 2004) illustrate that listeners’ preference increases for music that has been encountered before. This effect is even stronger for music that is complex or ecologically valid (Bornstein, 1989). Halpern and Mullensiefen (2008) exploit this preference toward previously encountered music as a measure of implicit memory, showing that when melodies that are encountered in an exposure phase are later replayed in new timbres, participants continue to report increased liking for them, even when explicit memory of the music is obscured (i.e., the timbre change prevented them from recognizing explicitly that they had heard the excerpts before). Similarly, Peretz et al. (1998) found that explicit recognition memory was more susceptible to decay over time than implicit memory measured by elevated preference. They concluded that, in contrast with explicit memory, implicit memory as manifested in affective judgments operates obligatorily, in an automatic and unconscious fashion. Samson and Peretz (2005) further conclude, based on an analysis of patients with temporal lobe lesions on either the right or left side, that the right temporal lobe is more active in the formation of representations that underlie implicit musical memory, and the left temporal lobe is more active in processes related to explicit retrieval of musical memories.
In addition to the implicit learning of normative patterns in a particular musical style, many people are able to gain competence in more than one musical system through mere passive exposure, independent of any experience performing or producing the sound, as well as independent of any explicit instruction (formal musical training) about the style. Wong et al. (2009b) illustrate that passive exposure to the music from two cultures can result in the development of true bimusicals who approach both styles with affective and cognitive competence lacking in monomusicals of similar age and background. Wong et al. (in press) used structural equation modeling to investigate fMRI data from bimusical and monomusical listeners, finding more connectivity, and larger differentiation between the musical systems in bimusicals. These differences imply that even the implicit learning of multiple musical systems can result in fundamental changes to the way the brain approaches expressive sound.
Electrophysiological evidence also supports this conclusion. Violations of expected harmonic, melodic, and rhythmic patterns result in a late positive component (LPC) characteristic of the detection of an incongruity, even when the participants lacked formal training and were unable to explicitly identify the surprises (Besson and Faita, 1995). The elicitation of ERP components related to syntactic violations in music seem to be independent of the task relevance of unexpected chords, and provides strong evidence for important implicit components to musical ability (Koelsch et al., 2000). Patel et al. (1998b) were the first to show that the P600 – a known marker of syntactic violations in language – extended to syntactic violations in music grammars that are abstracted implicitly by listeners. Generally, these responses have been found even when the musical exposure is entirely passive, as in Koelsch and Jentschke (2008), when participants were watching a silent movie. Koelsch (2010) emphasizes that the early right anterior negativity (ERAN) that emerges in response to syntactic violations in music depends on the long-term extraction of statistical regularities in music, not from short-term exposure to particular sequences.
Predictions based on these abstractions of musical syntax are thought to be localized in the premotor cortex and the inferior frontal gyrus (particularly Broca’s area). Evidence for localization to the IFG comes from MEG (Maess et al., 2001), fMRI (Tillmann et al., 2003), and lesion studies (Sammler et al., 2011) exploring participants’ responses to ungrammatical or incongruent musical stimuli (see Koelsch, 2006 for a review). There is also some evidence that the source of the ERP component responding to expectation violation may have origins in the right temporal–limbic areas, which is associated with affect and emotive processing (James et al., 2008).
The processing of syntactic violations in music has also been shown to interfere with the processing of syntactic violations in language, suggesting overlap for these two functions. When participants read garden path sentences while hearing chord progressions, they took longer to process syntactically unexpected words when they appeared at the same time as syntactically unexpected harmonies; however, no such interference occurred when the musical surprise was not syntactic in nature (e.g., when a chord sounded in a different timbre; Slevc et al., 2009). So, implicit memory seems to play an important role in syntactic processing in both language and music.
Implicit Memory in Language and Music
We have reviewed above independent sets of empirical studies implicating the role of the implicit memory system in music and language, summarized in Table 1. In particular, we have discussed the fact that explicit training is not required for processing of language or music. It is important to note that these studies examined music or language alone. To ascertain common pathways in processing and/or representation, music and language should be examined in tandem. In terms of processing, studies could be conducted such as those performed by Patel and Slevc and colleagues (Patel et al., 1998b; Slevc et al., 2009) in which musical and linguistic stimuli were combined. However, it is preferable that everyday music listeners should be examined to ascertain that the results are not due to formal musical training alone or trained musicians possessing a genetic difference.
Studies examining the dependence and independence of musical and linguistic functions sometimes yield conflicting results. In particular, the lesion literature favors independence while studies on neurologically normal subjects favor dependence. It is beyond the scope of this proposal to extensively discuss the nature of this debate, except to mention that a reconciliation has been proposed by imposing a distinction between representation and processing at least for syntax (Patel, 2008). In his Shared Syntactic Integration Resource Hypothesis, Patel (2003) postulates that while musical and linguistic syntactic representations are maintained separately, the processing of both musical and linguistic syntactic structures overlapped in neural resources. While the processing aspect of this hypothesis has much support (Patel et al., 1998b) and is conceivably more feasible to test, representations are difficult to examine. However, neural repetition-suppression/enhancement paradigms have been used recently to examine mental representations in humans (Grill-Spector et al., 2006) and can potentially be used to test whether musical and linguistic representations overlap in neural regions. More specifically related to the implicit memory system, we believe such experiments could be conducted with both music and language studied side-by-side.
Major divisions of the dopaminergic system contain neurons from the substantia nigra pars compacta and ventral tegmental area projecting to divisions of the striatum and prefrontal cortex, and other regions (see Seamans and Yang, 2004 for a review). As discussed above, these brain regions are also associated with the implicit memory system. Recent studies in humans, including pharmacological (de Vries et al., 2010b), molecular imaging (e.g., McNab et al., 2009), and genomic (e.g., Klein et al., 2007a,b) studies have examined the role of dopamine and related genes in a variety of implicit behaviors, such as acquiring an artificial grammar (de Vries et al., 2010a) and learning from feedback in a statistical learning paradigm (Klein et al., 2007b). Future research into the role of the implicit memory system in music and language could employ similar methods to more directly examine their potentially shared molecular neurobiological mechanisms.
Conflict of Interest Statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
The authors would like to thank Lionel Newman for assistance in editing this manuscript. Support provided by grant T32 NS047987 to Marc Ettlinger and NIH grants R01DC008333, R21DC007468, and NSF BCS-1125144 to Patrick C. M. Wong.
- ^“Statistical learning” is also sometimes used to refer to certain types of perceptual learning (e.g., Maye et al., 2002). Here, we use it to refer to sequence-based learning only.
Bigand, E., Vieillard, S., Madurell, F., Marozeau, J., and Dacquet, A. (2005). Multidimensional scaling of emotional responses to music: the effect of musical expertise and of the duration of the excerpts. Cogn. Emot. 19, 1113–1139.
Brochard, R., Abecasis, D., Potter, D., Ragot, R., and Drake, C. (2003). The “ticktock” of our internal clock: direct brain evidence of subjective accents in isochronous sequences. Psychol. Sci. 14, 362–366.
De Diego-Balaguer, R., Couette, M., Dolbeau, G., Durr, A., Youssov, K., and Bachoud-Levi, A. C. (2008). Striatal degeneration impairs language learning: evidence from Huntington’s disease. Brain 131, 2870–2881.
de Vries, M. H., Barth, A. C. R., Maiworm, S., Knecht, S., Zwitserlood, P., and Flöel, A. (2010a). Electrical stimulation of Broca’s area enhances implicit learning of an artificial grammar. J. Cogn. Neurosci. 22, 2427–2436.
de Vries, M. H., Ulte, C., Zwitserlood, P., Szymanski, B., and Knecht, S. (2010b). Increasing dopamine levels in the brain improves feedback-based procedural learning in healthy participants: an artificial-grammar-learning experiment. Neuropsychologia 48, 3193–3197.
DeKeyser, R., and Larson-Hall, J. (2005). “What does the critical period really mean?” in Handbook of Bilingualism: Psycholinguistic Approaches, eds. J. F. Kroll, and A. M. B. De Groot (New York, NY: Oxford University Press), 88–108.
Dell, G. S., Reed, K. D., Adams, D. R., and Meyer, A. S. (2000). Speech errors, phonotactic constraints, and implicit learning: a study of the role of experience in language production. J. Exp. Psychol. Learn. Mem. Cogn. 26, 1355–1367.
Doyon, J., Gaudreau, D., Laforce, R. Jr., Castonguay, M., Bedard, P. J., Bedard, F., and Bouchard, J. P. (1997). Role of the striatum, cerebellum, and frontal lobes in the learning of a visuomotor sequence. Brain Cogn. 34, 218–245.
Doyon, J., Laforce, R. Jr., Bouchard, G., Gaudreau, D., Roy, J., Poirier, M., Bedard, P. J., Bedard, F., and Bouchard, J. P. (1998). Role of the striatum, cerebellum and frontal lobes in the automatization of a repeated visuomotor sequence of movements. Neuropsychologia 36, 625–641.
Ettlinger, M., Bradlow, A. R., and Wong, P. C. M. (in press). “The persistence and obliteration of opaque interactions,” in Proceedings of the 45th Annual Meeting of the Chicago Linguistics Society, ed. R. Bochnak Chicago, IL: Chicago Linguistics Society.
Exner, C., Koschack, J., and Irle, E. (2002). The differential role of premotor frontal cortex and basal ganglia in motor sequence learning: evidence from focal basal ganglia lesions. Learn. Mem. 9, 376–386.
Friederici, A. D., Steinhauer, K., and Pfeifer, E. (2002). Brain signatures of artificial language processing: evidence challenging the critical period hypothesis. Proc. Natl. Acad. Sci. U.S.A. 99, 529–534.
Gabrieli, J. D., Stebbins, G. T., Singh, J., Willingham, D. B., and Goetz, C. G. (1997). Intact mirror-tracing and impaired rotary-pursuit skill learning in patients with Huntington’s disease: evidence for dissociable memory systems in skill learning. Neuropsychology 11, 272–281.
Jackson, G. M., Jackson, S. R., Harrison, J., Henderson, L., and Kennard, C. (1995). Serial reaction time learning and Parkinson’s disease: evidence for a procedural learning deficit. Neuropsychologia 33, 577–593.
James, C. E., Britz, J., Vuilleumier, P., Hauert, C. A., and Michel, C. M. (2008). Early neuronal responses in right limbic structures mediate harmony incongruity processing in musical experts. Neuroimage 42, 1597–1608.
Kamide, Y., Scheepers, C., and Altmann, G. T. (2003). Integration of syntactic and semantic information in predictive processing: cross-linguistic evidence from German and English. J. Psycholinguist. Res. 32, 37–55.
Knowlton, B. J., and Squire, L. R. (1996). Artificial grammar learning depends on implicit acquisition of both abstract and exemplar-specific information. J. Exp. Psychol. Learn. Mem. Cogn. 22, 169–181.
Koechlin, E., Danek, A., Burnod, Y., and Grafman, J. (2002). Medial prefrontal and subcortical mechanisms underlying the acquisition of motor and cognitive action sequences in humans. Neuron 35, 371–381.
Koelsch, S. (2010). “Unconscious memory representations underlying music-syntactic processing and processing of auditory oddballs,” in Unconscious Memory Representations in Perception: Processes and Mechanisms in The Brain, eds. I. Cziglar, and I. Winkler (Herndon, VA: John Benjamins Publishing Co.), 209–244.
Lieberman, M. D., Chang, G. Y., Chiao, J., Bookheiner, S. Y., and Knowlton, B. J. (2004). An event-related fMRI study of artificial grammar learning in a balanced chunk strength design. J. Cogn. Neurosci. 16, 427–438.
McNab, F., Varrone, A., Farde, L., Jucaite, A., Bystritsky, P., Forssberg, H., and Klingberg, T. (2009). Changes in cortical dopamine D1 receptor binding associated with cognitive training. Science 323, 800–802.
Paradis, M. (1994). “Neurolinguistic aspects of implicit and explicit memory: implications for bilingualism,” in Implicit and Explicit Learning of Second Languages, ed. N. Ellis (London: Academic Press), 393–419.
Peigneux, P., Maquet, P., Van Der Linden, M., Meulemans, T., Degueldre, C., Delfiore, G., Luxen, A., Cleeremans, A., and Franck, G. (1999a). Left inferior frontal cortex is involved in probabilistic serial reaction time learning. Brain Cogn. 40, 215–219.
Rauch, S. L., Whalen, P. J., Savage, C. R., Curran, T., Kendrick, A., Brown, H. D., Bush, G., Breiter, H. C., and Rosen, B. R. (1997). Striatal recruitment during an implicit sequence learning task as measured by functional magnetic resonance imaging. Hum. Brain Mapp. 5, 124–132.
Rodriguez-Fornells, A., Cunillera, T., Mestres-Misse, A., and De Diego-Balaguer, R. (2009). Neurophysiological mechanisms involved in language learning in adults. Philos. Trans. R. Soc. Lond. B Biol. Sci. 364, 3711–3735.
Shohamy, D., Myers, C. E., Hopkins, R. O., Sage, J., and Gluck, M. A. (2009). Distinct hippocampal and basal ganglia contributions to probabilistic learning and reversal. J. Cogn. Neurosci. 21, 1821–1833.
Slevc, L. R., Rosenberg, J. C., and Patel, A. D. (2009). Making psycholinguistics musical: self-paced reading time evidence for shared processing of linguistic and musical syntax. Psychon. Bull. Rev. 16, 374–381.
Squire, L. R., and Knowlton, B. J. (2000). “The medial temporal lobe, the hippocampus, and the memory systems of the brain,” in The New Cognitive Neurosciences, ed. M. S. Gazzaniga (Cambridge, MA: MIT Press), 765–780.
Vakil, E., Kahan, S., Huberman, M., and Osimani, A. (2000). Motor and non-motor sequence learning in patients with basal ganglia lesions: the case of serial reaction time (SRT). Neuropsychologia 38, 1–10.
Wong, P. C. M., Chan, A. H. D., Roy, A., and Margulis, E. H. (in press). The bimusical brain is not two monomusical brains in one: evidence from musical affective processing. J. Cogn. Neurosci. 10.1162/jocn_a_00105. [Epub ahead of print].