Oculomotor learning revisited: a model of reinforcement learning in the basal ganglia incorporating an efference copy of motor actions

Fee, Michale Sean

doi:10.3389/fncir.2012.00038

HYPOTHESIS AND THEORY article

Front. Neural Circuits, 27 June 2012
Volume 6 - 2012 | https://doi.org/10.3389/fncir.2012.00038

Oculomotor learning revisited: a model of reinforcement learning in the basal ganglia incorporating an efference copy of motor actions

Michale S. Fee*

Department of Brain and Cognitive Sciences, McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA

In its simplest formulation, reinforcement learning is based on the idea that if an action taken in a particular context is followed by a favorable outcome, then, in the same context, the tendency to produce that action should be strengthened, or reinforced. While reinforcement learning forms the basis of many current theories of basal ganglia (BG) function, these models do not incorporate distinct computational roles for signals that convey context, and those that convey what action an animal takes. Recent experiments in the songbird suggest that vocal-related BG circuitry receives two functionally distinct excitatory inputs. One input is from a cortical region that carries context information about the current “time” in the motor sequence. The other is an efference copy of motor commands from a separate cortical brain region that generates vocal variability during learning. Based on these findings, I propose here a general model of vertebrate BG function that combines context information with a distinct motor efference copy signal. The signals are integrated by a learning rule in which efference copy inputs gate the potentiation of context inputs (but not efference copy inputs) onto medium spiny neurons in response to a rewarded action. The hypothesis is described in terms of a circuit that implements the learning of visually guided saccades. The model makes testable predictions about the anatomical and functional properties of hypothesized context and efference copy inputs to the striatum from both thalamic and cortical sources.

Introduction

One of the most fundamental problems an animal faces is how to modify its future actions based on the consequences of its past actions. One solution to this problem was first formulated by Edward Thorndike as the Law of Effect (Thorndike, 1911), according to which: “Responses that produce a satisfying effect in a particular situation become more likely to occur again in that situation.” The implementation of this principle, which forms the basis of reinforcement learning, instrumental learning, and stimulus-response learning (Sutton and Barto, 1998; Packard and Knowlton, 2002; Graybiel, 2008), requires three pieces of information: the action (response) that the animal makes, the context (situation) in which an action takes place, and an evaluation of the outcome (effect) of the action. Without any one of these components, learning of this type cannot occur.

Neural circuitry in the basal ganglia (BG) is intimately involved in the control of learned behaviors (Graybiel et al., 1994; Graybiel, 1998), and is thought to be essential for the modification of behavior through reinforcement (Barto, 1995; Doya, 2000; Daw and Doya, 2006). In the past few decades, a great deal of progress has been made in understanding the neural pathways that convey two of the key pieces of information noted above—outcome and context. An evaluation of the outcome of past actions is thought to be transmitted to the BG by neurons in dopaminergic brain centers (Wickens and Kötter, 1995; Reynolds et al., 2001; Hikosaka et al., 2006), whose firing rate signals the appearance of unexpected rewards or the absence of rewards that were expected (Montague et al., 1996; Schultz et al., 1997; Schultz, 2002; Cohen et al., 2012). The current situation, or context, in which the animal finds itself is thought to be transmitted to the striatum (the primary input structure of the BG) by a massive input from nearly all areas of sensory, motor, and premotor cortex (Graybiel et al., 1994; Wickens and Arbuthnott, 2010). In this framework, the term context includes all relevant information that might determine whether a particular action will lead to a positive outcome, including sensory stimuli, memory of recent past events, the time within a motor sequence, behavioral state, social context, and many others¹.

The combination of reward signals with these cortical signals allows the BG to determine which patterns of cortical activity are associated with, or predictive of, a favorable outcome and thus to bias motor and cognitive circuits toward favorable actions (Lauwereyns et al., 2002; Watanabe et al., 2003; Samejima et al., 2005; Frank and O'Reilly, 2006). In many current models of BG function, the learning about which cortical states are favorable is thought to proceed by a modification of the strength of corticostriatal inputs under the control of dopaminergic inputs (Kreitzer and Malenka, 2008; Shen et al., 2008). In other words, cortical inputs that are associated with subsequent reward become strengthened, thus allowing the BG to detect patterns of cortical activity that lead to reward.

The difficulty with this conception of BG function is that the ultimate function of the BG is to shape behavior, and in order to learn which particular actions led to a reward, BG circuitry must know what actions the animal actually made. One possibility is that the “actor” that generates exploratory behaviors during learning is within the BG itself (Berns and Sejnowski, 1998; Hikosaka et al., 1999; Ito and Doya, 2011). This view is based on the highly influential “actor/critic” model of reinforcement learning (Barto, 1995; Sutton and Barto, 1998). However, it is unlikely that the BG is the origin of exploratory behaviors during learning. First, the brain contains many behavior-generating circuits distributed throughout motor cortex and the brainstem (Swanson, 2000). Second, many studies suggest that the BG may be involved in the generation of learned behaviors after learning, but that it is not a generator of spontaneous motor actions before learning. For example, lesions of the striatum can affect the ability of rats to learn the association between a visual cue and the correct response in a maze, but these lesions have little effect on their ability to engage in motor aspects of task training and navigation in the maze (Packard et al., 1989; McDonald and White, 1994; Packard and McGaugh, 1996). Nor do striatal lesions prevent the animal from learning spatial aspects of the task, which are instead affected by hippocampal lesions (Packard et al., 1989; Packard and Knowlton, 2002; Featherstone and McDonald, 2004). Similarly, lesions of the vocal-related BG in juvenile songbirds prevent subsequent vocal learning (Sohrabji et al., 1990; Scharff and Nottebohm, 1991), but have little immediate effect on the generation of exploratory vocal variability during learning (Goldberg and Fee, 2011). Also, lesions in adult birds, after learning, also have no effect on song production (Scharff and Nottebohm, 1991). Together, these findings are consistent with an emerging view that the BG may not be the source of motor actions, but may serve to select from, or modify through learning, motor programs generated elsewhere (Cools, 1980; Mink, 1996; Redgrave et al., 1999; Gurney et al., 2001; Brown et al., 2004; Grillner et al., 2005; Atallah et al., 2007).

So if the BG does not originate motor actions, then, in order to have information about what actions were taken, it must receive a copy of motor commands generated by motor circuits located elsewhere. Indeed, the songbird BG receives an efference copy of signals generated by cortical premotor vocal circuits (Vates and Nottebohm, 1995; Hessler and Doupe, 1999; Fee and Scharff, 2010). These findings have inspired a simple model of vocal learning in the songbird in which BG circuitry receives three separate inputs: an efference copy of motor signals that drive exploratory song variations, a context signal indicating the time in the song, and a reinforcement signal carrying song performance information (Fee and Goldberg, 2011). The BG then integrates this information to determine which vocal variations at each time in the song result in better performance (Fee and Goldberg, 2011). In the songbird model, the results of this computation are then transmitted through the output pathways of the songbird BG to bias neural activity and direct synaptic plasticity within cortical motor circuits. In this view, the BG is not the generator of motor actions (the “actor”), rather, its role is to evaluate the outcome of actions generated elsewhere and to use the results of this computation to bias (or direct) activity in other motor circuits to improve the likelihood that future actions will lead to reward.

To explore the implications of this view for reinforcement learning in the mammalian BG, here I combine these same elements into a simple model of oculomotor learning in the primate that incorporates an efference copy of actions (eye movements). I chose the oculomotor system because there is a tremendous amount of information about the firing patterns of neurons in different parts of the BG pathway in during oculomotor learning. I will first describe the basic elements of the current view of BG function in oculomotor control and highlight a potential weakness in our current mechanistic understanding of oculomotor learning. I will then review recent insights from the songbird that may solve this problem. Finally, I will return to the oculomotor system to show how a songbird-inspired model of BG function works well in this system, and potentially in other behaviors.

A Model of BG Control of Oculomotor Behavior

Much is known about the basic functional organization of the BG, and has been reviewed thoroughly elsewhere (Mink, 1996; Graybiel, 2005; Tseng and Steiner, 2010). Of particular importance are a series of studies from Hikosaka and Wurtz in which monkeys were trained to produce saccadic eye movements toward a visual target (Hikosaka and Wurtz, 1983a; Hikosaka et al., 2000). The BG control eye movements via an inhibitory projection of the substantia nigra pars reticulata (SNr) onto intermediate layers of the superior colliculus (Jayaraman et al., 1977; Graybiel, 1978; Chevalier et al., 1984; May and Hall, 1984). Neurons in the SNr exhibit a pause in their tonic activity prior to eye movement into a particular part of the visual field (Hikosaka and Wurtz, 1983a; Basso and Wurtz, 2002), releasing the SC from inhibition, thus driving or augmenting the performance of a saccade (Figures 1A,B). The movement fields of SNr neurons match the movement fields in the region of the superior colliculus to which they project (Robinson, 1972; Hikosaka and Wurtz, 1983b), consistent with the idea that SNr neurons are segregated into different output “channels” that can influence saccades in different directions (Figure 1A).

FIGURE 1

Figure 1. The basal ganglia can drive learned changes in visually- guided saccades. (A) Schematic diagram of the direct pathway of an oculomotor circuit in the BG. The output of the BG can be thought of as having discrete motor “channels.” Shown are two channels that project to the superior colliculus and can drive saccades to the left or right. In this simple model, these channels can be driven by sensory inputs from cortex, illustrated here by neurons responding to the appearance of visual targets 1 and 2. (B) Neurons in the substantia nigra pars reticulate (SNr) are tonically active and inhibit the generation of saccades by the superior colliculus. SNr neurons can be inhibited by spiking in medium spiny neurons (MSNs) in the striatum, thus releasing the superior colliculus from inhibition (adapted from Hikosaka et al., 2000). (C) Illustration of a stimulus-response task in which only saccades in one direction (e.g., leftward saccade) are rewarded while saccades in the other direction were not rewarded. (D) During training, saccades in the rewarded direction become faster are generated with a shorter latency than saccades in the unrewarded direction. This behavioral change is thought to be mediated by activation of MSNs in the rewarded “channel” by the appropriate cortical inputs. Images in panels (C,D) are taken from Lauwereyns et al. (2002).

SNr neurons receive inhibitory input from medium spiny neurons (MSNs) in the striatum, allowing MSNs to influence the behavior of downstream motor circuits². For example, when monkeys are trained to make a saccade to a visual target, some MSNs generate a burst of activity (Hikosaka et al., 1989a) that inhibits downstream SNr neurons. The resulting “pause” in the tonic activity of SNr neurons leads to a dis-inhibition of saccade-generating neurons in the SC (Hikosaka et al., 2000; Figure 1B). The firing of MSNs is associated with increased speed and reduced latency of saccades (Watanabe et al., 2003).

Importantly, MSNs are not active prior to spontaneous saccades (Hikosaka et al., 1989a), but become active and exhibit complex firing patterns under conditions in which the animal is rewarded for some behaviors and not others (Hikosaka et al., 1989b,c). For example, if a monkey is trained to make saccades to several different targets, but only saccades to one of these targets are rewarded (Figure 1C), the animal begins to make saccades more rapidly in the rewarded direction and more slowly in the unrewarded direction (Kawagoe et al., 1998, 2004; Figure 1D). In these experiments, many MSNs became active prior to saccades only in the rewarded direction, and only when the rewarded direction was into a particular part of the visual field (to the left, for example). A similar degree of selectivity for reward and saccade direction was expressed by neurons in the SNr, which exhibited a suppression of activity only for rewarded saccades in a particular direction (Sato and Hikosaka, 2002), and in the superior colliculus, which exhibited increased activity in the rewarded direction (Ikeda and Hikosaka, 2003). These neurons exhibit the precise firing patterns expected for the striatal, SNr, and SC neurons in the model shown in Figure 1A (Hikosaka et al., 2006).

MSNs that, after learning, exhibit enhanced activity prior to a saccade in only one direction (say, to the left), and only in blocks of trials in which that direction was rewarded, can be thought of as signaling the value of a leftward saccade in a particular context. As a population, these neurons signal the expected value of all the different actions the animal might perform (Samejima et al., 2005). It is specifically these “action-value” MSNs that are used in the circuits described in this paper (Figures 1–4). Other MSNs develop a range of responses to different aspects of the target stimuli and the task (Hikosaka et al., 1989b,c; Lauwereyns et al., 2002). For example, some neurons become active only after a specific action is taken, and the size of the response correlates with the size of the reward the animal received from that choice of action (“chosen-value” neurons; Lau and Glimcher, 2008; Cai et al., 2011). Yet other MSNs developed a large visual response to targets that cue a saccade in any rewarded direction, independent of saccade direction (Kawagoe et al., 1998; Kobayashi et al., 2007), thus signaling the predicted value of the cue. The activity patterns of “chosen-value” and “cue-value” MSNs likely play a key role in computing the difference between the actual and predicted rewards (reward prediction error; Schultz et al., 1997). These MSNs may reside in the patch/striosome part of the striatum, which projects to dopaminergic centers rather than to the SNr (Graybiel and Ragsdale, 1978; Gerfen et al., 1987).

In the oculomotor learning task described above, the target stimuli represent the context that determines whether a particular action (e.g., a saccade in a particular direction) will be rewarded. For example, the appearance of a stimulus (target 1), followed by a saccade to the left results in reward, but the appearance of another stimulus (target 2) followed by a saccade to the left does not result in reward. In general, the association between a context (stimulus) and a response is arbitrary. For example, monkeys can be trained to saccade toward or away from a particular stimulus (Kunimatsu and Tanaka, 2010), or can be trained to make saccades in a particular direction depending the identity of an object image, rather than its location (Pasupathy and Miller, 2005).

Information about target stimuli in the oculomotor learning task is thought to be transmitted to the striatum by cortical inputs (Hikosaka et al., 2006). Thus the model shown in Figure 1A contains two cortical “units” that signal the appearance of each of the two targets³. Because the two cortical units represent arbitrary stimuli to which saccades can become associated, we assume that the connectivity from the cortical units to the MSNs is potentially all-to-all, and is initially weak. Let us imagine now that saccades to the left are rewarded when they occur after the appearance of target 1. In this case, we want to strengthen the connection from Ctx-1 to the left MSN (MSN-L), since activation of the left MSN will enhance the probability of generating (or increase the velocity of) a leftward saccade. One synaptic learning rule that has been hypothesized to implement this plastic change can be summarized as follows: strengthen a corticostriatal synapse whenever its presynaptic input is coincident with activity in the postsynaptic MSN, and is followed by reward (Houk et al., 1995; Wickens and Kötter, 1995; Hikosaka et al., 2006). In this case, the Ctx-1 to MSN-L synapse will be strengthened whenever target 1 appears (activating Ctx-1), and MSN-L happens to be active, causing the monkey to make a left saccade and resulting in a global dopamine reward signal. As desired, the strengthening of the Ctx-1 to MSN-L synapse will lead to activation of the left MSN, and a bias toward leftward saccades on future appearances of target 1.

Implicit in this model of oculomotor learning is the idea that the two MSNs are the “actors” that choose which way the monkey will saccade during learning. Early in the learning process, the monkey does not know which way to saccade in order to receive a reward, and will thus tend to make a random choice on each trial. In this model there must be some mechanism that adds a “randomness” or variability to MSN activity. In one prescription of the learning process (Hikosaka et al., 2006), MSNs are thought to be activated by the sensory cortical inputs even early in learning, so the randomness involved in the activation of MSNs is specified to arise from trial-to-trial variations in the strength of these corticostriatal synapses. Of course, with the learning rule stated above, any mechanism would work in which random variations in saccade directions are caused by random variations in MSN activity.

It is unlikely, however, that MSNs in the striatum are the “variability generators” responsible for driving the random trial-and-error saccades early in the learning process. First, MSNs are largely silent in untrained animals (Hikosaka et al., 1989a). Second, SNr neurons do not appear to produce pauses prior to spontaneous saccades, as they do after learning (Hikosaka and Wurtz, 1983a). Furthermore, lesions of the SNr lead to increased spontaneous saccade generation (Hikosaka and Wurtz, 1985), rather than the decrease that might be predicted if the BG were the motor source of spontaneous saccades early during learning. It appears, therefore, that such trial-and-error saccades are not initiated in the striatum, but are likely generated by one of the many of brain circuits that project to the superior colliculus, and are capable of triggering or influencing saccade generation (Wurtz and Albano, 1980).

In order to learn the outcome of actions, striatal MSNs must know what actions an animal just took. But if MSNs don't generate the actions during learning, how do they get this information? The resolution of this paradox was made explicit in a recent model of vocal learning in the songbird (Fee and Goldberg, 2011). In the songbird, exploratory variability during singing is generated by a cortical brain region that also transmits an efference copy of motor commands to the BG. By receiving an efference copy of actions generated elsewhere in the brain, MSNs in the model are able to evaluate the outcome of these actions, and then appropriately influence the “actors,” even when those actors are circuits outside of the BG. I will now turn to a description of the songbird model before applying this principle to the problem of oculomotor learning.

A Model of Songbird BG Incorporating an Efference Copy of Motor Actions

Songbirds acquire their songs by vocal imitation, and it has been proposed that this is achieved by a reinforcement learning mechanism (Doya and Sejnowski, 1995, 2000; Tumer and Brainard, 2007; Fee and Goldberg, 2011). An essential brain area underlying vocal learning in the songbird is Area X (Sohrabji et al., 1990; Scharff and Nottebohm, 1991), a BG circuit with a high degree of homology with the mammalian BG (Figures 2A,B; Jarvis, 2004; Reiner et al., 2004; Doupe et al., 2005; Person et al., 2008). Area X includes both striatal medium spiny neurons (Farries and Perkel, 2002; Goldberg and Fee, 2010), as well as pallidal neurons that project to the thalamus (Luo and Perkel, 1999; Farries et al., 2005; Goldberg et al., 2010).

FIGURE 2

Figure 2. A model of vocal learning in the songbird. (A) Schematic diagram of nuclei involved in song production and song learning. (B) Hypothesized homology between songbird and mammalian brain areas (MC, motor cortex). (C) Firing patterns of a single corticostriatal neuron in LMAN during singing. Each row of the raster plot shows the spikes produced during a different rendition of the song (spectrogram shown at top). The high degree of variability in LMAN activity is thought to drive exploratory song variations during learning. (D) Firing patterns of seven different corticostriatal neurons in HVC during singing. Raster plot shows spike produced during 10 sequential song renditions for each neuron. Note the highly stereotyped and sparse burst pattern of each neuron. The spiking of MSNs shows a similar degree of sparseness, potentially allowing MSN to compute the value of LMAN fluctuations independently at each time in the song. (E) A simple hypothesized circuit for song learning. LMAN and HVC inputs converge, together with dopaminergic inputs from the VTA, onto a single medium spiny neuron (MSN) in Area X. The LMAN input to the MSN (hollow circle) arises as an axon collateral of the projection of LMAN to the motor pathway (RA). This efference copy input does not drive spiking in the MSN, but gates synaptic plasticity at the HVC input (filled circle). If LMAN activity is coincident with the HVC input, and leads to improved song performance (signaled by increased dopamine input), then the HVC-MSN synapse is strengthened. On future song renditions, the HVC input drives the MSN to spike, thus disinhibiting the thalamus and biasing the LMAN neuron to be more active at that time. (F) Schematic showing the closed topographical loops between LMAN, Area X and the thalamic nucleus DLM (Luo et al., 2001). This allows Area X to independently evaluate and bias activity in each different subregion of LMAN. Also shown are the hypothesized divergent inputs from HVC.

Area X receives two distinct glutamatergic inputs. One input arises from the lateral nucleus of the anterior nidopallium (LMAN; Vates and Nottebohm, 1995), a cortical area known to be important for vocal learning (Bottjer et al., 1984; Scharff and Nottebohm, 1991; Brainard and Doupe, 2002). A key function of LMAN in vocal learning is the generation of vocal babbling and exploratory variability in learning birds (Kao et al., 2005; Olveczky et al., 2005; Kao and Brainard, 2006; Tumer and Brainard, 2007; Aronov et al., 2008; Hampton et al., 2009; Stepanek and Doupe, 2010). Individual LMAN neurons project to the motor pathway and produce a collateral that terminates in Area X. During singing, these LMAN neurons generate highly variable patterns of activity (Figure 2C; Hessler and Doupe, 1999; Kao et al., 2005, 2008; Olveczky et al., 2005; Aronov et al., 2008) that drive variability in the vocal motor pathway (Sober et al., 2008; Olveczky et al., 2011). Thus, the input to Area X from LMAN is an efference copy of an ongoing motor signal that drives vocal exploration.

Importantly, complete bilateral lesions of Area X have little effect on vocal variability in juvenile birds, suggesting that the BG circuitry is not directly involved in the generation of vocal exploration during learning (Goldberg and Fee, 2011). Furthermore, local brain cooling within LMAN results in slowing of the characteristic timescales of vocal babbling, suggesting that the biophysical and circuit dynamics within LMAN are involved in generating vocal variability (Aronov et al., 2011). These experiments support the idea that the cortical nucleus LMAN is the “variability generator” that drives vocal exploration during learning.

A second input to Area X comes from nucleus HVC (used as a proper name), a cortical region that controls the temporal structure of the song (Margoliash and Yu, 1996; Hahnloser et al., 2002; Long and Fee, 2008; Long et al., 2010). The HVC neurons projecting to Area X burst very sparsely, many generating a single highly reliable burst of spikes at one or a few specific moments in the song (Kozhevnikov and Fee, 2007; Prather et al., 2008; Fujimoto et al., 2011; Figure 2D). This input has been hypothesized to serve as a “context” signal that carries information about the current time in the song (Kozhevnikov and Fee, 2007; Fee and Goldberg, 2011). Interestingly, MSNs in Area X also fire extremely sparsely during singing, producing at most one burst of spikes at a specific moment of the song (Goldberg and Fee, 2010). This pattern suggests that the spiking of MSNs is likely driven by HVC (“context”) inputs rather than LMAN (“efference copy”) inputs. Furthermore, it indicates that spiking in any one MSN is driven by a very small subset of HVC inputs that are coactive at one moment in the song.

In songbirds, Area X also receives a large dopaminergic projection from VTA (Gale et al., 2008), and several lines of evidence suggest that this input could be important for song learning (Ding et al., 2003; Harding, 2004; Gale and Perkel, 2005; Kubikova and Kostal, 2010; Kubikova et al., 2010). In particular, it has been suggested that this input may serve as a fast reward prediction error signal that carries real-time information about song performance (Gale and Perkel, 2010; Fee and Goldberg, 2011). The evaluation of song performance in auditory cortical areas could be transmitted to VTA via descending forebrain projections (Gale et al., 2008; Gale and Perkel, 2010; Las et al., 2011). With such a song evaluation signal, and an efference copy of the LMAN activity leading to vocal variability, Area X would be in a position to determine which variations lead to a better song outcome.

Of course, just as a leftward saccade might lead to reward after the appearance of target 1 but lead to no reward after target 2, a particular song variation generated by LMAN might make the song better at one point in the song, but make it worse at another point. Thus, the evaluation of LMAN activity would need to be carried out independently at every time point in the song. The sparse firing of MSNs and of HVC inputs to Area X would facilitate the temporal specificity of this computation (Fiete et al., 2004). It has been proposed that such a context-specific evaluation of LMAN activity could be implemented with a synaptic learning rule, perhaps related to gated Hebbian “triplet” learning rules (Farries and Fairhall, 2007; Fiete et al., 2007; Izhikevich, 2007; Redondo and Morris, 2011), that detects coincident activation of LMAN and HVC inputs, followed by a dopaminergic reward signal (Fee and Goldberg, 2011; Figure 2E). The result of this pattern of coincident inputs would be to strengthen HVC-to-MSN synapses such that the activity of a medium spiny neuron at a particular time would indicate that activity in the LMAN neuron at that time consistently led to a better song performance. Finally, the output of Area X would be transmitted back to LMAN through the thalamus to bias future LMAN-generated song variations in the direction of better song performance (Kao et al., 2005; Olveczky et al., 2005; Fee and Goldberg, 2011). Such biased vocal variability has been directly observed in learning birds (Andalman and Fee, 2009; Warren et al., 2011).

An important feature of the HVC and LMAN inputs to Area X is their pattern of axonal projections (Figure 2F). The projection from LMAN to Area X is local and topographically organized, as is the projection of LMAN to the robust nucleus of the arcopallium (RA), the myotopically organized output nucleus of the motor pathway (Vicario, 1991; Iyengar et al., 1999). Thus Area X can be considered as divided into discrete motor “channels.” Furthermore, the projections from Area X to the pallidorecipient thalamic nucleus DLM (medial portion of the dorsolateral thalamus) and from DLM back to LMAN are also topographically organized, such that the connections among these three nuclei form discrete closed loops (Johnson et al., 1995; Luo et al., 2001). The closed-loop nature of these projections allows MSNs to feed back and selectively influence the activity of the particular subset of LMAN neurons from which they receive inputs (Figure 2F).

A Novel Model of Oculomotor Learning That Incorporates Efference Copy of Motor Actions

Returning now to the problem of oculomotor learning, we can imagine that early during learning—before the monkey has learned the association between a visual target and the saccade direction that leads to reward—there is a “variability generator” that generates random “guesses” at saccade direction during each behavioral trial. It could do this by transmitting a command signal to the superior colliculus, analogous to the commands sent by LMAN to the songbird vocal motor pathway. In order to explain how the striatum can learn the value of the saccade guesses, I hypothesize that MSNs must receive an efference copy of these saccade commands, just as Area X receives an efference copy of LMAN activity. There are several brain regions that could potentially generate saccade “guesses” during learning, but one likely possibility is the cortical frontal eye fields (FEF). The FEF sends a topographically organized projection to the superior colliculus (Bruce and Goldberg, 1985; Komatsu and Suzuki, 1985), an efference copy of which is transmitted to the striatum (Kunzle and Akert, 1977; Alexander et al., 1986). In the model shown in Figure 3A, the role of the efference copy input is not to drive spiking activity in the MSN, but rather to gate synaptic plasticity at cortical context inputs. The efference copy input is envisioned as a glutamatergic synapse that could operate at a mechanistic level by depolarizing MSN dendrites sufficiently to “enable” corticostriatal plasticity (Charpier and Deniau, 1997; Reynolds et al., 2001; Plotkin et al., 2011), but not necessarily enough to drive spiking.

FIGURE 3

Figure 3. A model of oculomotor learning in the BG incorporating an efference copy signal. (A) “Random” saccades during learning are generated in cortical frontal eye fields (dark yellow, FEF). Efference copy inputs to the MSN (hollow circle, analogous to LMAN inputs to Area X) arise from a collateral of the descending motor commands from the FEF to the superior colliculus (SC). Context inputs to the MSN (filled circles, analogous to HVC inputs to Area X) arise from cortical neurons conveying sensory inputs. The output of the SNr biases saccade generation by a projection to intermediate “motor” layers of SC. (B) The hypothesized learning rule that incorporates efferency copy, context, and reward signals. Coincident activation of context (CX) and efference copy (EC) inputs activates a transient eligibility trace (E_trace). If a reward signal (R_eward) coincides with the eligibility trace, then the CX input is strengthened (ΔW_CX-MSN > 0). (C) Hypothesized sequence of events during learning. (1) Cortical neuron Ctx-1 becomes active indicating the appearance of a particular target (i.e., Target 1). (2) The FEF generates a “random guess” at a saccade direction, in this case, to the left. This combination activates an eligibility trace in the Ctx-1 to MSN-L synapse. (3) If leftward saccades are rewarded in response to Target 1, monkey receives a reward, resulting in increased spiking in dopaminergic VTA neurons. (4) The coincidence of the reward and eligibility trace results in strengthening of the Ctx-1 to MSN-L synapse. Thus, future appearances of Target-1 will bias the monkey to make a leftward saccade.

If a particular action occurring within a particular context leads to reward, then future occurrences of that context should cause the action to occur with a higher probability (Thorndike, 1911). At the synaptic level this could be achieved by the same learning rule described above for the songbird—strengthening the context inputs that were active simultaneously with the arrival of a motor efference copy signal (action), and are followed by a reward signal (Figure 3B). In the model of oculomotor learning shown in Figure 3A, the sequence of events during learning would occur as follows (Figure 3C): on one particular trial, the appearance of the target 1 (which activates the Ctx-1 neurons), may be followed by a chance activation of the FEF-L neuron that initiates a saccade to the left. The left MSN will then simultaneously receive a context input from Ctx-1 and an efference copy input from FEF-L. If left saccades are rewarded after the appearance of target 1, this context-action pairing will be followed by a widespread dopaminergic reward signal from VTA/SNc. This combination of inputs would strengthen the Ctx-1 to MSN-L synapse. The Ctx-1 to MSN-R synapse will not be strengthened because, in this model, the efference copy of left saccades is transmitted only to the left MSN, and corticostriatal plasticity will only be enabled in this MSN (Figure 3C, right panel). The result of this learning rule is that future appearances of the target 1 will activate the left MSN neuron, which would initiate leftward saccades by the direct action of SNr neurons on the SC. In short, by incorporating an efference copy signal, the model is able to learn, in a highly specific manner, the value of any action in any context, as long as MSNs controlling that action receive CX inputs from a neuron signaling that context.

MSNs could also influence saccade direction via the pallidal projection to thalamic nuclei that project back to the FEF (Figure 4A; Alexander et al., 1986). This could serve to bias the “variability generator” in the FEF to generate leftward saccades after the appearance of target 1 through a BG-thalamocortical loop, in much the same way that Area X has been proposed to bias vocal variability from LMAN during learning (Andalman and Fee, 2009; Fee and Scharff, 2010; Fee and Goldberg, 2011; Warren et al., 2011). After learning, the BG would consistently drive activity in the FEF-L neuron after the appearance of target 1 (signaled by the activation of Ctx-1). Indeed, recordings of FEF neurons have revealed reward-related bias similar to that observed in the striatum (Ding and Hikosaka, 2006). This consistent pairing of activity in Ctx-1 and FEF-L could lead to strengthening of a direct connection from Ctx-1 to FEF-L, similar to the consolidation of LMAN driven bias into the direct HVC-to-RA projection, hypothesized in a recently proposed model of songbird vocal learning (Andalman and Fee, 2009; Fee and Scharff, 2010; Fee and Goldberg, 2011; Warren et al., 2011). This is also a mechanism by which often-repeated stimulus-response associations could be transformed into a cortically driven habitual behavior (Hikosaka et al., 2002; Yin and Knowlton, 2006; Graybiel, 2008).

FIGURE 4

Figure 4. Three additional models of BG circuits incorporating efference copy. (A) Efference copy comes from the FEF, as in Figure 3A, but the output of the BG acts to bias saccade generation in the FEF through the pallido-thalamo-cortical loop, rather than acting directly on the SC. (B) A model in which “random” saccades may be driven by any input to the SC, and efference copy signals to the BG arise from ascending tectothalamic and thalamostriatal pathways. In this model, the BG biases saccade generation by acting directly on the SC. (C) A model in which both efference copy inputs and context inputs arise from thalamostriatal pathways. This model is hypothesized to represent an evolutionarily early role for the BG in controlling brainstem-generated behavior.

Asymmetries Between Context and Efference Copy Inputs

This model incorporates two fundamental asymmetries between context (CX) inputs and efference copy (EC) inputs to the striatum. First, plasticity is only produced at CX inputs. In this model, the function of the BG circuit is to drive or bias a particular action (leftward saccade) in a given context (i.e., appearance of target 1). Such bias is naturally produced by strengthening the context input onto MSNs. From the perspective of learning an oculomotor stimulus-response association, it makes no sense to also strengthen the EC input to the MSN, the result of which would be that a spontaneous leftward saccade would tend to initiate another leftward saccade, independent of context.

A second essential asymmetry between CX and EC inputs relates to the convergence and divergence of these inputs onto MSNs. Namely, the projection of EC signals must be local within one motor channel of the BG, while the projection of CX signals must be highly divergent across many motor channels. Local projections of EC inputs within one motor channel of the striatum is required because, in order to learn the value of a saccade in a particular direction, the efference copy signal indicating a saccade in a particular direction needs to project precisely to the same MSNs that can influence that saccade direction in the future. For example, in the model shown in Figure 3A, if the efference copy inputs from the FEF each projected to both MSNs, there would be no specificity in the synaptic learning rule. Most generally, crosstalk of EC inputs across motor channels would have a detrimental effect, causing spurious actions to be learned. The highly topographic projection from LMAN to Area X exhibits precisely the kind of specificity suggested (Johnson et al., 1995; Luo et al., 2001), allowing MSNs to evaluate the effect of variability introduced into distinct “channels” of the motor pathway.

There is evidence for this type of channel specificity in mammalian BG circuits. In primates, for example, there is a coarse topographic organization of separate cortico-BG-thalamocortical loops for skeletomotor, oculomotor, prefrontal, and limbic circuits (Alexander et al., 1986). There is even evidence for some finer-grained topographic specificity within these larger loops. For example, multiple distinct pathways have been identified within the skeletomotor BG-thalamocortical circuit (Hoover and Strick, 1993), and there is some evidence for discrete topography in the output pathways of frontal eye fields (Robinson and Fuchs, 1969; Bruce and Goldberg, 1985; Komatsu and Suzuki, 1985; Schlag-Rey et al., 1992). It is unknown if motor and other coarse loops of the BG-cortical circuits exhibit the kind of fine-grained functional topography observed in the songbird, and predicted by the model proposed here.

In contrast to the local projections of EC inputs, the projection of CX signals must be highly divergent across many motor channels because context inputs to the striatum should have no intrinsic meaning in relation to actions. A particular action might lead to a reward in one context, but lead to an undesirable outcome in another context, and there may be little a priori knowledge about which contexts require which actions. It therefore seems adaptive to build in an enormous divergence in the projection of cortical context inputs onto MSNs. Indeed, the large degree of convergence of cortical projections onto striatal MSNs are widely recognized to be important (Goldman-Rakic and Selemon, 1986; Flaherty and Graybiel, 1991, 1993, 1994) in part because they may endow MSNs with an enormous capacity for pattern recognition (Kincaid et al., 1998; Zheng and Wilson, 2002; Bar-Gad et al., 2003) and identification of cortical “states” (Houk and Wise, 1995; Houk, 1995; Graybiel, 1998). From the perspective of our model, this could be used to link a wide variety of contexts to any action. In the songbird, for example, the projections from HVC to Area X are not topographically organized (Nottebohm et al., 1982; Luo et al., 2001), thus potentially allowing MSNs in every motor channel to evaluate LMAN activity at every time point in the song (Fee and Goldberg, 2011).

The view that the striatum receives functionally distinct cortical signals has already been proposed on the basis that cortical neurons produce two distinct types of projections to the striatum (Reiner et al., 2010)—one from pyramidal tract (PT) neurons in deep cortical layers and another from intratelencephalic (IT) neurons in layer 3 and upper layer 5 (Ramon y Cajal, 1911; Wilson, 1987; Cowan and Wilson, 1994; Levesque et al., 1996a,b; Levesque and Parent, 1998; Reiner et al., 2003; Parent and Parent, 2006). PT neurons (by definition) project to the spinal cord or to brainstem motor structures, and mediate the descending “motor” output of cortex. The axons of these neurons produce a fine collateral axon that terminates focally within the striatum, and typically forms a dense terminal plexus no more than 500 μm in diameter (Cowan and Wilson, 1994; Kincaid and Wilson, 1996; Parent and Parent, 2006). In contrast, IT neurons project only within the telencephalon, often to the contralateral cortical hemisphere (Gerfen and Wilson, 1996; Wright et al., 2001), and produce an axon collateral that projects diffusely within the striatum, typically over distances of several millimeters (Cowan and Wilson, 1994; Kincaid and Wilson, 1996; Parent and Parent, 2006). This pattern of striatal projections is consistent with the notion that PT neurons carry efference copy information and IT neurons carry context information. Thus, in the models shown in Figures 3A and 4A, the efference copy inputs to the striatum are hypothesized to be carried by topographically localized PT fibers from the FEF, while the context inputs would be carried by the diffuse IT fibers from a wide range of cortical areas, including potentially sensory and task-related frontal cortical areas.

Notably, the different firing patterns of PT and IT neurons in the motor and premotor cortex of primates also suggest that these inputs serve different functions (Reiner et al., 2010), perhaps related to their hypothesized role as efference copy and context signals, respectively. For example, the activity of PT neurons is more dense and more closely related to variations in motor activity, while that of IT neurons is sparser and perhaps more related to movement planning (Bauswein et al., 1989; Turner and DeLong, 2000; Beloozerova et al., 2003).

A Model of Motor Learning with A Thalamic Source of Efference Copy Signals

In relation to oculomotor learning, thalamostriatal projections are another particularly attractive candidate source of efference copy information. Several thalamic nuclei receive projections from the superior colliculus (McHaffie et al., 2005), in particular from the intermediate and deep layers in which neurons exhibit saccade-related motor activity (Harting et al., 1980; Krout et al., 2001). In primates, electrophysiological recordings in one of these thalamic nuclei (the lateral portion of the mediodorsal nucleus, MD) have confirmed the presence of neurons with robust saccade-related corollary discharge activity, producing a premotor burst of spikes tuned to saccades of a particular magnitude and direction (Sommer and Wurtz, 2002). While the focus of this work was on the role of MD in carrying efference copy signals to the frontal eye fields (Sommer and Wurtz, 2008), there is some evidence from anatomical studies in the cat that neurons in the homologous portions of the MD may also project to the striatum (Royce, 1983).

Figure 4B illustrates a simple model of oculomotor learning incorporating a thalamic source of efference copy information originating in the SC. The learning rule, and the logic by which action-specific learning happens in this circuit, is identical to the model with a cortical efference copy shown in Figures 3A and 4A. However, these previous models suffer from the difficulty that a large number of brain areas, besides the FEF, project to deep and intermediate layers of the superior colliculus (Wurtz and Albano, 1980), and can potentially influence saccade decisions. Thus, if efference copy information comes only from the FEF, the BG would not have access to information about saccades generated by these other components of the oculomotor system, and would not be able to learn from saccades generated by these other circuits. Thus, the advantage of the model shown in Figure 4B, in which efference copy signals arise from low-level brainstem motor systems and are transmitted back to the striatum through thalamostriatal pathways, is that the efference copy informs the striatum what action actually took place, not just what was instructed by the FEF.

Thalamostriatal inputs exhibit some features that may be consistent with a possible role in transmitting efference copy signals. The thalamostriatal projection forms excitatory glutamatergic synapses onto both direct- and indirect-pathway MSNs (Kemp and Powell, 1971; Wilson et al., 1983; Smith and Bolam, 1990; Doig et al., 2010; Reiner et al., 2010) that have properties distinct from corticostriatal synapses (Ding et al., 2008) and have been hypothesized to play a distinct role in striatal computations (Smith et al., 2011). Most importantly, striatal projections from some thalamic nuclei have been shown to produce a localized and dense terminal plexus (Deschenes et al., 1995; Deschenes and Bourassa, 1996; McFarland and Haber, 2001), similar to that described above for cortical PT neurons. Thus, there is some evidence that direct pathway MSNs receive topographically localized projections from the thalamus and topographically diffuse projections from IT-type cortical neurons, as required for the model shown in Figure 4B. We will return to a discussion of the indirect pathway in a later section of the paper.

The Ultrastructure of Striatal Inputs: Context and Efference Copy

In the model described above, context and efference copy inputs have a fundamental asymmetry in how they drive spiking activity in MSNs and how they undergo plasticity. Namely, context inputs are the primary drivers of spiking activity in MSNs, and are the only site of corticostriatal plasticity during learning. The model, as formulated, does not require that EC inputs drive spiking or undergo plastic changes. Such functional differences would likely be reflected in a structural asymmetry at the synaptic level. Of particular interest are reports that many thalamostriatal axons fibers terminate on the dendritic shafts of MSNs (Sadikot et al., 1992; Smith et al., 1994; Sidibe and Smith, 1996; Doig et al., 2010), while IT-type corticostriatal fibers synapse primarily on dendritic spines (Kemp and Powell, 1971; Reiner et al., 2003). How does this pattern relate to the hypothesized function of cortical and efference copy inputs? Because CX inputs onto MSNs are highly divergent, with each IT fiber probably contacting only a single spine on a given MSN (Kincaid et al., 1998; Zheng and Wilson, 2002), neighboring spines likely carry very different context signals. Thus, in order to avoid cross talk between neighboring CX inputs, the postsynaptic signals mediating plasticity at these inputs should be highly restricted to a single synapse. Such localization of synaptic signals and synaptic apparatus is thought to be one of the most important physiological functions of synaptic spines (Yuste, 2011).

In contrast, in the proposed models, EC inputs can be treated computationally as a single cell-wide input to an entire MSN; there is no reason to isolate EC synapses from either CX inputs or other EC inputs. Thus, based on the computational role of CX and EC inputs, it would make sense for context inputs (e.g., from IT fibers) to terminate on MSN spines and for efference copy inputs to terminate on dendritic shafts. According to the earlier identification of LMAN inputs to Area X as an efference copy signal and HVC inputs as a context signal, two clear predictions of this model are that (1) LMAN axons should terminate preferentially onto dendritic shafts of Area X MSNs, and (2) the dendritic spines of these MSNs to be contacted primarily by HVC axons.

How might efference copy and context inputs to MSNs interact to produce the desired learning rule, depicted in Figure 3B? It has previously been observed that corticostriatal plasticity is strongly modulated by the hyperpolarization state of the postsynaptic MSN (Charpier and Deniau, 1997). Thus, EC inputs could act to directly depolarize MSN dendrites, thus providing a widely distributed intracellular signal that could “enable” plasticity at corticostriatal context inputs, perhaps by pushing the dendrite into the “up” state (Wilson and Kawaguchi, 1996; Stern et al., 1998). Corticostriatal LTP is dependent on postsynaptic calcium (Charpier and Deniau, 1997), and postsynaptic calcium influx into spines following glutamatergic activation can be localized to a single spine, and is enhanced when the neuron is in a depolarized up state (Carter and Sabatini, 2004). Additionally, corticostriatal plasticity exhibits a strong dependence on the relative timing between cortical input and depolarization induced by backpropagating action potentials (Pawlak and Kerr, 2008) such that corticostriatal input followed by MSN spiking leads to long-term potentiation of cortical input. While Pawlak and Kerr interpreted these findings in terms of a Hebbian relation between presynaptic input and postsynaptic activity, such a mechanism could also result in selective potentiation of corticostriatal CX inputs at which presynaptic input is followed by dendritic depolarization due to an efference copy input, rather than from backpropagating action potentials.

Synaptic spines that received corticospinous CX input followed by depolarization from dendritic EC input would then be eligible for synaptic potentiation depending on the subsequent arrival of a reinforcement signal. It is known that corticostriatal LTP is dependent on dopaminergic signaling through D1-type receptors (Wickens and Kötter, 1995; Pawlak and Kerr, 2008). It has been suggested that postsynaptic calcium may constitute, or initiate, an “eligibility trace” (Houk et al., 1995; Wickens and Kötter, 1995; Suri and Schultz, 1999) that serves as a memory of prior corticostriatal activation until the later arrival of a dopaminergic reward signal. If the context signal is sufficiently sparse, such an eligibility trace allows for the strengthening of the correct synapses even if the reward signal is significantly delayed after the action occurs (Fiete et al., 2007; Fee and Goldberg, 2011). Dopaminergic inputs into the striatum have been reported to synapse preferentially onto the necks of dendritic spines (Freund et al., 1984), placing them in closer proximity to the site of cortical inputs than to the thalamostriatal synapses located on dendritic shafts (Smith et al., 1994, 2004).

Further evidence for the hypothesized relation between synaptic ultrastructure and the division of striatal inputs into context and efference copy comes from a closer examination of different types of thalamostriatal projections. While I have focused so far on potential thalamic sources of efference copy signals, it is possible that some thalamostriatal projections carry context signals. Context inputs from the thalamus might be expected to form diffuse widespread axonal arborizations in the striatum, just like cortical IT neurons. Notably, the literature on the thalamostriatal system in both rats (Deschenes et al., 1995) and in monkeys (McFarland and Haber, 2001) provides strong evidence for the existence of both diffuse and focal axonal arborizations within the striatum. In rats, for example, projections from the caudal intralaminar nuclei tend to make a focal, dense cluster of terminations that may make multiple contacts onto single MSNs (Deschenes et al., 1995; Parent and Parent, 2005). These thalamic areas receive input from middle and deep layers of the superior colliculus, and the striatal targets of these thalamic nuclei project to regions of the SNr that in turn project to the deep layers of the SC, thus forming a topographically ordered subcortical loop (McHaffie et al., 2005). This organization forms the anatomical basis of the feedback loop shown in Figure 4B, in which the topographically localized thalamostriatal projection serves as a motor efference copy signal.

In contrast to the focal, clustered terminal arborizations produced by neurons in the caudal intralaminar nuclei, projections to the striatum arising from several other thalamic nuclei make diffuse sparse projections (Deschenes et al., 1995; McFarland and Haber, 2001). One source of diffuse projections is the lateral posterior thalamus (LP), part of the extrageniculate visual system that receives input from the superficial exclusively visual layers of the SC (Graybiel, 1972; Abramson and Chalupa, 1988; Berson and Graybiel, 1991). The fact that the striatal projection from LP likely carries sensory information is consistent with the hypothesis that context inputs should project diffusely within the striatum.

Remarkably, studies of the synaptic ultrastructure of these two thalamostriatal inputs reveal an asymmetric pattern that correlates with the pattern of their axonal arborizations. As described earlier, the focal, clustered axonal arborizations of the caudal intralaminar nuclei—those potentially carrying efference copy information from deep layers of the SC—preferentially terminate on dendritic shafts of MSNs. In contrast, the diffuse projections from LP—thought to carry visual information—produce en-passant synapses preferentially onto the spines of striatal MSNs (Ichinohe et al., 2001; McHaffie et al., 2005), just as IT-type cortical fibers terminate preferentially onto synaptic spines of direct-pathway MSNs. Thus, a number of diverse findings on the circuit connectivity, axonal arborization patterns, and synaptic ultrastructure of different thalamostriatal circuits can be interpreted in light of the functional asymmetry between context and efference copy inputs hypothesized above.

One interesting corollary of the hypothesis that LTP at corticostriatal synapses is “gated on” by an excitatory input to dendritic shafts (such as an efference copy signal) is that LTP could also be “gated off” by an inhibitory input. Indeed, most inhibitory synapses onto MSNs, including those from other MSNs, terminate on dendritic shafts rather than on spines (Wilson and Goves, 1980), and are thus well suited to serve this function. In this case, spiking activity in one MSN would inhibit synaptic potentiation in the other MSNs to which it projects. It has long been suspected that inhibitory interactions between MSNs might introduce a winner-take-all mechanism that could increase sparseness and selectivity in MSN responses to cortical inputs (Wilson and Goves, 1980; Wickens et al., 1991). While recent evidence suggests that the lateral inhibition between MSNs is probably too sparse and weak to generate winner-take-all firing rate dynamics (Maass, 2000; Wilson, 2007; Plenz and Wickens, 2010), a competitive interaction that suppresses LTP of cortical inputs would likely require weaker lateral interactions, and could implement the previously hypothesized dimensionality reduction in the cortical-to-striatal transformation (Bar-Gad et al., 2003). Furthermore, when such lateral inhibition is coupled to the plasticity-promoting effect of an efference copy input, this competitive mechanism would tend to produce a compact, low-dimensional representation of the cortical context in which a particular action leads to reward.

The Indirect Pathway

It is interesting to speculate on how the indirect pathway might be integrated into the proposed view of BG function. In the classical division of the BG into direct and indirect pathways, tonically active pallidal output neurons in the GPi and SNr receive an inhibitory input from tonically active neurons in the external segment of the globus pallidus (GPe; Mink, 1996). These GPe neurons can, in turn, be inhibited by a distinct population of “indirect-pathway” MSNs expressing D2-type dopamine receptors. Cortical activation of indirect-pathway MSNs thus inhibits GPe neurons, causing increased spiking in the GPi/SNr output neurons and increased inhibition of downstream thalamic and motor targets (Alexander and Crutcher, 1990; Smith et al., 1998). Thus, activation of indirect-pathway MSNs has an effect exactly opposite that of activating direct-pathway MSNs, and is thought to be a mechanism to put a “brake” on downstream motor targets (Nambu, 2004).

For example, if a particular motor action produces a worse-than-expected outcome in a particular context, then strengthening of the corticostriatal CX inputs onto the appropriate indirect pathway MSNs would allow the context inputs to suppress that motor action in that context. More specifically, one can imagine a simple indirect-pathway counterpart to the models shown in Figures 3 and 4 in which EC and CX inputs converge onto indirect-pathway MSNs. Of course, for this model to work, the corticostriatal synapses onto indirect pathway MSNs would require a different learning rule than those onto direct pathway neurons. Namely, indirect-pathway corticostriatal LTP should result from simultaneous activation of a CX and EC input followed by the unexpected absence of a reward [signaled by a transient decrease in dopamine input (Schultz et al., 1997)], rather than the unexpected appearance of a reward (signaled by a transient increase in dopamine). Indeed, such differences in learning rules onto direct-and indirect-pathway have been reported in studies of corticostriatal plasticity that distinguish between MSNs on the basis of the different types of dopamine receptors expressed in these two pathways (Shen et al., 2008).

An additional prediction of this extended model is that EC inputs should form topographically localized projections onto both direct- and indirect-pathway MSNs. Using the arguments made above about thalamostriatal inputs, EC inputs might also be expected to form synapses onto the dendritic shafts in both MSN types. Indeed, some studies have indicated that thalamostriatal inputs form axodendritic synapses on both MSN types with roughly equal probability (Doig et al., 2010). However, other studies suggest that, while thalamic inputs to the striatum terminate on the dendritic shafts of direct-pathway MSNs, they terminate significantly less often onto indirect-pathway MSNs (Sidibe and Smith, 1996; Smith et al., 2004). In addition, the PT fibers that are the hypothesized cortical source of efference copy inputs preferentially contact indirect-pathway MSNs, and, furthermore, preferentially contact synaptic spines (Reiner et al., 2003, 2010). Of course, it is possible that PT fibers make sufficient contacts with direct-pathway MSNs (perhaps even onto dendritic shafts) to function as an EC input. But it is not clear, at this point, how the various reported differences between cortical and thalamic innervation of direct and indirect pathway MSNs can be related to the model proposed here.

One possibility, suggested by reports that PT inputs preferentially contact synaptic spines of indirect-pathway MSNs, is that these motor efference copy signals may also serve as a “context” signal in the indirect pathway, perhaps acting (as suggested by Reiner et al., 2010) to suppress specific motor channels that interfere with ongoing motor actions. More specifically, in the context of an ongoing motor action (represented in this case by a PT fiber acting as a context input), the occurrence of a conflicting motor action (represented by an efference copy input) would lead to a lower probability of reward. By the learning rule described above, this would lead to potentiation of the PT input onto the indirect-pathway MSN, such that during future occurrences of the ongoing motor action (the context), there would be a lower probability of generating the conflicting action. While this picture might explain the termination of PT axons onto the spines of indirect-pathway MSNs, it still violates the “principle” that context inputs should be topographically diffuse. Thus, while the anatomy of cortical and thalamic inputs to the direct pathway fit the proposed model reasonably well, a number of reported anatomical features of the inputs to the indirect pathway are not predicted by the model, as it is currently formulated.

Other Motor Systems

I have presented the argument that information about eye movements important for oculomotor learning may be transmitted from brainstem circuits as an efference copy through thalamostriatal pathways. Of course, the same view would likely apply to other motor systems as well. Central pattern generator circuits in the brainstem are capable of generating a wide range of behaviors: locomotion, turning, innate vocalizations, feeding, postural tone, and possibly even facial expressions, and other displays of emotions (Russell and Bullock, 1985; Grillner et al., 2005). It has been suggested (Grillner et al., 2005) that the BG play an important role in the selective and flexible control of this “tool box of motor infrastructure” (Takada et al., 1994; Swanson, 2000; Grillner, 2003; Hikosaka, 2007). If the hypothesis presented here for the oculomotor system is correct, then other brainstem motor circuits under the control of the BG should also send an efference copy of ongoing behaviors back to the striatum, as hypothesized for the oculomotor system.

For example, the BG are thought to exert control over posture and locomotion via projections of the SNr to the pedunculopontine tegmental nucleus (PPN; Garcia-Rill, 1986; Takakusaki et al., 2003; Hikosaka, 2007) and the mesencephalic locomotory region (MLR; Takakusaki et al., 2004). Indeed, it has been noted that the PPN and perhaps other brainstem motor structures exhibit a remarkably parallel pattern of interactions with the BG as that seen with the superior colliculus (Winn et al., 2010). This includes the presence of feedback connections to the striatum passing through the thalamus (Erro et al., 1999; Mengual et al., 1999) that could potentially carry some forms of efference copy signals useful for learning.

Another area of motor function for which the BG have been hypothesized to be important is in learning action sequences (Berns and Sejnowski, 1998; Hikosaka et al., 1999). Efference copy signals transmitted to the striatum about ongoing motor actions could in principle be used to learn such sequences. More specifically, let us imagine a situation in which action B has a higher probability of yielding a reward when it follows action A. In this case, if an efference copy of action A is available as one of the context inputs to an MSN that controls action B, then the learning rule described above (Figure 3B) will lead to a strengthening of this action A → action B context input. In this way, simple pairs or short sequences of actions could be learned. The potential utility of efference copy signals as context inputs suggest that axons carrying EC signals may possibly serve both of these functions. It would be interesting to determine if some EC projections have axons that form a diffuse projection that synapses onto dendritic spines of MSNs and a focal projection that synapses onto dendritic shafts. Indeed, it has been suggested that individual thalamostriatal axons from the ventral anterior and the vertral lateral (VA/VL) thalamic nuclei may form both a focal and a diffuse projection (McFarland and Haber, 2001).

Evolutionary Implications

The basal ganglia and its subcomponents are highly throughout vertebrate evolution, as are its interactions with downstream motor structures (Ganz et al., 2012; Stephenson-Jones et al., 2012). The potential role of the thalamostriatal system in transmitting efference copy signals arising from brainstem motor circuitry suggests an argument related to the evolution of the BG. The BG could have evolved to evaluate and reinforce brainstem-generated behaviors. This function could initially have been carried out using context and efference copy signals originating entirely in the brainstem and transmitted through the thalamus, rather than involving corticostriatal systems (Figure 4C). Consistent with this view, excitatory inputs to the striatum in amphibians originate almost entirely from thalamic nuclei, with comparatively little cortical input (Wilczynski and Northcutt, 1983; Reiner et al., 1998). For example, dorsal thalamic sensory relay nuclei, which in mammals and birds project principally to primary sensory cortices, project almost exclusively to the striatum in amphibians (Butler, 1994). Indeed, it has been suggested that the relatively minor projections to the striatum from specific sensory thalamic nuclei in mammals may be a remnant of the much larger striatal projection from these nuclei in ancestral amphibians (Reiner et al., 1998). In light of the model presented here, I would predict that the thalamostriatal projection in amphibians would also include substantial efference copy signals from intermediate layers of the tectum and other brainstem motor circuits. It would be further expected that sensory inputs and efference copy inputs from the thalamus in the frog would share the projection patterns and ultrastructural features described above for the mammalian LP and caudal intralaminar areas, respectively.

It is interesting that the amphibian pallium (the likely evolutionary precursor of neocortex) does not contain neurons that project out of the forebrain (Nieuwenhuys et al., 1998; Roth et al., 2007), and thus does not have the equivalent of pyramical tract neurons by which mammalian cortical outputs can directly influence brainstem or spinal motor functions. In these animals, “cortical” access to brainstem/spinal motor circuits may be mediated, at least in part, by the small but extant telencephalic projection to the striatopallidum (i.e., the BG; Nieuwenhuys et al., 1998; Roth et al., 2007). A prediction of the model I describe here is that these “corticostriatal” projections would act as context inputs, and would exhibit the anatomical, ultrastructural, and functional properties of IT fibers in mammalian striatum.

Of course, sensory context signals arising from thalamic and subcortical circuitry would tend to be more transient and much simpler than the kinds of responses produced by cortex. Mammalian cortical circuits can generate highly sophisticated representations of behaviorally important context information, including short term memory (Funahashi et al., 1991; Rainer et al., 1998; Romo et al., 1999), complex receptive fields (Tanaka, 1996), sensitivity to high-order combinations of sensory stimuli (Fitzpatrick et al., 1993), and object invariance (Freiwald and Tsao, 2010; Li and DiCarlo, 2010). The massive expansion of the pallium in amniotes (reptiles, birds, and mammals) could have been driven by the advantage of having such a rich set of context signals with which the striatum could evaluate the animal's actions. Of course, the expansion of context representations in cortex would have necessitated a corresponding increase in the number of MSNs, as reflected in the parallel evolutionary expansion of striatal and cortical sizes (Reiner et al., 1998).

As cortex evolved to carry out motor and executive functions, these cortical inputs to the striatum would need to also include efference copy signals of descending motor commands, as well as even more complex context signals that include complex representations of temporal order within sequential tasks, such as the signals transmitted from HVC to Area X in the songbird (Kozhevnikov and Fee, 2007; Fujimoto et al., 2011), or representations of the task or behavioral rules by circuits in prefrontal cortex (Miller and Cohen, 2001). These signals would allow even more sophisticated evaluations of actions, not just within the context of the external state of the world, but also in relation to the internal state of the animal, including emotional and social states and long-term goals.

Summary

The model I have presented here provides a very general framework by which the BG could learn to link specific contexts to actions, even if those actions are generated outside the striatum. It has long been hypothesized that the striatum receives two signals necessary for reinforcement learning: context signals that indicate the current state of the world and the animal, and an evaluation signal carrying information about rewards. Based on our developing understanding of the mechanisms of vocal learning in the songbird, here I hypothesize that the striatum receives an additional signal—an efference copy of motor command signals generated either in cortical or brainstem motor circuits. The role of this input is to allow the striatum to determine which actions, in which contexts lead to a reward. I have proposed that this learning can be accomplished by a simple synaptic learning rule in which motor efference copy signals “enable” synaptic plasticity in context inputs to striatal MSNs. The proposed model generates a number of predictions about differences in the divergence of the projections of context and efference copy striatal inputs, namely that efference copy inputs should be topographically localized and that context inputs should be diffuse. The model also makes predictions about synaptic plasticity in these inputs that may be consistent with the known ultrastructure of cortical and thalamic inputs. Specifically, it is hypothesized that context inputs terminate on synaptic spines to provide precise localization of synaptic plasticity, while efference copy inputs terminate preferentially on the dendritic shafts of MSNs, providing a widely distributed cellular signal that controls plasticity at MSN spines, perhaps by transiently driving the MSN into an “up” state.

Conflict of Interest Statement

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

I thank Michael Farries, Jesse Goldberg, Anusha Narayan, and Michael Stetner for helpful discussions and comments on earlier versions of the manuscript. I also thank the three reviewers for their insightful comments and suggestions. This work was supported by funding from the NIH (R01MH067105).

Footnotes

^In this paper, the model of song learning incorporates the time in the song as a context, the model of oculomotor learning incorporates a visual stimulus as the context.
^There are several different types of striatal MSNs, typically classified by their output targets and their expression of different dopamine receptor subtypes (Gerfen et al., 1990; Smith et al., 1998). So called “direct pathway” MSNs project to motor centers, such as the SNr, and preferentially express D1-type dopamine receptor. “Indirect pathway” MSNs project to the external segment of the globus pallidus (GPe) and preferentially express D2-type dopamine receptors. The activity of indirect-pathway MSNs has a net inhibitory effect on motor output. Yet another group of MSNs project reside in the patch/striosome part of the striatum (Graybiel and Ragsdale, 1978; Gerfen et al., 1987), and appear to project preferentially to midbrain dopaminergic centers rather than to motor-related pallidal outputs (Gerfen, 1985). The models shown in (Figures 1–4) apply only to the direct-pathway D1 MSNs of the matrix (motor output) regions of the BG. I will return to a discussion of the indirect pathway in a later section.
^In this description of context inputs, I have made the simplification that the cortical neurons represent a visual response to the target cue. In the actual experiments carried out by (Kawagoe et al., 1998), the monkeys performed a memory-guided saccade task, in which the cue was presented several seconds before the saccade was made. Thus, it may be more correct to think of the context inputs coming from neurons that represent a short-term memory of the cue, rather than from neurons that have a direct visual response.

References

Abramson, B. P., and Chalupa, L. M. (1988). Multiple pathways from the superior colliculus to the extrageniculate visual thalamus of the cat. J. Comp. Neurol. 271, 397–418.

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Alexander, G. E., and Crutcher, M. D. (1990). Functional architecture of basal ganglia circuits: neural substrates of parallel processing. Trends Neurosci. 13, 266–271.

HYPOTHESIS AND THEORY article

Oculomotor learning revisited: a model of reinforcement learning in the basal ganglia incorporating an efference copy of motor actions

Introduction

A Model of BG Control of Oculomotor Behavior

A Model of Songbird BG Incorporating an Efference Copy of Motor Actions

A Novel Model of Oculomotor Learning That Incorporates Efference Copy of Motor Actions

Asymmetries Between Context and Efference Copy Inputs

A Model of Motor Learning with A Thalamic Source of Efference Copy Signals

The Ultrastructure of Striatal Inputs: Context and Efference Copy

The Indirect Pathway

Other Motor Systems

Evolutionary Implications

Summary

Conflict of Interest Statement

Acknowledgments

Footnotes

References

People also looked at