# A Hebbian learning rule gives rise to mirror neurons and links them to control theoretic inverse models

^{1}Institute of Neuroinformatics, University of Zurich and ETH Zurich, Zurich, Switzerland^{2}Neuroscience Center Zurich (ZNZ), Zurich, Switzerland^{3}Department of Applied Physics, Stanford University, Stanford, USA

Mirror neurons are neurons whose responses to the observation of a motor act resemble responses measured during production of that act. Computationally, mirror neurons have been viewed as evidence for the existence of internal inverse models. Such models, rooted within control theory, map-desired sensory targets onto the motor commands required to generate those targets. To jointly explore both the formation of mirrored responses and their functional contribution to inverse models, we develop a correlation-based theory of interactions between a sensory and a motor area. We show that a simple eligibility-weighted Hebbian learning rule, operating within a sensorimotor loop during motor explorations and stabilized by heterosynaptic competition, naturally gives rise to mirror neurons as well as control theoretic inverse models encoded in the synaptic weights from sensory to motor neurons. Crucially, we find that the correlational structure or stereotypy of the neural code underlying motor explorations determines the nature of the learned inverse model: random motor codes lead to causal inverses that map sensory activity patterns to their motor causes; such inverses are maximally useful, by allowing the imitation of arbitrary sensory target sequences. By contrast, stereotyped motor codes lead to less useful predictive inverses that map sensory activity to future motor actions. Our theory generalizes previous work on inverse models by showing that such models can be learned in a simple Hebbian framework without the need for error signals or backpropagation, and it makes new conceptual connections between the causal nature of inverse models, the statistical structure of motor variability, and the time-lag between sensory and motor responses of mirror neurons. Applied to bird song learning, our theory can account for puzzling aspects of the song system, including necessity of sensorimotor gating and selectivity of auditory responses to bird's own song (BOS) stimuli.

## Introduction

Complex vertebrate motor behaviors are generated by dedicated cortical circuits. The organization of these circuits and the plasticity rules that lead to their development and that guarantee their maintenance are functionally related to neural activity in single units and across larger populations (Gallese et al., 1996; Rizzolatti et al., 1996; Rizzolatti and Craighero, 2004; Harvey et al., 2012). For example, neural activity often strongly co-varies with motor behavior, allowing for estimation of detailed limb movement parameters from mere single-neuron recordings (Georgopoulos et al., 1986; Schwartz et al., 1988) and facilitating neural prosthesis (Santhanam et al., 2006; Ethier et al., 2012). However, in other cases, the amount of firing variability in single neurons can be dramatically dissociated from behavioral variability. For example, in songbirds, two distinct premotor areas are responsible for the generation of different aspects of the same vocal behavior. On the one hand, the cortical area HVC is involved in generating stereotyped adult song; lesions of HVC lead to degradation of typical adult song toward more unstructured subsong typical of very young birds (Nottebohm et al., 1976; Aronov et al., 2008). On the other hand, its counterpart, the lateral magnocellular nucleus of the anterior nidopallium (LMAN) in very young birds is involved in subsong production and in adults it is involved in the production of very subtle song variability that is barely noticeable to the human ear (Aronov et al., 2008). Lesions of LMAN in juveniles abolish song learning (Bottjer et al., 1984), and lesions in adults reduce the already small variability of adult undirected songs (the songs not direct toward another bird), manifest for example by reduced fluctuations of sound pitch (Kao et al., 2005; Stepanek and Doupe, 2010).

These lesion studies ascribing differential roles of HVC and LMAN to song production, are paralleled by findings from electrophysiology. In HVC of singing birds, single principal neurons fire highly stereotyped spiking patterns associated with a given song syllable, with precision of individual action potentials in the sub millisecond range (Hahnloser et al., 2002; Kozhevnikov and Fee, 2007). By contrast, in LMAN of birds singing undirected songs, neurons fire very variable spike patterns, patterns that fluctuate on a trial-to-trial basis between loosely timed high-frequency bursts of action potentials and no spiking at all (Olveczky et al., 2005; Kao et al., 2008). Thus, stereotyped adult song is subserved by precise firing in HVC whereas subtle variability of adult song is subserved by large firing variability in LMAN, Figure 1. The diverse neural codes in LMAN and HVC are integrated in a dedicated nucleus that mediates both differential influences from these stereotypy and variability generators. Both HVC and LMAN project to the robust nucleus of the arcopallium (RA), which is the cortical output nucleus that directly innervates syringeal and respiratory motor neurons.

**Figure 1. Spiking activity in single neurons of singing zebra finches, illustrating (A) a variable premotor code in LMAN and (B) a stereotyped code in HVC. (A)** Spike raster plot of LMAN projection neuron aligned to 41 renditions of the stereotyped song motif. Three exemplary sound oscillograms of the motif are shown on top. The neuron produces single spikes and spike bursts at different times in each rendition of the motif. The motif-averaged firing rate is shown at the bottom. **(B)** Spike raster plot of HVC projection neuron aligned to 22 renditions of the stereotyped song motif (in a different bird), three exemplary motif oscillograms are shown on top. In each rendition of the motif the neuron produces a brief burst of spikes at precisely the same time.

Whether stereotyped or variable, internal motor patterns responsible for generating behavior cannot be fully understood without considering the sensory input reaching the motor system. Indeed, the very development of motor systems as well as the formation of motor plans are profoundly shaped by sensory inputs. For example, the development of the mirror neuron system depends on sensorimotor experience (Catmur, 2012) and, the successful development of birdsong depends on intact HVC and LMAN activity during sensory exposure (Basham et al., 1996; Roberts et al., 2012).

We have learned much about the integration of sensory inputs into motor systems from single neuron studies examining responses during motor production and during matched sensory states. Among the key findings are mirror neurons that fire similarly when an animal executes a motor act and when it sees or hears another animal perform that same act. For example, mirror neurons in F5 of monkey premotor cortex fire both when the monkey touches an object and sees another subject touch that object (Rizzolatti et al., 1996; Rizzolatti and Craighero, 2004). Mirror neurons also exist in HVC of songbirds; these neurons fire at a precise time in the song, both when the bird sings the song and when it hears a similar song produced by another bird (Prather et al., 2008).

Mirror neurons establish a link between the observation of an act in another and self-generation of that same act. Such a remarkable correspondence between sensory and motor roles in single neurons has led to numerous suggestions about the function of mirror neurons in communication, imitation learning, cultural learning, and language development (Rizzolatti and Craighero, 2004; Oztop et al., 2012). Most importantly, mirrored responses have been proposed to be causally related to streams of motor and sensory activity (Oztop et al., 2006, 2012). A recent proposal is to tie properties of the mirror neuron system to correlative learning rules (Cooper et al., 2012). Accordingly, sensory responses in mirror neurons could develop from the contingency of motor-related firing and its sensory consequences feeding back to motor areas. Here we develop this idea and propose a simple mathematical theory of mirror neuron formation from correlational learning rules. To examine the critical role of motor variability, we study, based on earlier work (Hahnloser and Ganguli, 2013), mirror neuron formation for both motor codes with strongly correlated firing patterns among neurons, as in HVC, as well as for motor codes with uncorrelated firing patterns among neurons, as in LMAN.

We are particularly interested in relating mirror neuron properties to their computational role in control theoretic inverse models. Mirror neurons have previously been recognized as direct evidence of inverse models, which are models that transform desired sensory states into motor commands that can achieve those states and may be used for action generation (Oztop et al., 2012). From the control-theoretic perspective, internal inverse models give rise to mirrored responses because of the precise correspondence between a desired sensory target, the motor commands for producing that target, and the resulting sensory feedback. We pursue this idea and elucidate the conditions under which inverse models can arise from correlational learning during sensory feedback-dependent motor explorations.

We assume inverse models form in a context without prior knowledge of structure of either the motor apparatus or the delayed sensory feedback. We design an eligibility-weighted correlational learning rule that allows for the formation of both inverse models and mirror neurons. In the rule we propose, synaptic strengthening depends on contiguous co-activation of pre-and postsynaptic neurons, whereas synaptic weakening depends on heterosynaptic competition between sensory afferents innervating the same motor neuron. We argue that from a synaptic perspective, this rule is considerably simpler and more plausible than previously proposed rules and computational approaches toward systems-level inverse models based on error backpropagation (Jordan and Rumelhart, 1992). Our rule is most closely related to direct inverse model approaches (Miller, 1987; Slotine, 1987), in which, however, the possibility of unknown feedback delays has not been adequately addressed. Most importantly, we find that whether the formed mirror neuron system and inverse model is suitable for action imitation depends on the correlational structure of the neural code associated with motor production. Whereas a variable (explorative) motor code leads to causal inverse models and is suitable for mirror-neuron dependent action imitation, a stereotyped (repetitive) motor code leads to predictive inverse models and is not suitable for action imitation. Thus, our work provides an interesting link between the correlational structure of motor behavior, its underlying neural code, and fine-grained temporal properties of mirror neuron responses and their suitability for flexible action imitation.

Furthermore, these conceptual connections suggest a set of natural experiments designed to probe for the existence, and characterize the causal nature of, inverse models by measuring the fine grained temporal properties of the sensory and motor responses of mirror neurons. As we discuss below, when applied to the bird song system, these experiments make a specific, testable prediction about the existence and temporal properties of mirror neurons in the variable motor circuit LMAN, as well as explain the origin of previously observed temporal properties of mirror neurons in the stereotyped motor circuit HVC.

## Results

### A Linear Framework

We develop our theory in a simple linear framework in which the sensory response *a*(*t*) in a sensory brain area at time *t* is a vector of firing rates that is linearly related to the motor cause *m*(*t* − τ) at an earlier time *t* − τ, where *m*(*t* − τ) is a vector of firing rates in a motor area such as HVC or LMAN. The *time delay of sensory feedback τ = τ _{m}* + τ

_{a}is the sum of the time τ

_{m}needed to translate motor activity into behavioral (vocal) output and the time τ

_{a}it takes for a vocalization to elicit a sensory response. We assume a linear

*motor-sensory mapping*modeled by the matrix

**Q**, allowing us to specify the form of delayed sensory feedback as

*a*(

*t*) =

**Q**

*m*(

*t*− τ), Figure 2.

**Figure 2. Delayed feedback and inverse model, illustrated by vocal production in birds.** In our model of delayed sensory feedback the auditory response *a*(*t*) in a sensory area at time *t* depends linearly on motor activity *m*(*t* − τ) in a motor brain area at an earlier time *t*-τ according to *a*(*t*) = **Q***m*(*t* − τ), where **Q** is the unknown motor-sensory mapping and τ the unknown delay of auditory feedback. An inverse **V** is a mapping from sensory neurons back onto motor neurons that inverts the action of **Q**: **V** = **Q**^{−1}.

Note that for simplicity we assume linearity of the motor-sensory mapping **Q**. However, the simple linearity assumption inherent in **Q** need not be inconsistent with the existence of non-linearities between motor neuron activity and behavioral output (for example, song) and also with non-linearities between behavioral output and sensory responses. While it is the case that each of these transformations is highly non-linear, the dimensionality of motor behavior patterns realizable by muscle activity, or recorded by early sensory responses, is much smaller than the dimensionality of sensory or motor activity patterns deep within the cortex, by virtue of the fact that cortical motor and sensory neurons largely outnumber the few muscles and sensory receptors involved in the composite motor to sensory feedback loop. So for example, within the bird song system, it is thus probable that the low dimensional, composite non-linear transformation from cortical motor patterns, to muscle activity in the syrinx, to song, to cochlear response, back to cortical sensory feedback, could be well-approximated by a direct high dimensional linear map from the cortical motor area back to the cortical sensory area. This is in exact analogy to the theory of support vector regression approaches from machine learning, in which low dimensional non-linear maps can be well-approximated by high dimensional linear maps (Smola and Schölkopf, 2004). Thus, for our purposes, all we assume is that there exists at least one high dimensional linear map from cortical motor patterns to cortical sensory feedback patterns that approximates the composite feedback pathway implemented through the non-linear processes of motor generation and perception.

Now, an inverse model in this context is a mapping **V** = **Q**^{−1} expressed in the synaptic weights **V** from sensory onto motor neurons. Such a mapping allows sensory neurons to postdict the possible motor cause *m*^{a} of a sensory target (vector) *a* (either driven externally, recalled from memory, or resulting from a planning strategy) according to *m*^{a} = **V***a*. Such a postdiction ability of inverse models can be used in feedforward motor control in which the appropriate stream of motor commands *m*^{a}(*t*) can be computed for a given desired sensory target sequence *a*(*t*) according to *m*^{a}(*t*) = **V***a*(*t*).

The goals of our theory are to outline a biologically plausible, local mechanism for learning of the synaptic mapping **V** and to characterize the associated emergence of mirror neurons in this process.

### Eligibility-Weighted Hebbian Learning

We designed a simple learning rule in which potentiation of sensory-to-motor synaptic connections **V** arises from correlated firing in pairs of sensory and motor neurons. Because sensory feedback is delayed, synapses must be able to detect correlated firing within some non-zero time window, which we achieve by introducing an eligibility trace *e*(*s*) that establishes a link between activity at time *t* in a motor neuron and activity in a sensory neuron at a later time *t* + *s* (see also Figures 3A,C). The eligibility trace modulates the change in synaptic strength associated with correlated pre- and postsynaptic firing—it is a biophysical process that resides on the postsynaptic side of **V** synapses and is triggered by activity (i.e., spikes) in the postsynaptic (motor) neuron. Intuitively, we imagine that the spiking of a motor neuron, elicited for example from an internal source of motor variation that generates exploratory motor behavior, triggers the eligibility trace that in turn makes all synapses from sensory neurons onto that motor neuron eligible for future modification. Thus, if the delayed sensory feedback arrives to the sensory area within the window of eligibility, sensory to motor synapses can potentially learn to postdict the motor cause by correlating the current sensory feedback with past motor activity that might have generated it. We further assume that the eligibility is monotonically decaying in time, implying that sensory inputs preferentially connect onto motor neurons that were recently and reliably activated rather than motor neurons that were activated a long time ago. Necessarily, the decay of the eligibility trace must be slow enough to be able to attribute significant eligibility to sensory inputs with motor-to-sensory delays τ, which we subsume in the condition *e*(τ)» 0.

**Figure 3. Cross-correlation functions for variable and stereotyped motor codes. (A)** In a variable motor code *m*(*t*) (shaded area). Activity bursts *m*_{1} (black) and *m*_{2} (blue) of width *t*_{0} in two example motor neurons occur at diverse time lags relative to each other across renditions of the song motif. Auditory tuning in the shown sensory neuron is such that it responds *a*_{1} to bursts *m*_{1} after a time lag τ. Repeated co-activation *m*_{1} → *a*_{1} and non-zero eligibility *e*(τ) (red bar) at time lag τ leads to increased synaptic weight V_{11} (red arrow) and to a causal inverse. Lack of correlation between *m*_{2} and *a*_{1}, as well as heterosynaptic competition, prevents V_{21} from similarly increasing (blue thin arrow). **(B)** The cross-correlation function *C*_{ij}(*t*) for variable codes is flat except the auto-correlation peak at zero time lag (motor activity is uncorrelated among neuron pairs). Note: based on square activity pulses in motor neurons in **(A)** the true cross-correlation shape is triangular (blue dotted line) which we approximate by a square pulse of width *t*_{0} ≃ 10 ms. The auto-correlation peak height is *C*_{0}. **(C)** In a stereotyped motor code *m*(*t*) (shaded area), bursts *m*_{1} (black) and *m*_{2} (blue) occur at a fixed time lag relative to each other across renditions of the song motif (traveling pulse of activity). Repeated co-activation *m*_{2} → *a*_{1} at higher eligibility (red bar) than the eligibility of *m*_{1} → *a*_{1} leads to strengthening of synapse V_{21} (red arrow) and to a predictive inverse. **(D)** The cross-correlation function *C*_{ij}(*t*) for stereotyped codes peaks also at non-zero time lags.

The full correlational learning rule describing changes in synaptic strength V_{ij} from auditory neuron *j* onto motor neuron *i* reads:

where ${\widehat{{m}}}_{{i}}{(}{t}{)}{=}{\displaystyle {{\sum}}_{{k}}{{\text{V}}}_{{i}{k}}{{a}}_{{k}}{(}{t}{)}}$ is the (silently) postdicted motor activity, corresponding to the summed auditory input to neuron *i* at time *t*. The subtractive term ${\widehat{{m}}}_{{i}}{{a}}_{{j}}$ provides an equal time heterosynaptic depression (Lynch et al., 1977; Chistiakova and Volgushev, 2009) among all sensory afferent synapses onto a motor neuron. The strength of this depression depends on the amount of presynaptic activity but does not depend directly on postsynaptic activation. The utility of such depression is not only to stabilize activity but also to force synaptic connections towards inverse mappings as we will see. Note that we assume **V** synapses are “silently” correlating pre- and postsynaptic activity in Equation 1, i.e., **V** synapses do not contribute either to postsynaptic depolarization or to postsynaptic hyperpolarization. In other words, while the inverse model is being learned, the motor activity *m*_{i}(*t*) is entirely driven by some other source than the afferent auditory input. Thus, from the perspective of extracellular physiology, it would appear that sensory feedback arriving to the motor area through the inverse model, is gated out of the motor area while that motor area is engaged in internally generated motor explorations.

In the following we examine the outcome of this learning rule in response to various forms of motor codes, with the goal of computing the synaptic weight matrix **V** at a steady-state of the learning rule, $\frac{{\text{d}}}{{\text{dt}}}{\text{}}{\langle}{V}{\rangle}{=}{0}$, where 〈〉 denotes averaging over time (e.g., over different renditions of the song). To simplify the calculations, we assume motor codes with narrow spike-train cross-correlation functions, i.e., the width *t*_{0} of spike-train cross-correlation functions is much smaller than the characteristic decay time of the eligibility trace. Although such functions have not been extensively studied due to the difficulty of simultaneously recording from several neurons during singing, narrow cross-correlation is plausible for RA and HVC neurons because pseudo simultaneous recordings can be constructed from serial recordings thanks to high firing stereotypy in these cells, yielding cross-correlation widths on the order of 10 ms (Leonardo and Fee, 2005). Note that in LMAN, because of high firing variability, similar estimation of cross-correlation width is virtually impossible.

We model motor codes with diverse inherent levels of randomness. We model stereotyped motor codes by assuming that spike-train cross correlations extend over large time lags, in agreement with a traveling pulse of activity (Hahnloser et al., 2002; Harvey et al., 2012). We model variable motor codes by assuming that cross-correlations vanish except in a peak at zero time lag (white noise assumption), Figure 3B.

### A Variable Neural Code Yields Causal Inverses

If motor activity is uncorrelated among different neuron pairs, the resulting sensory to motor map **V** = *e*(τ)*t*_{0}**Q**^{−1} equals the inverse of **Q** weighted by the eligibility at time lag τ (Equation A5, for the derivation see Appendix A3). Hence, **V** is a causal inverse that maps sensory representations onto their motor causes (in Figure 3A, auditory neurons map onto those motor neurons whose firing correlates most strongly with their own).

For example, during singing the motor cause *m*_{1}(*t* − τ) (say a neuron that generates a 4 kHz tone) will frequently be followed by auditory response *a*_{1}(*t*) (a 4 kHz detector neuron), leading to strengthening of synapse V_{11}. By contrast, due to high variability of the motor code, associations between *m*_{2} (*t* − τ) (say a neuron that generates a 3 kHz tone) and *a*_{1} (*t*) are much less frequent (because the bird randomizes the production of 3 and 4 kHz tones). Hence, synapse V_{21} from the 4 kHz detector onto the 3 kHz generator will lose to synapse V_{11} due to heterosynaptic competition (Figure 3A).

### A Stereotyped Neural Code Yields Predictive Inverses

If the motor code is stereotyped and different motor neuron pairs are correlated at even very large time lags (extending over the full range of the eligibility trace and possibly beyond), then **V**≃*e*(0)*t*_{0}**H**^{τ} **Q**^{−1} is approximately a concatenation of the inverse of **Q** and a shifter matrix **H**^{τ} that maps motor activity at one time onto motor activity at a time lag τ later, i.e., **V** is a predictive inverse of **Q** (Equation A10). Under a predictive inverse **V**, a sensory neuron maps onto those motor neurons that were most recently active (and reliably follow in activation other motor neurons that give rise to the sensory neuron's response).

For example, during singing, the motor cause *m*_{1}(*t* − τ) of a 4 kHz tone will frequently occur before the cause *m*_{2}(*t*) of a 3 kHz tone (because the bird produces stereotyped downsweep syllables). Hence, the 4 kHz auditory detector response *a*_{1}(*t*) will find much higher eligibility in motor neuron 2, leading to strengthening of V_{21} at the expense of V_{11}, i.e., the 4 kHz detector neuron connects onto the 3 kHz generator neuron (Figure 3C).

### Lack of Response to Perturbed Auditory Feedback and Selectivity for the BOS

During Hebbian learning of **V** in Equation 1 we required that synapses **V** are not able to drive spike responses in motor neurons during singing (**V** synapses learn silently). The main intuitive reason for the necessity of silent learning is that the learning goal of the inverse model synapses are to silently correlate the motor and sensory streams, without perturbing the motor stream that would result if sensory feedback were to pass through and drive spikes in the motor area. If the inverse model synapses allowed sensory feedback to significantly drive motor spikes, then the incoming sensory signals would serve to drive motor activity resulting in cyclic motor output with cycle time approximately equal to τ, i.e., birds would unavoidably produce repetitive motor output (stuttering).

Interestingly, there is much evidence for the gating out of sensory information in song motor nuclei. Principal motor neurons in LMAN and HVC do not respond to playback of white noise stimuli during singing (Leonardo, 2004; Kozhevnikov and Fee, 2007) and during states of high arousal (Cardin and Schmidt, 2003), though there are reports of distorted feedback responses in HVC interneurons in Bengalese finches (Sakata and Brainard, 2008). Lack of feedback sensitivity in principal motor neurons is usually ascribed to a form of gating caused by specific thalamic or neuromodulatory mechanisms (Dave et al., 1998; Schmidt and Konishi, 1998; Shea and Margoliash, 2003; Cardin and Schmidt, 2004; Coleman et al., 2007; Hahnloser et al., 2008), see also the Discussion.

By contrast, LMAN (Doupe and Konishi, 1991; Doupe, 1997; Solis and Doupe, 1999; Roy and Mooney, 2007) and HVC neurons (Katz and Gurney, 1981; Margoliash, 1983, 1986; Williams and Nottebohm, 1985) respond to auditory stimulation while birds are anesthetized or asleep, which we model as gating on of **V** synapses, i.e., we assume that auditory responses in motor neurons are driven via the learned inverse models.

The puzzling observation of the gating out of sensory inputs to motor areas during motor exploration, is naturally accounted for in our theory by necessity of correlating the current presynaptic sensory stream with past postsynaptic motor streams, to learn an unbiased inverse model (of unperturbed motor stream).

Also, interestingly, in both HVC and LMAN sensory responses are strongest for bird's own song (BOS) stimuli compared to other stimuli including the tutor song or the BOS played back in reverse time (McCasland and Konishi, 1981; Margoliash, 1986; Lewicki, 1996; Solis and Doupe, 1999). Such selectivity follows naturally from our model assumptions: for both stereotyped and variable motor codes, the mappings, whether causal or predictive, can only invert sensory responses that lie in the image of **Q** and cannot invert the full space of responses orthogonal to the image of **Q**. Such a restriction arises because only sensations that could arise through combinations of previously experienced sensory feedback during singing can actually be inverted into appropriate motor commands. In other words, the inverse model synapses map prior sensory feedback generated by the bird's own previous song into appropriate motor commands, but necessarily fails to map sensory activity patterns that are very different from the BOS into coherent motor patterns. Thus, assuming HVC and LMAN can be thought of as downstream of the output of an inverse model, our Hebbian learning rule generating inverse models can naturally account for the preference of sensory responses in HVC and LMAN for BOS; sounds very different from BOS are not appropriately inverted, and therefore presumably do not lead to coherent activation of motor patterns via sensory inputs propagating through the inverse model synapses.

### Inverse Models and Sensorimotor Mirroring

The Hebbian learning rule in Section Eligibility-Weighted Hebbian Learning determines the wiring of sensory afferents into motor areas based on sensorimotor experience. How could one experimentally test for the existence of such wiring without painstaking, detailed inspection of anatomical connections and characterization of the sensorimotor mapping **Q**? Here we outline the design of experiments to probe for the existence of either causal or predictive inverse models. We propose to record from single neurons both in sensory and motor states and to compare motor activity and sensory-evoked responses using cross-correlation functions: as we will show, the time lag of peak cross correlation provides evidence for either predictive or causal inverses.

In such mirroring experiments that we propose, a single neuron is first recorded during singing and then during playback of the just recorded songs while the bird is asleep in the dark (during which the auditory gate is open and motor neurons become responsive to auditory stimuli, presumably through an inverse model from an upstream sensory area). In our model, sensory responses ${{m}}_{{i}}^{{a}}{(}{t}{)}{=}{\widehat{{m}}}_{{i}}{(}{t}{)}{=}{\displaystyle {{\sum}}_{{k}}{{\text{V}}}_{{i}{j}}{{a}}_{{j}}{(}{t}{)}}$ during playback are driven via synaptic weights **V** (assumed to be at a steady-state of Equation 1, $\frac{{\text{d}}}{{\text{dt}}}{\text{}}{\langle}{V}{\rangle}{=}{0}$). Computing the cross-correlation functions Corr(*s*) of the sensory response *m*^{a}_{i}(*t*) with motor activity *m*_{i}(*t*) (as a function of time lag *s*) yields that (see Figure 4):

- For variable neural codes we have that
*m*^{a}_{i}(*t*) =*e*(τ)*t*_{0}*m*_{i}(*t*− τ) and the cross-correlation function Corr(*s*) is 0 except at time lags ${s}{\in}{\left[}{\tau}{-}\frac{{{t}}_{{0}}}{{2}}{,}{\tau}{+}\frac{{{t}}_{{0}}}{{2}}{\right]}$ thus the peak correlation is near the sensorimotor time lag τ. In other words, for causal inverses the cross-correlation function between motor activity and sensory-evoked response peaks near time lag τ. That is, causal inverses are associated with large mirroring offsets equal to the loop delay τ. The reason is that the auditory response*m*^{a}_{1}(*t*) ∝*m*_{1}(*t*− τ) to song playback lags the song generating motor activity by a time lag τ. Intuitively, in a causal inverse model, if a motor neuron's activity is time locked to a particular song feature, it must fire before that feature in the motor production state, but after that feature in the sensory response state. This yields a temporal lag, or mirroring offset between the two (song-aligned) spike trains of a neuron recorded during motor production and during sensory exposure. - For stereotyped neural codes we have that
*m*^{a}_{i}(*t*) ≃*e*(0)*t*_{0}*m*(_{i}*t*) and the cross-correlation function Corr(*s*) as a function of the time lag*s*is proportional to the eligibility trace*e*(*s*). Thus, assuming a monotonic decay of eligibility, Corr(*s*) peaks at time lag*s*= 0. In other words, for predictive inverses the cross-correlation function peaks near zero time lag and predictive inverses are associated with close to zero mirroring offsets. The reason is that the auditory response*m*^{a}_{1}(*t*) ∝*m*_{1}(*t*) to song playback shows no lag with respect to the song generating motor activity. Intuitively, in a predictive inverse model, sensory feedback from a past motor action maps to concurrent motor activity in a stereotyped motor stream, which necessarily occurs after the motor activity that caused the sensory feedback. Thus, the past song elicits the firing of a motor neuron that generates future song. This implies that for any neuron there is no lag between its sensory and motor responses; the motor and sensory-evoked spikes of a motor neuron downstream of a predictive inverse model occur at the same time relative to song.

For derivations and model assumptions see Appendices A2–A4. In particular, here we assumed no synaptic delay between auditory and motor neuron, though this assumption can be relaxed. In summary, for both stereotyped and variable motor codes, sensory responses mirror motor activity. The amount of randomness in the motor code dictates the time lag of peak cross-correlation between motor activity and sensory-evoked responses, which we refer to as the *mirroring offset*. The mirroring offset thus serves as an important experimental observable that provides a window into fundamental differences in the types of inverse models that are computed by Hebbian learning, Figure 4.

**Figure 4. The mirroring offset is determined in recordings of neural activity across (A) singing and (B) auditory stimulation by song playback; the offset (red) is large for causal inverses (middle column) and close to zero for predictive inverses (right column). (A)** During singing, a motor-neuron burst *m*_{1} drives a song feature (green notes) after a time delay τ_{m}. **(B)** Playback of that song feature leads to auditory response *a*_{1} after a time delay τ_{a}. And, auditory response *a*_{1} leads to motor neuron response *m*^{a}_{1} in case of a causal inverse (middle column) and to motor neuron response *m*^{a}_{2} in case of a predictive inverse (right column) after an additional time lag τ_{s} (spike propagation time from auditory to motor area) that is assumed to be 0 for simplicity. Thus, after alignment of motor activity and sensory response with song (green arrows), in the case of a causal inverse, the mirroring offset Δ*t* defined as the time lag between motor activity and playback response (red bar with extension set by red dashed lines) is equal to the sensory feedback delay τ = τ_{m} + τ_{a}, whereas in the case of a predictive inverse the mirroring offset Δ*t* is close to 0. The reason for the 0 offset associated with a predictive inverse is that the auditory burst *a*_{1} driving the playback response *m*^{a}_{2} is selective to the sound feature that during singing was generated by the much earlier burst *m*_{1} in a different neuron (black burst in **(A)**, right panel), but not the feature generated by *m*_{2} (blue burst in **(A)**, right panel).

Note that variable motor codes are associated with weaker mirroring than stereotyped codes, i.e., the cross-correlation functions for variable codes exhibit lower peak amplitudes than cross-correlation functions associated with stereotyped codes: In our model, the ratio of peak cross correlation is given by the eligibility at time lag τ divided by the eligibility at time lag zero (Equation A12 derived in the Appendices A3, A4). Thus, the steeper the eligibility trace, the weaker the mirrored response in case of variable motor codes. By contrast, the shape of the eligibility trace is expected to have almost no influence in case of stereotyped codes.

Note that the auditory response ${{m}}_{{i}}^{{a}}{(}{t}{)}{=}{\displaystyle {{\sum}}_{{k}}{{\text{V}}}_{{i}{k}}}{{a}}_{{k}}{(}{t}{)}$ in a motor neuron to song playback is mathematically identical to the (silently) postdicted motor activity ${\widehat{{m}}}_{{i}}{(}{t}{)}{=}{\displaystyle {{\sum}}_{{k}}{{\text{V}}}_{{i}{k}}}{{a}}_{{k}}{(}{t}{)}$ defined after Equation 1, and used in learning the inverse model. Nevertheless, we use different symbols for these quantities to disentangle their meaning, i.e., the former being a superthreshold sensory response elicited in a quiet non-singing state of the bird, the latter being a subthreshold subtractive term that stabilizes synaptic learning during singing. The biophysical underpinnings of these two terms might largely be identical, with the silent nature of the posticted activity arising from some form of response gating (see also the Discussion).

### Gradient Descent

We note that the learning rule in Equation 1 corresponds to gradient descent on the following error function:

For a derivation, see Appendix A1. Thus, synaptic weights **V** converge such as to yield optimal postdiction ${\widehat{{m}}}_{{i}}{(}{t}{)}{=}{\displaystyle {{\sum}}_{{k}}{{\text{V}}}_{{i}{k}}}{{a}}_{{k}}{(}{t}{)}$ of motor activity from sensory feedback. The origin of our eligibility-weighted Hebbian learning rule with heterosynaptic competition, from gradient descent of an energy function, confers a degree of robustness to the learning, as well as suggests generalizations to situations in which the synaptic transformation from sensory to motor areas is non-linear.

### Probabilistic Models

More realistic neuron models are non-linear and contain spikes that are potentially probabilistic and certainly binary events. Also, more realistically, we may want to explicitly model intrinsic noise in motor and sensory-related responses rather than deal with motor variability only through their effects on cross correlations. As a first step to dealing with such realism, we have derived two probabilistic neuron models in which inverse models and mirroring can be studied in similar manners as in the linear model, outlined in the following.

In one of these models we calculate the influence of probabilistic (binary) responses on the strength of mirroring. We consider a random motor area that at any time can only be in one of two possible states *M* = 1 and *M* = 0 with prior probability ${p}{(}{M}{=}{1}{)}{=}\frac{{1}}{{2}}$. Assume analogously that the sensory area is such that a particular sensory feature is either detected (*S* = 1) or not detected (*S* = 0). We then model the relationship between motor activity and sensory consequence in terms of conditional dependencies between these two random variables. We assess the strength of mirroring in this model in terms of the cross-correlation coefficient between the two random variables (as derived in Appendix A5) and find the following result:

(A) For perfect auditory tuning (sensory neurons exhibit no noise to repeated sensory stimulation) the cross-correlation coefficient is given by the difference between the probabilities that the detected sensory feature is driven by the motor area vs. not driven by it. In other words, mirrored responses are proportional to the strength with which a single motor neuron contributes toward generation of the detected feature.

(B) In case of equal intrinsic noise in motor and sensory systems we find that the correlation coefficient is positive and proportional to the squared difference between the probabilities that the detected sensory feature is driven by the motor area vs. not driven by it.

Thus, the simple probabilistic model shows that the strength of mirroring may also be strongly reduced by the amount of intrinsic noise present in sensory and motor systems.

## Discussion

We have presented a simple model for the development of mirror neuron systems that is mathematically tractable, allowing us to relate mirror neuron properties such as the correlative strengths and the time lag of peak mirrored responses to the stereotypy (the correlation structure) of motor-related firing. Mirroring properties depend on the variability of the neural motor code which may be dissociated from apparent variability of the motor behavior as is the case in LMAN neurons that fire highly variable spike patterns despite high song stereotypy in adults. Our conclusions are valid for arbitrary sensory systems, provided they are able to signal sensory feedback from motor actions with sufficient sensitivity matched to the behavioral richness generated by the motor system (and of course provided that sensory afferents are subject to correlative Hebbian learning). In our derivation we have assumed that cross-correlation functions among motor neuron pairs are narrow, which was a simplifying assumption that allowed us to derive simple analytical forms of the sensory-to-motor mapping **V** and of mirroring properties. Approximate inverses should also result for motor codes with more complex time dependence, because by construction, the learning rule we considered corresponds to a gradient-descent rule that achieves minimal inversion error.

Although inverse models are attractive as models for vocal learning (Guenther et al., 2006; Hahnloser and Ganguli, 2013), they have previously been judged to be inappropriate for vocal learning in songbirds because of mainly two reasons: (1) young birds require many song repetitions with auditory feedback (Doya and Sejnowski, 2000), and (2) the learning schemes proposed either used a biologically implausible algorithm (Jordan and Rumelhart, 1992) or assumed the preexistence of an approximate inverse model (Kawato, 1990). Here we suggested a resolution to both of these issues and shown that in contrary to previous beliefs, inverse models constitute a potentially plausible framework for vocal learning in birds, too: the many song explorations used by young birds could be required to actually learn the high dimensional inverse model; and, the correlational learning we proposed is quite plausible and simple (but non-trivial nevertheless). This suggests potentially opening up the hypothesis space for learning rules operating within cortico-basal ganglia circuits, in both mammalian and bird song systems, to include models spanning the range from pure reinforcement learning (RL) to pure inverse model learning. Of particular interest would be intermediate learning rules that synergistically incorporate both dopamine-dependent plasticity thought to underlie RL as well, as Hebbian based plasticity shown here to mediate inverse model learning, in order to implement sophisticated model-based RL strategies. For example, a simple proposal would be that dopamine delivered to striatal synapses from the ventral tegmental area (VTA) might not be released purely nonspecifically, but instead might be delivered by an inverse model that can partially map errors in sensory coordinates to errors in motor coordinates, thereby guiding learning in ways more sophisticated than pure RL (O'Reilly and Frank, 2006).

The key to learning causal inverse models is motor variability. In motor areas such as HVC that fire stereotyped patterns, auditory afferents cannot disentangle cause-and-effect, leading to preferential formation of predictive inverses rather than causal ones. Predictive inverses have limited usefulness for action imitation from action observation, because under a predictive inverse, observation of a particular motor gesture will lead to imitation of the subsequent gesture in the imitator's motor repertoire, which may not be part of the actions to be imitated. For example, if a bird repeatedly sings *ABCD* during formation of the inverse and wants to later imitate repetitions of *ABDB*, then its predictive inverse will constrain it to produce repetitions of *BCDA* because perception of *A* maps to production of *B*, perception of *B* maps to production of *C*, etc.

Small temporal delays between motor activity and activity evoked by playback of BOS or BOS-resembling sounds have been reported previously. Prather et al. (2008) showed there is a small mirroring offsets of just a few milliseconds in HVC_{X} neurons of awake swamp sparrows and report similar (not quantified) results in Bengalese finches. Furthermore, Dave and Margoliash (2000) observed a small time lag of auditory-evoked activity also in RA neurons of sleeping zebra finches. Both these experimental findings reflect a predictive inverse. While predictive inverses have limited usefulness for action imitation they might provide stability in sequential vocalization. Indeed Sakata and Brainard report that perturbation of auditory feedback can change song syntax in Bengalese finches (Sakata and Brainard, 2006, 2008; Hanuschkin et al., 2011). By contrast, a causal inverse revealing itself by a large mirroring offset is maximally useful for song imitation. Indeed, preliminary results indicated a large non-zero mirroring offsets in LMAN (Giret et al., 2012).

An important element of our theory is the eligibility trace. To endow Hebbian learning with such a trace is necessary in realistic situations in which effects (sensory feedback) follow their cause (motor command) with some non-zero time lag arising from signal propagation delays, from twitch times of muscles, and from sensory and synaptic receptor latencies. In humans such a lag could span up to several hundreds of milliseconds, whereas in birds it may be as short as several tens of milliseconds. Eligibility traces also appear in RL theories (Seung, 2003; Fiete et al., 2007) and seem to be a general prerequisite for learning in the context of delayed feedback or delayed reward. We can imagine that neurons and synapses may hold decaying eligibility traces in terms of dedicated molecules such as calcium. Action potential generation is associated with rapid calcium entry that decays over the time course from several hundreds of milliseconds to a few seconds (McGeown et al., 1996; Wallace et al., 2008). The monotonic decay of intracellular calcium is well-suited to modeling a monotonically decaying eligibility trace. However, a monotonic decay of eligibility harbors both advantages and disadvantages. The disadvantage, as discussed, is the problem associated with stereotyped motor generators that can only hold predictive inverse models; to make inverse models causal, motor variability is required. Another way to guarantee causal inverse models—even under stereotyped motor explorations—would be to consider eligibility traces that do not monotonically decay but that peak at precisely the time delay inherent in closed sensorimotor feedback loops. The main caveat of such eligibility traces is that it may be questionable whether different muscles recruited for the same behavior must necessarily be associated with the same sensorimotor delay—and it is presently unclear how such variable delays could be matched to variable eligibility traces across synapses in a way that would ensure the learning of a causal inverse model. Moreover, phenomena such as speech co-articulation make it unlikely that there exists a constant sensorimotor delay across a large range of premotor neurons. The advantage, on the other hand, of a decaying eligibility trace is that sensorimotor contingencies and inverses can be learned regardless of sensorimotor latencies, providing robustness of sensorimotor learning.

Convergence of the sensory to motor synaptic weights toward inverses depends on details of the heterosynaptic competitive term. Heterosynaptic competitive terms have a certain appeal because of the useful normalization they provide (Fiete et al., 2010). In the context of this work, such terms imply locally available information at a single synapse about sensory inputs to other synapses. Though this information need not be provided instantaneously, we can only speculate about possible mechanisms for sharing such information among different synapses onto the same postsynaptic neuron. One possibility is that some form of intracellular signaling conveys this information. Another possibility to be explored is whether there exists an entire class of such competitive terms with a similar effect. For example, provided that motor and sensory codes are sufficiently sparse, it is conceivable that very simple subtractive terms might suffice for inverse formation. Whether other (even simpler) competitive terms result in approximate inverses needs to be further explored. We would like to point out preliminary evidence that inverses can be learned with Hebbian rules that include no heterosynaptic competitive terms (Senn and Pawelzik, pers. communication).

Our Hebbian learning theory has been analyzed so far in linear circuits, but we have indicated ways to overcome linearity by pinpointing extensions of our work to include nonlinear mappings and probabilistic neuron models. Further work will be required to test whether our correlative learning approach is suitable also for inverse model learning employing detailed biophysical models of the avian syrinx.

## Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

## Acknowledgments

We acknowledge support by the European Research Council (ERC-Advanced Grant 268911) and the Swiss National Science Foundation (Grant 31003A_127024), and support from the Swartz, Sloan, and Burroughs-Wellcome foundations, and Defense Advanced Research Projects Agency (DARPA). R. H. R. Hahnloser thanks Walter Senn for helpful discussions on inverse models and S. Ganguli thanks Michael Brainard and Kris Bouchard for useful discussions on birdsong learning, and the transition from causal to predictive inverse models.

## References

Aronov, D., Andalman, A. S., and Fee, M. S. (2008). A specialized forebrain circuit for vocal babbling in the juvenile songbird. *Science* 320, 630–634. doi: 10.1126/science.1155140

Basham, M. E., Nordeen, E. J., and Nordeen, K. W. (1996). Blockade of NMDA receptors in the anterior forebrain impairs sensory acquisition in the zebra finch (*Poephila guttata*). *Neurobiol. Learn. Mem*. 66, 295–304. doi: 10.1006/nlme.1996.0071

Bottjer, S., Miesner, E., and Arnold, A. (1984). Forebrain lesions disrupt development but not maintenance of song in passerine birds. *Science* 224, 901–903. doi: 10.1126/science.6719123

Cardin, J. A., and Schmidt, M. F. (2003). Song system auditory responses are stable and highly tuned during sedation, rapidly modulated and unselective during wakefulness, and suppressed by arousal. *J. Neurophysiol*. 90, 2884–2899. doi: 10.1152/jn.00391.2003

Cardin, J. A., and Schmidt, M. F. (2004). Auditory responses in multiple sensorimotor song system nuclei are co-modulated by behavioral state. *J. Neurophysiol*. 91, 2148–2163. doi: 10.1152/jn.00918.2003

Catmur, C. (2012). Sensorimotor learning and the ontogeny of the mirror neuron system. *Neurosci. Lett*. 540, 21–27. doi: 10.1016/j.neulet.2012.10.001

Chistiakova, M., and Volgushev, M. (2009). Heterosynaptic plasticity in the neocortex. *Exp. Brain Res*. 199, 377–390. doi: 10.1007/s00221-009-1859-5

Coleman, M. J., Roy, A., Wild, J. M., and Mooney, R. (2007). Thalamic gating of auditory responses in telencephalic song control nuclei. *J. Neurosci*. 27, 10024–10036. doi: 10.1523/jneurosci.2215-07.2007

Cooper, R. P., Cook, R., Dickinson, A., and Heyes, C. M. (2012). Associative (not Hebbian) learning and the mirror neuron system. *Neurosci. Lett*. 540, 28–36. doi: 10.1016/j.neulet.2012.10.002

Dave, A. S., and Margoliash, D. (2000). Song replay during sleep and computational rules for sensorimotor vocal learning. *Science* 290, 812–816. doi: 10.1126/science.290.5492.812

Dave, A. S., Yu, A. C., and Margoliash, D. (1998). Behavioral state modulation of auditory activity in a vocal motor system. *Science* 282, 2250–2254.

Doupe, A. J. (1997). Song- and order-selective neurons in the songbird anterior forebrain and their emergence during vocal development. *J. Neurosci*. 17, 1147–1167.

Doupe, A. J., and Konishi, M. (1991). Song-selective auditory circuits in the vocal control system of the zebra finch. *Proc. Natl. Acad. Sci. U.S.A*. 88, 11339–11343.

Doya, K., and Sejnowski, T. (2000). “A computational model of avian song learning,” in *The New Cognitive Neurosciences*, 2nd Edn, ed M. S. Gazzaniga (Cambridge, London: MIT Press), 469–484.

Ethier, C., Oby, E. R., Bauman, M. J., and Miller, L. E. (2012). Restoration of grasp following paralysis through brain-controlled stimulation of muscles. *Nature* 485, 368–371. doi: 10.1038/nature10987

Fiete, I. R., Fee, M. S., and Seung, H. S. (2007). Model of birdsong learning based on gradient estimation by dynamic perturbation of neural conductances. *J. Neurophysiol*. 98, 2038–2057. doi: 10.1152/jn.01311.2006

Fiete, I. R., Senn, W., Wang, C. Z. H., and Hahnloser, R. H. R. (2010). Spike-time-dependent plasticity and heterosynaptic competition organize networks to produce long scale-free sequences of neural activity. *Neuron* 65, 563–576. doi: 10.1016/j.neuron.2010.02.003

Gallese, V., Fadiga, L., Fogassi, L., and Rizzolatti, G. (1996). Action recognition in the premotor cortex. *Brain* 119, 593–609. doi: 10.1093/brain/119.2.593

Georgopoulos, A., Schwartz, A., and Kettner, R. (1986). Neuronal population coding of movement direction. *Science* 233, 1416–1419. doi: 10.1126/science.3749885

Giret, N., Kornfeld, J., Ganguli, S., and Hahnloser, R. (2012). “Delayed mirrored activity in the Zebra finch song system,” in *Proceeding of 609.15/FFF53 Society for Neuroscience Meeting*, (New Orleans).

Guenther, F. H., Ghosh, S. S., and Tourville, J. A. (2006). Neural modeling and imaging of the cortical interactions underlying syllable production. *Brain Lang*. 96, 280–301. doi: 10.1016/j.bandl.2005.06.001

Hahnloser, R., and Ganguli, S. (2013). “Vocal learning with inverse models,” in *Principles of Neural Coding*, eds S. Panzeri and P. Quiroga (Boca Raton, FL: CRC Taylor and Francis), 547–564.

Hahnloser, R. H. R., Kozhevnikov, A. A., and Fee, M. S. (2002). An ultra-sparse code underlies the generation of neural sequences in a songbird. *Nature* 419, 65–70. doi: 10.1038/nature00974

Hahnloser, R. H. R., Wang, C. Z.-H., Nager, A., and Naie, K. (2008). Spikes and bursts in two types of thalamic projection neurons differentially shape sleep patterns and auditory responses in a songbird. *J. Neurosci*. 28, 5040–5052. doi: 10.1523/jneurosci.5059-07.2008.

Hanuschkin, A., Diesmann, M., and Morrison, A. (2011). A reafferent and feed-forward model of song syntax generation in the Bengalese finch. *J. Comput. Neurosci*. 31, 509–532. doi: 10.1007/s10827-011-0318-z

Harvey, C. D., Coen, P., and Tank, D. W. (2012). Choice-specific sequences in parietal cortex during a virtual-navigation decision task. *Nature* 484, 62–68. doi: 10.1038/nature10918

Jordan, M. I., and Rumelhart, D. E. (1992). Forward models: supervised learning with a distal teacher. *Cogn. Sci*. 16, 307–354. doi: 10.1207/s15516709cog1603_1

Kao, M. H., Doupe, A. J., and Brainard, M. S. (2005). Contributions of an avian basal ganglia-forebrain circuit to real-time modulation of song. *Nature* 433, 638–643. doi: 10.1038/nature03127

Kao, M. H., Wright, B. D., and Doupe, A. J. (2008). Neurons in a forebrain nucleus required for vocal plasticity rapidly switch between precise firing and variable bursting depending on social context. *J. Neurosci*. 28, 13232–13247. doi: 10.1523/jneurosci.2250-08.2008

Katz, L. C., and Gurney, M. E. (1981). Auditory responses in the zebra finch's motor system for song. *Brain Res*. 221, 192–197. doi: 10.1016/0006-899391073-8

Kawato, M. (1990). “Feedback-error learning neural network for supervised motor learning,” in *Advanced Neural Computer*, ed R. Eckmiller (North-Holland: Elsevier), 365–372.

Kozhevnikov, A. A., and Fee, M. S. (2007). Singing-related activity of identified HVC neurons in the zebra finch. *J. Neurophysiol*. 97, 4271–4283. doi: 10.1152/jn.00952.2006

Leonardo, A. (2004). Experimental test of the birdsong error-correction model. *Proc. Natl. Acad. Sci. U.S.A*. 101, 16935–16940. doi: 10.1073/pnas.0407870101

Leonardo, A., and Fee, M. S. (2005). Ensemble coding of vocal control in birdsong. *J. Neurosci*. 25, 652–661. doi: 10.1523/jneurosci.3036-04.2005

Lewicki, M. S. (1996). Intracellular characterization of song-specific neurons in the zebra finch auditory forebrain. *J. Neurosci*. 16, 5855–5863.

Lynch, G. S., Dunwiddie, T., and Gribkoff, V. (1977). Heterosynaptic depression: a postsynaptic correlate of long-term potentiation. *Nature* 266, 737–739. doi: 10.1038/266737a0

Margoliash, D. (1983). Acoustic parameters underlying the responses of song-specific neurons in the white-crowned sparrow. *J. Neurosci*. 3, 1039–1057.

Margoliash, D. (1986). Preference for autogenous song by auditory neurons in a song system nucleus of the white-crowned sparrow. *J. Neurosci*. 6, 1643–1661.

McCasland, J. S., and Konishi, M. (1981). Interaction between auditory and motor activities in an avian song control nucleus. *Proc. Natl. Acad. Sci. U.S.A*. 78, 7815–7819. doi: 10.1073/pnas.78.12.7815

McGeown, J. G., Drummond, R. M., McCarron, J. G., and Fay, F. S. (1996). The temporal profile of calcium transients in voltage clamped gastric myocytes from Bufo marinus. *J. Physiol*. 497(Pt 2), 321–336.

Miller, W. I. (1987). Sensor-based control of robotic manipulators using a general learning algorithm. *IEEE J. Rob. Autom*. 3, 157–165. doi: 10.1109/jra.1987.1087081

Nottebohm, F., Stokes, T. M., and Leonard, C. M. (1976). Central control of song in the canary, *Serinus canarius*. *J. Comp. Neurol*. 165, 457–486. doi: 10.1002/cne.901650405

Olveczky, B. P., Andalman, A. S., and Fee, M. S. (2005). Vocal experimentation in the juvenile songbird requires a basal ganglia circuit. *PLoS Biol*. 3:e153. doi: 10.1371/journal.pbio.0030153

O'Reilly, R. C., and Frank, M. J. (2006). Making working memory work: a computational model of learning in the prefrontal cortex and basal ganglia. *Neural Comput*. 18, 283–328. doi: 10.1162/089976606775093909

Oztop, E., Kawato, M., and Arbib, M. (2006). Mirror neurons and imitation: a computationally guided review. *Neural Netw*. 19, 254–271. doi: 10.1016/j.neunet.2006.02.002

Oztop, E., Kawato, M., and Arbib, M. A. (2012). Mirror neurons: functions, mechanisms and models. *Neurosci. Lett*. 540, 43–55. doi: 10.1016/j.neulet.2012.10.005

Prather, J. F., Peters, S., Nowicki, S., and Mooney, R. (2008). Precise auditory-vocal mirroring in neurons for learned vocal communication. *Nature* 451, 305–310. doi: 10.1038/nature06492

Rizzolatti, G., and Craighero, L. (2004). The mirror-neuron system. *Annu. Rev. Neurosci*. 27, 169–192. doi: 10.1146/annurev.neuro.27.070203.144230

Rizzolatti, G., Fadiga, L., Gallese, V., and Fogassi, L. (1996). Premotor cortex and the recognition of motor actions. *Cogn. Brain Res*. 3, 131–141.

Roberts, T. F., Gobes, S. M. H., Murugan, M., Ölveczky, B. P., and Mooney, R. (2012). Motor circuits are required to encode a sensory model for imitative learning. *Nat. Neurosci*. 15, 1454–1459. doi: 10.1038/nn.3206

Roy, A., and Mooney, R. (2007). Auditory plasticity in a basal ganglia-forebrain pathway during decrystallization of adult birdsong. *J. Neurosci*. 27, 6374–6387. doi: 10.1523/jneurosci.0894-07.2007

Sakata, J. T., and Brainard, M. S. (2006). Real-time contributions of auditory feedback to avian vocal motor control. *J. Neurosci*. 26, 9619–9628. doi: 10.1523/jneurosci.2027-06.2006

Sakata, J. T., and Brainard, M. S. (2008). Online contributions of auditory feedback to neural activity in avian song control circuitry. *J. Neurosci*. 28, 11378–11390. doi: 10.1523/jneurosci.3254-08.2008

Santhanam, G., Ryu, S. I., Yu, B. M., Afshar, A., and Shenoy, K. V. (2006). A high-performance brain-computer interface. *Nature* 442, 195–198. doi: 10.1038/nature04968

Schmidt, M. F., and Konishi, M. (1998). Gating of auditory responses in the vocal control system of awake songbirds. *Nat. Neurosci*. 1, 513–518. doi: 10.1038/2232

Schwartz, A. B., Kettner, R. E., and Georgopoulos, A. P. (1988). Primate motor cortex and free arm movements to visual targets in three-dimensional space. I. Relations between single cell discharge and direction of movement. *J. Neurosci*. 8, 2913–2927.

Seung, H. S. (2003). Learning in spiking neural networks by reinforcement of stochastic synaptic transmission. *Neuron* 40, 1063–1073.

Shea, S. D., and Margoliash, D. (2003). Basal forebrain cholinergic modulation of auditory activity in the zebra finch song system. *Neuron* 40, 1213–1226.

Slotine, J.-J. E. (1987). On the adaptive control of robot manipulators. *Int. J. Rob. Res*. 6, 49–59. doi: 10.1177/027836498700600303

Smola, A., and Schölkopf, B. (2004). A tutorial on support vector regression. *Stat. Comput*. 14, 199–222. doi: 10.1023/B:STCO.0000035301.49549.88

Solis, M. M., and Doupe, A. J. (1999). Contributions of tutor and bird's own song experience to neural selectivity in the songbird anterior forebrain. *J. Neurosci*. 19, 4559–4584.

Stepanek, L., and Doupe, A. J. (2010). Activity in a cortical-basal ganglia circuit for song is required for social context-dependent vocal variability. *J. Neurophysiol*. 104, 2474–2486. doi: 10.1152/jn.00977.2009

Wallace, D. J., Meyer, S., Astori, S., Yang, Y., Bausen, M., Palmer, A. E., et al. (2008). Single-spike detection *in vitro* and *in vivo* with a genetic Ca^{2+} sensor. *Nat. Methods* 5, 797–804. doi: 10.1038/nmeth.1242

Williams, H., and Nottebohm, F. (1985). Auditory responses in avian vocal motor neurons: a motor theory for song perception in birds. *Science* 229, 279–282.

## Appendices

### A1. Gradient Descent Derivation of Eligibility-Weighted Hebbian Learning

We can derive the learning rule in Equation 1 by gradient (steepest) descent on an error function E. The differential change δ**V** in synaptic weight is proportional to the gradient and we can write:

The error function E_{i}(*t*) for neuron *i* we define as the square difference between motor activity *m*_{i}(*t* − *s*) and postdicted motor activity ${\widehat{{m}}}_{{i}}{(}{t}{)}{=}{\displaystyle {{\sum}}_{{k}}{{\text{V}}}_{{i}{k}}}{{a}}_{{k}}{(}{t}{)}$, weighted by the eligibility associated with the time lag *s*.

The total error is simply the sum of errors over all neurons E = ∑_{i}*E*_{i}.

By taking the gradient with respect to the *i*, *j* th weight only we find

Assume a normalized eligibility trace (∫_{0}^{∞}*e*(*s*)ds = 1):

#### A1.1. Extension to non-linear network

Note that our linear approach can be extended by introducing a nonlinear function *f* in the auditory to motor mapping in Equation 2:

Where *f*_{i} = *f*(∑_{k}V_{ik} *a*_{k}(*t*)) and *f*′_{i} = (∑_{k}V_{ik}*a*_{k}(*t*))/dV_{ij}.

#### A1.2. Probabilistic derivation of hebbian learning rule

We derive a version of the Hebbian learning rule in Equation 1 that is based on the following probabilistic Boltzmann neuron model. For simplicity, we do not include the time dependence in the derivation (τ = 0). The auditory feedback response *a* given a motor activation *m* is given by the conditional probability

parameterized by the matrix **Q**, which is the motor-sensory mapping (as before) and where

is the partition function. The posterior probability of *m* is given by

In a sensory state, auditory responses in motor neurons are driven via synapses **V** according to the probabilistic model:

with partition function

The error function in Equation 2 is replaced by the Kullbach-Leibler (KL) divergence between P_{Q}(*m*|*a*) and P_{V}(*m*|*a*):

Before taking the derivative of D_{KL} we compute the derivative of the partition function:

based on which it follows that

Using this relationship and Equation A3 we can calculate the derivative of the KL-divergence with respect to V_{ij}:

Thus, the gradient decent leads to

This is the probabilistic analog of Equation A2, in which the silent postdictive motor activity ${\widehat{{m}}}_{{i}}$ is replaced by the conditional expectation 〈*m*_{i}*a*〉_{m} of activity in motor neuron *i* given the sensory response *a*.

### A2. Correlation of Motor Activity Determines Average Synaptic Change

The average synaptic change under learning rule Equation A2 satisfies

where we have substituted *t*′ = *t* − *s*. We can write this equation as

where C_{ij}(*s*) = 〈*m*_{i}(*t*)*m*_{j}(*t* + *s*)〉 is the cross-correlation matrix of motor activity at time lag *s*. In the following we assume without loss of generality that the delay τ_{s} of synaptic transmission between auditory and motor neurons is negligibly small.

### A3. Variable Motor Code

#### A3.1. V is a causal inverse

We assume a motor code with a narrow correlation function that is non-zero only for small *t*_{0}.

Where **1** is the unity matrix and *C*_{0} is a positive constant. The steady state solution 〈δ**V**〉 = 0 of Equation A4 leads to

Assuming that the eligibility trace is constant over short time intervals of duration *t*_{0} (over which the correlation function is non-zero) yields

This implies that the auditory to motor mapping **V** is proportional to the inverse of **Q** weighted by the eligibility at time lag τ:

Thus, **V** is a causal inverse, at least when restricted to the image of **Q**.

#### A3.2. Variable motor codes are associated with large mirroring offsets

We simulate a mirroring experiment in which we cross correlate in a given neuron the motor activity *m*_{i}(*t*) and the activity *m*^{a}_{i}(*t*) that results from observation of the motor act (achieved in birds by song playback though a loudspeaker).

The auditory response in motor neuron *i* is given by

Where *δ _{i,j}* is the Kronecker-Delta, (

*δ*= 1 for

_{i,j}*i*=

*j*and

*δ*= 0 otherwise). Note that relative to song (either produced by the bird or played through the loudspeaker) the playback-evoked activity

_{i,j}*m*

^{a}

_{i}(

*t*) is shifted with respect to the motor activity

*m*

_{i}(

*t*− τ) by a time shift τ (as illustrated in Figure 4). The cross correlation Corr(

*s*) between sensory-evoked and motor generated activity is defined as,

Where *T*′ is the duration of the motor behavior (e.g., the song motif or song). Inserting the expression for the motor activity evoked by the auditory response into Equation A6 yields,

Thus, the cross correlation is nonzero in a small time centered around τ. The peak cross-correlation value is given by the eligibility trace at time lag τ.

Note that our calculations are valid in principle for mean-subtracted *m*_{i}(*t*). In case of non-mean subtracted *m*_{i} we should replace the cross correlation in Equation A6 by the cross covariance to obtain the same findings. However, in practice, mean subtraction is not necessary because the peak location (the mirroring offset) is independent of the mean.

### A4. Stereotyped Motor Codes

#### A4.1. V is a predictive inverse

We describe the motor activity by a traveling pulse ${{m}}_{{i}}{(}{t}{)}{=}{\eta}{\text{}}{\left(}\frac{{i}}{{\omega}}{-}{t}{\right)}$ with speed ω, where

and *t*_{0} = 1/ω. The cross-correlation matrix for such a traveling pulse is a triangular pulse of height *t*_{0}/*T* and width 2*t*_{0} which we approximate by a square pulse of width ${{t}}_{{0}}{,}{{\text{C}}}_{{i}{j}}{(}{s}{)}{\simeq}\frac{{{t}}_{{0}}}{{T}}{\eta}{\text{}}{\left(}\frac{{i}{-}{j}}{{\omega}}{+}{s}{\right)}$ in the following (illustrated in Figure 3D). To facilitate comparison of inverses associated with stereotyped and variable motor codes, we assume their peak correlations are identical, i.e.,

At a steady state 〈δ**V**〉 = 0, Equation A4 yields

Assuming again that the eligibility trace is constant over short time intervals of width *t*_{0}, we find

Given sensory input *a*_{j}(*t*) and the inverse map **V**, the auditory-evoked activity *m*^{a}_{j}(*t*) is proportional to the eligibility trace:

defined for ${t}{\ge}\frac{{i}}{{\omega}}$.

By approximating the eligibility trace only by its maximum value ${e}{\left(}{t}{-}\frac{{i}}{{\omega}}{\right)}{=}{e}{(}{0}{)}{\eta}{\left(}{t}{-}\frac{{i}}{{\omega}}{\right)}$ we have that approximately

and so the playback-evoked activity *m*^{a}_{i}(*t*) is roughly identical to motor activity *m*_{i}(*t*) (as illustrated in Figure 4).

To compute the matrix **V** we use the same approximation for the eligibility trace to obtain

where we have set ω = 1, and where **H** is a shifter matrix (also called cyclic permutation or circulant matrix), e.g., for *n* = 4:

With this approximation we find that the synaptic mapping

is the inverse of the motor map shifted in time by τ, i.e., **V** maps sensory activity evoked by motor activity at time *t* onto motor activity at time *t* + τ. In other words, the sensory lag is compensated and sensory-evoked motor activity at time *t* predicts motor activity at time *t*. Hence, **V** is a predictive inverse.

#### A4.2. Stereotyped motor codes are associated with small mirroring offsets

Based on the sensory-evoked activity *m*^{a}_{i}(*t*) derived in Equation A8 we find for the cross-correlation function Corr(*s*) between motor activity *m*_{i}(*t*) and sensory-evoked activity *m*^{a}_{i}(*t*):

Thus, the cross-correlation function is proportional to the eligibility trace. If the eligibility trace is monotonically decaying we find that the peak of Corr(*s*) occurs at *s* = 0 and is given by

In other words, stereotyped neural codes are associated with zero mirroring offsets.

The ratio of peak cross correlation for variable (A6) and stereotyped (A10) motor codes is given by

implying that the more stereotyped a neural code, the stronger is the observed mirroring effect.

### A5. Mirrored Response Strength in a Probabilistic Model

In the following we define a probabilistic model of a motor neuron that allows us to compute the mirroring strength, i.e., the correlation between motor activity and sensory-evoked activity. We assume a minimal model in which a neuron has only two states *R* = 1 (active) and *R* = 0 (inactive). In addition, we assume two behavioral states *B* = 1 (behavioral feature present), and *B* = 0 (behavioral feature absent). During motor production, the degeneracy of the motor code quantified by the conditional probability of neural activity given that the feature of interest is present during the behavior (e.g., the finger is extended or the song pitch is high) is

and the probability that the neuron is active while the behavioral feature is absent (intrinsic noise) is

Hence, the average motor response [for prior *P*(*B* = 1) = 1/2] is given by

In the sensory state (during observation of the behavior), the reliability of a response quantified by the conditional probability of triggering a sensory response given presence of the behavioral feature in the stimulus is given by,

and the probability of a sensory response without the behavioral feature (intrinsic noise) is:

The parameters *p*_{1}, *p*_{2}, *q*_{1}, and *q*_{2} can be freely chosen in this minimal model, for example *q*_{2} = *p*_{2} if intrinsic noise in sensory and motor states are assumed to be equal.

The average response in the sensory state is given by

The correlation between sensory and motor responses in this cell is

And, the correlation coefficient between motor- and sensory-evoked response is

We can discuss the following special cases:

- perfect sensory tuning (
*q*_{1}= 1,*q*_{2}= 0, no instrinsic noise in sensory state):${\text{CorrCoeff}}{=}\frac{{{p}}_{{1}}{-}{{p}}_{{2}}}{{2}\sqrt{{p}{(}{1}{-}{p}{)}}}$ - same tuning and same intrinsic noise in motor and in sensory states (
*q*_{1}=*p*_{1},*q*_{2}=*p*_{2}):${\text{CorrCoeff}}{=}\frac{{{\left(}{{p}}_{{1}}{-}{{p}}_{{2}}{\right)}}^{{2}}}{{4}{p}{\left(}{1}{-}{p}{\right)}}$

In summary, the strength of mirrored responses scales linearly or quadratically with the contrastive probability that neural responses are locked to the behavioral feature vs. spontaneously driven.

Keywords: mirror neurons, inverse problem, linear models, songbird, sensory motor learning

Citation: Hanuschkin A, Ganguli S and Hahnloser RHR (2013) A Hebbian learning rule gives rise to mirror neurons and links them to control theoretic inverse models. *Front. Neural Circuits* **7**:106. doi: 10.3389/fncir.2013.00106

Received: 01 November 2012; Accepted: 15 May 2013;

Published online: 19 June 2013.

Edited by:

Ahmed El Hady, Max Planck Institute for Dynamics and Self Organization, GermanyCopyright © 2013 Hanuschkin, Ganguli and Hahnloser. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.

*Correspondence: R. H. R. Hahnloser, Institute of Neuroinformatics, University of Zurich and ETH Zurich, Winterthurerstrasse 190, CH-8057 Zurich, Switzerland e-mail: rich@ini.phys.ethz.ch

## COMMENTARY