Edited by: Tobias H. Donner, University Medical Center Hamburg-Eppendorf, Germany
Reviewed by: Markus Siegel, University of Tübingen, Germany; Asif A. Ghazanfar, Princeton University, United States
*Correspondence: Bryan C. Daniels
This article was submitted to Decision Neuroscience, a section of the journal Frontiers in Neuroscience
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
A central question in cognitive neuroscience is how unitary, coherent decisions at the whole organism level can arise from the distributed behavior of a large population of neurons with only partially overlapping information. We address this issue by studying neural spiking behavior recorded from a multielectrode array with 169 channels during a visual motion direction discrimination task. It is well known that in this task there are two distinct phases in neural spiking behavior. Here we show Phase I is a distributed or incompressible phase in which uncertainty about the decision is substantially reduced by pooling information from many cells. Phase II is a redundant or compressible phase in which numerous single cells contain all the information present at the population level in Phase I, such that the firing behavior of a single cell is enough to predict the subject's decision. Using an empirically grounded dynamical modeling framework, we show that in Phase I large cell populations with low redundancy produce a slow timescale of information aggregation through critical slowing down near a symmetry-breaking transition. Our model indicates that increasing collective amplification in Phase II leads naturally to a faster timescale of information pooling and consensus formation. Based on our results and others in the literature, we propose that a general feature of collective computation is a “coding duality” in which there are accumulation and consensus formation processes distinguished by different timescales.
The nervous system is a distributed information processing system. Functional encodings have been identified at the level of single cells (e.g., Shadlen and Newsome,
Here we ask how coherent output is produced when neurons in a relevant target population have different “opinions” about an input and are not coordinated by a “Deus Ex Machina” or central controller (e.g., Gazzaniga,
We show that these two views and the data supporting them can be reconciled by framing the problem of coherent output as one of
We use data from a well-known experimental paradigm, the Random Dot Motion discrimination task (RDM) (Shadlen and Newsome,
Timing of trial events. A monkey is trained to discriminate opposed directions of motion in a random dot display and to report the perceived direction with an eye movement (saccade) to one of two visual targets. In each trial the subject is presented with the visual stimulus of dots drifting left or right across the screen for a fixed duration. Once the dots disappear, and after a delay, a “go” cue is given to prompt subjects to indicate their decision about the direction in which the dots are moving—by looking either to the left or right—with a mean reaction time of 245 ms.
The measured neural activity is qualitatively different before and after the go cue, demarcating two time intervals that we call Phase I and Phase II (Figure
The causal pathway for perceptual decisions in the primate brain is still debated. The lateral intraparietal cortex (LIP) has been a contender as a causal decision-making locus, as it demonstrates accumulation of perceptual evidence (Shadlen and Newsome,
Here, we use data from one of these closely related areas, area 8Ar in dorsolateral prefrontal cortex. Area 8Ar, like LIP, carries information about planned saccades in direction discrimination tasks (Kim and Shadlen,
At the most abstract level, decision-making can be fitted using a variety of continuous or discrete one-dimensional random walks or diffusion models with fixed or variable thresholds (Gold and Shadlen,
Most of these models assume a single phase in which information is accumulated. They do not, however, explicitly consider the collective properties of this accumulation—is the information about the decision localized in individual neurons or encoded at the population level? Furthermore, is accumulated information shared or transmitted across the population of neurons? The observation that neuronal behavior is qualitatively different before and after the go cue (see Figures
The amount of information that individual neural units encode about the decision output varies strongly over units and over trial time.
Existing decision-making literature tends to neglect neural behavior after the go cue, treating it as “choice execution.” We argue that post-go-cue behavior is an extension of decision making at the system level and view the process between the go cue and the saccade as essential to collective decision-making—“reading out” the information that is, before this point, only available by pooling information from many cells.
In this paper we propose, building on previous work (Flack,
Hence in Phase I (slow aggregation) we propose information is acquired through a process of sensory accumulation. To improve the reliability of the information given noisy input and propensity for error at the component level, a sum or other integration is performed at the population or subpopulation level. This is essentially crowd-sourcing. In the measured neurons, this happens during the stimulus presentation and delay period (Figure
In Phase II (fast propagation) we propose that information at the level of units in Phase I is propagated quickly across a population of cells that may or may not have participated in Phase I. The outcome of propagation is neural consensus in so far as it results in the decision being encoded in each individual neuron. This consensus allows the system to act.
In our study system we find evidence for both Phase I and Phase II. Our results suggest Phase II occurs post-go-cue and is achieved through increased information amplification and sharing.
Finally, we develop a dynamical rate model that explains this behavior in terms of varying distance from a symmetry-breaking transition. In the simplest form of the model, this distance is controlled using a time-varying recurrent excitation among informative neurons. The model demonstrates a fundamental connection between timescales and redundancy, with the formation of a collective slow timescale requiring a population with lower informational redundancy.
First, we quantify how much information about the decision is encoded in individual neural firing rates. We find substantial heterogeneity over neurons and as a function of time. As shown in Figure
Figure
We next assess whether information about the decision is encoded collectively at the whole population level or within a subpopulation and how this quantity compares to the information encoded at the individual neuron level.
An encoding based on Linear Discriminant Analysis (LDA) allows us to verify that the population encodes more information than any single neural unit by producing a lower bound on the mutual information encoded jointly by the entire population (see Methods). As shown in Figure
Neurons encode information about the output decision, with distinct dynamics in Phase I (pre-go cue) and Phase II (post-go cue).
Interestingly, at the same time that the collective information jumps to its maximum value, there is a switch in the distributed nature of the encoding: many individual units become highly informative in Phase II, providing redundant information. This contrasts with Phase I, in which much more information is contained at the population level than in any single unit. To quantify the redundancy of the encoding, we ask how many units need to be included in the LDA encoding in order to reach 95% of the maximal collective predictive power. As shown in Figure
At the time of the physical implementation of the decision, encoding shifts: Phase I resembles a population code, whereas in Phase II individual rates are predictive.
Figure
A subset of neurons collectively encodes information about the decision in Phase I, and more neurons contain information in Phase II. Information about the decision gradually builds during Phase I in some neurons (Class H; red), whereas in other neurons (Class M; orange), little information is present until just before the saccade representing the decision (mean saccade time indicated by dotted line). Class L units (blue) always contain little information.
Qualitative summary of informational properties of each neural class.
H | Synergistic, slow growth | Redundant, fast growth |
M | Uninformative | Redundant, fast growth |
L | Uninformative | Uninformative |
We also find information at the population level specifically about the input stimulus, but it is small compared to the information about the decision, and is significant only during Phase I. This can be seen in
Neurons encode information about the input stimulus, but much less than about the output decision. Analogous plots to Figure
We explore the relationship between timescales of information accumulation, memory, and informational redundancy using a simple dynamical model. We start simply by representing individual neurons as having a state that (1) is persistent on the timescale of tens of ms, (2) transiently affects the states of other neurons via a firing rate that saturates as a function of the current state, and (3) is subject to random noise. The model consists of
Here,
Our simple dynamic rate model model produces behavior that is critically dependent on the degree of recurrent excitation, controlled by
Long timescales and low redundancy occur near the collective transition at which memory storage becomes feasible. We plot three aspects of the collective behavior produced by a simple distributed, dynamic rate model as a function of recurrent excitation and noise, for
The degree of recurrent excitation
First, the relevant timescale for motion along the decision direction
Secondly, sufficiently close to the threshold, noisy individual cells are only weakly constrained to have a similar state as the others; whereas summing over many cells can reliably predict the behavior of the whole, individual cells do not encode much information. This corresponds to the “synergistic” state of Phase I. As recurrent excitation increases, consensus is more strongly enforced, leading to individuals containing more information. This corresponds to the “redundant” state of Phase II. This dependence is demonstrated in Figure
The combination of low redundancy and slow dynamics therefore suggests that Phase I should correspond to
Decision-making model performance decreases if recurrent excitation is large throughout the entire trial. The strength of the model's input signal
Dual coding through critical slowing down. During Phase I the state of cells is determined largely by the coherence of the input signal,
Additionally, the model explains how neurons responsible for the decision can encode a relatively small amount of information about the input stimulus coherence. As the attractors representing the decision do not depend on the coherence, information is only contained in the speed with which those attractors are approached. This speed is in turn related to the magnitude of the input current (
Using a simple linear relationship between stimulus coherence and input
In this study we used information theory and the theory of collective phenomena to analyze time series data from a microelectrode array capturing 169 neural channels in the prefrontal cortex area 8Ar of a macaque monkey. This is a well studied area of the brain that has been shown to play an important role in both visual decision-making and motor behavior (Kim and Shadlen,
In the neural time series studied here, the idea of Phase I as information accumulation is in good agreement with many prior studies of MT, LIP, and prefrontal cortex in which information is integrated at the population level (essentially through crowd-sourcing) in order to increase the accuracy of a decision (Kiani et al.,
Neural behavior post-Phase I has received less attention. Our results suggest a second phase, during which a large subset of cells becomes correlated and acquires redundant information extremely rapidly. This phase of “consensus formation,” in which information rapidly spreads from the “knowledgeable” neurons to many neurons, dramatically increases redundancy in the system. Our simple rate model accomplishes this switch by changing the degree of recurrent excitation, but it could alternatively be controlled by external inputs to the circuit through a perturbation that moves the system away from the symmetry-breaking transition.
Our results suggest investigating other forms of neural decision making to look for similar dynamic consensus phenomena. We expect the separation of timescales between Phases I and II to be most clear in cases involving a gradual accumulation of evidence, such as comparing two extended auditory signals (Erlich et al.,
Although there has been little focus on Phase II and, more generally, consensus formation, in neuroscience, the role of consensus formation in collective computation has been a focus in the study of social processes. For example, search engines and auctions illustrate both slow accumulation and fast consensus (e.g., Leise,
Collective computation based on information accumulation and consensus formation has also been observed in the formation of power structures in primate societies (Flack and Krakauer,
In all three examples (neural, search, power) accumulation is slow and consensus is fast. In the power example it has additionally been shown that an advantage of this timescale separation in collective computation is that it produces a slowly changing yet accurate power structure that serves as a reliable “background” against which individuals can, on a fast timescale, tune strategies quickly and effectively (Flack,
One additional important difference between the neural case and the social cases is that in the social cases both accumulation and consensus can be occurring simultaneously but on different timescales. In the neural case presented here, accumulation (Phase I) occurs first with consensus (Phase II) following, but this may be an artifact of the experimental setup with an externally forced go-cue.
In large systems that are processing information from multiple sources it is difficult to conceive of any way of achieving an efficient, accurate, coordinated representation of environmental regularities other than through a dual-process dynamic. This is because (1) it takes time to integrate information from noisy sources, and (2) not all cells have equal access to information and therefore must acquire input from informed cells. We refer to this requirement as “coding-duality” as it implies a shift from an emphasis on populations of cells pooling resources in Phase I to single cells in possession of all adaptive information through consensus mechanisms in Phase II.
These results help clarify the debate between proponents of the modern neuron-doctrine and distributed-representation theory (Bowers,
In prior studies the representational status of areas 8Ar and LIP has remained ambiguous, and is often described as partly sensory and partly motor. We find that whereas spiking activity in 8Ar cells is strongly predictive of saccadic eye-movements, there is little residual information concerning the visual stimuli (Figures
Yet this does not rule out the measured neurons being part of a group of similar cells that are collectively fully responsible for the decision. In the rate model, the simulated cells are fully responsible for the decision but measuring a subset of the cells reveals only a small amount of information about the input signal—and even this information is quickly lost once the system reaches an attractor state representing the decision. Thus, because we have data on only a small fraction of all neurons in these areas, it is feasible that 8Ar neurons as a whole could be solely responsible for mapping sensory data onto the decision.
Many previous studies that attempt to model the perceptual decision-making system (e.g., Wang,
These neural classes are typically identified from electrode data by differences in firing rate, spiking waveform, burstiness, and refractory period (e.g., Csicsvari et al.,
In Table
Related perceptual binary decision-making models.
Drift diffusion (Bogacz et al., |
+ξ | Threshold | |||||
Leaky integrator (Kiani et al., |
+ξ | − |
Threshold | ||||
Ornstein-Uhlenbeck (OU) (Usher and McClelland, |
+ξ | − |
+ |
Threshold | |||
Distributed saturating OU [Equation (1)] | +ξ | − |
Attractor | ||||
Distributed spiking models (Wang, |
Detailed spiking dynamics | Attractor |
The simplest models are most analytically tractable, abstracting away both the distributed nature of the computation and mechanisms for saturation and persistent memory, beginning with the simplest drift diffusion model (Bogacz et al.,
However, models that do not include decay or saturation of firing rates cannot explain the loss of information about the coherence of the input late in the trial (Figure
Instead, the loss of information about coherence and the loss of susceptibility can be parsimoniously explained if the decision process involves approaching stable attractors that store the decision during the delay period. Including feedback is one way to produce multiple attractors. In the simplest case, this leads to the Ornstein-Uhlenbeck (OU) model (Busemeyer and Townsend,
None of these low-dimensional models, however, address the issue of how the computation is distributed over multiple cells. The noisiness of individual neurons likely necessitates larger populations of neurons in order to accumulate persistent information over longer timescales.
Existing models that incorporate collections of cells include those with populations of spiking neurons (Wang,
The model presented in Equation (1) captures the details necessary to describe the distributed nature of the computation but abstracts away most details of neurobiology. It can be viewed as a distributed implementation of an Ornstein-Uhlenbeck model (Busemeyer and Townsend,
The results we find here in the specific case of the random dots task in prefrontal cortex hint at more general design principles for decision-making and collective computation. For instance, much attention has been paid to the idea that it may be beneficial for collective information processing systems to exist near a symmetry breaking transition, or critical point (Langton,
Mutual information is the information shared by any two streams of data
In these analyses, we employ rates of neural firing averaged over time bins of length 200 ms (assuming rate encoding and ignoring details about the precise timing of spikes) in order to calculate dependencies among neural and behavioral states using mutual information. Entropies are estimated using the NSB method (Nemenman et al.,
To combine neural firing rates into a collective encoding, we use LDA. Given neural rate data from many trials and classification of those trials as left decisions and right decisions, LDA attempts to find the linear combination of neural rates that is most informative of the class (left or right).
LDA makes the simplifying assumption that data from each class is produced by a multidimensional Gaussian specified by the observed mean
produces a number designed to be informative about the class from which the data vector came (with maximal performance guaranteed when
LDA simultaneously provides a framework for predicting the output given rate data. Looking at any given set of rates
Thus, positive
Equation (1) describes a dynamic Hopfield network (Hopfield,
We imagine that we measure some subset
Although this simplified picture does not include spiking, models that include spiking have been shown to produce similar behavior (Wang,
Experimental data were provided by Roozbeh Kiani and William Newsome. Data were gathered in accordance with the recommendations of the National Institutes of Health Guides for the Care and Use of Laboratory Animals. The protocol was approved by the Stanford University Animal Care and Use Committee (IACUC number 9720). The data is a subset of that described in Kiani et al. (
The physical locations of measured neural units. Colored circles indicate units that have significant mutual information with the decision sometime before the go cue (Class H, left) and only after the go cue (Class M, right). Stacked circles indicate multiple neural units detected by a single electrode.
In a continuous-time model of neural activity (Hopfield,
where the timescale of the cell returning to equilibrium in the absence of other signals is set by τ
Adding a Gaussian noise term ξ, with
The timescale τ is set at 10 ms to match the order of magnitude of the characteristic timescales of synaptic receptors (Moreno-Bote and Parga,
Importantly, we expect the qualitative features of the relationship between timescales and redundancy to be insensitive to the exact form of the dynamics. These features are produced near any collective transition displaying critical slowing down, as described below.
As the strength
As can be seen by measuring the stability of the
The remaining parameters of the model can be set by matching the qualitative characteristics of informational dynamics observed in the data (see Figures
Second, because the dynamics is only slightly perturbed by the input (Figure
Mutual information with respect to the output decision, an information theoretic measure of the predictive power curves. Here we plot the mutual information with respect to the output decision
Finally, we set the form of the external input
Stochastic dynamic model parameters used throughout the paper, unless otherwise specified.
τ | 10 ms |
500 | |
5 | |
0.03 | |
1.1 | |
1.5 | |
Γ | 0.16 |
We emphasize that there are very likely other sets of parameters that equally well match the qualitative features of the data, so the values of the individual parameters are not meant to represent best choices to be used in other contexts. For instance, moving along the transition line in Figure
The model simulation recapitulates the timing of trials. In Phase I, with
The mutual information is a standard measure of dependence in information theory. Calculating mutual information between random variables
Intuitively,
Estimating the mutual information between the collective firing pattern of the measured neurons and a behavioral variable requires estimating the entropy of each (Equation 9). The entropy of the neural firing is difficult to calculate because it has many dimensions and thus many possible states.
If we hypothesize a specific encoding, however, we can use the Data Processing Inequality to produce a lower bound on the mutual information. LDA constitutes such a hypothesis; by transforming the neural rate data
In Figure
Neural rates are calculated using a bin width of 200 ms for measuring information about the decision (Figure
To test the amount of information present about coherence of the input, we split the trials into “strong coherence” trials (coherence value equal to 0.08, 0.16, or 0.32; 820 trials; relative frequency 0.461) and “weak coherence” trials (coherence value equal to 0, 0.01, 0.02, or 0.04; 958 trials; relative frequency 0.539). Mutual information and out-of-sample prediction are measured with respect to the binary classification of strong vs. weak, leading to a maximum possible mutual information of −0.461log0.461−0.539log0.539 = 0.996 bits. Previous studies have found that variation in firing rates is opposite in sign depending on the output direction (Shadlen and Newsome,
In Figure
To measure the number of units needed to reach a certain predictive performance, we add neurons one at a time ordered by their individual mutual information with the output at the given time (calculated with bin width 200 ms). Plotted lines indicate means and shaded regions indicate standard deviations over 20 realizations of in-sample and out-of-sample partitioning, as described above with regard to Figure
Though it is intuitive, our measure of the number of individual units needed to reach the collective performance may not be ideal for future experiments in that we expect it to become uninformative for large
estimating the collective and individual mutual informations in the same way as in Figure
Informational redundancy decreases during Phase I and increases during Phase II. Redundancy is measured by Equation 10, with similar dynamics displayed in
Finally, we would like to test directly the extent to which the change in redundancy happens before or after the saccade. Aligning by the saccade time instead of the go cue, and using a smaller time window of 100 ms for finer temporal resolution, produces the right half of Figure
Changes to distributedness happen prior to saccade. The left plot is the same as Figure
Besides avoiding premature saturation, another argument for decreased interactions during decision-making comes from a “wisdom of the crowd” argument, in which noise in individual decisions is best removed from an average by having individuals cast independent votes. We do not focus on this explanation because the magnitude of the effect in general depends on the specifics of the interactions, and for some cases (e.g., each individual moves their opinion closer to the average of individuals it interacts with) has no effect on the accuracy of the decision.
For fixed means and variances, it is true that having noise correlations that are the same sign as signal correlations leads to worse performance (as in Jeanne et al.,
This study was carried out in accordance with the recommendations of the National Institutes of Health Guides for the Care and Use of Laboratory Animals. The protocol was approved by the Stanford University Animal Care and Use Committee (IACUC number 9720).
BD, JF, and DK conceptualized the study and wrote the paper. BD performed the data analysis.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
This research was supported by two grants to the Santa Fe Institute from the John Templeton Foundation for the study of complexity, a grant to the Santa Fe Institute from the Templeton Foundation to study the mind-brain problem, and Templeton grant JTF number 60501, and by ARO contract W911NF-13-1-0340. The authors thank Bill Newsome and Roozbeh Kiani for use of the data and helpful discussion, and John Krakauer, Eleanor Brush, Chris Ellison, Philip Poon, and Eddie Lee for helpful discussion.
1This logic can likely be generalized to decision-making involving more than two options by using Hopfield networks that involve both positive and negative interactions. An analogous parameter to
2Too far below the threshold, information is quickly forgotten, and too far above the threshold, a decision is made prematurely.
3The exact scaling with
4The duration of this delay period was randomized in the experiment, taking values between 300 and 1500 ms, in order to avoid anticipation of the go cue. We do not include this variation in our simulations as we do not expect it to affect the results—the behavior of the model during the delay period is largely static.
5In the experiment, the coherence level was chosen at random from this set for each trial, so the number of trials per coherence was not precisely uniform.
6Measuring entropy in bits corresponds to taking logarithms with base 2 in the following formulas.
7Even if we bin the spikes into time intervals and create binary data, to guarantee a good estimate of the entropy, we need a number of samples